* add ut for comparing FP32 and QAT INT8
* add save qat transformed model python script
test=develop
* updated
* added missing file
* add "with_label"
test=develop
* performance benchmark as unit test
test=develop
* change names of unnecessary thing
* Change CMakeList.txt for model downloading and UT
test=develop
* change names of functions and params for more readable code
test=develop
* Change PADDLE_ENFORCE messages
test=develop
* fix indent problems
test=develop
* indent problems
test=develop
* Implement Int8 FC
* Integrate FC into INT8v2
test=develop
* int8 FC: transpose weights before computing scales
test=develop
* Add support for activation_type string in FC
test=develop
* Disable MKL-DNN's FC in VGG16 and 19
test=develop
* Disable FC quantization when mkldnn FC is disabled
test=develop
* Solve PADDLE_ENFORCES in FC int8
* Fix Paddle enforces and remove const cast
test=develop
* Fix style changes
test=develop
* Fix quantizer_tester test and add fc quantization
test=develop
* Fix FC test fail on CUDA
* Remove unnecessary log from quantize placement pass
test=develop
* Add Thread ID to FC hash key
test=develop
* Add comments to MKL-DNN FC Kernel
test=develop
* Refactor quantizer
test=develop
* Fix linter issues
test=develop
* Fix crash in slim googlenet
test=develop
* Fix PADDLE_ENFORCE messages
test=develop
* Add fc padding to solve mkl performance
test=develop
* fix gpu pass and error information
test=develop
* fix fc_fuse_pass_test
test=develop
* fix error information
test=develop
* fix error information
test=develop
* fix name and add fc op padding test
test=develop
* fix attributes
test=develop
* optimize fc padding
test=develop
* fix test
test=develop
* Refactor MKL-DNN ElementwiseMul
remove manual fallback, remove format attrs
test=develop
* Refine PADDLE_ENFORCEs in eltwise_mul_op.h
test=develop
* Make ElementwiseMulOp inherit from ElementwiseOp
* Change type of simd_width to int
test=develop
* Remove Constructor extensions in ElementwiseOp and ElementwiseMulOp
test=develop
* Restore attributes
test=develop
* Fix test coverage for mkldnn eltwise mul
test=develop
* Conform to new is_run_common_broadcast API
test=develop
* Add UT for AreDimsAndFormatCorrect
test=develop
* Improve argsort performance.
- Give 200000 data to compute argsort on v100,
can speed up ~190x
before opt cost: 0.53s
after opt cost:0.0027s
- Add fp16 support
* Refine error message
* Refine code
test=develop
Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>
* fix fetch handler problem and refactor
when a user define FetchHandler class, he or she should initialize a handler
with variable dict. the key of a variable dict is a user defined name,
the value of a variable dict is a Varaible generated from python API.
For each fetching, a user should implement handler function in which
fetched_result_dict will be available and the user can access the fetched value
with user defined keys.
* Disable fusion_group pass for windows and mac. We will do some experiments on Linux first.
test=develop
* Print the subgraph when check failed.
test=develop
* add int8 kernel to lookup_table op and add dequantize op test=develop
* change paddle_enforce to paddle_enforce_eq test=develop
* change copyright and change some not suitable code test=develop
* remove debug log test=develop
* replace GetInputType with IndicateVarDataType test=develop
* fix EmptyGradMaker test=develop
* fix diff between cpu and gpu test=develop
* use memcopy when int8_t test=develop
* open dygraph op test, test=develop
* modify to_variable, test=develop
* modify input and output for dygraph, test=develop
* modify input and output for dygraph(fix bug), test=develop
* fix input processing of dygraph op test, test=develop
* fix bug, test=develop
* fix op test, test=develop
* fix forward bug for dygraph, test=develop
* fix mkldnn op test for forward, test=develop
* update nn.py for dygraph, test=develop
* fix crop_tensor_op, test=develop
* fix elementwise_mul_op, test=develop
* fix fill_op, test=develop
* fix some mkldnn op, test=develop
* open backward op test for dygraph, test=develop
* delete log, test=develop
* close backward op test for dygraph, test=develop
* fix bug for edit_distance_op and test_lstm_cudnn_op, test=develop
* fix optest backward bug for dygraph, test=develop
* fix optest backward bug for dygraph, test=develop
* close backward op test for dygraph, test=develop
* close backward op test for dygraph, test=develop
* open dygraph op test, test=develop
* fix op test for dygraph, fix GradOpDescMaker, test=develop
* fix bug for linear_chain_crf_op.h, test=develop
* remove log, test=develop
* remove log, test=develop
* remove log for op_test.py, test=develop
* remove log for op_test.py, test=develop
* fix bug for var_conv_2d_op, change PADDLE_ENFORCE, test=develop
* fix PADDLE_ENFORCE_EQ for hierarchical_sigmoid_op.cc, test=develop
* fix bug for test_increment_ngraph_op.py, test=develop
* fix lod for op test in dygraph, test=develop
* refactor op_test.py to reduce redundant code, test=develop
* fix lod optest, modify InputVar/OutputVar to HasInput/HasOutput, test=develop
* remove debug log, test=develop
* remove redundant code in base.py, test=develop
* fix some error in optest, test=develop
* fix ClearNoNeedBufferInputs function's bug for LoDTensor, test=develop
* refactor op_test.py, test=develop
* remove redundant writing, test=develop
* fix error(get tensor of the grad variable), test=develop
* fix test_concat_mkldnn test_conv2d_mkldnn, test=develop
* fix optest.py for get tensor of LoDTensor, test=develop
* fix optest.py for get tensor of LoDTensor, test=develop
* fix optest.py for get tensor of LoDTensor, test=develop
* fix some redundant code, test=develop
* reslove conflict and rewrite paddle error message, test=develop
* fix the CAPI ZeroCopy shape error and reconstruct the output obtain
* use an anonymous namespace to cover the functor
* fix unit tests because of the output of typeid(T).name() is different from linux and windows, test=develop
* Enable generating code for a given subgraph.
* Support sorting the subgraph.
* Remove the rearange of expressions because we use the sorted subgraph directly.
* Enable generating code for a subgraph which is composed of grad ops.
* Use expression information to check the accuracy in unittest.
* Separate load and store from computation expressions.
test=develop
* Improve the loading statements in generated codes.
test=develop
* Remove unused arguments from formal list.
test=develop
* fix auc drop first commit test=develop
* update datanorm op
* update datanorm with enforce test=develop
* update test=develop
* update format test=develop
* update format
* update format test=develop
* add unit test test=develop
* update unit test test=develop
* update format test=develop
* update format test=develop
* update API description test=develop
* update API description test=develop
* update format test=develop
* fix codes as comments test=develop
* fix description as comments test=develop
* fix description as comments test=develop
* update codes.. test=develop
* modified error message for conv and conv_transpose, test=develop
* modified doc of conv and conv_transpose op, test=develop
* modified the expression for error message, test=develop
* modified error message for group_norm op, test=develop
* modified detail of Attr(data_format) or Attr(data_layout)
* add ValueError in API doc for maxout op, test=develop
* copy some feasigns and corresponding embeddings from one sparse table to another
* copy all feasigns and corresponding embeddings from one sparse table to another
* copy all dense params from one table to another
* copy some local vars to other local vars
* Add Asypadding for conv fusion.
test=develop
reference: pr/20042
* Fix eigen build link error
* Change back file mode
* Use math function & add more checks.
* set the default value of alpha for prelu to 0.25, test=develop
* add the call to __syncthreads(), test=develop
* fix the implementation of cpu prelu, test=develop
* repair the implementation of element mode prelu, test=develop
* modify test_prelu_op.py, test=develop
* Add the check of lod_level between compile-time and runtime.
test=develop
* Fix bug in check_compile_vs_runtime.
test=develop
* Fix the check of output when it is dispensiable or intermediate.
test=develop
* Share lod of x to out in match_matrix_tensor op in compile-time.
* Implement GetLoDLevel in InferShapeContext.
* Set the default value of check_compile_vs_runtime to False and enable it in test_sequence_pad_op.
test=develop
* Enable check_compile_vs_runtime in test_match_matrix_tensor.
* Add the implementation of SetLoDLevel in InferShapeContext.
* Remove the implementation of IncreaseLoDLevel and call Get/SetLoDLevel instead.
* Remove the implementation of DecreaseLoDLevel and call Set/GetLoDLevel instead.
* Refine some ops and unittests.
test=develop
* Fix a typo.
test=develop
* Remove the check of var type, and change int to int32_t.
test=develop
* Add unittest for Get/SetLoDLevel.
test=develop
* Add the definition of operation in fusion_group.
* Use operations in OperationMap to detect fusion_group of elementwise pattern.
* Add namespace fusion_group in code_generator.
* Use operations recorded in OperationMap to generate code.
* Remove implementation codes to .cc file.
* Refine Operation and CodeGenerator to make it easier to generate code for grad_op.
Refine the unittest for better reuse.
* Avoid recording the template's keyword in a array.
* Support the generating of code for grad_op and add unittest.
test=develop
* Remove replaced_element_in_order and use use number instead.
test=develop
* fix bug in pool/conv/conv_transpose:
1. It should be stride[i] not stride[0] in UpdatePaddingAndDilation;
2. fix bug of func _get_padding_with_SAME in test_conv/conv_transpose_op.py;
3. fix bug of the computation process in function conv2dtranspose_forward_naive.
test=develop
* change test to make the data of different dimensions different. test=develop
* Add ernie unit test
test=develop
* Add ernie unit test
test=develop
* Add ernie unit test
test=develop
* remove ngraph
* optimize gpu test
test=develop
* optimize codes
test=develop
* Enrich the type of error and declare the error type interfaces, test=develop
* adjust tests to adapt new form, test=develop
* add inference deps with error_codes.pb.h, test=develop
* restore stack iter start pos, test=develop
* polish code based review comments, test=develop
* Add asymetric padding support for mkldnn pooling
test=develop
* Add asymetric padding support for mkldnn conv
test=develop
* Add asymetric padding support for mkldnn conv_transpose
test=develop
* Add c++ global current tracer for dygraph, test=develop
* add tracer property in c++, test=develop
* support different place, test=develop
* add unittest for tracer, test=develop
* remove duplicate code and duplicate config of master+patch
* drop all ins which has conflict slot or size < merge_size
* user only need to set merge size,if ins num of same id is not equal to merge size, just drop these ins
* user must make sure master data and patch data has no same slot whose feasigns are both non-zero, otherwise these ins will be dropped. (slot list should still be the same of both master and patch)
* test=develop
* support no need buffer vars in dygraph, test=develop
* fix inference compilation error, test=develop
* update no_need_buffer_vars_inference, test=develop
* add unittests for no_need_buffer_vars_context, test=develop
* refine no_need_buffer_vars by return ref, test=develop
* polish some codes, test=develop
fix the bug of conv_transpose cudnn kernel: before version 1.6, the data_format is AnyLayout in inference model. When use version 1.6 and load the model which is saved by previous version, the error occurs. This is because the cudnn kernel in version 1.6 is not compitable with Anylayout setting.
* don't expose numerous Tensor.set(), test=develop
* fix condition, test=develop
* fix float16 bug, test=develop
* feed should be Tensor or np.array, not Variable or number, test=develop
* use forcecast to copy numpy slice to new array, test=develop
* remove float16-uint16 hacking, test=develop
* Refine the cache of program, context and scope in executor.
test=develop
* Refine the unittest test_executor_and_use_program_cache.
* Add the test the PaddingRNN with use_program_cache=True.
test=develop
* Remove a check.
test=develop
* Refine the unittest to check whether it is correct when setting use_program_cache=True.
test=develop
* Move the codes of fused operators to operators/fused directory.
test=develop
* Correct the op name in cmake.
* Change the use of PADDLE_ENFORCE.
test=develop
* improve split and concat op:
1. support Tensor for argument 'dim' in split op.
2. support Tensor for argument 'axis' in concat op.
test=develop
* redefine function GetDataFromTensor and set unknown output shape to - 1.
test=develop
* add check: Attr(sections) match Input(X). test=develop
* support Tensor for attr(sections) and attr(sections) can contain -1.
add check for attr(sections).
test=develop
* modify error message for concat and call Resize only when necessary. test=develop
* Refine the InferShape of ReadFrom and WriteTo op, and add comment to explain why not call ShareLoD for runtime.
test=develop
* Add comment for ReorderLoDTensorByRank op.
* Add comment for lod_tensor_to_tensor_array op to explain why only call DecreaseLoDLevel for compile time.
test=develop
* ShrinkRNNMemory op should call ShareLoD for compile time.
test=develop
* Add the implementation of IncreaseLoDLevel and add the compile-time check of lod_level in InferShape of sequence_pool.
test=develop
* Refine the unittest of DynamicRNN.
test=develop
* Change PADDLE_ENFORCE to PADDLE_ENFORCE_NE.
test=develop
* Add fusion_group_pass and elementwise pattern.
* Rewrite the detector of elementwise group.
test=develop
* Add a comment in codegen.
* Add more unittest cases.
test=develop
* Move code_generator related code to fusion_group directory.
* Correct the including path.
* Add the definition of SubGraph and finish the insert of fusion_group op in pass.
* Insert graph_vis_pass in tester to visualize the graph for debug.
* replace part of the old implementation, test=develop
* restore concat op, test=develop
* update all ops implemention & delete GetDataTypeOfVar func, test=develop
* no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto.
* add find_distributed_lookup_table_grads instead of hard code GRAD
* support embedding stop gradient. push sparse has error before fix this.*
* fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this.
* fix pull sparse, skip slots which do not have embedding.
* fix collect feasign label info, skip slots which do not have embedding.
* support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables.
* test=develop
* All elements in attr(shape) of crop_tensor can be -1, test=develop, test=document_preview
* fix the bug that attr(offsets) should be initialized, test=develop
* - Flushing mkl-dnn cache
test=develop
- Disabled clearing cache for LoadModel
- Added clearing of mkl-dnn cache when Executor is created
test=develop
- Do not clear for GPU places
test=develop
- compilation fix
test=develop
* - Moved clearing of mkl-dnn cache in destructor of executor
test=develop
* - Compilation fix
test=develop
- Reverted conditional clearing of mkl-dnn cache in Executors's
destructor
test=develop
- compilation fix
* improve save and load behaviour, test=develop
* code cleaning, test=develop
* disable check_guards and update_guards in release version, test=develop
* fix compilation issue, test=develop
* add buddy_allocator speed test data, test=develop
* fix compilation issue, test=develop
* fix comment, test=develop
* update function names according to the google C++ style guide, test=develop
* tweak the test data format, test=develop
* move buddy_allocator_test_data to paddle/fluid/testdata, test=develop
* add accessor and mutator for Desc, test=develop
* add data type check, test=develop
* polish error messages, test=develop
* polish error messages, test=develop
* Remove support for the CPU architecture matmul, test=develop
* fix syntax bug, test=develop
* Fix docs of gru_unit and dynamic_gru.
Fix basic_gru in rnn_impl.py.
Add error messages for param_attr setting in layer_norm api.
Add int64 dtype for expand.
test=develop
* Reopen unit-tests of basic_gru/basic_lstm in rnn_impl.py.
test=develop
* Add unit test for layer_norm api.
test=develop
* Remove the deprecated gru doc fix. test=develop
* Fix basic_gru test coverage. test=develop
* Update API.spec. test=develop
* Update API.spec. test=develop
* Fix test_basic_gru coverage test. test=develop
* Update test_basic_gru in test_layers to use fluid.data
test=develop
* Update test_basic_gru for coverage. test=develop
* Refine the documentation of sums.
* Remove Chinese comments and update API.spec.
* Refine the description of input argument.
* Update API.spec.
test=develop
test=document_fix
* refine the en api doc of ones, zeros, reverse, increment, hsigmoid and create_py_reader_by_data ops
test=develop, test=document_preview, test=document_fix
* refine eng doc for hsigmoid and create_py_reader_by_data ops
test=develop, test=document_preview, test=document_fix
* update API.spec
test=document_fix
* Fix the parameter name axis of reverse op in eng doc
test=develop, test=document_fix
* Update API.spec
test=develop, test=document_fix
* Refine eng doc of zeros, ones, reverse and assign op
test=develop, test=document_fix
* Update API.spec for assign, ones, zeros and reverse
test=develop, test=document_fix
* Fix data type of reverse op in eng doc
test=develop, test=document_fix
* Update API.spec for reverse op
test=develop, test=document_fix
* refine eng doc for hard_sigmoid op
test=develop
test=document_fix
* refine the description of hard_sigmoid
test=develop
test=document_fix
* update API.spec
test=document_fix
* Refine the decription of parameters of HardSigmoid op
test=develop, test=document_fix
* Update API.spec for hard_sigmoid op
test=develop, test=document_fix
* add input type and dtype check for accuracy_op
* add input type and dtype check for accuracy_op
* modify python error on accuracy_op,add test=develop
* modify details on accuracy_op, test=develop
* test float16, test=develop
* add warning, test=develop
* Refine the main comment of DynamicRNN.
* Refine the documentation of DynamicRNN's step_input function.
* Refine the documentation of DynamicRNN's static_input function.
* Refine the documentation of DynamicRNN's block function.
* Refine the documentation of DynamicRNN's memory function.
* Refine the documentation of DynamicRNN's update_memory and output function.
* Refine the code format and remove the method list.
* Refine the documentation of DynamicRNN's __call__ function.
test=develop
test=document_fix
* Minor modification.
test=develop
test=document_fix
* Fix some typo.
* Update API.spec.
test=develop
test=document_fix
* Refine the English according to the comments.
* Update API.spec.
test=develop
test=document_fix
* Fix some typo.
* Update API.spec.
* fix English Doc of API:layers.py_func/sum, test=document_fix
* fix English Doc of API:layers.array_read/array_write/array_length,test=develop test=document_fix
* update the api en doc of BuildStrategy and its setting, test=develop, test=document_fix
* update api.spec, test=develop, test=document_fix
* update the en doc of fuse_relu_depthwise_conv, test=develop, test=document_fix
* fix the reduce api en doc test=document_fix test=develop
* fix the fluid.data test=develop test=document_fix
* fix the API.spec test=develop test=document_fix
* fix according the review test=develop test=document_fix
* fix the confilict test=develop test=document_fix
* test=develop, fix docker with paddle nccl problem
* test=develop, refine en_doc for Variable and Program
* test=document_fix, fix English doc for Variable and Program
* test=document_fix, refine astype code block style
* test=document_fix, add example code for Variable properties
* test=document_fix, fix BackwardStrategy English Doc
* test=document_fix, fix syntax
* test=document_fix, refresh API.spec
* test=document_fix, refine api spec
* test=document_fix, refine api spec
* test=document_fix
test=develop
Fix english doc api, invloves the op of retinanet_target_assign, sigmoid_focal_loss and retinanet_detection_output.
* test=document_fix
test=develop
remove Notice: this OP supports CPU mode only in english doc api of retinanet_target_assign and retinanet_detection_output
* test=document_fix
test=develop
fix API Difference for retinanet_target_assign and retinanet_detection_output
* fix API.spec conflicts in english doc api of retinanet_target_assign, sigmoid_focal_loss, retinanet_detection_output
test=develop
test=document_fix
* fix API.spec conflicts in english doc api of retinanet_target_assign, sigmoid_focal_loss, retinanet_detection_output
test=develop
test=document_fix
* test=document_fix
Fix english doc api, invloves the op of greater_equal,greater_than,less_equal,not_equal,
rank,rsqrt,diag,linspace,reduce_all,reduce_any,sign,where,zeros_like,unique_with_counts.
* Fix some format problem in the op of sign and greather_than.
test=develop
test=document_fix
* Fix the example of zeros_like, and update api.spec
test=develop
test=document_fix
* test=develop, fix docker with paddle nccl problem
* test=develop, refine en_doc for Variable and Program
* test=document_fix, fix English doc for Variable and Program
* test=document_fix, refine astype code block style
* test=document_fix, add example code for Variable properties
* add api check in fc test=develop
* enforce shape error info of sum op test=develop
* fix spelling test=develop
* print x_dims info test=develop
* enhance shape error info test=develop
* polish minimize en doc
* polish adam optimizer en doc
* polish adamax optimizer en doc
* polish adagrad and decayed adagrad optimizer en doc
* polish model average en doc, test=develop, test=document_fix, test=document_preview
* self review and further polishing doc
* update API.spec, test=develop, test=document_fix
* update fluid.data api in examples, test=develop, test=document_fix
* update fluid.data inferface, test=develop, test=document_fix
* replace -1 by none, test=document_fix
* fix fluid.data code example, test=develop, test=document_preview, test=document_fix
* use None instead of -1 in shape, test=develop, test=document_preview, test=document_fix