* update benchmark for int8v2, QAT1, QAT2 accuracy and performance
test=document_fix
* change according to reviews
test=develop test=document_fix
* improve some descriptions and some models
test=develop test=document_fix
* update models benchmark data
test=develop test=document_fix
* update int8v2 and qat2 performance
test=develop test=document_fix
* fix python API tests that do not need to inherit OpTest, test=develop
* fix fp16 cases that will only be enabled in GPU mode, test=develop
* remove TestSoftmaxFP16Op from test cases of softmax_mkldnn_op, test=develop
* fix tests so that the cases are only created in GPU mode, test=develop
Add tests to use dy/dx to make sure the gradient values calculated by the control flow backward is correct. Also fixed bugs detected by those tests.
Fix bugs:
1. Unlike sum_op, optimizer ops don't allow uninitialized input tensor. But in conditional_block_grad_op, since the conditional_block may not run, the output gradient tensor may be uninitialized, which will cause the optimizer op error. To fix it, we should let optimizer ops support uninitialized input like sum_op or assign the uninitialized gradient to 0 when the conditional_block_grad_op doesn't run. I found there are about 10+ optimizer ops. **To be simpler, I just assign output gradient of the conditional_block_grad_op to 0 in this PR**. But it can be further explored whether we can make optimizer ops like sum_op to support uninitialized input tensor because theoretically we can speed up without the assigning in conditional_block_grad_op.
2. Infer parameter shapes during append_backward. I didn't know that all our parameters are in global block. When op_desc is inferring shapes at the sub-block, it may not know the shape of gradients of parameters whose shape information is at global block. I fixed it by inferring shapes of gradients from forward var.
This PR also did some code clean up:
1. Print the var name when sgd_op catches shape error so that it is easier to debug
2. Fix a typo: dicta -> dict
* add file check_op_desc.py and add interface to get default value. test=develop
* add test for c++ coverage rate. test=develop
* Correct typo. test=develop
* dygraph mode support linear lr warm up; test=develop
* add unitest for linear warmup; test=develop
* add input type check; test=develop
* fix type check assert error; test=develop
* change type error; test=develop
* test=develop, fix docker with paddle nccl problem
* don't expose numerous Tensor.set(), test=develop
* fix condition, test=develop
* fix float16 bug, test=develop
* feed should be Tensor or np.array, not Variable or number, test=develop
* use forcecast to copy numpy slice to new array, test=develop
* remove float16-uint16 hacking, test=develop
* add variable method to varbase and refactor to_variable to support return varbase
* support kwargs in varbase constructor
* add VarBase constructor to support default python args
* refine varbase initial method
* reset branch
* fix ut for change VarBase error info to PaddleEnforce
* cherry is parameter change before
* overload isinstance to replace too many change of is_variable
* rm useless files
* rm useless code merged by git
* test=develop, fix some ut failed error
* test=develop, fix test_graph_wrapper
* add some tests, test=develop
* refine __getitem__, test=develop
* add tests, test=develop
* fix err_msg, test=develop
* fix the device supported of the op unique and unique_with_counts.
test=develop
test=document_fix
* Fix the precision of test in the op of unique and unique_with_counts.
test=develop
test=document_fix
* add param & grad shape check for sgd op
* add _reshape_inplece interface for dygraph parallel
* refine unittest based paddle/models scripts, test=develop
* add unittest for parallel grad fuse, test=develop
* Commit before merging develop
test=develop
* Backup after working with Huihuang logs
* Commit before deleting Huihuang debug loggings
* Commit before debug
test=develop
* Fix bug commit
test=develop
* Backup of fixing bugs
test=develop
* Clean up code
test=develop
* Fix a bug in sum_op
test=develop
* Add ascending for argsort
* Refine api doc description.
* Refine descending description
* Add int32 logic to speedup when data is small size.
* Remove int32 opt as not support in python
* add ut for comparing FP32 and QAT INT8
* add save qat transformed model python script
test=develop
* updated
* added missing file
* add "with_label"
test=develop
* performance benchmark as unit test
test=develop
* change names of unnecessary thing
* Change CMakeList.txt for model downloading and UT
test=develop
* change names of functions and params for more readable code
test=develop
* Change PADDLE_ENFORCE messages
test=develop
* fix indent problems
test=develop
* indent problems
test=develop
* Implement Int8 FC
* Integrate FC into INT8v2
test=develop
* int8 FC: transpose weights before computing scales
test=develop
* Add support for activation_type string in FC
test=develop
* Disable MKL-DNN's FC in VGG16 and 19
test=develop
* Disable FC quantization when mkldnn FC is disabled
test=develop
* Solve PADDLE_ENFORCES in FC int8
* Fix Paddle enforces and remove const cast
test=develop
* Fix style changes
test=develop
* Fix quantizer_tester test and add fc quantization
test=develop
* Fix FC test fail on CUDA
* Remove unnecessary log from quantize placement pass
test=develop
* Add Thread ID to FC hash key
test=develop
* Add comments to MKL-DNN FC Kernel
test=develop
* Refactor quantizer
test=develop
* Fix linter issues
test=develop
* Fix crash in slim googlenet
test=develop
* Fix PADDLE_ENFORCE messages
test=develop
* Add fc padding to solve mkl performance
test=develop
* fix gpu pass and error information
test=develop
* fix fc_fuse_pass_test
test=develop
* fix error information
test=develop
* fix error information
test=develop
* fix name and add fc op padding test
test=develop
* fix attributes
test=develop
* optimize fc padding
test=develop
* fix test
test=develop
* Refactor MKL-DNN ElementwiseMul
remove manual fallback, remove format attrs
test=develop
* Refine PADDLE_ENFORCEs in eltwise_mul_op.h
test=develop
* Make ElementwiseMulOp inherit from ElementwiseOp
* Change type of simd_width to int
test=develop
* Remove Constructor extensions in ElementwiseOp and ElementwiseMulOp
test=develop
* Restore attributes
test=develop
* Fix test coverage for mkldnn eltwise mul
test=develop
* Conform to new is_run_common_broadcast API
test=develop
* Add UT for AreDimsAndFormatCorrect
test=develop
* Improve argsort performance.
- Give 200000 data to compute argsort on v100,
can speed up ~190x
before opt cost: 0.53s
after opt cost:0.0027s
- Add fp16 support
* Refine error message
* Refine code
test=develop
Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>
* fix fetch handler problem and refactor
when a user define FetchHandler class, he or she should initialize a handler
with variable dict. the key of a variable dict is a user defined name,
the value of a variable dict is a Varaible generated from python API.
For each fetching, a user should implement handler function in which
fetched_result_dict will be available and the user can access the fetched value
with user defined keys.
* add int8 kernel to lookup_table op and add dequantize op test=develop
* change paddle_enforce to paddle_enforce_eq test=develop
* change copyright and change some not suitable code test=develop
* remove debug log test=develop
* replace GetInputType with IndicateVarDataType test=develop
* fix EmptyGradMaker test=develop
* fix diff between cpu and gpu test=develop
* use memcopy when int8_t test=develop
* open dygraph op test, test=develop
* modify to_variable, test=develop
* modify input and output for dygraph, test=develop
* modify input and output for dygraph(fix bug), test=develop
* fix input processing of dygraph op test, test=develop
* fix bug, test=develop
* fix op test, test=develop
* fix forward bug for dygraph, test=develop
* fix mkldnn op test for forward, test=develop
* update nn.py for dygraph, test=develop
* fix crop_tensor_op, test=develop
* fix elementwise_mul_op, test=develop
* fix fill_op, test=develop
* fix some mkldnn op, test=develop
* open backward op test for dygraph, test=develop
* delete log, test=develop
* close backward op test for dygraph, test=develop
* fix bug for edit_distance_op and test_lstm_cudnn_op, test=develop
* fix optest backward bug for dygraph, test=develop
* fix optest backward bug for dygraph, test=develop
* close backward op test for dygraph, test=develop
* close backward op test for dygraph, test=develop
* open dygraph op test, test=develop
* fix op test for dygraph, fix GradOpDescMaker, test=develop
* fix bug for linear_chain_crf_op.h, test=develop
* remove log, test=develop
* remove log, test=develop
* remove log for op_test.py, test=develop
* remove log for op_test.py, test=develop
* fix bug for var_conv_2d_op, change PADDLE_ENFORCE, test=develop
* fix PADDLE_ENFORCE_EQ for hierarchical_sigmoid_op.cc, test=develop
* fix bug for test_increment_ngraph_op.py, test=develop
* fix lod for op test in dygraph, test=develop
* refactor op_test.py to reduce redundant code, test=develop
* fix lod optest, modify InputVar/OutputVar to HasInput/HasOutput, test=develop
* remove debug log, test=develop
* remove redundant code in base.py, test=develop
* fix some error in optest, test=develop
* fix ClearNoNeedBufferInputs function's bug for LoDTensor, test=develop
* refactor op_test.py, test=develop
* remove redundant writing, test=develop
* fix error(get tensor of the grad variable), test=develop
* fix test_concat_mkldnn test_conv2d_mkldnn, test=develop
* fix optest.py for get tensor of LoDTensor, test=develop
* fix optest.py for get tensor of LoDTensor, test=develop
* fix optest.py for get tensor of LoDTensor, test=develop
* fix some redundant code, test=develop
* reslove conflict and rewrite paddle error message, test=develop
* add control flow API: case. test=develop
* delete 'raise TypeError' in _error_message() and return a string. test=develop
* polish API document. test=develop
* fix auc drop first commit test=develop
* update datanorm op
* update datanorm with enforce test=develop
* update test=develop
* update format test=develop
* update format
* update format test=develop
* add unit test test=develop
* update unit test test=develop
* update format test=develop
* update format test=develop
* update API description test=develop
* update API description test=develop
* update format test=develop
* fix codes as comments test=develop
* fix description as comments test=develop
* fix description as comments test=develop
* update codes.. test=develop
* Fix TensorRT detection bug
1. Add new search path for TensorRT at tensorrt.cmake
2. Add better debug message
3. Fix the bug of detection of TensorRT version
In NVIDIA official docker image, TensorRT headers are located at
`/usr/include/x86_64-linux-gnu` and TensorRT libraries are located
at `/usr/lib/x86_64-linux-gnu`, so using `-DTENSORRT_ROOT` will
fail to detect TensorRT.
There is no debug/warning message to tell developer that TensorRT
is failed to be detected.
In later version of TensorRT (e.g. v6), `NV_TENSORRT_MAJOR` is
defined at `NvInferVersion.h` instead of `NvInfer.h`, so add
compatibility fix.
* Fix TensorRT variables in CMake
1. Replace `${TENSORRT_ROOT}/include` with `${TENSORRT_INCLUDE_DIR}`
2. Replace `${TENSORRT_ROOT}/lib` with `${TENSORRT_LIBRARY}`
Manually type path may locate incorrect path of TensorRT. Use the
paths detected by system instead.
* Fix TensorRT library path
1. Add new variable - `${TENSORRT_LIBRARY_DIR}`
2. Fix TensorRT library path
inference_lib.cmake and setup.py.in need the path of TensorRT library
instead of the file of TensorRT library, so add new variable to fix it.
* Add more general search rule for TensoRT
Let system detect architecture instead of manually assign it, so
replace `x86_64-linux-gnu` with `${CMAKE_LIBRARY_ARCHITECTURE}`.
* Add more general search rule for TensorRT
Remove duplicate search rules for TensorRT libraries. Use
`${TENSORRT_LIBRARY_DIR}` to get full path of libnvinfer.so
test=develop
* modified error message for conv and conv_transpose, test=develop
* modified doc of conv and conv_transpose op, test=develop
* modified the expression for error message, test=develop
* modified error message for group_norm op, test=develop
* modified detail of Attr(data_format) or Attr(data_layout)
* add ValueError in API doc for maxout op, test=develop
* add API switch_case. test=develop
add Nest
* modify code according to reviews:
1.Attr(branch_index) support 'uint8' and 'int64' besides 'int32'.
2.remove useless code.
test=develop
* replace fluid.layers.data with fluid.data and polish API document. test=develop
* copy some feasigns and corresponding embeddings from one sparse table to another
* copy all feasigns and corresponding embeddings from one sparse table to another
* copy all dense params from one table to another
* copy some local vars to other local vars
* add input type and dtype check template, and update some APIs check
* refine check template, and update some APIs check in nn.py
* update some APIs check in loss.py
test=develop
* Add Asypadding for conv fusion.
test=develop
reference: pr/20042
* Fix eigen build link error
* Change back file mode
* Use math function & add more checks.
* set the default value of alpha for prelu to 0.25, test=develop
* add the call to __syncthreads(), test=develop
* fix the implementation of cpu prelu, test=develop
* repair the implementation of element mode prelu, test=develop
* modify test_prelu_op.py, test=develop
* Add the check of lod_level between compile-time and runtime.
test=develop
* Fix bug in check_compile_vs_runtime.
test=develop
* Fix the check of output when it is dispensiable or intermediate.
test=develop
* Share lod of x to out in match_matrix_tensor op in compile-time.
* Implement GetLoDLevel in InferShapeContext.
* Set the default value of check_compile_vs_runtime to False and enable it in test_sequence_pad_op.
test=develop
* Enable check_compile_vs_runtime in test_match_matrix_tensor.
* Add the implementation of SetLoDLevel in InferShapeContext.
* Remove the implementation of IncreaseLoDLevel and call Get/SetLoDLevel instead.
* Remove the implementation of DecreaseLoDLevel and call Set/GetLoDLevel instead.
* Refine some ops and unittests.
test=develop
* Fix a typo.
test=develop
* Remove the check of var type, and change int to int32_t.
test=develop
* Add unittest for Get/SetLoDLevel.
test=develop
* split some APIs from nn.py to rnn.py
* split some APIs from nn.py to sequence_lod.py
test=develop
* fix unit-test bug
test=develop
* fix test_layers unit-test bug
test=develop
* fix bug in pool/conv/conv_transpose:
1. It should be stride[i] not stride[0] in UpdatePaddingAndDilation;
2. fix bug of func _get_padding_with_SAME in test_conv/conv_transpose_op.py;
3. fix bug of the computation process in function conv2dtranspose_forward_naive.
test=develop
* change test to make the data of different dimensions different. test=develop
* Add asymetric padding support for mkldnn pooling
test=develop
* Add asymetric padding support for mkldnn conv
test=develop
* Add asymetric padding support for mkldnn conv_transpose
test=develop
* Add c++ global current tracer for dygraph, test=develop
* add tracer property in c++, test=develop
* support different place, test=develop
* add unittest for tracer, test=develop
* remove duplicate code and duplicate config of master+patch
* drop all ins which has conflict slot or size < merge_size
* user only need to set merge size,if ins num of same id is not equal to merge size, just drop these ins
* user must make sure master data and patch data has no same slot whose feasigns are both non-zero, otherwise these ins will be dropped. (slot list should still be the same of both master and patch)
* test=develop
* add launch_ps module so that we can launch a parameter server training job
1) a user can specify worker_num and server_num
2) parameter server can be killed after all workers exit
3) unit test is added
test=develop
* don't expose numerous Tensor.set(), test=develop
* fix condition, test=develop
* fix float16 bug, test=develop
* feed should be Tensor or np.array, not Variable or number, test=develop
* use forcecast to copy numpy slice to new array, test=develop
* remove float16-uint16 hacking, test=develop
* Refine the cache of program, context and scope in executor.
test=develop
* Refine the unittest test_executor_and_use_program_cache.
* Add the test the PaddingRNN with use_program_cache=True.
test=develop
* Remove a check.
test=develop
* Refine the unittest to check whether it is correct when setting use_program_cache=True.
test=develop
* improve split and concat op:
1. support Tensor for argument 'dim' in split op.
2. support Tensor for argument 'axis' in concat op.
test=develop
* redefine function GetDataFromTensor and set unknown output shape to - 1.
test=develop
* add check: Attr(sections) match Input(X). test=develop
* support Tensor for attr(sections) and attr(sections) can contain -1.
add check for attr(sections).
test=develop
* modify error message for concat and call Resize only when necessary. test=develop
* Refine the InferShape of ReadFrom and WriteTo op, and add comment to explain why not call ShareLoD for runtime.
test=develop
* Add comment for ReorderLoDTensorByRank op.
* Add comment for lod_tensor_to_tensor_array op to explain why only call DecreaseLoDLevel for compile time.
test=develop
* ShrinkRNNMemory op should call ShareLoD for compile time.
test=develop
* Add the implementation of IncreaseLoDLevel and add the compile-time check of lod_level in InferShape of sequence_pool.
test=develop
* Refine the unittest of DynamicRNN.
test=develop
* Change PADDLE_ENFORCE to PADDLE_ENFORCE_NE.
test=develop
* no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto.
* add find_distributed_lookup_table_grads instead of hard code GRAD
* support embedding stop gradient. push sparse has error before fix this.*
* fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this.
* fix pull sparse, skip slots which do not have embedding.
* fix collect feasign label info, skip slots which do not have embedding.
* support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables.
* test=develop
* All elements in attr(shape) of crop_tensor can be -1, test=develop, test=document_preview
* fix the bug that attr(offsets) should be initialized, test=develop
* refine Categorical and MultivariateNormalDiag en doc test=develop, test=document_fix
* refine Categorical and MultivariateNormalDiag en doc test=develop, test=document_fix
* add cuda place and remove fluid.layers.data for test of python API, test=develop
* add cuda place and remove fluid.layers.data for test of python API, test=develop
* modified batch size for Python API test, test=develop
* add data type check, test=develop
* polish error messages, test=develop
* polish error messages, test=develop
* Remove support for the CPU architecture matmul, test=develop
* fix syntax bug, test=develop
* add input type and dtype check for cast_op
test=develop
* fix annotation
test=develop
* support more data type
test=develop
* fix bug for fill_constant's error type
test=develop
* improve converage
test=develop
* improve converage
test=develop
* Fix docs of gru_unit and dynamic_gru.
Fix basic_gru in rnn_impl.py.
Add error messages for param_attr setting in layer_norm api.
Add int64 dtype for expand.
test=develop
* Reopen unit-tests of basic_gru/basic_lstm in rnn_impl.py.
test=develop
* Add unit test for layer_norm api.
test=develop
* Remove the deprecated gru doc fix. test=develop
* Fix basic_gru test coverage. test=develop
* Update API.spec. test=develop
* Update API.spec. test=develop
* Fix test_basic_gru coverage test. test=develop
* Update test_basic_gru in test_layers to use fluid.data
test=develop
* Update test_basic_gru for coverage. test=develop
* Refine the documentation of sums.
* Remove Chinese comments and update API.spec.
* Refine the description of input argument.
* Update API.spec.
test=develop
test=document_fix
* refine the en api doc of ones, zeros, reverse, increment, hsigmoid and create_py_reader_by_data ops
test=develop, test=document_preview, test=document_fix
* refine eng doc for hsigmoid and create_py_reader_by_data ops
test=develop, test=document_preview, test=document_fix
* update API.spec
test=document_fix
* Fix the parameter name axis of reverse op in eng doc
test=develop, test=document_fix
* Update API.spec
test=develop, test=document_fix
* Refine eng doc of zeros, ones, reverse and assign op
test=develop, test=document_fix
* Update API.spec for assign, ones, zeros and reverse
test=develop, test=document_fix
* Fix data type of reverse op in eng doc
test=develop, test=document_fix
* Update API.spec for reverse op
test=develop, test=document_fix
* refine eng doc for hard_sigmoid op
test=develop
test=document_fix
* refine the description of hard_sigmoid
test=develop
test=document_fix
* update API.spec
test=document_fix
* Refine the decription of parameters of HardSigmoid op
test=develop, test=document_fix
* Update API.spec for hard_sigmoid op
test=develop, test=document_fix
* add input type and dtype check for accuracy_op
* add input type and dtype check for accuracy_op
* modify python error on accuracy_op,add test=develop
* modify details on accuracy_op, test=develop
* test float16, test=develop
* add warning, test=develop
* Refine the main comment of DynamicRNN.
* Refine the documentation of DynamicRNN's step_input function.
* Refine the documentation of DynamicRNN's static_input function.
* Refine the documentation of DynamicRNN's block function.
* Refine the documentation of DynamicRNN's memory function.
* Refine the documentation of DynamicRNN's update_memory and output function.
* Refine the code format and remove the method list.
* Refine the documentation of DynamicRNN's __call__ function.
test=develop
test=document_fix
* Minor modification.
test=develop
test=document_fix
* Fix some typo.
* Update API.spec.
test=develop
test=document_fix
* Refine the English according to the comments.
* Update API.spec.
test=develop
test=document_fix
* Fix some typo.
* Update API.spec.
* Add fp16 in input.dtype check test=develop
* Add warning of fp16 in CPU test=develop
* add unittest code for fp16 test=develop
* fix float16 list error test=develop
* fix English Doc of API:layers.py_func/sum, test=document_fix
* fix English Doc of API:layers.array_read/array_write/array_length,test=develop test=document_fix
* fix the reduce api en doc test=document_fix test=develop
* fix the fluid.data test=develop test=document_fix
* fix the API.spec test=develop test=document_fix
* fix according the review test=develop test=document_fix
* fix the confilict test=develop test=document_fix
* Leave fake quantization around mul
* Replace Fake with Real Quantized Mul
* Gather all scales from fake_quantize_ops
* Enable uint8 in conv_relu tensors
* Disable int8 mul and restore fake mul
* Fix buf for running QAT on VGG16 and 19
* test=document_fix
test=develop
Fix english doc api, invloves the op of retinanet_target_assign, sigmoid_focal_loss and retinanet_detection_output.
* test=document_fix
test=develop
remove Notice: this OP supports CPU mode only in english doc api of retinanet_target_assign and retinanet_detection_output
* test=document_fix
test=develop
fix API Difference for retinanet_target_assign and retinanet_detection_output
* fix API.spec conflicts in english doc api of retinanet_target_assign, sigmoid_focal_loss, retinanet_detection_output
test=develop
test=document_fix
* fix API.spec conflicts in english doc api of retinanet_target_assign, sigmoid_focal_loss, retinanet_detection_output
test=develop
test=document_fix
* test=document_fix
Fix english doc api, invloves the op of greater_equal,greater_than,less_equal,not_equal,
rank,rsqrt,diag,linspace,reduce_all,reduce_any,sign,where,zeros_like,unique_with_counts.
* Fix some format problem in the op of sign and greather_than.
test=develop
test=document_fix
* Fix the example of zeros_like, and update api.spec
test=develop
test=document_fix
* test=develop, fix docker with paddle nccl problem
* test=develop, refine en_doc for Variable and Program
* test=document_fix, fix English doc for Variable and Program
* test=document_fix, refine astype code block style
* test=document_fix, add example code for Variable properties
* add api check in fc test=develop
* enforce shape error info of sum op test=develop
* fix spelling test=develop
* print x_dims info test=develop
* enhance shape error info test=develop
* polish minimize en doc
* polish adam optimizer en doc
* polish adamax optimizer en doc
* polish adagrad and decayed adagrad optimizer en doc
* polish model average en doc, test=develop, test=document_fix, test=document_preview
* self review and further polishing doc
* update API.spec, test=develop, test=document_fix
* update fluid.data api in examples, test=develop, test=document_fix
* update fluid.data inferface, test=develop, test=document_fix
* replace -1 by none, test=document_fix
* fix fluid.data code example, test=develop, test=document_preview, test=document_fix
* use None instead of -1 in shape, test=develop, test=document_preview, test=document_fix
* rm unused ckpt and sort ckpt
* use max op idx to sort, test=develop
* remove unsed code,test=develop
* add testcase, test_develop
* modify test case, test=develop
* test=develop
Add input type and dtype check for sign_op.
* test=develop
Fix the api text format in sign op.
* test=develop
Fix the api examples in sign op add update the api.spec.
* Update crf_decoding api & example
test=develop
* Update api spec
test=develop
* Fix linear chain crf api
test=develop
* Avoid sharing data pointer with input
test=develop
* Simplify the logic in linear_chain_crf_decoding
* Add unittest for crf_decoding when label & path both are set
test=develop
* Update API spec
test=develop
* Add unittest for layers && correct infer_shape in chunk_eval
test=develop
* test=develop, fix docker with paddle nccl problem
* test=develop, Add Variable api and refine dygraph related API
* test=develop, Add Variable api and refine dygraph related API
* test=develop, refine test for new api and error info
* test=develop, refine error info and test_layers
* test=develop, add API.spec
* test=devleop, fix to_string python2 and python3 compat error and refien doc
* test=devleop, add API spec
* test=devleop, update API spec
* test=devleop, update API spec
* test=develop, invoke ci
* test=develop, fix example code
* test=develop, update API spec
* test=develop, fix auto_prune_error_on_leaf
* test=develop, fix auto prune error on loss stop_gradient
* test=develop, remove useless error check
* test=develop, add more ut for sorted gradient
* fix the error message for reduce_mean and reduce_sum op test=develop
* fix typo test=develop
* fix according review advice test=develop
* fix the test test=develop
* fix test=develop
* fix the constant error message test=develop
* fix typo test=develop
* fix typo test=develop
* fix code style test=develop
* fix comment and bugs test=develop
* fix the bug test=develop
* fix and add unittest test=develop
* fix the typo test=develop
* add support for the fill_constant op test=develop
* add test for ci coverage test=develop
1.support asymmetric padding;
2.support padding algorithm:"SAME" and "VALID";
3.support channel_last: data_format NHWC and NDHWC;
4.change doc of python API and c++;
test=develop, test=document_preview
* How to write custom op needs to follow framework OP spec.
* Package fluid_framework.so and headers into whl.
* Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir.
* Export some C-APIs to merge OpInfo between core.so and custom_op.so.
* Add unit testing.
* Update API.spec.
* test=develop, argument shape support tensor and tensor in list
* test=develop,Increasing the coverage of CI tests
* test=develop, modify the document and update API.spec
* test=develop, modify the doc and update API.spec
* test=develop, modify the doc and update API.spec
* test=develop, modify the interface of UniformInitializer
* test=develop, modify the interface of XavierInitializer and MSRAInitializer
* test=develop, modify based on review's comments
* test=develop, modify based on review's comments
* test=develop, modify based on review's comments
* fix pool2d pool3d:
1. support asymmetric padding;
2. support padding algorithm:"SAME" and "VALID";
3. support channel_last: data_format NHWC and NDHWC;
4. support inferring shape when input with negative dims in compile time;
5. change doc of python API and c++;
6. fix bug in cuda kernel when Attr(adaptive) is true.
test=develop,test=document_preview
* fix 'tensors' to 'Tensors'. test=develop,test=document_preview
* add test for converage ValueError.test=develop,test=document_preview
* resolve conflict in test_pool2d. test=develop
* Follow Wangzhen's comment in PR 18970, test=develop
* Review comments, test=develop
* Leave fake quantization around mul
test=develop
* Replace Fake with Real Quantized Mul
test=develop
* Fix bug in quantize placement pass
Nodes in the graph now have checked type instead of node name when they are to be marked for quantization test=develop
* test=develop, fix docker with paddle nccl problem
* test=develop, Add Variable api and refine dygraph related API
* test=develop, Add Variable api and refine dygraph related API
* test=develop, refine test for new api and error info
* test=develop, refine error info and test_layers
* test=develop, add API.spec
* test=devleop, fix to_string python2 and python3 compat error and refien doc
* test=devleop, add API spec
* test=devleop, update API spec
* test=devleop, update API spec
* test=develop, invoke ci
* test=develop, fix example code
* test=develop, update API spec
* test=develop, add compat test and fix inplace campat dict error
* impove error message when passing ndarray with object dtype
* imporve message format
* change assert to raise TypeError
* remind user how to locate the irregular data instead of printing
* add unittest for input array type check
* Fix conv2d+dequantize squash for residual fusion
test=develop
* Correct int8 input
test=develop
* Add if exclude or include padding in pool2d mkldnn
test=develop
The new "fluid.data" changes old "fluid.layers.data":
1. Add shape and dtype check.
2. Remove "append_batch_size" parameter. We won't offer this in the new data layer because other deep learning platforms don't have this kind of data layer pre-processing. It may confuse users.
3. Remove "stop gradient" parameter because the data layer doesn't do back-propagation
TODO:
Now data layer feeded by executor is checked, will we want to check the feed data of readers in the future?
* add kernel for fill_op, test=develop
* modify PADDLE_ENFORCE to PADDLE_ENFORCE_EQ, test=develop
* add op test for fill_op, test=develop
* REGISTER COP CUDA KERNEL, test=develop
* update test_fill_op.py, test=develop
* change FillConstantOpVarTypeInference to FillOpVarTypeInference, test=develop
* fix op test, test=develop
* add head file, test=develop
* add support of matmul with multiple head even different width and height
Original matmul with multiple head supports only the mat_a.width == mat_b.height,
in that case, mat_b will be horizontally split. In this patch, we extend the
support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
in this case, mab_b will be vertically split.
One example is A is [3, 8], B is [2, 16], head_number is 4. In this
case, A will be split as [3, 2], B will be (vertically) split as
[2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]
test=develop
* add support of matmul with multiple head even different width and height
Original matmul with multiple head supports only the mat_a.width == mat_b.height,
in that case, mat_b will be horizontally split. In this patch, we extend the
support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
in this case, mab_b will be vertically split.
One example is A is [3, 8], B is [2, 16], head_number is 4. In this
case, A will be split as [3, 2], B will be (vertically) split as
[2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]
test=develop
* refactor the code of matmul with multiple head even different width and height
test=develop
* Add support for new QAT models
test=develop
Co-Authored-By: Michał Gallus <michal.gallus@intel.com>
Co-Authored-By: Wojciech Uss <wojciech.uss@intel.com>
* fixed fps results
test=develop
* fix top5 accuracy drop problem
* updated for new QAT models
* skip quantizing average pooling - dirty but working
* add missing pass
* added missing conv+brelu fuse pass
* removed a call to non-existent pass
test=develop
* renamed pass
test=develop
* Adjust finding pooling scale to newest QAT models
* Remove unnecessary code from quantization_mkldnn_pass
* Copy Pooling input scale to output scale in QAT
* Refactor & remove unused code in QAT
* Incorporate fp32 FC into QAT
test=develop
* Enable graph drawing with debug flag
test=develop
* Add tests for QATv2
* Fix paths for QATv2 models
test=develop
* Add option to save transformed int8 qat model
test=develop
* Remove redundant lines from qat mkldnn pass
test=develop
* Delegate disablement of avg pooling to qat
test=develop
* fix CI bug, test=develop
* Follow Wangzhen's Review, test=develop
* Update API.spec
test=develop
* Name False in (is_unsigned, TensorScale) tuple
test=develop
* Remove constraint that last dimension is forced to be 1 by add
lookup_table_v2 test=develop
* modify into PADDLE_ENFORCE_CUDA_SUCCESS test=develop
* Revert "modify into PADDLE_ENFORCE_CUDA_SUCCESS test=develop"
This reverts commit 8a960bfc61e51aa27c3c529df8fb90b93ebd19f9.
* move api into fluid.embedding test=develop
* fix example code test=develop
* move one_hot into fluid.one_hot
* modify api.spec test=develop
* fix loss shape test=develop
1. Support customize eval function instead of eval program.
2. Fix loading checkpoint in quantization strategy.
3. Support saving eval model when saving a checkpoint.
4. Fix decoder of loading context in PaddleSlim.
5. Fix restoring from the checkpoint of uniform prune strategy.
6. Support saving eval model and infer model during training.
7. Add ‘unitest’ for saving eval model, saving infer model and uniform pruning restoring from the checkpoint.
8. Fix pruning of depthwise_conv_grad op by updating the groups.
* support change shuffle thread num
* support change train thread num
* fix receive shuffle data of each channel
* data norm stop gradient
* add check thread_tensor type and root_tensor type when merge metric
* remove sleep in shuffle, add config
* add config of pslib client to client communication
* fix xbox str
* add data norm op testcase
* add flush in trainer finalize
* make OpTest check grad inplace even if forward has no inplace, test=develop
* do not run PE when enable_inplace is False, test=develop
* add conv3d cuda kernel for float16 type, test=develop
* refactor OpTest for inplace, test=develop
* add comments, test=develop
* add recompute based checkpoints methods for large batch training
test=develop
* add append_backward_with_forward_recomputation
test=develop
* refine optimizer
test=develop
* update backward and optimizer
test=develop
* make Variable usable
test=develop
* add recompute code
* refine optimizer
test=develop
* refine addup _append_backward_ops_with_checkpoints_
1) for recompute part, just cache the grad_op_desc without appending to block
2) before appending grad_op_desc to backward part, addup_repetitive_vars, remove unused branch
test=develop
* make method private
* add recompute strategy into DistributedStrategy
test=develop
* checkpoint version3
test=develop
* remove some print information
test=develop
* remove unused sumop
test=develop
* try to fix recompute with graph building modules
* add input names to vars should be held
* add memory debug tool
* backup backward
* Fix bugs
* add backward desc for op not in any segments
* add exception info for sub_block
test=develop
* modify code style
test=develop
* modify code style
test=develop
* remove print functions
test=develop
* add API spec
test=develop
test=document_preview
* make Recompute a child class of Optimizer
test=develop
test=document_preview
* add API spec
test=develop
test=document_preview
* modify API spec
test=develop
test=document_preview
* add document for Recompute
test=develop
test=document_preview
* change API doc of Rcompute
test=develop
test=document_preview
* code cleaning
test=develop
test=document_preview
* modify API spec
* fix bugs when segments hold no element
* add testcase for Recompute Optimizer
test=develop
test=document_preview
* add test for apply_gradient, and code cleaning
test=develop
test=document_preview
* add test case for load function
* enable CI
test=develop
test=document
* add test case
test=develop
test=document_preview
* add sample code for 4 function of recompute optimizer
test=develop
test=document_preview
* move tree_conv to fluid.contrib.layers
test=develop
* update API.spec for tree_conv
test=develop
* update tree_conv api to increase unit coverage
test=develop
* refactor dygraph,test=develop
* fix failed unittest,test=develop
* polish code,test=develop
* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop
* polish vlog and profiler, test=develop
* try to fix preceding ops order,test=develop
* test transformer in windows ci, test=develop
* use python c-api to speed up tracer.trace,test=develop
* test=develop, fix docker with paddle nccl problem
* test=develop, add ut for debug string and gradient_accumulator
* test=develop, add tests for layer/gradient_accumulator/prepared_op
* test=develop, fix complie error for test_prepared_op
* test=develop, add more ut for dygraph
* test=develop, create API.spec for dygraph api change
* test=develop, refoctor name to make it easier to understand
* test=develop, refoctor name to make it easier to understand
* test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ
* test=develop, fix ut failed on parallel se-resnext
* test=develop, change one more PADDLE_ENFORCE
* support auto prune in dygraph mode
* test=develop, support auto prune
* test=develop, merge develop conflict
* test=develop, fix test_layer and test_tracer ut
* test=develop, fix bug which may cause stop_gradient disabled with a list of backward inputs
modified interpolate_op to support tensor attribute
1. the parameter out_shape of image_resize、resize_nearest/bilinear/trilinear can be a list or a 1-D tensor variable. If a list, each element can be an integer or a tensor variable with shape: [1].
2. the parameter scale of above Ops can be a 1-D tensor variable.
modified document of image_resize, resize_nearest, resize_bilinear, resize_trilinear and add some code example.
add crop_tensor op. The main difference with crop is :
1. If the argument shape is a list, each element is an integer or a tensor variable with shape: [1]. This way is suitable for the case that the shape may be changed each iteration.
2. If the argument shape is a variable. Its rank must be 1. In crop op, the rank of shape must be the same as x
offsets can be a list, in which each element is an integer or a tensor variavle with shape: [1].
* Add fc_elementwise_layernorm_fuse pass and unittest.
* Add fused_fc_elementwise_layernorm op and its GPU kernel.
test=develop
* Apply fc_elementwise_layernorm_fuse_pass to GPU inference.
* Add the setting of attrs in the definition of binary_op.
test=develop
* Add comment.
* Implement the unittest.
test=develop
* Change the unittest name of layer_norm.
test=develop
* strided_slice op basic function test=develop
* test=develop rewrite and fix
* fix bug test=develop
* fix for the PADDLE_ENFORCE usage
* add some unit testw
* fix for the aip test and copright and fix test=develop
* fix API.spec test=develop
* fix API.spec test=develop
* add axis parameter test=develop
* fix for the build error test=develop
* fix python api test=develop
* fix the build test=develop
* fix build test=develop
* fix API spec test=develop
* test=develop add some comment and single op test
* fix API spece test=develop
* fix test=develop
* fix test=develop
* fix api test=develop
* fix api test=develop
* fix API.spec test=develop
* fix typo test=develop
* fix API.spec test=develop
* fix API typo test=develop
* fix doc and API.spec test=develop
improve pow op according to reviews:
1. Delete unnecessary judgement statements in PowGradOpDescMaker;
2. Improve test of test_api;
overload GetKernelTypeForVar
add stop_gradient=True when attr(factor) is tensor Variable, change examples in API pow.
test=develop,test=document_preview
add support parameter inference when argument shape is a list containing integer and tensor variable;
test=develop
fix reshape op according to reviews:
1. improve or message;
2. improve test of test_api.
test=develop,test=document_preview
fix reshape op: Add error message in nn.py, test=develop
add stop_gradient=True when attr(shape) is tensor Variable.
change examples in API reshape.
test=develop,test=document_preview
add support parameter inference when arguments starts or ends is a list containing integer and tensor variable;
test=develop,test=document_preview
improve slice op according to review(from hongyu). test=develop
fix slice op according to review: infer_flags, test=develop
fix slice op: improve overload operator __getitem__ to support attrs(starts and ends) are Variable.
test=develop,test=document_preview
fix test_slice_op: add TestSliceOp_decs_dim_6 to resolve conflict with test_slice_ngraph_op. test=develop
add stop_gradient=True when attr(starts) or attr(ends) is tensor Variable.
test=develop,test=document_preview
1. add tensor support for argument expand_times in expand op;
2. add support parameter inference when argument expand_times is a list containing integer and tensor variable;
improve expand op according to reviews:
1. add doc of ExpandTimes in expand_op.cc;
2. improve the test of test_api.
add stop_gradient=True when attr(expand_times) is tensor Variable, change code examples.
test=develop,test=document_preview
* refactor dygraph,test=develop
* fix failed unittest,test=develop
* polish code,test=develop
* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop
* polish vlog and profiler, test=develop
* try to fix preceding ops order,test=develop
* test transformer in windows ci, test=develop
* use python c-api to speed up tracer.trace,test=develop
* test=develop, fix docker with paddle nccl problem
* test=develop, add ut for debug string and gradient_accumulator
* test=develop, add tests for layer/gradient_accumulator/prepared_op
* test=develop, fix complie error for test_prepared_op
* test=develop, add more ut for dygraph
* test=develop, create API.spec for dygraph api change
* add transform_data to dygraph
* test=develop, refoctor name to make it easier to understand
* test=develop, refoctor name to make it easier to understand
* add test and change input to const ref for safety
* test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ
* add ut for data transform
* refine ut for data_transform
* test=develop, fix ut failed on parallel se-resnext
* test=develop, change one more PADDLE_ENFORCE
* add test_tracer on multiple devices
* test=develop, change place to mutable for data transform
* test=develop, add transform data on same place test and remove useless log
* test=develop, Add to do for data layout and and ut for conv2d with no bias
* Implement the operator with sprase matrix multiply
* Update the URL of mklml library.
test=develop
* Disable MKLML implematation when using no-linux.
test=develop
* optimize bp with mkl sparse matrix
test=develop
* tmp add fused_emb_seq layer
* Add the support of padding_idx attribute.
test=develop
* add padding_idx support
test=develop
* implement grad refer lego
test=develop
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.
We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.
Also added data_feed_proto to operator to fix CI in CPU compilation
* Refine the codes related to fc op.
* Add GPU implementation for fc functor.
* Apply fc_fuse_pass in GPU inference.
test=develop
* Change the cmake for fc op.
* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
* Add an attribute to set the activation type in fc_op.
* Enhance the unittest of fc_op.
test=develop
* Remove the declaration of FCOpGrad back to the header file.
test=develop
* Set default value for newly added arguments in test_fc_op.
test=develop
* Remove constraint that last dimension is forced to be 1 in huber_loss
test=develop
* add y[rank-1] == 1 when x_rank=y_rank test=develop
* modify into contain_unknown_dim test=develop
* refactor dygraph,test=develop
* fix failed unittest,test=develop
* polish code,test=develop
* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop
* polish vlog and profiler, test=develop
* try to fix preceding ops order,test=develop
* test transformer in windows ci, test=develop
* use python c-api to speed up tracer.trace,test=develop
* test=develop, fix docker with paddle nccl problem
* test=develop, add ut for debug string and gradient_accumulator
* test=develop, add tests for layer/gradient_accumulator/prepared_op
* test=develop, fix complie error for test_prepared_op
* test=develop, add more ut for dygraph
* test=develop, create API.spec for dygraph api change
* test=develop, refoctor name to make it easier to understand
* test=develop, refoctor name to make it easier to understand
* test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ
* test=develop, fix ut failed on parallel se-resnext
* test=develop, change one more PADDLE_ENFORCE