1.support asymmetric padding;
2.support padding algorithm:"SAME" and "VALID";
3.support channel_last: data_format NHWC and NDHWC;
4.change doc of python API and c++;
test=develop, test=document_preview
* How to write custom op needs to follow framework OP spec.
* Package fluid_framework.so and headers into whl.
* Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir.
* Export some C-APIs to merge OpInfo between core.so and custom_op.so.
* Add unit testing.
* Update API.spec.
* test=develop, argument shape support tensor and tensor in list
* test=develop,Increasing the coverage of CI tests
* test=develop, modify the document and update API.spec
* test=develop, modify the doc and update API.spec
* test=develop, modify the doc and update API.spec
* test=develop, modify the interface of UniformInitializer
* test=develop, modify the interface of XavierInitializer and MSRAInitializer
* test=develop, modify based on review's comments
* test=develop, modify based on review's comments
* test=develop, modify based on review's comments
* fix pool2d pool3d:
1. support asymmetric padding;
2. support padding algorithm:"SAME" and "VALID";
3. support channel_last: data_format NHWC and NDHWC;
4. support inferring shape when input with negative dims in compile time;
5. change doc of python API and c++;
6. fix bug in cuda kernel when Attr(adaptive) is true.
test=develop,test=document_preview
* fix 'tensors' to 'Tensors'. test=develop,test=document_preview
* add test for converage ValueError.test=develop,test=document_preview
* resolve conflict in test_pool2d. test=develop
* Follow Wangzhen's comment in PR 18970, test=develop
* Review comments, test=develop
* Leave fake quantization around mul
test=develop
* Replace Fake with Real Quantized Mul
test=develop
* Fix bug in quantize placement pass
Nodes in the graph now have checked type instead of node name when they are to be marked for quantization test=develop
* test=develop, fix docker with paddle nccl problem
* test=develop, Add Variable api and refine dygraph related API
* test=develop, Add Variable api and refine dygraph related API
* test=develop, refine test for new api and error info
* test=develop, refine error info and test_layers
* test=develop, add API.spec
* test=devleop, fix to_string python2 and python3 compat error and refien doc
* test=devleop, add API spec
* test=devleop, update API spec
* test=devleop, update API spec
* test=develop, invoke ci
* test=develop, fix example code
* test=develop, update API spec
* test=develop, add compat test and fix inplace campat dict error
* Fix conv2d+dequantize squash for residual fusion
test=develop
* Correct int8 input
test=develop
* Add if exclude or include padding in pool2d mkldnn
test=develop
The new "fluid.data" changes old "fluid.layers.data":
1. Add shape and dtype check.
2. Remove "append_batch_size" parameter. We won't offer this in the new data layer because other deep learning platforms don't have this kind of data layer pre-processing. It may confuse users.
3. Remove "stop gradient" parameter because the data layer doesn't do back-propagation
TODO:
Now data layer feeded by executor is checked, will we want to check the feed data of readers in the future?
* add kernel for fill_op, test=develop
* modify PADDLE_ENFORCE to PADDLE_ENFORCE_EQ, test=develop
* add op test for fill_op, test=develop
* REGISTER COP CUDA KERNEL, test=develop
* update test_fill_op.py, test=develop
* change FillConstantOpVarTypeInference to FillOpVarTypeInference, test=develop
* fix op test, test=develop
* add head file, test=develop
* add support of matmul with multiple head even different width and height
Original matmul with multiple head supports only the mat_a.width == mat_b.height,
in that case, mat_b will be horizontally split. In this patch, we extend the
support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
in this case, mab_b will be vertically split.
One example is A is [3, 8], B is [2, 16], head_number is 4. In this
case, A will be split as [3, 2], B will be (vertically) split as
[2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]
test=develop
* add support of matmul with multiple head even different width and height
Original matmul with multiple head supports only the mat_a.width == mat_b.height,
in that case, mat_b will be horizontally split. In this patch, we extend the
support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
in this case, mab_b will be vertically split.
One example is A is [3, 8], B is [2, 16], head_number is 4. In this
case, A will be split as [3, 2], B will be (vertically) split as
[2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]
test=develop
* refactor the code of matmul with multiple head even different width and height
test=develop
* Add support for new QAT models
test=develop
Co-Authored-By: Michał Gallus <michal.gallus@intel.com>
Co-Authored-By: Wojciech Uss <wojciech.uss@intel.com>
* fixed fps results
test=develop
* fix top5 accuracy drop problem
* updated for new QAT models
* skip quantizing average pooling - dirty but working
* add missing pass
* added missing conv+brelu fuse pass
* removed a call to non-existent pass
test=develop
* renamed pass
test=develop
* Adjust finding pooling scale to newest QAT models
* Remove unnecessary code from quantization_mkldnn_pass
* Copy Pooling input scale to output scale in QAT
* Refactor & remove unused code in QAT
* Incorporate fp32 FC into QAT
test=develop
* Enable graph drawing with debug flag
test=develop
* Add tests for QATv2
* Fix paths for QATv2 models
test=develop
* Add option to save transformed int8 qat model
test=develop
* Remove redundant lines from qat mkldnn pass
test=develop
* Delegate disablement of avg pooling to qat
test=develop
* fix CI bug, test=develop
* Follow Wangzhen's Review, test=develop
* Update API.spec
test=develop
* Name False in (is_unsigned, TensorScale) tuple
test=develop
* Remove constraint that last dimension is forced to be 1 by add
lookup_table_v2 test=develop
* modify into PADDLE_ENFORCE_CUDA_SUCCESS test=develop
* Revert "modify into PADDLE_ENFORCE_CUDA_SUCCESS test=develop"
This reverts commit 8a960bfc61e51aa27c3c529df8fb90b93ebd19f9.
* move api into fluid.embedding test=develop
* fix example code test=develop
* move one_hot into fluid.one_hot
* modify api.spec test=develop
* fix loss shape test=develop
1. Support customize eval function instead of eval program.
2. Fix loading checkpoint in quantization strategy.
3. Support saving eval model when saving a checkpoint.
4. Fix decoder of loading context in PaddleSlim.
5. Fix restoring from the checkpoint of uniform prune strategy.
6. Support saving eval model and infer model during training.
7. Add ‘unitest’ for saving eval model, saving infer model and uniform pruning restoring from the checkpoint.
8. Fix pruning of depthwise_conv_grad op by updating the groups.
* support change shuffle thread num
* support change train thread num
* fix receive shuffle data of each channel
* data norm stop gradient
* add check thread_tensor type and root_tensor type when merge metric
* remove sleep in shuffle, add config
* add config of pslib client to client communication
* fix xbox str
* add data norm op testcase
* add flush in trainer finalize
* make OpTest check grad inplace even if forward has no inplace, test=develop
* do not run PE when enable_inplace is False, test=develop
* add conv3d cuda kernel for float16 type, test=develop
* refactor OpTest for inplace, test=develop
* add comments, test=develop
* add recompute based checkpoints methods for large batch training
test=develop
* add append_backward_with_forward_recomputation
test=develop
* refine optimizer
test=develop
* update backward and optimizer
test=develop
* make Variable usable
test=develop
* add recompute code
* refine optimizer
test=develop
* refine addup _append_backward_ops_with_checkpoints_
1) for recompute part, just cache the grad_op_desc without appending to block
2) before appending grad_op_desc to backward part, addup_repetitive_vars, remove unused branch
test=develop
* make method private
* add recompute strategy into DistributedStrategy
test=develop
* checkpoint version3
test=develop
* remove some print information
test=develop
* remove unused sumop
test=develop
* try to fix recompute with graph building modules
* add input names to vars should be held
* add memory debug tool
* backup backward
* Fix bugs
* add backward desc for op not in any segments
* add exception info for sub_block
test=develop
* modify code style
test=develop
* modify code style
test=develop
* remove print functions
test=develop
* add API spec
test=develop
test=document_preview
* make Recompute a child class of Optimizer
test=develop
test=document_preview
* add API spec
test=develop
test=document_preview
* modify API spec
test=develop
test=document_preview
* add document for Recompute
test=develop
test=document_preview
* change API doc of Rcompute
test=develop
test=document_preview
* code cleaning
test=develop
test=document_preview
* modify API spec
* fix bugs when segments hold no element
* add testcase for Recompute Optimizer
test=develop
test=document_preview
* add test for apply_gradient, and code cleaning
test=develop
test=document_preview
* add test case for load function
* enable CI
test=develop
test=document
* add test case
test=develop
test=document_preview
* add sample code for 4 function of recompute optimizer
test=develop
test=document_preview
* move tree_conv to fluid.contrib.layers
test=develop
* update API.spec for tree_conv
test=develop
* update tree_conv api to increase unit coverage
test=develop
* refactor dygraph,test=develop
* fix failed unittest,test=develop
* polish code,test=develop
* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop
* polish vlog and profiler, test=develop
* try to fix preceding ops order,test=develop
* test transformer in windows ci, test=develop
* use python c-api to speed up tracer.trace,test=develop
* test=develop, fix docker with paddle nccl problem
* test=develop, add ut for debug string and gradient_accumulator
* test=develop, add tests for layer/gradient_accumulator/prepared_op
* test=develop, fix complie error for test_prepared_op
* test=develop, add more ut for dygraph
* test=develop, create API.spec for dygraph api change
* test=develop, refoctor name to make it easier to understand
* test=develop, refoctor name to make it easier to understand
* test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ
* test=develop, fix ut failed on parallel se-resnext
* test=develop, change one more PADDLE_ENFORCE
* support auto prune in dygraph mode
* test=develop, support auto prune
* test=develop, merge develop conflict
* test=develop, fix test_layer and test_tracer ut
* test=develop, fix bug which may cause stop_gradient disabled with a list of backward inputs
* Set states of recurrent op as dependent vars in prune of save inference model
This PR will fix the save/load inference model problem of RNN models.
The reason of the bug is that save_inferenc_model will prune OPs that doesn't contribute to Output. But in recurrent_op, States are not Output, OPs refers States will be pruned.
This fix adds States of recurrent_op as dependent var so that OPs referring States won't be pruned.