* impove error message when passing ndarray with object dtype
* imporve message format
* change assert to raise TypeError
* remind user how to locate the irregular data instead of printing
* add unittest for input array type check
* Fix conv2d+dequantize squash for residual fusion
test=develop
* Correct int8 input
test=develop
* Add if exclude or include padding in pool2d mkldnn
test=develop
The new "fluid.data" changes old "fluid.layers.data":
1. Add shape and dtype check.
2. Remove "append_batch_size" parameter. We won't offer this in the new data layer because other deep learning platforms don't have this kind of data layer pre-processing. It may confuse users.
3. Remove "stop gradient" parameter because the data layer doesn't do back-propagation
TODO:
Now data layer feeded by executor is checked, will we want to check the feed data of readers in the future?
* add kernel for fill_op, test=develop
* modify PADDLE_ENFORCE to PADDLE_ENFORCE_EQ, test=develop
* add op test for fill_op, test=develop
* REGISTER COP CUDA KERNEL, test=develop
* update test_fill_op.py, test=develop
* change FillConstantOpVarTypeInference to FillOpVarTypeInference, test=develop
* fix op test, test=develop
* add head file, test=develop
* add support of matmul with multiple head even different width and height
Original matmul with multiple head supports only the mat_a.width == mat_b.height,
in that case, mat_b will be horizontally split. In this patch, we extend the
support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
in this case, mab_b will be vertically split.
One example is A is [3, 8], B is [2, 16], head_number is 4. In this
case, A will be split as [3, 2], B will be (vertically) split as
[2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]
test=develop
* add support of matmul with multiple head even different width and height
Original matmul with multiple head supports only the mat_a.width == mat_b.height,
in that case, mat_b will be horizontally split. In this patch, we extend the
support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
in this case, mab_b will be vertically split.
One example is A is [3, 8], B is [2, 16], head_number is 4. In this
case, A will be split as [3, 2], B will be (vertically) split as
[2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]
test=develop
* refactor the code of matmul with multiple head even different width and height
test=develop
* Remove constraint that last dimension is forced to be 1 by add
lookup_table_v2 test=develop
* modify into PADDLE_ENFORCE_CUDA_SUCCESS test=develop
* Revert "modify into PADDLE_ENFORCE_CUDA_SUCCESS test=develop"
This reverts commit 8a960bfc61e51aa27c3c529df8fb90b93ebd19f9.
* move api into fluid.embedding test=develop
* fix example code test=develop
* move one_hot into fluid.one_hot
* modify api.spec test=develop
* fix loss shape test=develop
* support change shuffle thread num
* support change train thread num
* fix receive shuffle data of each channel
* data norm stop gradient
* add check thread_tensor type and root_tensor type when merge metric
* remove sleep in shuffle, add config
* add config of pslib client to client communication
* fix xbox str
* add data norm op testcase
* add flush in trainer finalize
* make OpTest check grad inplace even if forward has no inplace, test=develop
* do not run PE when enable_inplace is False, test=develop
* add conv3d cuda kernel for float16 type, test=develop
* refactor OpTest for inplace, test=develop
* add comments, test=develop
* add recompute based checkpoints methods for large batch training
test=develop
* add append_backward_with_forward_recomputation
test=develop
* refine optimizer
test=develop
* update backward and optimizer
test=develop
* make Variable usable
test=develop
* add recompute code
* refine optimizer
test=develop
* refine addup _append_backward_ops_with_checkpoints_
1) for recompute part, just cache the grad_op_desc without appending to block
2) before appending grad_op_desc to backward part, addup_repetitive_vars, remove unused branch
test=develop
* make method private
* add recompute strategy into DistributedStrategy
test=develop
* checkpoint version3
test=develop
* remove some print information
test=develop
* remove unused sumop
test=develop
* try to fix recompute with graph building modules
* add input names to vars should be held
* add memory debug tool
* backup backward
* Fix bugs
* add backward desc for op not in any segments
* add exception info for sub_block
test=develop
* modify code style
test=develop
* modify code style
test=develop
* remove print functions
test=develop
* add API spec
test=develop
test=document_preview
* make Recompute a child class of Optimizer
test=develop
test=document_preview
* add API spec
test=develop
test=document_preview
* modify API spec
test=develop
test=document_preview
* add document for Recompute
test=develop
test=document_preview
* change API doc of Rcompute
test=develop
test=document_preview
* code cleaning
test=develop
test=document_preview
* modify API spec
* fix bugs when segments hold no element
* add testcase for Recompute Optimizer
test=develop
test=document_preview
* add test for apply_gradient, and code cleaning
test=develop
test=document_preview
* add test case for load function
* enable CI
test=develop
test=document
* add test case
test=develop
test=document_preview
* add sample code for 4 function of recompute optimizer
test=develop
test=document_preview
* move tree_conv to fluid.contrib.layers
test=develop
* update API.spec for tree_conv
test=develop
* update tree_conv api to increase unit coverage
test=develop
* refactor dygraph,test=develop
* fix failed unittest,test=develop
* polish code,test=develop
* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop
* polish vlog and profiler, test=develop
* try to fix preceding ops order,test=develop
* test transformer in windows ci, test=develop
* use python c-api to speed up tracer.trace,test=develop
* test=develop, fix docker with paddle nccl problem
* test=develop, add ut for debug string and gradient_accumulator
* test=develop, add tests for layer/gradient_accumulator/prepared_op
* test=develop, fix complie error for test_prepared_op
* test=develop, add more ut for dygraph
* test=develop, create API.spec for dygraph api change
* test=develop, refoctor name to make it easier to understand
* test=develop, refoctor name to make it easier to understand
* test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ
* test=develop, fix ut failed on parallel se-resnext
* test=develop, change one more PADDLE_ENFORCE
* support auto prune in dygraph mode
* test=develop, support auto prune
* test=develop, merge develop conflict
* test=develop, fix test_layer and test_tracer ut
* test=develop, fix bug which may cause stop_gradient disabled with a list of backward inputs
modified interpolate_op to support tensor attribute
1. the parameter out_shape of image_resize、resize_nearest/bilinear/trilinear can be a list or a 1-D tensor variable. If a list, each element can be an integer or a tensor variable with shape: [1].
2. the parameter scale of above Ops can be a 1-D tensor variable.
modified document of image_resize, resize_nearest, resize_bilinear, resize_trilinear and add some code example.
add crop_tensor op. The main difference with crop is :
1. If the argument shape is a list, each element is an integer or a tensor variable with shape: [1]. This way is suitable for the case that the shape may be changed each iteration.
2. If the argument shape is a variable. Its rank must be 1. In crop op, the rank of shape must be the same as x
offsets can be a list, in which each element is an integer or a tensor variavle with shape: [1].
* Add fc_elementwise_layernorm_fuse pass and unittest.
* Add fused_fc_elementwise_layernorm op and its GPU kernel.
test=develop
* Apply fc_elementwise_layernorm_fuse_pass to GPU inference.
* Add the setting of attrs in the definition of binary_op.
test=develop
* Add comment.
* Implement the unittest.
test=develop
* Change the unittest name of layer_norm.
test=develop
* strided_slice op basic function test=develop
* test=develop rewrite and fix
* fix bug test=develop
* fix for the PADDLE_ENFORCE usage
* add some unit testw
* fix for the aip test and copright and fix test=develop
* fix API.spec test=develop
* fix API.spec test=develop
* add axis parameter test=develop
* fix for the build error test=develop
* fix python api test=develop
* fix the build test=develop
* fix build test=develop
* fix API spec test=develop
* test=develop add some comment and single op test
* fix API spece test=develop
* fix test=develop
* fix test=develop
* fix api test=develop
* fix api test=develop
* fix API.spec test=develop
* fix typo test=develop
* fix API.spec test=develop
* fix API typo test=develop
* fix doc and API.spec test=develop
improve pow op according to reviews:
1. Delete unnecessary judgement statements in PowGradOpDescMaker;
2. Improve test of test_api;
overload GetKernelTypeForVar
add stop_gradient=True when attr(factor) is tensor Variable, change examples in API pow.
test=develop,test=document_preview
add support parameter inference when argument shape is a list containing integer and tensor variable;
test=develop
fix reshape op according to reviews:
1. improve or message;
2. improve test of test_api.
test=develop,test=document_preview
fix reshape op: Add error message in nn.py, test=develop
add stop_gradient=True when attr(shape) is tensor Variable.
change examples in API reshape.
test=develop,test=document_preview
add support parameter inference when arguments starts or ends is a list containing integer and tensor variable;
test=develop,test=document_preview
improve slice op according to review(from hongyu). test=develop
fix slice op according to review: infer_flags, test=develop
fix slice op: improve overload operator __getitem__ to support attrs(starts and ends) are Variable.
test=develop,test=document_preview
fix test_slice_op: add TestSliceOp_decs_dim_6 to resolve conflict with test_slice_ngraph_op. test=develop
add stop_gradient=True when attr(starts) or attr(ends) is tensor Variable.
test=develop,test=document_preview
1. add tensor support for argument expand_times in expand op;
2. add support parameter inference when argument expand_times is a list containing integer and tensor variable;
improve expand op according to reviews:
1. add doc of ExpandTimes in expand_op.cc;
2. improve the test of test_api.
add stop_gradient=True when attr(expand_times) is tensor Variable, change code examples.
test=develop,test=document_preview
* refactor dygraph,test=develop
* fix failed unittest,test=develop
* polish code,test=develop
* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop
* polish vlog and profiler, test=develop
* try to fix preceding ops order,test=develop
* test transformer in windows ci, test=develop
* use python c-api to speed up tracer.trace,test=develop
* test=develop, fix docker with paddle nccl problem
* test=develop, add ut for debug string and gradient_accumulator
* test=develop, add tests for layer/gradient_accumulator/prepared_op
* test=develop, fix complie error for test_prepared_op
* test=develop, add more ut for dygraph
* test=develop, create API.spec for dygraph api change
* add transform_data to dygraph
* test=develop, refoctor name to make it easier to understand
* test=develop, refoctor name to make it easier to understand
* add test and change input to const ref for safety
* test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ
* add ut for data transform
* refine ut for data_transform
* test=develop, fix ut failed on parallel se-resnext
* test=develop, change one more PADDLE_ENFORCE
* add test_tracer on multiple devices
* test=develop, change place to mutable for data transform
* test=develop, add transform data on same place test and remove useless log
* test=develop, Add to do for data layout and and ut for conv2d with no bias
* Implement the operator with sprase matrix multiply
* Update the URL of mklml library.
test=develop
* Disable MKLML implematation when using no-linux.
test=develop
* optimize bp with mkl sparse matrix
test=develop
* tmp add fused_emb_seq layer
* Add the support of padding_idx attribute.
test=develop
* add padding_idx support
test=develop
* implement grad refer lego
test=develop
* Refine the codes related to fc op.
* Add GPU implementation for fc functor.
* Apply fc_fuse_pass in GPU inference.
test=develop
* Change the cmake for fc op.
* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
* Add an attribute to set the activation type in fc_op.
* Enhance the unittest of fc_op.
test=develop
* Remove the declaration of FCOpGrad back to the header file.
test=develop
* Set default value for newly added arguments in test_fc_op.
test=develop
* Remove constraint that last dimension is forced to be 1 in huber_loss
test=develop
* add y[rank-1] == 1 when x_rank=y_rank test=develop
* modify into contain_unknown_dim test=develop
* refactor dygraph,test=develop
* fix failed unittest,test=develop
* polish code,test=develop
* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop
* polish vlog and profiler, test=develop
* try to fix preceding ops order,test=develop
* test transformer in windows ci, test=develop
* use python c-api to speed up tracer.trace,test=develop
* test=develop, fix docker with paddle nccl problem
* test=develop, add ut for debug string and gradient_accumulator
* test=develop, add tests for layer/gradient_accumulator/prepared_op
* test=develop, fix complie error for test_prepared_op
* test=develop, add more ut for dygraph
* test=develop, create API.spec for dygraph api change
* test=develop, refoctor name to make it easier to understand
* test=develop, refoctor name to make it easier to understand
* test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ
* test=develop, fix ut failed on parallel se-resnext
* test=develop, change one more PADDLE_ENFORCE
* test=develop add a argument for softshrink python api
* test=develop fix doc format
test=develop fix doc format
* test=develop fix API.spec
test=develop fix API.spec
* test=develop
Fix the scatter op bug when use the add mode, and support the int64 data type of scatter_op Index(#18804).
* test=develop
Remove the PADDLE_ENFORCE and use PADDLE_ENFORCE_EQ
* test=develop
Remove the fix bug of scatter_add, and just add the support of int64 in scatter_add
* test=develop
Add the test case for scatter op, the test case just for index int64
* add to and detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add detach for Variable in dygraph, test=develop
* add exception check, test=develop