* Support assignment to a Variable in dynamic mode. Note: not deal with backward.
* Rewrite VarBase __setitem__ for high-performance.
* try to test 3 means to do __setitem__ and test the performance of 3 means.
* Retain the means of the highest performance: C++ code and don't trace op.
* add float64 input to ctc_loss
* modified error message of warpctc
* update repo and tag of warpctc
* add test for warpctc with float64 input
* modified warpctc.cmake to make sure build always
* resolved sample code bug of warpctc
* add core.ops in warpctc dygraph
* fix a bug of test
* support use add instead of sum to do gradient accumulation
* add inplace addto pass
* add grad_add op and inplace addto pass
* remove debug code
* code refine
* fix bug when sereral sum ops inserts at same op_idx
* fix Flags type
* add addto attribute for conv3d
* fix ut
* code clean
* fix type
* Finished ChannelWiseQuantDequantAbsMaxOp and Passed unittests.
* Finished channel-wise quantize strategy in imperative quantization.
* Added Cuda code of ChannelWiseQuantDequantMaxAbsOP
Add Cuda code of ChannelWiseQuantDequantMaxAbsOp
* Add quant_axis for channel_wise quant.
* fixed a bug in unnitests, which will not trigger axis = 1 case and cannot meet the coverage rate requirement.
* Added some assert infomation and fixed some coding style mistakes.
* update amp_check_finite_and_scale_op for static_amp.
* use amp_check_finite_and_scale in static graph amp.
* update grads to zero when grads own infinite values(as for amp_checkout_finite_and_scale op).
* add update_loss_scaling op in cpp.
* add update_loss_scaling_op unit test.
* update the doc of the check_finite_and_unscale op
* Update the process of gradients updating skipping if the gradients have infinite values.
* update the way to zero grads.
* update test_update_loss_scaling_op.py
* add log info when find infinite grads.
* add the unit test for UpdateLossScaling Layer.
* enhance collect_op for dygraph, test=develop
* enhance detection ops with lod, test=develop
* support none bbox left in generate_proposals, test=develop
* unfiy MultiLevelRoisNum, test=develop
* update core.ops, test=develop
* add op register for new input & output, test=develop
* get use of global 'use_mkldnn' in layer_helper
* update for CI
* update for CI, relu test
* update for CI, relu test added, make FLAGS_use_mkldnn a public flag
* added more strict tests, fixes after review
* fixes after review
* fixes after review, CI stuff
* optimized transformation form tensor to numpy, test=develop
* optimized transformation form tensor to numpy, pass pre-commit, test=develop
* modify fetchophandle zerocopy to deepcopy in PE&CUP, test=develop
* modify py:array construct, test=develop
* fix _fetch_var to use deep copy, test=develop
* support Baidu AI Accelerator
* test=kunlun
* minor
* test=kunlun
* support xpu op in separate file
* test=kunlun
* update XPU error message and remove duplicated code
* test=kunlun
* minor
* test=kunlun
* minor
* test=kunlun
* expose and unify the Tensor concepts to the user
* expose tensor to user
* add copy place for Tensor
* add copy place for Tensor
* add note
* add macro PADDLE_WITH_CUDA
* remove RUN_TYPE=DIST
* fix some error
* fix the double grad bug for the star gan. test=develop
* update the retain_graph parameter doc. test=develop
* add the unit test for the retain_graph parameter. test=develop
* fix bn & in in dy, test=develop
* update instance_norm,test=develop
* fix bugs,test=develop
* add more case in unittest,test=develop
* fix,test=develop
* fix,test=develop
* Add a StatValue class in the backend to represent a stat.
* Add a singleton StatRegistry to maintain the collection of stats.
* For the sake of code neatness, we only support type of int and float, which can cover most of the scenarios.
* add new macro BOOST_GET_SAFELY & unittests, test=develop
* add different macro type, test=develop
* fix get macro type in executor, test=develop
* four macro part change backup
* using one macro for all case, test=develop
* revert attribute change, test=develop
* change to three func to solve gcc4.8 bug, test=develop
* polish some details, test=develop
* Support LoDTEnsorArray in fetch op
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Remove the NGraph engine from PDPD repository
1. Each operator was removed from the operator's directory
2. Each test was removed from the unittest directory
3. The parallel executor support was removed from the PDPD
4. The CMake file was removed from the PDPD
5. The NG flags were removed from the repository
test=develop
* Remove ngraph from:
1. Cmake file
2. Python file
test=develop
* Add a function to update FLAGS
test=develop
* Add a function to update FLAGS
test=develop
* expr flags
* Add a function to update FLAGS
test=develop
* distinguish public/private vars, test=develop
* fix windows issues, test=develop
* expr flag
* Add functions to get and set FLAGS
test=develop
* Add functions to get and set FLAGS
test=develop
* Add functions to get and set FLAGS
test=develop
* Add functions to get and set flags
test=develop
* Add functions to get and set FLAGS
test=develop
* Add a function to update FLAGS
test=develop
* Add a function to update FLAGS
test=develop
* Add functions to get and set flags in Paddle
test=develop
Co-authored-by: sneaxiy <sneaxiy@126.com>
* static model runner basic implement, test=develop
* add run program op to execute loaded program, test=develop
* refactor static model runner & run program op, test=develop
* reset engine.cc to resolve conflict
* adapt the change of dygraph double grad, test=develop
* refactor impl to solve control flow error, test=develop
* clear debug code, test=develop
* fix ci str compatible error & checkout dygraph grad maker & add example, test=develop
* hide api & add op test, test=develop
* fix run program op test places error, test=develop
* fix program by review comment, test=develop
* delete change var desc name, test=develop
* fix other program by review comment, test=develop
* remove _static_graph_guard, test=develop
* add selectedrows test, test=develop
* remove desc parser, test=develop
* fix detail program, test=develop
* change socpe create & add test, test=develop
* sequential reader stage 1, test=develop
* fix ut, test=develop
* fix iterable=False reset bug, add some logs and polish code, test=develop
* inference feed partial data, test=develop
* Turn on keep_order=True for test, test=develop
* enhance ut to test more cases, test=develop
* test commit for reverting
* Revert "test commit for reverting", test=develop
This reverts commit 80aef42ef52ba1ee79627d6f663a624ec4f12f58.
* add ut of merged and unmerged results, test=develop
* add more uts for coverages and add en doc of api, test=develop
* follow comments, test=develop
* change note style, test=develop
* change the ci trt from version 5. to 6.0
* paddle-trt dynamic shape support init
* conv+bias or conv+bn dynamic shape support
test=develop
* modity trt engine opconvert
test=develop
* fix ci error
test=develop
* update ScopeBufferedSSAGraphExecutor&AsyncSSAGraphExecutor&ThreadedSSAGraphExecutor&FastThreadedSSAGraphExecutor&ParallelSSAGraphExecutor&ParallelExecutor for fetching unmerged results.
* add the unit test for fetch_unmerged.
* update ut for multi-card and multi-cpu.
* add the error message and the user suggestion in FetchOpHandle. test=develop
* Refine adam op, test=develop
* Fuse kernels together to reduce cpu time.
* Refine paddle enforce, test=develop
* Remove some comments, test=develop
* Refine code,test=develop
* Refine cuda kernel, test=develop
* Refine code according to comments, test=develop
* Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator.
* Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly.
* Remove CPU code in Pull/PushSparse and we will add it back when testing it fully.
* Fix some known issues: such as copying persistable vars after one epoch running.
* Add the first implememtation of fusion_group op #19621 (#3)
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop
* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop
* Disable for mac and windows.
test=develop
* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop
* Refine the CUDA kernel to support large dims.
test=develop
* Add DeviceCodePool to manage all device codes.
* Add the first implementation fusion_group op.
* Add unit-test for fusion_group op.
* Add the check of result.
* Add the check of nvrtc in unit-test.
test=develop
* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop
* Disable fusion_group op for mac and windows.
test=develop
* Make the compiling of device code return status instead of hanging up.
test=develop
* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.
* Unify fusion_group_op's input and output names.
test=develop
* Add the check of CUDA driver library in unittest.
test=develop
* Enable generating code for a given subgraph. #21126 (#4)
* Enable generating code for a given subgraph.
* Support sorting the subgraph.
* Remove the rearange of expressions because we use the sorted subgraph directly.
* Enable generating code for a subgraph which is composed of grad ops.
* Use expression information to check the accuracy in unittest.
* Separate load and store from computation expressions.
test=develop
* Improve the loading statements in generated codes.
test=develop
* Remove unused arguments from formal list.
test=develop
* Enable the detection of subgraph of grad ops.
* Generate code for detected subgraph in fusion_group_pass.
* Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
test=develop
* Fix a bug when checking whether the shape of all inputs are the same.
* Add debug information.
* Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5)
test=develop
* Call subgraph_detector in fusion_group pass.
test=develop
* Disable fusion_group when WITH_GPU is OFF.
test=develop
* Refine all PADDLE_ENFORCE message.
test=develop
* Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
test=develop
* Follow review comments.
test=develop
* Implement a common python unittest to test the ir passes.
test=develop
* Save the results in np.array and support to startup on CPU.
test=develop
* Fix the unittest.
test=develop
* Add check_program to check whether the optimized program is different from the origin one.
test=develop
* Remove the inferface all_ops.
test=develop
* Add exception test in pass_test.
test=develop
* add bn and relu fuse pass
* add op attr assert and dtype assert
* fix some inputs&&outputs bugs for the fused op and pattern.
* add the unittest for fuse_bn_act_pass. test=develop
* use normative enforce statements. test=develop
* add the cpu test. test=develop
* add the support of batch_size=1 for the bn with relu op. test=develop
* add the error type for paddle throws. test=develop
* add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop
* modify fc to linear in sample code, test=develop
* remove FC, test=develop
* remove warnings, test=develop
* drop fluid/imperative/README.md , test=develop
* change fc to linear, test=develop
* polish code style, test=develop
* add file check_op_desc.py and add interface to get default value. test=develop
* add test for c++ coverage rate. test=develop
* Correct typo. test=develop