* replace config by kwargs
* change save path form dir to prefix
* fix failed unittests
* revert unittest name change
* polish en docs
* add more tests for coverage
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* modify doc for Layer, test=develop
* modify doc for Layer2, test=develop
* set dtype default value to float32, test=develop
* add example code for paddle.nn.Layer, test=develop
* set create_parameter and create_variable dtype default value to None, test=develop
* modify some example code, tet=develop
* refine, test=develop
* del no ues code, test=develop
* modify doc, example code, args, test=develop
* modify doc, test=develop
* fix huber_loss ans npair_loss doc and example code, test=document_fix
* remove disable_static in example code, test=document_fix
* remove huber_loss and refine npair_loss example code, test=document_fix
* remove huber_loss in functional/__init__.py, test=document_fix
* add multinomial cpu kernel
* fix C++ notype error
* fix windows ci array len error
* let array len be const
* change array to vector
* add cuda kernrl with num_distribution is 1, and not support replacement=False
* add multinomial python api
* support num_distribution different multinomial distributions
* add categorical class
* fix test_distribution enable_static error
* add unittest for different setting of Categorical
* optimize format
* little change
* little change
* add raise error if shape not match, optimize format
* fix windows CI dtype error in concat
* little changes
* little changes2
* change values type to int64
* change values type to int64
* change values type to int64
* remove input requirment in dygraph Model
* correct unittest
* upadte save inference model in dygraph without input
* fix unittets for test_model.py
* solve conflicts
* solve conflicts
* delete http.log
* fix test_model.py bug, correct initialization of MyModel
* fix unittests bugs
* set paddle manual seed for unittest
* fix Model bugs, because inputs can be list or dict when it is provided.
* add random seed for test_export_deploy_model
* delete redundant codes, because calls
* Code optimization, error information optimization
* add multinomial cpu kernel
* fix C++ notype error
* fix windows ci array len error
* let array len be const
* change array to vector
* add cuda kernrl with num_distribution is 1, and not support replacement=False
* add multinomial python api
* support num_distribution different multinomial distributions
* add multinomial python api unittest
* change output dtype to int64
* fix coverage prob
* optimize format
* fix dtype of output error, should be int64_t
* increase tolerance
* increase the difference between low and high
* change tolerance of Normal log_prob method
* change probs tolerance to 1e-4
* change tolerance of Normal kl method
* polish Program api doc & example
* polish CompiledProgram api doc & example
* polish ParallelEnv api doc & examples
* polish details, test=document_fix
* polish program doc details, test=document_fix
* polish details, test=document_fix
* fix note format error, test=document_fix
* add lost example, test=document_fix
* fix lost example, test=document_fix
Refine Dy2stat APIs to 2.0rc
After discussion, we accepted 3 key points from reviewers:
1. In 2.0rc we changed dygraph_to_static folder to dy2static
2. Keep the three files: convert_call_func.py, convert_operators.py, variable_trans_func.py
3. Remove convert_operators path when users import convert_xxx.
After this PR, users can import convert_xxx APIs by:
`import paddle.jit.dy2static.convert_xxx`
The file structure will be:
```
jit
dy2static
convert_operators.py
convert_func_call.py
variable_trans_func.py
```
Detail changed API in files:
In python/paddle/jit/dygraph_to_static/convert_call_func.py:
from ...fluid.dygraph.dygraph_to_static.convert_call_func import convert_call #DEFINE_ALIAS
In python/paddle/jit/dygraph_to_static/convert_operators.py:
from ...fluid.dygraph.dygraph_to_static.convert_operators import cast_bool_if_necessary #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_assert #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_ifelse #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_len #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_logical_and #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_logical_not #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_logical_or #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_print #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_var_dtype #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_var_shape #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_while_loop #DEFINE_ALIAS
In python/paddle/jit/dygraph_to_static/variable_trans_func.py:
from ...fluid.dygraph.dygraph_to_static.variable_trans_func import create_fill_constant_node #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.variable_trans_func import create_static_variable_gast_node #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.variable_trans_func import data_layer_not_check #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.variable_trans_func import to_static_variable #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.variable_trans_func import to_static_variable_gast_node #DEFINE_ALIAS
This PR fixed two bugs when converting LSTM in dy2stat:
is_unsupported has a condition can trigger Python syntax error
LSTM API's implementation in _rnn_static_graph doesn't include parameter initialization, which can cause dy2stat error.
We decreased the batch size on CPU so that it can run correctly on Win/Mac machine, this may cause the delta to be larger. So I set larger delta value.
* modified sample code of add_position_encoding to 2.0, test=document_fix
* use core.op in add_position_encoding API.
* add test for add_position_encoding in dygraph mode
* add unittests and op version register for tensorrt_subgraph_pass
* rename to test_trt_subgraph_pass.py
* fix softmax converter diff when padding dim=1
* Support assignment to a Variable in dynamic mode. Note: not deal with backward.
* Rewrite VarBase __setitem__ for high-performance.
* try to test 3 means to do __setitem__ and test the performance of 3 means.
* Retain the means of the highest performance: C++ code and don't trace op.
* add float64 input to ctc_loss
* modified error message of warpctc
* update repo and tag of warpctc
* add test for warpctc with float64 input
* modified warpctc.cmake to make sure build always
* resolved sample code bug of warpctc
* add core.ops in warpctc dygraph
* fix a bug of test
* support elementwise add, activation, matmul on Baidu Kunlun
* test=kunlun
* minor
* test=kunlun
* reconstuct the xpu directory
* test=kunlun
* minor
* test=kunlun
* minor
* test=kunlun
* minor
* test=kunlun
* minor
* test=kunlun
* minor
* test=kunlun
As the title, decrease random failure probability for test_parallel_executor_mnist
The old code set larger delta when comparing reduce and all reduce, but didn't set all. I added it.
On my linux machine, I run 100 times, no failure occurs. In addition, we only saw this random failure on CI two times since I worked. I thought it was rare and I just increased the delta.
* support use add instead of sum to do gradient accumulation
* add inplace addto pass
* add grad_add op and inplace addto pass
* remove debug code
* code refine
* fix bug when sereral sum ops inserts at same op_idx
* fix Flags type
* add addto attribute for conv3d
* fix ut
* code clean
* fix type
* Finished ChannelWiseQuantDequantAbsMaxOp and Passed unittests.
* Finished channel-wise quantize strategy in imperative quantization.
* Added Cuda code of ChannelWiseQuantDequantMaxAbsOP
Add Cuda code of ChannelWiseQuantDequantMaxAbsOp
* Add quant_axis for channel_wise quant.
* fixed a bug in unnitests, which will not trigger axis = 1 case and cannot meet the coverage rate requirement.
* Added some assert infomation and fixed some coding style mistakes.
* fix fleet util and gloo
* fix worker endpoints
* fix
* fix UT
* fix gloo
* fix gloo
* update gloo
* update gloo
* update gloo
* update gloo
* update gloo
* fix gloo wrapper for hdfs
* add file gloo and UT
* fix UT
* fix UT
* fix UT
* hide public method of RoleMaker
* fix UT
* GPU fleetrun support gloo
* parameterserver fleetrun support gloo
* add UT
* add UT
* fix UT
* fix get server endpoint
* fix get server endpoint
* fix UT
* hide public method of rolemaker
* hide public method of rolemaker
* hide public method of rolemaker
* Update test_fleet_rolemaker_new.py
* hide public method of rolemaker
* hide public method of rolemaker
* Add env value to log to stdout; 2.Add logger name
* Optimize log messages in dygraph-to-static
* Replace logging.warn and warnings.warn with logging_utils.warn
* update model.save_inference_model
* update doc for _save_inference_model, delete useless class in unittests
* make users not be able to set model._inputs be None
* update usage of Model class in unittests
* fix bugs of _verify_spec
* fix bugs of _verify_spec
* add unittest to increase coverage rate
* delete http.log
* update doc for save, remove requirments and limitations for using
* update doc for class Model
* optimize slice TRT plugin
This patch removes unnecessary barrier for data transfer of needed offset,
so data transfer can be overlap with GPU kernel execution.
This patch also fixes incorrect name of slice plugin. That is, replaces
"layernorm" with "slice"
test=develop
* add serialize/deserialize to slice plugin
* add static shape slice trt plugin
* fix slice trt op convertor dynamic shape bug
* fix format by clang-format
* fix pylint format error
* fix problems commented by peiyang
Co-authored-by: Ryan Jeng <rjeng@nvidia.com>
* update amp_check_finite_and_scale_op for static_amp.
* use amp_check_finite_and_scale in static graph amp.
* update grads to zero when grads own infinite values(as for amp_checkout_finite_and_scale op).
* add update_loss_scaling op in cpp.
* add update_loss_scaling_op unit test.
* update the doc of the check_finite_and_unscale op
* Update the process of gradients updating skipping if the gradients have infinite values.
* update the way to zero grads.
* update test_update_loss_scaling_op.py
* add log info when find infinite grads.
* add the unit test for UpdateLossScaling Layer.