* fix multinomial doc
* fix multinomial error message
* little doc change
* fix Categorical class doc
* optimize format of error message
* fix CPU Kernel error message format
* fix isinf and isnan error in WindowsOPENBLAS CI
* delete inf and nan
* add manual_seed in sample code
* little error message change
* change error message to InvalidArgument
* add full point for error message and add manual_seed in CPU environment
* Add truncated_gaussian_random_op XPU kernel
* Add truncated_gaussian_random_op XPU kernel, test=kunlun
* little change, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* little change, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* little change, test=kunlun
* add TODO, test=kunlun
* Add gaussian_random XPU kernels
* commit kunlun, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* add TODO, test=kunlun
* support uniform_random op on Baidu Kunlun
* change dtype of attr shape from int to int64_t
* kunlun ci, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format
* run Kunlun CI, test=kunlun
* add TODO, test=kunlun
* Incorporate cudnn_lstm into LSTM api.
test=develop
* Make coalesce_tensor support alignment optionally.
test=develop
* Reorganize RNN apis. test=develop
* Fix cudnn rnn layout conversion.
test=develop
* Add sequence_length support for RNN cudnn implement.
Add optional init_h and init_c gradient for cudnn_lstm_op.
test=develop
* Use create_parameter for rnn cudnn impl.
test=develop
* Move `self._flat_weight = self.create_parameter()` in RNNBase to main_program.
test=develop
* Update RNN api unittest to use set_device.
test=develop
* Fix set_place for unit tests of RNN apis.
test=develop
* Fix use_align in coalesce_tensor_op.
test=develop
* Adjust RNN apis arguments according to comments.
test=develop
* Polish documents for SimpleRNN apis.
test=develop
* Refine random seed in cudnn_lstm_op.
Expose rnn params from sublayers to RNN.
test=develop
* Fix RNN saving for jit.save.
Refine cudnn_lstm dropout behavior.
test=develop
* Fix doc of GRU. test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Remove updates on cudnn_lstm temporarily.
test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Refine random seed in cudnn_lstm_op.
test=develop
* Fix test_lstm by adjust ConcreteProgram buffer getter.
test=develop
* Use create_parameter instead of create_var for rnn._flat_weight for static graph usage.
test=develop
* Remove W input for cudnn_lstm to pass unused_var_check.
test=develop
* Add test_predict for RNN unit tests coverage.
test=develop
* Fix code style of rnn.
test=develop
* Fix F.rnn usage in rnn.py.
test=develop
* test=kunlun;
Add elementwise XPU OP kernel for KUNLUN core, including (but still cannot process common broadcast):
* elementwise_div op
* elementwise_max op
* elementwise_mul op (with grad op)
* elementwise_sub op (with grad op)
* 0.05->0.01
* add xpu error message description;test=kunlun
* Make dynamic_decode support dygraph and expose to API 2.0
test=develop
* update info about BeamSearchDecoder and dynamic_decode
* remove all APIs in paddle.text, expose BeamSearchDecoder and dynamic_decode
* update example code
* delete test_text.py, decode.py, update some doc, fix example code float64
* delete decode import from paddle.nn
* fix unittest bugs
* use dygraph.Embedding instead of nn.Embedding, add paddle.enbale_static()
* update, correct doc
* move dynamic_decode, BeamSearchDecoder API to paddle.nn
* fix code style
* update unittest param, delete import pf text.py
* set dtype of beamsearchtest float64
* update example code of BeamSearchDecoder, dynamic_decode
Co-authored-by: LiuChiaChi <709153940@qq.com>
1. support channel last in BatchNorm*d (#27875)
2. fix a bug in batch_norm_op cuda kernel by extracting ResizeToChannelFist(Last), TransToChannelFirst(Last) to operators/layer_utils.h
* modify cond while_loop to paddle.static.nn.cond
* modify crop_tensor to paddle.crop
* modify Variable to paddle.static.Variable
* remove nn.beam_search, nn.beam_search_decode, nn.gather_tree
* remove bpr_loss, center_loss, rank_loss, smooth_l1, teacher_student_sigmoid_loss, edit_distance, sampled_softmax_with_cross_entropy in nn.functional
* remove apis in nn.functional.learn_rate.py
* remove pool2d, pool3d, adaptive_pool2d, adaptive_pool3d in nn.functional
* remove apis in nn.functional.vision
* remove erf, soft_relu in nn.functional.activation
* remove apis in nn.functional.extension
* remove nn.functional.rnn
* remove hash from nn.functional.lod
* remove row_conv from nn.functional.extension
* remove one_hot, pad2d, pad_constant_like from nn.functional.common
* remove nn.gather_tree, nn.BilinearTensorProduct, nn.Pool2D, nn.Pad2D
* remove apis from optimizer.__init
* remove tensor.creation.fill_constant
* remove elementwise_mul in nn.functional.common and modify to paddle.multiply
* remove tensor.stat.reduce_mean
* remove reduce_all, reduce_any in tensor.logic
* remove apis in tensor.math
* remove apis in tensor.__init__
* remove has_inf, has_nan in tensor.search
* remove apis in framework.__init__
* remove apis in paddle.__init__
* remove apis in nn.functional.__init__
* modify removed alias apis to raw api in doc and unittests
* fix remove grid_sample bug
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* delete alias api relastions in doc
* reserve paddle.compat, paddle.sysconfig
* remove unittest for paddle.reduce_all, paddle.reduce_any
* modify removed alias apis to raw api in doc and unittests
* recover paddle.save and paddle.load
* resolve conflicts
* fix sample code missing paddle.enable_static() bug
* fix sample code missing paddle.enable_static() bug
* fix to_string sample code error
* fix gpu version paddle Error when have no CUDA device
* optimize format and add new unittest
* fix coverage problem
* fix unittest format
* change static mode to dygraph mode
* use subprocess in unittest
* 1. remove paddle.unique_with_counts api, which counts as unique api
2. add paddle.math.increment(x, value=1.0, name=None) api
3. replace paddle.sums with paddle.add_n api
4. update paddle.metric.accuracy api (add name parameter)
* fix doc and unittest of 2.0 lr_scheduler
* fix doc of 2.0 lr_scheduler
* fix unittest
* fix english doc of lr_scheduler
* fix api name of lr scheduler
* fix api name of lr scheduler
* add load_op_xpu for Baidu Kunlun, test=kunlun
* add is_compiled_with_xpu for unit test, test=kunlun
* add is_compiled_with_xpu for unit test, test=kunlun
* replace config by kwargs
* change save path form dir to prefix
* fix failed unittests
* revert unittest name change
* polish en docs
* add more tests for coverage
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* add multinomial cpu kernel
* fix C++ notype error
* fix windows ci array len error
* let array len be const
* change array to vector
* add cuda kernrl with num_distribution is 1, and not support replacement=False
* add multinomial python api
* support num_distribution different multinomial distributions
* add categorical class
* fix test_distribution enable_static error
* add unittest for different setting of Categorical
* optimize format
* little change
* little change
* add raise error if shape not match, optimize format
* fix windows CI dtype error in concat
* little changes
* little changes2
* change values type to int64
* change values type to int64
* change values type to int64
* add multinomial cpu kernel
* fix C++ notype error
* fix windows ci array len error
* let array len be const
* change array to vector
* add cuda kernrl with num_distribution is 1, and not support replacement=False
* add multinomial python api
* support num_distribution different multinomial distributions
* add multinomial python api unittest
* change output dtype to int64
* fix coverage prob
* optimize format
* fix dtype of output error, should be int64_t
* increase tolerance
* increase the difference between low and high
* change tolerance of Normal log_prob method
* change probs tolerance to 1e-4
* change tolerance of Normal kl method
Refine Dy2stat APIs to 2.0rc
After discussion, we accepted 3 key points from reviewers:
1. In 2.0rc we changed dygraph_to_static folder to dy2static
2. Keep the three files: convert_call_func.py, convert_operators.py, variable_trans_func.py
3. Remove convert_operators path when users import convert_xxx.
After this PR, users can import convert_xxx APIs by:
`import paddle.jit.dy2static.convert_xxx`
The file structure will be:
```
jit
dy2static
convert_operators.py
convert_func_call.py
variable_trans_func.py
```
Detail changed API in files:
In python/paddle/jit/dygraph_to_static/convert_call_func.py:
from ...fluid.dygraph.dygraph_to_static.convert_call_func import convert_call #DEFINE_ALIAS
In python/paddle/jit/dygraph_to_static/convert_operators.py:
from ...fluid.dygraph.dygraph_to_static.convert_operators import cast_bool_if_necessary #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_assert #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_ifelse #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_len #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_logical_and #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_logical_not #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_logical_or #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_print #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_var_dtype #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_var_shape #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_while_loop #DEFINE_ALIAS
In python/paddle/jit/dygraph_to_static/variable_trans_func.py:
from ...fluid.dygraph.dygraph_to_static.variable_trans_func import create_fill_constant_node #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.variable_trans_func import create_static_variable_gast_node #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.variable_trans_func import data_layer_not_check #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.variable_trans_func import to_static_variable #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.variable_trans_func import to_static_variable_gast_node #DEFINE_ALIAS
This PR fixed two bugs when converting LSTM in dy2stat:
is_unsupported has a condition can trigger Python syntax error
LSTM API's implementation in _rnn_static_graph doesn't include parameter initialization, which can cause dy2stat error.
We decreased the batch size on CPU so that it can run correctly on Win/Mac machine, this may cause the delta to be larger. So I set larger delta value.
* modified sample code of add_position_encoding to 2.0, test=document_fix
* use core.op in add_position_encoding API.
* add test for add_position_encoding in dygraph mode
* add unittests and op version register for tensorrt_subgraph_pass
* rename to test_trt_subgraph_pass.py
* fix softmax converter diff when padding dim=1
* Support assignment to a Variable in dynamic mode. Note: not deal with backward.
* Rewrite VarBase __setitem__ for high-performance.
* try to test 3 means to do __setitem__ and test the performance of 3 means.
* Retain the means of the highest performance: C++ code and don't trace op.
* add float64 input to ctc_loss
* modified error message of warpctc
* update repo and tag of warpctc
* add test for warpctc with float64 input
* modified warpctc.cmake to make sure build always
* resolved sample code bug of warpctc
* add core.ops in warpctc dygraph
* fix a bug of test
* support elementwise add, activation, matmul on Baidu Kunlun
* test=kunlun
* minor
* test=kunlun
* reconstuct the xpu directory
* test=kunlun
* minor
* test=kunlun
* minor
* test=kunlun
* minor
* test=kunlun
* minor
* test=kunlun
* minor
* test=kunlun
As the title, decrease random failure probability for test_parallel_executor_mnist
The old code set larger delta when comparing reduce and all reduce, but didn't set all. I added it.
On my linux machine, I run 100 times, no failure occurs. In addition, we only saw this random failure on CI two times since I worked. I thought it was rare and I just increased the delta.
* support use add instead of sum to do gradient accumulation
* add inplace addto pass
* add grad_add op and inplace addto pass
* remove debug code
* code refine
* fix bug when sereral sum ops inserts at same op_idx
* fix Flags type
* add addto attribute for conv3d
* fix ut
* code clean
* fix type
* Finished ChannelWiseQuantDequantAbsMaxOp and Passed unittests.
* Finished channel-wise quantize strategy in imperative quantization.
* Added Cuda code of ChannelWiseQuantDequantMaxAbsOP
Add Cuda code of ChannelWiseQuantDequantMaxAbsOp
* Add quant_axis for channel_wise quant.
* fixed a bug in unnitests, which will not trigger axis = 1 case and cannot meet the coverage rate requirement.
* Added some assert infomation and fixed some coding style mistakes.
* fix fleet util and gloo
* fix worker endpoints
* fix
* fix UT
* fix gloo
* fix gloo
* update gloo
* update gloo
* update gloo
* update gloo
* update gloo
* fix gloo wrapper for hdfs
* add file gloo and UT
* fix UT
* fix UT
* fix UT
* hide public method of RoleMaker
* fix UT
* GPU fleetrun support gloo
* parameterserver fleetrun support gloo
* add UT
* add UT
* fix UT
* fix get server endpoint
* fix get server endpoint
* fix UT
* hide public method of rolemaker
* hide public method of rolemaker
* hide public method of rolemaker
* Update test_fleet_rolemaker_new.py
* hide public method of rolemaker
* hide public method of rolemaker
* Add env value to log to stdout; 2.Add logger name
* Optimize log messages in dygraph-to-static
* Replace logging.warn and warnings.warn with logging_utils.warn
* optimize slice TRT plugin
This patch removes unnecessary barrier for data transfer of needed offset,
so data transfer can be overlap with GPU kernel execution.
This patch also fixes incorrect name of slice plugin. That is, replaces
"layernorm" with "slice"
test=develop
* add serialize/deserialize to slice plugin
* add static shape slice trt plugin
* fix slice trt op convertor dynamic shape bug
* fix format by clang-format
* fix pylint format error
* fix problems commented by peiyang
Co-authored-by: Ryan Jeng <rjeng@nvidia.com>
* update amp_check_finite_and_scale_op for static_amp.
* use amp_check_finite_and_scale in static graph amp.
* update grads to zero when grads own infinite values(as for amp_checkout_finite_and_scale op).
* add update_loss_scaling op in cpp.
* add update_loss_scaling_op unit test.
* update the doc of the check_finite_and_unscale op
* Update the process of gradients updating skipping if the gradients have infinite values.
* update the way to zero grads.
* update test_update_loss_scaling_op.py
* add log info when find infinite grads.
* add the unit test for UpdateLossScaling Layer.
* add some unittest cases ot verify jit.save, no_test
* add more unittests
* add test with example inputs
* polish implement details
* remove useless blank
* fix fetch random error
* enhance collect_op for dygraph, test=develop
* enhance detection ops with lod, test=develop
* support none bbox left in generate_proposals, test=develop
* unfiy MultiLevelRoisNum, test=develop
* update core.ops, test=develop
* add op register for new input & output, test=develop
* fix _to_tensor method of Distribution class
* Add unittest
* let dtype be consistent with value in log_prob and probs
* fix format
* fix dtype problem and change unittest
* fix dtype of Numpy class in unittest
* add formula for entropy and kl
* change formula
* fix kl formula format
* fix kl formula format 2
* change gt to np in unittest
* optimize unittest format
* delete dumplicate
* delete dumplicate 2
* extract common function used to convert dtype value
* support load infer model format state dict
* add unittests
* remove keep name table
* recolve circle inport
* fix compatible problem
* recover unittest
* polish doc and comment
* fix dropout bug in backward when input is 1d tensor, test=develop
* add test case and refine error message, test=develop
* refine error message, test=develop
* support mnist and resnet dygraph_to_static test
* make FLAGS_use_mkldnn a public flag
* fix test_get_set_flags
* Change name of a function
* Rerun CIs commit
* Fix oneDNN dygraph training
Co-authored-by: Adam <38704900+grygielski@users.noreply.github.com>
Co-authored-by: grygielski <adam.grygielski@gmail.com>
* remove backend argument of init_parallel_env
* remove keep name table in transformer
* add cpu version check
* add skip unittest for init_parallel_env
* polish doc: remove func use & update example
* get use of global 'use_mkldnn' in layer_helper
* update for CI
* update for CI, relu test
* update for CI, relu test added, make FLAGS_use_mkldnn a public flag
* added more strict tests, fixes after review
* fixes after review
* fixes after review, CI stuff
* Move hapi form paddle/incubate to paddle
* Remove vision/datasets/utils.py and clean code
* Add sample code for conll05
* Print pull path when saving model
* Fix sample code after paramter_list of SGD is changed to parameters
* Fix bug in wmt16 datase
* set default dtype for distribution API
* Add unittest
* Add unittest
* fix import get_default_dtype problem
* delete change under fluid.layers.nn
* little change
* [Dy2Stat] Add debugging and logging mechanism for dygraph to static.
* Remove TransformerError temporarily.
* import mock in PY2, from unittest import mock in PY3. test=develop
* Expose interfaces set_code_level and set_verbosity in paddle.jit, fix doc of the two interface.
* polish doc of set_verbosity and set_code_level.
* Add RNN related apis in paddl.nn
test=develop
* new rnn api, cell almost done
* add new progresses in rnn APIs for 2.0
* refine rnn APIs and docstrings.
* add unittets
* disable gpu tests when paddle is not compiled with cuda support
* remove unnecessary imports
* fix docstring
* add to no_sample wlist
* backport to python2 to avoid yield from
* add **kwargs, fix typos
* update docstrings for birnn
* rename argument for SimpleRNN and SimpleRNNCell, fix sample code
* add default value for initial_states in fluid.layers.birnn
Co-authored-by: guosheng <guosheng@baidu.com>