Recently, test_parallel_executor_test_while_train randomly failed on CI. On all CI logs, it showed NCCL initialization failed or cusolver initialization failed. I found online that those failure is usually caused by GPU shortage. Those API calls CUDA APIs directly so it shouldn't be the problem of allocator. It may be somewhere in PaddlePaddle increases GPU usage.
However, I run this test for 1000 times on my machine and the CI machine, either of them can reproduce the random failure. Maybe there is something related to the environment only happened in test env.
To verify my assumption that somewhere in PaddlePaddle increases GPU usage and also fix this CI, I decreased the batch_size to see whether the random failure disappears in test env.
* fix strided_slice_op's GetExpectedKernelType when input tensor is at CUDAPinnedPlace
* add unittest for tensors in cuda pinned place
* skip test for cuda pinned place on cpu machines
* fix bug of fetch_async_op_handle
* revert some changes of test_buffer_shared_memory_reuse_pass
* revert some changes of test_buffer_shared_memory_reuse_pass
* transfer from paddle.fluid.layers.assign() into creation.py,test=develop
* fix ut fail,add support for paddle.assign,test=develop
* fix,test=develop
* fix UT coverage,test=coverage
* fix UT fail,test=coverage
* fix doc,test=develop
* Still has bugs.
* Fixed allclose_op bug, which cannot deal with some cases of fp64 inputs.
* improved CUDA kernel performance.
* Changed CUDA code.
* Fixed a bug in cuda kernel which cannot deal with large dimension input, and added an unittest for it.
* Add a test case for float32 input.
* fix multinomial doc
* fix multinomial error message
* little doc change
* fix Categorical class doc
* optimize format of error message
* fix CPU Kernel error message format
* fix isinf and isnan error in WindowsOPENBLAS CI
* delete inf and nan
* add manual_seed in sample code
* little error message change
* change error message to InvalidArgument
* add full point for error message and add manual_seed in CPU environment
* Add truncated_gaussian_random_op XPU kernel
* Add truncated_gaussian_random_op XPU kernel, test=kunlun
* little change, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* little change, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* little change, test=kunlun
* add TODO, test=kunlun
* Add gaussian_random XPU kernels
* commit kunlun, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* add TODO, test=kunlun
* support uniform_random op on Baidu Kunlun
* change dtype of attr shape from int to int64_t
* kunlun ci, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format
* run Kunlun CI, test=kunlun
* add TODO, test=kunlun
* Incorporate cudnn_lstm into LSTM api.
test=develop
* Make coalesce_tensor support alignment optionally.
test=develop
* Reorganize RNN apis. test=develop
* Fix cudnn rnn layout conversion.
test=develop
* Add sequence_length support for RNN cudnn implement.
Add optional init_h and init_c gradient for cudnn_lstm_op.
test=develop
* Use create_parameter for rnn cudnn impl.
test=develop
* Move `self._flat_weight = self.create_parameter()` in RNNBase to main_program.
test=develop
* Update RNN api unittest to use set_device.
test=develop
* Fix set_place for unit tests of RNN apis.
test=develop
* Fix use_align in coalesce_tensor_op.
test=develop
* Adjust RNN apis arguments according to comments.
test=develop
* Polish documents for SimpleRNN apis.
test=develop
* Refine random seed in cudnn_lstm_op.
Expose rnn params from sublayers to RNN.
test=develop
* Fix RNN saving for jit.save.
Refine cudnn_lstm dropout behavior.
test=develop
* Fix doc of GRU. test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Remove updates on cudnn_lstm temporarily.
test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Refine random seed in cudnn_lstm_op.
test=develop
* Fix test_lstm by adjust ConcreteProgram buffer getter.
test=develop
* Use create_parameter instead of create_var for rnn._flat_weight for static graph usage.
test=develop
* Remove W input for cudnn_lstm to pass unused_var_check.
test=develop
* Add test_predict for RNN unit tests coverage.
test=develop
* Fix code style of rnn.
test=develop
* Fix F.rnn usage in rnn.py.
test=develop
* test=kunlun;
Add elementwise XPU OP kernel for KUNLUN core, including (but still cannot process common broadcast):
* elementwise_div op
* elementwise_max op
* elementwise_mul op (with grad op)
* elementwise_sub op (with grad op)
* 0.05->0.01
* add xpu error message description;test=kunlun
* Make dynamic_decode support dygraph and expose to API 2.0
test=develop
* update info about BeamSearchDecoder and dynamic_decode
* remove all APIs in paddle.text, expose BeamSearchDecoder and dynamic_decode
* update example code
* delete test_text.py, decode.py, update some doc, fix example code float64
* delete decode import from paddle.nn
* fix unittest bugs
* use dygraph.Embedding instead of nn.Embedding, add paddle.enbale_static()
* update, correct doc
* move dynamic_decode, BeamSearchDecoder API to paddle.nn
* fix code style
* update unittest param, delete import pf text.py
* set dtype of beamsearchtest float64
* update example code of BeamSearchDecoder, dynamic_decode
Co-authored-by: LiuChiaChi <709153940@qq.com>
1. support channel last in BatchNorm*d (#27875)
2. fix a bug in batch_norm_op cuda kernel by extracting ResizeToChannelFist(Last), TransToChannelFirst(Last) to operators/layer_utils.h
* modify cond while_loop to paddle.static.nn.cond
* modify crop_tensor to paddle.crop
* modify Variable to paddle.static.Variable
* remove nn.beam_search, nn.beam_search_decode, nn.gather_tree
* remove bpr_loss, center_loss, rank_loss, smooth_l1, teacher_student_sigmoid_loss, edit_distance, sampled_softmax_with_cross_entropy in nn.functional
* remove apis in nn.functional.learn_rate.py
* remove pool2d, pool3d, adaptive_pool2d, adaptive_pool3d in nn.functional
* remove apis in nn.functional.vision
* remove erf, soft_relu in nn.functional.activation
* remove apis in nn.functional.extension
* remove nn.functional.rnn
* remove hash from nn.functional.lod
* remove row_conv from nn.functional.extension
* remove one_hot, pad2d, pad_constant_like from nn.functional.common
* remove nn.gather_tree, nn.BilinearTensorProduct, nn.Pool2D, nn.Pad2D
* remove apis from optimizer.__init
* remove tensor.creation.fill_constant
* remove elementwise_mul in nn.functional.common and modify to paddle.multiply
* remove tensor.stat.reduce_mean
* remove reduce_all, reduce_any in tensor.logic
* remove apis in tensor.math
* remove apis in tensor.__init__
* remove has_inf, has_nan in tensor.search
* remove apis in framework.__init__
* remove apis in paddle.__init__
* remove apis in nn.functional.__init__
* modify removed alias apis to raw api in doc and unittests
* fix remove grid_sample bug
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* modify removed alias apis to raw api in doc and unittests
* delete alias api relastions in doc
* reserve paddle.compat, paddle.sysconfig
* remove unittest for paddle.reduce_all, paddle.reduce_any
* modify removed alias apis to raw api in doc and unittests
* recover paddle.save and paddle.load
* resolve conflicts
* fix sample code missing paddle.enable_static() bug
* fix sample code missing paddle.enable_static() bug
* fix to_string sample code error
* del the DEFINE_ALIAS of sigmoid_cross_entropy_with_logits
* del sigmoid_cross_entropy_with_logits in python/paddle/nn/functional/loss.py, test=develop
* call paddle.fluid.layers.sigmoid_cross_entropy_with_logits in bce_with_logits_loss, test=develop
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer
* test=document_fix
* test=document_fix
* fix gpu version paddle Error when have no CUDA device
* optimize format and add new unittest
* fix coverage problem
* fix unittest format
* change static mode to dygraph mode
* use subprocess in unittest
* 1. remove paddle.unique_with_counts api, which counts as unique api
2. add paddle.math.increment(x, value=1.0, name=None) api
3. replace paddle.sums with paddle.add_n api
4. update paddle.metric.accuracy api (add name parameter)
* fix doc and unittest of 2.0 lr_scheduler
* fix doc of 2.0 lr_scheduler
* fix unittest
* fix english doc of lr_scheduler
* fix api name of lr scheduler
* fix api name of lr scheduler
* add load_op_xpu for Baidu Kunlun, test=kunlun
* add is_compiled_with_xpu for unit test, test=kunlun
* add is_compiled_with_xpu for unit test, test=kunlun
* replace config by kwargs
* change save path form dir to prefix
* fix failed unittests
* revert unittest name change
* polish en docs
* add more tests for coverage
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* Add api of constant in paddle.nn.initializer
* modify doc for Layer, test=develop
* modify doc for Layer2, test=develop
* set dtype default value to float32, test=develop
* add example code for paddle.nn.Layer, test=develop
* set create_parameter and create_variable dtype default value to None, test=develop
* modify some example code, tet=develop
* refine, test=develop
* del no ues code, test=develop
* modify doc, example code, args, test=develop
* modify doc, test=develop
* fix huber_loss ans npair_loss doc and example code, test=document_fix
* remove disable_static in example code, test=document_fix
* remove huber_loss and refine npair_loss example code, test=document_fix
* remove huber_loss in functional/__init__.py, test=document_fix
* add multinomial cpu kernel
* fix C++ notype error
* fix windows ci array len error
* let array len be const
* change array to vector
* add cuda kernrl with num_distribution is 1, and not support replacement=False
* add multinomial python api
* support num_distribution different multinomial distributions
* add categorical class
* fix test_distribution enable_static error
* add unittest for different setting of Categorical
* optimize format
* little change
* little change
* add raise error if shape not match, optimize format
* fix windows CI dtype error in concat
* little changes
* little changes2
* change values type to int64
* change values type to int64
* change values type to int64
* remove input requirment in dygraph Model
* correct unittest
* upadte save inference model in dygraph without input
* fix unittets for test_model.py
* solve conflicts
* solve conflicts
* delete http.log
* fix test_model.py bug, correct initialization of MyModel
* fix unittests bugs
* set paddle manual seed for unittest
* fix Model bugs, because inputs can be list or dict when it is provided.
* add random seed for test_export_deploy_model
* delete redundant codes, because calls
* Code optimization, error information optimization
* add multinomial cpu kernel
* fix C++ notype error
* fix windows ci array len error
* let array len be const
* change array to vector
* add cuda kernrl with num_distribution is 1, and not support replacement=False
* add multinomial python api
* support num_distribution different multinomial distributions
* add multinomial python api unittest
* change output dtype to int64
* fix coverage prob
* optimize format
* fix dtype of output error, should be int64_t
* increase tolerance
* increase the difference between low and high
* change tolerance of Normal log_prob method
* change probs tolerance to 1e-4
* change tolerance of Normal kl method
* polish Program api doc & example
* polish CompiledProgram api doc & example
* polish ParallelEnv api doc & examples
* polish details, test=document_fix
* polish program doc details, test=document_fix
* polish details, test=document_fix
* fix note format error, test=document_fix
* add lost example, test=document_fix
* fix lost example, test=document_fix
Refine Dy2stat APIs to 2.0rc
After discussion, we accepted 3 key points from reviewers:
1. In 2.0rc we changed dygraph_to_static folder to dy2static
2. Keep the three files: convert_call_func.py, convert_operators.py, variable_trans_func.py
3. Remove convert_operators path when users import convert_xxx.
After this PR, users can import convert_xxx APIs by:
`import paddle.jit.dy2static.convert_xxx`
The file structure will be:
```
jit
dy2static
convert_operators.py
convert_func_call.py
variable_trans_func.py
```
Detail changed API in files:
In python/paddle/jit/dygraph_to_static/convert_call_func.py:
from ...fluid.dygraph.dygraph_to_static.convert_call_func import convert_call #DEFINE_ALIAS
In python/paddle/jit/dygraph_to_static/convert_operators.py:
from ...fluid.dygraph.dygraph_to_static.convert_operators import cast_bool_if_necessary #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_assert #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_ifelse #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_len #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_logical_and #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_logical_not #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_logical_or #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_print #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_var_dtype #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_var_shape #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.convert_operators import convert_while_loop #DEFINE_ALIAS
In python/paddle/jit/dygraph_to_static/variable_trans_func.py:
from ...fluid.dygraph.dygraph_to_static.variable_trans_func import create_fill_constant_node #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.variable_trans_func import create_static_variable_gast_node #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.variable_trans_func import data_layer_not_check #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.variable_trans_func import to_static_variable #DEFINE_ALIAS
from ...fluid.dygraph.dygraph_to_static.variable_trans_func import to_static_variable_gast_node #DEFINE_ALIAS
This PR fixed two bugs when converting LSTM in dy2stat:
is_unsupported has a condition can trigger Python syntax error
LSTM API's implementation in _rnn_static_graph doesn't include parameter initialization, which can cause dy2stat error.
We decreased the batch size on CPU so that it can run correctly on Win/Mac machine, this may cause the delta to be larger. So I set larger delta value.