* add more settings for distributed strategy
Basically, DistributedStrategy has several parts of configurations:
- BuildStrategy: the same as paddle.fluid.BuildStrategy, but the distributed arguments are moved out of BuildStrategy
- ExecutionStrategy: the same as paddle.fluid.ExecutionStrategy
- collective communication configs: nccl_comm_num, hierarchical allreduce and so on
- distributed algorithms: async_update(mainly used in PS), lars, lamb and so on
Based on the comment here b5f8784cab/paddle/fluid/framework/details/build_strategy.h (L49)
The unit test which compares Reduce and AllReduce must have diff. The PR_CI_Night runs on P40 machine and it has 8GB GPU, which is smaller than the 16GB normal CI machines. So we decrease the batch size in the past to make it runnable: https://github.com/PaddlePaddle/Paddle/pull/24651/files . Decreasing the batch size makes the difference occurs often. So this PR replace the absolute delta by relative delta.
Before this PR, the unit test failure happens with probability about 1/100. After this PR it doesn't happen.
* fix the double grad bug for the star gan. test=develop
* update the retain_graph parameter doc. test=develop
* add the unit test for the retain_graph parameter. test=develop
* Refine Model
1. Take the network (instance of Layer) as the input of Model.
2. Refine set_dict/load_dict of Layer.
3. Refine Input interface, so update code sample about Input
* New features, add sinh and cosh op, test=develop
* remove duplicate test function and remove out paramters, test=develop
* Add out paramters temporary, remove later. test=develop
* remove out args, PR 25570, test=develop
* remove TestParameter, test=developx
* add test api for static dygraph, test=develop
* add backword unittests for sinh and cosh, test=develop
* add ci check for changing op-related api without core.ops, test=develop
* generate api_source_md5 file when build, test=develop
* add failed example, test=develop
* add failed example, test=develop
* handle exception, test=develop
We found that the reason of multiple return error in SimNet is that I wrote wrong task_mode. If we set task_mode as "pairwise" correctly, which is a format of the model input data, the multiple return won't have problem in the unit test. In this PR we corrected the task_mode and enable multiple return in SimNet unit test.
This PR fixes a bug that SelectedRows cannot be supported in SimNet. The reason of this bug is that dygraph basic_engine didn't copy var's type when the var needs to be accumulated during backward. So when a var is SelectedRows and needs to be accumulated, like SimNet which calls net for two times, the var's type will be changed to default LoDTensor thus bug happens. To fix it, we just also copy the type.
Without this PR, the accumulated SelectedRows parameters in dygraph will be changed into LoDTensor. So when we fixed the bug of supporting SelectedRows in SimNet, we found `test_imperative_lod_tensor_to_selected_rows` failed and threw the error that SelectedRows was not supported for Transpose OP. To fix it, too, this PR also added support for SelectedRows for Transpose OP.
According to paddle 2.0 standard.
1, chang api to def meshgrid(*args, **kwargs) we hide name argument in **kwargs.
2, add relate ut.
3, change example code to imperative mode
Add Similarity Net as unit test. During the unit test, we found three problems:
1. The run_program_op has memory optimization error when running dy2stat net multiple times.
2. The support for SelectedRows can cause problem in dy2stat.
3. The return grammar has problem.
This PR fixes the 1. problem but modify codes for the 2. 3. problems to make PR smaller. I will fix those two problems in the next PR(s)
* fix optimizer.state_dict and LRScheduler.state_dict to save/load dygraph,test=develop
* fix optimizer.state_dict and LRScheduler.state_dict to save/load dygraph,test=develop
* Add a judgment that state_dict/set_dict is used incorrectly,test=develop
* fix some doc error,test=develop
* fix current_step_lr for _LearningRateEpochDecay,test=develop
* remove some unsed code to improve coverage,test=develop
* remove some unsed code to improve coverage,test=develop
* fix optimizer parameter is a iterator; test=develop
* fix parameter list None bug; test=develop
* use is not None; test=develop
* change list to iterable; test=develop
* fix bn & in in dy, test=develop
* update instance_norm,test=develop
* fix bugs,test=develop
* add more case in unittest,test=develop
* fix,test=develop
* fix,test=develop
* show the attr and functions of the Layer,test=develop
* add buffer for dir,test=develop
* fix __dir__,test=develop
* fix doc of Layer.__dir__, test=develop
* support tuple/list init for VarBase,test=develop
* fix doc of fluid.dygraph.to_variable,test=develop
* fix doc of fluid.dygraph.to_variable,test=develop
* fix the compatibility of PY2 and PY3 in paddle.distributed.launch
test=develop
* only pull log of local rank 0
test=develop
* log exception if UnicodeEncodeError occurs when pulling log in
paddle.distributed.launch
test=develop
Co-authored-by: SunGaofeng <peakbee@gmail.com>
* Add `matrix_nms_op`
test=develop
* Make ci happy
test=develop
* Exit early when no detection
test=develop
* Fix license year
test=develop
* Output index as well
test=develop
* Match nms2 lod behavior and add `return_index` flag
test=develop
* Make CI happy
test=develop
* Fix wording
test=develop
* add new API: MultiStepDecay, a new learing rate strategy, test=develop
* add new API: MultiStepDecay, a new learing rate strategy,test=develop
* add new API: MultiStepDecay, a new learing rate strategy,test=develop
* add base class of LearningRateEpochDecay, and MultiStepDecay, and StepDecay, test=develop
* fix doc to add coverage,test=develop
* add new api: optimizer.set_lr, test=develop
* add API doc and example code for optimizer.set_lr,test=develop
* add API doc and example code for optimizer.set_lr,test=develop
* Modified doc to :api_attr: imperative,test=develop
Support Various-Length Return Grammar in Dy2stat. This PR is a follow-up of https://github.com/PaddlePaddle/Paddle/pull/25176 .
The basic idea is putting no-value placeholder variables at `return` statement to make all `return` statement have same length, after that the static graph can have fixed fetch output (code at return_transformer.py). Then remove those no-value placeholder when we finally return dygraph result (code at partial_program.py).
However, various length return in Bert model is still not supported. The dy2stat can change the code as I wish but some ops which check shape at compile time (e.g. Reshape, MatMul) will throw error because of the no-value-placeholder may not have the required shape. Is this a matter? To me, those no-value placeholder will be replaced as really values meeting shape requirements at run time, so I think the solution should be some way to do the compile-time checking. By the way, every time when we have dynamic shape, it often causes problem in dy2stat. We should find a way to handle it in the future.
Fixing various return in Bert is my TODO thing and I will also find some other existing models for verification.
This PR added basic support for 'return' grammar in dy2stat. It supports the control flow of 'return'.
The basics idea is using a return value variable to store the early return statements and boolean state variables with if-else to skip the statements after the return statements.
**This PR is very basic support. There are some corner cases I didn't develop/test**. For example, 'return None', 'return different length of variables', 'return non-tensor and tensor together', 'no return statement'. **These corner cases will be done in my next PRs**. Target date is this week.
**Note**:
1. for the unit test, I changed test_program_translator.py because the StaticCode of `dyfunc_with_if_else` will change. To guarantee the correctness of `dyfunc_with_if_else`, I also run it in `TestRecursiveReturn` in test_return.py.
2. I commented the early return code in bert_dygraph_model.py because 'return different length of variables' is unsupported now. I also know that there are some other models used early return and we didn't enable it in the unit test. I will add support for it in next PRs and then re-enable those tests.
* add new api (set_global_initializer/reset_global_initializer),test=develop
* add new api (set_global_initializer/reset_global_initializer),test=develop
* fix doc and example code of set_global_initializer,test=develop
* The arg of append() can be not Tensor temporarily.
* Add Seq2Seq as ProgramTranslator Unit Test.
* set dtype of vocab_size_tensor to int64 to pass Windows-CI.
* Add a StatValue class in the backend to represent a stat.
* Add a singleton StatRegistry to maintain the collection of stats.
* For the sake of code neatness, we only support type of int and float, which can cover most of the scenarios.
* Support int and long: int or long -> six.integer_types.
* Modify test_tensor_shape: fix bug and modify comment.
* Support convert_var_shape to convert var.shape stmt
* Modify code in ifelse_simple_func.py because don't support return non-Tensor in Tensor-dependent 'if' stament currently.
* Convert the return variables of Tensor-dependent 'if' staments to Tensor if it not. test=develop
* Move function 'convert_len' to file convert_operators.py
* Support that for statements are transformed to while statements.
* Fix bug: raise None -> return None.
* Support variable loaded and created in loop.
* Use int64 in Py2 and Py3 in function to_static_variable.
* Support LoDTensorArray in reverse_op test=develop
* polish en doc and unittest code test=develop
* refine sample code test=develop
* add example of LoDTensorArray test=develop
* fix typo test=develop
* cast var in convert_logical_XX.
* Add convert_ifelse function in convert_operators.py
* Add logical_transformer. Remove LogicalTransformer from loop_transformer.py
* Revert modified tests in PR24799(convert_while_stmt).
* Comment and modify code that doesn't support `return` statement.
* Remove unnecessary class: MergeAssignTransformer, NodeTestTransformer and IfConditionVisitor in ifelse_transformer.