ShenLiang
0fb18bc214
enforce the matmul_v2 error message ( #29297 )
4 years ago
Zhen Wang
9b59a589b1
Remove some useless log. ( #29300 )
4 years ago
Leo Chen
13a22a3752
fix shape of tile_grad op ( #29289 )
4 years ago
Zhen Wang
be3777a50a
Add pure fp16 training with master weights. ( #27712 )
...
* add the weight decay func for the momentum op
* Add the multi_precision function in Momentum Optimizer.
* Make sure that the initial value of master weights are same with the fp16 weights.
* add static loss scaling.
* add the rescale_grad function in the pure fp16 training.
* use the original momentum updating method.
* Polish some codes, such as variable names.
* add docstring for apis.
* update the var creation details of _create_master_weight.
* not modify codes about imperative momentum updating.
* Fix the error of test_dist_sparse_tensor_load_momentum UT.
* add unit test for multi precision fp16 training.
* add more unit tests for CI.
* Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.
* For CI Coverage Checking.
4 years ago
furnace
7584bb5096
Layer norm fp16 ( #29169 )
...
* add fp16 for layer_norm op
* revert layernorm api
* fix forward
* fix forward
* fix backward for layernorm with fp16
* fix unit test for layernorm with fp16
* fix with_mkldnn compile error for layernorm with fp16
* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>
* fix with_mkldnn compile error for layernorm with fp16
* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: zhiqiu <chenqiuliang@baidu.com>
4 years ago
Leo Chen
116305ea4b
Improve performance of elementwise_add grad op ( #29187 )
...
* pass stop_gradient for cast op
* improve performance of elementwise_add grad
* use tensor copy async
* dygraph branch
* fix dygraph branch
* add ut
4 years ago
卖鱼的哲学
07c67d5a8b
add deformable_conv op on xpu ( #29234 )
...
* rebase develop
* update deformable_conv op on xpu
* update deformable_conv op on xpu
4 years ago
QingshuChen
64f29fbb70
update kunlun conv2d/softmax/elementwise implemetation ( #29229 )
...
* update conv2d & softmax to new xpu api
* test=kunlun
* remove useless comments
* test=kunlun
* remote softmax xpu op
* test=kunlun
* update kunlun softmax
* test=kunlun
* update xpu unitest
* test=kunlun
* fix elementwise_grad bug for kunlun
*test=kunlun
4 years ago
chentianyu03
8f45d14263
add complex64 and complex128 type; add +-*/@ and slice opreator for c… ( #29199 )
...
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types
* add test cases for complex elementwise, matmul and getitem unittest
* add test cases for complex types
* add test cases for complex matmul unittest
4 years ago
Wilber
74c43ac638
fix lite unit test. ( #29233 )
4 years ago
Adam Osewski
4096ff94dc
Small optimizations for conv2d kernel subroutines. ( #29188 )
...
- Make sure that oneDNN memory descriptors are created only once at
first iteration.
4 years ago
123malin
b5c6342336
Update ps gpu ( #29209 )
...
* fix paramete prefetch & device guard
Co-authored-by: MrChengmo <cmchengmo@163.com>
Co-authored-by: chengmo <chengmo@baidu.com>
4 years ago
123malin
03d4665f44
prefetch optimize ( #29095 )
...
* test=develop, optimize async prefetch
4 years ago
WangXi
0c2a51d240
optimizer amp, all use fp16 communication, overlap last comm and compute ( #28957 )
4 years ago
Jack Zhou
bc6033f86b
fix gru gcc7.4 bug for the gru compile
...
fix gru gcc7.4 bug for the gru compile
4 years ago
wangchaochaohu
b818429ae7
optimize cumsum OP ( #29193 )
4 years ago
lilong12
7e5e9934fe
update expand as op to use the shape of the target tensor instead of the target tensor itself. ( #29020 )
...
* update, test=develop
4 years ago
Jack Zhou
085260f3de
Add eigen gru and fix the dropout bug in the rnn
...
Add eigen gru and fix the dropout bug in the rnn
4 years ago
arlesniak
bc902044a4
Fixes mkldnn dygraph learning rate scheduler crashes ( #28988 )
4 years ago
Shang Zhizhou
b9e76a0103
detect tensorRT plugin fp16 in runtime ( #27933 )
...
* remove -DSUPPORTS_CUDA_FP16 in cuda.cmake
* comile with cuda9
* add some unittest
* notest;test=coverage
* add unittest for trt plugin swish && split
* update ernie unittest
* fix some error message
* remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter
* fix comile errror when CUDA_ARCH_NAME < Pascal"
* fix comile error
* update unittest timeout
* compile with cuda9
* update error msg
* fix code style
* add some comments
* add define IF_CUDA_ARCH_SUPPORT_FP16
* rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED
4 years ago
Noel
da71173bc9
Fix ops doc for some ops
...
Fix ops doc for some ops
4 years ago
joanna.wozna.intel
b0d1ac161e
Add bf16 pool2d and unify bf16 unit tests ( #29039 )
...
* Add bf16 pool2d and unify bf16 unit tests
* Add change default ops test
4 years ago
joejiong
582c0a0468
add uint8 for reshape op ( #28996 )
...
add uint8 for reshape operator
4 years ago
taixiurong
a5aa4dc7a9
add xpu elementwise ops ( #29031 )
4 years ago
joejiong
b04c78ef5e
Update pow ( #29000 )
...
Simple code clean up
4 years ago
wawltor
b2c8a00745
remove eigen threadpool for the speed up
...
remove eigen threadpool for the speed up
4 years ago
lilong12
767d0ba267
update, test=develop ( #28700 )
4 years ago
123malin
fbf9564f6b
【paddle.distributed.fleet】Optimize ParameterServer's Async Mode ( #28442 )
...
* test=develop, optimize global_step
4 years ago
furnace
8ff3550658
refactor momentum op to combine weight ( #27414 )
...
* refactor momentum op to combine weight_decay (scale op and sum op)
4 years ago
Jacek Czaja
bd1d6d3b30
extends oneDNN caching keys so caching objects are unique to executor/predictor ( #28758 )
4 years ago
yaoxuefeng
71c1cd1408
fix truncated_gaussian seed ( #28777 )
4 years ago
gongweibao
1dad8ceaab
Fix gpu memory allocation bug. ( #28703 )
4 years ago
Chen Weihang
b969c32ab1
fix occupied 0 device memory bug ( #28771 )
4 years ago
joejiong
1a532d5133
add uint8 support for squeeze operator ( #28734 )
...
Adding uint8 support for squeeze operator.
4 years ago
wangchaochaohu
8b853b3030
fix the number of perf algo for conv cudnn in exhaustive mode ( #28694 )
4 years ago
joanna.wozna.intel
8c0ea4bffe
Add bf16 matmul, fc, elementwise add and mul ( #28729 )
...
* Add bf16 matmul, fc, elementwise add and mul
* Correct unit test
4 years ago
yaoxuefeng
08b62f4902
fix shuffle batch op shuffle ( #28533 )
4 years ago
taixiurong
d3d1a6b6e0
add kunlun kernel: slice, slice_grad, top_k, cast. *test=kunlun ( #28542 )
...
* 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api
* 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api
4 years ago
Jack Zhou
9362d85e0e
Add LSTM, Simple RNN and GRU CPU kernel ( #28577 )
...
* add lstm, simple rnn op kernel
* fix the test_lstm for the rnn op
* change func name
* fix forward postprocess bug
* add gru forward, backward code
* remove unittest.skipIf; use a big rnn op instead of combination op
* fix input doesn't have gradient bug
* add eigen lstm forward, backward
Co-authored-by: wawltor <fangzeyang0904@hotmail.com>
4 years ago
QingshuChen
30ef3815b3
adjust kunlun header file ( #28536 )
...
* adjust kunlun header file
*test=kunlun
* update kunlun unittest
*test=kunlun
* update xpu unitest
* test = kunlun
* update xpu unittest
* test=kunlun
* update xpu unitest
* test=kunlun
4 years ago
Zhang Ting
dab4920568
improve performance of cast op ( #28727 )
4 years ago
yaoxuefeng
03f46e3526
fix truncated_gaussian op cuda seed setting ( #28678 )
4 years ago
Wojciech Uss
04bcc13fac
Add multi_gru op and tests ( #28591 )
...
* Add multi_gru op and tests
* removed redundant disable_dygraph()
4 years ago
joejiong
32b90b1c2d
add log10 ( #28576 )
...
Add new operator log10
4 years ago
Guo Sheng
858ffa0c8b
Fix the dropout setting when not initialized in rnn_op. ( #28561 )
...
test=develop
4 years ago
Jacek Czaja
6d8d3d4c22
[oneDNN] Layer norm bf16 kernel ( #28619 )
4 years ago
Zhou Wei
bf143652ac
fix lstm OP compile error on windows ( #28667 )
...
* add unittest and check unittest for windows
* fix lstm OP compile error on windows
4 years ago
石晓伟
57dab959ca
add datanorm op new scale_w register ( #28657 )
...
Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>
4 years ago
cc
65aac81191
Fix fake_quant error when cout > 1024, test=develop ( #28603 )
4 years ago
lilong12
b2f7ab6636
bug fix, test=develop ( #28648 )
4 years ago
wawltor
8f2656ef5c
fix the gradient bug for the topk v2
...
fix the gradient bug for the topk v2
4 years ago
wangchaochaohu
a972c33fd7
refine gather OP performance for dynamic mode ( #28587 )
4 years ago
joanna.wozna.intel
2cb71c0cde
Add checkpoint to quantize ( #28612 )
...
* Add checkpoint to quantize
* Change bfloat16 option
4 years ago
pangyoki
b889a0cee2
add gaussian_random op_version ( #28602 )
4 years ago
Guo Sheng
110febdc54
Fix gradients with ignore_idx in softmax_with_cross_entropy ( #28622 )
...
* Fix gradients with ignore_idx in softmax_with_cross_entropy.
test=develop
* Fix gradients with ignore_idx in softmax_with_cross_entropy on cpu.
Remove softmax_with_cross_entropy from op_threshold_white_list.
test=develop
* Fix test_softmax_cross_entropy_op.py.
test=develop
4 years ago
Leo Chen
f962bd3432
Fix cudnn workspace limit in cudnn-8 ( #28611 )
4 years ago
Leo Chen
90805e2df7
Register op_version for new attribute use_addto ( #28463 )
...
* register op_version for addto
* upgrade pass capability
* change eq to le
* change eq to le
* fix merge
4 years ago
lilong12
ed9dd7c9f0
add send and recv ops ( #28590 )
...
* update, test=develop
4 years ago
Zhong Hui
a829357e4d
register the op version for some ops
...
register the op version for some ops
4 years ago
Zhou Wei
bf6e7cba7a
updata 2.0 API english doc ( #28525 )
...
* make Numpy version is below 1.19.3
* fix 2.0 doc
4 years ago
Shang Zhizhou
8699f38d08
裁剪transformer模型trt支持;修复tensorRT不支持DeletePass的bug ( #28517 )
...
* skip_layernorm_op done
* add unittest
* slice op convertor support trt < 6
* skip_layernorm only work in ernie
4 years ago
joejiong
08d2413142
add log2 operator ( #28319 )
...
As the title
4 years ago
wangchaochaohu
c52fe48f6f
fix the GetKernelTypeForVar of input for fluid.gather ( #28534 )
4 years ago
wangchaochaohu
d7cfee9b31
Checkout point add ( #28488 )
...
* upgrade pass capability
4 years ago
zhupengyang
47cbf61dd4
fix softmax unittest float16 random error ( #28480 )
4 years ago
wangchaochaohu
e14ed71cc2
refine the performance of gather Op ( #28458 )
4 years ago
YUNSHEN XIE
ba0756325a
exec ut no more than 15s 1 ( #28439 )
...
* disable ut test_parallel_executor_fetch_isolated_var,test=document_fix
* test for limiting ut exec time as 15S
* fix an error caused by cannot find ut
* fix some error
* can not find test_transformer
* fix error caused by ut not run in windows
* fix error caused by Compiler Options
* fix error caused by setting timeout value as 15 in python/paddle/tests/CMakeLists.txt
* setting timeout value to 120s for old ut
* add the timeout value setting
* fix error caused by ut only run in coverage_ci
* add analyzer_transformer_profile_tester
* fix some error
* fix some error
* fix error with inference option
* fix error with inference option setting as ON_INFER
* add some ut to set timeout
* modified some option
* fix error
* fix some timeout error
* fix error
* fix error
* fix timeout for test_analyzer_bfloat16_resnet50
* fix error
* setting timeout properity for some ut
* first pr for new ut timeout as 15S
4 years ago
taixiurong
fad4744aa4
fix crash in adam in xpu, *test=kunlun ( #28433 )
4 years ago
QingshuChen
6bba8e57b1
fix batch_norm_xpu bug & remove xpusimulator dependence ( #28430 )
...
*test=kunlun
4 years ago
joanna.wozna.intel
7821759d48
Add bfloat16 softmax and gelu ( #28394 )
...
* Add bfloat16 softmax and gelu
* Add pass attr bfloat16_enabled_op_types
* Changes from review
4 years ago
石晓伟
c41fd033e5
check op_version_registry in CI test, test=develop ( #28402 )
4 years ago
Jacek Czaja
ca41541472
[oneDNN]Sum bf16 kernel ( #28382 )
...
* - Added sum bf16 oneDNN
test=develop
* - Fix to UT of sum bf16
test=develop
4 years ago
Leo Chen
8b2436a776
Add broadcast_shape api ( #28257 )
...
* add broadcast_shape api
* add ut
* follow comments
* add example code, test=dodument_fix
* update example code, test=document_fix
4 years ago
石晓伟
21a63f6f90
enhance the op_version_registry, test=develop ( #28347 )
...
* enhance the op_version_registry, test=develop
* add unittests, test=develop
* enhance the op_version_registry, test=develop
* fix bugs, test=develop
* revert pybind_boost_headers.h, test=develop
* fix a attribute bug, test=develop
4 years ago
Shang Zhizhou
ea851796e5
TensorRT中ernie模型推理性能优化,支持变长输入 ( #28367 )
...
* fp16 result ok
* change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS
* auto detect special slice op converter for ernie with trt oss
* ernie oss only support fp16
* fix special_slice_plugin serialize bug
* matmul in tensorrt ok
* ernie unittest ok
* add matmul tensorrt unittest
* remove demo code
4 years ago
Jacek Czaja
84cc61b2cd
[oneDNN] sum op refactor ( #28318 )
4 years ago
Wilber
09fd2b2aab
Paddle support compile on sw ( #27858 )
4 years ago
Leo Chen
6115c14fca
Pool2d cuda kernel supports fp16 ( #28316 )
...
* pool2d cuda kernel supports fp16
* fix compile issue of template
* add ut
4 years ago
Guo Sheng
9a600df373
Add rnn_op ( #28197 )
...
* Add rnn_op.
test=develop
* Fix rnn_op grad maker's drop_empty_grad.
test=develop
4 years ago
wangguanzhong
5262b02585
add generate_proposals_v2 op ( #28214 )
...
* add generate_proposals_v2 op
4 years ago
joanna.wozna.intel
571a63e7ec
Add bf16 transpose2, reshape2, concat ops ( #28195 )
4 years ago
Guanghua Yu
e8f2614da5
Enhance multiclass_nms op to support LoD for dygraph mode ( #28276 )
...
* Enhance multiclass_nms to support LoD for dygraph mode
* fix some error in multiclass_nms
* update GetLodFromRoisNum to GetNmsLodFromRoisNum
4 years ago
Leo Chen
8953038400
Fix transpose in conv cudnn kernel when addto enabled ( #28295 )
4 years ago
Tao Luo
e1e666a05f
fix conv mkldnn build error ( #28288 )
4 years ago
Jacek Czaja
0b678d401b
- sum ( #28233 )
...
test=develop
4 years ago
Jacek Czaja
c11d9b3035
[oneDNN ] conv2d fwd&bwd optimization ( #27871 )
4 years ago
wangxinxin08
41d26a8287
update matrix nms op to api 2.0 ( #28265 )
...
* update matrix nms op to api 2.0
* modify code according to review
4 years ago
Leo Chen
7fcb32ddf3
fill_constant op supports NINF ( #28270 )
4 years ago
wangchaochaohu
6905608cea
refine yolo box Op for performace optimization ( #28155 )
4 years ago
wangchaochaohu
cdadc8f019
refine temporal_shift_op for performance optimization using gpu kernel config ( #28114 )
4 years ago
Zhang Ting
fdc06f2158
add Fuse bn add act pass ( #28196 )
...
* add fuse_bn_add_act pass
4 years ago
Chen Weihang
2babd6ff67
Add compile limit for PADDLE_ENFORCE without error message ( #28221 )
...
* add compile limit for paddle enforce
* polish elementwise_op_function.cu.h
* fix failed unittest
* fix windows compile failed
* detail polish
* revert no type constructor
4 years ago
Double_V
2db77be423
fix wrong data type, test=develop ( #28203 )
4 years ago
Feiyu Chan
efe6e2840c
fix strided_slice_op's GetExpectedKernelType ( #28192 )
...
* fix strided_slice_op's GetExpectedKernelType when input tensor is at CUDAPinnedPlace
* add unittest for tensors in cuda pinned place
* skip test for cuda pinned place on cpu machines
4 years ago
WangXi
e450823b8b
Fix nccl op test failed, test=develop ( #28172 )
4 years ago
wangguanzhong
5cd97a1cb0
support multiclass nms for multi-batch, test=develop ( #28154 )
4 years ago
Double_V
5289b72acc
fix Wmaybe-uninitialized warning in pooling.cc, test=develop ( #28126 )
4 years ago
wangguanzhong
d1e1f17482
fix generate_proposal_labels in cascade-rcnn series model, test=develop ( #27892 )
...
* fix generate_proposal_labels in cascade-rcnn series model, test=develop
* fix example code & unittest, test=develop
* update code from review comments, test=develop
4 years ago
Leo Chen
a911c19eb0
fill_constant op supports NaN and Inf ( #28109 )
...
* fill_constant supports nan and inf
* add ut
4 years ago
zhupengyang
6dd64b0a30
randperm run error in multi-gpus ( #27942 )
4 years ago
Double_V
d43f75e4cc
add rois_num for roi_align xpu OP ( #28077 )
...
* add stack pool2d roi_align xpu op,test=kunlun
* error message opt, test=kunlun
* add xpu unittest,test=kunlun
* skip check grad,test=kunlun
* fix boostget , test=kunlun
* error message opt for XPU, test=kunlun
* add rois_num for roi_align xpu OP, test=develop
4 years ago
xiaoting
e3d02c9574
rm max_input in conv2d for kunlun, test=kunlun ( #28062 )
4 years ago
wangchaochaohu
463c72c2d9
refine gpu kernel config for Paddle ( #28085 )
4 years ago
yinhaofeng
2cb1ecb99e
lookup_table_v2_op_xpu report errors;test=kunlun ( #28064 )
...
* lookup_table_v2_op_xpu report errors;test=kunlun
* lookup_table_v2_op_xpu report errors;test=kunlun
4 years ago
yinhaofeng
6f0c3d1f06
xpu adam op ( #28031 )
...
* lookup_table_xpu op report errors;test=kunlun
* add adam xpu op;test=kunlun
* reset lookup
* change adam wrong;test=kunlun
4 years ago
TeslaZhao
a5c95cd588
Add xpu transpose2 op.test=kunlun ( #28086 )
4 years ago
Chengmo
5f04875c30
Fix xpu error message ( #28061 )
...
* fix error message,test=kunlun
* fix, test=kunlun
4 years ago
LutaoChu
c8d32c8c10
Fix diag OP bug on Windows Python3.8
...
Fix diag OP bug on Windows Python3.8 ,remove the std::min
4 years ago
huangxu96
d466893820
Allclose op ( #27891 )
...
* Still has bugs.
* Fixed allclose_op bug, which cannot deal with some cases of fp64 inputs.
* improved CUDA kernel performance.
* Changed CUDA code.
* Fixed a bug in cuda kernel which cannot deal with large dimension input, and added an unittest for it.
* Add a test case for float32 input.
4 years ago
pangyoki
975bd8873b
Fix error message of multinomial op ( #27946 )
...
* fix multinomial doc
* fix multinomial error message
* little doc change
* fix Categorical class doc
* optimize format of error message
* fix CPU Kernel error message format
* fix isinf and isnan error in WindowsOPENBLAS CI
* delete inf and nan
* add manual_seed in sample code
* little error message change
* change error message to InvalidArgument
* add full point for error message and add manual_seed in CPU environment
4 years ago
Kaipeng Deng
b6eff4427c
update yolo_box support h != w. test=develop ( #27327 )
4 years ago
Double_V
c1eed1fa24
error message opt for XPU, test=kunlun ( #27972 )
...
* add stack pool2d roi_align xpu op,test=kunlun
* error message opt, test=kunlun
* add xpu unittest,test=kunlun
* skip check grad,test=kunlun
* fix boostget , test=kunlun
* error message opt for XPU, test=kunlun
4 years ago
pangyoki
4c5b779a99
Add truncated_gaussian_random XPU kernel ( #27861 )
...
* Add truncated_gaussian_random_op XPU kernel
* Add truncated_gaussian_random_op XPU kernel, test=kunlun
* little change, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* little change, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* little change, test=kunlun
* add TODO, test=kunlun
4 years ago
pangyoki
5b8e500135
Add gaussian_random XPU kernels ( #27853 )
...
* Add gaussian_random XPU kernels
* commit kunlun, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format, test=kunlun
* add TODO, test=kunlun
4 years ago
pangyoki
74ce039743
Add uniform_random XPU kernel ( #27846 )
...
* support uniform_random op on Baidu Kunlun
* change dtype of attr shape from int to int64_t
* kunlun ci, test=kunlun
* new version, test=kunlun
* change boost_get to BOOST_GET_CONST
* change boost_get to BOOST_GET_CONST, test=kunlun
* use Generator to generate random number and optimize format
* run Kunlun CI, test=kunlun
* add TODO, test=kunlun
4 years ago
xiaoting
abf4d52a74
Polish kunlun error ( #27974 )
...
* polish error message,test=kunlun
* polish error,test=kunlun
* polish error,test=kunlun
* polish error,test=kunlun
4 years ago
liuyuhui
3e9568653b
add cast/concat/assign xpu op ( #27911 )
...
* addd
* add cast_op_xpu, test=kunlun
* fix bug for cast_op_xpu,test=kunlun
* add concat_op_xpu, test=kunlun
* slove conflicts, test=kunlun
* fix bug,test=kunlun
* add assign_op_xpu, test=kunlun
* fix bug,test=kunlun
* test=kunlun;test=develop
* fix concat bug,test=kunlun
* fix check_dygraph set in test_concat_op_xpu.py,test=kunlun
* fix error message,test=kunlun
Co-authored-by: mapingshuo <mps2012@yeah.net>
4 years ago
Guo Sheng
fa9d3fa5bf
Incorporate cudnn_lstm into LSTM api ( #27217 )
...
* Incorporate cudnn_lstm into LSTM api.
test=develop
* Make coalesce_tensor support alignment optionally.
test=develop
* Reorganize RNN apis. test=develop
* Fix cudnn rnn layout conversion.
test=develop
* Add sequence_length support for RNN cudnn implement.
Add optional init_h and init_c gradient for cudnn_lstm_op.
test=develop
* Use create_parameter for rnn cudnn impl.
test=develop
* Move `self._flat_weight = self.create_parameter()` in RNNBase to main_program.
test=develop
* Update RNN api unittest to use set_device.
test=develop
* Fix set_place for unit tests of RNN apis.
test=develop
* Fix use_align in coalesce_tensor_op.
test=develop
* Adjust RNN apis arguments according to comments.
test=develop
* Polish documents for SimpleRNN apis.
test=develop
* Refine random seed in cudnn_lstm_op.
Expose rnn params from sublayers to RNN.
test=develop
* Fix RNN saving for jit.save.
Refine cudnn_lstm dropout behavior.
test=develop
* Fix doc of GRU. test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Remove updates on cudnn_lstm temporarily.
test=develop
* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop
* Refine random seed in cudnn_lstm_op.
test=develop
* Fix test_lstm by adjust ConcreteProgram buffer getter.
test=develop
* Use create_parameter instead of create_var for rnn._flat_weight for static graph usage.
test=develop
* Remove W input for cudnn_lstm to pass unused_var_check.
test=develop
* Add test_predict for RNN unit tests coverage.
test=develop
* Fix code style of rnn.
test=develop
* Fix F.rnn usage in rnn.py.
test=develop
4 years ago
Guanghua Yu
f94d053705
error message optimization in mean_xpu,softmax_with_cross_entropy_op_xpu,test=kunlun ( #27967 )
4 years ago
Jack Zhou
d330cf66cc
Fix xpu enforce ( #27978 )
...
* test=kunlun;
Add elementwise XPU OP kernel for KUNLUN core, including (but still cannot process common broadcast):
* elementwise_div op
* elementwise_max op
* elementwise_mul op (with grad op)
* elementwise_sub op (with grad op)
* 0.05->0.01
* add xpu error message description;test=kunlun
4 years ago
lidanqing
7cb4a8b8f2
[oneDNN] Conv dilation support ( #27914 )
...
* conv dilated mkldnn support: forward and backward pass
* add mkldnn conv_transpose dilation UT
test=develop
* remove unnecessary PADDLE_ENFORCE
* add int8 and bf16 dilated conv UT
* update according to reviews
4 years ago
mapingshuo
64c2634995
fix kunlun kernel of reshape op ( #27988 )
4 years ago
tangwei12
202bfab1be
Feature/large scale kv save base/delta ( #27470 )
...
* add size method for large scale
* add large scale UT
* add ut for checkpoint
4 years ago
123malin
aa3b4ed717
【paddle.fleet】geo send sparse optimize ( #27719 )
...
* test=develop, fix geo sgd communicator
* test=develop, gloo_init_method
* test=develop, bug fix for gloo http_init
4 years ago
mapingshuo
5ccaaab8aa
reshape support bool, test=develop ( #27944 )
4 years ago
Qinghe JING
4a4f773658
Add reduce sum and reduce mean xpu op ( #27939 )
...
* add reduce xpu op test=develop;test=kunlun
* add reduce xpu op test=develop;test=kunlun
* add reduce xpu op test=develop;test=kunlun
* add reduce xpu op test=develop;test=kunlun
* add reduce xpu op test=develop;test=kunlun
4 years ago
Zhou Wei
bf412f4665
add tensor clone ( #27953 )
...
* add tensor clone
* fix unittest test_var_base
4 years ago
Feiyu Chan
2e845182d9
support channel last in BatchNorm*d
...
1. support channel last in BatchNorm*d (#27875 )
2. fix a bug in batch_norm_op cuda kernel by extracting ResizeToChannelFist(Last), TransToChannelFirst(Last) to operators/layer_utils.h
4 years ago
Leo Chen
9a2a4b5f65
Support setting xpu place in dygraph mode ( #27909 )
...
* support setting xpu place
* add ut, test=kunlun
4 years ago
MRXLT
263a9e97fd
Fix adam ( #27778 )
...
* fix adam
* fix gpu adam
* fix code style
* fix ut
* update ut add cuda code
4 years ago
Double_V
b0edda4d99
kunlun add op ( #27890 )
...
* add stack pool2d roi_align xpu op,test=kunlun
* error message opt, test=kunlun
* add xpu unittest,test=kunlun
* skip check grad,test=kunlun
* fix boostget , test=kunlun
4 years ago
Jack Zhou
c791df09cf
Add elementwise XPU OP kernel for KUNLUN core, including (but still cannot process common broadcast
...
Add elementwise XPU OP kernel for KUNLUN core, including (but still cannot process common broadcast
4 years ago
wangchaochaohu
c5fcc96d5b
xpu support for fill_constant Op ( #27675 )
4 years ago
Chengmo
328cb289ed
【paddle.fleet】fix sparse load ( #27680 )
...
* add sparse tensor load method
4 years ago
tangwei12
cf70d5b350
fix paddle error informations ( #27889 )
4 years ago
wawltor
95aa53425d
update the code for the topk message optimize
...
update the code for the topk message optimize
4 years ago
Chen Weihang
4ba977c720
Polish some error message in opeators ( #27876 )
...
* polish some error message
* add white list
* revert shell script change
4 years ago
123malin
a4f850748a
【paddle.fleet】bug fix for parameter_recv ( #27838 )
...
* test=develop, bug fix for parameter_recv
* test=develop, for unittest, test_fleet_rolemaker_new
4 years ago
QingshuChen
2712d07644
support kunlun matmul_v2 ( #27910 )
...
*test=kunlun
4 years ago
zhang wenhui
5a83496c8d
Multi task ( #26002 )
...
* add multitask
* add multitask, test=develop
* fix code style, test=develop
* add partail push dense, test=develop
* fix has_kay in py3, test=develop
* fix, test=develop
* fix, test=develop
* fix, test=develop
4 years ago
zhang wenhui
7a58431c0a
fix norm api doc, test=develop ( #27652 )
...
* fix norm api doc, test=develop
* fix error message, test=develop
* fix api norm, test=develop
* add adagrad, test=develop
* fix bug, test=develop
* fix bug, test=develop
* add spetral_norm, test=develop
* fix adagrad, test=develop
* merge , test=develop
4 years ago
yinhaofeng
3eb106da6d
Lookup table v2 xpu ( #27888 )
...
* add lookup_table_v2_op_xpu, test=kunlun
* add lookup_table_v2_op_xpu, test=kunlun
* change some Tips ,test=kunlun
4 years ago
Zhang Ting
d5cc144c60
tune backward filter algorithm for float16 ( #27529 )
...
* use exhaustive_search for float16
* tune algo only when dtype is float16
4 years ago
hutuxian
3f2a6ab65d
fix error msg ( #27887 )
4 years ago
xiaoting
ae01801f0a
Add dropout and log_loss for kunlun ( #27790 )
...
* add dropout,log_loss, test=kunlun
* fix dropout, test=kunlun
* polish error message, test=kunlun
* change boost::get to BOOST_GET_CONST, test=kunlun
* fix copyright, test=kunlun
4 years ago
Guanghua Yu
70c8c31371
support mean,softmax_with_cross_entropy on Baidu Kunlun ( #27792 )
...
* support mean,softmax_with_cross_entropy on Baidu Kunlun,test=kunlun
* fix unittests error,test=kunlun
* delete boost::get,test=kunlun
4 years ago
Chengmo
1607e87cb9
add xpu sgd & momentum ( #27728 )
...
* add xpu sgd & momentum
4 years ago
hong19860320
c90d35564b
Add batch_norm and layer_norm XPU kernels ( #27818 )
4 years ago
xiaoting
6da7a7458b
add conv for xpu, test=kunlun ( #27809 )
...
* add conv for xpu, test=kunlun
* polish error_message, test=kunlun
* polish error_message, test=kunlun
* fix copyrigth, test=kunlun
4 years ago
Thunderbrook
04be37c57f
add xpu slice op ( #27349 )
...
* add xpu slice op
test=xpu
* add slice xpu op
test=xpu
* code style
test=kunlun
* style
test=kunlun
* format
test=kunlun
4 years ago
Thunderbrook
8c25dfaacc
op error info ( #27856 )
...
* op error info
* style
* code format
4 years ago
ShenLiang
6d63cd2b93
add gather_op xpu, test=kunlun ( #27822 )
...
* add gather_op xpu, test=develop, test=kunlun
* fix ut, test=develop, test=kunlun
* fix the ut,test=develop, test=kunlun
4 years ago
Feiyu Chan
1d95a0fbc3
fix error message for nce_op ( #27863 )
4 years ago
guofei
2e1bca99ca
Refine the gradient calculation errors caused by renaming in while_grad ( #27814 )
...
test=develop
4 years ago
wanghuancoder
8fa4c09889
add load_op_xpu for Baidu Kunlun ( #27817 )
...
* add load_op_xpu for Baidu Kunlun, test=kunlun
* add is_compiled_with_xpu for unit test, test=kunlun
* add is_compiled_with_xpu for unit test, test=kunlun
4 years ago
Jacek Czaja
55e63763ec
[oneDNN] adaptive pool support ( #27747 )
4 years ago
Zhang Ting
16999ae49d
use IndexList to improve performance of instance_norm op ( #25132 )
...
* use IndexList to improve performance, test=develop
* remove EIGEN_HAS_INDEX_LIST, test=develop
* use IndexList only when EIGEN_HAS_INDEX_LIST is true
4 years ago
GaoWei8
36bb056ed6
Add flattern weight of lstm ( #27192 )
...
* add flattern weight of lstm
4 years ago
Guanghua Yu
7779790c61
error message optimization in softmax_with_cross_entropy_op ( #27772 )
...
* error message optimization in softmax_with_cross_entropy_op
* fix some unsuited comment
4 years ago
TeslaZhao
070ac9590c
Add double grad in Squeeze and Unsqueeze ( #27810 )
...
* Add double grad in Squeeze and Unsqueeze
* Add double grad in Squeeze and Unsqueeze
4 years ago
Jack Zhou
d4359b0f39
add the kunlun kernel for the paddle 2.0
...
Add xpu kernel for KUNLUN core:
* accuracy op
* sign op
* scale op
* sum op
Add default atol in xpu unittest.
4 years ago
mapingshuo
840d54de9b
add XPU support for shape op and reshape op ( #27804 )
4 years ago
cc
8fabb1c32f
Add test attribute in channelwise_quant op, test=develop ( #27742 )
...
* Add test attribute in channelwise_quant op, test=develop
4 years ago
wangxinxin08
ad99e638fd
add double grad op for matmul ( #27776 )
...
* add matmul doublegrad op
* fix compile errors
* modify code according to review
* delete float16
4 years ago
zhupengyang
0025e0d87b
refine APIs: brelu, hardsigmoid, hardswish, maxout ( #27658 )
4 years ago
zhupengyang
5098891fdf
add softmax xpu kernel ( #27700 )
4 years ago
Double_V
f6ad2375be
fix pool3d bug, test=develop ( #27718 )
...
* fix pool3d bug, test=develop
* fix unitest, test=develop
* fix test and fix pool2d bug, test=develop
4 years ago
Feiyu Chan
0a7bab4e34
fix error mesage for negative_positive_pair_op and nce_op ( #27779 )
4 years ago
zhupengyang
395cb561aa
refine logsumexp error message and docs ( #27713 )
4 years ago
smallv0221
057e28bc8f
API(lstm_unit, lstmp, sequence_mask, sequence_enumerate, sequence_conv) error message enhancement ( #27572 )
...
* API(Compute) error message enhancement on line 44, 50, 53.
* lstm_unit error message enhancement.
lstmp error message enhancement.
sequence_conv error message enhencement.
sequence_enumerate error message enhencement.
sequence_mask error message enhencement.
* Update lstm_unit_op.cc
* Update lstm_unit_op.h
* error msg enhancement.
* Update sequence_conv_op.cc
* Update lstm_unit_op.cc
* Update sequence_conv_op.cc
* Update sequence_enumerate_op.cc
* Update sequence_enumerate_op.cu
* Update sequence_enumerate_op.h
* Update sequence_pool_op.h
* error message enhencement.
* error message enhancement.
4 years ago
Jacek Czaja
606611d351
[oneDNN] GRU BF16 kernel ( #27731 )
4 years ago
xiemoyuan
6c1acf34ed
Optimize the error message for OP ( #27617 )
...
* Optimize the error message for OPs.
* Optimize the error message for OPs in details.
4 years ago
cc
ec7d11a492
refine fused_elemwise_activation error message ( #27734 )
4 years ago
Zhen Wang
365c2c9c89
fix error message showing in UpdateLossScalingOp ( #27596 )
4 years ago
LielinJiang
9089841b6e
Fix bilateral inference shape bug ( #26822 )
...
* fix bilateral bug
4 years ago
Yiqun Liu
65207b4560
Polish the error message of fc, fused_fc_elementwise_layernorm and fused_embedding_seq_pool. ( #27692 )
...
* Polish the error message of fc_op.
* Polish the error message of fused_fc_elementwise_layer_norm op.
* Polish an error message in fused_embedding_seq_pool_op.
4 years ago
Jacek Czaja
b9fda2ff09
Fix to issue #25537 ( #27546 )
...
* - condidate fix to issue #25537
test=develop
* - UT for transpose NHWC
test=develop
4 years ago
Wojciech Uss
966447e338
Added support for quantization of fusion_gru ( #27518 )
4 years ago
hong19860320
7a96d5788d
Optimize the error messages of the CUDA implementation of activation ops ( #27741 )
...
test=develop
4 years ago
tangwei12
fd616fadc2
repen heartbeat ut ( #27684 )
4 years ago
Qi Li
f373269df0
update histogram op for performance optimization, test=develop ( #24912 )
4 years ago
MRXLT
20fb01fb00
fix distributed error info ( #27206 )
...
* fix distributed error info
* bug fix; notest
* error info refine
* update error info
* update error info
* update error info
* bug fix
* bug fix
* bug fix
* bug fix
4 years ago
pangyoki
7cd2c13f1b
add multinomial op ( #27219 )
...
* add multinomial cpu kernel
* fix C++ notype error
* fix windows ci array len error
* let array len be const
* change array to vector
* add cuda kernrl with num_distribution is 1, and not support replacement=False
* add multinomial python api
* support num_distribution different multinomial distributions
* add multinomial python api unittest
* change output dtype to int64
* fix coverage prob
* optimize format
* fix dtype of output error, should be int64_t
4 years ago
Wojciech Uss
42d175385d
Add support for (de/re)quantization with shift ( #27481 )
4 years ago
123malin
cc780b1977
test=develop, optimize geo communicator ( #26857 )
...
* test=develop, optimize geo communicator
4 years ago
yukavio
7b46fb0f14
fix generate_proposals and affine grid error info ( #27636 )
4 years ago
AshburnLee
c3a3df6466
Add cuda support for unique op ( #27646 )
...
* unique op for cuda is added
* add support for cuda
* Add cuda support for unique op.
* Add support for int32_t and int64_t.
* For old version, process by cpu
* Add VisitDataType for thrust
4 years ago
wawltor
29f4922906
optimize the error meesage for detetion_map_op
...
optimize the error meesage for detetion_map_op
4 years ago
whs
daf5aa9b8b
Fix round in grid sample op ( #27657 )
4 years ago
ysh329
2f9cdd9038
API/OP clip_by_norm_op error message enhancement. test=develop ( #27614 )
...
* Fix clip_by_norm_op error message. test=develop
* test=develop
* test=develop
4 years ago
yongqiangma
aac57159c9
enhance array_to_lod_tensor_op lod_tensor_to_array_op errors informaiton ( #27386 )
...
* enhance array_to_lod_tensor_op lod_tensor_to_array_op errors information. test=develop
4 years ago
xiemoyuan
99e3337368
Optimize the error message of OP. ( #27478 )
...
* iCafe 9009: Optimize the error message of OP.
* Optimize the error message of GatherTreeOP.
4 years ago
ShenLiang
e8f873df88
optimize the speed&memory of matmul op ( #27610 )
...
* fix the speed&memory of matmul
* fix the comment
* fix the memory copy
* fix the windows ci
4 years ago
tangwei12
9704582eef
fix op error ( #27599 )
...
* fix error
* fix error
* fix error
* merge develop
4 years ago
yaoxuefeng
c9a8801325
enhance error messages of lookup_tale, merge_ids, data_norm ( #27619 )
...
* enhance error messages of lookup_tale, merge_ids, data_norm
* fix
* fix error msg in .cu
4 years ago
whs
9cc5603d56
Make grid support stopping graients. ( #27630 )
4 years ago
furnace
d01f626944
update mv op according PR#27024 ( #27474 )
4 years ago
Double_V
9d783aeddd
Error message opt, test=develop ( #27467 )
...
* Error message opt, test=develop
* solve comments, test=develop
* fix typo, test=develop
4 years ago
Li Fuchen
1501a80f74
add support to float64 input of warpctc op. ( #27399 )
...
* add float64 input to ctc_loss
* modified error message of warpctc
* update repo and tag of warpctc
* add test for warpctc with float64 input
* modified warpctc.cmake to make sure build always
* resolved sample code bug of warpctc
* add core.ops in warpctc dygraph
* fix a bug of test
4 years ago
QingshuChen
6b727e08b1
support elementwise add, activation, matmul on Baidu Kunlun ( #27143 )
...
* support elementwise add, activation, matmul on Baidu Kunlun
* test=kunlun
* minor
* test=kunlun
* reconstuct the xpu directory
* test=kunlun
* minor
* test=kunlun
* minor
* test=kunlun
* minor
* test=kunlun
* minor
* test=kunlun
* minor
* test=kunlun
4 years ago