Commit Graph

5891 Commits (3d015f1cf529915ab52cb8aef7c475f67fb128b5)

Author SHA1 Message Date
ShenLiang 0fb18bc214
enforce the matmul_v2 error message (#29297)
4 years ago
Zhen Wang 9b59a589b1
Remove some useless log. (#29300)
4 years ago
Leo Chen 13a22a3752
fix shape of tile_grad op (#29289)
4 years ago
Zhen Wang be3777a50a
Add pure fp16 training with master weights. (#27712)
4 years ago
furnace 7584bb5096
Layer norm fp16 (#29169)
4 years ago
Leo Chen 116305ea4b
Improve performance of elementwise_add grad op (#29187)
4 years ago
卖鱼的哲学 07c67d5a8b
add deformable_conv op on xpu (#29234)
4 years ago
QingshuChen 64f29fbb70
update kunlun conv2d/softmax/elementwise implemetation (#29229)
4 years ago
chentianyu03 8f45d14263
add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199)
4 years ago
Wilber 74c43ac638
fix lite unit test. (#29233)
4 years ago
Adam Osewski 4096ff94dc
Small optimizations for conv2d kernel subroutines. (#29188)
4 years ago
123malin b5c6342336
Update ps gpu (#29209)
4 years ago
123malin 03d4665f44
prefetch optimize (#29095)
4 years ago
WangXi 0c2a51d240
optimizer amp, all use fp16 communication, overlap last comm and compute (#28957)
4 years ago
Jack Zhou bc6033f86b
fix gru gcc7.4 bug for the gru compile
4 years ago
wangchaochaohu b818429ae7
optimize cumsum OP (#29193)
4 years ago
lilong12 7e5e9934fe
update expand as op to use the shape of the target tensor instead of the target tensor itself. (#29020)
4 years ago
Jack Zhou 085260f3de
Add eigen gru and fix the dropout bug in the rnn
4 years ago
arlesniak bc902044a4
Fixes mkldnn dygraph learning rate scheduler crashes (#28988)
4 years ago
Shang Zhizhou b9e76a0103
detect tensorRT plugin fp16 in runtime (#27933)
4 years ago
Noel da71173bc9
Fix ops doc for some ops
4 years ago
joanna.wozna.intel b0d1ac161e
Add bf16 pool2d and unify bf16 unit tests (#29039)
4 years ago
joejiong 582c0a0468
add uint8 for reshape op (#28996)
4 years ago
taixiurong a5aa4dc7a9
add xpu elementwise ops (#29031)
4 years ago
joejiong b04c78ef5e
Update pow (#29000)
4 years ago
wawltor b2c8a00745
remove eigen threadpool for the speed up
4 years ago
lilong12 767d0ba267
update, test=develop (#28700)
4 years ago
123malin fbf9564f6b
【paddle.distributed.fleet】Optimize ParameterServer's Async Mode (#28442)
4 years ago
furnace 8ff3550658
refactor momentum op to combine weight (#27414)
4 years ago
Jacek Czaja bd1d6d3b30
extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758)
4 years ago
yaoxuefeng 71c1cd1408
fix truncated_gaussian seed (#28777)
4 years ago
gongweibao 1dad8ceaab
Fix gpu memory allocation bug. (#28703)
4 years ago
Chen Weihang b969c32ab1
fix occupied 0 device memory bug (#28771)
4 years ago
joejiong 1a532d5133
add uint8 support for squeeze operator (#28734)
4 years ago
wangchaochaohu 8b853b3030
fix the number of perf algo for conv cudnn in exhaustive mode (#28694)
4 years ago
joanna.wozna.intel 8c0ea4bffe
Add bf16 matmul, fc, elementwise add and mul (#28729)
4 years ago
yaoxuefeng 08b62f4902
fix shuffle batch op shuffle (#28533)
4 years ago
taixiurong d3d1a6b6e0
add kunlun kernel: slice, slice_grad, top_k, cast. *test=kunlun (#28542)
4 years ago
Jack Zhou 9362d85e0e
Add LSTM, Simple RNN and GRU CPU kernel (#28577)
4 years ago
QingshuChen 30ef3815b3
adjust kunlun header file (#28536)
4 years ago
Zhang Ting dab4920568
improve performance of cast op (#28727)
4 years ago
yaoxuefeng 03f46e3526
fix truncated_gaussian op cuda seed setting (#28678)
4 years ago
Wojciech Uss 04bcc13fac
Add multi_gru op and tests (#28591)
4 years ago
joejiong 32b90b1c2d
add log10 (#28576)
4 years ago
Guo Sheng 858ffa0c8b
Fix the dropout setting when not initialized in rnn_op. (#28561)
4 years ago
Jacek Czaja 6d8d3d4c22
[oneDNN] Layer norm bf16 kernel (#28619)
4 years ago
Zhou Wei bf143652ac
fix lstm OP compile error on windows (#28667)
4 years ago
石晓伟 57dab959ca
add datanorm op new scale_w register (#28657)
4 years ago
cc 65aac81191
Fix fake_quant error when cout > 1024, test=develop (#28603)
4 years ago
lilong12 b2f7ab6636
bug fix, test=develop (#28648)
4 years ago
wawltor 8f2656ef5c
fix the gradient bug for the topk v2
4 years ago
wangchaochaohu a972c33fd7
refine gather OP performance for dynamic mode (#28587)
4 years ago
joanna.wozna.intel 2cb71c0cde
Add checkpoint to quantize (#28612)
4 years ago
pangyoki b889a0cee2
add gaussian_random op_version (#28602)
4 years ago
Guo Sheng 110febdc54
Fix gradients with ignore_idx in softmax_with_cross_entropy (#28622)
4 years ago
Leo Chen f962bd3432
Fix cudnn workspace limit in cudnn-8 (#28611)
4 years ago
Leo Chen 90805e2df7
Register op_version for new attribute use_addto (#28463)
4 years ago
lilong12 ed9dd7c9f0
add send and recv ops (#28590)
4 years ago
Zhong Hui a829357e4d
register the op version for some ops
4 years ago
Zhou Wei bf6e7cba7a
updata 2.0 API english doc (#28525)
4 years ago
Shang Zhizhou 8699f38d08
裁剪transformer模型trt支持;修复tensorRT不支持DeletePass的bug (#28517)
4 years ago
joejiong 08d2413142
add log2 operator (#28319)
4 years ago
wangchaochaohu c52fe48f6f
fix the GetKernelTypeForVar of input for fluid.gather (#28534)
4 years ago
wangchaochaohu d7cfee9b31
Checkout point add (#28488)
4 years ago
zhupengyang 47cbf61dd4
fix softmax unittest float16 random error (#28480)
4 years ago
wangchaochaohu e14ed71cc2
refine the performance of gather Op (#28458)
4 years ago
YUNSHEN XIE ba0756325a
exec ut no more than 15s 1 (#28439)
4 years ago
taixiurong fad4744aa4
fix crash in adam in xpu, *test=kunlun (#28433)
4 years ago
QingshuChen 6bba8e57b1
fix batch_norm_xpu bug & remove xpusimulator dependence (#28430)
4 years ago
joanna.wozna.intel 7821759d48
Add bfloat16 softmax and gelu (#28394)
4 years ago
石晓伟 c41fd033e5
check op_version_registry in CI test, test=develop (#28402)
4 years ago
Jacek Czaja ca41541472
[oneDNN]Sum bf16 kernel (#28382)
4 years ago
Leo Chen 8b2436a776
Add broadcast_shape api (#28257)
4 years ago
石晓伟 21a63f6f90
enhance the op_version_registry, test=develop (#28347)
4 years ago
Shang Zhizhou ea851796e5
TensorRT中ernie模型推理性能优化,支持变长输入 (#28367)
4 years ago
Jacek Czaja 84cc61b2cd
[oneDNN] sum op refactor (#28318)
4 years ago
Wilber 09fd2b2aab
Paddle support compile on sw (#27858)
4 years ago
Leo Chen 6115c14fca
Pool2d cuda kernel supports fp16 (#28316)
4 years ago
Guo Sheng 9a600df373
Add rnn_op (#28197)
4 years ago
wangguanzhong 5262b02585
add generate_proposals_v2 op (#28214)
4 years ago
joanna.wozna.intel 571a63e7ec
Add bf16 transpose2, reshape2, concat ops (#28195)
4 years ago
Guanghua Yu e8f2614da5
Enhance multiclass_nms op to support LoD for dygraph mode (#28276)
4 years ago
Leo Chen 8953038400
Fix transpose in conv cudnn kernel when addto enabled (#28295)
4 years ago
Tao Luo e1e666a05f
fix conv mkldnn build error (#28288)
4 years ago
Jacek Czaja 0b678d401b
- sum (#28233)
4 years ago
Jacek Czaja c11d9b3035
[oneDNN ] conv2d fwd&bwd optimization (#27871)
4 years ago
wangxinxin08 41d26a8287
update matrix nms op to api 2.0 (#28265)
4 years ago
Leo Chen 7fcb32ddf3
fill_constant op supports NINF (#28270)
4 years ago
wangchaochaohu 6905608cea
refine yolo box Op for performace optimization (#28155)
4 years ago
wangchaochaohu cdadc8f019
refine temporal_shift_op for performance optimization using gpu kernel config (#28114)
4 years ago
Zhang Ting fdc06f2158
add Fuse bn add act pass (#28196)
4 years ago
Chen Weihang 2babd6ff67
Add compile limit for PADDLE_ENFORCE without error message (#28221)
4 years ago
Double_V 2db77be423
fix wrong data type, test=develop (#28203)
4 years ago
Feiyu Chan efe6e2840c
fix strided_slice_op's GetExpectedKernelType (#28192)
4 years ago
WangXi e450823b8b
Fix nccl op test failed, test=develop (#28172)
4 years ago
wangguanzhong 5cd97a1cb0
support multiclass nms for multi-batch, test=develop (#28154)
4 years ago
Double_V 5289b72acc
fix Wmaybe-uninitialized warning in pooling.cc, test=develop (#28126)
4 years ago
wangguanzhong d1e1f17482
fix generate_proposal_labels in cascade-rcnn series model, test=develop (#27892)
4 years ago
Leo Chen a911c19eb0
fill_constant op supports NaN and Inf (#28109)
4 years ago
zhupengyang 6dd64b0a30
randperm run error in multi-gpus (#27942)
4 years ago
Double_V d43f75e4cc
add rois_num for roi_align xpu OP (#28077)
4 years ago
xiaoting e3d02c9574
rm max_input in conv2d for kunlun, test=kunlun (#28062)
4 years ago
wangchaochaohu 463c72c2d9
refine gpu kernel config for Paddle (#28085)
4 years ago
yinhaofeng 2cb1ecb99e
lookup_table_v2_op_xpu report errors;test=kunlun (#28064)
4 years ago
yinhaofeng 6f0c3d1f06
xpu adam op (#28031)
4 years ago
TeslaZhao a5c95cd588
Add xpu transpose2 op.test=kunlun (#28086)
4 years ago
Chengmo 5f04875c30
Fix xpu error message (#28061)
4 years ago
LutaoChu c8d32c8c10
Fix diag OP bug on Windows Python3.8
4 years ago
huangxu96 d466893820
Allclose op (#27891)
4 years ago
pangyoki 975bd8873b
Fix error message of multinomial op (#27946)
4 years ago
Kaipeng Deng b6eff4427c
update yolo_box support h != w. test=develop (#27327)
4 years ago
Double_V c1eed1fa24
error message opt for XPU, test=kunlun (#27972)
4 years ago
pangyoki 4c5b779a99
Add truncated_gaussian_random XPU kernel (#27861)
4 years ago
pangyoki 5b8e500135
Add gaussian_random XPU kernels (#27853)
4 years ago
pangyoki 74ce039743
Add uniform_random XPU kernel (#27846)
4 years ago
xiaoting abf4d52a74
Polish kunlun error (#27974)
4 years ago
liuyuhui 3e9568653b
add cast/concat/assign xpu op (#27911)
4 years ago
Guo Sheng fa9d3fa5bf
Incorporate cudnn_lstm into LSTM api (#27217)
4 years ago
Guanghua Yu f94d053705
error message optimization in mean_xpu,softmax_with_cross_entropy_op_xpu,test=kunlun (#27967)
4 years ago
Jack Zhou d330cf66cc
Fix xpu enforce (#27978)
4 years ago
lidanqing 7cb4a8b8f2
[oneDNN] Conv dilation support (#27914)
4 years ago
mapingshuo 64c2634995
fix kunlun kernel of reshape op (#27988)
4 years ago
tangwei12 202bfab1be
Feature/large scale kv save base/delta (#27470)
4 years ago
123malin aa3b4ed717
【paddle.fleet】geo send sparse optimize (#27719)
4 years ago
mapingshuo 5ccaaab8aa
reshape support bool, test=develop (#27944)
4 years ago
Qinghe JING 4a4f773658
Add reduce sum and reduce mean xpu op (#27939)
4 years ago
Zhou Wei bf412f4665
add tensor clone (#27953)
4 years ago
Feiyu Chan 2e845182d9
support channel last in BatchNorm*d
4 years ago
Leo Chen 9a2a4b5f65
Support setting xpu place in dygraph mode (#27909)
4 years ago
MRXLT 263a9e97fd
Fix adam (#27778)
4 years ago
Double_V b0edda4d99
kunlun add op (#27890)
4 years ago
Jack Zhou c791df09cf
Add elementwise XPU OP kernel for KUNLUN core, including (but still cannot process common broadcast
4 years ago
wangchaochaohu c5fcc96d5b
xpu support for fill_constant Op (#27675)
4 years ago
Chengmo 328cb289ed
【paddle.fleet】fix sparse load (#27680)
4 years ago
tangwei12 cf70d5b350
fix paddle error informations (#27889)
4 years ago
wawltor 95aa53425d
update the code for the topk message optimize
4 years ago
Chen Weihang 4ba977c720
Polish some error message in opeators (#27876)
4 years ago
123malin a4f850748a
【paddle.fleet】bug fix for parameter_recv (#27838)
4 years ago
QingshuChen 2712d07644
support kunlun matmul_v2 (#27910)
4 years ago
zhang wenhui 5a83496c8d
Multi task (#26002)
4 years ago
zhang wenhui 7a58431c0a
fix norm api doc, test=develop (#27652)
4 years ago
yinhaofeng 3eb106da6d
Lookup table v2 xpu (#27888)
4 years ago
Zhang Ting d5cc144c60
tune backward filter algorithm for float16 (#27529)
4 years ago
hutuxian 3f2a6ab65d
fix error msg (#27887)
4 years ago
xiaoting ae01801f0a
Add dropout and log_loss for kunlun (#27790)
4 years ago
Guanghua Yu 70c8c31371
support mean,softmax_with_cross_entropy on Baidu Kunlun (#27792)
4 years ago
Chengmo 1607e87cb9
add xpu sgd & momentum (#27728)
4 years ago
hong19860320 c90d35564b
Add batch_norm and layer_norm XPU kernels (#27818)
4 years ago
xiaoting 6da7a7458b
add conv for xpu, test=kunlun (#27809)
4 years ago
Thunderbrook 04be37c57f
add xpu slice op (#27349)
4 years ago
Thunderbrook 8c25dfaacc
op error info (#27856)
4 years ago
ShenLiang 6d63cd2b93
add gather_op xpu, test=kunlun (#27822)
4 years ago
Feiyu Chan 1d95a0fbc3
fix error message for nce_op (#27863)
4 years ago
guofei 2e1bca99ca
Refine the gradient calculation errors caused by renaming in while_grad (#27814)
4 years ago
wanghuancoder 8fa4c09889
add load_op_xpu for Baidu Kunlun (#27817)
4 years ago
Jacek Czaja 55e63763ec
[oneDNN] adaptive pool support (#27747)
4 years ago
Zhang Ting 16999ae49d
use IndexList to improve performance of instance_norm op (#25132)
4 years ago
GaoWei8 36bb056ed6
Add flattern weight of lstm (#27192)
4 years ago
Guanghua Yu 7779790c61
error message optimization in softmax_with_cross_entropy_op (#27772)
4 years ago
TeslaZhao 070ac9590c
Add double grad in Squeeze and Unsqueeze (#27810)
4 years ago
Jack Zhou d4359b0f39
add the kunlun kernel for the paddle 2.0
4 years ago
mapingshuo 840d54de9b
add XPU support for shape op and reshape op (#27804)
4 years ago
cc 8fabb1c32f
Add test attribute in channelwise_quant op, test=develop (#27742)
4 years ago
wangxinxin08 ad99e638fd
add double grad op for matmul (#27776)
4 years ago
zhupengyang 0025e0d87b
refine APIs: brelu, hardsigmoid, hardswish, maxout (#27658)
4 years ago
zhupengyang 5098891fdf
add softmax xpu kernel (#27700)
4 years ago
Double_V f6ad2375be
fix pool3d bug, test=develop (#27718)
4 years ago
Feiyu Chan 0a7bab4e34
fix error mesage for negative_positive_pair_op and nce_op (#27779)
4 years ago
zhupengyang 395cb561aa
refine logsumexp error message and docs (#27713)
4 years ago
smallv0221 057e28bc8f
API(lstm_unit, lstmp, sequence_mask, sequence_enumerate, sequence_conv) error message enhancement (#27572)
4 years ago
Jacek Czaja 606611d351
[oneDNN] GRU BF16 kernel (#27731)
4 years ago
xiemoyuan 6c1acf34ed
Optimize the error message for OP (#27617)
4 years ago
cc ec7d11a492
refine fused_elemwise_activation error message (#27734)
4 years ago
Zhen Wang 365c2c9c89
fix error message showing in UpdateLossScalingOp (#27596)
4 years ago
LielinJiang 9089841b6e
Fix bilateral inference shape bug (#26822)
4 years ago
Yiqun Liu 65207b4560
Polish the error message of fc, fused_fc_elementwise_layernorm and fused_embedding_seq_pool. (#27692)
4 years ago
Jacek Czaja b9fda2ff09
Fix to issue #25537 (#27546)
4 years ago
Wojciech Uss 966447e338
Added support for quantization of fusion_gru (#27518)
4 years ago
hong19860320 7a96d5788d
Optimize the error messages of the CUDA implementation of activation ops (#27741)
4 years ago
tangwei12 fd616fadc2
repen heartbeat ut (#27684)
4 years ago
Qi Li f373269df0
update histogram op for performance optimization, test=develop (#24912)
4 years ago
MRXLT 20fb01fb00
fix distributed error info (#27206)
4 years ago
pangyoki 7cd2c13f1b
add multinomial op (#27219)
4 years ago
Wojciech Uss 42d175385d
Add support for (de/re)quantization with shift (#27481)
4 years ago
123malin cc780b1977
test=develop, optimize geo communicator (#26857)
4 years ago
yukavio 7b46fb0f14
fix generate_proposals and affine grid error info (#27636)
4 years ago
AshburnLee c3a3df6466
Add cuda support for unique op (#27646)
4 years ago
wawltor 29f4922906
optimize the error meesage for detetion_map_op
4 years ago
whs daf5aa9b8b
Fix round in grid sample op (#27657)
4 years ago
ysh329 2f9cdd9038
API/OP clip_by_norm_op error message enhancement. test=develop (#27614)
4 years ago
yongqiangma aac57159c9
enhance array_to_lod_tensor_op lod_tensor_to_array_op errors informaiton (#27386)
4 years ago
xiemoyuan 99e3337368
Optimize the error message of OP. (#27478)
4 years ago
ShenLiang e8f873df88
optimize the speed&memory of matmul op (#27610)
4 years ago
tangwei12 9704582eef
fix op error (#27599)
4 years ago
yaoxuefeng c9a8801325
enhance error messages of lookup_tale, merge_ids, data_norm (#27619)
4 years ago
whs 9cc5603d56
Make grid support stopping graients. (#27630)
4 years ago
furnace d01f626944
update mv op according PR#27024 (#27474)
4 years ago
Double_V 9d783aeddd
Error message opt, test=develop (#27467)
4 years ago
Li Fuchen 1501a80f74
add support to float64 input of warpctc op. (#27399)
4 years ago
QingshuChen 6b727e08b1
support elementwise add, activation, matmul on Baidu Kunlun (#27143)
4 years ago