Commit Graph

5977 Commits (84639b61939ccd68702e6423f50f085af93ede19)

Author SHA1 Message Date
FlyingQianMM d42f93e504
add op_register_version for allclose op; test=op_version (#29968)
5 years ago
guofei b23faf37be
Add moving_average_abs_max_scale op_register_version test=develop (#29957)
5 years ago
wangxinxin08 be8b5fd18a
register op version for conv2d_transpose, conv3d_transpose and depthwise_conv2d_transpose, test=op_version (#29937)
5 years ago
Guo Sheng 6ac4f0af6a
Register op version for coalesce_tensor. (#29940)
5 years ago
Jack Zhou 5a4e42ca9a
add gru op_register_version; test=op_version; (#29931)
5 years ago
Qi Li 913f77a4b7
Register op version for print, test=op_version (#29945)
5 years ago
cc 7667e59bf7
add op version for fake_quant and fake_dequant ops, test=op_version (#29923)
5 years ago
Wilber 332da133a1
Support mips arch (#29903)
5 years ago
LielinJiang eab0b60e16
Register op version for grid_sampler, test=op_version (#29916)
5 years ago
LielinJiang 0f4b218640
Enable bilateral_slice unittest on windows platform (#29896)
5 years ago
Chen Weihang a6072055be
[Complex] Handle complex to real after type promotion (#29855)
5 years ago
Chen Weihang 1a304e6c06
[Complex] Add support for complex grad accumulated (#29889)
5 years ago
taixiurong c7acad9f2f
support some shape for matmul and cast in xpu place (#29900)
5 years ago
QingshuChen 59b47f3b32
feat: support check_nan_inf for kunlun/xpu device (#29694)
5 years ago
tangwei12 032414ca2a
[Feature] one ps (3/4) (#29604)
5 years ago
jakpiase edc06c6a1b
Added fc + activation fuse pass (currently only gelu, sigmoid and tanh are supported) (#29772)
5 years ago
liym27 97e75ad0f5
[setitem] Support Tensor setitem in static mode (#29708)
5 years ago
Jacek Czaja c9e874fc8e
[oneDNN] Unit test for checking oneDNN caching (#29606)
5 years ago
Thunderbrook 09b6e71928
heter box (#29734)
5 years ago
123malin a400b76db7
Roll cuda kernel (#29655)
5 years ago
wuhuanzhou e7ac74c85b
optimize compilation time of argmin/argmax op (#29595)
5 years ago
chentianyu03 ddfc3d2c2f
change grad elementwise_mul for complex types (#29757)
5 years ago
chentianyu03 2a260d9b0e
change the grad of div when complex types (#29804)
5 years ago
TTerror 82aa01c373
add nearest_interp_v2 on kunlun (#29725)
5 years ago
wangchaochaohu 01c37c8e02
refine the compiler error for half2 operation (#29816)
5 years ago
whs 82630408b4
Support double backward rsqrt (#29589)
5 years ago
Zhang Ting b76f5a8489
fix the bug of dropout_grad (#29813)
5 years ago
LielinJiang a94c3cbbf3
register cudnn conv double grad for depthwise conv (#29807)
5 years ago
wangchaochaohu f350aa59ff
Fix the compiler error for half type (#29799)
5 years ago
LielinJiang e5af650b71
Add double grad for conv_transpose (#29706)
5 years ago
LoveAn 2e5b4a216c
Optimize compilation time with Unity Build (#29733)
5 years ago
wangchaochaohu 7b2dc4e6b1
optimization for fp16 elementwise add (#29744)
5 years ago
Jacek Czaja 07790ba13e
[oneDNN] Reimplemented elementwise_add grad (#29747)
5 years ago
wangchaochaohu 068d905e1e
fix the shape choose of vectorize for cuda
5 years ago
syyxsxx 7c2affaa26
fix isfinite_v2_op OpProtoAndCheckerMaker AddComment bug (#29626)
5 years ago
chentianyu03 71063b8137
add conj op for complex types (#29527)
5 years ago
Chen Weihang 6cfa59de1b
[Complex] Add real & imag op and api for complex tensor (#29672)
5 years ago
wangchaochaohu 2e0d1ed00f
delete the code for fp16 optimization because it is not faster than common template code (#29715)
5 years ago
TTerror af8ded773a
update activation op on kunlun (#29577)
5 years ago
ceci3 cc387159f3
add pad and concat double grad (#29549)
5 years ago
Y_Xuan 76738504ad
添加rocm平台支持代码 (#29342)
5 years ago
Zhang Ting 1e9127f688
improve dropout grad (#29605)
5 years ago
wangchaochaohu eab44e1f32
refine (#29622)
5 years ago
WangXi 613c46bc07
fix gen_nccl_id_op_helper compile failed, test=develop (#29614)
5 years ago
Chen Weihang f02aece1f0
Add complex dtype op (add) test example (#29603)
5 years ago
lijianshe02 7779768b53
add transpose double grad test=develop (#29600)
5 years ago
wangchaochaohu 1b69e528d3
optimize for long width for elementwise (#29602)
5 years ago
ShenLiang 1efef8baed
Fix bug of matmul_v2 for broadcast case (#29599)
5 years ago
qingqing01 8d549fc85d
Add clip double grad (#29590)
5 years ago
wangchaochaohu ac4bae8ee9
elementwise_add_grad Op optimization (#29575)
5 years ago
arlesniak 62d4483649
Added verbose oneDNN lib version (#29378)
5 years ago
WangXi 467c716963
gen nccl id use socket (#29431)
5 years ago
Leo Chen c0163837a5
Fix compile problem when cuda_arch < 6000 (#29576)
5 years ago
QingshuChen 79a41a9ed6
support roi_align & affine_channel for kunlun (#29561)
5 years ago
Jacek Czaja f6cca62575
[oneDNN] Making ThreadID info in caching key optional (#29272)
5 years ago
Leo Chen 1e72e03217
remove duplicated macro (#29563)
5 years ago
Zhang Ting 6702040e94
improve dropout (#29465)
5 years ago
Zhang Ting 30d9589afe
add cast cuda kernel (#29352)
5 years ago
LoveAn b5d4a1f33d
Add the strategy of skipping cc/cu test compilation and execution in CI (#29499)
5 years ago
taixiurong 760d015c14
add xpu ops for training transformer in kunlun (#29539)
5 years ago
Zhong Hui 60bfd308ab
fix p_norm with empty shape (#29500)
5 years ago
Leo Chen 9f926eb720
Layernorm opt (#29522)
5 years ago
ShenLiang d8391a1983
fix error message of gather nd (#29521)
5 years ago
Zhen Wang 5ac71b36fb
Remove tensor copy in the update_loss_scaling op. (#29426)
5 years ago
joejiong 87e75a77c2
Add tangent operator (#29207)
5 years ago
zlsh80826 95e334810a
Softmax vectorization (#29404)
5 years ago
procr 3a0558339d
support mobilenet for kunlun (#29458)
5 years ago
Leo Chen e5e522493d
make gelu fp16 computing more robust (#29484)
5 years ago
Zhang Ting 560b432349
Revert "improve elementwise_add_grad perf (#29277)" (#29464)
5 years ago
jakpiase 57a4f16d9e
added internal and external reorders to profiler (#29443)
5 years ago
taixiurong ecca6585cd
1. fix elementwise ops'bug 2. fix softmax_with_cross_entropy_op 3. add biliner_interp_op (#29448)
5 years ago
TTerror a5fcc4b545
update reduce_sum op on xpu (#29367)
5 years ago
Jack Zhou c7cada8571
Fix gru performace decline in 1.8.5 (#29455)
5 years ago
Zhang Ting 6296f4ed09
revert cast eigen kernel (#29427)
5 years ago
Leo Chen a040c055a5
fix layer_norm accuracy (#29434)
5 years ago
Leo Chen 4e19ce1df5
refine reshape grad and double grad kernel, use tensor copy async (#29128)
5 years ago
LoveAn 671555ed32
Compiling operator libraries with Unity build (#29130)
5 years ago
chentianyu03 879e913b6d
Make transpose, trace, kron, reshape, sum op support complex type (#29321)
5 years ago
卖鱼的哲学 074065e5de
fix expand/uniform_random && concat/transpose to new api on xpu (#29280)
5 years ago
QingshuChen 74bf3bed36
support global pooling for kunlun (#29293)
5 years ago
Chen Weihang 9ad800ebb2
Support type promote for basic math ops (quantum required) (#29265)
5 years ago
tangwei12 8358791607
fix gpu outofrange (#29238)
5 years ago
Zhang Ting befd6d5338
improve elementwise_add_grad perf (#29277)
5 years ago
Shang Zhizhou ebf689197d
fix tensorrt output shape error (#29308)
5 years ago
Aurelius84 67c700b479
[Dy2Stat] Add cache for Executor and Context in run_program_op (#28421)
5 years ago
wangchaochaohu c4be80f402
polish the code of cumsum and remove some unused code (#29303)
5 years ago
ShenLiang 0fb18bc214
enforce the matmul_v2 error message (#29297)
5 years ago
Zhen Wang 9b59a589b1
Remove some useless log. (#29300)
5 years ago
Leo Chen 13a22a3752
fix shape of tile_grad op (#29289)
5 years ago
Zhen Wang be3777a50a
Add pure fp16 training with master weights. (#27712)
5 years ago
furnace 7584bb5096
Layer norm fp16 (#29169)
5 years ago
Leo Chen 116305ea4b
Improve performance of elementwise_add grad op (#29187)
5 years ago
卖鱼的哲学 07c67d5a8b
add deformable_conv op on xpu (#29234)
5 years ago
QingshuChen 64f29fbb70
update kunlun conv2d/softmax/elementwise implemetation (#29229)
5 years ago
chentianyu03 8f45d14263
add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199)
5 years ago
Wilber 74c43ac638
fix lite unit test. (#29233)
5 years ago
Adam Osewski 4096ff94dc
Small optimizations for conv2d kernel subroutines. (#29188)
5 years ago
123malin b5c6342336
Update ps gpu (#29209)
5 years ago
123malin 03d4665f44
prefetch optimize (#29095)
5 years ago
WangXi 0c2a51d240
optimizer amp, all use fp16 communication, overlap last comm and compute (#28957)
5 years ago
Jack Zhou bc6033f86b
fix gru gcc7.4 bug for the gru compile
5 years ago
wangchaochaohu b818429ae7
optimize cumsum OP (#29193)
5 years ago
lilong12 7e5e9934fe
update expand as op to use the shape of the target tensor instead of the target tensor itself. (#29020)
5 years ago
Jack Zhou 085260f3de
Add eigen gru and fix the dropout bug in the rnn
5 years ago
arlesniak bc902044a4
Fixes mkldnn dygraph learning rate scheduler crashes (#28988)
5 years ago
Shang Zhizhou b9e76a0103
detect tensorRT plugin fp16 in runtime (#27933)
5 years ago
Noel da71173bc9
Fix ops doc for some ops
5 years ago
joanna.wozna.intel b0d1ac161e
Add bf16 pool2d and unify bf16 unit tests (#29039)
5 years ago
joejiong 582c0a0468
add uint8 for reshape op (#28996)
5 years ago
taixiurong a5aa4dc7a9
add xpu elementwise ops (#29031)
5 years ago
joejiong b04c78ef5e
Update pow (#29000)
5 years ago
wawltor b2c8a00745
remove eigen threadpool for the speed up
5 years ago
lilong12 767d0ba267
update, test=develop (#28700)
5 years ago
123malin fbf9564f6b
【paddle.distributed.fleet】Optimize ParameterServer's Async Mode (#28442)
5 years ago
furnace 8ff3550658
refactor momentum op to combine weight (#27414)
5 years ago
Jacek Czaja bd1d6d3b30
extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758)
5 years ago
yaoxuefeng 71c1cd1408
fix truncated_gaussian seed (#28777)
5 years ago
gongweibao 1dad8ceaab
Fix gpu memory allocation bug. (#28703)
5 years ago
Chen Weihang b969c32ab1
fix occupied 0 device memory bug (#28771)
5 years ago
joejiong 1a532d5133
add uint8 support for squeeze operator (#28734)
5 years ago
wangchaochaohu 8b853b3030
fix the number of perf algo for conv cudnn in exhaustive mode (#28694)
5 years ago
joanna.wozna.intel 8c0ea4bffe
Add bf16 matmul, fc, elementwise add and mul (#28729)
5 years ago
yaoxuefeng 08b62f4902
fix shuffle batch op shuffle (#28533)
5 years ago
taixiurong d3d1a6b6e0
add kunlun kernel: slice, slice_grad, top_k, cast. *test=kunlun (#28542)
5 years ago
Jack Zhou 9362d85e0e
Add LSTM, Simple RNN and GRU CPU kernel (#28577)
5 years ago
QingshuChen 30ef3815b3
adjust kunlun header file (#28536)
5 years ago
Zhang Ting dab4920568
improve performance of cast op (#28727)
5 years ago
yaoxuefeng 03f46e3526
fix truncated_gaussian op cuda seed setting (#28678)
5 years ago
Wojciech Uss 04bcc13fac
Add multi_gru op and tests (#28591)
5 years ago
joejiong 32b90b1c2d
add log10 (#28576)
5 years ago
Guo Sheng 858ffa0c8b
Fix the dropout setting when not initialized in rnn_op. (#28561)
5 years ago
Jacek Czaja 6d8d3d4c22
[oneDNN] Layer norm bf16 kernel (#28619)
5 years ago
Zhou Wei bf143652ac
fix lstm OP compile error on windows (#28667)
5 years ago
石晓伟 57dab959ca
add datanorm op new scale_w register (#28657)
5 years ago
cc 65aac81191
Fix fake_quant error when cout > 1024, test=develop (#28603)
5 years ago
lilong12 b2f7ab6636
bug fix, test=develop (#28648)
5 years ago
wawltor 8f2656ef5c
fix the gradient bug for the topk v2
5 years ago
wangchaochaohu a972c33fd7
refine gather OP performance for dynamic mode (#28587)
5 years ago
joanna.wozna.intel 2cb71c0cde
Add checkpoint to quantize (#28612)
5 years ago
pangyoki b889a0cee2
add gaussian_random op_version (#28602)
5 years ago
Guo Sheng 110febdc54
Fix gradients with ignore_idx in softmax_with_cross_entropy (#28622)
5 years ago
Leo Chen f962bd3432
Fix cudnn workspace limit in cudnn-8 (#28611)
5 years ago
Leo Chen 90805e2df7
Register op_version for new attribute use_addto (#28463)
5 years ago
lilong12 ed9dd7c9f0
add send and recv ops (#28590)
5 years ago
Zhong Hui a829357e4d
register the op version for some ops
5 years ago
Zhou Wei bf6e7cba7a
updata 2.0 API english doc (#28525)
5 years ago
Shang Zhizhou 8699f38d08
裁剪transformer模型trt支持;修复tensorRT不支持DeletePass的bug (#28517)
5 years ago
joejiong 08d2413142
add log2 operator (#28319)
5 years ago
wangchaochaohu c52fe48f6f
fix the GetKernelTypeForVar of input for fluid.gather (#28534)
5 years ago
wangchaochaohu d7cfee9b31
Checkout point add (#28488)
5 years ago
zhupengyang 47cbf61dd4
fix softmax unittest float16 random error (#28480)
5 years ago
wangchaochaohu e14ed71cc2
refine the performance of gather Op (#28458)
5 years ago
YUNSHEN XIE ba0756325a
exec ut no more than 15s 1 (#28439)
5 years ago
taixiurong fad4744aa4
fix crash in adam in xpu, *test=kunlun (#28433)
5 years ago
QingshuChen 6bba8e57b1
fix batch_norm_xpu bug & remove xpusimulator dependence (#28430)
5 years ago
joanna.wozna.intel 7821759d48
Add bfloat16 softmax and gelu (#28394)
5 years ago
石晓伟 c41fd033e5
check op_version_registry in CI test, test=develop (#28402)
5 years ago
Jacek Czaja ca41541472
[oneDNN]Sum bf16 kernel (#28382)
5 years ago
Leo Chen 8b2436a776
Add broadcast_shape api (#28257)
5 years ago
石晓伟 21a63f6f90
enhance the op_version_registry, test=develop (#28347)
5 years ago
Shang Zhizhou ea851796e5
TensorRT中ernie模型推理性能优化,支持变长输入 (#28367)
5 years ago
Jacek Czaja 84cc61b2cd
[oneDNN] sum op refactor (#28318)
5 years ago
Wilber 09fd2b2aab
Paddle support compile on sw (#27858)
5 years ago
Leo Chen 6115c14fca
Pool2d cuda kernel supports fp16 (#28316)
5 years ago
Guo Sheng 9a600df373
Add rnn_op (#28197)
5 years ago
wangguanzhong 5262b02585
add generate_proposals_v2 op (#28214)
5 years ago
joanna.wozna.intel 571a63e7ec
Add bf16 transpose2, reshape2, concat ops (#28195)
5 years ago
Guanghua Yu e8f2614da5
Enhance multiclass_nms op to support LoD for dygraph mode (#28276)
5 years ago
Leo Chen 8953038400
Fix transpose in conv cudnn kernel when addto enabled (#28295)
5 years ago
Tao Luo e1e666a05f
fix conv mkldnn build error (#28288)
5 years ago
Jacek Czaja 0b678d401b
- sum (#28233)
5 years ago
Jacek Czaja c11d9b3035
[oneDNN ] conv2d fwd&bwd optimization (#27871)
5 years ago
wangxinxin08 41d26a8287
update matrix nms op to api 2.0 (#28265)
5 years ago
Leo Chen 7fcb32ddf3
fill_constant op supports NINF (#28270)
5 years ago
wangchaochaohu 6905608cea
refine yolo box Op for performace optimization (#28155)
5 years ago
wangchaochaohu cdadc8f019
refine temporal_shift_op for performance optimization using gpu kernel config (#28114)
5 years ago
Zhang Ting fdc06f2158
add Fuse bn add act pass (#28196)
5 years ago
Chen Weihang 2babd6ff67
Add compile limit for PADDLE_ENFORCE without error message (#28221)
5 years ago
Double_V 2db77be423
fix wrong data type, test=develop (#28203)
5 years ago
Feiyu Chan efe6e2840c
fix strided_slice_op's GetExpectedKernelType (#28192)
5 years ago
WangXi e450823b8b
Fix nccl op test failed, test=develop (#28172)
5 years ago
wangguanzhong 5cd97a1cb0
support multiclass nms for multi-batch, test=develop (#28154)
5 years ago
Double_V 5289b72acc
fix Wmaybe-uninitialized warning in pooling.cc, test=develop (#28126)
5 years ago
wangguanzhong d1e1f17482
fix generate_proposal_labels in cascade-rcnn series model, test=develop (#27892)
5 years ago
Leo Chen a911c19eb0
fill_constant op supports NaN and Inf (#28109)
5 years ago
zhupengyang 6dd64b0a30
randperm run error in multi-gpus (#27942)
5 years ago
Double_V d43f75e4cc
add rois_num for roi_align xpu OP (#28077)
5 years ago
xiaoting e3d02c9574
rm max_input in conv2d for kunlun, test=kunlun (#28062)
5 years ago
wangchaochaohu 463c72c2d9
refine gpu kernel config for Paddle (#28085)
5 years ago
yinhaofeng 2cb1ecb99e
lookup_table_v2_op_xpu report errors;test=kunlun (#28064)
5 years ago
yinhaofeng 6f0c3d1f06
xpu adam op (#28031)
5 years ago
TeslaZhao a5c95cd588
Add xpu transpose2 op.test=kunlun (#28086)
5 years ago
Chengmo 5f04875c30
Fix xpu error message (#28061)
5 years ago
LutaoChu c8d32c8c10
Fix diag OP bug on Windows Python3.8
5 years ago
huangxu96 d466893820
Allclose op (#27891)
5 years ago
pangyoki 975bd8873b
Fix error message of multinomial op (#27946)
5 years ago
Kaipeng Deng b6eff4427c
update yolo_box support h != w. test=develop (#27327)
5 years ago
Double_V c1eed1fa24
error message opt for XPU, test=kunlun (#27972)
5 years ago
pangyoki 4c5b779a99
Add truncated_gaussian_random XPU kernel (#27861)
5 years ago
pangyoki 5b8e500135
Add gaussian_random XPU kernels (#27853)
5 years ago