Commit Graph

18242 Commits (342d62de60850d1e991b1a23aed360c1d6f78bbf)

Author SHA1 Message Date
wangchaochaohu f350aa59ff
Fix the compiler error for half type (#29799)
4 years ago
wuhuanzhou 27aa15150c
Add approval for PR-CI-OP-benchmark (#29797)
4 years ago
Huihuang Zheng 1cbb282d77
Add Retry Logic to CublasHandlerHolder
4 years ago
LielinJiang e5af650b71
Add double grad for conv_transpose (#29706)
4 years ago
Leo Chen 224f3bcbb1
format code (#29714)
4 years ago
LoveAn 2e5b4a216c
Optimize compilation time with Unity Build (#29733)
4 years ago
Zhang Jun 0c23ba95d8
enable MakeCiper api for inference;test=develop (#29692)
4 years ago
wangchaochaohu 7b2dc4e6b1
optimization for fp16 elementwise add (#29744)
4 years ago
chalsliu 27bdbec7fc
Refine precision test print message
4 years ago
chalsliu e63a68feac
Retry when download failed for precision test
4 years ago
Jacek Czaja 07790ba13e
[oneDNN] Reimplemented elementwise_add grad (#29747)
4 years ago
Aurelius84 17c8e3adfe
Polish code in gpu_launch_config.h (#29730)
4 years ago
wangchaochaohu 068d905e1e
fix the shape choose of vectorize for cuda
4 years ago
syyxsxx 7c2affaa26
fix isfinite_v2_op OpProtoAndCheckerMaker AddComment bug (#29626)
4 years ago
石晓伟 8bd2879ef7
update the operator registration for incompatible upgrade, test=develop (#29720)
4 years ago
chentianyu03 71063b8137
add conj op for complex types (#29527)
4 years ago
Wilber b593d588aa
[Inference] EnableUseGpu has higher priority than flags (#29697)
4 years ago
WangXi 9cbcc6cadc
fleet sync build strategy, test=develop (#29732)
4 years ago
wanghuancoder 0c59ad2a1a
Windows generate pdb and dump, for debug (#29628)
4 years ago
Huihuang Zheng 4c4d4ba5e0
Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop (#29617)
4 years ago
Chen Weihang 6cfa59de1b
[Complex] Add real & imag op and api for complex tensor (#29672)
4 years ago
Jacek Czaja 9eff1a674f
Added missing format of oneDNN (#29670)
4 years ago
wangchaochaohu 2e0d1ed00f
delete the code for fp16 optimization because it is not faster than common template code (#29715)
4 years ago
TTerror af8ded773a
update activation op on kunlun (#29577)
4 years ago
ceci3 cc387159f3
add pad and concat double grad (#29549)
4 years ago
liuyuhui f13c3a9cd7
[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)
4 years ago
Y_Xuan 76738504ad
添加rocm平台支持代码 (#29342)
4 years ago
Zhang Ting 1e9127f688
improve dropout grad (#29605)
4 years ago
wangchaochaohu eab44e1f32
refine (#29622)
4 years ago
WangXi 613c46bc07
fix gen_nccl_id_op_helper compile failed, test=develop (#29614)
4 years ago
chen zhiyu f5f8809c1a
1. add python version selection 2.add dynamic flags setting. (#29612)
4 years ago
YUNSHEN XIE 2926e74326
New UT should not exceed 15s (#29492)
4 years ago
Chen Weihang f02aece1f0
Add complex dtype op (add) test example (#29603)
4 years ago
AshburnLee efea540ca9
Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)
4 years ago
lijianshe02 7779768b53
add transpose double grad test=develop (#29600)
4 years ago
wangchaochaohu 1b69e528d3
optimize for long width for elementwise (#29602)
4 years ago
Wilber 78dad78610
fix none-contiguous bug for python api. (#29615)
4 years ago
Zhou Wei 18f9df0da4
fix cache pip error (#29618)
4 years ago
ShenLiang 1efef8baed
Fix bug of matmul_v2 for broadcast case (#29599)
4 years ago
qingqing01 8d549fc85d
Add clip double grad (#29590)
4 years ago
wangchaochaohu ac4bae8ee9
elementwise_add_grad Op optimization (#29575)
4 years ago
arlesniak 62d4483649
Added verbose oneDNN lib version (#29378)
4 years ago
lilong12 ff6a145011
update, test=develop (#29559)
4 years ago
WangXi 467c716963
gen nccl id use socket (#29431)
4 years ago
tangwei12 0034273b7e
add service (#29560)
4 years ago
Leo Chen c0163837a5
Fix compile problem when cuda_arch < 6000 (#29576)
4 years ago
QingshuChen 79a41a9ed6
support roi_align & affine_channel for kunlun (#29561)
4 years ago
Huihuang Zheng 831e9135b9
Fix Windows Unittest (#29543)
4 years ago
Jacek Czaja f6cca62575
[oneDNN] Making ThreadID info in caching key optional (#29272)
4 years ago
GeminiCarrie 08f24a3108
Fix precision problem (#29567)
4 years ago
Wilber 740c0d58c3
update for xpu ci. (#29568)
4 years ago
JZ-LIANG d33d468f02
[Sharding] add hybrid-dp feature (#29518)
4 years ago
Leo Chen 1e72e03217
remove duplicated macro (#29563)
4 years ago
Zhang Ting 6702040e94
improve dropout (#29465)
4 years ago
Zhang Ting 30d9589afe
add cast cuda kernel (#29352)
4 years ago
LoveAn b5d4a1f33d
Add the strategy of skipping cc/cu test compilation and execution in CI (#29499)
4 years ago
Aurelius84 2a42250699
Polish hash function of executor cache key (#29556)
4 years ago
taixiurong 760d015c14
add xpu ops for training transformer in kunlun (#29539)
4 years ago
Jacek Czaja 83a693ee55
[oneDNN] Added Unit Test for Multiple instances prediction (#29501)
4 years ago
joanna.wozna.intel 0ce6d7fa77
Fix bf16 activations test for softmax and gelu (#29502)
4 years ago
Zhong Hui 60bfd308ab
fix p_norm with empty shape (#29500)
4 years ago
Zhou Wei b9e926b8e5
change the code format (#29550)
4 years ago
Leo Chen 9f926eb720
Layernorm opt (#29522)
4 years ago
arlesniak b781953ef5
[oneDNN] Fix flags use test for #29080, assert condition more general (#29493)
4 years ago
tangwei12 ae3f7a7100
add ps table (#29463)
4 years ago
chalsliu 36ec9456cf
Make PADDLE_ROOT as an environment variable
4 years ago
ShenLiang d8391a1983
fix error message of gather nd (#29521)
4 years ago
Zhen Wang 5ac71b36fb
Remove tensor copy in the update_loss_scaling op. (#29426)
4 years ago
Zhou Wei e74e1a226c
support deepcopy for Layer/Tensor/Paramerbase (#29387)
4 years ago
joejiong 87e75a77c2
Add tangent operator (#29207)
4 years ago
zlsh80826 95e334810a
Softmax vectorization (#29404)
4 years ago
wanghuancoder a136c9cdb8
fix increamental coverage script bug, WITH_INCREMENTAL_COVERAGE to DWITH_INCREMENTAL_COVERAGE, test=develop (#29509)
4 years ago
Aurelius84 966aa0e387
Fix test_mobile_net random failed on windows GPU(#29480)
4 years ago
ShenLiang 2ef9e0e23c
Rebuild group automatically in dynamic graph distributed (#29255)
4 years ago
procr 3a0558339d
support mobilenet for kunlun (#29458)
4 years ago
Huihuang Zheng a1909affc6
Fix Unit Test: Add Sleep Time for CUDA Retry (#29442)
4 years ago
Leo Chen e5e522493d
make gelu fp16 computing more robust (#29484)
4 years ago
LoveAn 8094ac686e
Print ccache/clcache hit rate (#29341)
4 years ago
Zhang Ting 560b432349
Revert "improve elementwise_add_grad perf (#29277)" (#29464)
4 years ago
jakpiase 57a4f16d9e
added internal and external reorders to profiler (#29443)
4 years ago
Pei Yang 2480bdef6c
change hard_swish from plugin to layer (#29177)
4 years ago
taixiurong ecca6585cd
1. fix elementwise ops'bug 2. fix softmax_with_cross_entropy_op 3. add biliner_interp_op (#29448)
4 years ago
LoveAn 03b42d9fa7
fix unittest on windows, test=develop (#29365)
4 years ago
TTerror a5fcc4b545
update reduce_sum op on xpu (#29367)
4 years ago
Jack Zhou c7cada8571
Fix gru performace decline in 1.8.5 (#29455)
4 years ago
Zhang Ting 6296f4ed09
revert cast eigen kernel (#29427)
4 years ago
Leo Chen a040c055a5
fix layer_norm accuracy (#29434)
4 years ago
Zhou Wei 24ba9ed436
fix that parameters'grad has grad var (#29408)
4 years ago
Leo Chen 4e19ce1df5
refine reshape grad and double grad kernel, use tensor copy async (#29128)
4 years ago
Shang Zhizhou 225a9c4ed8
Fix unittest (#29412)
4 years ago
Pei Yang f860de4af7
support clip op trt converter (#29411)
4 years ago
Jack Zhou 1dd7b97b66
fix rnn_op bug in cudnn_version>= 8 (#29406)
4 years ago
LoveAn 671555ed32
Compiling operator libraries with Unity build (#29130)
4 years ago
Zhou Wei 5c9bd0bf7c
print whether has build cache (#29035)
4 years ago
cc a623ce044f
Use different name_scope for different conv type, test=develop (#29355)
4 years ago
yongqiangma 7c508d8668
update unbind norm add CUDAPlace api doc information (#29322)
4 years ago
chentianyu03 879e913b6d
Make transpose, trace, kron, reshape, sum op support complex type (#29321)
4 years ago
卖鱼的哲学 074065e5de
fix expand/uniform_random && concat/transpose to new api on xpu (#29280)
4 years ago
lilong12 1decf4ada6
update, test=develop (#29331)
4 years ago
QingshuChen 74bf3bed36
support global pooling for kunlun (#29293)
4 years ago
liym27 b10ecd9d3a
[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267)
4 years ago
Chen Weihang 9ad800ebb2
Support type promote for basic math ops (quantum required) (#29265)
4 years ago
tangwei12 8358791607
fix gpu outofrange (#29238)
4 years ago
Leo Chen b58cfff89d
use has_grad instead of train_mode (#29309)
4 years ago
Zhang Ting befd6d5338
improve elementwise_add_grad perf (#29277)
4 years ago
Shang Zhizhou ebf689197d
fix tensorrt output shape error (#29308)
4 years ago
Aurelius84 67c700b479
[Dy2Stat] Add cache for Executor and Context in run_program_op (#28421)
4 years ago
ShenLiang 696dc4bb13
fix the warning of reducer (#29323)
4 years ago
wangchaochaohu c4be80f402
polish the code of cumsum and remove some unused code (#29303)
4 years ago
ShenLiang c00af94435
fix matmulv2 for windows (#29302)
4 years ago
wanghuancoder 3765da98c7
add coverage incremental switch, test=develop (#29290)
4 years ago
Wilber d68af02c04
fix analysis_config bug. (#29304)
4 years ago
ShenLiang 0fb18bc214
enforce the matmul_v2 error message (#29297)
4 years ago
Zhen Wang 9b59a589b1
Remove some useless log. (#29300)
4 years ago
Leo Chen 13a22a3752
fix shape of tile_grad op (#29289)
4 years ago
Zhen Wang be3777a50a
Add pure fp16 training with master weights. (#27712)
4 years ago
Wojciech Uss 6673fb0565
change import math.h to cmath (#29260)
4 years ago
furnace 7584bb5096
Layer norm fp16 (#29169)
4 years ago
Shang Zhizhou c59b4f28a2
fix cmake error when WITH_GPU=ON and WITH_TENSORRT=ON && WITH_MKL=OFF (#29275)
4 years ago
Shang Zhizhou fc80d2e09c
add compile option WITH_TENSORRT (#29208)
4 years ago
Leo Chen 116305ea4b
Improve performance of elementwise_add grad op (#29187)
4 years ago
卖鱼的哲学 07c67d5a8b
add deformable_conv op on xpu (#29234)
4 years ago
Chen Weihang 1de32f823d
Hot fix complle failed in gcc4.8 caused by complex impl (#29254)
4 years ago
GeminiCarrie 642abe2a48
Fix a bug when running on an operating system without "bash." (#29131)
4 years ago
ShenLiang 46b73e6cd9
Change the api of DataParallel and Fleet (#29224)
4 years ago
QingshuChen 64f29fbb70
update kunlun conv2d/softmax/elementwise implemetation (#29229)
4 years ago
chentianyu03 8f45d14263
add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199)
4 years ago
Zhou Wei c0a991c874
accumulate gradient for leaf tensor with previous graph and expose leaf tensor concept (#28429)
4 years ago
Wilber 74c43ac638
fix lite unit test. (#29233)
4 years ago
Adam Osewski 4096ff94dc
Small optimizations for conv2d kernel subroutines. (#29188)
4 years ago
joanna.wozna.intel 5c61eeef61
Enable all image classification models (#29155)
4 years ago
Wilber 4fec182d24
[Lite-Subgraph] Fix compile error for lite subgraph. (#29146)
4 years ago
123malin b5c6342336
Update ps gpu (#29209)
4 years ago
liym27 865a45984f
Check whether there is any inplace operation affecting gradient calculation. (#27901)
4 years ago
chen zhiyu 4056c4f11c
Add unittest in musl build (#29099)
4 years ago
123malin 03d4665f44
prefetch optimize (#29095)
4 years ago
WangXi 0c2a51d240
optimizer amp, all use fp16 communication, overlap last comm and compute (#28957)
4 years ago
Chen Weihang 0b032faeee
Polish unittests details and execution conditions to adapt to MUSL (#29044)
4 years ago
123malin 92817f8005
test=develop, rm pathlib (#28658)
4 years ago
Wojciech Uss 4fd4095d1b
Add quantization of multi_gru op and tests (#28615)
4 years ago
Jack Zhou bc6033f86b
fix gru gcc7.4 bug for the gru compile
4 years ago
wanghuancoder 0239f79695
Generate code coverage reports only for incremental files (#28508)
4 years ago
wangchaochaohu b818429ae7
optimize cumsum OP (#29193)
4 years ago
ShenLiang e2d01eb650
Support dynamic graph distributed (#28997)
4 years ago
lilong12 7e5e9934fe
update expand as op to use the shape of the target tensor instead of the target tensor itself. (#29020)
4 years ago
pangyoki 7c8ac064c8
Delete prettytable in condabuild (#29145)
4 years ago
Zhou Wei e668cb07fb
fix CUDA 11 error on windows (#29101)
4 years ago
Jack Zhou 085260f3de
Add eigen gru and fix the dropout bug in the rnn
4 years ago
yaoxuefeng 545df287fc
add user_define_dump (#28596)
4 years ago
Aurelius84 71815637cc
Move gym into unittest/requirements.txt (#29149)
4 years ago
arlesniak bc902044a4
Fixes mkldnn dygraph learning rate scheduler crashes (#28988)
4 years ago
Shang Zhizhou b9e76a0103
detect tensorRT plugin fp16 in runtime (#27933)
4 years ago
Leo Chen fd3fcb051a
fix typo of flag name (#29154)
4 years ago
Noel da71173bc9
Fix ops doc for some ops
4 years ago
Leo Chen 770395cb93
Split train_mode and has_grad for tracer (#29064)
4 years ago
Aurelius84 7ae3cb554a
Polish CUDA Information stdout (#29109)
4 years ago
chalsliu 7a15e64034
Support precision test for new ut
4 years ago
WangXi 173c22aec2
optimize fast graph executor (#28962)
4 years ago
Shang Zhizhou 562ded1041
fix unittest trt_dynamic_shape_transformer_prune_test error (#29122)
4 years ago
Shibo Tao db41258501
add API serialize_program, serialize_persistables, save_to_file, deserialize_program, deserialize_persistables, load_from_file. (#29034)
4 years ago
joanna.wozna.intel b0d1ac161e
Add bf16 pool2d and unify bf16 unit tests (#29039)
4 years ago
joanna.wozna.intel fddea67445
Fix cpu_bfloat16_pass (#28730)
4 years ago
Qi Li 2fd16cf6fc
fix win ci failure, test=develop (#29089)
4 years ago
Chen Weihang fea0e294ee
Hide the C++ stack by default and add hints (#29042)
4 years ago
Chen Weihang b1274ac3d6
set show cpp stack by default, test=document_fix (#29102)
4 years ago
joejiong 582c0a0468
add uint8 for reshape op (#28996)
4 years ago
Zhou Wei 8ca0a8a859
fix tensor detach to zero copy (#27921)
4 years ago
Aurelius84 8af0d85ea4
fix unittest failed on windows GPU (#29072)
4 years ago
taixiurong a5aa4dc7a9
add xpu elementwise ops (#29031)
4 years ago
joejiong b04c78ef5e
Update pow (#29000)
4 years ago
wawltor b2c8a00745
remove eigen threadpool for the speed up
4 years ago
Wojciech Uss 7b5a8e46de
Add multi_gru_fuse_pass and tests (#28601)
4 years ago
LoveAn c91bb084f4
Add op benchmark ci pipeline in Paddle repo (#28692)
4 years ago
Zhou Wei 5e26a15484
Open GPU unitest on windows (#29003)
4 years ago
Leo Chen 3815d7aa40
Upgrade string literals to raw string (#28989)
4 years ago
lilong12 767d0ba267
update, test=develop (#28700)
4 years ago
Wojciech Uss 991345b368
Add multi_gru_seq_fuse_pass and tests (#28604)
4 years ago
123malin fbf9564f6b
【paddle.distributed.fleet】Optimize ParameterServer's Async Mode (#28442)
4 years ago
lilong12 f77a78cdee
enable pipeline to run with Executor.run() (#28373)
4 years ago
Thunderbrook 0073f9bdb0
support ps-gpu (#28752)
4 years ago
Chen Weihang 768dab441e
polish two api doc detail, test=document_fix (#28971)
4 years ago
furnace 8ff3550658
refactor momentum op to combine weight (#27414)
4 years ago
Jacek Czaja bd1d6d3b30
extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758)
4 years ago
chen zhiyu 3d0ff8eebc
optimize musl docker build script (#28974)
4 years ago
Pei Yang 994673bf4f
change avg pooling and global pooling to trt layer in dynamic shape mode (#28702)
4 years ago
yaoxuefeng 71c1cd1408
fix truncated_gaussian seed (#28777)
4 years ago
HappyAngel de528981e5
fix paddlepredictor build error. test=develop (#28792)
4 years ago
Wilber a22ea652cf
fix trt delete_pass bug. (#28763)
4 years ago
gongweibao 1dad8ceaab
Fix gpu memory allocation bug. (#28703)
4 years ago
Chen Weihang b969c32ab1
fix occupied 0 device memory bug (#28771)
4 years ago
joejiong 1a532d5133
add uint8 support for squeeze operator (#28734)
4 years ago
wangchaochaohu 8b853b3030
fix the number of perf algo for conv cudnn in exhaustive mode (#28694)
4 years ago
joanna.wozna.intel 8c0ea4bffe
Add bf16 matmul, fc, elementwise add and mul (#28729)
4 years ago
Wojciech Uss efc3b182f0
a fix for the fc_lstm_fuse_pass (#28709)
4 years ago
Zhou Wei 3b0dd5f620
fix bug that to_tensor not support paddle.Place (#28717)
4 years ago
yaoxuefeng 08b62f4902
fix shuffle batch op shuffle (#28533)
4 years ago
taixiurong d3d1a6b6e0
add kunlun kernel: slice, slice_grad, top_k, cast. *test=kunlun (#28542)
4 years ago
Jack Zhou 9362d85e0e
Add LSTM, Simple RNN and GRU CPU kernel (#28577)
4 years ago
QingshuChen 30ef3815b3
adjust kunlun header file (#28536)
4 years ago
Zhang Ting dab4920568
improve performance of cast op (#28727)
4 years ago