Commit Graph

10699 Commits (c4eb5d0378cadd0fe8ed0f079746de448aaae3c0)

Author SHA1 Message Date
chentianyu03 ddfc3d2c2f
change grad elementwise_mul for complex types (#29757)
4 years ago
chentianyu03 2a260d9b0e
change the grad of div when complex types (#29804)
4 years ago
ShenLiang f65f1caad3
opt sparse allreduce using ncclgather (#29819)
4 years ago
TTerror 82aa01c373
add nearest_interp_v2 on kunlun (#29725)
4 years ago
wangchaochaohu 01c37c8e02
refine the compiler error for half2 operation (#29816)
4 years ago
whs 82630408b4
Support double backward rsqrt (#29589)
4 years ago
Zhang Ting b76f5a8489
fix the bug of dropout_grad (#29813)
4 years ago
LielinJiang a94c3cbbf3
register cudnn conv double grad for depthwise conv (#29807)
4 years ago
ShenLiang 01e2874a0e
Support multi-stream communication for dynamic graph distributed (#29525)
4 years ago
wangchaochaohu f350aa59ff
Fix the compiler error for half type (#29799)
4 years ago
Huihuang Zheng 1cbb282d77
Add Retry Logic to CublasHandlerHolder
4 years ago
LielinJiang e5af650b71
Add double grad for conv_transpose (#29706)
4 years ago
Leo Chen 224f3bcbb1
format code (#29714)
4 years ago
LoveAn 2e5b4a216c
Optimize compilation time with Unity Build (#29733)
4 years ago
Zhang Jun 0c23ba95d8
enable MakeCiper api for inference;test=develop (#29692)
4 years ago
wangchaochaohu 7b2dc4e6b1
optimization for fp16 elementwise add (#29744)
4 years ago
Jacek Czaja 07790ba13e
[oneDNN] Reimplemented elementwise_add grad (#29747)
4 years ago
Aurelius84 17c8e3adfe
Polish code in gpu_launch_config.h (#29730)
4 years ago
wangchaochaohu 068d905e1e
fix the shape choose of vectorize for cuda
4 years ago
syyxsxx 7c2affaa26
fix isfinite_v2_op OpProtoAndCheckerMaker AddComment bug (#29626)
4 years ago
石晓伟 8bd2879ef7
update the operator registration for incompatible upgrade, test=develop (#29720)
4 years ago
chentianyu03 71063b8137
add conj op for complex types (#29527)
4 years ago
Wilber b593d588aa
[Inference] EnableUseGpu has higher priority than flags (#29697)
4 years ago
WangXi 9cbcc6cadc
fleet sync build strategy, test=develop (#29732)
4 years ago
wanghuancoder 0c59ad2a1a
Windows generate pdb and dump, for debug (#29628)
4 years ago
Huihuang Zheng 4c4d4ba5e0
Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop (#29617)
4 years ago
Chen Weihang 6cfa59de1b
[Complex] Add real & imag op and api for complex tensor (#29672)
4 years ago
Jacek Czaja 9eff1a674f
Added missing format of oneDNN (#29670)
4 years ago
wangchaochaohu 2e0d1ed00f
delete the code for fp16 optimization because it is not faster than common template code (#29715)
4 years ago
TTerror af8ded773a
update activation op on kunlun (#29577)
4 years ago
ceci3 cc387159f3
add pad and concat double grad (#29549)
4 years ago
liuyuhui f13c3a9cd7
[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)
4 years ago
Y_Xuan 76738504ad
添加rocm平台支持代码 (#29342)
4 years ago
Zhang Ting 1e9127f688
improve dropout grad (#29605)
4 years ago
wangchaochaohu eab44e1f32
refine (#29622)
4 years ago
WangXi 613c46bc07
fix gen_nccl_id_op_helper compile failed, test=develop (#29614)
4 years ago
Chen Weihang f02aece1f0
Add complex dtype op (add) test example (#29603)
4 years ago
AshburnLee efea540ca9
Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)
4 years ago
lijianshe02 7779768b53
add transpose double grad test=develop (#29600)
4 years ago
wangchaochaohu 1b69e528d3
optimize for long width for elementwise (#29602)
4 years ago
Wilber 78dad78610
fix none-contiguous bug for python api. (#29615)
4 years ago
ShenLiang 1efef8baed
Fix bug of matmul_v2 for broadcast case (#29599)
4 years ago
qingqing01 8d549fc85d
Add clip double grad (#29590)
4 years ago
wangchaochaohu ac4bae8ee9
elementwise_add_grad Op optimization (#29575)
4 years ago
arlesniak 62d4483649
Added verbose oneDNN lib version (#29378)
4 years ago
lilong12 ff6a145011
update, test=develop (#29559)
4 years ago
WangXi 467c716963
gen nccl id use socket (#29431)
4 years ago
tangwei12 0034273b7e
add service (#29560)
4 years ago
Leo Chen c0163837a5
Fix compile problem when cuda_arch < 6000 (#29576)
4 years ago
QingshuChen 79a41a9ed6
support roi_align & affine_channel for kunlun (#29561)
4 years ago