Commit Graph

10948 Commits (bc7a3afa687696541b032d56d1e9a8ca8e101c77)

Author SHA1 Message Date
Wilber 609c022222
shape op support int8 and uint8 tensor (#30201)
4 years ago
Wilber 01a287bf0a
fix windows compile when WITH_PYTHON=ON and WITH_TENSORRT=ON (#30194)
4 years ago
ruri e42e1e80dc
Add version checking, test=op_version (#30129)
4 years ago
Leo Chen 1f97d61c68
Add callback after TensorCopy (#30123)
4 years ago
Chengmo 528e03fc08
【Paddle.Fleet】Fix tensor table (#30075)
4 years ago
Wilber ade244948c
disable mkldnn inplace pass on windows (#30164)
4 years ago
joanna.wozna.intel 907262ee15
Fix analysis predictor test (#30191)
4 years ago
lijianshe02 2dc7ee276b
enhance error message of nll_loss op test=develop (#30125)
4 years ago
Huihuang Zheng 54bf3f5a56
Refine PADDLE_ENFORCE Error Messages. test=develop (#30149)
4 years ago
Chen Weihang d0fb06b27f
[Complex] Simplify prepared op impl to improve performance (#30153)
4 years ago
123malin c5b415bfd9
Improve Index select cuda kernel (#30139)
4 years ago
wangchaochaohu 7dd551e08b
refine the paddle place support using str (#28769)
4 years ago
WeiXin 404c16763a
Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t (#30161)
4 years ago
Wilber 91a8a25721
enhance error info for py_func (#30138)
4 years ago
weihaoji b8207af6bc
[XPU] Remove lite_xpu ut lite_resnet50_test since fusion pass changes introduced precision diff. test=develop (#30122)
4 years ago
liuyuhui 15fac5e7fa
fix assign_op_xpu concat_op_xpu warining (#30120)
4 years ago
Jack Zhou f5428eca4f
fix enforce msg of sum xpu op (#30113)
4 years ago
123malin 198fbdfb60
Add Lookahead and ModelAverage Optimizer (#30004)
4 years ago
Leo Chen adac38c506
add dispenable input for core.ops.reshape2/expand/slice (#30072)
4 years ago
ShenLiang becf99d2e8
fix error message (#30135)
4 years ago
Zhou Wei 30888ca343
Polish and Optimize the print/repr information of Layer (#29998)
4 years ago
wangguanzhong 69839f8a9a
fix error message for distribute_fpn_proposals_op (#30116)
4 years ago
QingshuChen 8e1c3ddf15
add aarch64 and sunway kunlun lib (#30027)
4 years ago
Shang Zhizhou 05b27695f1
add inference api: DisableTensorRtOps (#30109)
4 years ago
石晓伟 53bb126510
fix a bug in op_version_registry, test=develop, test=op_version (#29994)
4 years ago
xiemoyuan 3e0c492910
Optimize the error message of framework. (#30134)
4 years ago
liym27 9922bd4125
Fix bug: In dynamic mode, if start or end is negetive, __getitem__ return wrong result(#30003)
4 years ago
chentianyu03 666e665132
change the kron gradient when complex types (#29995)
4 years ago
chentianyu03 a5e422c85d
add trace op_register_version and fix version bug; test=op_version (#30000)
4 years ago
cc 9f34374b48
Fix the formate of raising error in randperm op (#30108)
4 years ago
liuyuhui 254ad61959
fix xpu pe sync, test=notest (#30095)
4 years ago
Thunderbrook 0b8e1fadc5
add topo-aware in heter-ps (#30087)
4 years ago
hong 297fff1a79
support dygraph in xpu place (#30051)
4 years ago
wangchaochaohu d0a5620575
fix the compiler error when gcc4 cuda9.0 (#29997)
4 years ago
WangXi ee16006b5d
Optimization grad merge performance (#29784)
4 years ago
yongqiangma e891f4da1b
Add p_norm op version info (#30042)
4 years ago
tangwei12 7d1c149e09
for inference checkpoint (#30081)
4 years ago
tangwei12 7d4bdff07d
fix large scale memory (#30035)
4 years ago
Shang Zhizhou 08dc5bc27e
fix op version checker of pass bug (#30028)
4 years ago
cc 68398abce9
[Inference] zero_copy_tensor supports int8_t (#30053)
4 years ago
whs 1b999d2b5d
Add version checking (#30040)
4 years ago
ceci3 85b2f05ab0
register ModifyAttr for instance_norm, test=op_version (#30065)
4 years ago
channings ddcff254db
fix op_register_version for compare ops, test=op_version (#30007)
4 years ago
Wilber 66e16b7e99
update lite subgraph. (#30056)
4 years ago
GaoWei8 a64822589f
add REGISTER_OP_VERSION for LSTM (#30038)
4 years ago
yinhaofeng 6e93fb92f9
Register op version for linspace,test=op_version (#30025)
4 years ago
123malin d0056c324d
test=develop, add op_register_version for roll_op (#30023)
4 years ago
chentianyu03 e012930aa3
complex gradient matmul (#29966)
4 years ago
ShenLiang 893d37e5c6
Fix rank_attention op_version, test=op_version (#30006)
4 years ago
Adam Osewski 13aef97043
operator checkpoints for new attributes. (#29832)
4 years ago
wangguanzhong 844d8e0c2c
add REGISTER_OP_VERSION for generate_proposals, roi_align, roi_pool test=op_version (#30034)
4 years ago
cc c3c064a8fc
Add mkldnn nearest_interp and bilinear_interp op (#30016)
4 years ago
chalsliu c053bf2a57
Revert "register ModifyAttr for instance_norm, test=op_version (#29938)"
4 years ago
wawltor cc2f94620c
add the support the op version check for matmul, test=op_version (#30011)
5 years ago
wawltor b33aaea86c
add the op version check for the elementwise ops, test=op_version (#30010)
5 years ago
Chengmo 4cbcc9b6da
fix momentum op register (#29941)
5 years ago
hutuxian 7c1f69bdf0
add op_version for flip op [test=op_version] (#30019)
5 years ago
ceci3 77c1684397
register ModifyAttr for instance_norm, test=op_version (#29938)
5 years ago
Leo Chen 47d10c55d5
Enhance debugging (#30001)
5 years ago
FlyingQianMM d42f93e504
add op_register_version for allclose op; test=op_version (#29968)
5 years ago
wawltor 8f49f9d5c9
change the elementwise ops version check, test=op_version
5 years ago
guofei b23faf37be
Add moving_average_abs_max_scale op_register_version test=develop (#29957)
5 years ago
Thunderbrook 0ca6de171f
add include (#29952)
5 years ago
Pei Yang 6206b9bc71
fix ut:trt_resnext_test, trt_quant_int8_yolov3_r50_test, test_trt_dynamic_shape_ernie, test_trt_dynamic_shape_ernie_fp16_ser_deser, trt_cascade_rcnn_test (#29977)
5 years ago
wangxinxin08 be8b5fd18a
register op version for conv2d_transpose, conv3d_transpose and depthwise_conv2d_transpose, test=op_version (#29937)
5 years ago
石晓伟 958612231f
compile the denormal.cc on aarch64, test=develop (#29956)
5 years ago
Guo Sheng 6ac4f0af6a
Register op version for coalesce_tensor. (#29940)
5 years ago
Chen Weihang a1d9a14e89
support grad accumulated across batch (#29942)
5 years ago
cc 6a0102b038
map matmul/squeeze2+matmul/reshape2+matmul to mul (#29911)
5 years ago
Huihuang Zheng d038746e1c
Fix Unix Sleep for Wrong Time. test=develop (#29953)
5 years ago
Jack Zhou 5a4e42ca9a
add gru op_register_version; test=op_version; (#29931)
5 years ago
Wilber 2b1d796cd0
[Inference] Solve 2.0 trt performance reduce compare 1.8. (#29925)
5 years ago
Qi Li 913f77a4b7
Register op version for print, test=op_version (#29945)
5 years ago
石晓伟 181ea1870b
flush denormals to zero, test=develop (#29924)
5 years ago
cc 7667e59bf7
add op version for fake_quant and fake_dequant ops, test=op_version (#29923)
5 years ago
石晓伟 acb5e86363
fix a bug in reset_tensor_array, test=develop (#29620)
5 years ago
liuyuhui 3d1741b794
[Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926)
5 years ago
Wilber 332da133a1
Support mips arch (#29903)
5 years ago
LielinJiang eab0b60e16
Register op version for grid_sampler, test=op_version (#29916)
5 years ago
liym27 9602a182b2
[Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor (#29842)
5 years ago
liuyuhui 4427df37cf
[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)
5 years ago
LielinJiang 0f4b218640
Enable bilateral_slice unittest on windows platform (#29896)
5 years ago
YUNSHEN XIE 2a01756bf3
remove duplicate ut names (#29809)
5 years ago
Chen Weihang a6072055be
[Complex] Handle complex to real after type promotion (#29855)
5 years ago
Chen Weihang 1a304e6c06
[Complex] Add support for complex grad accumulated (#29889)
5 years ago
taixiurong c7acad9f2f
support some shape for matmul and cast in xpu place (#29900)
5 years ago
Leo Chen 6b258317cb
fix TransferInplaceBack (#29830)
5 years ago
QingshuChen 59b47f3b32
feat: support check_nan_inf for kunlun/xpu device (#29694)
5 years ago
tangwei12 032414ca2a
[Feature] one ps (3/4) (#29604)
5 years ago
jakpiase edc06c6a1b
Added fc + activation fuse pass (currently only gelu, sigmoid and tanh are supported) (#29772)
5 years ago
Wilber 2c0a4a3470
call_statck is turned on default when ON_INFER=ON (#29798)
5 years ago
Wilber ad0b01ffe2
lod operator should not be reused in memory_optimize pass. (#29828)
5 years ago
liym27 97e75ad0f5
[setitem] Support Tensor setitem in static mode (#29708)
5 years ago
YUNSHEN XIE 24ce051a84
remove duplicate ut reload (#29810)
5 years ago
Jacek Czaja c9e874fc8e
[oneDNN] Unit test for checking oneDNN caching (#29606)
5 years ago
Thunderbrook 09b6e71928
heter box (#29734)
5 years ago
Jacek Czaja 7b33720c90
[oneDNN] Tensor copy fix to oneDNN tensors (#29771)
5 years ago
123malin a400b76db7
Roll cuda kernel (#29655)
5 years ago
wuhuanzhou e7ac74c85b
optimize compilation time of argmin/argmax op (#29595)
5 years ago
chentianyu03 ddfc3d2c2f
change grad elementwise_mul for complex types (#29757)
5 years ago
chentianyu03 2a260d9b0e
change the grad of div when complex types (#29804)
5 years ago
ShenLiang f65f1caad3
opt sparse allreduce using ncclgather (#29819)
5 years ago
TTerror 82aa01c373
add nearest_interp_v2 on kunlun (#29725)
5 years ago
wangchaochaohu 01c37c8e02
refine the compiler error for half2 operation (#29816)
5 years ago
whs 82630408b4
Support double backward rsqrt (#29589)
5 years ago
Zhang Ting b76f5a8489
fix the bug of dropout_grad (#29813)
5 years ago
LielinJiang a94c3cbbf3
register cudnn conv double grad for depthwise conv (#29807)
5 years ago
ShenLiang 01e2874a0e
Support multi-stream communication for dynamic graph distributed (#29525)
5 years ago
wangchaochaohu f350aa59ff
Fix the compiler error for half type (#29799)
5 years ago
Huihuang Zheng 1cbb282d77
Add Retry Logic to CublasHandlerHolder
5 years ago
LielinJiang e5af650b71
Add double grad for conv_transpose (#29706)
5 years ago
Leo Chen 224f3bcbb1
format code (#29714)
5 years ago
LoveAn 2e5b4a216c
Optimize compilation time with Unity Build (#29733)
5 years ago
Zhang Jun 0c23ba95d8
enable MakeCiper api for inference;test=develop (#29692)
5 years ago
wangchaochaohu 7b2dc4e6b1
optimization for fp16 elementwise add (#29744)
5 years ago
Jacek Czaja 07790ba13e
[oneDNN] Reimplemented elementwise_add grad (#29747)
5 years ago
Aurelius84 17c8e3adfe
Polish code in gpu_launch_config.h (#29730)
5 years ago
wangchaochaohu 068d905e1e
fix the shape choose of vectorize for cuda
5 years ago
syyxsxx 7c2affaa26
fix isfinite_v2_op OpProtoAndCheckerMaker AddComment bug (#29626)
5 years ago
石晓伟 8bd2879ef7
update the operator registration for incompatible upgrade, test=develop (#29720)
5 years ago
chentianyu03 71063b8137
add conj op for complex types (#29527)
5 years ago
Wilber b593d588aa
[Inference] EnableUseGpu has higher priority than flags (#29697)
5 years ago
WangXi 9cbcc6cadc
fleet sync build strategy, test=develop (#29732)
5 years ago
wanghuancoder 0c59ad2a1a
Windows generate pdb and dump, for debug (#29628)
5 years ago
Huihuang Zheng 4c4d4ba5e0
Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop (#29617)
5 years ago
Chen Weihang 6cfa59de1b
[Complex] Add real & imag op and api for complex tensor (#29672)
5 years ago
Jacek Czaja 9eff1a674f
Added missing format of oneDNN (#29670)
5 years ago
wangchaochaohu 2e0d1ed00f
delete the code for fp16 optimization because it is not faster than common template code (#29715)
5 years ago
TTerror af8ded773a
update activation op on kunlun (#29577)
5 years ago
ceci3 cc387159f3
add pad and concat double grad (#29549)
5 years ago
liuyuhui f13c3a9cd7
[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)
5 years ago
Y_Xuan 76738504ad
添加rocm平台支持代码 (#29342)
5 years ago
Zhang Ting 1e9127f688
improve dropout grad (#29605)
5 years ago
wangchaochaohu eab44e1f32
refine (#29622)
5 years ago
WangXi 613c46bc07
fix gen_nccl_id_op_helper compile failed, test=develop (#29614)
5 years ago
Chen Weihang f02aece1f0
Add complex dtype op (add) test example (#29603)
5 years ago
AshburnLee efea540ca9
Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)
5 years ago
lijianshe02 7779768b53
add transpose double grad test=develop (#29600)
5 years ago
wangchaochaohu 1b69e528d3
optimize for long width for elementwise (#29602)
5 years ago
Wilber 78dad78610
fix none-contiguous bug for python api. (#29615)
5 years ago
ShenLiang 1efef8baed
Fix bug of matmul_v2 for broadcast case (#29599)
5 years ago
qingqing01 8d549fc85d
Add clip double grad (#29590)
5 years ago
wangchaochaohu ac4bae8ee9
elementwise_add_grad Op optimization (#29575)
5 years ago
arlesniak 62d4483649
Added verbose oneDNN lib version (#29378)
5 years ago
lilong12 ff6a145011
update, test=develop (#29559)
5 years ago
WangXi 467c716963
gen nccl id use socket (#29431)
5 years ago
tangwei12 0034273b7e
add service (#29560)
5 years ago
Leo Chen c0163837a5
Fix compile problem when cuda_arch < 6000 (#29576)
5 years ago
QingshuChen 79a41a9ed6
support roi_align & affine_channel for kunlun (#29561)
5 years ago
Jacek Czaja f6cca62575
[oneDNN] Making ThreadID info in caching key optional (#29272)
5 years ago
Wilber 740c0d58c3
update for xpu ci. (#29568)
5 years ago
JZ-LIANG d33d468f02
[Sharding] add hybrid-dp feature (#29518)
5 years ago
Leo Chen 1e72e03217
remove duplicated macro (#29563)
5 years ago
Zhang Ting 6702040e94
improve dropout (#29465)
5 years ago
Zhang Ting 30d9589afe
add cast cuda kernel (#29352)
5 years ago
LoveAn b5d4a1f33d
Add the strategy of skipping cc/cu test compilation and execution in CI (#29499)
5 years ago
Aurelius84 2a42250699
Polish hash function of executor cache key (#29556)
5 years ago
taixiurong 760d015c14
add xpu ops for training transformer in kunlun (#29539)
5 years ago
Jacek Czaja 83a693ee55
[oneDNN] Added Unit Test for Multiple instances prediction (#29501)
5 years ago
Zhong Hui 60bfd308ab
fix p_norm with empty shape (#29500)
5 years ago
Leo Chen 9f926eb720
Layernorm opt (#29522)
5 years ago
tangwei12 ae3f7a7100
add ps table (#29463)
5 years ago
ShenLiang d8391a1983
fix error message of gather nd (#29521)
5 years ago
Zhen Wang 5ac71b36fb
Remove tensor copy in the update_loss_scaling op. (#29426)
5 years ago
Zhou Wei e74e1a226c
support deepcopy for Layer/Tensor/Paramerbase (#29387)
5 years ago
joejiong 87e75a77c2
Add tangent operator (#29207)
5 years ago
zlsh80826 95e334810a
Softmax vectorization (#29404)
5 years ago
ShenLiang 2ef9e0e23c
Rebuild group automatically in dynamic graph distributed (#29255)
5 years ago
procr 3a0558339d
support mobilenet for kunlun (#29458)
5 years ago
Huihuang Zheng a1909affc6
Fix Unit Test: Add Sleep Time for CUDA Retry (#29442)
5 years ago
Leo Chen e5e522493d
make gelu fp16 computing more robust (#29484)
5 years ago
Zhang Ting 560b432349
Revert "improve elementwise_add_grad perf (#29277)" (#29464)
5 years ago
jakpiase 57a4f16d9e
added internal and external reorders to profiler (#29443)
5 years ago
Pei Yang 2480bdef6c
change hard_swish from plugin to layer (#29177)
5 years ago
taixiurong ecca6585cd
1. fix elementwise ops'bug 2. fix softmax_with_cross_entropy_op 3. add biliner_interp_op (#29448)
5 years ago
LoveAn 03b42d9fa7
fix unittest on windows, test=develop (#29365)
5 years ago
TTerror a5fcc4b545
update reduce_sum op on xpu (#29367)
5 years ago
Jack Zhou c7cada8571
Fix gru performace decline in 1.8.5 (#29455)
5 years ago
Zhang Ting 6296f4ed09
revert cast eigen kernel (#29427)
5 years ago
Leo Chen a040c055a5
fix layer_norm accuracy (#29434)
5 years ago
Zhou Wei 24ba9ed436
fix that parameters'grad has grad var (#29408)
5 years ago
Leo Chen 4e19ce1df5
refine reshape grad and double grad kernel, use tensor copy async (#29128)
5 years ago
Shang Zhizhou 225a9c4ed8
Fix unittest (#29412)
5 years ago
Pei Yang f860de4af7
support clip op trt converter (#29411)
5 years ago
Jack Zhou 1dd7b97b66
fix rnn_op bug in cudnn_version>= 8 (#29406)
5 years ago
LoveAn 671555ed32
Compiling operator libraries with Unity build (#29130)
5 years ago
cc a623ce044f
Use different name_scope for different conv type, test=develop (#29355)
5 years ago
yongqiangma 7c508d8668
update unbind norm add CUDAPlace api doc information (#29322)
5 years ago
chentianyu03 879e913b6d
Make transpose, trace, kron, reshape, sum op support complex type (#29321)
5 years ago
卖鱼的哲学 074065e5de
fix expand/uniform_random && concat/transpose to new api on xpu (#29280)
5 years ago
lilong12 1decf4ada6
update, test=develop (#29331)
5 years ago
QingshuChen 74bf3bed36
support global pooling for kunlun (#29293)
5 years ago
liym27 b10ecd9d3a
[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267)
5 years ago
Chen Weihang 9ad800ebb2
Support type promote for basic math ops (quantum required) (#29265)
5 years ago
tangwei12 8358791607
fix gpu outofrange (#29238)
5 years ago
Leo Chen b58cfff89d
use has_grad instead of train_mode (#29309)
5 years ago
Zhang Ting befd6d5338
improve elementwise_add_grad perf (#29277)
5 years ago
Shang Zhizhou ebf689197d
fix tensorrt output shape error (#29308)
5 years ago
Aurelius84 67c700b479
[Dy2Stat] Add cache for Executor and Context in run_program_op (#28421)
5 years ago
ShenLiang 696dc4bb13
fix the warning of reducer (#29323)
5 years ago