Commit Graph

3157 Commits (bc7a3afa687696541b032d56d1e9a8ca8e101c77)

Author SHA1 Message Date
Leo Chen a6343afc70
[NPU] support npu for memcpy op (#31808)
4 years ago
An Improved PeleeNet Algorithm with Feature Pyramid Networks for Image Detection 3ab39705ea
adapter npu (#31926)
4 years ago
Leo Chen ac89174e5a
[NPU] support GarbageCollector for npu (#31874)
4 years ago
lilong12 228bce12c8
Add 3d parallelism (#31796)
4 years ago
Leo Chen 3f206e97c4
Support TensorFormVector, TensorToVector of bool type (#31518)
4 years ago
Leo Chen d23bf89cf6
support list of list attribute for NPU (#31299)
4 years ago
liym27 1435b4c096
[NPU] Support executor with NPU (#31057)
4 years ago
Leo Chen 85cbd55648
Fix compilation problem (#31100)
4 years ago
Leo Chen 5cb20f30fc
add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
4 years ago
Leo Chen 1201cd2ef2
[feature] support npu allocator, part 2 (#30972)
4 years ago
Leo Chen 7e049108c5
[feature] support npu operator (#30951)
4 years ago
Leo Chen 81138239db
[feature] support npu allocator (#30840)
4 years ago
gongweibao ebef6601d5
Destroy session first. (#30954)
4 years ago
Leo Chen 88dfd067bf
Dev/fix ascend string (#30749)
4 years ago
Leo Chen 6eabbc8076
fix compilation on ascend-20.1 (#30722)
4 years ago
gongweibao 1882f2ce2d
Fix compilcation on CANN20.1 and older (#30494)
4 years ago
hutuxian 6dd52c5b25
Ascend rc (#30483)
4 years ago
yaoxuefeng 6e0da01c61
Heter ps new (#30198)
4 years ago
cc 8e3a294045
skip quantizing ops in cpu inference (#30342)
4 years ago
alncat 7bbf3ac5ab
Added support for inference using quantization aware trained dygraph (#30288)
4 years ago
Zhang Jun 10a8f3e5c3
fix bug on compiling inference shared lib with crypto;test=develop (#30269)
4 years ago
JZ-LIANG 75936d838f
Recompute Offload (#30233)
4 years ago
tangwei12 5e839e4da5
add sparse embedding & load vars for 2.0 & gloo bug fix (#30306)
4 years ago
tangwei12 25f80fd304
Fix/distributed proto (#29981)
4 years ago
liym27 b4989fb744
Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126)
4 years ago
石晓伟 8ce2482b80
fix header file paths of gflags, commit 1, test=develop (#30271)
4 years ago
wangchaochaohu af80859dd6
reduce the occupied size of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)
4 years ago
Zhen Wang 7f7dfccf20
Support pure fp16 training for AMP API. (#29544)
4 years ago
Leo Chen 789743e190
use cuda generator in bernoulli cuda kernel (#30199)
4 years ago
Leo Chen 1f97d61c68
Add callback after TensorCopy (#30123)
4 years ago
Chengmo 528e03fc08
【Paddle.Fleet】Fix tensor table (#30075)
4 years ago
Huihuang Zheng 54bf3f5a56
Refine PADDLE_ENFORCE Error Messages. test=develop (#30149)
4 years ago
Chen Weihang d0fb06b27f
[Complex] Simplify prepared op impl to improve performance (#30153)
4 years ago
liuyuhui 15fac5e7fa
fix assign_op_xpu concat_op_xpu warining (#30120)
4 years ago
石晓伟 53bb126510
fix a bug in op_version_registry, test=develop, test=op_version (#29994)
4 years ago
liuyuhui 254ad61959
fix xpu pe sync, test=notest (#30095)
4 years ago
Thunderbrook 0b8e1fadc5
add topo-aware in heter-ps (#30087)
4 years ago
WangXi ee16006b5d
Optimization grad merge performance (#29784)
4 years ago
Shang Zhizhou 08dc5bc27e
fix op version checker of pass bug (#30028)
4 years ago
cc c3c064a8fc
Add mkldnn nearest_interp and bilinear_interp op (#30016)
5 years ago
wawltor cc2f94620c
add the support the op version check for matmul, test=op_version (#30011)
5 years ago
wawltor b33aaea86c
add the op version check for the elementwise ops, test=op_version (#30010)
5 years ago
Leo Chen 47d10c55d5
Enhance debugging (#30001)
5 years ago
wawltor 8f49f9d5c9
change the elementwise ops version check, test=op_version
5 years ago
Thunderbrook 0ca6de171f
add include (#29952)
5 years ago
cc 6a0102b038
map matmul/squeeze2+matmul/reshape2+matmul to mul (#29911)
5 years ago
Jack Zhou 5a4e42ca9a
add gru op_register_version; test=op_version; (#29931)
5 years ago
Wilber 2b1d796cd0
[Inference] Solve 2.0 trt performance reduce compare 1.8. (#29925)
5 years ago
石晓伟 181ea1870b
flush denormals to zero, test=develop (#29924)
5 years ago
liuyuhui 3d1741b794
[Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926)
5 years ago