Commit Graph

10948 Commits (bc7a3afa687696541b032d56d1e9a8ca8e101c77)

Author SHA1 Message Date
oyjxer 7241bc2210
[NPU] Support npu op elementwise_min (#31575)
4 years ago
oyjxer 9606a86b18
[NPU] Support npu op logicalnot_op (#31534)
4 years ago
oyjxer 47860ce20d
[NPU] Support npu op log, log_grad, sqrt, sqrt_grad, square, tanh and tanh_grad (#31600)
4 years ago
oyjxer de65486c19
【NPU】Support npu op elementwise_div and elementwise_div_grad (#31573)
4 years ago
OleNet ec2160a622
[NPU] add range op (#31560)
4 years ago
Leo Chen 0234693040
fix gather_grad bug (#31607)
4 years ago
Leo Chen 5e851bff42
[NPU] fix assgin cmake (#31595)
4 years ago
oyjxer 382fc31f89
【NPU】Support npu op gelu and gelu_grad (#31530)
4 years ago
oyjxer 5d29a27c2e
[NPU] fix npu op elementwise_mul_grad (#31592)
4 years ago
OleNet 09bf2cfc0e
[NPU] add Assign OP (#31561)
4 years ago
xiayanming f1fdddfdc8
[NPU] Support npu kernel for c sync stream op (#31386)
4 years ago
yinhaofeng e1c33a6d69
[NPU] accuracy op (#31492)
4 years ago
xiayanming 3bf8a34c69
[NPU] Support npu kernel for amp_check_finite_and_unscale_npu op (#31457)
4 years ago
xiayanming d746197398
[NPU] Support npu kernel for gather op fix bug (#31541)
4 years ago
zhang wenhui 5d22e15b6e
【NPU】Suppert npu kernel for reshape2 op (#31524)
4 years ago
zhang wenhui 581e5460a0
【NPU】add relu op for npu (#31515)
4 years ago
oyjxer cfeeb4bc95
[NPU] Support npu op elementwise_max (#31574)
4 years ago
oyjxer e15ccafb84
[NPU] Support npu op elementwise_mul and elementwise_mul_grad (#31571)
4 years ago
zhang wenhui 29d50d2049
【NPU】Support npu kernel for matmul op (#31544)
4 years ago
xiayanming f400ce9f51
[NPU] Support npu kernel for reduceany op (#31422)
4 years ago
zhang wenhui 7524ac9345
【NPU】support npu kernel for fill_constant op (#31521)
4 years ago
Leo Chen 3f206e97c4
Support TensorFormVector, TensorToVector of bool type (#31518)
4 years ago
zhang wenhui 9df84bd693
【NPU】add scale op for npu (#31499)
4 years ago
xiayanming e19195f795
Support npu kernel for gather op (#31458)
4 years ago
lw921014 15823bb0df
[NPU] add npu kernel for communication op (#31437)
4 years ago
Reventon_L 388c69f27d
[NPU] squeeze and unsqueeze op for ascend (#31452)
4 years ago
Leo Chen 83f81eb573
Fix pow, refine code (#31440)
4 years ago
Leo Chen 5fe3d596e4
Fix pow, use fillD instead of broadcast (#31433)
4 years ago
zhang wenhui ecc6e213d7
fix endif (#31431)
4 years ago
zhang wenhui b3c88e961c
[NPU] Support npu kernel for shape op (#31427)
4 years ago
Leo Chen ac3d821bc0
[NPU] add npu kernel for equal op (#31393)
4 years ago
Leo Chen 0310945f5c
[NPU] Support npu op layer_norm and layer_norm_grad (#31310)
4 years ago
Void Main 45765d6eb6
Refactor HCCLCommContext to be compatible with Paddle (#31359)
4 years ago
Leo Chen 8497e2aad3
[NPU] add npu kernel for elementwise_add_grad (#31347)
4 years ago
lw921014 9fcdaeba5e
add allreduce and broadcast without test (#31024)
4 years ago
liym27 a1ddff81e3
[NPU] Support npu op: (1) slice (2) slice_grad (#31275)
4 years ago
Leo Chen d23bf89cf6
support list of list attribute for NPU (#31299)
4 years ago
liym27 187248f568
[NPU] Support npu op pow and pow grad (#31247)
4 years ago
Leo Chen d45f5d787e
Fix typo of selected_npus (#31230)
4 years ago
Leo Chen ff4654e216
refactor npu device manager (#31154)
4 years ago
liym27 1435b4c096
[NPU] Support executor with NPU (#31057)
4 years ago
Leo Chen 85cbd55648
Fix compilation problem (#31100)
4 years ago
Leo Chen 5cb20f30fc
add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
4 years ago
Leo Chen 1201cd2ef2
[feature] support npu allocator, part 2 (#30972)
4 years ago
Leo Chen 7e049108c5
[feature] support npu operator (#30951)
4 years ago
Leo Chen 81138239db
[feature] support npu allocator (#30840)
4 years ago
gongweibao ebef6601d5
Destroy session first. (#30954)
4 years ago
Leo Chen 88dfd067bf
Dev/fix ascend string (#30749)
4 years ago
Leo Chen 6eabbc8076
fix compilation on ascend-20.1 (#30722)
4 years ago
gongweibao e4287ca60b
Add Hccl program group (#30642)
4 years ago
gongweibao f9c97dd728
Add distribution supported (#30578)
4 years ago
gongweibao 1882f2ce2d
Fix compilcation on CANN20.1 and older (#30494)
4 years ago
hutuxian 6dd52c5b25
Ascend rc (#30483)
4 years ago
石晓伟 715d862868
export global google flags to users, test=develop (#30448)
4 years ago
Wojciech Uss 88fc7a7d68
fix cache key for inplaced elementwise ops (#30404)
4 years ago
wawltor 3d49882e2c
fix the rnn mask memory bug for out of read (#30459)
4 years ago
taixiurong 6a3c8725b0
support transformer v2.0 (#30381)
4 years ago
ShenLiang e85be1b1b2
fix flatten api grad (#30426)
4 years ago
yaoxuefeng 6e0da01c61
Heter ps new (#30198)
4 years ago
123malin 2a98e9323a
test=develop, add distributed_infer (#30300)
4 years ago
QingshuChen cf786d22ec
fix bug that cann't find mkldnn(kunlun) (#30394)
4 years ago
cc 8e3a294045
skip quantizing ops in cpu inference (#30342)
4 years ago
alncat 7bbf3ac5ab
Added support for inference using quantization aware trained dygraph (#30288)
4 years ago
GaoWei8 180877e988
Softmax backward optimize (#30249)
4 years ago
Zhang Jun 10a8f3e5c3
fix bug on compiling inference shared lib with crypto;test=develop (#30269)
4 years ago
Huihuang Zheng 28e156c27f
Fix Sleep Error in enforce.h (#30335)
4 years ago
Leo Chen 3d015f1cf5
Set expected place in child thread for dataloader to avoid costing cuda memory on other card (#30338)
4 years ago
QingshuChen 2c1bba02e4
optimize memcpy perf for kunlun (#30291)
4 years ago
ShenLiang a60f17b89d
Support unused parameters in dynamic graph distributed (#30224)
4 years ago
JZ-LIANG 75936d838f
Recompute Offload (#30233)
4 years ago
lidanqing a60893f6b5
correct the allowed dimension size (#30326)
4 years ago
Chen Weihang c8c8f205ba
remove c++ stacktrace hint (#30325)
4 years ago
tangwei12 5e839e4da5
add sparse embedding & load vars for 2.0 & gloo bug fix (#30306)
4 years ago
tangwei12 25f80fd304
Fix/distributed proto (#29981)
4 years ago
Chengmo d479ae1725
【Paddle.Fleet】Support local save sparse param (#30175)
4 years ago
Double_V 231501fefc
fix elugradgrad test fail & error message opt (#30171)
4 years ago
Zhen Wang fb49ea388e
Fix the accuracy problem of allclose op when using float64 data type in static mode. (#29890)
4 years ago
yaoxuefeng 4656525e24
fix datanorm error msg (#30294)
4 years ago
furnace 77051cc9f0
add fp16 support for tril_triu op (#30186)
4 years ago
石晓伟 efa54629fb
fix header file paths of gflags, commit 3, test=develop (#30273)
4 years ago
Chengmo 5b2c15afcd
Fix server.h include device_context (#30243)
4 years ago
石晓伟 a0ee09148e
enhance error msgs of fusion_seqpool_cvm_concat_op.cc, test=develop (#30240)
4 years ago
石晓伟 a66eebab5c
fix header file paths of gflags, commit 4, test=develop (#30274)
4 years ago
石晓伟 8c4500ff6d
fix header file paths of gflags, commit 2, test=develop (#30272)
4 years ago
liym27 b4989fb744
Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126)
4 years ago
wangchaochaohu 8dcae0c55d
register OPMaker and Infer Shape Check for fused_elementwise_add (#30259)
4 years ago
AshburnLee 924aac2216
Add tf32 switch for cuDNN (#29192)
4 years ago
石晓伟 8ce2482b80
fix header file paths of gflags, commit 1, test=develop (#30271)
4 years ago
chentianyu03 c7371b7b20
type promotion for grad (#30177)
4 years ago
liym27 3ce878f309
Check the rank of input in kernel of set_value op (#30147)
4 years ago
WeiXin 66dc4ac77b
modify error message based on comments (#30189)
4 years ago
wawltor fee424411a
just add the op error message for the matmul xpu (#30246)
4 years ago
GaoWei8 0a21924a8d
optimize softmax forward (#30217)
4 years ago
wangchaochaohu af80859dd6
reduce the occupied size of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)
4 years ago
zhang wenhui 5932fee60a
enhance error message, test=develop (#30220)
4 years ago
pangyoki da16b33f2e
add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913)
4 years ago
Jacek Czaja 4aba17b5db
[oneDNN] Added UT for testing elementwise_mul caching (#30203)
4 years ago
Zhen Wang 7f7dfccf20
Support pure fp16 training for AMP API. (#29544)
4 years ago
Leo Chen 789743e190
use cuda generator in bernoulli cuda kernel (#30199)
4 years ago
Leo Chen 8696335f86
Fix dtype of ungenerated grad var (#28511)
4 years ago