Paddle

Commit Graph

Author	SHA1	Message	Date
oyjxer	7241bc2210	[NPU] Support npu op elementwise_min (#31575 )	4 years ago
oyjxer	9606a86b18	[NPU] Support npu op logicalnot_op (#31534 )	4 years ago
oyjxer	47860ce20d	[NPU] Support npu op log, log_grad, sqrt, sqrt_grad, square, tanh and tanh_grad (#31600 )	4 years ago
oyjxer	de65486c19	【NPU】Support npu op elementwise_div and elementwise_div_grad (#31573 ) * Support npu op elementwise_div and elementwise_div_grad * Support npu op elementwise_div and elementwise_div_grad * Support npu op elementwise_div and elementwise_div_grad	4 years ago
OleNet	ec2160a622	[NPU] add range op (#31560 ) * add range op * fix codestyle; call GetSize directly Co-authored-by: oyjxer <1728722986@qq.com>	4 years ago
Leo Chen	0234693040	fix gather_grad bug (#31607 )	4 years ago
Leo Chen	5e851bff42	[NPU] fix assgin cmake (#31595 )	4 years ago
oyjxer	382fc31f89	【NPU】Support npu op gelu and gelu_grad (#31530 ) * Support npu op gelu and gelu_grad * Support npu op gelu and gelu_grad	4 years ago
oyjxer	5d29a27c2e	[NPU] fix npu op elementwise_mul_grad (#31592 )	4 years ago
OleNet	09bf2cfc0e	[NPU] add Assign OP (#31561 ) * add assign op * add test assign npu test * dele if def Co-authored-by: oyjxer <1728722986@qq.com>	4 years ago
xiayanming	f1fdddfdc8	[NPU] Support npu kernel for c sync stream op (#31386 ) * sync stream npu op * add with_ascend_acl * update c++ unittest	4 years ago
yinhaofeng	e1c33a6d69	[NPU] accuracy op (#31492 ) * accuracy op * fix license * fix * add test and fix bug	4 years ago
xiayanming	3bf8a34c69	[NPU] Support npu kernel for amp_check_finite_and_unscale_npu op (#31457 ) * Support npu kernel for amp_check_finite_and_unscale_npu op * support EnforceNotMet exception * fix exception bug * modify python unittest * precommit * update c++ unittest * fix review * fix review	4 years ago
xiayanming	d746197398	[NPU] Support npu kernel for gather op fix bug (#31541 ) * add gather npu op * code review done * update python new line * precommit * fix review * del commit * update gather_grad * fix bug * fix bug	4 years ago
zhang wenhui	5d22e15b6e	【NPU】Suppert npu kernel for reshape2 op (#31524 ) * add reshape2 npu * add reshpe2	4 years ago
zhang wenhui	581e5460a0	【NPU】add relu op for npu (#31515 ) * add relu npu * fixed * fix	4 years ago
oyjxer	cfeeb4bc95	[NPU] Support npu op elementwise_max (#31574 )	4 years ago
oyjxer	e15ccafb84	[NPU] Support npu op elementwise_mul and elementwise_mul_grad (#31571 )	4 years ago
zhang wenhui	29d50d2049	【NPU】Support npu kernel for matmul op (#31544 ) * add matmulv2_npu * add matmul * add matmul	4 years ago
xiayanming	f400ce9f51	[NPU] Support npu kernel for reduceany op (#31422 ) * add reduce any npu op * add gather python unittest * update c_plus unittest * update python unittest * del c++ unittest * update c++ unittest * update c++ unittest	4 years ago
zhang wenhui	7524ac9345	【NPU】support npu kernel for fill_constant op (#31521 ) * add fill_constant npu * add fill_constant npu * fix	4 years ago
Leo Chen	3f206e97c4	Support TensorFormVector, TensorToVector of bool type (#31518 ) * support TensorFormVector, TensorToVector of bool type * add ut * fix compile problem	4 years ago
zhang wenhui	9df84bd693	【NPU】add scale op for npu (#31499 ) * add scale npu * fix * fix	4 years ago
xiayanming	e19195f795	Support npu kernel for gather op (#31458 ) * add gather npu op * code review done * update python new line * precommit * fix review * del commit	4 years ago
lw921014	15823bb0df	[NPU] add npu kernel for communication op (#31437 ) * add allreduce and broadcast without test * add c_broadcast_test case * build c_comm_init and c_create_group operators * make the whole thing compile * add broadcast and init op test case but run failed * make unit test compile * fix broadcast test bug and change into hcom for ccl * change c_comm_init and c_create_group ops accordingly * make tests compile * transfer code to 27 * compiled successfully in 28, but run failed * test broadcast in 28, but failed * make hcom primitives work * change hccl data type for base.h * fix broadcast bug * make attributes work * fix group name bug * add allreduce but test failed * allreduce bug for qiuliang * allreduce finished * add allgather and reducescatter * merge all op code * add allgather test * finish run all ccl op test exclude send/recv * all all op and test exclude send/recv * send_v2_npu.cc recv_v2_npiu.cc compiled * fix ccl core dump bug and test allgather, reducescatter, broadcast op * fix allreduce bug just for test * hcom send&recv test pass, without hcom_destroy * for qiuliang test * Ascend Send&Recv Test Pass * all op (ex send/recv) ok * fix bug * merge all ccl op * style merge to PaddlePaddle * merge style * new merge style * merge style 2 * insert an empty at the end * disable ctest for hcom to pass ci Co-authored-by: void-main <voidmain1313113@gmail.com> Co-authored-by: f2hkop <f2huestc@outlook.com>	4 years ago
Reventon_L	388c69f27d	[NPU] squeeze and unsqueeze op for ascend (#31452 ) Co-authored-by: root <xiayanming@baidu.com>	4 years ago
Leo Chen	83f81eb573	Fix pow, refine code (#31440 )	4 years ago
Leo Chen	5fe3d596e4	Fix pow, use fillD instead of broadcast (#31433 )	4 years ago
zhang wenhui	ecc6e213d7	fix endif (#31431 )	4 years ago
zhang wenhui	b3c88e961c	[NPU] Support npu kernel for shape op (#31427 ) * add shape npu * fix * fix	4 years ago
Leo Chen	ac3d821bc0	[NPU] add npu kernel for equal op (#31393 ) * add npu kernel for equal op * refine code * add more ut * update year	4 years ago
Leo Chen	0310945f5c	[NPU] Support npu op layer_norm and layer_norm_grad (#31310 ) * init commit, add layer_norm npu kernel * fix typo * add unittest * add unittest * fix bug * fix bug * refine ut	4 years ago
Void Main	45765d6eb6	Refactor HCCLCommContext to be compatible with Paddle (#31359 ) Refactor HCCLCommContext to be compatible with Paddle (#31359)	4 years ago
Leo Chen	8497e2aad3	[NPU] add npu kernel for elementwise_add_grad (#31347 ) * fix reading flags from env * fix problem caused by async run * support partial grad * support elementwise_add_grad npu kernel * add unittest * fix bug?	4 years ago
lw921014	9fcdaeba5e	add allreduce and broadcast without test (#31024 ) add allreduce and broadcast without test	4 years ago
liym27	a1ddff81e3	[NPU] Support npu op: (1) slice (2) slice_grad (#31275 )	4 years ago
Leo Chen	d23bf89cf6	support list of list attribute for NPU (#31299 ) * support list of list attribute for NPU * fix compile problem * fix reference	4 years ago
liym27	187248f568	[NPU] Support npu op pow and pow grad (#31247 ) * [NPU] Support npu op: (1) pow (2) pow_grad * Support fp16	4 years ago
Leo Chen	d45f5d787e	Fix typo of selected_npus (#31230 )	4 years ago
Leo Chen	ff4654e216	refactor npu device manager (#31154 ) refactor npu device manager (#31154)	4 years ago
liym27	1435b4c096	[NPU] Support executor with NPU (#31057 ) * [NPU] Support executor with NPU * Fix code according to reviews * Fix code * Add unittest for sub op npu	4 years ago
Leo Chen	85cbd55648	Fix compilation problem (#31100 ) Fix compilation problem (#31100)	4 years ago
Leo Chen	5cb20f30fc	add npu kernel for elementwise_sub and elementwise_sub_grad (#30973 ) * add npu sub op * fix typo * rename test * fix bug * fix bug * add fp16 kernel * fix typo * support sub grad op * support elementwise_sub_grad op Co-authored-by: frankwhzhang <frankwhzhang@126.com>	4 years ago
Leo Chen	1201cd2ef2	[feature] support npu allocator, part 2 (#30972 ) * support npu allocator * add npu device context * fix some compile problem * fix some compile problem * add npu info * compile ok * fix include dir * support naive_best_fit_allocator * run ut ok, bug failed to exit * call aclrtResetDevice before exit * fix aclFinilize * add system allocatot test * add selected_gpus in gtest * add tensor_test for npu * support npu op, initial commit * add npu stream * add elementwise_add_op * compile ok * fix typo * fix elementwise_add_op_npu_test * support op run * test can run but failed * change aclopExecuteV2 to aclopCompileAndExecute	4 years ago
Leo Chen	7e049108c5	[feature] support npu operator (#30951 ) [feature] support npu operator	4 years ago
Leo Chen	81138239db	[feature] support npu allocator (#30840 ) [feature] support npu allocator	4 years ago
gongweibao	ebef6601d5	Destroy session first. (#30954 ) Destroy session first.	4 years ago
Leo Chen	88dfd067bf	Dev/fix ascend string (#30749 ) Dev/fix ascend string	4 years ago
Leo Chen	6eabbc8076	fix compilation on ascend-20.1 (#30722 ) fix compilation on ascend-20.1	4 years ago
gongweibao	e4287ca60b	Add Hccl program group (#30642 ) Add Hccl program group	4 years ago
gongweibao	f9c97dd728	Add distribution supported (#30578 ) Add distribution supported	4 years ago
gongweibao	1882f2ce2d	Fix compilcation on CANN20.1 and older (#30494 ) Fix compilcation on CANN20.1 and older	4 years ago
hutuxian	6dd52c5b25	Ascend rc (#30483 )	4 years ago
石晓伟	715d862868	export global google flags to users, test=develop (#30448 )	4 years ago
Wojciech Uss	88fc7a7d68	fix cache key for inplaced elementwise ops (#30404 )	4 years ago
wawltor	3d49882e2c	fix the rnn mask memory bug for out of read (#30459 ) * fix the rnn mask memory bug for out of read * update the code for the rnn	4 years ago
taixiurong	6a3c8725b0	support transformer v2.0 (#30381 )	4 years ago
ShenLiang	e85be1b1b2	fix flatten api grad (#30426 )	4 years ago
yaoxuefeng	6e0da01c61	Heter ps new (#30198 )	4 years ago
123malin	2a98e9323a	test=develop, add distributed_infer (#30300 ) * test=develop, add distributed_infer	4 years ago
QingshuChen	cf786d22ec	fix bug that cann't find mkldnn(kunlun) (#30394 )	4 years ago
cc	8e3a294045	skip quantizing ops in cpu inference (#30342 ) * skip quantizing ops in cpu inference, test=develop	4 years ago
alncat	7bbf3ac5ab	Added support for inference using quantization aware trained dygraph (#30288 ) * added support for inference using qunatization aware trained dygraph * added support for inference using qunatization aware trained dygraph correct boost get usage * Delete incorrect warning message (#30196) * fix warning and no grad * clean redundant API alias in 2.0 - part 2 (#30013) * delete paddle.nn.functional.assign * fix dynamic to static error * just add the op error message for the matmul xpu (#30246) add the op error message for the matmul xpu * Add Static Variable Clone (#30208) Add clone method for static Variable so that this interface will be same as dygraph. It fixed some bugs in dy2stat * use wget to replace curl to download the lcov file (#30229) * use wget to replace curl to download the lcov file * add cache for lcov * fix test_pool3d_op timeout issue (#30248) * Fix unittests bugs. (#30250) * modify error message based on comments (#30189) * modify error message based on comments * edit code according to review. * Correct spelling according to review. * Fix bug for 'save mutiple method' (#30218) * Fix bug for 'save mutiple method' * To pass coverage. * edit code to pass coverage. * edit code to pass coverage. * add unittest for coverage. * change for coverage. * edit for coverage. * added support for inference using qunatization aware trained dygraph * Alias from paddle.fluid.layers.auc to paddle.static.auc (#30206) * add alias from fluid.layers.auc to static.auc * Update __init__.py * added support for inference using qunatization aware trained dygraph correct boost get usage * corrected boost get usage * corrected naming issues and enforcing zero check * correct paddle enforce message * added more error checkings * corrected error report message and optimized code * corrected findvar usage * corrected paddle_enforce in scope * correct error messages * correct error reporting format Co-authored-by: LielinJiang <50691816+LielinJiang@users.noreply.github.com> Co-authored-by: XiaoguangHu <46782768+XiaoguangHu01@users.noreply.github.com> Co-authored-by: wawltor <fangzeyang0904@hotmail.com> Co-authored-by: Huihuang Zheng <zhhsplendid@gmail.com> Co-authored-by: YUNSHEN XIE <1084314248@qq.com> Co-authored-by: Bai Yifan <me@ethanbai.com> Co-authored-by: gongweibao <weibao.gong@gmail.com> Co-authored-by: WeiXin <weixin10@baidu.com> Co-authored-by: Jiaqi Liu <liujiaqi06@baidu.com>	4 years ago
GaoWei8	180877e988	Softmax backward optimize (#30249 ) * softmax backward optimize	4 years ago
Zhang Jun	10a8f3e5c3	fix bug on compiling inference shared lib with crypto;test=develop (#30269 ) * fix bug on compiling inference shared lib with crypto;test=develop * fix cmake bug when build inference lib using -DWITH_CRYPTO=OFF * update cmake * remove unnecessary enforce message	4 years ago
Huihuang Zheng	28e156c27f	Fix Sleep Error in enforce.h (#30335 ) usleep function in <unistd.h> only takes argument less than 1,000,000. Current call can exceed this limit, we have to fix it. This PR can fix random CI error.	4 years ago
Leo Chen	3d015f1cf5	Set expected place in child thread for dataloader to avoid costing cuda memory on other card (#30338 ) * set expected place in child thread for dataloader * set device id when set tensor from numpy * revert tensor_py change * add compile guard * fix ci * fix bug	4 years ago
QingshuChen	2c1bba02e4	optimize memcpy perf for kunlun (#30291 ) * optimize memcpy perf for kunlun * remove useless unitest for kunlun mean * minor	4 years ago
ShenLiang	a60f17b89d	Support unused parameters in dynamic graph distributed (#30224 )	4 years ago
JZ-LIANG	75936d838f	Recompute Offload (#30233 )	4 years ago
lidanqing	a60893f6b5	correct the allowed dimension size (#30326 )	4 years ago
Chen Weihang	c8c8f205ba	remove c++ stacktrace hint (#30325 )	4 years ago
tangwei12	5e839e4da5	add sparse embedding & load vars for 2.0 & gloo bug fix (#30306 ) * add sparse embedding & load vars for 2.0 Change-Id: I36b59ed5f015189dc9d9d2e34a9357722d369f1b * fix hdfs gloo Change-Id: Ia84d579053720ad804183e54c9a04b4f031c79c6 * fix gloo hdfs Change-Id: I5ab982fd483cddc10adcdef0b8aa83aca976cb9e * move loadvar/sparse embedding from incubute to static Change-Id: I57081d3545ad2efab78c72420d2162c0eacaf3a0	4 years ago
tangwei12	25f80fd304	Fix/distributed proto (#29981 ) * rename sendrecv.proto to namespace paddle.distributed * split ps with distributed	4 years ago
Chengmo	d479ae1725	【Paddle.Fleet】Support local save sparse param (#30175 ) * add save tensor support Co-authored-by: seiriosPlus <tangwei12@baidu.com>	4 years ago
Double_V	231501fefc	fix elugradgrad test fail & error message opt (#30171 ) * fix elugradgrad test fail and error message opt * fix unitest,test=develop * Update prroi_pool_op.h fix error message * opt message,test=develop * fix ci fail,test=develop	4 years ago
Zhen Wang	fb49ea388e	Fix the accuracy problem of allclose op when using float64 data type in static mode. (#29890 ) * Fix the accuracy problem of allclose op when using float64 data type in static mode. * Format the code style.	4 years ago
yaoxuefeng	4656525e24	fix datanorm error msg (#30294 )	4 years ago
furnace	77051cc9f0	add fp16 support for tril_triu op (#30186 )	4 years ago
石晓伟	efa54629fb	fix header file paths of gflags, commit 3, test=develop (#30273 )	4 years ago
Chengmo	5b2c15afcd	Fix server.h include device_context (#30243 ) * fix cmake Co-authored-by: seiriosPlus <tangwei12@baidu.com>	4 years ago
石晓伟	a0ee09148e	enhance error msgs of fusion_seqpool_cvm_concat_op.cc, test=develop (#30240 )	4 years ago
石晓伟	a66eebab5c	fix header file paths of gflags, commit 4, test=develop (#30274 )	4 years ago
石晓伟	8c4500ff6d	fix header file paths of gflags, commit 2, test=develop (#30272 )	4 years ago
liym27	b4989fb744	Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126 )	4 years ago
wangchaochaohu	8dcae0c55d	register OPMaker and Infer Shape Check for fused_elementwise_add (#30259 )	4 years ago
AshburnLee	924aac2216	Add tf32 switch for cuDNN (#29192 )	4 years ago
石晓伟	8ce2482b80	fix header file paths of gflags, commit 1, test=develop (#30271 )	4 years ago
chentianyu03	c7371b7b20	type promotion for grad (#30177 ) * type promotion for grad * add type promotion for div op	4 years ago
liym27	3ce878f309	Check the rank of input in kernel of set_value op (#30147 )	4 years ago
WeiXin	66dc4ac77b	modify error message based on comments (#30189 ) * modify error message based on comments * edit code according to review. * Correct spelling according to review.	4 years ago
wawltor	fee424411a	just add the op error message for the matmul xpu (#30246 ) add the op error message for the matmul xpu	4 years ago
GaoWei8	0a21924a8d	optimize softmax forward (#30217 ) * optimize softmax forward	4 years ago
wangchaochaohu	af80859dd6	reduce the occupied size of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885 )	4 years ago
zhang wenhui	5932fee60a	enhance error message, test=develop (#30220 )	4 years ago
pangyoki	da16b33f2e	add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913 ) * add view strategy on squeeze,unsqueeze,reshape,flatten * add squeeze unittest * add unittests * use View strategy as name rather than Reuse Allacation * fix view api doc * fix format * use core.ops when input of reshape2 is Tensor * fix test_cross_entropy_loss error because of reshape2 * delete selected_rows * change op_function * little change * solve HandleViewBetweenInputAndOutput	4 years ago
Jacek Czaja	4aba17b5db	[oneDNN] Added UT for testing elementwise_mul caching (#30203 ) * - Added UT for testing elementwise_mul caching * lint fixes	4 years ago
Zhen Wang	7f7dfccf20	Support pure fp16 training for AMP API. (#29544 ) * add cast ops before and after unsupported fp16 ops. * Keep partial net in FP32 pattern. * Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode. * Add fp16 support for adam op. * add multi precision attr for adam. * Fix the bug of test_multi_precision_fp16_train UT. * Code format for CI. * Fix the redefine error about MPTypeTrait on windows. * fix bugs of the _create_accumulators func in Momentum. * fix bug when inserting post cast op. * Add the update_loss_scaling op in allow_set of UnusedVarCheck. * Update for ci coverage. * Add some doc for OptimizerWithMixedPrecision. * Fix the code style. * Imporve the doc of `amp_init`. * Change for fp16 testing if users have the infer program defined in separate way.	4 years ago
Leo Chen	789743e190	use cuda generator in bernoulli cuda kernel (#30199 )	4 years ago
Leo Chen	8696335f86	Fix dtype of ungenerated grad var (#28511 ) * fix dtype of ungenerated grad var * update ut * refine code * set default dtype * fix could_use_cudnn bug * remove debug code * re-implement * fix bug	4 years ago

1 2 3 4 5 ...

10948 Commits (bc7a3afa687696541b032d56d1e9a8ca8e101c77)