Paddle

Commit Graph

Author	SHA1	Message	Date
tianshuo78520a	2e93233899	Add WITH_XPU_BKCL in Kunlun-CI (#30919 )	4 years ago
Qi Li	34f1628ce8	[ROCM] update fluid platform for rocm39 (part2), test=develop (#30774 )	4 years ago
Jacek Czaja	9e527d9956	[oneDNN] Added basic changes for elementwise_add_grad bf16 (#30925 )	4 years ago
Chengmo	c98f144fbc	add truncated gaussian random (#30922 ) add truncated gaussian random	4 years ago
liuyuhui	4a8b8b4547	[Kunlun] add gen_bkcl_id_op, support multi XPU cards training using multiprocess (#30858 )	4 years ago
liym27	39f41cb47f	Performance optimization for dynamic setitem: Call op set_value to speed up because the original call to TensorToPyArray will introduce unnecessary data copy. (#30817 )	4 years ago
liuyuhui	bef46ccfc8	[Kunlun]fix include files of gen_comm_id_helper.cc (#30917 )	4 years ago
wanghuancoder	aab3a3012e	add include for heterbox_trainer.cc, develop=test (#30910 )	4 years ago
taixiurong	24873f4f77	dyngraph (#30892 )	4 years ago
Adam Osewski	092a2b1413	More UT for LayerNormFuse pass (#30891 ) * Additionally change to not throw error from inside pass.	4 years ago
tianshuo78520a	a80fe67f84	Change cmake/third_party files for CI (#30833 )	4 years ago
Jacek Czaja	abfa822650	[oneDNN]Extended adaptive pooling support for oneDNN pool kernel (#30757 )	4 years ago
joanna.wozna.intel	73cdea01d4	Add bf16 fast performance verification (#30551 ) * Update Xbyak and add bf16 fast performance verification * Fix formating * Change LOG message * Trigger an update of a new tag	4 years ago
Shang Zhizhou	e6095bc2ce	fix split trt plugin initialize (#30875 ) * fix split trt plugin initialize * update	4 years ago
WangXi	6e3856d3fb	fix xpu dygraph place (#30868 )	4 years ago
wanghuancoder	35c5b23f68	use iwyu clean include second time, test=develop (#30829 ) * use iwyu clean include second time, test=develop	4 years ago
cucuzg	ac2e2e6b7f	add clip_by_norm on kunlun, *test=kunlun (#30862 )	4 years ago
wawltor	b7560a59ab	fix the broadcast for the large second input (#30818 ) fix the broadcast for the large second input	4 years ago
JamesLim	6e1e036a75	Implement cuda kernel for index_sample. (#30380 )	4 years ago
AshburnLee	666efc2336	Call new cudnn batch norm API regardless of data type and data layout (#30157 )	4 years ago
QingshuChen	5c8455d6ea	try again if kunlun memory malloc failed (#30855 ) * try again if kunlun memory malloc failed * minor	4 years ago
石晓伟	2ac4143b6c	support xpu with analysis predictor, test=develop (#30832 ) * support xpu inference with analysis predictor, test=develop * merge the cmake of the xpu toolchain, test=develop * add c-apis, test=develop * fix a bug in extern_xpu, test=develop	4 years ago
liuyuhui	2cb55eff57	fix WITH_XPU_BKCL in CMakeLists.txt (#30854 )	4 years ago
Adam Osewski	4f066e316e	Layer normalization fuse pass. (#30721 )	4 years ago
WangXi	b1026f64af	【kunlun】dygraph supports multi xpu card training (#30671 )	4 years ago
joanna.wozna.intel	04532b8a83	Update Xbyak to v5.81 (#30809 )	4 years ago
Shang Zhizhou	b909450994	fix trt plugin clone and initialize bugs in TRT7.1+ (#30709 ) * fix trt plugin clone and initialize bugs * fix unit test error * enable trt in ci py3 * update unittest timeout	4 years ago
Wilber	b08ae368bb	ci compilation depends on a stable release (#30755 ) * update lite tag * disable ut	4 years ago
Thunderbrook	cb66c53c2d	dump to cpu (#30750 ) * dump to cpu * format * format * format	4 years ago
Chengmo	d3fac0ea85	fix int64 bug (#30780 ) fix push sparse int64 bug	4 years ago
Qi Li	69875dc42c	[ROCM] update fluid memory for rocm35 (part1), test=develop (#30758 )	4 years ago
QingshuChen	c35a9880f9	fix malloc L3 failed bug for kunlun (#30745 ) * fix malloc L3 failed bug for kunlun * minor	4 years ago
WangXi	31ed9c9eed	Fleet distributed strategy support pure fp16 (#30754 )	4 years ago
Zhen Wang	53d01afed6	Fix the nan bug when passing all zero values into clip_by_norm_op. (#30777 )	4 years ago
ShenLiang	3858f458ea	rm Singleton of reducer (#30775 )	4 years ago
Qi Li	f89da4ab45	[ROCM] update fluid platform for rocm35 (part1), test=develop (#30639 ) * [ROCM] update fluid platform for rocm35 (part1), test=develop * address review comments, test=develop	4 years ago
Wojciech Uss	fc00240575	A fix for oneDNN matmul kernel. Fixes issue #30309 (#30723 )	4 years ago
lidanqing	46989e889b	Fix python3 incompatibility issues (#30698 ) * solve python3 incompatibility issues * update checksum	4 years ago
alncat	5b59499e57	fixed compilation error on gcc 4.8.x due to the usage of isfinite (#30733 )	4 years ago
Chengmo	78d37c3f75	【Paddle.Fleet】Fix brpc get hostname (#30703 ) * fix Brpc get hostname	4 years ago
taixiurong	caf3680bbc	fix bugs in transformer predict in xpu place (#30730 ) * transformer predict * trans bug fix	4 years ago
jakpiase	f8da5536ed	REUPLOAD Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30719 ) * added external reorder to profiler * resolved conflict * added enable_static * initial version of lstm, not working yet * added lstm to operators.cmake * added vanilla lstm mkldnn op * added peephole weights integration * minor changes * added formatting * added fusion_lstm_mkldnn to static_whitelist * added formatting * removed comment * moved use_peepholes attribute inside is_cached block * reverted wrong changes * minor formatting change * minor changes * changed stream handling * minor change * added datatype to GetExpectedKernelType() * added reading stream from TLS	4 years ago
liuyuhui	67abfc1588	[Kunlun] fix dead lock for exec_op_count_ (#30718 )	4 years ago
alncat	5ace20fc3f	modified conv+bn fuse pass to fix wrong mask in mask rcnn (#30704 )	4 years ago
Tao Luo	824a79d383	Revert "Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661 )" (#30708 ) This reverts commit `d834f4e6e8`.	4 years ago
lilong12	7fbc68a2c0	update, test=develop (#30692 )	4 years ago
jakpiase	d834f4e6e8	Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661 ) * added external reorder to profiler * resolved conflict * added enable_static * initial version of lstm, not working yet * added lstm to operators.cmake * added vanilla lstm mkldnn op * added peephole weights integration * minor changes * added formatting * added fusion_lstm_mkldnn to static_whitelist * added formatting * removed comment * moved use_peepholes attribute inside is_cached block * reverted wrong changes * minor formatting change * minor changes	4 years ago
arlesniak	5bf25d1e8b	More precise mkldnn kernel rules in GetExpectedKernelType (#29840 ) * More precise mkldnn kernel choice in GetExpectedKernelType * Fixes after review * Refresh develop for CI * CI experiment * get back from CI exper	4 years ago
Jacek Czaja	173660be7b	[oneDNN] Cache oneDNN stream not to recreate in each oneDNN op (#30358 )	4 years ago
Shang Zhizhou	ae0f88a988	add DLA support：C++&&Python api (#30165 ) * add dla * add dla done * add python api Co-authored-by: shangzhizhou <root@szth-rp-fanyi-opera49.szth.baidu.com>	4 years ago
chentianyu03	fb7fbc7a5d	fix abs bug and add abs test case (#30637 ) * add abs test case * use std::abs to fix abs bug * fix the abs bug * fix abs bug	4 years ago
ShenLiang	9514b4aa5f	Fix scatter grad bug (#30604 )	4 years ago
Pei Yang	cf9bdb9404	extend trt ut timeout threshold (#30537 )	4 years ago
Thunderbrook	1bebc09253	solve build gpu task core (#30626 ) * build gpu task core * format	4 years ago
石晓伟	33bf6eb753	revert external gflags, test=develop (#30623 )	4 years ago
Jacek Czaja	dfdb0359ea	- Disabling oneDNN inplace pass (#30588 )	4 years ago
TTerror	10271ddfc4	support reduce_max op on kunlun (#30581 ) * support reduce_max op on kunlun * support reduce_max op on kunlun * support reduce_max op on kunlun * support reduce_max op on kunlun	4 years ago
QingshuChen	5013c67644	fix softmax bug for multi_card in kunlun (#30600 )	4 years ago
wuhuanzhou	7e671c07b6	optimize unity build (#30195 ) * optimize unity build, test=develop * fix code style error, test=develop * fix code style error and test /MP settings, test=develop	4 years ago
liuyuhui	e5b0d9e1fc	[Kunlun] Add condition_variable and notify() in BindThreadedSSAGraphExecutor (#30586 )	4 years ago
Zhou Wei	9674e440e2	optimize windows CI, clear tp cache,polish code,improve level of msvc log (#30579 )	4 years ago
wanghuancoder	90773473a0	use nvtx push pop in timeline (#30567 ) * delete empty line of pybing.cc, test=develop * use nvtx push pop in timeline, test=develop * change year, test=develop * add #ifdef PADDLE_WITH_CUDA, test=develop * add #ifndef WIN32, test=develop * is_pushed to is_pushed_, test=develop	4 years ago
chentianyu03	358106fcb0	make abs op support complex types (#30375 ) * rewrite abs op * rewrite abs op and remove abs in activation * remove abs register in old codes * fix abs_grad type error * fix abs double_grad output name error * modify abs_grad, abs_grad_grad functor for windows building * format code style * fix the bug of result is nan when the divisor is zero * add missing abs attr and add abs for float16	4 years ago
Wilber	2d5758c456	update. (#30585 )	4 years ago
Tao Luo	9dd71c74df	disable test_analyzer_detect (#30541 )	4 years ago
tangwei12	c9e78a22c5	add trainers for pserver (#30523 ) * add trainers for pserver Change-Id: I1a75793ec81ce126d07f4c47cae09b95d530bbc8	4 years ago
wanghuancoder	d1b25ed9d7	add some RecordEvent, for dygraph timeline (#30299 ) * add some RecordEvent, for dygraph timeline, test=develop * change GpuMemcpySync to memory::Copy, test=develop * fix compile problem, test=develop * fix compile problem, test=develop * fix, test=develop * fix, test=develop	4 years ago
YUNSHEN XIE	bbea5a1fa9	The new unit test cannot have the same name as the existing unit test (#29878 ) * check UT Duplicate name * fix error * Optimized log display * modified exit code	4 years ago
liym27	ff25c5b36f	Fix bug: GetAttrValue should deal with attr with attrType vector<double> (#30536 )	4 years ago
WangXi	572c466d19	[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455 )	4 years ago
ykkk2333	549855ac20	add rmsprop_op_xpu test=kunlun (#30493 ) * add rmsprop_op_xpu test=kunlun * modified rmsprop_op_xpu error code. test=kunlun	4 years ago
Zhou Wei	fb20ec9a4e	fix bug of multicard grad ncclAllReduce (#30553 )	4 years ago
Zhen Wang	f30d00553a	Fix the compiling error of update_loss_scaling when using cuda9. (#30538 )	4 years ago
Leo Chen	81217a94d8	unify calling cudaSetDevice (#30470 ) * unify calling cudaSetDevice * fix compile	4 years ago
pangyoki	00554b3f6b	fix error message of Inplace strategy (#30520 )	4 years ago
Leo Chen	7043b8cfc6	support layer_norm fp16 in dygraph amp (#30430 ) * support layer_norm fp16 in dygraph amp * add ut * refine code	4 years ago
wanghuancoder	59ad6ff3e3	delete empty line of pybing.cc, test=develop (#30529 )	4 years ago
hutuxian	e207fe6385	Ascend Framework Part2: pybind files (#30410 )	4 years ago
hutuxian	40ede12631	Ascend Framework Part1: OP & Wrapper (#30281 )	4 years ago
liuyuhui	843dc3cdbd	[Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317 )	4 years ago
QingshuChen	8489d4f76f	optimize batch_norm & pool op for kunlun (#30490 )	4 years ago
wanghuancoder	bd97192274	if pybind.cc changed, generate total report, test=develop (#30514 )	4 years ago
taixiurong	5e5c2827a3	fix range op crash in dygraph xpu place (#30469 )	4 years ago
JZ-LIANG	16ba0abc79	Recompute Offload: fixed bug in memcpy (#30484 )	4 years ago
guofei	11e78ebaa3	Modify the calculation logic of LambOptimizer (#29313 ) * Modify the calculation logic of LambOptimizer	4 years ago
Adam Osewski	c5ffad126c	[oneDNN] Refactor fuse pass helper functions to one place. (#30460 ) * Move pass tester helper functions to single common place. * Use helper functions in two more fuse pass tests.	4 years ago
Zhang Ting	c9a334e1b3	add VecCastCUDAKernel (#30296 )	4 years ago
pangyoki	13d757362c	Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103 ) * add view strategy on squeeze,unsqueeze,reshape,flatten * add squeeze unittest * add unittests * use View strategy as name rather than Reuse Allacation * fix view api doc * fix format * use core.ops when input of reshape2 is Tensor * fix test_cross_entropy_loss error because of reshape2 * fix test_cross_entropy_loss error because of reshape2 * add inplace strategy * add elementwise_add sub * let backward op not use inplace * grad op do not use inplace * fix memory increase error and add leaf error message * delete selected_rows * change op_function * little change * solve HandleViewBetweenInputAndOutput * add unittest and leaf error message * merge view error * optimize op_function_generator format and support sum inplace op * fix format of basic_engine * fix format for framework * little change of variable wrapper * add reshape, squeeze, unsqueeze, scatter api * add relu elu tanh softmax inplace api * fix test_squeeze_op unittest * fix test_relu_op unittest * fix comment problems * delete sample code of inplace api * add reference of grad_pending_nodes in basic_engine * fix unittest name * add inplace apis into wlist * fix error message * add PADDLE_ENFORCE for set grad op twice * fix head file error	4 years ago
Yang Zhang	008b0a8b56	Fix float64 bug in layer norm (#30452 ) built-in `rsqrt` is shadowed	4 years ago
石晓伟	715d862868	export global google flags to users, test=develop (#30448 )	4 years ago
Wojciech Uss	88fc7a7d68	fix cache key for inplaced elementwise ops (#30404 )	4 years ago
wawltor	3d49882e2c	fix the rnn mask memory bug for out of read (#30459 ) * fix the rnn mask memory bug for out of read * update the code for the rnn	4 years ago
taixiurong	6a3c8725b0	support transformer v2.0 (#30381 )	4 years ago
ShenLiang	e85be1b1b2	fix flatten api grad (#30426 )	4 years ago
yaoxuefeng	6e0da01c61	Heter ps new (#30198 )	4 years ago
123malin	2a98e9323a	test=develop, add distributed_infer (#30300 ) * test=develop, add distributed_infer	4 years ago
QingshuChen	cf786d22ec	fix bug that cann't find mkldnn(kunlun) (#30394 )	4 years ago
cc	8e3a294045	skip quantizing ops in cpu inference (#30342 ) * skip quantizing ops in cpu inference, test=develop	4 years ago
alncat	7bbf3ac5ab	Added support for inference using quantization aware trained dygraph (#30288 ) * added support for inference using qunatization aware trained dygraph * added support for inference using qunatization aware trained dygraph correct boost get usage * Delete incorrect warning message (#30196) * fix warning and no grad * clean redundant API alias in 2.0 - part 2 (#30013) * delete paddle.nn.functional.assign * fix dynamic to static error * just add the op error message for the matmul xpu (#30246) add the op error message for the matmul xpu * Add Static Variable Clone (#30208) Add clone method for static Variable so that this interface will be same as dygraph. It fixed some bugs in dy2stat * use wget to replace curl to download the lcov file (#30229) * use wget to replace curl to download the lcov file * add cache for lcov * fix test_pool3d_op timeout issue (#30248) * Fix unittests bugs. (#30250) * modify error message based on comments (#30189) * modify error message based on comments * edit code according to review. * Correct spelling according to review. * Fix bug for 'save mutiple method' (#30218) * Fix bug for 'save mutiple method' * To pass coverage. * edit code to pass coverage. * edit code to pass coverage. * add unittest for coverage. * change for coverage. * edit for coverage. * added support for inference using qunatization aware trained dygraph * Alias from paddle.fluid.layers.auc to paddle.static.auc (#30206) * add alias from fluid.layers.auc to static.auc * Update __init__.py * added support for inference using qunatization aware trained dygraph correct boost get usage * corrected boost get usage * corrected naming issues and enforcing zero check * correct paddle enforce message * added more error checkings * corrected error report message and optimized code * corrected findvar usage * corrected paddle_enforce in scope * correct error messages * correct error reporting format Co-authored-by: LielinJiang <50691816+LielinJiang@users.noreply.github.com> Co-authored-by: XiaoguangHu <46782768+XiaoguangHu01@users.noreply.github.com> Co-authored-by: wawltor <fangzeyang0904@hotmail.com> Co-authored-by: Huihuang Zheng <zhhsplendid@gmail.com> Co-authored-by: YUNSHEN XIE <1084314248@qq.com> Co-authored-by: Bai Yifan <me@ethanbai.com> Co-authored-by: gongweibao <weibao.gong@gmail.com> Co-authored-by: WeiXin <weixin10@baidu.com> Co-authored-by: Jiaqi Liu <liujiaqi06@baidu.com>	4 years ago
GaoWei8	180877e988	Softmax backward optimize (#30249 ) * softmax backward optimize	4 years ago
Zhou Wei	b1d8ff45d7	running unit test sigle GPU parallely on Linux/windows GPU (#29523 )	4 years ago
Zhang Jun	10a8f3e5c3	fix bug on compiling inference shared lib with crypto;test=develop (#30269 ) * fix bug on compiling inference shared lib with crypto;test=develop * fix cmake bug when build inference lib using -DWITH_CRYPTO=OFF * update cmake * remove unnecessary enforce message	4 years ago
Huihuang Zheng	28e156c27f	Fix Sleep Error in enforce.h (#30335 ) usleep function in <unistd.h> only takes argument less than 1,000,000. Current call can exceed this limit, we have to fix it. This PR can fix random CI error.	4 years ago
Leo Chen	3d015f1cf5	Set expected place in child thread for dataloader to avoid costing cuda memory on other card (#30338 ) * set expected place in child thread for dataloader * set device id when set tensor from numpy * revert tensor_py change * add compile guard * fix ci * fix bug	4 years ago
QingshuChen	2c1bba02e4	optimize memcpy perf for kunlun (#30291 ) * optimize memcpy perf for kunlun * remove useless unitest for kunlun mean * minor	4 years ago
ShenLiang	a60f17b89d	Support unused parameters in dynamic graph distributed (#30224 )	4 years ago
JZ-LIANG	75936d838f	Recompute Offload (#30233 )	4 years ago
lidanqing	a60893f6b5	correct the allowed dimension size (#30326 )	4 years ago
Chen Weihang	c8c8f205ba	remove c++ stacktrace hint (#30325 )	4 years ago
tangwei12	5e839e4da5	add sparse embedding & load vars for 2.0 & gloo bug fix (#30306 ) * add sparse embedding & load vars for 2.0 Change-Id: I36b59ed5f015189dc9d9d2e34a9357722d369f1b * fix hdfs gloo Change-Id: Ia84d579053720ad804183e54c9a04b4f031c79c6 * fix gloo hdfs Change-Id: I5ab982fd483cddc10adcdef0b8aa83aca976cb9e * move loadvar/sparse embedding from incubute to static Change-Id: I57081d3545ad2efab78c72420d2162c0eacaf3a0	4 years ago
tangwei12	25f80fd304	Fix/distributed proto (#29981 ) * rename sendrecv.proto to namespace paddle.distributed * split ps with distributed	4 years ago
Chengmo	d479ae1725	【Paddle.Fleet】Support local save sparse param (#30175 ) * add save tensor support Co-authored-by: seiriosPlus <tangwei12@baidu.com>	4 years ago
Double_V	231501fefc	fix elugradgrad test fail & error message opt (#30171 ) * fix elugradgrad test fail and error message opt * fix unitest,test=develop * Update prroi_pool_op.h fix error message * opt message,test=develop * fix ci fail,test=develop	4 years ago
Zhen Wang	fb49ea388e	Fix the accuracy problem of allclose op when using float64 data type in static mode. (#29890 ) * Fix the accuracy problem of allclose op when using float64 data type in static mode. * Format the code style.	4 years ago
yaoxuefeng	4656525e24	fix datanorm error msg (#30294 )	4 years ago
furnace	77051cc9f0	add fp16 support for tril_triu op (#30186 )	4 years ago
石晓伟	efa54629fb	fix header file paths of gflags, commit 3, test=develop (#30273 )	4 years ago
Chengmo	5b2c15afcd	Fix server.h include device_context (#30243 ) * fix cmake Co-authored-by: seiriosPlus <tangwei12@baidu.com>	4 years ago
石晓伟	a0ee09148e	enhance error msgs of fusion_seqpool_cvm_concat_op.cc, test=develop (#30240 )	4 years ago
石晓伟	a66eebab5c	fix header file paths of gflags, commit 4, test=develop (#30274 )	4 years ago
石晓伟	8c4500ff6d	fix header file paths of gflags, commit 2, test=develop (#30272 )	4 years ago
liym27	b4989fb744	Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126 )	4 years ago
wangchaochaohu	8dcae0c55d	register OPMaker and Infer Shape Check for fused_elementwise_add (#30259 )	4 years ago
AshburnLee	924aac2216	Add tf32 switch for cuDNN (#29192 )	4 years ago
石晓伟	8ce2482b80	fix header file paths of gflags, commit 1, test=develop (#30271 )	4 years ago
chentianyu03	c7371b7b20	type promotion for grad (#30177 ) * type promotion for grad * add type promotion for div op	4 years ago
liym27	3ce878f309	Check the rank of input in kernel of set_value op (#30147 )	4 years ago
WeiXin	66dc4ac77b	modify error message based on comments (#30189 ) * modify error message based on comments * edit code according to review. * Correct spelling according to review.	4 years ago
wawltor	fee424411a	just add the op error message for the matmul xpu (#30246 ) add the op error message for the matmul xpu	4 years ago
GaoWei8	0a21924a8d	optimize softmax forward (#30217 ) * optimize softmax forward	4 years ago
wangchaochaohu	af80859dd6	reduce the occupied size of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885 )	4 years ago
zhang wenhui	5932fee60a	enhance error message, test=develop (#30220 )	4 years ago
pangyoki	da16b33f2e	add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913 ) * add view strategy on squeeze,unsqueeze,reshape,flatten * add squeeze unittest * add unittests * use View strategy as name rather than Reuse Allacation * fix view api doc * fix format * use core.ops when input of reshape2 is Tensor * fix test_cross_entropy_loss error because of reshape2 * delete selected_rows * change op_function * little change * solve HandleViewBetweenInputAndOutput	4 years ago
Jacek Czaja	4aba17b5db	[oneDNN] Added UT for testing elementwise_mul caching (#30203 ) * - Added UT for testing elementwise_mul caching * lint fixes	4 years ago
Zhen Wang	7f7dfccf20	Support pure fp16 training for AMP API. (#29544 ) * add cast ops before and after unsupported fp16 ops. * Keep partial net in FP32 pattern. * Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode. * Add fp16 support for adam op. * add multi precision attr for adam. * Fix the bug of test_multi_precision_fp16_train UT. * Code format for CI. * Fix the redefine error about MPTypeTrait on windows. * fix bugs of the _create_accumulators func in Momentum. * fix bug when inserting post cast op. * Add the update_loss_scaling op in allow_set of UnusedVarCheck. * Update for ci coverage. * Add some doc for OptimizerWithMixedPrecision. * Fix the code style. * Imporve the doc of `amp_init`. * Change for fp16 testing if users have the infer program defined in separate way.	4 years ago
Leo Chen	789743e190	use cuda generator in bernoulli cuda kernel (#30199 )	4 years ago
Leo Chen	8696335f86	Fix dtype of ungenerated grad var (#28511 ) * fix dtype of ungenerated grad var * update ut * refine code * set default dtype * fix could_use_cudnn bug * remove debug code * re-implement * fix bug	4 years ago
Wilber	609c022222	shape op support int8 and uint8 tensor (#30201 )	4 years ago
Wilber	01a287bf0a	fix windows compile when WITH_PYTHON=ON and WITH_TENSORRT=ON (#30194 )	4 years ago
ruri	e42e1e80dc	Add version checking, test=op_version (#30129 )	4 years ago
Leo Chen	1f97d61c68	Add callback after TensorCopy (#30123 ) * change to tensor copy sync * change to tensor copy sync * make copy_to safe when use TensorCopy * refine code * add ut * add cudapinned garbagecollector * add testcase: cpu place -> cuda pinned place	4 years ago
Chengmo	528e03fc08	【Paddle.Fleet】Fix tensor table (#30075 ) * add tensor table	4 years ago
Wilber	ade244948c	disable mkldnn inplace pass on windows (#30164 )	4 years ago
joanna.wozna.intel	907262ee15	Fix analysis predictor test (#30191 ) * Add a necessary condition * Remove test for white list and add header	4 years ago
lijianshe02	2dc7ee276b	enhance error message of nll_loss op test=develop (#30125 ) * enhance error message of nll_loss op test=develop	4 years ago
Huihuang Zheng	54bf3f5a56	Refine PADDLE_ENFORCE Error Messages. test=develop (#30149 ) Improve some error messages in parallel_executor.cc, conditional_block_op.cc, recurrent_op.cc	4 years ago
Chen Weihang	d0fb06b27f	[Complex] Simplify prepared op impl to improve performance (#30153 ) * simplify prepared op impl to improve performance * fix kunlun compile error * continue fix kunlun compile error * only transform diff place when dtype diff * fix failed unittests * remove useless file * polish impl by review comment	4 years ago
123malin	c5b415bfd9	Improve Index select cuda kernel (#30139 ) * test=develop, add index_select_cuda kernel	4 years ago
wangchaochaohu	7dd551e08b	refine the paddle place support using str (#28769 )	4 years ago
WeiXin	404c16763a	Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t (#30161 )	4 years ago
Wilber	91a8a25721	enhance error info for py_func (#30138 ) * enhance error info for py_func * update	4 years ago
weihaoji	b8207af6bc	[XPU] Remove lite_xpu ut lite_resnet50_test since fusion pass changes introduced precision diff. test=develop (#30122 )	4 years ago
liuyuhui	15fac5e7fa	fix assign_op_xpu concat_op_xpu warining (#30120 )	4 years ago
Jack Zhou	f5428eca4f	fix enforce msg of sum xpu op (#30113 )	4 years ago
123malin	198fbdfb60	Add Lookahead and ModelAverage Optimizer (#30004 ) * test=develop, add model_average and lookahead	4 years ago
Leo Chen	adac38c506	add dispenable input for core.ops.reshape2/expand/slice (#30072 ) * add dispenable input 'shape' for core.ops.reshape2 * add dispenable inputs for core.ops.reshape2/expand/slice * add ut	4 years ago
ShenLiang	becf99d2e8	fix error message (#30135 )	4 years ago
Zhou Wei	30888ca343	Polish and Optimize the print/repr information of Layer (#29998 ) * Polish and Optimize the print/repr message of all layer * fix some code format	4 years ago
Zhou Wei	9c99d37906	fix unittest failed on windows (#29837 )	4 years ago
wangguanzhong	69839f8a9a	fix error message for distribute_fpn_proposals_op (#30116 )	4 years ago
QingshuChen	8e1c3ddf15	add aarch64 and sunway kunlun lib (#30027 ) * add aarch64 and sunway kunlun lib * minor * optimize elementwise_add for kunlun * update kunlun dependence * minor * minor	4 years ago
Shang Zhizhou	05b27695f1	add inference api： DisableTensorRtOps (#30109 ) * snap * add inference api: DisableTensorRtOPs * fix code style * update api to experimental * update variable name	4 years ago
石晓伟	53bb126510	fix a bug in op_version_registry, test=develop, test=op_version (#29994 )	4 years ago
xiemoyuan	3e0c492910	Optimize the error message of framework. (#30134 )	4 years ago
liym27	9922bd4125	Fix bug: In dynamic mode, if start or end is negetive, __getitem__ return wrong result(#30003 ) 1. when slice_item is a slice: 1) the start of __getitem__ should be std::max(start, 0) if slice 2) the start of __getitem__ should be std::min(end, dim) 2. when slice_item is an integer, it should be in [-dim_len, dim_len) 3. Fix error message to use accurate data	4 years ago
chentianyu03	666e665132	change the kron gradient when complex types (#29995 )	4 years ago
chentianyu03	a5e422c85d	add trace op_register_version and fix version bug; test=op_version (#30000 ) * add trace op_register_version and fix defaulf bug; test=op_version * add trace op_register_version; test=op_version * add trace op_register_version; test=op_version * add trace op_register_version; test=op_version * fix missing the template bug of vector; test=op_version	4 years ago
cc	9f34374b48	Fix the formate of raising error in randperm op (#30108 ) * fix the formate of raising error in randperm op	4 years ago
liuyuhui	254ad61959	fix xpu pe sync, test=notest (#30095 )	4 years ago
Thunderbrook	0b8e1fadc5	add topo-aware in heter-ps (#30087 ) * add topo aware * resource.h * topo aware * format	4 years ago
hong	297fff1a79	support dygraph in xpu place (#30051 ) * support dygraph in xpu place; test=develop * fix cpu/gpu compile error; test=develop * fix compile error; test=develop * fix xpu compile error; testd=develop	4 years ago
wangchaochaohu	d0a5620575	fix the compiler error when gcc4 cuda9.0 (#29997 )	4 years ago
WangXi	ee16006b5d	Optimization grad merge performance (#29784 )	4 years ago
yongqiangma	e891f4da1b	Add p_norm op version info (#30042 ) * p_norm fix op version info. test=develop	4 years ago
tangwei12	7d1c149e09	for inference checkpoint (#30081 ) * for inference checkpoint Change-Id: I36c979240ffa55bf1ef0c9315402960762af6be4 * for inference checkpoint Change-Id: I82025365d5b792cbea1ead506df685aecc8ac198	4 years ago
tangwei12	7d4bdff07d	fix large scale memory (#30035 ) * memory holder optimize Change-Id: Ic91af8ac6f2853336d28a9fbbc5e8d0c57b5d05e * memory holder optimize Change-Id: I2fd1c14ecc17f5d5ce88b87890381ea801e6367f * fix large scale memory holder Change-Id: Ief0992b02b00220e16c72cc637a56e7b5788140f * fix large scale memory holder Change-Id: I910142a3952ead643a5604f8f80955f3e6efe655	4 years ago
Shang Zhizhou	08dc5bc27e	fix op version checker of pass bug (#30028 ) * fix op version checker of pass bug * fix code style * update pass version	4 years ago
cc	68398abce9	[Inference] zero_copy_tensor supports int8_t (#30053 ) * zero_copy_tensor supports int8_t	4 years ago
whs	1b999d2b5d	Add version checking (#30040 )	4 years ago
ceci3	85b2f05ab0	register ModifyAttr for instance_norm, test=op_version (#30065 ) * register instance norm, test=op_version	4 years ago
channings	ddcff254db	fix op_register_version for compare ops, test=op_version (#30007 ) Co-authored-by: zhoushunjie <zhoushunjie@baidu.com>	4 years ago
Wilber	66e16b7e99	update lite subgraph. (#30056 )	4 years ago
GaoWei8	a64822589f	add REGISTER_OP_VERSION for LSTM (#30038 )	4 years ago
yinhaofeng	6e93fb92f9	Register op version for linspace,test=op_version (#30025 ) * Register op version for linspace,test=op_version * Register op version for linspace,test=op_version * Register op version for linspace,test=op_version * Register op version for linspace,test=op_version * Register op version for linspace,test=op_version	4 years ago
123malin	d0056c324d	test=develop, add op_register_version for roll_op (#30023 ) * test=develop, add op_register_version for roll_op	4 years ago
chentianyu03	e012930aa3	complex gradient matmul (#29966 ) * dot op support complex types * matmul support complex types * add test case * matmul broadcast gradient support complex * move conjFunctor to complex_functor.h	4 years ago
ShenLiang	893d37e5c6	Fix rank_attention op_version, test=op_version (#30006 ) * fix rank_attention, test=op_version	4 years ago
Adam Osewski	13aef97043	operator checkpoints for new attributes. (#29832 ) * Add operator checkpoints for new attributes. * Fix adding subsequent checkpoint to quantize op.	4 years ago
wangguanzhong	844d8e0c2c	add REGISTER_OP_VERSION for generate_proposals, roi_align, roi_pool test=op_version (#30034 )	4 years ago
cc	c3c064a8fc	Add mkldnn nearest_interp and bilinear_interp op (#30016 ) * Add mkldnn nearest_interp and bilinear_interp op * don't run mkldnn interpolate in default * add interpolate_mkldnn_pass	4 years ago
chalsliu	c053bf2a57	Revert "register ModifyAttr for instance_norm, test=op_version (#29938 )"	4 years ago
wawltor	cc2f94620c	add the support the op version check for matmul, test=op_version (#30011 ) * add the support the op version check for matmul, test=op_version	4 years ago
wawltor	b33aaea86c	add the op version check for the elementwise ops, test=op_version (#30010 ) * add the op version check for the elementwise ops, test=op_version * add the support check for elementwise_ops, test=op_version	4 years ago
Chengmo	4cbcc9b6da	fix momentum op register (#29941 ) * fix momentum op register	4 years ago
hutuxian	7c1f69bdf0	add op_version for flip op [test=op_version] (#30019 )	4 years ago
ceci3	77c1684397	register ModifyAttr for instance_norm, test=op_version (#29938 ) * upgrade instance_norm, test=op_version * fix	4 years ago
Leo Chen	47d10c55d5	Enhance debugging (#30001 ) * add debug code * add place info * fix compile problem * add place for output	4 years ago
FlyingQianMM	d42f93e504	add op_register_version for allclose op; test=op_version (#29968 )	4 years ago
wawltor	8f49f9d5c9	change the elementwise ops version check, test=op_version change the elementwise ops version check, test=op_version	4 years ago
guofei	b23faf37be	Add moving_average_abs_max_scale op_register_version test=develop (#29957 ) Add moving_average_abs_max_scale op_register_version	4 years ago
Thunderbrook	0ca6de171f	add include (#29952 )	4 years ago
zhangchunle	631d783748	fix bug in windows ci (#29963 )	4 years ago
Pei Yang	6206b9bc71	fix ut:trt_resnext_test, trt_quant_int8_yolov3_r50_test, test_trt_dynamic_shape_ernie, test_trt_dynamic_shape_ernie_fp16_ser_deser, trt_cascade_rcnn_test (#29977 )	4 years ago
wangxinxin08	be8b5fd18a	register op version for conv2d_transpose, conv3d_transpose and depthwise_conv2d_transpose, test=op_version (#29937 )	4 years ago
石晓伟	958612231f	compile the denormal.cc on aarch64, test=develop (#29956 )	4 years ago
Guo Sheng	6ac4f0af6a	Register op version for coalesce_tensor. (#29940 ) test=develop test=op_version	4 years ago
Chen Weihang	a1d9a14e89	support grad accumulated across batch (#29942 )	4 years ago
cc	6a0102b038	map matmul/squeeze2+matmul/reshape2+matmul to mul (#29911 ) * map matmul/squeeze2+matmul/reshape2+matmul to mul	4 years ago
Huihuang Zheng	d038746e1c	Fix Unix Sleep for Wrong Time. test=develop (#29953 ) PADDLE_RETRY_CUDA_SUCCESS used wrong sleep time so it can cause timeout in unittest. This PR fixed it. After we searched the doc in https://pubs.opengroup.org/onlinepubs/7908799/xsh/unistd.h.html, the time unit of sleep in unistd.h takes "seconds", usleep takes "microseconds", Sleep in windows.h takes "milliseconds".	4 years ago
YUNSHEN XIE	121658d251	Support xpu ut coverage (#29892 ) * add xpu_coverage function * xpu coverage ipipe only deal with xpu files * fix import error * fix format error * 'fix format error' * fix format error * fix error * fix format error * fix format error	4 years ago
Jack Zhou	5a4e42ca9a	add gru op_register_version; test=op_version; (#29931 ) * add gru op_register_version; test=op_version; * Update fc,mul version;test=op_version;	4 years ago
Wilber	2b1d796cd0	[Inference] Solve 2.0 trt performance reduce compare 1.8. (#29925 )	4 years ago
Qi Li	913f77a4b7	Register op version for print, test=op_version (#29945 )	4 years ago
石晓伟	181ea1870b	flush denormals to zero, test=develop (#29924 ) * flush denormals to zero, test=develop * add comments, test=develop	4 years ago
cc	7667e59bf7	add op version for fake_quant and fake_dequant ops, test=op_version (#29923 ) * add op version for fake_quant and fake_dequant ops, test=op_version, test=develop	4 years ago
石晓伟	acb5e86363	fix a bug in reset_tensor_array, test=develop (#29620 ) * fix a bug in reset_tensor_array, test=develop * ci coverage, test=develop	4 years ago
liuyuhui	3d1741b794	[Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926 )	4 years ago
Wilber	332da133a1	Support mips arch (#29903 ) * Support MIPS arch.	4 years ago
LielinJiang	eab0b60e16	Register op version for grid_sampler, test=op_version (#29916 ) * register op version for grid_sampler	4 years ago
liym27	9602a182b2	[Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor (#29842 ) * Revert "[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267)" This reverts commit `b10ecd9d3a`. * Support ShareInplaceVersionCounterWith to share the same inplace version counter for VarBase	4 years ago
liuyuhui	4427df37cf	[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574 )	4 years ago
LielinJiang	0f4b218640	Enable bilateral_slice unittest on windows platform (#29896 ) * enable bilateral_slice unittest on windows platform * reduce max threads	4 years ago
Ren Wei (任卫)	95df0e1447	Add the ipipe log param prefix (#29545 ) * Add the ipipe log param prefix 1. add the prefix; 2. using Colon before the metric values; * 增加效率云日志指标收集前缀暂未验证windows bat的这个字符串替换是否正常 * Preserve The Old Format Metrics During The Transition Period Please DELETE the old format metrics log finally. The period man last for a week. * ipipe_log_param + ccache and clcache ..	4 years ago
YUNSHEN XIE	2a01756bf3	remove duplicate ut names (#29809 )	4 years ago
Chen Weihang	a6072055be	[Complex] Handle complex to real after type promotion (#29855 ) * try to add fwd op input dtypes * refactor base impl * return tmp_ins after dygraph prepare data * fix typo found in debug * polish comment & add complex net test * revert detail change * fix unittest failed * add complex kernel condition control * fix xpu test failed & polish comment * polish details by review comments	4 years ago
Chen Weihang	1a304e6c06	[Complex] Add support for complex grad accumulated (#29889 ) * add support for complex grad accumulated * add unittest for coverage * update test dtype * remove useless blank line	4 years ago
taixiurong	c7acad9f2f	support some shape for matmul and cast in xpu place (#29900 ) * support some shape in matmul and cast * modify matmul	4 years ago
Leo Chen	6b258317cb	fix TransferInplaceBack (#29830 )	4 years ago
QingshuChen	59b47f3b32	feat: support check_nan_inf for kunlun/xpu device (#29694 ) * feat: support check_nan_inf for kunlun device * support kunlun stack * minor	4 years ago
tangwei12	032414ca2a	[Feature] one ps (3/4) (#29604 ) * oneps (3/4) Co-authored-by: MrChengmo <cmchengmo@163.com> Co-authored-by: malin10 <malin10@baidu.com> Co-authored-by: chengmo <chengmo@baidu.com>	4 years ago
jakpiase	edc06c6a1b	Added fc + activation fuse pass (currently only gelu, sigmoid and tanh are supported) (#29772 )	4 years ago
Wilber	2c0a4a3470	call_statck is turned on default when ON_INFER=ON (#29798 )	4 years ago
Wilber	ad0b01ffe2	lod operator should not be reused in memory_optimize pass. (#29828 )	4 years ago
liym27	97e75ad0f5	[setitem] Support Tensor setitem in static mode (#29708 ) 1. Type of index: int, slice(step must be 1). 2. Type of value: (1) int32, int64, float32, bool; (2) numpy.array(int32, int64, float32, bool);<Note: float64 is not supported> (3) paddle.Tensor(int32, int64, float32, float64, bool);	4 years ago
YUNSHEN XIE	24ce051a84	remove duplicate ut reload (#29810 ) * remove duplicate ut reload * remove duplicate ut define in cmakelist	4 years ago
Jacek Czaja	c9e874fc8e	[oneDNN] Unit test for checking oneDNN caching (#29606 )	4 years ago
Thunderbrook	09b6e71928	heter box (#29734 ) * 　add heter box * add trainer, worker, wrapper... * format * for ci * format * remove boost get * boost & copyright * rename * 　rename * format * format * format Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>	4 years ago
Jacek Czaja	7b33720c90	[oneDNN] Tensor copy fix to oneDNN tensors (#29771 ) * - Tensor copy fix to oneDNN tensors * - Fixes after review	4 years ago
123malin	a400b76db7	Roll cuda kernel (#29655 ) * test=develop, optimize roll_op_cuda_kernel	4 years ago
wuhuanzhou	e7ac74c85b	optimize compilation time of argmin/argmax op (#29595 ) * Using VisitDataTypeTiny and put CastOP after ReduceOP, test=develop * remove changes of reduce_op.h, test=develop	4 years ago
Zhou Wei	3f83ec61c2	move running unittest on windows to another file (#29815 )	4 years ago
chentianyu03	ddfc3d2c2f	change grad elementwise_mul for complex types (#29757 ) * add conj op for complex types * add conj for complex types * add more test case * add conj_op test * modify conj api and impl * add complex type for fill_constant_op xpu * add setConstant for complex type * remove complex conj test file * user define grad for test_conj_op * add test case for static mode of conj api * modify conj doc * change input args name to x * remove useless codes * conj support real types * add conj test case for real number * delete no need to calculate inputs in dygraph op_test * delete no need to calculate inputs in dygraph op_test * modify grad of mul for complex types * fix the grads of inputs args order not match bug	4 years ago
chentianyu03	2a260d9b0e	change the grad of div when complex types (#29804 ) * change the grad of div when complex types * fix the grads of inputs args order not match bug	4 years ago
ShenLiang	f65f1caad3	opt sparse allreduce using ncclgather (#29819 )	4 years ago
TTerror	82aa01c373	add nearest_interp_v2 on kunlun (#29725 ) * add nearest_interp_v2 on kunlun * add nearest_interp_v2 on kunlun	4 years ago
wangchaochaohu	01c37c8e02	refine the compiler error for half2 operation (#29816 )	4 years ago
whs	82630408b4	Support double backward rsqrt (#29589 )	4 years ago
Zhang Ting	b76f5a8489	fix the bug of dropout_grad (#29813 )	4 years ago
LielinJiang	a94c3cbbf3	register cudnn conv double grad for depthwise conv (#29807 )	4 years ago
ShenLiang	01e2874a0e	Support multi-stream communication for dynamic graph distributed (#29525 ) * fix fleet for multi-stream * fix memcpy for ncclid * use sync to solve move operation	4 years ago

... 3 4 5 6 7 ...

18542 Commits (b48841ba2e7335eaa435a54436ed580d4aef001c)