Paddle

Commit Graph

Author	SHA1	Message	Date
jakpiase	2f1165342b	OneDNN hardswish integration (#30211 )	4 years ago
Chen Weihang	e8cdb49aa9	[CustomOp] Support attributes as func input in custom op (#31128 ) * add simple attr support and test * add int, float attr support * support other attribute * add custom attrs test in cmake * polish details * fix test failed * add backward test * update test flags	4 years ago
Zhou Wei	ffbf71359a	modify custom op dependent from paddle_framework to paddle_custom_op (#31195 )	4 years ago
Leo Chen	0f1fde5102	fix the modification of set_expected_place (#31177 ) * revert the modification of set_expected_place * set device before op run * add ut	4 years ago
lilong12	dc8dfba35b	align the default value of some configuration for fleet to that of single cards (#30740 ) * update, test=develop	4 years ago
lilong12	a373aa7645	fix the bug in expand_v2 op (#30984 ) * update, test=develop	4 years ago
Thunderbrook	c4f279fe8d	support multi node in heterps (#31102 ) * push multi node * multi node * MultiThread * remove log * solve bug in 30829	4 years ago
liu zhengxi	ae2be49f40	Add cublas_handle() to expose cublas_handle to ops (#31157 ) * add get_cublas_handle() api * update format * add unittests * alter function name	4 years ago
Pei Yang	00b09e86ac	[Paddle-TRT] support group_norm (#31040 ) * add group norm plugin * fix compile problems * move concat axis check to trt op teller * add nbDims for scale and bias nv dims * add group norm unit test * fix unittest * add trt version restriction for group norm op teller * fix unittest	4 years ago
Chen Weihang	1ce96fa118	[CustomOp] Add new paddle custom op so (#31141 ) * add new custom op so * fix use new method error * fix test failed	4 years ago
tangwei12	ebbdf52557	fix entry (#31079 ) * fix entry * fix distributed lookup table fuse case * fix entry bug at first time * move entry from paddle.fluid -> paddle.distributed * fix ut with paddle.enable_static() Co-authored-by: malin10 <malin10@baidu.com>	4 years ago
Qi Li	ee76ea72de	[ROCM] update fluid collective op for rocm, test=develop (#31075 )	4 years ago
yaoxuefeng	d8fa65a3a8	fix heter compile (#30518 )	4 years ago
Zhou Wei	4b220550ef	[Custom OP]Fix problem of custom op unitests on Windows CI (#31114 ) * fix some problem of Windows custom op * fix some problem of Windows custom op * fix some problem of Windows custom op	4 years ago
Zhou Wei	be61c2d06b	support build whl and inference library nightly,test=windows3 (#30616 )	4 years ago
alncat	5d6a8c7b73	added support for fake_quantize_dequantize_abs_max op in quantization… (#30896 ) * added support for fake_quantize_dequantize_abs_max op in quantization inference pass * remove const_cast to pass ci * remove compare operator to pass ci-coverage * added detailed error message for unregistered tensorrt_subgrah_pass	4 years ago
Jacek Czaja	d3f09ad702	Update of onednn to 2.2 (#31067 )	4 years ago
Guanghua Yu	24ba5ee05c	merge develop conflict (#31122 )	4 years ago
Qi Li	cced930b61	[ROCM] update fluid operators for rocm (part1), test=develop (#31077 )	4 years ago
wangchaochaohu	364cfa2686	fix windows for optimization of elementwise_add Op (#31068 ) * fix windows for optimization of elementwise_add Op	4 years ago
joanna.wozna.intel	781df300d0	Unification of BF16 enablement process (#31034 ) * Unification of bfloat16 enablement process and refactor * Remove unnecessary function * Standardize the output name search	4 years ago
Zhong Hui	16fe11d71e	fix softmax cross entropy integer overflow (#30590 ) [BUG FIX] Fix softmax cross entropy overflow problem.	4 years ago
Zhou Wei	44ee251fde	fix UNIX cmake problem (#31113 )	4 years ago
Qi Li	a60d93fb77	[ROCM] update fluid framework for rocm (part2), test=develop (#31010 )	4 years ago
Thunderbrook	565354f676	support save multi sparse table in one path (#31108 ) * save multi table one path * format	4 years ago
Qi Li	50967135a5	[ROCM] update fluid framework for rocm (part3), test=develop (#31011 )	4 years ago
Qi Li	8fe09faf14	[ROCM] update fluid framework for rocm (part1), test=develop (#31009 )	4 years ago
Qi Li	334296306c	[ROCM] update fluid platform for rocm39 (part4), test=develop (#30936 )	4 years ago
Shang Zhizhou	a5c56d83a1	update trt int8 calibrator to IEntropyCalibratorV2 (#31060 ) * update trt int8 calibrator to IEntropyCalibratorV2 * add delele opt_cache for trt_split_converter_test	4 years ago
Zhou Wei	adaec0073d	[2.0Custom OP]Support New Custom OP on Windows (#31063 ) * [2.0.1]Support New Custom OP on windows * fix CI * fix code style * fix CI * fix CI * fix coverage * fix CI * fix CI	4 years ago
Qi Li	1d996637e6	[ROCM] update fluid imperative for rocm (part1), test=develop (#31017 ) * [ROCM] update fluid imperative for rocm (part1), test=develop * [ROCM] update reducer.cc after merge, test=develop * update reducer cmake after merge, test=develop	4 years ago
JamesLim	b95eb38b8a	fix the bug in backward OP of index_sample. (#31026 )	4 years ago
Chengmo	6b3371e0c7	Remove PE special profiler (#30886 ) * remove pe special profiler * add profiler info	4 years ago
Chen Weihang	6beeafe797	[CustomOp] Add more dispatch marco for users (#31058 ) * add more dispatch marco * add more dispatch marco * add more tests * revert unneeded change * add timeout for test dispatch * add float and complex test * remove and marco	4 years ago
TTerror	d5323dab41	add squeeze_op/unsqueeze_op on kunlun;fix conv op and parallel executor;optimize lookup_table op (#31056 ) * add squeeze_op/unsqueeze_op on kunlun; fix conv op and parallel executor on kunlun; optimize lookup_table op on kunlun * update squeeze/unsqueeze op	4 years ago
123malin	16b4260b2f	test=develop, save/load, shrink (#30625 ) * test=develop, save/load, shrink Co-authored-by: seiriosPlus <tangwei12@baidu.com>	4 years ago
Jiabin Yang	628451af06	hide useless headers and add complex support (#31074 )	4 years ago
Wilber	463eae0383	update paddle_fluid.so to paddle_inference.so (#30850 ) * update paddle_fluid.so to paddle_inference.so	4 years ago
liym27	5b367dab44	[static setitem] Support the index is Tensor; step>1; step<0 .(#30949 ) * [static setitem] support the index step > 1. tensor_a[::3] = value * [static setitem] support the index step < 0. Eg: tensor_a[::-3] = value * [static setitem] support the index is Tensor. eg: tensor_a[tensor_3:0:-1] = value * Add op version.	4 years ago
Qi Li	eb3050fa9a	[ROCM] update fluid inference for rocm (part1), test=develop (#31018 )	4 years ago
Jacek Czaja	f7465641c3	Added reshape grad bf16 (#31035 ) * - added Reshape grad bf16 * - Added reshape grad bf16 * - cosmetics in py	4 years ago
Wojciech Uss	615d8a2264	Modify relu native implementation 2 (#30996 ) * Modify relu native implementation * fix GPU performance	4 years ago
ShenLiang	9401173e3a	Remove scale loss before reduce in dygraph (#30807 )	4 years ago
Wilber	0020d91506	fix python pass builder error. (#30946 )	4 years ago
Wilber	39aeaa160e	fix jetson problem (#30939 )	4 years ago
Wilber	01ccfbcde9	update trt error message when input height or width is -1 (#31019 )	4 years ago
Wilber	cf8b8f9c5e	resolve memory leak in cudnn8.0 (#31029 )	4 years ago
Guanghua Yu	5b267474a9	add offset parameter in roi_align,generate_proposals.etc ops (#30864 ) * add parameter in roi_align op	4 years ago
Chen Weihang	75f81233ae	fix regex error & simplify marco name (#31031 )	4 years ago
Zhang Ting	f0ee159280	enable exhaustive_search for forward and backward algos when dtype is float16 (#30959 ) * enable exhaustive_search for input_grad when dtype is float16 * enable exhaustive_search for forward algos	4 years ago
Pei Yang	9b54fe4154	add trt transpose and flatten converter (#31022 )	4 years ago
joanna.wozna.intel	caf9d39839	Add Conv Transpose BF16 (#30877 ) * Add conv transpose BF16 * Share function GetWeightsTz * Adjust to review and fix op compatibility * Add bias to unique handler name * Remove errors related to paddle enforce * Add conv2d_transpose to bf16 list and kernel refator	4 years ago
Chen Weihang	f649442ddd	New custom operator extension mechanism (#30690 ) * initial commit: simple demo * polish copyright format * add grap op simple demo * adapt uncertain number of argument * change trait marco name * add place & dtype support for add kernel * add dispath and infershape func * poish code & add notes * add dynamic_loader dep for paddle_framework * add new custom op test dir * polish impl details * add unittest for new custom op * fix failed unittest * Costum op (#1) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * Remove ShareData from user && Change CustomTensor to Tensor && Support more data type (#2) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * hid share data from and to * rename CustomTensor to Tensor * refactor register design & add test * change op_funtion to op_meta_info * split op meta info into .h and .cc * move get methods into friend class * move OpMetaInfoHelper into framework space * move CustomTensorUtils into framework space * change pybind api name * move PD C API into op meta info * add register custom op api * remove inference cmake change * refactor copy to api && change Reshape to lowercase && support more dtype && add more test (#3) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * hid share data from and to * rename CustomTensor to Tensor * support multi dtype * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * fix copy to error * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * polish detail & error message * polish test details * Add cast api && Change copy related api to copy_to && add more test (#4) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * hid share data from and to * rename CustomTensor to Tensor * support multi dtype * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * fix copy to error * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add type cast * add cast and make copy to api * add cast and make copy to api * add cast and make copy to api * add cast and make copy to api * merge cwh code * merge cwh code * merge cwh code * merge cwh code * merge cwh code * add more error log * add more error log * polish code * used for test * remove test comment * remove test comment * fix uint8 type error * fix lost uint8 type error * add test for coverage * polish details by reviewer comments * add prefix for DISABLE_COPY_AND_ASSIGN Co-authored-by: Jiabin Yang <360788950@qq.com>	4 years ago
Zhou Wei	5c0332714f	fix bug of Linux UT parallel level (#30971 )	4 years ago
wuhuanzhou	9b3c80c8ab	update eigen version on Windows (#30573 ) * update eigen version on Windows, test=develop * add /bigobj for cl, test=develop	4 years ago
ShenLiang	dae3e1f337	Solve inconsistent order in each card in dynamic graph (#30931 )	4 years ago
WangXi	14d039e4a1	Fix the problem that the number of ops executed by xpu is wrong (#30961 )	4 years ago
Chen Weihang	010f2caa23	try to fix reader and signal test failed (#30960 )	4 years ago
Adam Osewski	3ba69809bf	Fix LayerNorm tester for gcc4.8 (#30962 )	4 years ago
Qi Li	93c1d9e761	[ROCM] update fluid platform for rocm39 (part3), test=develop (#30913 )	4 years ago
QingshuChen	15297a065c	fix depends of kunlun bkcl (#30945 )	4 years ago
liym27	97f7a70c01	Add error message for slice op(#30851 )	4 years ago
liuyuhui	87197f8c2e	[kunlun]fix sync in multi kunlun xpu dygraph training. (#30943 )	4 years ago
石晓伟	99bd16eb4e	bug fix of xpu lite engine, test=develop (#30918 ) * bug fix of xpu lite engine, test=develop * xpu zero copy tensor, test=develop * revert paddle/fluid/inference/tests/api/CMakeLists.txt	4 years ago
tianshuo78520a	2e93233899	Add WITH_XPU_BKCL in Kunlun-CI (#30919 )	4 years ago
Qi Li	34f1628ce8	[ROCM] update fluid platform for rocm39 (part2), test=develop (#30774 )	4 years ago
Jacek Czaja	9e527d9956	[oneDNN] Added basic changes for elementwise_add_grad bf16 (#30925 )	4 years ago
Chengmo	c98f144fbc	add truncated gaussian random (#30922 ) add truncated gaussian random	4 years ago
liuyuhui	4a8b8b4547	[Kunlun] add gen_bkcl_id_op, support multi XPU cards training using multiprocess (#30858 )	4 years ago
liym27	39f41cb47f	Performance optimization for dynamic setitem: Call op set_value to speed up because the original call to TensorToPyArray will introduce unnecessary data copy. (#30817 )	4 years ago
liuyuhui	bef46ccfc8	[Kunlun]fix include files of gen_comm_id_helper.cc (#30917 )	4 years ago
wanghuancoder	aab3a3012e	add include for heterbox_trainer.cc, develop=test (#30910 )	4 years ago
taixiurong	24873f4f77	dyngraph (#30892 )	4 years ago
Adam Osewski	092a2b1413	More UT for LayerNormFuse pass (#30891 ) * Additionally change to not throw error from inside pass.	4 years ago
tianshuo78520a	a80fe67f84	Change cmake/third_party files for CI (#30833 )	4 years ago
Jacek Czaja	abfa822650	[oneDNN]Extended adaptive pooling support for oneDNN pool kernel (#30757 )	4 years ago
joanna.wozna.intel	73cdea01d4	Add bf16 fast performance verification (#30551 ) * Update Xbyak and add bf16 fast performance verification * Fix formating * Change LOG message * Trigger an update of a new tag	4 years ago
Shang Zhizhou	e6095bc2ce	fix split trt plugin initialize (#30875 ) * fix split trt plugin initialize * update	4 years ago
WangXi	6e3856d3fb	fix xpu dygraph place (#30868 )	4 years ago
wanghuancoder	35c5b23f68	use iwyu clean include second time, test=develop (#30829 ) * use iwyu clean include second time, test=develop	4 years ago
cucuzg	ac2e2e6b7f	add clip_by_norm on kunlun, *test=kunlun (#30862 )	4 years ago
wawltor	b7560a59ab	fix the broadcast for the large second input (#30818 ) fix the broadcast for the large second input	4 years ago
JamesLim	6e1e036a75	Implement cuda kernel for index_sample. (#30380 )	4 years ago
AshburnLee	666efc2336	Call new cudnn batch norm API regardless of data type and data layout (#30157 )	4 years ago
QingshuChen	5c8455d6ea	try again if kunlun memory malloc failed (#30855 ) * try again if kunlun memory malloc failed * minor	4 years ago
石晓伟	2ac4143b6c	support xpu with analysis predictor, test=develop (#30832 ) * support xpu inference with analysis predictor, test=develop * merge the cmake of the xpu toolchain, test=develop * add c-apis, test=develop * fix a bug in extern_xpu, test=develop	4 years ago
liuyuhui	2cb55eff57	fix WITH_XPU_BKCL in CMakeLists.txt (#30854 )	4 years ago
Adam Osewski	4f066e316e	Layer normalization fuse pass. (#30721 )	4 years ago
WangXi	b1026f64af	【kunlun】dygraph supports multi xpu card training (#30671 )	4 years ago
joanna.wozna.intel	04532b8a83	Update Xbyak to v5.81 (#30809 )	4 years ago
Shang Zhizhou	b909450994	fix trt plugin clone and initialize bugs in TRT7.1+ (#30709 ) * fix trt plugin clone and initialize bugs * fix unit test error * enable trt in ci py3 * update unittest timeout	4 years ago
Wilber	b08ae368bb	ci compilation depends on a stable release (#30755 ) * update lite tag * disable ut	4 years ago
Thunderbrook	cb66c53c2d	dump to cpu (#30750 ) * dump to cpu * format * format * format	4 years ago
Chengmo	d3fac0ea85	fix int64 bug (#30780 ) fix push sparse int64 bug	4 years ago
Qi Li	69875dc42c	[ROCM] update fluid memory for rocm35 (part1), test=develop (#30758 )	4 years ago
QingshuChen	c35a9880f9	fix malloc L3 failed bug for kunlun (#30745 ) * fix malloc L3 failed bug for kunlun * minor	4 years ago
WangXi	31ed9c9eed	Fleet distributed strategy support pure fp16 (#30754 )	4 years ago
Zhen Wang	53d01afed6	Fix the nan bug when passing all zero values into clip_by_norm_op. (#30777 )	4 years ago
ShenLiang	3858f458ea	rm Singleton of reducer (#30775 )	4 years ago
Qi Li	f89da4ab45	[ROCM] update fluid platform for rocm35 (part1), test=develop (#30639 ) * [ROCM] update fluid platform for rocm35 (part1), test=develop * address review comments, test=develop	4 years ago
Wojciech Uss	fc00240575	A fix for oneDNN matmul kernel. Fixes issue #30309 (#30723 )	4 years ago
lidanqing	46989e889b	Fix python3 incompatibility issues (#30698 ) * solve python3 incompatibility issues * update checksum	4 years ago
alncat	5b59499e57	fixed compilation error on gcc 4.8.x due to the usage of isfinite (#30733 )	4 years ago
Chengmo	78d37c3f75	【Paddle.Fleet】Fix brpc get hostname (#30703 ) * fix Brpc get hostname	4 years ago
taixiurong	caf3680bbc	fix bugs in transformer predict in xpu place (#30730 ) * transformer predict * trans bug fix	4 years ago
jakpiase	f8da5536ed	REUPLOAD Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30719 ) * added external reorder to profiler * resolved conflict * added enable_static * initial version of lstm, not working yet * added lstm to operators.cmake * added vanilla lstm mkldnn op * added peephole weights integration * minor changes * added formatting * added fusion_lstm_mkldnn to static_whitelist * added formatting * removed comment * moved use_peepholes attribute inside is_cached block * reverted wrong changes * minor formatting change * minor changes * changed stream handling * minor change * added datatype to GetExpectedKernelType() * added reading stream from TLS	4 years ago
liuyuhui	67abfc1588	[Kunlun] fix dead lock for exec_op_count_ (#30718 )	4 years ago
alncat	5ace20fc3f	modified conv+bn fuse pass to fix wrong mask in mask rcnn (#30704 )	4 years ago
Tao Luo	824a79d383	Revert "Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661 )" (#30708 ) This reverts commit `d834f4e6e8`.	4 years ago
lilong12	7fbc68a2c0	update, test=develop (#30692 )	4 years ago
jakpiase	d834f4e6e8	Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661 ) * added external reorder to profiler * resolved conflict * added enable_static * initial version of lstm, not working yet * added lstm to operators.cmake * added vanilla lstm mkldnn op * added peephole weights integration * minor changes * added formatting * added fusion_lstm_mkldnn to static_whitelist * added formatting * removed comment * moved use_peepholes attribute inside is_cached block * reverted wrong changes * minor formatting change * minor changes	4 years ago
arlesniak	5bf25d1e8b	More precise mkldnn kernel rules in GetExpectedKernelType (#29840 ) * More precise mkldnn kernel choice in GetExpectedKernelType * Fixes after review * Refresh develop for CI * CI experiment * get back from CI exper	4 years ago
Jacek Czaja	173660be7b	[oneDNN] Cache oneDNN stream not to recreate in each oneDNN op (#30358 )	4 years ago
Shang Zhizhou	ae0f88a988	add DLA support：C++&&Python api (#30165 ) * add dla * add dla done * add python api Co-authored-by: shangzhizhou <root@szth-rp-fanyi-opera49.szth.baidu.com>	4 years ago
chentianyu03	fb7fbc7a5d	fix abs bug and add abs test case (#30637 ) * add abs test case * use std::abs to fix abs bug * fix the abs bug * fix abs bug	4 years ago
ShenLiang	9514b4aa5f	Fix scatter grad bug (#30604 )	4 years ago
Pei Yang	cf9bdb9404	extend trt ut timeout threshold (#30537 )	4 years ago
Thunderbrook	1bebc09253	solve build gpu task core (#30626 ) * build gpu task core * format	4 years ago
石晓伟	33bf6eb753	revert external gflags, test=develop (#30623 )	4 years ago
Jacek Czaja	dfdb0359ea	- Disabling oneDNN inplace pass (#30588 )	4 years ago
TTerror	10271ddfc4	support reduce_max op on kunlun (#30581 ) * support reduce_max op on kunlun * support reduce_max op on kunlun * support reduce_max op on kunlun * support reduce_max op on kunlun	4 years ago
QingshuChen	5013c67644	fix softmax bug for multi_card in kunlun (#30600 )	4 years ago
wuhuanzhou	7e671c07b6	optimize unity build (#30195 ) * optimize unity build, test=develop * fix code style error, test=develop * fix code style error and test /MP settings, test=develop	4 years ago
liuyuhui	e5b0d9e1fc	[Kunlun] Add condition_variable and notify() in BindThreadedSSAGraphExecutor (#30586 )	4 years ago
Zhou Wei	9674e440e2	optimize windows CI, clear tp cache,polish code,improve level of msvc log (#30579 )	4 years ago
wanghuancoder	90773473a0	use nvtx push pop in timeline (#30567 ) * delete empty line of pybing.cc, test=develop * use nvtx push pop in timeline, test=develop * change year, test=develop * add #ifdef PADDLE_WITH_CUDA, test=develop * add #ifndef WIN32, test=develop * is_pushed to is_pushed_, test=develop	4 years ago
chentianyu03	358106fcb0	make abs op support complex types (#30375 ) * rewrite abs op * rewrite abs op and remove abs in activation * remove abs register in old codes * fix abs_grad type error * fix abs double_grad output name error * modify abs_grad, abs_grad_grad functor for windows building * format code style * fix the bug of result is nan when the divisor is zero * add missing abs attr and add abs for float16	4 years ago
Wilber	2d5758c456	update. (#30585 )	4 years ago
Tao Luo	9dd71c74df	disable test_analyzer_detect (#30541 )	4 years ago
tangwei12	c9e78a22c5	add trainers for pserver (#30523 ) * add trainers for pserver Change-Id: I1a75793ec81ce126d07f4c47cae09b95d530bbc8	4 years ago
wanghuancoder	d1b25ed9d7	add some RecordEvent, for dygraph timeline (#30299 ) * add some RecordEvent, for dygraph timeline, test=develop * change GpuMemcpySync to memory::Copy, test=develop * fix compile problem, test=develop * fix compile problem, test=develop * fix, test=develop * fix, test=develop	4 years ago
YUNSHEN XIE	bbea5a1fa9	The new unit test cannot have the same name as the existing unit test (#29878 ) * check UT Duplicate name * fix error * Optimized log display * modified exit code	4 years ago
liym27	ff25c5b36f	Fix bug: GetAttrValue should deal with attr with attrType vector<double> (#30536 )	4 years ago
WangXi	572c466d19	[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer (#30455 )	4 years ago
ykkk2333	549855ac20	add rmsprop_op_xpu test=kunlun (#30493 ) * add rmsprop_op_xpu test=kunlun * modified rmsprop_op_xpu error code. test=kunlun	4 years ago
Zhou Wei	fb20ec9a4e	fix bug of multicard grad ncclAllReduce (#30553 )	4 years ago
Zhen Wang	f30d00553a	Fix the compiling error of update_loss_scaling when using cuda9. (#30538 )	4 years ago
Leo Chen	81217a94d8	unify calling cudaSetDevice (#30470 ) * unify calling cudaSetDevice * fix compile	4 years ago
pangyoki	00554b3f6b	fix error message of Inplace strategy (#30520 )	4 years ago
Leo Chen	7043b8cfc6	support layer_norm fp16 in dygraph amp (#30430 ) * support layer_norm fp16 in dygraph amp * add ut * refine code	4 years ago
wanghuancoder	59ad6ff3e3	delete empty line of pybing.cc, test=develop (#30529 )	4 years ago
hutuxian	e207fe6385	Ascend Framework Part2: pybind files (#30410 )	4 years ago
hutuxian	40ede12631	Ascend Framework Part1: OP & Wrapper (#30281 )	4 years ago
liuyuhui	843dc3cdbd	[Kunlun]PR3: add xpu executor, multi xpu card train function optimization (#30317 )	4 years ago
QingshuChen	8489d4f76f	optimize batch_norm & pool op for kunlun (#30490 )	4 years ago
wanghuancoder	bd97192274	if pybind.cc changed, generate total report, test=develop (#30514 )	4 years ago
taixiurong	5e5c2827a3	fix range op crash in dygraph xpu place (#30469 )	4 years ago
JZ-LIANG	16ba0abc79	Recompute Offload: fixed bug in memcpy (#30484 )	4 years ago
guofei	11e78ebaa3	Modify the calculation logic of LambOptimizer (#29313 ) * Modify the calculation logic of LambOptimizer	4 years ago
Adam Osewski	c5ffad126c	[oneDNN] Refactor fuse pass helper functions to one place. (#30460 ) * Move pass tester helper functions to single common place. * Use helper functions in two more fuse pass tests.	4 years ago
Zhang Ting	c9a334e1b3	add VecCastCUDAKernel (#30296 )	4 years ago
pangyoki	13d757362c	Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103 ) * add view strategy on squeeze,unsqueeze,reshape,flatten * add squeeze unittest * add unittests * use View strategy as name rather than Reuse Allacation * fix view api doc * fix format * use core.ops when input of reshape2 is Tensor * fix test_cross_entropy_loss error because of reshape2 * fix test_cross_entropy_loss error because of reshape2 * add inplace strategy * add elementwise_add sub * let backward op not use inplace * grad op do not use inplace * fix memory increase error and add leaf error message * delete selected_rows * change op_function * little change * solve HandleViewBetweenInputAndOutput * add unittest and leaf error message * merge view error * optimize op_function_generator format and support sum inplace op * fix format of basic_engine * fix format for framework * little change of variable wrapper * add reshape, squeeze, unsqueeze, scatter api * add relu elu tanh softmax inplace api * fix test_squeeze_op unittest * fix test_relu_op unittest * fix comment problems * delete sample code of inplace api * add reference of grad_pending_nodes in basic_engine * fix unittest name * add inplace apis into wlist * fix error message * add PADDLE_ENFORCE for set grad op twice * fix head file error	4 years ago
Yang Zhang	008b0a8b56	Fix float64 bug in layer norm (#30452 ) built-in `rsqrt` is shadowed	4 years ago
石晓伟	715d862868	export global google flags to users, test=develop (#30448 )	4 years ago
Wojciech Uss	88fc7a7d68	fix cache key for inplaced elementwise ops (#30404 )	4 years ago
wawltor	3d49882e2c	fix the rnn mask memory bug for out of read (#30459 ) * fix the rnn mask memory bug for out of read * update the code for the rnn	4 years ago
taixiurong	6a3c8725b0	support transformer v2.0 (#30381 )	4 years ago
ShenLiang	e85be1b1b2	fix flatten api grad (#30426 )	4 years ago
yaoxuefeng	6e0da01c61	Heter ps new (#30198 )	4 years ago
123malin	2a98e9323a	test=develop, add distributed_infer (#30300 ) * test=develop, add distributed_infer	4 years ago
QingshuChen	cf786d22ec	fix bug that cann't find mkldnn(kunlun) (#30394 )	4 years ago
cc	8e3a294045	skip quantizing ops in cpu inference (#30342 ) * skip quantizing ops in cpu inference, test=develop	4 years ago
alncat	7bbf3ac5ab	Added support for inference using quantization aware trained dygraph (#30288 ) * added support for inference using qunatization aware trained dygraph * added support for inference using qunatization aware trained dygraph correct boost get usage * Delete incorrect warning message (#30196) * fix warning and no grad * clean redundant API alias in 2.0 - part 2 (#30013) * delete paddle.nn.functional.assign * fix dynamic to static error * just add the op error message for the matmul xpu (#30246) add the op error message for the matmul xpu * Add Static Variable Clone (#30208) Add clone method for static Variable so that this interface will be same as dygraph. It fixed some bugs in dy2stat * use wget to replace curl to download the lcov file (#30229) * use wget to replace curl to download the lcov file * add cache for lcov * fix test_pool3d_op timeout issue (#30248) * Fix unittests bugs. (#30250) * modify error message based on comments (#30189) * modify error message based on comments * edit code according to review. * Correct spelling according to review. * Fix bug for 'save mutiple method' (#30218) * Fix bug for 'save mutiple method' * To pass coverage. * edit code to pass coverage. * edit code to pass coverage. * add unittest for coverage. * change for coverage. * edit for coverage. * added support for inference using qunatization aware trained dygraph * Alias from paddle.fluid.layers.auc to paddle.static.auc (#30206) * add alias from fluid.layers.auc to static.auc * Update __init__.py * added support for inference using qunatization aware trained dygraph correct boost get usage * corrected boost get usage * corrected naming issues and enforcing zero check * correct paddle enforce message * added more error checkings * corrected error report message and optimized code * corrected findvar usage * corrected paddle_enforce in scope * correct error messages * correct error reporting format Co-authored-by: LielinJiang <50691816+LielinJiang@users.noreply.github.com> Co-authored-by: XiaoguangHu <46782768+XiaoguangHu01@users.noreply.github.com> Co-authored-by: wawltor <fangzeyang0904@hotmail.com> Co-authored-by: Huihuang Zheng <zhhsplendid@gmail.com> Co-authored-by: YUNSHEN XIE <1084314248@qq.com> Co-authored-by: Bai Yifan <me@ethanbai.com> Co-authored-by: gongweibao <weibao.gong@gmail.com> Co-authored-by: WeiXin <weixin10@baidu.com> Co-authored-by: Jiaqi Liu <liujiaqi06@baidu.com>	4 years ago
GaoWei8	180877e988	Softmax backward optimize (#30249 ) * softmax backward optimize	4 years ago
Zhou Wei	b1d8ff45d7	running unit test sigle GPU parallely on Linux/windows GPU (#29523 )	4 years ago
Zhang Jun	10a8f3e5c3	fix bug on compiling inference shared lib with crypto;test=develop (#30269 ) * fix bug on compiling inference shared lib with crypto;test=develop * fix cmake bug when build inference lib using -DWITH_CRYPTO=OFF * update cmake * remove unnecessary enforce message	4 years ago
Huihuang Zheng	28e156c27f	Fix Sleep Error in enforce.h (#30335 ) usleep function in <unistd.h> only takes argument less than 1,000,000. Current call can exceed this limit, we have to fix it. This PR can fix random CI error.	4 years ago
Leo Chen	3d015f1cf5	Set expected place in child thread for dataloader to avoid costing cuda memory on other card (#30338 ) * set expected place in child thread for dataloader * set device id when set tensor from numpy * revert tensor_py change * add compile guard * fix ci * fix bug	4 years ago
QingshuChen	2c1bba02e4	optimize memcpy perf for kunlun (#30291 ) * optimize memcpy perf for kunlun * remove useless unitest for kunlun mean * minor	4 years ago
ShenLiang	a60f17b89d	Support unused parameters in dynamic graph distributed (#30224 )	4 years ago
JZ-LIANG	75936d838f	Recompute Offload (#30233 )	4 years ago
lidanqing	a60893f6b5	correct the allowed dimension size (#30326 )	4 years ago
Chen Weihang	c8c8f205ba	remove c++ stacktrace hint (#30325 )	4 years ago
tangwei12	5e839e4da5	add sparse embedding & load vars for 2.0 & gloo bug fix (#30306 ) * add sparse embedding & load vars for 2.0 Change-Id: I36b59ed5f015189dc9d9d2e34a9357722d369f1b * fix hdfs gloo Change-Id: Ia84d579053720ad804183e54c9a04b4f031c79c6 * fix gloo hdfs Change-Id: I5ab982fd483cddc10adcdef0b8aa83aca976cb9e * move loadvar/sparse embedding from incubute to static Change-Id: I57081d3545ad2efab78c72420d2162c0eacaf3a0	4 years ago
tangwei12	25f80fd304	Fix/distributed proto (#29981 ) * rename sendrecv.proto to namespace paddle.distributed * split ps with distributed	4 years ago
Chengmo	d479ae1725	【Paddle.Fleet】Support local save sparse param (#30175 ) * add save tensor support Co-authored-by: seiriosPlus <tangwei12@baidu.com>	4 years ago
Double_V	231501fefc	fix elugradgrad test fail & error message opt (#30171 ) * fix elugradgrad test fail and error message opt * fix unitest,test=develop * Update prroi_pool_op.h fix error message * opt message,test=develop * fix ci fail,test=develop	4 years ago
Zhen Wang	fb49ea388e	Fix the accuracy problem of allclose op when using float64 data type in static mode. (#29890 ) * Fix the accuracy problem of allclose op when using float64 data type in static mode. * Format the code style.	4 years ago
yaoxuefeng	4656525e24	fix datanorm error msg (#30294 )	4 years ago
furnace	77051cc9f0	add fp16 support for tril_triu op (#30186 )	4 years ago
石晓伟	efa54629fb	fix header file paths of gflags, commit 3, test=develop (#30273 )	4 years ago
Chengmo	5b2c15afcd	Fix server.h include device_context (#30243 ) * fix cmake Co-authored-by: seiriosPlus <tangwei12@baidu.com>	4 years ago
石晓伟	a0ee09148e	enhance error msgs of fusion_seqpool_cvm_concat_op.cc, test=develop (#30240 )	4 years ago
石晓伟	a66eebab5c	fix header file paths of gflags, commit 4, test=develop (#30274 )	4 years ago
石晓伟	8c4500ff6d	fix header file paths of gflags, commit 2, test=develop (#30272 )	4 years ago
liym27	b4989fb744	Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126 )	4 years ago
wangchaochaohu	8dcae0c55d	register OPMaker and Infer Shape Check for fused_elementwise_add (#30259 )	4 years ago
AshburnLee	924aac2216	Add tf32 switch for cuDNN (#29192 )	4 years ago
石晓伟	8ce2482b80	fix header file paths of gflags, commit 1, test=develop (#30271 )	4 years ago
chentianyu03	c7371b7b20	type promotion for grad (#30177 ) * type promotion for grad * add type promotion for div op	4 years ago
liym27	3ce878f309	Check the rank of input in kernel of set_value op (#30147 )	4 years ago
WeiXin	66dc4ac77b	modify error message based on comments (#30189 ) * modify error message based on comments * edit code according to review. * Correct spelling according to review.	4 years ago
wawltor	fee424411a	just add the op error message for the matmul xpu (#30246 ) add the op error message for the matmul xpu	4 years ago
GaoWei8	0a21924a8d	optimize softmax forward (#30217 ) * optimize softmax forward	4 years ago
wangchaochaohu	af80859dd6	reduce the occupied size of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885 )	4 years ago
zhang wenhui	5932fee60a	enhance error message, test=develop (#30220 )	4 years ago
pangyoki	da16b33f2e	add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913 ) * add view strategy on squeeze,unsqueeze,reshape,flatten * add squeeze unittest * add unittests * use View strategy as name rather than Reuse Allacation * fix view api doc * fix format * use core.ops when input of reshape2 is Tensor * fix test_cross_entropy_loss error because of reshape2 * delete selected_rows * change op_function * little change * solve HandleViewBetweenInputAndOutput	4 years ago
Jacek Czaja	4aba17b5db	[oneDNN] Added UT for testing elementwise_mul caching (#30203 ) * - Added UT for testing elementwise_mul caching * lint fixes	4 years ago
Zhen Wang	7f7dfccf20	Support pure fp16 training for AMP API. (#29544 ) * add cast ops before and after unsupported fp16 ops. * Keep partial net in FP32 pattern. * Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode. * Add fp16 support for adam op. * add multi precision attr for adam. * Fix the bug of test_multi_precision_fp16_train UT. * Code format for CI. * Fix the redefine error about MPTypeTrait on windows. * fix bugs of the _create_accumulators func in Momentum. * fix bug when inserting post cast op. * Add the update_loss_scaling op in allow_set of UnusedVarCheck. * Update for ci coverage. * Add some doc for OptimizerWithMixedPrecision. * Fix the code style. * Imporve the doc of `amp_init`. * Change for fp16 testing if users have the infer program defined in separate way.	4 years ago
Leo Chen	789743e190	use cuda generator in bernoulli cuda kernel (#30199 )	4 years ago

... 2 3 4 5 6 ...

18556 Commits (52b05baca349d1bbfcbb6ed78b289d6c66dbec3e)