Paddle

Commit Graph

Author	SHA1	Message	Date
Qi Li	59940cb383	[ROCM] update fluid operators for rocm (part8), test=develop (#31309 )	4 years ago
tangwei12	5d7a8b05f8	fix sycn training error (#31357 ) * fix sycn training error Change-Id: Ie2feebcf0b5b2984fd59cfcdde0c817840e203d2	4 years ago
Qi Li	ec72f5b235	fix ELU output for nan, test=develop (#31132 )	4 years ago
Qi Li	65bcaeb004	[ROCM] update fluid operators for rocm (part5), test=develop (#31258 ) * [ROCM] update fluid operators for rocm (part5), test=develop * address review comments, test=develop * fix typo, test=develop	4 years ago
YUNSHEN XIE	2111d912d4	Decrease threshold for failed ut retry (#30903 ) * Decrease threshold for failed ut retry * retry Method upgrade * second method upgrade * fix error * Remove the comment lines * test for modified_retry_times * fix error * fix some error * fix error * fix error * remove test content * fix error * Reduce duplicate code * fix more than 10 ut failed bug * fix more than 10 ut failed bug on mac	4 years ago
Pei Yang	2e9e3fad15	add n-d input support for trt scale converter (#31316 ) * add n-d input support for trt scale converter * add flatten for ut * fix dims	4 years ago
Shang Zhizhou	6404c43814	support trt serialize when load model from memory (#31342 ) * support trt serialize when load model from memory * delete conv_bn_fuse_pass before tensorrt, with which trt serialize engine id is not stable * Revert "delete conv_bn_fuse_pass before tensorrt, with which trt serialize engine id is not stable" performance degradation, fix in the future This reverts commit fa6cd17e60b15df351efda379ddd00e9e9c1fea9. * add delete conv_bn * delete path when delete_cache_files	4 years ago
Gradie	d79fdc3d62	lamb_op_xpu;test=kunlun (#31012 ) * lamb_op_xpu;test=kunlun * modify lamb_op_xpu.cc;test=kunlun * delete atol lamb_op_xpu; test=kunlun * update xpu.cmake;test=kunlun * test_error 1e-5,lamb_op_xpu;test=kunlun * error1e-5,lamb_op_xpu,test=kunlun * delete atol lamb_xpu;test=kunlun * modify atol,lamb_op_xpy;test=kunlun * lamb_op_xpu;test=kunlun * lamb_op_xpu;test=kunlun * lamb_op_xpu, XPUOptest;test=kunlun * lamb_op_xpu;test=kunlun * lamb_op_xpu;test=kunlun * lamb_op_xpu;test=kunlun * lamb_op_xpu;test=kunlun * lamb_op_xpu;test=kunlun * lamb_op_xpu;test=kunlun * lamb_op_xpu;test=kunlun * lamb_op_xpu;test=kunlun * lamb_op_xpu;test=kunlun * lamb_op_xpu;test=kunlun * lamb_op_xpu;test=kunlun * lamb_op_xpu,modify xpu_cmake; test=kunlun * lamb_op_xpu;test=kunlun * lamb_op_xpu,modify xpucmake;test=kunlun	4 years ago
danleifeng	d1075df2e8	topo and memory performance for heterps (#30440 ) * topo and memory performance for heterps; test=develop * add trainwithprofiler in heter trainier; test=develop	4 years ago
Qi Li	72d99c5dcd	[ROCM] update fluid operators for rocm (part4), test=develop (#31225 )	4 years ago
cucuzg	91635de390	opt matmul and matmul_v2 on kunlun, test=kunlun (#31326 ) add clip_by_norm on kunlun, test=kunlun opt matmul and matmul_v2 on kunlun, *test=kunlun	4 years ago
Wilber	e20234094c	Fix xpu compile and cipher symbol problem. (#31271 )	4 years ago
wuhuanzhou	30858d8974	fix compilation errors for missing brpc header files, test=develop (#31325 )	4 years ago
石晓伟	625482f752	inference modification for custom operator, test=develop (#31312 )	4 years ago
wuhuanzhou	a13f1d6930	optimize unity build (#31119 ) * optimize unity build, test=develop * fix compilation error on Windows, test=develop * fix compilation error, test=develop * fix code style error, test=develop	4 years ago
jiangcheng	8f4ac6b525	optimize topk op through limit SortTopK kernel entrance, test=develop (#30403 )	4 years ago
alncat	bfb8a64234	updated conv bn fuse pass to make it compatible with latest batch_norm op (#31272 )	4 years ago
Chen Weihang	5610c1717e	fix dtype unmatched (#31305 )	4 years ago
Qi Li	9b016c7cb7	[ROCM] update fluid operators for rocm (part2), test=develop (#31211 )	4 years ago
niuliling123	2fd999d979	Optimized the adaptive_avg_pool2d op when output_size == 1 (#31197 ) * Optimized the adaptive_avg_pool2d op when output_size == 1	4 years ago
石晓伟	1da3280660	inference modification for custom operator, test=develop (#31283 )	4 years ago
Zhou Wei	af9066e89c	[Custom OP]add PD_THROW and PD_CHECK for User Error message (#31253 ) * [Custom OP]add PD_THROW and PD_CHECK for User error message * PD_THROW and PD_CHECK, fix comment * fix Windows error message * fix Windows error message * fix CI	4 years ago
石晓伟	8c94d8cb4c	[Custom OP] change the user header file format, test=develop (#31274 )	4 years ago
Jiabin Yang	038ce70d69	[Custom OP] Support stream set on Custom Op (#31257 )	4 years ago
Jiabin Yang	0c38708a90	[Custom Op] Remove unsupport dtypes (#31232 ) * remove remove_unsupport_dtype * remove remove_unsupport_dtype * remove test dtype * add more include * change dtype.h's enum as enum class to avoid conflict with inference lib * make enum as enum class * remove additional test * merge develop * polish code	4 years ago
WangXi	b8bce682e0	xpu support fuse allreduce (#31104 )	4 years ago
Chen Weihang	126633c50f	[CustomOp] Split build op marco & polish details (#31229 ) * split build op marco & polish details * revert register api del * fix other unittest	4 years ago
tangwei12	903235945b	loglevel adjustment for distributed training (#31205 ) Change-Id: I6210ce9c60bed48f3323c47b16500302b66cedf2	4 years ago
Qi Li	28b356b9a2	[ROCM] update fluid framework for rocm (part6), test=develop (#31015 )	4 years ago
Qi Li	c8fac5ee30	[ROCM] update fluid framework for rocm (part5), test=develop (#31014 )	4 years ago
Qi Li	580447d019	[ROCM] update fluid framework for rocm (part4), test=develop (#31013 )	4 years ago
Wilber	7d91974c91	enable lite ut. (#30890 )	4 years ago
Guanghua Yu	d18c5e47f3	fix ignore_index check in softmax_with_cross_entropy (#31201 )	4 years ago
chentianyu03	ca3b6bcf78	add cache for VariableWrapper (#30880 ) * add cache for VariableWrapper * modify args names and vlog level * format code style * add log when set cache to variable_wrapper * add log when set cache to variable_wrapper * add comment to variableWrapper cache * format code style	4 years ago
wangchaochaohu	f114c3f8ca	fix the branch of code choose (#31200 )	4 years ago
joanna.wozna.intel	d11602481c	Add bf16 gru model test (#31158 )	4 years ago
jakpiase	2f1165342b	OneDNN hardswish integration (#30211 )	4 years ago
Chen Weihang	e8cdb49aa9	[CustomOp] Support attributes as func input in custom op (#31128 ) * add simple attr support and test * add int, float attr support * support other attribute * add custom attrs test in cmake * polish details * fix test failed * add backward test * update test flags	4 years ago
Zhou Wei	ffbf71359a	modify custom op dependent from paddle_framework to paddle_custom_op (#31195 )	4 years ago
Leo Chen	0f1fde5102	fix the modification of set_expected_place (#31177 ) * revert the modification of set_expected_place * set device before op run * add ut	4 years ago
lilong12	dc8dfba35b	align the default value of some configuration for fleet to that of single cards (#30740 ) * update, test=develop	4 years ago
lilong12	a373aa7645	fix the bug in expand_v2 op (#30984 ) * update, test=develop	4 years ago
Thunderbrook	c4f279fe8d	support multi node in heterps (#31102 ) * push multi node * multi node * MultiThread * remove log * solve bug in 30829	4 years ago
liu zhengxi	ae2be49f40	Add cublas_handle() to expose cublas_handle to ops (#31157 ) * add get_cublas_handle() api * update format * add unittests * alter function name	4 years ago
Pei Yang	00b09e86ac	[Paddle-TRT] support group_norm (#31040 ) * add group norm plugin * fix compile problems * move concat axis check to trt op teller * add nbDims for scale and bias nv dims * add group norm unit test * fix unittest * add trt version restriction for group norm op teller * fix unittest	4 years ago
Chen Weihang	1ce96fa118	[CustomOp] Add new paddle custom op so (#31141 ) * add new custom op so * fix use new method error * fix test failed	4 years ago
tangwei12	ebbdf52557	fix entry (#31079 ) * fix entry * fix distributed lookup table fuse case * fix entry bug at first time * move entry from paddle.fluid -> paddle.distributed * fix ut with paddle.enable_static() Co-authored-by: malin10 <malin10@baidu.com>	4 years ago
Qi Li	ee76ea72de	[ROCM] update fluid collective op for rocm, test=develop (#31075 )	4 years ago
yaoxuefeng	d8fa65a3a8	fix heter compile (#30518 )	4 years ago
Zhou Wei	4b220550ef	[Custom OP]Fix problem of custom op unitests on Windows CI (#31114 ) * fix some problem of Windows custom op * fix some problem of Windows custom op * fix some problem of Windows custom op	4 years ago
Zhou Wei	be61c2d06b	support build whl and inference library nightly,test=windows3 (#30616 )	4 years ago
alncat	5d6a8c7b73	added support for fake_quantize_dequantize_abs_max op in quantization… (#30896 ) * added support for fake_quantize_dequantize_abs_max op in quantization inference pass * remove const_cast to pass ci * remove compare operator to pass ci-coverage * added detailed error message for unregistered tensorrt_subgrah_pass	4 years ago
Jacek Czaja	d3f09ad702	Update of onednn to 2.2 (#31067 )	4 years ago
Guanghua Yu	24ba5ee05c	merge develop conflict (#31122 )	4 years ago
Qi Li	cced930b61	[ROCM] update fluid operators for rocm (part1), test=develop (#31077 )	4 years ago
wangchaochaohu	364cfa2686	fix windows for optimization of elementwise_add Op (#31068 ) * fix windows for optimization of elementwise_add Op	4 years ago
joanna.wozna.intel	781df300d0	Unification of BF16 enablement process (#31034 ) * Unification of bfloat16 enablement process and refactor * Remove unnecessary function * Standardize the output name search	4 years ago
Zhong Hui	16fe11d71e	fix softmax cross entropy integer overflow (#30590 ) [BUG FIX] Fix softmax cross entropy overflow problem.	4 years ago
Zhou Wei	44ee251fde	fix UNIX cmake problem (#31113 )	4 years ago
Qi Li	a60d93fb77	[ROCM] update fluid framework for rocm (part2), test=develop (#31010 )	4 years ago
Thunderbrook	565354f676	support save multi sparse table in one path (#31108 ) * save multi table one path * format	4 years ago
Qi Li	50967135a5	[ROCM] update fluid framework for rocm (part3), test=develop (#31011 )	4 years ago
Qi Li	8fe09faf14	[ROCM] update fluid framework for rocm (part1), test=develop (#31009 )	4 years ago
Qi Li	334296306c	[ROCM] update fluid platform for rocm39 (part4), test=develop (#30936 )	4 years ago
Shang Zhizhou	a5c56d83a1	update trt int8 calibrator to IEntropyCalibratorV2 (#31060 ) * update trt int8 calibrator to IEntropyCalibratorV2 * add delele opt_cache for trt_split_converter_test	4 years ago
Zhou Wei	adaec0073d	[2.0Custom OP]Support New Custom OP on Windows (#31063 ) * [2.0.1]Support New Custom OP on windows * fix CI * fix code style * fix CI * fix CI * fix coverage * fix CI * fix CI	4 years ago
Qi Li	1d996637e6	[ROCM] update fluid imperative for rocm (part1), test=develop (#31017 ) * [ROCM] update fluid imperative for rocm (part1), test=develop * [ROCM] update reducer.cc after merge, test=develop * update reducer cmake after merge, test=develop	4 years ago
JamesLim	b95eb38b8a	fix the bug in backward OP of index_sample. (#31026 )	4 years ago
Chengmo	6b3371e0c7	Remove PE special profiler (#30886 ) * remove pe special profiler * add profiler info	4 years ago
Chen Weihang	6beeafe797	[CustomOp] Add more dispatch marco for users (#31058 ) * add more dispatch marco * add more dispatch marco * add more tests * revert unneeded change * add timeout for test dispatch * add float and complex test * remove and marco	4 years ago
TTerror	d5323dab41	add squeeze_op/unsqueeze_op on kunlun;fix conv op and parallel executor;optimize lookup_table op (#31056 ) * add squeeze_op/unsqueeze_op on kunlun; fix conv op and parallel executor on kunlun; optimize lookup_table op on kunlun * update squeeze/unsqueeze op	4 years ago
123malin	16b4260b2f	test=develop, save/load, shrink (#30625 ) * test=develop, save/load, shrink Co-authored-by: seiriosPlus <tangwei12@baidu.com>	4 years ago
Jiabin Yang	628451af06	hide useless headers and add complex support (#31074 )	4 years ago
Wilber	463eae0383	update paddle_fluid.so to paddle_inference.so (#30850 ) * update paddle_fluid.so to paddle_inference.so	4 years ago
liym27	5b367dab44	[static setitem] Support the index is Tensor; step>1; step<0 .(#30949 ) * [static setitem] support the index step > 1. tensor_a[::3] = value * [static setitem] support the index step < 0. Eg: tensor_a[::-3] = value * [static setitem] support the index is Tensor. eg: tensor_a[tensor_3:0:-1] = value * Add op version.	4 years ago
Qi Li	eb3050fa9a	[ROCM] update fluid inference for rocm (part1), test=develop (#31018 )	4 years ago
Jacek Czaja	f7465641c3	Added reshape grad bf16 (#31035 ) * - added Reshape grad bf16 * - Added reshape grad bf16 * - cosmetics in py	4 years ago
Wojciech Uss	615d8a2264	Modify relu native implementation 2 (#30996 ) * Modify relu native implementation * fix GPU performance	4 years ago
ShenLiang	9401173e3a	Remove scale loss before reduce in dygraph (#30807 )	4 years ago
Wilber	0020d91506	fix python pass builder error. (#30946 )	4 years ago
Wilber	39aeaa160e	fix jetson problem (#30939 )	4 years ago
Wilber	01ccfbcde9	update trt error message when input height or width is -1 (#31019 )	4 years ago
Wilber	cf8b8f9c5e	resolve memory leak in cudnn8.0 (#31029 )	4 years ago
Guanghua Yu	5b267474a9	add offset parameter in roi_align,generate_proposals.etc ops (#30864 ) * add parameter in roi_align op	4 years ago
Chen Weihang	75f81233ae	fix regex error & simplify marco name (#31031 )	4 years ago
Zhang Ting	f0ee159280	enable exhaustive_search for forward and backward algos when dtype is float16 (#30959 ) * enable exhaustive_search for input_grad when dtype is float16 * enable exhaustive_search for forward algos	4 years ago
Pei Yang	9b54fe4154	add trt transpose and flatten converter (#31022 )	4 years ago
joanna.wozna.intel	caf9d39839	Add Conv Transpose BF16 (#30877 ) * Add conv transpose BF16 * Share function GetWeightsTz * Adjust to review and fix op compatibility * Add bias to unique handler name * Remove errors related to paddle enforce * Add conv2d_transpose to bf16 list and kernel refator	4 years ago
Chen Weihang	f649442ddd	New custom operator extension mechanism (#30690 ) * initial commit: simple demo * polish copyright format * add grap op simple demo * adapt uncertain number of argument * change trait marco name * add place & dtype support for add kernel * add dispath and infershape func * poish code & add notes * add dynamic_loader dep for paddle_framework * add new custom op test dir * polish impl details * add unittest for new custom op * fix failed unittest * Costum op (#1) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * Remove ShareData from user && Change CustomTensor to Tensor && Support more data type (#2) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * hid share data from and to * rename CustomTensor to Tensor * refactor register design & add test * change op_funtion to op_meta_info * split op meta info into .h and .cc * move get methods into friend class * move OpMetaInfoHelper into framework space * move CustomTensorUtils into framework space * change pybind api name * move PD C API into op meta info * add register custom op api * remove inference cmake change * refactor copy to api && change Reshape to lowercase && support more dtype && add more test (#3) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * hid share data from and to * rename CustomTensor to Tensor * support multi dtype * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * fix copy to error * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * polish detail & error message * polish test details * Add cast api && Change copy related api to copy_to && add more test (#4) * fix compile error * wrap framework tensor with LoDTensor * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * add CustomTensor default constructor * add size() for CustomTensor * make size const for CustomTensor * refactor place related api to circle the concept * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * fix compile error * make place const * make Tensor copy * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * debug CustomTensor core * remove additional head of framework * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * use back to shared ptr for custom tensor * add gpu test * merge latest cwh code in * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * adjust ut code of custom op * hid share data from and to * rename CustomTensor to Tensor * support multi dtype * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * remove lod, make reshape lowercase, add copy test and refactor copy api * fix copy to error * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add more test * add type cast * add cast and make copy to api * add cast and make copy to api * add cast and make copy to api * add cast and make copy to api * merge cwh code * merge cwh code * merge cwh code * merge cwh code * merge cwh code * add more error log * add more error log * polish code * used for test * remove test comment * remove test comment * fix uint8 type error * fix lost uint8 type error * add test for coverage * polish details by reviewer comments * add prefix for DISABLE_COPY_AND_ASSIGN Co-authored-by: Jiabin Yang <360788950@qq.com>	4 years ago
Zhou Wei	5c0332714f	fix bug of Linux UT parallel level (#30971 )	4 years ago
wuhuanzhou	9b3c80c8ab	update eigen version on Windows (#30573 ) * update eigen version on Windows, test=develop * add /bigobj for cl, test=develop	4 years ago
ShenLiang	dae3e1f337	Solve inconsistent order in each card in dynamic graph (#30931 )	4 years ago
WangXi	14d039e4a1	Fix the problem that the number of ops executed by xpu is wrong (#30961 )	4 years ago
Chen Weihang	010f2caa23	try to fix reader and signal test failed (#30960 )	4 years ago
Adam Osewski	3ba69809bf	Fix LayerNorm tester for gcc4.8 (#30962 )	4 years ago
Qi Li	93c1d9e761	[ROCM] update fluid platform for rocm39 (part3), test=develop (#30913 )	4 years ago
QingshuChen	15297a065c	fix depends of kunlun bkcl (#30945 )	4 years ago
liym27	97f7a70c01	Add error message for slice op(#30851 )	4 years ago
liuyuhui	87197f8c2e	[kunlun]fix sync in multi kunlun xpu dygraph training. (#30943 )	4 years ago
石晓伟	99bd16eb4e	bug fix of xpu lite engine, test=develop (#30918 ) * bug fix of xpu lite engine, test=develop * xpu zero copy tensor, test=develop * revert paddle/fluid/inference/tests/api/CMakeLists.txt	4 years ago
tianshuo78520a	2e93233899	Add WITH_XPU_BKCL in Kunlun-CI (#30919 )	4 years ago
Qi Li	34f1628ce8	[ROCM] update fluid platform for rocm39 (part2), test=develop (#30774 )	4 years ago
Jacek Czaja	9e527d9956	[oneDNN] Added basic changes for elementwise_add_grad bf16 (#30925 )	4 years ago
Chengmo	c98f144fbc	add truncated gaussian random (#30922 ) add truncated gaussian random	4 years ago
liuyuhui	4a8b8b4547	[Kunlun] add gen_bkcl_id_op, support multi XPU cards training using multiprocess (#30858 )	4 years ago
liym27	39f41cb47f	Performance optimization for dynamic setitem: Call op set_value to speed up because the original call to TensorToPyArray will introduce unnecessary data copy. (#30817 )	4 years ago
liuyuhui	bef46ccfc8	[Kunlun]fix include files of gen_comm_id_helper.cc (#30917 )	4 years ago
wanghuancoder	aab3a3012e	add include for heterbox_trainer.cc, develop=test (#30910 )	4 years ago
taixiurong	24873f4f77	dyngraph (#30892 )	4 years ago
Adam Osewski	092a2b1413	More UT for LayerNormFuse pass (#30891 ) * Additionally change to not throw error from inside pass.	4 years ago
tianshuo78520a	a80fe67f84	Change cmake/third_party files for CI (#30833 )	4 years ago
Jacek Czaja	abfa822650	[oneDNN]Extended adaptive pooling support for oneDNN pool kernel (#30757 )	4 years ago
joanna.wozna.intel	73cdea01d4	Add bf16 fast performance verification (#30551 ) * Update Xbyak and add bf16 fast performance verification * Fix formating * Change LOG message * Trigger an update of a new tag	4 years ago
Shang Zhizhou	e6095bc2ce	fix split trt plugin initialize (#30875 ) * fix split trt plugin initialize * update	4 years ago
WangXi	6e3856d3fb	fix xpu dygraph place (#30868 )	4 years ago
wanghuancoder	35c5b23f68	use iwyu clean include second time, test=develop (#30829 ) * use iwyu clean include second time, test=develop	4 years ago
cucuzg	ac2e2e6b7f	add clip_by_norm on kunlun, *test=kunlun (#30862 )	4 years ago
wawltor	b7560a59ab	fix the broadcast for the large second input (#30818 ) fix the broadcast for the large second input	4 years ago
JamesLim	6e1e036a75	Implement cuda kernel for index_sample. (#30380 )	4 years ago
AshburnLee	666efc2336	Call new cudnn batch norm API regardless of data type and data layout (#30157 )	4 years ago
QingshuChen	5c8455d6ea	try again if kunlun memory malloc failed (#30855 ) * try again if kunlun memory malloc failed * minor	4 years ago
石晓伟	2ac4143b6c	support xpu with analysis predictor, test=develop (#30832 ) * support xpu inference with analysis predictor, test=develop * merge the cmake of the xpu toolchain, test=develop * add c-apis, test=develop * fix a bug in extern_xpu, test=develop	4 years ago
liuyuhui	2cb55eff57	fix WITH_XPU_BKCL in CMakeLists.txt (#30854 )	4 years ago
Adam Osewski	4f066e316e	Layer normalization fuse pass. (#30721 )	4 years ago
WangXi	b1026f64af	【kunlun】dygraph supports multi xpu card training (#30671 )	4 years ago
joanna.wozna.intel	04532b8a83	Update Xbyak to v5.81 (#30809 )	4 years ago
Shang Zhizhou	b909450994	fix trt plugin clone and initialize bugs in TRT7.1+ (#30709 ) * fix trt plugin clone and initialize bugs * fix unit test error * enable trt in ci py3 * update unittest timeout	4 years ago
Wilber	b08ae368bb	ci compilation depends on a stable release (#30755 ) * update lite tag * disable ut	4 years ago
Thunderbrook	cb66c53c2d	dump to cpu (#30750 ) * dump to cpu * format * format * format	4 years ago
Chengmo	d3fac0ea85	fix int64 bug (#30780 ) fix push sparse int64 bug	4 years ago
Qi Li	69875dc42c	[ROCM] update fluid memory for rocm35 (part1), test=develop (#30758 )	4 years ago
QingshuChen	c35a9880f9	fix malloc L3 failed bug for kunlun (#30745 ) * fix malloc L3 failed bug for kunlun * minor	4 years ago
WangXi	31ed9c9eed	Fleet distributed strategy support pure fp16 (#30754 )	4 years ago
Zhen Wang	53d01afed6	Fix the nan bug when passing all zero values into clip_by_norm_op. (#30777 )	4 years ago
ShenLiang	3858f458ea	rm Singleton of reducer (#30775 )	4 years ago
Qi Li	f89da4ab45	[ROCM] update fluid platform for rocm35 (part1), test=develop (#30639 ) * [ROCM] update fluid platform for rocm35 (part1), test=develop * address review comments, test=develop	4 years ago
Wojciech Uss	fc00240575	A fix for oneDNN matmul kernel. Fixes issue #30309 (#30723 )	4 years ago
lidanqing	46989e889b	Fix python3 incompatibility issues (#30698 ) * solve python3 incompatibility issues * update checksum	4 years ago
alncat	5b59499e57	fixed compilation error on gcc 4.8.x due to the usage of isfinite (#30733 )	4 years ago
Chengmo	78d37c3f75	【Paddle.Fleet】Fix brpc get hostname (#30703 ) * fix Brpc get hostname	4 years ago
taixiurong	caf3680bbc	fix bugs in transformer predict in xpu place (#30730 ) * transformer predict * trans bug fix	4 years ago
jakpiase	f8da5536ed	REUPLOAD Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30719 ) * added external reorder to profiler * resolved conflict * added enable_static * initial version of lstm, not working yet * added lstm to operators.cmake * added vanilla lstm mkldnn op * added peephole weights integration * minor changes * added formatting * added fusion_lstm_mkldnn to static_whitelist * added formatting * removed comment * moved use_peepholes attribute inside is_cached block * reverted wrong changes * minor formatting change * minor changes * changed stream handling * minor change * added datatype to GetExpectedKernelType() * added reading stream from TLS	4 years ago
liuyuhui	67abfc1588	[Kunlun] fix dead lock for exec_op_count_ (#30718 )	4 years ago
alncat	5ace20fc3f	modified conv+bn fuse pass to fix wrong mask in mask rcnn (#30704 )	4 years ago
Tao Luo	824a79d383	Revert "Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661 )" (#30708 ) This reverts commit `d834f4e6e8`.	4 years ago
lilong12	7fbc68a2c0	update, test=develop (#30692 )	4 years ago
jakpiase	d834f4e6e8	Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661 ) * added external reorder to profiler * resolved conflict * added enable_static * initial version of lstm, not working yet * added lstm to operators.cmake * added vanilla lstm mkldnn op * added peephole weights integration * minor changes * added formatting * added fusion_lstm_mkldnn to static_whitelist * added formatting * removed comment * moved use_peepholes attribute inside is_cached block * reverted wrong changes * minor formatting change * minor changes	4 years ago
arlesniak	5bf25d1e8b	More precise mkldnn kernel rules in GetExpectedKernelType (#29840 ) * More precise mkldnn kernel choice in GetExpectedKernelType * Fixes after review * Refresh develop for CI * CI experiment * get back from CI exper	4 years ago
Jacek Czaja	173660be7b	[oneDNN] Cache oneDNN stream not to recreate in each oneDNN op (#30358 )	4 years ago
Shang Zhizhou	ae0f88a988	add DLA support：C++&&Python api (#30165 ) * add dla * add dla done * add python api Co-authored-by: shangzhizhou <root@szth-rp-fanyi-opera49.szth.baidu.com>	4 years ago

1 2 3 4 5 ...

18542 Commits (b48841ba2e7335eaa435a54436ed580d4aef001c)