Paddle

Commit Graph

Author	SHA1	Message	Date
Wilber	609c022222	shape op support int8 and uint8 tensor (#30201 )	4 years ago
Wilber	01a287bf0a	fix windows compile when WITH_PYTHON=ON and WITH_TENSORRT=ON (#30194 )	4 years ago
ruri	e42e1e80dc	Add version checking, test=op_version (#30129 )	4 years ago
Leo Chen	1f97d61c68	Add callback after TensorCopy (#30123 ) * change to tensor copy sync * change to tensor copy sync * make copy_to safe when use TensorCopy * refine code * add ut * add cudapinned garbagecollector * add testcase: cpu place -> cuda pinned place	4 years ago
Chengmo	528e03fc08	【Paddle.Fleet】Fix tensor table (#30075 ) * add tensor table	4 years ago
Wilber	ade244948c	disable mkldnn inplace pass on windows (#30164 )	4 years ago
joanna.wozna.intel	907262ee15	Fix analysis predictor test (#30191 ) * Add a necessary condition * Remove test for white list and add header	4 years ago
lijianshe02	2dc7ee276b	enhance error message of nll_loss op test=develop (#30125 ) * enhance error message of nll_loss op test=develop	4 years ago
Huihuang Zheng	54bf3f5a56	Refine PADDLE_ENFORCE Error Messages. test=develop (#30149 ) Improve some error messages in parallel_executor.cc, conditional_block_op.cc, recurrent_op.cc	4 years ago
Chen Weihang	d0fb06b27f	[Complex] Simplify prepared op impl to improve performance (#30153 ) * simplify prepared op impl to improve performance * fix kunlun compile error * continue fix kunlun compile error * only transform diff place when dtype diff * fix failed unittests * remove useless file * polish impl by review comment	4 years ago
123malin	c5b415bfd9	Improve Index select cuda kernel (#30139 ) * test=develop, add index_select_cuda kernel	4 years ago
wangchaochaohu	7dd551e08b	refine the paddle place support using str (#28769 )	4 years ago
WeiXin	404c16763a	Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t (#30161 )	4 years ago
Wilber	91a8a25721	enhance error info for py_func (#30138 ) * enhance error info for py_func * update	4 years ago
weihaoji	b8207af6bc	[XPU] Remove lite_xpu ut lite_resnet50_test since fusion pass changes introduced precision diff. test=develop (#30122 )	4 years ago
liuyuhui	15fac5e7fa	fix assign_op_xpu concat_op_xpu warining (#30120 )	4 years ago
Jack Zhou	f5428eca4f	fix enforce msg of sum xpu op (#30113 )	4 years ago
123malin	198fbdfb60	Add Lookahead and ModelAverage Optimizer (#30004 ) * test=develop, add model_average and lookahead	4 years ago
Leo Chen	adac38c506	add dispenable input for core.ops.reshape2/expand/slice (#30072 ) * add dispenable input 'shape' for core.ops.reshape2 * add dispenable inputs for core.ops.reshape2/expand/slice * add ut	4 years ago
ShenLiang	becf99d2e8	fix error message (#30135 )	4 years ago
Zhou Wei	30888ca343	Polish and Optimize the print/repr information of Layer (#29998 ) * Polish and Optimize the print/repr message of all layer * fix some code format	4 years ago
wangguanzhong	69839f8a9a	fix error message for distribute_fpn_proposals_op (#30116 )	4 years ago
QingshuChen	8e1c3ddf15	add aarch64 and sunway kunlun lib (#30027 ) * add aarch64 and sunway kunlun lib * minor * optimize elementwise_add for kunlun * update kunlun dependence * minor * minor	4 years ago
Shang Zhizhou	05b27695f1	add inference api： DisableTensorRtOps (#30109 ) * snap * add inference api: DisableTensorRtOPs * fix code style * update api to experimental * update variable name	4 years ago
石晓伟	53bb126510	fix a bug in op_version_registry, test=develop, test=op_version (#29994 )	4 years ago
xiemoyuan	3e0c492910	Optimize the error message of framework. (#30134 )	4 years ago
liym27	9922bd4125	Fix bug: In dynamic mode, if start or end is negetive, __getitem__ return wrong result(#30003 ) 1. when slice_item is a slice: 1) the start of __getitem__ should be std::max(start, 0) if slice 2) the start of __getitem__ should be std::min(end, dim) 2. when slice_item is an integer, it should be in [-dim_len, dim_len) 3. Fix error message to use accurate data	4 years ago
chentianyu03	666e665132	change the kron gradient when complex types (#29995 )	4 years ago
chentianyu03	a5e422c85d	add trace op_register_version and fix version bug; test=op_version (#30000 ) * add trace op_register_version and fix defaulf bug; test=op_version * add trace op_register_version; test=op_version * add trace op_register_version; test=op_version * add trace op_register_version; test=op_version * fix missing the template bug of vector; test=op_version	4 years ago
cc	9f34374b48	Fix the formate of raising error in randperm op (#30108 ) * fix the formate of raising error in randperm op	4 years ago
liuyuhui	254ad61959	fix xpu pe sync, test=notest (#30095 )	4 years ago
Thunderbrook	0b8e1fadc5	add topo-aware in heter-ps (#30087 ) * add topo aware * resource.h * topo aware * format	4 years ago
hong	297fff1a79	support dygraph in xpu place (#30051 ) * support dygraph in xpu place; test=develop * fix cpu/gpu compile error; test=develop * fix compile error; test=develop * fix xpu compile error; testd=develop	4 years ago
wangchaochaohu	d0a5620575	fix the compiler error when gcc4 cuda9.0 (#29997 )	4 years ago
WangXi	ee16006b5d	Optimization grad merge performance (#29784 )	4 years ago
yongqiangma	e891f4da1b	Add p_norm op version info (#30042 ) * p_norm fix op version info. test=develop	4 years ago
tangwei12	7d1c149e09	for inference checkpoint (#30081 ) * for inference checkpoint Change-Id: I36c979240ffa55bf1ef0c9315402960762af6be4 * for inference checkpoint Change-Id: I82025365d5b792cbea1ead506df685aecc8ac198	4 years ago
tangwei12	7d4bdff07d	fix large scale memory (#30035 ) * memory holder optimize Change-Id: Ic91af8ac6f2853336d28a9fbbc5e8d0c57b5d05e * memory holder optimize Change-Id: I2fd1c14ecc17f5d5ce88b87890381ea801e6367f * fix large scale memory holder Change-Id: Ief0992b02b00220e16c72cc637a56e7b5788140f * fix large scale memory holder Change-Id: I910142a3952ead643a5604f8f80955f3e6efe655	4 years ago
Shang Zhizhou	08dc5bc27e	fix op version checker of pass bug (#30028 ) * fix op version checker of pass bug * fix code style * update pass version	4 years ago
cc	68398abce9	[Inference] zero_copy_tensor supports int8_t (#30053 ) * zero_copy_tensor supports int8_t	4 years ago
whs	1b999d2b5d	Add version checking (#30040 )	4 years ago
ceci3	85b2f05ab0	register ModifyAttr for instance_norm, test=op_version (#30065 ) * register instance norm, test=op_version	4 years ago
channings	ddcff254db	fix op_register_version for compare ops, test=op_version (#30007 ) Co-authored-by: zhoushunjie <zhoushunjie@baidu.com>	4 years ago
Wilber	66e16b7e99	update lite subgraph. (#30056 )	4 years ago
GaoWei8	a64822589f	add REGISTER_OP_VERSION for LSTM (#30038 )	4 years ago
yinhaofeng	6e93fb92f9	Register op version for linspace,test=op_version (#30025 ) * Register op version for linspace,test=op_version * Register op version for linspace,test=op_version * Register op version for linspace,test=op_version * Register op version for linspace,test=op_version * Register op version for linspace,test=op_version	4 years ago
123malin	d0056c324d	test=develop, add op_register_version for roll_op (#30023 ) * test=develop, add op_register_version for roll_op	4 years ago
chentianyu03	e012930aa3	complex gradient matmul (#29966 ) * dot op support complex types * matmul support complex types * add test case * matmul broadcast gradient support complex * move conjFunctor to complex_functor.h	4 years ago
ShenLiang	893d37e5c6	Fix rank_attention op_version, test=op_version (#30006 ) * fix rank_attention, test=op_version	4 years ago
Adam Osewski	13aef97043	operator checkpoints for new attributes. (#29832 ) * Add operator checkpoints for new attributes. * Fix adding subsequent checkpoint to quantize op.	4 years ago
wangguanzhong	844d8e0c2c	add REGISTER_OP_VERSION for generate_proposals, roi_align, roi_pool test=op_version (#30034 )	4 years ago
cc	c3c064a8fc	Add mkldnn nearest_interp and bilinear_interp op (#30016 ) * Add mkldnn nearest_interp and bilinear_interp op * don't run mkldnn interpolate in default * add interpolate_mkldnn_pass	4 years ago
chalsliu	c053bf2a57	Revert "register ModifyAttr for instance_norm, test=op_version (#29938 )"	4 years ago
wawltor	cc2f94620c	add the support the op version check for matmul, test=op_version (#30011 ) * add the support the op version check for matmul, test=op_version	5 years ago
wawltor	b33aaea86c	add the op version check for the elementwise ops, test=op_version (#30010 ) * add the op version check for the elementwise ops, test=op_version * add the support check for elementwise_ops, test=op_version	5 years ago
Chengmo	4cbcc9b6da	fix momentum op register (#29941 ) * fix momentum op register	5 years ago
hutuxian	7c1f69bdf0	add op_version for flip op [test=op_version] (#30019 )	5 years ago
ceci3	77c1684397	register ModifyAttr for instance_norm, test=op_version (#29938 ) * upgrade instance_norm, test=op_version * fix	5 years ago
Leo Chen	47d10c55d5	Enhance debugging (#30001 ) * add debug code * add place info * fix compile problem * add place for output	5 years ago
FlyingQianMM	d42f93e504	add op_register_version for allclose op; test=op_version (#29968 )	5 years ago
wawltor	8f49f9d5c9	change the elementwise ops version check, test=op_version change the elementwise ops version check, test=op_version	5 years ago
guofei	b23faf37be	Add moving_average_abs_max_scale op_register_version test=develop (#29957 ) Add moving_average_abs_max_scale op_register_version	5 years ago
Thunderbrook	0ca6de171f	add include (#29952 )	5 years ago
Pei Yang	6206b9bc71	fix ut:trt_resnext_test, trt_quant_int8_yolov3_r50_test, test_trt_dynamic_shape_ernie, test_trt_dynamic_shape_ernie_fp16_ser_deser, trt_cascade_rcnn_test (#29977 )	5 years ago
wangxinxin08	be8b5fd18a	register op version for conv2d_transpose, conv3d_transpose and depthwise_conv2d_transpose, test=op_version (#29937 )	5 years ago
石晓伟	958612231f	compile the denormal.cc on aarch64, test=develop (#29956 )	5 years ago
Guo Sheng	6ac4f0af6a	Register op version for coalesce_tensor. (#29940 ) test=develop test=op_version	5 years ago
Chen Weihang	a1d9a14e89	support grad accumulated across batch (#29942 )	5 years ago
cc	6a0102b038	map matmul/squeeze2+matmul/reshape2+matmul to mul (#29911 ) * map matmul/squeeze2+matmul/reshape2+matmul to mul	5 years ago
Huihuang Zheng	d038746e1c	Fix Unix Sleep for Wrong Time. test=develop (#29953 ) PADDLE_RETRY_CUDA_SUCCESS used wrong sleep time so it can cause timeout in unittest. This PR fixed it. After we searched the doc in https://pubs.opengroup.org/onlinepubs/7908799/xsh/unistd.h.html, the time unit of sleep in unistd.h takes "seconds", usleep takes "microseconds", Sleep in windows.h takes "milliseconds".	5 years ago
Jack Zhou	5a4e42ca9a	add gru op_register_version; test=op_version; (#29931 ) * add gru op_register_version; test=op_version; * Update fc,mul version;test=op_version;	5 years ago
Wilber	2b1d796cd0	[Inference] Solve 2.0 trt performance reduce compare 1.8. (#29925 )	5 years ago
Qi Li	913f77a4b7	Register op version for print, test=op_version (#29945 )	5 years ago
石晓伟	181ea1870b	flush denormals to zero, test=develop (#29924 ) * flush denormals to zero, test=develop * add comments, test=develop	5 years ago
cc	7667e59bf7	add op version for fake_quant and fake_dequant ops, test=op_version (#29923 ) * add op version for fake_quant and fake_dequant ops, test=op_version, test=develop	5 years ago
石晓伟	acb5e86363	fix a bug in reset_tensor_array, test=develop (#29620 ) * fix a bug in reset_tensor_array, test=develop * ci coverage, test=develop	5 years ago
liuyuhui	3d1741b794	[Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926 )	5 years ago
Wilber	332da133a1	Support mips arch (#29903 ) * Support MIPS arch.	5 years ago
LielinJiang	eab0b60e16	Register op version for grid_sampler, test=op_version (#29916 ) * register op version for grid_sampler	5 years ago
liym27	9602a182b2	[Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor (#29842 ) * Revert "[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267)" This reverts commit `b10ecd9d3a`. * Support ShareInplaceVersionCounterWith to share the same inplace version counter for VarBase	5 years ago
liuyuhui	4427df37cf	[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574 )	5 years ago
LielinJiang	0f4b218640	Enable bilateral_slice unittest on windows platform (#29896 ) * enable bilateral_slice unittest on windows platform * reduce max threads	5 years ago
YUNSHEN XIE	2a01756bf3	remove duplicate ut names (#29809 )	5 years ago
Chen Weihang	a6072055be	[Complex] Handle complex to real after type promotion (#29855 ) * try to add fwd op input dtypes * refactor base impl * return tmp_ins after dygraph prepare data * fix typo found in debug * polish comment & add complex net test * revert detail change * fix unittest failed * add complex kernel condition control * fix xpu test failed & polish comment * polish details by review comments	5 years ago
Chen Weihang	1a304e6c06	[Complex] Add support for complex grad accumulated (#29889 ) * add support for complex grad accumulated * add unittest for coverage * update test dtype * remove useless blank line	5 years ago
taixiurong	c7acad9f2f	support some shape for matmul and cast in xpu place (#29900 ) * support some shape in matmul and cast * modify matmul	5 years ago
Leo Chen	6b258317cb	fix TransferInplaceBack (#29830 )	5 years ago
QingshuChen	59b47f3b32	feat: support check_nan_inf for kunlun/xpu device (#29694 ) * feat: support check_nan_inf for kunlun device * support kunlun stack * minor	5 years ago
tangwei12	032414ca2a	[Feature] one ps (3/4) (#29604 ) * oneps (3/4) Co-authored-by: MrChengmo <cmchengmo@163.com> Co-authored-by: malin10 <malin10@baidu.com> Co-authored-by: chengmo <chengmo@baidu.com>	5 years ago
jakpiase	edc06c6a1b	Added fc + activation fuse pass (currently only gelu, sigmoid and tanh are supported) (#29772 )	5 years ago
Wilber	2c0a4a3470	call_statck is turned on default when ON_INFER=ON (#29798 )	5 years ago
Wilber	ad0b01ffe2	lod operator should not be reused in memory_optimize pass. (#29828 )	5 years ago
liym27	97e75ad0f5	[setitem] Support Tensor setitem in static mode (#29708 ) 1. Type of index: int, slice(step must be 1). 2. Type of value: (1) int32, int64, float32, bool; (2) numpy.array(int32, int64, float32, bool);<Note: float64 is not supported> (3) paddle.Tensor(int32, int64, float32, float64, bool);	5 years ago
YUNSHEN XIE	24ce051a84	remove duplicate ut reload (#29810 ) * remove duplicate ut reload * remove duplicate ut define in cmakelist	5 years ago
Jacek Czaja	c9e874fc8e	[oneDNN] Unit test for checking oneDNN caching (#29606 )	5 years ago
Thunderbrook	09b6e71928	heter box (#29734 ) * 　add heter box * add trainer, worker, wrapper... * format * for ci * format * remove boost get * boost & copyright * rename * 　rename * format * format * format Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>	5 years ago
Jacek Czaja	7b33720c90	[oneDNN] Tensor copy fix to oneDNN tensors (#29771 ) * - Tensor copy fix to oneDNN tensors * - Fixes after review	5 years ago
123malin	a400b76db7	Roll cuda kernel (#29655 ) * test=develop, optimize roll_op_cuda_kernel	5 years ago
wuhuanzhou	e7ac74c85b	optimize compilation time of argmin/argmax op (#29595 ) * Using VisitDataTypeTiny and put CastOP after ReduceOP, test=develop * remove changes of reduce_op.h, test=develop	5 years ago
chentianyu03	ddfc3d2c2f	change grad elementwise_mul for complex types (#29757 ) * add conj op for complex types * add conj for complex types * add more test case * add conj_op test * modify conj api and impl * add complex type for fill_constant_op xpu * add setConstant for complex type * remove complex conj test file * user define grad for test_conj_op * add test case for static mode of conj api * modify conj doc * change input args name to x * remove useless codes * conj support real types * add conj test case for real number * delete no need to calculate inputs in dygraph op_test * delete no need to calculate inputs in dygraph op_test * modify grad of mul for complex types * fix the grads of inputs args order not match bug	5 years ago
chentianyu03	2a260d9b0e	change the grad of div when complex types (#29804 ) * change the grad of div when complex types * fix the grads of inputs args order not match bug	5 years ago
ShenLiang	f65f1caad3	opt sparse allreduce using ncclgather (#29819 )	5 years ago
TTerror	82aa01c373	add nearest_interp_v2 on kunlun (#29725 ) * add nearest_interp_v2 on kunlun * add nearest_interp_v2 on kunlun	5 years ago
wangchaochaohu	01c37c8e02	refine the compiler error for half2 operation (#29816 )	5 years ago
whs	82630408b4	Support double backward rsqrt (#29589 )	5 years ago
Zhang Ting	b76f5a8489	fix the bug of dropout_grad (#29813 )	5 years ago
LielinJiang	a94c3cbbf3	register cudnn conv double grad for depthwise conv (#29807 )	5 years ago
ShenLiang	01e2874a0e	Support multi-stream communication for dynamic graph distributed (#29525 ) * fix fleet for multi-stream * fix memcpy for ncclid * use sync to solve move operation	5 years ago
wangchaochaohu	f350aa59ff	Fix the compiler error for half type (#29799 )	5 years ago
Huihuang Zheng	1cbb282d77	Add Retry Logic to CublasHandlerHolder Add Retry Logic to CublasHandlerHolder to avoid random unittest failure.	5 years ago
LielinJiang	e5af650b71	Add double grad for conv_transpose (#29706 ) * add double grad for conv_transpose	5 years ago
Leo Chen	224f3bcbb1	format code (#29714 )	5 years ago
LoveAn	2e5b4a216c	Optimize compilation time with Unity Build (#29733 ) * Test compilation time with less parallel count, notest, test=windows_ci * optimize rules of Unity Build, notest, test=windows_ci, test=windows_op * limit parallel counts used only on GPU, test=develop * remove limit of argument /m:8 on Windows, test=develop	5 years ago
Zhang Jun	0c23ba95d8	enable MakeCiper api for inference;test=develop (#29692 )	5 years ago
wangchaochaohu	7b2dc4e6b1	optimization for fp16 elementwise add (#29744 )	5 years ago
Jacek Czaja	07790ba13e	[oneDNN] Reimplemented elementwise_add grad (#29747 ) * - Reimplemented elementwise_add grad - lint * - fix after review * - Fix to fix after review	5 years ago
Aurelius84	17c8e3adfe	Polish code in gpu_launch_config.h (#29730 )	5 years ago
wangchaochaohu	068d905e1e	fix the shape choose of vectorize for cuda	5 years ago
syyxsxx	7c2affaa26	fix isfinite_v2_op OpProtoAndCheckerMaker AddComment bug (#29626 ) fix isfinite_v2_op OpProtoAndCheckerMaker AddComment bug	5 years ago
石晓伟	8bd2879ef7	update the operator registration for incompatible upgrade, test=develop (#29720 )	5 years ago
chentianyu03	71063b8137	add conj op for complex types (#29527 ) * add conj op for complex types * add conj for complex types * add more test case * add conj_op test * modify conj api and impl * add complex type for fill_constant_op xpu * add setConstant for complex type * remove complex conj test file * user define grad for test_conj_op * add test case for static mode of conj api * modify conj doc * change input args name to x * remove useless codes * conj support real types * add conj test case for real number	5 years ago
Wilber	b593d588aa	[Inference] EnableUseGpu has higher priority than flags (#29697 ) * enable_use_gpu has higher priority than FLAGS * update.	5 years ago
WangXi	9cbcc6cadc	fleet sync build strategy, test=develop (#29732 )	5 years ago
wanghuancoder	0c59ad2a1a	Windows generate pdb and dump, for debug (#29628 ) * Windows generate pdb and dump, for debug * fix code style, test=develop * modify cmakelist	5 years ago
Huihuang Zheng	4c4d4ba5e0	Modify CublasHandleHolder to Fix Random Unittest Failure. test=develop (#29617 ) Modify CublasHandleHolder from using PADDLE_ENFORCE_CUDA_SUCCESS to PADDLE_RETRY_CUDA_SUCCESS to fix random unittest failure. We checked that the unittest log showed CUDA allocation error at this file, which may due to GPU not enough. We fixed similar failure in the past, so we applied PADDLE_RETRY_CUDA_SUCCESS here.	5 years ago
Chen Weihang	6cfa59de1b	[Complex] Add real & imag op and api for complex tensor (#29672 ) * add complex real op & api & unittest * add imag op & api & unittest * refactor op impl * revert simplify writing due to complile failed * polish details * polish grad op code	5 years ago
Jacek Czaja	9eff1a674f	Added missing format of oneDNN (#29670 )	5 years ago
wangchaochaohu	2e0d1ed00f	delete the code for fp16 optimization because it is not faster than common template code (#29715 )	5 years ago
TTerror	af8ded773a	update activation op on kunlun (#29577 ) * fix expand && concat/transpose to new api * update xpu_header * update activation op on kunlun * update activation op on kunlun * update activation op on kunlun * update activation op on kunlun * update activation op on kunlun * add nearest_interp on kunlun * update error message	5 years ago
ceci3	cc387159f3	add pad and concat double grad (#29549 ) * add constant pad double grad	5 years ago
liuyuhui	f13c3a9cd7	[Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337 )	5 years ago
Y_Xuan	76738504ad	添加rocm平台支持代码 (#29342 ) * 添加rocm平台支持代码 * 修改一些问题 * 修改一些歧义并添加备注 * 修改代码格式 * 解决冲突后的代码修改 * 修改operators.cmake * 修改格式 * 修正错误 * 统一接口 * 修改日期	5 years ago
Zhang Ting	1e9127f688	improve dropout grad (#29605 ) * improve grad perf	5 years ago
wangchaochaohu	eab44e1f32	refine (#29622 )	5 years ago
WangXi	613c46bc07	fix gen_nccl_id_op_helper compile failed, test=develop (#29614 )	5 years ago
Chen Weihang	f02aece1f0	Add complex dtype op (add) test example (#29603 ) * add op test case for complex * polish code details * add xpu set constant support * fix argument rror * remove useless pyc file	5 years ago
AshburnLee	efea540ca9	Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732 )	5 years ago
lijianshe02	7779768b53	add transpose double grad test=develop (#29600 ) * add transpose double grad test=develop	5 years ago
wangchaochaohu	1b69e528d3	optimize for long width for elementwise (#29602 )	5 years ago
Wilber	78dad78610	fix none-contiguous bug for python api. (#29615 )	5 years ago
ShenLiang	1efef8baed	Fix bug of matmul_v2 for broadcast case (#29599 ) * fix bug of matmul_v2 for broadcast	5 years ago
qingqing01	8d549fc85d	Add clip double grad (#29590 )	5 years ago
wangchaochaohu	ac4bae8ee9	elementwise_add_grad Op optimization (#29575 )	5 years ago
arlesniak	62d4483649	Added verbose oneDNN lib version (#29378 )	5 years ago
lilong12	ff6a145011	update, test=develop (#29559 )	5 years ago
WangXi	467c716963	gen nccl id use socket (#29431 )	5 years ago
tangwei12	0034273b7e	add service (#29560 ) * add service, remove ut on mac * fix heter_profiler & add heter stop method * fix code style	5 years ago
Leo Chen	c0163837a5	Fix compile problem when cuda_arch < 6000 (#29576 ) * fix compile problem when cuda_arch < 6000 * refine code * refine code	5 years ago
QingshuChen	79a41a9ed6	support roi_align & affine_channel for kunlun (#29561 ) * support roi_align & affine_channel for kunlun * minor	5 years ago
Jacek Czaja	f6cca62575	[oneDNN] Making ThreadID info in caching key optional (#29272 )	5 years ago
Wilber	740c0d58c3	update for xpu ci. (#29568 )	5 years ago
JZ-LIANG	d33d468f02	[Sharding] add hybrid-dp feature (#29518 ) * Sharding add hybrid-dp feature * update sharding in distributed_strategy * update sharding unitest * revise code format for sharding	5 years ago
Leo Chen	1e72e03217	remove duplicated macro (#29563 )	5 years ago
Zhang Ting	6702040e94	improve dropout (#29465 ) * improve drop out * add VectorizedRandomGeneratorWithGenerator * fix bug * modify according to comments	5 years ago
Zhang Ting	30d9589afe	add cast cuda kernel (#29352 )	5 years ago
LoveAn	b5d4a1f33d	Add the strategy of skipping cc/cu test compilation and execution in CI (#29499 ) * Add the strategy of skipping cc/cu test compilation and execution in CI, test=develop * fix if error with CI_SKIP_TEST, test=develop * fix add properties to test error on Linux/MAC, test=develop * fix set test properties of test_code_generator error, test=develop * remove test codes and advance judgment of file modification on Linux, test=develop * rename CI_SKIP_TEST to CI_SKIP_CPP_TEST, test=document_fix * Add branch judgement on Linux, test=develop	5 years ago
Aurelius84	2a42250699	Polish hash function of executor cache key (#29556 ) * Add more value to calculate hash key * fix size_t * polish code	5 years ago
taixiurong	760d015c14	add xpu ops for training transformer in kunlun (#29539 ) * 1.fix matmul bug 2. add one hot * add xpu error msg	5 years ago
Jacek Czaja	83a693ee55	[oneDNN] Added Unit Test for Multiple instances prediction (#29501 ) * - Added infrastructre for new test - Added UT for Multiple models prediction - cosmetic fixes - lint - lint fixes * - Removed timeout for MMP test	5 years ago
Zhong Hui	60bfd308ab	fix p_norm with empty shape (#29500 ) fix p_norm with empty shape (#29500)	5 years ago
Leo Chen	9f926eb720	Layernorm opt (#29522 ) * layernorm fw opt * layernorm bw opt * fix typo, test=develop * remove const dim3 for windows CI compatibility * merge develop Co-authored-by: zlsh80826 <zlsh80826@gmail.com>	5 years ago
tangwei12	ae3f7a7100	add ps table (#29463 ) * add ps table Change-Id: I468a04bd071d21ff52654926fcf4d5f3da19e178	5 years ago
ShenLiang	d8391a1983	fix error message of gather nd (#29521 )	5 years ago
Zhen Wang	5ac71b36fb	Remove tensor copy in the update_loss_scaling op. (#29426 ) * remove tensor copy in the update_loss_scaling op * not use thrust. * fix some cuda memory access error.	5 years ago
Zhou Wei	e74e1a226c	support deepcopy for Layer/Tensor/Paramerbase (#29387 ) * support deepcopy for Layer/Tensor/Paramerbase * fix some code	5 years ago
joejiong	87e75a77c2	Add tangent operator (#29207 ) As the title	5 years ago
zlsh80826	95e334810a	Softmax vectorization (#29404 ) * vec softmax fw * vec softmax bw * add a message argument for compiler compatibility	5 years ago
ShenLiang	2ef9e0e23c	Rebuild group automatically in dynamic graph distributed (#29255 ) * add tensor_indices in AssignGroupBySize * add rebuild group in reducer	5 years ago
procr	3a0558339d	support mobilenet for kunlun (#29458 )	5 years ago
Huihuang Zheng	a1909affc6	Fix Unit Test: Add Sleep Time for CUDA Retry (#29442 ) Add Sleep Time for CUDA Retry, which is similar to our GPU retry logic. This is a try to avoid init GPU allocation random failure in unit test.	5 years ago
Leo Chen	e5e522493d	make gelu fp16 computing more robust (#29484 )	5 years ago
Zhang Ting	560b432349	Revert "improve elementwise_add_grad perf (#29277 )" (#29464 ) This reverts commit `befd6d5338`.	5 years ago
jakpiase	57a4f16d9e	added internal and external reorders to profiler (#29443 ) * added external reorder to profiler * added external and internal reorders to profiler * added internal and external reorder to profiler * added formatting to int/ext reorder commit * removed unnecessary comment	5 years ago
Pei Yang	2480bdef6c	change hard_swish from plugin to layer (#29177 ) * change hard_swish from plugin to layer * add ut when threshold != scale	5 years ago
taixiurong	ecca6585cd	1. fix elementwise ops'bug 2. fix softmax_with_cross_entropy_op 3. add biliner_interp_op (#29448 ) Co-authored-by: root <root@bjhw-sys-rpm0223.bjhw.baidu.com>	5 years ago
LoveAn	03b42d9fa7	fix unittest on windows, test=develop (#29365 )	5 years ago
TTerror	a5fcc4b545	update reduce_sum op on xpu (#29367 ) * update reduce_sum op on xpu * update reduce_sum op on xpu * support running on xpu	5 years ago
Jack Zhou	c7cada8571	Fix gru performace decline in 1.8.5 (#29455 )	5 years ago
Zhang Ting	6296f4ed09	revert cast eigen kernel (#29427 )	5 years ago
Leo Chen	a040c055a5	fix layer_norm accuracy (#29434 )	5 years ago
Zhou Wei	24ba9ed436	fix that parameters'grad has grad var (#29408 )	5 years ago
Leo Chen	4e19ce1df5	refine reshape grad and double grad kernel, use tensor copy async (#29128 )	5 years ago
Shang Zhizhou	225a9c4ed8	Fix unittest (#29412 ) * fix tensorrt unittest precision error * fix unittest precision error. test_trt_subgraph_pass && test_trt_dynamic_shape_transformer_prune	5 years ago
Pei Yang	f860de4af7	support clip op trt converter (#29411 )	5 years ago
Jack Zhou	1dd7b97b66	fix rnn_op bug in cudnn_version>= 8 (#29406 )	5 years ago
LoveAn	671555ed32	Compiling operator libraries with Unity build (#29130 ) * Compiling operator libraries with Unity Build on Windows CPU. * Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci * Add option in windows ci script, no_test, test=windows_ci * Optimize parallel compiling, test=develop * remove limit of parallel compile and skip some ops in UB, test=develop * remove changes of header file, test=develop * remove changes of header file, test=develop * fix test_eye_op unittest failed, test=develop * Compiling operator libraries with Unity Build on Linux, test=develop * set default WITH_UNITY_BUILD=OFF, test=develop * Move unity build rules into a single file and add comment, test=develop * optimize parallel compilation, test=develop * fix undefined reference error on coverage ci, test=develop	5 years ago
cc	a623ce044f	Use different name_scope for different conv type, test=develop (#29355 )	5 years ago
yongqiangma	7c508d8668	update unbind norm add CUDAPlace api doc information (#29322 ) * enhance array_to_lod_tensor_op lod_tensor_to_array_op errors information. test=develop * fix format. test=develop * format fix. test=develop * add lod_rank_table. test=develop * fix format. test=develop * fix doc info. test=develop * fix np error * add unbind dygraph api. test=develop * fix unbind doc.test=develop	5 years ago
chentianyu03	879e913b6d	Make transpose, trace, kron, reshape, sum op support complex type (#29321 ) * add complex64 and complex128 type; add +-/@ and slice opreator for complex types add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest * kron, reshape, transpose support complex types * sum and trace op support complex types * add test case of sum and trace op * fix the bug of imag part of complex not initialized * format file * format code style * kron support type promotion; modify test cases	5 years ago
卖鱼的哲学	074065e5de	fix expand/uniform_random && concat/transpose to new api on xpu (#29280 ) * fix expand && concat/transpose to new api * update uniform_random_op * update xpu_header	5 years ago
lilong12	1decf4ada6	update, test=develop (#29331 )	5 years ago
QingshuChen	74bf3bed36	support global pooling for kunlun (#29293 ) * test=kunlun	5 years ago
liym27	b10ecd9d3a	[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267 )	5 years ago
Chen Weihang	9ad800ebb2	Support type promote for basic math ops (quantum required) (#29265 ) * basic impl of type promote * add comment & another testcase * fix complex bugs & support python op promote type * fix failed unittests & polish code * add unittest for coverage * change to only promote complex type * polish code details * polish several comments	5 years ago
tangwei12	8358791607	fix gpu outofrange (#29238 ) * fix gpu emb out of range Change-Id: I5794ac73bd634d5ea069a6fbbd914274b6d6b7bf * fix doc Change-Id: I5a3350b2930a9ab2f52116c192b087307faf8fdf	5 years ago
Leo Chen	b58cfff89d	use has_grad instead of train_mode (#29309 ) * use has_grad instead of train_mode * add vlog for debug * fix ut * fix ut	5 years ago
Zhang Ting	befd6d5338	improve elementwise_add_grad perf (#29277 ) * improve performance of elementwise_sum_grad	5 years ago
Shang Zhizhou	ebf689197d	fix tensorrt output shape error (#29308 ) * fix tensorrt output shape error * fix unittest tensorrt_engine_op_test * fix code style for unitest	5 years ago
Aurelius84	67c700b479	[Dy2Stat] Add cache for Executor and Context in run_program_op (#28421 )	5 years ago
ShenLiang	696dc4bb13	fix the warning of reducer (#29323 )	5 years ago

... 2 3 4 5 6 ...

10948 Commits (bc7a3afa687696541b032d56d1e9a8ca8e101c77)