Paddle

Commit Graph

Author	SHA1	Message	Date
FlyingQianMM	d42f93e504	add op_register_version for allclose op; test=op_version (#29968 )	5 years ago
guofei	b23faf37be	Add moving_average_abs_max_scale op_register_version test=develop (#29957 ) Add moving_average_abs_max_scale op_register_version	5 years ago
wangxinxin08	be8b5fd18a	register op version for conv2d_transpose, conv3d_transpose and depthwise_conv2d_transpose, test=op_version (#29937 )	5 years ago
Guo Sheng	6ac4f0af6a	Register op version for coalesce_tensor. (#29940 ) test=develop test=op_version	5 years ago
Jack Zhou	5a4e42ca9a	add gru op_register_version; test=op_version; (#29931 ) * add gru op_register_version; test=op_version; * Update fc,mul version;test=op_version;	5 years ago
Qi Li	913f77a4b7	Register op version for print, test=op_version (#29945 )	5 years ago
cc	7667e59bf7	add op version for fake_quant and fake_dequant ops, test=op_version (#29923 ) * add op version for fake_quant and fake_dequant ops, test=op_version, test=develop	5 years ago
Wilber	332da133a1	Support mips arch (#29903 ) * Support MIPS arch.	5 years ago
LielinJiang	eab0b60e16	Register op version for grid_sampler, test=op_version (#29916 ) * register op version for grid_sampler	5 years ago
LielinJiang	0f4b218640	Enable bilateral_slice unittest on windows platform (#29896 ) * enable bilateral_slice unittest on windows platform * reduce max threads	5 years ago
Chen Weihang	a6072055be	[Complex] Handle complex to real after type promotion (#29855 ) * try to add fwd op input dtypes * refactor base impl * return tmp_ins after dygraph prepare data * fix typo found in debug * polish comment & add complex net test * revert detail change * fix unittest failed * add complex kernel condition control * fix xpu test failed & polish comment * polish details by review comments	5 years ago
Chen Weihang	1a304e6c06	[Complex] Add support for complex grad accumulated (#29889 ) * add support for complex grad accumulated * add unittest for coverage * update test dtype * remove useless blank line	5 years ago
taixiurong	c7acad9f2f	support some shape for matmul and cast in xpu place (#29900 ) * support some shape in matmul and cast * modify matmul	5 years ago
QingshuChen	59b47f3b32	feat: support check_nan_inf for kunlun/xpu device (#29694 ) * feat: support check_nan_inf for kunlun device * support kunlun stack * minor	5 years ago
tangwei12	032414ca2a	[Feature] one ps (3/4) (#29604 ) * oneps (3/4) Co-authored-by: MrChengmo <cmchengmo@163.com> Co-authored-by: malin10 <malin10@baidu.com> Co-authored-by: chengmo <chengmo@baidu.com>	5 years ago
jakpiase	edc06c6a1b	Added fc + activation fuse pass (currently only gelu, sigmoid and tanh are supported) (#29772 )	5 years ago
liym27	97e75ad0f5	[setitem] Support Tensor setitem in static mode (#29708 ) 1. Type of index: int, slice(step must be 1). 2. Type of value: (1) int32, int64, float32, bool; (2) numpy.array(int32, int64, float32, bool);<Note: float64 is not supported> (3) paddle.Tensor(int32, int64, float32, float64, bool);	5 years ago
Jacek Czaja	c9e874fc8e	[oneDNN] Unit test for checking oneDNN caching (#29606 )	5 years ago
Thunderbrook	09b6e71928	heter box (#29734 ) * 　add heter box * add trainer, worker, wrapper... * format * for ci * format * remove boost get * boost & copyright * rename * 　rename * format * format * format Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>	5 years ago
123malin	a400b76db7	Roll cuda kernel (#29655 ) * test=develop, optimize roll_op_cuda_kernel	5 years ago
wuhuanzhou	e7ac74c85b	optimize compilation time of argmin/argmax op (#29595 ) * Using VisitDataTypeTiny and put CastOP after ReduceOP, test=develop * remove changes of reduce_op.h, test=develop	5 years ago
chentianyu03	ddfc3d2c2f	change grad elementwise_mul for complex types (#29757 ) * add conj op for complex types * add conj for complex types * add more test case * add conj_op test * modify conj api and impl * add complex type for fill_constant_op xpu * add setConstant for complex type * remove complex conj test file * user define grad for test_conj_op * add test case for static mode of conj api * modify conj doc * change input args name to x * remove useless codes * conj support real types * add conj test case for real number * delete no need to calculate inputs in dygraph op_test * delete no need to calculate inputs in dygraph op_test * modify grad of mul for complex types * fix the grads of inputs args order not match bug	5 years ago
chentianyu03	2a260d9b0e	change the grad of div when complex types (#29804 ) * change the grad of div when complex types * fix the grads of inputs args order not match bug	5 years ago
TTerror	82aa01c373	add nearest_interp_v2 on kunlun (#29725 ) * add nearest_interp_v2 on kunlun * add nearest_interp_v2 on kunlun	5 years ago
wangchaochaohu	01c37c8e02	refine the compiler error for half2 operation (#29816 )	5 years ago
whs	82630408b4	Support double backward rsqrt (#29589 )	5 years ago
Zhang Ting	b76f5a8489	fix the bug of dropout_grad (#29813 )	5 years ago
LielinJiang	a94c3cbbf3	register cudnn conv double grad for depthwise conv (#29807 )	5 years ago
wangchaochaohu	f350aa59ff	Fix the compiler error for half type (#29799 )	5 years ago
LielinJiang	e5af650b71	Add double grad for conv_transpose (#29706 ) * add double grad for conv_transpose	5 years ago
LoveAn	2e5b4a216c	Optimize compilation time with Unity Build (#29733 ) * Test compilation time with less parallel count, notest, test=windows_ci * optimize rules of Unity Build, notest, test=windows_ci, test=windows_op * limit parallel counts used only on GPU, test=develop * remove limit of argument /m:8 on Windows, test=develop	5 years ago
wangchaochaohu	7b2dc4e6b1	optimization for fp16 elementwise add (#29744 )	5 years ago
Jacek Czaja	07790ba13e	[oneDNN] Reimplemented elementwise_add grad (#29747 ) * - Reimplemented elementwise_add grad - lint * - fix after review * - Fix to fix after review	5 years ago
wangchaochaohu	068d905e1e	fix the shape choose of vectorize for cuda	5 years ago
syyxsxx	7c2affaa26	fix isfinite_v2_op OpProtoAndCheckerMaker AddComment bug (#29626 ) fix isfinite_v2_op OpProtoAndCheckerMaker AddComment bug	5 years ago
chentianyu03	71063b8137	add conj op for complex types (#29527 ) * add conj op for complex types * add conj for complex types * add more test case * add conj_op test * modify conj api and impl * add complex type for fill_constant_op xpu * add setConstant for complex type * remove complex conj test file * user define grad for test_conj_op * add test case for static mode of conj api * modify conj doc * change input args name to x * remove useless codes * conj support real types * add conj test case for real number	5 years ago
Chen Weihang	6cfa59de1b	[Complex] Add real & imag op and api for complex tensor (#29672 ) * add complex real op & api & unittest * add imag op & api & unittest * refactor op impl * revert simplify writing due to complile failed * polish details * polish grad op code	5 years ago
wangchaochaohu	2e0d1ed00f	delete the code for fp16 optimization because it is not faster than common template code (#29715 )	5 years ago
TTerror	af8ded773a	update activation op on kunlun (#29577 ) * fix expand && concat/transpose to new api * update xpu_header * update activation op on kunlun * update activation op on kunlun * update activation op on kunlun * update activation op on kunlun * update activation op on kunlun * add nearest_interp on kunlun * update error message	5 years ago
ceci3	cc387159f3	add pad and concat double grad (#29549 ) * add constant pad double grad	5 years ago
Y_Xuan	76738504ad	添加rocm平台支持代码 (#29342 ) * 添加rocm平台支持代码 * 修改一些问题 * 修改一些歧义并添加备注 * 修改代码格式 * 解决冲突后的代码修改 * 修改operators.cmake * 修改格式 * 修正错误 * 统一接口 * 修改日期	5 years ago
Zhang Ting	1e9127f688	improve dropout grad (#29605 ) * improve grad perf	5 years ago
wangchaochaohu	eab44e1f32	refine (#29622 )	5 years ago
WangXi	613c46bc07	fix gen_nccl_id_op_helper compile failed, test=develop (#29614 )	5 years ago
Chen Weihang	f02aece1f0	Add complex dtype op (add) test example (#29603 ) * add op test case for complex * polish code details * add xpu set constant support * fix argument rror * remove useless pyc file	5 years ago
lijianshe02	7779768b53	add transpose double grad test=develop (#29600 ) * add transpose double grad test=develop	5 years ago
wangchaochaohu	1b69e528d3	optimize for long width for elementwise (#29602 )	5 years ago
ShenLiang	1efef8baed	Fix bug of matmul_v2 for broadcast case (#29599 ) * fix bug of matmul_v2 for broadcast	5 years ago
qingqing01	8d549fc85d	Add clip double grad (#29590 )	5 years ago
wangchaochaohu	ac4bae8ee9	elementwise_add_grad Op optimization (#29575 )	5 years ago
arlesniak	62d4483649	Added verbose oneDNN lib version (#29378 )	5 years ago
WangXi	467c716963	gen nccl id use socket (#29431 )	5 years ago
Leo Chen	c0163837a5	Fix compile problem when cuda_arch < 6000 (#29576 ) * fix compile problem when cuda_arch < 6000 * refine code * refine code	5 years ago
QingshuChen	79a41a9ed6	support roi_align & affine_channel for kunlun (#29561 ) * support roi_align & affine_channel for kunlun * minor	5 years ago
Jacek Czaja	f6cca62575	[oneDNN] Making ThreadID info in caching key optional (#29272 )	5 years ago
Leo Chen	1e72e03217	remove duplicated macro (#29563 )	5 years ago
Zhang Ting	6702040e94	improve dropout (#29465 ) * improve drop out * add VectorizedRandomGeneratorWithGenerator * fix bug * modify according to comments	5 years ago
Zhang Ting	30d9589afe	add cast cuda kernel (#29352 )	5 years ago
LoveAn	b5d4a1f33d	Add the strategy of skipping cc/cu test compilation and execution in CI (#29499 ) * Add the strategy of skipping cc/cu test compilation and execution in CI, test=develop * fix if error with CI_SKIP_TEST, test=develop * fix add properties to test error on Linux/MAC, test=develop * fix set test properties of test_code_generator error, test=develop * remove test codes and advance judgment of file modification on Linux, test=develop * rename CI_SKIP_TEST to CI_SKIP_CPP_TEST, test=document_fix * Add branch judgement on Linux, test=develop	5 years ago
taixiurong	760d015c14	add xpu ops for training transformer in kunlun (#29539 ) * 1.fix matmul bug 2. add one hot * add xpu error msg	5 years ago
Zhong Hui	60bfd308ab	fix p_norm with empty shape (#29500 ) fix p_norm with empty shape (#29500)	5 years ago
Leo Chen	9f926eb720	Layernorm opt (#29522 ) * layernorm fw opt * layernorm bw opt * fix typo, test=develop * remove const dim3 for windows CI compatibility * merge develop Co-authored-by: zlsh80826 <zlsh80826@gmail.com>	5 years ago
ShenLiang	d8391a1983	fix error message of gather nd (#29521 )	5 years ago
Zhen Wang	5ac71b36fb	Remove tensor copy in the update_loss_scaling op. (#29426 ) * remove tensor copy in the update_loss_scaling op * not use thrust. * fix some cuda memory access error.	5 years ago
joejiong	87e75a77c2	Add tangent operator (#29207 ) As the title	5 years ago
zlsh80826	95e334810a	Softmax vectorization (#29404 ) * vec softmax fw * vec softmax bw * add a message argument for compiler compatibility	5 years ago
procr	3a0558339d	support mobilenet for kunlun (#29458 )	5 years ago
Leo Chen	e5e522493d	make gelu fp16 computing more robust (#29484 )	5 years ago
Zhang Ting	560b432349	Revert "improve elementwise_add_grad perf (#29277 )" (#29464 ) This reverts commit `befd6d5338`.	5 years ago
jakpiase	57a4f16d9e	added internal and external reorders to profiler (#29443 ) * added external reorder to profiler * added external and internal reorders to profiler * added internal and external reorder to profiler * added formatting to int/ext reorder commit * removed unnecessary comment	5 years ago
taixiurong	ecca6585cd	1. fix elementwise ops'bug 2. fix softmax_with_cross_entropy_op 3. add biliner_interp_op (#29448 ) Co-authored-by: root <root@bjhw-sys-rpm0223.bjhw.baidu.com>	5 years ago
TTerror	a5fcc4b545	update reduce_sum op on xpu (#29367 ) * update reduce_sum op on xpu * update reduce_sum op on xpu * support running on xpu	5 years ago
Jack Zhou	c7cada8571	Fix gru performace decline in 1.8.5 (#29455 )	5 years ago
Zhang Ting	6296f4ed09	revert cast eigen kernel (#29427 )	5 years ago
Leo Chen	a040c055a5	fix layer_norm accuracy (#29434 )	5 years ago
Leo Chen	4e19ce1df5	refine reshape grad and double grad kernel, use tensor copy async (#29128 )	5 years ago
LoveAn	671555ed32	Compiling operator libraries with Unity build (#29130 ) * Compiling operator libraries with Unity Build on Windows CPU. * Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci * Add option in windows ci script, no_test, test=windows_ci * Optimize parallel compiling, test=develop * remove limit of parallel compile and skip some ops in UB, test=develop * remove changes of header file, test=develop * remove changes of header file, test=develop * fix test_eye_op unittest failed, test=develop * Compiling operator libraries with Unity Build on Linux, test=develop * set default WITH_UNITY_BUILD=OFF, test=develop * Move unity build rules into a single file and add comment, test=develop * optimize parallel compilation, test=develop * fix undefined reference error on coverage ci, test=develop	5 years ago
chentianyu03	879e913b6d	Make transpose, trace, kron, reshape, sum op support complex type (#29321 ) * add complex64 and complex128 type; add +-/@ and slice opreator for complex types add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest * kron, reshape, transpose support complex types * sum and trace op support complex types * add test case of sum and trace op * fix the bug of imag part of complex not initialized * format file * format code style * kron support type promotion; modify test cases	5 years ago
卖鱼的哲学	074065e5de	fix expand/uniform_random && concat/transpose to new api on xpu (#29280 ) * fix expand && concat/transpose to new api * update uniform_random_op * update xpu_header	5 years ago
QingshuChen	74bf3bed36	support global pooling for kunlun (#29293 ) * test=kunlun	5 years ago
Chen Weihang	9ad800ebb2	Support type promote for basic math ops (quantum required) (#29265 ) * basic impl of type promote * add comment & another testcase * fix complex bugs & support python op promote type * fix failed unittests & polish code * add unittest for coverage * change to only promote complex type * polish code details * polish several comments	5 years ago
tangwei12	8358791607	fix gpu outofrange (#29238 ) * fix gpu emb out of range Change-Id: I5794ac73bd634d5ea069a6fbbd914274b6d6b7bf * fix doc Change-Id: I5a3350b2930a9ab2f52116c192b087307faf8fdf	5 years ago
Zhang Ting	befd6d5338	improve elementwise_add_grad perf (#29277 ) * improve performance of elementwise_sum_grad	5 years ago
Shang Zhizhou	ebf689197d	fix tensorrt output shape error (#29308 ) * fix tensorrt output shape error * fix unittest tensorrt_engine_op_test * fix code style for unitest	5 years ago
Aurelius84	67c700b479	[Dy2Stat] Add cache for Executor and Context in run_program_op (#28421 )	5 years ago
wangchaochaohu	c4be80f402	polish the code of cumsum and remove some unused code (#29303 )	5 years ago
ShenLiang	0fb18bc214	enforce the matmul_v2 error message (#29297 )	5 years ago
Zhen Wang	9b59a589b1	Remove some useless log. (#29300 )	5 years ago
Leo Chen	13a22a3752	fix shape of tile_grad op (#29289 )	5 years ago
Zhen Wang	be3777a50a	Add pure fp16 training with master weights. (#27712 ) * add the weight decay func for the momentum op * Add the multi_precision function in Momentum Optimizer. * Make sure that the initial value of master weights are same with the fp16 weights. * add static loss scaling. * add the rescale_grad function in the pure fp16 training. * use the original momentum updating method. * Polish some codes, such as variable names. * add docstring for apis. * update the var creation details of _create_master_weight. * not modify codes about imperative momentum updating. * Fix the error of test_dist_sparse_tensor_load_momentum UT. * add unit test for multi precision fp16 training. * add more unit tests for CI. * Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT. * For CI Coverage Checking.	5 years ago
furnace	7584bb5096	Layer norm fp16 (#29169 ) * add fp16 for layer_norm op * revert layernorm api * fix forward * fix forward * fix backward for layernorm with fp16 * fix unit test for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 * 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U> * fix with_mkldnn compile error for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 Co-authored-by: zhiqiu <chenqiuliang@baidu.com>	5 years ago
Leo Chen	116305ea4b	Improve performance of elementwise_add grad op (#29187 ) * pass stop_gradient for cast op * improve performance of elementwise_add grad * use tensor copy async * dygraph branch * fix dygraph branch * add ut	5 years ago
卖鱼的哲学	07c67d5a8b	add deformable_conv op on xpu (#29234 ) * rebase develop * update deformable_conv op on xpu * update deformable_conv op on xpu	5 years ago
QingshuChen	64f29fbb70	update kunlun conv2d/softmax/elementwise implemetation (#29229 ) * update conv2d & softmax to new xpu api * test=kunlun * remove useless comments * test=kunlun * remote softmax xpu op * test=kunlun * update kunlun softmax * test=kunlun * update xpu unitest * test=kunlun * fix elementwise_grad bug for kunlun *test=kunlun	5 years ago
chentianyu03	8f45d14263	add complex64 and complex128 type; add +-/@ and slice opreator for c… (#29199 ) add complex64 and complex128 type; add +-/@ and slice opreator for complex types add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest	5 years ago
Wilber	74c43ac638	fix lite unit test. (#29233 )	5 years ago
Adam Osewski	4096ff94dc	Small optimizations for conv2d kernel subroutines. (#29188 ) - Make sure that oneDNN memory descriptors are created only once at first iteration.	5 years ago
123malin	b5c6342336	Update ps gpu (#29209 ) * fix paramete prefetch & device guard Co-authored-by: MrChengmo <cmchengmo@163.com> Co-authored-by: chengmo <chengmo@baidu.com>	5 years ago
123malin	03d4665f44	prefetch optimize (#29095 ) * test=develop, optimize async prefetch	5 years ago
WangXi	0c2a51d240	optimizer amp, all use fp16 communication, overlap last comm and compute (#28957 )	5 years ago
Jack Zhou	bc6033f86b	fix gru gcc7.4 bug for the gru compile fix gru gcc7.4 bug for the gru compile	5 years ago
wangchaochaohu	b818429ae7	optimize cumsum OP (#29193 )	5 years ago
lilong12	7e5e9934fe	update expand as op to use the shape of the target tensor instead of the target tensor itself. (#29020 ) * update, test=develop	5 years ago
Jack Zhou	085260f3de	Add eigen gru and fix the dropout bug in the rnn Add eigen gru and fix the dropout bug in the rnn	5 years ago
arlesniak	bc902044a4	Fixes mkldnn dygraph learning rate scheduler crashes (#28988 )	5 years ago
Shang Zhizhou	b9e76a0103	detect tensorRT plugin fp16 in runtime (#27933 ) * remove -DSUPPORTS_CUDA_FP16 in cuda.cmake * comile with cuda9 * add some unittest * notest;test=coverage * add unittest for trt plugin swish && split * update ernie unittest * fix some error message * remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter * fix comile errror when CUDA_ARCH_NAME < Pascal" * fix comile error * update unittest timeout * compile with cuda9 * update error msg * fix code style * add some comments * add define IF_CUDA_ARCH_SUPPORT_FP16 * rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED	5 years ago
Noel	da71173bc9	Fix ops doc for some ops Fix ops doc for some ops	5 years ago
joanna.wozna.intel	b0d1ac161e	Add bf16 pool2d and unify bf16 unit tests (#29039 ) * Add bf16 pool2d and unify bf16 unit tests * Add change default ops test	5 years ago
joejiong	582c0a0468	add uint8 for reshape op (#28996 ) add uint8 for reshape operator	5 years ago
taixiurong	a5aa4dc7a9	add xpu elementwise ops (#29031 )	5 years ago
joejiong	b04c78ef5e	Update pow (#29000 ) Simple code clean up	5 years ago
wawltor	b2c8a00745	remove eigen threadpool for the speed up remove eigen threadpool for the speed up	5 years ago
lilong12	767d0ba267	update, test=develop (#28700 )	5 years ago
123malin	fbf9564f6b	【paddle.distributed.fleet】Optimize ParameterServer's Async Mode (#28442 ) * test=develop, optimize global_step	5 years ago
furnace	8ff3550658	refactor momentum op to combine weight (#27414 ) * refactor momentum op to combine weight_decay (scale op and sum op)	5 years ago
Jacek Czaja	bd1d6d3b30	extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758 )	5 years ago
yaoxuefeng	71c1cd1408	fix truncated_gaussian seed (#28777 )	5 years ago
gongweibao	1dad8ceaab	Fix gpu memory allocation bug. (#28703 )	5 years ago
Chen Weihang	b969c32ab1	fix occupied 0 device memory bug (#28771 )	5 years ago
joejiong	1a532d5133	add uint8 support for squeeze operator (#28734 ) Adding uint8 support for squeeze operator.	5 years ago
wangchaochaohu	8b853b3030	fix the number of perf algo for conv cudnn in exhaustive mode (#28694 )	5 years ago
joanna.wozna.intel	8c0ea4bffe	Add bf16 matmul, fc, elementwise add and mul (#28729 ) * Add bf16 matmul, fc, elementwise add and mul * Correct unit test	5 years ago
yaoxuefeng	08b62f4902	fix shuffle batch op shuffle (#28533 )	5 years ago
taixiurong	d3d1a6b6e0	add kunlun kernel: slice, slice_grad, top_k, cast. test=kunlun (#28542 ) 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api * 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api	5 years ago
Jack Zhou	9362d85e0e	Add LSTM, Simple RNN and GRU CPU kernel (#28577 ) * add lstm, simple rnn op kernel * fix the test_lstm for the rnn op * change func name * fix forward postprocess bug * add gru forward, backward code * remove unittest.skipIf; use a big rnn op instead of combination op * fix input doesn't have gradient bug * add eigen lstm forward, backward Co-authored-by: wawltor <fangzeyang0904@hotmail.com>	5 years ago
QingshuChen	30ef3815b3	adjust kunlun header file (#28536 ) * adjust kunlun header file test=kunlun update kunlun unittest test=kunlun update xpu unitest * test = kunlun * update xpu unittest * test=kunlun * update xpu unitest * test=kunlun	5 years ago
Zhang Ting	dab4920568	improve performance of cast op (#28727 )	5 years ago
yaoxuefeng	03f46e3526	fix truncated_gaussian op cuda seed setting (#28678 )	5 years ago
Wojciech Uss	04bcc13fac	Add multi_gru op and tests (#28591 ) * Add multi_gru op and tests * removed redundant disable_dygraph()	5 years ago
joejiong	32b90b1c2d	add log10 (#28576 ) Add new operator log10	5 years ago
Guo Sheng	858ffa0c8b	Fix the dropout setting when not initialized in rnn_op. (#28561 ) test=develop	5 years ago
Jacek Czaja	6d8d3d4c22	[oneDNN] Layer norm bf16 kernel (#28619 )	5 years ago
Zhou Wei	bf143652ac	fix lstm OP compile error on windows (#28667 ) * add unittest and check unittest for windows * fix lstm OP compile error on windows	5 years ago
石晓伟	57dab959ca	add datanorm op new scale_w register (#28657 ) Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>	5 years ago
cc	65aac81191	Fix fake_quant error when cout > 1024, test=develop (#28603 )	5 years ago
lilong12	b2f7ab6636	bug fix, test=develop (#28648 )	5 years ago
wawltor	8f2656ef5c	fix the gradient bug for the topk v2 fix the gradient bug for the topk v2	5 years ago
wangchaochaohu	a972c33fd7	refine gather OP performance for dynamic mode (#28587 )	5 years ago
joanna.wozna.intel	2cb71c0cde	Add checkpoint to quantize (#28612 ) * Add checkpoint to quantize * Change bfloat16 option	5 years ago
pangyoki	b889a0cee2	add gaussian_random op_version (#28602 )	5 years ago
Guo Sheng	110febdc54	Fix gradients with ignore_idx in softmax_with_cross_entropy (#28622 ) * Fix gradients with ignore_idx in softmax_with_cross_entropy. test=develop * Fix gradients with ignore_idx in softmax_with_cross_entropy on cpu. Remove softmax_with_cross_entropy from op_threshold_white_list. test=develop * Fix test_softmax_cross_entropy_op.py. test=develop	5 years ago
Leo Chen	f962bd3432	Fix cudnn workspace limit in cudnn-8 (#28611 )	5 years ago
Leo Chen	90805e2df7	Register op_version for new attribute use_addto (#28463 ) * register op_version for addto * upgrade pass capability * change eq to le * change eq to le * fix merge	5 years ago
lilong12	ed9dd7c9f0	add send and recv ops (#28590 ) * update, test=develop	5 years ago
Zhong Hui	a829357e4d	register the op version for some ops register the op version for some ops	5 years ago
Zhou Wei	bf6e7cba7a	updata 2.0 API english doc (#28525 ) * make Numpy version is below 1.19.3 * fix 2.0 doc	5 years ago
Shang Zhizhou	8699f38d08	裁剪transformer模型trt支持；修复tensorRT不支持DeletePass的bug (#28517 ) * skip_layernorm_op done * add unittest * slice op convertor support trt < 6 * skip_layernorm only work in ernie	5 years ago
joejiong	08d2413142	add log2 operator (#28319 ) As the title	5 years ago
wangchaochaohu	c52fe48f6f	fix the GetKernelTypeForVar of input for fluid.gather (#28534 )	5 years ago
wangchaochaohu	d7cfee9b31	Checkout point add (#28488 ) * upgrade pass capability	5 years ago
zhupengyang	47cbf61dd4	fix softmax unittest float16 random error (#28480 )	5 years ago
wangchaochaohu	e14ed71cc2	refine the performance of gather Op (#28458 )	5 years ago
YUNSHEN XIE	ba0756325a	exec ut no more than 15s 1 (#28439 ) * disable ut test_parallel_executor_fetch_isolated_var,test=document_fix * test for limiting ut exec time as 15S * fix an error caused by cannot find ut * fix some error * can not find test_transformer * fix error caused by ut not run in windows * fix error caused by Compiler Options * fix error caused by setting timeout value as 15 in python/paddle/tests/CMakeLists.txt * setting timeout value to 120s for old ut * add the timeout value setting * fix error caused by ut only run in coverage_ci * add analyzer_transformer_profile_tester * fix some error * fix some error * fix error with inference option * fix error with inference option setting as ON_INFER * add some ut to set timeout * modified some option * fix error * fix some timeout error * fix error * fix error * fix timeout for test_analyzer_bfloat16_resnet50 * fix error * setting timeout properity for some ut * first pr for new ut timeout as 15S	5 years ago
taixiurong	fad4744aa4	fix crash in adam in xpu, *test=kunlun (#28433 )	5 years ago
QingshuChen	6bba8e57b1	fix batch_norm_xpu bug & remove xpusimulator dependence (#28430 ) *test=kunlun	5 years ago
joanna.wozna.intel	7821759d48	Add bfloat16 softmax and gelu (#28394 ) * Add bfloat16 softmax and gelu * Add pass attr bfloat16_enabled_op_types * Changes from review	5 years ago
石晓伟	c41fd033e5	check op_version_registry in CI test, test=develop (#28402 )	5 years ago
Jacek Czaja	ca41541472	[oneDNN]Sum bf16 kernel (#28382 ) * - Added sum bf16 oneDNN test=develop * - Fix to UT of sum bf16 test=develop	5 years ago
Leo Chen	8b2436a776	Add broadcast_shape api (#28257 ) * add broadcast_shape api * add ut * follow comments * add example code, test=dodument_fix * update example code, test=document_fix	5 years ago
石晓伟	21a63f6f90	enhance the op_version_registry, test=develop (#28347 ) * enhance the op_version_registry, test=develop * add unittests, test=develop * enhance the op_version_registry, test=develop * fix bugs, test=develop * revert pybind_boost_headers.h, test=develop * fix a attribute bug, test=develop	5 years ago
Shang Zhizhou	ea851796e5	TensorRT中ernie模型推理性能优化，支持变长输入 (#28367 ) * fp16 result ok * change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS * auto detect special slice op converter for ernie with trt oss * ernie oss only support fp16 * fix special_slice_plugin serialize bug * matmul in tensorrt ok * ernie unittest ok * add matmul tensorrt unittest * remove demo code	5 years ago
Jacek Czaja	84cc61b2cd	[oneDNN] sum op refactor (#28318 )	5 years ago
Wilber	09fd2b2aab	Paddle support compile on sw (#27858 )	5 years ago
Leo Chen	6115c14fca	Pool2d cuda kernel supports fp16 (#28316 ) * pool2d cuda kernel supports fp16 * fix compile issue of template * add ut	5 years ago
Guo Sheng	9a600df373	Add rnn_op (#28197 ) * Add rnn_op. test=develop * Fix rnn_op grad maker's drop_empty_grad. test=develop	5 years ago
wangguanzhong	5262b02585	add generate_proposals_v2 op (#28214 ) * add generate_proposals_v2 op	5 years ago
joanna.wozna.intel	571a63e7ec	Add bf16 transpose2, reshape2, concat ops (#28195 )	5 years ago
Guanghua Yu	e8f2614da5	Enhance multiclass_nms op to support LoD for dygraph mode (#28276 ) * Enhance multiclass_nms to support LoD for dygraph mode * fix some error in multiclass_nms * update GetLodFromRoisNum to GetNmsLodFromRoisNum	5 years ago
Leo Chen	8953038400	Fix transpose in conv cudnn kernel when addto enabled (#28295 )	5 years ago
Tao Luo	e1e666a05f	fix conv mkldnn build error (#28288 )	5 years ago
Jacek Czaja	0b678d401b	- sum (#28233 ) test=develop	5 years ago
Jacek Czaja	c11d9b3035	[oneDNN ] conv2d fwd&bwd optimization (#27871 )	5 years ago
wangxinxin08	41d26a8287	update matrix nms op to api 2.0 (#28265 ) * update matrix nms op to api 2.0 * modify code according to review	5 years ago
Leo Chen	7fcb32ddf3	fill_constant op supports NINF (#28270 )	5 years ago
wangchaochaohu	6905608cea	refine yolo box Op for performace optimization (#28155 )	5 years ago
wangchaochaohu	cdadc8f019	refine temporal_shift_op for performance optimization using gpu kernel config (#28114 )	5 years ago
Zhang Ting	fdc06f2158	add Fuse bn add act pass (#28196 ) * add fuse_bn_add_act pass	5 years ago
Chen Weihang	2babd6ff67	Add compile limit for PADDLE_ENFORCE without error message (#28221 ) * add compile limit for paddle enforce * polish elementwise_op_function.cu.h * fix failed unittest * fix windows compile failed * detail polish * revert no type constructor	5 years ago
Double_V	2db77be423	fix wrong data type, test=develop (#28203 )	5 years ago
Feiyu Chan	efe6e2840c	fix strided_slice_op's GetExpectedKernelType (#28192 ) * fix strided_slice_op's GetExpectedKernelType when input tensor is at CUDAPinnedPlace * add unittest for tensors in cuda pinned place * skip test for cuda pinned place on cpu machines	5 years ago
WangXi	e450823b8b	Fix nccl op test failed, test=develop (#28172 )	5 years ago
wangguanzhong	5cd97a1cb0	support multiclass nms for multi-batch, test=develop (#28154 )	5 years ago
Double_V	5289b72acc	fix Wmaybe-uninitialized warning in pooling.cc, test=develop (#28126 )	5 years ago
wangguanzhong	d1e1f17482	fix generate_proposal_labels in cascade-rcnn series model, test=develop (#27892 ) * fix generate_proposal_labels in cascade-rcnn series model, test=develop * fix example code & unittest, test=develop * update code from review comments, test=develop	5 years ago
Leo Chen	a911c19eb0	fill_constant op supports NaN and Inf (#28109 ) * fill_constant supports nan and inf * add ut	5 years ago
zhupengyang	6dd64b0a30	randperm run error in multi-gpus (#27942 )	5 years ago
Double_V	d43f75e4cc	add rois_num for roi_align xpu OP (#28077 ) * add stack pool2d roi_align xpu op,test=kunlun * error message opt, test=kunlun * add xpu unittest,test=kunlun * skip check grad,test=kunlun * fix boostget , test=kunlun * error message opt for XPU, test=kunlun * add rois_num for roi_align xpu OP, test=develop	5 years ago
xiaoting	e3d02c9574	rm max_input in conv2d for kunlun, test=kunlun (#28062 )	5 years ago
wangchaochaohu	463c72c2d9	refine gpu kernel config for Paddle (#28085 )	5 years ago
yinhaofeng	2cb1ecb99e	lookup_table_v2_op_xpu report errors;test=kunlun (#28064 ) * lookup_table_v2_op_xpu report errors;test=kunlun * lookup_table_v2_op_xpu report errors;test=kunlun	5 years ago
yinhaofeng	6f0c3d1f06	xpu adam op (#28031 ) * lookup_table_xpu op report errors;test=kunlun * add adam xpu op;test=kunlun * reset lookup * change adam wrong;test=kunlun	5 years ago
TeslaZhao	a5c95cd588	Add xpu transpose2 op.test=kunlun (#28086 )	5 years ago
Chengmo	5f04875c30	Fix xpu error message (#28061 ) * fix error message,test=kunlun * fix, test=kunlun	5 years ago
LutaoChu	c8d32c8c10	Fix diag OP bug on Windows Python3.8 Fix diag OP bug on Windows Python3.8 ，remove the std::min	5 years ago
huangxu96	d466893820	Allclose op (#27891 ) * Still has bugs. * Fixed allclose_op bug, which cannot deal with some cases of fp64 inputs. * improved CUDA kernel performance. * Changed CUDA code. * Fixed a bug in cuda kernel which cannot deal with large dimension input, and added an unittest for it. * Add a test case for float32 input.	5 years ago
pangyoki	975bd8873b	Fix error message of multinomial op (#27946 ) * fix multinomial doc * fix multinomial error message * little doc change * fix Categorical class doc * optimize format of error message * fix CPU Kernel error message format * fix isinf and isnan error in WindowsOPENBLAS CI * delete inf and nan * add manual_seed in sample code * little error message change * change error message to InvalidArgument * add full point for error message and add manual_seed in CPU environment	5 years ago
Kaipeng Deng	b6eff4427c	update yolo_box support h != w. test=develop (#27327 )	5 years ago
Double_V	c1eed1fa24	error message opt for XPU, test=kunlun (#27972 ) * add stack pool2d roi_align xpu op,test=kunlun * error message opt, test=kunlun * add xpu unittest,test=kunlun * skip check grad,test=kunlun * fix boostget , test=kunlun * error message opt for XPU, test=kunlun	5 years ago
pangyoki	4c5b779a99	Add truncated_gaussian_random XPU kernel (#27861 ) * Add truncated_gaussian_random_op XPU kernel * Add truncated_gaussian_random_op XPU kernel, test=kunlun * little change, test=kunlun * change boost_get to BOOST_GET_CONST * change boost_get to BOOST_GET_CONST, test=kunlun * little change, test=kunlun * use Generator to generate random number and optimize format, test=kunlun * little change, test=kunlun * add TODO, test=kunlun	5 years ago
pangyoki	5b8e500135	Add gaussian_random XPU kernels (#27853 ) * Add gaussian_random XPU kernels * commit kunlun, test=kunlun * new version, test=kunlun * change boost_get to BOOST_GET_CONST, test=kunlun * use Generator to generate random number and optimize format, test=kunlun * add TODO, test=kunlun	5 years ago

... 2 3 4 5 6 ...

5977 Commits (84639b61939ccd68702e6423f50f085af93ede19)