Paddle

Commit Graph

Author	SHA1	Message	Date
ShenLiang	0fb18bc214	enforce the matmul_v2 error message (#29297 )	4 years ago
Zhen Wang	9b59a589b1	Remove some useless log. (#29300 )	4 years ago
Leo Chen	13a22a3752	fix shape of tile_grad op (#29289 )	4 years ago
Zhen Wang	be3777a50a	Add pure fp16 training with master weights. (#27712 ) * add the weight decay func for the momentum op * Add the multi_precision function in Momentum Optimizer. * Make sure that the initial value of master weights are same with the fp16 weights. * add static loss scaling. * add the rescale_grad function in the pure fp16 training. * use the original momentum updating method. * Polish some codes, such as variable names. * add docstring for apis. * update the var creation details of _create_master_weight. * not modify codes about imperative momentum updating. * Fix the error of test_dist_sparse_tensor_load_momentum UT. * add unit test for multi precision fp16 training. * add more unit tests for CI. * Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT. * For CI Coverage Checking.	4 years ago
furnace	7584bb5096	Layer norm fp16 (#29169 ) * add fp16 for layer_norm op * revert layernorm api * fix forward * fix forward * fix backward for layernorm with fp16 * fix unit test for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 * 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U> * fix with_mkldnn compile error for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 Co-authored-by: zhiqiu <chenqiuliang@baidu.com>	4 years ago
Leo Chen	116305ea4b	Improve performance of elementwise_add grad op (#29187 ) * pass stop_gradient for cast op * improve performance of elementwise_add grad * use tensor copy async * dygraph branch * fix dygraph branch * add ut	4 years ago
卖鱼的哲学	07c67d5a8b	add deformable_conv op on xpu (#29234 ) * rebase develop * update deformable_conv op on xpu * update deformable_conv op on xpu	4 years ago
QingshuChen	64f29fbb70	update kunlun conv2d/softmax/elementwise implemetation (#29229 ) * update conv2d & softmax to new xpu api * test=kunlun * remove useless comments * test=kunlun * remote softmax xpu op * test=kunlun * update kunlun softmax * test=kunlun * update xpu unitest * test=kunlun * fix elementwise_grad bug for kunlun *test=kunlun	4 years ago
chentianyu03	8f45d14263	add complex64 and complex128 type; add +-/@ and slice opreator for c… (#29199 ) add complex64 and complex128 type; add +-/@ and slice opreator for complex types add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest	4 years ago
Wilber	74c43ac638	fix lite unit test. (#29233 )	4 years ago
Adam Osewski	4096ff94dc	Small optimizations for conv2d kernel subroutines. (#29188 ) - Make sure that oneDNN memory descriptors are created only once at first iteration.	4 years ago
123malin	b5c6342336	Update ps gpu (#29209 ) * fix paramete prefetch & device guard Co-authored-by: MrChengmo <cmchengmo@163.com> Co-authored-by: chengmo <chengmo@baidu.com>	4 years ago
123malin	03d4665f44	prefetch optimize (#29095 ) * test=develop, optimize async prefetch	4 years ago
WangXi	0c2a51d240	optimizer amp, all use fp16 communication, overlap last comm and compute (#28957 )	4 years ago
Jack Zhou	bc6033f86b	fix gru gcc7.4 bug for the gru compile fix gru gcc7.4 bug for the gru compile	4 years ago
wangchaochaohu	b818429ae7	optimize cumsum OP (#29193 )	4 years ago
lilong12	7e5e9934fe	update expand as op to use the shape of the target tensor instead of the target tensor itself. (#29020 ) * update, test=develop	4 years ago
Jack Zhou	085260f3de	Add eigen gru and fix the dropout bug in the rnn Add eigen gru and fix the dropout bug in the rnn	4 years ago
arlesniak	bc902044a4	Fixes mkldnn dygraph learning rate scheduler crashes (#28988 )	4 years ago
Shang Zhizhou	b9e76a0103	detect tensorRT plugin fp16 in runtime (#27933 ) * remove -DSUPPORTS_CUDA_FP16 in cuda.cmake * comile with cuda9 * add some unittest * notest;test=coverage * add unittest for trt plugin swish && split * update ernie unittest * fix some error message * remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter * fix comile errror when CUDA_ARCH_NAME < Pascal" * fix comile error * update unittest timeout * compile with cuda9 * update error msg * fix code style * add some comments * add define IF_CUDA_ARCH_SUPPORT_FP16 * rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED	4 years ago
Noel	da71173bc9	Fix ops doc for some ops Fix ops doc for some ops	4 years ago
joanna.wozna.intel	b0d1ac161e	Add bf16 pool2d and unify bf16 unit tests (#29039 ) * Add bf16 pool2d and unify bf16 unit tests * Add change default ops test	4 years ago
joejiong	582c0a0468	add uint8 for reshape op (#28996 ) add uint8 for reshape operator	4 years ago
taixiurong	a5aa4dc7a9	add xpu elementwise ops (#29031 )	4 years ago
joejiong	b04c78ef5e	Update pow (#29000 ) Simple code clean up	4 years ago
wawltor	b2c8a00745	remove eigen threadpool for the speed up remove eigen threadpool for the speed up	4 years ago
lilong12	767d0ba267	update, test=develop (#28700 )	4 years ago
123malin	fbf9564f6b	【paddle.distributed.fleet】Optimize ParameterServer's Async Mode (#28442 ) * test=develop, optimize global_step	4 years ago
furnace	8ff3550658	refactor momentum op to combine weight (#27414 ) * refactor momentum op to combine weight_decay (scale op and sum op)	4 years ago
Jacek Czaja	bd1d6d3b30	extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758 )	4 years ago
yaoxuefeng	71c1cd1408	fix truncated_gaussian seed (#28777 )	4 years ago
gongweibao	1dad8ceaab	Fix gpu memory allocation bug. (#28703 )	4 years ago
Chen Weihang	b969c32ab1	fix occupied 0 device memory bug (#28771 )	4 years ago
joejiong	1a532d5133	add uint8 support for squeeze operator (#28734 ) Adding uint8 support for squeeze operator.	4 years ago
wangchaochaohu	8b853b3030	fix the number of perf algo for conv cudnn in exhaustive mode (#28694 )	4 years ago
joanna.wozna.intel	8c0ea4bffe	Add bf16 matmul, fc, elementwise add and mul (#28729 ) * Add bf16 matmul, fc, elementwise add and mul * Correct unit test	4 years ago
yaoxuefeng	08b62f4902	fix shuffle batch op shuffle (#28533 )	4 years ago
taixiurong	d3d1a6b6e0	add kunlun kernel: slice, slice_grad, top_k, cast. test=kunlun (#28542 ) 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api * 1.add xpu slice op 2. add xpu top_k op 3.modify xpu cast to new api	4 years ago
Jack Zhou	9362d85e0e	Add LSTM, Simple RNN and GRU CPU kernel (#28577 ) * add lstm, simple rnn op kernel * fix the test_lstm for the rnn op * change func name * fix forward postprocess bug * add gru forward, backward code * remove unittest.skipIf; use a big rnn op instead of combination op * fix input doesn't have gradient bug * add eigen lstm forward, backward Co-authored-by: wawltor <fangzeyang0904@hotmail.com>	4 years ago
QingshuChen	30ef3815b3	adjust kunlun header file (#28536 ) * adjust kunlun header file test=kunlun update kunlun unittest test=kunlun update xpu unitest * test = kunlun * update xpu unittest * test=kunlun * update xpu unitest * test=kunlun	4 years ago
Zhang Ting	dab4920568	improve performance of cast op (#28727 )	4 years ago
yaoxuefeng	03f46e3526	fix truncated_gaussian op cuda seed setting (#28678 )	4 years ago
Wojciech Uss	04bcc13fac	Add multi_gru op and tests (#28591 ) * Add multi_gru op and tests * removed redundant disable_dygraph()	4 years ago
joejiong	32b90b1c2d	add log10 (#28576 ) Add new operator log10	4 years ago
Guo Sheng	858ffa0c8b	Fix the dropout setting when not initialized in rnn_op. (#28561 ) test=develop	4 years ago
Jacek Czaja	6d8d3d4c22	[oneDNN] Layer norm bf16 kernel (#28619 )	4 years ago
Zhou Wei	bf143652ac	fix lstm OP compile error on windows (#28667 ) * add unittest and check unittest for windows * fix lstm OP compile error on windows	4 years ago
石晓伟	57dab959ca	add datanorm op new scale_w register (#28657 ) Co-authored-by: yaoxuefeng6 <yaoxuefeng@baidu.com>	4 years ago
cc	65aac81191	Fix fake_quant error when cout > 1024, test=develop (#28603 )	4 years ago
lilong12	b2f7ab6636	bug fix, test=develop (#28648 )	4 years ago
wawltor	8f2656ef5c	fix the gradient bug for the topk v2 fix the gradient bug for the topk v2	4 years ago
wangchaochaohu	a972c33fd7	refine gather OP performance for dynamic mode (#28587 )	4 years ago
joanna.wozna.intel	2cb71c0cde	Add checkpoint to quantize (#28612 ) * Add checkpoint to quantize * Change bfloat16 option	4 years ago
pangyoki	b889a0cee2	add gaussian_random op_version (#28602 )	4 years ago
Guo Sheng	110febdc54	Fix gradients with ignore_idx in softmax_with_cross_entropy (#28622 ) * Fix gradients with ignore_idx in softmax_with_cross_entropy. test=develop * Fix gradients with ignore_idx in softmax_with_cross_entropy on cpu. Remove softmax_with_cross_entropy from op_threshold_white_list. test=develop * Fix test_softmax_cross_entropy_op.py. test=develop	4 years ago
Leo Chen	f962bd3432	Fix cudnn workspace limit in cudnn-8 (#28611 )	4 years ago
Leo Chen	90805e2df7	Register op_version for new attribute use_addto (#28463 ) * register op_version for addto * upgrade pass capability * change eq to le * change eq to le * fix merge	4 years ago
lilong12	ed9dd7c9f0	add send and recv ops (#28590 ) * update, test=develop	4 years ago
Zhong Hui	a829357e4d	register the op version for some ops register the op version for some ops	4 years ago
Zhou Wei	bf6e7cba7a	updata 2.0 API english doc (#28525 ) * make Numpy version is below 1.19.3 * fix 2.0 doc	4 years ago
Shang Zhizhou	8699f38d08	裁剪transformer模型trt支持；修复tensorRT不支持DeletePass的bug (#28517 ) * skip_layernorm_op done * add unittest * slice op convertor support trt < 6 * skip_layernorm only work in ernie	4 years ago
joejiong	08d2413142	add log2 operator (#28319 ) As the title	4 years ago
wangchaochaohu	c52fe48f6f	fix the GetKernelTypeForVar of input for fluid.gather (#28534 )	4 years ago
wangchaochaohu	d7cfee9b31	Checkout point add (#28488 ) * upgrade pass capability	4 years ago
zhupengyang	47cbf61dd4	fix softmax unittest float16 random error (#28480 )	4 years ago
wangchaochaohu	e14ed71cc2	refine the performance of gather Op (#28458 )	4 years ago
YUNSHEN XIE	ba0756325a	exec ut no more than 15s 1 (#28439 ) * disable ut test_parallel_executor_fetch_isolated_var,test=document_fix * test for limiting ut exec time as 15S * fix an error caused by cannot find ut * fix some error * can not find test_transformer * fix error caused by ut not run in windows * fix error caused by Compiler Options * fix error caused by setting timeout value as 15 in python/paddle/tests/CMakeLists.txt * setting timeout value to 120s for old ut * add the timeout value setting * fix error caused by ut only run in coverage_ci * add analyzer_transformer_profile_tester * fix some error * fix some error * fix error with inference option * fix error with inference option setting as ON_INFER * add some ut to set timeout * modified some option * fix error * fix some timeout error * fix error * fix error * fix timeout for test_analyzer_bfloat16_resnet50 * fix error * setting timeout properity for some ut * first pr for new ut timeout as 15S	4 years ago
taixiurong	fad4744aa4	fix crash in adam in xpu, *test=kunlun (#28433 )	4 years ago
QingshuChen	6bba8e57b1	fix batch_norm_xpu bug & remove xpusimulator dependence (#28430 ) *test=kunlun	4 years ago
joanna.wozna.intel	7821759d48	Add bfloat16 softmax and gelu (#28394 ) * Add bfloat16 softmax and gelu * Add pass attr bfloat16_enabled_op_types * Changes from review	4 years ago
石晓伟	c41fd033e5	check op_version_registry in CI test, test=develop (#28402 )	4 years ago
Jacek Czaja	ca41541472	[oneDNN]Sum bf16 kernel (#28382 ) * - Added sum bf16 oneDNN test=develop * - Fix to UT of sum bf16 test=develop	4 years ago
Leo Chen	8b2436a776	Add broadcast_shape api (#28257 ) * add broadcast_shape api * add ut * follow comments * add example code, test=dodument_fix * update example code, test=document_fix	4 years ago
石晓伟	21a63f6f90	enhance the op_version_registry, test=develop (#28347 ) * enhance the op_version_registry, test=develop * add unittests, test=develop * enhance the op_version_registry, test=develop * fix bugs, test=develop * revert pybind_boost_headers.h, test=develop * fix a attribute bug, test=develop	4 years ago
Shang Zhizhou	ea851796e5	TensorRT中ernie模型推理性能优化，支持变长输入 (#28367 ) * fp16 result ok * change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS * auto detect special slice op converter for ernie with trt oss * ernie oss only support fp16 * fix special_slice_plugin serialize bug * matmul in tensorrt ok * ernie unittest ok * add matmul tensorrt unittest * remove demo code	4 years ago
Jacek Czaja	84cc61b2cd	[oneDNN] sum op refactor (#28318 )	4 years ago
Wilber	09fd2b2aab	Paddle support compile on sw (#27858 )	4 years ago
Leo Chen	6115c14fca	Pool2d cuda kernel supports fp16 (#28316 ) * pool2d cuda kernel supports fp16 * fix compile issue of template * add ut	4 years ago
Guo Sheng	9a600df373	Add rnn_op (#28197 ) * Add rnn_op. test=develop * Fix rnn_op grad maker's drop_empty_grad. test=develop	4 years ago
wangguanzhong	5262b02585	add generate_proposals_v2 op (#28214 ) * add generate_proposals_v2 op	4 years ago
joanna.wozna.intel	571a63e7ec	Add bf16 transpose2, reshape2, concat ops (#28195 )	4 years ago
Guanghua Yu	e8f2614da5	Enhance multiclass_nms op to support LoD for dygraph mode (#28276 ) * Enhance multiclass_nms to support LoD for dygraph mode * fix some error in multiclass_nms * update GetLodFromRoisNum to GetNmsLodFromRoisNum	4 years ago
Leo Chen	8953038400	Fix transpose in conv cudnn kernel when addto enabled (#28295 )	4 years ago
Tao Luo	e1e666a05f	fix conv mkldnn build error (#28288 )	4 years ago
Jacek Czaja	0b678d401b	- sum (#28233 ) test=develop	4 years ago
Jacek Czaja	c11d9b3035	[oneDNN ] conv2d fwd&bwd optimization (#27871 )	4 years ago
wangxinxin08	41d26a8287	update matrix nms op to api 2.0 (#28265 ) * update matrix nms op to api 2.0 * modify code according to review	4 years ago
Leo Chen	7fcb32ddf3	fill_constant op supports NINF (#28270 )	4 years ago
wangchaochaohu	6905608cea	refine yolo box Op for performace optimization (#28155 )	4 years ago
wangchaochaohu	cdadc8f019	refine temporal_shift_op for performance optimization using gpu kernel config (#28114 )	4 years ago
Zhang Ting	fdc06f2158	add Fuse bn add act pass (#28196 ) * add fuse_bn_add_act pass	4 years ago
Chen Weihang	2babd6ff67	Add compile limit for PADDLE_ENFORCE without error message (#28221 ) * add compile limit for paddle enforce * polish elementwise_op_function.cu.h * fix failed unittest * fix windows compile failed * detail polish * revert no type constructor	4 years ago
Double_V	2db77be423	fix wrong data type, test=develop (#28203 )	4 years ago
Feiyu Chan	efe6e2840c	fix strided_slice_op's GetExpectedKernelType (#28192 ) * fix strided_slice_op's GetExpectedKernelType when input tensor is at CUDAPinnedPlace * add unittest for tensors in cuda pinned place * skip test for cuda pinned place on cpu machines	4 years ago
WangXi	e450823b8b	Fix nccl op test failed, test=develop (#28172 )	4 years ago
wangguanzhong	5cd97a1cb0	support multiclass nms for multi-batch, test=develop (#28154 )	4 years ago
Double_V	5289b72acc	fix Wmaybe-uninitialized warning in pooling.cc, test=develop (#28126 )	4 years ago
wangguanzhong	d1e1f17482	fix generate_proposal_labels in cascade-rcnn series model, test=develop (#27892 ) * fix generate_proposal_labels in cascade-rcnn series model, test=develop * fix example code & unittest, test=develop * update code from review comments, test=develop	4 years ago
Leo Chen	a911c19eb0	fill_constant op supports NaN and Inf (#28109 ) * fill_constant supports nan and inf * add ut	4 years ago
zhupengyang	6dd64b0a30	randperm run error in multi-gpus (#27942 )	4 years ago
Double_V	d43f75e4cc	add rois_num for roi_align xpu OP (#28077 ) * add stack pool2d roi_align xpu op,test=kunlun * error message opt, test=kunlun * add xpu unittest,test=kunlun * skip check grad,test=kunlun * fix boostget , test=kunlun * error message opt for XPU, test=kunlun * add rois_num for roi_align xpu OP, test=develop	4 years ago
xiaoting	e3d02c9574	rm max_input in conv2d for kunlun, test=kunlun (#28062 )	4 years ago
wangchaochaohu	463c72c2d9	refine gpu kernel config for Paddle (#28085 )	4 years ago
yinhaofeng	2cb1ecb99e	lookup_table_v2_op_xpu report errors;test=kunlun (#28064 ) * lookup_table_v2_op_xpu report errors;test=kunlun * lookup_table_v2_op_xpu report errors;test=kunlun	4 years ago
yinhaofeng	6f0c3d1f06	xpu adam op (#28031 ) * lookup_table_xpu op report errors;test=kunlun * add adam xpu op;test=kunlun * reset lookup * change adam wrong;test=kunlun	4 years ago
TeslaZhao	a5c95cd588	Add xpu transpose2 op.test=kunlun (#28086 )	4 years ago
Chengmo	5f04875c30	Fix xpu error message (#28061 ) * fix error message,test=kunlun * fix, test=kunlun	4 years ago
LutaoChu	c8d32c8c10	Fix diag OP bug on Windows Python3.8 Fix diag OP bug on Windows Python3.8 ，remove the std::min	4 years ago
huangxu96	d466893820	Allclose op (#27891 ) * Still has bugs. * Fixed allclose_op bug, which cannot deal with some cases of fp64 inputs. * improved CUDA kernel performance. * Changed CUDA code. * Fixed a bug in cuda kernel which cannot deal with large dimension input, and added an unittest for it. * Add a test case for float32 input.	4 years ago
pangyoki	975bd8873b	Fix error message of multinomial op (#27946 ) * fix multinomial doc * fix multinomial error message * little doc change * fix Categorical class doc * optimize format of error message * fix CPU Kernel error message format * fix isinf and isnan error in WindowsOPENBLAS CI * delete inf and nan * add manual_seed in sample code * little error message change * change error message to InvalidArgument * add full point for error message and add manual_seed in CPU environment	4 years ago
Kaipeng Deng	b6eff4427c	update yolo_box support h != w. test=develop (#27327 )	4 years ago
Double_V	c1eed1fa24	error message opt for XPU, test=kunlun (#27972 ) * add stack pool2d roi_align xpu op,test=kunlun * error message opt, test=kunlun * add xpu unittest,test=kunlun * skip check grad,test=kunlun * fix boostget , test=kunlun * error message opt for XPU, test=kunlun	4 years ago
pangyoki	4c5b779a99	Add truncated_gaussian_random XPU kernel (#27861 ) * Add truncated_gaussian_random_op XPU kernel * Add truncated_gaussian_random_op XPU kernel, test=kunlun * little change, test=kunlun * change boost_get to BOOST_GET_CONST * change boost_get to BOOST_GET_CONST, test=kunlun * little change, test=kunlun * use Generator to generate random number and optimize format, test=kunlun * little change, test=kunlun * add TODO, test=kunlun	4 years ago
pangyoki	5b8e500135	Add gaussian_random XPU kernels (#27853 ) * Add gaussian_random XPU kernels * commit kunlun, test=kunlun * new version, test=kunlun * change boost_get to BOOST_GET_CONST, test=kunlun * use Generator to generate random number and optimize format, test=kunlun * add TODO, test=kunlun	4 years ago
pangyoki	74ce039743	Add uniform_random XPU kernel (#27846 ) * support uniform_random op on Baidu Kunlun * change dtype of attr shape from int to int64_t * kunlun ci, test=kunlun * new version, test=kunlun * change boost_get to BOOST_GET_CONST * change boost_get to BOOST_GET_CONST, test=kunlun * use Generator to generate random number and optimize format * run Kunlun CI, test=kunlun * add TODO, test=kunlun	4 years ago
xiaoting	abf4d52a74	Polish kunlun error (#27974 ) * polish error message,test=kunlun * polish error,test=kunlun * polish error,test=kunlun * polish error,test=kunlun	4 years ago
liuyuhui	3e9568653b	add cast/concat/assign xpu op (#27911 ) * addd * add cast_op_xpu, test=kunlun * fix bug for cast_op_xpu,test=kunlun * add concat_op_xpu, test=kunlun * slove conflicts, test=kunlun * fix bug,test=kunlun * add assign_op_xpu, test=kunlun * fix bug,test=kunlun * test=kunlun;test=develop * fix concat bug,test=kunlun * fix check_dygraph set in test_concat_op_xpu.py,test=kunlun * fix error message,test=kunlun Co-authored-by: mapingshuo <mps2012@yeah.net>	4 years ago
Guo Sheng	fa9d3fa5bf	Incorporate cudnn_lstm into LSTM api (#27217 ) * Incorporate cudnn_lstm into LSTM api. test=develop * Make coalesce_tensor support alignment optionally. test=develop * Reorganize RNN apis. test=develop * Fix cudnn rnn layout conversion. test=develop * Add sequence_length support for RNN cudnn implement. Add optional init_h and init_c gradient for cudnn_lstm_op. test=develop * Use create_parameter for rnn cudnn impl. test=develop * Move `self._flat_weight = self.create_parameter()` in RNNBase to main_program. test=develop * Update RNN api unittest to use set_device. test=develop * Fix set_place for unit tests of RNN apis. test=develop * Fix use_align in coalesce_tensor_op. test=develop * Adjust RNN apis arguments according to comments. test=develop * Polish documents for SimpleRNN apis. test=develop * Refine random seed in cudnn_lstm_op. Expose rnn params from sublayers to RNN. test=develop * Fix RNN saving for jit.save. Refine cudnn_lstm dropout behavior. test=develop * Fix doc of GRU. test=develop * Use ShareDataWith to avoid copying for cudnn_lstm_op test. test=develop * Remove updates on cudnn_lstm temporarily. test=develop * Use ShareDataWith to avoid copying for cudnn_lstm_op test. test=develop * Refine random seed in cudnn_lstm_op. test=develop * Fix test_lstm by adjust ConcreteProgram buffer getter. test=develop * Use create_parameter instead of create_var for rnn._flat_weight for static graph usage. test=develop * Remove W input for cudnn_lstm to pass unused_var_check. test=develop * Add test_predict for RNN unit tests coverage. test=develop * Fix code style of rnn. test=develop * Fix F.rnn usage in rnn.py. test=develop	4 years ago
Guanghua Yu	f94d053705	error message optimization in mean_xpu,softmax_with_cross_entropy_op_xpu,test=kunlun (#27967 )	4 years ago
Jack Zhou	d330cf66cc	Fix xpu enforce (#27978 ) * test=kunlun; Add elementwise XPU OP kernel for KUNLUN core, including (but still cannot process common broadcast): * elementwise_div op * elementwise_max op * elementwise_mul op (with grad op) * elementwise_sub op (with grad op) * 0.05->0.01 * add xpu error message description;test=kunlun	4 years ago
lidanqing	7cb4a8b8f2	[oneDNN] Conv dilation support (#27914 ) * conv dilated mkldnn support: forward and backward pass * add mkldnn conv_transpose dilation UT test=develop * remove unnecessary PADDLE_ENFORCE * add int8 and bf16 dilated conv UT * update according to reviews	4 years ago
mapingshuo	64c2634995	fix kunlun kernel of reshape op (#27988 )	4 years ago
tangwei12	202bfab1be	Feature/large scale kv save base/delta (#27470 ) * add size method for large scale * add large scale UT * add ut for checkpoint	4 years ago
123malin	aa3b4ed717	【paddle.fleet】geo send sparse optimize (#27719 ) * test=develop, fix geo sgd communicator * test=develop, gloo_init_method * test=develop, bug fix for gloo http_init	4 years ago
mapingshuo	5ccaaab8aa	reshape support bool, test=develop (#27944 )	4 years ago
Qinghe JING	4a4f773658	Add reduce sum and reduce mean xpu op (#27939 ) * add reduce xpu op test=develop;test=kunlun * add reduce xpu op test=develop;test=kunlun * add reduce xpu op test=develop;test=kunlun * add reduce xpu op test=develop;test=kunlun * add reduce xpu op test=develop;test=kunlun	4 years ago
Zhou Wei	bf412f4665	add tensor clone (#27953 ) * add tensor clone * fix unittest test_var_base	4 years ago
Feiyu Chan	2e845182d9	support channel last in BatchNormd 1. support channel last in BatchNormd (#27875) 2. fix a bug in batch_norm_op cuda kernel by extracting ResizeToChannelFist(Last), TransToChannelFirst(Last) to operators/layer_utils.h	4 years ago
Leo Chen	9a2a4b5f65	Support setting xpu place in dygraph mode (#27909 ) * support setting xpu place * add ut, test=kunlun	4 years ago
MRXLT	263a9e97fd	Fix adam (#27778 ) * fix adam * fix gpu adam * fix code style * fix ut * update ut add cuda code	4 years ago
Double_V	b0edda4d99	kunlun add op (#27890 ) * add stack pool2d roi_align xpu op,test=kunlun * error message opt, test=kunlun * add xpu unittest,test=kunlun * skip check grad,test=kunlun * fix boostget , test=kunlun	4 years ago
Jack Zhou	c791df09cf	Add elementwise XPU OP kernel for KUNLUN core, including (but still cannot process common broadcast Add elementwise XPU OP kernel for KUNLUN core, including (but still cannot process common broadcast	4 years ago
wangchaochaohu	c5fcc96d5b	xpu support for fill_constant Op (#27675 )	4 years ago
Chengmo	328cb289ed	【paddle.fleet】fix sparse load (#27680 ) * add sparse tensor load method	4 years ago
tangwei12	cf70d5b350	fix paddle error informations (#27889 )	4 years ago
wawltor	95aa53425d	update the code for the topk message optimize update the code for the topk message optimize	4 years ago
Chen Weihang	4ba977c720	Polish some error message in opeators (#27876 ) * polish some error message * add white list * revert shell script change	4 years ago
123malin	a4f850748a	【paddle.fleet】bug fix for parameter_recv (#27838 ) * test=develop, bug fix for parameter_recv * test=develop, for unittest, test_fleet_rolemaker_new	4 years ago
QingshuChen	2712d07644	support kunlun matmul_v2 (#27910 ) *test=kunlun	4 years ago
zhang wenhui	5a83496c8d	Multi task (#26002 ) * add multitask * add multitask, test=develop * fix code style, test=develop * add partail push dense, test=develop * fix has_kay in py3, test=develop * fix, test=develop * fix, test=develop * fix, test=develop	4 years ago
zhang wenhui	7a58431c0a	fix norm api doc, test=develop (#27652 ) * fix norm api doc, test=develop * fix error message, test=develop * fix api norm, test=develop * add adagrad, test=develop * fix bug, test=develop * fix bug, test=develop * add spetral_norm, test=develop * fix adagrad, test=develop * merge , test=develop	4 years ago
yinhaofeng	3eb106da6d	Lookup table v2 xpu (#27888 ) * add lookup_table_v2_op_xpu, test=kunlun * add lookup_table_v2_op_xpu, test=kunlun * change some Tips ,test=kunlun	4 years ago
Zhang Ting	d5cc144c60	tune backward filter algorithm for float16 (#27529 ) * use exhaustive_search for float16 * tune algo only when dtype is float16	4 years ago
hutuxian	3f2a6ab65d	fix error msg (#27887 )	4 years ago
xiaoting	ae01801f0a	Add dropout and log_loss for kunlun (#27790 ) * add dropout,log_loss, test=kunlun * fix dropout, test=kunlun * polish error message, test=kunlun * change boost::get to BOOST_GET_CONST, test=kunlun * fix copyright, test=kunlun	4 years ago
Guanghua Yu	70c8c31371	support mean,softmax_with_cross_entropy on Baidu Kunlun (#27792 ) * support mean,softmax_with_cross_entropy on Baidu Kunlun,test=kunlun * fix unittests error,test=kunlun * delete boost::get,test=kunlun	4 years ago
Chengmo	1607e87cb9	add xpu sgd & momentum (#27728 ) * add xpu sgd & momentum	4 years ago
hong19860320	c90d35564b	Add batch_norm and layer_norm XPU kernels (#27818 )	4 years ago
xiaoting	6da7a7458b	add conv for xpu, test=kunlun (#27809 ) * add conv for xpu, test=kunlun * polish error_message, test=kunlun * polish error_message, test=kunlun * fix copyrigth, test=kunlun	4 years ago
Thunderbrook	04be37c57f	add xpu slice op (#27349 ) * add xpu slice op test=xpu * add slice xpu op test=xpu * code style test=kunlun * style test=kunlun * format test=kunlun	4 years ago
Thunderbrook	8c25dfaacc	op error info (#27856 ) * op error info * style * code format	4 years ago
ShenLiang	6d63cd2b93	add gather_op xpu, test=kunlun (#27822 ) * add gather_op xpu, test=develop, test=kunlun * fix ut, test=develop, test=kunlun * fix the ut,test=develop, test=kunlun	4 years ago
Feiyu Chan	1d95a0fbc3	fix error message for nce_op (#27863 )	4 years ago
guofei	2e1bca99ca	Refine the gradient calculation errors caused by renaming in while_grad (#27814 ) test=develop	4 years ago
wanghuancoder	8fa4c09889	add load_op_xpu for Baidu Kunlun (#27817 ) * add load_op_xpu for Baidu Kunlun, test=kunlun * add is_compiled_with_xpu for unit test, test=kunlun * add is_compiled_with_xpu for unit test, test=kunlun	4 years ago
Jacek Czaja	55e63763ec	[oneDNN] adaptive pool support (#27747 )	4 years ago
Zhang Ting	16999ae49d	use IndexList to improve performance of instance_norm op (#25132 ) * use IndexList to improve performance, test=develop * remove EIGEN_HAS_INDEX_LIST, test=develop * use IndexList only when EIGEN_HAS_INDEX_LIST is true	4 years ago
GaoWei8	36bb056ed6	Add flattern weight of lstm (#27192 ) * add flattern weight of lstm	4 years ago
Guanghua Yu	7779790c61	error message optimization in softmax_with_cross_entropy_op (#27772 ) * error message optimization in softmax_with_cross_entropy_op * fix some unsuited comment	4 years ago
TeslaZhao	070ac9590c	Add double grad in Squeeze and Unsqueeze (#27810 ) * Add double grad in Squeeze and Unsqueeze * Add double grad in Squeeze and Unsqueeze	4 years ago
Jack Zhou	d4359b0f39	add the kunlun kernel for the paddle 2.0 Add xpu kernel for KUNLUN core: * accuracy op * sign op * scale op * sum op Add default atol in xpu unittest.	4 years ago
mapingshuo	840d54de9b	add XPU support for shape op and reshape op (#27804 )	4 years ago
cc	8fabb1c32f	Add test attribute in channelwise_quant op, test=develop (#27742 ) * Add test attribute in channelwise_quant op, test=develop	4 years ago
wangxinxin08	ad99e638fd	add double grad op for matmul (#27776 ) * add matmul doublegrad op * fix compile errors * modify code according to review * delete float16	4 years ago
zhupengyang	0025e0d87b	refine APIs: brelu, hardsigmoid, hardswish, maxout (#27658 )	4 years ago
zhupengyang	5098891fdf	add softmax xpu kernel (#27700 )	4 years ago
Double_V	f6ad2375be	fix pool3d bug, test=develop (#27718 ) * fix pool3d bug, test=develop * fix unitest, test=develop * fix test and fix pool2d bug, test=develop	4 years ago
Feiyu Chan	0a7bab4e34	fix error mesage for negative_positive_pair_op and nce_op (#27779 )	4 years ago
zhupengyang	395cb561aa	refine logsumexp error message and docs (#27713 )	4 years ago
smallv0221	057e28bc8f	API(lstm_unit, lstmp, sequence_mask, sequence_enumerate, sequence_conv) error message enhancement (#27572 ) * API(Compute) error message enhancement on line 44, 50, 53. * lstm_unit error message enhancement. lstmp error message enhancement. sequence_conv error message enhencement. sequence_enumerate error message enhencement. sequence_mask error message enhencement. * Update lstm_unit_op.cc * Update lstm_unit_op.h * error msg enhancement. * Update sequence_conv_op.cc * Update lstm_unit_op.cc * Update sequence_conv_op.cc * Update sequence_enumerate_op.cc * Update sequence_enumerate_op.cu * Update sequence_enumerate_op.h * Update sequence_pool_op.h * error message enhencement. * error message enhancement.	4 years ago
Jacek Czaja	606611d351	[oneDNN] GRU BF16 kernel (#27731 )	4 years ago
xiemoyuan	6c1acf34ed	Optimize the error message for OP (#27617 ) * Optimize the error message for OPs. * Optimize the error message for OPs in details.	4 years ago
cc	ec7d11a492	refine fused_elemwise_activation error message (#27734 )	4 years ago
Zhen Wang	365c2c9c89	fix error message showing in UpdateLossScalingOp (#27596 )	4 years ago
LielinJiang	9089841b6e	Fix bilateral inference shape bug (#26822 ) * fix bilateral bug	4 years ago
Yiqun Liu	65207b4560	Polish the error message of fc, fused_fc_elementwise_layernorm and fused_embedding_seq_pool. (#27692 ) * Polish the error message of fc_op. * Polish the error message of fused_fc_elementwise_layer_norm op. * Polish an error message in fused_embedding_seq_pool_op.	4 years ago
Jacek Czaja	b9fda2ff09	Fix to issue #25537 (#27546 ) * - condidate fix to issue #25537 test=develop * - UT for transpose NHWC test=develop	4 years ago
Wojciech Uss	966447e338	Added support for quantization of fusion_gru (#27518 )	4 years ago
hong19860320	7a96d5788d	Optimize the error messages of the CUDA implementation of activation ops (#27741 ) test=develop	4 years ago
tangwei12	fd616fadc2	repen heartbeat ut (#27684 )	4 years ago
Qi Li	f373269df0	update histogram op for performance optimization, test=develop (#24912 )	4 years ago
MRXLT	20fb01fb00	fix distributed error info (#27206 ) * fix distributed error info * bug fix; notest * error info refine * update error info * update error info * update error info * bug fix * bug fix * bug fix * bug fix	4 years ago
pangyoki	7cd2c13f1b	add multinomial op (#27219 ) * add multinomial cpu kernel * fix C++ notype error * fix windows ci array len error * let array len be const * change array to vector * add cuda kernrl with num_distribution is 1, and not support replacement=False * add multinomial python api * support num_distribution different multinomial distributions * add multinomial python api unittest * change output dtype to int64 * fix coverage prob * optimize format * fix dtype of output error, should be int64_t	4 years ago
Wojciech Uss	42d175385d	Add support for (de/re)quantization with shift (#27481 )	4 years ago
123malin	cc780b1977	test=develop, optimize geo communicator (#26857 ) * test=develop, optimize geo communicator	4 years ago
yukavio	7b46fb0f14	fix generate_proposals and affine grid error info (#27636 )	4 years ago
AshburnLee	c3a3df6466	Add cuda support for unique op (#27646 ) * unique op for cuda is added * add support for cuda * Add cuda support for unique op. * Add support for int32_t and int64_t. * For old version, process by cpu * Add VisitDataType for thrust	4 years ago
wawltor	29f4922906	optimize the error meesage for detetion_map_op optimize the error meesage for detetion_map_op	4 years ago
whs	daf5aa9b8b	Fix round in grid sample op (#27657 )	4 years ago
ysh329	2f9cdd9038	API/OP clip_by_norm_op error message enhancement. test=develop (#27614 ) * Fix clip_by_norm_op error message. test=develop * test=develop * test=develop	4 years ago
yongqiangma	aac57159c9	enhance array_to_lod_tensor_op lod_tensor_to_array_op errors informaiton (#27386 ) * enhance array_to_lod_tensor_op lod_tensor_to_array_op errors information. test=develop	4 years ago
xiemoyuan	99e3337368	Optimize the error message of OP. (#27478 ) * iCafe 9009: Optimize the error message of OP. * Optimize the error message of GatherTreeOP.	4 years ago
ShenLiang	e8f873df88	optimize the speed&memory of matmul op (#27610 ) * fix the speed&memory of matmul * fix the comment * fix the memory copy * fix the windows ci	4 years ago
tangwei12	9704582eef	fix op error (#27599 ) * fix error * fix error * fix error * merge develop	4 years ago
yaoxuefeng	c9a8801325	enhance error messages of lookup_tale, merge_ids, data_norm (#27619 ) * enhance error messages of lookup_tale, merge_ids, data_norm * fix * fix error msg in .cu	4 years ago
whs	9cc5603d56	Make grid support stopping graients. (#27630 )	4 years ago
furnace	d01f626944	update mv op according PR#27024 (#27474 )	4 years ago
Double_V	9d783aeddd	Error message opt, test=develop (#27467 ) * Error message opt, test=develop * solve comments, test=develop * fix typo, test=develop	4 years ago
Li Fuchen	1501a80f74	add support to float64 input of warpctc op. (#27399 ) * add float64 input to ctc_loss * modified error message of warpctc * update repo and tag of warpctc * add test for warpctc with float64 input * modified warpctc.cmake to make sure build always * resolved sample code bug of warpctc * add core.ops in warpctc dygraph * fix a bug of test	4 years ago
QingshuChen	6b727e08b1	support elementwise add, activation, matmul on Baidu Kunlun (#27143 ) * support elementwise add, activation, matmul on Baidu Kunlun * test=kunlun * minor * test=kunlun * reconstuct the xpu directory * test=kunlun * minor * test=kunlun * minor * test=kunlun * minor * test=kunlun * minor * test=kunlun * minor * test=kunlun	4 years ago

... 2 3 4 5 6 ...

5891 Commits (3d015f1cf529915ab52cb8aef7c475f67fb128b5)