Paddle

Commit Graph

Author	SHA1	Message	Date
Jack Zhou	c7cada8571	Fix gru performace decline in 1.8.5 (#29455 )	5 years ago
LoveAn	671555ed32	Compiling operator libraries with Unity build (#29130 ) * Compiling operator libraries with Unity Build on Windows CPU. * Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci * Add option in windows ci script, no_test, test=windows_ci * Optimize parallel compiling, test=develop * remove limit of parallel compile and skip some ops in UB, test=develop * remove changes of header file, test=develop * remove changes of header file, test=develop * fix test_eye_op unittest failed, test=develop * Compiling operator libraries with Unity Build on Linux, test=develop * set default WITH_UNITY_BUILD=OFF, test=develop * Move unity build rules into a single file and add comment, test=develop * optimize parallel compilation, test=develop * fix undefined reference error on coverage ci, test=develop	5 years ago
chentianyu03	8f45d14263	add complex64 and complex128 type; add +-/@ and slice opreator for c… (#29199 ) add complex64 and complex128 type; add +-/@ and slice opreator for complex types add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest	5 years ago
Jack Zhou	bc6033f86b	fix gru gcc7.4 bug for the gru compile fix gru gcc7.4 bug for the gru compile	5 years ago
Jack Zhou	085260f3de	Add eigen gru and fix the dropout bug in the rnn Add eigen gru and fix the dropout bug in the rnn	5 years ago
Shang Zhizhou	b9e76a0103	detect tensorRT plugin fp16 in runtime (#27933 ) * remove -DSUPPORTS_CUDA_FP16 in cuda.cmake * comile with cuda9 * add some unittest * notest;test=coverage * add unittest for trt plugin swish && split * update ernie unittest * fix some error message * remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter * fix comile errror when CUDA_ARCH_NAME < Pascal" * fix comile error * update unittest timeout * compile with cuda9 * update error msg * fix code style * add some comments * add define IF_CUDA_ARCH_SUPPORT_FP16 * rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED	5 years ago
wawltor	b2c8a00745	remove eigen threadpool for the speed up remove eigen threadpool for the speed up	5 years ago
Jack Zhou	9362d85e0e	Add LSTM, Simple RNN and GRU CPU kernel (#28577 ) * add lstm, simple rnn op kernel * fix the test_lstm for the rnn op * change func name * fix forward postprocess bug * add gru forward, backward code * remove unittest.skipIf; use a big rnn op instead of combination op * fix input doesn't have gradient bug * add eigen lstm forward, backward Co-authored-by: wawltor <fangzeyang0904@hotmail.com>	5 years ago
QingshuChen	30ef3815b3	adjust kunlun header file (#28536 ) * adjust kunlun header file test=kunlun update kunlun unittest test=kunlun update xpu unitest * test = kunlun * update xpu unittest * test=kunlun * update xpu unitest * test=kunlun	5 years ago
YUNSHEN XIE	ba0756325a	exec ut no more than 15s 1 (#28439 ) * disable ut test_parallel_executor_fetch_isolated_var,test=document_fix * test for limiting ut exec time as 15S * fix an error caused by cannot find ut * fix some error * can not find test_transformer * fix error caused by ut not run in windows * fix error caused by Compiler Options * fix error caused by setting timeout value as 15 in python/paddle/tests/CMakeLists.txt * setting timeout value to 120s for old ut * add the timeout value setting * fix error caused by ut only run in coverage_ci * add analyzer_transformer_profile_tester * fix some error * fix some error * fix error with inference option * fix error with inference option setting as ON_INFER * add some ut to set timeout * modified some option * fix error * fix some timeout error * fix error * fix error * fix timeout for test_analyzer_bfloat16_resnet50 * fix error * setting timeout properity for some ut * first pr for new ut timeout as 15S	5 years ago
Wilber	09fd2b2aab	Paddle support compile on sw (#27858 )	5 years ago
Leo Chen	6115c14fca	Pool2d cuda kernel supports fp16 (#28316 ) * pool2d cuda kernel supports fp16 * fix compile issue of template * add ut	5 years ago
Double_V	5289b72acc	fix Wmaybe-uninitialized warning in pooling.cc, test=develop (#28126 )	5 years ago
wangchaochaohu	463c72c2d9	refine gpu kernel config for Paddle (#28085 )	5 years ago
wangchaochaohu	c5fcc96d5b	xpu support for fill_constant Op (#27675 )	5 years ago
Double_V	f6ad2375be	fix pool3d bug, test=develop (#27718 ) * fix pool3d bug, test=develop * fix unitest, test=develop * fix test and fix pool2d bug, test=develop	5 years ago
Li Fuchen	1501a80f74	add support to float64 input of warpctc op. (#27399 ) * add float64 input to ctc_loss * modified error message of warpctc * update repo and tag of warpctc * add test for warpctc with float64 input * modified warpctc.cmake to make sure build always * resolved sample code bug of warpctc * add core.ops in warpctc dygraph * fix a bug of test	5 years ago
Zhong Hui	a85592bcbf	fix cpplint error for the autmic max/min fix cpplint error for the autmic max/min	5 years ago
ShenLiang	6fc74bbaf6	add fp16 for matmul (#27523 ) * add fp16 for matmul	5 years ago
wanghuancoder	df43905f12	use iwyu clean include (#27267 ) * use iwyu clean include, test=develop, test=win * compilation error, test=develop * fix compilation error2, test=develop * fix compilation error3, test=develop * fix compilation error4, test=develop * fix compilation error5, test=develop * fix compilation error6, test=develop * fix compilation error7, test=develop * fix compilation error8, test=develop * fix compilation error8, test=develop * fix compilation error10, test=develop * fix compilation error11, test=develop	5 years ago
Zhong Hui	4a9d21de49	Add GPU Kernels of Segment Ops, support, sum, max, min, mean Add GPU Kernels of Segment Ops, support, sum, max, min, mean	5 years ago
Zhong Hui	f4c750d721	Add the cpu version of segment sum mean max min op Add the cpu version of segment sum mean max min op	5 years ago
wawltor	b6a4349dd4	fix the error message for the math dir https://github.com/PaddlePaddle/Paddle/pull/27332	5 years ago
Jack Zhou	63203c4abc	enhance reduce op which can reduce tensor with arbitrary rank enhance reduce op which can reduce tensor with arbitrary rank	5 years ago
Jack Zhou	6e29c2da05	Error description optimize for the math dir Error description optimize for the math dir	5 years ago
Zhong Hui	bbad3414e8	Enhance the error messages for files in operators/math Enhance the error messages for files in operators/math	5 years ago
Jack Zhou	9437ce36c4	Error description optimize for math dir Error description optimize for math dir	5 years ago
Steffy-zxf	50e60e8779	update error info for selected_rows_functor update error info for selected_rows_functor	5 years ago
wangchaochaohu	c71d79b1d2	[cuda11 support] change the CMakeLists to support the cuda11 (#27124 )	5 years ago
kinghuin	ed292695c5	optimize the error message for math dir optimize the error message for math dir	5 years ago
kinghuin	1b102dd552	optimize the error message for unpooling.cc fix the error message for the unpooling.cc	5 years ago
joanna.wozna.intel	95e1434bb2	Add bfloat16 data type (#25402 )	5 years ago
Leo Chen	844583c8fd	Refine paddle.manual_seed (#26496 ) * refine manual seed * fix ci problem * fix unittests * fix unittest * set is_init_py=false in manual_seed * fix unittest * fix bernoulli_op * fix(unittest): change random_seed to manual_seed * 🐞fix(unittest): fix manual_seed * trigger ci * fix test_sentiment * fix test_imperative_save_load * fix test_uniform_random_op * fix test_uniform_random_op * fix test_jit_save_load * merge develop * fix manual_seed * fix manual_seed * use global engine * use shared_ptr * fix double free * fix bug * fix bug * fix bug * fix test bug * fix test bug * fix test bug * fix ci	5 years ago
Bai Yifan	8986a82131	fix adaptive gpu grad bug, add doc refine (#26660 )	5 years ago
yaoxuefeng	efee426742	support generator seed in related kernals test=develop (#26495 )	5 years ago
ShenLiang	c609066074	Add Matmul op (#26411 ) * add matmul_v2	5 years ago
QingshuChen	138ecf24aa	support Baidu Kunlun AI Accelerator (#25959 ) * support Baidu AI Accelerator * test=kunlun * minor * test=kunlun * support xpu op in separate file * test=kunlun * update XPU error message and remove duplicated code * test=kunlun * minor * test=kunlun * minor * test=kunlun	5 years ago
Pei Yang	b717895f64	Fix registering trt plugin (#25744 ) * develop dynamic shape serilization * add test param for gelu * fix bugs * delete redundant comments * debug * fix conflict. test=develop * fix bug. test=develop * add trt dynamic shape serialized support * fix ernie serialized bug test=develop * fix codestyle test=develop * fix bug test=develop * fix bug.test=develop * modify cmakelist test=develop * fix bug test=develop * fix error message. test=develop * fix trt register plugin based on pr#25003 * add trt dynload * fix deserialization bug of not finding plugin registration * refine code style * recover engine key in tensorrt_subgraph_pass * for ci coverage * add unittest for deserialization Co-authored-by: haozech <chenhaoze94@gmail.com>	5 years ago
Zhang Ting	6486fe8a94	improve GPU performance of transpose, test=develop (#25862 )	5 years ago
ShenLiang	bca303165a	fix inverse bug (#25641 ) * fix inverse bug, test=develop * fix the untest, test=develop * add singular checking, test=develop * fix the utest, test=develop * use memory::copy, test=develop * fix bost_get, test=develop * fix position, test=develop	5 years ago
joanna.wozna.intel	e5bbffa84c	Add NOMINMAX define due to windows.h max/min macro conflict (#25637 ) test=develop	5 years ago
Zhang Ting	30d1ff3bb4	call cublasGemmStridedBatchedEx when using fp16, test=develop (#25553 )	5 years ago
Chen Weihang	0b54d54fd8	Fix index overflow bug of the CUDA kernel loop increment (#25435 ) * fix softmax_with_cross_entropy cuda kernel overflow bug, test=develop * replace old macro & for condition, test=develop * polish details, test=develop	5 years ago
zlsh80826	e528392de9	[Paddle-TRT] SkipLayernorm vectorized memory optimization (#25117 ) * add explicit specialization * add skiplayernorm vector load if available * test=develop	5 years ago
zhupengyang	6de75082cb	fix test_hsigmoid windows ci (#25311 )	5 years ago
Leo Chen	fa657b3dbb	fix bug of prelu when rank not equal 4, test=develop (#25067 ) * fix bug of prelu when rank not equal 4, test=develop * fix prelu inference, test=develop * fix api, test=develop * fix shape when mode is chennel, test=develop * remove debug code, test=develop * add unittest, test=develop	5 years ago
zlsh80826	479c8834f7	[Paddle-TRT] Fixes #24731 , opt for SoftmaxKernelWithEltadd kernel, test=develop (#24834 ) * blockReduce opt * launch threads align to warpSize * reduce unnecessary shared memory for broadcast reduced value * vectorize SoftmaxKernelWithEltadd * add fp16 constrain * test=develop	5 years ago
ceci3	8db66fc3f6	fix cos_sim, test=develop (#25017 )	5 years ago
Chen Weihang	d1062d5278	Replace all errors thrown by LOG(FATAL) with PADDLE_THROW (#24759 ) * remove REPLACE_ENFORCE_GLOG compile option & add ci rule prohibit LOG(FATAL) using, test=develop * remove ci test case, test=develop * replace all LOG(FATAL) & polish message, test=develop * fix typo, test=develop * polish error info detail, test=develop	5 years ago
Leo Chen	b67ded04f2	Support gradient accumulation of fp16 in imperative mode (#24823 ) * support gradient accumulation of fp16 in imperative mode, test=develop * enhance coverage test, test=develop * follow comments, test=develop	5 years ago

1 2 3 4 5 ...

769 Commits (760d015c14d9c35b0271c3a90898d52f39596190)