Paddle

Commit Graph

Author	SHA1	Message	Date
QingshuChen	64f29fbb70	update kunlun conv2d/softmax/elementwise implemetation (#29229 ) * update conv2d & softmax to new xpu api * test=kunlun * remove useless comments * test=kunlun * remote softmax xpu op * test=kunlun * update kunlun softmax * test=kunlun * update xpu unitest * test=kunlun * fix elementwise_grad bug for kunlun *test=kunlun	4 years ago
chentianyu03	8f45d14263	add complex64 and complex128 type; add +-/@ and slice opreator for c… (#29199 ) add complex64 and complex128 type; add +-/@ and slice opreator for complex types add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest	4 years ago
ShenLiang	e2d01eb650	Support dynamic graph distributed (#28997 ) * add reducer * refine envent for memorycopy * add concat&split for allreduce * apply concat & split for fuse tensor * fix nccl dep * fix the untest, compile problem and ddp initialize problem * fix untest for mac & add some comments & solve the repeated param in sublayers * fix untest for windows & fix document	4 years ago
Zhou Wei	e668cb07fb	fix CUDA 11 error on windows (#29101 )	4 years ago
arlesniak	bc902044a4	Fixes mkldnn dygraph learning rate scheduler crashes (#28988 )	4 years ago
Shang Zhizhou	b9e76a0103	detect tensorRT plugin fp16 in runtime (#27933 ) * remove -DSUPPORTS_CUDA_FP16 in cuda.cmake * comile with cuda9 * add some unittest * notest;test=coverage * add unittest for trt plugin swish && split * update ernie unittest * fix some error message * remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter * fix comile errror when CUDA_ARCH_NAME < Pascal" * fix comile error * update unittest timeout * compile with cuda9 * update error msg * fix code style * add some comments * add define IF_CUDA_ARCH_SUPPORT_FP16 * rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED	4 years ago
Leo Chen	fd3fcb051a	fix typo of flag name (#29154 )	4 years ago
Aurelius84	7ae3cb554a	Polish CUDA Information stdout (#29109 )	4 years ago
Chen Weihang	fea0e294ee	Hide the C++ stack by default and add hints (#29042 ) * default not show cpp statck & add hint * fix failed unittest * fix failed unittests	4 years ago
wawltor	b2c8a00745	remove eigen threadpool for the speed up remove eigen threadpool for the speed up	4 years ago
Jacek Czaja	bd1d6d3b30	extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758 )	4 years ago
Pei Yang	994673bf4f	change avg pooling and global pooling to trt layer in dynamic shape mode (#28702 ) * change avg pooling and global pooling to trt layer * add support for static shape global pooling * modify trt errmsg	4 years ago
gongweibao	1dad8ceaab	Fix gpu memory allocation bug. (#28703 )	4 years ago
QingshuChen	30ef3815b3	adjust kunlun header file (#28536 ) * adjust kunlun header file test=kunlun update kunlun unittest test=kunlun update xpu unitest * test = kunlun * update xpu unittest * test=kunlun * update xpu unitest * test=kunlun	4 years ago
Jacek Czaja	6d8d3d4c22	[oneDNN] Layer norm bf16 kernel (#28619 )	4 years ago
lilong12	80d2024644	bug fix, test=develop (#28674 )	4 years ago
Zhou Wei	849467b5aa	fix user set CUDA_VISIBLE_DEVICES start/end with quotation marks (#28547 )	4 years ago
Chen Weihang	23439b1688	show cpp stack when catch signal (#28415 )	4 years ago
Shang Zhizhou	ea851796e5	TensorRT中ernie模型推理性能优化，支持变长输入 (#28367 ) * fp16 result ok * change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS * auto detect special slice op converter for ernie with trt oss * ernie oss only support fp16 * fix special_slice_plugin serialize bug * matmul in tensorrt ok * ernie unittest ok * add matmul tensorrt unittest * remove demo code	4 years ago
Jacek Czaja	84cc61b2cd	[oneDNN] sum op refactor (#28318 )	4 years ago
Wilber	09fd2b2aab	Paddle support compile on sw (#27858 )	4 years ago
Guo Sheng	9a600df373	Add rnn_op (#28197 ) * Add rnn_op. test=develop * Fix rnn_op grad maker's drop_empty_grad. test=develop	4 years ago
wangchaochaohu	0f4b6247c8	refine the gpu config for performance optimization (#28291 )	4 years ago
Huihuang Zheng	acc11c2a62	Retry CUDA Initialization to Fix Random Failure, test=develop (#28323 ) This PR is follow up of #28213. On that PR we tried to decrease GPU usage, however the CI still randomly failed. So I added retry logic for the initialization of nccl and cusolver. If the initialization failed, we can retry to avoid the random failure.	4 years ago
Leo Chen	18c86fb2fb	hide some logs of p2p (#28307 )	4 years ago
Jacek Czaja	c11d9b3035	[oneDNN ] conv2d fwd&bwd optimization (#27871 )	4 years ago
Chen Weihang	813b2ade34	Enrich the python error types of paddle & polish format (#28124 ) * add multiple exception type * define all exception & polish compile pystack * mapping paddle error to python exception * polish static mode error format * fix failed unittests * fix dytostatic test_error * fix check_nan_inf failed * add unittest for coverage * revert some code try to solve compile error * refactor enforce & error change * polish code & add unittest	4 years ago
Chen Weihang	2babd6ff67	Add compile limit for PADDLE_ENFORCE without error message (#28221 ) * add compile limit for paddle enforce * polish elementwise_op_function.cu.h * fix failed unittest * fix windows compile failed * detail polish * revert no type constructor	4 years ago
Zhou Wei	5d7000215a	fix dynamic_loader more safe and error message on windows (#28117 )	4 years ago
wangchaochaohu	463c72c2d9	refine gpu kernel config for Paddle (#28085 )	4 years ago
Pei Yang	a0b2f93689	reduce trt warning message (#28011 )	4 years ago
lidanqing	7cb4a8b8f2	[oneDNN] Conv dilation support (#27914 ) * conv dilated mkldnn support: forward and backward pass * add mkldnn conv_transpose dilation UT test=develop * remove unnecessary PADDLE_ENFORCE * add int8 and bf16 dilated conv UT * update according to reviews	4 years ago
Zhang Ting	d5cc144c60	tune backward filter algorithm for float16 (#27529 ) * use exhaustive_search for float16 * tune algo only when dtype is float16	4 years ago
Jacek Czaja	55e63763ec	[oneDNN] adaptive pool support (#27747 )	4 years ago
chen zhiyu	6335e6a0a6	add musl option (#27798 )	4 years ago
Jacek Czaja	b9fda2ff09	Fix to issue #25537 (#27546 ) * - condidate fix to issue #25537 test=develop * - UT for transpose NHWC test=develop	4 years ago
joanna.wozna.intel	0cd4907eba	Add avx512 core instructions check (#27732 ) * Add avx instructions check * Small fix * Change function name * Change uint to unsigned int	4 years ago
123malin	cc780b1977	test=develop, optimize geo communicator (#26857 ) * test=develop, optimize geo communicator	4 years ago
lilong12	bbc2add703	Initialize gloo for low level collective apis (#27672 ) * add gloo initializer, test=develop	4 years ago
arlesniak	0ecf441af1	Add support for mkldnn ops types selection with FLAGS in dygraph (#27482 ) * Add support for mkldnn ops types selection with FLAGS in dygraph * use regex to match DNNL verbose * python3 encoding fix	4 years ago
lilong12	36c0410223	Revert "Initialize gloo for low level collective apis (#27356 )", test=document_fix (#27665 )	4 years ago
lilong12	5218b7af6b	add ncclSend and ncclRecv (#27621 ) * include ncclRecv and ncclSend, test=develop	4 years ago
lilong12	fa73e4a284	Initialize gloo for low level collective apis (#27356 ) * add gloo initializer, test=develop	4 years ago
Li Fuchen	1501a80f74	add support to float64 input of warpctc op. (#27399 ) * add float64 input to ctc_loss * modified error message of warpctc * update repo and tag of warpctc * add test for warpctc with float64 input * modified warpctc.cmake to make sure build always * resolved sample code bug of warpctc * add core.ops in warpctc dygraph * fix a bug of test	4 years ago
QingshuChen	6b727e08b1	support elementwise add, activation, matmul on Baidu Kunlun (#27143 ) * support elementwise add, activation, matmul on Baidu Kunlun * test=kunlun * minor * test=kunlun * reconstuct the xpu directory * test=kunlun * minor * test=kunlun * minor * test=kunlun * minor * test=kunlun * minor * test=kunlun * minor * test=kunlun	4 years ago
Zhong Hui	a85592bcbf	fix cpplint error for the autmic max/min fix cpplint error for the autmic max/min	4 years ago
Zhong Hui	597345d17b	fix cuda atomic for ARCH<350 for the automic_max fix cuda atomic for ARCH<350 for the automic_max	4 years ago
Shibo Tao	8f7bb52bd2	fix tensorrt 6 build error. test=develop (#27511 ) * fix tensorrt 6 build error. test=develop * fix. test=develop * bug fix * test=develop	4 years ago
wanghuancoder	df43905f12	use iwyu clean include (#27267 ) * use iwyu clean include, test=develop, test=win * compilation error, test=develop * fix compilation error2, test=develop * fix compilation error3, test=develop * fix compilation error4, test=develop * fix compilation error5, test=develop * fix compilation error6, test=develop * fix compilation error7, test=develop * fix compilation error8, test=develop * fix compilation error8, test=develop * fix compilation error10, test=develop * fix compilation error11, test=develop	4 years ago
Zhong Hui	4a9d21de49	Add GPU Kernels of Segment Ops, support, sum, max, min, mean Add GPU Kernels of Segment Ops, support, sum, max, min, mean	4 years ago

1 2 3 4 5 ...

1012 Commits (28164b266f4639c48fad7923caebbc8fb4921b45)