Paddle

Commit Graph

Author	SHA1	Message	Date
AshburnLee	efea540ca9	Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732 )	4 years ago
arlesniak	62d4483649	Added verbose oneDNN lib version (#29378 )	4 years ago
Jacek Czaja	f6cca62575	[oneDNN] Making ThreadID info in caching key optional (#29272 )	4 years ago
taixiurong	760d015c14	add xpu ops for training transformer in kunlun (#29539 ) * 1.fix matmul bug 2. add one hot * add xpu error msg	4 years ago
Huihuang Zheng	a1909affc6	Fix Unit Test: Add Sleep Time for CUDA Retry (#29442 ) Add Sleep Time for CUDA Retry, which is similar to our GPU retry logic. This is a try to avoid init GPU allocation random failure in unit test.	4 years ago
jakpiase	57a4f16d9e	added internal and external reorders to profiler (#29443 ) * added external reorder to profiler * added external and internal reorders to profiler * added internal and external reorder to profiler * added formatting to int/ext reorder commit * removed unnecessary comment	4 years ago
Jack Zhou	1dd7b97b66	fix rnn_op bug in cudnn_version>= 8 (#29406 )	4 years ago
chentianyu03	879e913b6d	Make transpose, trace, kron, reshape, sum op support complex type (#29321 ) * add complex64 and complex128 type; add +-/@ and slice opreator for complex types add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest * kron, reshape, transpose support complex types * sum and trace op support complex types * add test case of sum and trace op * fix the bug of imag part of complex not initialized * format file * format code style * kron support type promotion; modify test cases	4 years ago
卖鱼的哲学	074065e5de	fix expand/uniform_random && concat/transpose to new api on xpu (#29280 ) * fix expand && concat/transpose to new api * update uniform_random_op * update xpu_header	4 years ago
lilong12	1decf4ada6	update, test=develop (#29331 )	4 years ago
Chen Weihang	9ad800ebb2	Support type promote for basic math ops (quantum required) (#29265 ) * basic impl of type promote * add comment & another testcase * fix complex bugs & support python op promote type * fix failed unittests & polish code * add unittest for coverage * change to only promote complex type * polish code details * polish several comments	4 years ago
QingshuChen	64f29fbb70	update kunlun conv2d/softmax/elementwise implemetation (#29229 ) * update conv2d & softmax to new xpu api * test=kunlun * remove useless comments * test=kunlun * remote softmax xpu op * test=kunlun * update kunlun softmax * test=kunlun * update xpu unitest * test=kunlun * fix elementwise_grad bug for kunlun *test=kunlun	4 years ago
chentianyu03	8f45d14263	add complex64 and complex128 type; add +-/@ and slice opreator for c… (#29199 ) add complex64 and complex128 type; add +-/@ and slice opreator for complex types add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest	4 years ago
ShenLiang	e2d01eb650	Support dynamic graph distributed (#28997 ) * add reducer * refine envent for memorycopy * add concat&split for allreduce * apply concat & split for fuse tensor * fix nccl dep * fix the untest, compile problem and ddp initialize problem * fix untest for mac & add some comments & solve the repeated param in sublayers * fix untest for windows & fix document	4 years ago
Zhou Wei	e668cb07fb	fix CUDA 11 error on windows (#29101 )	4 years ago
arlesniak	bc902044a4	Fixes mkldnn dygraph learning rate scheduler crashes (#28988 )	4 years ago
Shang Zhizhou	b9e76a0103	detect tensorRT plugin fp16 in runtime (#27933 ) * remove -DSUPPORTS_CUDA_FP16 in cuda.cmake * comile with cuda9 * add some unittest * notest;test=coverage * add unittest for trt plugin swish && split * update ernie unittest * fix some error message * remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter * fix comile errror when CUDA_ARCH_NAME < Pascal" * fix comile error * update unittest timeout * compile with cuda9 * update error msg * fix code style * add some comments * add define IF_CUDA_ARCH_SUPPORT_FP16 * rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED	4 years ago
Leo Chen	fd3fcb051a	fix typo of flag name (#29154 )	4 years ago
Aurelius84	7ae3cb554a	Polish CUDA Information stdout (#29109 )	4 years ago
Chen Weihang	fea0e294ee	Hide the C++ stack by default and add hints (#29042 ) * default not show cpp statck & add hint * fix failed unittest * fix failed unittests	4 years ago
wawltor	b2c8a00745	remove eigen threadpool for the speed up remove eigen threadpool for the speed up	4 years ago
Jacek Czaja	bd1d6d3b30	extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758 )	4 years ago
Pei Yang	994673bf4f	change avg pooling and global pooling to trt layer in dynamic shape mode (#28702 ) * change avg pooling and global pooling to trt layer * add support for static shape global pooling * modify trt errmsg	4 years ago
gongweibao	1dad8ceaab	Fix gpu memory allocation bug. (#28703 )	4 years ago
QingshuChen	30ef3815b3	adjust kunlun header file (#28536 ) * adjust kunlun header file test=kunlun update kunlun unittest test=kunlun update xpu unitest * test = kunlun * update xpu unittest * test=kunlun * update xpu unitest * test=kunlun	4 years ago
Jacek Czaja	6d8d3d4c22	[oneDNN] Layer norm bf16 kernel (#28619 )	4 years ago
lilong12	80d2024644	bug fix, test=develop (#28674 )	4 years ago
Zhou Wei	849467b5aa	fix user set CUDA_VISIBLE_DEVICES start/end with quotation marks (#28547 )	4 years ago
Chen Weihang	23439b1688	show cpp stack when catch signal (#28415 )	4 years ago
Shang Zhizhou	ea851796e5	TensorRT中ernie模型推理性能优化，支持变长输入 (#28367 ) * fp16 result ok * change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS * auto detect special slice op converter for ernie with trt oss * ernie oss only support fp16 * fix special_slice_plugin serialize bug * matmul in tensorrt ok * ernie unittest ok * add matmul tensorrt unittest * remove demo code	4 years ago
Jacek Czaja	84cc61b2cd	[oneDNN] sum op refactor (#28318 )	4 years ago
Wilber	09fd2b2aab	Paddle support compile on sw (#27858 )	4 years ago
Guo Sheng	9a600df373	Add rnn_op (#28197 ) * Add rnn_op. test=develop * Fix rnn_op grad maker's drop_empty_grad. test=develop	4 years ago
wangchaochaohu	0f4b6247c8	refine the gpu config for performance optimization (#28291 )	4 years ago
Huihuang Zheng	acc11c2a62	Retry CUDA Initialization to Fix Random Failure, test=develop (#28323 ) This PR is follow up of #28213. On that PR we tried to decrease GPU usage, however the CI still randomly failed. So I added retry logic for the initialization of nccl and cusolver. If the initialization failed, we can retry to avoid the random failure.	4 years ago
Leo Chen	18c86fb2fb	hide some logs of p2p (#28307 )	4 years ago
Jacek Czaja	c11d9b3035	[oneDNN ] conv2d fwd&bwd optimization (#27871 )	4 years ago
Chen Weihang	813b2ade34	Enrich the python error types of paddle & polish format (#28124 ) * add multiple exception type * define all exception & polish compile pystack * mapping paddle error to python exception * polish static mode error format * fix failed unittests * fix dytostatic test_error * fix check_nan_inf failed * add unittest for coverage * revert some code try to solve compile error * refactor enforce & error change * polish code & add unittest	4 years ago
Chen Weihang	2babd6ff67	Add compile limit for PADDLE_ENFORCE without error message (#28221 ) * add compile limit for paddle enforce * polish elementwise_op_function.cu.h * fix failed unittest * fix windows compile failed * detail polish * revert no type constructor	4 years ago
Zhou Wei	5d7000215a	fix dynamic_loader more safe and error message on windows (#28117 )	4 years ago
wangchaochaohu	463c72c2d9	refine gpu kernel config for Paddle (#28085 )	4 years ago
Pei Yang	a0b2f93689	reduce trt warning message (#28011 )	4 years ago
lidanqing	7cb4a8b8f2	[oneDNN] Conv dilation support (#27914 ) * conv dilated mkldnn support: forward and backward pass * add mkldnn conv_transpose dilation UT test=develop * remove unnecessary PADDLE_ENFORCE * add int8 and bf16 dilated conv UT * update according to reviews	4 years ago
Zhang Ting	d5cc144c60	tune backward filter algorithm for float16 (#27529 ) * use exhaustive_search for float16 * tune algo only when dtype is float16	4 years ago
Jacek Czaja	55e63763ec	[oneDNN] adaptive pool support (#27747 )	4 years ago
chen zhiyu	6335e6a0a6	add musl option (#27798 )	4 years ago
Jacek Czaja	b9fda2ff09	Fix to issue #25537 (#27546 ) * - condidate fix to issue #25537 test=develop * - UT for transpose NHWC test=develop	4 years ago
joanna.wozna.intel	0cd4907eba	Add avx512 core instructions check (#27732 ) * Add avx instructions check * Small fix * Change function name * Change uint to unsigned int	4 years ago
123malin	cc780b1977	test=develop, optimize geo communicator (#26857 ) * test=develop, optimize geo communicator	4 years ago
lilong12	bbc2add703	Initialize gloo for low level collective apis (#27672 ) * add gloo initializer, test=develop	4 years ago
arlesniak	0ecf441af1	Add support for mkldnn ops types selection with FLAGS in dygraph (#27482 ) * Add support for mkldnn ops types selection with FLAGS in dygraph * use regex to match DNNL verbose * python3 encoding fix	4 years ago
lilong12	36c0410223	Revert "Initialize gloo for low level collective apis (#27356 )", test=document_fix (#27665 )	4 years ago
lilong12	5218b7af6b	add ncclSend and ncclRecv (#27621 ) * include ncclRecv and ncclSend, test=develop	4 years ago
lilong12	fa73e4a284	Initialize gloo for low level collective apis (#27356 ) * add gloo initializer, test=develop	4 years ago
Li Fuchen	1501a80f74	add support to float64 input of warpctc op. (#27399 ) * add float64 input to ctc_loss * modified error message of warpctc * update repo and tag of warpctc * add test for warpctc with float64 input * modified warpctc.cmake to make sure build always * resolved sample code bug of warpctc * add core.ops in warpctc dygraph * fix a bug of test	4 years ago
QingshuChen	6b727e08b1	support elementwise add, activation, matmul on Baidu Kunlun (#27143 ) * support elementwise add, activation, matmul on Baidu Kunlun * test=kunlun * minor * test=kunlun * reconstuct the xpu directory * test=kunlun * minor * test=kunlun * minor * test=kunlun * minor * test=kunlun * minor * test=kunlun * minor * test=kunlun	4 years ago
Zhong Hui	a85592bcbf	fix cpplint error for the autmic max/min fix cpplint error for the autmic max/min	4 years ago
Zhong Hui	597345d17b	fix cuda atomic for ARCH<350 for the automic_max fix cuda atomic for ARCH<350 for the automic_max	4 years ago
Shibo Tao	8f7bb52bd2	fix tensorrt 6 build error. test=develop (#27511 ) * fix tensorrt 6 build error. test=develop * fix. test=develop * bug fix * test=develop	4 years ago
wanghuancoder	df43905f12	use iwyu clean include (#27267 ) * use iwyu clean include, test=develop, test=win * compilation error, test=develop * fix compilation error2, test=develop * fix compilation error3, test=develop * fix compilation error4, test=develop * fix compilation error5, test=develop * fix compilation error6, test=develop * fix compilation error7, test=develop * fix compilation error8, test=develop * fix compilation error8, test=develop * fix compilation error10, test=develop * fix compilation error11, test=develop	4 years ago
Zhong Hui	4a9d21de49	Add GPU Kernels of Segment Ops, support, sum, max, min, mean Add GPU Kernels of Segment Ops, support, sum, max, min, mean	4 years ago
Shang Zhizhou	c17f9cf25f	[bug fix]:Memory increases after adapting the cudnn version to cudnn8 (#27436 ) * [bug fix]:Memory increases after adapting the cudnn version to 8 * [bug fix]cudnnGetConvolutionForwardAlgorithm not defined	4 years ago
Chen Weihang	765064476b	Polish some lost invalid error message (#27445 ) * polish some lost error msg * add some math file to white list * polish detail based reviewer commnet	4 years ago
Leo Chen	aba759ba16	[Feature] Enhance inplace addto strategy for gradient accumulation in static graph (#27112 ) * support use add instead of sum to do gradient accumulation * add inplace addto pass * add grad_add op and inplace addto pass * remove debug code * code refine * fix bug when sereral sum ops inserts at same op_idx * fix Flags type * add addto attribute for conv3d * fix ut * code clean * fix type	4 years ago
GaoWei8	1a7559718e	fix cudnn dyload (#27308 ) * fix cudnn dyload error	4 years ago
Jack Zhou	63203c4abc	enhance reduce op which can reduce tensor with arbitrary rank enhance reduce op which can reduce tensor with arbitrary rank	4 years ago
GaoWei8	ee1ed42c99	change sequence length attribute to input (#27193 ) * replace sequence length attr to input	5 years ago
joanna.wozna.intel	1483ea2304	Add bfloat16 passes (#26999 )	5 years ago
GaoWei8	4ff16eb201	Add padding cudnn interface (#26370 ) * add lstm cudnn of padding data and refine cudnn codes	5 years ago
wangchaochaohu	3eacced950	[cuda11 support] add support for cublas load of same function name (parameter diff) (#26963 )	5 years ago
joanna.wozna.intel	95e1434bb2	Add bfloat16 data type (#25402 )	5 years ago
Zhen Wang	f9066e6a6f	Update the demo code and the doc of varbase.backward. (#26506 ) * update the demo code and the doc of varbase.backward. * update the doc of the fake interface `paddle.fluid.Variable`. * remove BackwardStrategy.	5 years ago
lilong12	1c68138327	[api 2.0] add collective op for cpu using gloo and paddle.distributed.* apis (#26552 ) add collective op for cpu using gloo and paddle.distributed.* apis	5 years ago
joanna.wozna.intel	559e43eee4	Small change in conv2d and quantize pass (#26671 )	5 years ago
Adam	f3909020de	Add mechanism for blocking oneDNN cache clearing (#26502 ) * Add mechanism for blocking oneDNN cache clearing * Review changes and Add thread guards	5 years ago
QingshuChen	138ecf24aa	support Baidu Kunlun AI Accelerator (#25959 ) * support Baidu AI Accelerator * test=kunlun * minor * test=kunlun * support xpu op in separate file * test=kunlun * update XPU error message and remove duplicated code * test=kunlun * minor * test=kunlun * minor * test=kunlun	5 years ago
GaoWei8	1fbee267d4	remove scope in cudnn lstm (#25188 )	5 years ago
Leo Chen	672578a797	Print user-friendly error message in core.ops (#26261 ) * print user-friendly error message * adjust error sumary	5 years ago
wangchaochaohu	0b81d76310	[API2.0] add op for cudnn version query test=develop (#26180 )	5 years ago
joanna.wozna.intel	734cf1c3e9	Change use_quantizer attribute name and data type (#25838 ) * Change use_quantizer attribute name and data type * Fix problem with setting attribute * Add changes due to review * Small change in function * Restore use_quantizer attr for compatibility	5 years ago
Leo Chen	751305ecf0	Add flags to control call stack of error message (#25997 ) * add flags_call_stack_level * update * refine code	5 years ago
Pei Yang	beb0ca5fab	Fix TRT plugin registry without TRT lib (#25982 ) * fix trt plugin registry without trt lib * support trt4 * refine code style	5 years ago
Adam	68c6160e63	Add oneDNN fusion_gru kernel (#25594 ) * Add oneDNN fusion_gru kernel and fix fc+gru pass test=develop * Formatting changes test=develop * Lint fixes test=develop * Add memory::format_tag::any to GRU weights test=develop * Fix build with CUDA * Fix build with CUDA v2	5 years ago
Zhaolong Xing	358bc06c72	[CUDNN8 support] : support CUDNN8 (#25664 ) * cunn8 support test=develop * fix ci error test=develop	5 years ago
Pei Yang	b717895f64	Fix registering trt plugin (#25744 ) * develop dynamic shape serilization * add test param for gelu * fix bugs * delete redundant comments * debug * fix conflict. test=develop * fix bug. test=develop * add trt dynamic shape serialized support * fix ernie serialized bug test=develop * fix codestyle test=develop * fix bug test=develop * fix bug.test=develop * modify cmakelist test=develop * fix bug test=develop * fix error message. test=develop * fix trt register plugin based on pr#25003 * add trt dynload * fix deserialization bug of not finding plugin registration * refine code style * recover engine key in tensorrt_subgraph_pass * for ci coverage * add unittest for deserialization Co-authored-by: haozech <chenhaoze94@gmail.com>	5 years ago
Chen Weihang	9b5a65b819	refine init signal handler meg dumper (#25911 )	5 years ago
Chen Weihang	d47304e6d9	Refine paddle error stack format (#25790 ) * refine error stack format * polish compile traceback format * polish detail format	5 years ago
Chen Weihang	2469b578f5	Unified paddle error format when catch system signal (#25765 ) * unified signal error format * refine signal error message	5 years ago
Chen Weihang	1b3081b1b4	Simplify BufferedReader to improve DataLoader performance (#25648 ) * simplify buffered reader to improve DataLoader performance * fix 22 failed unittests * fix cuda pinned context condition * fix test_reader_reset failed * fix two failed unittests * change unittest place * polish error messaage * polish cast op GetExpecctedKernelType * remove debug info in unittest	5 years ago
arlesniak	e52df3b125	Added DNNL cache management for DyGraph (#25624 ) * Added DNNL cache management for DyGraph * move FLAGS_use_mkldnn to more general CMakeLists, getu use of the flag in ClearGradients * missing file * Fixes after review * Bringing back original idea of place for 'use_mkldnn' flag to be accessible from platform nad imperative. * Removed duplicate and added docs * Fixes for CI	5 years ago
joanna.wozna.intel	e5bbffa84c	Add NOMINMAX define due to windows.h max/min macro conflict (#25637 ) test=develop	5 years ago
Chen Weihang	a6abd92dfd	Polish install error hint message (#25531 ) * polish install error hint msg, test=develop * fix variable error, test=develop * polish hint messgae again	5 years ago
Jacek Czaja	7dbc441eab	[oneDNN] cache cosmetics improvement (#25576 )	5 years ago
LielinJiang	7129f544f0	Add bilateral_slice op (#25401 ) * add bilateral slice op	5 years ago
GaoWei8	c10dcff12d	refine PADDLE_ENFORCE (#25456 ) * Refine PADDLE_ENFORCE in paddle/fluid/platform test=develop	5 years ago
Chen Weihang	0b54d54fd8	Fix index overflow bug of the CUDA kernel loop increment (#25435 ) * fix softmax_with_cross_entropy cuda kernel overflow bug, test=develop * replace old macro & for condition, test=develop * polish details, test=develop	5 years ago
Chen Weihang	7be285a66f	remove useless property, test=develop (#25461 ) remove useless property	5 years ago
Jacek Czaja	a5d1592f6c	Added missing oneDNN format (#25450 ) test=develop	5 years ago
Chen Weihang	172d4ecb6c	remove WITH_DSO compile option (#25444 )	5 years ago
Zhen Wang	bb45af02ac	add the c++ part of Imperative QAT. test=develop (#25446 )	5 years ago

1 2 3 4 5 ...

1073 Commits (a5c56d83a1b16482dcaae1db6e0543b1cf355f3f)