Paddle

Commit Graph

Author	SHA1	Message	Date
dzhwinter	ad2ab95207	"small fix of Place" (#6766 )	7 years ago
QI JUN	93a2d9c59d	add more place test and rename Cudnn to CUDNN (#6621 ) * add more place_test and rename Cudnn to CUDNN * fix ci	7 years ago
Yu Yang	1b0c7d7c7a	Simplize system_allocator and fix GPU_INFO (#6653 )	7 years ago
Yu Yang	d5cab4f07c	Fix compile on CUDA9.1 & MacOS (#6642 )	7 years ago
tensor-tang	bf269d67b3	fix place_test on MKLDNNPlace	7 years ago
tensor-tang	a92f057ed1	fix conflict of Place	7 years ago
tensor-tang	7728c53448	Merge remote-tracking branch 'upstream/develop' into fluid Conflicts: paddle/platform/place.h	7 years ago
tensor-tang	f271210595	fix undefined issue when with_gpu	7 years ago
tensor-tang	e0c3317646	add MKLDNNPlace	7 years ago
dzhwinter	0e9b393b34	"derived cudnnDevice context" (#6585 ) * "derived cudnnDevice context" * "leave remove cudnn handle from CUDADeviceContext" * "fix math function error"	7 years ago
QI JUN	61ec0b9516	Refine device context (#6433 ) There are mainly following fixes: - take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place` - remove `eigen_device` interface in base class `DeviceContext` - remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext` - remove unused `platform::EigenDeviceConverter` - rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL` - rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`	7 years ago
qingqing01	5ba231d80b	Merge pull request #6374 from reyoung/feature/remove_device_context_finish Remove DeviceContext::Finish	7 years ago
Yang Yu	6b9567e0ac	Remove DeviceContext::Finish	7 years ago
Yu Yang	f291abfc53	Add HasCUDNN to detect if CUDNN is installed or not (#6349 ) * Add HasCUDNN to detect if CUDNN is installed or not * Fix CI	7 years ago
QI JUN	96a5f96cc1	fix bug in gpu default memory allocating policy (#6268 )	7 years ago
QI JUN	d066b07f14	change GPU memory allocating policy (#6159 ) * change GPU memory allocating policy * fix potential overflow bug	7 years ago
chengduo	e50f35706a	code refine (#6164 )	7 years ago
Yu Yang	8ac02279f2	Fix the proformance problem of enforce (#6085 ) * Fix Proformance problem of enforce * Fix missing `;` in code * Fix CI	7 years ago
武毅	4ecbab42d8	Fix compile on cudnn7 (#5982 ) * fix compile on cudnn7 * update * update * make silent	7 years ago
dangqingqing	696b0253e5	Refine paddle/v2/fluid/profiler.py.	7 years ago
dangqingqing	623f62a7dc	Add cuda profiler tools and expose it in Python.	7 years ago
dangqingqing	322d69f209	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into nvprof	7 years ago
dangqingqing	6cf2dcbc1f	Add cuda profiler tools.	7 years ago
武毅	a06bec1287	Conv cudnn 3d (#5783 ) * conv cudnn 3d * update test case * update * update * follow comments and remove groups from helper * update * refine * update * follow comments2 * update * fix compile	7 years ago
Qiao Longfei	c9172c1cb3	Make enforce target (#5889 ) * make enforce a target and dependent on nccl when gpu is enabled * add some more dependency	7 years ago
Yu Yang	c077a6d57c	Feature/support int64 for sum (#5832 ) * Support int64 for sum op * Refine code	7 years ago
chengduoZH	dec61ab6df	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_pool3d	7 years ago
chengduoZH	0bc2f41da9	remove conflict	7 years ago
chengduoZH	7e91da41e7	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_pool3d	7 years ago
wanghaox	0968c7cd6b	Update code and fix conflicts.	7 years ago
dzhwinter	e97b89873a	"fix accuracy kernel bug" (#5673 ) * "fix accuracy kernel bug" * "relauch ci"	7 years ago
chengduoZH	74912c7d4e	fix data layout	7 years ago
dangqingqing	884ce5d5a2	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cmake_speed	7 years ago
chengduoZH	ec1e2fc938	add cudnn_pool3d unit test	7 years ago
chengduoZH	a93a59ec7d	add cudnn 3d unit test	7 years ago
Yang Yu	174050277a	Fix GPU Compile on Linux	7 years ago
dangqingqing	524ccba4fe	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cmake_speed	7 years ago
dangqingqing	f5e367655e	Use G++ to compile some cu operators.	7 years ago
emailweixu	2378679a9e	Fix a dead lock bug for dyload/nccl.h when nccl lib cannot be loaded (#5533 ) It caused by a bug of std::call_once described in https://stackoverflow.com/questions/41717579/stdcall-once-hangs-on-second-call-after-callable-threw-on-first-call. It is likely caused by a deeper bug of pthread_once, which is discussed in https://patchwork.ozlabs.org/patch/482350/	7 years ago
Yang Yu	3187451ae7	CompareOp's kernel device type is decided by input tensor place CompareOp can run on CPU even other operators are running on GPU, since opeatations like comparing control flags should be performed only on CPU	7 years ago
qingqing01	58db07b7bb	Check errors for the cuda kernel calls. (#5436 )	7 years ago
QI JUN	afd1e844fd	remove unused code (#5219 ) * remove unused code * fix cmake file * fix build error	7 years ago
Dong Zhihong	16a39d24f3	fix conflict	7 years ago
Qiao Longfei	56b723c40d	Cudnn batch norm op (#5067 ) * init cudnn batch norm op * rename batch_norm_cudnn_op.cc batch_norm_op.cu * correct name style * add ExtractNCWHD, simplify code * fix ExtractNCWHD * use CUDNN_ENFORCE instead of PADDLE_ENFORCE	7 years ago
Dong Zhihong	0990c87bf6	checkin nccl operator	7 years ago
Yu Yang	94e741d6f0	Use external project for NCCL (#5028 )	7 years ago
Yu Yang	43c6ff212e	Feature/nccl dso (#5001 ) * "add nccl enforce" * Dev * Update comment * Add nccl test * Follow comments	7 years ago
Markus Kliegl	164898277c	MatMul operator (#4856 ) * initial matmul operator Similar to np.matmul, but also has transpose_X and transpose_Y flags, and only supports tensors from rank 1 to 3 inclusive. For GPU, uses cublas?gemmStridedBatched. For CPU, uses cblas_?gemm_batch if available via MKL; otherwise a simple serial implementation that loops over the batch dimension is employed for now.	7 years ago
武毅	a3ccbdb3b6	Cudnn conv op (#4195 ) * add cudnn_conv_op * WIP * update * update * fix grad check * use platform::memory * add support group for cudnn * update * follow comments * fix onlycpu build * update cuda define * follow comments * follow comments * merge with updates * fix compile error * follow comments * follow comments	7 years ago
Yang Yang(Tony)	c3bf332666	Merge pull request #4537 from QiJune/executor_impl Executor interface design and implementation	7 years ago

1 2 3 4 5

221 Commits (feb05c3a54c5fd1f50e76dcb713bcc832c784fb5)