Paddle

Commit Graph

Author	SHA1	Message	Date
dangqingqing	521db98bc9	Refine CUDA profiler and delete the test file.	7 years ago
dangqingqing	f266284d9f	Fix the compiling for only CPU mode.	7 years ago
dangqingqing	10622ba3cf	Resolve conflicts.	7 years ago
dangqingqing	9d73950ec9	Add profiling tools for fluid.	7 years ago
QI JUN	93a2d9c59d	add more place test and rename Cudnn to CUDNN (#6621 ) * add more place_test and rename Cudnn to CUDNN * fix ci	7 years ago
Yu Yang	1b0c7d7c7a	Simplize system_allocator and fix GPU_INFO (#6653 )	7 years ago
Yu Yang	d5cab4f07c	Fix compile on CUDA9.1 & MacOS (#6642 )	7 years ago
tensor-tang	bf269d67b3	fix place_test on MKLDNNPlace	7 years ago
tensor-tang	a92f057ed1	fix conflict of Place	7 years ago
tensor-tang	7728c53448	Merge remote-tracking branch 'upstream/develop' into fluid Conflicts: paddle/platform/place.h	7 years ago
tensor-tang	f271210595	fix undefined issue when with_gpu	7 years ago
tensor-tang	e0c3317646	add MKLDNNPlace	7 years ago
dzhwinter	0e9b393b34	"derived cudnnDevice context" (#6585 ) * "derived cudnnDevice context" * "leave remove cudnn handle from CUDADeviceContext" * "fix math function error"	7 years ago
QI JUN	61ec0b9516	Refine device context (#6433 ) There are mainly following fixes: - take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place` - remove `eigen_device` interface in base class `DeviceContext` - remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext` - remove unused `platform::EigenDeviceConverter` - rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL` - rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`	7 years ago
qingqing01	5ba231d80b	Merge pull request #6374 from reyoung/feature/remove_device_context_finish Remove DeviceContext::Finish	7 years ago
Yang Yu	6b9567e0ac	Remove DeviceContext::Finish	7 years ago
Yu Yang	f291abfc53	Add HasCUDNN to detect if CUDNN is installed or not (#6349 ) * Add HasCUDNN to detect if CUDNN is installed or not * Fix CI	7 years ago
QI JUN	96a5f96cc1	fix bug in gpu default memory allocating policy (#6268 )	7 years ago
QI JUN	d066b07f14	change GPU memory allocating policy (#6159 ) * change GPU memory allocating policy * fix potential overflow bug	7 years ago
chengduo	e50f35706a	code refine (#6164 )	7 years ago
Yu Yang	8ac02279f2	Fix the proformance problem of enforce (#6085 ) * Fix Proformance problem of enforce * Fix missing `;` in code * Fix CI	7 years ago
武毅	4ecbab42d8	Fix compile on cudnn7 (#5982 ) * fix compile on cudnn7 * update * update * make silent	7 years ago
dangqingqing	696b0253e5	Refine paddle/v2/fluid/profiler.py.	7 years ago
dangqingqing	623f62a7dc	Add cuda profiler tools and expose it in Python.	7 years ago
dangqingqing	322d69f209	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into nvprof	7 years ago
dangqingqing	6cf2dcbc1f	Add cuda profiler tools.	7 years ago
武毅	a06bec1287	Conv cudnn 3d (#5783 ) * conv cudnn 3d * update test case * update * update * follow comments and remove groups from helper * update * refine * update * follow comments2 * update * fix compile	7 years ago
Qiao Longfei	c9172c1cb3	Make enforce target (#5889 ) * make enforce a target and dependent on nccl when gpu is enabled * add some more dependency	7 years ago
Yu Yang	c077a6d57c	Feature/support int64 for sum (#5832 ) * Support int64 for sum op * Refine code	7 years ago
chengduoZH	dec61ab6df	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_pool3d	7 years ago
chengduoZH	0bc2f41da9	remove conflict	7 years ago
chengduoZH	7e91da41e7	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_pool3d	7 years ago
wanghaox	0968c7cd6b	Update code and fix conflicts.	7 years ago
dzhwinter	e97b89873a	"fix accuracy kernel bug" (#5673 ) * "fix accuracy kernel bug" * "relauch ci"	7 years ago
chengduoZH	74912c7d4e	fix data layout	7 years ago
dangqingqing	884ce5d5a2	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cmake_speed	7 years ago
chengduoZH	ec1e2fc938	add cudnn_pool3d unit test	7 years ago
chengduoZH	a93a59ec7d	add cudnn 3d unit test	7 years ago
Yang Yu	174050277a	Fix GPU Compile on Linux	7 years ago
dangqingqing	524ccba4fe	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cmake_speed	7 years ago
dangqingqing	f5e367655e	Use G++ to compile some cu operators.	7 years ago
emailweixu	2378679a9e	Fix a dead lock bug for dyload/nccl.h when nccl lib cannot be loaded (#5533 ) It caused by a bug of std::call_once described in https://stackoverflow.com/questions/41717579/stdcall-once-hangs-on-second-call-after-callable-threw-on-first-call. It is likely caused by a deeper bug of pthread_once, which is discussed in https://patchwork.ozlabs.org/patch/482350/	7 years ago
Yang Yu	3187451ae7	CompareOp's kernel device type is decided by input tensor place CompareOp can run on CPU even other operators are running on GPU, since opeatations like comparing control flags should be performed only on CPU	7 years ago
qingqing01	58db07b7bb	Check errors for the cuda kernel calls. (#5436 )	7 years ago
QI JUN	afd1e844fd	remove unused code (#5219 ) * remove unused code * fix cmake file * fix build error	7 years ago
Dong Zhihong	16a39d24f3	fix conflict	7 years ago
Qiao Longfei	56b723c40d	Cudnn batch norm op (#5067 ) * init cudnn batch norm op * rename batch_norm_cudnn_op.cc batch_norm_op.cu * correct name style * add ExtractNCWHD, simplify code * fix ExtractNCWHD * use CUDNN_ENFORCE instead of PADDLE_ENFORCE	7 years ago
Dong Zhihong	0990c87bf6	checkin nccl operator	7 years ago
Yu Yang	94e741d6f0	Use external project for NCCL (#5028 )	7 years ago
Yu Yang	43c6ff212e	Feature/nccl dso (#5001 ) * "add nccl enforce" * Dev * Update comment * Add nccl test * Follow comments	7 years ago
Markus Kliegl	164898277c	MatMul operator (#4856 ) * initial matmul operator Similar to np.matmul, but also has transpose_X and transpose_Y flags, and only supports tensors from rank 1 to 3 inclusive. For GPU, uses cublas?gemmStridedBatched. For CPU, uses cblas_?gemm_batch if available via MKL; otherwise a simple serial implementation that loops over the batch dimension is employed for now.	7 years ago
武毅	a3ccbdb3b6	Cudnn conv op (#4195 ) * add cudnn_conv_op * WIP * update * update * fix grad check * use platform::memory * add support group for cudnn * update * follow comments * fix onlycpu build * update cuda define * follow comments * follow comments * merge with updates * fix compile error * follow comments * follow comments	7 years ago
Yang Yang(Tony)	c3bf332666	Merge pull request #4537 from QiJune/executor_impl Executor interface design and implementation	7 years ago
Luo Tao	871a3f6e76	remove unused PADDLE_ONLY_CPU comment	7 years ago
Yang Yang	e51557130e	clean up for review	7 years ago
qijun	1f5192a27b	fix executor gpu unittest	7 years ago
qijun	39f75a13a4	Merge remote-tracking branch 'baidu/develop' into executor_impl	7 years ago
Yi Wang	880b874b47	Merge branch 'develop' of https://github.com/paddlepaddle/paddle into paddle_only_cpu	7 years ago
Yi Wang	2b204f048b	Rename platform::GetDeviceCount into platform::GetCUDADeviceCount	7 years ago
qijun	e02cc571cf	Merge remote-tracking branch 'baidu/develop' into executor_impl	7 years ago
qijun	fe10e86dd5	fix gpu build error	7 years ago
Yi Wang	4558807c48	Use PADDLE_WITH_CUDA instead of PADDLE_WITH_GPU	7 years ago
Yu Yang	84500f9487	Change `PADDLE_ONLY_CPU` to `PADDLE_WITH_GPU` By shell command ```bash sed -i 's#ifdef PADDLE_ONLY_CPU#ifndef PADDLE_WITH_GPU#g' `find ./paddle/ -name '.h' -o -name '.cc' -o -name '.cpp' -o -name '.c' -o -name '.cu'` sed -i 's#ifndef PADDLE_ONLY_CPU#ifdef PADDLE_WITH_GPU#g' `find ./paddle/ -name '.h' -o -name '.cc' -o -name '.cpp' -o -name '.c' -o -name '.cu'` ```	7 years ago
qijun	cb198fa7b6	merge baidu/develop	7 years ago
qijun	395051512d	remove device context manager	7 years ago
qijun	6c4d1f551d	refine codes	7 years ago
qijun	023ed5eb39	merge baidu/develop	7 years ago
qijun	b5dbe88b5a	follow comments	7 years ago
dzhwinter	8acc010691	Merge branch 'develop' into macro	7 years ago
dongzhihong	5423cb3e57	format	7 years ago
Yu Yang	8fd845e0fa	Unify Map in OpDescBind	7 years ago
chengduoZH	df59889984	remove conflict	7 years ago
qijun	b611a479fc	fix gpu build error	7 years ago
qijun	7a6fcc7d30	move EigenDeviceConverter to device_context.h	7 years ago
Yu Yang	f2feb33384	Follow comments	7 years ago
Yu Yang	3a5693e0a8	Add Skeleton of Double support	7 years ago
chengduoZH	3c0f079333	remove conflict and fix InferShape function	7 years ago
Yu Yang	bc30ba19ed	Merge pull request #4375 from reyoung/feature/use_bool_for_enforce Use `bool` for PADDLE_ENFORCE, not int	7 years ago
chengduoZH	30a586df0c	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into Add_pool_op	7 years ago
Qiao Longfei	d0ad82cff1	fix nv_library (#4370 ) * fix nv_library * fix symbol in gpu_info.h	8 years ago
Yu Yang	699dbe3be9	Use `bool` for PADDLE_ENFORCE, not int * If stat is an integer, bool value will implicit cast to int before pass to PADDLE_ENFORCE	8 years ago
Yu Yang	ba1f5b5c58	Sync computation when Python invoke `run` * Since GPU is an async device by default. We should sync computation when Python invoke `run`. So Python can get the correct computation result	8 years ago
chengduoZH	0417e4e4bf	fix framework::LoDTensor => Tensor	8 years ago
dangqingqing	41a2321a0e	Refine platform::Transform function and fix prelu_op testing.	8 years ago
Yu Yang	87e4e25db1	Change Transform API Using DeviceContext, not Place to get stream	8 years ago
Yu Yang	847fe47310	Merge branch 'develop' of github.com:baidu/Paddle into feature/remove_lazy_init_in_dev_ctx	8 years ago
Yu Yang	81d56ca86b	Remove lazy-initialization in device_context * Also use `const DeviceContext&` all the time, to prevent `const_cast` Fix #4169 Fix #3468 Fix #3475	8 years ago
武毅	8580dce308	Refine accuracy_op CUDA kernel (#4097 ) * refind accuracy_op * follow comments * follow comments	8 years ago
Yu Yang	9d3b920d75	Merge pull request #3981 from reyoung/feature/transform_api Host and device transform API	8 years ago
liaogang	59d661b9a9	Fix enforce test failed Note: If no symbol with a suitable value is found, both this field and dli_saddr shall be set to NULL.	8 years ago
Yu Yang	f8c6792aa3	Extract DevPtrCast to device_ptr_cast.h	8 years ago
Yu Yang	54d88d4472	Merge branch 'develop' of github.com:baidu/Paddle into feature/transform_api	8 years ago
Yu Yang	6fbf097bcc	Mark thrust::device_ptr in transform Fix TravisCI	8 years ago
Yu Yang	dad5421afe	Remove enforce demangle It is buggy in some Linux because the unique_ptr will be free however the std::string trying to use that char*. Moreover, it is no need to demangle for error log by Paddle. Just use `c++filt` or other shell utilities to do this.	8 years ago
Yu Yang	c5fa417c62	Host and device transform API * with unit-tests * Also complete `memcpy`	8 years ago
Yu Yang	ed346f1dcd	Pass CI	8 years ago
dangqingqing	8c048aa099	Remove cudnn_helper.cc	8 years ago
dangqingqing	207132226c	Add unit testing for cuDNN wrapper.	8 years ago
dangqingqing	c20a01d67d	Add cuDNN Wrapper.	8 years ago
dangqingqing	f188e22b33	Remove set functor and add comapre_grad test	8 years ago

1 2 3 4 5 ...

274 Commits (adc26dffa9dac81bd93c88d70f0ab66fcdcc81f0)