Paddle

Commit Graph

Author	SHA1	Message	Date
Qi Li	4d647ec137	[ROCM] update fluid platform for rocm (part5), test=develop (#31315 )	4 years ago
liu zhengxi	ae2be49f40	Add cublas_handle() to expose cublas_handle to ops (#31157 ) * add get_cublas_handle() api * update format * add unittests * alter function name	4 years ago
Qi Li	93c1d9e761	[ROCM] update fluid platform for rocm39 (part3), test=develop (#30913 )	4 years ago
Jacek Czaja	173660be7b	[oneDNN] Cache oneDNN stream not to recreate in each oneDNN op (#30358 )	4 years ago
AshburnLee	924aac2216	Add tf32 switch for cuDNN (#29192 )	4 years ago
liuyuhui	3d1741b794	[Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926 )	4 years ago
liuyuhui	4427df37cf	[Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574 )	4 years ago
Jacek Czaja	c9e874fc8e	[oneDNN] Unit test for checking oneDNN caching (#29606 )	4 years ago
AshburnLee	efea540ca9	Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732 )	4 years ago
arlesniak	62d4483649	Added verbose oneDNN lib version (#29378 )	4 years ago
Jacek Czaja	f6cca62575	[oneDNN] Making ThreadID info in caching key optional (#29272 )	4 years ago
wawltor	b2c8a00745	remove eigen threadpool for the speed up remove eigen threadpool for the speed up	4 years ago
Jacek Czaja	bd1d6d3b30	extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758 )	4 years ago
Huihuang Zheng	acc11c2a62	Retry CUDA Initialization to Fix Random Failure, test=develop (#28323 ) This PR is follow up of #28213. On that PR we tried to decrease GPU usage, however the CI still randomly failed. So I added retry logic for the initialization of nccl and cusolver. If the initialization failed, we can retry to avoid the random failure.	4 years ago
wanghuancoder	df43905f12	use iwyu clean include (#27267 ) * use iwyu clean include, test=develop, test=win * compilation error, test=develop * fix compilation error2, test=develop * fix compilation error3, test=develop * fix compilation error4, test=develop * fix compilation error5, test=develop * fix compilation error6, test=develop * fix compilation error7, test=develop * fix compilation error8, test=develop * fix compilation error8, test=develop * fix compilation error10, test=develop * fix compilation error11, test=develop	4 years ago
Jack Zhou	63203c4abc	enhance reduce op which can reduce tensor with arbitrary rank enhance reduce op which can reduce tensor with arbitrary rank	4 years ago
Adam	f3909020de	Add mechanism for blocking oneDNN cache clearing (#26502 ) * Add mechanism for blocking oneDNN cache clearing * Review changes and Add thread guards	5 years ago
QingshuChen	138ecf24aa	support Baidu Kunlun AI Accelerator (#25959 ) * support Baidu AI Accelerator * test=kunlun * minor * test=kunlun * support xpu op in separate file * test=kunlun * update XPU error message and remove duplicated code * test=kunlun * minor * test=kunlun * minor * test=kunlun	5 years ago
GaoWei8	fb70682f00	fix PADDLE_ENFORCE (#25297 ) * fix PADDLE_ENFORCE and refine the description test=develop	5 years ago
pawelpiotrowicz	db2b6b6568	Hide globals & redesign restore PR (#24279 ) test=develop	5 years ago
Guo Sheng	a8c0fb4e86	Add cholesky_op (#23543 ) * Add cholesky_op forward part. test=develop * Complete cholesky_op forward part. test=develop * Add cholesky_op backward part. test=develop * Complete cholesky_op backward part. test=develop * Refine cholesky_op error check and docs. test=develop * Add grad_check unit test for cholesky_op. test=develop * Fix sample code in cholesky doc. test=develop * Refine some error messages of cholesky_op. test=develop * Refine some error messages of cholesky_op. test=develop * Remove unused input in cholesky_grad. test=develop * Remove unused input in cholesky_grad. test=develop * Fix stream for cusolverDnSetStream. test=develop * Update PADDLE_ENFORCE_CUDA_SUCCESS from cholesky_op to adapt to latest code. test=develop * Add CUSOLVER ERROR in enforce.h test=develop * Fix the missing return value in cholesky. test=develop	5 years ago
石晓伟	34d7d6aef0	declare the stream::Priority as enum class, test=develop (#24013 )	5 years ago
Zhou Wei	7817003795	Optimize the error messages of paddle CUDA API (#23816 ) * Optimize the error messages of paddle CUDA API, test=develop * fix the error messages of paddle CUDA API, test=develop * Refactoring PADDLE_ENFORCE_CUDA_SUCCESS, and apply to curand/cudnn/cublas/NCCL,test=develop * remove build_ex_string,test=develop * merge conflict,test=develop	5 years ago
石晓伟	2d01cc85c4	DeviceContext Split, test=develop (#23737 ) * supports thread-binding stream, test=develop * avoid using thread_local variables in dtor, test=develop * modify the stream priority enum, test=develop	5 years ago
石晓伟	5c59d2139e	reverts the commit 23177, test=develop (#23363 )	5 years ago
石晓伟	75ebb48a91	supports thread-binding stream, test=develop (#23177 )	5 years ago
Wilber	7bc4b09500	add WITH_NCCL option for cmake. (#22384 ) cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡 Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>	5 years ago
zhaoyuchen2018	3d4f2aa689	Refine stack op to improve xlnet performance, test=develop (#22142 ) stack's wait cost a lot of cpu time, use cuda kernel to do memory copy will reduce cpu time. Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	5 years ago
Jacek Czaja	cd43c4440e	[MKL-DNN] LRN and Pool2d (FWD) NHWC support (#21375 )	5 years ago
Zeng Jinle	cdb3d27985	Fix warn of gcc8 (#21205 ) * fix warnings oof gcc 8 compilation, test=develop * fix boost::bad_get, test=develop * refine PADDLE_ENFORCE, test=develop	5 years ago
zhaoyuchen2018	b93870e696	Improve topk performance. (#21087 ) * Improve topk performance. give 200000 data to compute topk, before opt: cost 1s after opt: cost 0.0028s. * Refine return value. * Add cuda util funtions. * Fix ComputeBlockSize bug & refine comments. Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	5 years ago
qingqing01	1a3eef026c	Enable users to create custom cpp op outside framework. (#19256 ) * How to write custom op needs to follow framework OP spec. * Package fluid_framework.so and headers into whl. * Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir. * Export some C-APIs to merge OpInfo between core.so and custom_op.so. * Add unit testing. * Update API.spec.	5 years ago
Zeng Jinle	37f76407b0	fix cuda dev_ctx allocator cmake deps, test=develop (#19953 )	5 years ago
Zeng Jinle	c7f36e7c00	Add lock to cudnn handle calls (#19845 ) * refine reallocate of workspace size, test=develop * add lock to cudnn handle calls, test=develop	5 years ago
Zeng Jinle	5eb381a3e2	refine reallocate of workspace size, test=develop (#19843 )	5 years ago
Huihuang Zheng	12542320c5	Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989 ) TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation	6 years ago
Tao Luo	75d1571995	refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603 ) test=develop	6 years ago
Tao Luo	fe32879d2a	add mkldnn shapeblob cache clear strategy (#18513 ) * add mkldnn shapeblob cache clear strategy test=develop * refine with comments test=develop * make cache clear strategy more safey test=develop * add lock for GetShapeBlobSize test=develop	6 years ago
Tao Luo	3f3112ceb0	add shape_blob for cache mkldnn primitive (#18454 ) test=develop	6 years ago
Leo Zhao	8f5fffca0a	rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453 ) * rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() test=develop * update session id definition and adjust logic for default behavior test=develop * reset logic in mkldnn reuse as most of cases work in default. test=develop	6 years ago
Michał Gallus	8409693272	Reset DeviceContext after quantization warmup (#18182 ) test=develop	6 years ago
Huihuang Zheng	b9494058b3	Use CudnnWorkspaceHandle in exhaustive search (#17082 ) 1. Use CudnnWorkspaceHandle in exhaustive search of conv_cudnn. 2. For Ops using CudnnWorkspaceHandle in exhaustive search, release their GPU memory after exhaustive search. test=develop	6 years ago
Zeng Jinle	1202d3fc74	Refine model gpu memory (#16993 ) * speedup gc and inplace softmax_with_cross_entropy_grad test=develop * refine models gpu mem Merge skip vars and warning messages of mem opt remove relu mem opt test=develop * follow comments test=develop	6 years ago
nhzlx	a1d11bb175	fix ci bug: cudnn handler in multi card test=develop	6 years ago
nhzlx	3df7b98a0f	Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD	6 years ago
Wu Yi	b7baeed7bb	fix win gpu build test=develop (#16334 )	6 years ago
nhzlx	07dcf2856c	git cherry-pick from feature/anakin-engine: update anakin subgraph #16278	6 years ago
Wu Yi	6382b62f6b	Collective ops (#15572 ) * wip allreduce in op * wip * wip * wip * wip adding test * wip for conflict with mp mode * fix tests test=develop * fix cpu build test=develop * fix travis clang format test=develop * fix cpu build test=develop * update api.spec test=develop * delete comment test=develop * fix cpplint test=develop * fix test=develop * follow comment test=develop * add file test=develop * fix build test=develop * update test=develop * to be compatible with sync_bn, and fix mp mode in develop test=develop	6 years ago
qingqing01	86e912c544	Fix windows compiling (#16230 ) test=develop	6 years ago
qingqing01	8ad672a287	Support sync batch norm. (#16121 ) * Support Sync Batch Norm. * Note, do not enable it in one device. Usage: build_strategy = fluid.BuildStrategy() build_strategy.sync_batch_norm = True binary = fluid.compiler.CompiledProgram(tp).with_data_parallel( loss_name=loss_mean.name, build_strategy=build_strategy)	6 years ago

1 2 3

103 Commits (17030ff28b9a54bb57779e9b8448a6d222110ec5)