Paddle

Commit Graph

Author	SHA1	Message	Date
Adam	f3909020de	Add mechanism for blocking oneDNN cache clearing (#26502 ) * Add mechanism for blocking oneDNN cache clearing * Review changes and Add thread guards	5 years ago
QingshuChen	138ecf24aa	support Baidu Kunlun AI Accelerator (#25959 ) * support Baidu AI Accelerator * test=kunlun * minor * test=kunlun * support xpu op in separate file * test=kunlun * update XPU error message and remove duplicated code * test=kunlun * minor * test=kunlun * minor * test=kunlun	5 years ago
GaoWei8	c10dcff12d	refine PADDLE_ENFORCE (#25456 ) * Refine PADDLE_ENFORCE in paddle/fluid/platform test=develop	5 years ago
GaoWei8	ea7e532598	Refine PADDLE_ENFORCE (#25369 ) * refine PADDLE_ENFORCE test=develop	5 years ago
Chen Weihang	d1062d5278	Replace all errors thrown by LOG(FATAL) with PADDLE_THROW (#24759 ) * remove REPLACE_ENFORCE_GLOG compile option & add ci rule prohibit LOG(FATAL) using, test=develop * remove ci test case, test=develop * replace all LOG(FATAL) & polish message, test=develop * fix typo, test=develop * polish error info detail, test=develop	5 years ago
pawelpiotrowicz	db2b6b6568	Hide globals & redesign restore PR (#24279 ) test=develop	5 years ago
Chen Weihang	aa0f254fbe	Add macro BOOST_GET to enrich the error information of boost :: get (#24175 ) * add new macro BOOST_GET_SAFELY & unittests, test=develop * add different macro type, test=develop * fix get macro type in executor, test=develop * four macro part change backup * using one macro for all case, test=develop * revert attribute change, test=develop * change to three func to solve gcc4.8 bug, test=develop * polish some details, test=develop	5 years ago
Sylwester Fraczek	e1a7a88057	added reshape transpose matmul fuse pass (#23754 )	5 years ago
Guo Sheng	a8c0fb4e86	Add cholesky_op (#23543 ) * Add cholesky_op forward part. test=develop * Complete cholesky_op forward part. test=develop * Add cholesky_op backward part. test=develop * Complete cholesky_op backward part. test=develop * Refine cholesky_op error check and docs. test=develop * Add grad_check unit test for cholesky_op. test=develop * Fix sample code in cholesky doc. test=develop * Refine some error messages of cholesky_op. test=develop * Refine some error messages of cholesky_op. test=develop * Remove unused input in cholesky_grad. test=develop * Remove unused input in cholesky_grad. test=develop * Fix stream for cusolverDnSetStream. test=develop * Update PADDLE_ENFORCE_CUDA_SUCCESS from cholesky_op to adapt to latest code. test=develop * Add CUSOLVER ERROR in enforce.h test=develop * Fix the missing return value in cholesky. test=develop	5 years ago
石晓伟	34d7d6aef0	declare the stream::Priority as enum class, test=develop (#24013 )	5 years ago
Zhang Ting	b89dd86fb6	Update eigen (#23203 ) * update eigen, test=develop * remove patches, test=develop * add definition of -fabi-version, test=develop * add patch for TensorBlock.h, test=develop * test windows, test=develop * only update eigen for Linux, test=develop * add code comments, test=develop	5 years ago
石晓伟	2d01cc85c4	DeviceContext Split, test=develop (#23737 ) * supports thread-binding stream, test=develop * avoid using thread_local variables in dtor, test=develop * modify the stream priority enum, test=develop	5 years ago
石晓伟	5c59d2139e	reverts the commit 23177, test=develop (#23363 )	5 years ago
Yi Liu	0471476a18	fix nccl comm double free bug (#23344 ) As nccl comm is not created by CUDADeviceContext, it should be destroyed by the creator as the best practice of RAII.	5 years ago
石晓伟	75ebb48a91	supports thread-binding stream, test=develop (#23177 )	5 years ago
Wilber	7bc4b09500	add WITH_NCCL option for cmake. (#22384 ) cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡 Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>	5 years ago
zhaoyuchen2018	3d4f2aa689	Refine stack op to improve xlnet performance, test=develop (#22142 ) stack's wait cost a lot of cpu time, use cuda kernel to do memory copy will reduce cpu time. Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	5 years ago
Adam	e81f0228df	MKL-DNN 1.0 Update (#20162 ) * MKLDNN v1.0 rebase to Paddle 1.6 test=develop * Add hacky paddle::string::to_string() implementation * vectorize<int64-t>() -> vectorize() cleanup test=develop * PADDLE_ENFORCE and void_cast fixes test=develop * Rebase changes test=develop * Cosmetics test=develop * Delete MKL from mkldnn.cmake test=develop * CMake debug commands test=develop * Delete MKLDNN_VERBOSE and rebase fixes test=develop * Rebase fixes test=develop * Temporarily disable int8 resnet101 vgg16 and vgg19 tests test=develop * Add libmkldnn.so.1 to python setup test=develop * Add libmkldnn.so.1 to inference_lib cmake after rebase test=develop * Post rebase fixes + FC int8 changes test=develop * Fix LRN NHWC test=develop * Fix NHWC conv3d test=develop * Windows build fix + next conv3d fix test=develop * Fix conv2d on AVX2 machines test=develop	5 years ago
Zeng Jinle	97e76cb96d	refine dev_ctx.Wait() exception throw, test=develop (#21600 )	5 years ago
Jacek Czaja	cd43c4440e	[MKL-DNN] LRN and Pool2d (FWD) NHWC support (#21375 )	5 years ago
liuwei1031	d8b6cf2bcd	fix sporadically hang issue on windows(#21201 ) cudaStreamSynchronize randomly hang when used in multi-thread environment, replace it with cudaStreamQuery API on windows	5 years ago
zhaoyuchen2018	b93870e696	Improve topk performance. (#21087 ) * Improve topk performance. give 200000 data to compute topk, before opt: cost 1s after opt: cost 0.0028s. * Refine return value. * Add cuda util funtions. * Fix ComputeBlockSize bug & refine comments. Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	5 years ago
Zeng Jinle	37f76407b0	fix cuda dev_ctx allocator cmake deps, test=develop (#19953 )	5 years ago
Zeng Jinle	c7f36e7c00	Add lock to cudnn handle calls (#19845 ) * refine reallocate of workspace size, test=develop * add lock to cudnn handle calls, test=develop	5 years ago
Huihuang Zheng	12542320c5	Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989 ) TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation	6 years ago
gongweibao	29d8781240	Polish fleet API to support cuda collective mode and nccl2 mode. (#18966 ) Polish fleet API to support cuda collective mode and nccl2 mode	6 years ago
Tao Luo	076f833110	add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy (#18580 ) * add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy test=develop * enhance MkldnnPostReset test=develop * add comments for mkldnn_cache_capacity field test=develop	6 years ago
Tao Luo	fe32879d2a	add mkldnn shapeblob cache clear strategy (#18513 ) * add mkldnn shapeblob cache clear strategy test=develop * refine with comments test=develop * make cache clear strategy more safey test=develop * add lock for GetShapeBlobSize test=develop	6 years ago
Tao Luo	3f3112ceb0	add shape_blob for cache mkldnn primitive (#18454 ) test=develop	6 years ago
Leo Zhao	8f5fffca0a	rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453 ) * rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() test=develop * update session id definition and adjust logic for default behavior test=develop * reset logic in mkldnn reuse as most of cases work in default. test=develop	6 years ago
Michał Gallus	8409693272	Reset DeviceContext after quantization warmup (#18182 ) test=develop	6 years ago
chengduo	4978db2c10	Remove nccl dep when the number of GPU is 1 (#18158 ) * remove nccl dep when the number of GPU is 1 test=develop	6 years ago
Zeng Jinle	3ece61f71e	Remove attribute in Allocator::Allocate (#17878 ) * remove attribute in Allocator::Allocate, test=develop * fix travis ci error, test=develop	6 years ago
Zeng Jinle	3925bd81e8	Fix cuda/cudnn version detection error (#17853 ) * fix cuda/cudnn version detection error, test=develop * fix again, test=develop	6 years ago
gongweibao	eb83abeac3	Add DGC(Deep Gradient Compression) interface. (#15841 )	6 years ago
nhzlx	a1d11bb175	fix ci bug: cudnn handler in multi card test=develop	6 years ago
nhzlx	07dcf2856c	git cherry-pick from feature/anakin-engine: update anakin subgraph #16278	6 years ago
qingqing01	86e912c544	Fix windows compiling (#16230 ) test=develop	6 years ago
qingqing01	8ad672a287	Support sync batch norm. (#16121 ) * Support Sync Batch Norm. * Note, do not enable it in one device. Usage: build_strategy = fluid.BuildStrategy() build_strategy.sync_batch_norm = True binary = fluid.compiler.CompiledProgram(tp).with_data_parallel( loss_name=loss_mean.name, build_strategy=build_strategy)	6 years ago
Sylwester Fraczek	74672d1aff	Change (smart_ptr.get()) -> smart_ptr reason: dereferencing smart pointer is the same as the underlying pointer test=develop	6 years ago
sneaxiy	209b355762	fix many warning test=develop	6 years ago
minqiyang	315b133e67	Add single GPU support to imperative	6 years ago
chengduo	064512aa47	Remove workspace_handle in conv_cudnn (#15186 ) * remove workspace_handle in conv2d_cudnn test=develop * remove workspace_handle test=develop * fix bug test=develop * make test_conv2d_op SERIAL test=develop * save memory in conv_cudnn test=develop * enhance thread safety test=develop * enhance temporary allocator test=develop * Add excess fraction test=develop * follow comments test=develop * fix bug and code refine test=develop * fix memory size check test=develop * rename reuse_tmp_allocation_excess_fraction test=develop	6 years ago
Zeng Jinle	e29f10d315	Merge pull request #15207 from sneaxiy/remove_op_handle_lock_and_fix_var Remove op handle lock and fix var	6 years ago
Zeng Jinle	c562be20d9	Merge pull request #15193 from sneaxiy/fix_cudnn_compatible_check Fix cudnn compatible check	6 years ago
sneaxiy	ed409ac9f4	Revert "Revert "Remove op handle lock"" test=develop	6 years ago
Zeng Jinle	dacfaaa966	Revert "Remove op handle lock" test=develop	6 years ago
sneaxiy	9793a0b6a6	fix_cudnn_compatible_check	6 years ago
sneaxiy	d0a8a1e950	remove_op_handle_lock test=develop	6 years ago
sneaxiy	d25395fc98	remove tensor core lock test=develop	6 years ago

1 2 3

114 Commits (8645591d664f9e059113900281a715f8f83ae93c)