Paddle

Commit Graph

Author	SHA1	Message	Date
石晓伟	5c59d2139e	reverts the commit 23177, test=develop (#23363 )	5 years ago
石晓伟	75ebb48a91	supports thread-binding stream, test=develop (#23177 )	5 years ago
Wilber	7bc4b09500	add WITH_NCCL option for cmake. (#22384 ) cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡 Co-authored-by: 石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>	5 years ago
zhaoyuchen2018	3d4f2aa689	Refine stack op to improve xlnet performance, test=develop (#22142 ) stack's wait cost a lot of cpu time, use cuda kernel to do memory copy will reduce cpu time. Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	5 years ago
Jacek Czaja	cd43c4440e	[MKL-DNN] LRN and Pool2d (FWD) NHWC support (#21375 )	5 years ago
Zeng Jinle	cdb3d27985	Fix warn of gcc8 (#21205 ) * fix warnings oof gcc 8 compilation, test=develop * fix boost::bad_get, test=develop * refine PADDLE_ENFORCE, test=develop	5 years ago
zhaoyuchen2018	b93870e696	Improve topk performance. (#21087 ) * Improve topk performance. give 200000 data to compute topk, before opt: cost 1s after opt: cost 0.0028s. * Refine return value. * Add cuda util funtions. * Fix ComputeBlockSize bug & refine comments. Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	5 years ago
qingqing01	1a3eef026c	Enable users to create custom cpp op outside framework. (#19256 ) * How to write custom op needs to follow framework OP spec. * Package fluid_framework.so and headers into whl. * Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir. * Export some C-APIs to merge OpInfo between core.so and custom_op.so. * Add unit testing. * Update API.spec.	5 years ago
Zeng Jinle	37f76407b0	fix cuda dev_ctx allocator cmake deps, test=develop (#19953 )	5 years ago
Zeng Jinle	c7f36e7c00	Add lock to cudnn handle calls (#19845 ) * refine reallocate of workspace size, test=develop * add lock to cudnn handle calls, test=develop	5 years ago
Zeng Jinle	5eb381a3e2	refine reallocate of workspace size, test=develop (#19843 )	5 years ago
Huihuang Zheng	12542320c5	Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989 ) TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation	6 years ago
Tao Luo	75d1571995	refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603 ) test=develop	6 years ago
Tao Luo	fe32879d2a	add mkldnn shapeblob cache clear strategy (#18513 ) * add mkldnn shapeblob cache clear strategy test=develop * refine with comments test=develop * make cache clear strategy more safey test=develop * add lock for GetShapeBlobSize test=develop	6 years ago
Tao Luo	3f3112ceb0	add shape_blob for cache mkldnn primitive (#18454 ) test=develop	6 years ago
Leo Zhao	8f5fffca0a	rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453 ) * rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() test=develop * update session id definition and adjust logic for default behavior test=develop * reset logic in mkldnn reuse as most of cases work in default. test=develop	6 years ago
Michał Gallus	8409693272	Reset DeviceContext after quantization warmup (#18182 ) test=develop	6 years ago
Huihuang Zheng	b9494058b3	Use CudnnWorkspaceHandle in exhaustive search (#17082 ) 1. Use CudnnWorkspaceHandle in exhaustive search of conv_cudnn. 2. For Ops using CudnnWorkspaceHandle in exhaustive search, release their GPU memory after exhaustive search. test=develop	6 years ago
Zeng Jinle	1202d3fc74	Refine model gpu memory (#16993 ) * speedup gc and inplace softmax_with_cross_entropy_grad test=develop * refine models gpu mem Merge skip vars and warning messages of mem opt remove relu mem opt test=develop * follow comments test=develop	6 years ago
nhzlx	a1d11bb175	fix ci bug: cudnn handler in multi card test=develop	6 years ago
nhzlx	3df7b98a0f	Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD	6 years ago
Wu Yi	b7baeed7bb	fix win gpu build test=develop (#16334 )	6 years ago
nhzlx	07dcf2856c	git cherry-pick from feature/anakin-engine: update anakin subgraph #16278	6 years ago
Wu Yi	6382b62f6b	Collective ops (#15572 ) * wip allreduce in op * wip * wip * wip * wip adding test * wip for conflict with mp mode * fix tests test=develop * fix cpu build test=develop * fix travis clang format test=develop * fix cpu build test=develop * update api.spec test=develop * delete comment test=develop * fix cpplint test=develop * fix test=develop * follow comment test=develop * add file test=develop * fix build test=develop * update test=develop * to be compatible with sync_bn, and fix mp mode in develop test=develop	6 years ago
qingqing01	86e912c544	Fix windows compiling (#16230 ) test=develop	6 years ago
qingqing01	8ad672a287	Support sync batch norm. (#16121 ) * Support Sync Batch Norm. * Note, do not enable it in one device. Usage: build_strategy = fluid.BuildStrategy() build_strategy.sync_batch_norm = True binary = fluid.compiler.CompiledProgram(tp).with_data_parallel( loss_name=loss_mean.name, build_strategy=build_strategy)	6 years ago
chengduo	46d01d798e	Revert "Revert "Remove workspace_handle in conv_cudnn (#15186 )"" (#15290 ) test=develop This reverts commit `358e657f68`.	6 years ago
chengduozh	358e657f68	Revert "Remove workspace_handle in conv_cudnn (#15186 )" test=develop This reverts commit `064512aa47`.	6 years ago
chengduo	064512aa47	Remove workspace_handle in conv_cudnn (#15186 ) * remove workspace_handle in conv2d_cudnn test=develop * remove workspace_handle test=develop * fix bug test=develop * make test_conv2d_op SERIAL test=develop * save memory in conv_cudnn test=develop * enhance thread safety test=develop * enhance temporary allocator test=develop * Add excess fraction test=develop * follow comments test=develop * fix bug and code refine test=develop * fix memory size check test=develop * rename reuse_tmp_allocation_excess_fraction test=develop	6 years ago
sneaxiy	ed409ac9f4	Revert "Revert "Remove op handle lock"" test=develop	6 years ago
Zeng Jinle	dacfaaa966	Revert "Remove op handle lock" test=develop	6 years ago
sneaxiy	d0a8a1e950	remove_op_handle_lock test=develop	6 years ago
sneaxiy	d25395fc98	remove tensor core lock test=develop	6 years ago
chengduo	b9fb03cf54	Move GetTensor to tensor_util (#15011 ) * refine tensor test=develop * refine tensor test=develop * fix device_context log test=develop	6 years ago
chengduo	79bd6dfa18	[Feature] Add Temporary Allocator (#14875 ) * Add Temporal Allocator * add Temporay Allocator to DeviceContext test=develop * code refine test=develop * fix mean_iou test=develop * Add DeviceTemporaryAllocator test=develop * fix conv_op bug test=develop * small fix test=develop * code refine test=develop * log refine test=develop * fix unit test test=develop * move double check * refine concat_and_split test=develop * add limit_of_temporary_allocation test=develop * fix name test=develop	6 years ago
sneaxiy	ca84c2ca8f	merge develop test=develop	6 years ago
Yu Yang	7604b1ad51	Fix Eigen macro when using GPU The macro should be defined by compiler rather than by source. test=develop	6 years ago
sneaxiy	c47c451a00	fix bug	6 years ago
chengduo	00b9e9a135	Refine cublas to support CUBLAS_TENSOR_OP_MATH (#13929 ) * refine cublase test=develop * code refine * refine cublas * add GEMME_EX * add enable_cublas_tensor_op_math doc and add cublasCall test=develop * fix CublasCall for cuda version test=develop * fix error test=develop * fix GEMM_EX to be compatible with gcc 4.8 test=develop * add GEMM_EX test=develop * to compatiable with gcc4.8 test=develop	6 years ago
Yu Yang	0d6718fcbd	Pass compile	6 years ago
Yu Yang	c774bcbd2d	Merge device_context test=develop	6 years ago
sneaxiy	faac8a76ce	remove unnecessary codes test=develop	6 years ago
sneaxiy	7ff320f8cc	merge develop	6 years ago
Yu Yang	90d9e5aee8	feat(platform): lazy initialization of devicecontext in pool (#14067 ) * feat(platform): lazy initialization of devicecontext in pool Use std::async(deferer, []{...}) to lazy initialize DeviceContext in Pool test=develop * Add future includes test=develop	6 years ago
Sylwester Fraczek	2098b42584	review fixes (Teamcity fails) test=develop	6 years ago
sneaxiy	5be6f762d0	remove_lock_in_some_ops test=develop	6 years ago
Brian Liu	a53e8a8da6	Update MKLDNN integration framework to support Paddle multi-instances Make all blob info saved in global device context to be thread based. Meanwhile save thread id in thread local storage in ParallelDo	6 years ago
chengduozh	82d2903b63	Fix fast ParallelExe bug test=develop	6 years ago
chengduo	2c9839c847	add cuda version display (#13885 ) test=develop	6 years ago
typhoonzero	a4f7696a18	Revert "Some trivial optimization (#13530 )" This reverts commit `1d91a49d2f`.	6 years ago

1 2

79 Commits (8af85922d0facbe1480da7059369f38ffed93f15)