Commit Graph

46 Commits (162b4d6c13f6f38a234423bc984fb41710796475)

Author SHA1 Message Date
Leo Chen bbc84e0fe0
Refine error msg in paddle/fluid/framework/details [part 1] (#25631)
4 years ago
Chen Weihang d1062d5278
Replace all errors thrown by LOG(FATAL) with PADDLE_THROW (#24759)
5 years ago
Chen Weihang aa0f254fbe
Add macro BOOST_GET to enrich the error information of boost :: get (#24175)
5 years ago
Wilber a90fa54092
Compile without nccl deps. [1/2] (#22509)
5 years ago
Wilber 7bc4b09500
add WITH_NCCL option for cmake. (#22384)
5 years ago
WangXi 17299b8d21 fix batch_norm_grad infer shape=0 & add allreduce enforce shape, test=develop (#21801)
5 years ago
WangXi 507afa8a8a Fix dgc nan by stripping nccl from sparseReduce. (#20630)
5 years ago
chengduo 5866a7a5fe
Enable fused_all_reduce_op_handle support GPU and CPU Gradients (#19418)
6 years ago
Zeng Jinle 708bd9798d
move_flags_to_unified_files_for_management, test=develop (#19224)
6 years ago
gongweibao 29d8781240
Polish fleet API to support cuda collective mode and nccl2 mode. (#18966)
6 years ago
Zeng Jinle d3003a1620
Feature/buffer_shared_inplace (#17911)
6 years ago
gongweibao f5caf3443c
Fix reinitialized ncclid error! (#18025)
6 years ago
gongweibao fbbdc9ccad
Add backward and optimizer operator dependency pass. (#17746)
6 years ago
gongweibao 65bbf950ee
Add multi-ncclcomm and 2D ncclallreduce support. (#17263)
6 years ago
gongweibao cbdb8a17b1
Polish DGC code (#16818)
6 years ago
gongweibao 8b793d0efd
Fix DGC bug. (#16697)
6 years ago
chengduo ea2a2f778a Fix the bug of AllReduceDepPass (#16393)
6 years ago
gongweibao eb83abeac3
Add DGC(Deep Gradient Compression) interface. (#15841)
6 years ago
chengduo a6a3b2fbbc
[Speed]Refine ParallelExecutor (#16190)
6 years ago
Dun a83e470405
Profiler refine and add CUDA runtime api tracer (#15301)
6 years ago
gongweibao 7cd4dd7ce4
Hide varhandle members. (#15382)
6 years ago
Yancey1989 4ad9de74dd disable sync nccl by default test=develop
6 years ago
Yancey1989 0a885ac12a Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode
6 years ago
Yancey1989 845bfd5807 cleanup code
6 years ago
peizhilin 1e7f83e60a add cuda dso support for windows
6 years ago
Yancey1989 41a64f6a2a Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode
6 years ago
Yancey1989 06936a2ff5 fix 1gpu test=develop
6 years ago
Yancey1989 d3a4da5cf6 fix comment test=develop
6 years ago
Yancey1989 49870f507d delete unused code test=develop
6 years ago
Yancey1989 4a4ccac1d0 update by comment test=develop
6 years ago
Yu Yang 9bd70a1e04 Change tensor uses proto::VarType::type
6 years ago
Yancey1989 47740ace28 fix performance
6 years ago
Yancey1989 cb8a24be14 clean code
6 years ago
Yancey1989 c9de6f1b05 init parallel graph mode
6 years ago
Wu Yi 29d9fb53fc
[Feature] multi process multi gpu dist training, boost v100 performance by 20% (#14661)
6 years ago
peizhilin 7c8c9dc9bf fix unit test cases
6 years ago
chengduo ed087f8232
refine op_handle (#14178)
6 years ago
Yancey1989 1e1b6622fd update by comment
7 years ago
Yancey1989 5ce1a960a5 move bcast op into pass
7 years ago
Xin Pan caf10b474f make profiler use thread_id from g_thread_id
7 years ago
Xin Pan 37e514432b op compose node and update nodes.
7 years ago
chengduoZH 7b723839ef Add cpu test for parallel_executor_crf executor_fetch_feed, and enable these tests
7 years ago
chengduoZH d24e046c1e fix allReduce bug
7 years ago
chengduoZH a57e8a4338 add cpu test
7 years ago
chengduoZH 495368c243 ADD CPU_NUM
7 years ago
chengduoZH 27073c284d nccl_all_reduce_op_handle => all_reduce_op_handle
7 years ago