Commit Graph

90 Commits (756a7d9d506780c8d395d12be1b798a073049f55)

Author SHA1 Message Date
Yi Wang 535646cf25 Update (#9717)
7 years ago
Yi Wang e185502ebe
Fix cpplint errors with paddle/fluid/platform/dynload (#9715)
7 years ago
Yi Wang 0c43a376e2
Fix cpplint errors with paddle/fluid/platform/gpu_info.* (#9710)
7 years ago
Yi Wang 55ffceaadb
Fix cpplint errors paddle/fluid/platform/place.* (#9711)
7 years ago
Yi Wang 809962625f
Fix cpplint errors of enforce.* (#9706)
7 years ago
Yi Wang ef4ee22668
Fix cpplint errors with paddle/fluid/platform/cpu_info* (#9708)
7 years ago
Kexin Zhao b2a1c9e8b7 Add float16 support to non-cudnn softmax op on GPU (#9686)
7 years ago
Yi Wang 797a7184ac
Unify Fluid code to Google C++ style (#9685)
7 years ago
Kexin Zhao d00bd9eb72 Update the cuda API and enable tensor core for GEMM (#9622)
7 years ago
Lei Wang 09b4a1a361 Build: generate all the build related files into one directory. (#9512)
7 years ago
Kexin Zhao d904b3dd1d
Merge pull request #9623 from kexinzhao/enable_cudnn_tensor_core
7 years ago
Kexin Zhao 9ba36604d8 fix cpplint error
7 years ago
Kexin Zhao 187ba08789 enable tensor core for conv cudnn
7 years ago
chengduoZH e099b18045 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/add_CUDAPinnedPlace
7 years ago
chengduoZH 2514d70ea7 follow comments
7 years ago
Luo Tao 5baa529e0e fix compiler error of profiler_test in ONLY_CPU mode
7 years ago
chengduoZH 58a9f9f781 set the max size of cudapinned memory
7 years ago
Yu Yang 7dcb217e31 Refine allreduce op
7 years ago
Yu Yang c0c2e15920 NCCL AllReduce
7 years ago
chengduoZH ab601c19c3 Add CUDAPinnedPlace
7 years ago
chengduoZH 158d6c4d19 add unit test
7 years ago
chengduoZH 18eb77303d add CUDAPinnedPlace
7 years ago
Yu Yang 50e7e25db3 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cpp_parallel_executor
7 years ago
Darcy 8090eb6272 added proto_desc to device_tracer's dep list (#9342)
7 years ago
Yu Yang 1d8fe2a220 Enhance device context pool (#9293)
7 years ago
Yu Yang 5c333e4143 Add dctor for dev_ctx
7 years ago
Yu Yang fe7ed285d1 Extract NCCLCtxMap
7 years ago
Kexin Zhao ed2bc194c5
Merge pull request #9176 from kexinzhao/batch_norm_fp16
7 years ago
Yu Yang 6ebc6bf533 ReorganizeCode
7 years ago
Yu Yang 41ad632341 Add NCCL Group Guard
7 years ago
Yu Yang 99fe83a020 Move nccl helper
7 years ago
Yu Yang a0494f8e55 Mutex lock wait
7 years ago
Kexin Zhao d307b5e4a6 Merge remote-tracking branch 'upstream/develop' into elementwise_add_fp16
7 years ago
Kexin Zhao 182da95317 small fix
7 years ago
Kexin Zhao f2bbbb2b66 fix arithmetic operator
7 years ago
Kexin Zhao 18d616ed70 add float16 arithmetic operators on new GPU
7 years ago
Yu Yang 3aa7051b98 Remove DevCtx lock
7 years ago
Yu Yang d3e55fde03 Guard devctx
7 years ago
Yu Yang 0023c3bcf5 Use atomic bool
7 years ago
Kexin Zhao 446d54f5c3 update
7 years ago
Kexin Zhao ffa22a5f90 fix scaling param type
7 years ago
Kexin Zhao e870947cfd fix batch norm fp16 param type
7 years ago
Yu Yang 5e87cd7574 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cpp_parallel_executor
7 years ago
qiaolongfei a39c861530 rm unused private field in profiler
7 years ago
Kexin Zhao a13ec3432a fix test error
7 years ago
Kexin Zhao e4de5dc347 add conv2d fp16 support
7 years ago
Xin Pan d284cf88e5
Merge pull request #9037 from panyx0718/develop
7 years ago
dzhwinter 128adf53cb
[Speed]implement cudnn sequence softmax cudnn (#8978)
7 years ago
Yu Yang baef1124fb ParallelExecutor And dependency engine
7 years ago
Xin Pan 4840c49b27 Better timeline
7 years ago