Commit Graph

130 Commits (6418c42148ef96b9040c978dd901acbd316f7cda)

Author SHA1 Message Date
chengduoZH d36af62c1e wrap_shfl_x_sync
7 years ago
chengduo 54797abd53
Merge pull request #10347 from chengduoZH/replace___shfl_with__shfl_sync
7 years ago
chengduoZH e97c1a8ca0 fix __shfl
7 years ago
chengduoZH 0cc635497c merge develop
7 years ago
Yiqun Liu 6084af47ef
Fix the bug when a input variable of op is dispensable. (#10268)
7 years ago
chengduo 4fbde42cdf Fix __shfl_down_sync_ of cross_entropy (#10345)
7 years ago
chengduoZH b8f7fa97b6 replace __shfl with __shfl_sync
7 years ago
chengduoZH 90d73c79c3 fix shfl_sync for CUDA8.0
7 years ago
dzhwinter eb6f9dd5de
Feature/cuda9 cudnn7 (#10140)
7 years ago
Yu Yang c02ba51de0
Merge pull request #10191 from reyoung/feature/strict_dynload
7 years ago
Yu Yang 3d53631bad Make dyload strictly use the same ABI in header
7 years ago
gongweibao 6171705a2c Potential bug in paddle/fluid/platform/CMakeLists.txt (#9723)
7 years ago
Tao Luo 44fa823841
Merge pull request #9949 from mozga-intel/mozga-intel/Mul_mkldnn
7 years ago
fengjiayi 9f11da5931 Add synchronous TensorCopy and use it in double buffer
7 years ago
mozga-intel 171471eada
Merge branch 'develop' into mozga-intel/Mul_mkldnn
7 years ago
Yu Yang c3c7b7bd1b
Merge pull request #9928 from reyoung/feature/stablize_code
7 years ago
mozga-intel 6e7b883bdd Initial implementation of multiplication operator for MKLDNN
7 years ago
Tao Luo 038dbb386e
Merge pull request #9958 from luotao1/find_tensorrt
7 years ago
Kexin Zhao 64bf3df0f9 add print support to float16 (#9960)
7 years ago
Luo Tao d4682247e1 auto find tensorrt library
7 years ago
Yan Chunwei 186659798f
add tensorrt build support(#9891)
7 years ago
Yu Yang 093d227a77 Use mutex to stablize ncclCtxMap
7 years ago
Yi Wang 630943c7a7
Update documentation (#9918)
7 years ago
Yi Wang b48cf1712b
Fix cpplint errors in transform_test.cu (#9915)
7 years ago
Yi Wang 47609ab2b8
Document transform.h and fix cpplint errors (#9913)
7 years ago
Yu Yang 6b20b35589 Fix Transformer Hang Problem
7 years ago
Yu Yang c64190ecbb Polish NCCLHelper
7 years ago
Yu Yang 7483555a81 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/change_int64
7 years ago
qingqing01 129859e732
Support data type int64 in NCCL. (#9818)
7 years ago
Kexin Zhao 7ed457e77a Fix cuda 7.5 error with cublas GEMM (#9811)
7 years ago
Yu Yang 40e3fe173c Make cuda_helper.h Pass cpplint
7 years ago
chengduo b1224da8d9 Move reduceSum to elementwise_op_function.h (#9773)
7 years ago
Kexin Zhao 0f38bb4593
add fp16 support to activation op (#9769)
7 years ago
Yi Wang 8dbd9c394e
Fix part of the cpplint errors in fluid/platform (#9802)
7 years ago
qingqing01 add367c3f4 Code cleanup in the profiler code. (#9782)
7 years ago
Yi Wang 47a4ec0672 Remove call_once.h (#9764)
7 years ago
Yi Wang b1a5a3cab8
Fix cpplint errors with float16* (#9751)
7 years ago
Yi Wang 25ad6884bb Merge branch 'develop' of http://github.com/paddlepaddle/paddle into cpplint-memory-detail
7 years ago
Yi Wang 67ba884d2a Update CMakeLists
7 years ago
Yi Wang 478055bd9f Update CMakeLists.txt
7 years ago
Yi Wang 535646cf25 Update (#9717)
7 years ago
Yi Wang e185502ebe
Fix cpplint errors with paddle/fluid/platform/dynload (#9715)
7 years ago
Yi Wang 0c43a376e2
Fix cpplint errors with paddle/fluid/platform/gpu_info.* (#9710)
7 years ago
Yi Wang 55ffceaadb
Fix cpplint errors paddle/fluid/platform/place.* (#9711)
7 years ago
Yi Wang 809962625f
Fix cpplint errors of enforce.* (#9706)
7 years ago
Yi Wang ef4ee22668
Fix cpplint errors with paddle/fluid/platform/cpu_info* (#9708)
7 years ago
Kexin Zhao b2a1c9e8b7 Add float16 support to non-cudnn softmax op on GPU (#9686)
7 years ago
Yi Wang 797a7184ac
Unify Fluid code to Google C++ style (#9685)
7 years ago
Kexin Zhao d00bd9eb72 Update the cuda API and enable tensor core for GEMM (#9622)
7 years ago
Lei Wang 09b4a1a361 Build: generate all the build related files into one directory. (#9512)
7 years ago