Yan Chunwei
7796f65f89
fix inference on gpu out of mem ( #14414 )
...
* fix inference on gpu out of mem
the transfer logic in operator.cc will keep creating new scopes.
6 years ago
tensor-tang
94ab65d591
disable avx2 and avx512 flag
...
test=develop
6 years ago
dzhwinter
b9fcf8e677
"configure" ( #13539 )
7 years ago
dzhwinter
dbd7896678
cmakelist windows ( #12927 )
...
* picked pr
* "fix ci"
7 years ago
minqiyang
d214dff13c
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_anakin_in_manylinux1
7 years ago
minqiyang
8f8560744a
Reorder the cmake import and add CUDNN_INCLUDE_DIR into anakin cmake module
7 years ago
Yihua Xu
084d4a9e9e
Optimize CRF Decoding with AVX/AVX2/AVX512F instruction ( #12767 )
...
* Optimize CRF decoding with AVX/AVX2 instruction
* Enable the AVX2 flags for compiling
* Clean the code and decrease the count of multiply calculation
* Add the support of AVX512 instruction to optimize CRF Decoding
* Clean the code
* Enable the AVX512f flags for compiling
* Clean the code for the invaluable switch
* Fixed the issue to check AVX512F status
* Clean the code
* Add some explanation of the key points
7 years ago
dzhwinter
00463fdfe3
cudnn windows support ( #12757 )
...
* cudnn widndows
* "add comment"
* "windows support"
* "fix cmake error"
7 years ago
minqiyang
bfe8d6fa66
Polish code
7 years ago
minqiyang
e10e0d4a5b
Fix anakin build problem in manylinux1 docker image
7 years ago
luotao1
413bf9d494
disable anakin when cuda < 8.0 or cudnn < 7.0
7 years ago
dzhwinter
d7873e1412
remove patchelf in windows ( #12710 )
...
* remove patchelf in windowls
* "follow comment"
7 years ago
luotao1
3373535b21
fix specific cudnn include and library path
7 years ago
Luo Tao
e8aa6d1283
add anakin compiler from github source code
7 years ago
gongweibao
66c91911cf
Improve brpccmake ( #11842 )
7 years ago
Wu Yi
34865f2de3
Trainer send term signal ( #11220 )
...
* wip
* use executor.complete to end trainer
* fix build
* fix build with distribute off
* fix typo
* fix cmake typo
* fix build
7 years ago
gongweibao
d9de6b8621
Add brpc surpport. ( #11263 )
7 years ago
Luo Tao
e116129f03
rewrite unittest of trt_activation_op
7 years ago
Houjiang Chen
83f4e9e9a6
enable eigen multi-threads on mobile device ( #10938 )
7 years ago
Luo Tao
d4682247e1
auto find tensorrt library
7 years ago
sabreshao
45c988d86a
Demostration of cmake refine for HIP support.
...
1. Add option WITH_AMD_GPU.
2. Add cmake/hip.cmake for HIP toolchain.
3. Some external module such as eigen may need HIP port.
4. Add macro hip_library/hip_binary/hip_test to cmake/generic.cmake.
5. Add one HIP source concat.hip.cu as an example. Each .cu may have its corresponding .hip.cu.
7 years ago
Yu Yang
22b5c07a7d
Fix the compilation on CUDA 9.1/GCC 5.3
...
* Make CUPTI_LIB_PATH not passing by macro.
* Add missing header
7 years ago
Xin Pan
b9ec24c6e9
Extend current profiler for timeline and more features.
7 years ago
qingqing01
24509f4af9
Fix the grammar in copyright. ( #8403 )
7 years ago
Kexin Zhao
4901184ee9
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into float16
7 years ago
Kexin Zhao
41bd1f9115
fix gpu test, clean code and add cmake
7 years ago
tensor-tang
8496eab45a
make mklml necessary when with_mkldnn
7 years ago
Yu Yang
94e741d6f0
Use external project for NCCL ( #5028 )
8 years ago
Yu Yang
43c6ff212e
Feature/nccl dso ( #5001 )
...
* "add nccl enforce"
* Dev
* Update comment
* Add nccl test
* Follow comments
8 years ago
Yan Chunwei
843ed8e320
dynamic recurrent op forward c++ implentation ( #4597 )
8 years ago
Yi Wang
f985700abf
Resolve conflict
8 years ago
Yu Yang
84500f9487
Change `PADDLE_ONLY_CPU` to `PADDLE_WITH_GPU`
...
By shell command
```bash
sed -i 's#ifdef PADDLE_ONLY_CPU#ifndef PADDLE_WITH_GPU#g' `find ./paddle/ -name '*.h' -o -name '*.cc' -o -name '*.cpp' -o -name '*.c' -o -name '*.cu'`
sed -i 's#ifndef PADDLE_ONLY_CPU#ifdef PADDLE_WITH_GPU#g' `find ./paddle/ -name '*.h' -o -name '*.cc' -o -name '*.cpp' -o -name '*.c' -o -name '*.cu'`
```
8 years ago
Yi Wang
5f51d0afc4
Add -D PADDLE_WITH_CUDA in cmake/configure.cmake
8 years ago
hedaoyuan
adcca2cc06
Add PADDLE_USE_EIGEN_FOR_BLAS macro.
8 years ago
liaogang
7a56d46a8a
Rename PROJ_ROOT to PADDLE_SOURCE_DIR and PROJ_BINARY_ROOT to PADDLE_BINARY_DIR
8 years ago
tensor-tang
e71976f221
remove global linker and exe from mkldnn iomp
8 years ago
tensor-tang
577bb4e346
rename mkllite to mklml
8 years ago
tensor-tang
f490d94210
separate MKL_LITE from MKLDNN
8 years ago
tensor-tang
89a4158038
enable MKLDNN library and MKL small package
8 years ago
Helin Wang
fbfbe93a78
cmake: do not run glide install every time.
8 years ago
yi.wu
9c853c269d
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cmake_go_vendor
8 years ago
Helin Wang
38790c1c21
fix according to comment
8 years ago
qiaolongfei
9e13b68f01
refine code
8 years ago
qiaolongfei
d9aac1e13d
add WITH_Go to disable compile go to paddle
8 years ago
Liu Yiqun
ea5b265fdb
Fix the setting of simd flags in cmake when there is no avx and sse3 support.
8 years ago
Liu Yiqun
1b8564206f
Use native cross-compiling support for Android of cmake.
8 years ago
Liu Yiqun
38fa74edaa
Fix cmake error of failing to find UINT64_MAX.
8 years ago
Liu Yiqun
ccd3d0a42b
Modify cmake for cross-compiling on arm architecture.
8 years ago
liaogang
c24e94c8a4
Check python if system already equipped one
8 years ago
liaogang
9e7f2b8de8
Add system configure
8 years ago