qingqing01
58db07b7bb
Check errors for the cuda kernel calls. ( #5436 )
8 years ago
QI JUN
afd1e844fd
remove unused code ( #5219 )
...
* remove unused code
* fix cmake file
* fix build error
8 years ago
Dong Zhihong
16a39d24f3
fix conflict
8 years ago
Qiao Longfei
56b723c40d
Cudnn batch norm op ( #5067 )
...
* init cudnn batch norm op
* rename batch_norm_cudnn_op.cc batch_norm_op.cu
* correct name style
* add ExtractNCWHD, simplify code
* fix ExtractNCWHD
* use CUDNN_ENFORCE instead of PADDLE_ENFORCE
8 years ago
Dong Zhihong
0990c87bf6
checkin nccl operator
8 years ago
Yu Yang
94e741d6f0
Use external project for NCCL ( #5028 )
8 years ago
Yu Yang
43c6ff212e
Feature/nccl dso ( #5001 )
...
* "add nccl enforce"
* Dev
* Update comment
* Add nccl test
* Follow comments
8 years ago
Markus Kliegl
164898277c
MatMul operator ( #4856 )
...
* initial matmul operator
Similar to np.matmul, but also has transpose_X and transpose_Y flags,
and only supports tensors from rank 1 to 3 inclusive.
For GPU, uses cublas?gemmStridedBatched. For CPU, uses
cblas_?gemm_batch if available via MKL; otherwise a simple serial
implementation that loops over the batch dimension is employed for now.
8 years ago
武毅
a3ccbdb3b6
Cudnn conv op ( #4195 )
...
* add cudnn_conv_op
* WIP
* update
* update
* fix grad check
* use platform::memory
* add support group for cudnn
* update
* follow comments
* fix onlycpu build
* update cuda define
* follow comments
* follow comments
* merge with updates
* fix compile error
* follow comments
* follow comments
8 years ago
Yang Yang(Tony)
c3bf332666
Merge pull request #4537 from QiJune/executor_impl
...
Executor interface design and implementation
8 years ago
Luo Tao
871a3f6e76
remove unused PADDLE_ONLY_CPU comment
8 years ago
Yang Yang
e51557130e
clean up for review
8 years ago
qijun
1f5192a27b
fix executor gpu unittest
8 years ago
qijun
39f75a13a4
Merge remote-tracking branch 'baidu/develop' into executor_impl
8 years ago
Yi Wang
880b874b47
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into paddle_only_cpu
8 years ago
Yi Wang
2b204f048b
Rename platform::GetDeviceCount into platform::GetCUDADeviceCount
8 years ago
qijun
e02cc571cf
Merge remote-tracking branch 'baidu/develop' into executor_impl
8 years ago
qijun
fe10e86dd5
fix gpu build error
8 years ago
Yi Wang
4558807c48
Use PADDLE_WITH_CUDA instead of PADDLE_WITH_GPU
8 years ago
Yu Yang
84500f9487
Change `PADDLE_ONLY_CPU` to `PADDLE_WITH_GPU`
...
By shell command
```bash
sed -i 's#ifdef PADDLE_ONLY_CPU#ifndef PADDLE_WITH_GPU#g' `find ./paddle/ -name '*.h' -o -name '*.cc' -o -name '*.cpp' -o -name '*.c' -o -name '*.cu'`
sed -i 's#ifndef PADDLE_ONLY_CPU#ifdef PADDLE_WITH_GPU#g' `find ./paddle/ -name '*.h' -o -name '*.cc' -o -name '*.cpp' -o -name '*.c' -o -name '*.cu'`
```
8 years ago
qijun
cb198fa7b6
merge baidu/develop
8 years ago
qijun
395051512d
remove device context manager
8 years ago
qijun
6c4d1f551d
refine codes
8 years ago
qijun
023ed5eb39
merge baidu/develop
8 years ago
qijun
b5dbe88b5a
follow comments
8 years ago
dzhwinter
8acc010691
Merge branch 'develop' into macro
8 years ago
dongzhihong
5423cb3e57
format
8 years ago
Yu Yang
8fd845e0fa
Unify Map in OpDescBind
8 years ago
chengduoZH
df59889984
remove conflict
8 years ago
qijun
b611a479fc
fix gpu build error
8 years ago
qijun
7a6fcc7d30
move EigenDeviceConverter to device_context.h
8 years ago
Yu Yang
f2feb33384
Follow comments
8 years ago
Yu Yang
3a5693e0a8
Add Skeleton of Double support
8 years ago
chengduoZH
3c0f079333
remove conflict and fix InferShape function
8 years ago
Yu Yang
bc30ba19ed
Merge pull request #4375 from reyoung/feature/use_bool_for_enforce
...
Use `bool` for PADDLE_ENFORCE, not int
8 years ago
chengduoZH
30a586df0c
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into Add_pool_op
8 years ago
Qiao Longfei
d0ad82cff1
fix nv_library ( #4370 )
...
* fix nv_library
* fix symbol in gpu_info.h
8 years ago
Yu Yang
699dbe3be9
Use `bool` for PADDLE_ENFORCE, not int
...
* If stat is an integer, bool value will implicit cast to int before
pass to PADDLE_ENFORCE
8 years ago
Yu Yang
ba1f5b5c58
Sync computation when Python invoke `run`
...
* Since GPU is an async device by default. We should sync computation
when Python invoke `run`. So Python can get the correct computation
result
8 years ago
chengduoZH
0417e4e4bf
fix framework::LoDTensor => Tensor
8 years ago
dangqingqing
41a2321a0e
Refine platform::Transform function and fix prelu_op testing.
8 years ago
Yu Yang
87e4e25db1
Change Transform API
...
Using DeviceContext, not Place to get stream
8 years ago
Yu Yang
847fe47310
Merge branch 'develop' of github.com:baidu/Paddle into feature/remove_lazy_init_in_dev_ctx
8 years ago
Yu Yang
81d56ca86b
Remove lazy-initialization in device_context
...
* Also use `const DeviceContext&` all the time, to prevent `const_cast`
Fix #4169
Fix #3468
Fix #3475
8 years ago
武毅
8580dce308
Refine accuracy_op CUDA kernel ( #4097 )
...
* refind accuracy_op
* follow comments
* follow comments
8 years ago
Yu Yang
9d3b920d75
Merge pull request #3981 from reyoung/feature/transform_api
...
Host and device transform API
8 years ago
liaogang
59d661b9a9
Fix enforce test failed
...
Note: If no symbol with a suitable value is found, both this field and dli_saddr shall be set to NULL.
8 years ago
Yu Yang
f8c6792aa3
Extract DevPtrCast to device_ptr_cast.h
8 years ago
Yu Yang
54d88d4472
Merge branch 'develop' of github.com:baidu/Paddle into feature/transform_api
8 years ago
Yu Yang
6fbf097bcc
Mark thrust::device_ptr in transform
...
Fix TravisCI
8 years ago