qingqing01
45073b7c39
Always synchronize when copy data on GPU from C++ to Numpy array. ( #9110 )
7 years ago
Yu Yang
35744e7b36
Polish code
7 years ago
Xin Pan
d284cf88e5
Merge pull request #9037 from panyx0718/develop
...
Better timeline
7 years ago
Yu Yang
ae88fdefb7
Use thread pool
7 years ago
dzhwinter
128adf53cb
[Speed]implement cudnn sequence softmax cudnn ( #8978 )
...
* "add softmax cudnn functor support"
* "add testing"
* "refine cmakelist"
* "sequence softmax forward speed up"
* "add softmax grad"
* "fix sequence softmax test"
* "add double precision'
* "fix softmax test"
* "add softmax cudnn support"
* "fix softmax cudnn test"
* "add softmax to nn.py"
* "fix compile bug"
* "refine cmakelist"
* "fix ci"
* "fix based on comment"
* "fix based on comments"
* "fix ci"
7 years ago
Kexin Zhao
e26f1123da
Add fp16 mul op support and bind paddle fp16 to numpy fp16 ( #9017 )
...
* add fp16 mul op support
* small fix
* fix bug
* small fix
* fix PADDLE_WITH_CUDA compiling issue
* reorg code
* test for pybind
* treate as float16 as uint16_t in pybind
* bind np.float16 to paddle float16
* small fix
* clean code
* remove redundancy
* fix mul_op test
* address comments
* small fix
* add is_float16_supported func
7 years ago
dzhwinter
7140071152
"exported scatter to python" ( #9038 )
...
* "exported scatter to python"
* Revert ""exported scatter to python""
This reverts commit 38745a626c3f937bec836c92c98a76deadf0a03d.
* "polish scatter and export to python"
7 years ago
Tao Luo
cf2addd21f
Merge pull request #9067 from luotao1/with_fluid
...
enable WITH_FLUID option
7 years ago
chengduo
11c43e5da3
Merge pull request #9072 from chengduoZH/feature/refine_parallel_do
...
Refine parallel_do_grad
7 years ago
Abhinav Arora
41894da145
Add changes to channel that are needed for select op ( #9084 )
7 years ago
Yu Yang
692a0f7425
Better name
7 years ago
Yu Yang
baef1124fb
ParallelExecutor And dependency engine
7 years ago
Yibing Liu
90afbd2856
Move back operator's event to RunImpl()
7 years ago
Xin Pan
4840c49b27
Better timeline
7 years ago
chengduoZH
ef28e7deba
refine parallel_do_grad
7 years ago
Luo Tao
76e1c6af9f
enable WITH_FLUID option
7 years ago
Yu Yang
48f213e5a1
Merge pull request #8991 from reyoung/feature/shuffle_reader
...
Feature/shuffle reader
7 years ago
Cao Ying
881c5227ab
Merge pull request #8843 from zhouhanqing/Paddle-ReduceProd
...
Add product reduction for reduce op.
7 years ago
武毅
d13ce35875
Feature/send recv can now retry ( #9027 )
7 years ago
dzhwinter
14fe40aaa6
Refine/nccl ( #9009 )
...
* "Refine nccl op"
* "refine code "
* "refine nccl code"
7 years ago
chengduo
788c600e9d
Merge pull request #8932 from chengduoZH/feature/add_concat_rows
...
Enhance look_up_table op
7 years ago
Yang Yang
8f061e43b7
delete param name
7 years ago
Yang Yang
0621c327f1
init commit
7 years ago
chengduoZH
92e2207e18
refine doc
7 years ago
Yu Yang
164f2382af
Polish code
7 years ago
chengduoZH
ff09b21cd0
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/add_concat_rows
7 years ago
Yu Yang
e13aec601a
Merge pull request #8830 from reyoung/feature/recordio_file_reader
...
Feature/recordio file reader
7 years ago
Yu Yang
f9974a4a12
Make double_buffer reader async
7 years ago
Yu Yang
a8c076e577
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/shuffle_reader
7 years ago
chengduoZH
b9397b2668
remove concat_rows
7 years ago
QI JUN
7287630e83
Repair nccl op test ( #8575 )
...
* fix nccl op unit test
* fix build error
* format code
* refine nccl related unit test
* fix build error
* add setGPUData
* clean up
* follow comments
* rm test_nccl.cu
* follow comment
* rm wait
7 years ago
Yu Yang
b52ad9de92
Merge pull request #9000 from reyoung/feature/extract_prepare_from_executor_run
...
Extract Prepare from Executor
7 years ago
Tao Luo
b62874429d
Merge pull request #8910 from Xreki/core_inference_profile
...
Refine the profile codes for inference.
7 years ago
Yu Yang
43d09a1c5f
Extract Prepare from Executor
7 years ago
Yu Yang
225efa671f
Remove dims in base class
7 years ago
QI JUN
f7e9fe57d3
[Memory]More memory optimization policy ( #8690 )
...
* add memopt level
* add opt level for image classification demo
* clean code
* add delete op
* clean code
* test machine translation demo
* clean code
* clean code
* skip fill constant with force cpu
* clean code
* clean code
* refine code
* clean code
* fix bug
7 years ago
Yu Yang
2ea4a5d96c
Polish double buffer reader
7 years ago
kexinzhao
607eec30a8
Merge pull request #8946 from kexinzhao/fix_cuda_arch_fp16
...
Add GPU compute capability check for float16 math function test
7 years ago
Yancey
b5ef315cf1
Fix dist compile error ( #8987 )
7 years ago
qingqing01
b3d26cd3ad
Fix bug in detection_output and mAP calculation in SSD. ( #8985 )
...
* Clipping bbox in the mAP evaluator calculation.
* Fix bug in detection_output and mAP calculation in SSD.
* Fix bug in detection.py.
* Fix bug in test_detection_map_op.py.
7 years ago
Yu Yang
46ae4075ee
Polish ShuffleReader and test
7 years ago
Kexin Zhao
c88f58dbd8
add comment
7 years ago
chengduoZH
f1c3ecb2b2
add concat rows
7 years ago
chengduo
685f03762e
Merge pull request #8890 from chengduoZH/feature/fix_bug_of_elementwise
...
Add ElementwiseOpInferVarType for Elementwise_op
7 years ago
Kexin Zhao
3b44b849d3
address comments
7 years ago
fengjiayi
dd1244f3c9
Merge pull request #8943 from JiayiFeng/fix_bugs_in_readers
...
Fix a potential bug in the c++ reader
7 years ago
Yu Yang
7eedced82a
Polish RecordIO
7 years ago
Yu Yang
cfca8a3a26
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/recordio_file_reader
7 years ago
Yu Yang
fea43077f6
Refine
7 years ago
pzelazko-intel
4730a4be24
MKLDNN pool2d OP kernel added ( #8879 )
...
* MKLDNN pool2d OP kernel added
* conv2d and pool2d MKLDNN kernels renamed
* MKLDNN conv2d kernel refactoring
7 years ago
Kexin Zhao
95de7617eb
fix bug
7 years ago
Kexin Zhao
1998d5afa2
add gpu info func to get compute cap
7 years ago
Kexin Zhao
d400b4192d
fix math function arch mismatch for older GPU
7 years ago
fengjiayi
614c33fb3a
fix a potential bug in the c++ reader
7 years ago
chengduoZH
1509ce6638
enhancement look_up_table
7 years ago
fengjiayi
aa3f5058d3
Merge pull request #8841 from JiayiFeng/dev_double_buffer_for_cpp_reader
...
Basic double buffer for cpp reader
7 years ago
QI JUN
b341bac7e1
Refine cast op ( #8923 )
...
* fix mac build error
* override GetExpectedKernelType for cast op
* fix typo
* add cuda unittest
7 years ago
Yancey
8468037918
Fix sparse update memory error for distributed training ( #8837 )
...
Fix sparse update memory error for distributed training
7 years ago
fengjiayi
35e1e0d521
uses channel to replace the traditional buffer
7 years ago
fengjiayi
b3a11fdf3a
Merge branch 'rm_reader_HasNext' into dev_double_buffer_for_cpp_reader
7 years ago
fengjiayi
6e5736e270
fix a compile error
7 years ago
fengjiayi
4e517881f7
remove HasNext
7 years ago
Liu Yiqun
a8e8507767
Refine the profile codes for inference.
7 years ago
武毅
9dd34e4169
update unpushed commits for zerocopy grpc ( #8900 )
7 years ago
zhouhanqing
9d78971d8b
Some comments have been modified.
7 years ago
Xin Pan
b825c79261
Merge pull request #8897 from panyx0718/message
...
Print exception message from threads
7 years ago
zhouhanqing
3ca968441d
Merge branch 'develop' into Paddle-ReduceProd
7 years ago
kexinzhao
90215b7844
Add float16 GEMM math function on GPU ( #8695 )
...
* test cpu float16 data transform
* add isnan etc
* small fix
* fix containsNAN test error
* add data_type transform GPU test
* add float16 GPU example
* fix error
* fix GPU test error
* initial commit
* fix error
* small fix
* add more gemm fp16 tests
* fix error
* add utility function
7 years ago
武毅
45af8c1e99
Performance/zero copy variable seriralization ( #8839 )
7 years ago
Xin Pan
9a27d3af23
Print exception message from threads
7 years ago
chengduoZH
53d19f5b1e
Add ElementwiseOpInferVarType
7 years ago
qingqing01
ffda2c414d
Clipping bbox in the mAP evaluator calculation. ( #8872 )
7 years ago
Yiqun Liu
fecc9a38c6
Add test for nested RecordEvent. ( #8773 )
...
* Add test for nested RecordEvent.
* Remove the debug information.
* Add log information for the 3 usages and reduce the loop counts of nested case.
7 years ago
Xin Pan
a9b9ec45ab
Merge pull request #8775 from panyx0718/test2
...
Improve the timeline profiler
7 years ago
Yu Yang
9d4c93a0a7
Fix CI
7 years ago
chengduo
abb10556e8
Merge pull request #8859 from chengduoZH/feature/refine_exe_log
...
Add log before op Run
7 years ago
Yu Yang
b536799af0
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/recordio_file_reader
7 years ago
Xin Pan
30e556d675
Use vlog instead.
7 years ago
Yu Yang
db46778bdd
Polish codes and comments
7 years ago
Yu Yang
5cb79524d2
Fix CI
7 years ago
QI JUN
47ca1814f3
fix mac build error ( #8856 )
7 years ago
chengduoZH
f7c7135673
Add log before op Run
7 years ago
chengduo
f3cdeb9a29
Merge pull request #8820 from chengduoZH/feature/refine_elementwise_
...
[Speed] Refine elementwise sub,div,min,max gradient functor
7 years ago
Xin Pan
eb46845313
Add warning
7 years ago
Yiqun Liu
a032f56f7c
Add profiling information for inference example ( #8748 )
...
* Add profiling information for inference example, recognize digits.
* Refine the profiling method.
* Correct the use of RecordEvent and simplify recognize_digits.
7 years ago
qingqing01
ded34b2c0f
Fix detection_map_op for multi-device. ( #8845 )
7 years ago
kexinzhao
7f00716c87
Add context wait in type_transform ( #8850 )
7 years ago
Tao Luo
6f50dee4d5
compile and install the static library of fluid inference ( #7827 )
...
* compile and install the static library of fluid inference
* fix dynload_cuda not in CPU mode
* update shared library and adjust the deploy of openblas
* adjust the deploy of openblas
* * auto add all fluid modules for static library
* use libprotobuf.a instead of libprotobuf-lite.a for profiler
* use set_property to set the global varible instead of ENV
* add gpu depends of fluid modules, auto add inference_lib_dist depends
* change the condition of openblas_lib, and fix a typo
7 years ago
Yu Yang
72be7a6151
Complete RecordIO reader op
7 years ago
fengjiayi
b1f647fd6d
fix errors
7 years ago
zhouhanqing
732eebb286
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into Paddle-ReduceProd
7 years ago
zhouhanqing
15306ffdc3
add product reduction for reduce_op
7 years ago
fengjiayi
e8d21b6349
fix an error
7 years ago
fengjiayi
4fb7b96756
Add basic double buffer reader
7 years ago
Luo Tao
49f3f1db07
add back framework_proto depends
7 years ago
Luo Tao
3ddc997182
rename concat_functor to concat, refine CMakeLists based on comments
7 years ago
Luo Tao
1ef97fa7b1
Merge branch 'develop' into math_function
7 years ago
Yu Yang
bcb80756af
Add Writer/Scanner
...
Make vec<Tensor> can be serialized to RecordIO
7 years ago
chengduo
84aea8a8a1
Merge pull request #8669 from chengduoZH/feature/concat_op
...
Refine concat_op
7 years ago
pzelazko-intel
8c71adaa8c
MKLDNN conv2d kernel added ( #8451 )
...
* MKLDNN conv2 OP kernel added
* TODOs added
* mkldnn conv2d OP refactor
* CanCUDNNBeUsed and CanMKLDNNBeUsed moved
7 years ago