zhaoyuchen2018
b93870e696
Improve topk performance. ( #21087 )
...
* Improve topk performance.
give 200000 data to compute topk,
before opt: cost 1s
after opt: cost 0.0028s.
* Refine return value.
* Add cuda util funtions.
* Fix ComputeBlockSize bug & refine comments.
Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>
5 years ago
Zeng Jinle
a710ccc0cb
refine error message of allocator again, test=develop ( #21023 )
6 years ago
wangchaochaohu
7695b713e1
gpu info query refine test=develop ( #20904 )
6 years ago
Wilber
751812a674
enable cpu machine to run paddle in gpu lib
...
enable cpu machine to run paddle model in gpu lib
6 years ago
Zeng Jinle
708bd9798d
move_flags_to_unified_files_for_management, test=develop ( #19224 )
6 years ago
Zeng Jinle
08fa98f7cc
Fix gpu_info PADDLE_ENFORCE_GT when fraction_of_gpu_memory_to_use=1.0 ( #18950 )
...
* fix gpu_info, test=develop
* fix reserving gpu memory calculation bug, add fraction=1 unittest, test=develop
* fix bug again for reserving size, test=develop
6 years ago
Huihuang Zheng
ea6ee76fa9
GPU allocation uses fraction of available memory ( #18896 )
...
GPU allocation uses fraction of available memory, also fix the GetUsed without lock
6 years ago
zhouwei25
772e09560e
Optimize the content of error reporting information, print error code and official document web sites ( #18671 )
...
optimize the error reporting information of cuda related API
index on develop: 130ac17 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into develop
6 years ago
liuwei1031
759530966c
print out error code of cudaGetDeviceProperties if failed ( #18643 )
6 years ago
Huihuang Zheng
e4a5332416
Fix a typo in gpu_info.cc ( #17175 )
...
test=develop
6 years ago
zhhsplendid
124f1df481
Add flags for init and re-alloc gpu
...
test=develop
6 years ago
zhhsplendid
22715487dc
add allocator flags
...
test=develop
6 years ago
Yiqun Liu
3008fa1261
Add the CUDA kernel for beam_search op ( #15020 )
...
* Refine the beam_search op and test.
* A basic CUDA implementation of beam_search for small batch_size.
* Implement CUDA kernel for beam_search_op.
* Use multiple CUDA threads in the same block to select the top beam.
* Update the python api of beam_search op.
* Enable extend function in CPU kernel of beam_search op.
* Unify the CUDA codes.
test=develop
* Unify the CPU kernel of beam_search op.
* Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.
* Update the description of beam_search in API.spec.
* Enable the use of CUDA kernel in beam_search op.
* Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
test=develop
* Follow comments.
test=develop
* Call the CPU kernel for beam_search op when batch_size > 4.
test=develop
* Remove the except of is_empty op in PrepareData.
test=develop
6 years ago
sneaxiy
9c360cc798
test=develop
6 years ago
sneaxiy
51227bd447
lazy_allocator
...
test=develop
6 years ago
Wu Yi
29d9fb53fc
[Feature] multi process multi gpu dist training, boost v100 performance by 20% ( #14661 )
...
* wip multi process multi gpu dist training
* workable for p2p
* update test=develop
* change back env name test=develop
* fix alloc init
* fix cpu build test=devlop
* fix mac tests test=develop
* refine code
* refine test=develop
6 years ago
minqiyang
a02ce58f2c
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into revert_vlog
...
test=develop
6 years ago
peizhilin
38715e6fd0
minor fix
6 years ago
minqiyang
be04d99fe4
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into revert_vlog
...
test=develop
6 years ago
minqiyang
53433d7f2e
Revert the changes of VLOG
...
test=develop
6 years ago
peizhilin
36cd18b549
Merge remote-tracking branch 'upstream/develop' into windows/build
6 years ago
peizhilin
b2f8d4183d
Given the different fraction_of_gpu_memory_to_use depends on platform
6 years ago
chengduo
00b9e9a135
Refine cublas to support CUBLAS_TENSOR_OP_MATH ( #13929 )
...
* refine cublase
test=develop
* code refine
* refine cublas
* add GEMME_EX
* add enable_cublas_tensor_op_math doc and add cublasCall
test=develop
* fix CublasCall for cuda version
test=develop
* fix error
test=develop
* fix GEMM_EX to be compatible with gcc 4.8
test=develop
* add GEMM_EX
test=develop
* to compatiable with gcc4.8
test=develop
6 years ago
peizhilin
7c8c9dc9bf
fix unit test cases
6 years ago
minqiyang
0c3227a523
Change the origin VLOG level to 10 times
...
Fix code to support cpplint syntax check
test=develop
7 years ago
chengduo
2c9839c847
add cuda version display ( #13885 )
...
test=develop
7 years ago
Xin Pan
ab798a2832
clarify the fraction_of_gpu_memory flag
...
test=develop
7 years ago
typhoonzero
a4f7696a18
Revert "Some trivial optimization ( #13530 )"
...
This reverts commit 1d91a49d2f
.
7 years ago
chengduo
1d91a49d2f
Some trivial optimization ( #13530 )
...
* some trivial opt
* remove the fix of lod_tensor and shrink_rnn_memory_op
* refine ShrinkRNNMemoryOp
test=develop
7 years ago
chenweihang
da39d84a48
refine by reviewer's advice
7 years ago
chenweihang
61052cdbc6
polish high frequency enforce error message
7 years ago
fengjiayi
9f11da5931
Add synchronous TensorCopy and use it in double buffer
7 years ago
Yi Wang
0c43a376e2
Fix cpplint errors with paddle/fluid/platform/gpu_info.* ( #9710 )
...
* Fix cpplint errors with paddle/fluid/platform/gpu_info.*
* Update
7 years ago
Kexin Zhao
1998d5afa2
add gpu info func to get compute cap
7 years ago
chengduoZH
00e596edbe
get max threads of GPU
7 years ago
qingqing01
24509f4af9
Fix the grammar in copyright. ( #8403 )
7 years ago
Yi Wang
fc374821dd
Correct #include path
7 years ago
Yi Wang
90648f336d
Move file to fluid/; Edit CMakeLists.txt
7 years ago