Paddle

Commit Graph

Author	SHA1	Message	Date
zhaoyuchen2018	b93870e696	Improve topk performance. (#21087 ) * Improve topk performance. give 200000 data to compute topk, before opt: cost 1s after opt: cost 0.0028s. * Refine return value. * Add cuda util funtions. * Fix ComputeBlockSize bug & refine comments. Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com>	5 years ago
Zeng Jinle	a710ccc0cb	refine error message of allocator again, test=develop (#21023 )	6 years ago
wangchaochaohu	7695b713e1	gpu info query refine test=develop (#20904 )	6 years ago
Wilber	751812a674	enable cpu machine to run paddle in gpu lib enable cpu machine to run paddle model in gpu lib	6 years ago
Zeng Jinle	708bd9798d	move_flags_to_unified_files_for_management, test=develop (#19224 )	6 years ago
Zeng Jinle	08fa98f7cc	Fix gpu_info PADDLE_ENFORCE_GT when fraction_of_gpu_memory_to_use=1.0 (#18950 ) * fix gpu_info, test=develop * fix reserving gpu memory calculation bug, add fraction=1 unittest, test=develop * fix bug again for reserving size, test=develop	6 years ago
Huihuang Zheng	ea6ee76fa9	GPU allocation uses fraction of available memory (#18896 ) GPU allocation uses fraction of available memory, also fix the GetUsed without lock	6 years ago
zhouwei25	772e09560e	Optimize the content of error reporting information, print error code and official document web sites (#18671 ) optimize the error reporting information of cuda related API index on develop: 130ac17 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into develop	6 years ago
liuwei1031	759530966c	print out error code of cudaGetDeviceProperties if failed (#18643 )	6 years ago
Huihuang Zheng	e4a5332416	Fix a typo in gpu_info.cc (#17175 ) test=develop	6 years ago
zhhsplendid	124f1df481	Add flags for init and re-alloc gpu test=develop	6 years ago
zhhsplendid	22715487dc	add allocator flags test=develop	6 years ago
Yiqun Liu	3008fa1261	Add the CUDA kernel for beam_search op (#15020 ) * Refine the beam_search op and test. * A basic CUDA implementation of beam_search for small batch_size. * Implement CUDA kernel for beam_search_op. * Use multiple CUDA threads in the same block to select the top beam. * Update the python api of beam_search op. * Enable extend function in CPU kernel of beam_search op. * Unify the CUDA codes. test=develop * Unify the CPU kernel of beam_search op. * Ensure the seletced items of beam_search_op's CPU kernel sorted by scores. * Update the description of beam_search in API.spec. * Enable the use of CUDA kernel in beam_search op. * Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements. test=develop * Follow comments. test=develop * Call the CPU kernel for beam_search op when batch_size > 4. test=develop * Remove the except of is_empty op in PrepareData. test=develop	6 years ago
sneaxiy	9c360cc798	test=develop	6 years ago
sneaxiy	51227bd447	lazy_allocator test=develop	6 years ago
Wu Yi	29d9fb53fc	[Feature] multi process multi gpu dist training, boost v100 performance by 20% (#14661 ) * wip multi process multi gpu dist training * workable for p2p * update test=develop * change back env name test=develop * fix alloc init * fix cpu build test=devlop * fix mac tests test=develop * refine code * refine test=develop	6 years ago
minqiyang	a02ce58f2c	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into revert_vlog test=develop	6 years ago
peizhilin	38715e6fd0	minor fix	6 years ago
minqiyang	be04d99fe4	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into revert_vlog test=develop	6 years ago
minqiyang	53433d7f2e	Revert the changes of VLOG test=develop	6 years ago
peizhilin	36cd18b549	Merge remote-tracking branch 'upstream/develop' into windows/build	6 years ago
peizhilin	b2f8d4183d	Given the different fraction_of_gpu_memory_to_use depends on platform	6 years ago
chengduo	00b9e9a135	Refine cublas to support CUBLAS_TENSOR_OP_MATH (#13929 ) * refine cublase test=develop * code refine * refine cublas * add GEMME_EX * add enable_cublas_tensor_op_math doc and add cublasCall test=develop * fix CublasCall for cuda version test=develop * fix error test=develop * fix GEMM_EX to be compatible with gcc 4.8 test=develop * add GEMM_EX test=develop * to compatiable with gcc4.8 test=develop	6 years ago
peizhilin	7c8c9dc9bf	fix unit test cases	6 years ago
minqiyang	0c3227a523	Change the origin VLOG level to 10 times Fix code to support cpplint syntax check test=develop	7 years ago
chengduo	2c9839c847	add cuda version display (#13885 ) test=develop	7 years ago
Xin Pan	ab798a2832	clarify the fraction_of_gpu_memory flag test=develop	7 years ago
typhoonzero	a4f7696a18	Revert "Some trivial optimization (#13530 )" This reverts commit `1d91a49d2f`.	7 years ago
chengduo	1d91a49d2f	Some trivial optimization (#13530 ) * some trivial opt * remove the fix of lod_tensor and shrink_rnn_memory_op * refine ShrinkRNNMemoryOp test=develop	7 years ago
chenweihang	da39d84a48	refine by reviewer's advice	7 years ago
chenweihang	61052cdbc6	polish high frequency enforce error message	7 years ago
fengjiayi	9f11da5931	Add synchronous TensorCopy and use it in double buffer	7 years ago
Yi Wang	0c43a376e2	Fix cpplint errors with paddle/fluid/platform/gpu_info.* (#9710 ) * Fix cpplint errors with paddle/fluid/platform/gpu_info.* * Update	7 years ago
Kexin Zhao	1998d5afa2	add gpu info func to get compute cap	7 years ago
chengduoZH	00e596edbe	get max threads of GPU	7 years ago
qingqing01	24509f4af9	Fix the grammar in copyright. (#8403 )	7 years ago
Yi Wang	fc374821dd	Correct #include path	7 years ago
Yi Wang	90648f336d	Move file to fluid/; Edit CMakeLists.txt	7 years ago

38 Commits (3e1404d2087e4ea52e61ba4273638a4fa00e2928)