Paddle

Commit Graph

Author	SHA1	Message	Date
HaoRen	b7128bac5f	supports collective communicated training (#18175 ) * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O	6 years ago
wangchaochaohu	c10157a5df	revise the cudnn conv choose algorithm to improve the performance(mask rcnn benchmark) (#17753 ) * revise conv layer cudnn algo choose test=develop * update for code style test=develop * update for code style test=develop	6 years ago
chengduo	863c75168c	polish error doc (#17772 ) test=develop	6 years ago
Tao Luo	ff1661f12a	remove unused FLAGS_warpctc_dir (#17162 ) * remove unused FLAGS_warpctc_dir test=develop * remove FLAGS_warpctc_dir test=develop	6 years ago
Chen Weihang	0b2aec14b6	Revert "Model data cryption link all lib (#16555 )" test=develop This reverts commit `c38c7c5619`.	6 years ago
Chen Weihang	c38c7c5619	Model data cryption link all lib (#16555 ) * link the libwbaes.so into paddle * polish detail, test=develop * try fix mac_pr_ci error, test=develop * add compile option, test=develop * fix ci error, test=develop * ignore failed to find mac lib, test=develop * change cdn to bj, cdn can't get the latest version * trigger ci, test=develop * temporary delete win32 lib linking, test=develop * change https to http, test=develop * turn compile option on to off * turn compile option off to on, test=develop * try lib compiled by gcc4.8, test=develop * update lib version, test=develop * link other lib, test=develop * add setup config * delete false, test=develop * delete no_soname, test=develop * recover so name set * fix, test=develop * adjust make config, test=develop * remove link to wbaes, test=develop * remove useless define, test=develop	6 years ago
Tao Luo	4efdebc6f6	Merge pull request #15931 from yihuaxu/develop_2c5c7b2a7_gelu_mkl_opt Optimize gelu operation with mkl erf	6 years ago
dzhwinter	225c11a91f	polish cudnn related code and fix bug. (#15164 ) * staged. * polish code * polish code. test=develop * polish code. test=develop * api change. test=develop * fix default value. test=develop * fix default value. test=develop	6 years ago
Yihua Xu	7396788694	Optimize gelu operation with mkl erf. test=develop	6 years ago
tensor-tang	ee2321debd	Revert 15770 develop `a6910f900` gelu mkl opt (#15872 ) * Revert "Optimze Gelu with MKL Erf function (#15770)" This reverts commit `676995c86c`. * test=develop	6 years ago
Yihua Xu	676995c86c	Optimze Gelu with MKL Erf function (#15770 ) * Optimize for gelu operator * Set up the low accuracy mode of MKL ERF function. test=develop * Only enable MKLML ERF when OS is linux * Use the speical mklml version included vmsErf function to verify gelu mkl kernel. test=develop * Add the CUDA macro to avoid NVCC's compile issue. test=develop * Add the TODO comments for mklml library modification. test=develop * Clean Code test=develop * Add the comment of marco for NVCC compiler. test=develop	6 years ago
tensor-tang	8117725852	add jit kernel hsum, hmax and softmax refer code test=develop	6 years ago
peizhilin	1e7f83e60a	add cuda dso support for windows test=develop	6 years ago
peizhilin	40a94a138f	remove irrelevant fix for mkl test=develop	6 years ago
peizhilin	ed5bd5e586	test=develop	6 years ago
Yu Yang	7b10bf0e60	Use mkl	6 years ago
liuhongyu	8daf67f90f	fix bugs; test=develop	6 years ago
liuhongyu	968dd3c078	add cudnn 5 support; test=develop	6 years ago
phlrain	cf1fe61004	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_cudnn_lstm	6 years ago
Tao Luo	ea47685f91	Merge pull request #14646 from jczaja/prv-softmax-mkl-sasum Softmax for inference MKL further changes	6 years ago
Jacek Czaja	8bfa1fa9bb	- ASUM MKL integration	6 years ago
liuhongyu	05917c3c79	add cudnn lstm; test=develop	6 years ago
minqiyang	be04d99fe4	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into revert_vlog test=develop	6 years ago
minqiyang	53433d7f2e	Revert the changes of VLOG test=develop	6 years ago
peizhilin	36cd18b549	Merge remote-tracking branch 'upstream/develop' into windows/build	6 years ago
chengduozh	f7847ca6a3	fix cublas warp error test=develop	6 years ago
chengduo	00b9e9a135	Refine cublas to support CUBLAS_TENSOR_OP_MATH (#13929 ) * refine cublase test=develop * code refine * refine cublas * add GEMME_EX * add enable_cublas_tensor_op_math doc and add cublasCall test=develop * fix CublasCall for cuda version test=develop * fix error test=develop * fix GEMM_EX to be compatible with gcc 4.8 test=develop * add GEMM_EX test=develop * to compatiable with gcc4.8 test=develop	6 years ago
peizhilin	7c8c9dc9bf	fix unit test cases	6 years ago
wopeizl	d9a1f3e58e	Windows/online (#14474 ) * add recordio support * disable the openblas multi-thread on windows since no support adjust the python script * code style * code style test=develop * add create_recordio_file_reader back * fix code style test=develop * fix the gtest.cmake on windows * fix cc_test on windows * fix the win build test=develop * remove fused compile support on windows test=develop * add the jit support test=develop * add the jit support, test=develop * add the jit support, test=develop * add the jit back fix compile error on windows * rollback test=develop * test case fix * disable DSO by default on windows * exclude warpctc_op on windows * exclude the dynload_warpctc out on windows test=develop * fix the scripts error test=develop * disable avx on windows by default test=develop * re-organize the cmake file * disable mkl on windows by default * add warp_ctc back * fix the dependency * fix the dependency * fix the build issue on windows * remove unsupported flag on windows * code style * code style test=develop * fix issue * add profiler, parallel_executor back * clean up the pre-definitions on windows * fix build issue * test=develop	6 years ago
peizhilin	6e66fadb95	clean up the pre-definitions on windows	6 years ago
qingqing01	fd7e643153	Convolution fusion operator. (#14449 ) * Convolution fusion operator. * Clean code test=develop	6 years ago
Wu Yi	b32c13dc20	Add cudnn ctc loss (#12366 ) * add cudnn ctc loss * wip add test test=develop * wip * wip * done test=develop * move include cudnn test=develop * test test=develop * fix build test=develop * fix build test=develop * fix build on cudnn5 test=develop * fix cudnn5 build test=develop * fix cudnn5 build test=develop * merge develop softmax functor change test=develop	6 years ago
tensor-tang	1be85d011d	add mkl vsqr and vpow	6 years ago
Qiyang Min	698698f2fa	Merge branch 'develop' into fix_vlog	6 years ago
qingqing01	abe209234f	Exhaustive search for cuDNN conv. (#14286 ) * exhaustive search for cuDNN conv. * Refine code and add unit testing. * Fix model load in fluid/inference and unit testing in conv2d * Follow comments. * Fix compiling test=develop	6 years ago
minqiyang	0c3227a523	Change the origin VLOG level to 10 times Fix code to support cpplint syntax check test=develop	6 years ago
qingqing01	db8c52da5e	Revert " Exhaustive search for cuDNN conv. (#14043 )" This reverts commit `ce7d9b0799`.	6 years ago
qingqing01	ce7d9b0799	Exhaustive search for cuDNN conv. (#14043 ) * exhaustive search for cuDNN conv. * Refine code and add unit testing. * Clean code * Fix model load in fluid/inference and unit testing in conv2d * Follow comments.	6 years ago
whs	0c319e0b35	Add affine grid generator op (#12238 ) * Add affine grid generator. * fix ffine grid. * Add unitest. * Add CPU kernel and fix unitest. * Fix CPU kernel. * Refine code. test=develop * Fix python api. test=develop * Update python api. test=develop * Fix comment. test=develop * Rename affine_grid_generator to affine_grid and enhence unitest. test=develop * Fix unitest. test=develop	6 years ago
dzhwinter	2d00e65819	namespace issue (#13543 ) * flags * "follow comment"	6 years ago
JiabinYang	e322fc4e0e	add error info for nccl not found	7 years ago
dzhwinter	d361624c1d	platform module (#12932 ) * platform module * Update profiler.h	7 years ago
tensor-tang	3dd66390b2	add blas vexp	7 years ago
tensor-tang	0ec1f65cf1	fix blas dot and add cblas scal	7 years ago
tensor-tang	a2203d0466	add cblas dot	7 years ago
dzhwinter	e23ddf6ae4	status (#12764 )	7 years ago
Tao Luo	d04ef276a5	Merge pull request #12745 from tensor-tang/refine/op/elewise_mul Refine elementwise mul cpu forward	7 years ago
dzhwinter	00463fdfe3	cudnn windows support (#12757 ) * cudnn widndows * "add comment" * "windows support" * "fix cmake error"	7 years ago
tensor-tang	6644ce79a5	add mklml vmul	7 years ago
tensor-tang	43cee33a23	add mkl packed gemm	7 years ago

1 2

77 Commits (2246f7c133e3dc3cfd9f2779fd2f4cc2778c7ea7)