Paddle

Commit Graph

Author	SHA1	Message	Date
wopeizl	3ccd8964a4	Merge pull request #15905 from wopeizl/win/fix_eigen fix build issue on windows for sample prop op	7 years ago
chengduo	8e904d322f	Remove unnecessary dependence for profiler (#15899 ) * refile profiler test=develop * follow comment test=develop	7 years ago
Xin Pan	44e7fcddc5	Merge pull request #15844 from panyx0718/infer add per kernel config and remove const_cast.	7 years ago
Jacek Czaja	dec9cf53c8	[MKL-DNN] MKL-DNN specific Tensor modification (#15429 ) * - Implemented draft of primitive desc keeping in Tensor test=develop - TransposeMKLDNNHandler::AcquireSrcMemory was reimplemented - Added nchw and nc formats setting for sake of compatiblity Fixed unit tests - Worakaround to problem with 5D data in conv - Added 3D and 1D MKL-DNN formats for name handles for tensor test=develop - Fix to UTs test=develop - Conv fp32 op was updated Cosmetic fixes test=develop - tensor mkldnn cosmetics test=develop - Moved most of mkl-dnn specific code from Tensor to mkl-dnn utils * - Lint fixes test=develop * - setting prim dec in Tensor , sets also layout to kMKLDNN test=develop * - Moved creation of prim desc totally out of Tensor test=develop * - Cosmetic fixes adter review test=develop	7 years ago
peizhilin	6ccdb1b947	fix build issue on windows for sample prop op test=develop	7 years ago
Dun	c6bd434ffe	add memset CUPTI && test=develop (#15868 )	7 years ago
Sylwester Fraczek	74672d1aff	Change (smart_ptr.get()) -> smart_ptr reason: dereferencing smart pointer is the same as the underlying pointer test=develop	7 years ago
tensor-tang	ee2321debd	Revert 15770 develop `a6910f900` gelu mkl opt (#15872 ) * Revert "Optimze Gelu with MKL Erf function (#15770)" This reverts commit `676995c86c`. * test=develop	7 years ago
chengduo	3b08c9abf4	enhance profiler (#15842 ) test=develop	7 years ago
Yihua Xu	676995c86c	Optimze Gelu with MKL Erf function (#15770 ) * Optimize for gelu operator * Set up the low accuracy mode of MKL ERF function. test=develop * Only enable MKLML ERF when OS is linux * Use the speical mklml version included vmsErf function to verify gelu mkl kernel. test=develop * Add the CUDA macro to avoid NVCC's compile issue. test=develop * Add the TODO comments for mklml library modification. test=develop * Clean Code test=develop * Add the comment of marco for NVCC compiler. test=develop	7 years ago
Tao Luo	e3dd6970fc	disable dam temporarily (#15860 ) test=develop	7 years ago
Dun Liang	35a90e06bf	test=develop	7 years ago
Dun Liang	c9080f516b	test=develop	7 years ago
Dun Liang	1c7bb0e40c	test=develop	7 years ago
Xin Pan	5eb87506bc	add per kernel config and remove const_cast. test=develop	7 years ago
Dun	a83e470405	Profiler refine and add CUDA runtime api tracer (#15301 ) * refine profiler && add runtime tracer * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * fix bug && test=develop * add thread id map && test=develop * test=develop * testing * bug fix * remove cuda event && refine code && test=develop * test=develop * test=develop * test=develop * fix windows temp file && test=develop * test=develop * fix windows bug && test=develop * fix start up issue && test=develop * code polish && test=develop * remove unused code && test=develop * add some cupti cbid && test=develop * add FLAGS_multiple_of_cupti_buffer_size && test=develop * fix compile error && test=develop * add keyword && test=develop * fix && test=develop * code polish && test=develop	7 years ago
mozga-intel	13ec2d331b	Enable momentum operator for a ngraph engine (#15673 ) * Enable momentum operator for a ngraph engine test=develop * Update tests test=develop * Unnecessary line of the code as intended was removed test=develop	7 years ago
Tao Luo	c797a1f050	remove legacy any.cmake	7 years ago
Tao Luo	bd2fa73620	Merge pull request #15794 from sneaxiy/fix-warnings Fix compile warning	7 years ago
tensor-tang	e1c707fe9c	fix warnings (#15790 ) * fix warnings test=develop * fix enforce test test=develop	7 years ago
sneaxiy	9b8e0e2f17	fix enforce_test test=develop	7 years ago
sneaxiy	209b355762	fix many warning test=develop	7 years ago
Zeng Jinle	fc87ef741b	Merge pull request #15687 from sneaxiy/fix_enforce fix enforce	7 years ago
sneaxiy	f0590947c3	fix enforce test=develop	7 years ago
tensor-tang	31fd8ce1e1	Merge pull request #15375 from mozga-intel/mozga-intel/batch_norm_ngraph_operator Enable batch_norm operator for a ngraph engine	7 years ago
dzhwinter	04e9776aef	add details. test=develop	7 years ago
mozga-intel	1198ccae6b	Enable batch_norm operator for a ngraph engine test=develop	7 years ago
peizhilin	883d22093a	fix the lib_any dependency test=develop	7 years ago
wopeizl	3614dadf23	Merge pull request #15631 from wopeizl/windows/fixci fix ci broken randomly and disable some warnings	7 years ago
peizhilin	061299be87	fix dependency test=develop	7 years ago
baojun	ac4cde009d	Enable accuracy op for ngraph engine (#15592 ) * Added accuracy ngraph op test=develop * fixed name type test=develop	7 years ago
dzhwinter	ce0394bcd0	merge develop branch. test=develop	7 years ago
guoshengCS	b6c3b69af8	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix-beam-search-size test=develop	7 years ago
liuwei1031	6e84eb131f	expose peak gpu memory API to python test=develop (#15529 ) * expose peak gpu memory API to python test=develop * add unittest for peak gpu memory monitoring test=develop * add pybind change test=develop * add mutex to gpu mem usage monitor test=develop * update benchmark flag definition file test=develop * tweak unittest for memory monitoring test=develop	7 years ago
guoshengCS	5dfce93101	To make CUDA_LAUNCH_KERNEL_HELPER support large size. test=develop	7 years ago
tensor-tang	8117725852	add jit kernel hsum, hmax and softmax refer code test=develop	7 years ago
sneaxiy	ba4f43fd62	fix compile error in distributed mode test=develop	7 years ago
Yiqun Liu	3008fa1261	Add the CUDA kernel for beam_search op (#15020 ) * Refine the beam_search op and test. * A basic CUDA implementation of beam_search for small batch_size. * Implement CUDA kernel for beam_search_op. * Use multiple CUDA threads in the same block to select the top beam. * Update the python api of beam_search op. * Enable extend function in CPU kernel of beam_search op. * Unify the CUDA codes. test=develop * Unify the CPU kernel of beam_search op. * Ensure the seletced items of beam_search_op's CPU kernel sorted by scores. * Update the description of beam_search in API.spec. * Enable the use of CUDA kernel in beam_search op. * Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements. test=develop * Follow comments. test=develop * Call the CPU kernel for beam_search op when batch_size > 4. test=develop * Remove the except of is_empty op in PrepareData. test=develop	7 years ago
Zeng Jinle	2480a3df7d	Merge pull request #15496 from sneaxiy/lazy_allocator2 Fix bug when user set CUDA_VISIBLE_DEVICES be empty and run CPU-only models	7 years ago
sneaxiy	9c360cc798	test=develop	7 years ago
Xin Pan	58cb18d9d9	Merge pull request #15322 from velconia/imperative_resnet Imperative Resnet	7 years ago
sneaxiy	51227bd447	lazy_allocator test=develop	7 years ago
tangwei12	8b50ad80ff	checkpoint at distributed training (#14854 ) checkpoint for distributed training.	7 years ago
minqiyang	8ce198b2e1	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into imperative_resnet test=develop	7 years ago
minqiyang	315b133e67	Add single GPU support to imperative	7 years ago
tensor-tang	3759c1db8c	Merge pull request #14805 from mozga-intel/mozga-intel/element_wise_operator_ngraph Enable element_wise_add operator for a ngraph engine	7 years ago
peizhilin	eea75a1d93	fix issue when type is invalid test=develop	7 years ago
peizhilin	9adb158e5b	Merge remote-tracking branch 'upstream/develop' into debug/support	7 years ago
chengduo	46d01d798e	Revert "Revert "Remove workspace_handle in conv_cudnn (#15186 )"" (#15290 ) test=develop This reverts commit `358e657f68`.	7 years ago
Wojciech Uss	cb2ba58458	Fix performance drop when with MKL-DNN test=develop	7 years ago
chengduozh	c4eced9881	fix thread safe bug test=develop	7 years ago
chengduozh	358e657f68	Revert "Remove workspace_handle in conv_cudnn (#15186 )" test=develop This reverts commit `064512aa47`.	7 years ago
wopeizl	5d9edb4124	Merge pull request #15156 from wopeizl/windows/fixgpuissue fix gpu buils issue on windows test=develop	7 years ago
chengduo	064512aa47	Remove workspace_handle in conv_cudnn (#15186 ) * remove workspace_handle in conv2d_cudnn test=develop * remove workspace_handle test=develop * fix bug test=develop * make test_conv2d_op SERIAL test=develop * save memory in conv_cudnn test=develop * enhance thread safety test=develop * enhance temporary allocator test=develop * Add excess fraction test=develop * follow comments test=develop * fix bug and code refine test=develop * fix memory size check test=develop * rename reuse_tmp_allocation_excess_fraction test=develop	7 years ago
xiaolil1	8f17c714de	Conv int8 residual (#15145 ) * Enable basic MKL-DNN INT8 Conv OP test=develop * Modify test case test=develop * Clean unittest code test=develop * Fix test test=develop * Modify test test=develop * Enable MKL-DNN INT8 Conv with Relu Fusion OP test=develop * Enable INT8 Conv with residual fusion OP test=develop * Modify code. test=develop * Modify basic INT8 Conv test=develop * Modify Conv. test=develop * fix style test=develop * Fix style test=develop * Fix test test=develop * Modify code. test=develop * Fix test test=develop	7 years ago
peizhilin	439691f5bd	adjust the shlwapi on windows test=develop	7 years ago
peizhilin	92da467c99	Merge remote-tracking branch 'upstream/develop' into windows/fixgpuissue	7 years ago
peizhilin	c1235c935f	add the enable_debug flag test=develop	7 years ago
Zeng Jinle	e29f10d315	Merge pull request #15207 from sneaxiy/remove_op_handle_lock_and_fix_var Remove op handle lock and fix var	7 years ago
mozga-intel	a42f8f4f6f	Enable element_wise_add operator for a ngraph test=develop	7 years ago
Zeng Jinle	c562be20d9	Merge pull request #15193 from sneaxiy/fix_cudnn_compatible_check Fix cudnn compatible check	7 years ago
peizhilin	1cd95d8a0b	use thread local instance test=develop	7 years ago
sneaxiy	ed409ac9f4	Revert "Revert "Remove op handle lock"" test=develop	7 years ago
peizhilin	d54133ea85	not include the numeric under linux test=develop	7 years ago
peizhilin	a6f5ceee74	add the python callstack for debug support test=develop	7 years ago
Zeng Jinle	dacfaaa966	Revert "Remove op handle lock" test=develop	7 years ago
xiaolil1	c8f101e5da	Conv int8 relu (#15130 ) * Enable basic MKL-DNN INT8 Conv OP test=develop * Modify test case test=develop * Clean unittest code test=develop * Fix test test=develop * Modify test test=develop * Enable MKL-DNN INT8 Conv with Relu Fusion OP test=develop * Modify basic INT8 Conv test=develop * fix type test=develop * Modify test test=develop	7 years ago
sneaxiy	9793a0b6a6	fix_cudnn_compatible_check	7 years ago
Zeng Jinle	ccb322d6a5	merge develop	7 years ago
Zeng Jinle	f3a13512fc	Merge pull request #15139 from sneaxiy/remove_op_handle_lock Remove op handle lock	7 years ago
xiaolil1	bbc9336878	Enable basic MKL-DNN INT8 Conv OP (#15124 ) * Enable basic MKL-DNN INT8 Conv OP test=develop * Modify test case test=develop * Clean unittest code test=develop * Fix test test=develop * Modify test test=develop * Modify basic INT8 Conv test=develop	7 years ago
peizhilin	c919b2f31d	Merge remote-tracking branch 'upstream/develop' into windows/fixgpuissue	7 years ago
peizhilin	fd4f4d0e5f	fix build issue test=develop	7 years ago
Yan Xu	a1e60ab19b	Merge pull request #14791 from Yancey1989/parallel_graph_mode [Feature] Add ParallelGraph executor mode in parallelexecutor to improve performance	7 years ago
peizhilin	9ae50dd07d	fix gpu buils issue on windows test=develop	7 years ago
sneaxiy	d0a8a1e950	remove_op_handle_lock test=develop	7 years ago
Yancey1989	e65436103f	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode test=develop	7 years ago
sneaxiy	6f06e6cdac	Merge remote origin test=develop	7 years ago
Xin Pan	9186451f60	hide GetTensor test=develop	7 years ago
sneaxiy	d25395fc98	remove tensor core lock test=develop	7 years ago
Yancey1989	82b42e31f0	polish unittest test=develop	7 years ago
Yancey1989	0a885ac12a	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode test=develop	7 years ago
peizhilin	813c2ce539	fix timer test=develop	7 years ago
wopeizl	7ab501264d	Merge pull request #15069 from wopeizl/windows/dsosupport add cuda dso support for windows	7 years ago
guru4elephant	ff739449ab	Merge pull request #15018 from guru4elephant/add_timer Add debug thread function for async executor	7 years ago
Yancey1989	4743c9cd5d	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode	7 years ago
wopeizl	719ebe3786	Merge pull request #15070 from wopeizl/windows/testcasefix fix test issues on windows	7 years ago
Qiyang Min	0238a3bb4f	Merge pull request #14972 from velconia/accelerate_lstm Accelerate PADDLE_ENFORCE	7 years ago
Yancey1989	86bb583881	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode	7 years ago
peizhilin	01c00b07dd	fix test issues on windows test=develop	7 years ago
peizhilin	1e7f83e60a	add cuda dso support for windows test=develop	7 years ago
Yancey1989	41a64f6a2a	Merge branch 'develop' of github.com:PaddlePaddle/Paddle into parallel_graph_mode	7 years ago
Wu Yi	856f0da0fe	Fp16 training (#14992 ) * wip * wip * wip * wip for test * add fp16 tests test=develop * fix cpu build test=develop * fix test=develop * fix py3 tests test=develop * fix lr_scheduler dtype test=develop * fix test=dvelop * test fix ci compile test=develop * fix build and merge test=develop * fallback momentumop change to general test=develop * make fp16 lr schedule simple test=develop * fix ut test=develop * fix tests test=develop * remove fp16 learning rate cast test=develop	7 years ago
chengduo	b9fb03cf54	Move GetTensor to tensor_util (#15011 ) * refine tensor test=develop * refine tensor test=develop * fix device_context log test=develop	7 years ago
dongdaxiang	ab2abfc5b2	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	7 years ago
dongdaxiang	4cb833d2de	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	7 years ago
tensor-tang	f0e02a65ed	Merge pull request #14974 from xiaolil1/quantize Add Quantize OP	7 years ago
dongdaxiang	68a2d1f3d7	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer add timer_test test=develop	7 years ago
dongdaxiang	2e5ebc4594	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	7 years ago
dongdaxiang	5dfd9c9aa9	Merge branch 'add_timer' of https://github.com/guru4elephant/Paddle into add_timer test=develop	7 years ago

1 2 3 4 5 ...

668 Commits (bcc0d41646efbba034cb68f8dbf302a8f33d992c)