Paddle

Commit Graph

Author	SHA1	Message	Date
Tao Luo	4efdebc6f6	Merge pull request #15931 from yihuaxu/develop_2c5c7b2a7_gelu_mkl_opt Optimize gelu operation with mkl erf	6 years ago
dzhwinter	225c11a91f	polish cudnn related code and fix bug. (#15164 ) * staged. * polish code * polish code. test=develop * polish code. test=develop * api change. test=develop * fix default value. test=develop * fix default value. test=develop	6 years ago
xiaolil1	6724be2b0d	INT8 Pool kernel Key Creation Optimization. (#15883 ) * Optimize key creation of INT8 pool kernel to improve the peformance of ResNet-50 and MobileNet, especially for latency. test=develop * Optimize key creation of pool fp32 grad. test=develop	6 years ago
Yihua Xu	7396788694	Optimize gelu operation with mkl erf. test=develop	6 years ago
peizhilin	c6472579c0	test=develop	6 years ago
peizhilin	b5d6e38b05	fix build issue for cudaEvent_t test=develop	6 years ago
wopeizl	3ccd8964a4	Merge pull request #15905 from wopeizl/win/fix_eigen fix build issue on windows for sample prop op	6 years ago
chengduo	8e904d322f	Remove unnecessary dependence for profiler (#15899 ) * refile profiler test=develop * follow comment test=develop	6 years ago
Xin Pan	44e7fcddc5	Merge pull request #15844 from panyx0718/infer add per kernel config and remove const_cast.	6 years ago
Jacek Czaja	dec9cf53c8	[MKL-DNN] MKL-DNN specific Tensor modification (#15429 ) * - Implemented draft of primitive desc keeping in Tensor test=develop - TransposeMKLDNNHandler::AcquireSrcMemory was reimplemented - Added nchw and nc formats setting for sake of compatiblity Fixed unit tests - Worakaround to problem with 5D data in conv - Added 3D and 1D MKL-DNN formats for name handles for tensor test=develop - Fix to UTs test=develop - Conv fp32 op was updated Cosmetic fixes test=develop - tensor mkldnn cosmetics test=develop - Moved most of mkl-dnn specific code from Tensor to mkl-dnn utils * - Lint fixes test=develop * - setting prim dec in Tensor , sets also layout to kMKLDNN test=develop * - Moved creation of prim desc totally out of Tensor test=develop * - Cosmetic fixes adter review test=develop	6 years ago
peizhilin	6ccdb1b947	fix build issue on windows for sample prop op test=develop	6 years ago
Dun	c6bd434ffe	add memset CUPTI && test=develop (#15868 )	6 years ago
Sylwester Fraczek	74672d1aff	Change (smart_ptr.get()) -> smart_ptr reason: dereferencing smart pointer is the same as the underlying pointer test=develop	6 years ago
tensor-tang	ee2321debd	Revert 15770 develop `a6910f900` gelu mkl opt (#15872 ) * Revert "Optimze Gelu with MKL Erf function (#15770)" This reverts commit `676995c86c`. * test=develop	6 years ago
chengduo	3b08c9abf4	enhance profiler (#15842 ) test=develop	6 years ago
Yihua Xu	676995c86c	Optimze Gelu with MKL Erf function (#15770 ) * Optimize for gelu operator * Set up the low accuracy mode of MKL ERF function. test=develop * Only enable MKLML ERF when OS is linux * Use the speical mklml version included vmsErf function to verify gelu mkl kernel. test=develop * Add the CUDA macro to avoid NVCC's compile issue. test=develop * Add the TODO comments for mklml library modification. test=develop * Clean Code test=develop * Add the comment of marco for NVCC compiler. test=develop	6 years ago
Tao Luo	e3dd6970fc	disable dam temporarily (#15860 ) test=develop	6 years ago
Dun Liang	35a90e06bf	test=develop	6 years ago
Dun Liang	c9080f516b	test=develop	6 years ago
Dun Liang	1c7bb0e40c	test=develop	6 years ago
Xin Pan	5eb87506bc	add per kernel config and remove const_cast. test=develop	6 years ago
Dun	a83e470405	Profiler refine and add CUDA runtime api tracer (#15301 ) * refine profiler && add runtime tracer * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * fix bug && test=develop * add thread id map && test=develop * test=develop * testing * bug fix * remove cuda event && refine code && test=develop * test=develop * test=develop * test=develop * fix windows temp file && test=develop * test=develop * fix windows bug && test=develop * fix start up issue && test=develop * code polish && test=develop * remove unused code && test=develop * add some cupti cbid && test=develop * add FLAGS_multiple_of_cupti_buffer_size && test=develop * fix compile error && test=develop * add keyword && test=develop * fix && test=develop * code polish && test=develop	6 years ago
mozga-intel	13ec2d331b	Enable momentum operator for a ngraph engine (#15673 ) * Enable momentum operator for a ngraph engine test=develop * Update tests test=develop * Unnecessary line of the code as intended was removed test=develop	6 years ago
Tao Luo	c797a1f050	remove legacy any.cmake	6 years ago
Tao Luo	bd2fa73620	Merge pull request #15794 from sneaxiy/fix-warnings Fix compile warning	6 years ago
tensor-tang	e1c707fe9c	fix warnings (#15790 ) * fix warnings test=develop * fix enforce test test=develop	6 years ago
sneaxiy	9b8e0e2f17	fix enforce_test test=develop	6 years ago
sneaxiy	209b355762	fix many warning test=develop	6 years ago
Zeng Jinle	fc87ef741b	Merge pull request #15687 from sneaxiy/fix_enforce fix enforce	6 years ago
sneaxiy	f0590947c3	fix enforce test=develop	6 years ago
tensor-tang	31fd8ce1e1	Merge pull request #15375 from mozga-intel/mozga-intel/batch_norm_ngraph_operator Enable batch_norm operator for a ngraph engine	6 years ago
dzhwinter	04e9776aef	add details. test=develop	6 years ago
mozga-intel	1198ccae6b	Enable batch_norm operator for a ngraph engine test=develop	6 years ago
peizhilin	883d22093a	fix the lib_any dependency test=develop	6 years ago
wopeizl	3614dadf23	Merge pull request #15631 from wopeizl/windows/fixci fix ci broken randomly and disable some warnings	6 years ago
peizhilin	061299be87	fix dependency test=develop	6 years ago
baojun	ac4cde009d	Enable accuracy op for ngraph engine (#15592 ) * Added accuracy ngraph op test=develop * fixed name type test=develop	6 years ago
dzhwinter	ce0394bcd0	merge develop branch. test=develop	6 years ago
guoshengCS	b6c3b69af8	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix-beam-search-size test=develop	6 years ago
liuwei1031	6e84eb131f	expose peak gpu memory API to python test=develop (#15529 ) * expose peak gpu memory API to python test=develop * add unittest for peak gpu memory monitoring test=develop * add pybind change test=develop * add mutex to gpu mem usage monitor test=develop * update benchmark flag definition file test=develop * tweak unittest for memory monitoring test=develop	6 years ago
guoshengCS	5dfce93101	To make CUDA_LAUNCH_KERNEL_HELPER support large size. test=develop	6 years ago
tensor-tang	8117725852	add jit kernel hsum, hmax and softmax refer code test=develop	6 years ago
sneaxiy	ba4f43fd62	fix compile error in distributed mode test=develop	6 years ago
Yiqun Liu	3008fa1261	Add the CUDA kernel for beam_search op (#15020 ) * Refine the beam_search op and test. * A basic CUDA implementation of beam_search for small batch_size. * Implement CUDA kernel for beam_search_op. * Use multiple CUDA threads in the same block to select the top beam. * Update the python api of beam_search op. * Enable extend function in CPU kernel of beam_search op. * Unify the CUDA codes. test=develop * Unify the CPU kernel of beam_search op. * Ensure the seletced items of beam_search_op's CPU kernel sorted by scores. * Update the description of beam_search in API.spec. * Enable the use of CUDA kernel in beam_search op. * Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements. test=develop * Follow comments. test=develop * Call the CPU kernel for beam_search op when batch_size > 4. test=develop * Remove the except of is_empty op in PrepareData. test=develop	6 years ago
Zeng Jinle	2480a3df7d	Merge pull request #15496 from sneaxiy/lazy_allocator2 Fix bug when user set CUDA_VISIBLE_DEVICES be empty and run CPU-only models	6 years ago
sneaxiy	9c360cc798	test=develop	6 years ago
Xin Pan	58cb18d9d9	Merge pull request #15322 from velconia/imperative_resnet Imperative Resnet	6 years ago
sneaxiy	51227bd447	lazy_allocator test=develop	6 years ago
tangwei12	8b50ad80ff	checkpoint at distributed training (#14854 ) checkpoint for distributed training.	6 years ago
minqiyang	8ce198b2e1	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into imperative_resnet test=develop	6 years ago

1 2 3 4 5 ...

624 Commits (6d5a04c1e7b4d0aecb2b5e44e75fb4776da566b1)