Commit Graph

624 Commits (6d5a04c1e7b4d0aecb2b5e44e75fb4776da566b1)

Author SHA1 Message Date
Tao Luo 4efdebc6f6
Merge pull request #15931 from yihuaxu/develop_2c5c7b2a7_gelu_mkl_opt
6 years ago
dzhwinter 225c11a91f polish cudnn related code and fix bug. (#15164)
6 years ago
xiaolil1 6724be2b0d INT8 Pool kernel Key Creation Optimization. (#15883)
6 years ago
Yihua Xu 7396788694 Optimize gelu operation with mkl erf.
6 years ago
peizhilin c6472579c0 test=develop
6 years ago
peizhilin b5d6e38b05 fix build issue for cudaEvent_t
6 years ago
wopeizl 3ccd8964a4
Merge pull request #15905 from wopeizl/win/fix_eigen
6 years ago
chengduo 8e904d322f
Remove unnecessary dependence for profiler (#15899)
6 years ago
Xin Pan 44e7fcddc5
Merge pull request #15844 from panyx0718/infer
6 years ago
Jacek Czaja dec9cf53c8 [MKL-DNN] MKL-DNN specific Tensor modification (#15429)
6 years ago
peizhilin 6ccdb1b947 fix build issue on windows for sample prop op
6 years ago
Dun c6bd434ffe
add memset CUPTI && test=develop (#15868)
6 years ago
Sylwester Fraczek 74672d1aff Change *(smart_ptr.get()) -> *smart_ptr
6 years ago
tensor-tang ee2321debd
Revert 15770 develop a6910f900 gelu mkl opt (#15872)
6 years ago
chengduo 3b08c9abf4
enhance profiler (#15842)
6 years ago
Yihua Xu 676995c86c Optimze Gelu with MKL Erf function (#15770)
6 years ago
Tao Luo e3dd6970fc disable dam temporarily (#15860)
6 years ago
Dun Liang 35a90e06bf test=develop
6 years ago
Dun Liang c9080f516b test=develop
6 years ago
Dun Liang 1c7bb0e40c test=develop
6 years ago
Xin Pan 5eb87506bc add per kernel config and remove const_cast.
6 years ago
Dun a83e470405
Profiler refine and add CUDA runtime api tracer (#15301)
6 years ago
mozga-intel 13ec2d331b Enable momentum operator for a ngraph engine (#15673)
6 years ago
Tao Luo c797a1f050 remove legacy any.cmake
6 years ago
Tao Luo bd2fa73620
Merge pull request #15794 from sneaxiy/fix-warnings
6 years ago
tensor-tang e1c707fe9c
fix warnings (#15790)
6 years ago
sneaxiy 9b8e0e2f17 fix enforce_test
6 years ago
sneaxiy 209b355762 fix many warning
6 years ago
Zeng Jinle fc87ef741b
Merge pull request #15687 from sneaxiy/fix_enforce
6 years ago
sneaxiy f0590947c3 fix enforce
6 years ago
tensor-tang 31fd8ce1e1
Merge pull request #15375 from mozga-intel/mozga-intel/batch_norm_ngraph_operator
6 years ago
dzhwinter 04e9776aef add details. test=develop
6 years ago
mozga-intel 1198ccae6b Enable batch_norm operator for a ngraph engine
6 years ago
peizhilin 883d22093a fix the lib_any dependency
6 years ago
wopeizl 3614dadf23
Merge pull request #15631 from wopeizl/windows/fixci
6 years ago
peizhilin 061299be87 fix dependency
6 years ago
baojun ac4cde009d Enable accuracy op for ngraph engine (#15592)
6 years ago
dzhwinter ce0394bcd0 merge develop branch. test=develop
6 years ago
guoshengCS b6c3b69af8 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix-beam-search-size
6 years ago
liuwei1031 6e84eb131f expose peak gpu memory API to python test=develop (#15529)
6 years ago
guoshengCS 5dfce93101 To make CUDA_LAUNCH_KERNEL_HELPER support large size.
6 years ago
tensor-tang 8117725852 add jit kernel hsum, hmax and softmax refer code
6 years ago
sneaxiy ba4f43fd62 fix compile error in distributed mode
6 years ago
Yiqun Liu 3008fa1261
Add the CUDA kernel for beam_search op (#15020)
6 years ago
Zeng Jinle 2480a3df7d
Merge pull request #15496 from sneaxiy/lazy_allocator2
6 years ago
sneaxiy 9c360cc798 test=develop
6 years ago
Xin Pan 58cb18d9d9
Merge pull request #15322 from velconia/imperative_resnet
6 years ago
sneaxiy 51227bd447 lazy_allocator
6 years ago
tangwei12 8b50ad80ff
checkpoint at distributed training (#14854)
6 years ago
minqiyang 8ce198b2e1 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into imperative_resnet
6 years ago