Commit Graph

2656 Commits (e21edb26f6e7fb364597c31a26f128c3c2710516)

Author SHA1 Message Date
luotao1 e21edb26f6 add Set/GetCPUNumThreads api
6 years ago
qingqing01 36f08eef3b
CUDA kernel for density_prior_box_op. (#14513)
6 years ago
chengduo 00b9e9a135
Refine cublas to support CUBLAS_TENSOR_OP_MATH (#13929)
6 years ago
Dun ae7d22862b Group Norm (#13843)
6 years ago
wopeizl d9a1f3e58e Windows/online (#14474)
6 years ago
Tao Luo 5d4d117edc
Merge pull request #14502 from qingqing01/cudnn5_fix
6 years ago
Yu Yang e68c1fcd5a
Merge pull request #14522 from reyoung/feature/fix_op_header_deps
6 years ago
Zhaolong Xing ad349e770f
Merge pull request #14452 from NHZlX/fix_avg_pool_trt_bug
6 years ago
Yu Yang 3edd32d070 fix(Compile): fix depends error when compile op using cub
6 years ago
Dang Qingqing cda60311f9 Fix compling with cuDNN v5
6 years ago
tensor-tang 10fb4ceefc
Merge pull request #14351 from tpatejko/tpatejko/mkldnn-elementwise_mul
6 years ago
nhzlx e62872df8b fix conflicts
6 years ago
Tao Luo 1d3e9bde1e
Merge pull request #14488 from yihuaxu/develop_7a64d48f5_stack_opt
6 years ago
tensor-tang 7aa3aff338
Merge pull request #14465 from tensor-tang/fea/jit/exp
6 years ago
Tao Luo 1b894e495f
Merge pull request #14437 from jczaja/prv-softmax-mkl
6 years ago
Yihua Xu a906a361be Add the macro for NVCC (test=develop)
6 years ago
Yihua Xu d91740acb1 Revert "Remove the remnant code (test=develop)"
6 years ago
Yihua Xu be50670348 Remove the remnant code (test=develop)
6 years ago
qingqing01 9eefd2c766
Modify some infer-shape about detection operators in compile-time. (#14483)
6 years ago
Yihua Xu f4c869d872 Optimize the layer_norm operator with AVX intrinsic function (#14417)
6 years ago
Yu Yang f1a392a5fe
Merge pull request #13804 from sneaxiy/rewrite_allocation
6 years ago
Yihua Xu f418f552df Merge branch 'develop' into develop_7a64d48f5_stack_opt (test=develop)
6 years ago
qingqing01 fd7e643153
Convolution fusion operator. (#14449)
6 years ago
Yu Yang 98bbfc17be Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into rewrite_allocation
6 years ago
Wu Yi d7bd0361cb fix dist deps (#14471)
6 years ago
Jacek Czaja 9b0eae3023 - Removing partial specialization of sotmax for inference for GPU
6 years ago
tensor-tang a19b3225a1 fix jitcode small size
6 years ago
Jacek Czaja be80bb4f28 - Fix to GPU
6 years ago
tensor-tang 4dbdfa60ef sigmoid and tanh support all size
6 years ago
tensor-tang ccb8963705 refine exp jitcode with all size
6 years ago
tensor-tang d3eae8f61b refine relu and fix addrelu test
6 years ago
tensor-tang 4e67fe6a12 refine act and vxx with all size
6 years ago
tensor-tang ba3eaed7a7 exp support all size
6 years ago
tensor-tang 1ffce8c0ae fix build error on noavx
6 years ago
Michal Gallus c69c41604e MKLDNN elementwise_mul: Move Kernel to KernelPool to avoid segfaults
6 years ago
Michal Gallus 785066eb8a MKLDNN elementwise_mul: Check if AVX512 is available
6 years ago
Michal Gallus 08f63c4d12 MKLDNN elementwise_mul: Lint changes to UT & integration
6 years ago
Michal Gallus 49b09327f6 MKLDNN elementwise_mul: Reorder on non-nchw input, fallback on non-16 divisable fm
6 years ago
Michal Gallus d14858e4ba MKLDNN elementwise_mul: Parallelize mul
6 years ago
Michal Gallus ed31936ba1 MKLDNN elementwise_mul: Support NCHW, update UT
6 years ago
Tomasz Patejko 700bcbf74f MKLDNN elementwise_mul: h and w loops implemented in xbyak
6 years ago
Tomasz Patejko ad09facafe MKLDNN elementwise_mul: CPU tests initially refactored. MKLDNN mul test for broadcast added
6 years ago
Tomasz Patejko 2d73ad180a MKLDNN elementwise_mul: simple xbyak version for AVX512
6 years ago
Tomasz Patejko 213ec37d6a MKLDNN elementwise_add: simple initial implementation of the operator for MKLDNN format
6 years ago
Wu Yi a2d9b34417
Refine operator cmake (#14413)
6 years ago
tensor-tang 7f17e561d7
Merge pull request #14423 from tensor-tang/fea/jit/act
6 years ago
Jiabin Yang 28bd5b7bad fix space_to_depth_op unicode problem (#14430)
6 years ago
Jacek Czaja 513bb6c151 Squashing MKL based softmax for inference
6 years ago
nhzlx 9b64aac41f add macro for pool2dDirectCUDAFunctor
6 years ago
whs 1722678258
Make nce support more distribution. (#13549)
6 years ago