Commit Graph

32 Commits (0536b5263d188c54069765c4168bdba91ad250c7)

Author SHA1 Message Date
danleifeng 425279a57b Improve elementwise operators performance in same dimensions. (#19763)
5 years ago
Bob Zhu c670058a8d add support of matmul with multiple head even different width and height (#19708)
5 years ago
Yihua Xu b920395842 Use sparse matrix to implement fused emb_seq_pool operator (#19064)
6 years ago
Bob Zhu 220eef602e Extend Matmul to support matrix multiplication with multiple heads (#18570)
6 years ago
Yihua Xu 7396788694 Optimize gelu operation with mkl erf.
6 years ago
tensor-tang ee2321debd
Revert 15770 develop a6910f900 gelu mkl opt (#15872)
6 years ago
Yihua Xu 676995c86c Optimze Gelu with MKL Erf function (#15770)
6 years ago
Yu Yang 7b10bf0e60 Use mkl
6 years ago
Jacek Czaja 8bfa1fa9bb - ASUM MKL integration
6 years ago
tensor-tang 64f7516aee
fix lrn on mac (#14426)
6 years ago
tensor-tang 1be85d011d add mkl vsqr and vpow
6 years ago
tensor-tang cf5ea925c3 fix bugs
7 years ago
tensor-tang 3dd66390b2 add blas vexp
7 years ago
tensor-tang 0ec1f65cf1 fix blas dot and add cblas scal
7 years ago
tensor-tang a2203d0466 add cblas dot
7 years ago
tensor-tang f72ab8961e refine blas gemm
7 years ago
tensor-tang 6644ce79a5 add mklml vmul
7 years ago
tensor-tang 54c95e49f0 fix blas
7 years ago
tensor-tang 8c23f7c4f0 fix blas and use packed weight
7 years ago
tensor-tang 43cee33a23 add mkl packed gemm
7 years ago
tensor-tang 17987eb3fc link libxsmm
7 years ago
tensor-tang 3df99e72ab Merge remote-tracking branch 'ups/develop' into refine/set_num_threads
7 years ago
dzhwinter 99a99ec7e3
"remove lapack" (#11966)
7 years ago
tensor-tang e3a96300bb move SetNumThreads to platform
7 years ago
tensor-tang f503f12925 enable dynamic load mklml lib on fluid
7 years ago
tensor-tang 9169b3b802
Merge pull request #10789 from Xreki/core_fix_openblas_threads
7 years ago
Tomasz Patejko e43c8f33cd MKL elementwise add: elementwise_add uses vAdd VML function when MKL is used
7 years ago
Liu Yiqun 39eb871ddf Add an interface to set the number of threads for math function, and set the default value to 1 for inference.
7 years ago
Yu Yang fcd31d6161 Follow comments and polish code names
7 years ago
Yu Yang 0a13d3c67a Move MatMul to blas_impl.h
7 years ago
Yu Yang c6a6d87f96 Rewrite Matmul, make code cleaner
7 years ago
Yu Yang ef6ea790dc Clean and extract blas
7 years ago