Commit Graph

673 Commits (b34933d9ee3b61dbbd642fd02f244c36d0d14550)

Author SHA1 Message Date
tensor-tang 6447155dac
Merge pull request #13851 from tensor-tang/fea/jitkernel_peephole
7 years ago
sneaxiy 4b4af84e67 test=develop
7 years ago
Qiao Longfei 0225957515 change elementwise_add to elementwise_add_to test=develop
7 years ago
Qiao Longfei b4a32eafdf Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into optimize-sum-seq-pooling-op
7 years ago
Zeng Jinle 93606c2c2c
Merge pull request #13689 from sneaxiy/sparse_rmsprop
7 years ago
sneaxiy 5cedfb60c8 test=develop
7 years ago
Qiao Longfei 936926aadd code optimize
7 years ago
Qiyang Min cab29828a5
Merge pull request #13829 from velconia/accelerate_sequence_pool_op
7 years ago
Qiao Longfei c52ccbc109 clean code
7 years ago
Qiao Longfei 6056d04361 optimize blas call
7 years ago
Qiao Longfei 5db7551317 optimize code
7 years ago
Qiao Longfei eb6d9e3bbe Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into optimize-sum-seq-pooling-op
7 years ago
Qiao Longfei 0170d36c42 fix a bug
7 years ago
Qiyang Min e37c9e6732
Merge pull request #13828 from velconia/accelerate_selected_rows_functor
7 years ago
Qiao Longfei 86e2e686ee fix bug
7 years ago
Qiao Longfei 333fd15204 add gpu test for mrege add
7 years ago
Qiao Longfei ab3e36da80 update MergeAdd for selected_rows_functor.cu
7 years ago
Qiao Longfei d5c64af24f change map to unordered_map
7 years ago
Qiao Longfei 005f1923a2 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into optimize-sum-seq-pooling-op
7 years ago
tensor-tang bcb8ea397d Merge remote-tracking branch 'ups/develop' into fea/jitkernel_peephole
7 years ago
tensor-tang 8e182170ba refine and replace lstm peephole kernel
7 years ago
Dun 5f2e837847 optimize depthwise conv by register memory (#13778)
7 years ago
minqiyang 3f6ec90060 Polish code
7 years ago
tensor-tang 7ef2699e18 init peephole runtime kernel
7 years ago
minqiyang 0385b0a1ea Accelerate SequencePool Op on SUM mode
7 years ago
minqiyang 8ec748cfa0 Accelerate SelectedRows Functors:
7 years ago
Qiao Longfei 38568519f7 optimize code
7 years ago
tensor-tang 3ee8f2c6cf thread local jit kernels
7 years ago
tensor-tang 9131a35676 replace the lstm compute with jitkernel
7 years ago
tensor-tang b55c247678 add lstm compute unit test
7 years ago
sneaxiy 4c672ab1a2 Merge reyoung:rewrite_allocation
7 years ago
tensor-tang 2a00969165 optimize lstm jitkernel keq8
7 years ago
tensor-tang f2adaf1c3e add vrelu and lstm kernel
7 years ago
tensor-tang e6d8aca3bf refine code and fix
7 years ago
qiaolongfei 1a59880084 update test_sum_op
7 years ago
qiaolongfei 40d3bd4e81 selected rows merge add support multi input
7 years ago
tensor-tang ea7dc9cbf6 Merge remote-tracking branch 'ups/develop' into fea/jitkernel
7 years ago
tensor-tang 2513b2cc4e fix bug vtanh
7 years ago
tensor-tang cf8c8e72bd add vtanh and unit test
7 years ago
tensor-tang b37fe30417
Merge pull request #13690 from wangguibao/fix_cpu_lstm_compute_cc
7 years ago
dzhwinter 26771f41ba
"fix compile error" (#13579)
7 years ago
tensor-tang d10a9df7b8 add vaddbias and unit test
7 years ago
tensor-tang 3c8b651187 add vsigmoid avx implementations and unit test
7 years ago
tensor-tang 55e44761fb refine code and init vsigmoid
7 years ago
wangguibao 1940bc2d83 Avoid multiple definitions of lstm_compute_ctht when linking libpaddle_fluid.so
7 years ago
sneaxiy 584c3f048f fix sparse rmsprop
7 years ago
Yu Yang 8e3fdc6e65 Fix SetDevice on init
7 years ago
Yu Yang 524f6e9b36 Refine code
7 years ago
Dun 161c3e31f7 Optimization of Kernels that related to DeepLabv3+ (#13534)
7 years ago
tensor-tang 2d0ff6a3c2 add vexp and unit test
7 years ago
tensor-tang b3c63f40fa add vscal and unit test
7 years ago
tensor-tang 0987f2b4d9 add vadd unit test
7 years ago
tensor-tang 3d928d4f9d refine and seepdup
7 years ago
tensor-tang 77fc42d2d1 Merge remote-tracking branch 'ups/develop' into fea/jitkernel
7 years ago
tensor-tang 2937314d8e refine vmul and test
7 years ago
tensor-tang 6c986e127a fix macro and add vmul unit test
7 years ago
Yu Yang 0be1582df0
Merge pull request #13525 from reyoung/fix_mixed_vector
7 years ago
tensor-tang 8c69764d12 add vmul unit tests
7 years ago
tensor-tang 084893a9a9 add vadd kernel
7 years ago
tensor-tang eeff268a6c clean and refine kernels
7 years ago
tensor-tang dee5d35c20 refine vmul
7 years ago
tensor-tang 92031968d7 init vmul kernel
7 years ago
tensor-tang b9acbcc8c5 init lstm kernel
7 years ago
tensor-tang c260bf942d init jit kernel
7 years ago
Yu Yang 3043f51b3a
Merge pull request #13511 from reyoung/fix_ce
7 years ago
Yu Yang f7af695801
Merge pull request #13505 from reyoung/fix_selected_rows_functor_test
7 years ago
Yu Yang 6d2c6f96f1 Revert "Revert "Merge pull request #13431 from chengduoZH/refine_lod""
7 years ago
Yu Yang a6c8d6b9a2 Revert "Merge pull request #13431 from chengduoZH/refine_lod"
7 years ago
Zeng Jinle 7f1e312677
Merge pull request #13456 from sneaxiy/refine_sparse_adam
7 years ago
Yu Yang b5996fa124 Fix unstable selected_rows_functor_test.cu
7 years ago
sneaxiy a29b4227eb fix sparse gradient clip
7 years ago
Yihua Xu 87086b1386 Refine activation for GRU operator (#13275)
7 years ago
chengduo d402234ba8
Feature/op_fuse_pass (#12440)
7 years ago
Yu Yang 2c31ea9293
Merge pull request #13424 from chengduoZH/refine_seq_concat
7 years ago
Yu Yang 5996e224fa
Merge pull request #13430 from chengduoZH/refine_seq_pool
7 years ago
sneaxiy b6f61faf13 fix adam
7 years ago
chengduoZH 6534f8527a Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refine_lod
7 years ago
chengduoZH 24459501fe Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refine_seq_concat
7 years ago
chengduoZH f92b07f0b5 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into refine_seq_pool
7 years ago
gongweibao 0c8c0d943f
fix macunittest (#13434)
7 years ago
chengduoZH cdb9605bad refine
7 years ago
chengduoZH cacf549e8a refine seq_pool
7 years ago
chengduoZH e7940141ce refine seq_concat
7 years ago
tensor-tang 7c8730824a
Merge pull request #13396 from tensor-tang/refine/op/lstm
7 years ago
Tao Luo 40c54db301
Merge pull request #13338 from bingyanghuang/bingyang/seq_pool_memcpy
7 years ago
tensor-tang e09cf031a8 refine src and header
7 years ago
bingyanghuang 76553c5a6d fix travis-ci
7 years ago
tensor-tang bc9971dd6c fix deps
7 years ago
tensor-tang ff858d35ed fix bug and enable on batch mode as well
7 years ago
tensor-tang 8dea07f209 fix comopile
7 years ago
tensor-tang 612ba41aee add simple lstm compute
7 years ago
bingyanghuang 83394bab3e modified by luotao's suggestion
7 years ago
Bai Yifan faf8ad2436
Add ignore_index in cross_entropy op (#13217)
7 years ago
bingyanghuang 1454cd54aa pre-commit check
7 years ago
bingyanghuang 7429067ab3 clean code
7 years ago
bingyanghuang cdbc5e7353 Add some comments
7 years ago
bingyanghuang 53185fde11 Rewrite sequence pooling last and first mode with memcpy and clean code
7 years ago
dzhwinter 379b471ee2 squash commit
7 years ago
dzhwinter f05520060e
fix style (#13142)
7 years ago
tensor-tang f38905a6e5 Merge remote-tracking branch 'ups/develop' into optimize/op/fusion_gru
7 years ago
dzhwinter 34757efb8e fix windows compile
7 years ago
dzhwinter dbe90cc0f6 merge develop branch
7 years ago
dzhwinter ab1097cd8e
Feature/template (#13093)
7 years ago
tensor-tang 7bdd11d88e Merge branch 'develop' into optimize/op/fusion_gru
7 years ago
tensor-tang b0d36c4c3d add cross vec to speedup gru
7 years ago
chengduo 3bd1d22a7d
Enhance fused_elementwise_activation_op (#12837)
7 years ago
tensor-tang 2d0ddf8c41 refine cpu gru batch mode
7 years ago
tensor-tang 70d3981220 add cpu vec bias sub
7 years ago
tensor-tang d941192e74 fix gcc53 on cpu vec (#13020)
7 years ago
tensor-tang 2328a69157
Merge pull request #13012 from tensor-tang/refine/seq2batch
7 years ago
tensor-tang fd4f7c3ab5 refine seq2batch
7 years ago
fengjiayi 7e0c9f50ae Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into dev_sequence_padding_op
7 years ago
fengjiayi 9cb455fa7d update function
7 years ago
Zeng Jinle ef7bd03a03
Merge pull request #12964 from sneaxiy/fix_concat_sync
7 years ago
qingqing01 1f09bc320c
Support data type int8_t . (#12841)
7 years ago
dzhwinter cd8f3e9ed0 operator module is done
7 years ago
chengduo 3e1050a2e8
Add pad_constant_like_op (#12943)
7 years ago
dzhwinter 6cc7870517 fix concat synchronization bug
7 years ago
dzhwinter 2ec589a24e float.h fixed
7 years ago
dzhwinter 7dceb8a080 check some operators
7 years ago
dzhwinter 26dbe35c54 add msvc flags and copy lib done
7 years ago
Qiao Longfei 3c58b87b45
fix auc layer and add check for auc op (#12954)
7 years ago
dzhwinter d7f98f37a7 more platform is done
7 years ago
dzhwinter eca4563e5d
operators module (#12938)
7 years ago
dzhwinter a94d4f51a8 fix math_function compile
7 years ago
tensor-tang 7bdaf09664 Merge remote-tracking branch 'ups/develop' into refine/jit
7 years ago
tensor-tang 3462c29940 refine add bias with avx
7 years ago
dzhwinter c1ad52f768 pre-commit
7 years ago
dzhwinter 89f95ea25e merge develop branch
7 years ago
tensor-tang bb9f98e10d add inplace test
7 years ago
tensor-tang f269614bcd further optimize tanh with avx and mkl
7 years ago
luotao1 2b4edacca0 enhance the forward of concat op
7 years ago
dzhwinter 34f8c9b6f5 windows port
7 years ago
tensor-tang 7a4924cd44 further optimize sigmoid with avx and avx512
7 years ago
tensor-tang 6bd89ba5b6 fix typo
7 years ago
tensor-tang e3bb98eb38 optimize relu with avx and avx512
7 years ago
tensor-tang 25976fe736 optimize the sigmoid and tanh
7 years ago
tensor-tang 2eb46c2b06 add cpu vec test
7 years ago
tensor-tang f0f06992c1
Merge pull request #12878 from tensor-tang/feature/op/attention_lstm
7 years ago
fengjiayi f4a4a4cbd9 add op comment and python layer
7 years ago
tensor-tang 5ca0bb9aad support more activation type and remove some comments
7 years ago
tensor-tang ec59f0d454 add cpu vec
7 years ago
tensor-tang cf5ea925c3 fix bugs
7 years ago
tensor-tang 3dd66390b2 add blas vexp
7 years ago
tensor-tang 0ec1f65cf1 fix blas dot and add cblas scal
7 years ago
tensor-tang a2203d0466 add cblas dot
7 years ago
tensor-tang f72ab8961e refine blas gemm
7 years ago
Yu Yang 3768677980 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/process_lod_grad
7 years ago
Yu Yang 2a36ad1a96 Handle LoD for concat & seq_softmax ops
7 years ago
fengjiayi ce182d9037 bug fix
7 years ago
Tao Luo d04ef276a5
Merge pull request #12745 from tensor-tang/refine/op/elewise_mul
7 years ago
fengjiayi 34b209cffa Complete sequence_padding GPU kernel
7 years ago
tensor-tang b090479409 Merge remote-tracking branch 'ups/develop' into feature/op/fusion_lstm
7 years ago
fengjiayi 8d8d48a34f Complete sequence_pad_op and its CPU kernel. Add unittests
7 years ago
dzhwinter 4069262f0e
Revert ""cherry picked operators changes" (#12184)" (#12747)
7 years ago
fengjiayi 3c749fae43 update CPU sequence_padding functor
7 years ago
tensor-tang 92890ac258 Merge remote-tracking branch 'ups/develop' into feature/op/fusion_lstm
7 years ago
tensor-tang 6644ce79a5 add mklml vmul
7 years ago
tensor-tang ff92b6ba81
Merge pull request #12531 from tensor-tang/refine/op/gru
7 years ago
tensor-tang a72f68f223 Merge remote-tracking branch 'ups/develop' into feature/op/fusion_lstm
7 years ago
tensor-tang f3cd2612ae refine fc and use the fc compute in fusion_lstm
7 years ago
dzhwinter bf3c34960f
"cherry picked operators changes" (#12184)
7 years ago
fengjiayi a38a8db928 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into dev_sequence_padding_op
7 years ago
tensor-tang 3bf3e77ac8 Merge remote-tracking branch 'ups/develop' into refine/op/gru
7 years ago
chengduo 7c8b69c700
Feature/op fusion (#12240)
7 years ago
tensor-tang 54c95e49f0 fix blas
7 years ago
tensor-tang 8c23f7c4f0 fix blas and use packed weight
7 years ago
tensor-tang 43cee33a23 add mkl packed gemm
7 years ago
tensor-tang d8d2dbcfac further optimize im2col using variables
7 years ago
tensor-tang 687a322267 Merge remote-tracking branch 'ups/develop' into refine/im2col
7 years ago
tensor-tang 65d418f060 complete im2col with padding==1 and speedup filter width==1
7 years ago
tensor-tang 52eb86e30f refine im2col benchmark
7 years ago
tensor-tang 3017f46076 add more test cases
7 years ago
tensor-tang 8d6be4fb5f refine im2col test and add benchmark
7 years ago
tensor-tang 507c143047 im2col cfo cpu code clean
7 years ago
tensor-tang 4eeed0b5e4 refine width padding and enable core copy
7 years ago
Wu Yi 73fcfc06ec
refine conv cudnn enforce (#12353)
7 years ago
tensor-tang e3131e2d73 enable width padding
7 years ago
tensor-tang 92518c519f reuse sizes saving time
7 years ago
tensor-tang 660df122ce enable padding!=0 and fill height padding with 0
7 years ago
tensor-tang d8e00facf7 reuse im_size
7 years ago
tensor-tang b72befc5cc reuse copy size
7 years ago
tensor-tang 6788af4bf1 refine test cases
7 years ago
tensor-tang b163e601b6 add gtest
7 years ago
tensor-tang aae994fd26 refine im2col no padding
7 years ago
Yan Chunwei 02cf54d331
bugfix lod cpu performance (#12297)
7 years ago
tensor-tang fc2b578842 add gemm_warp test
7 years ago
tensor-tang a916c52579 refine gemm
7 years ago
tensor-tang 961e754c9f mkl split gemm for better perf
7 years ago
tensor-tang f0cd493c0d
Merge pull request #11989 from tensor-tang/feature/libxsmm
7 years ago
Guo Sheng da3f766821
Merge pull request #12088 from guoshengCS/complete-hsigmoid
7 years ago
guosheng 4ee069fdba Fix the HierarchicalSigmoidGradOpKernel and refine the codes. Now hsigmoid_op is same with V2 implementation and can pass gradient check.
7 years ago
tensor-tang 1c5d6c5692 disable xsmm with float16
7 years ago
tensor-tang c9ba51ead8 Merge remote-tracking branch 'ups/develop' into feature/libxsmm
7 years ago
tensor-tang 64a8e6d20e refine the threshold functions
7 years ago
lemon34 29145e1e31 change im2sequence for ctc batch inference (#11696)
7 years ago
guosheng e7a4cfc0ff complete the hsigmoid_op
7 years ago
guosheng d695381677 Merge branch 'develop' of https://github.com/PaddlePaddle/paddle into complete-hsigmoid
7 years ago
tensor-tang 6bc1aaaac7 refine the ColMajor replacement
7 years ago
tensor-tang de856da9a6 fix ColMajor and RowMajor replacement
7 years ago
tensor-tang 21516e5cbe add unit test of smm
7 years ago
tensor-tang c3941745b3 add libxsmm_gemm
7 years ago
tensor-tang 7782a4ab53 fix blas build issue
7 years ago
tensor-tang 17987eb3fc link libxsmm
7 years ago
tensor-tang 3df99e72ab Merge remote-tracking branch 'ups/develop' into refine/set_num_threads
7 years ago
dzhwinter 4ed0b62476
Move fluid::framework::InitDevices into fluid::platform (#11757)
7 years ago
dzhwinter 99a99ec7e3
"remove lapack" (#11966)
7 years ago
Xin Pan a9086bf320 also move a few other dir to legacy/
7 years ago
tensor-tang e3a96300bb move SetNumThreads to platform
7 years ago
tensor-tang 1f09ddf806 Merge remote-tracking branch 'ups/develop' into refine/mklml/dyload
7 years ago
Tao Luo bfe5dc6312
Merge pull request #11607 from chengduoZH/fix_concat_warning
7 years ago
chengduoZH 804c767107 fix concat warning
7 years ago
tensor-tang f503f12925 enable dynamic load mklml lib on fluid
7 years ago
fengjiayi 12619fcf90 fix a compile error
7 years ago
qiaolongfei 762160bd8c fix concat grad kernel
7 years ago
qingqing01 9c90dc9728
Make the CUDA kernel of concat correct and fix unit tests. (#11541)
7 years ago
qiaolongfei ad1ad738d8 add gpu support for concat
7 years ago
qiaolongfei 9c128fe656 concat support data as input
7 years ago
weixing02 ee13b396f2 fix some errors
7 years ago
weixing02 8bd148dc00 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into hsigmoid_op
7 years ago
tensor-tang 9169b3b802
Merge pull request #10789 from Xreki/core_fix_openblas_threads
7 years ago
guochaorong 04b8d3d03c Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into paddle_fix
7 years ago
guochaorong 0fec9469f9 fix some bugs introduced by unfreed memory
7 years ago
weixing02 3e46ec41a9 add hsigmoid
8 years ago
qingqing01 3ba75d4a69
Check label range in cross entropy calculation. (#10954)
8 years ago
Tomasz Patejko e43c8f33cd MKL elementwise add: elementwise_add uses vAdd VML function when MKL is used
8 years ago
yangyaming 10ec329b7d Refine code.
8 years ago
Liu Yiqun 50ba205d79 Merge branch 'develop' into core_fix_openblas_threads
8 years ago
Liu Yiqun 39eb871ddf Add an interface to set the number of threads for math function, and set the default value to 1 for inference.
8 years ago
yuyang18 fd2b4b478e Make tensor support uint8
8 years ago
Yiqun Liu b7026f79a9
Fix a bug related to dispensable inputs and refine the inference unittest (#10527)
8 years ago
yangyaming 0797246704 Enhance sequence_padding functor (CPU and GPU).
8 years ago
yuyang18 66590a0b88 Fix typo in blas_impl.h
8 years ago
yuyang18 27197290dc matmul support float16/double
8 years ago
Yu Yang fcd31d6161 Follow comments and polish code names
8 years ago
Yu Yang 0a13d3c67a Move MatMul to blas_impl.h
8 years ago
Yu Yang 3dd01823a8 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/clean_matmul
8 years ago
Yu Yang c6a6d87f96 Rewrite Matmul, make code cleaner
8 years ago
fengjiayi b708ec0ae1
Merge pull request #10412 from JiayiFeng/correct_TensorCopy_misuse
8 years ago
Darcy 8f8a4768dc adding device_context to blas deps list (#10420)
8 years ago
fengjiayi 0c99cd7bbb fix errors in sequence_padding_test
8 years ago
Siddharth Goyal b65282168c Fix cpplint errors in lstm kernel (#10394)
8 years ago
fengjiayi e309f42293 fix errors in concat_test
8 years ago
Yu Yang 0285a2b95d
Merge pull request #10371 from reyoung/refine_code
8 years ago
Abhinav Arora c9f55dfafc
Fix CPPLint issues in /math/detail/gru_kernel.h (#10390)
8 years ago
Yu Yang ef6ea790dc Clean and extract blas
8 years ago
Yu Yang 815d888468 Clean MatMul
8 years ago
Yu Yang bc8160350b Fix compile
8 years ago
Yu Yang a6edeb39b3 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/clean_blas
8 years ago
chengduo 4fbde42cdf Fix __shfl_down_sync_ of cross_entropy (#10345)
8 years ago
Yu Yang caa4027d9d Follow comments
8 years ago
Abhinav Arora 1945b729b6
Fix CPPLint issues with math/sequence_padding (#10317)
8 years ago
chengduo 9bcd9f661b fix cpplint error (#10329)
8 years ago
Yu Yang 4db43c6c9f Naive implement cblas
8 years ago
Yu Yang 60d6348e69 Revert develop
8 years ago
Yu Yang 86af6bdc81 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/clean_blas
8 years ago
Yu Yang 49dedfad17 Polish code and tests
8 years ago
Abhinav Arora 738585476d
Fix more CPPLint issues in fluid/operators/math (#10276)
8 years ago
dzhwinter eb6f9dd5de
Feature/cuda9 cudnn7 (#10140)
8 years ago
Yu Yang c888e01660 Refactor GEMM in blas
8 years ago
Abhinav Arora e735359631
Fix more CPPlint issues in fluid/operators/math (#10249)
8 years ago
fengjiayi 71fa3ca9c4
Merge pull request #10232 from JiayiFeng/fix_unittests
8 years ago
fengjiayi 30f9dc92e5 fix errors
8 years ago
fengjiayi 330fa95cbd Follow comments
8 years ago
Abhinav Arora 83b1a8f6bf
Pending more CPPLint errors in fluid/operators/math (#10243)
8 years ago
fengjiayi bcf260e1e8 fix several unit tests
8 years ago
Abhinav Arora f457d5da06
Fix more CPPLint errors (#10218)
8 years ago
Yu Yang 580dad0c2c Fix compile when there is no mkl
8 years ago
Yu Yang 2a06e307d0 Fix batch_gemm bugs
8 years ago
Kexin Zhao 92913027fc
fix unused var error (#9908)
8 years ago
Kexin Zhao 617e790a59
fix cuda 7.5 compile error (#9885)
8 years ago
Kexin Zhao 7ed457e77a Fix cuda 7.5 error with cublas GEMM (#9811)
8 years ago
Kexin Zhao b2a1c9e8b7 Add float16 support to non-cudnn softmax op on GPU (#9686)
8 years ago
Kexin Zhao d00bd9eb72 Update the cuda API and enable tensor core for GEMM (#9622)
8 years ago
chengduoZH e099b18045 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/add_CUDAPinnedPlace
8 years ago
Yang Yu af230d9bef Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cpp_parallel_executor
8 years ago
dzhwinter 8425c2c859
Speed/sequence op1 (#9217)
8 years ago
Yang Yu b0775588c0 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cpp_parallel_executor
8 years ago
chengduoZH ab601c19c3 Add CUDAPinnedPlace
8 years ago
Luo Tao 6332bd1ed8 Merge branch 'develop' into infer_mkl
8 years ago
Yu Yang 50e7e25db3 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cpp_parallel_executor
8 years ago
chengduoZH aca9180a76 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/fix_concat
8 years ago
chengduoZH 750aff10ce code refine
8 years ago
chengduoZH 043f47b27f fix concat op
8 years ago
Luo Tao ae820a34bc Merge branch 'develop' into infer_mkl
8 years ago
Tao Luo 9126e626fc
Merge pull request #9165 from ROCmSoftwarePlatform/amd_cmake_01
8 years ago
Kexin Zhao 4eaa789730 resolve conflict
8 years ago
Kexin Zhao ed2bc194c5
Merge pull request #9176 from kexinzhao/batch_norm_fp16
8 years ago
Kexin Zhao 70e7122785 initial commit
8 years ago
sabreshao e50205e744 CMake refine for HIP support.
8 years ago
Yang yaming 381c6a026d
Merge pull request #9100 from pkuyym/fix-9049
8 years ago
yangyaming 2f2c5f5e60 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix-9049
8 years ago
Xi Chen 9eae086e39 add math_function to softmax's dep list
8 years ago
Yu Yang 9cb8f50302 Complete fetch op
8 years ago
Kexin Zhao 39c676e208 initial commit
8 years ago
xuwei06 ab3543e35e Fix compilation for gcc5.4
8 years ago
yangyaming bf3f56e899 Finish adaption for backward.
8 years ago
sabreshao 45c988d86a Demostration of cmake refine for HIP support.
8 years ago
Tao Luo a448fbe9e1
Merge pull request #9134 from putcn/fix-selected-row-dep
8 years ago
qingqing01 7c1a0b77a0
Delete the detection_output_op, which had been split into several operators. (#9121)
8 years ago
Xi Chen d20c6eb6de add math_function to selected_rows_functor dependency list
8 years ago
dzhwinter 128adf53cb
[Speed]implement cudnn sequence softmax cudnn (#8978)
8 years ago
Luo Tao de13f0eb4e Merge branch 'develop' into infer_mkl
8 years ago
Kexin Zhao 3b44b849d3 address comments
8 years ago
Kexin Zhao 95de7617eb fix bug
8 years ago
Kexin Zhao 1998d5afa2 add gpu info func to get compute cap
8 years ago
Kexin Zhao d400b4192d fix math function arch mismatch for older GPU
8 years ago
kexinzhao 90215b7844
Add float16 GEMM math function on GPU (#8695)
8 years ago
Luo Tao bc0cfb2283 remove PADDLE_USE_ATLAS
8 years ago
Luo Tao 49f3f1db07 add back framework_proto depends
8 years ago
Luo Tao 3ddc997182 rename concat_functor to concat, refine CMakeLists based on comments
8 years ago
Luo Tao 1ef97fa7b1 Merge branch 'develop' into math_function
8 years ago
chengduo 84aea8a8a1
Merge pull request #8669 from chengduoZH/feature/concat_op
8 years ago
kexinzhao 266ccaa843
Integrate float16 into data_type_transform (#8619)
8 years ago
chengduoZH 131ec276ed fix bug for big number; float->double and code refine
8 years ago
chengduoZH 82bd82c186 follow comments and refine code
8 years ago
chengduoZH 00e596edbe get max threads of GPU
8 years ago
Luo Tao f67275a920 refine operator/math/CMakeLists.txt, seperate im2col from math_function
8 years ago
chengduoZH 60e7ee0611 refine concat_op
8 years ago
Yi Wang cfffb1a362
Update tensor_util.h (#8422)
8 years ago
qingqing01 24509f4af9 Fix the grammar in copyright. (#8403)
8 years ago
Yi Wang fc374821dd Correct #include path
8 years ago
Yi Wang 90648f336d Move file to fluid/; Edit CMakeLists.txt
8 years ago