Commit Graph

673 Commits (b34933d9ee3b61dbbd642fd02f244c36d0d14550)

Author SHA1 Message Date
Kaipeng Deng 99c78b772a
fix softmax axis!=-1. test=develop (#19800)
6 years ago
Huihuang Zheng 12542320c5
Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989)
6 years ago
Yiqun Liu a65c728e5d
Implement the GPU kernel of fc operator (#19687)
6 years ago
123malin 2f037c3189
fix the diff between async mode and async_half mode (#19535)
6 years ago
Tao Luo 3ae939e48a
unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631)
6 years ago
Tao Luo d6c85c96dc
paddle::framework::vectorize() templatization (#19627)
6 years ago
Tao Luo 0a46d34538
refine some PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19607)
6 years ago
Tao Luo 75d1571995
refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603)
6 years ago
Tao Luo 49523ea189
replace PADDLE_ASSERT with PADDLE_ASSERT_MSG (#19586)
6 years ago
zhouwei25 84c728013c fix the compilation issue on windows caused by mkl_CSRMM (#19533)
6 years ago
Zeng Jinle 11f2f78458
fix sofmax seg fault in AVX, test=develop (#19487)
6 years ago
Yihua Xu b920395842 Use sparse matrix to implement fused emb_seq_pool operator (#19064)
6 years ago
silingtong123 af0fbd9012 change PADDLE_ENFORCE to PADDLE_ENFORCE_CUDA_SUCCESS (#19205)
6 years ago
LielinJiang 22fa4c2d24 Fix depthwise conv gpu kernel bug (#18582)
6 years ago
Bob Zhu 220eef602e Extend Matmul to support matrix multiplication with multiple heads (#18570)
6 years ago
Zeng Jinle f5641000bb
Add a unittest to inplace elementwise_add (#18385)
6 years ago
Hongyu Liu df2eee71d8
Sequence mask support tensor (#18249)
6 years ago
Yiqun Liu 660c1a65f3
Optimize fused_elewise_activation_grad op. (#18041)
6 years ago
Yiqun Liu 7e463c84a6
Optimize the concat and split cuda implementation for cases when the number of inputs/outputs is less than 5. (#17979)
6 years ago
Yibing Liu 33d1e56506
Enable seq_pool op to accept len 0 input (#17284)
6 years ago
Yiqun Liu 8fd39f3e99
Enhance fused_elementwise_activation op and add python api in contrib.layers (#17236)
6 years ago
Yiqun Liu 5782dddad0
Optimize the concat and split kernel for specical cases when the number of inputs/outputs is 2 (#17415)
6 years ago
tensor-tang 7ae461eb13
[CPU] refine cpu softmax bwd (#17534)
7 years ago
tensor-tang 0600b370ea
[CPU] refine softmax op fwd on CPU (#17522)
7 years ago
liuwei1031 ba70cc499e
fix security bugs : (#17464)
7 years ago
zhaoyuchen2018 b02f2aff04
Add conditional compile for gru opt (#17368)
7 years ago
Krzysztof Binias 0823a7bc8b Optimize the sequence padding op (#17403)
7 years ago
zhaoyuchen2018 8a2caacdbc
improve gru unit performance. (#16338)
7 years ago
Kaipeng Deng a71d8fdb87
Softmax_cross_entropy op add axis (#16806)
7 years ago
Yibing Liu 3c375751f8
Support seq len equal to 0 in sequence ops (#16935)
7 years ago
Kevin c474e7ddf5 fix overflow by int32 mul test=develop (#16794)
7 years ago
Qiao Longfei faae1b4170 fix cpplint test=develop
7 years ago
Qiao Longfei 0a8ff2ecd4 add cpu_merge_add_multi_noduplicated_test test=develop
7 years ago
Qiao Longfei 920a960974 optimize merge add if input rows of all selected rows is not duplicated
7 years ago
Qiao Longfei baf02328b2 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
7 years ago
Kaipeng Deng 54474637ae
Merge pull request #16057 from heavengate/softmax_axis
7 years ago
Qiao Longfei 30618409db Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
7 years ago
dengkaipeng 90bd038d35 fix format. test=develop
7 years ago
phlrain 1580be5d6c fix sequence pad; test=develop
7 years ago
dengkaipeng 93701dba50 add jit kernel for softmax axis. test=develop
7 years ago
dengkaipeng 6c64182709 refine softmax kernel. test=develop
7 years ago
phlrain 802b33489a remove resize then seq num == 1; test=develop
7 years ago
sneaxiy 5a92e4c097 revert revert 16144
7 years ago
Zeng Jinle a91964c8fe Revert "PaddingRNN model memory optimize"
7 years ago
Zeng Jinle 0b49e43d3a
Merge pull request #16144 from sneaxiy/rnn_mem_opt
7 years ago
sneaxiy b26e9bd232 refine code
7 years ago
tensor-tang 6ff230a624 Merge remote-tracking branch 'ups/develop' into refine/jit
7 years ago
tensor-tang 14a764c930 simplify the jitkernel templates and tests
7 years ago
Yiqun Liu 5bde120243
Make parent_idx a dispensable output for beam_search op to support models saved by older paddle version. (#16106)
7 years ago
tensor-tang 802f362ac4 unify the kernelfuncs cache and add unit test
7 years ago