Commit Graph

684 Commits (569951c418fb3c9f82cbdde9fda3910cc7033bff)

Author SHA1 Message Date
qingqing01 01eddc1a04
Support fp16 in GPU impl of fused_elemwise_activation_op. (#20636)
6 years ago
Zhang Ting 78910480c1 fix conv_transpose's bug: compatible with Anylayout setting, test=develop (#20589)
6 years ago
liym27 ad60b3b8ac mv two function in conv op for good code style (#20116)
6 years ago
Zhang Ting cf6919bf6e conv_transpose supports channel_last input, test=develop, test=document_preview (#20072)
6 years ago
danleifeng 425279a57b Improve elementwise operators performance in same dimensions. (#19763)
6 years ago
liym27 3aa331d97e fix conv2d and conv3d: (#20042)
6 years ago
liym27 24010472d4 fix pool2d pool3d,support asymmetric padding and channel_last (#19739)
6 years ago
chengduo fb2a9cdf83
Add fp16 support for pad and split (#19881)
6 years ago
Bob Zhu c670058a8d add support of matmul with multiple head even different width and height (#19708)
6 years ago
Kaipeng Deng 3f021781a1
fix softmax CE time limit check failed (#19846)
6 years ago
Aurelius84 fcf53e55ff
support 2-level lod of input in sequence_pool (#19839)
6 years ago
Kaipeng Deng 99c78b772a
fix softmax axis!=-1. test=develop (#19800)
6 years ago
Huihuang Zheng 12542320c5
Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989)
6 years ago
Yiqun Liu a65c728e5d
Implement the GPU kernel of fc operator (#19687)
6 years ago
123malin 2f037c3189
fix the diff between async mode and async_half mode (#19535)
6 years ago
Tao Luo 3ae939e48a
unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631)
6 years ago
Tao Luo d6c85c96dc
paddle::framework::vectorize() templatization (#19627)
6 years ago
Tao Luo 0a46d34538
refine some PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19607)
6 years ago
Tao Luo 75d1571995
refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603)
6 years ago
Tao Luo 49523ea189
replace PADDLE_ASSERT with PADDLE_ASSERT_MSG (#19586)
6 years ago
zhouwei25 84c728013c fix the compilation issue on windows caused by mkl_CSRMM (#19533)
6 years ago
Zeng Jinle 11f2f78458
fix sofmax seg fault in AVX, test=develop (#19487)
6 years ago
Yihua Xu b920395842 Use sparse matrix to implement fused emb_seq_pool operator (#19064)
6 years ago
silingtong123 af0fbd9012 change PADDLE_ENFORCE to PADDLE_ENFORCE_CUDA_SUCCESS (#19205)
6 years ago
LielinJiang 22fa4c2d24 Fix depthwise conv gpu kernel bug (#18582)
6 years ago
Bob Zhu 220eef602e Extend Matmul to support matrix multiplication with multiple heads (#18570)
6 years ago
Zeng Jinle f5641000bb
Add a unittest to inplace elementwise_add (#18385)
6 years ago
Hongyu Liu df2eee71d8
Sequence mask support tensor (#18249)
6 years ago
Yiqun Liu 660c1a65f3
Optimize fused_elewise_activation_grad op. (#18041)
6 years ago
Yiqun Liu 7e463c84a6
Optimize the concat and split cuda implementation for cases when the number of inputs/outputs is less than 5. (#17979)
6 years ago
Yibing Liu 33d1e56506
Enable seq_pool op to accept len 0 input (#17284)
6 years ago
Yiqun Liu 8fd39f3e99
Enhance fused_elementwise_activation op and add python api in contrib.layers (#17236)
6 years ago
Yiqun Liu 5782dddad0
Optimize the concat and split kernel for specical cases when the number of inputs/outputs is 2 (#17415)
6 years ago
tensor-tang 7ae461eb13
[CPU] refine cpu softmax bwd (#17534)
6 years ago
tensor-tang 0600b370ea
[CPU] refine softmax op fwd on CPU (#17522)
6 years ago
liuwei1031 ba70cc499e
fix security bugs : (#17464)
6 years ago
zhaoyuchen2018 b02f2aff04
Add conditional compile for gru opt (#17368)
6 years ago
Krzysztof Binias 0823a7bc8b Optimize the sequence padding op (#17403)
6 years ago
zhaoyuchen2018 8a2caacdbc
improve gru unit performance. (#16338)
6 years ago
Kaipeng Deng a71d8fdb87
Softmax_cross_entropy op add axis (#16806)
6 years ago
Yibing Liu 3c375751f8
Support seq len equal to 0 in sequence ops (#16935)
6 years ago
Kevin c474e7ddf5 fix overflow by int32 mul test=develop (#16794)
6 years ago
Qiao Longfei faae1b4170 fix cpplint test=develop
6 years ago
Qiao Longfei 0a8ff2ecd4 add cpu_merge_add_multi_noduplicated_test test=develop
6 years ago
Qiao Longfei 920a960974 optimize merge add if input rows of all selected rows is not duplicated
6 years ago
Qiao Longfei baf02328b2 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
6 years ago
Kaipeng Deng 54474637ae
Merge pull request #16057 from heavengate/softmax_axis
6 years ago
Qiao Longfei 30618409db Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add-async-ssa-graph-executor-communicator
6 years ago
dengkaipeng 90bd038d35 fix format. test=develop
6 years ago
phlrain 1580be5d6c fix sequence pad; test=develop
7 years ago