Commit Graph

695 Commits (01a9646323e306c505efde60aa1c67669052de8f)

Author SHA1 Message Date
whs cfdd1fc2cd
Fix warpctc in padding mode. (#21033)
6 years ago
lilong12 e249d9a3e2
fix the computation for dx (grad for x) for prelu operation. (#20949)
6 years ago
Chen Weihang 2f27b10331
Add dependency for error_codes.proto (#21084)
6 years ago
zhaoyuchen2018 0059404e77
Fix ce ocr_recognition test fails (#20987)
6 years ago
Tao Luo 25ffa8445d
refine murmurhash3_x64_128 for bloom_filter (#20996)
6 years ago
zhaoyuchen2018 7f3a445e9a
Fix gru as small frame_size has error. (#20922)
6 years ago
Zhang Ting 8d1e9f0f7e maxout supports channel_last input (#20846)
6 years ago
Zhang Ting c18f1bd716 fix the bug of conv_transpose:compatible with Anylayout setting, test=develop (#20897)
6 years ago
zhang wenhui d428912503
fix select_rows mergeadd bug, test=develop (#20876)
6 years ago
Aurelius84 aacd16dbb4 add pyramid_hash_op (#20698)
6 years ago
Pei Yang e89c16b90d
Bug Fix: Paddle-TRT cannot handle adaptive pooling in pool2d op converter and "num" attribute in split op converter (#20733)
6 years ago
qingqing01 01eddc1a04
Support fp16 in GPU impl of fused_elemwise_activation_op. (#20636)
6 years ago
Zhang Ting 78910480c1 fix conv_transpose's bug: compatible with Anylayout setting, test=develop (#20589)
6 years ago
liym27 ad60b3b8ac mv two function in conv op for good code style (#20116)
6 years ago
Zhang Ting cf6919bf6e conv_transpose supports channel_last input, test=develop, test=document_preview (#20072)
6 years ago
danleifeng 425279a57b Improve elementwise operators performance in same dimensions. (#19763)
6 years ago
liym27 3aa331d97e fix conv2d and conv3d: (#20042)
6 years ago
liym27 24010472d4 fix pool2d pool3d,support asymmetric padding and channel_last (#19739)
6 years ago
chengduo fb2a9cdf83
Add fp16 support for pad and split (#19881)
6 years ago
Bob Zhu c670058a8d add support of matmul with multiple head even different width and height (#19708)
6 years ago
Kaipeng Deng 3f021781a1
fix softmax CE time limit check failed (#19846)
6 years ago
Aurelius84 fcf53e55ff
support 2-level lod of input in sequence_pool (#19839)
6 years ago
Kaipeng Deng 99c78b772a
fix softmax axis!=-1. test=develop (#19800)
6 years ago
Huihuang Zheng 12542320c5
Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989)
6 years ago
Yiqun Liu a65c728e5d
Implement the GPU kernel of fc operator (#19687)
6 years ago
123malin 2f037c3189
fix the diff between async mode and async_half mode (#19535)
6 years ago
Tao Luo 3ae939e48a
unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631)
6 years ago
Tao Luo d6c85c96dc
paddle::framework::vectorize() templatization (#19627)
6 years ago
Tao Luo 0a46d34538
refine some PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19607)
6 years ago
Tao Luo 75d1571995
refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603)
6 years ago
Tao Luo 49523ea189
replace PADDLE_ASSERT with PADDLE_ASSERT_MSG (#19586)
6 years ago
zhouwei25 84c728013c fix the compilation issue on windows caused by mkl_CSRMM (#19533)
6 years ago
Zeng Jinle 11f2f78458
fix sofmax seg fault in AVX, test=develop (#19487)
6 years ago
Yihua Xu b920395842 Use sparse matrix to implement fused emb_seq_pool operator (#19064)
6 years ago
silingtong123 af0fbd9012 change PADDLE_ENFORCE to PADDLE_ENFORCE_CUDA_SUCCESS (#19205)
6 years ago
LielinJiang 22fa4c2d24 Fix depthwise conv gpu kernel bug (#18582)
6 years ago
Bob Zhu 220eef602e Extend Matmul to support matrix multiplication with multiple heads (#18570)
6 years ago
Zeng Jinle f5641000bb
Add a unittest to inplace elementwise_add (#18385)
6 years ago
Hongyu Liu df2eee71d8
Sequence mask support tensor (#18249)
6 years ago
Yiqun Liu 660c1a65f3
Optimize fused_elewise_activation_grad op. (#18041)
6 years ago
Yiqun Liu 7e463c84a6
Optimize the concat and split cuda implementation for cases when the number of inputs/outputs is less than 5. (#17979)
6 years ago
Yibing Liu 33d1e56506
Enable seq_pool op to accept len 0 input (#17284)
6 years ago
Yiqun Liu 8fd39f3e99
Enhance fused_elementwise_activation op and add python api in contrib.layers (#17236)
6 years ago
Yiqun Liu 5782dddad0
Optimize the concat and split kernel for specical cases when the number of inputs/outputs is 2 (#17415)
6 years ago
tensor-tang 7ae461eb13
[CPU] refine cpu softmax bwd (#17534)
6 years ago
tensor-tang 0600b370ea
[CPU] refine softmax op fwd on CPU (#17522)
6 years ago
liuwei1031 ba70cc499e
fix security bugs : (#17464)
6 years ago
zhaoyuchen2018 b02f2aff04
Add conditional compile for gru opt (#17368)
6 years ago
Krzysztof Binias 0823a7bc8b Optimize the sequence padding op (#17403)
6 years ago
zhaoyuchen2018 8a2caacdbc
improve gru unit performance. (#16338)
6 years ago