Commit Graph

702 Commits (310edc0d0c9050f1f01c108655493c1935c00214)

Author SHA1 Message Date
GaoWei8 d4dda8628e optimize fc jit (#21878)
5 years ago
GaoWei8 5af0c7ba89 Modify padding strategy: remove weight copy in fc padding (#21650)
5 years ago
Tao Luo 01fa4ead61
fix -Wno-error=sign-compare warning in gcc8 (#21434)
5 years ago
Tao Luo c0656dcb1a
remove -Wno-error=sign-compare, make warning as error (#21358)
5 years ago
GaoWei8 8493f20ebc Polish the codes of fc when needs padding (#21378)
5 years ago
GaoWei8 234060f88f Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972)
5 years ago
Liufang Sang f0b1518438 add dequantize_abs_max op and modify lookup_table op (#20899)
5 years ago
whs cfdd1fc2cd
Fix warpctc in padding mode. (#21033)
5 years ago
lilong12 e249d9a3e2
fix the computation for dx (grad for x) for prelu operation. (#20949)
5 years ago
Chen Weihang 2f27b10331
Add dependency for error_codes.proto (#21084)
5 years ago
zhaoyuchen2018 0059404e77
Fix ce ocr_recognition test fails (#20987)
5 years ago
Tao Luo 25ffa8445d
refine murmurhash3_x64_128 for bloom_filter (#20996)
5 years ago
zhaoyuchen2018 7f3a445e9a
Fix gru as small frame_size has error. (#20922)
5 years ago
Zhang Ting 8d1e9f0f7e maxout supports channel_last input (#20846)
5 years ago
Zhang Ting c18f1bd716 fix the bug of conv_transpose:compatible with Anylayout setting, test=develop (#20897)
5 years ago
zhang wenhui d428912503
fix select_rows mergeadd bug, test=develop (#20876)
5 years ago
Aurelius84 aacd16dbb4 add pyramid_hash_op (#20698)
5 years ago
Pei Yang e89c16b90d
Bug Fix: Paddle-TRT cannot handle adaptive pooling in pool2d op converter and "num" attribute in split op converter (#20733)
5 years ago
qingqing01 01eddc1a04
Support fp16 in GPU impl of fused_elemwise_activation_op. (#20636)
5 years ago
Zhang Ting 78910480c1 fix conv_transpose's bug: compatible with Anylayout setting, test=develop (#20589)
5 years ago
liym27 ad60b3b8ac mv two function in conv op for good code style (#20116)
5 years ago
Zhang Ting cf6919bf6e conv_transpose supports channel_last input, test=develop, test=document_preview (#20072)
5 years ago
danleifeng 425279a57b Improve elementwise operators performance in same dimensions. (#19763)
5 years ago
liym27 3aa331d97e fix conv2d and conv3d: (#20042)
5 years ago
liym27 24010472d4 fix pool2d pool3d,support asymmetric padding and channel_last (#19739)
5 years ago
chengduo fb2a9cdf83
Add fp16 support for pad and split (#19881)
5 years ago
Bob Zhu c670058a8d add support of matmul with multiple head even different width and height (#19708)
5 years ago
Kaipeng Deng 3f021781a1
fix softmax CE time limit check failed (#19846)
5 years ago
Aurelius84 fcf53e55ff
support 2-level lod of input in sequence_pool (#19839)
6 years ago
Kaipeng Deng 99c78b772a
fix softmax axis!=-1. test=develop (#19800)
6 years ago
Huihuang Zheng 12542320c5
Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989)
6 years ago
Yiqun Liu a65c728e5d
Implement the GPU kernel of fc operator (#19687)
6 years ago
123malin 2f037c3189
fix the diff between async mode and async_half mode (#19535)
6 years ago
Tao Luo 3ae939e48a
unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631)
6 years ago
Tao Luo d6c85c96dc
paddle::framework::vectorize() templatization (#19627)
6 years ago
Tao Luo 0a46d34538
refine some PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19607)
6 years ago
Tao Luo 75d1571995
refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603)
6 years ago
Tao Luo 49523ea189
replace PADDLE_ASSERT with PADDLE_ASSERT_MSG (#19586)
6 years ago
zhouwei25 84c728013c fix the compilation issue on windows caused by mkl_CSRMM (#19533)
6 years ago
Zeng Jinle 11f2f78458
fix sofmax seg fault in AVX, test=develop (#19487)
6 years ago
Yihua Xu b920395842 Use sparse matrix to implement fused emb_seq_pool operator (#19064)
6 years ago
silingtong123 af0fbd9012 change PADDLE_ENFORCE to PADDLE_ENFORCE_CUDA_SUCCESS (#19205)
6 years ago
LielinJiang 22fa4c2d24 Fix depthwise conv gpu kernel bug (#18582)
6 years ago
Bob Zhu 220eef602e Extend Matmul to support matrix multiplication with multiple heads (#18570)
6 years ago
Zeng Jinle f5641000bb
Add a unittest to inplace elementwise_add (#18385)
6 years ago
Hongyu Liu df2eee71d8
Sequence mask support tensor (#18249)
6 years ago
Yiqun Liu 660c1a65f3
Optimize fused_elewise_activation_grad op. (#18041)
6 years ago
Yiqun Liu 7e463c84a6
Optimize the concat and split cuda implementation for cases when the number of inputs/outputs is less than 5. (#17979)
6 years ago
Yibing Liu 33d1e56506
Enable seq_pool op to accept len 0 input (#17284)
6 years ago
Yiqun Liu 8fd39f3e99
Enhance fused_elementwise_activation op and add python api in contrib.layers (#17236)
6 years ago