Commit Graph

5729 Commits (c21a979790aebefffdde5c470dc406a2a81959e7)

Author SHA1 Message Date
123malin 03d4665f44
prefetch optimize (#29095)
4 years ago
WangXi 0c2a51d240
optimizer amp, all use fp16 communication, overlap last comm and compute (#28957)
4 years ago
Jack Zhou bc6033f86b
fix gru gcc7.4 bug for the gru compile
4 years ago
wangchaochaohu b818429ae7
optimize cumsum OP (#29193)
4 years ago
lilong12 7e5e9934fe
update expand as op to use the shape of the target tensor instead of the target tensor itself. (#29020)
4 years ago
Jack Zhou 085260f3de
Add eigen gru and fix the dropout bug in the rnn
4 years ago
arlesniak bc902044a4
Fixes mkldnn dygraph learning rate scheduler crashes (#28988)
4 years ago
Shang Zhizhou b9e76a0103
detect tensorRT plugin fp16 in runtime (#27933)
4 years ago
Noel da71173bc9
Fix ops doc for some ops
4 years ago
joanna.wozna.intel b0d1ac161e
Add bf16 pool2d and unify bf16 unit tests (#29039)
4 years ago
joejiong 582c0a0468
add uint8 for reshape op (#28996)
4 years ago
taixiurong a5aa4dc7a9
add xpu elementwise ops (#29031)
4 years ago
joejiong b04c78ef5e
Update pow (#29000)
4 years ago
wawltor b2c8a00745
remove eigen threadpool for the speed up
4 years ago
lilong12 767d0ba267
update, test=develop (#28700)
4 years ago
123malin fbf9564f6b
【paddle.distributed.fleet】Optimize ParameterServer's Async Mode (#28442)
4 years ago
furnace 8ff3550658
refactor momentum op to combine weight (#27414)
4 years ago
Jacek Czaja bd1d6d3b30
extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758)
4 years ago
yaoxuefeng 71c1cd1408
fix truncated_gaussian seed (#28777)
4 years ago
gongweibao 1dad8ceaab
Fix gpu memory allocation bug. (#28703)
4 years ago
Chen Weihang b969c32ab1
fix occupied 0 device memory bug (#28771)
4 years ago
joejiong 1a532d5133
add uint8 support for squeeze operator (#28734)
4 years ago
wangchaochaohu 8b853b3030
fix the number of perf algo for conv cudnn in exhaustive mode (#28694)
4 years ago
joanna.wozna.intel 8c0ea4bffe
Add bf16 matmul, fc, elementwise add and mul (#28729)
4 years ago
yaoxuefeng 08b62f4902
fix shuffle batch op shuffle (#28533)
4 years ago
taixiurong d3d1a6b6e0
add kunlun kernel: slice, slice_grad, top_k, cast. *test=kunlun (#28542)
4 years ago
Jack Zhou 9362d85e0e
Add LSTM, Simple RNN and GRU CPU kernel (#28577)
4 years ago
QingshuChen 30ef3815b3
adjust kunlun header file (#28536)
4 years ago
Zhang Ting dab4920568
improve performance of cast op (#28727)
4 years ago
yaoxuefeng 03f46e3526
fix truncated_gaussian op cuda seed setting (#28678)
4 years ago
Wojciech Uss 04bcc13fac
Add multi_gru op and tests (#28591)
4 years ago
joejiong 32b90b1c2d
add log10 (#28576)
4 years ago
Guo Sheng 858ffa0c8b
Fix the dropout setting when not initialized in rnn_op. (#28561)
4 years ago
Jacek Czaja 6d8d3d4c22
[oneDNN] Layer norm bf16 kernel (#28619)
4 years ago
Zhou Wei bf143652ac
fix lstm OP compile error on windows (#28667)
4 years ago
石晓伟 57dab959ca
add datanorm op new scale_w register (#28657)
4 years ago
cc 65aac81191
Fix fake_quant error when cout > 1024, test=develop (#28603)
4 years ago
lilong12 b2f7ab6636
bug fix, test=develop (#28648)
4 years ago
wawltor 8f2656ef5c
fix the gradient bug for the topk v2
4 years ago
wangchaochaohu a972c33fd7
refine gather OP performance for dynamic mode (#28587)
4 years ago
joanna.wozna.intel 2cb71c0cde
Add checkpoint to quantize (#28612)
4 years ago
pangyoki b889a0cee2
add gaussian_random op_version (#28602)
4 years ago
Guo Sheng 110febdc54
Fix gradients with ignore_idx in softmax_with_cross_entropy (#28622)
4 years ago
Leo Chen f962bd3432
Fix cudnn workspace limit in cudnn-8 (#28611)
4 years ago
Leo Chen 90805e2df7
Register op_version for new attribute use_addto (#28463)
4 years ago
lilong12 ed9dd7c9f0
add send and recv ops (#28590)
4 years ago
Zhong Hui a829357e4d
register the op version for some ops
4 years ago
Zhou Wei bf6e7cba7a
updata 2.0 API english doc (#28525)
4 years ago
Shang Zhizhou 8699f38d08
裁剪transformer模型trt支持;修复tensorRT不支持DeletePass的bug (#28517)
4 years ago
joejiong 08d2413142
add log2 operator (#28319)
4 years ago