Commit Graph

10708 Commits (2c0a4a347015d2201fde5acdd2c0bb411a43f8f0)

Author SHA1 Message Date
lilong12 1decf4ada6
update, test=develop (#29331)
5 years ago
QingshuChen 74bf3bed36
support global pooling for kunlun (#29293)
5 years ago
liym27 b10ecd9d3a
[inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267)
5 years ago
Chen Weihang 9ad800ebb2
Support type promote for basic math ops (quantum required) (#29265)
5 years ago
tangwei12 8358791607
fix gpu outofrange (#29238)
5 years ago
Leo Chen b58cfff89d
use has_grad instead of train_mode (#29309)
5 years ago
Zhang Ting befd6d5338
improve elementwise_add_grad perf (#29277)
5 years ago
Shang Zhizhou ebf689197d
fix tensorrt output shape error (#29308)
5 years ago
Aurelius84 67c700b479
[Dy2Stat] Add cache for Executor and Context in run_program_op (#28421)
5 years ago
ShenLiang 696dc4bb13
fix the warning of reducer (#29323)
5 years ago
wangchaochaohu c4be80f402
polish the code of cumsum and remove some unused code (#29303)
5 years ago
Wilber d68af02c04
fix analysis_config bug. (#29304)
5 years ago
ShenLiang 0fb18bc214
enforce the matmul_v2 error message (#29297)
5 years ago
Zhen Wang 9b59a589b1
Remove some useless log. (#29300)
5 years ago
Leo Chen 13a22a3752
fix shape of tile_grad op (#29289)
5 years ago
Zhen Wang be3777a50a
Add pure fp16 training with master weights. (#27712)
5 years ago
Wojciech Uss 6673fb0565
change import math.h to cmath (#29260)
5 years ago
furnace 7584bb5096
Layer norm fp16 (#29169)
5 years ago
Shang Zhizhou c59b4f28a2
fix cmake error when WITH_GPU=ON and WITH_TENSORRT=ON && WITH_MKL=OFF (#29275)
5 years ago
Leo Chen 116305ea4b
Improve performance of elementwise_add grad op (#29187)
5 years ago
卖鱼的哲学 07c67d5a8b
add deformable_conv op on xpu (#29234)
5 years ago
Chen Weihang 1de32f823d
Hot fix complle failed in gcc4.8 caused by complex impl (#29254)
5 years ago
GeminiCarrie 642abe2a48
Fix a bug when running on an operating system without "bash." (#29131)
5 years ago
ShenLiang 46b73e6cd9
Change the api of DataParallel and Fleet (#29224)
5 years ago
QingshuChen 64f29fbb70
update kunlun conv2d/softmax/elementwise implemetation (#29229)
5 years ago
chentianyu03 8f45d14263
add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199)
5 years ago
Zhou Wei c0a991c874
accumulate gradient for leaf tensor with previous graph and expose leaf tensor concept (#28429)
5 years ago
Wilber 74c43ac638
fix lite unit test. (#29233)
5 years ago
Adam Osewski 4096ff94dc
Small optimizations for conv2d kernel subroutines. (#29188)
5 years ago
joanna.wozna.intel 5c61eeef61
Enable all image classification models (#29155)
5 years ago
Wilber 4fec182d24
[Lite-Subgraph] Fix compile error for lite subgraph. (#29146)
5 years ago
123malin b5c6342336
Update ps gpu (#29209)
5 years ago
liym27 865a45984f
Check whether there is any inplace operation affecting gradient calculation. (#27901)
5 years ago
123malin 03d4665f44
prefetch optimize (#29095)
5 years ago
WangXi 0c2a51d240
optimizer amp, all use fp16 communication, overlap last comm and compute (#28957)
5 years ago
Chen Weihang 0b032faeee
Polish unittests details and execution conditions to adapt to MUSL (#29044)
5 years ago
Wojciech Uss 4fd4095d1b
Add quantization of multi_gru op and tests (#28615)
5 years ago
Jack Zhou bc6033f86b
fix gru gcc7.4 bug for the gru compile
5 years ago
wangchaochaohu b818429ae7
optimize cumsum OP (#29193)
5 years ago
ShenLiang e2d01eb650
Support dynamic graph distributed (#28997)
5 years ago
lilong12 7e5e9934fe
update expand as op to use the shape of the target tensor instead of the target tensor itself. (#29020)
5 years ago
Zhou Wei e668cb07fb
fix CUDA 11 error on windows (#29101)
5 years ago
Jack Zhou 085260f3de
Add eigen gru and fix the dropout bug in the rnn
5 years ago
yaoxuefeng 545df287fc
add user_define_dump (#28596)
5 years ago
arlesniak bc902044a4
Fixes mkldnn dygraph learning rate scheduler crashes (#28988)
5 years ago
Shang Zhizhou b9e76a0103
detect tensorRT plugin fp16 in runtime (#27933)
5 years ago
Leo Chen fd3fcb051a
fix typo of flag name (#29154)
5 years ago
Noel da71173bc9
Fix ops doc for some ops
5 years ago
Leo Chen 770395cb93
Split train_mode and has_grad for tracer (#29064)
5 years ago
Aurelius84 7ae3cb554a
Polish CUDA Information stdout (#29109)
5 years ago
WangXi 173c22aec2
optimize fast graph executor (#28962)
5 years ago
Shang Zhizhou 562ded1041
fix unittest trt_dynamic_shape_transformer_prune_test error (#29122)
5 years ago
Shibo Tao db41258501
add API serialize_program, serialize_persistables, save_to_file, deserialize_program, deserialize_persistables, load_from_file. (#29034)
5 years ago
joanna.wozna.intel b0d1ac161e
Add bf16 pool2d and unify bf16 unit tests (#29039)
5 years ago
joanna.wozna.intel fddea67445
Fix cpu_bfloat16_pass (#28730)
5 years ago
Qi Li 2fd16cf6fc
fix win ci failure, test=develop (#29089)
5 years ago
Chen Weihang fea0e294ee
Hide the C++ stack by default and add hints (#29042)
5 years ago
joejiong 582c0a0468
add uint8 for reshape op (#28996)
5 years ago
Zhou Wei 8ca0a8a859
fix tensor detach to zero copy (#27921)
5 years ago
taixiurong a5aa4dc7a9
add xpu elementwise ops (#29031)
5 years ago
joejiong b04c78ef5e
Update pow (#29000)
5 years ago
wawltor b2c8a00745
remove eigen threadpool for the speed up
5 years ago
Wojciech Uss 7b5a8e46de
Add multi_gru_fuse_pass and tests (#28601)
5 years ago
lilong12 767d0ba267
update, test=develop (#28700)
5 years ago
Wojciech Uss 991345b368
Add multi_gru_seq_fuse_pass and tests (#28604)
5 years ago
123malin fbf9564f6b
【paddle.distributed.fleet】Optimize ParameterServer's Async Mode (#28442)
5 years ago
lilong12 f77a78cdee
enable pipeline to run with Executor.run() (#28373)
5 years ago
Thunderbrook 0073f9bdb0
support ps-gpu (#28752)
5 years ago
Chen Weihang 768dab441e
polish two api doc detail, test=document_fix (#28971)
5 years ago
furnace 8ff3550658
refactor momentum op to combine weight (#27414)
5 years ago
Jacek Czaja bd1d6d3b30
extends oneDNN caching keys so caching objects are unique to executor/predictor (#28758)
5 years ago
Pei Yang 994673bf4f
change avg pooling and global pooling to trt layer in dynamic shape mode (#28702)
5 years ago
yaoxuefeng 71c1cd1408
fix truncated_gaussian seed (#28777)
5 years ago
HappyAngel de528981e5
fix paddlepredictor build error. test=develop (#28792)
5 years ago
Wilber a22ea652cf
fix trt delete_pass bug. (#28763)
5 years ago
gongweibao 1dad8ceaab
Fix gpu memory allocation bug. (#28703)
5 years ago
Chen Weihang b969c32ab1
fix occupied 0 device memory bug (#28771)
5 years ago
joejiong 1a532d5133
add uint8 support for squeeze operator (#28734)
5 years ago
wangchaochaohu 8b853b3030
fix the number of perf algo for conv cudnn in exhaustive mode (#28694)
5 years ago
joanna.wozna.intel 8c0ea4bffe
Add bf16 matmul, fc, elementwise add and mul (#28729)
5 years ago
Wojciech Uss efc3b182f0
a fix for the fc_lstm_fuse_pass (#28709)
5 years ago
Zhou Wei 3b0dd5f620
fix bug that to_tensor not support paddle.Place (#28717)
5 years ago
yaoxuefeng 08b62f4902
fix shuffle batch op shuffle (#28533)
5 years ago
taixiurong d3d1a6b6e0
add kunlun kernel: slice, slice_grad, top_k, cast. *test=kunlun (#28542)
5 years ago
Jack Zhou 9362d85e0e
Add LSTM, Simple RNN and GRU CPU kernel (#28577)
5 years ago
QingshuChen 30ef3815b3
adjust kunlun header file (#28536)
5 years ago
Zhang Ting dab4920568
improve performance of cast op (#28727)
5 years ago
yaoxuefeng 03f46e3526
fix truncated_gaussian op cuda seed setting (#28678)
5 years ago
Wilber 04cefeacc5
Disable windows gpu static lib. (#28741)
5 years ago
Wojciech Uss 04bcc13fac
Add multi_gru op and tests (#28591)
5 years ago
wanghuancoder 5aec7dbeb0
use forward declarations for framework.pb.h (#28494)
5 years ago
joejiong 32b90b1c2d
add log10 (#28576)
5 years ago
Leo Chen 3d09929b1f
Add check for non-dispensable input (#28666)
5 years ago
Chen Weihang 7eeb99fe02
Add basic hook classes for dygraph & implement reduce hook (#28584)
5 years ago
Guo Sheng 858ffa0c8b
Fix the dropout setting when not initialized in rnn_op. (#28561)
5 years ago
Jacek Czaja 6d8d3d4c22
[oneDNN] Layer norm bf16 kernel (#28619)
5 years ago
lilong12 80d2024644
bug fix, test=develop (#28674)
5 years ago
Zhou Wei bf143652ac
fix lstm OP compile error on windows (#28667)
5 years ago
石晓伟 57dab959ca
add datanorm op new scale_w register (#28657)
5 years ago
cc 65aac81191
Fix fake_quant error when cout > 1024, test=develop (#28603)
5 years ago
lilong12 b2f7ab6636
bug fix, test=develop (#28648)
5 years ago
wawltor 8f2656ef5c
fix the gradient bug for the topk v2
5 years ago
wangchaochaohu a972c33fd7
refine gather OP performance for dynamic mode (#28587)
5 years ago
joanna.wozna.intel 2cb71c0cde
Add checkpoint to quantize (#28612)
5 years ago
lidanqing 804271cff9
Op version python mkldnn_inplace test (#28354)
5 years ago
pangyoki b889a0cee2
add gaussian_random op_version (#28602)
5 years ago
Guo Sheng 110febdc54
Fix gradients with ignore_idx in softmax_with_cross_entropy (#28622)
5 years ago
Wilber 8b97bb2e1f
Update cmake for arm ft and fix a bug for Predictor dtor. (#28586)
5 years ago
Leo Chen f962bd3432
Fix cudnn workspace limit in cudnn-8 (#28611)
5 years ago
Leo Chen 90805e2df7
Register op_version for new attribute use_addto (#28463)
5 years ago
danleifeng a24d186814
fix nccl init failed in parallel dygraph mode (#28497)
5 years ago
lilong12 ed9dd7c9f0
add send and recv ops (#28590)
5 years ago
Zhong Hui a829357e4d
register the op version for some ops
5 years ago
Zhou Wei bf6e7cba7a
updata 2.0 API english doc (#28525)
5 years ago
YUNSHEN XIE 7b1619e69b
disable test_trt_dynamic_shape_transformer_prune,test=document_fix (#28588)
5 years ago
Zhou Wei 849467b5aa
fix user set CUDA_VISIBLE_DEVICES start/end with quotation marks (#28547)
5 years ago
Shang Zhizhou 8699f38d08
裁剪transformer模型trt支持;修复tensorRT不支持DeletePass的bug (#28517)
5 years ago
joejiong 08d2413142
add log2 operator (#28319)
5 years ago
lidanqing 0fc181dbd0
[Fix bug] If the pass name is not found, IsCompatible should return false (#28475)
5 years ago
Wilber 1bf4836580
[Inference] Add TryShrinkMemory interface. (#28409)
5 years ago
wangchaochaohu c52fe48f6f
fix the GetKernelTypeForVar of input for fluid.gather (#28534)
5 years ago
wangchaochaohu d7cfee9b31
Checkout point add (#28488)
5 years ago
Pei Yang 75196cda40
Paddle-TRT int8 support mul op channelwise quant (#28422)
5 years ago
zhupengyang 47cbf61dd4
fix softmax unittest float16 random error (#28480)
5 years ago
YUNSHEN XIE 369605be1d
fix cmake error when execute build_inference_lib (#28503)
5 years ago
Wilber 645e999afc
fix api_impl test. (#28483)
5 years ago
YUNSHEN XIE 1e698c600e
fix cmake error when setting ut timeout properity (#28492)
5 years ago
wangchaochaohu e14ed71cc2
refine the performance of gather Op (#28458)
5 years ago
YUNSHEN XIE ba0756325a
exec ut no more than 15s 1 (#28439)
5 years ago
Chen Weihang 155b4f9b6c
Remove selected rows all reduce over height check (#28460)
5 years ago
taixiurong fad4744aa4
fix crash in adam in xpu, *test=kunlun (#28433)
5 years ago
QingshuChen 6bba8e57b1
fix batch_norm_xpu bug & remove xpusimulator dependence (#28430)
5 years ago
Wilber ced5c40c41
Update memory release interface. (#28456)
5 years ago
joanna.wozna.intel 7821759d48
Add bfloat16 softmax and gelu (#28394)
5 years ago
iducn ba0fe0a812
revert the modified shell script (#28453)
5 years ago
Chen Weihang c42e656179
Add retry for dygraph parallel socket bind (#28404)
5 years ago
石晓伟 c41fd033e5
check op_version_registry in CI test, test=develop (#28402)
5 years ago
Jacek Czaja ca41541472
[oneDNN]Sum bf16 kernel (#28382)
5 years ago
Chen Weihang 23439b1688
show cpp stack when catch signal (#28415)
5 years ago
Leo Chen 44a476c2ab
support cuda pinned place (#28416)
5 years ago
lidanqing 12b9587be5
Add conv_bias pass version python test (#28278)
5 years ago
Wilber 05114693cf
[Inference] Memory modification for ShrinkMemory. (#28355)
5 years ago
Leo Chen 8b2436a776
Add broadcast_shape api (#28257)
5 years ago
石晓伟 21a63f6f90
enhance the op_version_registry, test=develop (#28347)
5 years ago
Shang Zhizhou ea851796e5
TensorRT中ernie模型推理性能优化,支持变长输入 (#28367)
5 years ago
Jacek Czaja 84cc61b2cd
[oneDNN] sum op refactor (#28318)
5 years ago
Wilber 6f0f45f69c
copy_to_cpu support uint8 (#28372)
5 years ago
Wilber 09fd2b2aab
Paddle support compile on sw (#27858)
5 years ago
Leo Chen 6115c14fca
Pool2d cuda kernel supports fp16 (#28316)
5 years ago
Guo Sheng 9a600df373
Add rnn_op (#28197)
5 years ago