Commit Graph

1130 Commits (b34933d9ee3b61dbbd642fd02f244c36d0d14550)

Author SHA1 Message Date
nhzlx b21770a2aa cherry-pick from feature/anakin-engine: Add subgraph fuse support and anakin engine #16018
6 years ago
nhzlx 084310f536 paddle-anakin: concat, split, pool2d converter#16003
6 years ago
flame be523baad2 Add anakin conv2d/relu/sigmoid/tanh converter (#15997)
6 years ago
Yan Chunwei d0ce6a9044 fix anakin converter registry (#15993)
6 years ago
luotao1 82af8031d9 add runtime_context_cache_pass
6 years ago
Tao Luo 7d2740db83
Revert "cache runtime_context"
6 years ago
Jacek Czaja 13816dd4ac [MKL-DNN] Fix to crash of Transformer when mkldnn is to be used (#16233)
6 years ago
Tao Luo dbb92ee4b1
Merge pull request #16002 from luotao1/runtime_context
6 years ago
Qiyang Min 8e4ad008fb
Merge pull request #16198 from velconia/imperative_train_speed
6 years ago
luotao1 a275fd6e0c Merge branch 'develop' into runtime_context
6 years ago
Wojciech Uss 2579ade45f Add cpu_quantize_pass for C-API quantization (#16127)
6 years ago
luotao1 5ecdc49c6b set enable_runtime_context_cache_ default false
6 years ago
minqiyang 7355d41834 1. Add imperative gperf profiler
6 years ago
minqiyang 98dfb492bb Release GIL lock
6 years ago
minqiyang 42e96a029f Accelerate CPU part
6 years ago
luotao1 1510b866b6 turn off runtime_context_cache for tensorrt
6 years ago
luotao1 d94fd97230 add runtime_context_cache_pass
6 years ago
fc500110 1c6e72b905 remove visualizer, which can be replaced by python IrGraph draw API
6 years ago
Tao Luo c49b7855fa
Merge pull request #16120 from Xreki/fix_cmake_compress
6 years ago
Liu Yiqun 4e052e0ac9 Disable inference download for WIN32 temporary.
6 years ago
luotao1 1283833395 zero_copy tensor support INT32
6 years ago
luotao1 31c4e1d9fc Merge branch 'develop' into zero_copy
6 years ago
luotao1 9e2c7e69fb simplify the zero_copy tests
6 years ago
luotao1 aeee4cbe71 add compare between zerocopy and analysis
6 years ago
Liu Yiqun 6bb84b74b2 Change the download and compress command of cmake.
6 years ago
Tao Luo 25ca2ca001 change init_idx to INT32 in transformer_test
6 years ago
Tao Luo e5e7e9b865 Merge branch 'develop' into transformer_ut
6 years ago
Tao Luo 6f2581e4c5
Merge pull request #16090 from lidanqing-intel/paddle-int32
6 years ago
Zhaolong Xing 3d63aa0a11
Merge pull request #15729 from NHZlX/add_static_model_load_for_trt
6 years ago
nhzlx a9ed427749 cant not pass ci
6 years ago
luotao1 fad06cb928 unify ZeroCopy in analysis_test
6 years ago
lidanqing 4aeb261da9 Add INT32 support. INT32 in last switch case
6 years ago
luotao1 06aab1b493 refine SetCpuMathLibraryNumThreads
6 years ago
nhzlx 3c40cb767b 7 refine zero copy
6 years ago
Yiqun Liu 1616c32acf
Add the include of cudnn.h to enable the use of CUDNN_VERSION. (#15961)
6 years ago
flame b187e3728e
add anakin fc op converter (#15965)
6 years ago
flame e40d56c3d3
anakin subgraph engine (#15774)
6 years ago
nhzlx 2eff3e26b6 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_static_model_load_for_trt
6 years ago
nhzlx 06a088a199 fix comments and fix cpplint
6 years ago
nhzlx 0ed63b2108 6. delete useless predictor id
6 years ago
nhzlx 1d5ef7c9ee 5. add static trt load model
6 years ago
Tao Luo 4774dad806
Merge pull request #15857 from sfraczek/fix-typo
6 years ago
Tao Luo e3dd6970fc disable dam temporarily (#15860)
6 years ago
Sylwester Fraczek 1943119fc5 fix typo memeroy->memory
6 years ago
Sylwester Fraczek 8bc604571f fix typo seriazlized->serialized
6 years ago
Sylwester Fraczek 543e53db05 fix typo releated->related
6 years ago
Dun a83e470405
Profiler refine and add CUDA runtime api tracer (#15301)
6 years ago
Yiqun Liu e38dd91f04
Refine cmake's download function. (#15512)
6 years ago
tensor-tang e1c707fe9c
fix warnings (#15790)
6 years ago
nhzlx 2070fb246d 4. do the trt_engine optim during init.
6 years ago
nhzlx ecc12fb430 3. when runing in trt mode, do not allocate memory for parameters in fluid.
6 years ago
nhzlx 9cc6249cd6 2. TRTEngine using stream only when execute.
6 years ago
Wojciech Uss daac6a05f5 Removed duplicated code
6 years ago
Yan Chunwei 3a5d6e5e64
move passes to src to avoid different behavior in deployment (#15705)
6 years ago
nhzlx 034ba1c291 add static model load for trt
6 years ago
Yan Chunwei c00ed19df2
add more comment (#15603)
6 years ago
Gabor Buella da9c94da33 Clang build fixes (#15628)
6 years ago
Chunwei d85c2e4e5c fix anakin compile dependency
6 years ago
wopeizl 3614dadf23
Merge pull request #15631 from wopeizl/windows/fixci
6 years ago
peizhilin 061299be87 fix dependency
6 years ago
Gabor Buella 2bf63f4c33 Fix std::abs usage in memory_optimize_pass.cc (#15627)
6 years ago
peizhilin 3a4110f960 fix ci broken randomly and disable some warnings
6 years ago
dzhwinter 4f01de6378 Merge remote-tracking branch 'origin/develop' into feature/ir_inplace_pass
6 years ago
qingqing01 943d972878
Fix analysis predictor when loading the persistable RAW type variable. (#15613)
6 years ago
dzhwinter 9c9ad7d40b Merge remote-tracking branch 'origin/develop' into feature/ir_inplace_pass
6 years ago
Yan Chunwei e887d71958
fix ir debug config (#15571)
6 years ago
Yan Chunwei 897789b16e
fix save_inferece_model bug (#15365)
6 years ago
dzhwinter 6f9904e99a rerun windows ci. test=develop
6 years ago
Tao Luo 3d0ecab41b add analyzer_transformer_test
6 years ago
Tao Luo 1a252f4be6
Merge pull request #15587 from luotao1/bert
6 years ago
Jiabin Yang b4c24f3f7c
Merge pull request #15575 from JiabinYang/feature/imperative
6 years ago
Zhaolong Xing 90ffe74954
Merge pull request #15546 from NHZlX/fix_trt_utest_random_failed
6 years ago
luotao1 8f0c2b07f2 use embedding=128 bert model for test
6 years ago
JiabinYang 16f64b43d4 test=develop, Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/imperative
6 years ago
Tao Luo 245b1f0579
Merge pull request #15570 from luotao1/bert
6 years ago
JiabinYang bb881199f2 test=develop, polish code and fix wrong change in /paddle/fluid/inference/utils/CMakeLists.txt
6 years ago
Jiabin Yang 075df09f86
Merge pull request #15470 from JiabinYang/feature/imperative
6 years ago
luotao1 5504425eb3 fix compiler error, use len20 dataset for bert
6 years ago
Yan Chunwei 655179089f
AnalysisConfig remove contrib namespace (#15540)
6 years ago
luotao1 e31aef9f6e Merge branch 'develop' into fc500110-bert_test
6 years ago
qingqing01 a6910f900e
Always create variables in analysis_predictor before OptimizeInferenceProgram. (#15533)
6 years ago
Yan Chunwei b62b756b28
add version support (#15469)
6 years ago
Yan Chunwei 526790e652
infer get program (#15511)
6 years ago
JiabinYang 2e309b11c2 test=develop, merge develop
6 years ago
nhzlx 95b98f27ae fix trt models utest failed.
6 years ago
Tao Luo b919190232
Merge pull request #15531 from jczaja/prv-googlenet-fix
6 years ago
JiabinYang 53d558cd41 test=develop, polish code and merge develop
6 years ago
Zhaolong Xing 97b76c94c4
Merge pull request #15242 from NHZlX/trt_int8_ultimate_version
6 years ago
Jacek Czaja 4aa7ef3c13 - Compensation fix to LRN MKL-DNN op
6 years ago
nhzlx b43ea40c51 delete the usage of the const_cast
6 years ago
Yan Chunwei e2818c8608
add dynamic memory optim (#15457)
6 years ago
nhzlx 92cf4a4c6b fix comments
6 years ago
JiabinYang 1bf2facecb Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/imperative
6 years ago
JiabinYang e3a8929cf8 little change
6 years ago
Zhaolong Xing a7ba07d7ef
Merge pull request #15504 from NHZlX/fix_conv2d_fusion
6 years ago
nhzlx 0779e35544 fix two bug:
6 years ago
nhzlx 027d24c831 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into trt_int8_ultimate_version
6 years ago
nhzlx 9641324995 fix comments
6 years ago
nhzlx 484b3bc801 When cudnn version < 7100, there is problem with conv_fusion.
6 years ago
tensor-tang 5c68dee798 fix debug compile of analysis pass fail
6 years ago
luotao1 353b5f06a7 refine analyzer_bert_test to pass the ci
6 years ago
nhzlx e6218c1d7b change the input to a smaller value
6 years ago
fuchang01 4a33a44f45 analyzer bert tester
6 years ago
nhzlx 5b92ddabe2 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_trt_stream_bug
6 years ago
nhzlx 2f4aee361a fix comments
6 years ago
nhzlx ec213730bc fix trt stream bug.
6 years ago
flame d60751fb71
add python inference api (#15248)
6 years ago
Yan Chunwei 885c4e57ab
fea/infer memory optim2 (#14953)
6 years ago
Tao Luo 8f522c15ed
Merge pull request #15408 from luotao1/mm_dnn
6 years ago
Tao Luo 001827c270 test_analyzer_mm_dnn runs in serial
6 years ago
Tao Luo 140fc1e92c
Merge pull request #15392 from luotao1/pyramid_dnn
6 years ago
Yan Chunwei c9e5aa19c1
get tensor API add more comments (#15345)
6 years ago
Yan Chunwei e84234b551
make clone thread safe (#15363)
6 years ago
Tao Luo 668563088e add pyramid_dnn c++ inference test
6 years ago
Zhaolong Xing 236201c222
Merge pull request #15350 from NHZlX/fix_bug_for_precditor
6 years ago
nhzlx 8817841c73 fix unit test bug
6 years ago
Yan Chunwei e07900d317
cache tensor ptr in ZeroCopyTensor (#15352)
6 years ago
Yan Chunwei b7916440ff
hot fix the Native clone (#15344)
6 years ago
Xin Pan 3ecf6bb338
Merge pull request #15028 from yihuaxu/develop_641313ea7_elementwise_mul_mkldnn_bug_fix
6 years ago
nhzlx b95f2ff8fe fix win build bug
6 years ago
nhzlx b938324381 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into trt_int8_ultimate_version
6 years ago
nhzlx 312fe0ece1 add trt int8 calibration support
6 years ago
Yiqun Liu 568cc2ffa8
Optimize while_op for test (#14764)
6 years ago
nhzlx b2ba3471fd fix analysis config bug.
6 years ago
tensor-tang a7fc3d42a0
Merge pull request #15304 from tensor-tang/fuse/second_order_mul_sub
6 years ago
bingyanghuang a152a5c731 Disable conv3d mkldnn in dam (#15335)
6 years ago
tensor-tang 1a95cd227d disable seqpool test on mac or without mkl
6 years ago
Tao Luo 2411ed4286 fix multi-threads in ZeroCopyProfile
6 years ago
tensor-tang 84b0ecdcce Merge remote-tracking branch 'ups/develop' into fuse/second_order_mul_sub
6 years ago
tensor-tang 7035f051a8 adjust acc on mac
6 years ago
Tao Luo e33427da0d
Merge pull request #15280 from luotao1/random_test
6 years ago
tensor-tang d618e48309 fix fuse square mat order and refine test
6 years ago
tensor-tang a5d2a6d1ad add fuse pass of sequared mat sub fusion
6 years ago
tensor-tang 84e023eae5 adjust the acc since the refer result is too large
6 years ago
tensor-tang 4461a458a5 adjust diff since abs is too large
6 years ago
tensor-tang ca6fdc6e33 refine and fix test
6 years ago
tensor-tang a89296ac1f add repeated fc relu pass
6 years ago
tensor-tang 781cd0cf51 add multi threads test of seqpool test (#15293)
6 years ago
Tao Luo cbd1c7c01f fix CompareDeterministic error when test_all_data
6 years ago
Zhaolong Xing 98e85f3735 add_transpose_flatten_concat_fuse (#15121)
6 years ago
wopeizl 5d9edb4124
Merge pull request #15156 from wopeizl/windows/fixgpuissue
6 years ago
tensor-tang 146e942c65
Merge pull request #15250 from tensor-tang/refine/seqpool/feed
6 years ago
peizhilin 439691f5bd adjust the shlwapi on windows
6 years ago
tensor-tang 96786d3716 add compare_determine of seqpool1 test
6 years ago
tensor-tang ce909664d8 Merge remote-tracking branch 'ups/develop' into refine/seqpool/feed
6 years ago
peizhilin e239558e56 remove the dismatch enclosure to avoid warning message test=develop
6 years ago
Tao Luo 7d13d20769
Merge pull request #15245 from luotao1/rnn1_multi_thread
6 years ago
Tao Luo 2b11c710b3
Merge pull request #15249 from NHZlX/fix_trt_demo_ci
6 years ago
tensor-tang 54afcb7ec6 add compare zerocopy test with native result
6 years ago
tensor-tang 137060135e fix zerocopy size
6 years ago
tensor-tang 7461356723 add zerocopy for seqpool test
6 years ago
tensor-tang 48410b9bfe
Merge pull request #15237 from tensor-tang/fuse/seqpool_concat_2
6 years ago
nhzlx e7d83389e6 fix demo ci bug
6 years ago
Tao Luo 9b41e45584
Merge pull request #15222 from luotao1/native_config
6 years ago
Tao Luo d43983b61d reduce threads number to avoid hang in CI
6 years ago
nhzlx c1264e99f3 fix win error
6 years ago
nhzlx 4e3522e5b4 add trt int8 support
6 years ago
tensor-tang f8c305b243 Merge remote-tracking branch 'ups/develop' into fuse/seqpool_concat_2
6 years ago
Tao Luo 197d0f2431 fix trt_model_tester to pass the ci
6 years ago
Tao Luo feee78d9f0
Merge pull request #15214 from tensor-tang/fix/debug
6 years ago
Tao Luo 71d9097a89 fix analyzer_test runs error in native_config
6 years ago
Tao Luo 9c02765158
Merge pull request #15210 from Superjomn/fix/analysis_tester_bug
6 years ago
tensor-tang 72d2a1801e add seqpool concat fuse pass
6 years ago
tensor-tang 69fd3fdb52 fix debug build error
6 years ago
superjomn 23bdd0a223 fix analysis_tester bug
6 years ago
Yan Chunwei d09d6eadc0
make inference api work with Doxygen (#15195)
6 years ago
Tao Luo 6ca9a4810b
Merge pull request #15196 from luotao1/serial
6 years ago
Yan Chunwei 6ccf8685f7
refactor tensorrt node teller (#15181)
6 years ago
Tao Luo 7dc0181c46 run analyzer_tester serial in multi-thread
6 years ago
Yan Chunwei 875a07c32d
refactor inference analysis api (#14634)
6 years ago
tensor-tang 516fe301ee add comment in case of empty name
6 years ago
tensor-tang b9c645639b workaround with third party cache
6 years ago
tensor-tang dca68cdf97 throw error when name not find
6 years ago
tensor-tang 484085693e update url and num_ops
6 years ago
tensor-tang cd94df8679 fix load and refine
6 years ago
tensor-tang 8e271896ae add test data for seqpool1
6 years ago
Zhaolong Xing 4048cfa9da
Merge pull request #15048 from NHZlX/add_affine_channel_fuse
6 years ago
Zeng Jinle c0bcff00dc
Merge pull request #14962 from sneaxiy/rewrite_variable_type
6 years ago
Tao Luo 85471533e0
Merge pull request #15079 from luotao1/analysis_test
6 years ago
wopeizl 719ebe3786
Merge pull request #15070 from wopeizl/windows/testcasefix
6 years ago
Qiyang Min 0238a3bb4f
Merge pull request #14972 from velconia/accelerate_lstm
6 years ago
sneaxiy c4ce2e7b21 merge develop, solve conflict
6 years ago
Tao Luo ecae157edf simplify some data record in analyzer_tester
6 years ago
Tao Luo 05f1b65da3 simplify prepere_input in analyzer_test
6 years ago
nhzlx 02e17396c2 fix comments
6 years ago
nhzlx 71636e677d add min_subgraph_size attr to tensorrt config
6 years ago
peizhilin 01c00b07dd fix test issues on windows
6 years ago
nhzlx a6aa8ea771 faster rcnn input is presistable. (fix it in paddle-trt)
6 years ago
sneaxiy dde3afe7b7 Merge develop
6 years ago
Yihua Xu 0b0acfaa88 Add mkldnn item for porfile and compare usage.
6 years ago
tensor-tang d46a140dd9 add seq pool inference test
6 years ago
tensor-tang d4931a2abc support more input fake data
6 years ago
nhzlx 73b47df1f4 Merge branch 'develop' of https://github.com/paddlepaddle/paddle into add_affine_channel_fuse
6 years ago
nhzlx ce3782c193 add affine_channel fuse.
6 years ago
Tao Luo 91408e3122 fix analyzer_mm_dnn_tester fails when bs > 1
6 years ago
Tao Luo f01c966800 Merge branch 'develop' into mm_dnn
6 years ago
qingqing01 51a9fca323
Async memory copy (#15013)
6 years ago
minqiyang b1d0a14c14 Change the ut back
6 years ago
minqiyang 7d1533216d Fix syntax error in unit test
6 years ago
Tao Luo 22c71398e3 add MM_DNN inference test
6 years ago