Commit Graph

1236 Commits (24a063f6ac0ba1122b5b6bec524c6ec659197e5f)

Author SHA1 Message Date
翟飞跃 78441c5449 add mkldnn Int8v2 slim doc (#17909)
6 years ago
Wojciech Uss ca5642c850 unify FP32 vs. INT8 comparison tests output (#18111)
6 years ago
Wojciech Uss c26130f3a9 reuse C-API INT8 unit test application (#18077)
6 years ago
lidanqing 466254151a add Mobilienet ssd int8 analyzer tester (#18075)
6 years ago
石晓伟 42f12a4aca
fix ci test cmake test=develop (#18060)
6 years ago
Michał Gallus 8462e2b805 Disable MKLDNN FC in Resnet50 test (#18030)
6 years ago
石晓伟 04ea7cb069
modify the access level of anakin engine (#18015)
6 years ago
石晓伟 bce259e5bf
Update the Anakin interfaces for content-dnn and MLU (#17890)
6 years ago
Zhaolong Xing 4e8d5a034f
Light mem reuse strategy for inference. (#17925)
6 years ago
mozga-intel c1379bf238 [NGraph] Bert model for a capi, ngraph's support test=develop (#17844)
6 years ago
石晓伟 d008260fa8
update the initialization of anakin subgraph (#17880)
6 years ago
Zhaolong Xing ae576f3c68
fix: when use the load model from memory mode, the RAM occupy is high (#17788)
6 years ago
翟飞跃 993c703bcc INT8 MKL-DNN v2 integrate to slim (#17634)
6 years ago
Tao Luo e089e454a1
make omp thread num default 1 after inference run (#17801)
6 years ago
Tao Luo b4b169467b
add fc_mkldnn_pass in compare_mkldnn (#17712)
6 years ago
Zhaolong Xing 4337009b92 fix trt ci timeout error (#17701)
6 years ago
mozga-intel 5eb81fe595 Capi for a ngraph engine (#17037)
6 years ago
lidanqing 04b6c29ee0 Improve mobilenetv2 INT8 performance by using INT8 relu as post-op (#17570)
6 years ago
Jacek Czaja 6d8075ecef [MKL-DNN] conv_transpose mkldnn bias pass (#17644)
6 years ago
Sylwester Fraczek 96845d2168 add Concat quantization (#17448)
6 years ago
Zhen Wang 8bd651b7ed
Fix the bug in the AnalysisPredictor and add more directions about io APIs. (#17639)
6 years ago
Zeng Jinle 4aa931dd85
Code clean of Allocator (#17602)
6 years ago
Zhaolong Xing 61221ebc28
TRT: Support set dynamic range in int8 mode. (#17524)
6 years ago
Michał Gallus 0c39b97b4e [MKL-DNN] Add Fully Connected Op for inference only(#15226)
6 years ago
Sylwester Fraczek 5b2a3c4b12 Conv concat relu quantization (#17466)
6 years ago
Sylwester Fraczek bccb0ba49a fix quantize_squash_pass segfault when no tensor linked to Bias (#17292)
6 years ago
Zhaolong Xing 38da103034 fix trt ci bug temporary. (#17565)
6 years ago
lijianshe02 daf88968e2
fix bug that saved optimal model path in test_analyzer_save_model con… (#17555)
6 years ago
guomingz 2281ebf0f3 Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130)
6 years ago
Tao Luo 3d19f44a89
remove unused SERIAL compiler option (#17500)
6 years ago
lidanqing 36757ed203 Enabling resnet101, vgg16, vgg19 INT8v2 model tests (#17468)
6 years ago
liuwei1031 ba70cc499e
fix security bugs : (#17464)
6 years ago
Tao Luo 32da5e9c3d
remove unused expected_kernel_cache_pass (#17486)
6 years ago
wopeizl ca3ba378c7
fix the random compilation failure on windows test=develop (#17475)
6 years ago
Zhen Wang 4a1b7fec96
Add setting Scope function for the graph class (#17417)
6 years ago
flame e48dd92fc8
bug fix (#17392)
6 years ago
Zhaolong Xing 7a3bb061d8
fix: (#17279)
6 years ago
Wojciech Uss 984aa90583 improved unit test output (#17266)
6 years ago
石晓伟 a72dbe9abf
Cherry-pick benchmark related changes from release/1.4 (#17156)
6 years ago
Leo Zhao 54636a1982 call SetNumThreads everytime to avoid missing omp thread setting (#17224)
6 years ago
wopeizl 83c4f7721f
use two GPUs to run the exclusive test test=develop (#17187)
6 years ago
tensor-tang 79ed1c76cd
fix bn fuse vardesc and add model saver (#17143)
6 years ago
Tao Luo d9cd989825
Merge pull request #17048 from luotao1/fix_runtime_cache_bug
6 years ago
tangwei12 13295d90d9
load persistables with selected rows, test=develop (#17047)
6 years ago
luotao1 490e746269 fix runtime_context_cache bug when gpu model has an op runs only on cpu
6 years ago
wopeizl d9991dccdd
add parallel build script to ci … (#16901)
6 years ago
Tao Luo aa7b975bf6 disable runtime_context_cache pass by default
6 years ago
nhzlx bc6b0ca1f4 fix trt anakin subgraph compile rely
6 years ago
gongweibao cbdb8a17b1
Polish DGC code (#16818)
6 years ago
Tao Luo bc037c13c7 use multi-thread to speedup CI tests
6 years ago
Tao Luo 5b1565a7be
Merge pull request #16875 from lidanqing-intel/lidanqing/improve_preprocess_script
6 years ago
root 1965a22488 minus trt ci times.
6 years ago
Tao Luo ca8b8fa0bd
Merge pull request #16830 from Superjomn/fix/tmp-memory-optim
6 years ago
lijianshe02 de26df440b add SaveOptimModel interface in analysis_predictor.h and test it in a… (#16441)
6 years ago
lidanqing de02d40e98 improve preprocess script and read from tar
6 years ago
superjomn f58c3ec189 fix memory optim temporarily
6 years ago
Yihua Xu 93cedfdb9c Fix the order while sorting the operators (#16756)
6 years ago
liuwei1031 85363848a1
Security issue (#16774)
6 years ago
tensor-tang d6c1b5a73b disable seqpool concat pass by default saving CI time
6 years ago
Tao Luo ad4a1bd13c
Merge pull request #16339 from luotao1/core_opt_choose_kernel
6 years ago
Tao Luo d5c8d4acfe reduce all analyzer_test ci elasped time
6 years ago
luotao1 226596a296 Merge branch 'develop' into core_opt_choose_kernel
6 years ago
bingyanghuang 88ceda5134 MKLDNN INT8 v2 readme.md (#16515)
6 years ago
luotao1 bd636a9ea6 test_analyzer_int8 tests use default pass order
6 years ago
Yan Chunwei 044ae2497d
fix identity temporarily (#15942)
6 years ago
Wojciech Uss ec2750b3c2 fix repeating passes (#16606)
6 years ago
Wojciech Uss 9b6a029666 fix dataset reading and add support for full dataset (#16559)
6 years ago
lidanqing 2ca0de3cd4 fix preprocess script with processbar, integrity check and logs (#16608)
6 years ago
Tao Luo ce18710421 enhance analyzer_tests download
6 years ago
石晓伟 5dea0bdd1b
Merge pull request #16498 from Shixiaowei02/feature/anakin-engine
6 years ago
Shixiaowei02 7b9fc71076 update tensorrt subgraph_util test=develop
6 years ago
Wojciech Uss 2498395132 remove profiling from int8 test
6 years ago
Zhaolong Xing 3e6aa498d6
Merge pull request #16526 from NHZlX/refine_trt_anakin
6 years ago
Tao Luo 8f7b5883b8
Merge pull request #16529 from lidanqing-intel/lidanqing/preprocess-data
6 years ago
Tao Luo 5b24002389
Merge pull request #16399 from sfraczek/sfraczek/analyzer_int8_resnet50_test
6 years ago
Shixiaowei02 bddb2cd315 resolve conflicts with the develop branch test=develop
6 years ago
lidanqing 0d656996bf fix some bugs of unzip and reading val list
6 years ago
nhzlx d065b5bf2b Anakin ssd support
6 years ago
lidanqing b46e467abc add wget and unzip part and change data_dir
6 years ago
lidanqing 894aa9b235 change script file name and data_dir location
6 years ago
lidanqing 57f51e5b08 preprocess with PIL the full val dataset and save binary
6 years ago
chengduo ed61d67c73
Fix the interface of Pass::Apply (#16484)
6 years ago
Sylwester Fraczek 8ece7a9708 fixed url to dataset
6 years ago
gongweibao eb83abeac3
Add DGC(Deep Gradient Compression) interface. (#15841)
6 years ago
Sylwester Fraczek fe21578a44 create test for quantized resnet50
6 years ago
Michał Gallus 2d8b7b3a76 Refine default MKL-DNN Pass order (#16490)
6 years ago
Wojciech Uss 09dfc7a2aa
C-API quantization core 2 (#16396)
6 years ago
Yihua Xu 57dc3c1943 Disable compare for Issue#16316 (#16466)
6 years ago
nhzlx 953bdde058 Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD
6 years ago
nhzlx 45b3766fdf fix comments
6 years ago
Wojciech Uss 46677fb080 Move cpu_quantize_* passes into mkldnn subfolder
6 years ago
liuwei1031 de3b70a101
fix cdn issue, test=develop (#16423)
6 years ago
nhzlx 3df7b98a0f Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD
6 years ago
nhzlx f3a2e4b3d8 1. Add ANAKIN_ROOT compile option
6 years ago
Tao Luo 294cdf6f48
Merge pull request #16177 from fc500110/remove_visualizer
6 years ago
luotao1 056599a738 add expected_kernel_cache_pass
6 years ago
Wojciech Uss cbe2dbf0db Add enabling quantization (#16326)
6 years ago
nhzlx 4f4daa4b66 cherry-pick from feature/anakin-engine: add data type for zero copy #16313
6 years ago
nhzlx 07dcf2856c git cherry-pick from feature/anakin-engine: update anakin subgraph #16278
6 years ago
nhzlx c407dfa3cb cherry-pick from feature/anakin-engine: refine paddle-anakin to new interface. #16276
6 years ago
nhzlx a25331bc26 cherry-pick from feature/anakin-engine: deal the changing shape when using anakin #16189
6 years ago
nhzlx c79f06d3d8 cherry-pick from feature/anakin-engine: add batch interface for pd-anakin #16178
6 years ago
nhzlx 69d37f81d7 cherry-pick from feature/anakin-engine: refine anakin subgraph. #16157
6 years ago
nhzlx a1d200a5de cherry-pick from feature/anakin-engine: Anakin support facebox #16111
6 years ago
flame a32d420043 cherry-pick from feature/anakin-engine: batch norm (#16110)
6 years ago
flame 0945b97f07 cherry-pick feature/anakin-engine: add anakin softmax/transpose/batch_norm/flatten/reshape op (#16020)
6 years ago
nhzlx b21770a2aa cherry-pick from feature/anakin-engine: Add subgraph fuse support and anakin engine #16018
6 years ago
nhzlx 084310f536 paddle-anakin: concat, split, pool2d converter#16003
6 years ago
flame be523baad2 Add anakin conv2d/relu/sigmoid/tanh converter (#15997)
6 years ago
Yan Chunwei d0ce6a9044 fix anakin converter registry (#15993)
6 years ago
luotao1 82af8031d9 add runtime_context_cache_pass
6 years ago
Tao Luo 7d2740db83
Revert "cache runtime_context"
6 years ago
Jacek Czaja 13816dd4ac [MKL-DNN] Fix to crash of Transformer when mkldnn is to be used (#16233)
6 years ago
Tao Luo dbb92ee4b1
Merge pull request #16002 from luotao1/runtime_context
6 years ago
Qiyang Min 8e4ad008fb
Merge pull request #16198 from velconia/imperative_train_speed
6 years ago
luotao1 a275fd6e0c Merge branch 'develop' into runtime_context
6 years ago
Wojciech Uss 2579ade45f Add cpu_quantize_pass for C-API quantization (#16127)
6 years ago
luotao1 5ecdc49c6b set enable_runtime_context_cache_ default false
6 years ago
minqiyang 7355d41834 1. Add imperative gperf profiler
6 years ago
minqiyang 98dfb492bb Release GIL lock
6 years ago
minqiyang 42e96a029f Accelerate CPU part
6 years ago
luotao1 1510b866b6 turn off runtime_context_cache for tensorrt
6 years ago
luotao1 d94fd97230 add runtime_context_cache_pass
6 years ago
fc500110 1c6e72b905 remove visualizer, which can be replaced by python IrGraph draw API
6 years ago
Tao Luo c49b7855fa
Merge pull request #16120 from Xreki/fix_cmake_compress
6 years ago
Liu Yiqun 4e052e0ac9 Disable inference download for WIN32 temporary.
6 years ago
luotao1 1283833395 zero_copy tensor support INT32
6 years ago
luotao1 31c4e1d9fc Merge branch 'develop' into zero_copy
6 years ago
luotao1 9e2c7e69fb simplify the zero_copy tests
6 years ago
luotao1 aeee4cbe71 add compare between zerocopy and analysis
6 years ago
Liu Yiqun 6bb84b74b2 Change the download and compress command of cmake.
6 years ago
Tao Luo 25ca2ca001 change init_idx to INT32 in transformer_test
6 years ago
Tao Luo e5e7e9b865 Merge branch 'develop' into transformer_ut
6 years ago
Tao Luo 6f2581e4c5
Merge pull request #16090 from lidanqing-intel/paddle-int32
6 years ago
Zhaolong Xing 3d63aa0a11
Merge pull request #15729 from NHZlX/add_static_model_load_for_trt
6 years ago
nhzlx a9ed427749 cant not pass ci
6 years ago
luotao1 fad06cb928 unify ZeroCopy in analysis_test
6 years ago
lidanqing 4aeb261da9 Add INT32 support. INT32 in last switch case
6 years ago
luotao1 06aab1b493 refine SetCpuMathLibraryNumThreads
6 years ago
nhzlx 3c40cb767b 7 refine zero copy
6 years ago
Yiqun Liu 1616c32acf
Add the include of cudnn.h to enable the use of CUDNN_VERSION. (#15961)
6 years ago
flame b187e3728e
add anakin fc op converter (#15965)
6 years ago
flame e40d56c3d3
anakin subgraph engine (#15774)
6 years ago
nhzlx 2eff3e26b6 Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_static_model_load_for_trt
6 years ago
nhzlx 06a088a199 fix comments and fix cpplint
6 years ago
nhzlx 0ed63b2108 6. delete useless predictor id
6 years ago
nhzlx 1d5ef7c9ee 5. add static trt load model
6 years ago
Tao Luo 4774dad806
Merge pull request #15857 from sfraczek/fix-typo
6 years ago
Tao Luo e3dd6970fc disable dam temporarily (#15860)
6 years ago
Sylwester Fraczek 1943119fc5 fix typo memeroy->memory
6 years ago
Sylwester Fraczek 8bc604571f fix typo seriazlized->serialized
6 years ago
Sylwester Fraczek 543e53db05 fix typo releated->related
6 years ago
Dun a83e470405
Profiler refine and add CUDA runtime api tracer (#15301)
6 years ago
Yiqun Liu e38dd91f04
Refine cmake's download function. (#15512)
6 years ago
tensor-tang e1c707fe9c
fix warnings (#15790)
6 years ago
nhzlx 2070fb246d 4. do the trt_engine optim during init.
6 years ago
nhzlx ecc12fb430 3. when runing in trt mode, do not allocate memory for parameters in fluid.
6 years ago
nhzlx 9cc6249cd6 2. TRTEngine using stream only when execute.
6 years ago
Wojciech Uss daac6a05f5 Removed duplicated code
6 years ago
Yan Chunwei 3a5d6e5e64
move passes to src to avoid different behavior in deployment (#15705)
6 years ago
nhzlx 034ba1c291 add static model load for trt
6 years ago
Yan Chunwei c00ed19df2
add more comment (#15603)
6 years ago
Gabor Buella da9c94da33 Clang build fixes (#15628)
6 years ago
Chunwei d85c2e4e5c fix anakin compile dependency
6 years ago
wopeizl 3614dadf23
Merge pull request #15631 from wopeizl/windows/fixci
6 years ago
peizhilin 061299be87 fix dependency
6 years ago
Gabor Buella 2bf63f4c33 Fix std::abs usage in memory_optimize_pass.cc (#15627)
6 years ago
peizhilin 3a4110f960 fix ci broken randomly and disable some warnings
6 years ago
dzhwinter 4f01de6378 Merge remote-tracking branch 'origin/develop' into feature/ir_inplace_pass
6 years ago
qingqing01 943d972878
Fix analysis predictor when loading the persistable RAW type variable. (#15613)
6 years ago
dzhwinter 9c9ad7d40b Merge remote-tracking branch 'origin/develop' into feature/ir_inplace_pass
6 years ago
Yan Chunwei e887d71958
fix ir debug config (#15571)
6 years ago
Yan Chunwei 897789b16e
fix save_inferece_model bug (#15365)
6 years ago
dzhwinter 6f9904e99a rerun windows ci. test=develop
6 years ago
Tao Luo 3d0ecab41b add analyzer_transformer_test
6 years ago
Tao Luo 1a252f4be6
Merge pull request #15587 from luotao1/bert
6 years ago
Jiabin Yang b4c24f3f7c
Merge pull request #15575 from JiabinYang/feature/imperative
6 years ago
Zhaolong Xing 90ffe74954
Merge pull request #15546 from NHZlX/fix_trt_utest_random_failed
6 years ago
luotao1 8f0c2b07f2 use embedding=128 bert model for test
6 years ago
JiabinYang 16f64b43d4 test=develop, Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/imperative
6 years ago
Tao Luo 245b1f0579
Merge pull request #15570 from luotao1/bert
6 years ago
JiabinYang bb881199f2 test=develop, polish code and fix wrong change in /paddle/fluid/inference/utils/CMakeLists.txt
6 years ago
Jiabin Yang 075df09f86
Merge pull request #15470 from JiabinYang/feature/imperative
6 years ago
luotao1 5504425eb3 fix compiler error, use len20 dataset for bert
6 years ago
Yan Chunwei 655179089f
AnalysisConfig remove contrib namespace (#15540)
6 years ago
luotao1 e31aef9f6e Merge branch 'develop' into fc500110-bert_test
6 years ago
qingqing01 a6910f900e
Always create variables in analysis_predictor before OptimizeInferenceProgram. (#15533)
6 years ago
Yan Chunwei b62b756b28
add version support (#15469)
6 years ago
Yan Chunwei 526790e652
infer get program (#15511)
6 years ago
JiabinYang 2e309b11c2 test=develop, merge develop
6 years ago
nhzlx 95b98f27ae fix trt models utest failed.
6 years ago
Tao Luo b919190232
Merge pull request #15531 from jczaja/prv-googlenet-fix
6 years ago
JiabinYang 53d558cd41 test=develop, polish code and merge develop
6 years ago
Zhaolong Xing 97b76c94c4
Merge pull request #15242 from NHZlX/trt_int8_ultimate_version
6 years ago
Jacek Czaja 4aa7ef3c13 - Compensation fix to LRN MKL-DNN op
6 years ago
nhzlx b43ea40c51 delete the usage of the const_cast
6 years ago
Yan Chunwei e2818c8608
add dynamic memory optim (#15457)
6 years ago
nhzlx 92cf4a4c6b fix comments
6 years ago
JiabinYang 1bf2facecb Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/imperative
6 years ago
JiabinYang e3a8929cf8 little change
6 years ago