Paddle

Commit Graph

Author	SHA1	Message	Date
翟飞跃	78441c5449	add mkldnn Int8v2 slim doc (#17909 )	6 years ago
Wojciech Uss	ca5642c850	unify FP32 vs. INT8 comparison tests output (#18111 ) test=develop	6 years ago
Wojciech Uss	c26130f3a9	reuse C-API INT8 unit test application (#18077 ) * reuse C-API INT8 unit test application test=develop * updates after review test=develop	6 years ago
lidanqing	466254151a	add Mobilienet ssd int8 analyzer tester (#18075 ) * add pascalvoc preprocess script and mobilenet-ssd analyzer_tester, wait 17737 * change converting local dataset to downloading and converting tarfile test=develop * change the test data_path test=develop * change copyright (c) 2016 to copyright (c) 2019 test=develop	6 years ago
石晓伟	42f12a4aca	fix ci test cmake test=develop (#18060 )	6 years ago
Michał Gallus	8462e2b805	Disable MKLDNN FC in Resnet50 test (#18030 )	6 years ago
石晓伟	04ea7cb069	modify the access level of anakin engine (#18015 ) test=develop	6 years ago
石晓伟	bce259e5bf	Update the Anakin interfaces for content-dnn and MLU (#17890 ) * update anakin-engine interfaces for content-dnn test=develop * support only-gpu mode of Anakin modify eltwise parse test=develop * modification for thread-safe test=develop * Integrated template instance test=develop * increase template parameters test=develop * support MLU predictor test=develop * update anakin cmake files test=develop * update TargetWrapper::set_device * update the initialization of anakin subgraph test=develop * use the default constructor of base class test=develop	6 years ago
Zhaolong Xing	4e8d5a034f	Light mem reuse strategy for inference. (#17925 ) * fix: when use the load model from memory mode, the RAM occupy is high test=develop * ligth mem reuse test=develop * fix cpplint test=develop	6 years ago
mozga-intel	c1379bf238	[NGraph] Bert model for a capi, ngraph's support test=develop (#17844 )	6 years ago
石晓伟	d008260fa8	update the initialization of anakin subgraph (#17880 ) test=develop	6 years ago
Zhaolong Xing	ae576f3c68	fix: when use the load model from memory mode, the RAM occupy is high (#17788 ) test=develop	6 years ago
翟飞跃	993c703bcc	INT8 MKL-DNN v2 integrate to slim (#17634 ) * refactor PR 16865 * delete mergetool files * test=develop * test=develop * test=develop * test=develop * create dir for int8 model before call SaveOptimModel * test=develop * mkldnn int8 only support linux; test=develop * refine code; test=develop * remove comment; test=develop * refine code; test=develop * fix bug; test=develop * add exception for mkldnn_post_training_strategy * reuse int8v2 CAPI dataset; test=develop * fix accuracy check bug; test=develop * remove tab * convert files to unix format * test=develop * reduce CI time;test=develop * reduce CI time and refine code;test=develop * refine comment; test=develop * add cmake FLAGS;test=develop * remove predict_num;test=develop	6 years ago
Tao Luo	e089e454a1	make omp thread num default 1 after inference run (#17801 ) test=develop	6 years ago
Tao Luo	b4b169467b	add fc_mkldnn_pass in compare_mkldnn (#17712 ) test=develop	6 years ago
Zhaolong Xing	4337009b92	fix trt ci timeout error (#17701 ) test=develop	6 years ago
mozga-intel	5eb81fe595	Capi for a ngraph engine (#17037 )	6 years ago
lidanqing	04b6c29ee0	Improve mobilenetv2 INT8 performance by using INT8 relu as post-op (#17570 ) * add INT8 conv+relu6 fuse and enbale mobilentv2 INT8 test test=develop * change fasle and 0.0 to fuse_brelu and brelu_threshold test=develop change the "fuse_relu\|\|fuse_brelu" to "unsigned_output" test=develop * Use relu instead of brelu as INT8 post-op because INT8 brelu is not enabled in mkldnn v0.18 test=develop * continuous-integration fix test=develop	6 years ago
Jacek Czaja	6d8075ecef	[MKL-DNN] conv_transpose mkldnn bias pass (#17644 ) * - changes to graph detector - Changes to pass - Added ut for new pass - use_pass - Added pass to mkldnn passes - fix to registration - improved verbose messaging for conv bias passes - Lint fixes test=develop * - Lint fixes test=develop	6 years ago
Sylwester Fraczek	96845d2168	add Concat quantization (#17448 ) * add Concat quantization add unit test for quantizing concat fix for wrong value when the input is not in map of calculated scales add use_quantizer to concat_op.cc add scale_algo rules for concat test=develop * missing fix for multiple inputs quantize-squash * wojtuss review fix: adding comment test=develop	6 years ago
Zhen Wang	8bd651b7ed	Fix the bug in the AnalysisPredictor and add more directions about io APIs. (#17639 ) * fix the bug that sub_scope_ may be null in AnalysisPredictor::Run. * add more directions about io APIs' docs. * update the API.spec. test=develop test=document_preview	6 years ago
Zeng Jinle	4aa931dd85	Code clean of Allocator (#17602 ) * Revert "Revert "Fix allocator bug"" This reverts commit `174d0d0b90`. * Revert "fix travis ci" This reverts commit `5656fa9f7c`. test=develop * add inlined_vector.h, test=develop * add inlined_vector_test,test=develop * clean code of allocator,test=develop * delete zero_size_allocator.h,test=develop * fix failed unittest,test=develop	6 years ago
Zhaolong Xing	61221ebc28	TRT: Support set dynamic range in int8 mode. (#17524 ) * fluid int8 train and trt int8 predict align. trt int8 predict init op converter * 2. align fluid int8 train and trt int8 inference. enhance quant dequant fuse pass enhance op converter, trt engine, trt engine op, trt subgraph pass. * 3. add delete_quant_dequant_pass for trt test=develop * 4. add the missing file test=develop * 5. i modify the c++ interface, but forget to modify the pybind code fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter test=develop	6 years ago
Michał Gallus	0c39b97b4e	[MKL-DNN] Add Fully Connected Op for inference only(#15226 ) * fuse mul and elementwise add to fc * Reimplement the FC forward operator * Fix FC MKLDNN integration by transposing weights * Add FC MKLDNN Pass test=develop * FC MKLDNN Pass: change memcpy to std::copy * Fix MKLDNN FC handling of mismatch input and weights dims * Lower tolerance for MKL-DNN in resnet50 test test=develop * Adjust FC to support MKLDNN Op placement test=develop * Adjust Placement Op to set use_mkldnn attribute for graph test=develop * MKLDNN FC: fix weights format so that gemm version is called test=develop * FC MKLDNN: Remove tolerance decrease from tester_helper * FC MKL-DNN: Refactor the code, change input reorder to weight reorder * MKL-DNN FC: Introduce operator caching test=develop * FC MKL-DNN: Fix the tensor type in ExpectedKernelType test=develop * FC MKL-DNN: fix style changes test=develop * FC MKL-DNN: fallback to native on non-supported dim sizes test=develop * FC MKLDNN: fix CMake paths test=develop * FC MKLDNN: Refine placement pass graph mkldnn attribute test=develop * Fix Transpiler error for fuse_conv_eltwise test=develop * Fix missing STL includes in files test=develop * FC MKL-DNN: Enable new output size computation Also, refine pass to comply with newest interface. test=develop * FC MKL-DNN: enable only when fc_mkldnn_pass is enabled * FC MKL-DNN: Allow Weights to use oi or io format * FC MKL-DNN: Adjust UT to work with correct dims test=develop * Enable MKL DEBUG for resnet50 analyzer test=develop * FC MKL-DNN: Improve Hashing function test=develop * FC MKL-DNN: Fix shape for fc weights in transpiler * FC MKL-DNN: Update input pointer in re-used fc primitive * Add log for not handling fc fuse for unsupported dims test=develop * FC MKL-DNN: Move transpose from pass to Op Kernel test=develop * FC MKL-DNN: Disable transpose in unit test test=develop * FC MKL-DNN: Remove fc_mkldnn_pass from default list * Correct Flag for fake data analyzer tests test=develop * FC MKL-DNN: Add comment about fc mkldnn pass disablement test=develop * FC MKL-DNN: Disable fc in int8 tests test=develop	6 years ago
Sylwester Fraczek	5b2a3c4b12	Conv concat relu quantization (#17466 ) * add conv_concat_relu fuse test=develop * add test code test=develop * added missing include with unordered_map test=develop * review fixes for wojtuss test=develop * remove 'should (not) be fused' comment statements one of them was invalid anyway test=develop	6 years ago
Sylwester Fraczek	bccb0ba49a	fix quantize_squash_pass segfault when no tensor linked to Bias (#17292 ) * fix quantize_squash_pass segfault when there is no tensor linked do Bias input test=develop * add googlenet test test=develop * fix concat CreateKey not using input format test=develop	6 years ago
Zhaolong Xing	38da103034	fix trt ci bug temporary. (#17565 ) ban all trt ut. will fix it later. test=develop	6 years ago
lijianshe02	daf88968e2	fix bug that saved optimal model path in test_analyzer_save_model con… (#17555 ) * modify saved model path in analyzer_save_model.cc test=develop	6 years ago
guomingz	2281ebf0f3	Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130 ) * Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization. Below table shows the benchmark(FPS) which measured on skx-8180(28 cores) Batch size \| with fusion \| without fusion -- \| -- \| -- 1 \| 214.7 \| 53.4 50 \| 1219.727 \| 137.280 test=develop * Fix the format issue test=develop * Add the missing nolint comments. test=develop * Fix the typos. test=develop * Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine. test=develop * Adjust the indentation. test=develop * Add the test_conv_brelu_mkldnn_fuse_pass case. test=develop * Slightly update the code per Baidu comments. Let the parameter definition embedded into the code. That's will make the code easy to understand. test=develop	6 years ago
Tao Luo	3d19f44a89	remove unused SERIAL compiler option (#17500 ) test=develop	6 years ago
lidanqing	36757ed203	Enabling resnet101, vgg16, vgg19 INT8v2 model tests (#17468 ) * Add 6 models tests support in CMake * enabling resnet101, vgg16, vgg19 INT8v2 model tests test=develop * remove SERIAL test=develop	6 years ago
liuwei1031	ba70cc499e	fix security bugs : (#17464 ) http://newicafe.baidu.com:80/issue/PaddleSec-33/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-28/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-25/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-24/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-21/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-20/show?from=page test=develop	6 years ago
Tao Luo	32da5e9c3d	remove unused expected_kernel_cache_pass (#17486 ) test=develop	6 years ago
wopeizl	ca3ba378c7	fix the random compilation failure on windows test=develop (#17475 ) * fix the random compilation failure on windows	6 years ago
Zhen Wang	4a1b7fec96	Add setting Scope function for the graph class (#17417 ) * add set_not_owned function for graph * add scope set. test=develop * add scope_ptr enforce not null before setting.test=develop	6 years ago
flame	e48dd92fc8	bug fix (#17392 ) fix secure bug	6 years ago
Zhaolong Xing	7a3bb061d8	fix: (#17279 ) 1. infernce multi card occupy 2. facebox model inference occupy too much test=develop	6 years ago
Wojciech Uss	984aa90583	improved unit test output (#17266 ) added printing data type to differentiate int8 and fp32 latency results test=develop	6 years ago
石晓伟	a72dbe9abf	Cherry-pick benchmark related changes from release/1.4 (#17156 ) * cherry-pick commit from `8877054` * cherry-pick commit from `3f0b97d` * cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn (cherry picked from commit `8643dbc233`) * Cherry-Pick from 16662 : Anakin subgraph cpu support (cherry picked from commit `7ad182e16c`) * Cherry-pick from 1662, 16797.. : add anakin int8 support (cherry picked from commit `e14ab180fe`) * Cherry-pick from 16813 : change singleton to graph RegistBlock test=release/1.4 (cherry picked from commit `4b9fa42307`) * Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2 Support ShuffleNet and MobileNet-v2, test=release/1.4 (cherry picked from commit `a6fb066f90`) * Cherry-pick : anakin subgraph add opt config layout argument #16846 test=release/1.4 (cherry picked from commit `8121b3eccb`) * 1. add shuffle_channel_detect (cherry picked from commit `6efdea8997`) * update shuffle_channel op convert, test=release/1.4 (cherry picked from commit `e4726a066f`) * Modify symbol export rules test=develop	6 years ago
Leo Zhao	54636a1982	call SetNumThreads everytime to avoid missing omp thread setting (#17224 ) * call SetNumThreads everytime to avoid missing omp thread setting resolve #17153 test=develop * add paddle_num_threads into config for test_analyzer_pyramid_dnn resolve #17153 test=develop	6 years ago
wopeizl	83c4f7721f	use two GPUs to run the exclusive test test=develop (#17187 )	6 years ago
tensor-tang	79ed1c76cd	fix bn fuse vardesc and add model saver (#17143 ) * fix bn fuse vardesc and add model saver test=develop * unify save model in test helper test=develop * fix mkdir on windows test=develop * remove magic number use bn bias var desc test=develop	6 years ago
Tao Luo	d9cd989825	Merge pull request #17048 from luotao1/fix_runtime_cache_bug fix runtime_context_cache bug when gpu model has an op runs only on cpu	6 years ago
tangwei12	13295d90d9	load persistables with selected rows, test=develop (#17047 )	6 years ago
luotao1	490e746269	fix runtime_context_cache bug when gpu model has an op runs only on cpu test=develop	6 years ago
wopeizl	d9991dccdd	add parallel build script to ci … (#16901 ) * add parallel build script to ci test=develop * 1. classify the test case as single card/two cards/multiple cards type 2. run test case according to the run type	6 years ago
Tao Luo	aa7b975bf6	disable runtime_context_cache pass by default test=develop	6 years ago
nhzlx	bc6b0ca1f4	fix trt anakin subgraph compile rely test=develop	6 years ago
gongweibao	cbdb8a17b1	Polish DGC code (#16818 )	6 years ago
Tao Luo	bc037c13c7	use multi-thread to speedup CI tests test=develop	6 years ago
Tao Luo	5b1565a7be	Merge pull request #16875 from lidanqing-intel/lidanqing/improve_preprocess_script Improve preprocessing script and read from tar	6 years ago
root	1965a22488	minus trt ci times. test=develop	6 years ago
Tao Luo	ca8b8fa0bd	Merge pull request #16830 from Superjomn/fix/tmp-memory-optim fix memory optim temporarily	6 years ago
lijianshe02	de26df440b	add SaveOptimModel interface in analysis_predictor.h and test it in a… (#16441 ) * add SaveOptimModel interface in analysis_predictor.h and test it in analyzer_dam_tester and analyzer_resnet50_tester test=develop	6 years ago
lidanqing	de02d40e98	improve preprocess script and read from tar test=develop	6 years ago
superjomn	f58c3ec189	fix memory optim temporarily test=develop	6 years ago
Yihua Xu	93cedfdb9c	Fix the order while sorting the operators (#16756 ) * Fix the order when sorting operators. test=develop * Enable transfomer compare test item. test=develop * Use set to replace vector. test=develop	6 years ago
liuwei1031	85363848a1	Security issue (#16774 ) * disable memory_optimize and inpalce strategy by default, test=develop * fix security issue http://newicafe.baidu.com:80/issue/PaddleSec-3/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-8/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-12/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-32/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-35/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-37/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-40/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-43/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-44/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-45/show?from=page test=develop * revert piece.cc, test=develop * adjust api.cc,test=develop	6 years ago
tensor-tang	d6c1b5a73b	disable seqpool concat pass by default saving CI time test=develop	6 years ago
Tao Luo	ad4a1bd13c	Merge pull request #16339 from luotao1/core_opt_choose_kernel Cache the chosen kernel of operators	6 years ago
Tao Luo	d5c8d4acfe	reduce all analyzer_test ci elasped time test=develop	6 years ago
luotao1	226596a296	Merge branch 'develop' into core_opt_choose_kernel	6 years ago
bingyanghuang	88ceda5134	MKLDNN INT8 v2 readme.md (#16515 )	6 years ago
luotao1	bd636a9ea6	test_analyzer_int8 tests use default pass order test=develop	6 years ago
Yan Chunwei	044ae2497d	fix identity temporarily (#15942 )	6 years ago
Wojciech Uss	ec2750b3c2	fix repeating passes (#16606 )	6 years ago
Wojciech Uss	9b6a029666	fix dataset reading and add support for full dataset (#16559 )	6 years ago
lidanqing	2ca0de3cd4	fix preprocess script with processbar, integrity check and logs (#16608 ) * fix preprocess script with processbar, integrity check and logs test=develop * delete unnecessary empty lines, change function name test=develop	6 years ago
Tao Luo	ce18710421	enhance analyzer_tests download test=develop	6 years ago
石晓伟	5dea0bdd1b	Merge pull request #16498 from Shixiaowei02/feature/anakin-engine merge feature/anakin-engine to develop	6 years ago
Shixiaowei02	7b9fc71076	update tensorrt subgraph_util test=develop	6 years ago
Wojciech Uss	2498395132	remove profiling from int8 test test=develop	6 years ago
Zhaolong Xing	3e6aa498d6	Merge pull request #16526 from NHZlX/refine_trt_anakin refine subgraph trt and anakin	6 years ago
Tao Luo	8f7b5883b8	Merge pull request #16529 from lidanqing-intel/lidanqing/preprocess-data preprocess with PIL the full val dataset and save binary	6 years ago
Tao Luo	5b24002389	Merge pull request #16399 from sfraczek/sfraczek/analyzer_int8_resnet50_test create test for quantized resnet50	6 years ago
Shixiaowei02	bddb2cd315	resolve conflicts with the develop branch test=develop	6 years ago
lidanqing	0d656996bf	fix some bugs of unzip and reading val list test=develop	6 years ago
nhzlx	d065b5bf2b	Anakin ssd support refine trt first run add quant dequant fuse pass omit simplify_anakin_priorbox_detection template omit transpose_flatten_concat_fuse template test=develop	6 years ago
lidanqing	b46e467abc	add wget and unzip part and change data_dir test=develop	6 years ago
lidanqing	894aa9b235	change script file name and data_dir location test=develop	6 years ago
lidanqing	57f51e5b08	preprocess with PIL the full val dataset and save binary test=develop	6 years ago
chengduo	ed61d67c73	Fix the interface of Pass::Apply (#16484 ) * modify the interface of Pass::Allay test=develop * Polish code test=develop * Fix Travis CI test=develop * fix Pass::Apply interface test=develop * Fix Travis CI test=develop	6 years ago
Sylwester Fraczek	8ece7a9708	fixed url to dataset test=develop	6 years ago
gongweibao	eb83abeac3	Add DGC(Deep Gradient Compression) interface. (#15841 )	6 years ago
Sylwester Fraczek	fe21578a44	create test for quantized resnet50 test=develop	6 years ago
Michał Gallus	2d8b7b3a76	Refine default MKL-DNN Pass order (#16490 ) * Refine default MKL-DNN Pass order test=develop * Add comment to default MKL-DNN Pass list test=develop	6 years ago
Wojciech Uss	09dfc7a2aa	C-API quantization core 2 (#16396 ) * C-API quantization core test=develop Co-authored-by: Sylwester Fraczek <sylwester.fraczek@intel.com> * Decouple Quantizer from AnalysisPredictor test=develop * fixes after review test=develop * renamed mkldnn quantize stuff test=develop * remove ifdef from header file test=develop	6 years ago
Yihua Xu	57dc3c1943	Disable compare for Issue#16316 (#16466 ) * Disable compare for accuracy issue. test=develop * Add todo comments to show more information. test=develop	6 years ago
nhzlx	953bdde058	Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD test=develop	6 years ago
nhzlx	45b3766fdf	fix comments test=develop	6 years ago
Wojciech Uss	46677fb080	Move cpu_quantize_* passes into mkldnn subfolder test=develop	6 years ago
liuwei1031	de3b70a101	fix cdn issue, test=develop (#16423 ) * fix cdn issue, test=develop * fix cdn issue, test=develop	6 years ago
nhzlx	3df7b98a0f	Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD	6 years ago
nhzlx	f3a2e4b3d8	1. Add ANAKIN_ROOT compile option 2. refine trt code test=develop	6 years ago
Tao Luo	294cdf6f48	Merge pull request #16177 from fc500110/remove_visualizer remove graph visualizer tool, which can be replaced by python IrGraph draw API	6 years ago
luotao1	056599a738	add expected_kernel_cache_pass test=develop	6 years ago
Wojciech Uss	cbe2dbf0db	Add enabling quantization (#16326 ) * Add enabling quantization test=develop * remove unused (here) function	6 years ago
nhzlx	4f4daa4b66	cherry-pick from feature/anakin-engine: add data type for zero copy #16313 1. refine anakin engine 2. add data type for zero copy align dev branch and PaddlePaddle:feature/anakin-engine brach the cudnn workspace modify was not included for now, because we use a hard code way in feature/anakin-engine branch. There should be a better way to implement it, and subsequent submissions will be made. test=develop	6 years ago
nhzlx	07dcf2856c	git cherry-pick from feature/anakin-engine: update anakin subgraph #16278	6 years ago
nhzlx	c407dfa3cb	cherry-pick from feature/anakin-engine: refine paddle-anakin to new interface. #16276	6 years ago
nhzlx	a25331bc26	cherry-pick from feature/anakin-engine: deal the changing shape when using anakin #16189	6 years ago
nhzlx	c79f06d3d8	cherry-pick from feature/anakin-engine: add batch interface for pd-anakin #16178	6 years ago
nhzlx	69d37f81d7	cherry-pick from feature/anakin-engine: refine anakin subgraph. #16157 support change input size	6 years ago
nhzlx	a1d200a5de	cherry-pick from feature/anakin-engine: Anakin support facebox #16111	6 years ago
flame	a32d420043	cherry-pick from feature/anakin-engine: batch norm (#16110 ) * use anakin batch norm and scale implement fluid batch norm	6 years ago
flame	0945b97f07	cherry-pick feature/anakin-engine: add anakin softmax/transpose/batch_norm/flatten/reshape op (#16020 ) * add anakin softmax/ flatten/reshape/transpose/batch_norm op converter	6 years ago
nhzlx	b21770a2aa	cherry-pick from feature/anakin-engine: Add subgraph fuse support and anakin engine #16018	6 years ago
nhzlx	084310f536	paddle-anakin: concat, split, pool2d converter#16003	6 years ago
flame	be523baad2	Add anakin conv2d/relu/sigmoid/tanh converter (#15997 ) * add activation op * test conv2d relu sigmoid tanh	6 years ago
Yan Chunwei	d0ce6a9044	fix anakin converter registry (#15993 )	6 years ago
luotao1	82af8031d9	add runtime_context_cache_pass test=develop	6 years ago
Tao Luo	7d2740db83	Revert "cache runtime_context"	6 years ago
Jacek Czaja	13816dd4ac	[MKL-DNN] Fix to crash of Transformer when mkldnn is to be used (#16233 ) * - Fix to crash of Transformer when mkldnn is to be used Desc: TensorCopy was not setting MKLDNN primitive descriptor when layout was to be kMKLDNN test=develop * - Enable transformer for mkl-dnn test=develo * - Compilation fix test=develop * - Removed manual selection of MKL-DNN ops to be used in Transformer test test=develop	6 years ago
Tao Luo	dbb92ee4b1	Merge pull request #16002 from luotao1/runtime_context cache runtime_context	6 years ago
Qiyang Min	8e4ad008fb	Merge pull request #16198 from velconia/imperative_train_speed Improve imperative mode training speed	6 years ago
luotao1	a275fd6e0c	Merge branch 'develop' into runtime_context	6 years ago
Wojciech Uss	2579ade45f	Add cpu_quantize_pass for C-API quantization (#16127 ) * Add cpu_quantize_pass for C-API quantization test=develop * add cpu_quantize_pass test * fix lint: add include memory unorderd_map and unordered_set test=develop * fuse_relu 1 test=develop * tuned 2 without squash * fixes test=develop * remove unused vars test=develop * refactored test=develop * fix lint c-style cast -> C++ style cast test=develop * remove QuantMax and c style casts test=develop * last usage of QuantMax removed test=develop * Fix Analysis Predictor UT Check if memory_optimize_pass has already been added to the analysis config before adding a new one, so that it is not added multiple times. test=develop * change map to unordered_map fix the forgotten part of cpu_quantize_pass_tester.cc test=develop * removed quantized attribute * fixed cpu_quantize_pass_tester and op attr comments test=develop * removed redundant line test=debug * removed gmock test=develop * fix after merge	6 years ago
luotao1	5ecdc49c6b	set enable_runtime_context_cache_ default false test=develop	6 years ago
minqiyang	7355d41834	1. Add imperative gperf profiler 2. Add binutils 2.27 in manylinux support test=develop	6 years ago
minqiyang	98dfb492bb	Release GIL lock	6 years ago
minqiyang	42e96a029f	Accelerate CPU part	6 years ago
luotao1	1510b866b6	turn off runtime_context_cache for tensorrt test=develop	6 years ago
luotao1	d94fd97230	add runtime_context_cache_pass test=develop	6 years ago
fc500110	1c6e72b905	remove visualizer, which can be replaced by python IrGraph draw API	6 years ago
Tao Luo	c49b7855fa	Merge pull request #16120 from Xreki/fix_cmake_compress Change the download and compress command of cmake.	6 years ago
Liu Yiqun	4e052e0ac9	Disable inference download for WIN32 temporary. test=develop	6 years ago
luotao1	1283833395	zero_copy tensor support INT32 test=develop	6 years ago
luotao1	31c4e1d9fc	Merge branch 'develop' into zero_copy	6 years ago
luotao1	9e2c7e69fb	simplify the zero_copy tests test=develop	6 years ago
luotao1	aeee4cbe71	add compare between zerocopy and analysis	6 years ago
Liu Yiqun	6bb84b74b2	Change the download and compress command of cmake. test=develop	6 years ago
Tao Luo	25ca2ca001	change init_idx to INT32 in transformer_test test=develop	6 years ago
Tao Luo	e5e7e9b865	Merge branch 'develop' into transformer_ut	6 years ago
Tao Luo	6f2581e4c5	Merge pull request #16090 from lidanqing-intel/paddle-int32 Add PaddleDType INT32 support	6 years ago
Zhaolong Xing	3d63aa0a11	Merge pull request #15729 from NHZlX/add_static_model_load_for_trt Four points for enhancing Paddle-TRT	6 years ago
nhzlx	a9ed427749	cant not pass ci add if use static engine for trt test=develop	6 years ago
luotao1	fad06cb928	unify ZeroCopy in analysis_test	6 years ago
lidanqing	4aeb261da9	Add INT32 support. INT32 in last switch case test=develop	6 years ago
luotao1	06aab1b493	refine SetCpuMathLibraryNumThreads test=develop	6 years ago
nhzlx	3c40cb767b	7 refine zero copy update trt in docker file test=develop	6 years ago
Yiqun Liu	1616c32acf	Add the include of cudnn.h to enable the use of CUDNN_VERSION. (#15961 ) test=develop	6 years ago
flame	b187e3728e	add anakin fc op converter (#15965 )	6 years ago
flame	e40d56c3d3	anakin subgraph engine (#15774 ) * add anakin subgraph engine * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * add initial op converter * update * update * fix op register compile error * update test=develop * update	6 years ago
nhzlx	2eff3e26b6	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_static_model_load_for_trt	6 years ago
nhzlx	06a088a199	fix comments and fix cpplint test=develop	6 years ago
nhzlx	0ed63b2108	6. delete useless predictor id test=develop	6 years ago
nhzlx	1d5ef7c9ee	5. add static trt load model 1). add static trt load model 2). fix bug: when device_id is not 0, the trt will have a bug test=develop	6 years ago
Tao Luo	4774dad806	Merge pull request #15857 from sfraczek/fix-typo Fix few typos	6 years ago
Tao Luo	e3dd6970fc	disable dam temporarily (#15860 ) test=develop	6 years ago
Sylwester Fraczek	1943119fc5	fix typo memeroy->memory test=develop	6 years ago
Sylwester Fraczek	8bc604571f	fix typo seriazlized->serialized	6 years ago
Sylwester Fraczek	543e53db05	fix typo releated->related	6 years ago
Dun	a83e470405	Profiler refine and add CUDA runtime api tracer (#15301 ) * refine profiler && add runtime tracer * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * fix bug && test=develop * add thread id map && test=develop * test=develop * testing * bug fix * remove cuda event && refine code && test=develop * test=develop * test=develop * test=develop * fix windows temp file && test=develop * test=develop * fix windows bug && test=develop * fix start up issue && test=develop * code polish && test=develop * remove unused code && test=develop * add some cupti cbid && test=develop * add FLAGS_multiple_of_cupti_buffer_size && test=develop * fix compile error && test=develop * add keyword && test=develop * fix && test=develop * code polish && test=develop	6 years ago
Yiqun Liu	e38dd91f04	Refine cmake's download function. (#15512 ) * Refine cmake's download function. test=develop * Set DOWNLOAD_NO_EXTRACT to 1 pure download function. test=develop * Fix unpack problem in ExternalProject_Add, and it seem DOWNLOAD_NO_EXTRACT option is not support in cmake-3.5. test=develop	6 years ago
tensor-tang	e1c707fe9c	fix warnings (#15790 ) * fix warnings test=develop * fix enforce test test=develop	6 years ago
nhzlx	2070fb246d	4. do the trt_engine optim during init. add simple static mode loading test=develop	6 years ago
nhzlx	ecc12fb430	3. when runing in trt mode, do not allocate memory for parameters in fluid. test=develop	6 years ago
nhzlx	9cc6249cd6	2. TRTEngine using stream only when execute.	6 years ago
Wojciech Uss	daac6a05f5	Removed duplicated code This also fixes linking to libpaddle_fluid.so built in debug mode test=develop	6 years ago
Yan Chunwei	3a5d6e5e64	move passes to src to avoid different behavior in deployment (#15705 )	6 years ago
nhzlx	034ba1c291	add static model load for trt 1. bind trt input and output to fluid tensors	6 years ago
Yan Chunwei	c00ed19df2	add more comment (#15603 )	6 years ago
Gabor Buella	da9c94da33	Clang build fixes (#15628 ) * Remove some superfluous std::move calls The std:move triggered a build error (with -Werror): ``` [ 9%] Building CXX object paddle/fluid/memory/allocation/CMakeFiles/allocator_facade.dir/allocator_facade.cc.o /home/tej/code/gbuella_paddle/paddle/fluid/memory/allocation/allocator_facade.cc:86:29: error: moving a temporary object prevents copy elision [-Werror,-Wpessimizing-move] [this] { return std::move(CreateAllocatorWithChunk()); }, capacity); ^ /home/tej/code/gbuella_paddle/paddle/fluid/memory/allocation/allocator_facade.cc:86:29: note: remove std::move call here [this] { return std::move(CreateAllocatorWithChunk()); }, capacity); ^~~~~~~~~~ ~ 1 error generated. ``` See: https://reviews.llvm.org/D7633 * Remove a superfluous lambda capture from framework/operator.h ``` [ 10%] Building CXX object paddle/fluid/platform/CMakeFiles/device_context.dir/init.cc.o In file included from /home/tej/code/gbuella_paddle/paddle/fluid/platform/init.cc:19: /home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.h:229:21: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture] [this](Variable* var) { return var; }); ^~~~ 1 error generated. ``` Changing it to `return it->second;`, as is in the function below. * Rethrow an exception (instead of copying it) ``` [ 11%] Building CXX object paddle/fluid/framework/CMakeFiles/operator.dir/operator.cc.o /home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.cc:191:13: error: local variable 'exception' will be copied despite being thrown by name [-Werror,-Wreturn-std-move] throw exception; ^~~~~~~~~ /home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.cc:191:13: note: call 'std::move' explicitly to avoid copying throw exception; ^~~~~~~~~ std::move(exception) ``` See https://reviews.llvm.org/D43322 for an explanation of this diagnostic message. * Remove an unused variable ``` /home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.cc:884:16: error: private field 'scope_' is not used [-Werror,-Wunused-private-field] const Scope& scope_; ^ ``` * struct ComputationOpHandle -> class ComputationOpHandle ``` [ 13%] Building CXX object paddle/fluid/framework/details/CMakeFiles/memory_early_delete_pass.dir/memory_early_delete_pass.cc.o In file included from /home/tej/code/gbuella_paddle/paddle/fluid/framework/details/memory_early_delete_pass.cc:21: /home/tej/code/gbuella_paddle/paddle/fluid/framework/details/reference_count_pass_helper.h:30:1: error: class 'ComputationOpHandle' was previously declared as a struct; this is valid, but may result in linker errors under the Microsoft C++ ABI [-Werror,-Wmismatched-tags] class ComputationOpHandle; ^ /home/tej/code/gbuella_paddle/paddle/fluid/framework/details/computation_op_handle.h:29:8: note: previous use is here struct ComputationOpHandle : public OpHandleBase { ^ /home/tej/code/gbuella_paddle/paddle/fluid/framework/details/reference_count_pass_helper.h:30:1: note: did you mean struct here? class ComputationOpHandle; ^~~~~ struct 1 error generated. ``` * Fix name() methods under fluid/operators ``` In file included from /home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen/act.cc:15: In file included from /home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen/act.h:19: /home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen/jitcode.h:71:23: error: 'name' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override] virtual const char* name() const = 0; ^ /home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen_base.h:31:23: note: overridden virtual function is here virtual const char* name() const = 0; ^ ``` test=develop	6 years ago
Chunwei	d85c2e4e5c	fix anakin compile dependency test=develop	6 years ago
wopeizl	3614dadf23	Merge pull request #15631 from wopeizl/windows/fixci fix ci broken randomly and disable some warnings	6 years ago
peizhilin	061299be87	fix dependency test=develop	6 years ago
Gabor Buella	2bf63f4c33	Fix std::abs usage in memory_optimize_pass.cc (#15627 ) test=develop size_t is an unsigned integer, with a conversion rank larger than int, therefore in the following expression the int value was promoted to size_t, making it a subtraction of unsigned values. The result of such a subtraction is also an unsigned value.	6 years ago
peizhilin	3a4110f960	fix ci broken randomly and disable some warnings test=develop	6 years ago
dzhwinter	4f01de6378	Merge remote-tracking branch 'origin/develop' into feature/ir_inplace_pass	6 years ago
qingqing01	943d972878	Fix analysis predictor when loading the persistable RAW type variable. (#15613 )	6 years ago
dzhwinter	9c9ad7d40b	Merge remote-tracking branch 'origin/develop' into feature/ir_inplace_pass test=develop	6 years ago
Yan Chunwei	e887d71958	fix ir debug config (#15571 )	6 years ago
Yan Chunwei	897789b16e	fix save_inferece_model bug (#15365 )	6 years ago
dzhwinter	6f9904e99a	rerun windows ci. test=develop	6 years ago
Tao Luo	3d0ecab41b	add analyzer_transformer_test test=develop	6 years ago
Tao Luo	1a252f4be6	Merge pull request #15587 from luotao1/bert use embedding=128 bert model for test	6 years ago
Jiabin Yang	b4c24f3f7c	Merge pull request #15575 from JiabinYang/feature/imperative test=develop, polish code and fix some wrong change	6 years ago
Zhaolong Xing	90ffe74954	Merge pull request #15546 from NHZlX/fix_trt_utest_random_failed fix trt models utest failed.	6 years ago
luotao1	8f0c2b07f2	use embedding=128 bert model for test test=develop	6 years ago
JiabinYang	16f64b43d4	test=develop, Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/imperative	6 years ago
Tao Luo	245b1f0579	Merge pull request #15570 from luotao1/bert fix compiler error, use len20 dataset for bert	6 years ago
JiabinYang	bb881199f2	test=develop, polish code and fix wrong change in /paddle/fluid/inference/utils/CMakeLists.txt	6 years ago
Jiabin Yang	075df09f86	Merge pull request #15470 from JiabinYang/feature/imperative Add simple RNN in imperative	6 years ago
luotao1	5504425eb3	fix compiler error, use len20 dataset for bert test=develop	6 years ago
Yan Chunwei	655179089f	AnalysisConfig remove contrib namespace (#15540 )	6 years ago
luotao1	e31aef9f6e	Merge branch 'develop' into fc500110-bert_test test=develop	6 years ago
qingqing01	a6910f900e	Always create variables in analysis_predictor before OptimizeInferenceProgram. (#15533 ) Otherwise, some other persistable variable (like RAW type) will not be created	6 years ago
Yan Chunwei	b62b756b28	add version support (#15469 )	6 years ago
Yan Chunwei	526790e652	infer get program (#15511 )	6 years ago
JiabinYang	2e309b11c2	test=develop, merge develop	6 years ago
nhzlx	95b98f27ae	fix trt models utest failed. test=develop	6 years ago
Tao Luo	b919190232	Merge pull request #15531 from jczaja/prv-googlenet-fix Performance and functional fixes to LRN	6 years ago
JiabinYang	53d558cd41	test=develop, polish code and merge develop	6 years ago
Zhaolong Xing	97b76c94c4	Merge pull request #15242 from NHZlX/trt_int8_ultimate_version add trt int8 support	6 years ago
Jacek Czaja	4aa7ef3c13	- Compensation fix to LRN MKL-DNN op test=develop	6 years ago
nhzlx	b43ea40c51	delete the usage of the const_cast test=develop	6 years ago
Yan Chunwei	e2818c8608	add dynamic memory optim (#15457 )	6 years ago
nhzlx	92cf4a4c6b	fix comments test=develop	6 years ago
JiabinYang	1bf2facecb	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/imperative	6 years ago
JiabinYang	e3a8929cf8	little change	6 years ago

... 2 3 4 5 6 ...

1236 Commits (24a063f6ac0ba1122b5b6bec524c6ec659197e5f)