Paddle

Commit Graph

Author	SHA1	Message	Date
石晓伟	2bb135825e	fix analysis_predictor when func is called multiple times, test=release/1.6 (#21665 )	5 years ago
joanna.wozna.intel	d419b859c0	Add reshape int8 mkldnn op (#21428 ) * Add reshape int8 op test=develop * Change test to CPUPlace test=develop * Correct tests test=develop	5 years ago
Zhaolong Xing	fbbd94a6ce	there is bug for inference using auto grwoth allocator (#21621 ) test=develop	5 years ago
rensilin	7f5d532a9c	fix: fail to call ZeroCopyTensor::mutable_data() when device_id is no… (#21461 ) * ZeroCopyTensor::mutable_data in the right device, test=develop * add unittest for zerocopy, test=develop	5 years ago
Pei Yang	20d61414b4	fix glog warning, test=develop (#21573 )	5 years ago
Pei Yang	122b37ce62	make config option DisableGlogInfo() able to mute all inference logs (#21318 ) * make DisableGlogInfo able to mute all logs in inference.	5 years ago
Zhaolong Xing	b39c011637	specify the auto growth allocator for inference. (#21448 ) test=develop	5 years ago
Lv Mengsi	37f3e56dea	Fix transpose conv (#21406 ) * fix transpose conv,test=develop * fix comments test=develop	5 years ago
Zhaolong Xing	d1a6e112e6	fix C++ multicard inference bug. (#20955 ) test=develop	5 years ago
Michał Gallus	5d7d548275	INT8 Fully-connected (#17641 ) * Implement Int8 FC * Integrate FC into INT8v2 test=develop * int8 FC: transpose weights before computing scales test=develop * Add support for activation_type string in FC test=develop * Disable MKL-DNN's FC in VGG16 and 19 test=develop * Disable FC quantization when mkldnn FC is disabled test=develop * Solve PADDLE_ENFORCES in FC int8 * Fix Paddle enforces and remove const cast test=develop * Fix style changes test=develop * Fix quantizer_tester test and add fc quantization test=develop * Fix FC test fail on CUDA * Remove unnecessary log from quantize placement pass test=develop * Add Thread ID to FC hash key test=develop * Add comments to MKL-DNN FC Kernel test=develop * Refactor quantizer test=develop * Fix linter issues test=develop * Fix crash in slim googlenet test=develop * Fix PADDLE_ENFORCE messages test=develop	5 years ago
silingtong123	45c1e7bb7b	add prediction demo and script on windows (#21248 )	5 years ago
zhouwei25	c0dcb090a3	Determine whether to copy and link inference lib by ON_INFER (#20931 )	5 years ago
Zhaolong Xing	65f7052554	TRT int8: refine trt int8 for dynamic range set (#21112 ) * refine trt int8 for dynamic range set test=develop * refine trt int8 test=develop	5 years ago
joanna.wozna.intel	77c2083586	Add transpose2 INT8 for mkl-dnn (#19424 ) * Add transpose2 INT8 for mkl-dnn test=develop * Fix test_transpose_int8_mkldnn test=develop * Revert "Merge branch 'develop' into transpose_int8_mkldnn_2" This reverts commit 34011bdba4c859abb945e062ab13124f70508054, reversing changes made to 2ce6473f144da298aba4a43d46918f27d463cf7c. * Revert "Revert "Merge branch 'develop' into transpose_int8_mkldnn_2"" This reverts commit 23754dd78ca47ae56881161172b2aacd349aba90. * Add template to TransposeMKLDNNHandler test=develop * Resolve conflict test=develop * Restore get_size and refactor test=develop	5 years ago
Pei Yang	e89c16b90d	Bug Fix: Paddle-TRT cannot handle adaptive pooling in pool2d op converter and "num" attribute in split op converter (#20733 ) * fix pool2d trt converter, test=develop * add fix for split op converter, test=develop	5 years ago
石晓伟	e742760f8e	optimize version error, test=develop (#20715 )	5 years ago
石晓伟	d8f4f4239d	Ensure backward compatibility with the anakin interface, test=develop (#20691 ) * support MLU nums, test=develop * change anakin apis, test=develop	5 years ago
Pei Yang	443f604c3b	add DisableGlogInfo() to AnalysisConfig, test=develop (#20581 )	5 years ago
zhaoyuchen2018	b8333edef6	Add Multihead matmul fuse pass (#20167 ) * Add multihead fuse pass for ernie opt * Refine softmax test=develop * Refine cuda kernel * Refine cuda version * Refine cmake test=develop * refine header file * refine test case and pass * refine comments	5 years ago
Adam	7faa3e9555	Add ConvTranspose + BatchNorm fuse pass (#20161 ) * Add ConvTranspose + BatchNorm fuse pass test=develop * Add tests for conv+bn and conv_transpose+bn passes test=develop	5 years ago
石晓伟	2c28e3283a	fix analysis_predictor ci, test=release/1.6 (#20141 )	5 years ago
石晓伟	01b9d07963	update operator compatible info, test=develop (#19978 ) * update operator compatible info, test=develop * revert cmake/version.cmake, test=develop * add unit_tests and fix bugs, test=develop * update ../paddle/fluid/framework/framework.proto, test=develop * fix bug of paddle/fluid/inference/api/analysis_predictor.cc, test=develop * update paddle/fluid/framework/version_test.cc, test=develop * add comments and rename interfaces, test=develop	5 years ago
Zhaolong Xing	e89b12884a	FIx C++ inference BUG: When open memory optim and enable trt subgraph at the same time, there is a bug (#19969 ) * fix memory optimization type test=develop * 1. fix BUG: open trt and memory optim will trigger bug. 2. Clean memory optim bug. test=develop	5 years ago
Yiqun Liu	3cd985a669	Add a pass to fuse fc+elementwise_add+layernorm (#19776 ) * Add fc_elementwise_layernorm_fuse pass and unittest. * Add fused_fc_elementwise_layernorm op and its GPU kernel. test=develop * Apply fc_elementwise_layernorm_fuse_pass to GPU inference. * Add the setting of attrs in the definition of binary_op. test=develop * Add comment. * Implement the unittest. test=develop * Change the unittest name of layer_norm. test=develop	5 years ago
石晓伟	71b2ed61bc	support MLU nums, test=develop (#19372 )	5 years ago
Pei Yang	9cbc1eff2d	zerocopytensor support uint8, analysis config support profile, analysis predictor support GetInputTensorShape, test=develop (#19822 )	5 years ago
Yiqun Liu	a65c728e5d	Implement the GPU kernel of fc operator (#19687 ) * Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop	6 years ago
Tao Luo	f05d2c519d	paddle::framework::vectorize() templatization [PART3] (#19643 ) * paddle::framework::vectorize() templatization test=develop * update pybind/imperative.cc test=develop * revert update on unsqueeze_op.cc and warpctc_cudnn_op.cu.cc test=develop	6 years ago
Yiqun Liu	c5548178b0	A a pass to enable the use of cudnn (#19346 ) * Add a interface to enable cudnn for inference. * Add cudnn_placement_pass. test=develop * Set the default value of cudnn_enabled_op_types to null. test=develop * Write the common basic class, placement_pass_base, to refine the codes. test=develop * Call EnableCUDNN in unittest. test=develop * Refine cudnn_placement_pass tester. * Enable the testing of cudnn_placement_pass in inference's unittest. test=develop * Add the check of op kernels. test=develop	6 years ago
liuwei1031	d6cb1a4122	add dynamic C runtime support on windows, test=develop (#19502 )	6 years ago
Yiqun Liu	fcec365d29	Add a pass to replace dropout_op with scale_op when is_test is true (#19297 ) * Add simplify_with_basic_ops_pass to replace dropout_op with scale_op when is_test is true. test=develop * Delete dropout_op directly when upscale_in_train is true. test=develop * Improve the debug string, adding the print of op_desc information. * Fix the case when dropout's input x is reused as the next op's output. * Add the pass to inference. test=develop * Change the log level. test=develop * Add unittest for inplace case. * Add comment to explain the pass. * Apply the pass for CPU inference. test=develop * Fix the typo. test=develop * Add the check of AttrType. test=develop	6 years ago
lidanqing	9240e5325c	add local user data conversion into full_pascalvoc_test_preprocess.py (#19283 ) * add local user data conversion into full_pascalvoc_test_preprocess.py test=develop * change PADDLE_ENFORCE to PADDLE_ENFORCE_GE test=develop * change according to reviews test=develop	6 years ago
Adam	97d1db1874	Add generalized Conv+Activation MKLDNN fuse pass creation Part2 (#19237 ) * Add generalized Conv+Activation MKLDNN fuse pass creation Part2 test=develop * Undefined behaviour of GetAttrIfExists<> FIX test=develop	6 years ago
Zhaolong Xing	76c95af000	Fix BUG: Mask RCNN inference diff When using AnalysisPredictor. (#19213 ) * fix mask rcnn bug: 1. affine channel fuse (diff) 2. condition block op (memory leak) 3. merge lod tensor op (diff) 4. memroy optim (diff) test=develop * fix ci aboud PADDLE_ENFOCE fix merge lod infer op ut test=develop	6 years ago
Zeng Jinle	5b6673c44d	merge develop to solve conflict, also fix API doc, test=develop (#18823 )	6 years ago
Adam	b837689e97	Add generalized Conv+Activation MKLDNN fuse pass creation (#19072 ) test=develop	6 years ago
wopeizl	80b7ef6fc8	add tensorrt support for windows (#19084 ) * add tensorrt support for windows	6 years ago
Tao Luo	741ce8bb1a	inference_shared_library support profile (#16275 ) test=develop	6 years ago
silingtong123	fd3b666d8c	test=develop,Synchronize the contents of develop with release1.5 (#18937 ) Fix the third-party openblas dependency for paddle on windows	6 years ago
石晓伟	ee2f296ef8	Fusion: seqpool_cvm_concat (#18471 ) * add fusion_seqpool_cvm_concat test=develop * simplify pass, test=develop * fix code style, test=develop	6 years ago
liuwei1031	0d99690809	fix several security bugs reported by security team (#18831 ) * fix security issue, test=develop * bug fix, test=develop * throw an exception when null pointer data with non-zero length PaddleBuf is passed, test=develop	6 years ago
Zhaolong Xing	61238d31f7	Trt fp16 support (#18860 ) * Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop * 1 add trt fp16 support test=develop	6 years ago
Zhaolong Xing	26ae6d49e4	Update trt5 for paddle-trt (#18645 ) * update paddle-trt for: 1. fix bug: when batch > 2, core in split plugin. 2. add leaky_relu trt5.0 support (yolov3 from 65ms to 42ms.) 3. add new attr to dropout. 4. shuffle channel, swish, relu6 support test=develop * 1. fix ci test=develop	6 years ago
石晓伟	25d8079140	Fix Bitmain Predictor::Clone() (#18599 ) * update anakin-engine interfaces for content-dnn test=develop * support only-gpu mode of Anakin modify eltwise parse test=develop * modification for thread-safe test=develop * Integrated template instance test=develop * increase template parameters test=develop * support MLU predictor test=develop * update anakin cmake files test=develop * update TargetWrapper::set_device * update the initialization of anakin subgraph test=develop * use the default constructor of base class test=develop * load model from buffer with length test=develop * modify the access level of class test=develop * support anakin for bitmain arch test=develop * remove files * checkout cmakelists test=develop * modify interfaces test=develop * add cmake dependments test=develop * enforce the outputs of net test=develop	6 years ago
Tao Luo	076f833110	add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy (#18580 ) * add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy test=develop * enhance MkldnnPostReset test=develop * add comments for mkldnn_cache_capacity field test=develop	6 years ago
Jiabin Yang	667f88f9a6	Fix/gcc 4.8 ubt link error (#18558 ) * test=develop, fix docker with paddle nccl problem * test=develop, fix/gcc_4.8_ubt_link_error * test=develop, fix code format	6 years ago
Zhaolong Xing	88b52a27fe	Inference: fix mask rcnn model diff, optim memory usage, memory leak. (#18532 ) * Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop	6 years ago
石晓伟	1529154821	Support Bitmain Anakin (#18542 ) * update anakin-engine interfaces for content-dnn test=develop * support only-gpu mode of Anakin modify eltwise parse test=develop * modification for thread-safe test=develop * Integrated template instance test=develop * increase template parameters test=develop * support MLU predictor test=develop * update anakin cmake files test=develop * update TargetWrapper::set_device * update the initialization of anakin subgraph test=develop * use the default constructor of base class test=develop * load model from buffer with length test=develop * modify the access level of class test=develop * support anakin for bitmain arch test=develop * remove files * checkout cmakelists test=develop	6 years ago
石晓伟	047bba855b	Remove the obsolete cmake options (#18481 ) * remove the obsolete cmake options, test=develop * remove unittests, test=develop	6 years ago
Tao Luo	3123d18787	remove unused AnalysisPredictor::SetMkldnnThreadID() (#18444 ) test=develop	6 years ago
Michał Gallus	7023a86c3a	Fix Pooling output scale (#18186 ) * Int8: Fix Pooling output scale test=develop * Update scales quantization for certain operators These include: concat, transpose, pool and reshape. test=develop * Move concat minimum scale finding to quantizer test=develop	6 years ago
Michał Gallus	8409693272	Reset DeviceContext after quantization warmup (#18182 ) test=develop	6 years ago
Sylwester Fraczek	9252e8fa08	add int8 mkldnn prior_box (#17242 ) add prior_box quantization code add scale algo rules for prior box test=develop	6 years ago
wopeizl	daa32d5383	fix package generation for inference test=develop (#18220 )	6 years ago
翟飞跃	802ea50956	fix spelling errors (#17941 ) * fix spelling errors; test=develop * Update API.spec update md5 * Update API.spec * change the order of api;test=develop	6 years ago
石晓伟	04ea7cb069	modify the access level of anakin engine (#18015 ) test=develop	6 years ago
石晓伟	bce259e5bf	Update the Anakin interfaces for content-dnn and MLU (#17890 ) * update anakin-engine interfaces for content-dnn test=develop * support only-gpu mode of Anakin modify eltwise parse test=develop * modification for thread-safe test=develop * Integrated template instance test=develop * increase template parameters test=develop * support MLU predictor test=develop * update anakin cmake files test=develop * update TargetWrapper::set_device * update the initialization of anakin subgraph test=develop * use the default constructor of base class test=develop	6 years ago
石晓伟	d008260fa8	update the initialization of anakin subgraph (#17880 ) test=develop	6 years ago
Zhaolong Xing	ae576f3c68	fix: when use the load model from memory mode, the RAM occupy is high (#17788 ) test=develop	6 years ago
翟飞跃	993c703bcc	INT8 MKL-DNN v2 integrate to slim (#17634 ) * refactor PR 16865 * delete mergetool files * test=develop * test=develop * test=develop * test=develop * create dir for int8 model before call SaveOptimModel * test=develop * mkldnn int8 only support linux; test=develop * refine code; test=develop * remove comment; test=develop * refine code; test=develop * fix bug; test=develop * add exception for mkldnn_post_training_strategy * reuse int8v2 CAPI dataset; test=develop * fix accuracy check bug; test=develop * remove tab * convert files to unix format * test=develop * reduce CI time;test=develop * reduce CI time and refine code;test=develop * refine comment; test=develop * add cmake FLAGS;test=develop * remove predict_num;test=develop	6 years ago
Tao Luo	e089e454a1	make omp thread num default 1 after inference run (#17801 ) test=develop	6 years ago
mozga-intel	5eb81fe595	Capi for a ngraph engine (#17037 )	6 years ago
lidanqing	04b6c29ee0	Improve mobilenetv2 INT8 performance by using INT8 relu as post-op (#17570 ) * add INT8 conv+relu6 fuse and enbale mobilentv2 INT8 test test=develop * change fasle and 0.0 to fuse_brelu and brelu_threshold test=develop change the "fuse_relu\|\|fuse_brelu" to "unsigned_output" test=develop * Use relu instead of brelu as INT8 post-op because INT8 brelu is not enabled in mkldnn v0.18 test=develop * continuous-integration fix test=develop	6 years ago
Jacek Czaja	6d8075ecef	[MKL-DNN] conv_transpose mkldnn bias pass (#17644 ) * - changes to graph detector - Changes to pass - Added ut for new pass - use_pass - Added pass to mkldnn passes - fix to registration - improved verbose messaging for conv bias passes - Lint fixes test=develop * - Lint fixes test=develop	6 years ago
Sylwester Fraczek	96845d2168	add Concat quantization (#17448 ) * add Concat quantization add unit test for quantizing concat fix for wrong value when the input is not in map of calculated scales add use_quantizer to concat_op.cc add scale_algo rules for concat test=develop * missing fix for multiple inputs quantize-squash * wojtuss review fix: adding comment test=develop	6 years ago
Zhen Wang	8bd651b7ed	Fix the bug in the AnalysisPredictor and add more directions about io APIs. (#17639 ) * fix the bug that sub_scope_ may be null in AnalysisPredictor::Run. * add more directions about io APIs' docs. * update the API.spec. test=develop test=document_preview	6 years ago
Zeng Jinle	4aa931dd85	Code clean of Allocator (#17602 ) * Revert "Revert "Fix allocator bug"" This reverts commit `174d0d0b90`. * Revert "fix travis ci" This reverts commit `5656fa9f7c`. test=develop * add inlined_vector.h, test=develop * add inlined_vector_test,test=develop * clean code of allocator,test=develop * delete zero_size_allocator.h,test=develop * fix failed unittest,test=develop	6 years ago
Zhaolong Xing	61221ebc28	TRT: Support set dynamic range in int8 mode. (#17524 ) * fluid int8 train and trt int8 predict align. trt int8 predict init op converter * 2. align fluid int8 train and trt int8 inference. enhance quant dequant fuse pass enhance op converter, trt engine, trt engine op, trt subgraph pass. * 3. add delete_quant_dequant_pass for trt test=develop * 4. add the missing file test=develop * 5. i modify the c++ interface, but forget to modify the pybind code fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter test=develop	6 years ago
Michał Gallus	0c39b97b4e	[MKL-DNN] Add Fully Connected Op for inference only(#15226 ) * fuse mul and elementwise add to fc * Reimplement the FC forward operator * Fix FC MKLDNN integration by transposing weights * Add FC MKLDNN Pass test=develop * FC MKLDNN Pass: change memcpy to std::copy * Fix MKLDNN FC handling of mismatch input and weights dims * Lower tolerance for MKL-DNN in resnet50 test test=develop * Adjust FC to support MKLDNN Op placement test=develop * Adjust Placement Op to set use_mkldnn attribute for graph test=develop * MKLDNN FC: fix weights format so that gemm version is called test=develop * FC MKLDNN: Remove tolerance decrease from tester_helper * FC MKL-DNN: Refactor the code, change input reorder to weight reorder * MKL-DNN FC: Introduce operator caching test=develop * FC MKL-DNN: Fix the tensor type in ExpectedKernelType test=develop * FC MKL-DNN: fix style changes test=develop * FC MKL-DNN: fallback to native on non-supported dim sizes test=develop * FC MKLDNN: fix CMake paths test=develop * FC MKLDNN: Refine placement pass graph mkldnn attribute test=develop * Fix Transpiler error for fuse_conv_eltwise test=develop * Fix missing STL includes in files test=develop * FC MKL-DNN: Enable new output size computation Also, refine pass to comply with newest interface. test=develop * FC MKL-DNN: enable only when fc_mkldnn_pass is enabled * FC MKL-DNN: Allow Weights to use oi or io format * FC MKL-DNN: Adjust UT to work with correct dims test=develop * Enable MKL DEBUG for resnet50 analyzer test=develop * FC MKL-DNN: Improve Hashing function test=develop * FC MKL-DNN: Fix shape for fc weights in transpiler * FC MKL-DNN: Update input pointer in re-used fc primitive * Add log for not handling fc fuse for unsupported dims test=develop * FC MKL-DNN: Move transpose from pass to Op Kernel test=develop * FC MKL-DNN: Disable transpose in unit test test=develop * FC MKL-DNN: Remove fc_mkldnn_pass from default list * Correct Flag for fake data analyzer tests test=develop * FC MKL-DNN: Add comment about fc mkldnn pass disablement test=develop * FC MKL-DNN: Disable fc in int8 tests test=develop	6 years ago
Sylwester Fraczek	5b2a3c4b12	Conv concat relu quantization (#17466 ) * add conv_concat_relu fuse test=develop * add test code test=develop * added missing include with unordered_map test=develop * review fixes for wojtuss test=develop * remove 'should (not) be fused' comment statements one of them was invalid anyway test=develop	6 years ago
guomingz	2281ebf0f3	Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130 ) * Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization. Below table shows the benchmark(FPS) which measured on skx-8180(28 cores) Batch size \| with fusion \| without fusion -- \| -- \| -- 1 \| 214.7 \| 53.4 50 \| 1219.727 \| 137.280 test=develop * Fix the format issue test=develop * Add the missing nolint comments. test=develop * Fix the typos. test=develop * Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine. test=develop * Adjust the indentation. test=develop * Add the test_conv_brelu_mkldnn_fuse_pass case. test=develop * Slightly update the code per Baidu comments. Let the parameter definition embedded into the code. That's will make the code easy to understand. test=develop	6 years ago
liuwei1031	ba70cc499e	fix security bugs : (#17464 ) http://newicafe.baidu.com:80/issue/PaddleSec-33/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-28/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-25/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-24/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-21/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-20/show?from=page test=develop	6 years ago
Tao Luo	32da5e9c3d	remove unused expected_kernel_cache_pass (#17486 ) test=develop	6 years ago
wopeizl	ca3ba378c7	fix the random compilation failure on windows test=develop (#17475 ) * fix the random compilation failure on windows	6 years ago
Zhen Wang	4a1b7fec96	Add setting Scope function for the graph class (#17417 ) * add set_not_owned function for graph * add scope set. test=develop * add scope_ptr enforce not null before setting.test=develop	6 years ago
Zhaolong Xing	7a3bb061d8	fix: (#17279 ) 1. infernce multi card occupy 2. facebox model inference occupy too much test=develop	6 years ago
Wojciech Uss	984aa90583	improved unit test output (#17266 ) added printing data type to differentiate int8 and fp32 latency results test=develop	6 years ago
石晓伟	a72dbe9abf	Cherry-pick benchmark related changes from release/1.4 (#17156 ) * cherry-pick commit from `8877054` * cherry-pick commit from `3f0b97d` * cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn (cherry picked from commit `8643dbc233`) * Cherry-Pick from 16662 : Anakin subgraph cpu support (cherry picked from commit `7ad182e16c`) * Cherry-pick from 1662, 16797.. : add anakin int8 support (cherry picked from commit `e14ab180fe`) * Cherry-pick from 16813 : change singleton to graph RegistBlock test=release/1.4 (cherry picked from commit `4b9fa42307`) * Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2 Support ShuffleNet and MobileNet-v2, test=release/1.4 (cherry picked from commit `a6fb066f90`) * Cherry-pick : anakin subgraph add opt config layout argument #16846 test=release/1.4 (cherry picked from commit `8121b3eccb`) * 1. add shuffle_channel_detect (cherry picked from commit `6efdea8997`) * update shuffle_channel op convert, test=release/1.4 (cherry picked from commit `e4726a066f`) * Modify symbol export rules test=develop	6 years ago
Leo Zhao	54636a1982	call SetNumThreads everytime to avoid missing omp thread setting (#17224 ) * call SetNumThreads everytime to avoid missing omp thread setting resolve #17153 test=develop * add paddle_num_threads into config for test_analyzer_pyramid_dnn resolve #17153 test=develop	6 years ago
wopeizl	83c4f7721f	use two GPUs to run the exclusive test test=develop (#17187 )	6 years ago
luotao1	490e746269	fix runtime_context_cache bug when gpu model has an op runs only on cpu test=develop	6 years ago
wopeizl	d9991dccdd	add parallel build script to ci … (#16901 ) * add parallel build script to ci test=develop * 1. classify the test case as single card/two cards/multiple cards type 2. run test case according to the run type	6 years ago
Tao Luo	aa7b975bf6	disable runtime_context_cache pass by default test=develop	6 years ago
Tao Luo	ca8b8fa0bd	Merge pull request #16830 from Superjomn/fix/tmp-memory-optim fix memory optim temporarily	6 years ago
lijianshe02	de26df440b	add SaveOptimModel interface in analysis_predictor.h and test it in a… (#16441 ) * add SaveOptimModel interface in analysis_predictor.h and test it in analyzer_dam_tester and analyzer_resnet50_tester test=develop	6 years ago
superjomn	f58c3ec189	fix memory optim temporarily test=develop	6 years ago
liuwei1031	85363848a1	Security issue (#16774 ) * disable memory_optimize and inpalce strategy by default, test=develop * fix security issue http://newicafe.baidu.com:80/issue/PaddleSec-3/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-8/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-12/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-32/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-35/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-37/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-40/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-43/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-44/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-45/show?from=page test=develop * revert piece.cc, test=develop * adjust api.cc,test=develop	6 years ago
tensor-tang	d6c1b5a73b	disable seqpool concat pass by default saving CI time test=develop	6 years ago
luotao1	226596a296	Merge branch 'develop' into core_opt_choose_kernel	6 years ago
luotao1	bd636a9ea6	test_analyzer_int8 tests use default pass order test=develop	6 years ago
Yan Chunwei	044ae2497d	fix identity temporarily (#15942 )	6 years ago
Wojciech Uss	ec2750b3c2	fix repeating passes (#16606 )	6 years ago
Wojciech Uss	9b6a029666	fix dataset reading and add support for full dataset (#16559 )	6 years ago
石晓伟	5dea0bdd1b	Merge pull request #16498 from Shixiaowei02/feature/anakin-engine merge feature/anakin-engine to develop	6 years ago
Shixiaowei02	bddb2cd315	resolve conflicts with the develop branch test=develop	6 years ago
nhzlx	d065b5bf2b	Anakin ssd support refine trt first run add quant dequant fuse pass omit simplify_anakin_priorbox_detection template omit transpose_flatten_concat_fuse template test=develop	6 years ago
Michał Gallus	2d8b7b3a76	Refine default MKL-DNN Pass order (#16490 ) * Refine default MKL-DNN Pass order test=develop * Add comment to default MKL-DNN Pass list test=develop	6 years ago
Wojciech Uss	09dfc7a2aa	C-API quantization core 2 (#16396 ) * C-API quantization core test=develop Co-authored-by: Sylwester Fraczek <sylwester.fraczek@intel.com> * Decouple Quantizer from AnalysisPredictor test=develop * fixes after review test=develop * renamed mkldnn quantize stuff test=develop * remove ifdef from header file test=develop	6 years ago
nhzlx	953bdde058	Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD test=develop	6 years ago
nhzlx	45b3766fdf	fix comments test=develop	6 years ago
liuwei1031	de3b70a101	fix cdn issue, test=develop (#16423 ) * fix cdn issue, test=develop * fix cdn issue, test=develop	6 years ago
nhzlx	3df7b98a0f	Merge branch 'develop' of https://github.com/paddlepaddle/paddle into HEAD	6 years ago
nhzlx	f3a2e4b3d8	1. Add ANAKIN_ROOT compile option 2. refine trt code test=develop	6 years ago
luotao1	056599a738	add expected_kernel_cache_pass test=develop	6 years ago
Wojciech Uss	cbe2dbf0db	Add enabling quantization (#16326 ) * Add enabling quantization test=develop * remove unused (here) function	6 years ago
nhzlx	4f4daa4b66	cherry-pick from feature/anakin-engine: add data type for zero copy #16313 1. refine anakin engine 2. add data type for zero copy align dev branch and PaddlePaddle:feature/anakin-engine brach the cudnn workspace modify was not included for now, because we use a hard code way in feature/anakin-engine branch. There should be a better way to implement it, and subsequent submissions will be made. test=develop	6 years ago
nhzlx	07dcf2856c	git cherry-pick from feature/anakin-engine: update anakin subgraph #16278	6 years ago
nhzlx	c407dfa3cb	cherry-pick from feature/anakin-engine: refine paddle-anakin to new interface. #16276	6 years ago
nhzlx	a25331bc26	cherry-pick from feature/anakin-engine: deal the changing shape when using anakin #16189	6 years ago
nhzlx	c79f06d3d8	cherry-pick from feature/anakin-engine: add batch interface for pd-anakin #16178	6 years ago
nhzlx	69d37f81d7	cherry-pick from feature/anakin-engine: refine anakin subgraph. #16157 support change input size	6 years ago
nhzlx	a1d200a5de	cherry-pick from feature/anakin-engine: Anakin support facebox #16111	6 years ago
nhzlx	b21770a2aa	cherry-pick from feature/anakin-engine: Add subgraph fuse support and anakin engine #16018	6 years ago
luotao1	82af8031d9	add runtime_context_cache_pass test=develop	6 years ago
Tao Luo	7d2740db83	Revert "cache runtime_context"	6 years ago
luotao1	a275fd6e0c	Merge branch 'develop' into runtime_context	6 years ago
Wojciech Uss	2579ade45f	Add cpu_quantize_pass for C-API quantization (#16127 ) * Add cpu_quantize_pass for C-API quantization test=develop * add cpu_quantize_pass test * fix lint: add include memory unorderd_map and unordered_set test=develop * fuse_relu 1 test=develop * tuned 2 without squash * fixes test=develop * remove unused vars test=develop * refactored test=develop * fix lint c-style cast -> C++ style cast test=develop * remove QuantMax and c style casts test=develop * last usage of QuantMax removed test=develop * Fix Analysis Predictor UT Check if memory_optimize_pass has already been added to the analysis config before adding a new one, so that it is not added multiple times. test=develop * change map to unordered_map fix the forgotten part of cpu_quantize_pass_tester.cc test=develop * removed quantized attribute * fixed cpu_quantize_pass_tester and op attr comments test=develop * removed redundant line test=debug * removed gmock test=develop * fix after merge	6 years ago
luotao1	5ecdc49c6b	set enable_runtime_context_cache_ default false test=develop	6 years ago
luotao1	1510b866b6	turn off runtime_context_cache for tensorrt test=develop	6 years ago
luotao1	d94fd97230	add runtime_context_cache_pass test=develop	6 years ago
luotao1	1283833395	zero_copy tensor support INT32 test=develop	6 years ago
luotao1	31c4e1d9fc	Merge branch 'develop' into zero_copy	6 years ago
Tao Luo	e5e7e9b865	Merge branch 'develop' into transformer_ut	6 years ago
Tao Luo	6f2581e4c5	Merge pull request #16090 from lidanqing-intel/paddle-int32 Add PaddleDType INT32 support	6 years ago
Zhaolong Xing	3d63aa0a11	Merge pull request #15729 from NHZlX/add_static_model_load_for_trt Four points for enhancing Paddle-TRT	6 years ago
nhzlx	a9ed427749	cant not pass ci add if use static engine for trt test=develop	6 years ago
luotao1	fad06cb928	unify ZeroCopy in analysis_test	6 years ago
lidanqing	4aeb261da9	Add INT32 support. INT32 in last switch case test=develop	6 years ago
luotao1	06aab1b493	refine SetCpuMathLibraryNumThreads test=develop	6 years ago
nhzlx	3c40cb767b	7 refine zero copy update trt in docker file test=develop	6 years ago
Yiqun Liu	1616c32acf	Add the include of cudnn.h to enable the use of CUDNN_VERSION. (#15961 ) test=develop	6 years ago
nhzlx	2eff3e26b6	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into add_static_model_load_for_trt	6 years ago
nhzlx	06a088a199	fix comments and fix cpplint test=develop	6 years ago
nhzlx	0ed63b2108	6. delete useless predictor id test=develop	6 years ago
Sylwester Fraczek	1943119fc5	fix typo memeroy->memory test=develop	6 years ago
Sylwester Fraczek	8bc604571f	fix typo seriazlized->serialized	6 years ago
Sylwester Fraczek	543e53db05	fix typo releated->related	6 years ago
tensor-tang	e1c707fe9c	fix warnings (#15790 ) * fix warnings test=develop * fix enforce test test=develop	6 years ago
nhzlx	2070fb246d	4. do the trt_engine optim during init. add simple static mode loading test=develop	6 years ago
Yan Chunwei	3a5d6e5e64	move passes to src to avoid different behavior in deployment (#15705 )	6 years ago
Yan Chunwei	c00ed19df2	add more comment (#15603 )	6 years ago
Gabor Buella	da9c94da33	Clang build fixes (#15628 ) * Remove some superfluous std::move calls The std:move triggered a build error (with -Werror): ``` [ 9%] Building CXX object paddle/fluid/memory/allocation/CMakeFiles/allocator_facade.dir/allocator_facade.cc.o /home/tej/code/gbuella_paddle/paddle/fluid/memory/allocation/allocator_facade.cc:86:29: error: moving a temporary object prevents copy elision [-Werror,-Wpessimizing-move] [this] { return std::move(CreateAllocatorWithChunk()); }, capacity); ^ /home/tej/code/gbuella_paddle/paddle/fluid/memory/allocation/allocator_facade.cc:86:29: note: remove std::move call here [this] { return std::move(CreateAllocatorWithChunk()); }, capacity); ^~~~~~~~~~ ~ 1 error generated. ``` See: https://reviews.llvm.org/D7633 * Remove a superfluous lambda capture from framework/operator.h ``` [ 10%] Building CXX object paddle/fluid/platform/CMakeFiles/device_context.dir/init.cc.o In file included from /home/tej/code/gbuella_paddle/paddle/fluid/platform/init.cc:19: /home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.h:229:21: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture] [this](Variable* var) { return var; }); ^~~~ 1 error generated. ``` Changing it to `return it->second;`, as is in the function below. * Rethrow an exception (instead of copying it) ``` [ 11%] Building CXX object paddle/fluid/framework/CMakeFiles/operator.dir/operator.cc.o /home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.cc:191:13: error: local variable 'exception' will be copied despite being thrown by name [-Werror,-Wreturn-std-move] throw exception; ^~~~~~~~~ /home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.cc:191:13: note: call 'std::move' explicitly to avoid copying throw exception; ^~~~~~~~~ std::move(exception) ``` See https://reviews.llvm.org/D43322 for an explanation of this diagnostic message. * Remove an unused variable ``` /home/tej/code/gbuella_paddle/paddle/fluid/framework/operator.cc:884:16: error: private field 'scope_' is not used [-Werror,-Wunused-private-field] const Scope& scope_; ^ ``` * struct ComputationOpHandle -> class ComputationOpHandle ``` [ 13%] Building CXX object paddle/fluid/framework/details/CMakeFiles/memory_early_delete_pass.dir/memory_early_delete_pass.cc.o In file included from /home/tej/code/gbuella_paddle/paddle/fluid/framework/details/memory_early_delete_pass.cc:21: /home/tej/code/gbuella_paddle/paddle/fluid/framework/details/reference_count_pass_helper.h:30:1: error: class 'ComputationOpHandle' was previously declared as a struct; this is valid, but may result in linker errors under the Microsoft C++ ABI [-Werror,-Wmismatched-tags] class ComputationOpHandle; ^ /home/tej/code/gbuella_paddle/paddle/fluid/framework/details/computation_op_handle.h:29:8: note: previous use is here struct ComputationOpHandle : public OpHandleBase { ^ /home/tej/code/gbuella_paddle/paddle/fluid/framework/details/reference_count_pass_helper.h:30:1: note: did you mean struct here? class ComputationOpHandle; ^~~~~ struct 1 error generated. ``` * Fix name() methods under fluid/operators ``` In file included from /home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen/act.cc:15: In file included from /home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen/act.h:19: /home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen/jitcode.h:71:23: error: 'name' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override] virtual const char* name() const = 0; ^ /home/tej/code/gbuella_paddle/paddle/fluid/operators/jit/gen_base.h:31:23: note: overridden virtual function is here virtual const char* name() const = 0; ^ ``` test=develop	6 years ago
Chunwei	d85c2e4e5c	fix anakin compile dependency test=develop	6 years ago
qingqing01	943d972878	Fix analysis predictor when loading the persistable RAW type variable. (#15613 )	6 years ago
Yan Chunwei	e887d71958	fix ir debug config (#15571 )	6 years ago
Yan Chunwei	897789b16e	fix save_inferece_model bug (#15365 )	6 years ago
Tao Luo	3d0ecab41b	add analyzer_transformer_test test=develop	6 years ago
Yan Chunwei	655179089f	AnalysisConfig remove contrib namespace (#15540 )	6 years ago
qingqing01	a6910f900e	Always create variables in analysis_predictor before OptimizeInferenceProgram. (#15533 ) Otherwise, some other persistable variable (like RAW type) will not be created	6 years ago
Yan Chunwei	b62b756b28	add version support (#15469 )	6 years ago
Yan Chunwei	526790e652	infer get program (#15511 )	6 years ago
Zhaolong Xing	97b76c94c4	Merge pull request #15242 from NHZlX/trt_int8_ultimate_version add trt int8 support	6 years ago
Yan Chunwei	e2818c8608	add dynamic memory optim (#15457 )	6 years ago
nhzlx	92cf4a4c6b	fix comments test=develop	6 years ago
nhzlx	027d24c831	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into trt_int8_ultimate_version	6 years ago
nhzlx	9641324995	fix comments test=develop	6 years ago
nhzlx	484b3bc801	When cudnn version < 7100, there is problem with conv_fusion. Add check for it. test=develop	6 years ago
flame	d60751fb71	add python inference api (#15248 ) add python inference api	6 years ago
Yan Chunwei	885c4e57ab	fea/infer memory optim2 (#14953 )	6 years ago
Yan Chunwei	c9e5aa19c1	get tensor API add more comments (#15345 )	6 years ago
Yan Chunwei	e84234b551	make clone thread safe (#15363 )	6 years ago
Zhaolong Xing	236201c222	Merge pull request #15350 from NHZlX/fix_bug_for_precditor fix analysis config bug	6 years ago
Yan Chunwei	e07900d317	cache tensor ptr in ZeroCopyTensor (#15352 )	6 years ago
Yan Chunwei	b7916440ff	hot fix the Native clone (#15344 )	6 years ago
nhzlx	b95f2ff8fe	fix win build bug test=develop	6 years ago
nhzlx	b938324381	Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into trt_int8_ultimate_version test=develop	6 years ago
nhzlx	312fe0ece1	add trt int8 calibration support fix comments test=develop	6 years ago
Yiqun Liu	568cc2ffa8	Optimize while_op for test (#14764 ) * Simplify the compare op for CPU. * Use asynchronous tensor copy in reshape_op's kernel. * Optimize while_op for test, avoiding creating variables every time. test=develop * Enable the cache of kernel type and kernel function. test=develop * Enable profiling with gperftools. * Remove flags for testing, and fix the linking error. test=develop * Delete the codes of ChooseKernel. test=develop * Fix bug when preparing ExecutorPrepareContext for while_op. * Fix missing depending on grpc libraries. * Remove the redundant print. test=develop * Follow comments. * Remove the codes related to prepare the ExecutorPrepareContext for while_op. test=develop	6 years ago
nhzlx	b2ba3471fd	fix analysis config bug.	6 years ago
tensor-tang	a5d2a6d1ad	add fuse pass of sequared mat sub fusion	6 years ago
tensor-tang	a89296ac1f	add repeated fc relu pass	6 years ago
Zhaolong Xing	98e85f3735	add_transpose_flatten_concat_fuse (#15121 )	6 years ago
wopeizl	5d9edb4124	Merge pull request #15156 from wopeizl/windows/fixgpuissue fix gpu buils issue on windows test=develop	6 years ago
tensor-tang	146e942c65	Merge pull request #15250 from tensor-tang/refine/seqpool/feed Refine/seqpool/feed with infer zerocopytensor	6 years ago
peizhilin	439691f5bd	adjust the shlwapi on windows test=develop	6 years ago
tensor-tang	ce909664d8	Merge remote-tracking branch 'ups/develop' into refine/seqpool/feed	6 years ago
peizhilin	e239558e56	remove the dismatch enclosure to avoid warning message test=develop	6 years ago
Tao Luo	2b11c710b3	Merge pull request #15249 from NHZlX/fix_trt_demo_ci fix demo ci bug	6 years ago
tensor-tang	137060135e	fix zerocopy size	6 years ago
nhzlx	e7d83389e6	fix demo ci bug 1. trt_demo bug 2. trigger exit when exists a bug test=develop	6 years ago
nhzlx	4e3522e5b4	add trt int8 support test=develop	6 years ago
tensor-tang	72d2a1801e	add seqpool concat fuse pass test=develop	6 years ago
Yan Chunwei	d09d6eadc0	make inference api work with Doxygen (#15195 )	6 years ago
Yan Chunwei	875a07c32d	refactor inference analysis api (#14634 )	6 years ago
tensor-tang	516fe301ee	add comment in case of empty name test=develop	6 years ago
tensor-tang	dca68cdf97	throw error when name not find test=develop	6 years ago
tensor-tang	cd94df8679	fix load and refine	6 years ago
Zhaolong Xing	4048cfa9da	Merge pull request #15048 from NHZlX/add_affine_channel_fuse Add conv+ affine channel fuse pass	6 years ago
Zeng Jinle	c0bcff00dc	Merge pull request #14962 from sneaxiy/rewrite_variable_type Rewrite variable type	6 years ago
Tao Luo	05f1b65da3	simplify prepere_input in analyzer_test test=develop	6 years ago
nhzlx	02e17396c2	fix comments test=develop	6 years ago
nhzlx	71636e677d	add min_subgraph_size attr to tensorrt config test=develop	6 years ago
sneaxiy	dde3afe7b7	Merge develop test=develop	6 years ago
nhzlx	73b47df1f4	Merge branch 'develop' of https://github.com/paddlepaddle/paddle into add_affine_channel_fuse test=develop	6 years ago
nhzlx	ce3782c193	add affine_channel fuse. fix conv+elemenwise fuse bug.	6 years ago
qingqing01	51a9fca323	Async memory copy (#15013 )	6 years ago
sneaxiy	ae6f46a1a9	rewrite variable type test=develop	6 years ago
peizhilin	07c7eaabb4	Merge remote-tracking branch 'upstream/develop' into windows/mkl test=develop	6 years ago
Zhaolong Xing	a9fb34fad8	Merge pull request #14903 from NHZlX/add_conv_elementwise_pass Add conv + elementwiseAdd pass	6 years ago
peizhilin	5a6d7fe2ff	add mkl,ctc support for windows	6 years ago

... 2 3 4 5 6 ...

617 Commits (e50bc2c2a6fcfec748d5bb991588a2fdc2ab4caf)