Paddle

Commit Graph

Author	SHA1	Message	Date
Michał Gallus	5d7d548275	INT8 Fully-connected (#17641 ) * Implement Int8 FC * Integrate FC into INT8v2 test=develop * int8 FC: transpose weights before computing scales test=develop * Add support for activation_type string in FC test=develop * Disable MKL-DNN's FC in VGG16 and 19 test=develop * Disable FC quantization when mkldnn FC is disabled test=develop * Solve PADDLE_ENFORCES in FC int8 * Fix Paddle enforces and remove const cast test=develop * Fix style changes test=develop * Fix quantizer_tester test and add fc quantization test=develop * Fix FC test fail on CUDA * Remove unnecessary log from quantize placement pass test=develop * Add Thread ID to FC hash key test=develop * Add comments to MKL-DNN FC Kernel test=develop * Refactor quantizer test=develop * Fix linter issues test=develop * Fix crash in slim googlenet test=develop * Fix PADDLE_ENFORCE messages test=develop	5 years ago
GaoWei8	234060f88f	Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972 ) * Add fc padding to solve mkl performance test=develop * fix gpu pass and error information test=develop * fix fc_fuse_pass_test test=develop * fix error information test=develop * fix error information test=develop * fix name and add fc op padding test test=develop * fix attributes test=develop * optimize fc padding test=develop * fix test test=develop	5 years ago
silingtong123	45c1e7bb7b	add prediction demo and script on windows (#21248 )	5 years ago
zhouwei25	345b67b5e2	remove warning LNK4006 and warning LNK4221 (#21226 )	5 years ago
liu zhengxi	3cb6c0a059	Fix the CAPI ZeroCopy shape error and reuse the code to get output (#21240 ) * fix the CAPI ZeroCopy shape error and reconstruct the output obtain * use an anonymous namespace to cover the functor * fix unit tests because of the output of typeid(T).name() is different from linux and windows, test=develop	5 years ago
Pei Yang	2e2f92a5b1	fix trt weight bug (#21231 ) added splitter "__" between weight name and suffix number to avoid conflicts.	5 years ago
zhouwei25	c0dcb090a3	Determine whether to copy and link inference lib by ON_INFER (#20931 )	5 years ago
Zhaolong Xing	65f7052554	TRT int8: refine trt int8 for dynamic range set (#21112 ) * refine trt int8 for dynamic range set test=develop * refine trt int8 test=develop	5 years ago
GaoWei8	a9d4eed3a8	fix cmake fails on inference_download_and_uncompress (#21185 ) * solve cmake fails on inference_download_and_uncompress test=develop * solve cmake fails on inference_download_and_uncompress test=develop	5 years ago
Adam	d74ea0855f	Add relative error measure when (value > 1) (#21144 ) * Add relative error measure when value > 1 test=develop * Move code to CheckError function test=develop	5 years ago
Chen Weihang	8414575b78	Add examples for error message writing specification - PreconditionNotMet, Unimplemented, Unavailable (#21137 ) * add examples for error spec, test=develop * change ENFORCE to ENFORCE_**, test=develop	5 years ago
joanna.wozna.intel	77c2083586	Add transpose2 INT8 for mkl-dnn (#19424 ) * Add transpose2 INT8 for mkl-dnn test=develop * Fix test_transpose_int8_mkldnn test=develop * Revert "Merge branch 'develop' into transpose_int8_mkldnn_2" This reverts commit 34011bdba4c859abb945e062ab13124f70508054, reversing changes made to 2ce6473f144da298aba4a43d46918f27d463cf7c. * Revert "Revert "Merge branch 'develop' into transpose_int8_mkldnn_2"" This reverts commit 23754dd78ca47ae56881161172b2aacd349aba90. * Add template to TransposeMKLDNNHandler test=develop * Resolve conflict test=develop * Restore get_size and refactor test=develop	5 years ago
GaoWei8	829bf871d7	Add ernie c++ inference test (#21015 ) * Add ernie unit test test=develop * Add ernie unit test test=develop * Add ernie unit test test=develop * remove ngraph * optimize gpu test test=develop * optimize codes test=develop	5 years ago
Pei Yang	e89c16b90d	Bug Fix: Paddle-TRT cannot handle adaptive pooling in pool2d op converter and "num" attribute in split op converter (#20733 ) * fix pool2d trt converter, test=develop * add fix for split op converter, test=develop	5 years ago
石晓伟	e742760f8e	optimize version error, test=develop (#20715 )	5 years ago
bingyanghuang	fd49ebcbd8	update int8 benchmark with 6271 data, test=develop test=document_fix (#20736 )	5 years ago
石晓伟	d8f4f4239d	Ensure backward compatibility with the anakin interface, test=develop (#20691 ) * support MLU nums, test=develop * change anakin apis, test=develop	5 years ago
liu zhengxi	d39777fefa	alter the capi of PD_PredictorRun to provide proper function, test=develop (#20697 ) modify the way to pass parameter out_size in function.	5 years ago
liu zhengxi	dbc2bb3376	improve the performance of capi in PD_PredictorRun (#20665 )	5 years ago
lidanqing	57b656f956	Add document for int8 object detection quantization (#19356 )	5 years ago
liu zhengxi	922d432477	fix the PD_ZeroCopyPredictorRun output problem (#20612 ) * fix the PD_ZeroCopyPredictorRun output problem and add some checks and logs for users * modify the cmakelists depends and fix the cmakelists problem	5 years ago
bingyanghuang	85e1f2150b	Modify the helper information in full_pascalvoc_test_preprocess.py (#20475 )	5 years ago
Pei Yang	443f604c3b	add DisableGlogInfo() to AnalysisConfig, test=develop (#20581 )	5 years ago
zhaoyuchen2018	b8333edef6	Add Multihead matmul fuse pass (#20167 ) * Add multihead fuse pass for ernie opt * Refine softmax test=develop * Refine cuda kernel * Refine cuda version * Refine cmake test=develop * refine header file * refine test case and pass * refine comments	5 years ago
Adam	7faa3e9555	Add ConvTranspose + BatchNorm fuse pass (#20161 ) * Add ConvTranspose + BatchNorm fuse pass test=develop * Add tests for conv+bn and conv_transpose+bn passes test=develop	5 years ago
liu zhengxi	53d8799bee	remove incorrect new in c style, test=develop (#20370 ) remove incorrect "new" in c style.	5 years ago
石晓伟	2c28e3283a	fix analysis_predictor ci, test=release/1.6 (#20141 )	5 years ago
liu zhengxi	acb02fd69e	add dll to inference capi (#20180 ) * add dll to inference capi, test=develop * add if win32 in cmakelists, test=develop	5 years ago
liu zhengxi	301eeb5bea	Add capi for fluid inference api (#20092 ) * add capi for fluid inference api, including AnalysisConfig, AnalysisPredictor, PaddleBuf, PaddleTensor, ZeroCopyTensor	5 years ago
Wilber	276b5e3440	fix compile paddle with anakin bug * fix compile with anakin bug * remove useless deps test=develop - 修复了联编anakin时，遇到的bug. - 编译test_anakin_activate 不通过 - 编译test_anakin_engine 不通过	5 years ago
石晓伟	01b9d07963	update operator compatible info, test=develop (#19978 ) * update operator compatible info, test=develop * revert cmake/version.cmake, test=develop * add unit_tests and fix bugs, test=develop * update ../paddle/fluid/framework/framework.proto, test=develop * fix bug of paddle/fluid/inference/api/analysis_predictor.cc, test=develop * update paddle/fluid/framework/version_test.cc, test=develop * add comments and rename interfaces, test=develop	5 years ago
Zhaolong Xing	e89b12884a	FIx C++ inference BUG: When open memory optim and enable trt subgraph at the same time, there is a bug (#19969 ) * fix memory optimization type test=develop * 1. fix BUG: open trt and memory optim will trigger bug. 2. Clean memory optim bug. test=develop	5 years ago
Aurelius84	99a9615a4b	Removing length dims constraints of seq_pad and seq_unpad (#19497 ) * Removing last dims constraints of seq_pad and seq_unpad test=develop * fix test_layer api code test=develop * fix sequence_pad_op.cc conflict test=develop * remove test_analyzer_mm_dnn test=develop * fix vectorize bug test=develop * fix vectorize<int> test=develop	5 years ago
pawelpiotrowicz	2c5c636514	Add two extra flags for test_analyzer_int8_image_classification to disable fp32/int8 (#19840 ) test=develop	5 years ago
Pei Yang	baccd7e2ca	Add TRT input shape check between model and runtime (#19864 ) * add TRT shape check, test=develop * model_input_shape == runtime_input_shape, refine message, test=develop	5 years ago
Pei Yang	74812d1c90	Fix BUGS: paddle-TRT repeatedly sets weight_map and overdeletes repetitive_params (#19825 ) * fix trt bugs when sharing params, test=develop * add unittest for cascade_rcnn	5 years ago
石晓伟	d004a0f50e	fix multi-thread exec of trt, test=develop (#19338 )	5 years ago
Yiqun Liu	3cd985a669	Add a pass to fuse fc+elementwise_add+layernorm (#19776 ) * Add fc_elementwise_layernorm_fuse pass and unittest. * Add fused_fc_elementwise_layernorm op and its GPU kernel. test=develop * Apply fc_elementwise_layernorm_fuse_pass to GPU inference. * Add the setting of attrs in the definition of binary_op. test=develop * Add comment. * Implement the unittest. test=develop * Change the unittest name of layer_norm. test=develop	6 years ago
石晓伟	71b2ed61bc	support MLU nums, test=develop (#19372 )	6 years ago
Pei Yang	9cbc1eff2d	zerocopytensor support uint8, analysis config support profile, analysis predictor support GetInputTensorShape, test=develop (#19822 )	6 years ago
Zhaolong Xing	110be57c1b	fix memory optimization type (#19781 ) test=develop	6 years ago
Yiqun Liu	c67c8758cb	Enhance fc_fuse_pass to enable fusing relu to fc_op (#19733 ) * Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop * Enhance fc_fuse_pass to enable fusing relu. * Allow print the shapes of var_desc in graph. test=develop * Enhance fc_fuse_pass_tester. * Remove the use of PADDLE_ENFORCE. test=develop * Correct the number of ops after fusing. test=develop * Fix a typo. test=develop * Set activation_type to null when there is no relu in fc. test=develop * Refine fc_fuse_pass's codes. * Enable the set of shape for tensor. * Refine repeated_fc_relu_pass and add unittest. test=develop	6 years ago
Yiqun Liu	a65c728e5d	Implement the GPU kernel of fc operator (#19687 ) * Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop	6 years ago
Tao Luo	f05d2c519d	paddle::framework::vectorize() templatization [PART3] (#19643 ) * paddle::framework::vectorize() templatization test=develop * update pybind/imperative.cc test=develop * revert update on unsqueeze_op.cc and warpctc_cudnn_op.cu.cc test=develop	6 years ago
Tao Luo	3ae939e48a	unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631 ) * remove assert.h * change PADDLE_ASSERT_MSG to PADDLE_ENFORCE test=develop * fix tensorrt paddle_enforce test=develop	6 years ago
baojun	a3a4b6e570	Enable ngraph through build_strategy (#19266 ) * enable ngraph throught build_strategy test=develop * add unittest test=develop * put use_ngraph unconditional test=develop * remove paddle_enforce test=develop * remove paddle_enforce test=develop * fix copyright test=develop * limit for ngraph only test=develop	6 years ago
Yiqun Liu	c5548178b0	A a pass to enable the use of cudnn (#19346 ) * Add a interface to enable cudnn for inference. * Add cudnn_placement_pass. test=develop * Set the default value of cudnn_enabled_op_types to null. test=develop * Write the common basic class, placement_pass_base, to refine the codes. test=develop * Call EnableCUDNN in unittest. test=develop * Refine cudnn_placement_pass tester. * Enable the testing of cudnn_placement_pass in inference's unittest. test=develop * Add the check of op kernels. test=develop	6 years ago
liuwei1031	d6cb1a4122	add dynamic C runtime support on windows, test=develop (#19502 )	6 years ago
Yiqun Liu	fcec365d29	Add a pass to replace dropout_op with scale_op when is_test is true (#19297 ) * Add simplify_with_basic_ops_pass to replace dropout_op with scale_op when is_test is true. test=develop * Delete dropout_op directly when upscale_in_train is true. test=develop * Improve the debug string, adding the print of op_desc information. * Fix the case when dropout's input x is reused as the next op's output. * Add the pass to inference. test=develop * Change the log level. test=develop * Add unittest for inplace case. * Add comment to explain the pass. * Apply the pass for CPU inference. test=develop * Fix the typo. test=develop * Add the check of AttrType. test=develop	6 years ago
lidanqing	9240e5325c	add local user data conversion into full_pascalvoc_test_preprocess.py (#19283 ) * add local user data conversion into full_pascalvoc_test_preprocess.py test=develop * change PADDLE_ENFORCE to PADDLE_ENFORCE_GE test=develop * change according to reviews test=develop	6 years ago
Adam	97d1db1874	Add generalized Conv+Activation MKLDNN fuse pass creation Part2 (#19237 ) * Add generalized Conv+Activation MKLDNN fuse pass creation Part2 test=develop * Undefined behaviour of GetAttrIfExists<> FIX test=develop	6 years ago
Zhaolong Xing	76c95af000	Fix BUG: Mask RCNN inference diff When using AnalysisPredictor. (#19213 ) * fix mask rcnn bug: 1. affine channel fuse (diff) 2. condition block op (memory leak) 3. merge lod tensor op (diff) 4. memroy optim (diff) test=develop * fix ci aboud PADDLE_ENFOCE fix merge lod infer op ut test=develop	6 years ago
Zeng Jinle	5b6673c44d	merge develop to solve conflict, also fix API doc, test=develop (#18823 )	6 years ago
lidanqing	07a4d8f8d6	Fix mAP problem in unit test of int8 object detection test (#18946 ) * change the top1 comparison to mAP comparison test=develop * change the mobilenet-ssd tester demo data and batch_size settings test=develop	6 years ago
Adam	b837689e97	Add generalized Conv+Activation MKLDNN fuse pass creation (#19072 ) test=develop	6 years ago
wopeizl	80b7ef6fc8	add tensorrt support for windows (#19084 ) * add tensorrt support for windows	6 years ago
Tao Luo	741ce8bb1a	inference_shared_library support profile (#16275 ) test=develop	6 years ago
mapingshuo	4ad7c9d5a7	[WIP] Add Imdb train demo (#18895 ) * add train demo for imdb text classification task * make inference library release data_feed dataset dataset_factory data_feed_factory * add String Data Generator * new feature of train demo: save model params * New feature of train demo: set training config using gflags * change code style for CI * add readme and dataset for imdb demo trainer	6 years ago
silingtong123	fd3b666d8c	test=develop,Synchronize the contents of develop with release1.5 (#18937 ) Fix the third-party openblas dependency for paddle on windows	6 years ago
Zhaolong Xing	3816d221ff	Fix the CE error which caused by paddle-trt version (#18941 ) * Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop * 1 add trt fp16 support test=develop * fix trt fp16 ce error test=develop * add an vlog if the user use trt4 and specify fp16. test=develop	6 years ago
石晓伟	ee2f296ef8	Fusion: seqpool_cvm_concat (#18471 ) * add fusion_seqpool_cvm_concat test=develop * simplify pass, test=develop * fix code style, test=develop	6 years ago
liuwei1031	0d99690809	fix several security bugs reported by security team (#18831 ) * fix security issue, test=develop * bug fix, test=develop * throw an exception when null pointer data with non-zero length PaddleBuf is passed, test=develop	6 years ago
Zhaolong Xing	61238d31f7	Trt fp16 support (#18860 ) * Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop * 1 add trt fp16 support test=develop	6 years ago
Leo Zhao	10eeed93d1	Revert "use static variable to do cache instead of thread local in thread frequent switching case (#18428 )" (#18879 ) This reverts commit `ce38bb5341`. test=develop	6 years ago
Huihuang Zheng	cfce4994cf	Merge cuda 9/10 dockerfile with root dockerfile (#18693 ) Also fix a dependency error which may cause compile error	6 years ago
Zhaolong Xing	26ae6d49e4	Update trt5 for paddle-trt (#18645 ) * update paddle-trt for: 1. fix bug: when batch > 2, core in split plugin. 2. add leaky_relu trt5.0 support (yolov3 from 65ms to 42ms.) 3. add new attr to dropout. 4. shuffle channel, swish, relu6 support test=develop * 1. fix ci test=develop	6 years ago
guru4elephant	d714bf037c	remove async executor and add data_feed.proto to the deps of train demo (#18659 ) * remove async executor and add data_feed.proto to the deps of train demo	6 years ago
石晓伟	25d8079140	Fix Bitmain Predictor::Clone() (#18599 ) * update anakin-engine interfaces for content-dnn test=develop * support only-gpu mode of Anakin modify eltwise parse test=develop * modification for thread-safe test=develop * Integrated template instance test=develop * increase template parameters test=develop * support MLU predictor test=develop * update anakin cmake files test=develop * update TargetWrapper::set_device * update the initialization of anakin subgraph test=develop * use the default constructor of base class test=develop * load model from buffer with length test=develop * modify the access level of class test=develop * support anakin for bitmain arch test=develop * remove files * checkout cmakelists test=develop * modify interfaces test=develop * add cmake dependments test=develop * enforce the outputs of net test=develop	6 years ago
Tao Luo	076f833110	add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy (#18580 ) * add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy test=develop * enhance MkldnnPostReset test=develop * add comments for mkldnn_cache_capacity field test=develop	6 years ago
Jiabin Yang	667f88f9a6	Fix/gcc 4.8 ubt link error (#18558 ) * test=develop, fix docker with paddle nccl problem * test=develop, fix/gcc_4.8_ubt_link_error * test=develop, fix code format	6 years ago
Zhaolong Xing	88b52a27fe	Inference: fix mask rcnn model diff, optim memory usage, memory leak. (#18532 ) * Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop	6 years ago
石晓伟	1529154821	Support Bitmain Anakin (#18542 ) * update anakin-engine interfaces for content-dnn test=develop * support only-gpu mode of Anakin modify eltwise parse test=develop * modification for thread-safe test=develop * Integrated template instance test=develop * increase template parameters test=develop * support MLU predictor test=develop * update anakin cmake files test=develop * update TargetWrapper::set_device * update the initialization of anakin subgraph test=develop * use the default constructor of base class test=develop * load model from buffer with length test=develop * modify the access level of class test=develop * support anakin for bitmain arch test=develop * remove files * checkout cmakelists test=develop	6 years ago
Leo Zhao	ce38bb5341	use static variable to do cache instead of thread local in thread frequent switching case (#18428 )	6 years ago
Tao Luo	fe32879d2a	add mkldnn shapeblob cache clear strategy (#18513 ) * add mkldnn shapeblob cache clear strategy test=develop * refine with comments test=develop * make cache clear strategy more safey test=develop * add lock for GetShapeBlobSize test=develop	6 years ago
bingyanghuang	3fe6bf5ee6	fix command line bug in int8v2 readme (#18507 )	6 years ago
石晓伟	047bba855b	Remove the obsolete cmake options (#18481 ) * remove the obsolete cmake options, test=develop * remove unittests, test=develop	6 years ago
Tao Luo	d234aa02cd	add transfer_scope_cache unit-test (#18467 ) test=develop	6 years ago
Tao Luo	3123d18787	remove unused AnalysisPredictor::SetMkldnnThreadID() (#18444 ) test=develop	6 years ago
Michał Gallus	7023a86c3a	Fix Pooling output scale (#18186 ) * Int8: Fix Pooling output scale test=develop * Update scales quantization for certain operators These include: concat, transpose, pool and reshape. test=develop * Move concat minimum scale finding to quantizer test=develop	6 years ago
Michał Gallus	8409693272	Reset DeviceContext after quantization warmup (#18182 ) test=develop	6 years ago
Sylwester Fraczek	9252e8fa08	add int8 mkldnn prior_box (#17242 ) add prior_box quantization code add scale algo rules for prior box test=develop	6 years ago
lidanqing	5fd68ac154	some fixes for int8 mobilenet_ssd tester (#18112 ) * some fixes for int8 mobilenet_ssd tester test=develop * change wrong data file name test=develop * change test images bin file from 200 images to 100 images * change directory existence to file existence during downloading test=develop * reuse download_data test=develop * run full dataset when iterations=0 test=develop	6 years ago
wopeizl	daa32d5383	fix package generation for inference test=develop (#18220 )	6 years ago
翟飞跃	de42fe8fd5	Change int8v2 CAPI unit test name and add log in the prediction stage (#18200 ) * fix issue 18111;test=develop * fix timer;test=develop * refine code;test=develop	6 years ago
翟飞跃	802ea50956	fix spelling errors (#17941 ) * fix spelling errors; test=develop * Update API.spec update md5 * Update API.spec * change the order of api;test=develop	6 years ago
翟飞跃	78441c5449	add mkldnn Int8v2 slim doc (#17909 )	6 years ago
Wojciech Uss	ca5642c850	unify FP32 vs. INT8 comparison tests output (#18111 ) test=develop	6 years ago
Wojciech Uss	c26130f3a9	reuse C-API INT8 unit test application (#18077 ) * reuse C-API INT8 unit test application test=develop * updates after review test=develop	6 years ago
lidanqing	466254151a	add Mobilienet ssd int8 analyzer tester (#18075 ) * add pascalvoc preprocess script and mobilenet-ssd analyzer_tester, wait 17737 * change converting local dataset to downloading and converting tarfile test=develop * change the test data_path test=develop * change copyright (c) 2016 to copyright (c) 2019 test=develop	6 years ago
石晓伟	42f12a4aca	fix ci test cmake test=develop (#18060 )	6 years ago
Michał Gallus	8462e2b805	Disable MKLDNN FC in Resnet50 test (#18030 )	6 years ago
石晓伟	04ea7cb069	modify the access level of anakin engine (#18015 ) test=develop	6 years ago
石晓伟	bce259e5bf	Update the Anakin interfaces for content-dnn and MLU (#17890 ) * update anakin-engine interfaces for content-dnn test=develop * support only-gpu mode of Anakin modify eltwise parse test=develop * modification for thread-safe test=develop * Integrated template instance test=develop * increase template parameters test=develop * support MLU predictor test=develop * update anakin cmake files test=develop * update TargetWrapper::set_device * update the initialization of anakin subgraph test=develop * use the default constructor of base class test=develop	6 years ago
Zhaolong Xing	4e8d5a034f	Light mem reuse strategy for inference. (#17925 ) * fix: when use the load model from memory mode, the RAM occupy is high test=develop * ligth mem reuse test=develop * fix cpplint test=develop	6 years ago
mozga-intel	c1379bf238	[NGraph] Bert model for a capi, ngraph's support test=develop (#17844 )	6 years ago
石晓伟	d008260fa8	update the initialization of anakin subgraph (#17880 ) test=develop	6 years ago
Zhaolong Xing	ae576f3c68	fix: when use the load model from memory mode, the RAM occupy is high (#17788 ) test=develop	6 years ago
翟飞跃	993c703bcc	INT8 MKL-DNN v2 integrate to slim (#17634 ) * refactor PR 16865 * delete mergetool files * test=develop * test=develop * test=develop * test=develop * create dir for int8 model before call SaveOptimModel * test=develop * mkldnn int8 only support linux; test=develop * refine code; test=develop * remove comment; test=develop * refine code; test=develop * fix bug; test=develop * add exception for mkldnn_post_training_strategy * reuse int8v2 CAPI dataset; test=develop * fix accuracy check bug; test=develop * remove tab * convert files to unix format * test=develop * reduce CI time;test=develop * reduce CI time and refine code;test=develop * refine comment; test=develop * add cmake FLAGS;test=develop * remove predict_num;test=develop	6 years ago
Tao Luo	e089e454a1	make omp thread num default 1 after inference run (#17801 ) test=develop	6 years ago
Tao Luo	b4b169467b	add fc_mkldnn_pass in compare_mkldnn (#17712 ) test=develop	6 years ago
Zhaolong Xing	4337009b92	fix trt ci timeout error (#17701 ) test=develop	6 years ago
mozga-intel	5eb81fe595	Capi for a ngraph engine (#17037 )	6 years ago
lidanqing	04b6c29ee0	Improve mobilenetv2 INT8 performance by using INT8 relu as post-op (#17570 ) * add INT8 conv+relu6 fuse and enbale mobilentv2 INT8 test test=develop * change fasle and 0.0 to fuse_brelu and brelu_threshold test=develop change the "fuse_relu\|\|fuse_brelu" to "unsigned_output" test=develop * Use relu instead of brelu as INT8 post-op because INT8 brelu is not enabled in mkldnn v0.18 test=develop * continuous-integration fix test=develop	6 years ago
Jacek Czaja	6d8075ecef	[MKL-DNN] conv_transpose mkldnn bias pass (#17644 ) * - changes to graph detector - Changes to pass - Added ut for new pass - use_pass - Added pass to mkldnn passes - fix to registration - improved verbose messaging for conv bias passes - Lint fixes test=develop * - Lint fixes test=develop	6 years ago
Sylwester Fraczek	96845d2168	add Concat quantization (#17448 ) * add Concat quantization add unit test for quantizing concat fix for wrong value when the input is not in map of calculated scales add use_quantizer to concat_op.cc add scale_algo rules for concat test=develop * missing fix for multiple inputs quantize-squash * wojtuss review fix: adding comment test=develop	6 years ago
Zhen Wang	8bd651b7ed	Fix the bug in the AnalysisPredictor and add more directions about io APIs. (#17639 ) * fix the bug that sub_scope_ may be null in AnalysisPredictor::Run. * add more directions about io APIs' docs. * update the API.spec. test=develop test=document_preview	6 years ago
Zeng Jinle	4aa931dd85	Code clean of Allocator (#17602 ) * Revert "Revert "Fix allocator bug"" This reverts commit `174d0d0b90`. * Revert "fix travis ci" This reverts commit `5656fa9f7c`. test=develop * add inlined_vector.h, test=develop * add inlined_vector_test,test=develop * clean code of allocator,test=develop * delete zero_size_allocator.h,test=develop * fix failed unittest,test=develop	6 years ago
Zhaolong Xing	61221ebc28	TRT: Support set dynamic range in int8 mode. (#17524 ) * fluid int8 train and trt int8 predict align. trt int8 predict init op converter * 2. align fluid int8 train and trt int8 inference. enhance quant dequant fuse pass enhance op converter, trt engine, trt engine op, trt subgraph pass. * 3. add delete_quant_dequant_pass for trt test=develop * 4. add the missing file test=develop * 5. i modify the c++ interface, but forget to modify the pybind code fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter test=develop	6 years ago
Michał Gallus	0c39b97b4e	[MKL-DNN] Add Fully Connected Op for inference only(#15226 ) * fuse mul and elementwise add to fc * Reimplement the FC forward operator * Fix FC MKLDNN integration by transposing weights * Add FC MKLDNN Pass test=develop * FC MKLDNN Pass: change memcpy to std::copy * Fix MKLDNN FC handling of mismatch input and weights dims * Lower tolerance for MKL-DNN in resnet50 test test=develop * Adjust FC to support MKLDNN Op placement test=develop * Adjust Placement Op to set use_mkldnn attribute for graph test=develop * MKLDNN FC: fix weights format so that gemm version is called test=develop * FC MKLDNN: Remove tolerance decrease from tester_helper * FC MKL-DNN: Refactor the code, change input reorder to weight reorder * MKL-DNN FC: Introduce operator caching test=develop * FC MKL-DNN: Fix the tensor type in ExpectedKernelType test=develop * FC MKL-DNN: fix style changes test=develop * FC MKL-DNN: fallback to native on non-supported dim sizes test=develop * FC MKLDNN: fix CMake paths test=develop * FC MKLDNN: Refine placement pass graph mkldnn attribute test=develop * Fix Transpiler error for fuse_conv_eltwise test=develop * Fix missing STL includes in files test=develop * FC MKL-DNN: Enable new output size computation Also, refine pass to comply with newest interface. test=develop * FC MKL-DNN: enable only when fc_mkldnn_pass is enabled * FC MKL-DNN: Allow Weights to use oi or io format * FC MKL-DNN: Adjust UT to work with correct dims test=develop * Enable MKL DEBUG for resnet50 analyzer test=develop * FC MKL-DNN: Improve Hashing function test=develop * FC MKL-DNN: Fix shape for fc weights in transpiler * FC MKL-DNN: Update input pointer in re-used fc primitive * Add log for not handling fc fuse for unsupported dims test=develop * FC MKL-DNN: Move transpose from pass to Op Kernel test=develop * FC MKL-DNN: Disable transpose in unit test test=develop * FC MKL-DNN: Remove fc_mkldnn_pass from default list * Correct Flag for fake data analyzer tests test=develop * FC MKL-DNN: Add comment about fc mkldnn pass disablement test=develop * FC MKL-DNN: Disable fc in int8 tests test=develop	6 years ago
Sylwester Fraczek	5b2a3c4b12	Conv concat relu quantization (#17466 ) * add conv_concat_relu fuse test=develop * add test code test=develop * added missing include with unordered_map test=develop * review fixes for wojtuss test=develop * remove 'should (not) be fused' comment statements one of them was invalid anyway test=develop	6 years ago
Sylwester Fraczek	bccb0ba49a	fix quantize_squash_pass segfault when no tensor linked to Bias (#17292 ) * fix quantize_squash_pass segfault when there is no tensor linked do Bias input test=develop * add googlenet test test=develop * fix concat CreateKey not using input format test=develop	6 years ago
Zhaolong Xing	38da103034	fix trt ci bug temporary. (#17565 ) ban all trt ut. will fix it later. test=develop	6 years ago
lijianshe02	daf88968e2	fix bug that saved optimal model path in test_analyzer_save_model con… (#17555 ) * modify saved model path in analyzer_save_model.cc test=develop	6 years ago
guomingz	2281ebf0f3	Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130 ) * Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization. Below table shows the benchmark(FPS) which measured on skx-8180(28 cores) Batch size \| with fusion \| without fusion -- \| -- \| -- 1 \| 214.7 \| 53.4 50 \| 1219.727 \| 137.280 test=develop * Fix the format issue test=develop * Add the missing nolint comments. test=develop * Fix the typos. test=develop * Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine. test=develop * Adjust the indentation. test=develop * Add the test_conv_brelu_mkldnn_fuse_pass case. test=develop * Slightly update the code per Baidu comments. Let the parameter definition embedded into the code. That's will make the code easy to understand. test=develop	6 years ago
Tao Luo	3d19f44a89	remove unused SERIAL compiler option (#17500 ) test=develop	6 years ago
lidanqing	36757ed203	Enabling resnet101, vgg16, vgg19 INT8v2 model tests (#17468 ) * Add 6 models tests support in CMake * enabling resnet101, vgg16, vgg19 INT8v2 model tests test=develop * remove SERIAL test=develop	6 years ago
liuwei1031	ba70cc499e	fix security bugs : (#17464 ) http://newicafe.baidu.com:80/issue/PaddleSec-33/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-28/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-25/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-24/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-21/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-20/show?from=page test=develop	6 years ago
Tao Luo	32da5e9c3d	remove unused expected_kernel_cache_pass (#17486 ) test=develop	6 years ago
wopeizl	ca3ba378c7	fix the random compilation failure on windows test=develop (#17475 ) * fix the random compilation failure on windows	6 years ago
Zhen Wang	4a1b7fec96	Add setting Scope function for the graph class (#17417 ) * add set_not_owned function for graph * add scope set. test=develop * add scope_ptr enforce not null before setting.test=develop	6 years ago
flame	e48dd92fc8	bug fix (#17392 ) fix secure bug	6 years ago
Zhaolong Xing	7a3bb061d8	fix: (#17279 ) 1. infernce multi card occupy 2. facebox model inference occupy too much test=develop	6 years ago
Wojciech Uss	984aa90583	improved unit test output (#17266 ) added printing data type to differentiate int8 and fp32 latency results test=develop	6 years ago
石晓伟	a72dbe9abf	Cherry-pick benchmark related changes from release/1.4 (#17156 ) * cherry-pick commit from `8877054` * cherry-pick commit from `3f0b97d` * cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn (cherry picked from commit `8643dbc233`) * Cherry-Pick from 16662 : Anakin subgraph cpu support (cherry picked from commit `7ad182e16c`) * Cherry-pick from 1662, 16797.. : add anakin int8 support (cherry picked from commit `e14ab180fe`) * Cherry-pick from 16813 : change singleton to graph RegistBlock test=release/1.4 (cherry picked from commit `4b9fa42307`) * Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2 Support ShuffleNet and MobileNet-v2, test=release/1.4 (cherry picked from commit `a6fb066f90`) * Cherry-pick : anakin subgraph add opt config layout argument #16846 test=release/1.4 (cherry picked from commit `8121b3eccb`) * 1. add shuffle_channel_detect (cherry picked from commit `6efdea8997`) * update shuffle_channel op convert, test=release/1.4 (cherry picked from commit `e4726a066f`) * Modify symbol export rules test=develop	6 years ago
Leo Zhao	54636a1982	call SetNumThreads everytime to avoid missing omp thread setting (#17224 ) * call SetNumThreads everytime to avoid missing omp thread setting resolve #17153 test=develop * add paddle_num_threads into config for test_analyzer_pyramid_dnn resolve #17153 test=develop	6 years ago
wopeizl	83c4f7721f	use two GPUs to run the exclusive test test=develop (#17187 )	6 years ago
tensor-tang	79ed1c76cd	fix bn fuse vardesc and add model saver (#17143 ) * fix bn fuse vardesc and add model saver test=develop * unify save model in test helper test=develop * fix mkdir on windows test=develop * remove magic number use bn bias var desc test=develop	6 years ago
Tao Luo	d9cd989825	Merge pull request #17048 from luotao1/fix_runtime_cache_bug fix runtime_context_cache bug when gpu model has an op runs only on cpu	6 years ago
tangwei12	13295d90d9	load persistables with selected rows, test=develop (#17047 )	6 years ago
luotao1	490e746269	fix runtime_context_cache bug when gpu model has an op runs only on cpu test=develop	6 years ago
wopeizl	d9991dccdd	add parallel build script to ci … (#16901 ) * add parallel build script to ci test=develop * 1. classify the test case as single card/two cards/multiple cards type 2. run test case according to the run type	6 years ago
Tao Luo	aa7b975bf6	disable runtime_context_cache pass by default test=develop	6 years ago
nhzlx	bc6b0ca1f4	fix trt anakin subgraph compile rely test=develop	6 years ago
gongweibao	cbdb8a17b1	Polish DGC code (#16818 )	6 years ago
Tao Luo	bc037c13c7	use multi-thread to speedup CI tests test=develop	6 years ago
Tao Luo	5b1565a7be	Merge pull request #16875 from lidanqing-intel/lidanqing/improve_preprocess_script Improve preprocessing script and read from tar	6 years ago
root	1965a22488	minus trt ci times. test=develop	6 years ago
Tao Luo	ca8b8fa0bd	Merge pull request #16830 from Superjomn/fix/tmp-memory-optim fix memory optim temporarily	6 years ago
lijianshe02	de26df440b	add SaveOptimModel interface in analysis_predictor.h and test it in a… (#16441 ) * add SaveOptimModel interface in analysis_predictor.h and test it in analyzer_dam_tester and analyzer_resnet50_tester test=develop	6 years ago
lidanqing	de02d40e98	improve preprocess script and read from tar test=develop	6 years ago
superjomn	f58c3ec189	fix memory optim temporarily test=develop	6 years ago
Yihua Xu	93cedfdb9c	Fix the order while sorting the operators (#16756 ) * Fix the order when sorting operators. test=develop * Enable transfomer compare test item. test=develop * Use set to replace vector. test=develop	6 years ago
liuwei1031	85363848a1	Security issue (#16774 ) * disable memory_optimize and inpalce strategy by default, test=develop * fix security issue http://newicafe.baidu.com:80/issue/PaddleSec-3/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-8/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-12/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-32/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-35/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-37/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-40/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-43/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-44/show?from=page http://newicafe.baidu.com:80/issue/PaddleSec-45/show?from=page test=develop * revert piece.cc, test=develop * adjust api.cc,test=develop	6 years ago
tensor-tang	d6c1b5a73b	disable seqpool concat pass by default saving CI time test=develop	6 years ago
Tao Luo	ad4a1bd13c	Merge pull request #16339 from luotao1/core_opt_choose_kernel Cache the chosen kernel of operators	6 years ago
Tao Luo	d5c8d4acfe	reduce all analyzer_test ci elasped time test=develop	6 years ago
luotao1	226596a296	Merge branch 'develop' into core_opt_choose_kernel	6 years ago
bingyanghuang	88ceda5134	MKLDNN INT8 v2 readme.md (#16515 )	6 years ago
luotao1	bd636a9ea6	test_analyzer_int8 tests use default pass order test=develop	6 years ago
Yan Chunwei	044ae2497d	fix identity temporarily (#15942 )	6 years ago

1 2 3 4 5 ...

1271 Commits (e58619295e82b86795da6d29587a17a3efdc486a)