* add new macro BOOST_GET_SAFELY & unittests, test=develop
* add different macro type, test=develop
* fix get macro type in executor, test=develop
* four macro part change backup
* using one macro for all case, test=develop
* revert attribute change, test=develop
* change to three func to solve gcc4.8 bug, test=develop
* polish some details, test=develop
* update the analysis predictor, test=develop
* update the unit test, test=develop
* no priority set before the inferface determined, test=develop
* interface name generalization, test=develop
* Support LoDTEnsorArray in fetch op
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Support LoDTensorArray in fetch
test=develop
* Remove the NGraph engine from PDPD repository
1. Each operator was removed from the operator's directory
2. Each test was removed from the unittest directory
3. The parallel executor support was removed from the PDPD
4. The CMake file was removed from the PDPD
5. The NG flags were removed from the repository
test=develop
* Remove ngraph from:
1. Cmake file
2. Python file
test=develop
* change the ci trt from version 5. to 6.0
* paddle-trt dynamic shape support init
* conv+bias or conv+bn dynamic shape support
test=develop
* modity trt engine opconvert
test=develop
* fix ci error
test=develop
* Implement Int8 FC
* Integrate FC into INT8v2
test=develop
* int8 FC: transpose weights before computing scales
test=develop
* Add support for activation_type string in FC
test=develop
* Disable MKL-DNN's FC in VGG16 and 19
test=develop
* Disable FC quantization when mkldnn FC is disabled
test=develop
* Solve PADDLE_ENFORCES in FC int8
* Fix Paddle enforces and remove const cast
test=develop
* Fix style changes
test=develop
* Fix quantizer_tester test and add fc quantization
test=develop
* Fix FC test fail on CUDA
* Remove unnecessary log from quantize placement pass
test=develop
* Add Thread ID to FC hash key
test=develop
* Add comments to MKL-DNN FC Kernel
test=develop
* Refactor quantizer
test=develop
* Fix linter issues
test=develop
* Fix crash in slim googlenet
test=develop
* Fix PADDLE_ENFORCE messages
test=develop
* Add fc_elementwise_layernorm_fuse pass and unittest.
* Add fused_fc_elementwise_layernorm op and its GPU kernel.
test=develop
* Apply fc_elementwise_layernorm_fuse_pass to GPU inference.
* Add the setting of attrs in the definition of binary_op.
test=develop
* Add comment.
* Implement the unittest.
test=develop
* Change the unittest name of layer_norm.
test=develop
* Refine the codes related to fc op.
* Add GPU implementation for fc functor.
* Apply fc_fuse_pass in GPU inference.
test=develop
* Change the cmake for fc op.
* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
* Add an attribute to set the activation type in fc_op.
* Enhance the unittest of fc_op.
test=develop
* Remove the declaration of FCOpGrad back to the header file.
test=develop
* Set default value for newly added arguments in test_fc_op.
test=develop
* Add a interface to enable cudnn for inference.
* Add cudnn_placement_pass.
test=develop
* Set the default value of cudnn_enabled_op_types to null.
test=develop
* Write the common basic class, placement_pass_base, to refine the codes.
test=develop
* Call EnableCUDNN in unittest.
test=develop
* Refine cudnn_placement_pass tester.
* Enable the testing of cudnn_placement_pass in inference's unittest.
test=develop
* Add the check of op kernels.
test=develop
* Add simplify_with_basic_ops_pass to replace dropout_op with scale_op when is_test is true.
test=develop
* Delete dropout_op directly when upscale_in_train is true.
test=develop
* Improve the debug string, adding the print of op_desc information.
* Fix the case when dropout's input x is reused as the next op's output.
* Add the pass to inference.
test=develop
* Change the log level.
test=develop
* Add unittest for inplace case.
* Add comment to explain the pass.
* Apply the pass for CPU inference.
test=develop
* Fix the typo.
test=develop
* Add the check of AttrType.
test=develop
* add local user data conversion into full_pascalvoc_test_preprocess.py
test=develop
* change PADDLE_ENFORCE to PADDLE_ENFORCE_GE
test=develop
* change according to reviews
test=develop
* fix security issue, test=develop
* bug fix, test=develop
* throw an exception when null pointer data with non-zero length PaddleBuf is passed, test=develop
* Fix Mask rcnn predictor
1. refine memory optim algorithm to support the model with the block op.
2. output diff : modify the affine channel fuse
3. add condition_block_infer op
add interface for setting trt calib table dir
test=develop
* add the missing files.
test=develop
* 1 add trt fp16 support
test=develop
* update paddle-trt for:
1. fix bug: when batch > 2, core in split plugin.
2. add leaky_relu trt5.0 support (yolov3 from 65ms to 42ms.)
3. add new attr to dropout.
4. shuffle channel, swish, relu6 support
test=develop
* 1. fix ci
test=develop
* update anakin-engine interfaces for content-dnn
test=develop
* support only-gpu mode of Anakin
modify eltwise parse
test=develop
* modification for thread-safe
test=develop
* Integrated template instance
test=develop
* increase template parameters
test=develop
* support MLU predictor
test=develop
* update anakin cmake files
test=develop
* update TargetWrapper::set_device
* update the initialization of anakin subgraph
test=develop
* use the default constructor of base class
test=develop
* load model from buffer with length
test=develop
* modify the access level of class
test=develop
* support anakin for bitmain arch
test=develop
* remove files
* checkout cmakelists
test=develop
* modify interfaces
test=develop
* add cmake dependments
test=develop
* enforce the outputs of net
test=develop
* Fix Mask rcnn predictor
1. refine memory optim algorithm to support the model with the block op.
2. output diff : modify the affine channel fuse
3. add condition_block_infer op
add interface for setting trt calib table dir
test=develop
* add the missing files.
test=develop
* update anakin-engine interfaces for content-dnn
test=develop
* support only-gpu mode of Anakin
modify eltwise parse
test=develop
* modification for thread-safe
test=develop
* Integrated template instance
test=develop
* increase template parameters
test=develop
* support MLU predictor
test=develop
* update anakin cmake files
test=develop
* update TargetWrapper::set_device
* update the initialization of anakin subgraph
test=develop
* use the default constructor of base class
test=develop
* load model from buffer with length
test=develop
* modify the access level of class
test=develop
* support anakin for bitmain arch
test=develop
* remove files
* checkout cmakelists
test=develop
* add INT8 conv+relu6 fuse and enbale mobilentv2 INT8 test
test=develop
* change fasle and 0.0 to fuse_brelu and brelu_threshold
test=develop
change the "fuse_relu||fuse_brelu" to "unsigned_output"
test=develop
* Use relu instead of brelu as INT8 post-op because INT8 brelu is not enabled in mkldnn v0.18
test=develop
* continuous-integration fix
test=develop
* add Concat quantization
add unit test for quantizing concat
fix for wrong value when the input is not in map of calculated scales
add use_quantizer to concat_op.cc
add scale_algo rules for concat
test=develop
* missing fix for multiple inputs quantize-squash
* wojtuss review fix: adding comment
test=develop
* fix the bug that sub_scope_ may be null in AnalysisPredictor::Run.
* add more directions about io APIs' docs.
* update the API.spec. test=develop test=document_preview
* fluid int8 train and trt int8 predict align.
trt int8 predict init
op converter
* 2. align fluid int8 train and trt int8 inference.
enhance quant dequant fuse pass
enhance op converter, trt engine, trt engine op, trt subgraph pass.
* 3. add delete_quant_dequant_pass for trt
test=develop
* 4. add the missing file
test=develop
* 5. i modify the c++ interface, but forget to modify the pybind code
fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter
test=develop
* fuse mul and elementwise add to fc
* Reimplement the FC forward operator
* Fix FC MKLDNN integration by transposing weights
* Add FC MKLDNN Pass
test=develop
* FC MKLDNN Pass: change memcpy to std::copy
* Fix MKLDNN FC handling of mismatch input and weights dims
* Lower tolerance for MKL-DNN in resnet50 test
test=develop
* Adjust FC to support MKLDNN Op placement
test=develop
* Adjust Placement Op to set use_mkldnn attribute for graph
test=develop
* MKLDNN FC: fix weights format so that gemm version is called
test=develop
* FC MKLDNN: Remove tolerance decrease from tester_helper
* FC MKL-DNN: Refactor the code, change input reorder to weight reorder
* MKL-DNN FC: Introduce operator caching
test=develop
* FC MKL-DNN: Fix the tensor type in ExpectedKernelType
test=develop
* FC MKL-DNN: fix style changes
test=develop
* FC MKL-DNN: fallback to native on non-supported dim sizes
test=develop
* FC MKLDNN: fix CMake paths
test=develop
* FC MKLDNN: Refine placement pass graph mkldnn attribute
test=develop
* Fix Transpiler error for fuse_conv_eltwise
test=develop
* Fix missing STL includes in files
test=develop
* FC MKL-DNN: Enable new output size computation
Also, refine pass to comply with newest interface.
test=develop
* FC MKL-DNN: enable only when fc_mkldnn_pass is enabled
* FC MKL-DNN: Allow Weights to use oi or io format
* FC MKL-DNN: Adjust UT to work with correct dims
test=develop
* Enable MKL DEBUG for resnet50 analyzer
test=develop
* FC MKL-DNN: Improve Hashing function
test=develop
* FC MKL-DNN: Fix shape for fc weights in transpiler
* FC MKL-DNN: Update input pointer in re-used fc primitive
* Add log for not handling fc fuse for unsupported dims
test=develop
* FC MKL-DNN: Move transpose from pass to Op Kernel
test=develop
* FC MKL-DNN: Disable transpose in unit test
test=develop
* FC MKL-DNN: Remove fc_mkldnn_pass from default list
* Correct Flag for fake data analyzer tests
test=develop
* FC MKL-DNN: Add comment about fc mkldnn pass disablement
test=develop
* FC MKL-DNN: Disable fc in int8 tests
test=develop
* add conv_concat_relu fuse
test=develop
* add test code
test=develop
* added missing include with unordered_map
test=develop
* review fixes for wojtuss
test=develop
* remove 'should (not) be fused' comment statements
one of them was invalid anyway
test=develop
* Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization.
Below table shows the benchmark(FPS) which measured on skx-8180(28 cores)
Batch size | with fusion | without fusion
-- | -- | --
1 | 214.7 | 53.4
50 | 1219.727 | 137.280
test=develop
* Fix the format issue
test=develop
* Add the missing nolint comments.
test=develop
* Fix the typos.
test=develop
* Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine.
test=develop
* Adjust the indentation.
test=develop
* Add the test_conv_brelu_mkldnn_fuse_pass case.
test=develop
* Slightly update the code per Baidu comments.
Let the parameter definition embedded into the code.
That's will make the code easy to understand.
test=develop
* add parallel build script to ci test=develop
* 1. classify the test case as single card/two cards/multiple cards type
2. run test case according to the run type