QAT Int8 document (#21360)

* update benchmark for int8v2, QAT1, QAT2 accuracy and performance test=document_fix * change according to reviews test=develop test=document_fix * improve some descriptions and some models test=develop test=document_fix * update models benchmark data test=develop test=document_fix * update int8v2 and qat2 performance test=develop test=document_fix
6 years ago · fbf9eca0d3
parent a1a5adc9b8
commit fbf9eca0d3
2 changed files with 100 additions and 86 deletions
--- a/paddle/fluid/inference/tests/api/int8_mkldnn_quantization.md
+++ b/paddle/fluid/inference/tests/api/int8_mkldnn_quantization.md
@ -1,6 +1,6 @@
-# INT8 MKL-DNN quantization
+# INT8 MKL-DNN post-training quantization

-This document describes how to use Paddle inference Engine to convert the FP32 models to INT8 models. We provide the instructions on enabling INT8 MKL-DNN quantization in Paddle inference and show the accuracy and performance results of the quantized models, including 7 image classification models: GoogleNet, MobileNet-V1, MobileNet-V2, ResNet-101, ResNet-50, VGG16, VGG19, and 1 object detection model Mobilenet-SSD.
+This document describes how to use Paddle inference Engine to convert the FP32 models to INT8 models using INT8 MKL-DNN post-training quantization. We provide the instructions on enabling INT8 MKL-DNN quantization in Paddle inference and show the accuracy and performance results of the quantized models, including 7 image classification models: GoogleNet, MobileNet-V1, MobileNet-V2, ResNet-101, ResNet-50, VGG16, VGG19, and 1 object detection model Mobilenet-SSD.

 ## 0. Install PaddlePaddle

@ -40,28 +40,27 @@ We provide the results of accuracy and performance measured on Intel(R) Xeon(R)

 >**I. Top-1 Accuracy on Intel(R) Xeon(R) Gold 6271**

-| Model        | FP32 Accuracy   | INT8 Accuracy   | Accuracy Diff(FP32-INT8)   |
-| :----------: | :-------------: | :------------:  | :--------------:           |
-| GoogleNet    |  70.50%         |  70.08%         |   0.42%                    |
-| MobileNet-V1 |  70.78%         |  70.41%         |   0.37%                    |
-| MobileNet-V2 |  71.90%         |  71.34%         |   0.56%                    |
-| ResNet-101   |  77.50%         |  77.43%         |   0.07%                    |
-| ResNet-50    |  76.63%         |  76.57%         |   0.06%                    |
-| VGG16        |  72.08%         |  72.05%         |   0.03%                    |
-| VGG19        |  72.57%         |  72.57%         |   0.00%                    |
+|    Model     | FP32 Accuracy | INT8 Accuracy | Accuracy Diff(INT8-FP32) |
+|:------------:|:-------------:|:-------------:|:------------------------:|
+|  GoogleNet   |    70.50%     |    70.08%     |          -0.42%          |
+| MobileNet-V1 |    70.78%     |    70.41%     |          -0.37%          |
+| MobileNet-V2 |    71.90%     |    71.34%     |          -0.56%          |
+|  ResNet-101  |    77.50%     |    77.43%     |          -0.07%          |
+|  ResNet-50   |    76.63%     |    76.57%     |          -0.06%          |
+|    VGG16     |    72.08%     |    72.05%     |          -0.03%          |
+|    VGG19     |    72.57%     |    72.57%     |          0.00%           |

 >**II. Throughput on Intel(R) Xeon(R) Gold 6271 (batch size 1 on single core)**

-| Model        | FP32 Throughput(images/s)  | INT8 Throughput(images/s) | Ratio(INT8/FP32)|
-| :-----------:| :------------:             | :------------:            | :------------:  |
-| GoogleNet    |    32.76                   |    67.43                  |   2.06          |
-| MobileNet-V1 |    73.96                   |   218.82                  |   2.96          |
-| MobileNet-V2 |    87.94                   |   193.70                  |   2.20          |
-| ResNet-101   |     7.17                   |    26.37                  |   3.42          |
-| ResNet-50    |    13.26                   |    48.72                  |   3.67          |
-| VGG16        |     3.47                   |    10.10                  |   2.91          |
-| VGG19        |     2.82                   |     8.68                  |   3.07          |
-
+|    Model     | FP32 Throughput(images/s) | INT8 Throughput(images/s) | Ratio(INT8/FP32) |
+|:------------:|:-------------------------:|:-------------------------:|:----------------:|
+|  GoogleNet   |           32.53           |           68.32           |       2.13       |
+| MobileNet-V1 |           73.98           |          224.91           |       3.04       |
+| MobileNet-V2 |           86.59           |          204.91           |       2.37       |
+|  ResNet-101  |           7.15            |           26.73           |       3.74       |
+|  ResNet-50   |           13.15           |           49.48           |       3.76       |
+|    VGG16     |           3.34            |           10.11           |       3.03       |
+|    VGG19     |           2.83            |           8.68            |       3.07       |

 * ## Prepare dataset

@ -72,7 +71,7 @@ cd /PATH/TO/PADDLE/build
 python ../paddle/fluid/inference/tests/api/full_ILSVRC2012_val_preprocess.py
 ```

-Then the ILSVRC2012 Validation dataset will be preprocessed and saved by default in `~/.cache/paddle/dataset/int8/download/int8_full_val.bin`
+Then the ILSVRC2012 Validation dataset will be preprocessed and saved by default in `$HOME/.cache/paddle/dataset/int8/download/int8_full_val.bin`

 * ## Commands to reproduce image classification benchmark

@ -95,17 +94,17 @@ MODEL_NAME=googlenet, mobilenetv1, mobilenetv2, resnet101, resnet50, vgg16, vgg1

 ## 3. Accuracy and Performance benchmark for Object Detection models

->**I. mAP on Intel(R) Xeon(R) Gold 6271 (batch size 1 on single core):**
+>**I. mAP on Intel(R) Xeon(R) Gold 6271 (batch size 100 on single core):**

-| Model        | FP32 Accuracy   | INT8 Accuracy   | Accuracy Diff(FP32-INT8)   |
-| :----------: | :-------------: | :------------:  | :--------------:           |
-| Mobilenet-SSD| 73.80%         |  73.17%         |   0.63%                    |
+|     Model     | FP32 Accuracy | INT8 Accuracy | Accuracy Diff(INT8-FP32) |
+|:-------------:|:-------------:|:-------------:|:------------------------:|
+| Mobilenet-SSD |    73.80%     |    73.17%     |          -0.63           |

->**II. Throughput on Intel(R) Xeon(R) Gold 6271 (batch size 1 on single core)**
+>**II. Throughput on Intel(R) Xeon(R) Gold 6271 (batch size 100 on single core)**

-| Model        | FP32 Throughput(images/s)  | INT8 Throughput(images/s) | Ratio(INT8/FP32)|
-| :-----------:| :------------:             | :------------:            | :------------:  |
-| Mobilenet-SSD    |    37.8180       | 115.0604 |3.04 |
+|     Model     | FP32 Throughput(images/s) | INT8 Throughput(images/s) | Ratio(INT8/FP32) |
+|:-------------:|:-------------------------:|:-------------------------:|:----------------:|
+| Mobilenet-ssd |           37.94           |          114.94           |       3.03       |

 * ## Prepare dataset

@ -113,16 +112,16 @@ MODEL_NAME=googlenet, mobilenetv1, mobilenetv2, resnet101, resnet50, vgg16, vgg1
  
 ```bash
 cd /PATH/TO/PADDLE/build
-python ./paddle/fluid/inference/tests/api/full_pascalvoc_test_preprocess.py --choice=VOC_test_2007 \\
+python ../paddle/fluid/inference/tests/api/full_pascalvoc_test_preprocess.py --choice=VOC_test_2007 
 ```

-Then the Pascal VOC2007 test set will be preprocessed and saved by default in `~/.cache/paddle/dataset/pascalvoc/pascalvoc_full.bin`
+Then the Pascal VOC2007 test set will be preprocessed and saved by default in `$HOME/.cache/paddle/dataset/pascalvoc/pascalvoc_full.bin`

 * Run the following commands to prepare your own dataset.

 ```bash
 cd /PATH/TO/PADDLE/build
-python ./paddle/fluid/inference/tests/api/full_pascalvoc_test_preprocess.py --choice=local \\
+python ../paddle/fluid/inference/tests/api/full_pascalvoc_test_preprocess.py --choice=local \\
                                         --data_dir=./third_party/inference_demo/int8v2/pascalvoc_small \\
                                         --img_annotation_list=test_100.txt \\
                                         --label_file=label_list \\
--- a/python/paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md
+++ b/python/paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md
@ -1,7 +1,10 @@
 # SLIM Quantization-aware training (QAT) on INT8 MKL-DNN

-This document describes how to use [Paddle Slim](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/advanced_usage/paddle_slim/paddle_slim.md) to convert a quantization-aware trained model to an INT8 MKL-DNN. In **Release 1.5**, we have released the QAT MKL-DNN 1.0 which enabled the INT8 MKL-DNN kernel for QAT trained model within 0.05% accuracy diff on GoogleNet, MobileNet-V1, MobileNet-V2, ResNet-101, ResNet-50, VGG16 and VGG19. In **Release 1.6**, QAT MKL-DNN 2.0, we did the performance optimization based on fake QAT models: ResNet50, ResNet101, Mobilenet-v1, Mobilenet-v2, VGG16 and VGG19 with the minor accuracy drop. Compared with Release 1.5, the QAT MKL-DNN 2.0 got better performance gain on inference compared with fake QAT models but got a little bit bigger accuracy diff. We provide the accuracy benchmark both for QAT MKL-DNN 1.0 and QAT MKL-DNN 2.0, and performance benchmark on QAT MKL-DNN 2.0.  
-MKL-DNN INT8 quantization performance gain can only be obtained with AVX512 series CPU servers.
+This document describes how to use [Paddle Slim](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/advanced_usage/paddle_slim/paddle_slim.md) to convert a quantization-aware trained model to INT8 MKL-DNN quantized model. In **Release 1.5**, we have released the QAT1.0 MKL-DNN which enabled the INT8 MKL-DNN kernel for QAT trained model within 0.05% accuracy diff on GoogleNet, MobileNet-V1, MobileNet-V2, ResNet-101, ResNet-50, VGG16 and VGG19. In **Release 1.6**, QAT2.0 MKL-DNN, we did the performance optimization based on fake QAT models: ResNet50, ResNet101, Mobilenet-v1, Mobilenet-v2, VGG16 and VGG19 with the minor accuracy drop. Compared with Release 1.5, the QAT2.0 MKL-DNN got better performance gain on inference compared with fake QAT models but got a little bit bigger accuracy diff. We provide the accuracy benchmark both for QAT1.0 MKL-DNN and QAT2.0 MKL-DNN, and performance benchmark on QAT2.0 MKL-DNN.  
+
+Notes:
+
+* MKL-DNN and MKL are required. The performance gain can only be obtained with AVX512 series CPU servers.

 ## 0. Prerequisite
 You need to install at least PaddlePaddle-1.6 python package `pip install paddlepaddle==1.6`.
@ -19,12 +22,12 @@ You can refer to the unit test in [test_quantization_mkldnn_pass.py](test_quanti
    graph = IrGraph(core.Graph(fluid.Program().desc), for_test=False)
    place = fluid.CPUPlace()
    # Convert the IrGraph to MKL-DNN supported INT8 IrGraph by using
-    # QAT MKL-DNN 1.0
+    # QAT1.0 MKL-DNN
    # FakeQAT2MkldnnINT8KernelPass
    mkldnn_pass = FakeQAT2MkldnnINT8KernelPass(fluid.global_scope(), place)
    # Apply FakeQAT2MkldnnINT8KernelPass to IrGraph
    mkldnn_pass.apply(graph)
-    # QAT MKL-DNN 2.0
+    # QAT2.0 MKL-DNN
    # FakeQAT2MkldnnINT8PerfPass
    mkldnn_pass = FakeQAT2MkldnnINT8PerfPass(fluid.global_scope(), place, fluid.core, False)
    # Apply FakeQAT2MkldnnINT8PerfPass to IrGraph
@ -32,45 +35,46 @@ You can refer to the unit test in [test_quantization_mkldnn_pass.py](test_quanti

 ```

-## 2. Accuracy benchmark
+## 2. Accuracy and Performance benchmark

->**I. QAT1.0 MKL_DNN Accuracy on Intel(R) Xeon(R) Gold 6271**
+>**I. QAT1.0 MKL-DNN Accuracy on Intel(R) Xeon(R) Gold 6271**

-| Model        | Fake QAT Top1 Accuracy | Fake QAT Top5 Accuracy |MKL-DNN INT8 Top1 Accuracy |  Top1 Diff   | MKL-DNN INT8 Top5 Accuracy | Top5 Diff  |
-| :----------: | :--------------------: | :--------------------: |:-----------------------:  | :----------: | :------------------------: | :--------: |
-| GoogleNet    |         70.40%         |          89.46%        |           70.39%          |    -0.01%    |           89.46%           |   0.00%    |
-| MobileNet-V1 |         70.84%         |          89.59%        |           70.85%          |    +0.01%    |           89.58%           |  -0.01%    |
-| MobileNet-V2 |         72.07%         |          90.71%        |           72.06%          |    -0.01%    |           90.69%           |  -0.02%    |
-| ResNet-101   |         77.49%         |          93.68%        |           77.52%          |    +0.03%    |           93.67%           |  -0.01%    |
-| ResNet-50    |         76.61%         |          93.08%        |           76.62%          |    +0.01%    |           93.10%           |  +0.02%    |
-| VGG16        |         72.71%         |          91.11%        |           72.69%          |    -0.02%    |           91.09%           |  -0.02%    |
-| VGG19        |         73.37%         |          91.40%        |           73.37%          |     0.00%    |           91.41%           |  +0.01%    |
+|     Model    | Fake QAT Top1 Accuracy | INT8 QAT Top1 Accuracy | Top1 Diff | Fake QAT Top5 Accuracy | INT8 QAT Top5 Accuracy | Top5 Diff |
+|:------------:|:----------------------:|:----------------------:|:---------:|:----------------------:|:----------------------:|:---------:|
+|   GoogleNet  |         70.40%         |         70.39%         |   -0.01%  |         89.46%         |         89.46%         |   0.00%   |
+| MobileNet-V1 |         70.84%         |         70.85%         |   +0.01%  |         89.59%         |         89.58%         |   -0.01%  |
+| MobileNet-V2 |         72.07%         |         72.06%         |   -0.01%  |         90.71%         |         90.69%         |   -0.02%  |
+|  ResNet-101  |         77.49%         |         77.52%         |   +0.03%  |         93.68%         |         93.67%         |   -0.01%  |
+|   ResNet-50  |         76.61%         |         76.62%         |   +0.01%  |         93.08%         |         93.10%         |   +0.02%  |
+|     VGG16    |         72.71%         |         72.69%         |   -0.02%  |         91.11%         |         91.09%         |   -0.02%  |
+|     VGG19    |         73.37%         |         73.37%         |   0.00%   |         91.40%         |         91.41%         |   +0.01%  |

-Notes:
-
-* MKL-DNN and MKL are required. AVX512 CPU server is required.

 >**II. QAT2.0 MKL-DNN Accuracy on Intel(R) Xeon(R) Gold 6271**

-| Model        | Fake QAT Top1 Accuracy | Fake QAT Top5 Accuracy |MKL-DNN INT8 Top1 Accuracy |  Top1 Diff  | MKL-DNN INT8 Top5 Accuracy | Top5 Diff |
-| :----------: | :--------------------: | :--------------------: |:-----------------------:  | :----------:| :------------------------: | :--------:|
-| MobileNet-V1 |         70.72%         |          89.47%        |           70.78%          |    +0.06%   |           89.39%           |   -0.08%  |
-| MobileNet-V2 |         72.07%         |          90.65%        |           72.17%          |    +0.10%   |           90.63%           |   -0.02%  |
-| ResNet101    |         77.86%         |          93.54%        |           77.59%          |    -0.27%   |           93.54%           |   -0.00%  |
-| ResNet50     |         76.62%         |          93.01%        |           76.53%          |    -0.09%   |           92.98%           |   -0.03%   |
-| VGG16        |         71.74%         |          89.96%        |           71.75%          |    +0.01%   |           89.73%           |   -0.23%   |
-| VGG19        |         72.30%         |          90.19%        |           72.09%          |    -0.21%   |           90.13%           |   -0.06%  |
-
->**III. QAT2.0 MKL-DNN Python Performance on Intel(R) Xeon(R) Gold 6271**
-
-| Model        | Fake QAT Original Throughput(images/s) | INT8 Throughput(images/s) | Ratio(INT8/FP32)|
-| :-----------:| :-------------------------:            | :------------:            | :------------:  |
-| MobileNet-V1 |    12.86                               | 118.05                    |   9.18          |
-| MobileNet-V2 |    9.76                                |  85.89                    |   8.80          |
-| ResNet101    |    2.55                                |  19.40                    |   7.61          |
-| ResNet50     |    4.39                                |  35.78                    |   8.15          |
-| VGG16        |    2.26                                |  9.89                     |   4.38          |
-| VGG19        |    1.96                                |  8.41                     |   4.29          |
+|     Model    | Fake QAT Top1 Accuracy | INT8 QAT Top1 Accuracy | Top1 Diff | Fake QAT Top5 Accuracy | INT8 QAT Top5 Accuracy | Top5 Diff |
+|:------------:|:----------------------:|:----------------------:|:---------:|:----------------------:|:----------------------:|:---------:|
+| MobileNet-V1 |         70.72%         |         70.78%         |   +0.06%  |         89.47%         |         89.39%         |   -0.08%  |
+| MobileNet-V2 |         72.07%         |         72.17%         |   +0.10%  |         90.65%         |         90.63%         |   -0.02%  |
+|   ResNet101  |         77.86%         |         77.59%         |   -0.27%  |         93.54%         |         93.54%         |   0.00%   |
+|   ResNet50   |         76.62%         |         76.53%         |   -0.09%  |         93.01%         |         92.98%         |   -0.03%  |
+|     VGG16    |         71.74%         |         71.75%         |   +0.01%  |         89.96%         |         89.73%         |   -0.23%  |
+|     VGG19    |         72.30%         |         72.09%         |   -0.21%  |         90.19%         |         90.13%         |   -0.06%  |
+
+>**III. QAT2.0 MKL-DNN C-API Performance on Intel(R) Xeon(R) Gold 6271**
+
+|     Model    | FP32 Optimized Throughput (images/s) | INT8 QAT Throughput(images/s) | Ratio(INT8/FP32) |
+|:------------:|:------------------------------------:|:-----------------------------:|:----------------:| 
+| MobileNet-V1 |                 73.98                |             227.73            |       3.08       |
+| MobileNet-V2 |                 86.59                |             206.74            |       2.39       |
+|   ResNet101  |                 7.15                 |             26.69             |       3.73       |
+|   ResNet50   |                 13.15                |             49.33             |       3.75       |
+|     VGG16    |                 3.34                 |             10.15             |       3.04       |
+|     VGG19    |                 2.83                 |              8.67             |       3.07       |
+
+Notes:
+
+* FP32 Optimized Throughput (images/s) is from [int8_mkldnn_quantization.md](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/api/int8_mkldnn_quantization.md).

 ## 3. How to reproduce the results
 Three steps to reproduce the above-mentioned accuracy results, and we take ResNet50 benchmark as an example:
@ -79,46 +83,57 @@ Three steps to reproduce the above-mentioned accuracy results, and we take ResNe
 cd /PATH/TO/PADDLE
 python paddle/fluid/inference/tests/api/full_ILSVRC2012_val_preprocess.py
 ```
-The converted data binary file is saved by default in `~/.cache/paddle/dataset/int8/download/int8_full_val.bin`
+The converted data binary file is saved by default in `$HOME/.cache/paddle/dataset/int8/download/int8_full_val.bin`
 * ### Prepare model
-You can run the following commands to download ResNet50 model. The exemplary code snippet provided below downloads a ResNet50 QAT model. The reason for having two different versions of the same model originates from having a two different QAT training strategies: One for an non-optimized and second for an optimized graph transform which correspond to QAT 1.0 and QAT 2.0 respectively.
+You can run the following commands to download ResNet50 model. The exemplary code snippet provided below downloads a ResNet50 QAT model. The reason for having two different versions of the same model originates from having two different QAT training strategies: One for an non-optimized and second for an optimized graph transform which correspond to QAT1.0 and QAT2.0 respectively.

 ```bash
 mkdir -p /PATH/TO/DOWNLOAD/MODEL/
 cd /PATH/TO/DOWNLOAD/MODEL/
-# uncomment for QAT 1.0 MKL-DNN
+# uncomment for QAT1.0 MKL-DNN
 # export MODEL_NAME=ResNet50
 # export MODEL_FILE_NAME= QAT_models/${MODEL_NAME}_qat_model.tar.gz
-# uncomment for QAT 2.0 MKL-DNN
+# uncomment for QAT2.0 MKL-DNN
 # export MODEL_NAME=resnet50
 # export MODEL_FILE_NAME= QAT2_models/${MODEL_NAME}_quant.tar.gz
 wget http://paddle-inference-dist.bj.bcebos.com/int8/${MODEL_FILE_NAME}
 ```

-Unzip the downloaded model to the folder.To verify all the 7 models, you need to set `MODEL_NAME` to one of the following values in command line:
+Unzip the downloaded model to the folder. To verify all the 7 models, you need to set `MODEL_NAME` to one of the following values in command line:
 ```text
-QAT MKL-DNN 1.0
+QAT1.0 models
 MODEL_NAME=ResNet50, ResNet101, GoogleNet, MobileNetV1, MobileNetV2, VGG16, VGG19
-QAT MKL-DNN 2.0
+QAT2.0 models
 MODEL_NAME=resnet50, resnet101, mobilenetv1, mobilenetv2, vgg16, vgg19 
 ```
 * ### Commands to reproduce benchmark
-You can run `qat_int8_comparison.py` with the following arguments to reproduce the accuracy result on ResNet50. The difference of command line between the QAT MKL-DNN 1.0 and QAT MKL-DNN 2.0 is that we use argument `qat2` to enable QAT MKL-DNN 2.0. To perform QAT MKL-DNN 2.0 the performance test, the environmental variable `OMP_NUM_THREADS=1` and `batch_size=1` parameter should be set.
->*QAT 1.0*
+You can run `qat_int8_comparison.py` with the following arguments to reproduce the accuracy result on ResNet50. The difference of command line between the QAT1.0 MKL-DNN and QAT2.0 MKL-DNN is that we use argument `qat2` to enable QAT2.0 MKL-DNN. To perform QAT2.0 MKL-DNN the performance test, the environmental variable `OMP_NUM_THREADS=1` and `batch_size=1` parameter should be set.
+>*QAT1.0*
+
+- Accuracy benchmark command on QAT1.0 models

 ```bash
-OMP_NUM_THREADS=28 FLAGS_use_mkldnn=true python python/paddle/fluid/contrib/slim/tests/qat_int8_comparison.py --qat_model=/PATH/TO/DOWNLOAD/MODEL/${MODEL_NAME}/model --infer_data=~/.cache/paddle/dataset/int8/download/int8_full_val.bin --batch_size=50 --batch_num=1000 --acc_diff_threshold=0.001
+cd /PATH/TO/PADDLE
+OMP_NUM_THREADS=28 FLAGS_use_mkldnn=true python python/paddle/fluid/contrib/slim/tests/qat_int8_comparison.py --qat_model=/PATH/TO/DOWNLOAD/MODEL/${MODEL_NAME}/model --infer_data=$HOME/.cache/paddle/dataset/int8/download/int8_full_val.bin --batch_size=50 --batch_num=1000 --acc_diff_threshold=0.001
 ```
->*QAT 2.0*
+>*QAT2.0*

- Accuracy benchamrk
+- Accuracy benchamrk command on QAT2.0 models
 ```bash
-OMP_NUM_THREADS=28 FLAGS_use_mkldnn=true python python/paddle/fluid/contrib/slim/tests/qat_int8_comparison.py --qat_model=/PATH/TO/DOWNLOAD/MODEL/${MODEL_NAME} --infer_data=~/.cache/paddle/dataset/int8/download/int8_full_val.bin --batch_size=50 --batch_num=1000 --acc_diff_threshold=0.01 --qat2
+cd /PATH/TO/PADDLE
+OMP_NUM_THREADS=28 FLAGS_use_mkldnn=true python python/paddle/fluid/contrib/slim/tests/qat_int8_comparison.py --qat_model=/PATH/TO/DOWNLOAD/MODEL/${MODEL_NAME} --infer_data=$HOME/.cache/paddle/dataset/int8/download/int8_full_val.bin --batch_size=50 --batch_num=1000 --acc_diff_threshold=0.01 --qat2
 ```

- Performance benchmark
+* Performance benchmark command on QAT2.0 models

 ```bash
-OMP_NUM_THREADS=1 FLAGS_use_mkldnn=true python python/paddle/fluid/contrib/slim/tests/qat_int8_comparison.py --qat_model=/PATH/TO/DOWNLOAD/MODEL/${MODEL_NAME} --infer_data=~/.cache/paddle/dataset/int8/download/int8_full_val.bin --batch_size=1 --batch_num=1000 --acc_diff_threshold=0.01 --qat2
+# 1. Save QAT2.0 INT8 model
+cd /PATH/TO/PADDLE/build
+python ../python/paddle/fluid/contrib/slim/tests/save_qat_model.py --qat_model_path /PATH/TO/DOWNLOAD/MODEL/${QAT2_MODEL_NAME} --int8_model_save_path /PATH/TO/${QAT2_MODEL_NAME}_qat_int8
+
+# 2. Run the QAT2.0 C-API for performance benchmark
+cd /PATH/TO/PADDLE/build
+OMP_NUM_THREADS=1 paddle/fluid/inference/tests/api/test_analyzer_qat_image_classification ARGS --enable_fp32=false --with_accuracy_layer=false --int8_model=/PATH/TO/${QAT2_MODEL_NAME}_qat_int8 --infer_data=$HOME/.cache/paddle/dataset/int8/download/int8_full_val.bin --batch_size=1 --paddle_num_threads=1
 ```
-> Notes: Due to a high amount of images contained in `int8_full_val.bin` dataset (50 000), the accuracy benchmark which includes comparison of unoptimized and optimized QAT model may last long (even several hours). To accelerate the process, it is recommended to set `OMP_NUM_THREADS` to the max number of physical cores available on the server. Since performance test doesn't require running through the whole dataset, it is sufficient to keep the number of iterations to as low as 1000, with batch size and `OMP_NUM_THRADS` both set to 1.
+
+> Notes: Due to a large amount of images contained in `int8_full_val.bin` dataset (50 000), the accuracy benchmark which includes comparison of unoptimized and optimized QAT model may last long (even several hours). To accelerate the process, it is recommended to set `OMP_NUM_THREADS` to the max number of physical cores available on the server.