Update DNNL QAT document 2.0-alpha (#24494)

Update DNNL QAT document 2.0-alpha
5 years ago · 8ef3c02e90
parent db2b6b6568
commit 8ef3c02e90
1 changed files with 10 additions and 45 deletions
--- a/python/paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md
+++ b/python/paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md
@ -109,10 +109,9 @@ The code snipped shows how the `Qat2Int8MkldnnPass` can be applied to a model gr
 ## 5. Accuracy and Performance benchmark
-This section contain QAT2 MKL-DNN accuracy and performance benchmark results measured on two servers:
+This section contain QAT2 MKL-DNN accuracy and performance benchmark results measured on the following server:
 * Intel(R) Xeon(R) Gold 6271 (with AVX512 VNNI support),
 * Intel(R) Xeon(R) Gold 6148.
 Performance benchmarks were run with the following environment settings:
@ -144,17 +143,6 @@ Performance benchmarks were run with the following environment settings:
 |    VGG16     |       72.08%       |         71.73%         |  -0.35%   |       90.63%       |         89.71%         |  -0.92%   |
 |    VGG19     |       72.57%       |         72.12%         |  -0.45%   |       90.84%       |         90.15%         |  -0.69%   |
 >**Intel(R) Xeon(R) Gold 6148**
 |    Model     | FP32 Top1 Accuracy | INT8 QAT Top1 Accuracy | Top1 Diff | FP32 Top5 Accuracy | INT8 QAT Top5 Accuracy | Top5 Diff |
 | :----------: | :----------------: | :--------------------: | :-------: | :----------------: | :--------------------: | :-------: |
 | MobileNet-V1 |       70.78%       |         70.85%         |   0.07%   |       89.69%       |         89.41%         |  -0.28%   |
 | MobileNet-V2 |       71.90%       |         72.08%         |   0.18%   |       90.56%       |         90.66%         |  +0.10%   |
 |  ResNet101   |       77.50%       |         77.51%         |   0.01%   |       93.58%       |         93.50%         |  -0.08%   |
 |   ResNet50   |       76.63%       |         76.55%         |  -0.08%   |       93.10%       |         92.96%         |  -0.14%   |
 |    VGG16     |       72.08%       |         71.72%         |  -0.36%   |       90.63%       |         89.75%         |  -0.88%   |
 |    VGG19     |       72.57%       |         72.08%         |  -0.49%   |       90.84%       |         90.11%         |  -0.73%   |
 #### Performance
 Image classification models performance was measured using a single thread. The setting is included in the benchmark reproduction commands below.
@ -164,23 +152,12 @@ Image classification models performance was measured using a single thread. The
 |    Model     | FP32 (images/s) | INT8 QAT (images/s) | Ratio (INT8/FP32)  |
 | :----------: | :-------------: | :-----------------: | :---------------:  |
-| MobileNet-V1 |      77.00      |       210.76        |      2.74          |
+| MobileNet-V1 |      74.05      |       196.98        |      2.66          |
-| MobileNet-V2 |      88.43      |       182.47        |      2.06          |
+| MobileNet-V2 |      88.60      |       187.67        |      2.12          |
-|  ResNet101   |      7.20       |        25.88        |      3.60          |
+|  ResNet101   |      7.20       |       26.43         |      3.67          |
-|   ResNet50   |      13.26      |        47.44        |      3.58          |
+|   ResNet50   |      13.23      |       47.44         |      3.59          |
-|    VGG16     |      3.48       |        10.11        |      2.90          |
+|    VGG16     |      3.47       |       10.20         |      2.94          |
-|    VGG19     |      2.83       |        8.77         |      3.10          |
+|    VGG19     |      2.83       |       8.67          |      3.06          |
 >**Intel(R) Xeon(R) Gold 6148**
 |    Model     | FP32 (images/s) | INT8 QAT (images/s) | Ratio (INT8/FP32) |
 | :----------: | :-------------: | :-----------------: | :---------------: |
 | MobileNet-V1 |      75.23      |       103.63        |      1.38         |
 | MobileNet-V2 |      86.65      |       128.14        |      1.48         |
 |  ResNet101   |      6.61       |       10.79         |      1.63         |
 |   ResNet50   |      12.42      |       19.65         |      1.58         |
 |    VGG16     |      3.31       |        4.74         |      1.43         |
 |    VGG19     |      2.68       |        3.91         |      1.46         |
 Notes:
@ -194,13 +171,8 @@ Notes:
 |     Model    |  FP32 Accuracy | QAT INT8 Accuracy | Accuracy Diff |
 |:------------:|:----------------------:|:----------------------:|:---------:|
-|   Ernie      |      80.20%            |        79.88%        |  -0.32%  |
+|   Ernie      |      80.20%            |        79.44%        |  -0.76%  |
 >**Intel(R) Xeon(R) Gold 6148**
 | Model | FP32 Accuracy | QAT INT8 Accuracy | Accuracy Diff |
 | :---: | :-----------: | :---------------: | :-----------: |
 | Ernie |    80.20%     |      79.64%       |    -0.56%     |
 #### Performance
@ -209,17 +181,10 @@ Notes:
 |  Model  |     Threads  | FP32 Latency (ms) | QAT INT8 Latency (ms)    | Ratio (FP32/INT8) |
 |:------------:|:----------------------:|:-------------------:|:---------:|:---------:|
-| Ernie | 1 thread     |       236.72        |     83.70    |   2.82x   |
+| Ernie | 1 thread     |       237.21        |     79.26    |   2.99x    |
-| Ernie | 20 threads   |       27.40         |     15.01    |   1.83x   |
+| Ernie | 20 threads   |       22.08         |     12.57    |   1.76x    |
 >**Intel(R) Xeon(R) Gold 6148**
 | Model |  Threads   | FP32 Latency (ms) | QAT INT8 Latency (ms) | Ratio (FP32/INT8) |
 | :---: | :--------: | :---------------: | :-------------------: | :---------------: |
 | Ernie |  1 thread  |    248.42         |       169.30           |       1.46       |
 | Ernie | 20 threads |    28.92          |       20.83            |       1.39       |
 ## 6. How to reproduce the results
 The steps below show, taking ResNet50 as an example, how to reproduce the above accuracy and performance results for Image Classification models.