History

Zhang, Guoming e5d3d7c63d resolve #15724 1.Remove the code for setting mkldnn environment in the test_calibration.py; 2.Update the cmake file for MKLDNN environment enabling; 3.Update the INT8 inference doc. test=develop		6 years ago
..
README.md	resolve #15724	6 years ago
__init__.py	Add INT8 calibration support in Paddle package (#15569 )	6 years ago
utility.py	Add INT8 calibration support in Paddle package (#15569 )	6 years ago

README.md

Unescape Escape

Offline INT8 Calibration Tool

PaddlePaddle supports offline INT8 calibration to accelerate the inference speed. In this document, we provide the instructions on how to enable INT8 calibration and show the ResNet-50 and MobileNet-V1 results in accuracy.

0. Prerequisite

You need to install at least PaddlePaddle-1.3 python package pip install paddlepaddle==1.3.

1. How to generate INT8 model

You can refer to the unit test in test_calibration.py. Basically, there are three steps:

Construct calibration object.

calibrator = int8_utility.Calibrator( # Step 1
    program=infer_program, # required, FP32 program
    pretrained_model=model_path, # required, FP32 pretrained model
    algo=algo, # required, calibration algorithm; default is max, the alternative is KL (Kullback–Leibler divergence)
    exe=exe, # required, executor
    output=int8_model, # required, INT8 model
    feed_var_names=feed_dict, # required, feed dict
    fetch_list=fetch_targets) # required, fetch targets

Call the calibrator.sample_data() after executor run.

_, acc1, _ = exe.run(
    program,
    feed={feed_dict[0]: image,
          feed_dict[1]: label},
    fetch_list=fetch_targets)

calibrator.sample_data() # Step 2

Call the calibrator.save_int8_model() after sampling over specified iterations (e.g., iterations = 50)

calibrator.save_int8_model() # Step 3

2. How to run INT8 model

You can load INT8 model by load_inference_model API and run INT8 inference similar as FP32.

[infer_program, feed_dict,
    fetch_targets] = fluid.io.load_inference_model(model_path, exe)

3. Result

We provide the results of accuracy measurd on Intel® Xeon® Platinum Gold Processor (also known as Intel® Xeon® Skylake6148).

Model	Dataset	FP32 Accuracy	INT8 Accuracy	Accuracy Diff
ResNet-50	Small	72.00%	72.00%	0.00%
MobileNet-V1	Small	62.00%	62.00%	0.00%
ResNet-50	Full ImageNet Val	76.63%	76.17%	0.46%
MobileNet-V1	Full ImageNet Val	70.78%	70.49%	0.29%

Please note that Small is a subset of full ImageNet validation dataset.

Notes:

The accuracy measurement requires the model with label.
The INT8 theoretical speedup is ~1.33X on Intel® Xeon® Skylake Server (please refer to This allows for 4x more input at the cost of 3x more instructions or 33.33% more compute in Reference).

4. How to reproduce the results

Small dataset

FLAGS_use_mkldnn=true python python/paddle/fluid/contrib/tests/test_calibration.py

Full dataset

FLAGS_use_mkldnn=true DATASET=full python python/paddle/fluid/contrib/tests/test_calibration.py

README.md Unescape Escape