- [Script and Sample Code](#script-and-sample-code)
- [Script Parameters](#script-parameters)
- [Train](#train)
- [Eval](#eval)
- [Options and Parameters](#options-and-parameters)
- [Parameters](#parameters)
- [Training Process](#training-process)
- [Training](#training)
- [Evaluation Process](#evaluation-process)
- [Evaluation](#evaluation)
- [evaluation on STS-B dataset](#evaluation-on-STS-B-dataset)
- [evaluation on QNLI dataset](#evaluation-on-qnli-dataset)
- [evaluation on MNLI dataset](#evaluation-on-mnli-dataset)
- [Model Description](#model-description)
- [Performance](#performance)
- [training Performance](#training-performance)
- [Inference Performance](#inference-performance)
- [Description of Random Situation](#description-of-random-situation)
- [ModelZoo Homepage](#modelzoo-homepage)
# [TernaryBERT Description](#contents)
[TernaryBERT](https://arxiv.org/abs/2009.12812) ternarizes the weights in a fine-tuned [BERT](https://arxiv.org/abs/1810.04805) or [TinyBERT](https://arxiv.org/abs/1909.10351) model and achieves competitive performances in natural language processing tasks. TernaryBERT outperforms the other BERT quantization methods, and even achieves comparable performance as the full-precision model while being 14.9x smaller
[Paper](https://arxiv.org/abs/2009.12812): Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang and Qun Liu. [TernaryBERT: Distillation-aware Ultra-low Bit BERT](https://arxiv.org/abs/2009.12812). arXiv preprint arXiv:2009.12812.
# [Model Architecture](#contents)
The backbone structure of TernaryBERT is transformer, the transformer contains six encoder modules, one encoder contains one self-attention module and one self-attention module contains one attention module.
# [Dataset](#contents)
- Download glue dataset for task distillation. Convert dataset files from json format to tfrecord format, please refer to run_classifier.py which in [BERT](https://github.com/google-research/bert) repository.
# [Environment Requirements](#contents)
- Hardware(GPU)
- Prepare hardware environment with GPU processor.
num_attention_heads number of attention heads: N, default is 12
intermediate_size size of intermediate layer: N
hidden_act activation function used: ACTIVATION, default is "gelu"
hidden_dropout_prob dropout probability for BertOutput: Q
attention_probs_dropout_prob dropout probability for BertAttention: Q
max_position_embeddings maximum length of sequences: N, default is 512
save_ckpt_step number for saving checkponit: N, default is 100
max_ckpt_num maximum number for saving checkpoint: N, default is 1
type_vocab_size size of token type vocab: N, default is 2
initializer_range initialization value of TruncatedNormal: Q, default is 0.02
use_relative_positions use relative positions or not: True | False, default is False
dtype data type of input: mstype.float16 | mstype.float32, default is mstype.float32
compute_type compute type in BertTransformer: mstype.float16 | mstype.float32, default is mstype.float32
do_quant do activation quantilization or not: True | False, default is True
embedding_bits the quant bits of embedding: N, default is 2
weight_bits the quant bits of weight: N, default is 2
cls_dropout_prob dropout probability for BertModelCLS: Q
activation_init initialization value of activation quantilization: Q, default is 2.5
is_lgt_fit use label ground truth loss or not: True | False, default is False
```
## [Training Process](#contents)
### Training
Before running the command below, please check `teacher_model_dir`, `student_model_dir` and `data_dir` has been set. Please set the path to be the absolute full path, e.g:"/home/xxx/model_dir/".
The shell command above will run in the background, you can view the results the file log.txt. The python command will run in the console, you can view the results on the interface. After training, you will get some checkpoint files under the script folder by default. The eval metric value will be achieved as follows:
The shell command above will run in the background, you can view the results the file log.txt. The python command will run in the console, you can view the results on the interface. The metric value of the test dataset will be as follows:
The shell command above will run in the background, you can view the results the file log.txt. The python command will run in the console, you can view the results on the interface. The metric value of the test dataset will be as follows:
The shell command above will run in the background, you can view the results the file log.txt. The python command will run in the console, you can view the results on the interface. The metric value of the test dataset will be as follows: