This example implements general distill and task distill of [BERT-base]( base version of BERT model).
[TinyBERT]( is 7.5x smalller and 9.4x faster on inference than [BERT-base]( base version of BERT model) and achieves competitive performances in the tasks of natural language understanding. It performs a novel transformer distillation at both the pre-training and task-specific learning stages.