History

He Wei 7d9a783993 [auto-monad] Support side-effects by auto-monad The basic idea is: exploits data dependency to control the execution order of side-effect operations, and keep the semantics of ANF unchanged. The ControlDepend primitive is removed and there are two primitives added: 1. UpdateState: ``` a = Assign(para, value) ``` became: ``` a = Assign(para, value, u) u = UpdateState(u, a) ``` 2. Load: ``` x = Add(para, value) ``` became: ``` p = Load(para, u) x = Add(p, value) u = UpdateState(u, p) ```		4 years ago
..
scripts	BugFix for GPT	4 years ago
src	[auto-monad] Support side-effects by auto-monad	4 years ago
README.md	fix link error in README	4 years ago
eval.py	Add GPT	4 years ago
train.py	Add GPT	4 years ago

README.md

Unescape Escape

It is still under development

Contents
GPT Description
Model Architecture
Dataset
Environment Requirements
Quick Start
Script Description
Script and Sample Code
ModelZoo Homepage

GPT Description

The GPT network was proposed by OpenAI and it has three versions, i.e., GPT, GPT2 and GPT3. The newest version GPT3 was proposed in Jul 2020 and it is quite a large language model with 175 billion parameters. Stacking many Decoder structure of Transformer and feeding massive amount of training data, GPT3 becomes such a powerful language model that no fine-tuning process is needed. As the papre title says, language models are few-shot learners, GPT3 proves that with a large and well-trained model, we can achieve a similar performance compared to those of fine-tuning methods.

Paper: Tom B.Brown, Benjamin Mann, Nick Ryder et al. Language Models are Few-Shot Learners. arXiv preprint arXiv:2005.14165

Model Architecture

GPT3 stacks many layers of decoder of transformer. According to the layer numbers and embedding size, GPT3 has several versions. The largest model contains 96 layers with embedding size of 12288 resulting to a total parameter of 175 billion.

Dataset

OpenWebText is utilized as the training data and the training objective is to predict the next token at each position.

Environment Requirements

Hardware（Ascend）
- Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the application form to ascend@huawei.com. Once approved, you can get access to the resources.
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore Tutorials
- MindSpore Python API

Quick Start

After installing MindSpore via the official website, you can start training and evaluation as follows:


# run standalone training example

bash scripts/run_standalone_train.sh 0 10 /path/dataset

# run distributed training example

bash scripts/run_distribute_training.sh /path/dataset /path/hccl.json 8

# run evaluation example, now only accuracy and perplexity for lambada and wikitext103 are supported

bash scripts/run_evaluation.sh lambada /your/ckpt /your/data acc

For distributed training, an hccl configuration file with JSON format needs to be created in advance. Please follow the instructions in the link below: https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.

Script Description

Script and Sample Code

.
└─gpt
  ├─README.md
  ├─scripts
    ├─run_standalone_train.sh                 # shell script for standalone training on ascend
    ├─run_distribut_train.sh                  # shell script for distributed training on ascend
    └─run_evaluation.sh                       # shell script for evaluation of ascend
  ├─src
    ├─gpt_wrapper.py                          # backbone code of network
    ├─gpt.py                                  # backbone code of network
    ├─dataset.py                              # data preprocessing
    ├─inference.py                            # evaluation function
    ├─utils.py                                # util function
  ├─train.py                                  # train net for training phase
  └─eval.py                                   # eval net for evaluation

ModelZoo Homepage

Please check the official homepage.

README.md Unescape Escape

It is still under development

Contents

README.md

Unescape Escape