You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
mindspore/model_zoo/official/rl/dqn/README.md

128 lines
5.3 KiB

4 years ago
# Contents
- [DQN Description](#DQN-description)
- [Model Architecture](#model-architecture)
- [Dataset](#dataset)
- [Requirements](#Requirements)
- [Script Description](#script-description)
- [Script and Sample Code](#script-and-sample-code)
- [Script Parameters](#script-parameters)
- [Training Process](#training-process)
- [Evaluation Process](#evaluation-process)
- [Model Description](#model-description)
- [Performance](#performance)
- [Description of Random Situation](#description-of-random-situation)
- [ModelZoo Homepage](#modelzoo-homepage)
# [DQN Description](#contents)
DQN is the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning.
[Paper](https://www.nature.com/articles/nature14236) Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." nature 518, no. 7540 (2015): 529-533.
## [Model Architecture](#content)
The overall network architecture of DQN is show below:
[Paper](https://www.nature.com/articles/nature14236)
## [Dataset](#content)
## [Requirements](#content)
- HardwareAscend/GPU/CPU
- Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
- Framework
- [MindSpore](https://www.mindspore.cn/install/en)
- For more information, please check the resources below
- [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
- third-party libraries
```bash
pip install gym
```
## [Script Description](#content)
### [Scripts and Sample Code](#contents)
```python
├── dqn
├── README.md # descriptions about DQN
├── scripts
│ ├──run_standalone_eval_ascend.sh # shell script for evaluation with Ascend
│ ├──run_standalone_eval_gpu.sh # shell script for evaluation with GPU
│ ├──run_standalone_train_ascend.sh # shell script for train with Ascend
│ ├──run_standalone_train_gpu.sh # shell script for train with GPU
├── src
│ ├──agent.py # model agent
│ ├──config.py # parameter configuration
│ ├──dqn.py # dqn architecture
├── train.py # training script
├── eval.py # evaluation script
```
### [Script Parameter](#content)
```python
'gamma': 0.8 # the proportion of choose next state value
'epsi_high': 0.9 # the highest exploration rate
'epsi_low': 0.05 # the Lowest exploration rate
'decay': 200 # number of steps to start learning
'lr': 0.001 # learning rate
'capacity': 100000 # the capacity of data buffer
'batch_size': 512 # training batch size
'state_space_dim': 4 # the environment state space dim
'action_space_dim': 2 # the action dim
```
### [Training Process](#content)
```shell
# training example
python
Ascend: python train.py --device_target Ascend --ckpt_path ckpt > log.txt 2>&1 &
GPU: python train.py --device_target GPU --ckpt_path ckpt > log.txt 2>&1 &
shell:
Ascend: sh run_standalone_train_ascend.sh ckpt
GPU: sh run_standalone_train_gpu.sh ckpt
```
### [Evaluation Process](#content)
```shell
# evaluat example
python
Ascend: python eval.py --device_target Ascend --ckpt_path .ckpt/checkpoint_dqn.ckpt
GPU: python eval.py --device_target GPU --ckpt_path .ckpt/checkpoint_dqn.ckpt
shell:
Ascend: sh run_standalone_eval_ascend.sh .ckpt/checkpoint_dqn.ckpt
GPU: sh run_standalone_eval_gpu.sh .ckpt/checkpoint_dqn.ckpt
```
## [Performance](#content)
### Inference Performance
| Parameters | DQN |
| -------------------------- | ----------------------------------------------------------- |
| Resource | Ascend 910 CPU 2.60GHz192coresMemory755G |
| uploaded Date | 03/10/2021 (month/day/year) |
| MindSpore Version | 1.1.0 |
| Training Parameters | batch_size = 512, lr=0.001 |
| Optimizer | RMSProp |
| Loss Function | MSELoss |
| outputs | probability |
| Params (M) | 7.3k |
| Scripts | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/rl/dqn |
## [Description of Random Situation](#content)
We use random seed in train.py.
## [ModeZoo Homepage](#contents)
Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).