# Contents - [DQN Description](#DQN-description) - [Model Architecture](#model-architecture) - [Dataset](#dataset) - [Requirements](#Requirements) - [Script Description](#script-description) - [Script and Sample Code](#script-and-sample-code) - [Script Parameters](#script-parameters) - [Training Process](#training-process) - [Evaluation Process](#evaluation-process) - [Model Description](#model-description) - [Performance](#performance) - [Description of Random Situation](#description-of-random-situation) - [ModelZoo Homepage](#modelzoo-homepage) # [DQN Description](#contents) DQN is the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. [Paper](https://www.nature.com/articles/nature14236) Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al. "Human-level control through deep reinforcement learning." nature 518, no. 7540 (2015): 529-533. ## [Model Architecture](#content) The overall network architecture of DQN is show below: [Paper](https://www.nature.com/articles/nature14236) ## [Dataset](#content) ## [Requirements](#content) - Hardware(Ascend/GPU/CPU) - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. - Framework - [MindSpore](https://www.mindspore.cn/install/en) - For more information, please check the resources below: - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) - third-party libraries ```bash pip install gym ``` ## [Script Description](#content) ### [Scripts and Sample Code](#contents) ```python ├── dqn ├── README.md # descriptions about DQN ├── scripts │ ├──run_standalone_eval_ascend.sh # shell script for evaluation with Ascend │ ├──run_standalone_eval_gpu.sh # shell script for evaluation with GPU │ ├──run_standalone_train_ascend.sh # shell script for train with Ascend │ ├──run_standalone_train_gpu.sh # shell script for train with GPU ├── src │ ├──agent.py # model agent │ ├──config.py # parameter configuration │ ├──dqn.py # dqn architecture ├── train.py # training script ├── eval.py # evaluation script ``` ### [Script Parameter](#content) ```python 'gamma': 0.8 # the proportion of choose next state value 'epsi_high': 0.9 # the highest exploration rate 'epsi_low': 0.05 # the Lowest exploration rate 'decay': 200 # number of steps to start learning 'lr': 0.001 # learning rate 'capacity': 100000 # the capacity of data buffer 'batch_size': 512 # training batch size 'state_space_dim': 4 # the environment state space dim 'action_space_dim': 2 # the action dim ``` ### [Training Process](#content) ```shell # training example python Ascend: python train.py --device_target Ascend --ckpt_path ckpt > log.txt 2>&1 & GPU: python train.py --device_target GPU --ckpt_path ckpt > log.txt 2>&1 & shell: Ascend: sh run_standalone_train_ascend.sh ckpt GPU: sh run_standalone_train_gpu.sh ckpt ``` ### [Evaluation Process](#content) ```shell # evaluat example python Ascend: python eval.py --device_target Ascend --ckpt_path .ckpt/checkpoint_dqn.ckpt GPU: python eval.py --device_target GPU --ckpt_path .ckpt/checkpoint_dqn.ckpt shell: Ascend: sh run_standalone_eval_ascend.sh .ckpt/checkpoint_dqn.ckpt GPU: sh run_standalone_eval_gpu.sh .ckpt/checkpoint_dqn.ckpt ``` ## [Performance](#content) ### Inference Performance | Parameters | DQN | | -------------------------- | ----------------------------------------------------------- | | Resource | Ascend 910 ;CPU 2.60GHz,192cores;Memory,755G | | uploaded Date | 03/10/2021 (month/day/year) | | MindSpore Version | 1.1.0 | | Training Parameters | batch_size = 512, lr=0.001 | | Optimizer | RMSProp | | Loss Function | MSELoss | | outputs | probability | | Params (M) | 7.3k | | Scripts | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/rl/dqn | ## [Description of Random Situation](#content) We use random seed in train.py. ## [ModeZoo Homepage](#contents) Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).