You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
mindspore/model_zoo/official/recommend/deepfm/README.md

304 lines
12 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# Contents
- [DeepFM Description](#deepfm-description)
- [Model Architecture](#model-architecture)
- [Dataset](#dataset)
- [Environment Requirements](#environment-requirements)
- [Quick Start](#quick-start)
- [Script Description](#script-description)
- [Script and Sample Code](#script-and-sample-code)
- [Script Parameters](#script-parameters)
- [Training Process](#training-process)
- [Training](#training)
- [Distributed Training](#distributed-training)
- [Evaluation Process](#evaluation-process)
- [Evaluation](#evaluation)
- [Model Description](#model-description)
- [Performance](#performance)
- [Evaluation Performance](#evaluation-performance)
- [Inference Performance](#evaluation-performance)
- [Description of Random Situation](#description-of-random-situation)
- [ModelZoo Homepage](#modelzoo-homepage)
# [DeepFM Description](#contents)
Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low- or high-order interactions, or require expertise feature engineering. In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions. The proposed model, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture.
[Paper](https://arxiv.org/abs/1703.04247): Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
# [Model Architecture](#contents)
DeepFM consists of two components. The FM component is a factorization machine, which is proposed in to learn feature interactions for recommendation. The deep component is a feed-forward neural network, which is used to learn high-order feature interactions.
The FM and deep component share the same input raw feature vector, which enables DeepFM to learn low- and high-order feature interactions simultaneously from the input raw features.
# [Dataset](#contents)
- [1] A dataset used in Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction[J]. 2017.
# [Environment Requirements](#contents)
- HardwareAscend/GPU/CPU
- Prepare hardware environment with Ascend, GPU, or CPU processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
- Framework
- [MindSpore](https://www.mindspore.cn/install/en)
- For more information, please check the resources below
- [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
- [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
# [Quick Start](#contents)
After installing MindSpore via the official website, you can start training and evaluation as follows:
- runing on Ascend
```shell
# run training example
python train.py \
--dataset_path='dataset/train' \
--ckpt_path='./checkpoint' \
--eval_file_name='auc.log' \
--loss_file_name='loss.log' \
--device_target='Ascend' \
--do_eval=True > ms_log/output.log 2>&1 &
# run distributed training example
sh scripts/run_distribute_train.sh 8 /dataset_path /rank_table_8p.json
# run evaluation example
python eval.py \
--dataset_path='dataset/test' \
--checkpoint_path='./checkpoint/deepfm.ckpt' \
--device_target='Ascend' > ms_log/eval_output.log 2>&1 &
OR
sh scripts/run_eval.sh 0 Ascend /dataset_path /checkpoint_path/deepfm.ckpt
```
For distributed training, a hccl configuration file with JSON format needs to be created in advance.
Please follow the instructions in the link below:
[hccl tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools).
- running on GPU
For running on GPU, please change `device_target` from `Ascend` to `GPU` in configuration file src/config.py
```shell
# run training example
python train.py \
--dataset_path='dataset/train' \
--ckpt_path='./checkpoint' \
--eval_file_name='auc.log' \
--loss_file_name='loss.log' \
--device_target='GPU' \
--do_eval=True > ms_log/output.log 2>&1 &
# run distributed training example
sh scripts/run_distribute_train.sh 8 /dataset_path
# run evaluation example
python eval.py \
--dataset_path='dataset/test' \
--checkpoint_path='./checkpoint/deepfm.ckpt' \
--device_target='GPU' > ms_log/eval_output.log 2>&1 &
OR
sh scripts/run_eval.sh 0 GPU /dataset_path /checkpoint_path/deepfm.ckpt
```
- running on CPU
```shell
# run training example
python train.py \
--dataset_path='dataset/train' \
--ckpt_path='./checkpoint' \
--eval_file_name='auc.log' \
--loss_file_name='loss.log' \
--device_target='CPU' \
--do_eval=True > ms_log/output.log 2>&1 &
# run evaluation example
python eval.py \
--dataset_path='dataset/test' \
--checkpoint_path='./checkpoint/deepfm.ckpt' \
--device_target='CPU' > ms_log/eval_output.log 2>&1 &
```
# [Script Description](#contents)
## [Script and Sample Code](#contents)
```path
.
└─deepfm
├─README.md
├─mindspore_hub_conf.md # config for mindspore hub
├─scripts
├─run_standalone_train.sh # launch standalone training(1p) in Ascend or GPU
├─run_distribute_train.sh # launch distributed training(8p) in Ascend
├─run_distribute_train_gpu.sh # launch distributed training(8p) in GPU
└─run_eval.sh # launch evaluating in Ascend or GPU
├─src
├─__init__.py # python init file
├─config.py # parameter configuration
├─callback.py # define callback function
├─deepfm.py # deepfm network
├─dataset.py # create dataset for deepfm
├─eval.py # eval net
└─train.py # train net
```
## [Script Parameters](#contents)
Parameters for both training and evaluation can be set in config.py
- train parameters
```help
optional arguments:
-h, --help show this help message and exit
--dataset_path DATASET_PATH
Dataset path
--ckpt_path CKPT_PATH
Checkpoint path
--eval_file_name EVAL_FILE_NAME
Auc log file path. Default: "./auc.log"
--loss_file_name LOSS_FILE_NAME
Loss log file path. Default: "./loss.log"
--do_eval DO_EVAL Do evaluation or not. Default: True
--device_target DEVICE_TARGET
Ascend or GPU. Default: Ascend
```
- eval parameters
```help
optional arguments:
-h, --help show this help message and exit
--checkpoint_path CHECKPOINT_PATH
Checkpoint file path
--dataset_path DATASET_PATH
Dataset path
--device_target DEVICE_TARGET
Ascend or GPU. Default: Ascend
```
## [Training Process](#contents)
### Training
- running on Ascend
```shell
python train.py \
--dataset_path='dataset/train' \
--ckpt_path='./checkpoint' \
--eval_file_name='auc.log' \
--loss_file_name='loss.log' \
--device_target='Ascend' \
--do_eval=True > ms_log/output.log 2>&1 &
```
The python command above will run in the background, you can view the results through the file `ms_log/output.log`.
After training, you'll get some checkpoint files under `./checkpoint` folder by default. The loss value are saved in loss.log file.
```log
2020-05-27 15:26:29 epoch: 1 step: 41257, loss is 0.498953253030777
2020-05-27 15:32:32 epoch: 2 step: 41257, loss is 0.45545706152915955
...
```
The model checkpoint will be saved in the current directory.
- running on GPU
To do.
### Distributed Training
- running on Ascend
```shell
sh scripts/run_distribute_train.sh 8 /dataset_path /rank_table_8p.json
```
The above shell script will run distribute training in the background. You can view the results through the file `log[X]/output.log`. The loss value are saved in loss.log file.
- running on GPU
To do.
## [Evaluation Process](#contents)
### Evaluation
- evaluation on dataset when running on Ascend
Before running the command below, please check the checkpoint path used for evaluation.
```shell
python eval.py \
--dataset_path='dataset/test' \
--checkpoint_path='./checkpoint/deepfm.ckpt' \
--device_target='Ascend' > ms_log/eval_output.log 2>&1 &
OR
sh scripts/run_eval.sh 0 Ascend /dataset_path /checkpoint_path/deepfm.ckpt
```
The above python command will run in the background. You can view the results through the file "eval_output.log". The accuracy is saved in auc.log file.
```log
{'result': {'AUC': 0.8057789065281104, 'eval_time': 35.64779996871948}}
```
- evaluation on dataset when running on GPU
To do.
# [Model Description](#contents)
## [Performance](#contents)
### Training Performance
| Parameters | Ascend | GPU |
| -------------------------- | ----------------------------------------------------------- | ---------------------- |
| Model Version | DeepFM | To do |
| Resource | Ascend 910; CPU 2.60GHz, 192cores; Memory 755G | To do |
| uploaded Date | 09/15/2020 (month/day/year) | To do |
| MindSpore Version | 1.0.0 | To do |
| Dataset | [1] | To do |
| Training Parameters | epoch=15, batch_size=1000, lr=1e-5 | To do |
| Optimizer | Adam | To do |
| Loss Function | Sigmoid Cross Entropy With Logits | To do |
| outputs | Accuracy | To do |
| Loss | 0.45 | To do |
| Speed | 1pc: 8.16 ms/step; | To do |
| Total time | 1pc: 90 mins; | To do |
| Parameters (M) | 16.5 | To do |
| Checkpoint for Fine tuning | 190M (.ckpt file) | To do |
| Scripts | [deepfm script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/recommend/deepfm) | To do |
### Inference Performance
| Parameters | Ascend | GPU |
| ------------------- | --------------------------- | --------------------------- |
| Model Version | DeepFM | To do |
| Resource | Ascend 910 | To do |
| Uploaded Date | 05/27/2020 (month/day/year) | To do |
| MindSpore Version | 0.3.0-alpha | To do |
| Dataset | [1] | To do |
| batch_size | 1000 | To do |
| outputs | accuracy | To do |
| Accuracy | 1pc: 80.55%; | To do |
| Model for inference | 190M (.ckpt file) | To do |
# [Description of Random Situation](#contents)
We set the random seed before training in train.py.
# [ModelZoo Homepage](#contents)
Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).