mindspore/model_zoo/research/audio/wavenet/README.md

# Contents

- [WaveNet Description](#WaveNet-description)
- [Model Architecture](#model-architecture)
- [Dataset](#dataset)
- [Environment Requirements](#environment-requirements)
- [Script Description](#script-description)
    - [Script and Sample Code](#script-and-sample-code)
    - [Script Parameters](#script-parameters)
    - [Training Process](#training-process)
    - [Evaluation Process](#evaluation-process)
    - [Convert Process](#convert-process)
- [Model Description](#model-description)
    - [Performance](#performance)
        - [Training Performance](#training-performance)
        - [Inference Performance](#inference-performance)
- [ModelZoo Homepage](#modelzoo-homepage)

# [WaveNet Description](#contents)

WaveNet is a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones. We support training and evaluation  on both GPU and CPU.

[Paper](https://arxiv.org/pdf/1609.03499.pdf): ord A, Dieleman S, Zen H, et al. Wavenet: A generative model for raw audio

# [Model Architecture](#contents)

The current model consists of a pre-convolution layer, followed by several residual block which has residual and skip connection with gated activation units.
Finally, post convolution layers are added to predict the distribution.

# [Dataset](#contents)

In the following sections, we will introduce how to run the scripts using the related dataset below.

Dataset used: [The LJ Speech Dataset](<https://keithito.com/LJ-Speech-Dataset>)

- Dataset size：2.6G
- Data format：audio clips(13100) and transcription

- The dataset structure is as follows:

    ```path
    .
    └── LJSpeech-1.1
        ├─ wavs                  //audio clips files
        └─ metadata.csv           //transcripts
    ```

# [Environment Requirements](#contents)

- Hardware（GPU/CPU）
    - Prepare hardware environment with GPU/CPU processor.
- Framework
    - [MindSpore](https://www.mindspore.cn/install/en)
- For more information, please check the resources below：
    - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
    - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)

# [Script Description](#contents)

## [Script and Sample Code](#contents)

**Note that some of the scripts described below are not included our code**. These scripts should first be download them from [r9y9](https://github.com/r9y9/wavenet_vocoder) and added into this project.

```path
.
├── audio
    └──wavenet
        ├── scripts
        │   ├──run_distribute_train_gpu.sh // launch distributed training with gpu platform(8p)
        │   ├──run_eval_cpu.sh             // launch evaluation with cpu platform
        │   ├──run_eval_gpu.sh             // launch evaluation with gpu platform
        │   ├──run_standalone_train_cpu.sh // launch standalone training with cpu platform
        │   └──run_standalone_train_gpu.sh // launch standalone training with gpu platform(1p)
        ├──datasets                        // Note the datasets folder should be downloaded from the above link
        ├──egs                             // Note the egs folder should be downloaded from the above link  
        ├──utils                           // Note the utils folder should be downloaded from the above link  
        ├── audio.py                       // Audio utils. Note this script should be downloaded from the above link
        ├── compute-meanvar-stats.py       // Compute mean-variance normalization stats. Note this script should be downloaded from the above link
        ├── evaluate.py                    // Evaluation
        ├── export.py                      // Convert mindspore model to air model  
        ├── hparams.py                     // Hyper-parameter configuration. Note this script should be downloaded from the above link
        ├── mksubset.py                    // Make subset of dataset. Note this script should be downloaded from the above link
        ├── preprocess.py                  // Preprocess dataset. Note this script should be downloaded from the above link
        ├── preprocess_normalize.py        // Perform meanvar normalization to preprocessed features. Note this script should be downloaded from the above link
        ├── README.md                      // Descriptions about WaveNet
        ├── train.py                       // Training scripts
        ├── train_pytorch.py               // Note this script should be downloaded from the above link. The initial name of this script is train.py in the project from the link
        ├── src
        │   ├──__init__.py
        │   ├──dataset.py                  // Generate dataloader and data processing entry
        │   ├──callback.py                 // Callbacks to monitor the training
        │   ├──lr_generator.py             // Learning rate generator
        │   └──loss.py                     // Loss function definition
        └── wavenet_vocoder
            ├──__init__.py
            ├──conv.py                     // Extended 1D convolution
            ├──mixture.py                  // Loss function for training and sample function for testing
            ├──modules.py                  // Modules for Wavenet construction
            ├──upsample.py                 // Upsample layer definition
            ├──util.py                     // Utils. Note this script should be downloaded from the above link
            ├──wavenet.py                  // WaveNet networks
            └──tfcompat                    // Note this script should be downloaded from the above link
               ├──__init__.py
               └──hparam.py                // Param management tools
```

## [Script Parameters](#contents)

### Training

```text
usage: train.py  [--data_path DATA_PATH] [--preset PRESET]
                 [--checkpoint_dir CHECKPOINT_DIR] [--checkpoint CHECKPOINT]
                 [--speaker_id SPEAKER_ID] [--platform PLATFORM]
                 [--is_distributed IS_DISTRIBUTED]
options:
    --data_path                  dataset path
    --preset                     path of preset parameters (json)
    --checkpoint_dir             directory of saving model checkpoints
    --checkpoint                 pre-trained ckpt path, default is "./checkpoints"
    --speaker_id                 specific speaker of data in case for multi-speaker datasets, not used currently
    --platform                   specify platform to be used, defeault is "GPU"
    --is_distributed             whether distributed training or not

```

### Evaluation

```text
usage: evaluate.py  [--data_path DATA_PATH] [--preset PRESET]
                    [--pretrain_ckpt PRETRAIN_CKPT] [--is_numpy]
                    [--output_path OUTPUT_PATH] [--speaker_id SPEAKER_ID]
                    [--platform PLATFORM]
options:
    --data_path                  dataset path
    --preset                     path of preset parameters (json)
    --pretrain_ckpt              pre-trained ckpt path
    --is_numpy                   whether using numpy for inference or not
    --output_path                path to save synthesized audio
    --speaker_id                 specific speaker of data in case for multi-speaker datasets, not used currently
    --platform                   specify platform to be used, defeault is "GPU"
```

More parameters for training and evaluation can be set in file `hparams.py`.

## [Training Process](#contents)

Before your first training, some dependency scripts should be downloaded and placed in correct directory as described in [Script and Sample Code].
After that, raw data should be pre-processed by using the scripts in `egs`. The directory of egs is as follows:

```path
.
├── egs
    ├──gaussian
    │  ├──conf
    │  │  ├──gaussian_wavenet.json
    │  │  └──gaussian_wavenet_demo.json
    │  └──run.sh
    ├──mol
    │  ├──conf
    │  │  ├──mol_wavenet.json
    │  │  └──mol_wavenet_demo.json
    │  └──run.sh
    ├──mulaw256
    │  ├──conf
    │  │  ├──mulaw_wavenet.json
    │  │  └──mulaw_wavenet_demo.json
    │  └──run.sh
    └──README.md
```

In this project, three different losses are implemented to train the network:

- mulaw256: categorical output distribution. The input is 8-bit mulaw quantized waveform.
- mol: discretized mix logistic loss. The input is 16-bit raw audio.
- gaussian: mix gaussian loss. The input is 16-bit raw audio.

The three folder gaussian, mol, mulaw is used to generate corresponding training data respectively. For example, To generate the training data for
mix gaussian loss, you should first modify the `run.sh` in line 28. Change `conf/gaussian_wavenet_demo.json` to
`conf/gaussian_wavenet.json`. We use the default parameter in `gaussian_wavenet.json`. By this setting, data will be generated to adapt to mix gaussian loss and
some parameters in `hparams.py` will be covered by that in `gaussian_wavenet.json`. You can also define your own hyper-parameter json here. After the modification,
The following command can be ran for data generation. Note that if you want to change values of some parameters, you may need to modify in `gaussian_wavenet.json` instead of  `hparams.py` since `gaussian_wavenet.json` may cover that in`hparams.py`.

```bash
bash run.sh --stage 0 --stop-stage 0 --db-root /path_to_dataset/LJSpeech-1.1/wavs
bash run.sh --stage 1 --stop-stage 1
```

After the processing, the directory of gaussian will be as follows:

```path
.
├── gaussian
    ├──conf
    ├──data
    ├──exp
    └──dump
       └──lj
          └──logmelspectrogram
             ├──org
             └──norm
                ├──train_no_dev
                ├──dev
                └──eval
```

The train_no_dev folder contains the final training data. For mol and gaussian, the process is the same. When the training data is prepared,
you can run the following command to train the network:

```bash
Standalone training
GPU:
sh ./scripts/run_standalone_train_gpu.sh [CUDA_DEVICE_ID] [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_save_ckpt]

CPU:
sh ./scripts/run_standalone_train_cpu.sh [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_save_ckpt]

Distributed training(8p)
sh ./scripts/run_distribute_train_gpu.sh [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_save_ckpt]
```

## [Evaluation Process](#contents)

WaveNet has a process of auto-regression and this process currently cannot be run in Graph mode(place the auto-regression into `construct`). Therefore, we implement the process in a common function. Here, we provide two kinds of ways to realize the function: using Numpy or using MindSpore ops. One can set `is_numpy` to determine which mode is used. We recommend using numpy since it is much faster than using MindSpore ops. This is because the auto-regression process only calls some simple operation like Matmul and Bias_add. Unlike Graph mode, there will exist some fixed cost each step and this leads to a lower speed. For more information, please refer to
this [link](https://bbs.huaweicloud.com/forum/thread-94852-1-1.html)

```bash
Evaluation
GPU (using numpy):
sh ./scripts/run_eval_gpu.sh [CUDA_DEVICE_ID] [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_load_ckpt] is_numpy [path_to_save_audio]

GPU (using mindspore):
sh ./scripts/run_eval_gpu.sh [CUDA_DEVICE_ID] [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_load_ckpt] [path_to_save_audio]

CPU:
sh ./scripts/run_eval_cpu.sh [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_load_ckpt] [is_numpy] [path_to_save_audio]
```

## [Convert Process](#contents)

```bash
GPU:
python export.py --preset=/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json --checkpoint_dir=path_to_dump_hparams --pretrain_ckpt=path_to_load_ckpt

CPU:
python export.py --preset=/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json --checkpoint_dir=path_to_dump_hparams --pretrain_ckpt=path_to_load_ckpt --platform=CPU
```

# [Model Description](#contents)

## [Performance](#contents)

### Training Performance on GPU

| Parameters                 | WaveNet                                                      |
| -------------------------- | ---------------------------------------------------------------|
| Resource                   | NV SMX2 V100-32G              |
| uploaded Date              | 01/14/2021 (month/day/year)                                    |
| MindSpore Version          | 1.0.0                                                          |
| Dataset                    | LJSpeech-1.1                                                 |
| Training Parameters        | 1p, epoch=600(max), steps=1635 * epoch, batch_size = 8, lr=1e-3   |
| Optimizer                  | Adam                                                           |
| Loss Function              | SoftmaxCrossEntropyWithLogits/discretized_mix_logistic/mix_gaussian                                |
| Loss                       | around 2.0(mulaw256)/around 4.5(mol)/around -6.0(gaussian)                                                     |
| Speed                      | 1p 1.467s/step                                   |
| Total time: training       | 1p(mol/gaussian): around 4 days; 2p(mulaw256):around 1 week                                  |
| Checkpoint                 | 59.79MM/54.87M/54.83M (.ckpt file)                                              |
| Scripts                    | [WaveNet script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/audio/wavenet) |

### Inference Performance On GPU

Audio samples will be demonstrated online soon.

# [ModelZoo Homepage](#contents)

 Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).
-												wavenet

											
										
										
											4 years ago
+								# Contents
 								- [WaveNet Description](#WaveNet-description)
 								- [Model Architecture](#model-architecture)
 								- [Dataset](#dataset)
 								- [Environment Requirements](#environment-requirements)
 								- [Script Description](#script-description)
 								    - [Script and Sample Code](#script-and-sample-code)
 								    - [Script Parameters](#script-parameters)
 								    - [Training Process](#training-process)
 								    - [Evaluation Process](#evaluation-process)
 								    - [Convert Process](#convert-process)
 								- [Model Description](#model-description)
 								    - [Performance](#performance)
 								        - [Training Performance](#training-performance)
 								        - [Inference Performance](#inference-performance)
 								- [ModelZoo Homepage](#modelzoo-homepage)
 								# [WaveNet Description](#contents)
-												added Wavenet in CPU mode

											
										
										
											4 years ago
+								WaveNet is a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones. We support training and evaluation  on both GPU and CPU.
-												wavenet

											
										
										
											4 years ago
 								[Paper](https://arxiv.org/pdf/1609.03499.pdf): ord A, Dieleman S, Zen H, et al. Wavenet: A generative model for raw audio
 								# [Model Architecture](#contents)
 								The current model consists of a pre-convolution layer, followed by several residual block which has residual and skip connection with gated activation units.
 								Finally, post convolution layers are added to predict the distribution.
 								# [Dataset](#contents)
 								In the following sections, we will introduce how to run the scripts using the related dataset below.
 								Dataset used: [The LJ Speech Dataset](<https://keithito.com/LJ-Speech-Dataset>)
 								- Dataset size：2.6G
 								- Data format：audio clips(13100) and transcription
 								- The dataset structure is as follows:
 								    ```path
 								    .
 								    └── LJSpeech-1.1
 								        ├─ wavs                  //audio clips files
 								        └─ metadata.csv           //transcripts
 								    ```
 								# [Environment Requirements](#contents)
-												added Wavenet in CPU mode

											
										
										
											4 years ago
+								- Hardware（GPU/CPU）
 								    - Prepare hardware environment with GPU/CPU processor.
-												wavenet

											
										
										
											4 years ago
+								- Framework
-												deepspeech cpu training

											
										
										
											4 years ago
+								    - [MindSpore](https://www.mindspore.cn/install/en)
-												wavenet

											
										
										
											4 years ago
+								- For more information, please check the resources below：
-												deepspeech cpu training

											
										
										
											4 years ago
+								    - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
 								    - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
-												wavenet

											
										
										
											4 years ago
 								# [Script Description](#contents)
 								## [Script and Sample Code](#contents)
 								**Note that some of the scripts described below are not included our code**. These scripts should first be download them from [r9y9](https://github.com/r9y9/wavenet_vocoder) and added into this project.
 								```path
 								.
 								├── audio
 								    └──wavenet
-												add_wavenet_shell

											
										
										
											4 years ago
+								        ├── scripts
 								        │   ├──run_distribute_train_gpu.sh // launch distributed training with gpu platform(8p)
 								        │   ├──run_eval_cpu.sh             // launch evaluation with cpu platform
 								        │   ├──run_eval_gpu.sh             // launch evaluation with gpu platform
 								        │   ├──run_standalone_train_cpu.sh // launch standalone training with cpu platform
 								        │   └──run_standalone_train_gpu.sh // launch standalone training with gpu platform(1p)
 								        ├──datasets                        // Note the datasets folder should be downloaded from the above link
 								        ├──egs                             // Note the egs folder should be downloaded from the above link
 								        ├──utils                           // Note the utils folder should be downloaded from the above link
 								        ├── audio.py                       // Audio utils. Note this script should be downloaded from the above link
 								        ├── compute-meanvar-stats.py       // Compute mean-variance normalization stats. Note this script should be downloaded from the above link
 								        ├── evaluate.py                    // Evaluation
 								        ├── export.py                      // Convert mindspore model to air model
 								        ├── hparams.py                     // Hyper-parameter configuration. Note this script should be downloaded from the above link
 								        ├── mksubset.py                    // Make subset of dataset. Note this script should be downloaded from the above link
 								        ├── preprocess.py                  // Preprocess dataset. Note this script should be downloaded from the above link
 								        ├── preprocess_normalize.py        // Perform meanvar normalization to preprocessed features. Note this script should be downloaded from the above link
 								        ├── README.md                      // Descriptions about WaveNet
 								        ├── train.py                       // Training scripts
 								        ├── train_pytorch.py               // Note this script should be downloaded from the above link. The initial name of this script is train.py in the project from the link
-												wavenet

											
										
										
											4 years ago
+								        ├── src
 								        │   ├──__init__.py
-												add_wavenet_shell

											
										
										
											4 years ago
+								        │   ├──dataset.py                  // Generate dataloader and data processing entry
 								        │   ├──callback.py                 // Callbacks to monitor the training
 								        │   ├──lr_generator.py             // Learning rate generator
 								        │   └──loss.py                     // Loss function definition
-												wavenet

											
										
										
											4 years ago
+								        └── wavenet_vocoder
 								            ├──__init__.py
-												add_wavenet_shell

											
										
										
											4 years ago
+								            ├──conv.py                     // Extended 1D convolution
 								            ├──mixture.py                  // Loss function for training and sample function for testing
 								            ├──modules.py                  // Modules for Wavenet construction
 								            ├──upsample.py                 // Upsample layer definition
 								            ├──util.py                     // Utils. Note this script should be downloaded from the above link
 								            ├──wavenet.py                  // WaveNet networks
 								            └──tfcompat                    // Note this script should be downloaded from the above link
-												wavenet

											
										
										
											4 years ago
+								               ├──__init__.py
-												add_wavenet_shell

											
										
										
											4 years ago
+								               └──hparam.py                // Param management tools
-												wavenet

											
										
										
											4 years ago
+								```
 								## [Script Parameters](#contents)
 								### Training
 								```text
 								usage: train.py  [--data_path DATA_PATH] [--preset PRESET]
 								                 [--checkpoint_dir CHECKPOINT_DIR] [--checkpoint CHECKPOINT]
-												added Wavenet in CPU mode

											
										
										
											4 years ago
+								                 [--speaker_id SPEAKER_ID] [--platform PLATFORM]
 								                 [--is_distributed IS_DISTRIBUTED]
-												wavenet

											
										
										
											4 years ago
+								options:
 								    --data_path                  dataset path
 								    --preset                     path of preset parameters (json)
 								    --checkpoint_dir             directory of saving model checkpoints
 								    --checkpoint                 pre-trained ckpt path, default is "./checkpoints"
 								    --speaker_id                 specific speaker of data in case for multi-speaker datasets, not used currently
-												added Wavenet in CPU mode

											
										
										
											4 years ago
+								    --platform                   specify platform to be used, defeault is "GPU"
-												wavenet

											
										
										
											4 years ago
+								    --is_distributed             whether distributed training or not
 								```
 								### Evaluation
 								```text
 								usage: evaluate.py  [--data_path DATA_PATH] [--preset PRESET]
-												added Wavenet in CPU mode

											
										
										
											4 years ago
+								                    [--pretrain_ckpt PRETRAIN_CKPT] [--is_numpy]
 								                    [--output_path OUTPUT_PATH] [--speaker_id SPEAKER_ID]
 								                    [--platform PLATFORM]
-												wavenet

											
										
										
											4 years ago
+								options:
 								    --data_path                  dataset path
 								    --preset                     path of preset parameters (json)
 								    --pretrain_ckpt              pre-trained ckpt path
 								    --is_numpy                   whether using numpy for inference or not
 								    --output_path                path to save synthesized audio
 								    --speaker_id                 specific speaker of data in case for multi-speaker datasets, not used currently
-												added Wavenet in CPU mode

											
										
										
											4 years ago
+								    --platform                   specify platform to be used, defeault is "GPU"
-												wavenet

											
										
										
											4 years ago
+								```
 								More parameters for training and evaluation can be set in file `hparams.py`.
 								## [Training Process](#contents)
 								Before your first training, some dependency scripts should be downloaded and placed in correct directory as described in [Script and Sample Code].
 								After that, raw data should be pre-processed by using the scripts in `egs`. The directory of egs is as follows:
 								```path
 								.
 								├── egs
 								    ├──gaussian
 								    │  ├──conf
 								    │  │  ├──gaussian_wavenet.json
 								    │  │  └──gaussian_wavenet_demo.json
 								    │  └──run.sh
 								    ├──mol
 								    │  ├──conf
 								    │  │  ├──mol_wavenet.json
 								    │  │  └──mol_wavenet_demo.json
 								    │  └──run.sh
 								    ├──mulaw256
 								    │  ├──conf
 								    │  │  ├──mulaw_wavenet.json
 								    │  │  └──mulaw_wavenet_demo.json
 								    │  └──run.sh
 								    └──README.md
 								```
 								In this project, three different losses are implemented to train the network:
 								- mulaw256: categorical output distribution. The input is 8-bit mulaw quantized waveform.
 								- mol: discretized mix logistic loss. The input is 16-bit raw audio.
 								- gaussian: mix gaussian loss. The input is 16-bit raw audio.
 								The three folder gaussian, mol, mulaw is used to generate corresponding training data respectively. For example, To generate the training data for
 								mix gaussian loss, you should first modify the `run.sh` in line 28. Change `conf/gaussian_wavenet_demo.json` to
 								`conf/gaussian_wavenet.json`. We use the default parameter in `gaussian_wavenet.json`. By this setting, data will be generated to adapt to mix gaussian loss and
 								some parameters in `hparams.py` will be covered by that in `gaussian_wavenet.json`. You can also define your own hyper-parameter json here. After the modification,
 								The following command can be ran for data generation. Note that if you want to change values of some parameters, you may need to modify in `gaussian_wavenet.json` instead of  `hparams.py` since `gaussian_wavenet.json` may cover that in`hparams.py`.
 								```bash
 								bash run.sh --stage 0 --stop-stage 0 --db-root /path_to_dataset/LJSpeech-1.1/wavs
 								bash run.sh --stage 1 --stop-stage 1
 								```
 								After the processing, the directory of gaussian will be as follows:
 								```path
 								.
 								├── gaussian
 								    ├──conf
 								    ├──data
 								    ├──exp
 								    └──dump
 								       └──lj
 								          └──logmelspectrogram
 								             ├──org
 								             └──norm
 								                ├──train_no_dev
 								                ├──dev
 								                └──eval
 								```
-												added Wavenet in CPU mode

											
										
										
											4 years ago
+								The train_no_dev folder contains the final training data. For mol and gaussian, the process is the same. When the training data is prepared,
-												wavenet

											
										
										
											4 years ago
+								you can run the following command to train the network:
 								```bash
-												added Wavenet in CPU mode

											
										
										
											4 years ago
+								Standalone training
 								GPU:
-												add_wavenet_shell

											
										
										
											4 years ago
+								sh ./scripts/run_standalone_train_gpu.sh [CUDA_DEVICE_ID] [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_save_ckpt]
-												wavenet

											
										
										
											4 years ago
-												added Wavenet in CPU mode

											
										
										
											4 years ago
+								CPU:
-												add_wavenet_shell

											
										
										
											4 years ago
+								sh ./scripts/run_standalone_train_cpu.sh [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_save_ckpt]
-												wavenet

											
										
										
											4 years ago
-												add_wavenet_shell

											
										
										
											4 years ago
+								Distributed training(8p)
 								sh ./scripts/run_distribute_train_gpu.sh [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_save_ckpt]
-												wavenet

											
										
										
											4 years ago
+								```
 								## [Evaluation Process](#contents)
 								WaveNet has a process of auto-regression and this process currently cannot be run in Graph mode(place the auto-regression into `construct`). Therefore, we implement the process in a common function. Here, we provide two kinds of ways to realize the function: using Numpy or using MindSpore ops. One can set `is_numpy` to determine which mode is used. We recommend using numpy since it is much faster than using MindSpore ops. This is because the auto-regression process only calls some simple operation like Matmul and Bias_add. Unlike Graph mode, there will exist some fixed cost each step and this leads to a lower speed. For more information, please refer to
 								this [link](https://bbs.huaweicloud.com/forum/thread-94852-1-1.html)
 								```bash
-												added Wavenet in CPU mode

											
										
										
											4 years ago
+								Evaluation
-												add_wavenet_shell

											
										
										
											4 years ago
+								GPU (using numpy):
 								sh ./scripts/run_eval_gpu.sh [CUDA_DEVICE_ID] [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_load_ckpt] is_numpy [path_to_save_audio]
 								GPU (using mindspore):
 								sh ./scripts/run_eval_gpu.sh [CUDA_DEVICE_ID] [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_load_ckpt] [path_to_save_audio]
-												added Wavenet in CPU mode

											
										
										
											4 years ago
 								CPU:
-												add_wavenet_shell

											
										
										
											4 years ago
+								sh ./scripts/run_eval_cpu.sh [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_load_ckpt] [is_numpy] [path_to_save_audio]
-												wavenet

											
										
										
											4 years ago
+								```
 								## [Convert Process](#contents)
 								```bash
-												added Wavenet in CPU mode

											
										
										
											4 years ago
+								GPU:
 								python export.py --preset=/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json --checkpoint_dir=path_to_dump_hparams --pretrain_ckpt=path_to_load_ckpt
 								CPU:
 								python export.py --preset=/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json --checkpoint_dir=path_to_dump_hparams --pretrain_ckpt=path_to_load_ckpt --platform=CPU
-												wavenet

											
										
										
											4 years ago
+								```
 								# [Model Description](#contents)
 								## [Performance](#contents)
-												added Wavenet in CPU mode

											
										
										
											4 years ago
+								### Training Performance on GPU
-												wavenet

											
										
										
											4 years ago
 								| Parameters                 | WaveNet                                                      |
 								| -------------------------- | ---------------------------------------------------------------|
 								| Resource                   | NV SMX2 V100-32G              |
 								| uploaded Date              | 01/14/2021 (month/day/year)                                    |
 								| MindSpore Version          | 1.0.0                                                          |
 								| Dataset                    | LJSpeech-1.1                                                 |
 								| Training Parameters        | 1p, epoch=600(max), steps=1635 * epoch, batch_size = 8, lr=1e-3   |
 								| Optimizer                  | Adam                                                           |
 								| Loss Function              | SoftmaxCrossEntropyWithLogits/discretized_mix_logistic/mix_gaussian                                |
 								| Loss                       | around 2.0(mulaw256)/around 4.5(mol)/around -6.0(gaussian)                                                     |
 								| Speed                      | 1p 1.467s/step                                   |
 								| Total time: training       | 1p(mol/gaussian): around 4 days; 2p(mulaw256):around 1 week                                  |
 								| Checkpoint                 | 59.79MM/54.87M/54.83M (.ckpt file)                                              |
 								| Scripts                    | [WaveNet script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/audio/wavenet) |
 								### Inference Performance On GPU
 								Audio samples will be demonstrated online soon.
 								# [ModelZoo Homepage](#contents)
 								 Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).