WaveNet is a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones. We support training and evaluation on both GPU and CPU.
[Paper](https://arxiv.org/pdf/1609.03499.pdf): ord A, Dieleman S, Zen H, et al. Wavenet: A generative model for raw audio
# [Model Architecture](#contents)
The current model consists of a pre-convolution layer, followed by several residual block which has residual and skip connection with gated activation units.
Finally, post convolution layers are added to predict the distribution.
# [Dataset](#contents)
In the following sections, we will introduce how to run the scripts using the related dataset below.
**Note that some of the scripts described below are not included our code**. These scripts should first be download them from [r9y9](https://github.com/r9y9/wavenet_vocoder) and added into this project.
│ ├──run_distribute_train_gpu.sh // launch distributed training with gpu platform(8p)
│ ├──run_eval_cpu.sh // launch evaluation with cpu platform
│ ├──run_eval_gpu.sh // launch evaluation with gpu platform
│ ├──run_standalone_train_cpu.sh // launch standalone training with cpu platform
│ └──run_standalone_train_gpu.sh // launch standalone training with gpu platform(1p)
├──datasets // Note the datasets folder should be downloaded from the above link
├──egs // Note the egs folder should be downloaded from the above link
├──utils // Note the utils folder should be downloaded from the above link
├── audio.py // Audio utils. Note this script should be downloaded from the above link
├── compute-meanvar-stats.py // Compute mean-variance normalization stats. Note this script should be downloaded from the above link
├── evaluate.py // Evaluation
├── export.py // Convert mindspore model to air model
├── hparams.py // Hyper-parameter configuration. Note this script should be downloaded from the above link
├── mksubset.py // Make subset of dataset. Note this script should be downloaded from the above link
├── preprocess.py // Preprocess dataset. Note this script should be downloaded from the above link
├── preprocess_normalize.py // Perform meanvar normalization to preprocessed features. Note this script should be downloaded from the above link
├── README.md // Descriptions about WaveNet
├── train.py // Training scripts
├── train_pytorch.py // Note this script should be downloaded from the above link. The initial name of this script is train.py in the project from the link
More parameters for training and evaluation can be set in file `hparams.py`.
## [Training Process](#contents)
Before your first training, some dependency scripts should be downloaded and placed in correct directory as described in [Script and Sample Code].
After that, raw data should be pre-processed by using the scripts in `egs`. The directory of egs is as follows:
```path
.
├── egs
├──gaussian
│ ├──conf
│ │ ├──gaussian_wavenet.json
│ │ └──gaussian_wavenet_demo.json
│ └──run.sh
├──mol
│ ├──conf
│ │ ├──mol_wavenet.json
│ │ └──mol_wavenet_demo.json
│ └──run.sh
├──mulaw256
│ ├──conf
│ │ ├──mulaw_wavenet.json
│ │ └──mulaw_wavenet_demo.json
│ └──run.sh
└──README.md
```
In this project, three different losses are implemented to train the network:
- mulaw256: categorical output distribution. The input is 8-bit mulaw quantized waveform.
- mol: discretized mix logistic loss. The input is 16-bit raw audio.
- gaussian: mix gaussian loss. The input is 16-bit raw audio.
The three folder gaussian, mol, mulaw is used to generate corresponding training data respectively. For example, To generate the training data for
mix gaussian loss, you should first modify the `run.sh` in line 28. Change `conf/gaussian_wavenet_demo.json` to
`conf/gaussian_wavenet.json`. We use the default parameter in `gaussian_wavenet.json`. By this setting, data will be generated to adapt to mix gaussian loss and
some parameters in `hparams.py` will be covered by that in `gaussian_wavenet.json`. You can also define your own hyper-parameter json here. After the modification,
The following command can be ran for data generation. Note that if you want to change values of some parameters, you may need to modify in `gaussian_wavenet.json` instead of `hparams.py` since `gaussian_wavenet.json` may cover that in`hparams.py`.
sh ./scripts/run_standalone_train_cpu.sh [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_save_ckpt]
sh ./scripts/run_distribute_train_gpu.sh [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_save_ckpt]
WaveNet has a process of auto-regression and this process currently cannot be run in Graph mode(place the auto-regression into `construct`). Therefore, we implement the process in a common function. Here, we provide two kinds of ways to realize the function: using Numpy or using MindSpore ops. One can set `is_numpy` to determine which mode is used. We recommend using numpy since it is much faster than using MindSpore ops. This is because the auto-regression process only calls some simple operation like Matmul and Bias_add. Unlike Graph mode, there will exist some fixed cost each step and this leads to a lower speed. For more information, please refer to
this [link](https://bbs.huaweicloud.com/forum/thread-94852-1-1.html)