History

wanyiming 77c2559387 add_wavenet_shell		4 years ago
..
scripts	add_wavenet_shell	4 years ago
src	added Wavenet in CPU mode	4 years ago
wavenet_vocoder	!11859 Added Wavenet model Scripts in CPU mode.	4 years ago
README.md	add_wavenet_shell	4 years ago
evaluate.py	added Wavenet in CPU mode	4 years ago
export.py	added Wavenet in CPU mode	4 years ago
train.py	added Wavenet in CPU mode	4 years ago

README.md

Unescape Escape

WaveNet Description
Model Architecture
Dataset
Environment Requirements
Script Description
Model Description
- Performance
  - Training Performance
  - Inference Performance
ModelZoo Homepage

WaveNet Description

WaveNet is a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones. We support training and evaluation on both GPU and CPU.

Paper: ord A, Dieleman S, Zen H, et al. Wavenet: A generative model for raw audio

Model Architecture

The current model consists of a pre-convolution layer, followed by several residual block which has residual and skip connection with gated activation units. Finally, post convolution layers are added to predict the distribution.

Dataset

In the following sections, we will introduce how to run the scripts using the related dataset below.

Dataset used: The LJ Speech Dataset

Dataset size：2.6G
Data format：audio clips(13100) and transcription

The dataset structure is as follows:

.
└── LJSpeech-1.1
    ├─ wavs                  //audio clips files
    └─ metadata.csv           //transcripts

Environment Requirements

Hardware（GPU/CPU）
- Prepare hardware environment with GPU/CPU processor.
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore tutorials
- MindSpore Python API

Script Description

Script and Sample Code

Note that some of the scripts described below are not included our code. These scripts should first be download them from r9y9 and added into this project.

.
├── audio
    └──wavenet
        ├── scripts
        │   ├──run_distribute_train_gpu.sh // launch distributed training with gpu platform(8p)
        │   ├──run_eval_cpu.sh             // launch evaluation with cpu platform
        │   ├──run_eval_gpu.sh             // launch evaluation with gpu platform
        │   ├──run_standalone_train_cpu.sh // launch standalone training with cpu platform
        │   └──run_standalone_train_gpu.sh // launch standalone training with gpu platform(1p)
        ├──datasets                        // Note the datasets folder should be downloaded from the above link
        ├──egs                             // Note the egs folder should be downloaded from the above link  
        ├──utils                           // Note the utils folder should be downloaded from the above link  
        ├── audio.py                       // Audio utils. Note this script should be downloaded from the above link
        ├── compute-meanvar-stats.py       // Compute mean-variance normalization stats. Note this script should be downloaded from the above link
        ├── evaluate.py                    // Evaluation
        ├── export.py                      // Convert mindspore model to air model  
        ├── hparams.py                     // Hyper-parameter configuration. Note this script should be downloaded from the above link
        ├── mksubset.py                    // Make subset of dataset. Note this script should be downloaded from the above link
        ├── preprocess.py                  // Preprocess dataset. Note this script should be downloaded from the above link
        ├── preprocess_normalize.py        // Perform meanvar normalization to preprocessed features. Note this script should be downloaded from the above link
        ├── README.md                      // Descriptions about WaveNet
        ├── train.py                       // Training scripts
        ├── train_pytorch.py               // Note this script should be downloaded from the above link. The initial name of this script is train.py in the project from the link
        ├── src
        │   ├──__init__.py
        │   ├──dataset.py                  // Generate dataloader and data processing entry
        │   ├──callback.py                 // Callbacks to monitor the training
        │   ├──lr_generator.py             // Learning rate generator
        │   └──loss.py                     // Loss function definition
        └── wavenet_vocoder
            ├──__init__.py
            ├──conv.py                     // Extended 1D convolution
            ├──mixture.py                  // Loss function for training and sample function for testing
            ├──modules.py                  // Modules for Wavenet construction
            ├──upsample.py                 // Upsample layer definition
            ├──util.py                     // Utils. Note this script should be downloaded from the above link
            ├──wavenet.py                  // WaveNet networks
            └──tfcompat                    // Note this script should be downloaded from the above link
               ├──__init__.py
               └──hparam.py                // Param management tools

Script Parameters

Training

usage: train.py  [--data_path DATA_PATH] [--preset PRESET]
                 [--checkpoint_dir CHECKPOINT_DIR] [--checkpoint CHECKPOINT]
                 [--speaker_id SPEAKER_ID] [--platform PLATFORM]
                 [--is_distributed IS_DISTRIBUTED]
options:
    --data_path                  dataset path
    --preset                     path of preset parameters (json)
    --checkpoint_dir             directory of saving model checkpoints
    --checkpoint                 pre-trained ckpt path, default is "./checkpoints"
    --speaker_id                 specific speaker of data in case for multi-speaker datasets, not used currently
    --platform                   specify platform to be used, defeault is "GPU"
    --is_distributed             whether distributed training or not

Evaluation

usage: evaluate.py  [--data_path DATA_PATH] [--preset PRESET]
                    [--pretrain_ckpt PRETRAIN_CKPT] [--is_numpy]
                    [--output_path OUTPUT_PATH] [--speaker_id SPEAKER_ID]
                    [--platform PLATFORM]
options:
    --data_path                  dataset path
    --preset                     path of preset parameters (json)
    --pretrain_ckpt              pre-trained ckpt path
    --is_numpy                   whether using numpy for inference or not
    --output_path                path to save synthesized audio
    --speaker_id                 specific speaker of data in case for multi-speaker datasets, not used currently
    --platform                   specify platform to be used, defeault is "GPU"

More parameters for training and evaluation can be set in file hparams.py.

Training Process

Before your first training, some dependency scripts should be downloaded and placed in correct directory as described in [Script and Sample Code]. After that, raw data should be pre-processed by using the scripts in egs. The directory of egs is as follows:

.
├── egs
    ├──gaussian
    │  ├──conf
    │  │  ├──gaussian_wavenet.json
    │  │  └──gaussian_wavenet_demo.json
    │  └──run.sh
    ├──mol
    │  ├──conf
    │  │  ├──mol_wavenet.json
    │  │  └──mol_wavenet_demo.json
    │  └──run.sh
    ├──mulaw256
    │  ├──conf
    │  │  ├──mulaw_wavenet.json
    │  │  └──mulaw_wavenet_demo.json
    │  └──run.sh
    └──README.md

In this project, three different losses are implemented to train the network:

mulaw256: categorical output distribution. The input is 8-bit mulaw quantized waveform.
mol: discretized mix logistic loss. The input is 16-bit raw audio.
gaussian: mix gaussian loss. The input is 16-bit raw audio.

The three folder gaussian, mol, mulaw is used to generate corresponding training data respectively. For example, To generate the training data for mix gaussian loss, you should first modify the run.sh in line 28. Change conf/gaussian_wavenet_demo.json to conf/gaussian_wavenet.json. We use the default parameter in gaussian_wavenet.json. By this setting, data will be generated to adapt to mix gaussian loss and some parameters in hparams.py will be covered by that in gaussian_wavenet.json. You can also define your own hyper-parameter json here. After the modification, The following command can be ran for data generation. Note that if you want to change values of some parameters, you may need to modify in gaussian_wavenet.json instead of hparams.py since gaussian_wavenet.json may cover that inhparams.py.

bash run.sh --stage 0 --stop-stage 0 --db-root /path_to_dataset/LJSpeech-1.1/wavs
bash run.sh --stage 1 --stop-stage 1

After the processing, the directory of gaussian will be as follows:

.
├── gaussian
    ├──conf
    ├──data
    ├──exp
    └──dump
       └──lj
          └──logmelspectrogram
             ├──org
             └──norm
                ├──train_no_dev
                ├──dev
                └──eval

The train_no_dev folder contains the final training data. For mol and gaussian, the process is the same. When the training data is prepared, you can run the following command to train the network:

Standalone training
GPU:
sh ./scripts/run_standalone_train_gpu.sh [CUDA_DEVICE_ID] [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_save_ckpt]

CPU:
sh ./scripts/run_standalone_train_cpu.sh [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_save_ckpt]

Distributed training(8p)
sh ./scripts/run_distribute_train_gpu.sh [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_save_ckpt]

Evaluation Process

WaveNet has a process of auto-regression and this process currently cannot be run in Graph mode(place the auto-regression into construct). Therefore, we implement the process in a common function. Here, we provide two kinds of ways to realize the function: using Numpy or using MindSpore ops. One can set is_numpy to determine which mode is used. We recommend using numpy since it is much faster than using MindSpore ops. This is because the auto-regression process only calls some simple operation like Matmul and Bias_add. Unlike Graph mode, there will exist some fixed cost each step and this leads to a lower speed. For more information, please refer to this link

Evaluation
GPU (using numpy):
sh ./scripts/run_eval_gpu.sh [CUDA_DEVICE_ID] [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_load_ckpt] is_numpy [path_to_save_audio]

GPU (using mindspore):
sh ./scripts/run_eval_gpu.sh [CUDA_DEVICE_ID] [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_load_ckpt] [path_to_save_audio]

CPU:
sh ./scripts/run_eval_cpu.sh [/path_to_egs/egs/gaussian/dump/lj/logmelspectrogram/norm/] [/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json] [path_to_load_ckpt] [is_numpy] [path_to_save_audio]

Convert Process

GPU:
python export.py --preset=/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json --checkpoint_dir=path_to_dump_hparams --pretrain_ckpt=path_to_load_ckpt

CPU:
python export.py --preset=/path_to_egs/egs/gaussian/conf/gaussian_wavenet.json --checkpoint_dir=path_to_dump_hparams --pretrain_ckpt=path_to_load_ckpt --platform=CPU

Model Description

Performance

Training Performance on GPU

Parameters	WaveNet
Resource	NV SMX2 V100-32G
uploaded Date	01/14/2021 (month/day/year)
MindSpore Version	1.0.0
Dataset	LJSpeech-1.1
Training Parameters	1p, epoch=600(max), steps=1635 * epoch, batch_size = 8, lr=1e-3
Optimizer	Adam
Loss Function	SoftmaxCrossEntropyWithLogits/discretized_mix_logistic/mix_gaussian
Loss	around 2.0(mulaw256)/around 4.5(mol)/around -6.0(gaussian)
Speed	1p 1.467s/step
Total time: training	1p(mol/gaussian): around 4 days; 2p(mulaw256):around 1 week
Checkpoint	59.79MM/54.87M/54.83M (.ckpt file)
Scripts	WaveNet script

Inference Performance On GPU

Audio samples will be demonstrated online soon.

ModelZoo Homepage

Please check the official homepage.

README.md Unescape Escape

Contents

Training

Evaluation

Training Performance on GPU

Inference Performance On GPU

README.md

Unescape Escape