5

GitHub - tianrengao/SqueezeWave

 4 years ago
source link: https://github.com/tianrengao/SqueezeWave
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

README.md

SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis

By Bohan Zhai *, Tianren Gao *, Flora Xue, Daniel Rothchild, Bichen Wu, Joseph Gonzalez, and Kurt Keutzer (UC Berkeley)

Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into an audio waveform. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. WaveGlow is a flow-based feed-forward alternative to these auto-regressive models (Prenger et al., 2019). However, while WaveGlow can be easily parallelized, the model is too expensive for real-time speech synthesis on the edge. This paper presents SqueezeWave, a family of lightweight vocoders based on WaveGlow that can generate audio of similar quality to WaveGlow with 61x - 214x fewer MACs.

Link to the paper: paper. If you find this work useful, please consider citing

@inproceedings{squeezewave,
   Author = {Bohan Zhai, Tianren Gao, Flora Xue, Daniel Rothchild, Bichen Wu, Joseph Gonzalez, Kurt Keutzer},
   Title = {SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis},
   Journal = {arXiv:2001.05685},
   Year = {2020}
}

Audio samples generated by SqueezeWave

Audio samples of SqueezeWave are here: https://tianrengao.github.io/SqueezeWaveDemo/

Results

We introduce 4 variants of SqueezeWave in our paper. See the table below.

Model length n_channels MACs Reduction MOS WaveGlow 2048 8 228.9 1x 4.57±0.04 SqueezeWave-128L 128 256 3.78 60x 4.07±0.06 SqueezeWave-64L 64 256 2.16 106x 3.77±0.05 SqueezeWave-128S 128 128 1.06 214x 3.79±0.05 SqueezeWave-64S 64 128 0.68 332x 2.74±0.04

Model Complexity

A detailed MAC calculation can be found from here

Setup

  1. (Optional) Create a virtual environment

    virtualenv env
    source env/bin/activate
    
  2. Clone our repo and initialize submodule

    git clone https://github.com/BohanZhai/SqueezeWave.git
    cd SqueezeWave
    git submodule init
    git submodule update
  3. Install requirements pip3 install -r requirements.txt

  4. Install Apex

    cd ../
    git clone https://www.github.com/nvidia/apex
    cd apex
    python setup.py install

Generate audio with our pretrained model

  1. Download our pretrained models. We provide 4 pretrained models as described in the paper.

  2. Download mel-spectrograms

  3. Generate audio. Please replace SqueezeWave.pt to the specific pretrained model's name.

    python3 inference.py -f <(ls mel_spectrograms/*.pt) -w SqueezeWave.pt -o . --is_fp16 -s 0.6

Train your own model

  1. Download LJ Speech Data. We assume all the waves are stored in the directory ^/data/

  2. Make a list of the file names to use for training/testing

    ls data/*.wav | tail -n+10 > train_files.txt
    ls data/*.wav | head -n10 > test_files.txt
  3. We provide 4 model configurations with audio channel and channel numbers specified in the table below. The configuration files are under /configs directory. To choose the model you want to train, select the corresponding configuration file.

  4. Train your SqueezeWave model

    mkdir checkpoints
    python train.py -c configs/config_a256_c128.json

    For multi-GPU training replace train.py with distributed.py. Only tested with single node and NCCL.

    For mixed precision training set "fp16_run": true on config.json.

  5. Make test set mel-spectrograms

    mkdir -p eval/mels
    python3 mel2samp.py -f test_files.txt -o eval/mels -c configs/config_a128_c256.json
    
  6. Run inference on the test data.

    ls eval/mels > eval/mel_files.txt
    sed -i -e 's_.*_eval/mels/&_' eval/mel_files.txt
    mkdir -p eval/output
    python3 inference.py -f eval/mel_files.txt -w checkpoints/SqueezeWave_10000 -o eval/output --is_fp16 -s 0.6

    Replace SqueezeWave_10000 with the checkpoint you want to test.

Credits

The implementation of this work is based on WaveGlow: https://github.com/NVIDIA/waveglow


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK