GitHub - tianrengao/SqueezeWave
source link: https://github.com/tianrengao/SqueezeWave
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
README.md
SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis
By Bohan Zhai *, Tianren Gao *, Flora Xue, Daniel Rothchild, Bichen Wu, Joseph Gonzalez, and Kurt Keutzer (UC Berkeley)
Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into an audio waveform. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. WaveGlow is a flow-based feed-forward alternative to these auto-regressive models (Prenger et al., 2019). However, while WaveGlow can be easily parallelized, the model is too expensive for real-time speech synthesis on the edge. This paper presents SqueezeWave, a family of lightweight vocoders based on WaveGlow that can generate audio of similar quality to WaveGlow with 61x - 214x fewer MACs.
Link to the paper: paper. If you find this work useful, please consider citing
@inproceedings{squeezewave,
Author = {Bohan Zhai, Tianren Gao, Flora Xue, Daniel Rothchild, Bichen Wu, Joseph Gonzalez, Kurt Keutzer},
Title = {SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis},
Journal = {arXiv:2001.05685},
Year = {2020}
}
Audio samples generated by SqueezeWave
Audio samples of SqueezeWave are here: https://tianrengao.github.io/SqueezeWaveDemo/
Results
We introduce 4 variants of SqueezeWave in our paper. See the table below.
Model length n_channels MACs Reduction MOS WaveGlow 2048 8 228.9 1x 4.57±0.04 SqueezeWave-128L 128 256 3.78 60x 4.07±0.06 SqueezeWave-64L 64 256 2.16 106x 3.77±0.05 SqueezeWave-128S 128 128 1.06 214x 3.79±0.05 SqueezeWave-64S 64 128 0.68 332x 2.74±0.04Model Complexity
A detailed MAC calculation can be found from here
Setup
-
(Optional) Create a virtual environment
virtualenv env source env/bin/activate
-
Clone our repo and initialize submodule
git clone https://github.com/BohanZhai/SqueezeWave.git cd SqueezeWave git submodule init git submodule update
-
Install requirements
pip3 install -r requirements.txt
-
Install Apex
cd ../ git clone https://www.github.com/nvidia/apex cd apex python setup.py install
Generate audio with our pretrained model
-
Download our pretrained models. We provide 4 pretrained models as described in the paper.
-
Download mel-spectrograms
-
Generate audio. Please replace
SqueezeWave.pt
to the specific pretrained model's name.python3 inference.py -f <(ls mel_spectrograms/*.pt) -w SqueezeWave.pt -o . --is_fp16 -s 0.6
Train your own model
-
Download LJ Speech Data. We assume all the waves are stored in the directory
^/data/
-
Make a list of the file names to use for training/testing
ls data/*.wav | tail -n+10 > train_files.txt ls data/*.wav | head -n10 > test_files.txt
-
We provide 4 model configurations with audio channel and channel numbers specified in the table below. The configuration files are under
/configs
directory. To choose the model you want to train, select the corresponding configuration file. -
Train your SqueezeWave model
mkdir checkpoints python train.py -c configs/config_a256_c128.json
For multi-GPU training replace
train.py
withdistributed.py
. Only tested with single node and NCCL.For mixed precision training set
"fp16_run": true
onconfig.json
. -
Make test set mel-spectrograms
mkdir -p eval/mels python3 mel2samp.py -f test_files.txt -o eval/mels -c configs/config_a128_c256.json
-
Run inference on the test data.
ls eval/mels > eval/mel_files.txt sed -i -e 's_.*_eval/mels/&_' eval/mel_files.txt mkdir -p eval/output python3 inference.py -f eval/mel_files.txt -w checkpoints/SqueezeWave_10000 -o eval/output --is_fp16 -s 0.6
Replace
SqueezeWave_10000
with the checkpoint you want to test.
Credits
The implementation of this work is based on WaveGlow: https://github.com/NVIDIA/waveglow
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK