232
GitHub - pyannote/pyannote-audio: Neural building blocks for speaker diarization...
source link: https://github.com/pyannote/pyannote-audio
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
README.md
Announcement
Open Phd/postdoc positions at LIMSI combining machine learning, NLP, speech processing, and computer vision.
pyannote-audio
Neural building blocks for speaker diarization
Installation
$ conda create --name pyannote python=3.6 anaconda
$ source activate pyannote
$ conda install -c conda-forge yaafe
$ conda install cmake
$ pip install -U pip setuptools
$ pip install --process-dependency-links pyannote.audio
Citation
If you use pyannote.audio
in your research, please use the following citations.
- Speech activity and speaker change detection
@inproceedings{Yin2017, Author = {Ruiqing Yin and Herv\'e Bredin and Claude Barras}, Title = {{Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks}}, Booktitle = {{18th Annual Conference of the International Speech Communication Association, Interspeech 2017}}, Year = {2017}, Month = {August}, Address = {Stockholm, Sweden}, Url = {https://github.com/yinruiqing/change_detection} }
- Speaker embedding
@inproceedings{Bredin2017, author = {Herv\'{e} Bredin}, title = {{TristouNet: Triplet Loss for Speaker Turn Embedding}}, booktitle = {42nd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017}, year = {2017}, url = {http://arxiv.org/abs/1609.04301}, }
- Speaker diarization pipeline
@inproceedings{Yin2018, Author = {Ruiqing Yin and Herv\'e Bredin and Claude Barras}, Title = {{Neural Speech Turn Segmentation and Affinity Propagation for Speaker Diarization}}, Booktitle = {{19th Annual Conference of the International Speech Communication Association, Interspeech 2018}}, Year = {2018}, Month = {September}, Address = {Hyderabad, India}, }
Tutorials
- Feature extraction
- LSTM-based speech activity detection
- LSTM-based speaker change detection
- TristouNet neural speech turn embedding
- Speaker diarization pipeline
Documentation
The API is unfortunately not documented yet.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK