Identify a spoken language using artificial intelligence (LID)
source link: https://www.tuicool.com/articles/hit/fuANRfe
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
spoken language identification
Identify a spoken language using artificial intelligence (LID). The solution uses the convolutional neural network in order to detect language specific phonemes. It supports 3 languages: English, German and Spanish. The inspiration for the project came from the TopCoder contest, Spoken Languages 2 .
Take a look at theDemosection to try the project yourself against real life content.
Dataset
New dataset was created from scratch.
LibriVox recordings were used to prepare the dataset. Particular attention was paid to a big variety of unique speakers. Big variance forces the network to concentrate more on language properties than a specific voice. Samples are equally balanced between languages, genders and speakers in order not to favour any subgroup. Finally speakers present in the test set, are not present in the train set. This helps estimate a generalization error.
More information at tomasz-oponowicz/spoken_language_dataset .
Architecture
The first step is to normalize input audio. Each sample is an FLAC audio file with:
- sample rate: 22050
- bit depth: 16
- channels: 1
- duration: 10 seconds (sharp)
Next filter banks are extracted from samples. Mean and variance normalization is applied. Then data is scaled with the Min/Max scaler .
Finally preprocessed data is passed to the convolutional neural network . Please notice the AveragePooling2D layer which improved the performance. This strategy is called global average pooling. It effectively forces the previous layers to produce the confidence maps.
The output is multiclass.
Performance
The score against the test set (out-of-sample) is 97% (F1 metric). Additionally the network generalizes well and presents high score against real life content, for example podcasts or TV news.
Sound effects or languages other than English, German or Spanish may be badly classified. If you want to work with noisy audio consider filtering noise out beforehand.
Demo
Prerequisites
- docker is installed (tested with 18.03.0)
Steps
-
Create a temporary directory and change the current directory:
$ mkdir examples && cd $_
-
Download samples:
NOTE: An audio file should contain speech and silence only. For example podcasts, interviews or audiobooks are a good fit. Sound effects or languages other than English, German or Spanish may be badly classified.
-
English (confidence 85.36%):
$ wget "https://javascriptair.podbean.com/mf/player-preload/nkdkps/048_JavaScript_Air_-_JavaScript_and_the_Web_Platform_The_Grand_Finale_.mp3" -O en.mp3
-
German (confidence 85.53%):
$ wget "http://mp3-download.ard.de/radio/radiofeature/auf-die-fresse-xa9c.l.mp3" -O de.mp3
-
Spanish (confidence 86.96%):
$ wget "http://mvod.lvlt.rtve.es/resources/TE_SCINCOC/mp3/2/8/1526585716282.mp3" -O es.mp3
-
-
Build the docker image:
$ docker build -t sli --rm https://github.com/tomasz-oponowicz/spoken_language_identification.git
-
Mount the
examples
directory and classify an audio file, for example:$ docker run --rm -it -v $(pwd):/data sli /data/en.mp3
Train
Prerequisites
- ffmpeg is installed (tested with 3.4.2)
- sox is installed (tested with 14.4.2)
- docker is installed (tested with 18.03.0)
Steps
-
Clone the repository:
$ git clone [email protected]:tomasz-oponowicz/spoken_language_identification.git
-
Go to the newly created directory:
$ cd spoken_language_identification
-
Generate samples:
-
Fetch the spoken_language_dataset dataset:
$ git submodule update --init --recursive
-
Go to the dataset directory:
$ cd spoken_language_dataset
-
Generate samples:
$ make build
-
Fix file permission of newly generated samples:
$ make fix_permissions
-
Return to the
spoken_language_identification
directory$ cd ..
-
-
Install dependencies
$ pip install -r requirements.txt
...the
tensorflow
package is installed by default (i.e. CPU support only). In order to speed up the training, install thetensorflow-gpu
package instead (i.e. GPU support). More information at Installing TensorFlow . -
Generate features from samples:
$ python features.py
-
Normalize features and build folds:
$ python folds.py
-
Train the model:
$ python model.py
...new model is stored at
model.h5
.
Release history
- 2018-07-06 / v1.0 / Initial version
Recommend
-
11
Putting a lid on those who are known for doing things It's time for another analogy. Let's say you worked someplace for a while, and had managed to create a meme about something without really trying. Maybe you posted a fu...
-
2
Ways to Put a Lid on Cloud WasteIt's easy to underuse and overspend on cloud assets. The good news is that there are tools and practices that IT can apply to better manage cloud assets and tamp down the waste. Credit: gearstd...
-
2
Whistleblower Blows Lid on the Decentralized Internet: An Interview with Mr. XDecember 11th 2021 new story9
-
6
-
4
Is it OK to stack things on top of a closed lid MacBook Pro?
-
3
The higher the lid, the dimmer the display - is it the screen or the screen cable?
-
7
October 19, 2022 ...
-
1
Google plans giant AI language model supporting world’s 1,000 most spoken languagesGoogle plans giant AI language model supporting world’s 1,000 most spoken languages / It’s an ambitious project in its early...
-
3
Researchers identify 6 challenges humans face with artificial intelligence by Beatriz Nina Ribeiro Oliveira,...
-
1
Artificial intelligence can identify patterns in surface cracking to assess damage in reinforced concrete structures by
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK