4

Mozilla Foundation - Mozilla Common Voice Adds 16 New Languages and 4,600 New Ho...

 2 years ago
source link: https://foundation.mozilla.org/en/blog/mozilla-common-voice-adds-16-new-languages-and-4600-new-hours-of-speech/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Mozilla Common Voice Adds 16 New Languages and 4,600 New Hours of Speech

Mozilla

By Mozilla | Aug. 2, 2021

The Mozilla Common Voice initiative has released a new, expanded data set featuring 16 new languages — like Basaa and Kazakh — and 4,622 new hours of speech.

Mozilla Common Voice is an open-source initiative to make voice technology more inclusive. Contributors donate speech data to a public dataset, which anyone can then use to train voice-enabled technology.

Says Hillary Juma, Common Voice Community Manager: “Internet access is increasingly mediated through speech: Voice assistants and smart speakers give us directions, search for information, connect us to friends, used in assistive technology and much more. Yet this technology doesn’t work for millions of people. For example, neither Amazon’s Alexa, Apple’s Siri, nor Google Home support a single native African language.”

Hillary continues: “By giving individuals the ability to share their speech, we can help ensure all communities have access to voice technology and the opportunity it unlocks."

In recent months, Mozilla has also announced three Common Voice fellows, a $3.4 million investment to fuel work in East Africa, and a partnership with NVIDIA.

"By giving individuals the ability to share their speech, we can help ensure all communities have access to voice technology and the opportunity it unlocks."

Hillary Juma, Common Voice Community Manager

The latest numbers

-- This latest release introduces 16 new languages to the Common Voice data set: Basaa, Slovak, Northern Kurdish, Bulgarian, Kazakh, Bashkir, Galician, Uyghur, Armenian, Belarusian, Urdu, Guarani, Serbian, Uzbek, Azerbaijani, Hausa.

-- The top five languages by total hours are English (2,630 hours), Kinyarwanda (2,260) , German (1,040), Catalan (920), and Esperanto (840).

-- Languages that has increased the most by percentage are Thai (almost 20x growth, from 12 hours to 250 hours), Luganda (9x growth, from 8 hours to 80 hours), Esperanto (more than 7x growth, from 100 hours to 840 hours), and Tamil (more than 8x growth, from 24 hours to 220 hours).


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK