7

Largest Text-To-Speech AI Model Yet Shows 'Emergent Abilities' - Slashdot

 3 months ago
source link: https://slashdot.org/story/24/02/15/0117248/largest-text-to-speech-ai-model-yet-shows-emergent-abilities
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Largest Text-To-Speech AI Model Yet Shows 'Emergent Abilities'binspamdupenotthebestofftopicslownewsdaystalestupid freshfunnyinsightfulinterestingmaybe offtopicflamebaittrollredundantoverrated insightfulinterestinginformativefunnyunderrated descriptive typodupeerror

Sign up for the Slashdot newsletter! OR check out the new Slashdot job board to browse remote jobs or jobs in your areaDo you develop on GitHub? You can keep using GitHub but automatically sync your GitHub releases to SourceForge quickly and easily with this tool so your projects have a backup location, and get your project in front of SourceForge's nearly 30 million monthly users. It takes less than a minute. Get new users downloading your project releases today!
×
Devin Coldeway reports via TechCrunch: Researchers at Amazon have trained the largest ever text-to-speech model yet, which they claim exhibits "emergent" qualities improving its ability to speak even complex sentences naturally. The breakthrough could be what the technology needs to escape the uncanny valley. These models were always going to grow and improve, but the researchers specifically hoped to see the kind of leap in ability that we observed once language models got past a certain size. For reasons unknown to us, once LLMs grow past a certain point, they start being way more robust and versatile, able to perform tasks they weren't trained to. That is not to say they are gaining sentience or anything, just that past a certain point their performance on certain conversational AI tasks hockey sticks. The team at Amazon AGI -- no secret what they're aiming at -- thought the same might happen as text-to-speech models grew as well, and their research suggests this is in fact the case.

The new model is called Big Adaptive Streamable TTS with Emergent abilities, which they have contorted into the abbreviation BASE TTS. The largest version of the model uses 100,000 hours of public domain speech, 90% of which is in English, the remainder in German, Dutch and Spanish. At 980 million parameters, BASE-large appears to be the biggest model in this category. They also trained 400M- and 150M-parameter models based on 10,000 and 1,000 hours of audio respectively, for comparison -- the idea being, if one of these models shows emergent behaviors but another doesn't, you have a range for where those behaviors begin to emerge. As it turns out, the medium-sized model showed the jump in capability the team was looking for, not necessarily in ordinary speech quality (it is reviewed better but only by a couple points) but in the set of emergent abilities they observed and measured. Here are examples of tricky text mentioned in the paper:

- Compound nouns: The Beckhams decided to rent a charming stone-built quaint countryside holiday cottage.
- Emotions: "Oh my gosh! Are we really going to the Maldives? That's unbelievable!" Jennie squealed, bouncing on her toes with uncontained glee.
- Foreign words: "Mr. Henry, renowned for his mise en place, orchestrated a seven-course meal, each dish a piece de resistance.
- Paralinguistics (i.e. readable non-words): "Shh, Lucy, shhh, we mustn't wake your baby brother," Tom whispered, as they tiptoed past the nursery.
- Punctuations: She received an odd text from her brother: 'Emergency @ home; call ASAP! Mom & Dad are worried... #familymatters.'
- Questions: But the Brexit question remains: After all the trials and tribulations, will the ministers find the answers in time?
-Syntactic complexities: The movie that De Moya who was recently awarded the lifetime achievement award starred in 2022 was a box-office hit, despite the mixed reviews. You can read more examples of these difficult texts being spoken naturally here.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK