News | Nvidia Enters Speech Race, Joins Meta and Google

Nvidia Enters Speech Race, Joins Meta and Google

Published by: Insights Desk Released: Nov 04, 2022

Highlights:

Nvidia asserts that linguistic inclusion for voice AI has numerous data health advantages, including aiding AI models in comprehending speaker variation and noise characteristics.
Nvidia wants to incorporate recent advancements in AST and next-generation voice AI into use cases for the real-time metaverse.

At the Speech AI Summit, Nvidia recently unveiled its new speech Artificial Intelligence (AI) environment. This ecosystem was created in collaboration with Mozilla Common Voice. The ecosystem focuses on creating open-source pre-trained models and crowdsourced multilingual speech corpuses. The goal of Nvidia and Mozilla Common Voice is to hasten the development of automatic voice recognition systems that work universally for all language speakers across the world.

Nvidia discovered that popular voice assistants like Amazon Alexa and Google support less than one percent of the spoken languages in the world. To address this issue, the company aims to improve linguistic inclusion in speech AI and increase the accessibility of speech data for languages with less access to resources.

Nvidia has now joined the league of Meta and Google. Recently, both businesses unveiled speech AI models to encourage communication between people who speak various languages. Translation Hub, Google’s speech-to-speech AI technology, can translate a large number of papers into numerous languages. The tech giant also unveiled that it is developing a universal speech translator trained in more than 400 languages, claiming that it is the “largest language model coverage seen in a speech model recently.”

Likewise, Meta AI’s universal speech translator (UST) project contributes to the development of AI systems that allow for real-time translation from speech to speech in any language, including those that are spoken but not frequently written.

A system for users of different languages

Nvidia asserts that linguistic inclusion for voice AI has numerous data health advantages, including aiding AI models in comprehending speaker variation and noise characteristics. The new speech AI ecosystem allows developers to develop, maintain, and enhance speech AI models and datasets for linguistic inclusion, usability, and experience. Users can train their models on Mozilla Common Voice datasets and then offer those pre-trained models as top-notch automatic speech recognition architectures. Then, other companies and people worldwide can modify and use those architectures to create their speech AI applications.

Caroline de Brito Gottlieb, product manager at Nvidia, said, “Demographic diversity is key to capturing language diversity. Several vital factors impact speech variation, such as underserved dialects, sociolects, pidgins, and accents. Through this partnership, we aim to create a dataset ecosystem that helps communities build speech datasets and models for any language or context.”

Currently, the Mozilla Common Voice platform supports 100 languages, with 24,000 hours of speech data from 500,000 contributors across the world. The most recent edition of the Common Voice dataset also includes more speech data from female speakers and six new languages, including Tigre, Taiwanese (Minnan), Meadow Mari, Bengali, Toki Pona, and Cantonese.

Using the Mozilla Common Voice platform, users may donate their audio datasets by recording sentences as brief voice clips. Mozilla validates this to ensure dataset quality after submission.

Siddharth Sharma, head of product marketing, AI and deep learning at Nvidia, “The speech AI ecosystem extensively focuses on not only the diversity of languages but also on accents and noise profiles that different language speakers across the globe have. This has been our unique focus at Nvidia, and we created a solution that can be customized for every aspect of the speech AI model pipeline.”

Current speech AI implementations from Nvidia

The business is creating speech AI for various applications, including text-to-speech, automatic speech recognition (ASR), and artificial speech translation (AST). Nvidia Riva, a component of the Nvidia AI platform, offers cutting-edge GPU-optimized processes to design and deploy fully configurable, real-time AI pipelines for applications like contact center agent aids, virtual assistants, digital avatars, brand voices, and video conferencing transcription. Applications created via Riva can be installed on any cloud, in any data center, at the edge, or on embedded hardware.

The Singapore government’s transportation technology partner, NCS, adapted the Riva FastPitch model from Nvidia and created its text-to-speech engine for English-Singapore utilizing the voice data of local speakers. A recently designed app by NCS, Breeze, is an app for local drivers that translates languages including Mandarin, Hokkien, Malay, and Tamil into Singaporean English with the same expressiveness and clarity as a native Singaporean would speak them.

T-Mobile, a multinational mobile communications provider, also collaborated with Nvidia to create AI-based software for its customer experience centers that transcribes real-time client discussions and makes recommendations to thousands of front-line employees. In a bid to develop the software, T-Mobile used Riva and Nvidia Nemo, an open-source framework for cutting-edge conversational AI models. With the help of these Nvidia tools, T-Mobile engineers were able to optimize ASR models on the company’s unique datasets and accurately decipher customer jargon in loud circumstances.

The future focus of Nvidia is voice AI

According to Sharma, Nvidia wants to incorporate recent advancements in AST and next-generation voice AI into use cases for the real-time metaverse.

He said, “Today, we’re limited to only offering slow translation from one language to the other, and those translations have to go through text. But the future is where you can have people in the metaverse across so many different languages all being able to have instant translation with each other.”

He added, “the next step is developing systems that will enable fluid interactions with people across the globe through speech recognition for all languages and real-time text-to-speech.”

how to protect industrial processes in ot-it conve...

single-vendor sase for dummies...

beyond the vpn...

critical guidance for evaluating sase solutions...

choosing the best sase solution for your hybrid wo...

fruitful-berries realises their growth potential w...

sanorice future-proofs its business with aptean fo...

adapt, grow and thrive: how food industry experts ...

ai governance for the enterprise...

top 5 use cases for splunk enterprise security...

2024 gartner® magic quadrant™ for siem...

the hidden costs of downtime...

the ai philosophy powering digital resilience...

following the leaders: how premier organizations b...

the essential guide to zero trust...

2023 gartner® market guide for security, orchestr...

uncovering cyber threats: kaspersky incident analy...

proactive threat management: insights into managed...

threat hunting – what, why and how...

why are targeted ransomware attacks so successful?...

learn security vendor consolidation to enhance sec...

embedded payments: a smoother experience for your ...

leveraging multi-tenant architecture for scalabili...

building the bridge: effective post-merger it inte...

combating virtual machine sprawl: technical strate...

outdated endpoint security solutions: a security b...

businesses with low-code development enhances cust...

modern data governance for improved data quality...

deciphering cryptowall ransomware to plot a cyber ...

apache spark maximizing data potential with advanc...

scaling your cloud: scalable storage for public cl...

navigating shadow data: securing your sensitive bu...

guide to data center virtualization: management, p...

cloud application security solutions for complete ...

mastering source code management: best practices a...

profitable ai-powered data management solutions to...

application delivery network for business scalabil...

adaptive authentication fortifying businesses with...

bespoke software catalyzing roi: transforming busi...

result-driven virtual security analyst to help sec...

microsoft introduces bing generative search in lim...

qa wolf secures usd 36 m to enhance app testing...

linx security secures usd 33 m for its identity se...

microsoft reveals, crowdstrike update impacts 8.5 ...

cytoreason raises usd 80 m in the funding round in...

atlassian’s trello data breach: 15m emails leake...

google unveils a suite of new features for ai apps...

dreambig semiconductor secures usd 75m in funding...

kindo reels in usd 20.6 m and acquires whiterabbit...

microsoft’s spreadsheetllm enhances ai’s compr...

herculesai raises usd 26 m to develop and expand i...

intel capital leads usd 15 m investment in ai cons...

hayden ai raises usd 90 m to provide vision ai pla...

aws unveils app studio to accelerate app developme...

snowflake introduces multifactor authentication af...

alphabet call offs hubspot acquisition plans...

command zero launches with usd 21 m to investigate...

captions llc raises usd 60 m for generative video ...

aws introduces graviton4, fourth generation custom...

enso technologies secures usd 6 m for smb-focused ...

resurgence in lockbit drives record high ransomwar...

14 interesting trends that affect innovation and t...

what is web hosting?...

data privacy best practices every business should ...

Nvidia Enters Speech Race, Joins Meta and Google

Insights Desk

Related posts

Microsoft Introduces Bing Generative Search in Lim...

CytoReason Raises USD 80 M in the Funding Round In...

Google Unveils a Suite of New Features for AI Apps...

Kindo Reels in USD 20.6 M and Acquires WhiteRabbit...

Microsoft’s SpreadsheetLLM Enhances AI’s Compr...

HerculesAI Raises USD 26 M to Develop and Expand i...

Intel Capital Leads USD 15 M Investment in AI Cons...

AWS Unveils App Studio to Accelerate App Developme...

Captions LLC Raises USD 60 M for Generative Video ...

Enso Technologies Secures USD 6 M for SMB-focused ...

Our Brands