Why Meta’s speech-to-speech translation ambitions are far from today’s reality

Why Meta’s speech-to-speech translation ambitions are far from today’s reality | Papercup Blog

April 27, 2022

6 min read

Since its launch in 2021, Meta has positioned itself as a new pioneer for the digital age. Zuckerberg’s recent announcement that Meta is working on speech-to-speech AI translation is an exciting development. However, it's best to be realistic about what can be achieved and grounded about timelines. Here, we take a look at current possibilities in the world of technology-based translation - including what works well, what’s still forthcoming, and where care is needed to produce quality results.

Meta’s translation ambitions

AI breakthroughs are necessary to Zuckerberg’s vision of the Metaverse and how it will operate. Direct speech-to-speech translation could open a world of connection, but would also offer innovative possibilities for planned products like Meta’s AR glasses.

Meta has announced two core translation projects: No Language Left Behind, its push to translate mid and long-tail languages traditionally neglected in the digital world; and a Universal Speech Translator, an effort to use AI advances to simplify automatic translation.

Today, nearly half of the world’s population can’t access online content in their preferred language and roughly twenty percent aren’t covered by existing translation tools Source: Meta

Meta claims their machine translation advances will ‘fundamentally change the way people in the world connect and share ideas’. Zuckerberg has lofty ambitions for the universal translator, in particular, calling it ‘a superpower people have dreamed of forever’ in his live stream.

‍

Reality check

In the flurry of excitement, it’s easy to overlook the qualifications. Zuckerberg’s announcement of this superpower AI ‘within our lifetimes’ is a pretty ambiguous timeline. The fact is, this isn’t an imminent launch for Meta: both projects are research experiments, with no public roadmap to production.

Meta’s biggest bet is on their Universal Speech Translator, which they hope will enable communication across all languages essentially instantaneously. Direct speech-to-speech translation is currently possible, but translation between just two languages can at best achieve an accuracy level of 85% within five seconds of receiving input.

No Language Left Behind is more straightforward but still hampered by the limits of machine translation. Machine learning needs enough data sets to work from to begin predicting and translating a new language and this becomes tricky when you reach languages with fewer records and speech examples. Meta plans to build on their existing automatic data sets to overcome this, but by Zuckerberg’s own admission, they’re far from reaching everyone. He states, ‘Five years ago, we could translate across a dozen languages. Three years ago, we were up to 30 languages and this year, we are now aiming for hundreds of languages.’ Clearly, many are still being left behind.

"I think people need to be grounded and realistic about what state this is in" Jesse Shemen, Papercup co-founder and CEO, explains. "t's not about to be deployed in a product that 8 billion people will start using tomorrow."

‍

The challenges with machine translation

Today, entirely automatic translation is often unsuitable for professional purposes. Machine translation has come a long way: synthetic voices can now be produced that sound extremely lifelike, and software like Papercup’s AI can even capture the nuances of an original speaker’s vocal traits. However, without human evaluation in post-edit, mistakes can be introduced that create a jarring consumer experience.

Machine translation frequently misses nuance, makes translation literal, and can’t account for cultural differences. Algorithms are great for scaling content output and increasing accessibility, but there’s a danger they’ll increase or introduce bias as well – like when Google Translate’s algorithm wrongly turned all historians into men and all nurses into women.

Meta alludes to these issues in their launch post, citing their ‘need to find ways to […] preserve cultural sensitivities’ and ‘not create or intensify biases’. Yet they admit finding ways to overcome this will be difficult, stating, ‘evaluating a large-scale, multilingual model’s performance is […] time-consuming, resource intensive, and often impractical.’

It’s something they’ll want to carefully consider if they want audiences to feel safe using their technology, however: it was only in 2017 that Facebook’s own automatic translation resulted in the arrest of a Palestinian man after his post ‘good morning’ was translated to ‘attack them’.

How human-in-the-loop ensures quality translation at scale

Meta has made a play for a future translation landscape where scalability, efficiency, and accessibility are easy and expected. To achieve these goals without compromising on quality today, machine translation needs to be combined with the input of expert translators. As Jesse explains, by using human-in-the-loop models like Papercup’s, "You reap the vast speed benefits of machine learning while achieving the last mile of quality that people expect with a human layer."

Hybrid solutions like Papercup use professional translators to audit machine translations and customize pronunciation. Human translators can also apply local phrases and ensure complex brand and technical terms are conveyed in the way intended. This gives companies the fast and scalable translation that they require while increasing audience engagement.

It’s a strategy it appears Meta is leaning into: despite the utopia Zuckerberg gestures toward, the team admits their translation goals ‘will require not just expertise in AI but also the sustained input of numerous experts, researchers, and individuals from around the world.’

Takeaway

Meta’s ambitious plans are what is needed to keep the digital world innovating and progressing. However, it’s important to understand what’s possible now and what will produce consistent, high-quality results. In the short to medium term, a hybrid model provides a fast, efficient way to scale content to a global audience. As Papercup’s Jesse Shemen explains, it’s "Humans and machines working in concert" that will ‘unlock billions of hours of content and speech stuck in a single language in the foreseeable future.’

In this article

Related Blogs

The overall average watch time and completion on our new Spanish Sky News channel is so far above and beyond what we had expected. That’s a testament to the quality of the Papercup solution and then how it has transformed into positive user behavior that shows us how they consume content.

Andy Gill

Audience & Partnerships Manager at Sky News

The primary driver for looking at AI dubbing was being able to increase our viewer base and revenue and to do that through using videos from our existing archive. Previously, we have tried different approaches for translations before such as subtitling but more recently we’ve been interested by the new YouTube MLA feature, which has the potential to drive audience expansion.

John Montoya

Senior Director, Content Strategy at Vice