Blog post

Building Inclusive Speech Technology with Diverse Data

Inclusive speech recognition technology that is trained on diverse, accented speech data is the key to staying relevant in the voice recognition market.  

Three New Yorkers walk into a bar: one grew up in the Midwest in a Mexican family, another is a native Spanish speaker from Colombia, and the last, a New Yorker who spoke Castilian Spanish at home until high school. There’s no punchline here: they simply sit down and have a conversation in English.  

As they speak, we observe major differences in the speech of each person. Geography, socio- economic status, and ethnicity, among other factors, all cause variations in pronunciation, vocabulary, and other speech patterns. 

Given those differences, what happens when each of them goes home to their voice assistant and makes a request in English? How well is each of their accents understood? And what are the consequences for those who aren’t understood? 

Those are essential questions for data scientists, developers, and other AI speech professionals as they work to create speech recognition technology that is inclusive, diverse, and free from biases caused by an accent gap

Bridging the accent gap 

An accent gap is a type of algorithmic bias that occurs in voice recognition models that lack training with diverse, representative data, for example, models trained exclusively on English speech data sourced from a single geographic and cultural background. This “accent gap” can be frustrating to users who fall outside the narrow definition of an English speaker (predominantly white, upper-class male speakers), resulting in a product that doesn’t meet the needs of a diverse market.   

An accent gap can affect speech technology of all kinds. For example, one Washington Post study found that Amazon’s Alexa was 30% less likely to understand non-native English accents. In the same study, voice assistants from Google and other major competitors produced similar results.  

This means that to compete long-term in the voice recognition market and to create inclusive speech products, your model must understand accented speech. And when we say “models” we don’t just mean voice assistants. All models and devices that make up the Internet of Things (IoT), many of which use voice activation and recognition as part of their core offering, should be trained on diverse, representative, and bias-aware training data.  

By releasing a free Spanish-accented English speech dataset, Defined.ai aims to help AI professionals test whether their models present  accent gap for one specific group: non-native English speakers in the US whose native language is Spanish.  

Spanish-accented data 

Within the United States, there are more than 37 million Spanish speakers, making it the most spoken non-English language in the US. This number has grown by 233% since 1980, mostly due to immigration and the organic population growth in certain regions of the US.  

Spanish itself has many variations – there are approximately 577 million native Spanish speakers in the world, spread across 21 countries, each with their own distinct accents.  

As a result, addressing the accent gap in relation to Spanish accents is extremely complex and nuanced. Models must be trained on accented English taken from Spanish speakers all around the world, from a variety of Spanish-speaking countries.  

The importance of inclusive speech technology

The absolutely essential element of needing to build inclusive and representative speech technology cannot be overstated. Beyond simply appealing to a broader customer base, the implications of inherently biased technology to be put to everyday use are far-reaching. Imagine, a user having to change the very way they speak in order to simply be understood by their home voice assistant, or in a customer service call.

These biases already exist in the real world, outside of technology, and it is our responsibility to create inclusive AI for the future that fights against these biases, rather than reinforce them.

Free speech data from Defined.ai 

To continue the fight against this accent gap, Defined.ai is releasing free speech dataset, made up of data from Spanish-accented English speakers from all around the world.  

Language is constantly evolving and shifting, adapting to its environment and its users. As a result, voice assistants and IVR models must evolve as well, to stay inclusive, relevant, and competitive.  

Let’s build AI that drops the outdated model of what American English is supposed to sound like, and instead focuses on what it does sound like.   

Claim your free dataset here, by registering here on Defined.ai’s marketplace.  

0

Leave a comment

Your email address will not be published. Required fields are marked *

Terms of Use agreement

When contributing, do not post any material that contains:

  • hate speech
  • profanity, obscenity or vulgarity
  • comments that could be considered prejudicial, racist or inflammatory
  • nudity or offensive imagery (including, but not limited to, in profile pictures)
  • defamation to a person or people
  • name calling and/or personal attacks
  • comments whose main purpose are commercial in nature and/or to sell a product
  • comments that infringe on copyright or another person’s intellectual property
  • spam comments from individuals or groups, such as the same comment posted repeatedly on a profile
  • personal information about you or another individual (including identifying information, email addresses, phone numbers or private addresses)
  • false representation of another individual, organisation, government or entity
  • promotion of a product, business, company or organisation

We retain the right to remove any content that does not comply with these guidelines or we deem inappropriate.
Repeated violations may cause the author to be blocked from our channels.

Thank you for your comment!

Please allow several working hours for the comment to be moderated before it is published.