Accelerating Innovation: Voice Assistant Development

.18.04.2023

 

As voice technology continues to reshape the landscape of consumer electronics, leading companies are embracing voice assistant development to stay ahead of the curve. One such global leader in audio equipment partnered with us to develop an automatic speech recognition (ASR) model for integration into their products seamless voice assistant development. Our end-to-end pipeline and diverse human-in-the-loop community played a crucial role in collecting and processing high-quality data, enabling the client to train their ASR system for superior performance across a range of scenarios, accents, and speech nuances.

Leveraging voice assistant development, our client is now positioned to deliver exceptional voice-enabled products that understand and respond to user requests with remarkable accuracy. Are you ready to transform your products and customer experiences with advanced voice technology?

 

The customer

Our client is a global leader and provider of audio equipment.

 

The context

Given the rise of voice technology, a leading global audio equipment provider wanted to develop an automatic speech recognition (ASR) model to test in their products for voice assistant development. Our client needed high-quality data to train their ASR system on everything from simple audio system commands like “repeat” to more complex assistant requests like “find me a restaurant”, which could be spoken in a quiet home environment or in a moving vehicle with background noise. The system would need to understand variations of the same request – such as “make it louder” or “turn it up” – as well as accents and other nuances in people’s speech.

 

 

The solution

Our enterprise portal served as an end-to-end pipeline for voice assistant development, collecting everything needed to build a voice assistant from the ground up. The project took advantage of our range of purpose-built workflows in speech and NLP, utilizing the skills of our diverse human-in-the-loop community.

Voice Assistant Development

Step 1 – Speech collection

The client’s specific requirements for voice assistant development were converted into an online task on our crowd’s platform. We selected 230 qualified people from the crowd to record phrases in their own words related to specific scenarios. An illustrative example of a specific scenario is requesting that the model play some music you’d like to hear. This step would not only help the virtual assistant understand customers, but it would also train it to speak to users at a later stage of development.

 

Step 2 – Transcription & validation

Other community members then transcribed the speech data collected in Step 1 into text. Since quality is maximized when one community member validates another’s data, we recruited a new group to validate the transcriptions. Data that did not pass the validation process was recollected and validated at no additional charge to the customer.

 

Step 3 – Semantic annotation

The validated transcriptions were compiled with a set of additional transcriptions from the client.

A further group of 75 people annotated the sentences, identifying the speakers’ intent. Examples of intent included adjusting volume, activating noise cancellation, finding a restaurant, etc.

Annotators also categorized phrases related to specific artists, music genres, song titles, and other entertainment categories based on domains defined by the client. This step would ensure that the ASR model would be able to respond to user requests accordingly.

 

Step 4 – Entity tagging

A subset of the data that was semantically annotated in the previous step was then transferred to a final entity tagging job. Only the datapoints which contained entities went through this step, as opposed to the entire data set, saving both time and money.
90 community members tagged the remaining sentences with the entities mentioned. Example: “Lady Gaga” tagged as “Singer”.

A pre-defined inter-annotator agreement was used to ensure consistency.

 

Step 5 – Aggregating & delivering

Once the whole process was completed, we aggregated, vetted, and delivered a unique set of speech data to the client, enabling them to build a voice-enabled product from scratch.

This entire process was completed via our enterprise portal. At every step, our machine learning algorithms monitored our community members’ work, eliminating data deemed inaccurate or invalid to ensure the highest possible output quality.

 

Workflow: Voice Assistant Data Accuracy

 

With ongoing support, our client received the high-quality data needed to train, test, and tune a model for developing voice-enabled products. The data measured at over 98% accuracy based on a 1.8% word error rate and an F1 score threshold of 0.8, enabling them to build a baseline ASR model.

Having had a working relationship for over a year, this client was already familiar with our services and used our complete end-to-end processes to combine diverse data workflows to deliver greater quality, efficiency, and ROI.

Enhance your products with advanced voice technology using our ready-to-use ASR datasets. Check our Marketplace today to explore how our comprehensive solutions can elevate your offerings and boost customer satisfaction.