DefinedCrowd Launches DefinedData, an Online Marketplace of AI Datasets Available for On-Demand Purchase


High-quality data is now just a click away 

Seattle, Washington, July 7 2020 – DefinedCrowd, the leading data provider for Artificial Intelligence, today announced the launch of DefinedData, a new offering that enables customers to rapidly accelerate their AI initiatives into the market by acquiring pre-collected, annotated, and validated AI training data from an online catalog.  

This product launch follows the recent closing of a US$50.5M Series B funding round and the addition of a new investor, Balderton Capital. This funding round enables DefinedCrowd to continue its launch of new and innovative data solutions for the AI industry. 

“Machine learning teams building AI models have always faced one particularly pressing problem, and that is continuous access to highly accurate data. When technology-focused companies want to launch their AI initiatives into the market quickly, they simply don’t have the time to collect and validate the data required to do so,” said Daniela Braga, founder, and CEO of DefinedCrowd

According to Braga, DefinedData aims to solve this problem by providing time-strapped customers with high-quality, pre-collected datasets, already annotated and validated by a global crowd of over 300,000 contributors. Usually, creating such high-quality datasets would take a machine learning (ML) team anywhere from three to six months. However, DefinedData makes accessing high-quality data for AI much easier.  

Customers can simply browse pre-collected AI datasets in multiple languages, domains, and recording types and either request samples or request to purchase. Customers can also choose between a one-time purchase or an annual subscription that provides access to all of the new datasets. By May 2021, the catalog is expected to grow to include over 25,000 hours of speech and natural language data.  

“As the appetite for high-quality data continues to grow, the market for training data will become increasingly modularised. Training data repositories and marketplaces will be a key feature of the value chain, allowing teams to both monetize existing data sets as well as source new data time and cost-effectively. We are incredibly excited to be joining Daniela and her team on their journey as they pave the way in this space,” said Laura Connell, Principal at Balderton Capital. 

DefinedData will maintain the commitment to the quality for which DefinedCrowd has become known. To ensure the highest levels of accuracy and authenticity, multiple key performance indicators (KPIs) will be used including Word Error Rate, gender distribution levels, age distribution, ambient noise levels, nativeness (accuracy of native speakers), and domain accuracy.  

“Whether you’re building a prototype, or minimum viable product; testing internal models, or benchmarking third-party cognitive services, our continually updated library of datasets will help you quickly achieve your AI goals,” concluded Braga.  

To learn more about DefinedData, visit the catalog here.  

About DefinedCrowd  

DefinedCrowd offers a platform with multiple data delivery options that leverages machine learning technology and human intelligence to deliver quality-guaranteed training data for AI systems. The platform offers self-service and fully customizable solutions that deliver high-quality project-specific training data, enabling AI products reach market quicker. Our value proposition is quality, privacy, speed, and scale, covering more than 50 different languages. With strong expertise in speech and natural language processing technologies, we have been serving AI companies and Fortune 500 companies since day one. DefinedCrowd was founded in Seattle and has offices in Lisbon, Porto and Tokyo.