DefinedCrowd Addresses Data Quality Challenge as Global Pandemic Accelerates Demand for Bias-Free Artificial Intelligence, Boosts Access via NVIDIA NGC


The Seattle-based company is expanding its Data Marketplace, complemented by detailed metadata, broadening dataset access through the NVIDIA NGC catalog, and adding new subscription plans

Seattle, Washington, USA, March 30th 2021To address the rapid increase in the demand for high-quality, bias-aware AI training data, DefinedCrowd announced today the expansion of its online data marketplace, DefinedData, to third-party suppliers to sell or share AI datasets, as well as a collaboration with NVIDIA to provide dataset samples through the NVIDIA NGC catalog. In addition, the platform now provides AI engineers with unprecedented levels of training data transparency, and a range of subscription options, with special discounts for academia.

AI is Top of the Corporate Agenda

The global pandemic has pushed AI to the top of the corporate agenda. A study by IDC shows that, in 2020, the AI market was predicted to be worth $300 billion by 2024. As of February 2021, the market is expected to break the $500 billion mark in 2024. A McKinsey survey found that responses to the crisis sped the adoption of digital technologies by several years, with 61% of high-performance companies increasing their investment in AI. This acceleration in AI development has created a huge increase in the demand for high-quality datasets.

Avoiding AI Bias Through Data Transparency

As more AI systems are deployed, and at a faster rate, implications of underlying bias arise. To address this concern, DefinedData’s catalog now offers detailed information on the gender, age, accent, and phonetic distribution of datasets as well as meta-data on the recordings, and audio samples. Access the up-to-date catalog here.

Democratizing Data Access via NVIDIA NGC

As a key step in democratizing access to data, DefinedCrowd will provide dataset samples through the NVIDIA NGC catalog, a GPU-optimized hub for AI and HPC containers, pre-trained models and SDKs that simplifies and accelerates end-to-end workflows. Datasets can be used to train models using libraries within the NVIDIA Jarvis application framework; NVIDIA Transfer Learning Toolkit, which enables developers to build production-quality models faster with no coding required; as well as the NVIDIA NeMo platform, a Python toolkit for building, training, and fine-tuning unmatched GPU-accelerated conversational AI models. This collaboration allows researchers and developers to build high-quality, state-of-the-art conversational AI models.

“By working with DefinedCrowd, we’re providing NVIDIA Jarvis and NeMo users with sample datasets to build and accelerate their models, all within the NGC environment,” said Richard Kerris, head of developer relations at NVIDIA.

Affordable Dataset Subscriptions

DefinedCrowd is introducing DefinedData subscriptions, providing access to a constantly expanding and refined catalog of high-quality speech and NLP datasets.

“Companies constantly need to engage a long tail of data in order to grow in new sectors, and data scientists need the raw material in order to address these issues as data science becomes more democratic each day,” said Director of Machine Learning at DefinedCrowd, Dr. Christopher Shulby. “This offering will allow data scientists to keep their models relevant in a continually evolving world.”

To learn more about DefinedData’s subscriptions, follow this link. Academia will have access to special pricing options.

Welcoming Data Partners

DefinedCrowd is encouraging third parties to list and sell their datasets on DefinedData, in an effort to address the increasing demand for AI training data. To ensure world-class quality, all datasets will be subjected to a vetting process before being made available. Express your interest in becoming a vendor on DefinedData here.

“This is an exciting moment. I am proud to see DefinedCrowd becoming the GitHub of AI,” said Founder and CEO, Dr. Daniela Braga. “Transparent, traceable and bias-aware data is crucial to build ethical AI technologies.”

About DefinedCrowd

DefinedCrowd is a trusted AI data partner, offering an overarching infrastructure of solutions which all focus on providing our clients with a one-stop-shop for AI training data. The broad scope of products ranges from off-the-shelf data to customized AI solutions. With Expertise, Reliability, Trust, and Innovation at the core of the business, DefinedCrowd has been serving AI companies and Fortune 500 companies since day one. With offices in Seattle, Lisbon, Porto and Tokyo, DefinedCrowd combines human knowledge and machine learning creating a natural interaction between people and machines towards a smarter future.


Catarina Peyroteo Salteiro    
Director of Global Communication & Brand