Artificial Intelligence systems essentially involve two main ingredients: Code and Data.
The code reflects AI model or algorithm which is trained using the data. The conventional model-centric AI focuses on improving code to achieve better results given a fixed set of data. AI developers generally consider the training datasets from which their code is learning as a collection of ground-truth labels, and their AI model is made to fit that labeled training data. Thus, this approach generally assumes the training data as external from the AI development process.
On the other hand, data-centric AI aims to improve data quality to achieve better outcomes by treating code as an unchangeable entity. In other words, while model-centric AI deals with developing or improving the AI model or algorithm, data-centric AI deals with the labeling, augmenting, managing and curating of data. Data-centric AI may seem to be the pre-processing of data, however, it emphasizes an iterative AI life-cycle consisting of data collection, model training and analyzing errors.
In model-centric AI, we spend relatively more time on optimizing an AI model whereas in data-centric AI, we spend rather more time on data quality improvement. In model-centric, we aim to find the most suitable AI model or an optimization technique for a given problem, whereas in data-centric we aim to find inconsistencies in the collected data for a given problem.
Nowadays, model-centric AI tends to optimize bigger AI models on large-scale datasets which therefore require large-scale datasets and lots of computing resources, whereas data-centric AI may require domain knowledge or experts to find inconsistencies in data.
Though most data-centric AI ideas already exist as conventional wisdom in the AI community, data-centric AI aims to build a systematic approach and the tools needed to facilitate this process.