Unlearning or forgetting things learned is an important action that artificial intelligence (AI) must undergo from time to time.
Unlearning is also known as selective amnesia in AI. and can be needed for all kinds of reasons, including removing bias, correcting inaccuracies, or updating information.
Over a period of time, AI learns from varied and vast amounts of datasets and inevitably learns to demonstrate bias, inaccuracies, and discriminations. These manifestations can be dangerous and can be targeted by malicious entities.
However, the task of unlearning is difficult, for instance, data can affect many different datasets, and different tools are needed for different machine learning models.
Still, unlearning is one of the important ways to improve AI.
What is AI Unlearning?
Let’s try to understand AI unlearning with an example of imaginary John Smith.
John has been exposed to learning about the food habits of the people of a region, which makes him think that the people demonstrate poor food habits.
He has learned from hearsay, people’s secondhand experiences, the media, and the internet, and all of this information has shaped his opinion.
You can say that John’s learning has made his opinion biased, false, and even defamatory.
Now, people with real-world exposure to the food habits of that region find that much of what John believes is untrue and baseless.
When John finally visits the area for an extended period, eats their food, and experiences their food habits, he returns with a new perspective. He finds that his recent experiences challenge or update his old beliefs and thoughts. He has unlearned a lot of things that he knew.
In other words, new data has replaced old data.
AI unlearning happens in a similar manner. AI can be exposed to incorrect and biased datasets over some time and amplify its inaccurate knowledge over a period of time.
At times, AI must have the ability to bo be taken through an unlearning program that replaces or updates old datasets with new and more accurate datasets. This is a continuous process that may need to happen regularly.
Circumstances Behind Unlearning AI
The primary purpose is to remove inaccurate and biased output, however, another concern can be that AI may leak private data — and therefore, that knowledge must be “unlearnt”.
Various regulatory authorities have already been asking companies to eliminate data that violates privacy.
In 2018, the data regulator in the UK warned that companies using AI could be subject to the GDPR. The US Federal Trade Commission (FTC) forced Paravision, a facial recognition software company, to remove a collection of photos that they had collected without following protocol and also to alter the data of the AI program that was trained on the photos.
Unlearning is a Complex Proposition
From the perspective of the companies that train AI systems, the circumstances leading to unlearning create a problematic situation.
One, the need to protect privacy drives continuous changes to various laws like the GDPR, and the companies must have their AI systems adapt to the regulations, which can be costly and time-consuming.
Two, currently, unlearning means that you remove the data from the AI systems and retrain the system from scratch. Add to this the effort of removing the data from other methods which are affected by the data.
This means that you might be facing the possibility of retrain.
Where possible, it is more straightforward to remove the contested data but avoid retraining the AI system.
Can You Forget But Avoid Retraining an AI Model?
According to Aron Roth, a researcher on AI Unlearning at the University of Pennsylvania, “Can we remove all influence of someone’s data when they ask to delete it, but avoid the full cost of retraining from scratch?” A lot of effort is being put in that direction.
One example is a project by researchers at the universities of Toronto and Wisconsin-Madison in which they created multiple smaller projects with datasets and combined them into a larger project.
The research paper describes the project as “a framework that expedites the unlearning process by strategically limiting the influence of a data point in the training procedure.
“While our framework is applicable to any learning algorithm, it is designed to achieve the largest improvements for stateful algorithms like stochastic gradient descent for deep neural network.
“Training reduces the computational overhead associated with unlearning, even in the worst-case setting where unlearning requests are made uniformly across the training set.”
Are There Any Limitations?
There is a limitation with the approach, as pointed out by the researchers from Harvard, Pennsylvania, and Stanford universities, that if the data deletion came in a certain sequence, either from a malicious actor or from any other entity by chance, the program could break.
Apart from this, there is another problem of verifying whether the AI system has been successfully unlearned.
This is not to question the company’s intention but to find out whether the effort to unlearn has fully succeeded.
According to Gautam Kamath, a professor at the University of Waterloo, “It feels like it’s a little way down the road, but maybe they’ll eventually have auditors for this sort of thing.”
Other ideas include differential privacy, a technique that can put mathematical boundaries on how much an AI system can actually leak in terms of private data. The technique must still be vetted by different experts before it can be successfully rolled out.
The Bottom Line
Unlearning is at a nascent stage, and it will be a while before it is treated as a mature and proven system that can enable AI systems to not only unlearn but also retrain with minimal effort.
Constant pressure from regulatory bodies, laws, regulations, and litigations will keep the companies using AI systems on their toes, especially in regions like the European Union (EU), where strong laws like the GDPR are used.
Unlearning is an extremely complex proposition, and it will take a deeper look at how AI systems learn to find out how they can potentially unlearn.