Some enthusiasts would argue that artificial intelligence (AI) is on the verge of becoming sentient. But not Yann LeCun, who won a 2018 Turing award for his contribution to deep learning.
In fact, LeCun thinks AI is not on the right path to think and learn like humans. He points out that while a teenager can learn to drive in around 20 hours, a decent self-driving car nowadays would require millions or billions of labeled training data samples or reinforcement learning trials in simulated environments — and would still fall short of a human’s ability to drive reliably. (Also read: Hacking Autonomous Vehicles: Is This Why We Don't Have Self-Driving Cars Yet?)
Based on this realization, LeCun sketched a roadmap to create "autonomous artificial intelligence." LeCun's roadmap draws inspiration from various disciplines — like deep learning, robotics, cognitive science and neuroscience — to sketch a modular configurable structure. And while actually implementing this roadmap would require further exploration, it's useful to think about the different components required to replicate animal and human intelligence.
This article delves deeper into the methodology behind LeCun's autonomous artificial intelligence roadmap.
How It Works
The World Model
The core of LeCun's framework is a "world model" that predicts world conditions or states. LeCun argues that animals and people each have their own "world model" somewhere in their prefrontal cortex.
While attempts have already been made to develop an AI-based world model, these models are task-dependent and can’t be adapted to different tasks. LeCun, however, disagrees with the notion of multiple task-dependent world models and believes in a single, dynamically configurable world model. According to LeCun, each individual's single world model enables knowledge sharing among multiple tasks, which leads humans to reason through analogy.
In the context of LeCun's autonomous AI roadmap, the idea of the world model is accompanied by other models that help the AI system understand the world and perform actions in it.
The Perception Model
The “perception” model collects and processes signals from sensors and estimates the state of the world. Hence, it mimics people's five senses. The world model helps the perception model perform two essential tasks:
- Filling in absent pieces of information in the sensory data (e.g., occluded objects).
- Predicting the most likely future states of the world (e.g., a moving car's location five seconds from now).
LeCun's autonomous AI architecture also contains other models which work alongside the world model to facilitate AI's ability to learn. These include:
The Cost Model
The cost model urges an AI system to achieve desired objectives. It measures the level of discomfort in the system and consists of two sub-models:
- The intrinsic cost: a built-in, non-trainable model that computes instant discomfort (e.g., damage to the system).
- The critic: a learnable model that predicts the future state of intrinsic cost.
The AI system aims to reduce intrinsic cost over a period of time. According to LeCun, it’s the cost model where basic behavioral urges and intrinsic motivations exist. It is important to differentiate this model because it enables gradients of the cost to backpropagate through other models — training them to work together to reduce intrinsic cost.
The Actor Model
The “actor” model takes actions in attempts to minimize the level of discomfort (i.e., intrinsic cost).
The Short-Term Memory Model
The “short-term memory” model memorizes important information about the state of the world and the corresponding intrinsic cost. It plays an important role to help the world model make accurate predictions.
The Configurator Model
Lastly, LeCun's autonomous AI architecture includes a “configurator” model to provide executive control to the system.
The configurator's key objective is to enable the AI to handle a variety of different tasks. It does this by regulating the other models in the architecture — for example, by modulating their parameters.
To call back to the "self-driving cars" example from earlier, if you want to drive a car, your "perception model" (your five senses) should be absorbing information from the parts of a car relevant for driving — you should look out through the windshield, touch the steering wheel and listen to the engine. Meanwhile, your "actor model" must plan actions accordingly — you start the engine and change gears — and your "cost model" takes traffic rules into account.
Interestingly, LeCun's roadmap was inspired by Daniel Kahneman's dual process theory, which he proposed in “Thinking Fast and Slow.” Kahneman's model enables AI systems to exhibit two types of behaviors:
- Mode 1. Mode 1 is a fast and reflexive behavior as a result of direct perception-to-action mapping.
- Mode 2. Mode 2 is a slow and deliberate behavior that uses the world model, perception model, cost model, actor model, short-term memory model and configurator model for reasoning and planning.
How to Implement Yann LeCun's Autonomous AI Framework
According to LeCun, a key challenge in realizing his conceptual framework is implementation.
LeCun believes in implementing his model using deep learning models trainable with gradient-based optimization algorithms. He is not convinced in using the symbolic system, which requires hand-coded knowledge from humans.
Two promising methodologies for implementing this framework are:
1. Self-Supervised Learning
Because deep learning models require a large amount of human-annotated data sets to learn using supervised learning, LeCun advocates for the use of self-supervised learning (SSL): an unsupervised learning approach which uses the supervisions available naturally within a dataset (i.e., no human annotations). LeCun argues that human children also use self-supervised learning to gain common-sense knowledge of the world — such as gravity, dimensionality, depth and social relationships.
Besides theoretical motivations, SSL has also displayed incredible practical utility in learning foundational language models using transformer-based deep learning architectures. (Also read: Foundation Models: AI's Next Frontier.)
2. Energy-Based Models
While various SSL approaches exist, such as auto-encoding and contrastive learning, LeCun emphasizes using energy-based models (EBMs).
EBMs deal with encoding high-dimensional data, such as images, into low-dimensional embedding spaces by preserving only relevant information. Keep in mind: AI models are trained by measuring whether two observations are similar or not. To this end, LeCun proposes an EBM-based learning architecture called “Joint Embedding Predictive Architecture (JEPA)” to learn world models.
According to LeCun, a key feature of JEPA is that it can choose to overlook the irrelevant details that could not be predicted easily. For example, in image processing, rather than predicting the world's state at a pixel level, JEPA tends to learn low-dimensional features that are vital for a given task. LeCun also explains how JEPA architectures can be stacked on top of each other to form "Hierarchical JEPA" (H-JEPA), which could be crucial to handle complex tasks such as reasoning and planning at multiple time scales.
Conclusion: The Upward Climb to Autonomous AI
While some researchers believe artificial general intelligence (AGI) can be achieved by massively scaling deep learning architectures, LeCun states scaling is not enough to achieve autonomous AI. While scaling has produced incredible advances in language models involving discrete data, it fails to achieve a similar impact on high-dimensional continuous data, such as videos. (Also read: A Primer on Natural Language Understanding (NLU) Technologies.)
LeCun is also not convinced reward functions and reinforcement algorithms are enough to achieve AGI. He argues reinforcement learning requires continuous interaction with the environment — unlike humans and animals, who mainly use their perception to learn.
Clearly, LeCun's framework requires further exploration to address its implementational challenges.