How Interactive AI is the Next Phase of Generative AI

Picture a world much like the legendary Tower of Babel, where everyone excels in their unique field but speaks in different languages.

Your mission is to construct this grand tower. To conquer this monumental task, you need to plan meticulously, break the task into manageable steps, assemble the right team for each role, decode their languages for seamless communication, and ensure flawless coordination.

This Herculean effort calls for masterful planning, a deep understanding of each person’s expertise, multilingual fluency, and effective coordination.

For years, you’ve expertly managed this complex task. Then, you stumble upon an interpreter – someone who can effortlessly translate your instructions from your language into the diverse languages of your team.

While this discovery is a significant relief, your responsibilities of planning, team selection, and coordination persist.

But what if this interpreter could become more than just a translator? What if they could be a strategic genius, a talent scout, and a master coordinator?

What is Interactive AI?

The term “Interactive AI” was coined last month by Mustafa Suleyman, co-founder of DeepMind, defining it as the next evolution of generative AI, focused on developing bots capable of executing assigned tasks by orchestrating other software and human resources.

Although this term has generated significant buzz on the internet, there remains a lack of information on what makes such interactive AI systems. In this article, we delve into the world of interactive AI, seeking to understand its foundations and assessing the progress made in this field.

How Generative AI Plays a Role

Generative AI, short for Generative Artificial Intelligence, refers to a subset of artificial intelligence technologies designed to generate content, data, or information. These systems can produce new and original content rather than simply making decisions or predictions based on existing data. Generative AI operates by learning patterns, styles, and structures from large datasets and then using this knowledge to create something new.

One of the most well-known applications of generative AI is in natural language processing, where models like ChatGPT (the initials stand for Generative Pre-trained Transformer) have been developed to generate human-like text. These models can write coherent and contextually relevant text, answer questions, generate creative writing, and even perform language translation.

Transitioning from Generative AI to Interactive AI

In the context of our Tower of Babel analogy, where we aim to empower Generative AI (acting as an interpreter) to take on all tasks related to constructing the tower, we recognize that Generative AI needs three fundamental capabilities:

1) the ability to follow human instructions,

2) access to various technologies (referred to as “workers”)

3) planning capabilities.

Although Generative AI wasn’t initially designed with these skills in mind, research is emerging in the realm to encompass these abilities. The following sections elaborate on the ongoing work in these areas.

Equipping Generative AI with Planning and Problem-Solving Abilities

Generative AI, which deals with human-like text, is getting better at thinking and problem-solving through “in-context learning.” This involves giving the AI some information (prompts and responses) before a creative task.

For instance, a technique called “Chain-of-Thought Prompting,” developed by Google trains AI by providing prompts and responses in a sequence. This helps the AI think logically and make effective plans to solve problems.

For more complex problems with multiple solutions, “Tree-of-Thought (ToT) Prompting” is developed by researchers from Princeton University and Google’s DeepMind. The ToT organizes prompts in a decision tree fashion, allowing AI to explore different approaches and come up with creative solutions.

Microsoft’s “Algorithm of Thoughts (AoT)” takes it a step further, enabling AI to reason and solve math problems like humans. AoT is efficient, streamlining the thinking process within a single context, unlike other methods that require numerous queries.

Empowering Generative AI to Utilize External Tools

One exciting frontier in Generative AI is enabling these AI systems to use external tools. Researchers from Meta have taken a significant step in this direction by introducing “Toolformer,” a language model. This model is designed to independently use external tools like search engines and calculators, all without requiring extensive human guidance.

Furthermore, a collaborative effort between researchers from UC Berkeley and Microsoft Research has expanded the capabilities of Large Language Models (LLMs). They’ve created a model called “Gorilla” which builds upon LLaMa, an open-source language model from Meta. Gorilla is fine-tuned to interact with a wide range of tools through API calls, which opens up new possibilities for integrating AI with various software and platforms.

This approach is reinforced by the creation of the “APIBench dataset,” which encompasses a diverse collection of API calls from platforms such as HuggingFace, TorchHub, and TensorHub. This development is shaping the future of Generative AI, making it even more versatile and capable of utilizing external resources.

Empowering Generative AI to Follow Instructions

Generative AI language models are not primarily designed to follow instructions. Their initial training revolves around predicting the next word in the text, which is quite different from the goal of having them follow user instructions. However, the field of Generative AI is rapidly advancing in this direction.

One effective method gaining traction is “reinforcement learning from human feedback (RLHF)” where a pretrained language model is guided to follow human instructions based on human feedback. An example of this approach is “InstructGPT,” a fine-tuned GPT model designed explicitly to follow human instructions.

Another noteworthy development is the study on “In-Context Instruction Learning,” which employs in-context learning techniques to enhance language models’ ability to follow instructions. Although this study primarily focuses on specific tasks, it demonstrates how instruction-based training can significantly improve the alignment between human intention and AI behavior.

The Bottom Line

The journey from Generative AI to Interactive AI is marked by significant advancements in equipping AI systems with the ability to plan, problem-solve, utilize external tools, and follow instructions.

As we continue to break down the language barriers between different technologies and domains, Interactive AI is poised to revolutionize how we interact with and leverage AI-driven systems.

The interdisciplinary efforts of researchers and technologists are driving us closer to a future where AI can seamlessly orchestrate complex tasks, becoming more than just interpreters and evolving into strategic geniuses that empower us in unprecedented ways.