For the past year, an artificial intelligence (AI) war between OpenAI, Microsoft, Google, and others has supercharged the disruptive field, each competing head-to-head and releasing new and more powerful models.
While Google was not first on the AI scene, it now intends to rise to the top with Gemini — speculated as the most powerful AI model ever to exist.
Gemini was launched on Wednesday, December 6, 2023, so we will now see how the long game plays out.
This is how Gemini works, how powerful it is, and what it will be able to do — it is everything we know about Gemini today.
READ MORE: Google Gemini is LIVE! Find Out More
Google Gemini: Multimodal From the Start
From the start when Gemini was first announced on May 10 during the Google I/O developer conference by CEO Sundar Pichai himself, one thing was made clear: Google was building a next-generation AI. The project, led by Google teams Brain Team and DeepMind, builds upon PaLM 2.
PaLM 2, or Pathways Language Model 2, is the core technology that Google uses to drive AI capabilities throughout its suite of products. This includes Google Cloud products and services, Gmail, Google Workspace, hardware devices such as the Pixel smartphone or the Nest thermostat, and, of course, the famous AI chatbot Bard.
Back then, Gemini was still in full development and training mode, but Pichai revealed what would make the new AI different.
Gemini Takes Multimodal AI Beyond
“Gemini was created from the ground up to be multimodal.”
That was the key phrase from Pichai, and if there is one word that describes Gemini, it is without a doubt “multimodal“. While many confuse multimodal AI with any AI that can work with different content, such as images or text, for Google, the term means much more.
Recently, on October 24, during Alphabet’s third-quarter 2023 earnings, Pichai gave evident signs of what type of multimodal AI they were building.
“We are just really laying the foundation of what I think of as the next-generation series of models we’ll be launching all throughout 2024,” Pichai said.
“The pace of innovation is extraordinarily impressive to see.”
Gemini is a More Human AI
In one way or another, we have already witnessed multimodal AI. Companies like OpenAI — responsible for ChatGPT — or Microsoft offer different generative AI technologies that can work with images, text, data, and even code. However, all these early AI systems are just scratching the surface of multimodal technology, as the integration of different content and data formats is not efficient.
The reason why generative AI is such a wild success is because, for the first time, a machine can imitate what humans do. But what exactly can humans do? We can not only chat, code, write reports, and create images, we can do all of that.
The human brain is brilliantly complex. It can simultaneously interpret and understand various data formats, including text, words, sounds, and visuals. This allows us to make sense of the world around us, respond to stimuli, and solve problems in creative and innovative ways. And that is what Google’s Gemini is all about. A new AI that comes closer to what humans really do: a multi-tasking multimodal AI.
Gemini Is Not One Model, Its Many AIs Combined
There is only one way to create elegant and efficient multimodal AI. That is combining different AI models into one. Machine learning and AI models such as graph processing, computer vision, audio processing, language models, coding and programming, and 3D models need to be integrated and orchestrated to achieve synergy when developing multimodal AI.
This is a monumental, challenging task, and Google wants to take this concept to a new, unprecedented level.
Unleashed for Developers
Another big difference between Gemini and other models like ChatGPT or Bing Chat is the currently limited level of access developers are given to the technology.
But straight out of the gate, Gemini will break that trend.
Pichai added that Gemini would be “highly efficient with tools and API integrations”.
This means that Google is not just working on a new AI for it to be a pony show for the web but is building lightweight and powerful versions of Gemini for developers to use and customize to create their own AI apps and APIs.
An AI To Build AI
It’s not too early in the game to understand how developers will use Gemini to create new AI apps and APIs. In mid-September, news broke that Google began giving users access to an early version of Gemini. Naturally, as expected, the first leaks of Gemini came through.
On October 15, Javascript engineer Bedros Pamboukian shocked the world with the first screenshots of what seemed to be Gemini integrated into Makersuite. Released in early 2023 and powered by PaLM 2, Google’s MakerSuite is used by developers to create AI applications.
MakerSuite is basically an AI to create AI. It has a simple user interface where developers can create code generation tools, natural language processing (NLP) apps, and more.
Pamboukian — the first to leak the integration of Gemini into MarketSuite — revealed the tip of the iceberg of Gemini’s multimodal capabilities. The leak shows that Gemini already has text and object recognition capabilities and can caption and understand prompts that combine free text with images.
Is Gemini More Powerful Than ChatGPT?
When comparing Gemini with ChatGPT, many experts talk about parameters. Parameters in an AI system are the variables whose values are adjusted or tuned during the training stage and which the AI uses to transform input data into output. In broad strokes, the more parameters an AI has, the more sophisticated it is.
ChatGPT 4.0, the most advanced AI in operation, has 1.75 trillion parameters. In contrast, Gemini is reported to exceed this number — with reports claiming it will have 30 trillion or even 65 trillion parameters.
But, the power of an AI system is not just about big parameter numbers.
A study by SemiAnalysis assures us that Gemini will “smash” ChatGPT 4.0. SemiAnalysis anticipates that by the end of 2023, Gemini could surpass ChatGPT 4.0 by a factor of five, potentially 20 times more powerful.
Gemini, Chips, and Training Data
The concept behind an AI model is also relevant.
While, as mentioned, ChatGPT’s multimodal capacity is still minimal — it can work with language and code but not with images — Gemini will combine it all.
“Google Gemini is multimodal, meaning it can process and generate text, images, and other data types. This makes it more versatile than ChatGPT, which is only capable of processing text,” the SemiAnalysis report reads.
SemiAnalysis added that Google “invested unprecedented computational power” to train Gemini, exceeding GPT-4. To train Gemini, Google uses cutting-edge training chips known as TPUv5. These chips are reported to be the only technology in the world capable of orchestrating 16,384 chips working together. These super chips are the secret that allows Google to train such a massive model.
SemiAnalysis says:
“At present, no other entities in the field possess the capacity to undertake such training endeavors.”
But training an AI model is not just about chips but also data. And when it comes to data, Google is one of the ruling kings. “Google possesses an extensive collection of code-only data, estimated at around 40 trillion tokens, a fact that has been verified,” SemiAnalysis added.
Forty trillion tokens are the equivalent of hundreds of petabytes or the content of millions of books. According to SemiAnalysis, the Google dataset alone is four times larger than the entirety of the data used to train ChatGPT 4.0, which includes code and non-code data.
The Bottom Line: Google’s End Game for Gemini
Just like PaLM 2 powers everything Google brand, Gemini is expected to do the same for AI. Google is nurturing Gemini and expects it to grow to become the backbone of all AI intelligence embedded and integrated into every Google product and service.
What end products and services will we see powered by Gemini? If it replaces PaLM 2, Gemini will power everything from Maps to Docs and Translate, all Google Workplace and Cloud environment and services, as well as software and hardware and new products.
Google is fully committed to building a more powerful, versatile, and context-aware AI capable of understanding and interacting with the world in new and unprecedented ways.
Programmers will use Gemini to code, automate, and enhance cloud and edge operations, drive sales, and be integrated into chatbots and virtual assistants inside wearable Google tech smartphones, apps, APIs, and much more.
If 2023 ends up being seen as the year AI hits mainstream awareness and use, 2024 really might be the year of the Gemini.