Google Gemini Goes Live: Here’s What to Expect from the AI

Today, Google announced the launch of its new multimodal AI model called Gemini, designed to understand and recognize text, images, video, audio, and code.

“Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research,” CEO and co-founder of Google DeepMind Demis Hassabis wrote in the official blog post.

“It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across, and combine different types of information including text, code, audio, image, and video,” Hassabis wrote.

Google announces Gemini

There are three confirmed versions of the model: Gemini Ultra, Gemini Pro, and Gemini Nano, which are all unique. Gemini Ultra is the largest, while Gemini Pro is designed to scale across a range of tasks, and Gemini Nano is the most efficient model for on-device tasks (making it ideal for use on mobile devices).

Gemini - Three Types — Gemini – Three Types

As of today, Gemini has been added to Google’s Bard chatbot, and Gemini Nano will be added to the Pixel 8 Pro to power summary and smart-reply capabilities in December.

So, Just How Good is Google Gemini?

The release comes just a month after OpenAI announced the launch of GPT-4 Turbo and its own multimodal model, GPT-4v, which can understand image inputs.

While it’s too early to conclude that Gemini has overtaken OpenAI and GPT-4, it certainly does look that way. In an interview with The Verge, Hassibis confirmed that Google had tested Gemini against GPT-4 across 32 benchmarks and found that Gemini was “substantially ahead” on 30 of those.

One of Gemini’s standout achievements so far is that it has become the first model to outperform human experts on massive multitask language understanding (MMLU), achieving a score of 90.0%.

At the same time, Gemini Ultra has scored just above GPT-4 in a range of benchmarks, including:

Big-Bench Hard (83.6% vs 83.1%),
DROP (82.4% vs $80.9%),
GSM8K (94.4% vs 92.0%),
MATH (53.2% vs 52.9%),
HumanEval (74.4% vs 67.0%).

This indicates that Gemini Ultra has a slight edge over GPT-4 in multi-step reasoning, reading comprehension, basic arithmetic manipulations, and Python code generation.

Gemini benchmarks

In addition, Google claims Gemini Ultra also edges out GPT-4 in multimodal performance, natural image understanding, natural image OCR, document understanding, infographic understanding, and mathematical reasoning in visual contexts.

READ MORE:

Gemini has also achieved a state-of-the-art score on the MMMU benchmark, which measures performance in multimodal tasks.

To achieve this performance, Gemini was pre-trained on different modalities and then fine-tuned to increase the model’s ability to understand and reason about different types of inputs better than any LLM to date.

The Bottom Line

With the doors now open, we’ll explore Gemini over the coming weeks and see how the claims stack up against reality.

What is exciting is how Gemini can be plugged in across Google’s suite of services — will Google Home be easier to use (except for the odd hallucination) when you can have more casual conversations with your ‘house’? Will Search Engine Pages be radically different? Will services like Gmail and Google Maps be very different, with AI sitting between you and the product?

When considering Gemini’s performance on these benchmarks alongside plans to integrate the LLM with popular products like Chrome and Search on the road toward a Search Generative Experience, it’s clear that OpenAI has a serious contender to confront.