Grok 1.5 Nears GPT-4 Level Performance — But Still Playing Catchup

Why Trust Techopedia
KEY TAKEAWAYS

  • Grok 1.5, Elon Musk’s AI startup X.AI's chatbot, shows promising performance improvements, approaching GPT-4 level performance in various benchmarks.
  • Grok's unique selling point lies in its humorous approach and freedom from rigid content moderation guidelines — but its market proposition remains uncertain.
  • However, it holds up remarkably well in testing against competitors like ChatGPT and Gemini.
  • It remains a unique alternative in the AI landscape with its humorous approach, providing users with a wry perspective on the world.

In the era of artificial intelligence (AI), one week feels like a year, with newer models arriving faster than a Big Mac.

With Elon Musk’s AI startup X.AI’s humorous-flavored chatbot Grok now reaching version 1.5, it’s — as we will discuss below — in contention to be used as freely as ChatGPT, but is it all the way there yet?

Grok 1.5, released right at the end of March, not only boasts a context length of 128,000 tokens, the same as GPT-4 Turbo, but also approaches close to GPT-4 level performance on key performance benchmarks, including Massive Multitask Language Understanding (MMLU), MATH (mathematical problem solving, and GSM8K (Grade School Math 8K).

In one category called HumanEval, which measures the code generation capabilities of language models, Grok 1.5 actually outperformed GPT-4 with a score of 74.1% compared to 67%.

These results are promising, but Musk claims they are just a taste of what’s to come, releasing a post on X stating that “Grok 2 should exceed current AI on all metrics,” while claiming that it’s “in training now.”

Where Does Grok Stand Post Update?

Back when X.AI was first announced in November 2023, there was lots of hype around the release, presented as Musk’s answer to ChatGPT — an assistant with real-time knowledge of the world via the X platform. Now Grok 1 is one of the largest open-source models available on the market, with 314 billion parameters.

Yet shortly after release, it faced criticism from all angles — from being unfunny to being woke, all the while failing to reach the level of GPT-4.

The release of Grok 1.5 is a big win for X.AI, in the sense that it shows the gap between the humorous chatbot and ChatGPT is closing. But the organization is still playing catch-up.

For starters, there is a question of scale. Grok is only available to X Premium+ subscribers, and it’s unclear how many Premium+ subscribers there are on the platform as of today.

Back in September 2023, Fortune reported that there were 40,000 paying subscribers. By comparison, ChatGPT had 100 million weekly active users that same month.

Of course, ChatGPT has had longer to establish itself and came with a First Mover advantage — but one of the biggest problems facing Grok is how competitive the large language model (LLM) market is. There are so many high performance tools now that AI vendors not only need to perform well, they have to offer concrete differentiation.

Humor’s Place in the AI Market

In an attempt to differentiate Grok from other LLMs, X.AI initially marketed it as “an AI modeled after the Hitchhiker’s Guide to the Galaxy,” with “a bit of wit” and a “rebellious streak,” which “is intended to answer almost anything.” This would include “spicy questions that are rejected by most other AI systems.”

In this sense, Grok’s differentiation was that it would provide users with a virtual assistant that wasn’t hamstrung by moderation guidelines like ChatGPT or Bard (now Gemini) and would respond to a wider range of questions in a more lighthearted way.

The problem with this differentiation is that it’s ultimately unclear if Grok’s approach to content moderation and its humorous outputs make it preferable to ChatGPT, Gemini, or Claude 3 in most use cases. For example, content creation and translation don’t leave much of an opportunity for humor.

In addition, generative AI also has a major PR problem in that a substantial proportion of people are anxious about what this technology means for the future.

According to Pew Research, 52% of Americans say they feel more concerned than excited about the increased use of artificial intelligence.

Likewise, research conducted by Forbes Advisor finds that 76% of consumers are concerned with misinformation from AI tools like ChatGPT, Bing Chat, and Gemini.

Given how widespread these concerns are, there will be many users who would rather gravitate toward heavily moderated AI tools that emphasize accuracy and harmlessness over humor.

This isn’t to suggest that Grok is dangerous, but that many people are too cautious about language models’ general tendency to produce misinformation, prejudiced output, and harmful content to embrace a humorous AI assistant (even if X.AI has a robust content moderation policy).

While X.AI tried to address such users head-on in its initial promotional release by saying, “Please don’t use it if you hate humor!”, this positioning will inevitably write a lot of users off unless Grok becomes head and shoulders better than every other LLM on the market.

Grok vs GPT-4, Gemini

ChatGPT w/GPT-4 and Gemini are two of Grok’s biggest competitors in the LLM market, but each of these competitors has some strong advantages and market positioning that give them a significant edge.

Since the release of ChatGPT in November 2022, OpenAI has built its flagship GPT-4 model into a multimodal virtual assistant that accepts text, voice, and image input. It has also built the generative AI equivalent of an app store – the GPT Store where developers can share custom versions of ChatGPT created with the GPT builder.

Similarly, Gemini has built an identity as a multimodal research assistant that users can use to search the web. It is also rolling out integrations with Google Cloud and Google products like Gmail, Google Docs, and Search.

Other competitors like Microsoft have the GPT-4-driven Bing Chat, and the Microsoft Office 365 ecosystem.

In comparison — Grok has very little beyond its promising performance, connection to X, and its use of humor. The jury is still out on whether this will be enough to put it in a position to go toe to toe with OpenAI or Google.

The Bottom Line

Grok has come a long way in a short amount of time, but in spite of its performance, there is a long road ahead before it can be considered on par with ChatGPT and Gemini in the immensely competitive LLM market.

However, as a tool to experiment with and as an alternative, wry look at the world? It’s good to have a competitor with a hint of snark around it.

Related Terms

Related Article