What is Grok?
The chatbot is inspired by the sci-fi comedy, The Hitchhiker’s Guide to the Galaxy and is powered by xAI’s frontier large language model (LLM), Grok-1. It also has access to real-time data taken from posts made on X (formerly known as Twitter).
As xAI notes in the announcement blog post, Grok is “intended to answer almost anything” and is “designed to answer questions with a bit of wit.” The post noted that the solution also aims to help users access information, process data, and discover new ideas.
The @xAI Grok AI assistant will be provided as part of 𝕏 Premium+, so I recommend signing up for that.
Just $16/month via web. https://t.co/wEEIZNjEkp
— Elon Musk (@elonmusk) November 4, 2023
The organization also confirmed that Grok would be available to a limited group of users in the U.S. before a wider release. On 22 November 2023, Musk posted on X that Grok would be available to all Premium+ subscribers the following week.
Grok should be available to all X Premium+ subscribers next week.
— Elon Musk (@elonmusk) November 22, 2023
Grok AI vs. ChatGPT, Other AI Assistants
While the nature of this training data hasn’t been publicly disclosed, being able to access the high volume of conversational content on X and potentially some of the vendor’s behind-the-scenes proprietary data could make the chatbot a significant player in the market.
In addition, Grok’s emphasis on humor and wit is also a significant point of differentiation from competitors like GPT-4 and Claude 2, which have focused on interacting with users in a conversational but restrained manner and minimizing harmful outputs. As Musk explained in a post on X, Grok is “based & loves sarcasm.”
Grok has real-time access to info via the 𝕏 platform, which is a massive advantage over other models.
It’s also based & loves sarcasm. I have no idea who could have guided it this way 🤷♂️ 🤣 pic.twitter.com/e5OwuGvZ3Z
— Elon Musk (@elonmusk) November 4, 2023
As a result, its playful approach of Grok has the potential to entertain users with witty responses in a way that replicates the lighthearted nature of everyday human interaction.
How Does Grok Perform Against Other LLMs?
With just two months’ worth of training completed, xAI has already reported that the Grok-1 LLM has performed well on key AI benchmarks like Human Eval and MMLU, scoring 63.2% and 73%, respectively.
These scores exceeded both OpenAI’s GPT-3.5 and Meta’s Llama 2 70B on both benchmarks. For reference, GPT-3.5 scored 48.1% on Human Eval and 70% on MMLU, while Llama 2 70B scored 29.9% and 68.9%.
xAI also reports that Grok performed well on another performance task which tested how Grok, Claude 2, and GPT-4 performed on the May 2023 Hungarian national high school math exam. In this exercise, Grok-1 achieved a C grade with 59%, Claude 2 achieved a C grade with 55%, and GPT-4 achieved a B grade with 68%.
Although Grok falls short of GPT-4 level performance, the fact that it is competitive with LLMs like GPT-3.5, Claude 2, and Llama 2 70B on certain tasks is impressive when considering it has only been under development for four months.
It also uses a fraction of the training data and compute resources of LLMs like GPT-4 and Llama 2 70B. While it’s unclear how many parameters Grok-1 has, Grok-0 reportedly had 33 billion parameters.
By comparison, LLama 2 has 70 billion.
Research Team Behind Grok
xAI launched in March 2023 and is composed of experienced AI researchers who’ve previously worked at organizations and institutions, including OpenAI, DeepMind, Google Research, and the University of Toronto.
This includes Ibor Babuschkin, Manual Kroiss, Yuhuai Wu, Christian Szegedy, Jimmy Ba, Toby Pohlen, Ross Nordeen, Kyle Kosic, Greg Yang, Guodong Zhang, Zihang Dai, Xiao Sun, Fabio Aguilera-Convers, Ting Chen, and Szymon Tworkowski.
The company’s researchers have contributed to a wide range of innovations in the space, including GPT-4, GPT- 3.5, AlphaStar, AlphaCode, Inception, Minerva, the Adam optimizer, batch normalization, layer normalization, Transformer-XL, autoformalization, and batch size scaling.
Overall, the highly experienced team of researchers behind Grok suggests that xAI has the potential to be an important vendor in the generative AI market going forward.
The Potential for Harmful Output
As an LLM-driven chatbot, Grok faces the same challenges as all other language models in that it can be prompted or jailbroken to produce harmful, discriminatory, or illegal content.
However, it’s unclear whether Grok’s emphasis on providing humorous and witty responses to user prompts will amplify the risk of creating content that some users may find offensive.
As xAI notes, Grok has a “rebellious streak” and will answer questions rejected by other AI systems, which means that there are potentially more opportunities for offensive content to be generated.
Other Challenges: Bias from X
Another potential risk factor is the use of real-time data from X. Historically, X, when it was known as Twitter, experienced lots of criticism over the spread of toxicity and misinformation throughout the platform.
For example, Pew Research found that 17% of users have experienced harassing or abusive behavior on the platform, and 33% have seen a lot of inaccurate or misleading information.
This means there is a risk that some of the toxicity and misinformation on the platform could leak into Grok’s training data and create harmful biases and responses. This means a significant amount of content moderation will need to be in place to prevent toxic or inaccurate content from filtering into outputs.
So far, xAI does appear to be working to minimize the risk of harmful outputs. The company highlighted in its blog post that the team is “interested in improving the robustness of LLMs” and “doing our utmost to ensure that AI remains a force for good.” It is actively being advised by Dan Hendrycks, the director of the Center for AI Safety.