OpenAI’s GPT-4o: Everything We Know So Far

Why Trust Techopedia

Today OpenAI announced Chat-GPT 4o, or GPT-4o, a major update to the large language model (LLM) that more than 100 million people are using.

The features, which will roll out over the next few weeks, bring speech and video to all users — free or paid — and the biggest takeaway is just what a difference using voice and video to interact with ChatGPT-4o brings.

The changes, OpenAI told viewers on the live-stream, are aimed at “reducing the friction” between “humans and machines” and “bringing AI to everyone.”

In a stunning demo, technology chief and presenter Mira Murati, along with ChatGPT developers, hold real-time conversations with ChatGPT, asking for a bedtime story.

GPT-4o even makes jokes in different voices, from playful to dramatic to singsong at the request of OpenAI researcher Mark Chen.

We saw video capabilities, real-time voice communication, and simulated emotion during the voice demo.

Advertisements

Key Takeaways

  • OpenAI’s Chat-GPT 4o introduces speech and video capabilities, enabling users to interact with the model through voice and video inputs.
  • The update aims to reduce the friction between humans and machines by leveraging advanced AI capabilities to create more natural and seamless interactions.
  • GPT-4o can engage in real-time conversations, respond to multiple speakers simultaneously, and even simulate emotions, adding depth and richness to interactions.
  • The upgrade includes improvements in quality and speed across over 50 languages, as well as a desktop version for Mac users.
  • OpenAI acknowledges the challenges related to misuse of real-time audio and video capabilities, and emphasizes it will work with stakeholders to address these challenges responsibly.
  • GPT-4o rolls out iteratively over the coming weeks, including a Desktop app starting with the Mac.

When using video, the ChatGPT holds real-time conversations with the engineers — solving math equations written on paper in front of a phone lens while ambling along with real-time, playful conversation.

Watch the OpenAI LiveStream

OpenAI says the features, which will roll out over the next few weeks, will also boost quality and speed in over 50 languages “to bring this experience to as many people as possible”.

The upgrade also includes a desktop version, rolling out today on the Mac and available to paid users.

The team talked about university lecturers offering tools to their students or podcasters creating content for their users and how you can use real-time data in your work.

OpenAI says GPT-4o (the ‘o’ stands for ‘Omni’) can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds — similar to human response time in a conversation.

While the features will also be available to free users, OpenAI also discussed how Pro users are not left out, being able to access up to five times the capacity.

The changes will also flow to the application programming interface (API), with the API said to be 2x faster and 50 cheaper.

One impressive feature of voice and video was that all three presenters talked to ChatGPT at the same time — the artificial intelligence (AI) successfully discerned all the speakers and talked back to each of them.

Some users on X, formerly Twitter, compared the new flavor of ChatGPT to the movie “Her,” in which the all-knowing AI companion was indistinguishable from a human personality.

We also saw real-time translation between Italian and English, based on a user question on Twitter.

OpenAI Technology Chief and presenter Mira Murati introduces OpenAI GPT 4-o
OpenAI Technology Chief and presenter Mira Murati introduces OpenAI GPT 4-o.

OpenAI stated that “GPT-4o presents new challenges to real-time audio and real-time vision against mis-use, and we continue to work with different stakeholders … to figure how to best bring these technologies into the world.”

The features will, therefore, be rolled out iteratively over the next few weeks, with safeguards intact.

Asked for comment, Brian Jackson, Principal Research Director at Info-Tech Research Group, said:

“After watching OpenAI’s live event today, my takeaway is that the release of GPT-4o represents both a significant upgrade to ChatGPT’s capabilities and insight into its business strategy.

“So far, ChatGPT has been orchestrating across multiple models to handle visual interpretation, audio analysis, and understanding text. GPT-4o changes that by bringing together those capabilities natively under one unified model.

“In an onstage demo that strongly evoked the 2013 Spike Jonze movie Her, members of the OpenAI team had a real-time conversation with the updated model. It fluidly detected emotion in the users’ voice, paused when it was interrupted and adjusted its responses accordingly, and understood a math question drawn on paper by processing the view through a smartphone’s camera.

“It’s like a super-charged version of Siri or Google Assistant that promises to disrupt our concept of AI personal assistants.

“Beyond the model itself, OpenAI hinted a bit at its business strategy with the release. By lowering the cost of queries on the new model by 50% compared to GPT-4, OpenAI said that it could now afford to bring to the new model to all users, not just paying subscribers.

“This suggests that OpenAI is more interested in drawing in a large number of users than in driving as many paying subscribers as possible and then improving its margin on the subscription service.

“OpenAI also made other formerly paid-for features available to free users, including browsing the web for information and uploading a file for analysis.

“OpenAI also took the opportunity to reference its Custom GPT ‘store’, which has been available for months. It envisions a future where micro-communities form around these Custom GPTs.

“For example, it was suggested a professor could make a Custom GPT for their students, or that a podcaster makes one for listeners.

“This suggests a network business model approach in which the use of ChatGPT is driven as much by a creator community as by OpenAI’s developers themselves, similar to Apple’s relationship with its iOS developer community.

“OpenAI says limits on querying GPT-4o will still be in place for free users, so there is still some incentive to use the paid version.

“I’d expect that its new capabilities will make it a killer feature for smartphones in the very near future.”

OpenAI said in a blog post:

“We spent a lot of effort over the last two years working on efficiency improvements at every layer of the stack.

“As a first fruit of this research, we’re able to make a GPT-4 level model available much more broadly. GPT-4o’s capabilities will be rolled out iteratively (with extended red team access starting today).

 

GPT-4o’s text and image capabilities are starting to roll out today in ChatGPT. We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits. We’ll roll out a new version of Voice Mode with GPT-4o in alpha within ChatGPT Plus in the coming weeks.”

OpenAI chose a good day for the attention-grabbing update, landing a day before Google’s I/O developer conference, which is expected to be AI-heavy.

Advertisements
Eddie Wrenn
Senior Content Editor
Eddie Wrenn
Senior Content Editor

Eddie is Techopedia's Senior Editor who has previously worked in local, national, and international newsrooms in the UK and Australia, including Mail Online and Sydney's Daily Telegraph over the past 20 years. As a former science and technology editor, he focuses on emerging technologies and breaking news at Techopedia. He has also previously worked in product teams at Microsoft and News Corp, where he focused on introducing new editorial tools to newsrooms. He currently resides in London, UK, and spends his free time reading and scuba diving.