At DevDay 2024 on October 1st, OpenAI announced API updates aimed at helping developers customize models, build speech applications, cut costs, and enhance the performance of smaller models.
At its San Francisco event, OpenAI highlighted incremental improvements to its AI tools and APIs instead of major product launches.
Today at DevDay SF, we’re launching a bunch of new capabilities to the OpenAI platform: pic.twitter.com/y4cqDGugju
— OpenAI Developers (@OpenAIDevs) October 1, 2024
The company introduced four key API updates:
- Model Distillation
- Prompt Caching
- Vision Fine-Tuning
- Realtime
These tools reflect OpenAI’s shift toward empowering its developer ecosystem instead of competing directly in the end-user application market.
Realtime API
OpenAI has made its Advanced Voice Mode available to all ChatGPT subscribers and is now enabling developers to create speech-to-speech applications. Previously, building AI-powered applications that spoke to users required transcribing audio, processing it with a language model like GPT-4, and converting it back to speech, which often led to a noticeable latency.
Starting this week, Advanced Voice is rolling out to all ChatGPT Enterprise, Edu, and Team users globally. Free users will also get a sneak peek of Advanced Voice.
Plus and Free users in the EU…we’ll keep you updated, we promise.
— OpenAI (@OpenAI) October 1, 2024
The new Realtime API processes audio instantly without linking multiple applications. It supports function calling, enabling tasks like ordering pizza or scheduling appointments, with future updates planned for multimodal experiences, including video.
The API costs $5 per million input tokens and $20 per million output tokens for text, while audio processing is priced at $100 per million input tokens and $200 per million output tokens, translating to approximately $0.06 per minute of audio input and $0.24 per minute of audio output.
Introducing Vision to the Fine-Tuning API
Developers can now fine-tune GPT-4o with images, improving its visual recognition for applications like visual search, object detection, and improved medical image analysis.
For instance, OpenAI says that Grab, a food delivery and rideshare company, transforms driver-collected street imagery into mapping data for GrabMaps. Using 100 examples, they trained GPT-4o to localize traffic signs and count lane dividers, increasing lane count accuracy by 20% and speed limit sign localization by 13%, automating the mapping process.
To support developers, OpenAI will offer one million free training tokens daily in October. Starting in November, fine-tuning GPT-4o with images will cost $25 per million tokens.
Prompt Caching
Prompt Caching reduces API costs by allowing developers to reuse frequent prompts at a discounted rate. Long prefixes, often used to guide model behavior and improve consistency, typically increase API call costs.
OpenAI‘s API now automatically caches lengthy prefixes for up to an hour, providing a 50% discount if reused. This feature applies to the latest GPT-4o, GPT-4o mini, o1-preview, o1-mini, and their fine-tuned models, helping developers save money.
Model Distillation
Model Distillation enhances smaller models, like GPT-4o mini, by using outputs from larger models. Previously, the process was error-prone, requiring developers to manage multiple tasks for dataset generation and performance measurement. The new Model Distillation suite in the API platform streamlines this by enabling developers to create datasets with advanced models, fine-tune smaller models, and assess their performance on specific tasks.
To assist developers with distillation, OpenAI is offering two million free training tokens daily for GPT-4o mini and one million for GPT-4o until October 31. Beyond this limit, training and operating a distilled model will be priced at OpenAI’s standard fine-tuning rates.