OpenAI’s o1 model, launching today, is trained to think through complex math problems more extensively, try different strategies, and recognize its own mistakes.
In tests, the new model “performs similarly to PhD students” across physics, chemistry and biology tasks, according to OpenAI. The company also claims the model excels at coding and math.
In a qualifying exam for the International Mathematics Olympiad, the model solved 83% of problems correctly, compared to GPT-4o’s 13% score.
Because this model is a preview, it can’t yet upload files or images, browse the web, or perform many of the other useful tasks that ChatGPT can. GPT-4o is therefore the more capable model for many tasks, for the time being.
For complex math, coding, and science tasks o1 is a “significant advancement,” says OpenAI. It also explains that it named the new series “OpenAI o1” to reflect the fact it “represents a new level of AI capability.”
In its blog post, OpenAI said its target audience is anyone looking to solve complex science, coding, math, or similar problems, citing examples such as developers looking to build and execute workflows or physicists wishing to generate complex mathematical formulas.
The safety of this latest model has been tested with a new safety training approach. The o1-preview model scored 84 on one of the company’s hardest jailbreaking tests (scored on a scale of 1 to 100), while GPT-4o scored a less impressive 22.
OpenAI has also started to operationalize agreements with the US and UK AI Safety Institutes. The institutes will get early access to a research version of the model for evaluation and testing prior to and after public release.