OpenAI’s O1 Series: What’s new in reasoning models?

Uncategorized

September 16, 2024

OpenAI has launched its new O1 series of AI models, designed to enhance performance in complex reasoning tasks. Previously referred to as Project Strawberry, these models demonstrate significant improvements over their predecessors, particularly in solving difficult reasoning and mathematical questions.

The O1 series introduces a novel approach by prioritising careful deliberation before providing answers. Unlike earlier models that often rushed responses, the O1 series spends additional time analysing queries to deliver more accurate and nuanced replies.

This new capability aims to benefit professionals such as chemists, physicists, and engineers by aiding them in solving complex problems and innovating new solutions.

A company statement highlights, “We trained these models to think through problems in a more human-like manner. They learn to refine their thinking, explore various strategies, and recognise their mistakes.”

These models can tackle intricate, multistep problems and adjust their strategies based on feedback, mirroring human reasoning processes. Benchmarks show that the O1 series outperforms previous models in code generation and handling complex problems. For instance, the models have achieved results comparable to PhD students on challenging tasks in physics, chemistry, and biology.

The O1 series excels in mathematics and coding. In a qualifying exam for the International Mathematics Olympiad (IMO), the new reasoning model solved 83% of problems, a notable improvement over GPT-4o, which solved only 13%.

OpenAI has made the O1-preview and O1-mini models available to all subscribers of its premium ChatGPT Plus and ChatGPT Teams products, as well as to top-tier developers using its enterprise API.

These enhanced reasoning capabilities offer valuable support in fields requiring complex problem-solving, such as healthcare research, quantum optics, and multi-step workflows in development.

Key points on OpenAI’s O1 series

While these new models show significant improvements, they are not examples of artificial general intelligence (AGI) and still fall short of human-level reasoning.

The O1 series may prompt competitors like Google and Meta to accelerate their own AI developments.

The models employ techniques such as “chain of thought” reasoning and reinforcement learning, yet OpenAI has not detailed their training processes. The O1-preview models are accessible to premium ChatGPT subscribers and top developers, but they come with high usage costs. OpenAI has opted not to disclose the models’ internal reasoning steps.

The new scaling laws introduced by O1 suggest that longer processing times can lead to more accurate results, impacting both deployment and costs. Despite its advanced capabilities, O1 also raises concerns about potential unintended actions and safety risks.

The new models also come with improvements in safety. In a jailbreaking test, which assesses how well models adhere to safety rules when users attempt to bypass them, GPT-4o scored 22 out of 100. In contrast, the O1-preview model achieved a score of 84, indicating a stronger adherence to safety protocols.

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30