October 10, 2024
OpenAI's 01 Models Show Promise but Face Challenges
The latest from OpenAI features two new models, the 01 Preview and 01 Mini, which have sparked interest in the AI community. They introduce the ability to think ahead in real time, aiming to improve planning and decision-making. But just how effective are these new models?
To find out, a recent research paper compared the 01 models to the well-known GPT-4 model. The study used six benchmarks focusing on spatial reasoning and logic problems. These tasks are straightforward for humans but challenging for AI. The arc prize, a million-dollar award for achieving Artificial General Intelligence (AGI), inspired the study's approach. Unlike typical benchmarks that measure skill, AGI measures how well AI can learn new skills.
The research tested the models on three aspects: feasibility, optimality, and generalizability. Feasibility looks at whether the AI can devise a workable plan. Optimality checks if the plan is efficient. Generalizability assesses the model's ability to apply learned skills to new tasks.

The 01 models showed promise, particularly in self-evaluation and following constraints. However, they struggled with decision-making and memory tasks, especially those needing robust spatial reasoning. This challenge is a common issue in many language models. AI researcher Yan Laon notes that language alone might not be enough to handle high-level spatial reasoning.
The paper outlined six tests or games, such as Barman and Blocks World, which required planning and executing specific tasks. In each test, the AI had to follow rules and achieve goals efficiently. The 01 models excelled in some areas but fell short in others, especially when the complexity increased. For instance, the 01 Preview model performed well in simpler tasks but struggled with more complex spatial ones.
One test, Tire World, showed 01 Preview excelling with familiar symbols. But when symbols changed, its performance dropped, highlighting the challenge of generalization. This issue underscores the difficulty AI faces in adapting to new, abstract environments.
The study also noted that while 01 models often found feasible solutions, they weren't always optimal. They sometimes added unnecessary steps, showing inefficiency. Improving decision-making frameworks could help these models achieve better results.
Despite these challenges, the 01 models represent progress in AI development. They show improved rule-following and state management compared to older models. Yet, the journey to achieving true AGI remains long. The study suggests potential improvements, such as integrating multimodal inputs and leveraging human feedback for continuous learning.
The 01 models reflect advancements but also highlight areas needing further work. These insights are crucial for future AI development, especially in creating models that can generalize knowledge across different domains.