There’s a lot of hype going around social media surrounding OpenAI’s final announcement of its “12 Days of OpenAI” extravaganza. Following the release of o1, codenamed strawberry, earlier this year, they have now announced o3. (o2 was not immediately available for comment)
Here’s arguably the most important tweet regarding o3, from an AI researcher who designed one of the best known dataset/tasks meant to measure true reasoning ability:
Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks.
— François Chollet (@fchollet) December 20, 2024
It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task… pic.twitter.com/ESQ9CNVCEA
This is a big milestone within AI research, but it has also led to some hyperbole:
Straight facts. Let that sink in.
— Chubby♨️ (@kimmonismus) December 21, 2024
1) there is no wall
2) AGI is here, next step ASI
3) exponential development is real
4) we are still accelerating
5) no end in sight
6) from o1 to o3 was 3 months! https://t.co/yLcHIy4lL9 pic.twitter.com/bBMMjq3jCp
As well as some ridiculous nonsense:
If you are a software engineer who’s three years into your career: quit now. there is not a single job in CS anymore. it's over. this field won't exist in 1.5 years.
— null (@nullpointered) December 19, 2024
But there are those who have pushed back on the hype:

How to achieve AGI in 2024:
— Andriy Burkov (@burkov) December 20, 2024
1. Define a benchmark with puzzles and call it "The AGI Testing Benchmark."
2. Fine-tune a VLM to solve these puzzles.
3. Declare the AGI achievement.
We still don’t know exactly how o1/o3 work, but most assume it’s a form of search combined with GPT-4.5 LLM(s). Inference costs to solve the most challenging problems are quite high ($1,000+).
Ultimately, my opinion is that while this is a big step forward for AI, we still have a long way to go to get to true AGI. I’ve believed for years now that more advanced inference algorithms would be necessary to get closer to human-level intelligence, and this appears to be one successful way of using more test-time compute to solve tougher problems. Progress will continue, but my guess is that we are still a decade or so away from genuine human-level intelligence.