“`html
Is OpenAI’s ChatGPT Pro Worth the $200 a Month?
OpenAI has recently released o1 Pro Mode, claiming it to be the smartest model in the world. But does it live up to the hype? I signed up for the service, tested it, and analyzed the release notes to find out.
The Cost of ChatGPT Pro
To access ChatGPT Pro, you need to pay $200 a month. For this, you get unlimited access to advanced voice and o1, the full version of the o1 preview we’ve been testing.
However, if you currently pay $20 a month for ChatGPT, you will still have access to o1, just not o1 Pro Mode. OpenAI warns that staying on the $20 tier will limit your access to the latest advancements in AI.
Benchmark Performance
I ran benchmark tests on o1 and o1 Pro Mode. The results were surprising.
o1 and o1 Pro Mode are both significantly better at mathematics than previous models. However, they are not as good at coding or PhD-level science questions as you might expect.
o1 Pro Mode is not that much better than o1. According to OpenAI, o1 Pro Mode uses a “special way” of aggregating o1 answers and picking the majority vote answer. This increases reliability, but it does not significantly improve performance.
The o1 System Card
OpenAI has released a system card for o1. The card contains a variety of benchmarks and tests.
o1 is slightly more persuasive than o1 preview, which is itself slightly more persuasive than GPT-4o. However, o1 is not as good at writing creative tweets as GPT-4o or Claude’s sonnet.
o1 Pro Mode is not mentioned in the system card, which suggests that it is not a major improvement over o1.
My Own Benchmark
I ran my own benchmark on o1 and o1 Pro Mode using the 10 questions in the public dataset of SimpleBench. The results were disappointing.
o1 and o1 Pro Mode both got 5 out of 10 on the benchmark. This suggests that o1 Pro Mode does not significantly improve performance over o1.
Image Analysis and Abstract Reasoning
o1 and o1 Pro Mode can both analyze images. However, their performance is not overwhelming.
o1 Pro Mode is not very good at abstract reasoning. For example, it failed to identify the pattern in a set of images.
OpenAI Research Engineer Interview Questions
o1 Preview does better than o1 on OpenAI research engineer interview questions, both pre and post mitigation.
o1 Mini also does better than o1 on these questions.
Vanilla Hallucinations
The performance difference between o1 and o1 preview on vanilla hallucinations is slight.
o1 Preview outperforms o1 on one fairly important machine learning benchmark that tests the models’ ability to self-improve.
Safety
OpenAI claims that o1 outperforms o1 Preview on difficult real world questions and reduces major errors by 34%. However, they do not provide any details on what these questions are.
o1 is better at speaking different languages than all other OpenAI models.
Conclusion
Overall, I am not particularly impressed with o1 Pro Mode or o1 full for that matter. There is no way that they can justify $200 a month for pro mode.
I believe that OpenAI will release a limited preview of GPT-4o in the next 11 days. This would explain why Sam Altman said that they are not hitting a wall in these benchmarks.
Thank you for watching. Have a wonderful day.
“`