Math Whole Number On the Benchmark

New secret math benchmark stumps AI models and PhDs alike

On Friday, research organization Epoch AI released FrontierMath, a new mathematics benchmark that has been turning heads in the AI world because it contains hundreds of expert-level problems that ...

Unite.AI

From Math Exams to Machine Reasoning: AI’s Latest Struggles

Recently, Artificial Intelligence (AI) has reached a historic milestone in one of the world's toughest math contests, the International Mathematical Olympiad (IMO). Google DeepMind’s Gemini Deep Think ...

Bleeping Computer

Grok 4 benchmark results: Tops math, ranks second in coding

Grok 4 is a huge leap from Grok 3, but how good is it compared to other models in the market, such as Gemini 2.5 Pro? We now have answers, thanks to new independent benchmarks. LMArena.ai, which is an ...

Ars Technica

New secret math benchmark stumps AI models and PhDs alike | Page 2 | Ars OpenForum

FrontierMath: a new benchmark of expert-level math problems designed to measure AI’s mathematical abilities. See how leading AI models perform against the collective mathematics community. They ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results