Flowgorithm Grade Average Examples

OpenAI tested GPT-5, Claude, and Gemini on real-world tasks - the results were surprising

OpenAI had experienced professionals blindly grade outputs from OpenAI's GPT-4o, o4-mini, o3, and GPT-5 models, as well as Anthropic's Claude Opus 4.1, Google's Gemini 2.5 Pro, and xAI's Grok 4.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results