Beyond coding capabilities, the model performed competitively against Claude Sonnet 4.5, the company's flagship LLM that launched at the end of September, on a series of benchmarks, including the MMMU ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results