Anthropic evaluated the model’s programming capabilities using a benchmark called SWE-bench Verified. Sonnet 4.5 set a new industry record with a 82% score. The next two highest scores were also ...
MMLU-Pro holds steady at 85.0, AIME 2025 slightly improves to 89.3, while GPQA-Diamond dips from 80.7 to 79.9. Coding and agent benchmarks tell a similar story, with Codeforces ratings rising from ...
Vivo India has announced the OriginOS 6 Preview Program in India, aimed at bringing “the smoothest Android experience with a refreshed interface.” ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results