Programming Language Benchmarks

12d

Alibaba CODEELO Platform Launch: AI Programming Challenge and LLM Elo Rating System Unveiled

markdown CODEELO: The New Battleground for AI Programming, New Standards for LLM Capability Assessment ...

DeepSeek's new V3.2-Exp model cuts API pricing in half to less than 3 cents per 1M input tokens

MMLU-Pro holds steady at 85.0, AIME 2025 slightly improves to 89.3, while GPQA-Diamond dips from 80.7 to 79.9. Coding and agent benchmarks tell a similar story, with Codeforces ratings rising from ...

Philstar.com

Anthropic launches new AI model, touting coding supremacy

Anthropic's new release is the most sophisticated for applications that allow an AI assistant to use a computer as a human ...

Anthropic sets AI coding record with new flagship Claude Sonnet 4.5 model

Anthropic evaluated the model’s programming capabilities using a benchmark called SWE-bench Verified. Sonnet 4.5 set a new industry record with a 82% score. The next two highest scores were also ...

InfoWorld

A brief history of AI

Artificial intelligence has taken many forms over the years and is still evolving. Will machines soon surpass human knowledge ...

PCMag

Anthropic Debuts Claude Sonnet 4.5 With More Coding, Less 'Deception'

The latest upgrade brings the ability to save your progress and create custom agents, with fewer behavioral issues, such as ...

Digital Information World

Study Links Frequent AI Chatbot Use to Lower Scores in Programming Course

University of Tartu study finds frequent AI chatbot use linked with lower programming grades, highlighting overreliance risks ...

GPT-5 Programming Capability New Evaluation: Poor Surface Performance, Yet Amazing Potential

BENCHPRO, has sparked heated discussions regarding the programming capabilities of artificial intelligence. In this test, the solution rates for GPT-5, Claude Opus 4.1, and Gemini 2.5 were 23.3%, 22.7 ...

The Robot Report

Gemini Robotics 1.5 enables agentic experiences, explains Google DeepMind

Google DeepMind is also making Gemini Robotics-ER 1.5 available to developers via the Gemini API in Google AI Studio.

Google's Gemini 2.5 Flash Lite is now the fastest proprietary model (and there's more big Gemini updates)

Google's Gemini 2.5 Flash Lite is now the fastest proprietary model (and there's more big Gemini updates) Google continues to improve its Gemini family of large language models (LLMs) and its audio ...

Financial IT

Alibaba Cloud Unveils Strategic Roadmap For The Next Generation Of AI Innovation

Alibaba Cloud, the digital technology and intelligence backbone of Alibaba Group, today unveiled its latest full-stack AI ...

Xinhuanet

DeepSeek's R1 sets benchmark as first peer-reviewed major AI LLM

Nature highlighted R1 as the first major LLM to undergo formal peer-review, building upon a preprint released earlier this year that detailed how DeepSeek enhanced a standard LLM to tackle complex ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results