Programming Language Benchmarks

DeepSeek's new V3.2-Exp model cuts API pricing in half to less than 3 cents per 1M input tokens

MMLU-Pro holds steady at 85.0, AIME 2025 slightly improves to 89.3, while GPQA-Diamond dips from 80.7 to 79.9. Coding and agent benchmarks tell a similar story, with Codeforces ratings rising from ...

IEEE

Survey of Different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Abstract: Large Language Models (LLMs) represent a class of deep learning models adept at understanding natural language and generating coherent responses to various prompts or queries. These models ...

GitHub

MPL - Motion Programming Language

MPL is a domain-specific language that revolutionizes 3D motion and animation through human-readable, semantic syntax. Designed to bridge the gap between natural language and 3D movement, MPL ...

IEEE

Language Model Evolutionary Algorithms for Recommender Systems: Benchmarks and Algorithm Comparisons

Abstract: In the evolutionary computing community, the remarkable language-handling capabilities and reasoning power of large language models (LLMs) have significantly enhanced the functionality of ...

Slator

Stanford and UC Santa Cruz Launch Benchmark for Audio-Language Models

A team from Stanford University and UC Santa Cruz has introduced AHELM, a new benchmark designed to evaluate audio-language models (ALMs) across a wide range of capabilities. ALMs are multimodal ...

ZDNet

MS-BASIC 1.1 introduced programming to a generation - now you can download it for free

Microsoft open-sourced the MS-BASIC language. Bill Gates would never have seen this coming back in the day. MS-BASIC 1.1 was many developers' first language. In 1976, they rebranded Altair BASIC to ...

PC World

Microsoft’s first-ever programming language was just open-sourced

Did you know that, between 1976 and 1978, Microsoft developed its own version of the BASIC programming language? It was initially called Altair BASIC before becoming Microsoft BASIC, and it was ...

GitHub

Elfsong/Awesome-Code-Benchmark

Software Development Life Cycle Perspective A Survey of Benchmarks for Code Large Language Models and Agents from Xi’an Jiaotong University HumanEval Evaluating Large Language Models Trained on Code ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results