Programming Language Benchmarks

Autonomous AI Coding Clears 60,000-Line Ceiling: MirrorCode Benchmark Released

AI coding benchmark MirrorCode published its full results June 26, showing Claude Opus 4.7 autonomously rebuilt a 60,000-line interpreter and scored 56% overall — completing tasks that take human ...

GitHub

LangArena: A Balanced Programming Language Benchmark Suite

The suite started with my original implementation in Crystal. AI tools assisted in translating it to other languages. Throughout this process, I reviewed and edited the implementation for semantic ...

Small Language Models Outperform Frontier AI On Cost, Speed And Accuracy

Bigger has defined AI from day one. New data says task-specific small models beat frontier LLMs on accuracy, cost and speed — and save money.

Slator

AI Translation’s Key Benchmark Takes Aim at Low-Resource Languages

The Eleventh Conference on Machine Translation (WMT26) has moved into its active evaluation phase, with test data releases and submission windows now opening across several of the conference’s shared ...

Morning Overview on MSN

Alibaba’s Qwen released three AI models built to drive robots

Alibaba’s Qwen team published three separate AI models designed to give robots the ability to see, manipulate objects, and ...

InfoWorld

33 LLM metrics to watch closely

Look to these key metrics and benchmarks to evaluate the performance, capability, reliability, and safety of your AI models and agents.

1mon

DeepSWE blows up the AI coding leaderboard, crowns GPT-5.5, and finds Claude Opus exploiting a benchmark loophole

DeepSWE puts GPT-5.5 atop the AI coding leaderboard while raising new questions about Claude Opus, SWE-Bench Pro, and benchmark leakage.

1mon

Is there systematic religious bias in AI models? What new research says

ChatGPT, Claude, Grok, Gemini and other AI models display systematic religious bias, according to scientific research from ...

techtimes

Which Programming Languages Should You Learn in 2026? Best Coding Languages for Beginners

Programming languages shape how software, apps, and websites are built, making them one of the most important skills in the modern digital world. With industries shifting toward automation, AI tools, ...

acm.org

How AI is Changing Programming Language Usage

While much attention regarding AI has been focused on developers using it to code, the impact of AI on software development goes far beyond code creation tools. Armando Solar-Lezama, Distinguished ...

Science Daily

Study of 1,700 languages reveals surprising hidden patterns

A massive new analysis of over 1,700 languages shows that some long-debated “universal” grammar rules are actually real. By using cutting-edge evolutionary methods, researchers found that languages ...

MIT Technology Review

AI benchmarks are broken. Here’s what we need instead.

One-off tests don’t measure AI’s true impact. We’re better off shifting to more human-centered, context-specific methods. For decades, artificial intelligence has been evaluated through the question ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results