Benchmarking the Benchmarks: Why Current AI Evaluations Measure the Wrong Things
AI & MACHINE LEARNING

Benchmarking the Benchmarks: Why Current AI Evaluations Measure the Wrong Things

A systematic review of 147 AI benchmark datasets reveals that most popular evaluation methods are testing for capabilities that do not predict real-world performance. The gap between what we measure and what matters is growing.

The Geometry of Large Language Models: Mathematicians Are Finally Making Sense of What's Inside
AI & MACHINE LEARNING

The Geometry of Large Language Models: Mathematicians Are Finally Making Sense of What's Inside

A new framework from algebraic topology is giving mathematicians precise language to describe the internal structure of neural networks — and the results are surprising.