AI & MACHINE LEARNING
Benchmarking the Benchmarks: Why Current AI Evaluations Measure the Wrong Things
A systematic review of 147 AI benchmark datasets reveals that most popular evaluation methods are testing for capabilities that do not predict real-world performance. The gap between what we measure and what matters is growing.
AI & MACHINE LEARNING
The Geometry of Large Language Models: Mathematicians Are Finally Making Sense of What's Inside
A new framework from algebraic topology is giving mathematicians precise language to describe the internal structure of neural networks — and the results are surprising.