October 30, 2025

Radiology's Last Exam (RadLE): Benchmarking Frontier Multimodal AI Against Human Experts and a Taxonomy of Visual Reasoning Errors in Radiology

State-of-the-art medical imaging benchmark to comprehensively assess multi-modal capabilities of foundation models in radiology

The benchmark highlights the present limitations of generalist AI in medical imaging and cautions against unsupervised clinical use. We also provide a qualitative analysis of reasoning traces and propose a practical taxonomy of visual reasoning errors by AI models for better understanding their failure modes, informing evaluation standards and guiding more robust model development.

Read Next Publication

/ Get Started

Let's Accelerate Scientific Innovation Together

Join leading researchers, startups, and institutions advancing real-world science.
We believe research isn’t just about data—it’s about delivering actionable insights that fuel progress and change lives
Laboratory & research template
Alex Tran
Founder & Chief Scientist
Discuss your use case with us