October 30, 2025

Radiology's Last Exam (RadLE): Benchmarking Frontier Multimodal AI Against Human Experts and a Taxonomy of Visual Reasoning Errors in Radiology

State-of-the-art medical imaging benchmark to comprehensively assess multi-modal capabilities of foundation models in radiology

The benchmark highlights the present limitations of generalist AI in medical imaging and cautions against unsupervised clinical use. We also provide a qualitative analysis of reasoning traces and propose a practical taxonomy of visual reasoning errors by AI models for better understanding their failure modes, informing evaluation standards and guiding more robust model development.

Let's Accelerate Scientific Innovation Together

Join leading researchers, startups, and institutions advancing real-world science.

Get in Touch Partner with Us

We believe research isn’t just about data—it’s about delivering actionable insights that fuel progress and change lives

Alex Tran

Founder & Chief Scientist

Radiology's Last Exam (RadLE): Benchmarking Frontier Multimodal AI Against Human Experts and a Taxonomy of Visual Reasoning Errors in Radiology

Read Next Publication

Predicting in‐hospital mortality among patients admitted with a diagnosis of heart failure: a machine learning approach

Let's Accelerate Scientific Innovation Together