/ Publications

Analysis & Insight Blog Posts

Our analysis and [mild] takes on SOTA healthcare AI work

The Benchmark Bug That Inflates(or Deflates) Healthcare AI Results

We have found a bug in SOTA healthcare benchmark design that causes wildly different scores even from the same output. Depending on how the evaluation criteria is phrased, the same output produces a difference in score up to 16%!

Are We Measuring Reasoning or Formatting? Rethinking Benchmarks for Clinical AI

Our analysis of MedAgentBench, SOTA benchmark for agentic workflow in FHIR systems

Your Next MRI Could Use a Little Help (And Why That’s a Big Problem)

Our take on challenges in AI radiology

Why most healthcare AI benchmarks are biased - and how to fix them

We discovered a big flaw in benchmark generation while creating our own internal benchmark for medical conversations.

/ Get Started

Let's Accelerate Healthcare AI Innovation Together

Join leading researchers, startups, and AI Labs advancing impact in AI and Healthcare.

Discuss your use case with us