This website uses cookies

Read our Privacy policy and Terms of use for more information.

Dr. Shantanu Nundy MD, MBA, is a Primary Care Physician and an FDA AI Advisor

Readers, we’re always looking to evolve in how we cover the intersection of health and technology. So we couldn’t say no when AI expert Dr. Shantanu Nundy offered to review the latest research globally, and unpack the relative importance. There’s an immense amount of noise in the market today and not enough signal.

So every two weeks, he will unpack a study that’s making the rounds amongst his peer set and dig into the methodology and likely impact.

If you have a study you’d like Dr. Nundy to tackle, please do reach out to us and we’ll ensure he gets your note!

His first review looks at a study with the title: Automation Bias in LLM-Assisted Diagnostic Reasoning Among AI-Trained Physicians, which was published on April 23, 2026. The full article is only available to subscribers behind a paywall.

🔥 HOT TAKE

A new RCT in NEJM AI shows that even doctors trained to use AI are biased by bad AI recommendations. This suggests the problem isn't with physicians; it's that we have no independent way to know when AI is wrong in the first place.

Methodology: 8/10
Importance: 10/10
Likely impact of the study: Should make health systems more wary about doctor-facing AI without rigorous independent benchmarking
Ideal follow-up study: Replicating study in US context with specialty-matched cases and doctors

📋  CLIFF NOTES

Qazi et al. conducted a single-blind randomized clinical trial enrolling 44 physicians across multiple institutions in Pakistan, all of whom had completed a rigorous 20-hour AI literacy training covering LLM capabilities, prompt engineering, and critical evaluation of AI outputs. Physicians were randomized 1:1 to diagnose six clinical vignettes with either accurate ChatGPT-4o suggestions (control) or suggestions containing deliberately introduced errors in 3 of 6 cases (treatment).

Importantly, LLM consultation was entirely voluntary. Physicians could choose to consult, modify, or ignore AI output at any point. Despite their training, physicians exposed to erroneous AI showed a 14 percentage-point drop in diagnostic accuracy.

To me, more interesting than the headline result are two subgroup analyses: 1) more experienced physicians who at baseline had higher diagnostic accuracy had a greater degradation in their diagnostic performance than less experienced physicians (statistically significant); 2) physicians who reported more frequent LLM use prior to the study tended towards greater degradation in diagnostic performance (not statistically significant). These results, if confirmed in other studies, would suggest that as physicians use these tools more, we may paradoxically see worse outcomes, not better.

Table: Key outcome from Qazi et al., NEJM AI (2026). Physicians with identical AI literacy training performed significantly worse when exposed to erroneous LLM recommendations, even when consultation was optional.

Reserve Your Spot for Upcoming Webinars!

Webinar Topic

Panelists’

Timing

Registration

What will AI do for employer healthcare and benefits?

Nick Reber
Ellen Kelsay, Christina Farr

May 19th, 2026
At 3:00 PM (ET)

Privacy AI and the future of HIPAA with the former founding director of ONC

Jodi Daniel, Christina Farr

June 3rd, 2026
At 12:00 PM (ET)

Freeing Data From the EHR

Lisa Bari,
Ryan Howells
Ruth Reader

June 17th, 2026
At 12:00 PM (ET)

Not everyone can access the Top 1% of physicians. Will AI change that?

Daniel Stein
Christina Farr
Fred Thiele

June 23rd, 2026
At 12:00 PM (ET)

🌐  BIGGER PICTURE

This finding doesn't exist in isolation. A growing body of evidence — from Goh et al. in JAMA Network Open to the "From Tool to Teammate" RCT published last month in npj Digital Medicine — consistently shows that how AI is deployed matters as much as its performance in isolation. But this paper adds a darker dimension: you can train physicians on how to use AI, give them autonomy on when and how to use it, and it still may not be enough.

So what does that mean for healthcare leaders and investors? The U.S. clinical AI market is projected to exceed $45 billion by 2030, yet much of the investment and attention is still going to building clinical AI systems and demonstrating their performance on curated benchmarks. What it hasn't built is the infrastructure to evaluate those systems in conditions that reflect clinical reality: adversarial inputs, diverse patient populations, and real physician workflows. That’s the gap the field needs to close.

💬  MY LONGER TAKE

I've spent the better part of two decades at the intersection of human and machine intelligence in medicine. The finding that jumps out to me isn't the 14-point accuracy drop. It's that it happened in physicians who chose to consult the AI. These were engaged, trained clinicians exercising judgment about when AI was appropriate to use, and even then, the AI misled them. That tells me the problem is upstream of the physician entirely.

Please join our mailing list to keep reading

Much of our content is free, but to see it all you have to subscribe to Second Opinion.

I consent to receive newsletters via email. Sign up Terms of service.

Already a subscriber?Sign in.Not now

Reply

Avatar

or to participate

Keep Reading