Synthetic Data vs. Real-World Data: A Reality Check for Healthcare AI

Synthetic Data vs. Real-World Data: A Reality Check for Healthcare AI

2 min read

I first encountered the concept of synthetic data back in 2013, while teaching a health informatics course as a tenure-track assistant professor at UNC Charlotte. To help students experience the complexity of Electronic Health Record (EHR) systems, we partnered with a startup that built an educational EHR platform on top of the VA’s open-source system—once considered the gold standard in the industry. It was a great way for students to “feel” the real-world challenges of clinical data.

Fast forward, the idea of synthetic data expanded into the HL7 FHIR community, enabling developers to test interoperability. That made sense. But later, while working at Merck, I saw startups pitching synthetic data as their core asset to pharma. I stayed collegial, but deep down, I was skeptical: Who will trust results generated from synthetic data—FDA, CMS, or any Health Technology Assessment body?

The argument was compelling: synthetic data can train AI models when real-world data is scarce. It’s a concept rooted in machine learning. But does it sound a bit like a perpetual motion machine—too good to be true?

Here’s why I remain cautious:

  • Semantic correctness matters. Synthetic data is often generated using generative adversarial networks (GANs), variational autoencoders, or diffusion models. But how do we ensure biomedical plausibility? Can a male patient have breast cancer? (Yes, but rarely.) Can someone have both hypertension and hypotension? What about hundreds of thousands of drug-drug contraindications? These nuances matter.
  • Quality metrics are unclear. How do we evaluate how “good” a synthetic dataset is? Statistical similarity isn’t enough when clinical decisions are at stake.

Despite my skepticism, the field is booming—AI hype attracts investment. Today, multiple companies specialize in synthetic healthcare data, and research papers increasingly report models trained on it.

Recently, a viewpoint in The Lancet Digital Health by researchers from Stanford and NIMHD offered a thorough and rigorous discussion on synthetic data. They proposed actionable safeguards for synthetic medical AI: standards for training data, fragility testing during development, and deployment disclosures.
Reference: Koul, Arman, Deborah Duran, and Tina Hernandez-Boussard. "Synthetic data, synthetic trust: navigating data challenges in the digital revolution." The Lancet Digital Health (Nov 30, 2025).

This reminds me of my first blog post in January 2023 when I launched Polygon Health Analytics: “Damn, it is the data!” Three years later, it feels that LLMs and generative AI have changed the world, but the shortage of high-quality, real-world data in healthcare and biomedicine remains.

If we want AI to truly revolutionize these fields (and I believe it can), our top priority must be collaborative efforts to make high-quality real-world data accessible.

Other Posts You Might Like

AI in HEOR, RWD & Medical Affairs: What 133 Professionals Told Us—and What It Means for the Industry
AI in HEOR, RWD & Medical Affairs: What 133 Professionals Told Us—and What It Means for the Industry
Apr 14, 2026
Artificial intelligence is gaining traction across many disciplines, and health economics and outcomes research (HEOR), real-world data (RWD), and medical affairs are no exception. To understand...
Read more
Will AI Replace Pathologists? -Notes From the 2026 USCAP Floor
Will AI Replace Pathologists? -Notes From the 2026 USCAP Floor
Mar 28, 2026
“People should stop training radiologists now.” — Geoffrey Hinton (2016; he later conceded the timeline was wrong) “Within 10 years, AI will replace many doctors…” — Bill Gates,...
Read more
Polygon Health Analytics Research to Be Presented at the 2026 USCAP Annual Meeting
Polygon Health Analytics Research to Be Presented at the 2026 USCAP Annual Meeting
Mar 17, 2026
San Antonio, TX — March 18, 2026 — Polygon Health Analytics LLC announced today that its research has been accepted for a platform presentation at the USCAP 115th...
Read more
PHA LaunchPad Program — Now Recruiting for the 2026 Summer Cohort
PHA LaunchPad Program — Now Recruiting for the 2026 Summer Cohort
Jan 25, 2026
Location: Remote Duration: 3–6 months (part-time or full-time) Start Date: TBA (based on student team availability in the summer) Now entering its third year, the...
Read more
Celebrating 3 Years of Polygon Health Analytics
Celebrating 3 Years of Polygon Health Analytics
Jan 13, 2026
From corporate scientist to health tech founder: a candid three-year journey of building Polygon Health Analytics, transforming data, and redefining leadership....
Read more
Drug Development Program Done Right: A Practical Checklist to Prevent Strategic Blind Spots
Drug Development Program Done Right: A Practical Checklist to Prevent Strategic Blind Spots
Nov 28, 2025
In the high-stakes world of pharmaceutical R&D, thousands of drug candidates are abandoned every year long before reaching patients. The harsh reality: fewer than...
Read more
QALYs Explained: The Metric That’s Shaping—and Dividing—Healthcare Policy
QALYs Explained: The Metric That’s Shaping—and Dividing—Healthcare Policy
Nov 10, 2025
Quality-Adjusted Life Years (QALYs) are a cornerstone concept in health economics. They measure the value of medical treatments by considering both how long people live and...
Read more
Value-Based Health Care: Shifting the Focus from Quantity to Quality
Value-Based Health Care: Shifting the Focus from Quantity to Quality
Oct 23, 2025
Understand how value-based health care shifts focus from volume to outcomes, rewarding better results, reducing costs and improving patient care....
Read more
Budget Impact Models: A Practical Tool for Healthcare Decision-Making
Budget Impact Models: A Practical Tool for Healthcare Decision-Making
Oct 07, 2025
Learn how Budget Impact Models help payers and HTA agencies assess short-term affordability of new healthcare treatments alongside cost-effectiveness analysis....
Read more
New White Paper: Charting the Landscape of Real-World Data in the U.S.
New White Paper: Charting the Landscape of Real-World Data in the U.S.
Oct 01, 2025
Learn how real-world data is transforming U.S. healthcare and life sciences. Our new white paper maps datasets, applications, challenges, and future directions....
Read more
View all