Synthetic Data vs. Real-World Data: A Reality Check for Healthcare AI

Synthetic Data vs. Real-World Data: A Reality Check for Healthcare AI

2 min read

I first encountered the concept of synthetic data back in 2013, while teaching a health informatics course as a tenure-track assistant professor at UNC Charlotte. To help students experience the complexity of Electronic Health Record (EHR) systems, we partnered with a startup that built an educational EHR platform on top of the VA’s open-source system—once considered the gold standard in the industry. It was a great way for students to “feel” the real-world challenges of clinical data.

Fast forward, the idea of synthetic data expanded into the HL7 FHIR community, enabling developers to test interoperability. That made sense. But later, while working at Merck, I saw startups pitching synthetic data as their core asset to pharma. I stayed collegial, but deep down, I was skeptical: Who will trust results generated from synthetic data—FDA, CMS, or any Health Technology Assessment body?

The argument was compelling: synthetic data can train AI models when real-world data is scarce. It’s a concept rooted in machine learning. But does it sound a bit like a perpetual motion machine—too good to be true?

Here’s why I remain cautious:

  • Semantic correctness matters. Synthetic data is often generated using generative adversarial networks (GANs), variational autoencoders, or diffusion models. But how do we ensure biomedical plausibility? Can a male patient have breast cancer? (Yes, but rarely.) Can someone have both hypertension and hypotension? What about hundreds of thousands of drug-drug contraindications? These nuances matter.
  • Quality metrics are unclear. How do we evaluate how “good” a synthetic dataset is? Statistical similarity isn’t enough when clinical decisions are at stake.

Despite my skepticism, the field is booming—AI hype attracts investment. Today, multiple companies specialize in synthetic healthcare data, and research papers increasingly report models trained on it.

Recently, a viewpoint in The Lancet Digital Health by researchers from Stanford and NIMHD offered a thorough and rigorous discussion on synthetic data. They proposed actionable safeguards for synthetic medical AI: standards for training data, fragility testing during development, and deployment disclosures.
Reference: Koul, Arman, Deborah Duran, and Tina Hernandez-Boussard. "Synthetic data, synthetic trust: navigating data challenges in the digital revolution." The Lancet Digital Health (Nov 30, 2025).

This reminds me of my first blog post in January 2023 when I launched Polygon Health Analytics: “Damn, it is the data!” Three years later, it feels that LLMs and generative AI have changed the world, but the shortage of high-quality, real-world data in healthcare and biomedicine remains.

If we want AI to truly revolutionize these fields (and I believe it can), our top priority must be collaborative efforts to make high-quality real-world data accessible.

Other Posts You Might Like

Drug Development Program Done Right: A Practical Checklist to Prevent Strategic Blind Spots
Drug Development Program Done Right: A Practical Checklist to Prevent Strategic Blind Spots
Nov 28, 2025
In the high-stakes world of pharmaceutical R&D, thousands of drug candidates are abandoned every year long before reaching patients. The harsh reality: fewer than...
Read more
QALYs Explained: The Metric That’s Shaping—and Dividing—Healthcare Policy
QALYs Explained: The Metric That’s Shaping—and Dividing—Healthcare Policy
Nov 10, 2025
Quality-Adjusted Life Years (QALYs) are a cornerstone concept in health economics. They measure the value of medical treatments by considering both how long people live and...
Read more
Value-Based Health Care: Shifting the Focus from Quantity to Quality
Value-Based Health Care: Shifting the Focus from Quantity to Quality
Oct 23, 2025
Understand how value-based health care shifts focus from volume to outcomes, rewarding better results, reducing costs and improving patient care....
Read more
Budget Impact Models: A Practical Tool for Healthcare Decision-Making
Budget Impact Models: A Practical Tool for Healthcare Decision-Making
Oct 07, 2025
Learn how Budget Impact Models help payers and HTA agencies assess short-term affordability of new healthcare treatments alongside cost-effectiveness analysis....
Read more
New White Paper: Charting the Landscape of Real-World Data in the U.S.
New White Paper: Charting the Landscape of Real-World Data in the U.S.
Oct 01, 2025
Learn how real-world data is transforming U.S. healthcare and life sciences. Our new white paper maps datasets, applications, challenges, and future directions....
Read more
Chart showing global vaccine trial trends
Polygon Health Analytics Launches Vaccine Trial Atlas: Making Clinical Trial Data Accessible
Sep 16, 2025
The vaccine research and development community has faced unprecedented challenges in recent months, including policy upheavals, leadership changes, research program cancellations, and a surge of misinformation...
Read more
Launchpad
Polygon Health Analytics Celebrates Graduation of 2025 Launchpad Cohort
Sep 03, 2025
[Philadelphia, September 2, 2025] – Polygon Health Analytics proudly announces the successful graduation of its 2025 Launchpad Program cohort—the second since the program’s inception—marking...
Read more
Patient_Reported_With_Doctor
Patient-Reported Outcomes: Bringing the Patient’s Voice into Clinical Development and Outcomes Research
Aug 14, 2025
When it comes to healthcare, numbers and lab results only tell part of the story. What about how patients feel? How treatments impact their daily lives?...
Read more
Powerful AI Starts with High-Quality Data: Lessons from Edwin Chen and Surge AI
Powerful AI Starts with High-Quality Data: Lessons from Edwin Chen and Surge AI
Jul 28, 2025
In a world where AI headlines are dominated by billion-dollar fundraises, massive model sizes, and compute power arms races, Edwin Chen offers a refreshing counter-narrative. As...
Read more
Understanding Health Technology Assessment (HTA): A Strategic Imperative for Innovators
Understanding Health Technology Assessment (HTA): A Strategic Imperative for Innovators
Jul 15, 2025
If you're a scientist transitioning into the biotechnology sector or an entrepreneur developing a novel therapy for an unmet medical need, one term will...
Read more
View all