Leveraging Large Language Models like ChatGPT for Real-World Evidence Generation
3 min read

Large Language Models (LLMs), including ChatGPT, have gained significant traction since late 2022, sparking debates about whether they’re mere hype or a genuine revolution. However, their undeniable utility spans personal and business activities, including but not limited to proofreading and editing, language translation, coding and debugging, and content creation. At Polygon Health Analytics, we recognize several areas where LLMs can make a substantial positive impact on real-world evidence (RWE) generation and Health Economics and Outcome Research (HEOR), as discussed below.

LLMs Enable RWE Generation from Novel Data Types

While most current RWE is derived from claims data or structured Electronic Health Record (EHRs), a wealth of untapped potential lies in unstructured real-world data (RWD) sources. These include unstructured EHRs (e.g., clinician’s notes), surveys, social media (including online patient forums), news articles, and scientific literature. Often overlooked due to challenges like missing entries, typos, specialized jargons, and context-dependent acronyms, these unstructured free-text data can now be harnessed efficiently with LLMs. These models can efficiently extract clinical terms (e.g., diagnoses, symptoms, and medications), transform them into structured formats, fix misspellings, and contextualize acronyms and jargons1-4 (Figure 1). LLM will unlock previously overlooked RWE insights, driving the evolution of evidence-based medicine.

Figure 1. ChatGPT deciphers the clinical notes and extracts medical terms into a structured format (adapted from [4])

LLMs Democratize RWD Exploration and Analysis

RWE generation often involves navigating vast datasets, demanding advanced technical skills. LLMs hold the promise of democratizing data analysis, making it accessible to a broader audience, regardless of programming or statistical expertise. Users can interact with data naturally by posing questions in everyday language, and LLMs can translate these inquiries into structured queries for data retrieval5,6 (Figure 2). Furthermore, LLMs can assist users in generating code for complex analyses, and interpreting and summarizing findings presented in tables and figures, thereby lowering barriers to RWD analysis and facilitating data-driven decision-making7,8.

Figure 2. LLM converts free-text questions to SQL queries for data retrieving from a relational database storing RWD ([6]).

LLMs Automate Scientific Literature Synthesis

Systematic literature review is about analyzing and synthesizing existing scientific literature for evidence-based decision-making and novel research and development opportunities identification. Traditionally, manual reviews are time-consuming and expensive. LLMs can perform systematic literature review tasks, from defining search terms, to summarizing and extracting information from articles9. By deploying multiple AI agents, LLMs can streamline the review process, offering timely insights amidst the growing body of literature10.

In conclusion, integrating LLMs into RWE generation promises to advance HEOR, patient care, and health policy. LLMs are poised to reshape the landscape of RWD and RWE. As these models continue evolving, their impact on the future of medicine and patient care will be profound. It is crucial to approach these advancements critically, addressing challenges like customized models, data privacy, biasness and fairness, and regulatory compliance.

References:

  1. Jethani, Neil, et al. “Evaluating ChatGPT in Information Extraction: A Case Study of Extracting Cognitive Exam Dates and Scores.” medRxiv (2023): 2023-07.
  2. Huang, Jingwei, et al. “A Critical Assessment of Using ChatGPT for Extracting Structured Data from Clinical Notes.” Available at SSRN 4488945.
  3. Hu, Yan, et al. “Zero-shot clinical entity recognition using chatgpt.” arXiv preprint arXiv:2303.16416 (2023).
  4. “Large language models help decipher clinical notes.” MIT.edu (2022)
  5. Pan, Youcheng, et al. “A BERT-based generation model to transform medical texts to SQL queries for electronic medical records: model development and validation.” JMIR Medical Informatics 9.12 (2021): e32698.
  6. Lee, Gyubok, et al. “EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records.” Advances in Neural Information Processing Systems 35 (2022): 15589-15601.
  7. “How to Use ChatGPT Code Interpreter.” Datacamp.com (2023)
  8. Maddigan, Paula, and Teo Susnjak. “Chat2vis: Generating data visualisations via natural language using chatgpt, codex and gpt-3 large language models.” IEEE Access (2023).
  9. Alshami, Ahmad, et al. “Harnessing the Power of ChatGPT for Automating Systematic Review Process: Methodology, Case Study, Limitations, and Future Directions.” Systems 11.7 (2023): 351.
  10. Talebirad, Yashar, and Amirhossein Nadiri. “Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents.” arXiv preprint arXiv:2306.03314 (2023).
Other Posts You Might Like
Calling for Lupus Patients and Caregivers to Shape the Future of Lupus Care
Mar 11, 2025
Sponsored by the National Institute of Minority Health and Health Disparities (NIMHD), Polygon Health Analytics is iSMILE (Individualized Social Risk Management in Systemic Lupus Erythematosus Care)...
Read more
Is Life Science Ready for AI?
Jan 02, 2025
A few days ago, Dr. Geoffrey Hinton, often referred to as the “Godfather of AI,” made a chilling statement: “There is a 10-20 percent probability that...
Read more
Polygon Health Analytics Receives Competitive SBIR Grant from NIMHD to Address Health Disparities in Lupus Care
Nov 20, 2024
Philadelphia, PA — Nov 19, 2024 — Polygon Health Analytics LLC is proud to announce its award of a $295,924 Small Business Innovation Research (SBIR) Phase I...
Read more
Polygon Health Analytics Launches First Cohort of Launchpad Program
Nov 11, 2024
Polygon Health Analytics is excited to announce the successful graduation of the first cohort of its Launchpad Program, a 3-to-6-month train-to-hire initiative designed to prepare emerging...
Read more
Introducing Vaccine Vibes – A Weekly Newsletter for Vaccine R&D
Sep 13, 2024
Staying informed about the latest developments is critical for professionals in the biopharmaceutical industry, especially in the complex world of vaccine research and development (R&...
Read more
Polygon Health Analytics LLC Awarded $400,000 National Cancer Institute Contract
Sep 06, 2024
Philadelphia, PA – September 6, 2024 – Polygon Health Analytics LLC (PHA) is thrilled to announce that it has been awarded a prestigious $399,985 contract by the...
Read more
To My Fellow Data Scientists: Transform a Useful Idea into a Real Business
Jun 24, 2024
I have been building my own startup company for the past 1.5 years. Throughout this journey, I’ve been fortunate to connect with many brilliant...
Read more
Polygon Health Analytics to Present AI-driven chart review capabilities at ISPOR 2024
Apr 26, 2024
Polygon Health Analytics is proud to announce the development of a groundbreaking capability – a Large Language Model (LLM)-facilitated chart review solution. This application of advanced...
Read more
Bridging the Gap: Real-World Data Enhancing Clinical Trials
Mar 13, 2024
Randomized clinical trials (RCTs) have long been recognized as the gold standard for testing the safety and efficacy of pharmaceutical interventions. Despite their rigorous methodology, RCTs...
Read more
Navigating the Future: Your Launchpad to Success in Data Analytics
Mar 13, 2024
In today’s ever-evolving job market, the challenge of securing employment is a topic that resonates widely. Through news media and personal stories from friends and...
Read more
View all