San Antonio, TX — March 18, 2026 — Polygon Health Analytics LLC announced today that its research has been accepted for a platform presentation at the USCAP 115th Annual Meeting of the United States and Canadian Academy of Pathology, taking place March 21–26, 2026, in San Antonio, Texas.
The presentation, “From PDF to Precision Medicine: MAGIC Automates Pathology Report Curation and Links to Genomic Data for Squamous Cell Carcinoma Research,” is based on a Small Business Innovation Research (SBIR) Phase I project supported by the National Cancer Institute (NCI).
“This work shows how automated extraction can transform large collections of pathology reports into research‑ready datasets,” said Dr. Lixia Yao, presenting author. “By building tools that help non‑technical users discover relevant signals across large, complex public cancer datasets, we aim to make it easier for physicians and scientists to accelerate discovery and improve patient outcomes.”
The abstract will be presented by Dr. Lixia Yao during the Informatics session on Monday, March 23, 2026, from 9:00 AM to 9:15 AM at the Henry B. González Convention Center in San Antonio.
Background
The National Cancer Institute hosts over 10 petabytes of genomic, transcriptomic, proteomic, imaging, and clinical datasets, including digitized slides and pathology reports in PDF or scanned-image formats. Similar challenges exist across healthcare systems, where vast amounts of valuable data remain disorganized and underutilized, limiting their potential to support research and improve patient care.
Design
To address these challenges, the team developed MAGIC (Multimodal Analysis of Genomics, Imaging, and Clinical Data), an analytics platform that enables pathologists to leverage large-scale datasets for clinical and translational research. MAGIC applies natural language processing (NLP) to extract standardized indicators, such as tumor site, histologic grade, and margin status, from scanned pathology reports.
The platform interface supports cohort building based on clinical or molecular criteria, with export capabilities for downstream analysis. Squamous cell carcinoma (SCC), a common histotype across multiple organ systems, served as a pilot use case.
Results
Using MAGIC, researchers assembled a cohort of 1,390 SCC cases and exported the dataset to cBioPortal for comparative genomic analysis. The NLP pipeline parsed and normalized unstructured pathology reports in PDF format into structured data elements, including demographics, tumor site, grade, stage, margin status, treatment effect, and histologic features such as differentiation, mitotic activity, lymphovascular invasion, and perineural invasion.
These structured data were linked to sequencing datasets encompassing genomic alterations and gene expression profiles stored in the NCI Commons.
As a use case, the team examined associations among tumor grade, primary site, and mutation profiles. A comparison of 754 Grade 1–2 SCC cases with 532 Grade 3–4 cases revealed differences in anatomical site distribution. ZFHX4 mutations were enriched in Grade 3–4 tumors (log₂ ratio = –1.04, p = 3.74×10⁻⁷, q = 6.61×10⁻³), suggesting a potential biomarker associated with tumor aggressiveness.
Conclusions
This proof of concept shows that analyses requiring months of manual curation can be completed in a single day using MAGIC with cBioPortal. By automating pathology report extraction and integrating multi-omic datasets, MAGIC offers an intuitive, scalable, and reproducible tool for translational research.
Contact
For more information, please contact Polygon Health Analytics LLC at info@polygonhealthanalytics.com.
About Polygon Health Analytics LLC
Polygon Health Analytics LLC develops advanced analytics solutions that transform complex healthcare data into actionable insights. The company focuses on real-world data, artificial intelligence, and multimodal data integration to support biomedical research, drug development, and healthcare innovation.