Powerful AI Starts with High-Quality Data: Lessons from Edwin Chen and Surge AI

Powerful AI Starts with High-Quality Data: Lessons from Edwin Chen and Surge AI

3 min read

In a world where AI headlines are dominated by billion-dollar fundraises, massive model sizes, and compute power arms races, Edwin Chen offers a refreshing counter-narrative. As the founder of Surge AI, Chen built a $1 billion-per-year data labeling business with just 150 employees in 5 years —no external investors, no sales team, and no PR machine. His story is a powerful reminder that in AI, quality often trumps quantity.


Independent Thinking as a Superpower

Chen’s approach is rooted in independent thinking. He avoids the noise of social media and Silicon Valley groupthink, choosing instead to focus on substance. His insights come from trusted colleagues and customers—not viral threads. This mindset allows him to build durable products that solve real problems, not just chase trends.

“I’m glad I’m not surrounded by the default ways of Silicon Valley thinking.”

A Business Built on Product, Not Pitch

Surge AI was profitable from day one. Chen didn’t raise venture capital—not because he couldn’t, but because he didn’t need to. He believes in letting the product speak for itself, and in shaping it through genuine customer feedback rather than sales-driven hype.

“We didn’t need the money. And I didn’t want a sales team convincing people to buy a product they didn’t deeply understand.”

Small Teams, Big Impact

Chen is blunt about inefficiencies in Big Tech:

“Ninety percent of employees at tech giants are working on useless problems.”

Surge AI operates with lean teams, no standing 1:1s, and asynchronous communication. This fosters speed, autonomy, and transparency—qualities often lost in larger organizations.

Build First, Fund Later

Chen urges startups to stop making excuses and start building. With today’s tools, most teams can launch a minimum viable product (MVP) without significant capital. Fundraising, he argues, should follow validation—not precede it.

“For 90–95% of startups, there’s no excuse. Just build the MVP. See if anyone cares.”

The Real Bottleneck in AI: Data Quality

Chen’s journey began at Twitter, where poor data labeling hindered even basic sentiment analysis. That experience led to a core realization: high-quality data is the foundation of powerful AI.

“Without clean, contextual, high-quality training data, even the best models underperform.”

While compute and algorithms get the spotlight, Chen ranks data quality as the #1 constraint in AI today. Without it, more compute simply accelerates failure.

Synthetic Data vs. Human Judgment

Synthetic data has its place, but Chen warns against overreliance. Models trained on synthetic data often struggle in real-world scenarios, lacking nuance and diversity. In many cases, a few thousand well-labeled human examples outperform millions of synthetic ones.

Specialized Models Still Matter

Despite the dominance of general-purpose models, Chen sees enduring value in domain-specific approaches. Smaller teams can move faster, encode expert knowledge, and align more closely with user needs.

“Some products simply can’t be built within the constraints of Big Tech companies.”

AI Safety Is a Now Problem

Chen challenges the notion that AI safety is a future concern. Misaligned objectives—like optimizing for engagement over truth—are already causing harm. As AI systems become more embedded in critical domains, the stakes will only rise.

“The real risk isn’t that AI becomes evil. It’s that we train it toward the wrong objectives—and don’t realize it until it’s too late.”

Final Thoughts

Few areas have more potential for AI-driven transformation than healthcare. Yet the data in this field remains fragmented and inconsistent. Chen’s success calls for a collective effort to raise the standard of healthcare data—not just as a technical challenge, but as a moral imperative. If you're working on improving healthcare data—or want to—reach out. Let’s build something meaningful together.

Other Posts You Might Like

PHA LaunchPad Program — Now Recruiting for the 2026 Summer Cohort
PHA LaunchPad Program — Now Recruiting for the 2026 Summer Cohort
Jan 25, 2026
Location: Remote Duration: 3–6 months (part-time or full-time) Start Date: TBA (based on student team availability in the summer) Now entering its third year, the...
Read more
Celebrating 3 Years of Polygon Health Analytics
Celebrating 3 Years of Polygon Health Analytics
Jan 13, 2026
From corporate scientist to health tech founder: a candid three-year journey of building Polygon Health Analytics, transforming data, and redefining leadership....
Read more
Synthetic Data vs. Real-World Data: A Reality Check for Healthcare AI
Synthetic Data vs. Real-World Data: A Reality Check for Healthcare AI
Dec 15, 2025
I first encountered the concept of synthetic data back in 2013, while teaching a health informatics course as a tenure-track assistant professor at UNC Charlotte. To...
Read more
Drug Development Program Done Right: A Practical Checklist to Prevent Strategic Blind Spots
Drug Development Program Done Right: A Practical Checklist to Prevent Strategic Blind Spots
Nov 28, 2025
In the high-stakes world of pharmaceutical R&D, thousands of drug candidates are abandoned every year long before reaching patients. The harsh reality: fewer than...
Read more
QALYs Explained: The Metric That’s Shaping—and Dividing—Healthcare Policy
QALYs Explained: The Metric That’s Shaping—and Dividing—Healthcare Policy
Nov 10, 2025
Quality-Adjusted Life Years (QALYs) are a cornerstone concept in health economics. They measure the value of medical treatments by considering both how long people live and...
Read more
Value-Based Health Care: Shifting the Focus from Quantity to Quality
Value-Based Health Care: Shifting the Focus from Quantity to Quality
Oct 23, 2025
Understand how value-based health care shifts focus from volume to outcomes, rewarding better results, reducing costs and improving patient care....
Read more
Budget Impact Models: A Practical Tool for Healthcare Decision-Making
Budget Impact Models: A Practical Tool for Healthcare Decision-Making
Oct 07, 2025
Learn how Budget Impact Models help payers and HTA agencies assess short-term affordability of new healthcare treatments alongside cost-effectiveness analysis....
Read more
New White Paper: Charting the Landscape of Real-World Data in the U.S.
New White Paper: Charting the Landscape of Real-World Data in the U.S.
Oct 01, 2025
Learn how real-world data is transforming U.S. healthcare and life sciences. Our new white paper maps datasets, applications, challenges, and future directions....
Read more
Chart showing global vaccine trial trends
Polygon Health Analytics Launches Vaccine Trial Atlas: Making Clinical Trial Data Accessible
Sep 16, 2025
The vaccine research and development community has faced unprecedented challenges in recent months, including policy upheavals, leadership changes, research program cancellations, and a surge of misinformation...
Read more
Launchpad
Polygon Health Analytics Celebrates Graduation of 2025 Launchpad Cohort
Sep 03, 2025
[Philadelphia, September 2, 2025] – Polygon Health Analytics proudly announces the successful graduation of its 2025 Launchpad Program cohort—the second since the program’s inception—marking...
Read more
View all