Powerful AI Starts with High-Quality Data

28 Jul 2025 3 min read AI

In a world where AI headlines are dominated by billion-dollar fundraises, massive model sizes, and compute power arms races, Edwin Chen offers a refreshing counter-narrative. As the founder of Surge AI, Chen built a $1 billion-per-year data labeling business with just 150 employees in 5 years —no external investors, no sales team, and no PR machine. His story is a powerful reminder that in AI, quality often trumps quantity.

Independent Thinking as a Superpower

Chen’s approach is rooted in independent thinking. He avoids the noise of social media and Silicon Valley groupthink, choosing instead to focus on substance. His insights come from trusted colleagues and customers—not viral threads. This mindset allows him to build durable products that solve real problems, not just chase trends.

“I’m glad I’m not surrounded by the default ways of Silicon Valley thinking.”

A Business Built on Product, Not Pitch

Surge AI was profitable from day one. Chen didn’t raise venture capital—not because he couldn’t, but because he didn’t need to. He believes in letting the product speak for itself, and in shaping it through genuine customer feedback rather than sales-driven hype.

“We didn’t need the money. And I didn’t want a sales team convincing people to buy a product they didn’t deeply understand.”

Small Teams, Big Impact

Chen is blunt about inefficiencies in Big Tech:

“Ninety percent of employees at tech giants are working on useless problems.”

Surge AI operates with lean teams, no standing 1:1s, and asynchronous communication. This fosters speed, autonomy, and transparency—qualities often lost in larger organizations.

Build First, Fund Later

Chen urges startups to stop making excuses and start building. With today’s tools, most teams can launch a minimum viable product (MVP) without significant capital. Fundraising, he argues, should follow validation—not precede it.

“For 90–95% of startups, there’s no excuse. Just build the MVP. See if anyone cares.”

The Real Bottleneck in AI: Data Quality

Chen’s journey began at Twitter, where poor data labeling hindered even basic sentiment analysis. That experience led to a core realization: high-quality data is the foundation of powerful AI.

“Without clean, contextual, high-quality training data, even the best models underperform.”

While compute and algorithms get the spotlight, Chen ranks data quality as the #1 constraint in AI today. Without it, more compute simply accelerates failure.

Synthetic Data vs. Human Judgment

Synthetic data has its place, but Chen warns against overreliance. Models trained on synthetic data often struggle in real-world scenarios, lacking nuance and diversity. In many cases, a few thousand well-labeled human examples outperform millions of synthetic ones.

Specialized Models Still Matter

Despite the dominance of general-purpose models, Chen sees enduring value in domain-specific approaches. Smaller teams can move faster, encode expert knowledge, and align more closely with user needs.

“Some products simply can’t be built within the constraints of Big Tech companies.”

AI Safety Is a Now Problem

Chen challenges the notion that AI safety is a future concern. Misaligned objectives—like optimizing for engagement over truth—are already causing harm. As AI systems become more embedded in critical domains, the stakes will only rise.

“The real risk isn’t that AI becomes evil. It’s that we train it toward the wrong objectives—and don’t realize it until it’s too late.”

Final Thoughts

Few areas have more potential for AI-driven transformation than healthcare. Yet the data in this field remains fragmented and inconsistent. Chen’s success calls for a collective effort to raise the standard of healthcare data—not just as a technical challenge, but as a moral imperative. If you're working on improving healthcare data—or want to—reach out. Let’s build something meaningful together.

Other Posts You Might Like

Synthetic Data vs. Real-World Data: A Reality Check for Healthcare AI

Dec 15, 2025

I first encountered the concept of synthetic data back in 2013, while teaching a health informatics course as a tenure-track assistant professor at UNC Charlotte. To...

Drug Development Program Done Right: A Practical Checklist to Prevent Strategic Blind Spots

Nov 28, 2025

In the high-stakes world of pharmaceutical R&D, thousands of drug candidates are abandoned every year long before reaching patients. The harsh reality: fewer than...

QALYs Explained: The Metric That’s Shaping—and Dividing—Healthcare Policy

Nov 10, 2025

Quality-Adjusted Life Years (QALYs) are a cornerstone concept in health economics. They measure the value of medical treatments by considering both how long people live and...

Value-Based Health Care: Shifting the Focus from Quantity to Quality

Oct 23, 2025

Understand how value-based health care shifts focus from volume to outcomes, rewarding better results, reducing costs and improving patient care....

Budget Impact Models: A Practical Tool for Healthcare Decision-Making

Oct 07, 2025

Learn how Budget Impact Models help payers and HTA agencies assess short-term affordability of new healthcare treatments alongside cost-effectiveness analysis....

New White Paper: Charting the Landscape of Real-World Data in the U.S.

Oct 01, 2025

Learn how real-world data is transforming U.S. healthcare and life sciences. Our new white paper maps datasets, applications, challenges, and future directions....

Chart showing global vaccine trial trends

Polygon Health Analytics Launches Vaccine Trial Atlas: Making Clinical Trial Data Accessible

Sep 16, 2025

The vaccine research and development community has faced unprecedented challenges in recent months, including policy upheavals, leadership changes, research program cancellations, and a surge of misinformation...

Polygon Health Analytics Celebrates Graduation of 2025 Launchpad Cohort

Sep 03, 2025

[Philadelphia, September 2, 2025] – Polygon Health Analytics proudly announces the successful graduation of its 2025 Launchpad Program cohort—the second since the program’s inception—marking...

Patient-Reported Outcomes: Bringing the Patient’s Voice into Clinical Development and Outcomes Research

Aug 14, 2025

When it comes to healthcare, numbers and lab results only tell part of the story. What about how patients feel? How treatments impact their daily lives?...

Understanding Health Technology Assessment (HTA): A Strategic Imperative for Innovators

Jul 15, 2025

If you're a scientist transitioning into the biotechnology sector or an entrepreneur developing a novel therapy for an unmet medical need, one term will...

View all

Powerful AI Starts with High-Quality Data: Lessons from Edwin Chen and Surge AI