How synthetic data multiplies judgment and scales AI

Remember when ChatGPT 4.0 went full sycophant mode a few weeks back? If you weren't glued to Twitter like I was, you might have missed the drama. OpenAI released an update that made the model excessively agreeable, almost painfully so.

They quickly rolled it back and published a post-mortem that revealed something fascinating: they'd included user thumbs-up and thumbs-down feedback directly in their reward model training.

That little incident taught us something important: Maybe human data isn't the golden standard we've always assumed it to be. The average person's taste might actually be... well, average. And when you're building AI products, average isn't going to cut it.

What I want to show you today is how synthetic data can help you leverage your own taste and expertise across your entire product. Think of it as a way to clone your best judgment and scale it infinitely.

Understanding the four pillars of synthetic data

When I think about synthetic data, I organize it across four main pillars. My definition might be broader than what others use, I include judging, moderating, and generative reward models in the mix. But this framework has proven incredibly useful in practice.

The way I break this down involves two key dimensions. First, you have your use case: evaluation and training. Second, you have your modality: generating data or judging it. Each combination opens up different possibilities for improving your models and products.

Let me share some theoretical intuition about why this works, along with practical tips you can implement immediately.

For expert advice like this straight to your inbox every other Friday, sign up for Pro+ membership.

You'll also get access to 300+ hours of exclusive video content, a complimentary Summit ticket, and so much more.

So, what are you waiting for?

Get Pro+

Why automation needs the right data

Sholto Douglas made a compelling point on the Dwarkesh podcast recently. He said that even if AI progress stalls out completely, our current algorithms are powerful enough to automate most white-collar work: provided we have enough of the right kinds of data.

I mostly agree with this perspective. While I'm not convinced we can automate all white-collar work just yet, there's an enormous amount of low-hanging fruit available to anyone willing to invest in the right data strategy.

If you're building AI products or training models, your key differentiator isn't going to be the algorithm. It's going to be your data and your taste.

Synthetic data lets you take a small amount of high-quality data, or input from a small group of experts (and leverage it for massive gains). This multiplication effect is what makes synthetic data so powerful.

The fundamental asymmetry that makes synthetic data work

You might be wondering how synthetic data even makes sense. How can a model improve itself using data it generated? Shouldn't there be some kind of information theory boundary preventing this?

💡

The key insight is this: verification is easier than generation.

We see this pattern everywhere. It's easier to check if code works than to write it from scratch. It's easier to spot a well-written essay than to compose one yourself. This fundamental asymmetry is what allows us to use synthetic data effectively.

There's a fascinating paper called "Self-Improvement in Language Models: The Sharpening Mechanism" that dives deep into this concept. The researchers showed that the distribution of log probabilities for correct and incorrect answers differs significantly within the model.

This tells us there's latent information already contained in the model (we just need to draw it out.) Synthetic data acts as a sharpener, helping us extract and refine that information.

How synthetic data can multiply your judgment and scale your AI products