Generate viral LinkedIn posts in your style for free.

Generate LinkedIn posts
Edwin Chen

Edwin Chen

These are the best posts from Edwin Chen.

5 viral posts with 685 likes, 37 comments, and 36 shares.
1 image posts, 0 carousel posts, 0 video posts, 1 text posts.

👉 Go deeper on Edwin Chen's LinkedIn with the ContentIn Chrome extension 👈

Best Posts by Edwin Chen on LinkedIn

Two weeks ago, ICONIQ invited me to speak about AI in Singapore.
They also had an extra ticket to the F1.

I said yes because yeah, sure, I thought I knew what F1 was. They mentioned something about PR and I nodded.

…Turns out I did *not* know what F1 was. (It's a racing competition. Not a Kaggle.)

After the race, I was lucky to run into Toto, who showed me around the garage.

And it blew my mind: F1 might have one of the most insane data systems I've ever seen.
Each car has 300+ sensors streaming real-time telemetry. There’s a decked out room with giant monitors and analysts studying the data. Teams make million-dollar decisions in seconds. Get your model wrong and... kaboom?

He also explained something about the gap between their racing simulations and the live, real world races.
In AI, we talk about data distribution shifts like an annoying edge case.
In F1, distribution shifts are the entire point.
The track spikes 6 degrees. A rain cloud appears. The strategy from 2 laps ago? Completely wrong now. And you have 200 ms to figure it out.

So Toto showed me around, explained their data systems, and at the end, gave me a hat.
In return, I gave him a lecture on machine learning evaluation metrics.
(He was gracious about it.)

Thanks Iconiq and Toto for the great time!

P.S. If these are the events Iconiq throws, no wonder they won the Anthropic deal.
Post image by Edwin Chen
I joined Lenny Rachitsky's podcast to talk about building Surge AI.

We crossed >$1B in revenue last year with <100 employees. Without VCs.

How? By solving the hardest data problems for frontier AI labs.

When researchers need to build the best coding model in the world, understand and fix their model’s weaknesses, and run last-minute evals before a new model launch, they turn to us.

Lenny Rachitsky and I talked about:
• Why Anthropic and Google are winning
• The brutal choice model builders face: prioritizing Engagement vs. sticking to their Values
• Why Fields Medalists and Harvard professors love teaching models on our platform
• What RL environments reveal about hierarchies of agentic behavior
• The underappreciated skills in post-training: Taste and Sophistication
• Why we're still over a decade away from AGI

It was a blast! Check out our full conversation here: https://lnkd.in/e5C5aM_c

YouTube: https://lnkd.in/essGkxqK
Spotify: https://lnkd.in/euXtcRUD
LMArena is a cancer on AI.

I was hoping it would die out 6 months ago, after Maverick showed what it gets you.

But it keeps rearing its head. The WSJ talks about how important it is. A new VP asks what their team is doing to climb it. The cycle continues.

It’s fundamentally broken, optimized for the wrong incentives:
• Users spend 2 seconds skimming responses before clicking their favorite.
• They're not reading carefully. They're not fact-checking. They're just picking whichever model response catches their eye.

This means that the easiest way to win on LMArena is by:
• Being verbose. Longer responses look more authoritative!
• Formatting aggressively. Bold headers look like polished writing!
• Vibing. Wild, colorful emojis grab your attention!

It doesn't matter if a model completely hallucinates. If it looks impressive, LMSYS users will vote for it over a correct answer.

It doesn't have to be this way. The best products have principles they stick to.

This is the brutal choice every model builder must eventually make:

1. Do you want to optimize for shiny leaderboards and short-term engagement, chasing user clicks no matter where they take you – in the vein of the worst dopamine loops?
2. Or do you stick to your guns – and prioritize street smarts and real utility?

Sticking to your values is hard. But we’ve seen some frontier labs hold the line. And users loved their models anyway, because hype eventually dies and quality is the only metric that survives the cycle.

LMArena is a plague on AI, and I hope more labs start pushing back.

Blog post with examples here:
It's 3am on Thursday. You're staring at Slack.
"Payment processing is failing for 0.3% of transactions.”
Only on mobile. Only for german users.

Figure it out (or you’re fired).

You look.
- Tests? Green 🟢
- Staging? Gorgeous ☀️
- Can’t repro locally
You’re one Stack Overflow prayer away from a breakdown.

Also, your 1yo heard you clacking away and started bawling too.

Which AI do you turn to?

Option A:
Trained on LeetCode. Aces "reverse a linked list" on the first try. Scores 99% on HumanEval.

OR

Option B:
Trained by a senior staff software engineer. Stack Overflows as a hobby. Has saved 1.2M devs from getting fired.

Model A’s advice: “Have you considered refactoring your hash table?”
Model B's: “The algorithms are fine. The problem is Germany's mobile carrier is insane. heh.”

This is why we choose that SO engineer every time.
So he can teach AI to operate in the trenches.

He's NOT teaching models how to:
- Solve problems with perfect inputs.

He's teaching them:
- It's 3am, Suzie is bawling, and the payments only break for users with umlauts. Help!

PhDs and textbooks prepare you for:
✅ Clean inputs
✅ Single correct answers
✅ Problems with an elegant solution

Production hits you with:
😱 "It only breaks on Tuesdays"
😱 "The logs contradict each other"
😱 "It worked yesterday, I swear!"

Unix wizards aren't wizards because they memorized algorithms.
They're wizards because they've been paged enough times to recognize oh, it’s a caching issue.

Again.

Because they know the difference between:
- Code that passes tests
- Code that survives angry Klauses, Felixes, and Helgas wielding bratwurst

Smart ≠ useful. We’re teaching AI to recognize the second.

Because when it’s 3am and you're Googling "why does this only break in Germany" for the 17th time...
You don't need AI that’s read CLRS and spits out Big O.

You need AI that’s faced – and solved – a thousand production meltdowns before.
"Prognosticative pastry." "A hound circling a tree, nose to bark."

These aren’t parodies - they’re actual quotes from SOTA models in response to creative writing prompts, and they’re winning leaderboards that are rewarding slop.

We’re introducing *Hemingway-bench*, a new AI writing leaderboard, to fix this:

https://lnkd.in/gdG9QdMc

https://lnkd.in/gEbnMwJs

We designed Hemingway-bench to push frontier model writing toward genuine nuance and impact.

Instead of autograders and two-second vibe checks - both of which love fancy literary devices and dense formatting, over actual quality - we used expert human writers across a variety of fields to judge real-world writing tasks.

Why? I love writing. I love reading. Great science fiction is one of the things that's always inspired me. Even in terms of "enterprise value", so much of what we do in our day-to-day involves writing - we want crisp emails and insightful reports, not dry, verbose summaries.

Yeah, coding is important - but there's a reason I use CC-assisted apps, but still haven't read a full-fledged AI novel.

What did we find? Current leaderboards are easily hacked, and often negatively correlated with actual quality. If a model (over)uses all the stuff you learn about in school (metaphors in every sentence! transition words! complex, flowery phrases!), it ranks high on EQ-bench and LMArena.

But that’s not good writing that people actually want.

The winners of Hemingway-bench didn't sound like they were trying to win a poetry slam. Gemini 3 Flash, Pro, and Opus 4.5 took the top 3 spots because they had natural voices that didn't sound pretentious.

They were poetic and immersive, but in the right ways.

When they used wit, they didn't sound cringey and try-hard - they sounded like your naturally funny friend.

I'm waiting for the day AI wins a Pulitzer, and hopefully Hemingway-bench helps guide it on its way.

Check out the leaderboard and examples here: https://lnkd.in/gdG9QdMc

And our blog post describing it: https://lnkd.in/gEbnMwJs

Related Influencers