Weekend Read in AI - #6
Grok-3 makes a strong debut, DeepSeek sheds its censorship, and a front-row seat to how AI is reshaping startups at Y Combinator.
xAI’s new model, Grok-3, has delivered an impressive performance on lmarena.ai. Strangely, it’s not making big waves, and I can’t quite wrap my head around why. Ultimately, this must come down to the fact that it’s Elon Musk’s model. Some people may prefer to ignore that the most powerful figure in tech now also controls one of the most powerful AIs. Others distrust him so deeply that they assume the benchmark must be rigged. That’s a tough claim to buy, especially given Grok-3’s strong results across multiple benchmarks.
Another reaction is: “Well, he bought 200,000 H100 GPUs and threw a bunch of data at it - what did you expect?” Sure, and producing electric cars and reusable rockets is just engineering… Of course, it would have been surprising if Grok-3 hadn’t performed well. But its success also underscores that catching up to the state of the art is possible. The real question is: how many others could pull this off? From both an engineering and organizational standpoint, it’s hard not to be impressed. I certainly am - and I can’t wait to get my hands on the API to test it.
DeepSeek-R1 1776
Perplexity made waves by releasing a version of the DeepSeek-R1 model that has been “post-trained to provide unbiased, accurate, and factual information.” This open-source release is remarkable for several reasons. First, it’s impressive how they managed to make the model respond factually on topics like Tiananmen Square and the Uyghurs in China. It’s not just that the R1-1776 version has less Chinese censorship - it seems to have eliminated it entirely.
Another striking aspect is the model’s name. 1776 is, of course, the year of the US Declaration of Independence, and it’s hard to interpret this as anything other than a geopolitical middle finger. What Perplexity hopes to gain from this move isn’t entirely clear, but I’m glad they are:
a) proving that this can be done, and
b) releasing the model weights.
AI startups from Y Combinator
At AMLD 2025 last week, we had the pleasure of hosting YC partner Nicolas Dessaigne as a keynote speaker. On the sidelines of the event, he took some time to sit down with me for a podcast episode - one I highly recommend listening to (on Apple Podcasts or Spotify). Nicolas has a front-row seat to the AI startup ecosystem and how these companies are disrupting markets at an unprecedented pace.
Here are a few highlights from our conversation:
1️⃣ "The most impressive trend we've seen in 2024 was voice AI. The first company that impressed me was a company doing an AI interviewer who is going to interview candidates for a job." - on the types of AI use cases that started working last year.
2️⃣ "At YC, we think founders first, not idea first. We would sometimes fund a team of founders that we really like despite their idea." - on why people matter more than anything else.
3️⃣ "When I was a founder, the best startups went from zero to 1 million annual recurring revenue, in one year. That was best-in-class. Today, best-in-class is zero to 10 million." - on how AI is immediately unlocking business value that was inaccessible to software pre-AI.
4️⃣ "Usually, after 5 minutes, you know if you want to fund a company or not." - on selecting founders for YC.
5️⃣ "Moore's Law was 2x every 18 months. We are at 10x every 12 months - it's super-exponential. You can work on things today that seem impossible because the cost doesn't make sense, but in just 12 months from now, it will totally make sense." - on how to think about a future where intelligence is becoming a commodity.
The main thing I took away from our conversation is the enormous speed at which startups can now execute on an idea and rapidly capture large parts of a market. So many things seem to be up for grabs - things that simply didn’t work two years ago now do because of AI.
And in this technology revolution, we’re not talking about fluff. We’re talking about things that deliver immediate economic value - things people are willing to pay for (hence the massive revenue numbers after just one year).
Again, too many points to cover here, so I encourage you to listen to the conversation on Apple Podcasts or Spotify.
CODA
This is a newsletter with two subscription types. I highly recommend to switch to the paid version. While all content will remain free, all financial support directly funds EPFL AI Center activities.
To stay in touch, here are other ways to find me:
Thank you for your reply, I missed it initially as I read it on my mobile device.
Are there any comparisons available between Grok, Chatgtp and Gemeni based on defined criteria? Thank you.