Weekend Read in AI - #2

DeepSeek R1, US regulations, and more mature agents: what caught my attention this week.

Jan 24, 2025

A number of big developments in AI this week, and I’ll focus on just a few: an open model, DeepSeek R1, from China shifts the AI landscape; the new U.S. administration has revoked the AI risk executive order and given itself 180 days to come up with a new plan; AI agent frameworks like OpenAI’s Operator are reaching new levels of maturity; and, depending on where you are in the world, the “Framework for Artificial Intelligence Diffusion” published by the outgoing U.S. administration might feel like a nightmare, as it imposes harsh AI compute constraints.

DeepSeek

You’ve probably heard about the Chinese DeepSeek models, but if you’re confused about what’s actually going on and why it’s important, you’re not alone. Here’s the deal:

DeepSeek released two models in the past 30 days: DeepSeek V3 and DeepSeek R1. Both are big deals, but for very different reasons:

DeepSeek V3: This was the first open release of a GPT-4-level model and was supposedly trained on a budget of 2.788 million H800 GPU hours. Both the model’s performance and its relatively small training budget made headlines, but the cost angle is what got the most attention.

Two things make this interesting. First, it shows that even when you restrict someone to lower-performance chips (as the U.S. did with export controls on China), they can still achieve their goals - necessity really is the mother of invention. Second, this training budget is estimated to be just 5% of GPT-4’s, which highlights how much room there is to make AI training more energy-efficient.

DeepSeek R1: This model is in a league of its own. Like models such as O1 and O3, it’s a reasoning model that performs step-by-step reasoning during inference. But R1 is open-source and released under the very permissive MIT license (unlike V3, which uses a more restrictive license).

What makes R1 even more exciting is that it actually shows its reasoning steps in a fully transparent way. With earlier reasoning models like O1 and O3, the reasoning was there, but it stayed hidden - the “secret sauce.” DeepSeek R1, on the other hand, lets you see everything, which is huge for researchers and practitioners.

Reasoning models - those that reason during inference - are generally much more powerful than ones that don’t. That’s why O1 was a step up from traditional GPT-like models and why O3 took it even further with better reasoning.

What sets DeepSeek R1 apart is that it completely exposes its reasoning process. This transparency makes it not only fascinating to watch but also a great tool for understanding how these models “think.”

For example, in my lab, we tested R1 by asking it to classify food items into predefined groups, always a nightmare in nutrition studies. Here’s how it reasoned through the classification for “Berries, (average), cooked, without addition of sugar”:

Alright, I'm faced with assigning the food item "Berries, (average), cooked, without addition of sugar" to one of the predefined food groups. Let's break this down step by step. First, I'll identify the key components of the food item. The main ingredient here is "Berries," which are a type of fruit. So, the first thought is that this might fall under the "fruits_nuts" group, as that category includes fresh fruits and dried fruits, among others. The terms "average," "cooked," and "without addition of sugar" are descriptors. "Average" likely refers to a standard preparation, "cooked" indicates that the berries have been heated, and "without addition of sugar" suggests that no sweeteners have been added. However, none of these descriptors change the fundamental classification of berries as a fruit. Cooking methods and absence of sugar don't typically shift a food item to another category; they just describe the preparation method. Looking at the predefined groups, "fruits_nuts" seems appropriate here because berries are a fruit. Let me double-check the other groups to ensure there's no better fit. For example, "sauces_seasoning" might involve things like jams or preserves, but since this is cooked berries without added sugar, it's more about the fruit itself than a sauce or seasoning. Additionally, "fruits_nuts" includes both fresh and dried fruits, so while cooks might prepare them differently, the core ingredient is still fruit. Therefore, I'm confident that "fruits_nuts" is the correct group for this item. I should also consider the format and guidelines. The answer needs to be a string without any additional explanations, just the boxed group name. There's no mention of case sensitivity issues here, so the exact match is fine. In conclusion, "Berries, (average), cooked, without addition of sugar" fits perfectly into the "fruits_nuts" category.

It’s incredibly fascinating to follow the thought process here, and it’s bound to be an excellent data source for training future models. The release of such a strong reasoning model with a permissive license is nothing short of transformative for AI.

US regulations

A lot has happened in the U.S. over the past two weeks. On January 15 (still under the Biden administration), the Bureau of Industry and Security released a document titled “Framework for Artificial Intelligence Diffusion.” This framework establishes global export controls on advanced AI computing chips and model weights, introducing a worldwide licensing requirement with stricter rules for most countries - except for 19 specifically listed “trusted” nations, which include key U.S. allies in Europe, Asia, and Oceania.

Interestingly, countries like Switzerland, Austria, Israel, and Singapore are not on the trusted list. This means they’ll face stringent, country-specific processing power (TPP) quotas or will need to apply for “Validated End User” status, which comes with heavy security requirements and monitoring.

Since I’m based in Switzerland, this development is deeply concerning. I can only hope that Swiss diplomats and policymakers are on top of this issue and working to secure better terms.

Meanwhile, the new Trump administration has already made waves by revoking the previous AI risk executive order. They replaced it with a short policy statement:

“It is the policy of the United States to sustain and enhance America’s global AI dominance in order to promote human flourishing, economic competitiveness, and national security.”

Beyond that, they’ve tasked a number of people with drafting a comprehensive action plan over the next 180 days. It’ll be interesting to see what they come up with by summer, as it will likely have a massive impact on global AI regulation.

Agents

Agents are widely considered the big thing in AI for 2025, and OpenAI has just added to the excitement by releasing Operator - “a research preview of an agent that can use its own browser to perform tasks for you.” (Currently available to Pro users in the U.S.)

Since I’m not in the U.S., I haven’t been able to try it out yet, but early testers seem enthusiastic about it - here’s an example.

If you’re wondering what exactly an agent is, here’s a quick explanation: it’s an AI system that can independently perform tasks on your behalf. For example, you could ask it to:

“Book me a flight to New York next week, Tuesday evening, if you can find a direct flight for under $1,000.”

The agent would then head off, complete the task, and return with either a ticket and receipt or a message saying no flights were found.

As you can imagine, this capability is both incredibly powerful and somewhat concerning - AIs operating independently out in the world raise all kinds of questions. But there’s no doubt this is the direction things are heading. I’m looking forward to eventually testing Operator myself.

Engineering Prompts

Discussion about this post