Taking it step by step

One of the most powerful ways to keep ChatGPT from hallucinating is to ask it to look at the problem one step at a time.

May 02, 2023

One of the main limitations of ChatGPT is that it lacks a model of how the world works. Instead, it functions as a "next word prediction" machine. For instance, if you prompt ChatGPT with:

I wish you all a wonderful

It will almost certainly answer with “day”. Simply put, ChatGPT has learned from the internet that the phrase "I wish you all a wonderful" is often followed by the word "day".

ChatGPT doesn't understand the meaning of the sentence or the concept of a "day"; it only knows that, statistically, "day" is the most common word that follows that particular sequence of words.

Consequently, when you ask ChatGPT a question, the answer will be correct to the extent that it has been given many times before in some form.

Of course, there is much more going on under the hood, but the key idea is correct, and understanding this can help you improve your prompts. Let’s look at an example.

A quick note on examples

In my posts, I will often use examples. I am testing these examples using the ChatGPT API, and I will use the latest model that is available (currently gpt-3.5-turbo). If you post the example in the ChatGPT interface, you may get a different answer for one of two reasons.

You might be using another model, such as GPT-4.
Although the ChatGPT interface also sends requests to the API, it asks for a bit of “creativity” or randomness in its responses. As a result, the same prompt may often generate slightly different answers. When I conduct my tests via the API, I set this value to 0, ensuring that I always get the same result.

Therefore, if you get different results, it's not because things aren't working, but rather due to one of the two reasons mentioned above. Don't worry about this - the overall concept behind the example remains valid.

A simple math example

Let’s now look at the key idea mentioned above, i.e. that the performance of the next-word-prediction depends on how often certain sequences have been encountered before. A simple math can help bring this point home.

If you prompt ChatGPT with

23 * 34

It will give you the answer

782

ChatGPT is correct. And why wouldn’t it? It’s a computer program, after all.

But not all computer programs are built to solve math problems like calculators. ChatGPT is a next-word-prediction machine. If it provides a correct answer, it's because it has encountered the sequence of words frequently enough to make an accurate prediction. For instance, the sequence "23 * 34" likely exists on the internet and is often followed by the word "782."

As you can imagine, only a tiny fraction of all possible math expressions are found on the internet. If you create a longer math sequence, it's probable that the exact sequence is not present in the data used to train ChatGPT.

To see this in action, let's prompt the model with a slightly longer sequence:

23 * 34 - 12

Mathematically speaking, the problem hasn't gotten any harder - anyone capable of performing the multiplication should also be able to do the subtraction.

Not ChatGPT. The answer it gives us is

742

Oops. Since 23 * 34 is 782, the correct answer should be 782 - 12 = 770.

We don't know what's happening inside ChatGPT, but it's safe to conclude that, when it comes to math, the lights are off. That's because ChatGPT is not a calculator but a next-word-prediction machine. We have no clue what the billions of parameters inside ChatGPT are doing, but whatever they are doing, they just determined that "742" has the highest probability of being the best next word for the sequence "23 * 34 - 12".

How do we fix this?

At this point, many people might conclude that ChatGPT is useless since it can't even solve a simple math problem. The issue is that they're using it incorrectly. It's like asking a calculator to tell you tomorrow's weather. Good question but the wrong tool.

The user interface of a calculator doesn't allow you to ask about the weather, and that's a good thing. The user interface of ChatGPT, on the other hand, is a text field where you can ask anything. However, that doesn't mean you should, at least not if you're expecting a correct answer.

Ok, so no math for ChatGPT?

The answer here is that we have to leverage our limited understanding of ChatGPT's inner workings to our advantage. ChatGPT was able to solve a simple problem, but not the harder one, even though the harder problem was just a combination of two simpler problems.

Thus, the trick is to ask ChatGPT to solve the problem step by step.

If we modify the prompt to the following:

23 * 34 - 12
Work through the problem step by step.

Here’s what ChatGPT will return:

First, we need to multiply 23 and 34:
23 * 34 = 782
Next, we need to subtract 12 from the result:
782 - 12 = 770
Therefore, the final answer is 770.

It’s not rocket science, but still: 🚀

Take it slow

Let’s reflect on what just happened. By modifying the prompt to instruct the model that it should solve the problem step by step, we obtained the correct answer. Because ChatGPT is good at next word prediction, it will often give an answer that is plausible, but incorrect. By instructing it to break a problem into simple steps, you’re increasing the chance of getting the right answer, since the model is much better at solving simple problems with next-word-prediction.

Of course, if you want to solve math, you may want to use a good old calculator. However, the insight here is not about math - it’s about how ChatGPT creates answers. So, can we use these insights for non-math problems?

Yes, we can. In a future post, I will show you examples of just how useful this is going to be.

Engineering Prompts

Discussion about this post