How to Manage an AI's Limited Memory

AI generated photo of a forgetful goldfish alone in the black void
An forgetful AI with too little memory

You might have experienced this. You're in the zone, cheeks flushed with excitement because ChatGPT is your mentor, understanding you far better than your boss and the rest of the department. Together, you're devising the world's best marketing strategy, five minutes to the deadline. And then suddenly, it's all over, and you're left sitting there, feeling betrayed, frustrated, and astonished, with a forgetful goldfish swimming around, firing random words and phrases left and right.

Individual AI models have a limited memory (context window), and although they are regularly upgraded, there is an upper limit to how much information they can manage at one time. The size of the memory sets a limit for the tasks we can use an AI to solve. It's important to know this in order to choose the right AI model and to use it appropriately, getting the highest quality out of the answers it produces.

While memory is an important factor, it's not the only one. You can read more about this here 8 AI Limitations You Should Know

Next, we'll look at how the size of an AI’s memory is measured, the challenges when memory becomes too small, and how we can work around the problem to ensure the best results. We'll use ChatGPT as an example because it's the best and most widespread model, but the principles also apply to Gemini, Claude, LLaMA, and other language models.

An AI’s Memory Is Measured in Tokens

The memory of a language model like ChatGPT is measured in tokens. The more tokens, the more memory, and the bigger and more complex tasks the AI model can solve.

We can use OpenAI’s tokenizer to see how ChatGPT breaks down our text input (prompts) into tokens. If we now ask ChatGPT-4 Plus the question, “Do AI models have limited memory?” we can see that the total of 37 letters, spaces, and other characters are divided into 8 tokens:

This means that with our simple question, we have used the first 8 tokens of ChatGPT-4 Plus’ 32,000 tokens in the chat we have started. There are now approximately 31,992 tokens left in the chat for ChatGPT’s answer and our continued conversation.

When we exceed the limit of 32,000 tokens, ChatGPT-4 Plus starts to forget what we talked about at the beginning of our conversation. It simply throws out the first tokens, so we are always within a short-term memory consisting of the last approximately 32,000 tokens. This is known as a rolling context window, and the length of this rolling window is approximately the number of tokens the AI model has in its memory. As long as we stay within the limit of 32,000 tokens, ChatGPT-4 Plus can remember what we are talking about, learn from our conversation, and improve its answers to us. Outside the chat, it knows nothing about our conversation. Each chat has its own short-term memory of 32,000 tokens. Therefore, it's a good idea to switch chat when you switch topics.

If our prompts are complex, or we write in a language other than English, then the token consumption will increase significantly, and the memory will correspondingly decrease. We will return to this later.

What Are Tokens, and Why Do Language Models Use Them?

Tokens are the smallest units of meaning into which our text input to ChatGPT and its answers to us are divided. Looking again at the example above, we can see how a token can be a whole word like “Do”, a character like “?”, or a combination like “space+limited”:

Each of these tokens is represented by a number in ChatGPT’s gigantic database, which maps all words and the contexts they can appear in, based on the enormous text volumes it has been trained on. In this way, ChatGPT and other language models make a qualified guess at the most likely next word in its answer. Just as we know from the countless apps and programs that use autocomplete to facilitate our text input.

Here's an example from Microsoft Copilot/Bing, which guesses that the word I'm looking for is "feelings" when I enter the first part of the question from before, “Do AI models have”:

Feelings was not the word I was looking for; instead, it was “limited”. The fact that an AI can guess wrong is something we must consider when prompting. But that's a different story.

The New York Times has a fine article with animations that show how ChatGPT guesses the next most likely word in its answer based on the content of our conversation with it.

Tokens Can Be Converted to Characters, Words, or Pages

Tokens are good for computers but difficult for us humans to relate to. Therefore, we can convert them into characters (letters, numbers, spaces, periods, etc.), words, or pages in a book. Here are OpenAI’s rules of thumb: One token corresponds to 4 characters or ¾ of a word, but only in English text. In other languages, such as Danish, the token consumption is higher. Why this is important, we will look at later.

In the table, you can see the maximum memory for the different In the table, you can see the maximum memory for the different ChatGPT models measured in tokens, and approximately how many characters, words, and pages it corresponds to in English:

FreePlus & TeamEnterprise & API
ModelGPT-3.5GPT-4GPT-4
Tokens8k32k128k
Characters32k128k512k
Words6k24k96k
Pages a 320 words1975300

The free ChatGPT-3.5 has the smallest memory of all GPT models, equivalent to 6,000 words, whereas ChatGPT-4 Plus can remember a chat that is four times as long. It can thus handle twice as long prompts, analyze a larger data volume, and produce longer answers. The largest models are found in Enterprise, which only large companies currently have access to, and in the API models developers use to integrate ChatGPT into various software products such as chatbots for online customer service. Enterprise has a memory that is 16 times larger than ChatGPT-3.5. There is thus a huge difference in how large and complex tasks the different models can perform.

Unfortunately, there is no counter in ChatGPT that can help us keep track of our token consumption, so we get an indication of whether our prompts can be executed within its memory. There is, however, one in Microsoft's Copilot, which has a small counter in the lower right corner that keeps track of the number of characters in each prompt, see the screenshot in the previous section.

Convert Tokens to Number of Words, or Use OpenAI’s Tokenizer

It is thus up to us to be aware of not exceeding the memory limitation of the AI we use. When it comes to ChatGPT, we can of course use OpenAI’s tokensizer. But we can also create our own rule of thumb based on a conversion of tokens to characters, words, or pages, as shown in the table above. Personally, I find it easiest to keep track of the number of words, as I typically write my prompts in Word. Here, the number of words can be easily read in the status bar in the bottom left corner.

Although the number of pages is easier to remember than tokens, characters, and words, the number of words per page varies quite a lot and is therefore difficult to use. OpenAI says that ChatGPT-4 Turbo with 128.000 tokens can handle prompts of about 300 pages. Here we are talking about unspecified book pages, at 320 words per page, less than an A4 page. I have now used the norm anyway, so the comparison of models in the table is consistent with OpenAI's documentation.

If you work with A4 pages, as we typically do in the office, there can be 400-500 words on a page with standard margins, 1½ line spacing, and a font size of 12 points. But depending on the number of paragraphs, subheadings, and illustrations, even less. Pages are thus a rather imprecise size. So I recommend that you use the number of words or OpenAI’s tokenizer.

Significantly Higher Token Usage for Non-English Languages

ChatGPT and other major language models are primarily trained on English texts, which favors the use of English over other languages. This also applies to token consumption. As mentioned in the section above, a token in the GPT-3.5 and GPT-4 models corresponds to 4 characters or about ¾ of a word in English. OpenAI states that usage is higher in other languages. However, they do not specify how much greater the difference is.

I have often wondered why ChatGPT loses its memory much faster when I write in Danish compared to English. Therefore, I conducted a small, entirely unscientific test to see the actual difference in token usage for texts written in Danish compared to their corresponding English translations.

If we now translate the question "Do AI models have limited memory?" from the example above into Danish, “Har AI-modellerne en begrænset hukommelse”, and compare the two prompts in OpenAI’s tokenizer, we can see that the Danish prompt costs 14 tokens, while the equivalent English prompt only costs 8 tokens, for essentially the same number of characters:

I have entered my articles here on the blog into the tokenizer, and it turns out that the original Danish texts use about 48% more tokens than their equivalent English translations! That's quite significant. And it would have been nice if OpenAI had informed us about the consequences when one does not prompt in English.

Looking at the two examples, we can see that the low English token consumption is due to tokens here more often being whole words, and not, as in the Danish example, broken down into even smaller units of meaning like syllables and single letters. For example, the English word “limited” consists of only one token, namely "space + limited", while the corresponding word in Danish, "begrænset", is broken down into 4 tokens: "space+begr" + "æ" + "n" + "set".

ChatGPT is primarily trained in English, so it is simply much better at English than, for example, Danish, and it handles English in a much more efficient way. This will of course change as it gets better trained in Danish and other non-English languages. It also helps that since writing this article, ChatGPT has been upgraded from 8,000 to 32,000 tokens. However, the fundamental issue remains the same.

Should We Write Prompts in English?

Does the significantly higher token usage in Danish and other non-English languages mean that we should write our prompts in English? Perhaps. I like to think that ChatGPT gives better results with English prompts, because it is trained in English, and because it writes better and more nuanced in English. It would be nice if OpenAI told us if we get better results by prompting in English.

I write prompts in both Danish and English. First and foremost, it is the customer who decides. They must be able to understand, maintain, and change them as needed. But when one can choose, I recommend prompting in the language in which one is most familiar with the technical terms related to the specific task, so one can write as precisely and concisely as possible. That is indeed one of the most important prompt principles if one wants a good answer from an AI. Otherwise, one must learn to write professional prompts in English to fully utilize the capabilities of one's AI. Unfortunately, this makes prompt usability worse for many, as writing good prompts can already be challenging.

The increased token usage for non-English languages also has an economic impact for API models, where you pay for the total number of tokens in both input and output. Thus, risking the temptation of selecting a cheaper and less capable model making your product less competitive.

How to Manage an AI's Limited Memory

It's not that there is a fixed token limit for when an AI begins to forget what we are talking about. It's rather a gradual transition. One must pay attention to the details. First, it makes small mistakes, then a few more, and then suddenly it completely loses its short-term memory and has no idea what we were just talking about. Or it just completely stops.

This is particularly important to remember when your prompts are complex or written in a language other than English, as token consumption can significantly increase, thereby reducing the AI's memory and its capacity for work.

Here are some tips on how to get the most quality out of the memory size of the AI you are working with:

  • Write only about one topic per chat, and switch to a new chat with a new and fresh short-term memory when you change the topic.

  • When you have a longer conversation with your AI, it may be necessary to repeat important information intermittently so that it does not forget them as they fall out of its rolling memory window.

  • Test the limit of your AI’s memory, so you know how it reacts, and what signs to look for when it can no longer remember all the information you have given.

  • Find out how many tokens the AI you are using has. Then make a rule of thumb for what this corresponds to in characters, words, or pages, whatever suits you best.

  • Use OpenAI’s tokenizer tokenizer to find out how many tokens your prompts and ChatGPT's responses use in total. This gives you an idea of whether you are within the GPT model's rolling token limit (context window length).

  • Break up large and complex tasks and prompts into several smaller prompts, each of which can be solved within your AI's memory, in a chat, or within a request if you are using the API models. The output of one prompt may serve as input for the next.

    You may also need to collect the various partial answers at the end and run them through a final prompt before you have the result.

  • If you prompt in a language other than English, you might consider learning how to write professional prompts in English, to fully leverage the capabilities of your AI.

  • If none of the above helps, use a larger and more capable AI model.

Will AI Get Permanent Memory?

In ChatGPT you can store some important information using the system prompt, ‘Custom instructions’, so ChatGPT does not forget them across your chats. It doesn’t fully compensate for its short-term memory issues, though.

OpenAI has developed a new memory management feature, which is currently being tested by selected users in beta. It’s going to be really interesting to see how much it can compensate for the forgetfulness in ChatGPT’s short-term memory.

Share this post

Picture of Jakob Styrup Brodersen

Jakob Styrup Brodersen

I have worked with data-driven online optimization for 20 years in 5 different industries. Now, I am a freelance CRO and AI consultant: I teach and advice how to utilize the benifits of AI and I do prompt engineering.