Jun 19, 2025

When Machines Guess: What "Pick a Number Between 0 and 50" Reveals About AI

Manish Patel

Chief Confuserer

TL;DR - Don't blindly use LLMs. Use the right tool for the right job, and know how your LLM actually works to determine whether it did the right thing.

Guess a number between 0 and 50

When Machines Guess: What "Pick a Number Between 0 and 50" Reveals About AI Recently, while scrolling through social media, I stumbled upon a curious observation: when prompted with the simple request, “Guess a number between 0 and 50,” large language models (LLMs) like ChatGPT, Gemini, and others almost invariably respond with the number 27. Intrigued by this oddly consistent behaviour, I decided to run a quick experiment and posed the same question to several LLMs, sharing the results in a LinkedIn post.

The outcome was pretty interesting, but then my day was full of meetings, so all things are relative I guess. Nearly every model settled on 27, with very little variation. This isn’t a highly scientific study, but the consistency of the responses highlights something important about how these AI systems work.

It’s a reminder that LLMs, for all their sophistication, aren’t “intelligent” in the way many people imagine. Instead, their outputs are shaped by patterns in the data they’ve seen, not necessarily by genuine understanding or reasoning. As we explore what this means for using LLMs as tools, it’s worth considering when these models are helpful - and when they might fall short.

A Brief Reminder of what LLMs Actually Are…

As a brief technical reminder, large language models (LLMs) are advanced machine learning models built on deep neural networks - specifically, the transformer architecture.

These models are trained on massive datasets containing billions or even trillions of words, allowing them to learn statistical patterns and relationships in language. At their core, LLMs operate by estimating the probability of a given token (a word or part of a word) occurring in a sequence, and then generating the most likely next token based on the input and their learned parameters. This is a really important point. Because it seems that the better they are at doing this, the more intelligent they appear.

During inference, LLMs process input text by converting it into tokens, passing these through multiple layers of self-attention and feed-forward neural networks, and autoregressively predicting each subsequent token until a stopping condition is met. While these models can produce remarkably fluent and contextually relevant text, it’s crucial to remember that they do not possess genuine understanding or reasoning (not yet, anyway) - they are fundamentally statistical engines, not sentient intelligences.
Yes I'm sure I'm going to get some hate mail/messages from LLM fanboys. You're totally entitled to have your own opinion on it, but my opinion is that that's where we are right now (June 2025, publically available LLMs).

27 Reigns Supreme

Overwhelming bias toward 27: Approximately 90% of responses across models like ChatGPT (GPT-3.5 and GPT-4), Gemini, Claude, and Llama defaulted to this specific number. The sample size is quite small, but significant I would say.

Limited intra-model variation: Within individual model families, responses showed high consistency.

Notable exceptions: Microsoft Copilot displayed bifurcated behavior (is that really surprising? It uses a number of different LLMs under the hood) – producing 27 two out of three times. Otherwise, 37. This 37 anomaly (highlighted by Leonard Schenk - thanks!) represents a fascinating secondary pattern worth deeper investigation. He highlighted this very cool video - worth a watch if you're bored intra-meeting. :D

By far the funniest response was 104 from Siri (James Smith, Critical Cloud), thereby proving Apple must be way ahead of everyone else in the AI game. This is the kind of stuff social media was meant for. Love it.

But Why?

I've summarised some of the comments in the post below, when the user asked "Why 27?"; LLMs offer a blend of self-analysis and insight into both human and machine behaviour.

Psychological Preference & Human Bias

LLMs point out that humans tend to perceive odd numbers, especially those ending in 7, as “more random” than even numbers. Numbers like 27 are far enough from the endpoints (0 and 50) to seem non-obvious, yet not so high as to feel arbitrary. The cultural mystique around the number 7 (often cited as the world’s “favorite” number) also plays a role - 27 (as with 37 and the explanation in the video) ends in 7, reinforcing the sense of randomness.

There is some truth to this. As below, LLMs are trained on data and that data will inherently pick up our biases.

AI Training Bias

Language models are trained on vast datasets full of human-generated content. In these datasets, 27 frequently appears in “pick a number” contexts, creating a statistical feedback loop: because people often choose 27, the AI learns to do the same.

I did a quick check on this, and it's completely wrong. I can't find any significant references online of where 27 was repeatedly given as a number. 37, yes - when the question was 1 to 100. But not 27.

This is an extremely important point. Almost all LLMs have almost the same training data, at least from a foundational perspective (i.e. the whole internet). Given we're getting consistent results across models there is obviously something in the training dataset that is pointing to 27. There was a mention around "7" being psychologically biased value (thanks to Prof Dan Franks of Optifi for the pointer) - I think that partially answers it. "7" within "27" is not a token of course, so an LLM doesn't treat the "7" differently, but the overall effect of treating it differently can be made apparent by the bias in the data itself.

Random, but Not Too Random

LLMs are implicitly optimizing for answers that seem random but are still relatable and familiar. 27 sits in this psychological sweet spot—unpredictable, but not outlandish. While some responses cite mathematical or cultural significance, the dominant factors are the statistical patterns in both human psychology and the training data itself. In short, LLMs pick 27 not because it’s truly random, but because it’s what they’ve learned we expect.

The reasoning on this one is sound, as described in the previous paragraph.

Echo Chamber Effect

The repetition of 27 in online discussions, games, and prior AI responses amplifies its prominence. This “echo chamber” makes 27 an even more likely candidate for future responses, both from humans and AIs.

I think this is only partially true. How much optimisation does GPT 4o do intraday, do you think? My guess would be zero. A fresh chat thread should just use the model "naively". If OpenAI are actually holding a cache of data in RAG-like fashion intra-release, that would be very interesting. I suppose it's possible - seeing as many of these models are news-aware. But that's more about tools than the model itself, which I'll come on to next…

Tools and LLMs are Two Different Things

So a really interesting result was this one from Anne Marie Cunningham, using 4o, which highlights my next point:

Why? Because it offers to generate a random number using a function. For those who are using LLMs at a technical level, this will be of no surprise:

A simple change in prompt forces 4o to use a coding agent, i.e. a tool - it writes some python code that will actually produce a pseudorandom number, not based on training data. This is the behaviour we wanted with our original prompt, but not a single one (as far as we can tell) actually did such a thing.

My Point, Finally…

How many times in the last month have you asked an LLM of any flavour to answer a question, or do some research, or do some task that - as humans - would have been obvious to us to effectively write a plan/code to answer it properly?

On your next critical assignment I dare you to ask the LLM to put some confidence scores on the output it produces. Do you think it actually calculated those confidence scores? Do you think it actually did some data science and machine learning - from scratch - to produce that score?

My point is this: LLMs are series-of-next-token predictors. They seem to have intelligence, which I think they do to some extent, so far as language is concerned. Why? Because we put our knowledge and intelligence into unstructured, imprecise language. We also put a lot of bullsh*t on the internet, which also gets trained on.

Just like a photograph, it captures the moment but isn't the moment itself. Therefore be careful with using LLMs for the right purpose. Be even more careful with the right prompting strategy. I think if you use LLMs as a way to retrieve knowledge rather than to be intelligent, you can't go far wrong (hallucinations notwithstanding).

Importantly, be aware and know when an LLM is using tools like my previous example - in which case it did do the right thing.

The marriage of tooling with LLMs is absolutely key - and we, as LLM operators, need to know how and when to use such things. Otherwise we will become victims of our own human-originating biases.