UTILITY·1 MIN READ
#how-llms-work

How a LLM Generates Text

Step through the transformer pipeline from prompt to prediction

HowsOfThings
May 6, 2026

All LLMs are next-word prediction algorithms. That's it. ChatGPT, Claude, LLaMA — at their core, they all do the same thing: read what you've written so far, and guess the single most likely next word. Then they add that word to the input and guess again. And again. That's how a paragraph gets written — one word at a time. But "guessing the next word" involves an insane amount of math. Your sentence gets chopped into tokens (pieces of words), each token becomes a list of 768 numbers called an embedding, then those numbers get processed through 12 layers of attention and transformation — where every word figures out which other words matter to it — and finally the model produces a probability for every single word in its 50,000-word vocabulary. The tool below runs GPT-2, a real transformer with 117 million parameters, on your text. You can step through each stage: see the tokens, the embeddings as points in space, the actual weight matrices (W_q, W_k, W_v) that compute attention, how each layer transforms the numbers, and the final probability distribution that picks the next word. Everything you see is real data from a real model — not a simulation.

Loading interactive tool...