How LLMs Work: An Interactive Guide

It Doesn't Read Words. It Reads Tokens.

To an AI, the sentence "I love coding" isn't three words. It's a sequence of numbers.

First, text is chopped into chunks called Tokens. A token can be a whole word, part of a word, or even a space. Common words are usually single tokens, while complex or rare words might be split up.

Try it: Type in the box to see how an LLM sees your text. Notice how spaces and partial words are handled.

Input Text Token View (Simulated)

Token count: 0 *Colors represent distinct tokens

Words into Numbers (Embeddings)

2D Visualization of Vector Space

Once tokenized, each token is converted into a list of numbers called a Vector.

Imagine a giant map. Words with similar meanings (like "King" and "Queen", or "Apple" and "Pear") live close to each other on this map.

This allows the AI to understand relationships. It knows that "Paris" is to "France" what "Tokyo" is to "Japan" just by looking at the distance and direction between them.

Interactive Map: Hover over the dots to see words clustered by meaning (Technology, Food, Animals, Royalty).

The Attention Mechanism

This is the secret sauce. When an LLM processes a word, it "looks back" at all previous words to figure out the context. It assigns an Attention Score to determine how important other words are to the current word.

Hover over a word to see what the AI "pays attention" to:

Thickness of line = Strength of attention

Example: In "The animal didn't cross the street because it was too tired", when the AI reads "it", it pays huge attention to "animal" to know what "it" refers to.

Predicting the Future

The LLM doesn't just pick one word. It calculates the probability for every possible word in its vocabulary appearing next.

Temperature controls how "risky" the AI is.

Low Temp (0.1): Boring, accurate. Always picks the most likely word.
High Temp (1.0+): Creative, chaotic. Might pick less likely words.

INPUT:
"The quick brown fox jumps over the ..."

Temperature: 0.5 Balanced

Strict (0.1) Creative (1.5)

Top Probabilities

Summary

1. Tokenize

Chop text into numbers

2. Embed

Map meaning in space

3. Attend

Find context links

4. Predict

Roll dice for next word