It Doesn't Read Words. It Reads Tokens.
To an AI, the sentence "I love coding" isn't three words. It's a sequence of numbers.
First, text is chopped into chunks called Tokens. A token can be a whole word, part of a word, or even a space. Common words are usually single tokens, while complex or rare words might be split up.
Words into Numbers (Embeddings)
Once tokenized, each token is converted into a list of numbers called a Vector.
Imagine a giant map. Words with similar meanings (like "King" and "Queen", or "Apple" and "Pear") live close to each other on this map.
This allows the AI to understand relationships. It knows that "Paris" is to "France" what "Tokyo" is to "Japan" just by looking at the distance and direction between them.
The Attention Mechanism
This is the secret sauce. When an LLM processes a word, it "looks back" at all previous words to figure out the context. It assigns an Attention Score to determine how important other words are to the current word.
Hover over a word to see what the AI "pays attention" to:
Example: In "The animal didn't cross the street because it was too tired", when the AI reads "it", it pays huge attention to "animal" to know what "it" refers to.
Predicting the Future
The LLM doesn't just pick one word. It calculates the probability for every possible word in its vocabulary appearing next.
Temperature controls how "risky" the AI is.
- Low Temp (0.1): Boring, accurate. Always picks the most likely word.
- High Temp (1.0+): Creative, chaotic. Might pick less likely words.
"The quick brown fox jumps over the ..."
Summary
1. Tokenize
Chop text into numbers
2. Embed
Map meaning in space
3. Attend
Find context links
4. Predict
Roll dice for next word