Token IDs are just addresses. The real meaning lives in embeddings — long lists of numbers where similar words sit close together, and relationships between words become directions you can calculate.
Course: Beginner.
This lesson covers 5 concepts: Token IDs In, The Meaning Table, A Word's Vector, Similar Words Are Close, Words Have Relationships.
These are the token IDs from tokenization — raw integers. They identify each word piece but carry no meaning on their own.
Token IDs are just addresses. The actual meaning lives in what those addresses point to — vectors in the embedding table.
Like a library catalogue number — "BD 1234" means nothing until you walk to that shelf and open the actual book.
3956 = "cat", 38617 = "kitten", 5679 = "dog", 39937 = "puppy". Four integers. No meaning — yet.
A giant lookup table built during training — one row per vocabulary entry, each row packed with numbers encoding that word's meaning.
This table is where AI's understanding of language lives — everything it learned from billions of sentences is compressed into these numbers.
Imagine a spreadsheet with 100,000 rows — one per word — and each row has 4,000 numbers that somehow capture the word's meaning. That is the embedding table.
A table with 100,000 rows and 4,096 columns. Row 3956 is "cat". Row 5679 is "dog". Their rows are similar. Row 7723 for "inflation" looks nothing like either.
Each token ID retrieves one row from the table — a vector of hundreds of numbers. This is the word's meaning fingerprint in AI's mathematical space.
This vector is the only representation of meaning the model works with. Every subsequent step — attention, reasoning, generation — builds entirely on these vectors.
Every word gets a unique fingerprint made of thousands of numbers. "Cat" has one fingerprint. "Kitten" has a very similar one. "Democracy" has a completely different one.
"Cat" becomes a list of 4,096 numbers. Some encode that it is an animal. Some that it is small. Some that it purrs. Nobody labelled any of them — AI discovered the structure itself.
"Cat" and "kitten" have vectors pointing in nearly the same direction — they are neighbours in meaning-space. "Cat" and "democracy" point in completely different directions.
Proximity in vector space is how AI knows that words are related — even synonyms it has never directly compared. Similarity is pure geometry.
Imagine a city where similar ideas live on the same street. Animals on one street, emotions on another, technology on a third. AI measures how far apart any two words are on this map.
cosine("cat","kitten") = 0.89. cosine("cat","dog") = 0.77. cosine("cat","car") = 0.18. cosine("cat","democracy") = 0.04. The numbers match human intuition.
Take the words you just saw. Subtract what makes a kitten different from a cat, then add a puppy — and you land almost exactly on "dog". The relationship is the same direction.
This is what lets AI complete analogies, fill in blanks, and transfer knowledge — because relationships are consistent geometric directions across the whole vocabulary.
The relationship between a cat and its baby is the same mathematical step as the relationship between a dog and its baby. The geometry is consistent across the whole language.
cat − kitten + puppy ≈ dog (0.94). The "baby animal" direction is the same for cats and dogs. AI learned this without being told — purely from reading billions of sentences.