AI never looks up a stored answer. It builds the response one word at a time — each word chosen from a probability distribution, each chosen word feeding back in to shape the next choice. Pure construction, not retrieval.
Course: Beginner.
This lesson covers 5 concepts: Your Question, The Generation Loop, Choosing the Next Word, Word by Word, Answer Assembled.
Your question enters the model as context — everything the model knows before it writes a single word of the answer.
The question is the only starting point. AI has no script, no stored answer, no template — just your words and everything it learned during training.
Like handing someone a question and saying "go" — they have no notes, no cheat sheet, just what they already know and your question in front of them.
"What is the largest planet?" — six words that trigger a chain of predictions, each building the answer one token at a time.
Before generating each word, the model sees your question plus every word it has already written. Each new word expands the context for the next prediction.
This feedback loop is what gives AI responses coherence — each new word is informed by the complete conversation, not just the last few words.
Like a typewriter that can read everything it has already typed. Each new keystroke is chosen based on seeing the full page — not just the last word.
Generating "Jupiter": sees full context → picks "Jupiter" → adds it → now sees "...is Jupiter" → picks "." to end the sentence.
After generating "The largest planet is", the model computes a score for every word in its vocabulary — all 100,000 of them. "Jupiter" wins with 94%.
This probability distribution is computed fresh at every single step. Different questions produce entirely different distributions.
Like a very confident multiple-choice exam. Every word in the language is an option. The model ranks them all and picks the top scorer.
Context: "The largest planet is ___". Jupiter scores 94%. Saturn 3%. Everything else shares the final 3%. The answer is unambiguous — Jupiter wins easily.
Each chosen word joins the growing output sequence. Six separate predictions produced six tokens — stacked together, they form a complete sentence.
The output is never planned ahead. Each word only comes into existence at its own step — the model does not "know" the full answer in advance.
Like building a sentence with individual word cards — you do not plan the whole sentence first, you just keep picking the best next card until the sentence feels complete.
"The" then "largest" then "planet" then "is" then "Jupiter" then "." — six independent predictions, six tokens, one complete answer.
The finished answer — assembled token by token, in milliseconds. Every word was chosen from a probability distribution. Nothing was stored, nothing was retrieved.
This exact same process produces every response you have ever gotten from an AI — poetry, code, explanations, essays. Always one word at a time.
Like watching a sentence assemble itself from nothing — each word appearing because the model decided it was the best next choice, not because it was written in advance.
"Jupiter. It is by far the largest planet..." — built token by token in under a second. No copy-paste. No database. Pure prediction.