Understanding how to use Large Language Models

Context Length/Window

Context length refers to the number of words or tokens a language model can consider when generating a response. It is a critical factor in language modeling, especially for LLMs with massive parameter counts.

Example:

Consider the sentence: "The quick brown fox jumps over the lazy dog."

Context Window: If the model's context window is set to 5, it will consider the last 5 words: "jumps over the lazy dog."
Output Generation: When asked about the sentence, the model might generate a response related to these words, e.g., "The dog is lazy."

Maintaining Context in Long Conversations

When a conversation exceeds the model’s context window, earlier messages are forgotten. However, models attempt to maintain context by summarizing or compressing prior interactions. This enables them to generate relevant responses even when earlier parts of the chat are no longer within the active memory.

Watch the video for a better understanding: YouTube Video

Tokens

Tokens are the fundamental units of text that a language model processes. They can be individual words, subwords, or even characters, depending on the model's tokenization strategy.

Tokenization Examples:

Word-Level Tokenization: "The quick brown fox" → ["The", "quick", "brown", "fox"].
Subword-Level Tokenization: "The quick brown" → ["The", "quick", "##brown"]. Here, "##" indicates a subword token.
Character-Level Tokenization: "The quick brown" → ["T", "h", "e", " ", "q", "u", "i", "c", "k", " ", "b", "r", "o", "w", "n"].

Token Counting

Tokenization is essential for both training and inference. Models count tokens to:

Determine Context: As seen in the context window example, token count influences the model's ability to maintain context.
Training Data Preparation: Tokenization converts text into numerical representations, forming the training dataset.