Lesson 21 of 27 intermediate 8 min read

Before this:Skills & AI config files The context window, in detail

Feeding the model the right context

Q: What is RAG and when do I need it?

**RAG** (retrieval-augmented generation) means fetching the most relevant chunks of a large body of text — code, docs, tickets — and pasting them into the prompt automatically before the model answers. It uses embeddings to find passages similar to your question. You need it when the material is far too big to fit in the context window, like a huge codebase or a wiki, and you want the model grounded in *your* content rather than guessing.

Q: What is MCP?

**MCP**, the Model Context Protocol, is an open standard for connecting models to external sources — files, documentation, APIs, databases, issue trackers — through a common interface. Instead of every tool inventing its own way to plug in a data source, an MCP server exposes that source once and any MCP-aware client can use it. It's how an agent can, say, read your database schema or fetch a live ticket without custom glue for each tool.

Key takeaways The model only knows what it can see — getting the right material into its context is most of the battle. Many ways in — pasting, @-mentioning files, letting an agent read the repo, retrieval/RAG over large corpora, and tools via MCP. Relevant beats big — context management is a skill: prune the irrelevant, summarise long history, and start fresh when a thread gets muddy.

This is lesson 21 of the path. Back in context windows we established the central fact: a model has no hidden knowledge of your project — it answers from what’s in its context plus what it absorbed during training. The previous lesson gave it standing context through config files. This lesson is about the rest: how you get the right code, docs, and data in front of the model for the task at hand, and — just as important — how you keep the junk out. By the end you’ll know the main ways to supply context and why managing it well is a skill in its own right.

The model only knows what it can see

It’s worth saying plainly, because it explains nearly every disappointing answer: when a model gives a generic or wrong response about your code, it usually wasn’t looking at your code. It was working from the general patterns it learned in training, because the specific file it needed wasn’t in the context. The model isn’t withholding effort; it literally cannot see what you didn’t show it.

So “providing context” isn’t a nice-to-have — it’s the main thing that turns a generic assistant into one that understands your system. Everything below is a different mechanism for the same goal: get the relevant material into the window.

The ways to supply context

Method	What it is	Best for
Paste / attach	Drop code, an error, or a file straight into the chat	Quick, focused questions about a snippet you already have
@-mention files	Reference files by name in an IDE so the tool pulls them in	Working in an editor where the files are right there
Agent reads the repo	An agentic tool opens, searches, and reads files itself	Tasks spanning many files where you don’t know all of them up front
Retrieval / RAG	Automatically fetch the most relevant chunks of a large corpus	Codebases or doc sets far too big to fit in the window
Tools / MCP	Connect the model to live sources — files, APIs, databases	Grounding answers in current, external data

Pasting and attaching is the simplest: you hand the model exactly the text you want it to consider. It’s precise but manual, and it doesn’t scale past a handful of files.

@-mentioning files in an IDE integration is the same idea with less friction — you name the file and the tool inserts its contents for you.

Letting an agent read the repo is the leap the agentic tools made. Instead of you choosing every file, the agent searches the codebase, opens what looks relevant, and reads it — discovering context you might not have known to provide. The cost is that it spends context (and time) exploring, so a good prompt still points it roughly where to look.

Retrieval and RAG for large corpora

When the material is far bigger than any context window — a million-line codebase, years of documentation — you can’t paste it and an agent can’t read all of it. Retrieval-augmented generation (RAG) solves this by fetching only the most relevant pieces.

The trick relies on embeddings, which we met in types of models: a model turns each chunk of text into a vector — a list of numbers — positioned so that passages about similar things sit near each other. Your question gets embedded the same way, and the system retrieves the chunks whose vectors are closest to it. Those few chunks get pasted into the prompt, and the model answers grounded in your content rather than its general training. That’s the whole shape of RAG: embed everything once, retrieve the nearest matches per question, generate from them. It’s how a tool can answer about a codebase it could never hold in memory all at once.

Tools and MCP for live data

Some context isn’t a file at all — it’s the current state of a database, a live API response, or today’s open issues. Tools let a model reach out and fetch such things during a conversation. The Model Context Protocol (MCP) is an open standard for these connections: an MCP server exposes a source — files, docs, an API, a database, an issue tracker — through a common interface, and any MCP-aware client can use it. The point of a standard is leverage: expose your data source once, and every MCP-capable tool can read it, instead of writing custom glue for each tool. This is increasingly how an agent grounds itself in current, external reality rather than a static snapshot.

Context management is a skill

Having many ways in creates a new problem: it’s easy to over-fill the window. And more context is not automatically better. Relevant beats big. A window padded with marginally related files dilutes the signal the model needs, can actively mislead it toward the wrong file, and costs more on every turn. Managing context well is mostly about restraint:

Prune the irrelevant. Remove files and attachments that aren’t pulling their weight. If a file isn’t load-bearing for the task, it’s noise.
Summarise long history. A sprawling conversation eventually crowds out the actual code. Condense the decisions so far into a short summary and drop the back-and-forth that produced them.
Start fresh when the thread gets muddy. Once a session is full of abandoned directions and contradictions, the cleanest fix is a new session seeded with just the relevant state. A clean window is easier to steer than a cluttered one — the same advice as refining versus restarting a prompt.

The mental model: you are the editor of the model’s attention. Every token of context competes with every other for the model’s focus, so curating down to what matters is as much the job as gathering material in the first place.

In a real codebase: point, don’t dump

GopherTrunk makes the practical lesson concrete. Suppose you’re fixing a bug in the control-channel decoder. The instinct might be to give the model “the whole repo” and let it sort things out. Resist it. The repo spans DSP, multiple protocol decoders, a daemon, and replay tooling — most of which is irrelevant to your bug and would only dilute the model’s attention.

Instead, point it at the right neighbourhood: the internal/scanner/ccdecoder package, plus the one or two types your change touches and the test file you’ll extend. That focused context lets the model match the package’s existing error style and conventions — exactly what the config file and a precise prompt set up — without wading through unrelated DSP code. If it turns out the bug reaches into the down-converter, you add ddc.go then; you don’t front-load the entire tree on the off chance. Start narrow, widen only when the task proves it needs more.

This also keeps you out of a known trap: GopherTrunk has two separate down-conversion paths (a single-channel Downconverter used by replay, and a wideband DDCBank used live). A model handed the whole repo can easily edit the wrong one. Pointing it at the specific path the task concerns prevents that class of mistake — context discipline is correctness, not just efficiency.

Quick check: how does retrieval-augmented generation (RAG) handle a codebase far too large for the context window?

Recap

The model only knows what it can see — generic answers usually mean the relevant code wasn’t in the context, not that the model didn’t try.
Many ways in — paste/attach, @-mention files in an IDE, let an agent read the repo, retrieve with RAG, or connect live sources with tools.
RAG uses embeddings — it fetches the chunks most similar to your question so the model can ground its answer in a corpus too big to fit.
MCP standardises connections — an open protocol so one data source, exposed once, works with any MCP-aware tool.
Relevant beats big — prune irrelevant files, summarise long history, and start fresh when a thread gets muddy.
Point, don’t dump — aim the model at the specific package and types a task needs, and widen only when it proves it must.

Next up: whether to commit to a single model and provider or combine several — the trade-offs, and how multi-model setups actually work — in One model, or a combination?.

Frequently asked questions

What is RAG and when do I need it?

RAG (retrieval-augmented generation) means fetching the most relevant chunks of a large body of text — code, docs, tickets — and pasting them into the prompt automatically before the model answers. It uses embeddings to find passages similar to your question. You need it when the material is far too big to fit in the context window, like a huge codebase or a wiki, and you want the model grounded in your content rather than guessing.

What is MCP?

MCP, the Model Context Protocol, is an open standard for connecting models to external sources — files, documentation, APIs, databases, issue trackers — through a common interface. Instead of every tool inventing its own way to plug in a data source, an MCP server exposes that source once and any MCP-aware client can use it. It’s how an agent can, say, read your database schema or fetch a live ticket without custom glue for each tool.

Is more context always better?

No — relevant beats big. Stuffing the window with marginally related files dilutes the signal, can confuse the model, and costs more. Point it at the few files that matter, prune what doesn’t, and start a fresh session when a thread gets muddy. Good context management is choosing what to leave out as much as what to put in.