What Building a RAG Pipeline Taught Me About Search
DocuChat, chunking strategies, and why retrieval - not the LLM - decides whether your answers are any good
DocuChat started as a simple idea: upload a PDF, ask it questions. LangChain, Pinecone, Next.js - the tutorial stack. The tutorial version worked in a demo and fell apart on real documents. Getting it production-worthy taught me that RAG is a search problem wearing an AI costume.
The LLM is the last mile, retrieval is the road
When answers were wrong, my instinct was to blame the model or tweak the prompt. Almost every time, the real failure was upstream: the relevant passage never made it into the context. You can't prompt-engineer your way out of bad retrieval. Once I started evaluating retrieval separately - “did the right chunks come back?” - quality problems became debuggable instead of mystical.
Chunking is where the quality lives
- Fixed-size chunks split tables and clauses mid-thought. Splitting on document structure - headings, sections, paragraphs - kept meaning intact and answers grounded.
- Overlap between chunks looks wasteful and isn't. A sentence that straddles a boundary is unfindable without it.
- Smaller chunks retrieve precisely but lose context; bigger chunks carry context but dilute the embedding. There is no universal size - evaluate on your own documents.
Metadata filters cut our query time 60%
The single biggest performance win wasn't a better index - it was scoping. Every chunk carries metadata: document ID, section, page. Filtering the vector search to the active document before similarity ranking shrank the search space dramatically. Combined with caching embeddings for repeated queries, end-to-end query time dropped about 60%.
const results = await index.query({
vector: await embed(question),
topK: 8,
// Scope before you rank: similarity search over one document,
// not the whole corpus.
filter: { docId: activeDocument.id },
includeMetadata: true,
});Answers need receipts
The feature that made people trust DocuChat wasn't better answers - it was citations. Every response links back to the page and section it came from. In a RAG system, “here's where I got that” converts a plausible-sounding paragraph into something a user will act on.
Where I'd start today
Treat retrieval as the product. Build an evaluation set of real questions with known source passages before you tune anything. Chunk on structure, filter on metadata, cite your sources. The LLM at the end is the easiest part to swap - the search underneath is what you'll actually own.