Building a Simple RAG Application with Modern LLMs
Retrieval-Augmented Generation (RAG) has become one of the most practical ways to make LLMs useful with your own data. In this post, I'll walk through how I built an open-source RAG app from scratch.
What is RAG?
RAG combines the power of large language models with a retrieval system that fetches relevant context from your own documents before generating a response.
Instead of fine-tuning a model (expensive and slow), RAG lets you:
- Ground responses in your actual data
- Reduce hallucinations by providing real context
- Update knowledge without retraining
Architecture Overview
User Query
│
▼
┌──────────┐ ┌──────────────┐
│ Embedding │────▶│ Vector Store │
│ Model │ │ (pgvector) │
└──────────┘ └──────┬───────┘
│ Top-K results
▼
┌──────────────┐
│ LLM Prompt │
│ Query+Context│
└──────┬───────┘
│
▼
Response
Tech Stack
| Component | Technology | Why |
|---|---|---|
| Embeddings | OpenAI text-embedding-3-small | Best cost/quality ratio |
| Vector DB | PostgreSQL + pgvector | No extra infrastructure |
| LLM | GPT-4o / Claude | Interchangeable via API |
| Backend | Python + FastAPI | Async support, type hints |
| Frontend | React + TypeScript | Familiar, fast iteration |
The Chunking Strategy
One of the most critical decisions in RAG is how you split your documents. Here's what worked best:
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=64,
separators=["\n\n", "\n", ". ", " ", ""]
)
chunks = splitter.split_documents(documents)
Why 512 tokens?
- Too small (128) — loses context, retrieval becomes noisy
- Too large (2048) — dilutes relevant information
- 512 with 64 overlap — sweet spot for most use cases
Lessons Learned
- Chunking matters more than the model — garbage in, garbage out
- Hybrid search wins — combine vector similarity with keyword BM25
- Reranking is worth it — a cheap reranker on top-20 results beats top-5 vector search
- Evaluation is hard — build a test set early, even if it's small
What's Next?
I'm experimenting with:
Multi-modal RAG(images + text) using vision models- Agentic RAG where the LLM decides when and what to retrieve
- Graph RAG for documents with complex entity relationships
The full source code is available on GitHub. Star it if you find it useful!
