Building a Simple RAG Application with Modern LLMs

Retrieval-Augmented Generation (RAG) has become one of the most practical ways to make LLMs useful with your own data. In this post, I'll walk through how I built an open-source RAG app from scratch.

What is RAG?

RAG combines the power of large language models with a retrieval system that fetches relevant context from your own documents before generating a response.

Instead of fine-tuning a model (expensive and slow), RAG lets you:

Ground responses in your actual data
Reduce hallucinations by providing real context
Update knowledge without retraining

Architecture Overview

User Query
    │
    ▼
┌──────────┐     ┌──────────────┐
│ Embedding │────▶│ Vector Store │
│  Model    │     │  (pgvector)  │
└──────────┘     └──────┬───────┘
                        │ Top-K results
                        ▼
                 ┌──────────────┐
                 │   LLM Prompt │
                 │ Query+Context│
                 └──────┬───────┘
                        │
                        ▼
                   Response

Tech Stack

Component	Technology	Why
Embeddings	OpenAI `text-embedding-3-small`	Best cost/quality ratio
Vector DB	PostgreSQL + pgvector	No extra infrastructure
LLM	GPT-4o / Claude	Interchangeable via API
Backend	Python + FastAPI	Async support, type hints
Frontend	React + TypeScript	Familiar, fast iteration

The Chunking Strategy

One of the most critical decisions in RAG is how you split your documents. Here's what worked best:

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=64,
    separators=["\n\n", "\n", ". ", " ", ""]
)

chunks = splitter.split_documents(documents)

Why 512 tokens?

Too small (128) — loses context, retrieval becomes noisy
Too large (2048) — dilutes relevant information
512 with 64 overlap — sweet spot for most use cases

Lessons Learned

Chunking matters more than the model — garbage in, garbage out
Hybrid search wins — combine vector similarity with keyword BM25
Reranking is worth it — a cheap reranker on top-20 results beats top-5 vector search
Evaluation is hard — build a test set early, even if it's small

What's Next?

I'm experimenting with:

~~Multi-modal RAG~~ (images + text) using vision models
Agentic RAG where the LLM decides when and what to retrieve
Graph RAG for documents with complex entity relationships

The full source code is available on GitHub. Star it if you find it useful!

Simple Open-source rag application using new LLM's