AleFeri

Simple Open-source rag application using new LLM's

Python9/5/2025

Simple Open-source rag application using new LLM's

Building a Simple RAG Application with Modern LLMs

Retrieval-Augmented Generation (RAG) has become one of the most practical ways to make LLMs useful with your own data. In this post, I'll walk through how I built an open-source RAG app from scratch.

What is RAG?

RAG combines the power of large language models with a retrieval system that fetches relevant context from your own documents before generating a response.

Instead of fine-tuning a model (expensive and slow), RAG lets you:

  • Ground responses in your actual data
  • Reduce hallucinations by providing real context
  • Update knowledge without retraining

Architecture Overview

User Query
    │
    ▼
┌──────────┐     ┌──────────────┐
│ Embedding │────▶│ Vector Store │
│  Model    │     │  (pgvector)  │
└──────────┘     └──────┬───────┘
                        │ Top-K results
                        ▼
                 ┌──────────────┐
                 │   LLM Prompt │
                 │ Query+Context│
                 └──────┬───────┘
                        │
                        ▼
                   Response

Tech Stack

ComponentTechnologyWhy
EmbeddingsOpenAI text-embedding-3-smallBest cost/quality ratio
Vector DBPostgreSQL + pgvectorNo extra infrastructure
LLMGPT-4o / ClaudeInterchangeable via API
BackendPython + FastAPIAsync support, type hints
FrontendReact + TypeScriptFamiliar, fast iteration

The Chunking Strategy

One of the most critical decisions in RAG is how you split your documents. Here's what worked best:

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=64,
    separators=["\n\n", "\n", ". ", " ", ""]
)

chunks = splitter.split_documents(documents)

Why 512 tokens?

  1. Too small (128) — loses context, retrieval becomes noisy
  2. Too large (2048) — dilutes relevant information
  3. 512 with 64 overlap — sweet spot for most use cases

Lessons Learned

  • Chunking matters more than the model — garbage in, garbage out
  • Hybrid search wins — combine vector similarity with keyword BM25
  • Reranking is worth it — a cheap reranker on top-20 results beats top-5 vector search
  • Evaluation is hard — build a test set early, even if it's small

What's Next?

I'm experimenting with:

  • Multi-modal RAG (images + text) using vision models
  • Agentic RAG where the LLM decides when and what to retrieve
  • Graph RAG for documents with complex entity relationships

The full source code is available on GitHub. Star it if you find it useful!