Skip to main content

Command Palette

Search for a command to run...

Chunking Methods for RAG: What and Why

Published
10 min read
Chunking Methods for RAG: What and Why

The Day My RAG System Failed

Meet Charlie, a developer who learned a valuable lesson about RAG systems the hard way. (Let’s just call him Charlie but we all really know who he is. 😉)

Last year, Charlie built what he thought was the perfect RAG (Retrieval-Augmented Generation) system for a Solar Company. The architecture was elegant, the UI sleek, and the LLM integration seamless. It was perfect for consuming PDFs and being a central source of truth for onboarding new employees.

Then came the testing portion. An employee asked the system a specific question about sales technique. The system confidently responded with completely wrong information that contradicted their report. The room went silent.

That was the day Charlie learned that a RAG system is only as good as its chunking strategy, and he had chosen the wrong one.

You might think chunking documents is just about splitting text into smaller pieces. But in reality, it's the strategic foundation that can make or break your entire RAG system's effectiveness.

Why Most RAG Systems Fail Despite Using Advanced Models

When engineers build RAG systems, they often focus on the fancy parts, the latest embedding models, vector stores, and prompt engineering techniques. Yet many overlook the humble chunking step, treating it as a trivial preprocessing task.

This is a big mistake.

No matter how advanced your retrieval algorithms or language models are, if your chunks don't properly preserve context and semantic meaning, your RAG system will inevitably deliver hallucinations and irrelevant responses.

The Three Chunking Methods You Need to Know

1. Fixed-Size Chunking: The Default Trap

Most engineers start with fixed-size chunking, slicing documents into equal segments of token or character counts. It's simple and conventional.

But here's the shocking truth: fixed-size chunking regularly destroys the semantic cohesion of your content. When you arbitrarily split text every 512 tokens, you're likely cutting right through important concepts, breaking relationships between sentences, and fragmenting contextual information.

Example Scenario

Imagine we're feeding a RAG system these two sentences:

  • "The brown dog jumps over the lazy fox."

  • "A brown dog jumps over the lazy fox quickly."

If we use fixed-size chunks (e.g., 20 characters), we might get these chunks:

  • Sentence 1: "The brown dog jumps o" and "ver the lazy fox."

  • Sentence 2: "A brown dog jumps ove" and "r the lazy fox quickly."

Now, if a user asks "What animal jumps over the lazy fox?", neither chunk perfectly captures the key information. The query terms are split across chunks due to the slight shift ("The" vs. "A"). The RAG system might miss the crucial link, even though both sentences clearly answer the question. This demonstrates how fixed-size chunking can break semantic context and hurt retrieval accuracy.

2. Recursive Character Text Splitting

Recursive character splitting aims to create smart chunks, but it can still fragment context.

Example Scenario

Recursive character splitting breaks text by punctuation (periods, etc.), then characters if needed. Sounds good, but it can still fragment context.

Example:

Consider: "Climate change threatens coasts. Rising sea levels are a problem. Communities rely on fishing and tourism. Reducing emissions is crucial."

Recursive splitting (aiming for ~100-character chunks) might give:

  • "Climate change threatens coasts. Rising sea levels are a problem."

  • "Communities rely on fishing and tourism. Reducing emissions is crucial."

If someone asks, "How do we protect coastal communities?", the link between climate change specifically and the need for emissions reductions is weakened. It's now less clear that reducing emissions directly addresses the threats caused by climate change.

The problem? Even with punctuation-based splitting, semantic relationships between chunks can be lost. Don't assume it's perfect; experiment and consider smarter chunking for optimal RAG!

3. Semantic Chunking: The Contextual Approach

Unlike fixed-size chunking, semantic chunking preserves meaning by respecting natural boundaries in the text, paragraphs, sections, or semantic units.

Example Scenario

"The use of Large Language Models (LLMs) has revolutionized Natural Language Processing (NLP). LLMs, trained on massive datasets, demonstrate impressive capabilities in text generation, translation, and question answering. However, LLMs also present challenges. One significant concern is the potential for generating biased or harmful content. Careful data curation and bias mitigation techniques are crucial. Another challenge is the computational cost associated with training and deploying these models. Research into more efficient architectures is ongoing. Despite these challenges, the benefits of LLMs are undeniable, and their applications are rapidly expanding across various industries."

Fixed-Size Chunking (Problem):

If we used a fixed-size chunk of, say, 200 characters, we might get chunks like:

  • "The use of Large Language Models (LLMs) has revolutionized Natural Language Processing (NLP). LLMs, trained on massive datasets, demonstrate impressive capabilities in text generation, translation, and quest"

  • "ion answering. However, LLMs also present challenges. One significant concern is the potential for generating biased or harmful content. Careful data curation and bias mitigation techniques are cru"

  • "cial. Another challenge is the computational cost associated with training and deploying these models. Research into more efficient architectures is ongoing. Despite these challenges, the benefits"

  • " of LLMs are undeniable, and their applications are rapidly expanding across various industries."

Notice how the chunks break in the middle of sentences and thoughts.

Semantic Chunking (Solution):

A semantic chunking approach might produce the following chunks, identifying logical breaks between topics:

  • Chunk 1: "The use of Large Language Models (LLMs) has revolutionized Natural Language Processing (NLP). LLMs, trained on massive datasets, demonstrate impressive capabilities in text generation, translation, and question answering." (This chunk focuses on the positive impact of LLMs)

  • Chunk 2: "However, LLMs also present challenges. One significant concern is the potential for generating biased or harmful content. Careful data curation and bias mitigation techniques are crucial." (This chunk focuses on the bias challenge and mitigation)

  • Chunk 3: "Another challenge is the computational cost associated with training and deploying these models. Research into more efficient architectures is ongoing." (This chunk focuses on the computational cost challenge)

  • Chunk 4: "Despite these challenges, the benefits of LLMs are undeniable, and their applications are rapidly expanding across various industries." (This chunk serves as a conclusion, summarizing the overall value of LLMs).

Why Semantic Chunking Wins:

  • If the user asks: "What are the advantages of LLMs?", Chunk 1 is a perfect match.

  • If the user asks: "What are the challenges with LLMs?", Chunks 2 and 3 provide detailed answers to different challenges.

  • If the user asks: "Are LLMs useful despite their problems?", Chunk 4 provides the concluding perspective.

Traditional chunking methods (fixed-size, punctuation-based) often fragment context, hurting RAG performance. Semantic chunking aims for meaningful chunks, leading to better retrieval and generation.

Why This Works (A Simplified Analogy):

Imagine each word has a "location" in a semantic space. You could imagine the location as a single point in a vector space described by the values of the embedding vector. Semantic chunking tries to group words into chunks where the "average location" (centroid) of all words in the chunk is close to each individual word. The closer the words are, the more coherent and semantically related the chunk is. This minimizes "semantic distance" within each chunk, maximizing its relevance. Fixed chunking ignores this all together.

How it Works (Simply):

Instead of rigidly sticking to character counts or punctuation, semantic chunking tries to understand the text. It identifies logical boundaries based on the content itself. This might involve:

  • Looking for topic shifts.

  • Identifying clear beginnings and endings of arguments.

  • Using more sophisticated NLP techniques to recognize semantic similarity within a chunk.

A visual example:

Note that the sentences and the vectors that represent them on the graph are arbitrary, and only exists to show what it means to semantically chunk.*

The moment the next sentence is far off, its automatically considered as another chunk. One could imagine this as similar to removing duplicates in a database to save storage and compute resources (Although objectively speaking, these vectors aren’t similar unless they are linearly dependent).

4. Agentic Chunking: The Advanced Solution

For complex documents with nested structure, hierarchical chunking creates multiple granularity levels—section-level chunks, paragraph-level chunks, and sentence-level chunks.

Even the best semantic chunking has limitations. It's static – analyzes data once and creates fixed chunks. This fails with messy data like scraped websites or complex PDFs.

The Problem: Unstructured Data & Varying Queries

  • Websites: Noisy HTML, ads, irrelevant disclaimers.

  • PDFs: Complex formatting, embedded images breaking text flow.

  • Long Text: Subtle topic shifts, making uniform chunking ineffective.

  • Query Variance: Some questions need broad context, others specific details. Static chunks can't adapt.

Agentic RAG: Dynamic & Adaptive Chunking to the Rescue!

Agentic RAG uses an "agent" (often a smaller LLM) to dynamically analyze the data and adapt the chunking strategy based on the source and the user's query.

Example: Scraping a Product Review Website

  • Static Chunking: You scrape a product review page and try to split by HTML structure. You end up with chunks containing navigation menus, ads, and user comments alongside the actual review.

  • Agentic RAG Approach:

    1. Agent Identifies Core Content: The agent identifies the main review text, ignoring irrelevant parts. It might use rules like, "Find the longest text block within the <article> tag" or "Identify the section with the highest concentration of keywords related to the product."

    2. Content-Aware Chunking: Now that the agent has the main article, it can do semantic chunking on just the review content, prioritizing sections with headings like "Pros," "Cons," or "Performance."

    3. Query-Aware Chunking: If the user asks, "What are the drawbacks of this product?", the agent could re-chunk the review, focusing specifically on sentences containing keywords related to "drawbacks," "cons," "problems," or "issues," creating highly targeted chunks.

Benefits of Agentic RAG:

  • Noise Reduction: Filters irrelevant content before chunking.

  • Contextual Understanding: Adapts to different data types (webpages, PDFs, etc.).

  • Query Optimization: Tailors chunk sizes and content to answer the user's specific question.

  • PDF Mastery: Handles PDFs by first extracting text, identifying headings, and chunking structurally.

  • Long Text Savvy: For long texts, employs techniques like sliding windows or hierarchical summarization to maintain context across large distances.

Cons of Agentic RAG:

  • Complexity Overload: Designing intelligent agents adds layers of code and complexity. You'll need stronger programming and NLP skills than with simple chunking. *

  • Higher Costs: Running agents, especially those powered by LLMs, consumes more computational resources. Expect increased latency and potentially higher cloud bills.

  • Prompt Engineering: Just like with other applications using LLMs, prompt engineering plays a critical role. If the prompts used to generate the agents are off, then the performance will not meet the requirements.

  • Over-Engineering Trap: It's tempting to over-engineer your agents. Start with simple agents and add complexity only when it demonstrably improves results.

Agentic RAG isn't just chunking; it's intelligent chunking. By dynamically adapting to the data and the user's needs, it unlocks far better accuracy and relevance in RAG systems compared to static approaches. If you're serious about RAG, you need to explore agentic strategies. But then again, it might be overkill to use Agentic RAG for simple use cases.

The Impact on Your RAG System's Performance

Choosing the right chunking method isn't just a technical decision, it's a business-critical one. Here's what happens when you get it right:

  1. Reduced Hallucinations: Proper chunks preserve context, giving the LLM less reason to "fill in the gaps" with fabricated information

  2. Improved Relevance: Better chunks mean more precise retrieval, ensuring responses actually address the user's query

  3. Enhanced Context Window Utilization: Strategic chunking makes better use of limited context windows in LLMs

  4. Lower Operational Costs: Better retrieval means fewer tokens processed and less computational overhead

Implementing the Right Chunking Strategy Today

The most successful RAG engineers I've worked with follow this process:

  1. Analyze your document structure and content type

  2. Experiment with multiple chunking strategies on a test dataset

  3. Measure retrieval effectiveness using precision, recall, and answer relevance metrics

  4. Implement a hybrid approach tailored to your specific knowledge base

Remember, what works for general web content may fail spectacularly for legal documents, code bases, or scientific papers.

Conclusion: The Decision That Will Define Your RAG System

As AI engineers, we're often drawn to the exciting parts of RAG, the latest models, complex retrievers, and advanced prompting techniques. But I've seen time and again that the engineers who master the seemingly mundane art of chunking are the ones who build systems that actually work when it matters most. For most cases, semantic chunking just might be enough.

What chunking method are you using in your RAG system today? And more importantly, are you absolutely certain it's the right one?


If you find this blog interesting, connect with me on Linkedin sure to leave a message!

Links:

https://www.youtube.com/@hddatascience

https://harveyducay.blog/

https://github.com/harvsDucs/