So, What Is Chunking?

04 Jan, 2025

By Shreyash and Bhavnick

Chonkie's claim to fame is chunking. But what exactly is chunking? And what makes an ideal chunk or chunker? Let's dive in.

Defining Chunking

First, let’s define “Chunking.”

Chunking is the process of breaking down large documents into smaller, manageable pieces. These chunks can then be fed to an LLM during inference, to supplement its knowledge with the latest information (a process known as "Retrival Augmented Generation"). Here's how it works step-by-step:

Collect Data Sources
Gather the documents or information you want to use to enhance your model's responses.
Create Chunks!
Use a chunking algorithm to divide your documents into smaller, manageable chunks. Then, generate vector embeddings for each of these chunks.
Store Chunks in a Vector Database
Store these embeddings in a vector database for retrieval later.
Retrieve Your Chunks!
When a user asks a question, use their query to search for similar chunks in your vector database.
Provide Relevant Context
Feed the retrieved embeddings to the model during inference, thereby giving it the latest and most relevant context.

By incorporating chunking into the RAG workflow, you ensure your model is more accurate, context-aware, and reliable.

What is Chunking-cropped

Later, at Inference Time...

What is Chunking (1)-cropped

What Makes an Ideal Chunk?

An ideal chunk has three key traits:

Reconstructable: When combined, chunks should recreate the original text without missing a beat. This guarantees that all the information in the original document is preserved in the chunks.
Independent: Each chunk should tackle a single idea. Keeping chunks independent ensures that related information from the original text is kept together and can be easily retrieved when needed.
Sufficient: Chunks should contain enough information to be meaningful and useful for the model.

And an Ideal Chunker?

An ideal chunker is:

Smart: Breaks text into reconstructable, independent, and sufficient chunks.
Deterministic: Always produces the same chunks for the same input.
Efficient: Works fast and doesn’t hog resources.

These values make up the philosophy behind Chonkie's chunkers. You can read more about them here.

Why Is Chunking So Important?

Can’t we just feed the whole document to the model? Nope. Here’s why chunking is absolutely essential:

1. Limited Context Windows

All models have a limit on how much text they can process at once. This is referred to as their "context window". Chunking breaks down large documents into manageable pieces that fit within these limits.

2. Computational Efficiency

Processing a 100GB document every time you make a query? Bad idea. Attention mechanisms, even optimized, are computationally expensive (O(n)). Chunking keeps things efficient and memory-friendly.

3. Better Representation

As mentioned earlier, chunks represent each idea as an independent entity. Not chunking your document will likely cause your model to conflate concepts and get confused. Representation models use lossy compression, so keeping chunks concise ensures the model understands the context better.

4. Reduced Hallucination

Feeding too much context at once makes models hallucinate. They start using irrelevant information to answer queries, and that’s a big no-no. Smaller, focused chunks reduce this risk.

All of this makes chunking a must-have for RAG applications. Don't get caught using your whole document as a single chunk!

Getting Started with Chunking

Ready to Chunk? Chonkie the Hippo is here to help! Check out the Quick Start Guide to start chunking like a pro.

Remember: chunking isn’t just a process—it’s an art. And with Chonkie by your side, you’re in good paws.

Happy chunking! 🦛✨