So, What Is Chunking?
By Shreyash and Bhavnick
Chonkie's claim to fame is chunking. But what exactly is chunking? And what makes an ideal chunk or chunker? Let's dive in.
Defining Chunking
First, let’s define “Chunking.”
Chunking is the process of breaking down large documents into smaller, manageable pieces. These chunks can then be fed to an LLM during inference, to supplement its knowledge with the latest information (a process known as "Retrival Augmented Generation"). Here's how it works step-by-step:
Collect Data Sources
Gather the documents or information you want to use to enhance your model's responses.Create Chunks!
Use a chunking algorithm to divide your documents into smaller, manageable chunks. Then, generate vector embeddings for each of these chunks.Store Chunks in a Vector Database
Store these embeddings in a vector database for retrieval later.Retrieve Your Chunks!
When a user asks a question, use their query to search for similar chunks in your vector database.Provide Relevant Context
Feed the retrieved embeddings to the model during inference, thereby giving it the latest and most relevant context.
By incorporating chunking into the RAG workflow, you ensure your model is more accurate, context-aware, and reliable.
Later, at Inference Time...
What Makes an Ideal Chunk?
An ideal chunk has three key traits:
- Reconstructable: When combined, chunks should recreate the original text without missing a beat. This guarantees that all the information in the original document is preserved in the chunks.
- Independent: Each chunk should tackle a single idea. Keeping chunks independent ensures that related information from the original text is kept together and can be easily retrieved when needed.
- Sufficient: Chunks should contain enough information to be meaningful and useful for the model.
And an Ideal Chunker?
An ideal chunker is:
- Smart: Breaks text into reconstructable, independent, and sufficient chunks.
- Deterministic: Always produces the same chunks for the same input.
- Efficient: Works fast and doesn’t hog resources.
These values make up the philosophy behind Chonkie's chunkers. You can read more about them here.
Why Is Chunking So Important?
Can’t we just feed the whole document to the model? Nope. Here’s why chunking is absolutely essential:
1. Limited Context Windows
All models have a limit on how much text they can process at once. This is referred to as their "context window". Chunking breaks down large documents into manageable pieces that fit within these limits.
2. Computational Efficiency
Processing a 100GB document every time you make a query? Bad idea. Attention mechanisms, even optimized, are computationally expensive (O(n)
). Chunking keeps things efficient and memory-friendly.
3. Better Representation
As mentioned earlier, chunks represent each idea as an independent entity. Not chunking your document will likely cause your model to conflate concepts and get confused. Representation models use lossy compression, so keeping chunks concise ensures the model understands the context better.
4. Reduced Hallucination
Feeding too much context at once makes models hallucinate. They start using irrelevant information to answer queries, and that’s a big no-no. Smaller, focused chunks reduce this risk.
All of this makes chunking a must-have for RAG applications. Don't get caught using your whole document as a single chunk!
Getting Started with Chunking
Ready to Chunk? Chonkie the Hippo is here to help! Check out the Quick Start Guide to start chunking like a pro.
Remember: chunking isn’t just a process—it’s an art. And with Chonkie by your side, you’re in good paws.
Happy chunking! 🦛✨