Welcome to an in-depth exploration of raptor rag, a powerful technique for hierarchical indexing in retrieval-augmented generation (RAG) systems.

In this article, we dive into the RAPTOR method, which innovatively addresses the challenges of retrieving both low-level and high-level information from large document corpora. Drawing from the insights and explanations by Lance from LangChain, we unravel the concepts, implementation details, and advantages of RAPTOR, empowering you to grasp this technique and apply it effectively in your own projects.

Why RAPTOR RAG? The Retrieval Challenge in RAG Systems

RAG systems are designed to answer questions by retrieving relevant information from a large corpus of documents. However, a significant challenge arises because queries can vary greatly in their scope and detail:

  • Low-level questions require very detailed information, often referencing facts contained within a single document or a specific chunk of text.
  • High-level questions demand consolidation and synthesis of information spanning multiple documents or many chunks within a document.

Traditional retrieval methods, such as k-nearest neighbors (kNN) search, typically retrieve a fixed number (k) of chunks. This approach works well for low-level questions where the answer resides within a few chunks. But what happens when a question requires information distributed across more chunks than the k parameter allows? For example, if k=3 but your question would benefit from insights drawn from five or six different chunks, the retrieval may miss critical information.

This gap in retrieval capability motivates the development of hierarchical indexing methods like RAPTOR, which aim to provide semantic coverage across different levels of abstraction, from granular document chunks to broad, consolidated summaries.

Illustration of kNN retrieval limitations in RAG systems

Introducing RAPTOR: Hierarchical Indexing with Recursive Summarization

RAPTOR stands for a hierarchical indexing technique that builds a tree of document summaries to address the retrieval challenges mentioned above. The core idea is to create multiple layers of abstraction over your documents, enabling the retrieval system to access both detailed and high-level information depending on the question's nature.

Here's the high-level intuition behind RAPTOR:

  1. Start with your base documents, which act as the leaves of a tree structure.
  2. Cluster similar documents or chunks together based on their semantic embeddings.
  3. Summarize each cluster, creating a higher-level representation that distills the core ideas of the grouped documents.
  4. Repeat this clustering and summarization process recursively, building successive layers of summaries until you reach a limit or a single, comprehensive summary of the entire corpus.
  5. Index all these layers—raw documents and summaries—together in a vector store.

This process creates a hierarchical index where queries can retrieve information at the appropriate level of detail:

  • Low-level questions will find close matches among the raw document chunks.
  • High-level questions will align better with the summaries of clusters, allowing semantic search to access broader concepts effectively.
Hierarchical tree of document summaries in RAPTOR

How This Hierarchy Improves Retrieval

By incorporating summaries at multiple levels, RAPTOR enables better semantic coverage across the abstraction hierarchy of question types. This means:

  • The retrieval system is no longer limited by a fixed k parameter that restricts the number of chunks retrieved.
  • It can leverage both detailed and synthesized knowledge, improving answer accuracy for a wide range of queries.
  • The method helps mitigate fragmentation of information that can occur when documents are split into many small chunks.

The original RAPTOR paper and related studies demonstrate that this approach enhances retrieval effectiveness, especially for high-level questions requiring synthesis across multiple documents.

Comparison of raw chunks and hierarchical summaries in RAPTOR

Applying RAPTOR RAG: A Practical Walkthrough

Let's explore how RAPTOR can be applied in practice, based on a project involving LangChain's expression language documentation. The corpus consists of approximately 30 documents, each varying in size—most under 4,000 tokens, but some larger.

The main steps taken in the implementation are as follows:

  1. Load the documents: All LangChain expression language docs are loaded as raw text.
  2. Embedding: Each document is embedded into a vector space using an embedding model.
  3. Clustering: Documents are grouped into clusters based on embedding similarity.
  4. Summarization: Each cluster is summarized using a language model, condensing the key information.
  5. Recursive processing: The clustering and summarization steps are repeated recursively, building a multi-level summary tree. In this case, the recursion depth was set to three levels.
  6. Indexing: Both the raw documents (leaves) and all generated summaries (higher-level nodes) are indexed together in a vector store.
Loading and analyzing LangChain documents for RAPTOR

Models and Tools Used

The implementation leverages multiple language models, including OpenAI's GPT models and Claude, for embedding and summarization tasks. The clustering method is based on techniques described in the RAPTOR paper, which employ semantic similarity to group related documents effectively.

All code for the process is available in LangChain's RAG From Scratch repository, providing a detailed notebook that walks through each step of the RAPTOR pipeline.

Clustering and summarizing clusters using language models

Deep Dive Into the Code

While the RAPTOR technique involves several detailed steps, the core logic can be summarized as follows:

  • Embedding and Clustering: Documents are converted into embeddings and clustered based on similarity.
  • Summarization: Each cluster's documents are combined into a summary using a language model.
  • Recursion: The summarization and clustering steps are applied recursively to the summaries themselves, building a tree of summaries.
  • Index Construction: Finally, all leaf documents and summary nodes are indexed together.

Here is a conceptual snippet of the recursive process logic:

RecursiveEmbeddingCluster(documents, depth_limit):

Embed documents

Cluster embeddings

For each cluster:

If depth_limit reached, return documents

Else, summarize cluster documents

Recursively call RecursiveEmbeddingCluster on summaries with reduced depth_limit

This recursive approach produces a hierarchical tree where each node summarizes its child nodes, enabling queries to retrieve information from the appropriate level.

Recursive embedding and clustering process for RAPTOR

Advantages of RAPTOR for Large-Scale Document Retrieval

Several factors make RAPTOR particularly attractive for modern RAG applications, especially those dealing with large or complex document sets:

  • Handles Long Contexts: Modern LLMs can process very large token contexts—up to hundreds of thousands or even a million tokens—making it feasible to summarize large clusters without splitting.
  • Reduces Need for Chunk Splitting: By clustering entire documents or large chunks before summarization, RAPTOR avoids arbitrary splits that can lose context.
  • Improves Semantic Coverage: The hierarchical index improves retrieval for both granular and abstract queries.
  • Flexible Recursion Levels: Developers can tune the number of recursive summarization levels to balance detail and abstraction.
  • Scalable and Modular: The approach naturally scales to large corpora by building a tree structure rather than flat indexing.

In the LangChain example, RAPTOR was applied to expression language docs with great success, indexing the raw documents and summaries together to enable versatile querying.

RAPTOR hierarchical index applied to LangChain documents

Frequently Asked Questions About RAPTOR RAG

What does RAPTOR stand for?

RAPTOR is not an acronym but the name of a hierarchical indexing technique introduced in a research paper by Parth Sarthi and colleagues. It focuses on recursive clustering and summarization to create multi-level document indexes.

How does RAPTOR differ from standard kNN retrieval?

Standard kNN retrieval retrieves a fixed number of document chunks based on similarity to a query. RAPTOR builds a hierarchical index of summaries and raw documents, allowing retrieval at multiple abstraction levels, which is especially useful for complex queries requiring broad context.

Can RAPTOR handle very large documents?

Yes. RAPTOR is designed to work well with large documents by clustering and summarizing without needing to split documents arbitrarily. Modern LLMs' ability to handle large token contexts enables this capability.

What models are used for summarization and embedding in RAPTOR?

RAPTOR implementations often use state-of-the-art language models such as OpenAI's GPT series and Anthropic's Claude for summarization and embedding tasks.

Is the RAPTOR code available for public use?

Yes, LangChain provides open-source notebooks demonstrating RAPTOR implementations, including recursive clustering and summarization. You can find them in the LangChain GitHub repository.

Conclusion: Harnessing the Power of RAPTOR RAG

The RAPTOR technique offers a robust solution to the challenges of retrieval in RAG systems, especially when dealing with questions that span different levels of detail. By recursively clustering and summarizing documents, RAPTOR creates a hierarchical index that enables semantic search to operate effectively across both granular and high-level queries.

Applying RAPTOR to real-world document corpora, like the LangChain expression language documentation, demonstrates its practical value and scalability. As large language models continue to evolve, their increasing capacity to handle vast contexts makes RAPTOR an even more compelling choice for building intelligent, context-aware retrieval systems.

Whether you're building a RAG system for enterprise knowledge bases, research papers, or any large document corpus, raptor rag is a technique worth exploring. It bridges the gap between detailed fact retrieval and broad conceptual understanding, enhancing the overall user experience and answer quality.

For those interested in diving deeper, detailed code notebooks and research papers are available, providing comprehensive guidance on implementing RAPTOR in your projects.

Summary and final thoughts on RAPTOR RAG technique

This article was created from the video RAG From Scratch: Part 13 (RAPTOR) with the help of AI.

Share this post