How to Implement Custom Retrieval Strategies in LlamaIndex

This blog delves into the basics of retrieval in LlamaIndex, and then progresses to advanced techniques like recency-aware, context-aware, and hybrid retrieval strategies.

GraphQL has a role beyond API Query Language- being the backbone of application Integration
background Coditation

How to Implement Custom Retrieval Strategies in LlamaIndex

We're in the era of AI agents that promise to truly understand and reason. But these artificial minds are only as good as the knowledge they can access and comprehend. LlamaIndex is one of the many toolkits around helping us build AI agents by providing ready-to-use functionality to access external data. With custom retrieval strategies at its core, LlamaIndex is one of the most widely used tool for crafting AI agents. Today, in this article, we will show you how to implement and use custom retrieval strategy with LlamaIndex.

Custom Retrieval Strategies: The Secret Sauce

Now, you might be thinking, "That's all well and good, but what about custom retrieval strategies?" Glad you asked! Custom retrieval strategies are where the magic happens. They allow you to tailor how LlamaIndex fetches and processes information, making your applications smarter and more context-aware.

Let's break it down with an analogy. Imagine you're at a massive library (your data). A standard retrieval strategy is like asking a librarian to fetch books based on their titles. A custom retrieval strategy, on the other hand, is like having a team of expert librarians who not only know the books by title but understand their contents, themes, and even how they relate to each other. Cool, right?

Getting Started: Setting Up Your Environment

Before we dive into the code, let's make sure we're all on the same page. First things first, you'll need to install LlamaIndex. Open up your terminal and run:


pip install llama-index

You might also want to install some additional dependencies:


pip install numpy pandas scikit-learn

Great! Now we're cooking with gas. Let's move on to the main course.

Basic Retrieval in LlamaIndex

Before we start customizing, it's important to understand how basic retrieval works in LlamaIndex. Here's a simple example:


from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader
# Load documents
documents = SimpleDirectoryReader('data').load_data()
# Create an index
index = GPTSimpleVectorIndex.from_documents(documents)
# Perform a query
response = index.query("What is the capital of France?")
print(response)

This is straightforward, but what if we want more control over how our data is retrieved? That's where custom retrieval strategies come in.

Implementing a Custom Retrieval Strategy

Let's implement a custom retrieval strategy that takes into account the recency of documents. We'll assume that more recent documents are more relevant.


from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores import SimpleVectorStore
from llama_index.storage.storage_context import StorageContext
from llama_index.embeddings import OpenAIEmbedding
from datetime import datetime
import numpy as np

class RecencyAwareRetriever:
    def __init__(self, index, recency_weight=0.5):
        self.index = index
        self.recency_weight = recency_weight

    def retrieve(self, query):
        # Get basic similarity scores
        basic_results = self.index.as_retriever().retrieve(query)

        # Calculate recency scores
        current_time = datetime.now()
        max_age = max([(current_time - r.node.extra_info['created_at']).days for r in basic_results])
        
        for result in basic_results:
            age = (current_time - result.node.extra_info['created_at']).days
            recency_score = 1 - (age / max_age)
            
            # Combine similarity and recency scores
            result.score = (1 - self.recency_weight) * result.score + self.recency_weight * recency_score

        # Sort by combined score
        return sorted(basic_results, key=lambda x: x.score, reverse=True)

# Load and prepare documents
documents = SimpleDirectoryReader('data').load_data()
for doc in documents:
    doc.extra_info = {'created_at': datetime.now()}  # You'd typically load real timestamps here

# Create the index
embed_model = OpenAIEmbedding()
vector_store = SimpleVectorStore()
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context, embed_model=embed_model)

# Create our custom retriever
custom_retriever = RecencyAwareRetriever(index)
# Use the custom retriever
results = custom_retriever.retrieve("What are the latest developments in AI?")
for result in results[:5]:
    print(f"Score: {result.score:.4f}, Content: {result.node.get_content()[:100]}...")
    
    

In this example, we've created a RecencyAwareRetriever class that combines traditional similarity scores with a recency factor. This means that more recent documents will be given a boost in relevance, which can be crucial in fast-moving fields like AI and technology.

Taking It Further: Contextual Retrieval

Now that we've got a taste for custom retrieval, let's kick it up a notch. What if we want our retrieval to be aware of the user's context? For instance, if we're building a personalized news aggregator, we might want to take into account the user's interests and reading history.

Here's how we might approach this:


from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores import SimpleVectorStore
from llama_index.storage.storage_context import StorageContext
from llama_index.embeddings import OpenAIEmbedding
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np

class ContextAwareRetriever:
    def __init__(self, index, user_interests, user_history):
        self.index = index
        self.user_interests = user_interests
        self.user_history = user_history
        self.vectorizer = TfidfVectorizer()
        
        # Create TF-IDF matrix of user interests and history
        user_docs = user_interests + [doc.get_content() for doc in user_history]
        self.user_tfidf = self.vectorizer.fit_transform(user_docs)

    def retrieve(self, query):
        # Get basic results
        basic_results = self.index.as_retriever().retrieve(query)
        
        for result in basic_results:
            # Calculate TF-IDF similarity with user interests and history
            doc_tfidf = self.vectorizer.transform([result.node.get_content()])
            similarity = np.mean(self.user_tfidf.dot(doc_tfidf.T).toarray())
            
            # Combine with original score
            result.score = 0.7 * result.score + 0.3 * similarity

        # Sort by combined score
        return sorted(basic_results, key=lambda x: x.score, reverse=True)

# Load and prepare documents
documents = SimpleDirectoryReader('data').load_data()

# Create the index
embed_model = OpenAIEmbedding()
vector_store = SimpleVectorStore()
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context, embed_model=embed_model)

# Mock user data
user_interests = ["artificial intelligence", "machine learning", "data science"]
user_history = documents[:5]  # Assume these are the last 5 articles the user read

# Create our context-aware retriever
context_retriever = ContextAwareRetriever(index, user_interests, user_history)

# Use the context-aware retriever
results = context_retriever.retrieve("What are the latest developments in technology?")
for result in results[:5]:
    print(f"Score: {result.score:.4f}, Content: {result.node.get_content()[:100]}...")
    
    

In this example, we've created a ContextAwareRetriever that takes into account the user's interests and reading history. It uses TF-IDF to calculate the similarity between the retrieved documents and the user's context, then combines this with the original similarity score.

The Power of Hybrid Approaches

While we've looked at recency-aware and context-aware strategies separately, in real-world applications, you might want to combine multiple strategies. Let's create a hybrid approach that takes into account recency, user context, and even sentiment:


from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores import SimpleVectorStore
from llama_index.storage.storage_context import StorageContext
from llama_index.embeddings import OpenAIEmbedding
from sklearn.feature_extraction.text import TfidfVectorizer
from textblob import TextBlob
from datetime import datetime
import numpy as np

class HybridRetriever:
    def __init__(self, index, user_interests, user_history, recency_weight=0.3, context_weight=0.3, sentiment_weight=0.1):
        self.index = index
        self.user_interests = user_interests
        self.user_history = user_history
        self.recency_weight = recency_weight
        self.context_weight = context_weight
        self.sentiment_weight = sentiment_weight
        self.vectorizer = TfidfVectorizer()
        
        user_docs = user_interests + [doc.get_content() for doc in user_history]
        self.user_tfidf = self.vectorizer.fit_transform(user_docs)

    def retrieve(self, query):
        basic_results = self.index.as_retriever().retrieve(query)
        
        current_time = datetime.now()
        max_age = max([(current_time - r.node.extra_info['created_at']).days for r in basic_results])
        
        for result in basic_results:
            # Recency score
            age = (current_time - result.node.extra_info['created_at']).days
            recency_score = 1 - (age / max_age)
            
            # Context score
            doc_tfidf = self.vectorizer.transform([result.node.get_content()])
            context_score = np.mean(self.user_tfidf.dot(doc_tfidf.T).toarray())
            
            # Sentiment score
            sentiment = TextBlob(result.node.get_content()).sentiment.polarity
            sentiment_score = (sentiment + 1) / 2  # Normalize to 0-1
            
            # Combine scores
            result.score = (
                (1 - self.recency_weight - self.context_weight - self.sentiment_weight) * result.score +
                self.recency_weight * recency_score +
                self.context_weight * context_score +
                self.sentiment_weight * sentiment_score
            )

        return sorted(basic_results, key=lambda x: x.score, reverse=True)

# Load and prepare documents
documents = SimpleDirectoryReader('data').load_data()
for doc in documents:
    doc.extra_info = {'created_at': datetime.now()}  # You'd typically load real timestamps here

# Create the index
embed_model = OpenAIEmbedding()
vector_store = SimpleVectorStore()
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context, embed_model=embed_model)

# Mock user data
user_interests = ["artificial intelligence", "machine learning", "data science"]
user_history = documents[:5]  # Assume these are the last 5 articles the user read

# Create our hybrid retriever
hybrid_retriever = HybridRetriever(index, user_interests, user_history)

# Use the hybrid retriever
results = hybrid_retriever.retrieve("What are the latest positive developments in AI?")
for result in results[:5]:
    print(f"Score: {result.score:.4f}, Content: {result.node.get_content()[:100]}...")
    
    

This hybrid approach combines multiple factors:

  1. The original similarity score from LlamaIndex
  2. A recency score, favoring more recent documents
  3. A context score based on the user's interests and history
  4. A sentiment score, which could be useful for queries specifically looking for positive or negative information

By adjusting the weights, you can fine-tune how much each factor influences the final ranking.

Performance Considerations

While these custom retrieval strategies can significantly improve the relevance of results, it's important to consider their impact on performance. Here are a few tips to keep in mind:

  1. Caching: If you're calculating complex scores (like TF-IDF similarities), consider caching the results to avoid recalculating for every query.
  2. Batching: When dealing with large datasets, process documents in batches to manage memory usage.
  3. Indexing: Use appropriate indexing techniques to speed up searches, especially for recency-based queries.
  4. Asynchronous Processing: For non-time-critical updates (like updating user context), consider using asynchronous processing to avoid slowing down query responses.

Here's a quick example of how you might implement caching:


from functools import lru_cache

class CachedHybridRetriever(HybridRetriever):
    @lru_cache(maxsize=1000)
    def calculate_context_score(self, doc_content):
        doc_tfidf = self.vectorizer.transform([doc_content])
        return np.mean(self.user_tfidf.dot(doc_tfidf.T).toarray())

    def retrieve(self, query):
        basic_results = self.index.as_retriever().retrieve(query)
        
        current_time = datetime.now()
        max_age = max([(current_time - r.node.extra_info['created_at']).days for r in basic_results])
        
        for result in basic_results:
            # Recency score
            age = (current_time - result.node.extra_info['created_at']).days
            recency_score = 1 - (age / max_age)
            
            # Context score (now cached)
            context_score = self.calculate_context_score(result.node.get_content())
            
            # ... rest of the method remains the same
            
            

This caching approach can significantly speed up repeated queries, especially if you have a large number of documents that are frequently accessed.

Conclusion: The Future of Information Retrieval

As we've seen, implementing custom retrieval strategies in LlamaIndex opens up a world of possibilities for context-aware information retrieval. By combining traditional similarity measures with factors like recency, user context, and even sentiment, we can create highly personalized and relevant search experiences.
The examples we've explored here are just the tip of the iceberg. As AI and machine learning continue to evolve, we can expect even more sophisticated retrieval strategies to emerge. Perhaps we'll see strategies that incorporate real-time data streams, or that dynamically adjust their parameters based on user feedback.

Want to receive update about our upcoming podcast?

Thanks for joining our newsletter.
Oops! Something went wrong.

Latest Articles

Implementing Custom Instrumentation for Application Performance Monitoring (APM) Using OpenTelemetry

Application Performance Monitoring (APM) has become crucial for businesses to ensure optimal software performance and user experience. As applications grow more complex and distributed, the need for comprehensive monitoring solutions has never been greater. OpenTelemetry has emerged as a powerful, vendor-neutral framework for instrumenting, generating, collecting, and exporting telemetry data. This article explores how to implement custom instrumentation using OpenTelemetry for effective APM.

Mobile Engineering
time
5
 min read

Implementing Custom Evaluation Metrics in LangChain for Measuring AI Agent Performance

As AI and language models continue to advance at breakneck speed, the need to accurately gauge AI agent performance has never been more critical. LangChain, a go-to framework for building language model applications, comes equipped with its own set of evaluation tools. However, these off-the-shelf solutions often fall short when dealing with the intricacies of specialized AI applications. This article dives into the world of custom evaluation metrics in LangChain, showing you how to craft bespoke measures that truly capture the essence of your AI agent's performance.

AI/ML
time
5
 min read

Enhancing Quality Control with AI: Smarter Defect Detection in Manufacturing

In today's competitive manufacturing landscape, quality control is paramount. Traditional methods often struggle to maintain optimal standards. However, the integration of Artificial Intelligence (AI) is revolutionizing this domain. This article delves into the transformative impact of AI on quality control in manufacturing, highlighting specific use cases and their underlying architectures.

AI/ML
time
5
 min read