Improve the performance of LLMs on knowledge-intensive tasks by combining them with a knowledge retriever.
Knowledge-intensive tasks are those that require a deep understanding of facts and contextual information to provide accurate and meaningful responses. Retrieval-Augmented Generation (RAG) pattern can be particularly useful for such tasks as they combine pre-trained parametric models (like LLMs) with non-parametric memory (like dense vector indexes of large-scale knowledge sources) to enhance the model’s ability to access factual knowledge.
When it comes to knowledge-intensive tasks, traditional LLMs still exhibit certain limitations that hinder their overall effectiveness.
Incomplete and Static Knowledge: LLMs have limited and outdated knowledge due to their pre-training process.
Inaccurate Knowledge Retrieval: LLMs may generate text that is contextually incorrect or not fully aligned with the given facts, particularly when addressing questions or generating content that requires deep understanding.
Lack of Provenance: Traditional LLMs do not provide clear sources for the factual knowledge they utilize, which makes the generated content lose some credibility.
Difficulty in Updating Knowledge: Updating knowledge stored in LLMs is computationally expensive and time-consuming as it mostly requires retraining.
Given these limitations, traditional LLMs tend to under perform on knowledge-intensive tasks compared to task-specific architectures, which are specifically designed to access and manipulate external knowledge sources.
The main role of the embedding generator is to convert input text (e.g., either an input document or end user query) into a continuous dense vector representation. This is typically achieved by using a pre-trained neural encoder, which has been trained to understand the semantic relationships between different pieces of text.
For large size documents, the embedding generator can be used to generate embeddings for smaller chunks of the document. This can be done by using sliding windows or by using a sentence tokenizer. All the chunks together in a list paradigm would then represent the original document.
Some embedding generators are multi-modal and can also take images/audio as input and would generate the embeddings in the same vector space as the text.
The Knowledge Base is a fundamental component of the Retrieval-Augmented Generation (RAG) design pattern, serving as the external, non-parametric memory that stores a vast amount of factual information. The primary role of the Knowledge Base is to supply the RAG model with a wealth of accurate, up-to-date information that can be used to augment its responses in knowledge-intensive tasks
The main role of the query engine is to convert the end user query into the same vector space as the documents that were indexed into the knowledge base.
The LLM generator is responsible for generating the final response to the end user query. It takes the retrieved documents, and the original query as input and generates the response.
PROMPT_TMPL = (
"Context information is below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Assess if the context information is relevant and accordingly answer the question: {query_str}\n"
)
PROMPT_TMPL_TO_LIMIT_PRIOR_KNOWLEDGE = (
"Context information is below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given only the context information and not any prior knowledge, "
"answer the question factually: {query_str}\n"
)
PROMPT_TMPL_TO_SUMMARIZE = (
"Context information is below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Summarize the above context to refine knowledge about this query: {query_str}\n"
)
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader
# Reading documents from the data directory.
# The example assumes that the data directory contains the html from http://paulgraham.com/worked.html
documents = SimpleDirectoryReader('data').load_data()
# Ingesting documents into an in memory knowledge base.
# GPTVectorStoreIndex has a default openai embedding generator that generates dense
# vectors for the documents before indexing them into a simple dictionary index.
index = GPTVectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
# The query engine internally first uses the same embedding generator to generate a dense vector for the input query
# and then queries the index for the most similar documents to the query vector.
# The query engine then embeds the retrieved documents and the query into a template prompt and feeds it to the LLM.
# The LLM then generates a response to the query which is returned by the query engine.
response = query_engine.query("What did the author do growing up?")
print(response)
Improved performance on knowledge-intensive tasks
More control on specific vs diverse vs factual language generation
Provenance for generated content
Fact-based Text Generation: Generating text that requires incorporating accurate factual information, like writing a summary of an event, creating an informative article, or producing a detailed description of a specific topic.
Conversational AI: Building chatbots or virtual assistants that can provide detailed and accurate responses ( likely from a KnowledgeBase) in natural language conversations, demonstrating an understanding of context and factual knowledge.
Open-domain Question Answering (QA): Answering questions that span a wide range of topics and require access to a vast knowledge base, such as answering questions about history, science, literature, or general trivia.
Knowledge Base Population: Automatically populating a knowledge base with new entities and relationships by extracting and synthesizing information from multiple sources, such as web pages, news articles, or scientific papers.