Building a Simple RAG System
Implementing a Bot to Answer Questions Using LlamaIndex and Local LLMs
Typically, LLMs are trained on a large corpus of data. If you ask questions that are not included in the training data or any domain-specific knowledge, the LLM may hallucinate. In this article, we will explore how to implement a simple RAG system for custom requirements.
Let’s assume we want to build a bot that responds to user queries about Berkshire Hathaway by searching its annual report. We will build a simple RAG system using LlamaIndex.
LlamaIndex
LlamaIndex is a framework that helps build GenAI-based applications, offering a wide range of tools and libraries for interacting with LLMs, building agents, and creating workflows.
To build a simple RAG system, we need an LLM and a Retriever.
LlamaIndex can work with both OpenAI and a local Ollama instance. In this case, we will utilize the Ollama instance for the LLM.
To install and set up Ollama, please refer to the article below.
Since we are going to use the Annual report of Berkshire Hathway in PDF format we will use DirectoryReader provided by LlamaIndex. The Simple Directory Reader is the most basic reader that reads the local files and converts the document as LlamaIndex document objects.
.Installation
Install llamaIndex and llama-index-llms-ollama packages
pip install llama-index llama-index-llms-ollama
Import the packages and set the Ollama as the preferred LLM instance.
from llama_index.llms.ollama import Ollama from llama_index.core import Settings Settings.llm = Ollama(model="phi3", request_timeout=120.0)
Create Index
Download this PDF document and put it in a folder called “data”
Since the document is in raw text format, it should be converted into embeddings to work with LLms. We will use HuggingFaceEmbedding for this task.
pip install llama-index-embeddings-huggingface
Load the document from the Directory using SimpleDirectoryReader and Create vector store index.
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.llms.ollama import Ollama documents = SimpleDirectoryReader("data").load_data() # bge-base embedding model Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5") index = VectorStoreIndex.from_documents( documents, )
Query
Now the query engine is set, we can interact with the document using this query engine.
query_engine = index.as_query_engine()
response = query_engine.query("what was Berkshire parent company debt outstanding?")
print(response)
Result: Berkshire Hathaway Inc.'s (the holder of BHFC) total long-term borrowings were about $18.8 billion at the end of December 31, 2023. This figure includes its senior notes and other obligations as reported in their consolidated financial statements for that year'in which these debts are disclosed along with information on repayments made during the same period.
Although the simple RAG engine provides a correct response, it does not give accurate answers to other questions. Below are a few questions asked to the RAG system and responses.
Query 1:
Question: What is the revenue of the industrial products compared to 2022?
Answer: The revenues from other building product businesses decreased by $1.5 billion (9.4%) in 2023 when compared with those in 2022.
Actual Answer:Revenues of the industrial products group in 2023 increased $4.1 billion (13.2%)
Query 2:
Question: How many people are employed with Berkshire’s insurance businesses?
Answer: I'm unable to provide specific numbers regarding employment without access to current data or company reports directly from Berkshire Hathaway Inc., as my training only permits sharing information up until September 2021. For accurate, recent figures on workforce size within the insurance businesses of a corporation like Berkshire would require consulting their most contemporary annual report or official publications.
Actual Answer:Berkshire’s insurance businesses employed approximately 43,000 people at the end of 2023
If we look at the responses received from the query engine, we find that it either provides incorrect answers or fails to search for data within the provided document. Therefore, the basic RAG engine we built requires fine-tuning and enhancements. In the upcoming articles, we will explore how to improve this RAG system and make it production-ready.