Simple Wonders of RAG using Ollama, Langchain and ChromaDB

Dive with me into the details of how you can use RAG to produce interesting results for questions related to a specific domain without needing to fine-tune your own model. What is RAG? RAG or Retrieval Augmented Generation is a really complicated way of saying “Knowledge base + LLM.” It describes a system that adds extra data, in addition to what the user provided, before querying the LLM. Where did that additional data come from? It could be from a number of different sources like vector databases, search engines, other pre-trained LLMs, etc. You can read my article for some more context about RAG. How does it work in practice? 1. No RAG, aka Base Case Before we get to RAG, we need to discuss “Chains” — the raison d’être for Langchain’s existence. From Langchain documentation, Chains refer to sequences of calls — whether to an LLM, a tool, or a data preprocessing step. You can see more details in the experiments section. A simple chain you can set up would look something like — chain = prompt | llm | output As you can see, this is very straightforward. You are passing a prompt to an LLM of choice and then using a parser to produce the output. You are using langchain’s concept of “chains” to help sequence these elements, much like you would use pipes in Unix to chain together several system commands like ls | grep file.txt. 2. With RAG The majority of use cases that use RAG today do so by using some form of vector database to store the information that the LLM uses to enhance the answer to the prompt. You can read how vector stores (or vector databases) store information and why it’s a better alternative than something like a traditional SQL or NoSQL store here. There are several flavors of vector databases ranging from commercial paid products like Pinecone to open-source alternatives like ChromaDB and FAISS. I typically use ChromaDB because it’s easy to use and has a lot of support in the community. At a high level, this is how RAG works with vector stores — Take the data you want to provide as contextual information for your prompt and store them as vector embeddings in your vector store of choice. Preload this information so that it’s available to any prompt that needs to use it. At the time of prompt processing, retrieve the relevant context from the vector store — this time by embedding your query and then running a vector search against your store. Pass that along with your prompt to the LLM so that the LLM can provide an appropriate response. 💲 Profit 💲 In practice, a RAG chain would look something like this — docs_chain = create_stuff_documents_chain(llm, prompt) retriever = vector_store.as_retriever() retrieval_chain = create_retrieval_chain(retriever, docs_chain) If you were to simplify the details of all these langchain functions like create_stuff_documents_chain and create_retrieval_chain (you can read up on these in Langchain’s official documentation) really, what it boils down to is something like — context_data = vector_store chain = context_data | llm | prompt which at a high level isn’t all that different from the base case shown above. The main difference is the inclusion of the contextual data that is provided with your prompt. And that, my friends, is RAG. However, like all modern software, things can’t be THAT simple. So there is a lot of syntactic sugar to make all of these things work, which looks like — docs_chain = create_stuff_documents_chain(llm, prompt) retriever = vector_store.as_retriever() retrieval_chain = create_retrieval_chain(retriever, docs_chain) As you can see in the diagram above there are many things happening to build an actual RAG-based system. However, if you focus on the “Retrieval chain,” you will see that it is composed of 2 main components — the simple chain on the bottom and the construction of the vector data on the top Bob’s your uncle. Running Experiments To demonstrate the effectiveness of RAG, I would like to know the answer to the question — How can langsmith help with testing? For those who are unaware, Langsmith is Langchain’s product offering, which provides tooling to help develop, test, deploy, and monitor LLM applications. Since unpaid versions of LLMs (as of 4/24) still have the limitation of not being connected to the internet and are trained on data from before 2021, Langsmith is not a concept known to LLMs. The usage of RAG here would be to see if providing context about Langsmith helps the LLM provide a better version of the answer to the question above. 1. Without RAG from langchain_community.llms import Ollama from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser # Simple chain invocation ## LLM + Prompt llm = Ollama(model="mistral") output = StrOutputParser() prompt = ChatPromptTemplate.from_messages( [ ( "system", "You are a skilled technical writer.", ), ("human", "{user_input}"), ] ) chain = prompt | llm | output ## Winner winner chicken dinner response = chain.invoke({"user_input": "how can langsmith help with testing?"}) print(":::ROUND 1:::") print(response) Output # :::ROUND 1::: SIMPLE RETRIEVAL Langsmith, being a text-based AI model, doesn't directly interact with software or perform tests in the traditional sense. However, it can assist with various aspects of testing through its strong language processing abilities. Here are some ways Langsmith can contribute to testing: 1. Writing test cases and test plans: Langsmith can help write clear, concise, and comprehensive test cases and test plans based on user stories or functional specifications. It can also suggest possible edge cases and boundary conditions for testing. 2. Generating test data: Langsmith can create realistic test data for different types of applications. This can be especially useful for large datasets or complex scenarios where generating data manually would be time-consuming and error-prone. 3. Creating test scripts: Langsmith can write test scripts in popular automation frameworks such as Selenium, TestNG, JMeter, etc., based on the test cases and expected outcomes. 4. Providing test reports: Langsmith can help draft clear and concise test reports that summarize the results of different testing activities. It can also generate statistics and metrics from test data to help identify trends and patterns in software performance. 5. Supporting bug tracking systems: Langsmith can write instructions for how to reproduce bugs and suggest potential fixes based on symptom analysis and past issue resolutions. 6. Automating regression tests: While it doesn't directly execute automated tests, Langsmith can write test scripts or provide instructions on how to automate existing manual tests using tools like Selenium, TestComplete, etc. 7. Improving testing documentation: Langsmith can help maintain and update testing documentation, ensuring that all relevant information is kept up-to-date and easily accessible to team members. ❌ As you can see, there are no references to any testing benefits of Langsmith (“Langsmith, being a text-based AI model, doesn’t directly interact with software or perform tests in the traditional sense.”). All the verbiage is super vague, and the LLM is hallucinating to come up with insipid cases in an effort to try to answer the question. (“Langsmith can write test scripts in popular automation frameworks such as Selenium, TestNG, JMeter, etc., based on the test cases and expected outcomes”). 2. With RAG The extra context is coming from a webpage that we will be loading into our vector store. from langchain_community.llms import Ollama from langchain_community.document_loaders import WebBaseLoader from langchain_community.embeddings import OllamaEmbeddings from langchain_community.vectorstores import FAISS from langchain_core.prompts import ChatPromptTemplate from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain.chains.combine_documents import create_stuff_documents_chain from langchain.chains import create_retrieval_chain # Invoke chain with RAG context llm = Ollama(model="mistral") ## Load page content loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide") docs = loader.load() ## Vector store things embeddings = OllamaEmbeddings(model="nomic-embed-text") text_splitter = RecursiveCharacterTextSplitter() split_documents = text_splitter.split_documents(docs) vector_store = FAISS.from_documents(split_documents, embeddings) ## Prompt construction prompt = ChatPromptTemplate.from_template( """ Answer the following question only based on the given context {context} Question: {input} """ ) ## Retrieve context from vector store docs_chain = create_stuff_documents_chain(llm, prompt) retriever = vector_store.as_retriever() retrieval_chain = create_retrieval_chain(retriever, docs_chain) ## Winner winner chicken dinner response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"}) print(":::ROUND 2:::") print(response["answer"]) Output # :::ROUND 2::: RAG RETRIEVAL Langsmith is a platform that helps developers test and monitor their Large Language Model (LLM) applications in various stages of development, including prototyping, beta testing, and production. It provides several workflows to support effective testing: 1. Tracing: Langsmith logs application traces, allowing users to debug issues by examining the data step-by-step. This can help identify unexpected end results, infinite agent loops, slower than expected execution, or higher token usage. Traces in Langsmith are rendered with clear visibility and debugging information at each step of an LLM sequence, making it easier to diagnose and root-cause issues. 2. Initial Test Set: Langsmith supports creating datasets (collections of inputs and reference outputs) and running tests on LLM applications using these test cases. Users can easily upload, create on the fly, or export test cases from application traces. This allows developers to adopt a more test-driven approach and compare test results across different model configurations. 3. Comparison View: Langsmith's comparison view enables users to track and diagnose regressions in test scores across multiple revisions of their applications. Changes in the prompt, retrieval strategy, or model choice can have significant implications on the responses produced by the application, so being able to compare results for different configurations side-by-side is essential. 4. Monitoring and A/B Testing: Langsmith provides monitoring charts to track key metrics over time and drill down into specific data points to get trace tables for that time period. This is helpful for debugging production issues and A/B testing changes in prompt, model, or retrieval strategy. 5. Production: Once the application hits production, Langsmith's high-level overview of application performance with respect to latency, cost, and feedback scores ensures it continues delivering desirable results at scale. ✅ As you can see, there are very specific references to the testing capabilities of Langsmith, which we were able to extract due to the supplemental knowledge provided using the PDF. We were able to augment the capabilities of the standard LLM with the specific domain knowledge required to answer this question. So what now? This is just the tip of the iceberg with RAG. There is a world of optimizations and enhancements you can make to see the full power of RAG applied in practice. Hopefully, this is a good launchpad for you to try out the bigger and better things yourself! To get access to the complete code, you can go here. Thanks for reading! ⭐ If you like this type of content, be sure to follow me or subscribe to https://a1engineering.substack.com/subscribe! ⭐ Dive with me into the details of how you can use RAG to produce interesting results for questions related to a specific domain without needing to fine-tune your own model. What is RAG? RAG or Retrieval Augmented Generation is a really complicated way of saying “Knowledge base + LLM.” It describes a system that adds extra data, in addition to what the user provided, before querying the LLM. Where did that additional data come from? It could be from a number of different sources like vector databases, search engines, other pre-trained LLMs, etc. You can read my article for some more context about RAG . context about RAG How does it work in practice? 1. No RAG, aka Base Case Before we get to RAG, we need to discuss “Chains” — the raison d’être for Lang chain ’s existence. From Langchain documentation , Chains refer to sequences of calls — whether to an LLM, a tool, or a data preprocessing step. You can see more details in the experiments section. chain documentation A simple chain you can set up would look something like — chain = prompt | llm | output chain = prompt | llm | output As you can see, this is very straightforward. You are passing a prompt to an LLM of choice and then using a parser to produce the output. You are using langchain’s concept of “chains” to help sequence these elements, much like you would use pipes in Unix to chain together several system commands like ls | grep file.txt. 2. With RAG The majority of use cases that use RAG today do so by using some form of vector database to store the information that the LLM uses to enhance the answer to the prompt. You can read how vector stores (or vector databases) store information and why it’s a better alternative than something like a traditional SQL or NoSQL store here . here There are several flavors of vector databases ranging from commercial paid products like Pinecone to open-source alternatives like ChromaDB and FAISS. I typically use ChromaDB because it’s easy to use and has a lot of support in the community. At a high level, this is how RAG works with vector stores — Take the data you want to provide as contextual information for your prompt and store them as vector embeddings in your vector store of choice. Preload this information so that it’s available to any prompt that needs to use it. At the time of prompt processing, retrieve the relevant context from the vector store — this time by embedding your query and then running a vector search against your store. Pass that along with your prompt to the LLM so that the LLM can provide an appropriate response. 💲 Profit 💲 Take the data you want to provide as contextual information for your prompt and store them as vector embeddings in your vector store of choice. Preload this information so that it’s available to any prompt that needs to use it. At the time of prompt processing, retrieve the relevant context from the vector store — this time by embedding your query and then running a vector search against your store. Pass that along with your prompt to the LLM so that the LLM can provide an appropriate response. 💲 Profit 💲 In practice, a RAG chain would look something like this — docs_chain = create_stuff_documents_chain(llm, prompt) retriever = vector_store.as_retriever() retrieval_chain = create_retrieval_chain(retriever, docs_chain) docs_chain = create_stuff_documents_chain(llm, prompt) retriever = vector_store.as_retriever() retrieval_chain = create_retrieval_chain(retriever, docs_chain) If you were to simplify the details of all these langchain functions like create_stuff_documents_chain and create_retrieval_chain (you can read up on these in Langchain’s official documentation) really, what it boils down to is something like — create_stuff_documents_chain create_retrieval_chain context_data = vector_store chain = context_data | llm | prompt context_data = vector_store chain = context_data | llm | prompt which at a high level isn’t all that different from the base case shown above. The main difference is the inclusion of the contextual data that is provided with your prompt. And that, my friends, is RAG. However, like all modern software, things can’t be THAT simple. So there is a lot of syntactic sugar to make all of these things work, which looks like — docs_chain = create_stuff_documents_chain(llm, prompt) retriever = vector_store.as_retriever() retrieval_chain = create_retrieval_chain(retriever, docs_chain) docs_chain = create_stuff_documents_chain(llm, prompt) retriever = vector_store.as_retriever() retrieval_chain = create_retrieval_chain(retriever, docs_chain) As you can see in the diagram above there are many things happening to build an actual RAG-based system. However, if you focus on the “Retrieval chain,” you will see that it is composed of 2 main components — the simple chain on the bottom and the construction of the vector data on the top the simple chain on the bottom and the construction of the vector data on the top Bob’s your uncle. Running Experiments To demonstrate the effectiveness of RAG, I would like to know the answer to the question — How can langsmith help with testing? How can langsmith help with testing? How can langsmith help with testing? For those who are unaware, Langsmith is Langchain’s product offering, which provides tooling to help develop, test, deploy, and monitor LLM applications. Langsmith Since unpaid versions of LLMs (as of 4/24) still have the limitation of not being connected to the internet and are trained on data from before 2021, Langsmith is not a concept known to LLMs. The usage of RAG here would be to see if providing context about Langsmith helps the LLM provide a better version of the answer to the question above. 1. Without RAG from langchain_community.llms import Ollama from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser # Simple chain invocation ## LLM + Prompt llm = Ollama(model="mistral") output = StrOutputParser() prompt = ChatPromptTemplate.from_messages( [ ( "system", "You are a skilled technical writer.", ), ("human", "{user_input}"), ] ) chain = prompt | llm | output ## Winner winner chicken dinner response = chain.invoke({"user_input": "how can langsmith help with testing?"}) print(":::ROUND 1:::") print(response) from langchain_community.llms import Ollama from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser # Simple chain invocation ## LLM + Prompt llm = Ollama(model="mistral") output = StrOutputParser() prompt = ChatPromptTemplate.from_messages( [ ( "system", "You are a skilled technical writer.", ), ("human", "{user_input}"), ] ) chain = prompt | llm | output ## Winner winner chicken dinner response = chain.invoke({"user_input": "how can langsmith help with testing?"}) print(":::ROUND 1:::") print(response) Output Output # :::ROUND 1::: SIMPLE RETRIEVAL Langsmith, being a text-based AI model, doesn't directly interact with software or perform tests in the traditional sense. However, it can assist with various aspects of testing through its strong language processing abilities. Here are some ways Langsmith can contribute to testing: 1. Writing test cases and test plans: Langsmith can help write clear, concise, and comprehensive test cases and test plans based on user stories or functional specifications. It can also suggest possible edge cases and boundary conditions for testing. 2. Generating test data: Langsmith can create realistic test data for different types of applications. This can be especially useful for large datasets or complex scenarios where generating data manually would be time-consuming and error-prone. 3. Creating test scripts: Langsmith can write test scripts in popular automation frameworks such as Selenium, TestNG, JMeter, etc., based on the test cases and expected outcomes. 4. Providing test reports: Langsmith can help draft clear and concise test reports that summarize the results of different testing activities. It can also generate statistics and metrics from test data to help identify trends and patterns in software performance. 5. Supporting bug tracking systems: Langsmith can write instructions for how to reproduce bugs and suggest potential fixes based on symptom analysis and past issue resolutions. 6. Automating regression tests: While it doesn't directly execute automated tests, Langsmith can write test scripts or provide instructions on how to automate existing manual tests using tools like Selenium, TestComplete, etc. 7. Improving testing documentation: Langsmith can help maintain and update testing documentation, ensuring that all relevant information is kept up-to-date and easily accessible to team members. # :::ROUND 1::: SIMPLE RETRIEVAL Langsmith, being a text-based AI model, doesn't directly interact with software or perform tests in the traditional sense. However, it can assist with various aspects of testing through its strong language processing abilities. Here are some ways Langsmith can contribute to testing: 1. Writing test cases and test plans: Langsmith can help write clear, concise, and comprehensive test cases and test plans based on user stories or functional specifications. It can also suggest possible edge cases and boundary conditions for testing. 2. Generating test data: Langsmith can create realistic test data for different types of applications. This can be especially useful for large datasets or complex scenarios where generating data manually would be time-consuming and error-prone. 3. Creating test scripts: Langsmith can write test scripts in popular automation frameworks such as Selenium, TestNG, JMeter, etc., based on the test cases and expected outcomes. 4. Providing test reports: Langsmith can help draft clear and concise test reports that summarize the results of different testing activities. It can also generate statistics and metrics from test data to help identify trends and patterns in software performance. 5. Supporting bug tracking systems: Langsmith can write instructions for how to reproduce bugs and suggest potential fixes based on symptom analysis and past issue resolutions. 6. Automating regression tests: While it doesn't directly execute automated tests, Langsmith can write test scripts or provide instructions on how to automate existing manual tests using tools like Selenium, TestComplete, etc. 7. Improving testing documentation: Langsmith can help maintain and update testing documentation, ensuring that all relevant information is kept up-to-date and easily accessible to team members. ❌ As you can see, there are no references to any testing benefits of Langsmith (“Langsmith, being a text-based AI model, doesn’t directly interact with software or perform tests in the traditional sense.”). All the verbiage is super vague, and the LLM is hallucinating to come up with insipid cases in an effort to try to answer the question. (“Langsmith can write test scripts in popular automation frameworks such as Selenium, TestNG, JMeter, etc., based on the test cases and expected outcomes”). ❌ As you can see, there are no references to any testing benefits of Langsmith (“Langsmith, being a text-based AI model, doesn’t directly interact with software or perform tests in the traditional sense.”). All the verbiage is super vague, and the LLM is hallucinating to come up with insipid cases in an effort to try to answer the question. (“Langsmith can write test scripts in popular automation frameworks such as Selenium, TestNG, JMeter, etc., based on the test cases and expected outcomes”). 2. With RAG The extra context is coming from a webpage that we will be loading into our vector store. webpage from langchain_community.llms import Ollama from langchain_community.document_loaders import WebBaseLoader from langchain_community.embeddings import OllamaEmbeddings from langchain_community.vectorstores import FAISS from langchain_core.prompts import ChatPromptTemplate from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain.chains.combine_documents import create_stuff_documents_chain from langchain.chains import create_retrieval_chain # Invoke chain with RAG context llm = Ollama(model="mistral") ## Load page content loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide") docs = loader.load() ## Vector store things embeddings = OllamaEmbeddings(model="nomic-embed-text") text_splitter = RecursiveCharacterTextSplitter() split_documents = text_splitter.split_documents(docs) vector_store = FAISS.from_documents(split_documents, embeddings) ## Prompt construction prompt = ChatPromptTemplate.from_template( """ Answer the following question only based on the given context {context} Question: {input} """ ) ## Retrieve context from vector store docs_chain = create_stuff_documents_chain(llm, prompt) retriever = vector_store.as_retriever() retrieval_chain = create_retrieval_chain(retriever, docs_chain) ## Winner winner chicken dinner response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"}) print(":::ROUND 2:::") print(response["answer"]) from langchain_community.llms import Ollama from langchain_community.document_loaders import WebBaseLoader from langchain_community.embeddings import OllamaEmbeddings from langchain_community.vectorstores import FAISS from langchain_core.prompts import ChatPromptTemplate from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain.chains.combine_documents import create_stuff_documents_chain from langchain.chains import create_retrieval_chain # Invoke chain with RAG context llm = Ollama(model="mistral") ## Load page content loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide") docs = loader.load() ## Vector store things embeddings = OllamaEmbeddings(model="nomic-embed-text") text_splitter = RecursiveCharacterTextSplitter() split_documents = text_splitter.split_documents(docs) vector_store = FAISS.from_documents(split_documents, embeddings) ## Prompt construction prompt = ChatPromptTemplate.from_template( """ Answer the following question only based on the given context {context} Question: {input} """ ) ## Retrieve context from vector store docs_chain = create_stuff_documents_chain(llm, prompt) retriever = vector_store.as_retriever() retrieval_chain = create_retrieval_chain(retriever, docs_chain) ## Winner winner chicken dinner response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"}) print(":::ROUND 2:::") print(response["answer"]) Output Output # :::ROUND 2::: RAG RETRIEVAL Langsmith is a platform that helps developers test and monitor their Large Language Model (LLM) applications in various stages of development, including prototyping, beta testing, and production. It provides several workflows to support effective testing: 1. Tracing: Langsmith logs application traces, allowing users to debug issues by examining the data step-by-step. This can help identify unexpected end results, infinite agent loops, slower than expected execution, or higher token usage. Traces in Langsmith are rendered with clear visibility and debugging information at each step of an LLM sequence, making it easier to diagnose and root-cause issues. 2. Initial Test Set: Langsmith supports creating datasets (collections of inputs and reference outputs) and running tests on LLM applications using these test cases. Users can easily upload, create on the fly, or export test cases from application traces. This allows developers to adopt a more test-driven approach and compare test results across different model configurations. 3. Comparison View: Langsmith's comparison view enables users to track and diagnose regressions in test scores across multiple revisions of their applications. Changes in the prompt, retrieval strategy, or model choice can have significant implications on the responses produced by the application, so being able to compare results for different configurations side-by-side is essential. 4. Monitoring and A/B Testing: Langsmith provides monitoring charts to track key metrics over time and drill down into specific data points to get trace tables for that time period. This is helpful for debugging production issues and A/B testing changes in prompt, model, or retrieval strategy. 5. Production: Once the application hits production, Langsmith's high-level overview of application performance with respect to latency, cost, and feedback scores ensures it continues delivering desirable results at scale. # :::ROUND 2::: RAG RETRIEVAL Langsmith is a platform that helps developers test and monitor their Large Language Model (LLM) applications in various stages of development, including prototyping, beta testing, and production. It provides several workflows to support effective testing: 1. Tracing : Langsmith logs application traces, allowing users to debug issues by examining the data step-by-step. This can help identify unexpected end results, infinite agent loops, slower than expected execution, or higher token usage. Traces in Langsmith are rendered with clear visibility and debugging information at each step of an LLM sequence, making it easier to diagnose and root-cause issues. Tracing 2. Initial Test Set : Langsmith supports creating datasets (collections of inputs and reference outputs) and running tests on LLM applications using these test cases. Users can easily upload, create on the fly, or export test cases from application traces. This allows developers to adopt a more test-driven approach and compare test results across different model configurations. Initial Test Set 3. Comparison View : Langsmith's comparison view enables users to track and diagnose regressions in test scores across multiple revisions of their applications. Changes in the prompt, retrieval strategy, or model choice can have significant implications on the responses produced by the application, so being able to compare results for different configurations side-by-side is essential. Comparison View 4. Monitoring and A/B Testing : Langsmith provides monitoring charts to track key metrics over time and drill down into specific data points to get trace tables for that time period. This is helpful for debugging production issues and A/B testing changes in prompt, model, or retrieval strategy. Monitoring and A/B Testing 5. Production : Once the application hits production, Langsmith's high-level overview of application performance with respect to latency, cost, and feedback scores ensures it continues delivering desirable results at scale. Production ✅ As you can see, there are very specific references to the testing capabilities of Langsmith, which we were able to extract due to the supplemental knowledge provided using the PDF. We were able to augment the capabilities of the standard LLM with the specific domain knowledge required to answer this question. So what now? This is just the tip of the iceberg with RAG. There is a world of optimizations and enhancements you can make to see the full power of RAG applied in practice. Hopefully, this is a good launchpad for you to try out the bigger and better things yourself! To get access to the complete code, you can go here . Thanks for reading! here ⭐ If you like this type of content, be sure to follow me or subscribe to https://a1engineering.substack.com/subscribe ! ⭐ https://a1engineering.substack.com/subscribe https://a1engineering.substack.com/subscribe