Building a RAG Pipeline with LangChain

In the age of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) has become a powerful technique to generate accurate, up-to-date, and context-aware answers by combining external knowledge with generative AI.

In this blog, we’ll walk through how to build a RAG pipeline using LangChain — starting from loading your documents (PDF, text, web), splitting them, embedding them into vectors, storing them in a vector database, and finally using them in a complete retrieval-based chatbot or QA system.

What is a RAG Pipeline?

RAG (Retrieval-Augmented Generation) is a technique that combines information retrieval with text generation. Instead of relying solely on a language model’s internal memory, RAG fetches relevant documents from an external knowledge base (using embeddings and vector databases) and provides them as context for the LLM to generate informed responses.

1. Document Loaders in LangChain

In a RAG (Retrieval-Augmented Generation) pipeline, the first step is ingesting raw data. LangChain simplifies this through Document Loaders, which help load data from multiple file types and sources — whether it’s a .txt file, a .pdf, a website, or even structured formats like CSV or JSON.

Document loaders convert unstructured or semi-structured data into Document objects, which are later processed and used for retrieval by the LLM.

A Document Loader is a class in LangChain that reads content from a source and returns it as a list of Document objects. Each Document typically includes:

page_content: the main content (text)
metadata: useful metadata such as source path, page number, or URL

1.1 Document Loaders in LangChain

TextLoader – Load a Simple Text File

from langchain_community.document_loaders import TextLoader 

loader = TextLoader("example.txt")
documents = loader.load()

PyPDFLoader – Loads content from PDF files, split page by page.

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("document.pdf")
documents = loader.load()

WebBaseLoader – Loads data from web pages by scraping the content.


from langchain_community.document_loaders import WebBaseLoader
import bs4

loader=WebBaseLoader(web_paths=("https://python.langchain.com/docs/integrations/document_loaders/",),)
docs=loader.load()

Explore the full list of loaders here:

https://python.langchain.com/docs/integrations/document_loaders

2. Text Splitters in LangChain

Once you’ve loaded documents using loaders like TextLoader, PyPDFLoader, or WebBaseLoader, the next critical step in building a RAG pipeline is splitting the documents into smaller, manageable chunks.

Why?
👉 Because LLMs like GPT have a context length limit, and chunking ensures:

Better semantic embedding
More relevant retrieval
Efficient usage of tokens

A Text Splitter takes in long documents and splits them into smaller chunks, each ideally representing a self-contained semantic unit of text. Each chunk can overlap with others, ensuring contextual continuity.

2.1 Popular Text Splitters in LangChain

RecursiveCharacterTextSplitter

The most robust and commonly used splitter. It tries multiple strategies recursively to split at natural boundaries — like paragraphs or sentences — before falling back to characters.

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap = 100)

final_documents = text_splitter.split_documents(docs)

CharacterTextSplitter

Splits text into chunks by character count, without considering sentence or paragraph boundaries.

from langchain_text_splitters import CharacterTextSplitter

splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=300,
    chunk_overlap=30
)
chunks = splitter.split_documents(all_docs)

chunk_size – Maximum size of each chunk (in characters or tokens

chunk_overlap – Overlap between chunks to maintain context

LangChain provides a wide range of text splitters for specific file types and formats — Markdown, CSV, HTML, JSON, and more.

https://python.langchain.com/docs/concepts/text_splitters

3. Embeddings – Converting Text to Vectors

Once your documents are split into meaningful chunks, the next step in building a RAG pipeline is to convert those text chunks into numerical representations, called embeddings. These embeddings are stored in a vector database and used to find the most relevant chunks when a user asks a question.

Embeddings are high-dimensional vector representations of text that capture semantic meaning. Similar texts have similar embeddings, which allows us to perform semantic search — a core part of RAG.

For example:

"What is AI?" and "Explain Artificial Intelligence" will have similar embeddings.

Provider	Class	Notes
OpenAI	`OpenAIEmbeddings`	Cloud-based, powerful, easy
Hugging Face	`HuggingFaceEmbeddings`	Local models like `all-MiniLM`
Cohere	`CohereEmbeddings`	Cloud-based, free tier
Google PaLM	`GooglePalmEmbeddings`	For Google AI services
Azure	`AzureOpenAIEmbeddings`	Enterprise-ready

3.1 Embedding Models in LangChain

LangChain provides a variety of embedding providers — both API-based and local.

OpenAI Embeddings

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

text="This is a tutorial on OPENAI embeddings"
query_result = embeddings.embed_query(text)

embed_query() – used for converting a user question into a vector during retrieval

len(query_result)
3072

HuggingFace Local Embeddings

For offline or privacy-sensitive setups, HuggingFace embeddings are ideal.

from langchain_huggingface import HuggingFaceEmbeddings
embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

query_embedding = embeddings.embed_query("hello AI")

You can refer different embeddings here:

https://python.langchain.com/docs/integrations/text_embedding

4. Vector Databases – Store and Search Your Embeddings

After converting your text chunks into embeddings (vectors), the next step in the RAG pipeline is to store them in a vector database. This helps you search and retrieve the most relevant chunks based on a user’s query.

A vector database stores high-dimensional vectors (like your embedded text) and helps you find the most similar ones quickly using similarity search (e.g., cosine similarity).

Think of it like Google Search, but instead of matching keywords, it matches meanings of texts.

Vector DB	Type	Description
FAISS	Local	Fast and easy, good for testing and small apps
Pinecone	Cloud	Scalable and production-ready
Chroma	Local	Lightweight and simple
Weaviate	Cloud/Local	Offers metadata filtering
Qdrant	Open-source	Supports filtering and hybrid search

4.1 Popular Vector Databases in LangChain

Using FAISS (Local and Free)

vector_store=FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)
vector_store.add_texts(["AI is future","AI is powerful","Dogs are cute"])

Similarity Search – Finding Relevant Chunks

Once your document chunks are stored as embeddings in a vector database, you can use similarity search to find the most relevant information based on a user’s question.

Similarity Search compares the user query embedding with the document embeddings stored in your vector database and returns the chunks that are most similar in meaning.

Instead of exact word matches (like traditional search), it finds semantically related content.

vector_store=FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)
vector_store.add_texts(["AI is future","AI is powerful","Dogs are cute"])
results = vector_store.similarity_search("Tell me about AI", k=2)

We get output as

[Document(id='ca455999-8dbe-4065-9052-82b65ef06c5b', metadata={}, page_content='AI is powerful'),
 Document(id='e35c7715-e0dd-4280-adab-4cc0f83db33c', metadata={}, page_content='AI is future'),
 Document(id='48092ecf-2299-43da-a160-404eac2e51d8', metadata={}, page_content='Dogs are cute')]

the parameter k stands for:

The number of most similar documents (chunks) to retrieve from the vector database.

Building a RAG Pipeline Using LangChain, FAISS, and PDF

Retrieval-Augmented Generation (RAG) is a powerful approach to enhance the output of large language models by combining them with external knowledge sources. In this guide, we’ll show how to build a simple RAG pipeline using a PDF file, LangChain’s tools, OpenAI embeddings, and FAISS as the vector database.

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
import faiss
from langchain_community.vectorstores import FAISS
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_huggingface import HuggingFaceEmbeddings

FILE_PATH = r"C:\AgenticAI\2-Langchain_Basics\2.4-Vector-db\llama2.pdf"
loader = PyPDFLoader(FILE_PATH)
pages = loader.load()

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)
split_docx = splitter.split_documents(pages)

embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")



index = faiss.IndexFlatIP(384)
vector_store = FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

vector_store.add_documents(documents=split_docx)

retriver = vector_store.as_retriever(
    search_kwargs={"k": 10}
)

retriver.invoke("what is llma model")

Code Explanation

In this RAG pipeline, we begin by using PyPDFLoader from langchain_community.document_loaders to load the contents of a PDF file — in this case, llama2.pdf. This loader reads each page and converts it into structured Document objects, making the text ready for downstream processing.

Once loaded, we use RecursiveCharacterTextSplitter to chunk the document content into manageable pieces. Here, each chunk is limited to 500 characters with a 50-character overlap. This overlap ensures that contextual continuity is maintained between chunks, which is especially important in semantic search and retrieval-based tasks.

After splitting, we initialize HuggingFaceEmbeddings using the lightweight and efficient "all-MiniLM-L6-v2" model. This step transforms each document chunk into high-dimensional vectors that capture semantic meaning — critical for finding similar chunks later. These embeddings are fully open-source, ideal for developers seeking alternatives to proprietary APIs.

To store and search these vectors efficiently, we use FAISS (faiss.IndexFlatIP), a fast and scalable vector similarity library. The FAISS vector store is wrapped using LangChain’s FAISS class, where we also configure an InMemoryDocstore to map vector IDs to the original documents.

Once set up, we call add_documents() to embed and insert the chunks into the FAISS index. Finally, we convert the vector store into a retriever using .as_retriever() and query it with "what is llma model". The retriever fetches the top 10 most semantically similar chunks using inner product similarity, which can then be fed to a language model for answer generation.

Conclusion

Building a Retrieval-Augmented Generation (RAG) system using LangChain, FAISS, and Hugging Face embeddings offers a powerful and flexible way to unlock insights from your own documents like PDFs. By loading documents with PyPDFLoader, splitting text into meaningful chunks using RecursiveCharacterTextSplitter, and converting those chunks into semantic vectors with Hugging Face’s all-MiniLM-L6-v2 model, you create a robust foundation for efficient and accurate semantic search.

FAISS enables lightning-fast similarity searches over these vectors, making it easy to retrieve the most relevant information for any query. This approach is especially valuable for developing intelligent chatbots, document search engines, or knowledge assistants that require deep understanding grounded in your data.

With fully open-source tools, this pipeline is customizable and scalable for a wide range of applications. If you want to build AI systems that combine the strengths of retrieval and generation, leveraging LangChain with Hugging Face embeddings and FAISS is an excellent choice.

Building a RAG Pipeline with LangChain

What is a RAG Pipeline?

1. Document Loaders in LangChain

1.1 Document Loaders in LangChain

2. Text Splitters in LangChain

2.1 Popular Text Splitters in LangChain

RecursiveCharacterTextSplitter

CharacterTextSplitter

3. Embeddings – Converting Text to Vectors

3.1 Embedding Models in LangChain

OpenAI Embeddings

HuggingFace Local Embeddings

4. Vector Databases – Store and Search Your Embeddings

4.1 Popular Vector Databases in LangChain

Using FAISS (Local and Free)

Similarity Search – Finding Relevant Chunks

Building a RAG Pipeline Using LangChain, FAISS, and PDF

Code Explanation

Conclusion

Prem Kumar

From Data to Intelligence: How Machine Learning and Big Data Revolutionized the Digital World

Model Context Protocol (MCP): The “USB Port” for AI Data

Build Automation & AI Agents with n8n

VS Code, Copilot, and MCP: A Trio for Rapid AI-Powered Development

Algorithms at Altitude: The Secret Sauce Behind Sindoor’s Precision Hits

Leave a ReplyCancel Reply

Build Automation & AI Agents with n8n

Harnessing Datadog for Enhanced Cybersecurity Monitoring and Threat Detection

Introduction to Quantum Computing: Concepts, Benefits & Applications

What is a RAG Pipeline?

1. Document Loaders in LangChain

1.1 Document Loaders in LangChain

2. Text Splitters in LangChain

2.1 Popular Text Splitters in LangChain

RecursiveCharacterTextSplitter

CharacterTextSplitter

3. Embeddings – Converting Text to Vectors

3.1 Embedding Models in LangChain

OpenAI Embeddings

HuggingFace Local Embeddings

4. Vector Databases – Store and Search Your Embeddings

4.1 Popular Vector Databases in LangChain

Using FAISS (Local and Free)

Similarity Search – Finding Relevant Chunks

Building a RAG Pipeline Using LangChain, FAISS, and PDF

Code Explanation

Conclusion

Prem Kumar

Leave a ReplyCancel Reply

Trending now