Made with ❤️ and GitHub Copilot
1 Introduction
Vector databases store high-dimensional numerical vectors and enable fast similarity search. They are widely used in Retrieval-Augmented Generation (RAG) systems, recommendation engines, and computer vision applications.
This post covers the fundamentals of vector databases and a discussion of RAG. It provides a Python implementation using FAISS, a popular library for fast similarity search.
2 What is a Vector Database?
A vector database is optimized for storing and querying high-dimensional vectors efficiently. Unlike traditional databases that use structured queries, vector databases retrieve data using similarity measures like:
- Cosine Similarity (angle between vectors)
- Euclidean Distance (L2 norm)
- Inner Product (dot product)
2.1 Use Cases
- Retrieval-Augmented Generation (RAG): Enhancing LLM responses
- Recommendation Systems: Finding similar users or products
- Image & Video Search: Searching by content rather than metadata
3 What is Retrieval-Augmented Generation (RAG)?
Retrieval-Augmented Generation (RAG) is an AI framework that combines retrieval-based search with generative models to improve the quality and accuracy of text generation. Instead of relying only on a pre-trained language model’s internal knowledge, RAG dynamically fetches relevant information from external sources (like a database, vector store, or web search) to generate more informed and up-to-date responses.
How RAG Works
- Retrieval Step
- Given a query, the system retrieves relevant documents from an external knowledge base (e.g., a vector database like FAISS, Pinecone, or a search engine).
- Common retrieval methods include dense vector search (e.g., using embeddings from transformers like BERT or OpenAI embeddings) and keyword search.
- Augmentation Step
- The retrieved documents are provided as additional context to a large language model (LLM).
- This allows the model to generate responses based on both its pre-trained knowledge and real-time, external information.
- Generation Step
- The LLM synthesizes an answer, incorporating the retrieved knowledge while ensuring coherence and fluency.
Benefits of RAG
- More Accurate & Up-to-Date: Retrieves real-time or domain-specific knowledge, reducing hallucinations.
- Interpretable: Users can see the sources used for generating responses.
- Efficient: Allows smaller models to perform better by offloading factual knowledge to retrieval systems.
Use Cases
- Chatbots & Virtual Assistants: Improved customer support with company-specific knowledge.
- Enterprise Search: Querying internal documents dynamically.
- Medical & Legal AI: Ensuring responses are based on authoritative sources.
4 How Are Best Matches Found for a Query?
The best matches for a given query vector are found using nearest neighbor search (NNS) techniques. The most common method is k-nearest neighbors (k-NN), which identifies the top-k closest vectors in the database based on a similarity metric.
For large-scale search, approximate methods like Hierarchical Navigable Small World (HNSW) and Product Quantization (PQ) can be used to improve efficiency while maintaining accuracy.
4.1 k-NN in FAISS
FAISS provides both exact and approximate k-NN search. - Exact Search: Uses brute-force comparison for the most accurate results. - Approximate Search: Uses indexing structures like IVF (Inverted File Index) and HNSW for faster retrieval at scale.
5 Implementing a Simple Vector Database in Python
We’ll start by implementing a minimal in-memory vector database in Python before introducing FAISS for efficient retrieval.
5.1 Step 1: Stub Implementation
Below is a basic skeleton for a vector database.
from typing import List, Tuple, Optional
import numpy as np
class SimpleVectorDB:
def __init__(self, dim: int):
"""Initialize the vector database with a given dimensionality."""
self.dim = dim
self.vectors = [] # List of stored vectors
self.metadata = [] # Optional metadata for each vector
def add_vector(self, vector: np.ndarray, meta: Optional[dict] = None) -> int:
"""Add a new vector with optional metadata and return its index."""
self.vectors.append(vector)
self.metadata.append(meta)
return len(self.vectors) - 1
def search(self, query: np.ndarray, k: int = 5) -> List[Tuple[int, float]]:
"""Find the top-k closest vectors using cosine similarity."""
if not self.vectors:
return []
= np.array(self.vectors)
matrix = matrix @ query / (np.linalg.norm(matrix, axis=1) * np.linalg.norm(query))
similarities = np.argsort(-similarities)[:k]
top_k return [(i, similarities[i]) for i in top_k]
def get_vector(self, index: int) -> Optional[np.ndarray]:
"""Retrieve a vector by index."""
return self.vectors[index] if 0 <= index < len(self.vectors) else None
5.2 Step 2: Using FAISS for Fast Search
FAISS (Facebook AI Similarity Search) is optimized for large-scale vector search.
import faiss
class FaissVectorDB:
def __init__(self, dim: int):
"""Initialize a FAISS-based vector database."""
self.index = faiss.IndexFlatL2(dim) # L2 distance index
self.vectors = []
def add_vector(self, vector: np.ndarray):
"""Add a new vector to the FAISS index."""
self.index.add(vector.reshape(1, -1))
self.vectors.append(vector)
def search(self, query: np.ndarray, k: int = 5):
"""Retrieve top-k closest vectors."""
= self.index.search(query.reshape(1, -1), k)
distances, indices return list(zip(indices[0], distances[0]))
6 Conclusion
Vector databases are powerful tools for similarity search and information retrieval. This post introduced a simple implementation and an optimized approach using FAISS. Future improvements could include HNSW for approximate search or metadata storage.
Would you like a follow-up post on integrating FAISS with RAG? Let me know in the comments!