Logo Sujal Magar
Learn ChromaDB

Learn ChromaDB

February 16, 2026
2 min read
Table of Contents

1. Introduction

ChromaDB is a open-source vector database designed for storing and querying embeddings.

It is widely used in AI applications like semantic search and Retrieval-Augmented Generation (RAG).


2. What are Embeddings?

Embeddings are numerical vector representations of data (text, images, audio).

Example:

Text -> [0.12, 0.98, 0.44, …]

They allow similarity comparison using distance metrics.


3. Key Features

  • Vector similarity search
  • Metadata filtering
  • Persistent storage
  • Scalable indexing
  • AI framework integrations

4. Architecture

Components:

  • Client
  • Collections
  • Embedding functions
  • Storage engine

5. Creating a Client

import chromadb
 
client = chromadb.PersistentClient(path="./db")

6. Collections

Equivalent to tables.

collection = client.get_or_create_collection("docs")

7. Adding Data

collection.add(
  documents=["AI is powerful"],
  ids=["1"],
  embeddings=[[0.1, 0.2, 0.3]]
)

8. Querying

collection.query(
  query_embeddings=[[0.1, 0.2, 0.3]],
  n_results=2
)

Returns closest vectors.


9. Distance Metrics

  • Cosine similarity
  • Euclidean distance
  • Dot product

10. Metadata Filtering

collection.query(
  query_embeddings=[[...]],
  where={"topic": "AI"}
)

11. Indexing

Uses ANN (Approximate Nearest Neighbor):

  • HNSW algorithm

12. Persistence

Stores embeddings on disk for reuse.


13. Integrations

  • LangChain
  • LlamaIndex
  • Haystack
  • Hugging Face
  • OpenAI embeddings

14. Use Cases

  • Semantic search
  • Chatbots
  • Recommendation systems
  • Document retrieval

15. Advantages

  • Optimized for vectors
  • Fast similarity search
  • AI-native design

16. Limitations

  • Not for transactional data
  • Requires embeddings pipeline