Learn ChromaDB | Sujal Magar

1. Introduction

ChromaDB is a open-source vector database designed for storing and querying embeddings.

It is widely used in AI applications like semantic search and Retrieval-Augmented Generation (RAG).

Embeddings are numerical vector representations of data (text, images, audio).

Example:

Text -> [0.12, 0.98, 0.44, …]

They allow similarity comparison using distance metrics.

Components:

import chromadb
 
client = chromadb.PersistentClient(path="./db")

Equivalent to tables.

collection = client.get_or_create_collection("docs")

collection.add(
  documents=["AI is powerful"],
  ids=["1"],
  embeddings=[[0.1, 0.2, 0.3]]
)

collection.query(
  query_embeddings=[[0.1, 0.2, 0.3]],
  n_results=2
)

Returns closest vectors.

collection.query(
  query_embeddings=[[...]],
  where={"topic": "AI"}
)

Uses ANN (Approximate Nearest Neighbor):

Stores embeddings on disk for reuse.