Chroma: Open-source embedding database for AI
Vector database for embedding storage and semantic search.
Learn more about chroma
Chroma is a vector database that stores embeddings and enables retrieval through nearest-neighbor search rather than traditional substring matching. It handles the full pipeline of tokenization, embedding generation, and indexing automatically, though users can also provide custom embeddings. The system supports filtering through metadata and document content, and can run in multiple modes including in-memory for development, persistent local storage, or client-server architecture. Common deployment contexts include retrieval-augmented generation (RAG) systems, semantic search applications, and LLM-based chat interfaces that require contextual document retrieval.

Minimal API surface
The core functionality is exposed through four primary functions for collection management and querying, reducing the learning curve for integration into existing applications.
Automatic embedding handling
The system can automatically tokenize, embed, and index documents using default models like Sentence Transformers, while also accepting custom embeddings from alternative providers like OpenAI or Cohere.
Multi-mode deployment
Chroma runs in-memory for prototyping, supports persistent local storage, and offers a client-server mode for scaling, allowing the same API to function across development, testing, and production environments.
import chromadb
client = chromadb.Client()
collection = client.create_collection(name="my_documents")
collection.add(
documents=["This is a document about cats", "This is about dogs"],
ids=["doc1", "doc2"]
)Development pre-release build from main branch; release notes do not specify changes, breaking updates, or requirements.
- –Treat as unstable dev snapshot; pin to a stable release for production workloads.
- –No changelog provided; review commit history or wait for official release notes before upgrading.
Adds BM25 embedding function and fixes Qwen embedding function hydration; no breaking changes noted.
- –Use the new chroma_bm25 embedding function in Python for sparse retrieval workflows.
- –Qwen embedding function now correctly hydrates with custom prompts and tasks.
Patch release fixing compaction with empty logs and sparse autoembed queries in the search API.
- –Update if you hit compaction failures when log segments are empty during rebuild operations.
- –Upgrade to resolve broken sparse autoembed queries when using the search API endpoint.
See how people are using chroma
Top in Data Engineering
Related Repositories
Discover similar tools and frameworks used by developers
supabase
PostgreSQL backend with auto-generated APIs and real-time subscriptions.
exceljs
Parse, modify, and generate XLSX files in Node.js.
pdfplumber
Python library for extracting PDF text and tables.
flyway
Version-controlled SQL migrations with automated execution tracking.
luigi
Build complex batch pipelines with dependency management.