๐ Getting started
- Python
- JavaScript
Chroma is a database for building AI applications with embeddings. It comes with everything you need to get started built in, and runs on your machine. A hosted version is coming soon!
1. Installโ
pip install chromadb
npm install --save chromadb # yarn add chromadb
You will need to install the Chroma python package to use the Chroma CLI and backend server.
pip install chromadb
Alternatively, you can use a Docker container to run the Chroma backend server.
2. Get the Chroma Clientโ
import chromadb
chroma_client = chromadb.Client()
Start the Chroma backend server:
chroma run --path /db_path
Then create a client which connects to it:
const { ChromaClient } = require("chromadb");
const client = new ChromaClient();
3. Create a collectionโ
Collections are where you'll store your embeddings, documents, and any additional metadata. You can create a collection with a name:
collection = chroma_client.create_collection(name="my_collection")
For this example, we want to generate embeddings from text. OpenAI's ada-002
model is popular, free, and a quick signup. Grab your API key and come back. Chroma's API is polymorphic (it can run in the browser or server-side), but OpenAIs is not. So run this example server-side.
Please take steps to secure your API when interacting with frontend systems.
const { OpenAIEmbeddingFunction } = require("chromadb");
const embedder = new OpenAIEmbeddingFunction({
openai_api_key: "your_api_key",
});
const collection = await client.createCollection({
name: "my_collection",
embeddingFunction: embedder,
});
4. Add some text documents to the collectionโ
Chroma will store your text, and handle tokenization, embedding, and indexing automatically.
collection.add(
documents=["This is a document", "This is another document"],
metadatas=[{"source": "my_source"}, {"source": "my_source"}],
ids=["id1", "id2"]
)
Chroma will store your text, and handle tokenization, embedding, and indexing automatically.
await collection.add({
ids: ["id1", "id2"],
metadatas: [{ source: "my_source" }, { source: "my_source" }],
documents: ["This is a document", "This is another document"],
});
If you have already generated embeddings yourself, you can load them directly in:
collection.add(
embeddings=[[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]],
documents=["This is a document", "This is another document"],
metadatas=[{"source": "my_source"}, {"source": "my_source"}],
ids=["id1", "id2"]
)
await collection.add({
ids: ["id1", "id2"],
embeddings: [
[1.2, 2.3, 4.5],
[6.7, 8.2, 9.2],
],
where: [{ source: "my_source" }, { source: "my_source" }],
documents: ["This is a document", "This is another document"],
});
5. Query the collectionโ
You can query the collection with a list of query texts, and Chroma will return the n
most similar results. It's that easy!
results = collection.query(
query_texts=["This is a query document"],
n_results=2
)
By default data stored in Chroma is ephemeral making it easy to prototype scripts. It's easy to make Chroma persistent so you can reuse every collection you create and add more documents to it later. It will load your data automatically when you start the client, and save it automatically when you close it. Check out the Usage Guide for more info.
Find chromadb on PyPI.
const results = await collection.query({
nResults: 2,
queryTexts: ["This is a query document"],
});
Find chromadb on npm.
๐ Next stepsโ
- Chroma is designed to be simple enough to get started with quickly and flexible enough to meet many use-cases. You can use your own embedding models, query Chroma with your own embeddings, and filter on metadata. To learn more about Chroma, check out the Usage Guide and API Reference.
- Chroma is integrated in LangChain (
python
andjs
), making it easy to build AI applications with Chroma. Check out the integrations page to learn more. - You can deploy a persistent instance of Chroma to an external server, to make it easier to work on larger projects or with a team.
Coming Soonโ
- A hosted version of Chroma, with an easy to use web UI and API
- Multiple datatypes, including images, audio, video, and more