-
Notifications
You must be signed in to change notification settings - Fork 1.2k
BUG: ChromaDB dimension mismatch when switching between different embedding models #157
Copy link
Copy link
Open
Labels
all-hands features2025 Second Me All-hands Contribution2025 Second Me All-hands Contribution
Description
Issue Description
When switching between different embedding models (e.g., from OpenAI to Ollama), users are encountering dimension mismatch errors in ChromaDB. This occurs because ChromaDB gets initialized with a fixed dimension (1536 for OpenAI) and doesn't handle different embedding dimensions gracefully.
Current Behavior
- ChromaDB gets locked to the dimension of the first embedding model used (typically 1536 for OpenAI)
- Switching to a different model with different dimensions causes errors
- No clear warning or handling mechanism for dimension mismatches
Proposed Solution
We should:
- Add a check for embedding dimensions before initializing ChromaDB
- Provide clear error messages when dimension mismatch occurs
- Add documentation about handling different embedding models
- Consider adding an automatic cleanup/reinit mechanism when switching models
Tasks
- Add dimension validation checks
- Implement clear error messages
- Update documentation with model switching guidelines
- Consider adding a utility function to handle DB cleanup
Temporary Workaround
For users encountering this issue:
- Delete the contents of the data folder to clear ChromaDB
- Restart training with the new embedding model
Related Issues
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
all-hands features2025 Second Me All-hands Contribution2025 Second Me All-hands Contribution