Skip to content

BUG: ChromaDB dimension mismatch when switching between different embedding models #157

@kevin-mindverse

Description

@kevin-mindverse

Issue Description

When switching between different embedding models (e.g., from OpenAI to Ollama), users are encountering dimension mismatch errors in ChromaDB. This occurs because ChromaDB gets initialized with a fixed dimension (1536 for OpenAI) and doesn't handle different embedding dimensions gracefully.

Current Behavior

  • ChromaDB gets locked to the dimension of the first embedding model used (typically 1536 for OpenAI)
  • Switching to a different model with different dimensions causes errors
  • No clear warning or handling mechanism for dimension mismatches

Proposed Solution

We should:

  1. Add a check for embedding dimensions before initializing ChromaDB
  2. Provide clear error messages when dimension mismatch occurs
  3. Add documentation about handling different embedding models
  4. Consider adding an automatic cleanup/reinit mechanism when switching models

Tasks

  • Add dimension validation checks
  • Implement clear error messages
  • Update documentation with model switching guidelines
  • Consider adding a utility function to handle DB cleanup

Temporary Workaround

For users encountering this issue:

  1. Delete the contents of the data folder to clear ChromaDB
  2. Restart training with the new embedding model

Related Issues

#134

Metadata

Metadata

Assignees

No one assigned

    Labels

    all-hands features2025 Second Me All-hands Contribution

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions