Skip to content

Fix VLM and Chat Documentation Discrepancies#328

Merged
kovtcharov merged 4 commits intomainfrom
kalin/fix-vlm-documentation
Feb 10, 2026
Merged

Fix VLM and Chat Documentation Discrepancies#328
kovtcharov merged 4 commits intomainfrom
kalin/fix-vlm-documentation

Conversation

@kovtcharov
Copy link
Collaborator

@kovtcharov kovtcharov commented Feb 9, 2026

Summary

Comprehensive fixes for VLM and RAG documentation discrepancies identified in detailed documentation review.

Changes (6 files)

1. docs/spec/vlm-client.mdx

  • ✅ Update default VLM model: Qwen2.5-VL-7BQwen3-VL-4B-Instruct-GGUF (8 occurrences)
  • ✅ Fix timeout: 300s for extraction (was 60s), clarify 60s for loading
  • ✅ Document MIME type auto-detection (PNG, JPEG, GIF, WebP, BMP support)

2. src/gaia/rag/sdk.py (CODE)

  • ✅ Normalize VLM default: Qwen2.5-VL-7BQwen3-VL-4B-Instruct-GGUF
  • Ensures consistency with VLMClient and other GAIA components

3. docs/spec/rag-sdk.mdx

  • ✅ Update VLM model reference: Qwen2.5-VL-7BQwen3-VL-4B

4. docs/reference/cli.mdx

  • ✅ Fix VLM model in init profiles table: Qwen2.5-VL-7BQwen3-VL-4B

5. docs/guides/chat.mdx

  • ✅ Fix ChatConfig assistant_name default: "assistant""gaia"
  • ✅ Fix VLM model in PDF indexing note: Qwen2.5-VL-7BQwen3-VL-4B

6. src/gaia/vlm/mixin.py (CODE)

  • ✅ Remove invalid Qwen3-VL-8B-Instruct-GGUF example (model doesn't exist)

Verification

All changes verified against implementation:

  • src/gaia/llm/vlm_client.py:73 - Qwen3-VL-4B default ✓
  • src/gaia/llm/vlm_client.py:253 - 300s timeout ✓
  • src/gaia/llm/vlm_client.py:27-58 - MIME detection ✓
  • src/gaia/chat/sdk.py:39 - "gaia" assistant_name ✓

Impact

  • ✅ Consistent VLM model across all GAIA components (Qwen3-VL-4B)
  • ✅ Accurate timeout expectations (5 min for complex forms)
  • ✅ Documented MIME type auto-detection feature
  • ✅ Correct ChatConfig defaults
  • ✅ Removed invalid code examples

Testing

  • Documentation and minor code changes only
  • No breaking changes
  • All updates align with existing implementation

- Update VLM default model to Qwen3-VL-4B-Instruct-GGUF (was Qwen2.5-VL-7B)
- Fix VLM timeout: 300s for extraction, 60s for loading (was 60s for both)
- Fix ChatConfig assistant_name default to "gaia" (was "assistant")
- Remove invalid Qwen3-VL-8B model example

Aligns documentation with implementation in src/gaia/llm/vlm_client.py
and src/gaia/chat/sdk.py
@github-actions github-actions bot added the documentation Documentation changes label Feb 9, 2026
Update RAGConfig default VLM model from Qwen2.5-VL-7B-Instruct-GGUF
to Qwen3-VL-4B-Instruct-GGUF for consistency with VLMClient and other
GAIA components (EMR, SD agents).

This ensures consistent VLM model defaults across the framework.
@github-actions github-actions bot added rag RAG system changes performance Performance-critical changes labels Feb 9, 2026
- Update all Qwen2.5-VL-7B references to Qwen3-VL-4B across docs
- Document VLM MIME type auto-detection (PNG, JPEG, GIF, WebP, BMP)
- Normalize RAG SDK to use Qwen3-VL-4B for consistency

Files updated:
- src/gaia/rag/sdk.py (code)
- docs/spec/rag-sdk.mdx
- docs/reference/cli.mdx
- docs/guides/chat.mdx
- docs/spec/vlm-client.mdx
@kovtcharov kovtcharov added this to the v0.15.4 milestone Feb 9, 2026
@kovtcharov kovtcharov self-assigned this Feb 9, 2026
@kovtcharov kovtcharov force-pushed the kalin/fix-vlm-documentation branch from 73bffff to 0244b48 Compare February 9, 2026 10:05
- Update Qwen3-30B → Qwen3-Coder-30B (full model name)
- Update Qwen2.5-VL → Qwen3-VL-4B

Installer plan now uses correct, current model names.
@kovtcharov kovtcharov force-pushed the kalin/fix-vlm-documentation branch from 0244b48 to f461d5c Compare February 9, 2026 10:06
@kovtcharov kovtcharov enabled auto-merge February 9, 2026 10:07
@kovtcharov kovtcharov added this pull request to the merge queue Feb 9, 2026
Merged via the queue into main with commit 66116fa Feb 10, 2026
51 checks passed
@kovtcharov kovtcharov deleted the kalin/fix-vlm-documentation branch February 10, 2026 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Documentation changes performance Performance-critical changes rag RAG system changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants