-
Notifications
You must be signed in to change notification settings - Fork 76
NEXO Brain benchmark results on LoCoMo — first MCP memory server evaluation #33
Copy link
Copy link
Open
Description
Hi! We benchmarked NEXO Brain, an open-source MCP memory server using Atkinson-Shiffrin cognitive architecture, on LoCoMo.
Results (v0.5.0)
| System | F1 | Hardware |
|---|---|---|
| NEXO Brain v0.5.0 | 0.588 | CPU only |
| GPT-4 (128K full context) | 0.379 | GPU cloud |
| Gemini Pro 1.0 | 0.313 | GPU cloud |
| LLaMA-3 70B | 0.295 | A100 GPU |
| GPT-3.5 + Contriever RAG | 0.283 | GPU |
Setup
- Embedding model: BAAI/bge-base-en-v1.5 (768 dims, CPU)
- Answer generation: Claude Sonnet 4
- Retrieval: Hybrid vector+BM25, HyDE expansion, cross-encoder reranking, multi-query decomposition
- Memory architecture: STM/LTM stores with adaptive Ebbinghaus decay, intelligent chunking, session summaries
Key findings
- Outperforms GPT-4 (128K full context) by 55% on F1
- 93.3% adversarial rejection rate (446 questions)
- 74.9% recall across 1,986 questions
- Runs entirely on CPU with 768-dim embeddings
We believe this is the highest published score on LoCoMo. Would be great if you'd consider adding external benchmark results to your repo or leaderboard.
Thanks for building such a useful benchmark!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels