Skip to content

NEXO Brain benchmark results on LoCoMo — first MCP memory server evaluation #33

@wazionapps

Description

@wazionapps

Hi! We benchmarked NEXO Brain, an open-source MCP memory server using Atkinson-Shiffrin cognitive architecture, on LoCoMo.

Results (v0.5.0)

System F1 Hardware
NEXO Brain v0.5.0 0.588 CPU only
GPT-4 (128K full context) 0.379 GPU cloud
Gemini Pro 1.0 0.313 GPU cloud
LLaMA-3 70B 0.295 A100 GPU
GPT-3.5 + Contriever RAG 0.283 GPU

Setup

  • Embedding model: BAAI/bge-base-en-v1.5 (768 dims, CPU)
  • Answer generation: Claude Sonnet 4
  • Retrieval: Hybrid vector+BM25, HyDE expansion, cross-encoder reranking, multi-query decomposition
  • Memory architecture: STM/LTM stores with adaptive Ebbinghaus decay, intelligent chunking, session summaries

Key findings

  • Outperforms GPT-4 (128K full context) by 55% on F1
  • 93.3% adversarial rejection rate (446 questions)
  • 74.9% recall across 1,986 questions
  • Runs entirely on CPU with 768-dim embeddings

Full results: https://github.com/wazionapps/nexo/tree/main/benchmarks/locomo

We believe this is the highest published score on LoCoMo. Would be great if you'd consider adding external benchmark results to your repo or leaderboard.

Thanks for building such a useful benchmark!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions