NEXO Brain benchmark results on LoCoMo — first MCP memory server evaluation

Hi! We benchmarked [NEXO Brain](https://github.com/wazionapps/nexo), an open-source MCP memory server using Atkinson-Shiffrin cognitive architecture, on LoCoMo.

## Results (v0.5.0)

| System | F1 | Hardware |
|---|---|---|
| **NEXO Brain v0.5.0** | **0.588** | **CPU only** |
| GPT-4 (128K full context) | 0.379 | GPU cloud |
| Gemini Pro 1.0 | 0.313 | GPU cloud |
| LLaMA-3 70B | 0.295 | A100 GPU |
| GPT-3.5 + Contriever RAG | 0.283 | GPU |

## Setup

- **Embedding model:** BAAI/bge-base-en-v1.5 (768 dims, CPU)
- **Answer generation:** Claude Sonnet 4
- **Retrieval:** Hybrid vector+BM25, HyDE expansion, cross-encoder reranking, multi-query decomposition
- **Memory architecture:** STM/LTM stores with adaptive Ebbinghaus decay, intelligent chunking, session summaries

## Key findings

- Outperforms GPT-4 (128K full context) by 55% on F1
- 93.3% adversarial rejection rate (446 questions)
- 74.9% recall across 1,986 questions
- Runs entirely on CPU with 768-dim embeddings

Full results: https://github.com/wazionapps/nexo/tree/main/benchmarks/locomo

We believe this is the highest published score on LoCoMo. Would be great if you'd consider adding external benchmark results to your repo or leaderboard.

Thanks for building such a useful benchmark!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEXO Brain benchmark results on LoCoMo — first MCP memory server evaluation #33

Results (v0.5.0)

Setup

Key findings

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

System	F1	Hardware
NEXO Brain v0.5.0	0.588	CPU only
GPT-4 (128K full context)	0.379	GPU cloud
Gemini Pro 1.0	0.313	GPU cloud
LLaMA-3 70B	0.295	A100 GPU
GPT-3.5 + Contriever RAG	0.283	GPU

NEXO Brain benchmark results on LoCoMo — first MCP memory server evaluation #33

Description

Results (v0.5.0)

Setup

Key findings

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions