Multiple issues between README claims and codebase

I've been doing reviews of agentic memory systems and figured I'd flag this since no other system in my survey has had this pattern before where the README claims do not match what's in the code to such a degree.

| README claim | What the code actually does | Severity |
|---|---|---|
| **"Contradiction detection"** — automatically flags inconsistencies against the knowledge graph | `knowledge_graph.py` has **no contradiction detection**. The only dedup is blocking identical open triples (same subject/predicate/object where `valid_to IS NULL`). Conflicting facts (e.g., two different `married_to` values) accumulate silently. | **Feature does not exist** |
| **"30x compression, zero information loss"** — AAAK described as "lossless shorthand" | AAAK is lossy abbreviation: regex entity codes + keyword frequency + 55-char sentence truncation. `decode()` is string splitting — no original text reconstruction. Token counting uses `len(text)//3` heuristic. **LongMemEval drops from 96.6% to 84.2% in AAAK mode** — a 12.4pp quality loss. | **Claim is false** |
| **96.6% LongMemEval R@5** (headline, positioned as MemPalace's score) | Real score, but measured in "raw mode" — uncompressed verbatim text stored in ChromaDB, standard nearest-neighbor retrieval. **The palace structure (wings/rooms/halls) is not involved.** This measures ChromaDB's default embedding model performance, not MemPalace. | **Misleading attribution** |
| **"+34% retrieval boost from palace structure"** | Narrowing search scope from all drawers → wing → wing+room. This is metadata filtering — a standard technique in any vector DB, not a novel retrieval mechanism. | **Misleading framing** |
| **"100% with Haiku rerank"** | Not in the benchmark scripts. Method undocumented and unverifiable from the repo. | **Unverifiable** |
| **"Closets" as compressed summaries** | AAAK produces abbreviations, not summaries. No evidence of a separate closet storage tier distinct from drawers. | **Nomenclature mismatch** |
| **Hall types structurally enforced** | Halls exist as metadata strings but are not used in retrieval ranking or enforced as constraints. | **Conceptual, not functional** |

There's a lot to like conceptually, but between this and the benchmarks (LongMemEval is using raw ChromaDB, which just measures its embeddings, not using the palace structure at all, both AAAK and room-boosting *decrease* the score, ConvoMem is extremely truncated), is... concerning.  

Full analysis for review: https://github.com/lhl/agentic-memory/blob/main/ANALYSIS-mempalace.md

---

**UPDATE**: @milla-jovovich has acknowledged our findings and has been actively pushing fixes. 🥳 

For those interested to see remediations, links to comments in this issue:
  - https://github.com/milla-jovovich/mempalace/issues/27#issuecomment-4201242972
  - https://github.com/milla-jovovich/mempalace/issues/27#issuecomment-4215691219
  - https://github.com/milla-jovovich/mempalace/issues/27#issuecomment-4216409621

README claim	What the code actually does	Severity
"Contradiction detection" — automatically flags inconsistencies against the knowledge graph	`knowledge_graph.py` has no contradiction detection. The only dedup is blocking identical open triples (same subject/predicate/object where `valid_to IS NULL`). Conflicting facts (e.g., two different `married_to` values) accumulate silently.	Feature does not exist
"30x compression, zero information loss" — AAAK described as "lossless shorthand"	AAAK is lossy abbreviation: regex entity codes + keyword frequency + 55-char sentence truncation. `decode()` is string splitting — no original text reconstruction. Token counting uses `len(text)//3` heuristic. LongMemEval drops from 96.6% to 84.2% in AAAK mode — a 12.4pp quality loss.	Claim is false
96.6% LongMemEval R@5 (headline, positioned as MemPalace's score)	Real score, but measured in "raw mode" — uncompressed verbatim text stored in ChromaDB, standard nearest-neighbor retrieval. The palace structure (wings/rooms/halls) is not involved. This measures ChromaDB's default embedding model performance, not MemPalace.	Misleading attribution
"+34% retrieval boost from palace structure"	Narrowing search scope from all drawers → wing → wing+room. This is metadata filtering — a standard technique in any vector DB, not a novel retrieval mechanism.	Misleading framing
"100% with Haiku rerank"	Not in the benchmark scripts. Method undocumented and unverifiable from the repo.	Unverifiable
"Closets" as compressed summaries	AAAK produces abbreviations, not summaries. No evidence of a separate closet storage tier distinct from drawers.	Nomenclature mismatch
Hall types structurally enforced	Halls exist as metadata strings but are not used in retrieval ranking or enforced as constraints.	Conceptual, not functional

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple issues between README claims and codebase #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multiple issues between README claims and codebase #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions