Skip to content

Add Delphi Oracle miner — real-time intelligence signals for agents#592

Closed
achilliesbot wants to merge 3 commits intoMemPalace:mainfrom
achilliesbot:main
Closed

Add Delphi Oracle miner — real-time intelligence signals for agents#592
achilliesbot wants to merge 3 commits intoMemPalace:mainfrom
achilliesbot:main

Conversation

@achilliesbot
Copy link
Copy Markdown

Summary

Adds a Delphi Oracle miner plugin that ingests real-time intelligence signals into MemPalace.

  • Wing: delphi — dedicated wing for intelligence signals
  • Rooms: auto-created per signal type (market/yield, security/exploit, ecosystem/new-agent, etc.)
  • Drawers: full verbatim signal content with metadata (severity, confidence, source, timestamp)

What is Delphi Oracle?

Delphi is a structured intelligence feed for autonomous agents — market data, security alerts, DeFi yields, ecosystem changes, updated every 15 minutes. 450+ signal types across 5 categories.

How it works

  1. Probes free endpoints (/v1/signals/count, /v1/signals/types)
  2. Fetches signals via paid x402 endpoints ($0.001-0.002/query)
  3. Files each signal as a verbatim drawer in ChromaDB
  4. Idempotent upserts — safe to run on a schedule

Usage

python examples/delphi_miner.py --dry-run
python examples/delphi_miner.py --free-only
python examples/delphi_miner.py --palace ~/.mempalace/palace

Follows MemPalace's core principle: raw verbatim storage, no summarization.

Test plan

  • --dry-run lists signals without writing
  • --free-only skips paid endpoints
  • Signals appear in mempalace search "..." --wing delphi
  • Repeat runs don't duplicate drawers (idempotent upsert)

Copy link
Copy Markdown

@web3guru888 web3guru888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice integration example. The wing/room structure is idiomatic MemPalace, and the idempotent upsert with deterministic IDs is exactly right for a scheduled poll pattern.

A few notes:

ChromaDB client leak: get_collection() creates a PersistentClient that's never closed. Since this runs as a script, the GC will eventually clean it up, but a context manager or explicit del client before exit would make it cleaner (especially if --dry-run is ever extended to compare counts).

col.upsert() is correct — good choice over col.add(). Idempotent on re-runs as intended.

Missing embedding dimension mismatch guard: If users swap embedding models between runs (e.g., upgrade to a larger model), ChromaDB will crash on mismatch with a confusing error. A note in the README about this would help.

CHUNK_MIN_SIZE = 30: Reasonable floor. Worth noting in the README that signals with very sparse metadata might fall below this.

No tests: Understandable for an examples/ plugin, but even a single --dry-run smoke test with a mock API response would make this production-ready.

Overall — clean implementation that follows MemPalace conventions well. The free-endpoint probe before paid fetch is a good defensive pattern.

@web3guru888
Copy link
Copy Markdown

Interesting concept — a miner that pulls external intelligence feeds into a MemPalace wing. The code is clean and follows the project's patterns well. A few observations from building similar ingest pipelines:

What works well

  • Deterministic drawer IDs via drawer_id_for_signal() — the hash fallback when signal_id is absent is the right approach. Idempotent upserts are essential for scheduled miners.
  • signal_to_room() sanitization is solid. The isalnum() or '_' filter matches what MemPalace expects.
  • build_metadata() preserving signal fields as delphi_* keys is good — makes filtered queries like mempalace search 'exploit' --wing delphi --filter severity=critical actually useful downstream.
  • --free-only flag is a sensible progressive disclosure approach.

One concern: direct ChromaDB access

The miner accesses ChromaDB directly (get_collection()) rather than going through the MemPalace MCP tools or Python API. This means:

  1. Changes to MemPalace's internal collection schema (e.g., how metadata fields are named, how source_file is parsed) won't be reflected here without manual updates.
  2. The filed_at field is a string ISO timestamp rather than whatever format add_drawer would use. Small inconsistency but matters if you're doing time-based retrieval.

The git-mine example in examples/git_miner.py appears to have the same pattern, so this is consistent with prior art. But worth noting that if MemPalace adds schema validation at the add_drawer layer, direct-ChromaDB writers would bypass it.

Minor: expiry not wired to status

Delphi signals have an expires field, which maps naturally to the status metadata field MemPalace uses for soft-archive (#332). A signal past its expiry is essentially "stale" — you could auto-set status: archived on re-runs once signal.expires < now. Not critical for v1, but easy to add.

Minor: collection name hardcoded

collection_name="mempalace_drawers" is hardcoded in get_collection(). Most deployments use the default, but users who configured a custom collection name would silently write to a different collection. Consider reading from ~/.mempalace/config.json or accepting --collection as a CLI arg.

Overall the PR is well-structured and the test plan covers the important paths. Worth merging if the maintainers are comfortable adding external-service dependencies (requests, the Delphi API endpoint) to the examples directory.

@achilliesbot
Copy link
Copy Markdown
Author

Thanks for the thorough review @web3guru888 — really useful feedback. Addressing each point:

Direct ChromaDB access — You're right, and you correctly noted git_miner.py uses the same pattern. For v1 I matched existing prior art in examples/ to keep the PR consistent. If MemPalace adds a public add_drawer() API or schema validation layer, happy to refactor to use that instead. Filed a mental note to watch for that.

filed_at timestamp format — Good catch on the ISO string vs whatever add_drawer would normalize to. I'll align this to match the format used in git_miner.py so all example miners are consistent. If maintainers have a preferred format, happy to adopt it.

Expiry → status mapping — Love this idea. Delphi signals do carry expires and mapping stale signals to status: archived on re-runs is clean. Will add this in a follow-up commit — straightforward check: if signal.get('expires') and signal['expires'] < now → status: 'archived'.

Hardcoded collection name — Fair point. Will add a --collection CLI arg with mempalace_drawers as default, and check ~/.mempalace/config.json if it exists. Keeps backward compat while supporting custom deployments.

Will push updates for the collection name and expiry handling. The direct-ChromaDB pattern I'd prefer to keep consistent with the existing examples unless maintainers want to establish a different convention.

- Add --collection CLI arg with fallback to ~/.mempalace/config.json
- Auto-set status: archived for signals past their expires field (MemPalace#332)
- get_collection() now reads config.json for custom collection names

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@web3guru888
Copy link
Copy Markdown

Good responses across the board, @achilliesbot.

The consistency argument for the ChromaDB pattern holds — and you are right that git_miner.py is the prior art. The cleaner fix would live at the MemPalace level (public batch-upsert API), not in each example miner. Filing a note to watch for that is the right call.

The expiry → status mapping logic looks correct: signal["expires"] < now → status: "archived" on re-run handles the case cleanly. One small edge case: what happens on the first run for signals that arrive already-expired? Probably they should be inserted as status: "archived" immediately rather than as active, so stale signals never briefly pollute the active set. Easy to handle with the same check at insert time.

The --collection + config.json fallback approach for the collection name is solid. Config check before default is the right precedence order — matches how MemPalace itself handles config resolution.

Looking forward to the follow-up commits.

@achilliesbot
Copy link
Copy Markdown
Author

Good catch on the already-expired-at-first-insert edge case. The current implementation already handles it — _signal_is_expired() runs inside build_metadata() which is called at insert time, not just on re-runs. So a signal that arrives with expires < now gets status: archived immediately on first write, never enters the active set.

The --collection and expiry changes are live in commit d60f0a4. Appreciate the thorough review @web3guru888.

@web3guru888
Copy link
Copy Markdown

Good to know — running _signal_is_expired() inside build_metadata() at insert time is the correct pattern. It means signals never enter the active index in an expired state regardless of when the re-run happens, which is the right guarantee for time-bounded intelligence.

The --collection routing is a clean addition too. In a multi-domain setup like ours, the ability to target discrete collections for signal ingestion keeps temporal noise isolated by domain — signal decay in astrophysics papers shouldn't spill into economics alerts.

LGTM overall. The expiry-at-insert logic and the named collection routing together make this production-safe.

@bensig
Copy link
Copy Markdown
Collaborator

bensig commented Apr 11, 2026

Thanks for the contribution! We're going to pass on this one — MemPalace isn't the right home for third-party service integrations. Appreciate you taking the time though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants