feat: restructure as framework, add feedback loop + ranker + GitHub importer#1
Conversation
…orter - Reorganize into subpackages: embeddings/, search/, ingest/, benchmark/, mcp/, cli/commands/ - Add `context8 import-github`, `context8_rate` MCP tool, per-strategy attribution, and confidence/recency/worked-ratio quality ranker - Wire previously-unused solution named vector into search - Add `bench` (Recall@K ablation) and `demo` CLI commands - Fix StorageService not calling connect() on the Actian sync wrapper - Harden `context8 doctor` to verify hybrid/named/sparse are actually live
There was a problem hiding this comment.
Sorry @pathfindermilan, your pull request is larger than the review limit of 150000 diff characters
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (49)
📝 WalkthroughWalkthroughContext8 undergoes a comprehensive reorganization from flat structure to modular packages: CLI commands split into separate files under Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as CLI (import-github)
participant GitHub as GitHub API
participant Importer as GitHubIssueImporter
participant Embeddings as EmbeddingService
participant Pipeline as IngestPipeline
participant Storage as StorageService
User->>CLI: context8 import-github owner/repo --label bug
CLI->>CLI: check_db_connection()
CLI->>Importer: fetch(repo, labels, max_issues, state)
Importer->>GitHub: GET /repos/owner/repo/issues?labels=bug
GitHub-->>Importer: issues + comments
Importer-->>CLI: FetchResult
CLI->>Importer: to_records(FetchResult, require_resolution)
Importer-->>CLI: list[ResolutionRecord]
CLI->>Pipeline: ingest(records, skip_existing=True)
Pipeline->>Storage: get_record(id) [check duplicates]
Storage-->>Pipeline: existing or None
Pipeline->>Embeddings: embed_record(problem, solution, code)
Embeddings-->>Pipeline: vectors dict
Pipeline->>Storage: store_record(record, vectors)
Storage-->>Pipeline: record_id
Pipeline-->>CLI: IngestStats(attempted=N, stored=M, duplicates=X)
CLI->>User: Display summary table
sequenceDiagram
participant User
participant SearchEngine as SearchEngine
participant Embeddings as EmbeddingService
participant Attribution as AttributionTracker
participant Ranking as QualityRanker
participant CLI as CLI (search)
User->>CLI: context8 search "error message"
CLI->>SearchEngine: search(query, language=..., limit=5)
SearchEngine->>Embeddings: embed_query("error message")
Embeddings-->>SearchEngine: dense vector
SearchEngine->>SearchEngine: _search_named("problem", vector, filter, 5)
SearchEngine-->>SearchEngine: problem_results[]
SearchEngine->>SearchEngine: _search_sparse(sparse_vector, filter, 5)
SearchEngine-->>SearchEngine: sparse_results[]
SearchEngine->>SearchEngine: RRF fusion(problem_results, sparse_results)
SearchEngine-->>SearchEngine: fused_results[]
SearchEngine->>Attribution: record(strategy="dense", results)
SearchEngine->>Attribution: record(strategy="sparse", results)
SearchEngine->>Ranking: boost(fused_results)
Ranking->>Ranking: Apply confidence/recency/feedback factors
Ranking-->>SearchEngine: boosted_results[]
SearchEngine->>Attribution: build_for(record_id) for each result
SearchEngine-->>CLI: list[SearchResult with attribution]
CLI->>User: Display results with source tracking
sequenceDiagram
participant User
participant CLI as CLI (rate tool)
participant MCP as MCP Handler
participant FeedbackService as FeedbackService
participant Storage as StorageService
participant Embeddings as EmbeddingService
User->>MCP: Call context8_rate(record_id, worked=true)
MCP->>FeedbackService: rate(record_id, worked=true)
FeedbackService->>Storage: get_record(record_id)
Storage-->>FeedbackService: ResolutionRecord
FeedbackService->>FeedbackService: Increment applied_count
FeedbackService->>FeedbackService: Increment worked_count (if worked=true)
FeedbackService->>FeedbackService: Update last_seen timestamp
FeedbackService->>Embeddings: embed_record(updated record)
Embeddings-->>FeedbackService: vectors dict
FeedbackService->>Storage: update_record(record, vectors)
Storage-->>FeedbackService: success
FeedbackService-->>MCP: FeedbackOutcome(accepted=true, worked_ratio=0.5)
MCP-->>User: "Feedback recorded: 1/2 worked"
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
docs: add ROADMAP — DO NOT MERGE; pluggable ingestion sources next Flags branch as not-ready-to-merge and lists the highest-leverage |
There was a problem hiding this comment.
Sorry @hallelx2, your pull request is larger than the review limit of 150000 diff characters
There was a problem hiding this comment.
Pull request overview
This PR restructures Context8 into a capability-oriented framework package layout and adds production features around ingestion, ranking, attribution, benchmarking, and MCP/CLI operations.
Changes:
- Replaces the previous flat module layout (
search.py,cli.py, etc.) with subpackages (search/,ingest/,benchmark/,mcp/,cli/commands/,embeddings/). - Adds GitHub Issues ingestion, an agent feedback loop, per-strategy attribution, and a quality re-ranker.
- Adds a benchmark harness + expanded unit/e2e tests and updates docs/artifacts (
RESULTS.md,README.md).
Reviewed changes
Copilot reviewed 49 out of 49 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_ranking.py | Adds unit tests for quality ranker factors and boosting behavior |
| tests/test_models_extended.py | Adds unit tests for feedback + attribution model behavior |
| tests/test_github_importer.py | Adds unit tests for GitHub importer parsing/detection helpers |
| tests/test_embeddings.py | Updates tests to target BM25Tokenizer + QueryAnalyzer in new layout |
| tests/test_e2e.py | Adds live-DB end-to-end coverage for hybrid/filter/feedback/boosting |
| tests/test_benchmark.py | Adds unit tests for benchmark math + ground-truth integrity |
| tests/test_attribution.py | Adds unit tests for attribution tracking logic |
| tests/test_agents.py | Updates MCP entrypoint assertion after module move |
| src/context8/storage.py | Connect-on-demand client + collection introspection + update_record |
| src/context8/search/ranking.py | Implements confidence/recency/feedback-based score boosting |
| src/context8/search/engine.py | New hybrid search engine with attribution + quality boost hooks |
| src/context8/search/attribution.py | Tracks per-strategy rank/score contributions for results |
| src/context8/search/analyzer.py | QueryAnalyzer extracted to its own module |
| src/context8/search/init.py | Exposes new search package surface |
| src/context8/search.py | Removes legacy monolithic search module |
| src/context8/models.py | Adds feedback stats + attribution + raw_score/boost_factors |
| src/context8/mcp/tools.py | Adds MCP tools for rating + solution-approach search + formatting |
| src/context8/mcp/server.py | New MCP server entrypoint wrapping tools module |
| src/context8/mcp/init.py | Exposes MCP app/run_server API |
| src/context8/ingest/seed.py | Adds deterministic seed slugs + routes seeding via ingest pipeline |
| src/context8/ingest/pipeline.py | Adds generic ingest pipeline + ingest stats |
| src/context8/ingest/github.py | Adds GitHub issue importer and extraction heuristics |
| src/context8/ingest/init.py | Exposes ingest package API (seed/pipeline/importer) |
| src/context8/feedback.py | Adds FeedbackService to persist agent success/failure ratings |
| src/context8/embeddings/tokenizer.py | Extracts BM25 tokenizer used for sparse vectors |
| src/context8/embeddings/service.py | Refactors embeddings to use BM25Tokenizer and new package layout |
| src/context8/embeddings/init.py | Exposes embeddings package surface |
| src/context8/config.py | Adds ranker tuning constants + updates MCP server command path |
| src/context8/cli/ui.py | Adds shared CLI helpers (docker compose selection, DB checks) |
| src/context8/cli/main.py | New Click CLI group wiring commands from cli/commands/ |
| src/context8/cli/commands/serve.py | Adds context8 serve command to run MCP server |
| src/context8/cli/commands/ops.py | Adds/updates stats/doctor/search CLI operations |
| src/context8/cli/commands/lifecycle.py | Adds start/stop/init commands for DB lifecycle + seeding |
| src/context8/cli/commands/integrations.py | Adds agent integration commands (add/remove) with aliases |
| src/context8/cli/commands/ingest.py | Adds import-github CLI command to ingest GitHub issues |
| src/context8/cli/commands/bench.py | Adds bench and demo CLI commands |
| src/context8/cli/commands/init.py | Exports CLI commands for main registration |
| src/context8/cli/init.py | Exposes CLI main entrypoint |
| src/context8/cli.py | Removes legacy monolithic CLI module |
| src/context8/benchmark/runner.py | Adds benchmark runner with ablation configurations |
| src/context8/benchmark/ground_truth.py | Adds ground-truth query set for benchmark evaluation |
| src/context8/benchmark/init.py | Exposes benchmark package surface |
| src/context8/agents.py | Updates agent config writer to new MCP module command |
| src/context8/main.py | Updates module entry to run new CLI main |
| src/context8/init.py | Bumps version to 0.2.0 |
| pyproject.toml | Updates ruff per-file ignores to new file paths |
| docker-compose.yml | Switches to fully-qualified docker.io image reference |
| RESULTS.md | Adds submission results template + reproduction steps |
| README.md | Updates docs for new capabilities, commands, and layout |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| named_vectors = self._discover_named_vectors(info) | ||
| sparse_vectors = self._discover_sparse_vectors(info) | ||
| return { | ||
| "status": str(getattr(info, "status", "unknown")), | ||
| "points": getattr(info, "points_count", 0), | ||
| "vectors": ["problem", "solution", "code_context"], | ||
| "vectors": named_vectors or ["problem", "solution", "code_context"], | ||
| "named_vector_count": len(named_vectors), | ||
| "sparse_vectors": sparse_vectors, | ||
| "sparse_supported": bool(sparse_vectors), | ||
| "hybrid_enabled": len(named_vectors) >= 2 and bool(sparse_vectors), |
There was a problem hiding this comment.
get_collection_info() returns a fallback list of vector names when discovery fails, but named_vector_count is still len(named_vectors) (0). Downstream checks like context8 doctor treat named_vector_count < 3 as a hard failure even though vectors reports the fallback 3. Consider making named_vector_count consistent with the vectors you return (or add a separate flag like named_vectors_discovered).
| def __init__( | ||
| self, | ||
| storage: StorageService, | ||
| embeddings: EmbeddingService, | ||
| ranker: QualityRanker | None = None, | ||
| dense_weight: float = DEFAULT_DENSE_WEIGHT, | ||
| code_weight: float = DEFAULT_CODE_WEIGHT, | ||
| sparse_weight: float = DEFAULT_SPARSE_WEIGHT, | ||
| ): | ||
| self.storage = storage | ||
| self.embeddings = embeddings | ||
| self.ranker = ranker or QualityRanker() | ||
| self.dense_weight = dense_weight | ||
| self.code_weight = code_weight | ||
| self.sparse_weight = sparse_weight |
There was a problem hiding this comment.
SearchEngine accepts dense_weight/code_weight/sparse_weight and stores them, but they are not used when building fusion_weights (QueryAnalyzer weights are used directly). This makes the constructor parameters misleading and prevents config-level tuning. Either incorporate these weights into fusion_weights (e.g., multiply or use as defaults when QueryAnalyzer doesn't apply) or remove the parameters/fields.
| table.add_row("Status", "[green]HEALTHY[/]") | ||
|
|
||
| if collection_info: | ||
| table.add_row("Vector spaces", ", ".join(collection_info.get("vectors", []))) | ||
| table.add_row("Status", collection_info.get("status", "unknown")) |
There was a problem hiding this comment.
The stats command adds a hard-coded "Status: HEALTHY" row and then (when collection_info is present) adds another "Status" row from the collection metadata. This produces duplicate/conflicting metrics in the output; consider renaming one (e.g., "DB health" vs "Collection status") or removing the hard-coded row.
| table.add_row("Status", "[green]HEALTHY[/]") | |
| if collection_info: | |
| table.add_row("Vector spaces", ", ".join(collection_info.get("vectors", []))) | |
| table.add_row("Status", collection_info.get("status", "unknown")) | |
| table.add_row("DB health", "[green]HEALTHY[/]") | |
| if collection_info: | |
| table.add_row("Vector spaces", ", ".join(collection_info.get("vectors", []))) | |
| table.add_row("Collection status", collection_info.get("status", "unknown")) |
| for token, freq in sorted(term_freqs.items()): | ||
| idx = abs(hash(token)) % self.vocab_size | ||
| weight = freq / (freq + 1.0) |
There was a problem hiding this comment.
BM25Tokenizer.encode() derives sparse indices via Python's built-in hash(), which is salted per process (PYTHONHASHSEED). That makes stored sparse vectors and query-time sparse vectors inconsistent across restarts/processes, effectively breaking sparse retrieval and hybrid fusion.
| @property | ||
| def sparse_supported(self) -> bool: | ||
| """Check if the collection supports sparse vectors.""" | ||
| if self._sparse_supported is None: | ||
| try: | ||
| self.client.collections.get_info(COLLECTION_NAME) | ||
| self._sparse_supported = False # Safe default | ||
| self._sparse_supported = False | ||
| except Exception: | ||
| self._sparse_supported = False | ||
| return self._sparse_supported |
There was a problem hiding this comment.
StorageService.sparse_supported always resolves to False when _sparse_supported is None (even if the existing collection supports sparse vectors). This will disable sparse search paths on fresh processes that connect to a pre-existing hybrid collection (e.g., when initialize() returns False because the collection already exists). Consider introspecting collection info and setting _sparse_supported based on discovered sparse vector config instead of hard-coding False.
README rewritten around the new SQLite-first install (single command: pip install context8 && context8 init --seed && context8 add claude-code). The Actian path is preserved as an "Optional: Actian VectorAI DB backend" section with the hackathon-era install. The "Hackathon: Advanced Features Used" section becomes "Capabilities (and how each backend delivers them)" — the same three capabilities (named vectors, hybrid fusion, filtered search) framed as backend-portable. Architecture diagram redrawn: pluggable Protocol fanning out to two concrete backends (SQLite vec0+FTS5 below, Actian gRPC container right). Tech-stack table promotes sqlite-vec and FTS5 to primary, demotes Actian to optional. Project-structure tree updated to reflect the new storage/ package and search/fusion.py. v0.5.0 changelog entry added. CLAUDE.md fully rewritten — new project overview, structure, key design decisions (#1 is now "pluggable storage backend"), commands, plus distinct SQLite Backend Notes / Actian Backend Notes sections. RESULTS.md kept as the hackathon submission narrative but reframed: the Actian-feature table is rewritten as a backend-portability table (SQLite delivers each capability via vec0/FTS5/SQL+JSON1; Actian delivers them via named vectors / sparse vectors / FilterBuilder). Benchmark section now has placeholders for both backends so the ablation can be run side-by-side. All twelve docs/*.md design docs (CONCEPT, ARCHITECTURE, BOTTLENECKS, PLAN-01..08, Hackathon Demo Video — Script) prepended with a historical-artifact banner pointing readers to README.md / CLAUDE.md. The hackathon design narrative is preserved in place; it just stops being the canonical source for the current architecture. tests/test_e2e.py: - pytestmark adds pytest.mark.actian + a new skipif on CONTEXT8_BACKEND != "actian", so the Actian e2e suite skips cleanly under the default SQLite install (15 tests skip). - Fixed pre-existing FeedbackService(storage, embeddings) arity mismatch on lines 281 and 300 — production constructor takes storage only (mcp/tools.py:44 confirms). Drop the second arg. - isolated_collection fixture now patches actian_backend.COLLECTION_NAME (the captured-at-import-time copy) rather than the package module attribute — the module's attribute is no longer load-time-bound to COLLECTION_NAME after the storage package split. Verification: 127 passed, 15 actian e2e skipped, ruff clean, context8 init/doctor/stats/search/bench/export/import all green under SQLite. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Summary
Restructures Context8 from a flat module layout into a focused subpackage framework, and adds the four production capabilities that turn it from a hackathon demo into a credible submission.
What's new
embeddings/,search/,ingest/,benchmark/,mcp/,cli/commands/(one responsibility per module)context8 import-github vercel/next.js --label bug --max-issues 50context8_rateMCP tool;worked_ratiofeeds the rankerbench(Recall@K ablation across 5 configs),demo(4 scripted scenarios)Bug fixes
StorageService.clientnow callsconnect()— fixes 503 from the Actian sync wrappercontext8 doctorasserts hybrid / named / sparse / filter are actually live (no silent degradation)docker-compose.ymluses fully-qualifieddocker.io/...image (Podman compatibility)Test plan
test_ranking,test_attribution,test_github_importer,test_models_extended,test_benchmarktests/test_e2e.pycovers hybrid retrieval, filter isolation, feedback persistence, quality boost (live DB, auto-skips when unreachable)ruff check src/ tests/cleancontext8 --helpshows all 12 commands;context8 doctorreports green on a live DBcontext8 benchagainst live DB and paste numbers intoRESULTS.mdSummary by CodeRabbit
Release Notes
New Features
import-githubcommandbenchanddemocommandscontext8_rate) with worked/applied countersDocumentation
Chores