- GitHub
- Documentation
- Profile
- PyPI (sql-sop)
- Download Stats
- Contributing:
CONTRIBUTING.md·GOVERNANCE.md·CODE_OF_CONDUCT.md·SECURITY.md
Factory managers and shift leads need answers from production data — yield, waste, compliance, traceability. Today, they either write SQL themselves (error-prone), wait for IT (slow), or export to Excel (outdated by the time it opens). OpsMind lets anyone type a question in plain English and get an answer in 5 seconds, directly from the production database.
- Ask in English, get answers in seconds — no SQL knowledge required
- Runs entirely on your machine — no data leaves the factory network
- SQL injection protection — validates every query before execution
- Covers 7 business areas — production, waste, orders, compliance, staff, suppliers, traceability
- Smart alerts — flags yield drops, temperature breaches, and overtime automatically
- Domain-aware — loads compliance, production, and waste rules at runtime for context-aware answers
Manufacturing teams query data through Excel exports and IT requests. OpsMind lets any operator ask the database in English — offline, on-prem, no API keys.
Includes production ERP integration with 19 tables covering batch-centric runs, waterfall yield tracking (Premium/Standard/Export tiers), batch lineage for OCM scan-back traceability, and shelf life management.
$ ollama pull gemma3:12b
$ streamlit run app.py
┌─────────────────────────────────────────────────┐
│ OpsMind — 7 tabs loaded │
│ │
│ > "What was the yield for cod fillets last week?"│
│ │
│ Detecting domain... production (2 tables) │
│ Generating SQL... │
│ SELECT ProductCode, AVG(YieldPercent) │
│ FROM ProductionRuns │
│ WHERE ProductCode = 'COD-F' │
│ AND ProductionDate >= date('now', '-7 days') │
│ GROUP BY ProductCode; │
│ │
│ ┌──────────┬──────────────┐ │
│ │ Product │ Avg Yield % │ │
│ ├──────────┼──────────────┤ │
│ │ COD-F │ 94.2% │ │
│ └──────────┴──────────────┘ │
│ │
│ "Cod fillet yield averaged 94.2% last week, │
│ which is 1.8% above your 30-day average." │
└─────────────────────────────────────────────────┘
User asks: "What was yesterday's waste?"
│
▼
┌─────────────┐ ┌──────────────────┐
│ Query Library│────▶│ 10 pre-built SQL │──── Match? ───▶ Execute instantly
│ (fast path) │ │ patterns │
└─────────────┘ └──────────────────┘
│ No match
▼
┌─────────────┐ ┌──────────────────┐ ┌──────────────┐
│ Schema │────▶│ Pick 4 tables │────▶│ Ollama LLM │
│ Registry │ │ from 19 │ │ (Gemma3 12B) │
│ (7 domains) │ │ (domain match) │ │ │
└─────────────┘ └──────────────────┘ └──────────────┘
│
▼
┌──────────────┐
│ SQLAlchemy │
│ execute │
└──────────────┘
│
┌──────┴──────┐
▼ ▼
Result Table Plotly Chart
│
▼
LLM explains in
plain English
Step 1 — Domain detection. User asks about "orders" → schema registry maps it to 2 tables out of 19. Only those go to the LLM.
Step 2 — SQL generation. Ollama converts the question to SQL. Pre-built library short-circuits the 10 most common questions.
Step 3 — Execution. SQLAlchemy runs the query (read-only — INSERT/UPDATE/DELETE blocked). Result rendered as table + Plotly chart.
Step 4 — Explanation. LLM summarises the result in English with context ("above average", "trending down").
┌─────────────────────────────────────────────────────────────┐
│ TAB 1: SQL Chat │
│ "How many orders shipped late this month?" │
│ → 12 orders, 3.4% of total. Worst day: Tuesday 18th. │
│ │
│ TAB 2: Document Search (RAG) │
│ Upload: allergen-procedure-v3.pdf │
│ "What's the allergen cleaning protocol?" │
│ → "Section 4.2: All surfaces must be cleaned with..." │
│ → Source: allergen-procedure-v3.pdf, page 8 │
│ │
│ TAB 3: Production Dashboard │
│ Output: 2,841 kg │ Waste: 187 kg │ Yield: 93.8% │
│ Orders: 38 open │ Shipped: 412 │ Late: 12 │
│ │
│ TAB 4: Compliance & Traceability │
│ Batch COD-2024-0847: │
│ Raw material → Supplier ABC, intake 06:12 │
│ Production → Line 2, yield 95.1% │
│ Despatch → Customer XYZ, temp 2.1°C ✓ │
│ │
│ TAB 5: Smart Alerts │
│ ⚠ Yield drop: Haddock -4.2% vs 30-day avg │
│ ⚠ Cold Room 2: 5.3°C (threshold: 5.0°C) │
│ ⚠ 3 batches expiring within 48 hours │
│ │
│ TAB 6: Excel Upload │
│ Uploaded: march-production.xlsx (340 rows) │
│ "What product had the most waste?" │
│ → Salmon fillets: 42kg waste (8.1% of output) │
│ │
│ TAB 7: Schema Registry │
│ 7 domains │ up to 147 tables │ 4 selected for current query │
└─────────────────────────────────────────────────────────────┘
# Step 1: Get Ollama running
curl -fsSL https://ollama.com/install.sh | sh
ollama pull gemma3:12b # default model, better SQL accuracy
# Step 2: Clone and install
git clone https://github.com/Pawansingh3889/OpsMind.git
cd OpsMind
pip install -r requirements.txt
# Step 3: Seed demo data (60 days of synthetic manufacturing data)
python scripts/seed_demo_db.py
# → Products: 10 | Runs: 662 | Orders: 451 | Temp logs: 3,600 | Materials: 282
# Step 4: Index documents for RAG search
python scripts/ingest_documents.py
# → Indexing PDFs into ChromaDB vectors...
# Step 5: Run
streamlit run app.py
# → OpsMind running at http://localhost:8501Or one-liner: make setup && make run
Isolated deployment pattern inspired by PyCon DE 2026: "Building Secure Environments for CLI Code Agents" (Nezbeda).
# One-command deployment: app + Ollama in isolated containers
docker compose up -d
# OpsMind: http://localhost:8501
# Ollama: http://localhost:11434What this gives you:
- OpsMind runs as non-root user in a minimal Python 3.11 container
- Ollama runs in a separate container (isolated bridge network)
- Model weights persist in a named volume
- Logs persist outside containers at
./logs/ - Health checks on both services with auto-restart
- No secrets in images — all configuration via environment variables
Every agent interaction is logged to logs/audit.jsonl as structured JSON.
# Last 10 SQL executions
jq -c 'select(.event == "sql_executed")' logs/audit.jsonl | tail -10
# All validation failures
jq -c 'select(.event == "sql_validated" and .passed == false)' logs/audit.jsonl
# Questions per day
jq -r 'select(.event == "question_asked") | .timestamp[:10]' logs/audit.jsonl | sort | uniq -cEvents logged: question_asked, sql_generated, sql_validated, sql_executed, llm_call.
Required for BRC traceability — every query, who asked, what SQL ran, what came back.
Three suites, three commands:
make test # cross-module smoke (tests/test_core.py) + per-module (tests/unit/)
make eval-library # library fast-path eval — no Ollama needed
make eval # full eval (library + LLM paths) — needs Ollama + gemma3:12bCoverage at a glance:
| Suite | File | What it covers |
|---|---|---|
| Smoke | tests/test_core.py |
Config, SQL dialect, schema registry, database, compliance, alerts, waste, SQL safety, doc search — one test per concern. |
| Per-module | tests/unit/test_sql_validator.py |
Every stage of the 5-stage SQL validation pipeline (statement type, injection, table existence, column resolution, row-limit injection). |
| Per-module | tests/unit/test_query_library.py |
One canonical question per library pattern + explicit regex-collision guards. |
| Per-module | tests/unit/test_schema_registry.py |
Domain detection for all 7 domains, edge cases, registry contracts. |
| Eval | tests/eval/golden_set.yaml |
20 factory questions — 14 library-path, 6 LLM-path. Judge compares result sets against the demo database. |
Failure-mode taxonomy for the eval harness lives in
tests/eval/failure_modes.md — it's a
living document that grows as the LLM path hits real failures.
# Environment variable — connection string
OPSMIND_DB=mssql+pyodbc://user:pass@server/database?driver=ODBC+Driver+17+for+SQL+Server
# Windows Auth
OPSMIND_DB=mssql+pyodbc://server/database?driver=ODBC+Driver+17+for+SQL+Server&trusted_connection=yesThen edit schema.yaml to map your tables to OpsMind's 7 business domains:
traceability:
tables:
ProductionBatch: BatchID, BatchNo, ProductCode, ProductionDate
RawMaterialIntake: IntakeID, ProductCode, BatchNo, SupplierCode
production:
tables:
ProductionRuns: RunID, ProductCode, FinishedOutputKg, WasteKg
# Also: orders, temperature, staff, stock, complianceProduction ERP integration: Includes 19 production ERP tables (runs, traceability, temperature, non-conformance, shifts, despatch, shelf life, yield tiers) with 8 pre-built SQL queries and 3 production-specific alerts.
Production data follows a batch-centric run structure: one batch feeds one run producing multiple products across RSPCA (Tier 1), GG (Tier 2), and catch-all (Tier 3) tiers. The schema registry maps these tables to 7 business domains for efficient NL-to-SQL query generation.
8 pre-built production queries for instant results (no LLM round-trip):
| # | Query | Description |
|---|---|---|
| 1 | Daily yield by production line | Yield % per line for a given date, compared to 30-day average |
| 2 | Batch traceability lookup (batch to vessel) | Full lineage from finished batch back to raw material vessel/intake |
| 3 | Temperature breach report | All cold-store and in-process readings outside threshold in a date range |
| 4 | Allergen changeover check | Validates cleaning records between allergen-class changeovers on a line |
| 5 | Shift productivity (day vs night) | Output kg, waste kg, and yield % split by shift for comparison |
| 6 | Giveaway analysis by product | Overweight giveaway per product vs target, ranked by cost impact |
| 7 | Open critical non-conformances | All open NCs with severity = Critical, grouped by category and age |
| 8 | MSC/ASC certification status | Current certification status per species/supplier with expiry dates |
3 production-specific alerts monitored continuously:
- Yield drops — flags when line yield falls below the 30-day rolling average by a configurable threshold
- Temperature breaches — triggers when any cold-store or in-process sensor exceeds its defined limit
- Open critical NCs — alerts when critical non-conformances remain unresolved past the SLA window
OpsMind supports two vector backends for RAG document search:
Local, embedded vector store. No external services needed -- works out of the box.
# No extra config required. ChromaDB is the default.
OPSMIND_VECTOR_DB=chromadb # optional, this is the defaultShared, production-grade vector store backed by PostgreSQL with the pgvector extension. Recommended when multiple OpsMind instances need to share the same document index, or when you want vector search co-located with your existing PostgreSQL database.
# 1. Install pgvector in your PostgreSQL instance
# https://github.com/pgvector/pgvector
# 2. Point OpsMind at the database
OPSMIND_VECTOR_DB=pgvector
OPSMIND_VECTOR_PG_URL=postgresql+psycopg2://user:pass@host:5432/opsmindOpsMind will automatically create the documents table and the pgvector extension on first use. If PostgreSQL is unreachable, the system falls back to ChromaDB so the app keeps working.
| Layer | Tool | What it does |
|---|---|---|
| LLM | Ollama (Gemma 3 12B) | English to SQL, result explanation |
| Agent | LangGraph (6-node state graph) | Structured NL-to-SQL pipeline with conditional routing |
| MCP Servers | FastMCP (database + doc search) | Decoupled tool servers via Model Context Protocol |
| SQL Validation | sqlparse + custom validator | Injection detection, schema checks, row limits |
| Database | SQLAlchemy | SQLite (demo) + SQL Server (production) |
| Vector Search | ChromaDB or PostgreSQL+pgvector | PDF and SOP search (RAG) |
| Domain Docs | Runtime-loaded markdown | Compliance, production, waste rules injected into LLM context |
| UI | Streamlit (7 tabs) | Dashboard, chat, charts |
| Charts | Plotly | Production and waste visualisation |
| Config | YAML + env vars | Schema registry, alert thresholds, MCP settings |
| Tests | pytest | Unit + integration tests |
| Lint | ruff + ty (Astral) | Linting and type checking in CI |
User Question (plain English)
|
v
[LangGraph Agent] --- 6-node state graph
|
+---> [Query Library] --- 8 pre-built queries (fast path)
|
+---> [Schema Registry] --- 7 domains, 19 tables
|
+---> [Ollama / Gemma 3 12B] --- NL-to-SQL generation
|
v
[SQL Validation] --- read-only enforcement
|
v
[SQLAlchemy] --- execute query
|
v
[Results + Explanation]
RAG: ChromaDB or PostgreSQL pgvector
Monitoring: Sentry (opt-in)
MCP Servers (opt-in):
[Database Server :9000] --- query, discover tables/columns, domain schema
[Doc Search Server :9001] --- semantic search, domain context
[MCP Client] --- connects to servers or falls back to direct module calls
OpsMind can optionally expose database and document search as MCP servers, decoupling data access from agent logic. This follows patterns from the PyCon DE 2026 talk on building agentic systems with LangGraph and MCP.
| Server | Port | Tools |
|---|---|---|
| Database | 9000 | query_database, discover_tables, discover_columns, get_schema_for_domain |
| Doc Search | 9001 | search_documents, get_document_count, get_domain_context |
MCP is opt-in. Set MCP_ENABLED=true to use the servers. When disabled (default), OpsMind calls modules directly — no behaviour change.
# Start MCP servers
python -m mcp_servers.database_server
python -m mcp_servers.doc_search_server
# Run OpsMind with MCP
MCP_ENABLED=true streamlit run app.pySee specs/mcp-servers.md for full documentation.
OpsMind uses a LangGraph state graph to structure the NL-to-SQL pipeline as a multi-step agent with 6 nodes:
question -> [detect_domain] -> [check_library] --match--> [validate_sql] -> [execute_sql] -> [explain_results]
| |
no match invalid SQL
| |
[generate_sql] --------> END (error)
| Node | Purpose |
|---|---|
detect_domain |
Maps the question to one of 7 business domains via keyword scoring |
check_library |
Checks 18 pre-built regex patterns for a fast-path match (no LLM needed) |
generate_sql |
LLM generates SQL from the question and domain-scoped schema |
validate_sql |
Safety gate -- only SELECT/WITH allowed, blocks dangerous keywords |
execute_sql |
Runs the validated query via SQLAlchemy (read-only) |
explain_results |
LLM explains results in plain English with business context |
Pre-built library fast path -- the 18 most common production questions bypass LLM generation entirely, returning tested SQL in milliseconds.
SQL safety validation -- every query passes through a 5-stage validation pipeline: statement type check, injection pattern detection (tautologies, UNION, comments, stacked queries), table existence, column existence, and automatic row limit enforcement. See specs/security.md for the full threat model.
Structured state flow -- the full state {question, domain, sql, results, explanation, error} flows through each node, making the pipeline observable and debuggable.
See modules/agent_graph.py for the implementation.
OpsMind supports optional error monitoring via Sentry. Set the SENTRY_DSN environment variable to enable it. If not set, monitoring is completely disabled (no-op).
export SENTRY_DSN="https://[email protected]/0"
export ENVIRONMENT="production" # optional, defaults to "development"Errors in the SQL agent pipeline are automatically reported when Sentry is enabled.
| Area | Reality |
|---|---|
| LLM accuracy | Measured per release — see tests/eval/golden_set.yaml (library path + LLM path). Run make eval locally to get current numbers for your model + schema. |
| Speed | 10-25 sec per query on 16GB RAM. LLM is the bottleneck. |
| Auth | Password via Streamlit secrets. No multi-user roles. |
| Safety | Read-only. SELECT only — INSERT/UPDATE/DELETE blocked. |
The accuracy claim is backed by a golden set (tests/eval/golden_set.yaml) with
two paths:
- Library path — 14 questions that should hit pre-built SQL patterns. Runs
with no LLM, catches regressions when someone edits
query_library.py. - LLM path — 6 questions the library can't match, forcing real NL-to-SQL generation. Generated SQL is executed against the demo database and compared to a reference SQL's result set.
make eval-library # fast, CI-safe, no Ollama needed
make eval-llm # requires Ollama + gemma3:12b
make eval # bothFailure modes are catalogued in tests/eval/failure_modes.md — the taxonomy
grows as real failures arrive (pattern from Martin Seeler's "AI Evals Done
Right", PyCon DE 2026).
OpsMind is solo-maintained with an open door for contributors.
- Read
CONTRIBUTING.mdfor how to set up, test, and open a PR. - Read
GOVERNANCE.mdbefore large changes — roles, response-time commitments, and the four hard scope lines live there. - Security issues go through
SECURITY.md(private advisory), not public issues. - Behavioural expectations are in
CODE_OF_CONDUCT.md. - Third-party licence attributions are in
THIRD_PARTY_NOTICES.md; the project's own copyright notice is inNOTICE.
Good first issues are labelled good first issue. First-PR-wins on any
issue: claim by commenting, ship within 7 days, or the next contributor
may take it.
If OpsMind is useful to you, a GitHub star is the easiest way to help — it makes the project more discoverable for people with the same problem.
