NeuronFeeder – High‑level System Design
Below is a clear, implementation‑ready architecture for the NeuronFeeder Agent, showing the main
components, their interactions, and the data flow from upload to final import.
1) Box‑and‑Lines Overview
┌──────────────────────────┐
│ Client │
│ (Web UI / CLI / API) │
└────────────┬────────────┘
│ HTTPS/JSON (Auth via OAuth/JWT)
┌───────▼────────┐
│ API Gateway │ ← Rate‑limit, authn/z, request
routing
└───────┬────────┘
│
┌─────────▼──────────┐
│ Ingestion Service │ ← chunked upload, resumable
└──────┬─────────────┘
│
┌────────────────┴───────────────┐
│ │
┌─────────▼─────────┐ ┌──────────▼──────────┐
│ Object Storage │ │ Metadata Catalog │
│ (files/chunks) │ │ + Schema Registry │
│ (S3/GCS/Azure) │ │ (tables, PK/FK, │
└─────────┬──────────┘ │ constraints) │
│ └──────────┬──────────┘
│ │
│ ┌──────▼─────────┐
│ │ Mapping Engine │ ← AI + Rules
│ │ (LLM+Rules) │ (header analysis,
table split)
│ └──────┬─────────┘
│ │
│ ┌──────▼────────────┐
│ │ Preview Generator │ ← sample rows,
mapping view
│ └──────┬────────────┘
│ │
│ ┌──────▼────────────┐
│ │ Feedback/NLP Loop │ ← “map
FullName→First_Name”
│ └──────┬────────────┘
│ │ (confirmed mapping)
1
│ ▼
│ ┌──────────────┐
│ │ Orchestrator │ ← workflow engine
(steps, retries)
│ └──────┬───────┘
│ │
│ ┌───────────────┼─────────────────┐
│ │ │ │
┌──────▼─────┐ ┌─────▼──────┐ ┌─────▼──────┐ ┌─────▼──────────┐
│ Staging DB │ │ Validator │ │ Bulk Load │ │ Observability │
│ (landing) │ │ (PK/FK, │ │ Adapters │ │ & Audit Logs │
└──────┬─────┘ │ types, │ │ (COPY, │ │ (ELK/OTel, │
│ │ ranges) │ │ BULK, │ │ Audit DB) │
│ └─────┬──────┘ │ SQL*Loader)│ └─────┬──────────┘
│ │ └─────┬───────┘ │
▼ │ │ │
┌──────────────┐ │ │ │
│ Final DW/DB │◄────────┘ ┌─────▼───────┐ │
│ (OLTP/DW/Lake) │ Message Bus │◄────────┘
└──────────────┘ │ (Kafka/SQS) │ events, metrics
└──────────────┘
2) Component Responsibilities
• Client (Web UI/CLI/API): Upload files, pick target application/tables, review preview, submit
corrections, confirm import.
• API Gateway: TLS termination, auth (OAuth2/JWT), quota & rate limits, routing to services.
• Ingestion Service: Resumable, chunked uploads (GB‑scale), virus scan, basic format sniffing,
writes to Object Storage, records file metadata.
• Object Storage: Durable store for raw files/chunks (e.g., S3/GCS/Azure Blob) with versioning &
lifecycle.
• Metadata Catalog + Schema Registry: Stores system schemas, table definitions, PK/FK,
constraints, mappings history, and dataset lineage.
• Mapping Engine (AI + Rules): Header parsing, fuzzy matching, PII detection, table split
suggestion, PK/FK inference using registry metadata.
• Preview Generator: Builds static, non‑destructive previews (mapping tables, sample rows, table
split plan).
• Feedback/NLP Loop: Parses natural‑language corrections (e.g., rename, split/merge columns,
type overrides) and updates the proposed mapping.
• Orchestrator (Workflow Engine): Coordinates staging → validation → bulk load, handles
retries/compensation, checkpoints, and rollback.
• Staging DB: Landing zone; raw → conformed transformations, light normalization; immutable
audit copies.
• Validator: Enforces constraints (PK uniqueness, FK existence), type & range checks; produces
reject files & error reports.
• Bulk Load Adapters: High‑speed loaders (PostgreSQL COPY, SQL Server BULK INSERT, Oracle
SQL*Loader) with parallelism.
2
• Observability & Audit: Structured logs, metrics, traces (OpenTelemetry), per‑job audit trail,
lineage, and alerts.
• Message Bus (Kafka/SQS/PubSub): Event backbone (upload‑received, mapping‑ready,
validation‑passed/failed, load‑complete).
• Final DW/DB: Target systems (OLTP schemas, Data Warehouse, or Lakehouse) where validated
data lands.
3) End‑to‑End Flow (Happy Path)
1. Upload: Client uploads file in chunks → Ingestion Service → Object Storage. Metadata (file
name, size, checksum) recorded.
2. Analyze: Orchestrator triggers Mapping Engine → reads headers/sample → consults Schema
Registry → proposes mapping & table split.
3. Preview: Preview Generator renders mapping, sample rows, and PK/FK plan → shown to user.
4. Feedback: User submits NLP corrections → Feedback Service updates mapping → new preview
loop until Confirm.
5. Stage: Orchestrator materializes confirmed mapping into Staging DB with idempotent batch
ids.
6. Validate: Validator checks types, PK/FK, nullability, business rules. Rejects are written as files &
surfaced to UI.
7. Bulk Load: On pass, Bulk Adapters write to Final DW/DB using COPY/BULK/SQL*Loader with
parallel threads.
8. Finish: Orchestrator emits events, updates audit, exposes run report & lineage.
4) Data Models (Concise)
• FileArtifact: { id, uri, size, checksum, format, uploader, created_at }
• SchemaEntity: { app, table, fields[], pk[], fk[], constraints[] }
• MappingPlan: { file_id, targets[], transforms[], conflicts[], created_by, version }
• RunJob: { job_id, state, started_at, finished_at, stats, rejects_uri, report_uri }
5) Technology Options
• API/Gateway: FastAPI / Spring Boot + Kong/NGINX
• Storage: S3/GCS/Azure Blob, multipart uploads
• Queue: Kafka / SQS / PubSub
• Staging DB: PostgreSQL / Snowflake stage / BigQuery temp
• Final: Postgres/SQL Server/Oracle/Snowflake/BigQuery/Lakehouse
• Orchestrator: Temporal / Airflow / Dagster
• LLM/NLP: Local HF model or API (for header → field mapping); Rules engine (Drools/JSONLogic)
• Observability: OpenTelemetry + Prometheus + Grafana; Audit in Postgres/Elastic
6) Non‑Functional Highlights
• Scalability: Horizontal workers for ingestion, mapping, and load; back‑pressure via queue.
3
• Reliability: Idempotent job ids, exactly‑once staging writes, retries with exponential backoff.
• Security: At‑rest encryption (SSE‑S3/KMS), in‑transit TLS, RBAC/ABAC, PII redaction.
• Governance: Schema versioning, lineage, role‑based approvals, full audit trail.
7) Sequence Diagram (NLP Correction Loop)
Client → API → Mapping Engine : propose mapping
API → Preview Service : render preview
Client → API : “map FullName → First_Name; split Address into City,State”
API → NLP/Rules : parse intents → updated MappingPlan
NLP/Rules → Mapping Engine : rebuild plan → new preview
API → Client : updated preview (repeat until Confirm)
8) Deployment Sketch
• Microservices (Ingestion, Mapping, Preview, Feedback, Orchestrator, Validator) in containers on
Kubernetes.
• Stateful bits (Postgres, Kafka) as managed services where possible.
• Config via ConfigMaps/Secrets; CI/CD with canary for Mapping Engine.
This diagram and breakdown are designed to be implementation‑ready while staying
tech‑agnostic. We can tailor choices (e.g., Postgres vs Snowflake, Kafka vs SQS) to your
environment.