How It Works
8 peer-reviewed papers. 7 clinical ratios. One trajectory pipeline that turns blood test snapshots into a health story.
Users upload blood test PDFs via the UploadForm component, which are stored in Cloudflare R2 and processed through a LlamaIndex IngestionPipeline with a custom BloodTestNodeParser. The pipeline produces 3 node types per test—test-level, per-marker, and health-state with 7 derived ratios—all embedded locally via BAAI/bge-large-en-v1.5 (native 1024-dim, ONNX Runtime) and stored in 7 paired pgvector tables. For AI Q&A, a 4-node LangGraph StateGraph (triage → retrieve → synthesize → guard) classifies intent into 8 categories, routes to intent-specific pgvector searches (hybrid for markers, fan-out for general health), generates a clinical answer, and audits it against 5 safety rules before delivery.
Key findings: 1024-dim Native embedding dimension (BAAI/bge-large-en-v1.5, no truncation needed) (langgraph/embeddings.py: _LOCAL_MODEL = "BAAI/bge-large-en-v1.5" via FastEmbed TextEmbedding); 7B Parameter count for Qwen 2.5 Instruct LLM (LLM_MODEL=mlx-community/Qwen2.5-7B-Instruct-4bit in .env.example); 7 Embedding tables (tests, markers, health state, conditions, medications, symptoms, appointments) (7 paired pgvector tables with vector(1024) + BTREE user_id index); 2 Evaluation frameworks (DeepEval, RAGAS) (evals/ directory with 15+ pytest modules using DeepEval metrics and RAGAS triad); O(log n) Indexed search performance for userId queries (PostgreSQL indexes on userId columns for all user-owned tables); < 100ms Target latency for local LLM inference via mlx_lm.server (Apple Silicon optimization for Qwen 2.5 4bit model); 0 egress Cloudflare R2 storage cost model with zero egress fees (R2 bucket healthcare-blood-tests for PDF storage).
Technical Foundations
- Next.js 15 — Vercel (2024). Finding: React framework with App Router, Server Components, and Server Actions for optimized performance and SEO Relevance: Used for the entire frontend with app/ directory routing, enabling Server Components in pages like app/(app)/chat/page.tsx and Server Actions in app/(app)/conditions/actions.ts [link]
- PostgreSQL (Neon) — Neon (2024). Finding: Serverless PostgreSQL with branching and auto-scaling for modern applications Relevance: Primary database storing bloodTests, conditions, medications, symptoms, appointments, doctors, and familyMembers tables with Drizzle ORM for schema management [link]
- Drizzle ORM — Drizzle Team (2024). Finding: TypeScript ORM with zero dependencies and excellent TypeScript support Relevance: Used for all database operations, including migrations with Drizzle Kit and queries like in app/(app)/doctors/page.tsx for multi-table joins [link]
- Better Auth — AI Apps (2024). Finding: Authentication library with session management and route protection Relevance: Handles user authentication via @ai-apps/auth, with session checks in lib/auth-helpers.ts and userId isolation in all queries [link]
- Qwen 2.5 7B Instruct 4bit — Alibaba (2024). Finding: Local LLM optimized for Apple Silicon via mlx_lm.server, offering privacy and low-latency inference Relevance: Primary LLM for AI health Q&A in the ChatInterface component, configured via LLM_BASE_URL and LLM_MODEL environment variables [link]
- BAAI/bge-large-en-v1.5 — Beijing Academy of AI (2023). Finding: BERT-based bi-encoder producing native 1024-dim embeddings, ranked #1 on MTEB at release. Runs fully offline via ONNX Runtime (Python) and quantized INT8 (TypeScript) — zero API calls, zero egress, full data sovereignty Relevance: Dual-runtime embedding: FastEmbed (ONNX) in langgraph/embeddings.py for ingestion and search, Xenova/bge-large-en-v1.5 (q8 via @huggingface/transformers) in lib/embed.ts for browser-side entity embedding. Both produce aligned 1024-dim vectors stored in 7 pgvector tables [link]
- Cloudflare R2 — Cloudflare (2024). Finding: Object storage with S3-compatible API and zero egress fees Relevance: Stores uploaded blood test PDFs in the healthcare-blood-tests bucket, accessed via AWS SDK v3 [link]
- Panda CSS — Panda CSS Team (2024). Finding: CSS-in-JS with codegen for type-safe styling and theming Relevance: Used for styling across components with panda codegen for design token management [link]
- Radix UI — Radix UI (2024). Finding: Unstyled, accessible component primitives for building custom UIs Relevance: Provides base components like dialogs and dropdowns, customized with Panda CSS theme [link]
- DeepEval — Confident AI (2024). Finding: Python evaluation framework for LLMs with built-in RAG metrics, GEval custom criteria, and synthetic test generation Relevance: Powers the full evaluation suite in evals/ with 15+ test modules covering RAG triad, safety, trajectory, conversational, and graph pipeline quality [link]
- Turbopack — Vercel (2024). Finding: Incremental bundler for fast development builds in Next.js Relevance: Used for development builds to speed up iteration on components like UploadForm and ChatInterface [link]
Pipeline
- Blood Test Upload and Storage — Users upload blood test PDFs via the UploadForm component in app/(app)/blood-tests/upload-form.tsx, which handles file validation and multipart form data submission. A Next.js Server Action sends the file to the Python FastAPI backend at POST /upload, which uploads the raw bytes to Cloudflare R2 bucket healthcare-blood-tests via boto3 (S3-compatible). This ensures raw data is persisted before processing, with userId isolation for multi-tenancy. Research basis: Cloudflare R2 object storage with S3-compatible API via boto3.
- LlamaParse PDF-to-Markdown Conversion — LlamaParse is called directly (not through LlamaIndex or LangGraph) to convert uploaded PDFs into structured markdown. The PDF bytes are written to a temp file, passed to LlamaParse's cloud API with result_type='markdown', then deleted. The returned markdown is processed by _markdown_to_elements() which uses regex to extract markdown tables and converts them to HTML via _md_table_to_html(). The result is a list of element dicts with type 'Table' (containing text_as_html) or 'NarrativeText'. Research basis: LlamaParse cloud API — standalone invocation, decoupled from LlamaIndex.
- 3-Tier Marker Extraction — parse_markers() in langgraph/parsers.py applies a 3-tier fallback strategy to extract biomarkers from the LlamaParse element dicts. Tier 1: parse_html_table extracts name/value/unit/reference_range from HTML table cells. Tier 2: parse_form_key_values handles Romanian/European key-value lab formats. Tier 3: parse_text_markers uses regex pattern matching for tab/space-separated free text. Each marker's flag (normal/low/high) is computed by compare_flag() against the reference range. Markers are stored in the blood_markers table. Research basis: Multi-strategy parsing with compute_flag() for clinical range comparison.
- Embedding Generation (LlamaIndex IngestionPipeline) — After the upload response returns, a FastAPI BackgroundTask runs _run_ingestion() which orchestrates a LlamaIndex IngestionPipeline. A custom BloodTestNodeParser produces 3 node types per test: blood_test (summary), blood_marker (one per marker), and health_state (7 derived clinical ratios with risk classification). All text is embedded locally via BAAI/bge-large-en-v1.5 (1024-dim, ONNX Runtime) through FastEmbedEmbedding, then persisted to 3 pgvector tables with ON CONFLICT upsert for idempotency. Research basis: BAAI/bge-large-en-v1.5 via FastEmbed (local ONNX, zero API cost).
- Multi-Source Retrieval for RAG — When a user submits a query, the triage node classifies intent into 8 categories, then the retrieve node dispatches to different pgvector search strategies. Marker queries use hybrid search (0.7 cosine + 0.3 FTS via a CTE with ts_rank normalization). Trajectory queries extend this with temporal joins for time-ordered series. General-health fans out to all 7 entity tables simultaneously. Results are deduplicated and re-ranked by score. Research basis: Intent-routed hybrid search: cosine similarity + full-text search in one SQL CTE.
- LLM Generation and Streaming — The assembled context is sent to the local Qwen 2.5 7B Instruct 4bit LLM via QwenClient, configured with LLM_BASE_URL and LLM_MODEL environment variables. The LLM generates an answer grounded in the retrieved context, with system prompts emphasizing medical accuracy. Responses are streamed back to the UI using the ChatInterface component, enabling real-time interaction with conversation history management. Research basis: mlx_lm.server for Apple Silicon-optimized local inference.
- Trajectory Analysis and Velocity Calculation — The TrajectoryPreview and TrajectoryInsights components analyze sequential blood tests by comparing embeddings via cosine similarity to detect pattern shifts. Functions calculate biomarker velocities (rate of change per day) and clinical ratios based on available markers. This enables trend detection and alerts for accelerating trends, providing longitudinal health insights beyond snapshot analysis. Research basis: Embedding similarity for pattern recognition and rate calculations.
- Evaluation and Quality Assurance — Comprehensive evaluation is performed using DeepEval with RAGAS for RAG quality metrics (faithfulness, relevancy, contextual precision/recall), GEval custom criteria with DeepSeek as judge, and synthetic test generation. 15+ pytest modules cover critical paths: RAG triad, safety guardrails, trajectory analysis, conversational multi-turn, graph pipeline, ingestion, extraction, and embedding quality. Research basis: DeepEval + RAGAS evaluation suite with custom DeepSeek judge model.