Skip to content

AnshumanAtrey/oki-doki

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Oki-Doki · Your Parent's Companion, Built For Yours


The 30-second pitch

Oki-Doki is a physical desk-top turtle that lives on your parent's laptop. It listens, remembers, and gently looks after them — and it shares just enough with you, the adult child, so you can stop worrying without stopping trusting.

  • They hold up a lab report. The turtle OCRs it, extracts meds and findings, and remembers them forever.
  • They eat lunch. The turtle notes the calories and sugar.
  • They mention "I'm having chest pain." The turtle drops its gentle mood, gets urgent, and you're alerted.
  • They go quiet at 10pm. The turtle logs a sleep observation. At 9am it says good morning.

You bought it for them. You see what matters. The relationship lives on the turtle, not in another app on your parent's phone they'll never open.


Why this matters

Most health apps fail because the user won't open them. A smartwatch gets taken off. A dashboard app gets ignored. Oki-Doki flips that: it lives on the laptop the recipient already uses every day, and the physical, emotional embodiment — turtle flippers, OLED eyes, 12 moods — is why they actually engage.

What it does:

  • ⌘⌃R — point at a lab report; Gemini OCRs + extracts findings + meds + severity.
  • ⌘⌃D — generates a 1-page plain-language doctor summary with talking points and red flags, copied to clipboard.
  • ⌘⌃F — photograph food; nutrition is extracted and the meal is logged.
  • Conversational: "Hey Shelly" — the turtle remembers profile, medications, recent observations, and semantically relevant long-term memories, injected into every LLM turn.
  • Ambient: infers sleep from HID idle, logs mood from voice, nudges on med schedules, flags urgent mentions ("chest pain", "I fell") for the gifter.

One Mac daemon. One ESP32. One Supabase schema with pgvector. The gifter dashboard reads off the same source of truth — no five-app sprawl, no dashboard your parent will never open.


What it does in 60 seconds

  1. Turtle sitting on the laptop, ambient blinking.
  2. "Hey Shelly, did I take my atorvastatin last night?" → Shelly (the turtle persona) answers based on the actual profile + recent messages.
  3. Hold a medical report to the camera + ⌘⌃R → Shelly says "I'll remember that for you", logs a medical_report observation with severity=medium, new medication auto-merges into users.profile.medications.
  4. Show a plate of food + ⌘⌃F → calories + sugar logged, mood turns happy, flipper waves.
  5. Say "I fell"mood flips to surprised, intensity 0.8, urgent observation with severity=urgent lands for the gifter.
  6. ⌘⌃D → a clean 1-page markdown doctor summary is saved to /tmp/oki_doctor_summary.md and on the clipboard.
  7. Switch to the gifter dashboard (friend's web) → daily feed, attention-needed flags, meal log all visible in real-time from Supabase.

Architecture

┌── On the recipient's Mac ──────────────────────────────────────┐
│                                                                │
│  Mic → WebRTC VAD → faster-whisper (local)                     │
│                                     │                          │
│  Camera (on-demand) ───┐            │                          │
│  Screen/app sampler ───┤            │                          │
│  HID idle detector ────┤            │                          │
│  Hotkeys ⌘⇧Space,⌘⌃F,R,D├─────▶ Event Bus ─▶ Scheduler         │
│  Cron (morning / 9pm /  │                 │                    │
│       random check-ins /│       ┌─────────┴─────────┐          │
│       med reminders) ───┘       │ rules.decide:     │          │
│  Conversation tracker ──────────▶ SKIP/LOG/         │          │
│  (passive capture)              │ CHEAP_LLM/BIG_LLM │          │
│                                  └─────────┬─────────┘         │
│                                            │                   │
│                  ┌─────────────────────────▼──────────────┐    │
│                  │ Gemini 2.5 pro/flash · structured JSON │    │
│                  │   ↑ profile + permanent memories +     │    │
│                  │     semantic-relevant memories injected│    │
│                  │     into every system prompt           │    │
│                  └─────────────────────────┬──────────────┘    │
│                                            │                   │
│            ┌───────────────────────────────┼──────────┐        │
│            ▼                               ▼          ▼        │
│   TTS (pyttsx3 / NSSpeech)         JSON over USB   SQLite      │
│   → Mac speaker                    → ESP32 firmware  + pgvec   │
│                                      (OLED eyes +   embeddings │
│                                       2 flippers)      │       │
└─────────────────────────────────────────────────────┼──────────┘
                                                      │
                                    (every 60s async) │
                                                      ▼
              ┌── Supabase (shared source of truth) ────────────┐
              │ users.profile (JSONB)                            │
              │ memories + vector(1536) + RPC match_memories()   │
              │ meals · observations · messages · activity       │
              │ Storage: photos bucket (food + reports)          │
              └──────────────────────────────────────────────────┘
                                      │
                       ┌──────────────┴──────────────┐
                       ▼                             ▼
             Gifter web (Next.js)           Emergency / daily-summary
             · daily feed                   · FCM push (TODO post-hack)
             · attention feed
             · medical history
             · meal log

Event vocabulary (14 types, 4 routing tiers)

User action Event Route Result
⌘⇧Space HOTKEY BIG_LLM Vision + mood reply + animation
⌘⌃F MEAL_LOG BIG_LLM Nutrition extraction + photo upload + meals row
⌘⌃R MEDICAL_REPORT_LOG BIG_LLM OCR + meds/conditions extracted, auto-merged into profile
⌘⌃D DOCTOR_SUMMARY BIG_LLM 1-page markdown summary → /tmp + clipboard
"Hey Shelly, …" SPEECH → wake BIG_LLM Conversational reply (60s follow-up window)
"chest pain" SPEECH → EMERGENCY BIG_LLM Urgent reply + severity='urgent' observation, bypasses budget
Ambient talk SPEECH LOG Fed into passive tracker
60s speech + 30s silence CONVERSATION_ENDED BIG_LLM Auto-summary + durable memory extraction
Med schedule hits MED_REMINDER CHEAP_LLM Profile-driven medication nudge
First activity ≥6am MORNING CHEAP_LLM Good-morning line
9pm first tick DAILY_SUMMARY BIG_LLM Day rollup
Random 30–90m active CHECK_IN CHEAP_LLM Proactive warm nudge, 5/day cap
Idle >10m → active ACTIVE (with was_idle_seconds) CHEAP_LLM Welcome back
>45m social media APP_DURATION CHEAP_LLM Doomscroll nudge
Night idle ≥4h (observation) Logged as inferred sleep

Rate limit: OMI_LLM_BUDGET_PER_HOUR (default 10). Emergencies bypass it.


Repository layout

oki-doki/
├── agent/                 Python daemon running on the Mac
│   ├── pyproject.toml
│   ├── demo_setup.sh      ← one-command pre-flight for demo-day
│   ├── scripts/
│   │   └── seed_demo_data.py   ← populate Supabase with 48h of realistic history
│   └── src/omi_companion/
│       ├── main.py             asyncio entry — spawns 10 concurrent workers
│       ├── config.py           pydantic-settings from .env
│       ├── events.py           EventKind enum + asyncio.Queue bus
│       ├── schemas.py          all LLM response shapes (mood, meal, report, doctor-summary, conversation)
│       ├── observability.py    /tmp/oki_status.json live snapshot
│       ├── perception/         camera · mic · screen · hid_idle · hotkey
│       ├── brain/              scheduler · rules · context · gemini · embeddings
│       │                       profile · medication · conversation (passive) · cron
│       │                       budget · personas
│       ├── speech/             stt (local whisper) · tts (pyttsx3 → NSSpeech)
│       ├── hardware/           serial_link (USB-JSON + MockHardware fallback)
│       └── storage/            sqlite (offline-first) · supabase_sync · supabase_storage
├── firmware/              ESP32 PlatformIO — SSD1306 OLED eyes + 2× SG90 flippers
│   ├── platformio.ini
│   ├── include/protocol.h
│   ├── src/main.cpp            state-machine mood renderer + flipper animator
│   └── README.md
├── web/                   Next.js 16 — landing / shop / gifter login (friend's scope)
└── shared/                Supabase schema + migrations + RLS
    ├── schema.sql
    ├── rls.sql
    ├── migration_002_health.sql   memory_type + importance + profile JSONB + severity
    ├── migration_003_memory_rpc.sql   match_memories + get_permanent_memories RPC
    └── migration_004_kinds.sql    expanded observations.kind domain

Naming note: the Python package namespace is omi_companion (legacy from the Omi-inspired scaffold). Product, CLI, log file, firmware all say Oki-Doki. We intentionally deferred a Python package rename to avoid import-churn during this cycle.


Tech stack at a glance

Layer Choice Why
LLM (reasoning / vision) Gemini 2.5 Pro Best schema adherence, multimodal, expensive only where needed
LLM (cheap / check-ins) Gemini 2.5 Flash 4× cheaper per image than 3.x, clean JSON
Embeddings gemini-embedding-001 @ 1536-d Matches pgvector column, free within project quota
STT faster-whisper small.en Local, private, no API cost, good enough for wake + passive capture
TTS macOS NSSpeechSynthesizer via pyttsx3 Free, offline, sounds human with Ava/Zoe Enhanced voice
Global hotkeys pynput GlobalHotKeys Works with Terminal Accessibility permission, cross-editor-safe combos
Screen / HID pyobjc (Quartz · NSWorkspace) Native macOS APIs
Local DB SQLite via aiosqlite Offline-first buffer, survives network loss
Cloud DB Supabase Postgres Auth + Realtime + pgvector + Storage, one service
Vector search pgvector inside Supabase Zero extra service — Omi uses Pinecone; we don't need to
Hardware ESP32 + SSD1306 0.96" OLED + 2× SG90 ~$8 BOM, I²C + PWM, everything off-the-shelf

Run it yourself

# 1. Fill in .env (see agent/.env.example)
cd agent
cp .env.example .env
$EDITOR .env                          # Gemini + Supabase keys

# 2. Apply schema to a fresh Supabase project
cd ../shared
psql $DATABASE_URL -f schema.sql
psql $DATABASE_URL -f rls.sql
psql $DATABASE_URL -f migration_002_health.sql
psql $DATABASE_URL -f migration_003_memory_rpc.sql
psql $DATABASE_URL -f migration_004_kinds.sql

# 3. Seed demo history + pre-flight check
cd ../agent
uv sync
uv run python scripts/seed_demo_data.py
./demo_setup.sh                       # Supabase + Gemini + embed + Whisper warm

# 4. Grant macOS permissions (one-time) for Terminal:
#    System Settings → Privacy & Security → Accessibility, Camera, Microphone

# 5. Run
uv run python -m omi_companion

# 6. In another pane, watch live state
watch -n1 'cat /tmp/oki_status.json | jq .'

For the hardware: cd firmware && pio run -t upload (see firmware/README.md). Then OMI_SERIAL_PORT=/dev/tty.usbserial-XXXX in .env.


Honest limitations

  • Not a medical device. Every LLM response carries a "not-a-doctor" guardrail. Reports are surfaced for discussion, not diagnosed.
  • Single-user in v1. OMI_USER_ID is hard-coded to one dev UUID. Multi-user + real Supabase Auth is a week of plumbing deliberately deferred.
  • Whisper wake-word matching is string-level. "hi Shelly" misses. Production would use Porcupine / OpenWakeWord.
  • Sleep signal is HID-inferred, not polysomnography. It knows you're away; it doesn't know you slept.
  • No FCM push to gifter yet — urgent observations surface in the dashboard feed but don't ping their phone. One cron + one webhook away.

Business model

Revenue stream Scale
Hardware bundle ($29 one-time) BOM ~$8, 3× margin
Personality / animal license upsell at checkout
Gifter subscription ($5/mo per recipient) cloud storage + dashboard + priority LLM budget
Optional Doctor-Export-Pro ($2/mo) auto-email monthly summary to family doctor

Target acquirer / customer: the adult child living in a different city from a parent they quietly worry about. The pitch on the shop page is "the quiet check-in you've been meaning to make, every day, on their terms."


Credits

Omi-inspired ambient architecture — we did an architecture audit of BasedHardware/omi and diverged where our scale didn't need the complexity.

Not affiliated with Omi / BasedHardware. We stood on their published architecture and simplified (1 database not 7, 1 LLM vendor not 3, no plugin marketplace, no speaker diarization) because our scale doesn't need it — yet.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors