Skip to content

Isabel-S/perception-llm

Repository files navigation

Person Perception

A local web application with a chat interface for interacting with an LLM and a visualization panel that displays insights based on the responses.

Screenshots

Run conversations — after a few turns of a simulated eval, the right panel shows live mental-model scores and a chart of how they evolve:

Run conversations tab with mental model

Explore conversations — browse saved Spiral-Bench eval runs turn by turn with the mental model JSON per turn:

Explore chatlog view

Chart view — average mental-model scores across 20 turns per scenario category; this run shows a clear upward trend in validation_seeking and user_rightness:

Explore chart view with trend

Features

  • 💬 Chat interface for LLM conversations
  • 📊 Visualization panel with live mental-model scores and per-turn chart
  • 🧠 Multiple mental model types: Induct, Support, Structured, Person Perception
  • 🔁 Simulated eval runner (Spiral-Bench, 30 scenarios × 20 turns)
  • 📂 Explore saved runs: chatlog and chart views
  • 🔌 Azure OpenAI (GPT-4o), Google Gemini, and Llama (Vertex AI) support

Setup

  1. Install dependencies:
npm install
  1. Start the development server:
npm run dev
  1. Open your browser to the URL shown in the terminal (usually http://localhost:5173)

Azure API Integration

The app supports Azure OpenAI (GPT-4o) and Google Gemini. You can switch the API model in the UI (dropdown “API model”) or in the terminal with --api_gemini or --api_gpt-4o when running npm run run_eval.

  1. Copy the example environment file (if you have one) or create a .env file in the project root.

  2. Add your keys to .env:

Azure OpenAI (GPT-4o):

VITE_AZURE_ENDPOINT=your-actual-endpoint-here
VITE_AZURE_API_KEY=your-actual-api-key-here
VITE_AZURE_DEPLOYMENT=gpt-4o
VITE_AZURE_API_VERSION=2024-12-01-preview

Google Gemini (optional; use when API model = Gemini):

  • Browser / API key: Get an API key from Google AI Studio and set:
    VITE_GEMINI_API_KEY=your-gemini-api-key-here
    
  • CLI / Node (Vertex AI with service account): Use a Google Cloud service account JSON key (like the one you pasted) and set:
    GOOGLE_APPLICATION_CREDENTIALS=./new_key.json   # or another path to your JSON
    VITE_GEMINI_PROJECT_ID=your-gcp-project-id      # optional if JSON has project_id
    
    Optional: VITE_GEMINI_LOCATION=us-central1, VITE_GEMINI_MODEL=gemini-1.5-flash (or e.g. gemini-2.5-pro).

Default provider (optional):

# VITE_API_PROVIDER=gpt-4o
# or
# VITE_API_PROVIDER=gemini
  1. Replace placeholder values with your actual keys.

  2. Restart the development server for the changes to take effect.

How data is saved

When you run evals (from the UI or via npm run run_eval), results are written under the data/ folder. Every saved JSON now includes api_model (e.g. "gpt-4o" or "gemini") so you know which API produced the run.

Where things go

Mode Command / UI Save location
Single-call --single_call --model_induct (or --model_support_2) data/single_call/<model>/run_<api_model>_<N>/ → e.g. run_gemini_1, run_gpt-4o_2. One folder per run; inside it, one JSON per scenario (e.g. spiral_tropes/sc01.json).
Separate call --separate_call --convo_<N> --model_induct data/separate_call/convo_<N>/<model>/run_<api_model>_<N>/ → same structure (e.g. run_gemini_1).
Generate convos --generate_convo data/separate_call/convo_<N>/ → only user/assistant turns (no mental model); used later by separate_call.
Human data --human_data --model_induct --filename do_not_upload/h01.json data/do_not_upload/<filename_no_ext>/<filename_no_ext>_<api_model>_<mental_model_type>.json → e.g. h01_gemini_induct.json, h01_gpt-4o_induct.json. One file per (source, api model, mental model); re-running overwrites/resumes that file.
Backfill empty --backfill_empty --model_induct --file <path> Overwrites the given file, filling in missing mental models and updating meta.api_model.

What’s in the JSON

  • Scenario runs (single_call / separate_call): Each file has category, prompt_id, categoryInjection, extraInjection, api_model, turns, and situation_log. Each turn has turnIndex, userMessage, assistantMessage, and mentalModel.
  • Human data runs: Top-level meta (includes api_model, source, mentalModelType, turns_recorded_up_to, etc.) and turns (same turn shape as above).

So you can always see which API model was used for a run by checking api_model in the saved JSON.

Resuming a run after a crash or error

Single-call runs: If a run fails mid-way (e.g. API error, rate limit, or OAuth error), re-run the same command with --resume_run <runId>. Use the same --api_*, --seed, and --prior (if you used them) as the original run. The script loads existing scenario JSONs from the run folder, skips scenarios that already have 20 turns, and continues from the first incomplete scenario (re-running that scenario from the beginning, then the rest).

Example (run folder run_gemini_3_prior):

npm run run_eval -- --single_call --model_induct --resume_run run_gemini_3_prior --api_gemini --prior

If you used a seed, add it (e.g. --seed 42). The run ID is the folder name under data/single_call/<model>/, e.g. run_gemini_3_prior, run_gpt-4o_2.

Human data runs: The CLI writes a checkpoint file after each turn. Re-run the same --human_data --filename ... command; it will detect the checkpoint and resume from the next turn.

Project Structure

├── src/
│   ├── components/
│   │   ├── ChatInterface.jsx           # Chat UI
│   │   ├── ExploreConversations.jsx    # Browse saved eval runs (chatlog + chart)
│   │   └── VisualizationPanel.jsx      # Mental model scores + per-turn chart
│   ├── eval/
│   │   ├── categories.js              # Spiral-Bench category injections
│   │   ├── default_prompt.js          # Seeker LLM system prompt
│   │   ├── injections.js              # Per-scenario extra injections
│   │   ├── mental_model_prompts.js    # Prompt builders + response parsers
│   │   └── scenarios.js              # Spiral-Bench scenario list
│   ├── services/
│   │   └── api.js                     # LLM API calls (Azure, Gemini, Llama)
│   ├── App.jsx                        # Root component + eval orchestration
│   └── main.jsx                       # Entry point
├── scripts/
│   ├── run_eval.js                    # CLI eval runner (npm run run_eval)
│   └── generate-spiral-manifest.js   # Rebuild public data manifest
├── data/                              # Saved eval run JSONs (gitignored)
├── screenshots/                       # README screenshots
├── index.html
├── package.json
└── vite.config.js                     # Dev server + data middleware + build plugin

Customization

  • Mental model types: Add or modify prompts and parsers in src/eval/mental_model_prompts.js
  • Visualization Panel: Edit VisualizationPanel.jsx to add new score series or chart types
  • Eval scenarios: Swap in different scenarios via src/eval/scenarios.js and src/eval/categories.js
  • API providers: Add new LLM backends in src/services/api.js

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors