Skip to content

danielrosehill/Diarised-Transcript-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Diarised-Transcript-Assistant

Lightweight prompt-and-schema toolkit for producing diarised transcripts and meeting‑style summaries from audio recordings using an LLM with speech‑to‑text (works especially well with Gemini). The approach favors clear, deterministic prompting and simple diarisation via speaker–gender matching instead of heavy diarisation pipelines.

Overview

This project provides system prompts and a structured output schema that guide an AI assistant to:

  • Convert audio to text and attribute turns to named speakers using a descriptive mapping (e.g., “male voice is Daniel, female voice is Sarah”).
  • Apply minimal, non‑substantive textual edits to improve intelligibility (remove fillers, break up long turns) while preserving meaning.
  • Produce both a concise business‑style summary and a readable, diarised transcript with optional emotion/voice cues.
  • Optionally emit a machine‑parseable JSON block that mirrors the transcript and metadata for downstream use (minutes, redaction, analytics).

Primary use case: generating likely‑redacted versions of meeting minutes in business contexts, where clarity, consistency, and structured output are crucial.

Why This Approach

  • Lightweight diarisation: map speakers by descriptive gender/identity hints rather than running separate speaker‑embedding pipelines.
  • Deterministic prompting: encourages reproducible, well‑formatted transcripts and summaries.
  • Redaction‑friendly: structured output simplifies redacting sensitive fields while keeping the readable transcript intact.

Repository Structure

.
├─ README.md                      # You are here
├─ examples/
│  └─ weather_call.md             # Sample formatted output (illustrative)
└─ system-prompts/
   ├─ assistant.md                # Human-readable transcript + summary prompt
   └─ structured/
      ├─ prompt.md                # JSON-first + Markdown transcript prompt
      └─ schema.json              # JSON schema for structured output

Quick Start (Gemini)

  1. Choose your mode
  • Human‑readable only: use system-prompts/assistant.md.
  • JSON + Markdown: use system-prompts/structured/prompt.md (validates against structured/schema.json).
  1. Provide inputs
  • Audio recording of the conversation (upload or reference).
  • Speaker mapping (for diarisation): e.g., “the male voice is Daniel; the female voice is Sarah.” Include any known roles if helpful.
  • Optional context: date, local time and UTC, medium (e.g., WhatsApp, in‑person), recording method, microphone, location.
  1. Run the prompt
  • In Gemini, start a new chat and paste the chosen system prompt in full.
  • Upload/attach the audio.
  • Add a short user message with your speaker mapping and context (see examples below).
  1. Receive output
  • Assistant returns a descriptive title, optional context note, a particulars table, a concise summary, and a diarised transcript.
  • In structured mode, a fenced JSON block appears at the top followed by the readable Markdown.

Example User Message

Use this alongside the appropriate system prompt.

Audio: <uploaded file>
Participants: The male voice is Daniel; the female voice is Sarah.
Context: Recorded March 2, 2025, 10:00 AM IST (07:00 UTC), face-to-face in Daniel’s office. Device: Zoom H1n, mic: built-in stereo.
Goal: Return the diarised transcript and a concise summary per the format.

For JSON + Markdown, explicitly request “structured output per schema followed by readable transcript.”

Output Format (Readable)

The prompts are designed to return:

  • Title: short, descriptive.
  • Optional italicized context note.
  • Call particulars: a Markdown table (participants, date, local/UTC time, medium, recording method, microphone, location).
  • Separator: ---.
  • Summary: 3–6 neutral sentences focused on key points and outcomes.
  • Transcript: diarised turns with **Speaker** names, optional emotion in parentheses (only when strongly noticeable), aggressive paragraphing for readability, and blank lines between turns.

See system-prompts/assistant.md for full formatting rules and an example.

Structured Output (JSON)

When using system-prompts/structured/prompt.md, the assistant first emits a fenced JSON block conforming to system-prompts/structured/schema.json, then the readable Markdown transcript.

Top‑level JSON fields include: title, context_note, particulars (participants, date/time, medium, recording_method, microphone, location), summary, and an array transcript with { speaker, emotion, text } entries. Unknown metadata should be null rather than omitted.

This enables downstream processing (e.g., minutes generation, analytics, selective redaction).

Diarisation Strategy

  • Provide a descriptive mapping of voices to identities/genders in your user message.
  • The assistant attributes each turn using this mapping; emotion notes are optional and only added if strongly noticeable.
  • This trades complex speaker‑embedding pipelines for a prompt‑driven, explainable approach suitable for many business calls with distinct voices.

Limitations:

  • Overlapping speech, very similar voices, or missing/ambiguous mappings can cause occasional misattribution.
  • If roles or names change mid‑call, include clarifications in your message.

Redaction & Intelligibility

  • The assistant applies only non‑substantive edits: removes filler words and obvious false starts; adds paragraph breaks for readability; preserves meaning.
  • For redaction, use the JSON to locate sensitive fields or instruct the model to replace protected items with [REDACTED] in both JSON and Markdown.
  • Always review generated content for compliance and correctness before distribution.

Tips for Deterministic Results

  • Paste the entire system prompt verbatim.
  • Provide precise participant mapping and any known roles (e.g., “Daniel, male, Product Manager; Sarah, female, Legal”).
  • Supply time, medium, and recording details; they improve the particulars table and title.
  • Keep your user message concise and structured (see example above).

Roadmap Ideas

  • Optional local STT pre‑pass (e.g., Whisper) with the same formatting prompts.
  • Automatic turn segmentation with VAD before prompting the LLM.
  • Simple UI or CLI to package prompt + audio + mapping in one step.
  • Lightweight speaker naming assistance (name suggestions from few-shot cues).

Contributing

Issues and PRs are welcome. Keep prompt changes minimal, additive, and well‑justified; avoid introducing non‑deterministic behavior. For schema updates, propose changes in system-prompts/structured/schema.json with examples.

License

No license file is currently provided. If you plan to use or distribute this repository, consider adding a license appropriate to your needs.

About

System prompt for generating diarised transcripts (STT plus stylistic guidance)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published