Hyprvoice - Voice-Powered Typing for Hyprland / Wayland

Press a toggle key, speak, and get instant text input. Built natively for Wayland/Hyprland - no X11 hacks or workarounds, just clean integration with modern Linux desktops.

Features

Toggle workflow: Press once to start recording, press again to stop and inject text
Wayland native: Purpose-built for Wayland compositors - no legacy X11 dependencies or hacky workarounds
Real-time feedback: Desktop notifications for recording states and transcription status
Multiple transcription backends: OpenAI Whisper, Groq, Mistral Voxtral, and Eleven Labs Scribe (99 languages, excellent accuracy)
Smart text injection: Clipboard save/restore with direct typing fallback
Daemon architecture: Lightweight control plane with efficient pipeline management

Status: Beta - core functionality complete and tested, ready for early adopters

Installation

From AUR (Arch Linux) - Recommended

# Install hyprvoice and all dependencies automatically
yay -S hyprvoice-bin
# or
paru -S hyprvoice-bin

The AUR package automatically installs all dependencies (pipewire, wl-clipboard, wtype, etc.) and sets up the systemd service. Follow the post-install instructions to complete setup.

Alternative: Download Binary

For non-Arch users or testing:

# Download and install binary
wget https://github.com/leonardotrapani/hyprvoice/releases/latest/download/hyprvoice-linux-x86_64
mkdir -p ~/.local/bin
mv hyprvoice-linux-x86_64 ~/.local/bin/hyprvoice
chmod +x ~/.local/bin/hyprvoice

# Add to PATH (add to ~/.bashrc or ~/.zshrc)
export PATH="$HOME/.local/bin:$PATH"

# You'll need to manually install dependencies and create systemd service
# See Requirements section above

Build from Source

git clone https://github.com/leonardotrapani/hyprvoice.git
cd hyprvoice
go mod download
go build -o hyprvoice ./cmd/hyprvoice

# Install locally
mkdir -p ~/.local/bin
cp hyprvoice ~/.local/bin/
export PATH="$HOME/.local/bin:$PATH"

Requirements

Wayland desktop (Hyprland, Niri, GNOME, KDE, etc.)
PipeWire audio system with tools
API key for transcription: OpenAI, Groq, Mistral, or Eleven Labs API key (check each provider's pricing)

System packages (automatically installed with AUR package):

pipewire, pipewire-pulse, pipewire-audio - Audio capture
wl-clipboard - Clipboard integration
wtype - Text typing (Wayland)
ydotool - Text typing (universal, recommended for Chromium apps)
libnotify - Desktop notifications
systemd - User service management

For manual installation on other distros:

# Ubuntu/Debian
sudo apt install pipewire-pulse pipewire-bin wl-clipboard wtype ydotool libnotify-bin

# Fedora
sudo dnf install pipewire-utils wl-clipboard wtype ydotool libnotify

# For ydotool, you also need to start the daemon:
systemctl --user enable --now ydotool
# Or add user to input group for uinput access:
sudo usermod -aG input $USER

Quick Start

After installing via AUR:

Configure hyprvoice interactively:

hyprvoice configure

This wizard will guide you through setting up your transcription provider, API key, audio preferences, and other settings.

Enable and start the service:

systemctl --user enable --now hyprvoice.service

Add keybinding to your window manager:

# For Hyprland, add to ~/.config/hypr/hyprland.conf
bind = SUPER, R, exec, hyprvoice toggle

Test voice input:

# Check daemon status
hyprvoice status

# Toggle recording (or use your keybind)
hyprvoice toggle
# Speak something...
hyprvoice toggle  # Stop and transcribe

Quick Reference

Common Commands

# Interactive configuration wizard
hyprvoice configure

# Start the daemon
hyprvoice serve

# Toggle recording on/off
hyprvoice toggle

# Cancel current operation
hyprvoice cancel

# Check current status
hyprvoice status

# Get protocol version
hyprvoice version

# Stop the daemon (if not using systemd service)
hyprvoice stop

Keybinding Pattern

Most setups use this toggle pattern in window manager config:

bind = SUPER, R, exec, hyprvoice toggle
bind = SUPER SHIFT, R, exec, hyprvoice cancel  # Optional: cancel current operation

Keyboard Shortcuts Setup

Hyprland

Add to your ~/.config/hypr/hyprland.conf:

# Hyprvoice - Voice to Text (toggle recording)
bind = SUPER, R, exec, hyprvoice toggle

# Optional: Cancel current operation
bind = SUPER SHIFT, C, exec, hyprvoice cancel

# Optional: Status check
bind = SUPER SHIFT, R, exec, hyprvoice status && notify-send "Hyprvoice" "$(hyprvoice status)"

Usage Examples

Basic Toggle Workflow

Press keybind → Recording starts (notification appears)
Speak your text → Audio captured in real-time
Press keybind again → Recording stops, transcription begins
Text appears → Injected at cursor position or clipboard

Cancel anytime: Press your cancel keybind (e.g., SUPER+SHIFT+C) to abort the current operation and return to idle.

CLI Usage

# Start daemon manually (if not using systemd service)
hyprvoice serve

# In another terminal: toggle recording
hyprvoice toggle
# ... speak ...
hyprvoice toggle

# Check what's happening
hyprvoice status

Configuration

Use the interactive configuration wizard:

hyprvoice configure

This will guide you through setting up:

OpenAI API key for transcription
Language preferences (auto-detect or specific language)
Text injection method (clipboard/typing/fallback)
Notification settings
Recording timeout

Configuration is stored in ~/.config/hyprvoice/config.toml and can also be edited manually. Changes are applied immediately without restarting the daemon.

Transcription Providers

Hyprvoice supports multiple transcription backends:

OpenAI Whisper API

Cloud-based transcription using OpenAI's Whisper API:

[transcription]
provider = "openai"
api_key = "sk-..."              # Or set OPENAI_API_KEY environment variable
language = ""                   # Empty for auto-detect, or "en", "es", "fr", etc.
model = "whisper-1"

Features:

High-quality transcription
Supports 50+ languages
Auto-detection or specify language for better accuracy

Groq Whisper API (Transcription)

Fast cloud-based transcription using Groq's Whisper API:

[transcription]
provider = "groq-transcription"
api_key = "gsk_..."             # Or set GROQ_API_KEY environment variable
language = ""                   # Empty for auto-detect, or "en", "es", "fr", etc.
model = "whisper-large-v3"      # Or "whisper-large-v3-turbo" for faster processing

Features:

Ultra-fast transcription (significantly faster than OpenAI)
Same Whisper model quality
Supports 50+ languages
Free tier available with generous limits

Groq Translation API

Fast translation of audio to English using Groq's Whisper API:

[transcription]
provider = "groq-translation"
api_key = "gsk_..."             # Or set GROQ_API_KEY environment variable
language = "es"                 # Optional: hint source language for better accuracy
model = "whisper-large-v3-turbo"

Features:

Translates any language audio → English text
Ultra-fast processing
Language field hints at source language (improves accuracy)
Always outputs English regardless of input language

Generated Configuration Example

The daemon automatically creates ~/.config/hyprvoice/config.toml with helpful comments:

# Hyprvoice Configuration
# This file is automatically generated with defaults.
# Edit values as needed - changes are applied immediately without daemon restart.

# Audio Recording Configuration
[recording]
  sample_rate = 16000          # Audio sample rate in Hz (16000 recommended for speech)
  channels = 1                 # Number of audio channels (1 = mono, 2 = stereo)
  format = "s16"               # Audio format (s16 = 16-bit signed integers)
  buffer_size = 8192           # Internal buffer size in bytes (larger = less CPU, more latency)
  device = ""                  # PipeWire audio device (empty = use default microphone)
  channel_buffer_size = 30     # Audio frame buffer size (frames to buffer)
  timeout = "5m"               # Maximum recording duration (e.g., "30s", "2m", "5m")

# Speech Transcription Configuration
[transcription]
  provider = "openai"          # Transcription service: "openai", "groq-transcription", or "groq-translation"
  api_key = ""                 # API key (or set OPENAI_API_KEY/GROQ_API_KEY environment variable)
  language = ""                # Language code (empty for auto-detect, "en", "it", "es", "fr", etc.)
  model = "whisper-1"          # Model: OpenAI="whisper-1", Groq="whisper-large-v3" or "whisper-large-v3-turbo"

# Text Injection Configuration
[injection]
  backends = ["ydotool", "wtype", "clipboard"]  # Ordered fallback chain
  ydotool_timeout = "5s"       # Timeout for ydotool commands
  wtype_timeout = "5s"         # Timeout for wtype commands
  clipboard_timeout = "3s"     # Timeout for clipboard operations

# Desktop Notification Configuration
[notifications]
  enabled = true               # Enable desktop notifications
  type = "desktop"             # Notification type ("desktop", "log", "none") -- always keep "desktop" unless debugging

whisper.cpp Local (Planned) -> Not yet implemented

Private, offline transcription using local models:

[transcription]
provider = "whisper_cpp"
model_path = "~/models/ggml-base.en.bin"
threads = 4

Recording Configuration

Audio capture settings:

[recording]
sample_rate = 16000        # Audio sample rate in Hz
channels = 1               # Number of audio channels (1 for mono)
format = "s16"             # Audio format (s16 recommended)
buffer_size = 8192         # Internal buffer size in bytes
device = ""                # PipeWire device (empty for default)
channel_buffer_size = 30   # Audio frame buffer size
timeout = "5m"             # Maximum recording duration (prevents runaway recordings)

Recording Timeout:

Prevents accidental long recordings that could consume resources
Default: 5 minutes ("5m")
Format: Go duration strings like "30s", "2m", "10m"
Recording automatically stops when timeout is reached

Text Injection

Configurable text injection with multiple backends:

[injection]
backends = ["ydotool", "wtype", "clipboard"]  # Ordered fallback chain
ydotool_timeout = "5s"
wtype_timeout = "5s"
clipboard_timeout = "3s"

Injection Backends:

ydotool: Uses ydotool (requires ydotoold daemon for ydotool v1.0.0+). Most compatible with Chromium/Electron apps.
wtype: Uses wtype for Wayland. May have issues with some Chromium-based apps (known upstream bug).
clipboard: Copies text to clipboard only. Most reliable, but requires manual paste.

Fallback Chain:

Backends are tried in order. The first successful one wins. Example configurations:

# Clipboard only (safest, always works)
backends = ["clipboard"]

# wtype with clipboard fallback
backends = ["wtype", "clipboard"]

# Full fallback chain (default) - best compatibility
backends = ["ydotool", "wtype", "clipboard"]

# ydotool only (if you have it set up)
backends = ["ydotool"]

ydotool Setup:

ydotool requires the ydotoold daemon running (for ydotool v1.0.0+) and access to /dev/uinput:

# Start ydotool daemon (systemd)
systemctl --user enable --now ydotool

# Or add user to input group
sudo usermod -aG input $USER
# Then logout/login

# For Hyprland, add to config to set correct keyboard layout:
# device:ydotoold-virtual-device {
#     kb_layout = us
# }

Behavior:

Backends are tried in order until one succeeds
Include clipboard in the chain if you want text copied to clipboard as fallback

Notifications

Desktop notification settings:

[notifications]
enabled = true             # Enable/disable notifications
type = "desktop"           # "desktop", "log", or "none"

Notification Types:

desktop: Use notify-send for desktop notifications
log: Log messages to console only
none: Disable all notifications

Always keep type = "desktop" unless debugging.

Custom Notification Messages

You can customize notification text via the [notifications.messages] section.

[notifications.messages]
  [notifications.messages.recording_started]
    title = "Hyprvoice"
    body = "Recording Started"
  [notifications.messages.transcribing]
    title = "Hyprvoice"
    body = "Recording Ended... Transcribing"
  [notifications.messages.config_reloaded]
    title = "Hyprvoice"
    body = "Config Reloaded"
  [notifications.messages.operation_cancelled]
    title = "Hyprvoice"
    body = "Operation Cancelled"
  [notifications.messages.recording_aborted]
    body = "Recording Aborted"
  [notifications.messages.injection_aborted]
    body = "Injection Aborted"

Configuration Hot-Reloading

The daemon automatically watches the config file for changes and applies them immediately:

Notification settings: Applied instantly
Injection settings: Applied to current and future operations
Recording/Transcription settings: Applied to new recording sessions
Invalid configs: Rejected with error notification, daemon continues with previous config

Service Management

The systemd user service is automatically installed with the AUR package:

# Check service status
systemctl --user status hyprvoice.service

# Start/stop service
systemctl --user start hyprvoice.service
systemctl --user stop hyprvoice.service

# Enable/disable autostart
systemctl --user enable hyprvoice.service
systemctl --user disable hyprvoice.service

# View logs
journalctl --user -u hyprvoice.service -f

File Locations

Socket: ~/.cache/hyprvoice/control.sock - IPC communication
PID file: ~/.cache/hyprvoice/hyprvoice.pid - Process tracking
Config: ~/.config/hyprvoice/config.toml - User settings (planned)

Development Status

Component	Status	Notes
Core daemon & IPC	✅	Unix socket control plane
Recording workflow	✅	Toggle recording via PipeWire
Audio capture	✅	Efficient PipeWire integration
Desktop notifications	✅	Status feedback via notify-send
OpenAI transcription	✅	HTTP API integration
Groq transcription	✅	Fast Whisper API with transcription and translation
Text injection	✅	Clipboard + wtype with fallback
Configuration system	✅	TOML-based user settings with hot-reload
Interactive setup	✅	`hyprvoice configure` wizard for easy setup
Unit test coverage	✅	Comprehensive test suite (100% pass)
CI/CD Pipeline	✅	Automated builds and releases via GitHub Actions
Installation (AUR etc)	✅	AUR package with automated dependency installation
Light dictation models	⏳	Alternatives to whispers for light and fast dictation
whisper.cpp support	⏳	Local model inference

Legend: ✅ Complete · ⏳ Planned

Architecture Overview

Hyprvoice uses a daemon + pipeline architecture for efficient resource management:

Control Daemon: Lightweight IPC server managing lifecycle
Pipeline: Stateful audio processing (recording → transcribing → injecting)
State Machine: idle → recording → transcribing → injecting → idle

System Architecture

flowchart LR
  subgraph Client
    CLI["CLI/Tool"]
  end
  subgraph Daemon
    D["Control Daemon (lifecycle + IPC)"]
  end
  subgraph Pipeline
    A["Audio Capture"]
    T["Transcribing"]
    I["Injecting (wtype + clipboard)"]
  end
  N["notify-send/log"]

  CLI -- unix socket --> D
  D -- start/stop --> A
  A -- frames --> T
  T -- status --> D
  D -- events --> N
  D -- inject action --> T
  T --> I
  I -->|done| D

stateDiagram-v2
  [*] --> idle
  idle --> recording: toggle
  recording --> transcribing: first_frame
  transcribing --> injecting: inject_action
  injecting --> idle: done
  recording --> idle: abort
  injecting --> idle: abort

How It Works

Toggle recording → Pipeline starts, audio capture begins
Audio streaming → PipeWire frames buffered for transcription
Toggle stop → Recording ends, transcription starts
Text injection → Result typed or copied to clipboard
Return to idle → Pipeline cleaned up, ready for next session

Data Flow

toggle (daemon) → create pipeline → recording
First frame arrives → transcribing (daemon may notify Transcribing later)
Audio frames → audio buffer (collect all audio during session)
Second toggle during transcribing → send inject action → transcribe collected audio → injecting (simulated)
Complete → idle; pipeline stops; daemon clears reference
Notifications at key transitions

Troubleshooting

Common Issues

Daemon Issues

Daemon won't start:

# Check if already running
hyprvoice status

# Check for stale files
ls -la ~/.cache/hyprvoice/

# Clean up and restart
rm -f ~/.cache/hyprvoice/hyprvoice.pid
rm -f ~/.cache/hyprvoice/control.sock
hyprvoice serve

Command not found:

# Check installation
which hyprvoice

# Add to PATH if using ~/.local/bin
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Audio Issues

No audio recording:

# Check PipeWire is running
systemctl --user status pipewire

# Test microphone
pw-record --help
pw-record test.wav

# Check microphone permissions and levels

Audio device issues:

# List available audio devices
pw-cli list-objects | grep -A5 -B5 Audio

# Check microphone is not muted in system settings

Notification Issues

No desktop notifications:

# Test notify-send directly
notify-send "Test" "This is a test notification"

# Install if missing
sudo pacman -S libnotify  # Arch
sudo apt install libnotify-bin  # Ubuntu/Debian

Text Injection Issues

Text not appearing:

Ensure cursor is in a text field when toggling off recording

Check that wtype and wl-clipboard tools are installed:

# Test wtype directly
wtype "test text"

# Test clipboard tools
echo "test" | wl-copy
wl-paste

Verify Wayland compositor supports text input protocols
Check injection mode in configuration (fallback mode is most robust)

Clipboard issues:

# Install wl-clipboard if missing
sudo pacman -S wl-clipboard  # Arch
sudo apt install wl-clipboard  # Ubuntu/Debian

# Test clipboard functionality
wl-copy "test text"
wl-paste

Debug Mode

# Run daemon with verbose output
hyprvoice serve

# Check logs from systemd service (or just see results from hyprvoice serve)
journalctl --user -u hyprvoice.service -f

# Test individual commands
hyprvoice toggle
hyprvoice status

Development

Building from Source

git clone https://github.com/leonardotrapani/hyprvoice.git
cd hyprvoice
go mod download
go build -o hyprvoice ./cmd/hyprvoice

# Install locally
mkdir -p ~/.local/bin
cp hyprvoice ~/.local/bin/
export PATH="$HOME/.local/bin:$PATH"

For Maintainers

Publishing to AUR

See packaging/RELEASE.md for complete release process including AUR deployment.

Quick start for AUR:

# After creating your first GitHub release
cd packaging/
./setup-aur.sh    # One-time AUR repository setup

Project Structure

hyprvoice/
├── cmd/hyprvoice/         # CLI application entry point
├── internal/
│   ├── bus/              # IPC (Unix socket) + PID management
│   ├── daemon/           # Control daemon (lifecycle management)
│   ├── injection/        # Text injection (clipboard + wtype)
│   ├── notify/           # Desktop notification integration
│   ├── pipeline/         # Audio processing pipeline + state machine
│   ├── recording/        # PipeWire audio capture
│   └── transcriber/      # Transcription adapters (OpenAI, whisper.cpp)
├── go.mod                # Go module definition
└── README.md

Development Workflow

# Terminal 1: Run daemon with logs
go run ./cmd/hyprvoice serve

# Terminal 2: Test commands
go run ./cmd/hyprvoice toggle
go run ./cmd/hyprvoice status
go run ./cmd/hyprvoice stop

IPC Protocol

Simple single-character commands over Unix socket:

t - Toggle recording on/off
c - Cancel current operation
s - Get current status
v - Get protocol version
q - Quit daemon gracefully

Contributing

Contributions welcome! Please:

Follow existing code conventions and patterns
Add tests for new functionality when available
Update documentation for user-facing changes
Test on Hyprland/Wayland before submitting PRs

License

MIT License - see LICENSE.md for details.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.github/workflows		.github/workflows
cmd/hyprvoice		cmd/hyprvoice
internal		internal
packaging		packaging
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
go.mod		go.mod
go.sum		go.sum

License

LeonardoTrapani/hyprvoice

Folders and files

Latest commit

History

Repository files navigation

Hyprvoice - Voice-Powered Typing for Hyprland / Wayland

Features

Installation

From AUR (Arch Linux) - Recommended

Alternative: Download Binary

Build from Source

Requirements

Quick Start

Quick Reference

Common Commands

Keybinding Pattern

Keyboard Shortcuts Setup

Hyprland

Usage Examples

Basic Toggle Workflow

CLI Usage

Configuration

Transcription Providers

OpenAI Whisper API

Groq Whisper API (Transcription)

Groq Translation API

Generated Configuration Example

whisper.cpp Local (Planned) -> Not yet implemented

Recording Configuration

Text Injection

Notifications

Custom Notification Messages

Configuration Hot-Reloading

Service Management

File Locations

Development Status

Architecture Overview

System Architecture

How It Works

Data Flow

Troubleshooting

Common Issues

Daemon Issues

Audio Issues

Notification Issues

Text Injection Issues

Debug Mode

Development

Building from Source

For Maintainers

Publishing to AUR

Project Structure

Development Workflow

IPC Protocol

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Contributors 5

Uh oh!

Languages

Packages