Press a toggle key, speak, and get instant text input. Built natively for Wayland/Hyprland - no X11 hacks or workarounds, just clean integration with modern Linux desktops.
- Toggle workflow: Press once to start recording, press again to stop and inject text
- Wayland native: Purpose-built for Wayland compositors - no legacy X11 dependencies or hacky workarounds
- Real-time feedback: Desktop notifications for recording states and transcription status
- Multiple transcription backends: OpenAI Whisper, Groq, Mistral Voxtral, and Eleven Labs Scribe (99 languages, excellent accuracy)
- Smart text injection: Clipboard save/restore with direct typing fallback
- Daemon architecture: Lightweight control plane with efficient pipeline management
Status: Beta - core functionality complete and tested, ready for early adopters
# Install hyprvoice and all dependencies automatically
yay -S hyprvoice-bin
# or
paru -S hyprvoice-binThe AUR package automatically installs all dependencies (pipewire, wl-clipboard, wtype, etc.) and sets up the systemd service. Follow the post-install instructions to complete setup.
For non-Arch users or testing:
# Download and install binary
wget https://github.com/leonardotrapani/hyprvoice/releases/latest/download/hyprvoice-linux-x86_64
mkdir -p ~/.local/bin
mv hyprvoice-linux-x86_64 ~/.local/bin/hyprvoice
chmod +x ~/.local/bin/hyprvoice
# Add to PATH (add to ~/.bashrc or ~/.zshrc)
export PATH="$HOME/.local/bin:$PATH"
# You'll need to manually install dependencies and create systemd service
# See Requirements section abovegit clone https://github.com/leonardotrapani/hyprvoice.git
cd hyprvoice
go mod download
go build -o hyprvoice ./cmd/hyprvoice
# Install locally
mkdir -p ~/.local/bin
cp hyprvoice ~/.local/bin/
export PATH="$HOME/.local/bin:$PATH"- Wayland desktop (Hyprland, Niri, GNOME, KDE, etc.)
- PipeWire audio system with tools
- API key for transcription: OpenAI, Groq, Mistral, or Eleven Labs API key (check each provider's pricing)
System packages (automatically installed with AUR package):
pipewire,pipewire-pulse,pipewire-audio- Audio capturewl-clipboard- Clipboard integrationwtype- Text typing (Wayland)ydotool- Text typing (universal, recommended for Chromium apps)libnotify- Desktop notificationssystemd- User service management
For manual installation on other distros:
# Ubuntu/Debian
sudo apt install pipewire-pulse pipewire-bin wl-clipboard wtype ydotool libnotify-bin
# Fedora
sudo dnf install pipewire-utils wl-clipboard wtype ydotool libnotify
# For ydotool, you also need to start the daemon:
systemctl --user enable --now ydotool
# Or add user to input group for uinput access:
sudo usermod -aG input $USERAfter installing via AUR:
- Configure hyprvoice interactively:
hyprvoice configureThis wizard will guide you through setting up your transcription provider, API key, audio preferences, and other settings.
- Enable and start the service:
systemctl --user enable --now hyprvoice.service- Add keybinding to your window manager:
# For Hyprland, add to ~/.config/hypr/hyprland.conf
bind = SUPER, R, exec, hyprvoice toggle- Test voice input:
# Check daemon status
hyprvoice status
# Toggle recording (or use your keybind)
hyprvoice toggle
# Speak something...
hyprvoice toggle # Stop and transcribe# Interactive configuration wizard
hyprvoice configure
# Start the daemon
hyprvoice serve
# Toggle recording on/off
hyprvoice toggle
# Cancel current operation
hyprvoice cancel
# Check current status
hyprvoice status
# Get protocol version
hyprvoice version
# Stop the daemon (if not using systemd service)
hyprvoice stopMost setups use this toggle pattern in window manager config:
bind = SUPER, R, exec, hyprvoice toggle
bind = SUPER SHIFT, R, exec, hyprvoice cancel # Optional: cancel current operationAdd to your ~/.config/hypr/hyprland.conf:
# Hyprvoice - Voice to Text (toggle recording)
bind = SUPER, R, exec, hyprvoice toggle
# Optional: Cancel current operation
bind = SUPER SHIFT, C, exec, hyprvoice cancel
# Optional: Status check
bind = SUPER SHIFT, R, exec, hyprvoice status && notify-send "Hyprvoice" "$(hyprvoice status)"- Press keybind → Recording starts (notification appears)
- Speak your text → Audio captured in real-time
- Press keybind again → Recording stops, transcription begins
- Text appears → Injected at cursor position or clipboard
Cancel anytime: Press your cancel keybind (e.g., SUPER+SHIFT+C) to abort the current operation and return to idle.
# Start daemon manually (if not using systemd service)
hyprvoice serve
# In another terminal: toggle recording
hyprvoice toggle
# ... speak ...
hyprvoice toggle
# Check what's happening
hyprvoice statusUse the interactive configuration wizard:
hyprvoice configureThis will guide you through setting up:
- OpenAI API key for transcription
- Language preferences (auto-detect or specific language)
- Text injection method (clipboard/typing/fallback)
- Notification settings
- Recording timeout
Configuration is stored in ~/.config/hyprvoice/config.toml and can also be edited manually. Changes are applied immediately without restarting the daemon.
Hyprvoice supports multiple transcription backends:
Cloud-based transcription using OpenAI's Whisper API:
[transcription]
provider = "openai"
api_key = "sk-..." # Or set OPENAI_API_KEY environment variable
language = "" # Empty for auto-detect, or "en", "es", "fr", etc.
model = "whisper-1"Features:
- High-quality transcription
- Supports 50+ languages
- Auto-detection or specify language for better accuracy
Fast cloud-based transcription using Groq's Whisper API:
[transcription]
provider = "groq-transcription"
api_key = "gsk_..." # Or set GROQ_API_KEY environment variable
language = "" # Empty for auto-detect, or "en", "es", "fr", etc.
model = "whisper-large-v3" # Or "whisper-large-v3-turbo" for faster processingFeatures:
- Ultra-fast transcription (significantly faster than OpenAI)
- Same Whisper model quality
- Supports 50+ languages
- Free tier available with generous limits
Fast translation of audio to English using Groq's Whisper API:
[transcription]
provider = "groq-translation"
api_key = "gsk_..." # Or set GROQ_API_KEY environment variable
language = "es" # Optional: hint source language for better accuracy
model = "whisper-large-v3-turbo"Features:
- Translates any language audio → English text
- Ultra-fast processing
- Language field hints at source language (improves accuracy)
- Always outputs English regardless of input language
The daemon automatically creates ~/.config/hyprvoice/config.toml with helpful comments:
# Hyprvoice Configuration
# This file is automatically generated with defaults.
# Edit values as needed - changes are applied immediately without daemon restart.
# Audio Recording Configuration
[recording]
sample_rate = 16000 # Audio sample rate in Hz (16000 recommended for speech)
channels = 1 # Number of audio channels (1 = mono, 2 = stereo)
format = "s16" # Audio format (s16 = 16-bit signed integers)
buffer_size = 8192 # Internal buffer size in bytes (larger = less CPU, more latency)
device = "" # PipeWire audio device (empty = use default microphone)
channel_buffer_size = 30 # Audio frame buffer size (frames to buffer)
timeout = "5m" # Maximum recording duration (e.g., "30s", "2m", "5m")
# Speech Transcription Configuration
[transcription]
provider = "openai" # Transcription service: "openai", "groq-transcription", or "groq-translation"
api_key = "" # API key (or set OPENAI_API_KEY/GROQ_API_KEY environment variable)
language = "" # Language code (empty for auto-detect, "en", "it", "es", "fr", etc.)
model = "whisper-1" # Model: OpenAI="whisper-1", Groq="whisper-large-v3" or "whisper-large-v3-turbo"
# Text Injection Configuration
[injection]
backends = ["ydotool", "wtype", "clipboard"] # Ordered fallback chain
ydotool_timeout = "5s" # Timeout for ydotool commands
wtype_timeout = "5s" # Timeout for wtype commands
clipboard_timeout = "3s" # Timeout for clipboard operations
# Desktop Notification Configuration
[notifications]
enabled = true # Enable desktop notifications
type = "desktop" # Notification type ("desktop", "log", "none") -- always keep "desktop" unless debuggingPrivate, offline transcription using local models:
[transcription]
provider = "whisper_cpp"
model_path = "~/models/ggml-base.en.bin"
threads = 4Audio capture settings:
[recording]
sample_rate = 16000 # Audio sample rate in Hz
channels = 1 # Number of audio channels (1 for mono)
format = "s16" # Audio format (s16 recommended)
buffer_size = 8192 # Internal buffer size in bytes
device = "" # PipeWire device (empty for default)
channel_buffer_size = 30 # Audio frame buffer size
timeout = "5m" # Maximum recording duration (prevents runaway recordings)Recording Timeout:
- Prevents accidental long recordings that could consume resources
- Default: 5 minutes (
"5m") - Format: Go duration strings like
"30s","2m","10m" - Recording automatically stops when timeout is reached
Configurable text injection with multiple backends:
[injection]
backends = ["ydotool", "wtype", "clipboard"] # Ordered fallback chain
ydotool_timeout = "5s"
wtype_timeout = "5s"
clipboard_timeout = "3s"Injection Backends:
ydotool: Uses ydotool (requiresydotoolddaemon for ydotool v1.0.0+). Most compatible with Chromium/Electron apps.wtype: Uses wtype for Wayland. May have issues with some Chromium-based apps (known upstream bug).clipboard: Copies text to clipboard only. Most reliable, but requires manual paste.
Fallback Chain:
Backends are tried in order. The first successful one wins. Example configurations:
# Clipboard only (safest, always works)
backends = ["clipboard"]
# wtype with clipboard fallback
backends = ["wtype", "clipboard"]
# Full fallback chain (default) - best compatibility
backends = ["ydotool", "wtype", "clipboard"]
# ydotool only (if you have it set up)
backends = ["ydotool"]ydotool Setup:
ydotool requires the ydotoold daemon running (for ydotool v1.0.0+) and access to /dev/uinput:
# Start ydotool daemon (systemd)
systemctl --user enable --now ydotool
# Or add user to input group
sudo usermod -aG input $USER
# Then logout/login
# For Hyprland, add to config to set correct keyboard layout:
# device:ydotoold-virtual-device {
# kb_layout = us
# }Behavior:
- Backends are tried in order until one succeeds
- Include
clipboardin the chain if you want text copied to clipboard as fallback
Desktop notification settings:
[notifications]
enabled = true # Enable/disable notifications
type = "desktop" # "desktop", "log", or "none"Notification Types:
desktop: Use notify-send for desktop notificationslog: Log messages to console onlynone: Disable all notifications
Always keep type = "desktop" unless debugging.
You can customize notification text via the [notifications.messages] section.
[notifications.messages]
[notifications.messages.recording_started]
title = "Hyprvoice"
body = "Recording Started"
[notifications.messages.transcribing]
title = "Hyprvoice"
body = "Recording Ended... Transcribing"
[notifications.messages.config_reloaded]
title = "Hyprvoice"
body = "Config Reloaded"
[notifications.messages.operation_cancelled]
title = "Hyprvoice"
body = "Operation Cancelled"
[notifications.messages.recording_aborted]
body = "Recording Aborted"
[notifications.messages.injection_aborted]
body = "Injection Aborted"The daemon automatically watches the config file for changes and applies them immediately:
- Notification settings: Applied instantly
- Injection settings: Applied to current and future operations
- Recording/Transcription settings: Applied to new recording sessions
- Invalid configs: Rejected with error notification, daemon continues with previous config
The systemd user service is automatically installed with the AUR package:
# Check service status
systemctl --user status hyprvoice.service
# Start/stop service
systemctl --user start hyprvoice.service
systemctl --user stop hyprvoice.service
# Enable/disable autostart
systemctl --user enable hyprvoice.service
systemctl --user disable hyprvoice.service
# View logs
journalctl --user -u hyprvoice.service -f- Socket:
~/.cache/hyprvoice/control.sock- IPC communication - PID file:
~/.cache/hyprvoice/hyprvoice.pid- Process tracking - Config:
~/.config/hyprvoice/config.toml- User settings (planned)
| Component | Status | Notes |
|---|---|---|
| Core daemon & IPC | ✅ | Unix socket control plane |
| Recording workflow | ✅ | Toggle recording via PipeWire |
| Audio capture | ✅ | Efficient PipeWire integration |
| Desktop notifications | ✅ | Status feedback via notify-send |
| OpenAI transcription | ✅ | HTTP API integration |
| Groq transcription | ✅ | Fast Whisper API with transcription and translation |
| Text injection | ✅ | Clipboard + wtype with fallback |
| Configuration system | ✅ | TOML-based user settings with hot-reload |
| Interactive setup | ✅ | hyprvoice configure wizard for easy setup |
| Unit test coverage | ✅ | Comprehensive test suite (100% pass) |
| CI/CD Pipeline | ✅ | Automated builds and releases via GitHub Actions |
| Installation (AUR etc) | ✅ | AUR package with automated dependency installation |
| Light dictation models | ⏳ | Alternatives to whispers for light and fast dictation |
| whisper.cpp support | ⏳ | Local model inference |
Legend: ✅ Complete · ⏳ Planned
Hyprvoice uses a daemon + pipeline architecture for efficient resource management:
- Control Daemon: Lightweight IPC server managing lifecycle
- Pipeline: Stateful audio processing (recording → transcribing → injecting)
- State Machine:
idle → recording → transcribing → injecting → idle
flowchart LR
subgraph Client
CLI["CLI/Tool"]
end
subgraph Daemon
D["Control Daemon (lifecycle + IPC)"]
end
subgraph Pipeline
A["Audio Capture"]
T["Transcribing"]
I["Injecting (wtype + clipboard)"]
end
N["notify-send/log"]
CLI -- unix socket --> D
D -- start/stop --> A
A -- frames --> T
T -- status --> D
D -- events --> N
D -- inject action --> T
T --> I
I -->|done| D
stateDiagram-v2
[*] --> idle
idle --> recording: toggle
recording --> transcribing: first_frame
transcribing --> injecting: inject_action
injecting --> idle: done
recording --> idle: abort
injecting --> idle: abort
- Toggle recording → Pipeline starts, audio capture begins
- Audio streaming → PipeWire frames buffered for transcription
- Toggle stop → Recording ends, transcription starts
- Text injection → Result typed or copied to clipboard
- Return to idle → Pipeline cleaned up, ready for next session
toggle(daemon) → create pipeline → recording- First frame arrives → transcribing (daemon may notify
Transcribinglater) - Audio frames → audio buffer (collect all audio during session)
- Second
toggleduring transcribing → sendinjectaction → transcribe collected audio → injecting (simulated) - Complete → idle; pipeline stops; daemon clears reference
- Notifications at key transitions
Daemon won't start:
# Check if already running
hyprvoice status
# Check for stale files
ls -la ~/.cache/hyprvoice/
# Clean up and restart
rm -f ~/.cache/hyprvoice/hyprvoice.pid
rm -f ~/.cache/hyprvoice/control.sock
hyprvoice serveCommand not found:
# Check installation
which hyprvoice
# Add to PATH if using ~/.local/bin
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrcNo audio recording:
# Check PipeWire is running
systemctl --user status pipewire
# Test microphone
pw-record --help
pw-record test.wav
# Check microphone permissions and levelsAudio device issues:
# List available audio devices
pw-cli list-objects | grep -A5 -B5 Audio
# Check microphone is not muted in system settingsNo desktop notifications:
# Test notify-send directly
notify-send "Test" "This is a test notification"
# Install if missing
sudo pacman -S libnotify # Arch
sudo apt install libnotify-bin # Ubuntu/DebianText not appearing:
-
Ensure cursor is in a text field when toggling off recording
-
Check that
wtypeandwl-clipboardtools are installed:# Test wtype directly wtype "test text" # Test clipboard tools echo "test" | wl-copy wl-paste
-
Verify Wayland compositor supports text input protocols
-
Check injection mode in configuration (fallback mode is most robust)
Clipboard issues:
# Install wl-clipboard if missing
sudo pacman -S wl-clipboard # Arch
sudo apt install wl-clipboard # Ubuntu/Debian
# Test clipboard functionality
wl-copy "test text"
wl-paste# Run daemon with verbose output
hyprvoice serve
# Check logs from systemd service (or just see results from hyprvoice serve)
journalctl --user -u hyprvoice.service -f
# Test individual commands
hyprvoice toggle
hyprvoice statusgit clone https://github.com/leonardotrapani/hyprvoice.git
cd hyprvoice
go mod download
go build -o hyprvoice ./cmd/hyprvoice
# Install locally
mkdir -p ~/.local/bin
cp hyprvoice ~/.local/bin/
export PATH="$HOME/.local/bin:$PATH"See packaging/RELEASE.md for complete release process including AUR deployment.
Quick start for AUR:
# After creating your first GitHub release
cd packaging/
./setup-aur.sh # One-time AUR repository setuphyprvoice/
├── cmd/hyprvoice/ # CLI application entry point
├── internal/
│ ├── bus/ # IPC (Unix socket) + PID management
│ ├── daemon/ # Control daemon (lifecycle management)
│ ├── injection/ # Text injection (clipboard + wtype)
│ ├── notify/ # Desktop notification integration
│ ├── pipeline/ # Audio processing pipeline + state machine
│ ├── recording/ # PipeWire audio capture
│ └── transcriber/ # Transcription adapters (OpenAI, whisper.cpp)
├── go.mod # Go module definition
└── README.md
# Terminal 1: Run daemon with logs
go run ./cmd/hyprvoice serve
# Terminal 2: Test commands
go run ./cmd/hyprvoice toggle
go run ./cmd/hyprvoice status
go run ./cmd/hyprvoice stopSimple single-character commands over Unix socket:
t- Toggle recording on/offc- Cancel current operations- Get current statusv- Get protocol versionq- Quit daemon gracefully
Contributions welcome! Please:
- Follow existing code conventions and patterns
- Add tests for new functionality when available
- Update documentation for user-facing changes
- Test on Hyprland/Wayland before submitting PRs
MIT License - see LICENSE.md for details.