# Cleanvoice AI

> Cleanvoice is a REST API for automated audio and podcast editing. It removes filler words ("um", "uh"), long silences, stutters, mouth sounds, and breathing from recordings. Also supports noise reduction, studio sound processing, loudness normalization, transcription (Whisper, 99 languages), chapter summarization, and social media content generation.

## Integration choice (read first)

- **Default for developers:** Use the **official Python SDK** or **JavaScript SDK**. They wrap the same REST API: one high-level call handles upload, job polling, and downloading the result.
- **When to use REST instead:** Non-Python/Non-JavaScript stacks, curl/shell scripts, or when you need full manual control over each HTTP request.
- **For AI assistants:** If the user’s project is Python or Node/JavaScript, generate integration code from the **SDK quick start** for that language. Do **not** default to REST endpoint documentation unless the stack is not covered by an SDK or the user asked for raw HTTP.

Base URL: https://api.cleanvoice.ai/v2
Authentication: X-API-Key header
Interactive API reference: https://api.cleanvoice.ai/docs

## Docs

- [Introduction](https://docs.cleanvoice.ai/docs/v2)
- [Python SDK — Quick Start](https://docs.cleanvoice.ai/docs/v2/python/quick-start)
- [Python SDK — Configuration Reference](https://docs.cleanvoice.ai/docs/v2/python/configuration)
- [Python SDK — Uploads](https://docs.cleanvoice.ai/docs/v2/python/uploads)
- [Python SDK — Authentication](https://docs.cleanvoice.ai/docs/v2/python/authentication)
- [Python SDK — Languages](https://docs.cleanvoice.ai/docs/v2/python/languages)
- [Python SDK — Recommendations](https://docs.cleanvoice.ai/docs/v2/python/scalability)
- [Python SDK — SDK Reference](https://docs.cleanvoice.ai/docs/v2/python/sdk-reference)
- [JavaScript SDK — Quick Start](https://docs.cleanvoice.ai/docs/v2/javascript/quick-start)
- [JavaScript SDK — Configuration Reference](https://docs.cleanvoice.ai/docs/v2/javascript/configuration)
- [JavaScript SDK — Uploads](https://docs.cleanvoice.ai/docs/v2/javascript/uploads)
- [JavaScript SDK — Authentication](https://docs.cleanvoice.ai/docs/v2/javascript/authentication)
- [JavaScript SDK — Languages](https://docs.cleanvoice.ai/docs/v2/javascript/languages)
- [JavaScript SDK — Recommendations](https://docs.cleanvoice.ai/docs/v2/javascript/scalability)
- [JavaScript SDK — SDK Reference](https://docs.cleanvoice.ai/docs/v2/javascript/sdk-reference)
- [REST API — Quick Start](https://docs.cleanvoice.ai/docs/v2/rest/quick-start)
- [REST API — Configuration Reference](https://docs.cleanvoice.ai/docs/v2/rest/configuration)
- [REST API — Uploads](https://docs.cleanvoice.ai/docs/v2/rest/uploads)
- [REST API — Create an Edit](https://docs.cleanvoice.ai/docs/v2/rest/edits/create)
- [REST API — Retrieve an Edit](https://docs.cleanvoice.ai/docs/v2/rest/edits/retrieve)
- [REST API — Delete Files](https://docs.cleanvoice.ai/docs/v2/rest/delete-files)
- [REST API — Languages](https://docs.cleanvoice.ai/docs/v2/rest/languages)
- [REST API — Rate Limits](https://docs.cleanvoice.ai/docs/v2/rest/rate-limits)
- [REST API — Recommendations](https://docs.cleanvoice.ai/docs/v2/rest/scalability)
- [Make.com Integration](https://docs.cleanvoice.ai/docs/v2/make)
- [n8n Integration](https://docs.cleanvoice.ai/docs/v2/n8n)

## API endpoints

- POST /v2/edits — submit an edit job; returns { id }
- GET /v2/edits/{edit_id} — poll for status and result; responses use `task_id`
- DELETE /v2/edits/{edit_id} — delete an edit and its associated files
- POST /v2/upload?filename=... — get a signed upload URL for a local file
- GET /v1/account — verify API key and inspect remaining credit

## Edit lifecycle

PENDING → PREPROCESSING → CLASSIFICATION → EDITING → POSTPROCESSING → EXPORT → SUCCESS (`result.download_url` available)
                                                                             ↘ FAILURE
                                                                             ↗ RETRY

## Create an edit — request body

```json
{
  "input": {
    "files": ["https://example.com/episode.mp3"],
    "config": {
      "fillers": true,
      "long_silences": true,
      "normalize": true
    }
  }
}
```

Multi-track (interview with separate tracks): pass multiple URLs in `files` and set `"upload_type": "multitrack"`.
Batch processing (multiple independent files): submit one POST /v2/edits per file — do NOT pass multiple files in one request unless they are multi-track.

## Upload a local file

`POST /v2/upload?filename=episode.mp3` returns:

```json
{
  "signedUrl": "https://storage.example.com/episode.mp3?signature=..."
}
```

Upload your file with `PUT signedUrl`, then strip the query string from `signedUrl` and use that file URL inside `input.files` when calling `POST /v2/edits`.

## Deliver result to your own storage

Pass a pre-signed PUT URL as `signed_url` inside `input.config`. Cleanvoice will PUT the cleaned file directly to your storage (S3, GCS, etc.) instead of hosting it.

```json
{
  "input": {
    "files": ["https://example.com/episode.mp3"],
    "config": {
      "signed_url": "https://your-bucket.s3.amazonaws.com/cleaned.mp3?X-Amz-Signature=...",
      "fillers": true
    }
  }
}
```

## Configuration options (REST / Python / JavaScript)

### Audio cleaning (all default: false)
- fillers / fillers=True / fillers: true — remove filler words ("um", "uh", "like", etc.)
- long_silences / long_silences=True / long_silences: true — trim long pauses
- mouth_sounds / mouth_sounds=True / mouth_sounds: true — remove clicks, lip smacks
- breath / breath=True / breath: true — remove audible breathing. Accepts: true (recommended), "legacy" (conservative, for clean audio), "natural" (lighter, preserves more breathing feel), false (disabled)
- stutters / stutters=True / stutters: true — remove repeated word fragments
- hesitations / hesitations=True / hesitations: true — remove short hesitation sounds that aren't full filler words
- muted / muted=True / muted: true — silence edits instead of cutting, preserves original timing

### Audio enhancement
- remove_noise / remove_noise=True / remove_noise: true — reduce background noise. On by default; pass false to disable
- studio_sound / studio_sound=True / studio_sound: true — aggressive studio-quality enhancement. Accepts: true (recommended), "nightly" (advanced/experimental, currently similar to true), false (disabled, default)
- normalize / normalize=True / normalize: true — loudness normalization (default: false)
- keep_music / keep_music=True / keep_music: true — preserve music sections during noise reduction (default: false)
- autoeq / autoeq=True / autoeq: true — legacy automatic EQ. Prefer studio_sound; autoeq will be removed in a future release (default: false)
- mute_lufs / mute_lufs=-120 / mute_lufs: -120 — gate level for LUFS measurement. Default -120 disables gating
- target_lufs / target_lufs=-16 / target_lufs: -16 — target LUFS. -16 is the standard for podcasts

### Output
- export_format / export_format="mp3" / export_format: "mp3" — audio-only output format: mp3 | wav | flac | m4a | opus | aac | auto (default: auto, matches input). Video jobs keep the original container format
- video / video=True / video: true — must be set to true for video editing. SDKs auto-detect from file extension, but explicit is safer. Without it, video input is treated as audio only
- merge / merge=True / merge: true — merge multi-track files into a single output. Only for multi-track editing
- audio_for_edl / audio_for_edl=True / audio_for_edl: true — return a separate audio track alongside video output (video workflows only)
- signed_url — pre-signed PUT URL to deliver result directly to your storage (S3, GCS, etc.)

### Content generation (all default: false)
- transcription / transcription=True / transcription: true — full transcript (Whisper, auto-detects language)
- summarize / summarize=True / summarize: true — chapters and key learnings (auto-enables transcription)
- social_content / social_content=True / social_content: true — social media posts (auto-enables summarize)
- export_timestamps / export_timestamps=True / export_timestamps: true — include timestamps in edit results

## Recommendations

### How to choose audio enhancement settings (in order)
1. Start with studio_sound=True — recommended default for most recordings. Gives studio-quality output.
2. Always add normalize=True — ensures consistent loudness across the file.
3. If studio_sound sounds too aggressive for a specific recording, use remove_noise=True instead.

### Optional cleanup settings
Add these only if the user wants to remove those elements:
- fillers=True — remove "um", "uh", filler words
- long_silences=True — trim long pauses
- breath=True — remove audible breathing. Use for most recordings.
- breath="legacy" — use instead of True when the audio is already relatively clean. More conservative removal, fewer artifacts.
- stutters=True, mouth_sounds=True — optional extras

### After processing
Always call result.audio.download("filename.mp3") or use result.audio.url to access the cleaned file.

### Which method to use
Always use client.process() (Python) or client.process() (JavaScript). It submits the job and waits for the result in one call. Only use create_edit / createEdit + get_edit / getEdit if manual polling is explicitly needed.

breath modes: true (for most audio), "legacy" (conservative, for already-clean recordings), "natural" (lighter touch). false = disabled (default).
studio_sound modes: true (recommended), "nightly" (advanced/experimental, similar to true). false = disabled (default). autoeq is legacy — use studio_sound instead.
muted: set to true to preserve original timing — edits are silenced instead of cut, keeping the file the same duration. Useful when syncing with video timelines or subtitle files.

## Common presets

Audio enhancement (default recommendation): studio_sound + normalize
Audio enhancement (if studio_sound too aggressive): remove_noise + normalize
Full podcast edit: studio_sound + normalize + fillers + long_silences + breath (or breath="legacy" for clean audio)
Transcript only: transcription
Full analysis: transcription + summarize + social_content

## Python SDK

Install: pip install cleanvoice-sdk
GitHub: https://github.com/cleanvoice/cleanvoice-python

```python
from cleanvoice import Cleanvoice, AsyncCleanvoice

# Sync
client = Cleanvoice.from_env()  # reads CLEANVOICE_API_KEY env var
# or: client = Cleanvoice(api_key="your_key")

# process() blocks until the job finishes and returns the result
result = client.process(
    "https://example.com/episode.mp3",  # URL
    # "/path/to/episode.mp3",           # local file path
    # (numpy_array, sample_rate),        # NumPy array
    fillers=True,
    long_silences=True,
    normalize=True,
)
result.audio.download("cleaned.mp3")

# Async
async_client = AsyncCleanvoice.from_env()
result = await async_client.process("episode.mp3", fillers=True)

# Lower-level methods
edit_id = client.create_edit("episode.mp3", fillers=True)
edit = client.get_edit(edit_id)          # poll manually
file_url = client.upload_file("/path/to/episode.mp3")  # upload local file
account = client.check_auth()            # verify API key
```

## JavaScript SDK

Install: npm install @cleanvoice/cleanvoice-sdk
GitHub: https://github.com/cleanvoice/cleanvoice-js
Works in: Node.js, Deno, Bun, edge runtimes

```typescript
import { Cleanvoice } from '@cleanvoice/cleanvoice-sdk';

const client = Cleanvoice.fromEnv();  // reads CLEANVOICE_API_KEY
// or: new Cleanvoice({ apiKey: '...' })

const result = await client.process('https://example.com/episode.mp3', {
  fillers: true, long_silences: true, normalize: true,
});
console.log(result.audio.url);
await result.audio.download('cleaned.mp3');

const editId = await client.createEdit('episode.mp3', { fillers: true });
const edit   = await client.getEdit(editId);
const account = await client.checkAuth();
```

## Languages

Audio enhancement (noise reduction, silences, normalization, mouth sounds, breathing, stutters) is language-agnostic — works for all languages.

Filler word detection — confirmed working:
English (en), German (de), French (fr), Dutch (nl), Spanish (es), Italian (it), Portuguese (pt), Romanian (ro), Polish (pl), Arabic (ar), Turkish (tr), Bulgarian (bg)

Transcription — powered by Whisper, supports ~99 languages, auto-detected (no manual language field needed).

## File retention

Edit files are automatically deleted after 7 days. To delete immediately: DELETE /v2/edits/{edit_id}

## Processing time

~30 seconds for a 2–3 minute clip. 5–10 minutes for a 1-hour file. Poll GET /v2/edits/{id} every 10 seconds after an initial 30-second wait.