Kesha Voice Kit

Give your local tools and LLM agents a voice.
Fast speech-to-text, text-to-speech, voice-activity detection, and language detection in one local-first CLI: Apple Silicon CoreML first, ONNX fallback on supported Linux/Windows builds.

Transcribe locally — 25 languages, up to ~19x faster than Whisper on Apple Silicon, ~2.5x on CPU
Speak back — text-to-speech in 9 languages
Plug into agents — ship voice workflows as CLI commands, an MCP server, an OpenClaw skill, or a Hermes agent
Small Rust engine — single ~20MB binary, no ffmpeg, no Python, no native Node addons

Quick Start

Runtime: Bun >= 1.3.0 · Platforms: macOS arm64, Linux x64, Windows x64.

# 1. Install Bun (skip if you have it) — Linux & macOS:
curl -fsSL https://bun.sh/install | bash        # or: brew install oven-sh/bun/bun
# Windows: powershell -c "irm bun.sh/install.ps1 | iex"

# 2. Install Kesha:
bun add -g @drakulavich/kesha-voice-kit
kesha install        # downloads engine + models (explicit — never automatic)

# 3. Transcribe:
kesha audio.ogg      # transcript to stdout

Prefer Homebrew, .deb/.rpm, Docker, or Nix? See Other install methods. Air-gapped or behind a corporate mirror? See docs/model-mirror.md.

Speech-to-text

kesha audio.ogg                            # transcribe (plain text)
kesha --format transcript audio.ogg        # text + language/confidence
kesha --format json audio.ogg              # full JSON with lang fields
kesha --json --timestamps audio.ogg        # JSON with timestamped segments
kesha --toon audio.ogg                     # compact LLM-friendly TOON
kesha status                               # show installed backend info

Multiple files get head-style headers; stdout is the transcript, stderr is errors — pipe-friendly:

$ kesha freedom.ogg tahiti.ogg
=== freedom.ogg ===
Свободу попугаям! Свободу!

=== tahiti.ogg ===
Таити, Таити! Не были мы ни в какой Таити! Нас и тут неплохо кормят.

Long / silence-heavy audio: install VAD (kesha install --vad); Kesha auto-uses it past 120 s. Without VAD, long audio falls back to fixed ASR chunks. See docs/vad.md.
Speaker diarization (darwin-arm64): kesha install --diarize, then kesha --json --vad --speakers meeting.m4a stamps each segment with a speaker id. Linux/Windows return a clear "darwin-arm64 only" error (#199).

Text-to-speech

Kesha speaks back in 9 languages, auto-picking the voice from the text's language. Override with --lang <code> or --voice <id>.

kesha install --tts                              # opt-in models (~990MB)
kesha say "Hello, world" > hello.wav
kesha say "Привет, мир" > privet.wav             # auto-routes by language
kesha say --voice ru-vosk-m02 "Голос в текст." > ru.wav

Output formats (--format, or inferred from the --out extension):

kesha say "Hello" --out hi.wav                    # WAV (default, uncompressed)
kesha say "Hello" --format ogg-opus --out hi.ogg  # OGG/Opus — messenger voice notes
kesha say "Hello" --format flac --out hi.flac     # FLAC — lossless, plays in every browser incl. Safari/iOS

kesha say --list-voices lists what's installed. Voices, the full catalogue, macOS system voices, SSML, speaking rate (--rate, <prosody>), Russian word stress, and Russian/English abbreviation handling are all in docs/tts.md.

Languages

Speech-to-text spans 25 languages and text-to-speech covers English, Russian, and select multilingual voices — full tables with codes and flags in docs/languages.md. Audio language detection identifies 107 languages.

Performance

Up to ~19x faster than Whisper on Apple Silicon (M2), ~2.5x faster on CPU

Compared against Whisper large-v3-turbo, all engines auto-detecting language:

Full per-file breakdown (Russian + English): BENCHMARK.md.

Other install methods

All of these install the Bun CLI wrapper; engine + models still download explicitly via kesha install.

Homebrew — brew install drakulavich/tap/kesha-voice-kit · docs/homebrew.md
Linux packages (.deb/.rpm, x64) — docs/linux-packages.md
Docker (GHCR image) — docs/docker.md
Nix (aarch64-darwin / x86_64-linux) — nix run github:drakulavich/kesha-voice-kit -- install · docs/nix-install.md
Shell completions + manpage — kesha completions bash|zsh|fish and kesha manpage print the packaged files to install wherever your shell expects them.

Integrations

MCP server — kesha mcp exposes transcribe/synthesize/list tools to any MCP client (Claude, Cursor, Codex, Gemini). Setup: docs/mcp.md.
OpenClaw — give your LLM agent ears. Install & config: docs/openclaw.md.
Hermes Agent — local STT/TTS through Hermes command providers. Setup: docs/hermes.md.
Raycast (macOS) — transcribe selected audio & speak the clipboard from the launcher. Source + install: raycast/.
Programmatic API — @drakulavich/kesha-voice-kit/core for use inside a Bun program. See docs/api.md.

More

Architecture — runtime data flow, the models that ship, the CLI ↔ Rust engine boundary, model pinning, and where tests live.
Use cases — copy-paste recipes (transcribe a meeting, speak from OpenClaw, run offline, move the cache).
Product positioning — supported workflows, non-goals, maturity labels, platform matrix.
Diagnostics: kesha doctor, kesha support-bundle (redacted .tar.gz for issues), and kesha logs produce local, content-free diagnostics — see docs/diagnostic-logs.md. Every failure prints a stable error [CODE]: … line and a documented process exit code.
Scripting & CI: --json (or --toon) for machine-readable output, --quiet/-q to silence progress, and --no-color (or NO_COLOR=1) for plain logs. Colors switch off automatically when CI=true.
Privacy / Local Stats: Stats are off by default and fully local. Opt in with kesha stats enable to record content-free operational metrics in a local SQLite database — never networked, never storing audio, transcripts, text, or paths. Full commands & lifecycle: docs/local-stats.md.

Contributing

See CONTRIBUTING.md, the Roadmap (Now / Next / Later), and the Decision log (why platform/model choices were made — and reversed). Dev setup: make dev-setup (Bun, Rust, nextest, platform libs).

License

Made with 💛🩵 and 🥤 energy under MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 500 Commits
.claude		.claude
.clawhub		.clawhub
.github		.github
bin		bin
completions		completions
docs		docs
man		man
openspec		openspec
packaging		packaging
raycast		raycast
rust		rust
scripts		scripts
src		src
tests		tests
.bun-version		.bun-version
.clawhubignore		.clawhubignore
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.npmignore		.npmignore
AGENTS.md		AGENTS.md
BENCHMARK.md		BENCHMARK.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
NOTICES.md		NOTICES.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
SKILL.md		SKILL.md
bun.lock		bun.lock
compose.yml		compose.yml
demo.gif		demo.gif
demo.tape		demo.tape
flake.lock		flake.lock
flake.nix		flake.nix
openclaw-plugin.cjs		openclaw-plugin.cjs
openclaw.plugin.json		openclaw.plugin.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kesha Voice Kit

Quick Start

Speech-to-text

Text-to-speech

Languages

Performance

Other install methods

Integrations

More

Contributing

License

About

Uh oh!

Releases 74

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kesha Voice Kit

Quick Start

Speech-to-text

Text-to-speech

Languages

Performance

Other install methods

Integrations

More

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 74

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages