I have XLRS, dyslexia, and ADHD.
I built Rift because every voice tool I tried fought how my brain works. This one doesn’t.
Voice to text. Text to voice. Entirely on your Mac.
Why I built Rift
A genetic eye condition that makes screen reading hard. I listen for hours a day — I wanted voices that don’t add their own fatigue.
Text fights my brain. I needed to speak instead of type and listen instead of read — without my voice leaving my Mac.
My thinking doesn’t follow a straight line. Most voice tools cut you off after two seconds of silence. Rift waits until you stop.
The features I built for myself turn out to help everyone.
What is XLRS?
X-linked retinoschisis is a rare genetic condition that affects the retina’s layers and central vision. It’s uncorrectable with glasses. Symptoms vary; prolonged screen reading often causes extra strain and fatigue.
Who Rift is for
The same choices that help me help anyone who wants patient dictation, natural speech, and privacy.
Dyslexia
If you think better out loud than on paper, Rift turns speech into text without fighting you — and reads it back when you need to hear what you wrote.
ADHD
If your brain takes detours, Rift doesn’t punish pauses or restarts. Live paste keeps the loop tight enough to follow.
Low vision
If reading the screen is tiring, Rift reads to you — fast first word, adjustable speed, pause anywhere — with voices made for long listens.
Motor differences
Hold-to-talk, global shortcuts, and no forced cutoff mean fewer precise key presses and no penalty for hesitation.
Writers & thinkers
If you think by talking, Rift captures it privately — on your Mac, under your control.
Voice to Text
Speak naturally. A local model cleans and merges as you go — not just raw transcription.
You decide
when you're done.
My thoughts don’t follow a timer. ADHD means I pause mid-sentence to find the right word — other tools treated that pause as “done.”
No auto-endpointing
Speak. Pause. Think. Rift waits.
Other apps cut you off after 2 seconds of silence.
Others
"The quick brown—"
Cut off after pause
Rift
"The quick brown fox jumps over the lazy dog."
You press stop when ready
0ms
First-word capture
Your first word is never lost.
A 250ms lead-in buffer starts recording before you even finish pressing the button.
Buffered
Button pressed
Recording
"Hel—" is already captured
0s
Rolling context window
The model considers the last 25 seconds of audio.
It understands context, not just isolated words.
Live paste
Text appears in your app as you speak.
Real-time streaming with final reconciliation when you stop.
The quick brown fox jumps over the lazy dog.
Auto-fix
Hallucination detection
If the first transcription guess is wrong, Rift detects it and auto-replaces.
No manual cleanup. No re-recording.
Real-time
Streaming transcription
Audio is processed in chunks as you speak.
No waiting for you to finish.
And the smaller touches
Things you stop noticing — because they just work.
- Silence polish. A few seconds of quiet, and Rift quietly cleans what you already pasted.
- Polish modes. Verbatim keeps your words. Clean fixes obvious issues. Professional tightens tone.
- Audio cues. Soft tones mark start and stop — confirmation without looking.
- Toggle or hold-to-talk. Pick what fits your hands. Optional auto-send after paste for chat apps.
Text to Voice
Select text. Multiple engines. Natural speech — including code.
First word in
150 milliseconds.
I can’t always read the screen for long stretches. When audio is how I read, the first syllable can’t arrive late.
0ms
First-word latency
You hear the first word before the sentence finishes generating.
No loading spinners. No waiting.
Seamless
Clause-level streaming
The next sentence is synthesized while the current one plays.
No gaps. No stutters. Continuous audio.
0ms
Audio poll rate
The audio buffer is checked every 20 milliseconds.
Imperceptible latency between chunks.
50 checks per second
Pause anywhere
Tap to pause mid-syllable. Tap again to resume from the exact position.
Your place is never lost.
Tap to pause
0.5× – 2×
Playback speed
Speed up for skimming. Slow down for comprehension.
Adjust in real-time without restarting.
And the smaller touches
The parts that make it usable for hours.
- Code Talk. In Cursor, VS Code, Terminal, and docs, Rift speaks technical text naturally —
overflow-x: hiddenbecomes “overflow-x set to hidden.” - Engines & voices. Kokoro (stable) and Chatterbox variants. 14+ voices. Download extras from the tray when you need them.
- Global shortcuts. ⌃1 read selection. ⌃2 dictate. ⌃3 show, hide, or pause. One keystroke away.
How it works
Two pipelines. Local speech models. A local language model for merge, correction, and polish. Zero cloud for your voice and text.
Start dictation
Capture
Core Audio streams from your microphone with a 250ms lead-in buffer. Your first word is never lost.
Process
Parakeet runs on the Neural Engine and GPU via MLX. 25 seconds of rolling context. Real-time streaming.
Paste
Text appears at your cursor as you speak. Final reconciliation when you stop. On-device Gemma 4 polishes your text — see Intelligence.
Speak selected text
Select
Highlight text in any app or copy to clipboard. Rift reads whatever you give it.
Synthesize
Kokoro or Chatterbox generates audio clause-by-clause. First word in 150ms. Next sentence ready before current ends. Code Talk may run an LLM transform first in developer contexts.
Play
Audio streams to system output. Pause anywhere, resume from exact position. 0.5× to 2× speed.
Four phases of local intelligence
Rift runs local language models (Gemma 4 + Qwen3, via MLX) next to Parakeet and TTS. Not just transcription — understanding and cleanup, on your Mac.
- Merge — New words fold into what came before. Fewer duplicates and jumps as the recognizer updates.
- Correct — Grammar, punctuation, and light formatting in real time. Numbers and phrasing stay intentional.
- Extract — When the model revises earlier audio, only genuinely new words are appended.
- Polish — On pause or stop (and silence polish), fillers can be trimmed, lists formatted, sentences smoothed — per your polish mode.
A fast Qwen3 0.6B tier handles real-time phases; a deeper Gemma 4 E4B tier powers polish and Code Talk transforms. All on-device.
Privacy.
That's Rift.
Your voice never leaves your Mac. Ever. When assistive tech is how you read and write, that isn’t abstract — it’s dignity.
Zero file I/O
Audio is synthesized directly to memory. Nothing is written to disk. Nothing persists after you close the app.
See the patience in action
A simplified replay: streaming text, a long pause, then an auto-fix. Skip to transcript
Ready
Demo transcript
Recording starts → text streams in → a 3s pause (other tools might have ended) → speech resumes → a wrong word auto-corrects.
Performance
Tested on real hardware. Real workloads.
M1 MacBook Air
M3 MacBook Pro
M4 Mac Studio
How Rift compares
| Feature | Rift | Whisper.cpp | macOS Dictation |
|---|---|---|---|
| On-device | Yes | Yes | Partial |
| No auto-cutoff | Yes | No | No |
| Live paste | Yes | No | Yes |
| First-word buffer | 250ms | None | None |
| Local LLM polish | Yes | No | No |
| TTS included | Yes | No | Basic |
| TTS latency (first word) | ~150ms | N/A | ~500ms |
| Voice & text privacy | 100% local | 100% local | Cloud fallback |
Requirements
- macOS Sonoma 14.0+
- Chip Apple Silicon
- RAM 8GB minimum
- Disk ~2GB
The visual metaphor
Nothing escapes.
Your data goes in — and stays in.
The Singularity
Your Mac is the center of gravity. Voice in, text out, text in, voice out — all here. No servers. No cloud.
The Event Horizon
Once your words enter Rift, they never leave your machine. No telemetry. No uploads. No exceptions.
How the visualization works +
Raymarching
Volumetric rendering via signed distance functions. The sphere-traced shader calculates 128 iterations per pixel to simulate photon paths.
Schwarzschild geodesics
Light follows the curved spacetime geometry of a non-rotating black hole. The photon sphere appears as a bright ring at 1.5× the event horizon radius.
Keplerian disk
Accretion disk particles orbit according to Kepler's laws. Inner particles orbit faster, creating the characteristic spiral structure.
ACES tonemapping
Film-industry-standard color grading compresses the HDR luminance into displayable range while preserving the fiery accretion glow.
Visualization based on Singularity by MisterPrada
Frequently asked
Does it work offline?
Yes — all voice, text, and language work runs on your Mac. The only network use is optional update checks and the first-run model download. After that, Rift works offline.
Is my voice stored anywhere?
Never. Audio lives in memory and is discarded right away. Nothing is written to disk. Nothing is sent anywhere.
What languages are supported?
English today. The Parakeet model supports more languages — I’m working on enabling them.
What voices ship with it?
Kokoro includes several built-in voices. Chatterbox variants — including MLX fast paths — add more, downloadable from the app. Voice cloning isn’t available.
Does it run on Intel Macs?
No. Rift needs Apple Silicon (M1+) for the MLX framework.
Why is the first run slow?
First launch caches the models (~2GB). After that, launches are instant.
Is Rift open source?
Yes — MIT licensed. Source on GitHub.
How do I install it?
Download the DMG, drag Rift to Applications, launch. Apple Silicon (M1+) required. If macOS shows a security warning, the install guide has the one-line fix.
The Technology
Built different.
Four pillars — speech in, intelligence in the middle, speech out — all on Apple Silicon. No cloud for your content.
The Foundation
MLX
Apple's machine learning framework. Runs entirely on your Mac's Neural Engine and GPU.
Voice to Text
Parakeet
NVIDIA's state-of-the-art speech recognition, optimized for Apple Silicon.
Text to Voice
Kokoro
Neural TTS with natural voices; Chatterbox variants optional. Real-time synthesis.
Intelligence
Gemma 4 + Qwen3
Local LLMs for merge, correct, extract, polish, and Code Talk. Fast Qwen3 0.6B + Gemma 4 E4B for deep cleanup — all via MLX.
Rift
Your voice. Your Mac. Nothing else.
Download for macOSApple Silicon (M1+) · macOS 14+ · English