I have XLRS, dyslexia, and ADHD.

I built Rift because every voice tool I tried fought how my brain works. This one doesn’t.

Voice to text. Text to voice. Entirely on your Mac. Nothing leaves your machine. Ever.

Free research preview · Apple Silicon · macOS 14+

Why I built Rift

XLRS

A congenital eye condition that makes reading on screens harder. I depend on text-to-speech for hours a day — I wanted voices that don’t create their own fatigue.

Dyslexia

Text doesn’t cooperate with my brain. I needed a way to speak instead of type and listen instead of read — without sending my voice to someone’s cloud.

ADHD

My thinking doesn’t follow a straight line. Most voice tools cut you off after two seconds of silence. Rift waits until you’re actually done.

The features I built for myself turn out to help everyone.

What is XLRS?

X-linked retinoschisis is a rare genetic condition that affects the retina’s layers and central vision. It’s uncorrectable with glasses. Symptoms vary; prolonged screen reading often causes extra strain and fatigue.

01

Voice to Text

Speak naturally. A local model cleans and merges as you go — not just raw transcription.

→

Listening...

2:34 and counting

You decide
when you're done.

My thoughts don’t follow a timer. ADHD means I pause mid-sentence to find the right word — other tools treated that pause as “done.”

No auto-endpointing

Speak. Pause. Think. Rift waits.
Other apps cut you off after 2 seconds of silence.

Others

"The quick brown—"

Cut off after pause

Rift

"The quick brown fox jumps over the lazy dog."

You press stop when ready

0ms

First-word capture

Your first word is never lost.
A 250ms lead-in buffer starts recording before you even finish pressing the button.

Buffered

Button pressed

Recording

"Hel—" is already captured

0s

Rolling context window

The model considers the last 25 seconds of audio.
It understands context, not just isolated words.

Context window

Now

-30s -25s -20s -15s -10s -5s 0s

Live paste

Text appears in your app as you speak.
Real-time streaming with final reconciliation when you stop.

The quick brown fox jumps over the lazy dog.

Auto-fix

Hallucination detection

If the first transcription guess is wrong, Rift detects it and auto-replaces.
No manual cleanup. No re-recording.

> The whether weather is nice today

detecting... fixed

Real-time

Streaming transcription

Audio is processed in chunks as you speak.
No waiting for you to finish.

Audio

↓

Text

The quick brown fox jumps

Silence polish

~5 seconds of quiet

After a few seconds of silence while dictating, Rift can polish what you already pasted — fillers, lists, grammar — using the same on-device model that powers final polish. Pauses aren’t wasted.

Polish modes

Verbatim keeps your words. Clean fixes obvious issues. Professional tightens tone more aggressively. You choose how much help you want.

Audio cues

Soft tones mark recording start and stop — so I get confirmation even when I’m not looking at the screen.

Hold to talk · Auto-send

Toggle or hold-to-talk dictation — pick what fits your hands and attention. Optional auto-send after paste (e.g. Return in chat apps) reduces friction after a burst of speech.

02

Text to Voice

Select text. Multiple engines. Natural speech — including code.

→

First word in
150 milliseconds.

I can’t always read the screen for long stretches. When audio is how I read, the first syllable can’t arrive late.

0ms

First-word latency

You hear the first word before the sentence finishes generating.
No loading spinners. No waiting.

Hello, world

"Hello..."

150ms to first sound

Seamless

Clause-level streaming

The next sentence is synthesized while the current one plays.
No gaps. No stutters. Continuous audio.

Playing: "The quick brown fox..."

Buffered: "jumps over the lazy dog."

Generating: "The end."

0ms

Audio poll rate

The audio buffer is checked every 20 milliseconds.
Imperceptible latency between chunks.

0ms 200ms

50 checks per second

Pause anywhere

Tap to pause mid-syllable. Tap again to resume from the exact position.
Your place is never lost.

Tap to pause

0.5× – 2×

Playback speed

Speed up for skimming. Slow down for comprehension.
Adjust in real-time without restarting.

0.5× 1.0× 2×

Code Talk

IDEs, terminals, docs

In Cursor, VS Code, Terminal, or developer sites, Rift detects context and can transform technical text into speakable phrasing before TTS — e.g. CSS overflow-x: hidden becomes “overflow-x set to hidden.” I read a lot of code with my ears.

Engines & voices

Kokoro (stable) and Chatterbox variants (including MLX fast paths) — pick what sounds right. 14+ voices across engines. Download extra models from the tray when you need them.

⌃3 — Show / hide / pause

Global shortcuts: ⌃1 read selection, ⌃2 dictation, ⌃3 show or hide the widget and pause audio. Your flow stays one keystroke away.

How it works

Two pipelines. Local speech models. A local language model for merge, correction, and polish. Zero cloud for your voice and text.

01 Voice to Text

Ctrl + 2

Start dictation

1

Capture

Core Audio streams from your microphone with a 250ms lead-in buffer. Your first word is never lost.

2

Process

Parakeet runs on the Neural Engine and GPU via MLX. 25 seconds of rolling context. Real-time streaming.

3

Paste

Text appears at your cursor as you speak. Final reconciliation when you stop. On-device Gemma 4 polishes your text — see Intelligence.

02 Text to Voice

Ctrl + 1

Speak selected text

1

Select

Highlight text in any app or copy to clipboard. Rift reads whatever you give it.

2

Synthesize

Kokoro or Chatterbox generates audio clause-by-clause. First word in 150ms. Next sentence ready before current ends. Code Talk may run an LLM transform first in developer contexts.

3

Play

Audio streams to system output. Pause anywhere, resume from exact position. 0.5× to 2× speed.

Space Pause / Resume

Esc Stop

Four phases of local intelligence

Rift runs a local language models (Gemma 4 + Qwen3, via MLX) next to Parakeet and TTS. It’s not just transcription — it’s understanding and cleanup that never leaves your Mac.

Merge — New words fold into what came before. Fewer duplicates and jumps as the recognizer updates.
Correct — Grammar, punctuation, and light formatting in real time. Numbers and phrasing stay intentional.
Extract — When the model revises earlier audio, only genuinely new words are appended.
Polish — On pause or stop (and silence polish), fillers can be trimmed, lists formatted, sentences smoothed — per your polish mode.

A fast Qwen3 0.6B tier handles real-time phases; a deeper Gemma 4 E4B tier powers polish and Code Talk transforms. All on-device.

Privacy.
That's Rift.

Your voice never leaves your Mac. Ever. When assistive tech is how you read and write, that isn’t abstract — it’s dignity.

100% on-device processing

No cloud. No servers.

No accounts required

Fully open source

Zero file I/O

Audio is synthesized directly to memory. Nothing is written to disk. Nothing persists after you close the app.

Who Rift is for

The same design choices that help me help anyone who wants patient dictation, natural TTS, and privacy.

Dyslexia

I think better out loud than on paper. Rift turns speech into text without fighting me — and reads it back when I need to hear what I wrote.

ADHD

My brain takes detours. Rift doesn’t punish pauses, restarts, or nonlinear thinking — and live paste keeps the feedback loop tight.

Low vision

I can’t always read the screen. Rift reads to me — fast first word, adjustable speed, pause anywhere — with voices I can listen to for hours.

Motor differences

Hold-to-talk, global shortcuts, and no forced auto-cutoff mean less precise timing and fewer repeated keypresses.

Writers & thinkers

If you think by talking, Rift captures voice privately — on your Mac, under your control.

See the patience in action

A simplified replay: streaming text, a long pause, then an auto-fix. Skip to transcript

Rift

Ready

Demo transcript

Recording starts → text streams in → a 3s pause (other tools might have ended) → speech resumes → a wrong word auto-corrects.

Performance

Tested on real hardware. Real workloads.

M1 MacBook Air

Voice→Text

0.8× realtime

Text→Voice

1.2× realtime

Memory

1.8 GB

M3 MacBook Pro

Voice→Text

1.5× realtime

Text→Voice

2.4× realtime

Memory

2.0 GB

M4 Mac Studio

Voice→Text

2.1× realtime

Text→Voice

3.4× realtime

Memory

2.1 GB

How Rift compares

Feature comparison: Rift vs Whisper.cpp vs macOS Dictation
Feature	Rift	Whisper.cpp	macOS Dictation
On-device	Yes	Yes	Partial
No auto-cutoff	Yes	No	No
Live paste	Yes	No	Yes
First-word buffer	250ms	None	None
Local LLM polish	Yes	No	No
TTS included	Yes	No	Basic
TTS latency (first word)	~150ms	N/A	~500ms
Voice & text privacy	100% local	100% local	Cloud fallback

Requirements

macOS Sonoma 14.0+
Chip Apple Silicon
RAM 8GB minimum
Disk ~2GB

The visual metaphor

Nothing escapes.

A black hole where your data goes in — and stays in.

The Singularity

Your Mac is the center of gravity. All processing happens here — voice recognition, text synthesis, everything. No servers. No cloud. One machine.

The Accretion Disk

Your voice flows in like matter spiraling toward the event horizon. It gets captured, processed, transformed. The warm glow is energy being released as computation.

→

The Event Horizon

The point of no return — but in a good way. Once your words enter Rift, they never leave your machine. No telemetry, no uploads, no exceptions.

Gravitational Lensing

Just as light bends around a black hole, your voice bends into text. Text bends into voice. Transformation through the most powerful force — local compute.

How the visualization works +

Raymarching

Volumetric rendering via signed distance functions. The sphere-traced shader calculates 128 iterations per pixel to simulate photon paths.

Schwarzschild geodesics

Light follows the curved spacetime geometry of a non-rotating black hole. The photon sphere appears as a bright ring at 1.5× the event horizon radius.

Keplerian disk

Accretion disk particles orbit according to Kepler's laws. Inner particles orbit faster, creating the characteristic spiral structure.

ACES tonemapping

Film-industry-standard color grading compresses the HDR luminance into displayable range while preserving the fiery accretion glow.

Visualization based on Singularity by MisterPrada

Frequently asked

Does it work offline?

Yes for voice and text — all STT, TTS, and LLM work runs on your Mac. Rift does not send your speech or transcripts to the cloud. Optional Check for Updates and first-run model downloads use the network; you can use the app fully offline after models are cached.

What languages are supported?

Currently English only. The underlying Parakeet model supports multiple languages, and we're working on enabling them in future updates.

What voices and TTS engines are available?

Kokoro ships with multiple built-in voices. Chatterbox variants (including MLX fast options) add more voices and can be downloaded from the app when needed. Custom voice cloning is not available yet.

What is Code Talk?

In IDEs, terminals, and docs sites, Rift can transform technical text into natural speech before TTS — so code and symbols are spoken clearly instead of letter-by-letter noise.

What is Silence Polish?

When you pause for a few seconds while dictating, Rift can use that silence to clean up pasted text (fillers, lists, light grammar) using the on-device model — without sending anything off your Mac.

Is my voice data stored anywhere?

Never. Audio is processed in memory and discarded immediately. Nothing is written to disk or sent anywhere.

Why is the first run slow?

On first launch, Rift downloads and caches the ML models (~2GB). Subsequent launches are instant.

Does it work on Intel Macs?

No. Rift requires Apple Silicon (M1 or later) for the MLX machine learning framework.

Is Rift open source?

Yes. The full source code is available on GitHub under the MIT license.

How do I install it?

Download the DMG, drag to Applications, and launch. Apple Silicon (M1+) required. If macOS shows a security warning, check the installation guide for a quick fix. First launch downloads ~2GB of ML models.

The Technology

Built different.

Four pillars — speech in, intelligence in the middle, speech out — all on Apple Silicon. No cloud for your content.

The Foundation

MLX

Apple's machine learning framework. Runs entirely on your Mac's Neural Engine and GPU.

Apple Silicon On-device Open source

Voice to Text

Parakeet

NVIDIA's state-of-the-art speech recognition, optimized for Apple Silicon.

0.6B params TDT arch ~800MB

Text to Voice

Kokoro

Neural TTS with natural voices; Chatterbox variants optional. Real-time synthesis.

82M params + Chatterbox 14+ voices

Intelligence

Gemma 4 + Qwen3

Local LLMs for merge, correct, extract, polish, and Code Talk. Fast Qwen3 0.6B + Gemma 4 E4B for deep cleanup — all via MLX.

On-device Private 0.6B + E4B

Rift

Your voice. Your Mac. Nothing else.

Download for macOS

Apple Silicon (M1+) · macOS 14+ · English

I have XLRS, dyslexia, and ADHD.

Why I built Rift

Voice to Text

Text to Voice

How it works

Capture

Process

Paste

Select

Synthesize

Play

Four phases of local intelligence

Privacy.That's Rift.

Who Rift is for

Dyslexia

ADHD

Low vision

Motor differences

Writers & thinkers

See the patience in action

Demo transcript

Performance

How Rift compares

Requirements

Nothing escapes.

The Singularity

The Accretion Disk

The Event Horizon

Gravitational Lensing

Frequently asked

Built different.

MLX

Parakeet

Kokoro

Gemma 4 + Qwen3

Rift

Privacy.
That's Rift.