Rift Evals

v2026.01.24 · Jan 24, 2026 · Source · JSON

Rift is a local voice-to-text pipeline using MLX on Apple Silicon.
185 scenarios across 5 test suites. Pass rate: 71.2% (threshold: 70%).

Methodology

Tests compare LLM output to expected results using Jaccard similarity of word sets. Pass requires similarity ≥ 0.70 and latency ≤ threshold. Latency limits: 1500ms (merge), 500ms (correction), 6500ms (polish), 16000ms (TTS transform).

Hardware: Apple M3 Pro, 18GB unified memory, MLX framework. Models: Qwen3-0.6B (fast operations), Qwen3-4B (quality operations).

Model comparison: All candidates run on the same 66 LLM test scenarios. Prompts are stored in Qwen3 chat format and auto-converted to each model's native format via tokenizer.apply_chat_template(). Models benchmarked: Qwen3-4B, Gemma 4 E4B (4-bit, 6-bit), Gemma 4 26B MoE (4-bit).

Limitations