Rift is a local voice-to-text pipeline using MLX on Apple Silicon.
185 scenarios across 5 test suites.
Pass rate: 71.2% (threshold: 70%).
Tests that didn't meet the 70% similarity threshold
Tests compare LLM output to expected results using Jaccard similarity of word sets.
Pass requires similarity ≥ 0.70 and latency ≤ threshold.
Latency limits: 1500ms (merge), 500ms (correction), 6500ms (polish), 16000ms (TTS transform).
Hardware: Apple M3 Pro, 18GB unified memory, MLX framework. Models: Qwen3-0.6B (fast operations), Qwen3-4B (quality operations).
Model comparison: All candidates run on the same 66 LLM test scenarios.
Prompts are stored in Qwen3 chat format and auto-converted to each model's native
format via tokenizer.apply_chat_template(). Models benchmarked: Qwen3-4B,
Gemma 4 E4B (4-bit, 6-bit), Gemma 4 26B MoE (4-bit).