On-device Speech AI for Apple Silicon
-
Updated
Jun 29, 2026 - Swift
On-device Speech AI for Apple Silicon
Cutting edge AI technology for automated audio transcription. A nice GUI for OpenAIs Whisper and pyannote (speaker identification)
Open source inference code for Rev's model
Very fast, accurate speaker diarization
Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.
Speaker diarization in Rust. 312–912x realtime on Apple Silicon, 50–121x on CUDA. Matches pyannote accuracy.
Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.
EchoInStone is an audio processing tool that transcribes, diarizes, and aligns speaker segments from audio files, prioritizing accuracy and reliability.
On-device VAD / streaming STT / TTS / diarization in C++17 (ONNX + LiteRT) with a voice-agent pipeline. Linux, Windows, Android.
Speaker identification powered by pyannote and resemblyzer
Official repository for Mamba-based Segmentation Model for Speaker Diarization
Ultra-fast, customizable speech-to-text and speaker diarization for noisy, multi-speaker audio. Includes advanced noise reduction, stereo channel support, and flexible audio preprocessing—ideal for call centers, meetings, and podcasts.
GPU-accelerated WhisperX on NVIDIA Blackwell (SM_121) - DGX Spark compatible
Transcription from mp3 files to html with or without embedded player
Multi-source transcript merging inspired by textual criticism — LLM adjudicates multiple Whisper, YouTube captions & external transcripts for higher quality. Includes speaker diarization and summarization.
speech to text gui for different (e.g. Whisper, Voxtral) models and backends, including whisper.cpp, crispasar, mlx-whisper, faster-whisper, ctranslate2; applies pyannote for diarization
Real-time speaker diarization using straightforward, intuitive logic - High accuracy thanks to SpeechBrain/Pyannote-WeSpeaker models
PyAnnote Voice Activity Detection (ONNX version)
Record and transcribe Teams, Zoom, and Google Meet calls locally with AI-powered speaker identification. Open-source alternative to Evaer, Otter.ai, and Fireflies. Offline speech-to-text using Whisper — no cloud, no subscriptions.
Add a description, image, and links to the pyannote topic page so that developers can more easily learn about it.
To associate your repository with the pyannote topic, visit your repo's landing page and select "manage topics."