Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
-
Updated
Jun 29, 2026 - Python
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
基于PaddlePaddle实现端到端中文语音识别,从入门到实战,超简单的入门案例,超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型
On-device VAD / streaming STT / TTS / diarization in C++17 (ONNX + LiteRT) with a voice-agent pipeline. Linux, Windows, Android.
Streaming on-device speech recognition for Android — NEON-accelerated, encrypted FastConformer (32M params), ~150 ms latency, no cloud. Powered by the VoxRT runtime.
Nemotron ASR streaming + real-time translation. 19 source × 200 target languages, runs on a MacBook CPU. Vi↔En foundation, any pair via flags. No cloud.
Streaming on-device speech recognition for iOS — NEON-accelerated, encrypted FastConformer (32M params), RTF 0.08–0.10 on iPhone 13 Pro Max. Built on the VoxRT custom Rust inference runtime. SwiftPM distribution.
A 1300-hour English speech and text corpus of parliamentary debates for streaming ASR training and benchmarking, speech data filtering and speech data verbatimization.
Pre-compiled ASR model weights for the VoxRT on-device runtime. Encrypted .vxrt v2 format. streaming-medium-pc: FastConformer 32M, CTC + RNN-T, CC-BY-4.0 (NVIDIA NeMo).
Faster-Whisper Transcription Server & API is a production-ready speech-to-text micro-service stack that wraps faster-whisper with a streaming FastAPI server, a Celery/Redis background queue, and optional Docker deployment—delivering real-time or batch audio transcription with minimal latency and simple web-hook integration.
OpenAI-compatible proxy bridging Doubao/Volcengine ASR 2.0 (Seed-ASR) WebSocket protocol to /v1/audio/transcriptions; works with Spokenly and OpenAI-compatible clients. OpenAI 兼容代理:将豆包/火山引擎 ASR 2.0(Seed-ASR)WebSocket 协议桥接到 /v1/audio/transcriptions,适用于 Spokenly 与其他 OpenAI 兼容客户端。
Production-ready REST API for Russian speech recognition using T-one model. FastAPI-based service with offline and streaming transcription support.
PhD Thesis: "Automatic speech recognition and machine translation with deep neural networks for open educational resources, parliamentary contents and broadcast media" (2024)
Windows 桌面豆包语音输入工具 — 全局快捷键录音 → 火山引擎流式 ASR → 自动粘贴到光标。原生支持豆包平台热词表 ID。
Lightweight Windows voice input tool with offline streaming ASR, hotwords, and AI text correction
Low-latency voice AI agent platform with streaming ASR/TTS, FSM-based dialog management, and microservices architecture. Built with FastAPI, LangGraph, vLLM, and F5-TTS.
SwiftPM streaming ASR runtime for NVIDIA Nemotron 3.5 on Apple CoreAI
Reusable Rust speech-to-text runtime with audio capture, VAD, backend selection, model provisioning, and transcript streaming.
Injecting semantic in Streaming Automatic Speech Recognition models
Add a description, image, and links to the streaming-asr topic page so that developers can more easily learn about it.
To associate your repository with the streaming-asr topic, visit your repo's landing page and select "manage topics."