Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
-
Updated
Jun 8, 2026 - Python
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
A PyTorch-based Speech Toolkit
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
End-to-End Speech Processing Toolkit
On-device Speech AI for Apple Silicon
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.
A python package to build AI-powered real-time audio applications
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
End-to-end speech recognition large model: 31 languages, dialects, accents, lyrics, hotwords, timestamps, speaker diarization. Trained on tens of millions of hours.
turnkey self-hosted offline transcription and diarization service with llm summary
AI speech toolkit for Apple Silicon — ASR, TTS, speech-to-speech, VAD, and diarization powered by MLX and CoreML
一站式全自动字幕生成软件,下载、转录、翻译、压制全流程覆盖,无需人工介入 / One-stop automated subtitle generator. Handles downloading, transcription, translation, and hardcoding—zero human intervention required.
Python re-implementation of the (constrained) spectral clustering algorithms used in Google's speaker diarization papers.
Add a description, image, and links to the speaker-diarization topic page so that developers can more easily learn about it.
To associate your repository with the speaker-diarization topic, visit your repo's landing page and select "manage topics."