Speakr's open source AI audio transcription tool

Speakr can generate concise summaries and titles and interact with content through a chat interface. It provides multiple functions, including audio upload, browser recording, transcription, speaker recognition, AI summary and title generation, interactive chat, etc. Maintained by murtaza‑nasir:

ˇProject overview

Speakr is a “Self-hosted smart voice taking application“, the main uses include:

Automatically convert recordings (such as meetings, lectures, interviews) into text
supportAutomatically identify speakersspeaker diarization, and can be named manually
Generate summaries and titles for transcribed content
Built-in interactive chat interface, you can ask questions about the recording content
Support multiple audio formats (MP3, WAV, M4A, AMR, etc.)

Core characteristics

📋 Recording and uploading

Browser-enabled recording (microphone, system audio, or both)
Support drag-and-drop or “black hole” directory to automatically identify and process files

ˇAutomatic transcription and speaker recognition

Use the OpenAI Whisper API or locally compatible models for voice transcribing
When combined with ASR services (such as WhisperX), you can automaticallyDistinguish multiple speakersAfter uploading, tags such as SPEAKER01 and SPEAKER02 can be generated, and AI-assisted naming and saving of personal speaker identities can be supported.

Automatically generate summary/title

Use LLM (such as the GPT series) to generate summaries and titles for each transcription

💬Smart chat interaction

Built-in chat interface allows you to “talk” with recorded content: ask questions and let AI find answers in the text

ˇEditing and formatting support

Support online editing of transcribed texts, summaries, and speaker information
Markdown support improves content aesthetics and structure

🧑‍💻可自定义与部署

Provides Docker container (Dockerfile, Docker-compose),.env configuration template guide (ASR/Whisper)
Support self-hosting Whisper model or calling APIs such as OpenAI/OpenRouter/Azure

Latest update (v0.4.1, 2025‑07‑19)

New UI interface
Security sharing function: You can set permissions to make recordings/summaries public and withdraw links at any time
Enhanced recording experience (mobile optimization, dual audio visualization)
Supports AMR audio format
Implement online editing of transcribed texts and writing summaries in Markdown format ([GitHub][2])

ˇ Suitable for crowd use scenarios

user	scene
Office worker/team host	Compilation of meeting minutes and interviews
Student/Lecturer	Course notes and lecture management
Journalists, content creators	Organization and quick summary of interview content
Privacy sensitive users	Deploy locally without requiring third-party cloud platforms

Quick start suggestions

Prepare a server or VPS with container support
Clone the project and configure it according to deployment guidelines .env and docker-compose.yml
Select interfaces based on budget:
- Free self-hosting: Use WhisperX ASR + Local LLM
- Cloud services: Using OpenAI Whisper and GPT interfaces
After running, open the web interface to upload recordings and experience transcription, summary, and Chat features

Excerpts of community feedback

Reddit user “hedonihilistic” concluded:

“Speaker Diarization: … automatically detect different speakers … You can easily rename them … Reprocess Button: …”([Reddit][3])

Summary

Speakr is a fully functional, modern interface and privacy-focused self-hosted voice taking tool for users who need transcription, summary generation and intelligent interaction. Whether it is a classroom, a meeting or an interview, it can provide efficient and structured transcripts and AI-driven interactive experiences.

Github：https://github.com/murtaza-nasir/speakr

Oil tubing: