Speakr can generate concise summaries and titles and interact with content through a chat interface. It provides multiple functions, including audio upload, browser recording, transcription, speaker recognition, AI summary and title generation, interactive chat, etc. Maintained by murtaza‑nasir:
ˇProject overview
Speakr is a “Self-hosted smart voice taking application“, the main uses include:
- Automatically convert recordings (such as meetings, lectures, interviews) into text
- supportAutomatically identify speakersspeaker diarization, and can be named manually
- Generate summaries and titles for transcribed content
- Built-in interactive chat interface, you can ask questions about the recording content
- Support multiple audio formats (MP3, WAV, M4A, AMR, etc.)
Core characteristics
📋 Recording and uploading
- Browser-enabled recording (microphone, system audio, or both)
- Support drag-and-drop or “black hole” directory to automatically identify and process files
ˇAutomatic transcription and speaker recognition
- Use the OpenAI Whisper API or locally compatible models for voice transcribing
- When combined with ASR services (such as WhisperX), you can automaticallyDistinguish multiple speakersAfter uploading, tags such as SPEAKER01 and SPEAKER02 can be generated, and AI-assisted naming and saving of personal speaker identities can be supported.
Automatically generate summary/title
- Use LLM (such as the GPT series) to generate summaries and titles for each transcription
💬Smart chat interaction
- Built-in chat interface allows you to “talk” with recorded content: ask questions and let AI find answers in the text
ˇEditing and formatting support
- Support online editing of transcribed texts, summaries, and speaker information
- Markdown support improves content aesthetics and structure
🧑💻可自定义与部署
- Provides Docker container (Dockerfile, Docker-compose),.env configuration template guide (ASR/Whisper)
- Support self-hosting Whisper model or calling APIs such as OpenAI/OpenRouter/Azure
Latest update (v0.4.1, 2025‑07‑19)
- New UI interface
- Security sharing function: You can set permissions to make recordings/summaries public and withdraw links at any time
- Enhanced recording experience (mobile optimization, dual audio visualization)
- Supports AMR audio format
- Implement online editing of transcribed texts and writing summaries in Markdown format ([GitHub][2])
ˇ Suitable for crowd use scenarios
| user | scene |
|---|---|
| Office worker/team host | Compilation of meeting minutes and interviews |
| Student/Lecturer | Course notes and lecture management |
| Journalists, content creators | Organization and quick summary of interview content |
| Privacy sensitive users | Deploy locally without requiring third-party cloud platforms |
Quick start suggestions
- Prepare a server or VPS with container support
- Clone the project and configure it according to deployment guidelines
.envanddocker-compose.yml - Select interfaces based on budget:
- Free self-hosting: Use WhisperX ASR + Local LLM
- Cloud services: Using OpenAI Whisper and GPT interfaces
- After running, open the web interface to upload recordings and experience transcription, summary, and Chat features
Excerpts of community feedback
Reddit user “hedonihilistic” concluded:
“Speaker Diarization: … automatically detect different speakers … You can easily rename them … Reprocess Button: …”([Reddit][3])
Summary
Speakr is a fully functional, modern interface and privacy-focused self-hosted voice taking tool for users who need transcription, summary generation and intelligent interaction. Whether it is a classroom, a meeting or an interview, it can provide efficient and structured transcripts and AI-driven interactive experiences.
Github:https://github.com/murtaza-nasir/speakr
Oil tubing: