Speakr’s open source AI audio transcription tool

Speakr can generate concise summaries and titles and interact with content through a chat interface. It provides multiple functions, including audio upload, browser recording, transcription, speaker recognition, AI summary and title generation, interactive chat, etc. Maintained by murtaza‑nasir:

ˇProject overview

Speakr is a “Self-hosted smart voice taking application“, the main uses include:

  • Automatically convert recordings (such as meetings, lectures, interviews) into text
  • supportAutomatically identify speakersspeaker diarization, and can be named manually
  • Generate summaries and titles for transcribed content
  • Built-in interactive chat interface, you can ask questions about the recording content
  • Support multiple audio formats (MP3, WAV, M4A, AMR, etc.)

Core characteristics

📋 Recording and uploading

  • Browser-enabled recording (microphone, system audio, or both)
  • Support drag-and-drop or “black hole” directory to automatically identify and process files

ˇAutomatic transcription and speaker recognition

  • Use the OpenAI Whisper API or locally compatible models for voice transcribing
  • When combined with ASR services (such as WhisperX), you can automaticallyDistinguish multiple speakersAfter uploading, tags such as SPEAKER01 and SPEAKER02 can be generated, and AI-assisted naming and saving of personal speaker identities can be supported.

Automatically generate summary/title

  • Use LLM (such as the GPT series) to generate summaries and titles for each transcription

💬Smart chat interaction

  • Built-in chat interface allows you to “talk” with recorded content: ask questions and let AI find answers in the text

ˇEditing and formatting support

  • Support online editing of transcribed texts, summaries, and speaker information
  • Markdown support improves content aesthetics and structure

🧑‍💻可自定义与部署

  • Provides Docker container (Dockerfile, Docker-compose),.env configuration template guide (ASR/Whisper)
  • Support self-hosting Whisper model or calling APIs such as OpenAI/OpenRouter/Azure

Latest update (v0.4.1, 2025‑07‑19)

  • New UI interface
  • Security sharing function: You can set permissions to make recordings/summaries public and withdraw links at any time
  • Enhanced recording experience (mobile optimization, dual audio visualization)
  • Supports AMR audio format
  • Implement online editing of transcribed texts and writing summaries in Markdown format ([GitHub][2])

ˇ Suitable for crowd use scenarios

userscene
Office worker/team hostCompilation of meeting minutes and interviews
Student/LecturerCourse notes and lecture management
Journalists, content creatorsOrganization and quick summary of interview content
Privacy sensitive usersDeploy locally without requiring third-party cloud platforms

Quick start suggestions

  1. Prepare a server or VPS with container support
  2. Clone the project and configure it according to deployment guidelines .env and docker-compose.yml
  3. Select interfaces based on budget:
    • Free self-hosting: Use WhisperX ASR + Local LLM
    • Cloud services: Using OpenAI Whisper and GPT interfaces
  4. After running, open the web interface to upload recordings and experience transcription, summary, and Chat features

Excerpts of community feedback

Reddit user “hedonihilistic” concluded:

“Speaker Diarization: … automatically detect different speakers … You can easily rename them … Reprocess Button: …”([Reddit][3])

Summary

Speakr is a fully functional, modern interface and privacy-focused self-hosted voice taking tool for users who need transcription, summary generation and intelligent interaction. Whether it is a classroom, a meeting or an interview, it can provide efficient and structured transcripts and AI-driven interactive experiences.

Github:https://github.com/murtaza-nasir/speakr

Oil tubing:

Scroll to Top