One-stop speech recognition, separation and summary solution!
Integrate automatic speech recognition (ASR), speaker separation, SRT subtitle editing, and LLM-based summary capabilities. The project uses Gradio to provide an intuitive and easy-to-use user interface
Private-ASR is a native-deployed tool based on modifications from the open source project FunClip that integrates automatic speech recognition (ASR), speaker separation, SRT subtitle editing, and summary capabilities based on the Large Language Model (LLM).
Main functions:
-
Automatic Speech Recognition (ASR):
- Support video and audio input, output text and SRT subtitles.
-
Speaker separation (SD):
- Identify and distinguish different speakers in multi-speaker audio/video.
-
SRT subtitle editor:
- Allows users to replace the speaker identification with a custom name.
-
Summary based on LLM:
- Use a GPT-based model to summarize ASR results and support custom API configurations.
-
Deployment options:
- Provides a lightweight Docker container for production environments and a Python environment for development/testing.
System requirements:
-
Deployment method:
- Docker (for container-based deployment)
- Python 3.9+(for manual deployment)
-
Dependency:
- see
requirements.txtDocuments.
- see
Deployment steps:
-
Docker deployment:
-
Build Docker image:
docker build -t audio-processor:latest . -
Deploy using Docker Compose:
version: '3.8' services: audio-processor: image: audio-processor:latest container_name: audio-processor ports: - "7860:7860" volumes: - ./. env:/app/.env working_dir: /app restart: unless-stoppedThen run:
docker-compose up -dThe Gradio interface will be available through
http://localhost:7860visit.
-
-
Python deployment:
-
Setting environment:
git clone https://github.com/MotorBottle/Private-ASR.git cd audio-processor python3 -m venv .venv source .venv/bin/activate pip install --no-cache-dir -r requirements.txt -
Make sure FFmpeg is installed:
sudo apt-get update sudo apt-get install -y ffmpeg -
Run the application:
python funclip/launch.py --listenThe Gradio interface will be available through
http://localhost:7860visit.
-
Environment configuration:
All credentials and API configurations can be stored in .env In the document. For example:
USERNAME=motor
PASSWORD=admin
OPENAI_API_KEY=your_openai_key
OPENAI_API_BASE=https://your-custom-api.com
Usage method:
- Upload audio or video files.
- Perform ASR recognition or speaker separation.
- Edit the speaker name in the generated SRT caption.
- Analyze and summarize ASR text using LLM Summary Function.
Contributions and licenses:
This project is released under MIT license. Contributions are welcome!
For more information, visit the project’s GitHub page:
Github:https://github.com/MotorBottle/Private-ASR
Oil tubing: