Private-ASR: Locally deployed intelligent voice assistant

One-stop speech recognition, separation and summary solution!

Integrate automatic speech recognition (ASR), speaker separation, SRT subtitle editing, and LLM-based summary capabilities. The project uses Gradio to provide an intuitive and easy-to-use user interface

Private-ASR is a native-deployed tool based on modifications from the open source project FunClip that integrates automatic speech recognition (ASR), speaker separation, SRT subtitle editing, and summary capabilities based on the Large Language Model (LLM).

Main functions:

  1. Automatic Speech Recognition (ASR):

    • Support video and audio input, output text and SRT subtitles.
  2. Speaker separation (SD):

    • Identify and distinguish different speakers in multi-speaker audio/video.
  3. SRT subtitle editor:

    • Allows users to replace the speaker identification with a custom name.
  4. Summary based on LLM:

    • Use a GPT-based model to summarize ASR results and support custom API configurations.
  5. Deployment options:

    • Provides a lightweight Docker container for production environments and a Python environment for development/testing.

System requirements:

  • Deployment method:

    • Docker (for container-based deployment)
    • Python 3.9+(for manual deployment)
  • Dependency:

    • see requirements.txt Documents.

Deployment steps:

  1. Docker deployment:

    • Build Docker image:

      docker build -t audio-processor:latest .
    • Deploy using Docker Compose:

      version: '3.8'
      
      services:
      audio-processor:
       image: audio-processor:latest
       container_name: audio-processor
       ports:
       - "7860:7860"
       volumes:
       - ./. env:/app/.env
       working_dir: /app
       restart: unless-stopped

      Then run:

      docker-compose up -d

      The Gradio interface will be available through http://localhost:7860 visit.

  2. Python deployment:

    • Setting environment:

      git clone https://github.com/MotorBottle/Private-ASR.git
      cd audio-processor
      python3 -m venv .venv
      source .venv/bin/activate
      pip install --no-cache-dir -r requirements.txt
    • Make sure FFmpeg is installed:

      sudo apt-get update
      sudo apt-get install -y ffmpeg
    • Run the application:

      python funclip/launch.py --listen

      The Gradio interface will be available through http://localhost:7860 visit.

Environment configuration:

All credentials and API configurations can be stored in .env In the document. For example:

USERNAME=motor
PASSWORD=admin
OPENAI_API_KEY=your_openai_key
OPENAI_API_BASE=https://your-custom-api.com

Usage method:

  1. Upload audio or video files.
  2. Perform ASR recognition or speaker separation.
  3. Edit the speaker name in the generated SRT caption.
  4. Analyze and summarize ASR text using LLM Summary Function.

Contributions and licenses:

This project is released under MIT license. Contributions are welcome!

For more information, visit the project’s GitHub page:

Github:https://github.com/MotorBottle/Private-ASR

Oil tubing:

Scroll to Top