AI-Media2Doc is based on the AI model and converts video and audio into various styles of documents with one click. There is no need to login and register. It is deployed locally at the front and back ends to experience AI video/audio conversion style document services at a very low cost.
overview of items
AI-Media2Doc It is a completely open source (MIT licensed) Web tool that aims to convert audio and video content into multiple styles of documents with one click, such as small red books, Weixin Official Accounts articles, knowledge notes, mind maps, video subtitles, etc.
- No need to login to register: The project design has strong privacy protection, and all task records are kept locally.
- Front-end local deployment: Supports users to run in their own environment without relying on external servers.
- The front end uses ffmpeg-wasm technology: You can process audio and video files in the browser without installing local ffmpeg.
- Support rich document styles: The generated content includes small red books, Weixin Official Accounts, knowledge notes, mind maps, content summaries, etc.
- AI secondary dialogue function: AI questions and answers, further interactions or content expansion can be conducted for video content.
- Subtitle export smart screenshots:
- Support one-click export of subtitle files.
- It can automatically take screenshots based on subtitle information and insert them into the article to achieve the combination of text and text without the need for a large visual model.
- Custom Prompt: Front-end allows users to configure custom prompt words (prompt)
- Docker one-click deployment: The project supports rapid deployment using Docker images, making it easy to get started and integrate
Technology and update highlights
- The project was initiated by hanshuaikang and currently has about 2.2k Stars and 268 Forks, indicating that the community has attracted high attention
- The project was first created on April 12, 2025. The most recent update (v0.5.1) was released on August 3, 2025, adding features including enhanced screenshot accuracy, optimization of multi-file processing performance, and Markdown table support
- Complete local deployment guidelines, including Docker image building,
variables.envof its top-shelf specs andmake runinitiate process
Overview of usage process (high-level)
- local deployment: Build an image through Docker, configure environment variables and execute it
make runStart the service quickly. - Online Voice/Video File: Users upload files on the front-end interface.
- AI transcribes and generates documents: The system automatically recognizes audio content (such as using Whisper technology) and generates structured content and documents, such as mind maps, notes, public account articles, etc.
- Interaction and adjustment: Users can inquire or adjust content through AI, and also support screenshot embedding, subtitle export and custom Prompt settings.
- export results: After completion, documents, subtitles, and mixed text manuscripts can be exported to improve subsequent editing efficiency.
Applicable scenarios and target users
- self-media creators: Quickly convert video content into graphic formats that adapt to different platforms.
- Learner/academic recorder: Quickly organize the course videos into notes or mind maps.
- Enterprise/Internal Sharing: Used to convert meeting recordings to documents for easy archiving and sharing.
- Privacy sensitive users: Willing to deploy locally to avoid the risk of data uploading to cloud platforms.
Future planning and community response
- The long-term goal includes using the local fast-whisper large model to further improve offline identification efficiency and accuracy.
- The community also looks forward to a more convenient one-click deployment panel (1-panel deployment) to lower the threshold for use.
Summary:
AI-Media2Doc is a strong privacy, open source, and easy-to-deploy tool that intelligently converts video and audio content into diverse, high-quality documents. It is very useful for scenarios such as self-media creation and learning notes collation.
If you want to learn more about how to deploy, use Docker or debug specific functions, you can also let me know and I can continue to help you answer them!
Github:https://github.com/hanshuaikang/AI-Media2Doc
Oil tubing: