Project name: VideoTextPro
Project function: text conversion tool
Project Description: A text conversion tool optimized for Douyin live video recording. It is mainly used to handle text extraction and subtitle generation for live playback and recorded video. It supports multiple video and audio formats, including FLV, MP4, AVI, etc.
It has efficient batch processing capabilities, can automatically scan recorded and broadcast folders, intelligently skip processed files, and supports multiple subtitle formats (such as SRT, ASS, TXT).
Project introduction
the project, called video-to-text-conversion, aiming toTranscribes audio content in a video file into text, which is to automatically generate subtitles. Supports multilingual speech recognition and generates subtitles with Timeline (.srt documents).
🧰Project functions
- Support the extraction and recognition of audio from videos as text;
- generation criterion
.srtSubtitle files for easy loading by video players; - Can recognize multiple languages;
- Support batch processing of multiple video files.
Installation steps
-
clone projects
git clone https://github.com/ldlkuz/video-to-text-conversion.git cd video-to-text-conversion -
Create a virtual environment and install dependencies
python3 -m venv venv source venv/bin/activate pip install -r requirements.txt
④ˇ Usage
-
Put your video files (for example
.mp4) Put invideos/In the folder. -
Run the main script
python main.pyAfter execution, it will:
- traversal
videos/All videos in the folder; - Extract audio for each video;
- Use Whisper for speech transcription;
- save the result as
.srtFormat subtitle file, output tosubtitles/Folder.
- traversal
Project structure
main.py: Main program, which processes video files, calls Whisper, and generates subtitles.videos/: Place video files to be processed.subtitles/: Output the generated subtitle file.requirements.txt: List of required Python libraries (mainly includingopenai-whisper,moviepy,ffmpeg-pythonetc.).
ˇSupported languages
The Whisper model natively supports multiple languages, not limited to English. You can modify the code to specify the recognition language.
📝Precautions
- You need to install FFmpeg (for processing video and audio) and make sure you can call it on the command line
ffmpeg; - If there is no graphics card in the system, the model will use CPU by default, which may lead to slower speeds;
- If you need higher precision, you can use Whisper’s large model (you need to modify your own code to load the corresponding model).
Project address: Click to open (https://github.com/ldlkuz/video-to-text-conversion)
Oil tubing: