Star Number: 14.1K+ AI-powered eBook to audiobook tool with voice cloning with support for 1100+ languages ebook2audiobook is an open-source project developed by DrewThomasson that focuses on automatically converting non-DRM ebooks (EPUB/MOBI, etc.) into high-quality audiobooks. It integrates with advanced TTS models, supports chapter splitting, metadata embedding, voice cloning, and multilingual output, and provides Gradio Web UI, CLI, and Docker deployment options.
1. Prepare the environment
Necessary software
- Python 3.10+
- Git
- FFmpeg (Audio Processing Required)
- Calibre (parsing EPUB/PDF structure)
How Windows is installed
- Python:https://www.python.org/downloads/
- Git:https://git-scm.com/downloads
- FFmpeg (lazy way): Download the zipped package, unzip it, and add
/binthe directory to the PATH
https://www.gyan.dev/ffmpeg/builds/ - Calibre:https://calibre-ebook.com/download
Make sure the command line executes:
python --version
ffmpeg -version
2. Clone the project code
git clone https://github.com/DrewThomasson/ebook2audiobook
cd ebook2audiobook
3. Install dependencies
The project uses poetry management dependencies, so install it first:
pip install poetry
Then install the project dependencies:
poetry install
When you’re done, enter the virtual environment:
poetry shell
4. Prepare the ebook you want to convert
Supported Formats:
- EPUB (Best)
- MOBI
- TXT
Recommendation: Chinese e-books use EPUB for the most stable effect.
5. The simplest conversion command (Chinese version).
Directly turn eBooks into MP3s
python main.py
--input "你的电子书.epub"
--output "输出目录"
--language "zh"
--tts-engine "coqui"
--output-format "mp3"
Explain the most critical parameters:
| Parameters | Meaning |
|---|---|
--input | Enter the eBook path |
--output | Output directory |
--language "zh" | Chinese |
--tts-engine "coqui" | Coqui XTTSv2 (Chinese, Stable) |
--output-format | mp3 / m4b / flac |
6. If you want to use a voice that is more like an “audiobook streamer” (recommended).
This set of commands works better (Chinese is more natural):
python main.py
--input "你的电子书.epub"
--output "输出目录"
--language "zh"
--tts-engine "bark"
--voice "v2/zh_speaker_6"
--output-format "m4b"
Why bark?
- Bark is pronounced in Chinese more naturally than Coqui
- Comes with multiple Chinese speakers (closer to the audiobook feel)
7. Advanced: Customize your voice (voice cloning).
Prepare your voice sample (about 20–30 seconds)
For example:
samples/myvoice.wav
Then run:
python main.py
--input "电子书.epub"
--output "输出"
--language "zh"
--tts-engine "xtts"
--voice "samples/myvoice.wav"
The system will clone the voice you provide to read the entire book.
The Chinese cloning effect is more dependent on the quality of the sample you provide.
It is recommended to record a normal reading aloud, and the natural speech speed is sufficient.
8. Output to professional audiobook format M4B (supported chapters).
--output-format "m4b"
--chapters "true"
Example:
python main.py
--input "Book.epub"
--output "out"
--language "zh"
--tts-engine "coqui"
--output-format "m4b"
--chapters true
9. The most trouble-free: Docker runs fast (no Python).
If you have Docker on your computer:
docker run -v "$PWD:/data"
ebook2audiobook
--input "/data/book.epub"
--output "/data/output"
--language "zh"
10. Precautions (most common pitfalls encountered by Chinese users).
- Poor PDF extraction
→ It is recommended to use Calibre to convert to EPUB before running. - Chinese sentence breakage problem
→ Bark / Coqui can be processed automatically without the need for manual marking. - The output is too slow
→ GPU=Fast
→ CPU=Slow but usable
→ Be patient when the text is long. - If you report an error, ffmpeg cannot be found
→ Put FFmpeg’s/binin the PATH.
GitHub:https://github.com/DrewThomasson/ebook2audiobook
Tubing: