ebook2audiobook: A quick guide to getting started in Chinese

Star Number: 14.1K+ AI-powered eBook to audiobook tool with voice cloning with support for 1100+ languages ebook2audiobook is an open-source project developed by DrewThomasson that focuses on automatically converting non-DRM ebooks (EPUB/MOBI, etc.) into high-quality audiobooks. It integrates with advanced TTS models, supports chapter splitting, metadata embedding, voice cloning, and multilingual output, and provides Gradio Web UI, CLI, and Docker deployment options.

1. Prepare the environment

Necessary software

  1. Python 3.10+
  2. Git
  3. FFmpeg (Audio Processing Required)
  4. Calibre (parsing EPUB/PDF structure)

How Windows is installed

Make sure the command line executes:

python --version
ffmpeg -version

2. Clone the project code

git clone https://github.com/DrewThomasson/ebook2audiobook
cd ebook2audiobook

3. Install dependencies

The project uses poetry management dependencies, so install it first:

pip install poetry

Then install the project dependencies:

poetry install

When you’re done, enter the virtual environment:

poetry shell

4. Prepare the ebook you want to convert

Supported Formats:

  • EPUB (Best)
  • PDF
  • MOBI
  • TXT

Recommendation: Chinese e-books use EPUB for the most stable effect.

5. The simplest conversion command (Chinese version).

Directly turn eBooks into MP3s

python main.py 
 --input "你的电子书.epub" 
 --output "输出目录" 
 --language "zh" 
 --tts-engine "coqui" 
 --output-format "mp3"

Explain the most critical parameters:

ParametersMeaning
--inputEnter the eBook path
--outputOutput directory
--language "zh"Chinese
--tts-engine "coqui"Coqui XTTSv2 (Chinese, Stable)
--output-formatmp3 / m4b / flac

6. If you want to use a voice that is more like an “audiobook streamer” (recommended).

This set of commands works better (Chinese is more natural):

python main.py 
 --input "你的电子书.epub" 
 --output "输出目录" 
 --language "zh" 
 --tts-engine "bark" 
 --voice "v2/zh_speaker_6" 
 --output-format "m4b"

Why bark?

  • Bark is pronounced in Chinese more naturally than Coqui
  • Comes with multiple Chinese speakers (closer to the audiobook feel)

7. Advanced: Customize your voice (voice cloning).

Prepare your voice sample (about 20–30 seconds)
For example:

samples/myvoice.wav

Then run:

python main.py 
 --input "电子书.epub" 
 --output "输出" 
 --language "zh" 
 --tts-engine "xtts" 
 --voice "samples/myvoice.wav"

The system will clone the voice you provide to read the entire book.

The Chinese cloning effect is more dependent on the quality of the sample you provide.
It is recommended to record a normal reading aloud, and the natural speech speed is sufficient.

8. Output to professional audiobook format M4B (supported chapters).

--output-format "m4b"
--chapters "true"

Example:

python main.py 
 --input "Book.epub" 
 --output "out" 
 --language "zh" 
 --tts-engine "coqui" 
 --output-format "m4b" 
 --chapters true

9. The most trouble-free: Docker runs fast (no Python).

If you have Docker on your computer:

docker run -v "$PWD:/data" 
 ebook2audiobook 
 --input "/data/book.epub" 
 --output "/data/output" 
 --language "zh"

10. Precautions (most common pitfalls encountered by Chinese users).

  1. Poor PDF extraction
    → It is recommended to use Calibre to convert to EPUB before running.
  2. Chinese sentence breakage problem
    → Bark / Coqui can be processed automatically without the need for manual marking.
  3. The output is too slow
    → GPU=Fast
    → CPU=Slow but usable
    → Be patient when the text is long.
  4. If you report an error, ffmpeg cannot be found
    → Put FFmpeg’s /bin in the PATH.

GitHub:https://github.com/DrewThomasson/ebook2audiobook
Tubing:

Scroll to Top