WeClone project to create your digital avatar

Project description: An open source project that aims to fine-tune the large language model through WeChat chat records and voice messages to achieve personalized digital doppelganger, including text style and sound cloning.
Support binding fine-tuned models to robots on platforms such as WeChat, QQ, Enterprise WeChat, Flybook, and Telegram to achieve cross-platform digital avatar. The project continues to be updated and iterated.

WeClone is an open source project that aims to fine-tune the Large Language Model (LLM) through WeChat chat records to achieve personalized digital avatar, and can be deployed as a chat robot on WeChat, QQ, Telegram and other platforms.

project overview

WeClone provides a complete process, including:

data export: Use the PyWxDump tool to extract WeChat chat records.
data preprocessing: Cleanses the data, removes sensitive information, and formats it into a JSON format that can be used for the model.
model training: Based on the ChatGLM3 – 6B model, the LoRA method is used for fine-tuning.
Deployment and Reasoning: Conduct model reasoning through browser Demo or API services and can be deployed as a chat robot.

In addition, the project also supports the voice cloning function (WeClone-audio module), which can combine WeChat voice messages with the 0.5B model to achieve high-quality voice cloning.

hardware and software requirements

Hardware requirements:

The ChatGLM3 – 6B model is used by default, and the LoRA method fine-tuning phase requires about 16GB of video memory.
Support using other models and methods supported by LLaMA Factory, which consumes less memory.

Software requirements:

Python version 3.8 and above.
Required libraries: Torch, Transformers, Datasets, Accelerate, PEFT, TRL, etc.
Optional libraries: CUDA, Deepspeed, Bitsandbytes, Flash-attn, etc.

environment construction

Recommended to use uv tools for environmental management:

git clone https://github.com/xming521/WeClone.git
cd WeClone
uv venv .venv --python=3.9
source .venv/bin/activate
uv pip install --group main -e .



Data preparation and pretreatment

Use the PyWxDump tool to extract WeChat chat records and export them to CSV format.
Place the exported CSV file in the ./ data/csv Under the catalog.
run ./ make_dataset/csv_to_json.py Script to clean and format the data.

By default, the project removes sensitive information such as mobile phone number, ID number, mailbox, and website address from the data, and provides a lexicon of prohibited words. blocked_words, you can add your own words that need to be filtered.

Model download and fine-tuning

Your first choice is to download ChatGLM3 models from Hugging Face, or use models provided by the Magic Community.

Fine-tuning configuration is unified in settings.json In the file, model paths, training parameters, etc. can be modified as needed.

Single card training:

python src/train_sft.py



Doka Training:

pip install deepspeed
deepspeed --num_gpus= number of graphics cards used src/train_sft.py



Reasoning and deployment

Browser Demo:

python ./ src/web_demo.py



API services:

python ./ src/api_service.py



Deploy as a chatbot:

ˇNote: Using WeChat robots risks being blocked. It is recommended to use a small number and bind it to a bank card.

python ./ src/api_service.py #Start the API service
python ./ src/wechat_bot/main.py #Launch WeChat robot



After scanning the code and logging in, you can interact with @ robots in private chats or Group chats.

Project status and considerations

WeClone is still in rapid iteration, and the current effect does not represent the final effect.
The fine-tuning effect depends largely on the amount and quality of chat data.
The Windows environment has not been tested and it is recommended to use a WSL or Linux environment.

The WeClone project provides users with a complete solution from data preparation to model deployment, making it possible to create personalized digital avatars.

Project address: Click to open (https://github.com/xming521/WeClone)

Oil tubing: