WeClone project to create your digital avatar

Project description: An open source project that aims to fine-tune the large language model through WeChat chat records and voice messages to achieve personalized digital doppelganger, including text style and sound cloning.
Support binding fine-tuned models to robots on platforms such as WeChat, QQ, Enterprise WeChat, Flybook, and Telegram to achieve cross-platform digital avatar. The project continues to be updated and iterated.

WeClone is an open source project that aims to fine-tune the Large Language Model (LLM) through WeChat chat records to achieve personalized digital avatar, and can be deployed as a chat robot on WeChat, QQ, Telegram and other platforms.

project overview

WeClone provides a complete process, including:

  • data export: Use the PyWxDump tool to extract WeChat chat records.
  • data preprocessing: Cleanses the data, removes sensitive information, and formats it into a JSON format that can be used for the model.
  • model training: Based on the ChatGLM3 – 6B model, the LoRA method is used for fine-tuning.
  • Deployment and Reasoning: Conduct model reasoning through browser Demo or API services and can be deployed as a chat robot.

In addition, the project also supports the voice cloning function (WeClone-audio module), which can combine WeChat voice messages with the 0.5B model to achieve high-quality voice cloning.

hardware and software requirements

Hardware requirements:

  • The ChatGLM3 – 6B model is used by default, and the LoRA method fine-tuning phase requires about 16GB of video memory.
  • Support using other models and methods supported by LLaMA Factory, which consumes less memory.

Software requirements:

  • Python version 3.8 and above.
  • Required libraries: Torch, Transformers, Datasets, Accelerate, PEFT, TRL, etc.
  • Optional libraries: CUDA, Deepspeed, Bitsandbytes, Flash-attn, etc.

environment construction

Recommended to use uv tools for environmental management:

git clone https://github.com/xming521/WeClone.git
cd WeClone
uv venv .venv --python=3.9
source .venv/bin/activate
uv pip install --group main -e .

Data preparation and pretreatment

  1. Use the PyWxDump tool to extract WeChat chat records and export them to CSV format.
  2. Place the exported CSV file in the ./ data/csv Under the catalog.
  3. run ./ make_dataset/csv_to_json.py Script to clean and format the data.

By default, the project removes sensitive information such as mobile phone number, ID number, mailbox, and website address from the data, and provides a lexicon of prohibited words. blocked_words, you can add your own words that need to be filtered.

Model download and fine-tuning

Your first choice is to download ChatGLM3 models from Hugging Face, or use models provided by the Magic Community.

Fine-tuning configuration is unified in settings.json In the file, model paths, training parameters, etc. can be modified as needed.

Single card training:

python src/train_sft.py

Doka Training:

pip install deepspeed
deepspeed --num_gpus= number of graphics cards used src/train_sft.py

Reasoning and deployment

Browser Demo:

python ./ src/web_demo.py

API services:

python ./ src/api_service.py

Deploy as a chatbot:

ˇNote: Using WeChat robots risks being blocked. It is recommended to use a small number and bind it to a bank card.

python ./ src/api_service.py #Start the API service
python ./ src/wechat_bot/main.py #Launch WeChat robot

After scanning the code and logging in, you can interact with @ robots in private chats or Group chats.

Project status and considerations

  • WeClone is still in rapid iteration, and the current effect does not represent the final effect.
  • The fine-tuning effect depends largely on the amount and quality of chat data.
  • The Windows environment has not been tested and it is recommended to use a WSL or Linux environment.

The WeClone project provides users with a complete solution from data preparation to model deployment, making it possible to create personalized digital avatars.

Project address: Click to open (https://github.com/xming521/WeClone)

Oil tubing:

Scroll to Top