The first parametric model with more than 100 billion yuan
Qwen1.5- 110B is a new member of the Qwen1.5 series and the first model in the series to have more than 100 billion parameters.
The model performed well in base model evaluations, comparable to Meta-Llama3- 70B, and performed well in chat model evaluations, including MT-Bench and AlpacaEval 2.0.
The model supports multiple languages, including English, Chinese, French, Spanish, etc., and the context length can reach 32K tokens.
Model characteristics:
- Architecture: Adopt Transformer decoder architecture and have Group Query Attention (GQA).
- Performance: Demonstrate excellent performance in both standard evaluations and chat model evaluations.
- Multi-language support: Supports multiple languages, and the context length can reach 32K tokens.
According to official evaluation results,
The evaluation results of the Qwen 1.5 – 110B model slightly exceeded that of Llama-3- 70B and Mixtral-8× 22B.
The Qwen 1.5 – 110B model scores slightly higher than Llama-3- 70B in comprehensive understanding (MMLU) and mathematical reasoning (GSM8K and MATH), making it the strongest among the several models. However, on the complex reasoning task ARC-C, it is slightly lower than the Mixtral-8× 22B model. HumanEval scores far higher than other models in the programming test, while MBPP programming test scores lower than the Mixtal-8× 22B model.
If you want to learn more, you can click on the link below the video.
Thank you for watching this video. If you like it, please subscribe and like it. thank
Details:https://qwenlm.github.io/blog/qwen1.5-110b/
Video: