Despite the remarkable achievements of large language models (LLMs) in a variety of tasks, there is still a language bias that favors high-resource languages (such as English), often at the expense of low-resource languages and regional languages.
To address this imbalance, SEA introduced SeaLLM, a series of innovative language models specifically targeted at Southeast Asian (SEA) languages.
SeaLLM is built on the Llama-2 model and is further developed through continuous pre-training, including expanded vocabulary, specialized guidance, and alignment adjustments to better capture the complexity of regional languages. This allows them to respect and reflect local cultural norms, customs, style preferences and legal considerations.
A comprehensive evaluation shows that compared to similar open source models, the SeaLLM-13b model demonstrates superior performance in a wide range of language tasks and assistant-style instruction tracking capabilities. In addition, they perform far better than ChatGPT-3.5 in non-Latin languages such as Thai, Khmer, Lao and Burmese, while remaining lightweight and cost-effective to operate.
a peer comparison
One of the most reliable ways to compare chatbot models is peer comparisons. With the help of native speakers, we built a set of command tests called Sea-bench that focuses on all aspects expected from user-facing chatbots, namely:
(1)Task resolution (such as translation and understanding),
(2)Mathematics-Reasoning (for example, mathematical and logical reasoning problems),
(3) General instructions (e.g., general domain instructions),
(4) Natural issues (e.g., questions about local context that are usually written informally), and
(5) Safety-related issues. The test set also covers all the languages we care about.
Similar to MT-bench, we used GPT-4 as an evaluator to evaluate comparisons between our model and ChatGPT-3.5 and other baselines.
Regional Language World Knowledge
M3Exam is a real-life collection of local official human exam question benchmarks. The benchmark covers issues in multiple countries in Southeast Asia that require strong multilingual skills and cultural knowledge across different key educational stages, from primary school to high school difficulty.
As the table shows, our SeaLLM model is better than most 13B baselines and is closer to the performance of ChatGPT. It’s worth noting that for Thai, a seemingly resource-scarce language, our model is only 1% behind ChatGPT, despite the large difference in size.
If you want to learn more, you can click on the link below the video.
Thank you for watching this video. If you like it, please subscribe and like it. thank
Paper: https://huggingface.co/papers/2312.00738
Video: