FAIR's new research: predicting better and faster large language models through multiple tags

Meta AI reintroduced their new paper, accelerating LLM training by predicting multiple words at once.

Generally, language models predict the next word based on known words. And this paper proposes to predict the next multiple words at a time, rather than just one word.

This approach improves the ability of code and natural language models on downstream tasks without increasing training time. For larger models, this improvement is more obvious.

Models trained with 4-word predictions can be up to three times faster when reasoning, even when processing large quantities of data.

We show that replacing the next labeled prediction task with multiple labeled predictions can achieve better code generation performance with exactly the same training budget and data, while also improving reasoning performance by a factor of 3.

Although similar methods have previously been used for fine-tuning to increase reasoning speed, this research expanded to pre-training of large models, showing significant behavior and results at these scales.

If you want to learn more, you can click on the link below the video.
Thank you for watching this video. If you like it, please subscribe and like it. thank

Research Paper: https://go.fb.me/wty7gj
Paper: https://arxiv.org/abs/2404.19737

Video: