Model architecture:
The MoE model with 132B parameters has a total of 16 experts, and each Token activates 4 experts, which means that there are 36B active parameters, while Mixstral only has 13B active parameters (nearly three times less).
Performance:
It easily beats open source models such as LLaMA2 – 70B, Mixtral and Grok-1 in language understanding, programming, mathematics and logic.
DBRX exceeds GPT-3.5 in most benchmarks.
DBRX is an expert hybrid model (MoE) built based on MegaBlocks research and open source projects, making it very fast in terms of the number of tags processed per second.
Data training:
Pre-trained with 12 trillion Tokens of text and code, and the maximum context length supported is 32k Tokens.
Meet DBRX: Universal LLM that sets new standards for efficient open source models.
Use the DBRX model in RAG applications or use DBRX designs to build your own custom LLMs and improve the quality of GenAI applications.
https://dbricks.co/43xaCMj
Video: