MistralAI open-source large models based on MoE

MistralAI open-sources the world’s first (possibly) large model based on MoE (Mixture of Experts) technology

Fun facts:

  • Released as an 87 GB torrent
  • Appears to be a scaled-down version of GPT-4
  • Posted on X, without a press release and refusing to elaborate

Expert Mixing (MoE) is a technique used in LLMs to improve their efficiency and accuracy. This approach works by dividing complex tasks into smaller, more manageable subtasks, each handled by specialized mini-models or “experts.”

  1. Expert tiers: These are smaller neural networks that are trained to be highly skilled in a specific area.
    2. Gated network: This is the decision-maker of the MoE architecture.

Extended information:

Introduction to MoE Technology: Hybrid Experts (MoE) is a technology used in large language models (LLMs) to improve efficiency and accuracy. It works by breaking down complex tasks into smaller, more manageable subtasks, each handled by a dedicated small model or “expert.”

Components of the MoE: Expert Layer: These are small neural networks that are trained to specialize in specific areas. Each expert processes the same input in a way that matches its specialization.

Gated network: This is the decision-maker of the MoE architecture. It evaluates which expert is best suited for a given input data. The network calculates compatibility scores between the input and each expert, which are then used to determine the level of involvement of each expert in the task.

Mistral’s MoE vs. GPT-4: Mistral 8x7B uses a very similar architecture to GPT-4, but on a smaller scale: a total of 8 experts instead of 16 (2x reduction), each expert has 7B parameters instead of 166B (24x reduction), a total of about 42B parameters instead of 1.8T (42x reduction), the same 32K context as the original GPT-4.

Download link (magnet link):

magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%http://2Fopentracker.i2p.rocks%3A6969%2Fannounce&tr= http%3A%2F%https://t.co/g0m9cEUz0T%3A80%2Fannounce
RELEASE a6bbd9affe0c2725c1b7410d66833e24

MoE 8x7B online experience, courtesy of @mattshumer_: https://replicate.com/nateraw/mixtral-8x7b-32kseqlen

Scroll to Top