AI21 releases the world’s first production-level model of Mamba: Jamba

Pioneering SSM-Transformer architecture

🧠52B parameter, 12B is active during generation
👨‍🏫16 位专家,生成过程中仅2个专家处于活跃状态
Combining Joint Attention and Mamba technologies
ˇSupports 256K context length
A single A100 80GB can accommodate up to 140K contexts
🚀Long-context throughput is 3x higher compared to Mixtral 8x7B

Jamba combines Mamba Structured State Space (SSM) technology with elements of traditional Transformer architecture to make up for the inherent limitations of the pure SSM model.

background knowledge

Jamba represents a major innovation in model design. “Mamba” here refers to a Structured State Space Model (SSM), which is a model used to capture and process data over time, and is particularly suitable for processing sequential data, such as text or time-series data. A key advantage of the SSM model is its ability to efficiently process long sequences of data, but it may not be as powerful as other models at handling complex patterns and dependencies.

The “Transformer” architecture is one of the most successful models in the field of artificial intelligence in recent years, especially in natural language processing (NLP) tasks. It can process and understand language data very efficiently and capture long-distance dependencies, but it encounters problems with computational efficiency and memory consumption when processing long sequences of data.

The Jamba model combines Mamba’s SSM technology with elements of the Transformer architecture, aiming to leverage the strengths of both while overcoming their respective limitations. Through this combination, Jamba is not only able to efficiently process long-sequence data (a strength of Mamba), but also maintains a high understanding of complex language patterns and dependencies (a strength of Transformer). This means that the Jamba model remains efficient when handling tasks that require understanding large amounts of text and complex dependencies without sacrificing performance or precision.

Website:https://ai21.com/jamba
Detailed introduction:https://ai21.com/blog/announcing-jamba
Model:https://huggingface.co/ai21labs/Jamba-v0.1

Video:

Scroll to Top