Bi-Mamba: Towards Accurate 1-Bit State Space Models
Tang, Shengkun ; Ma, Liqun ; Li, Haonan ; Sun, Mingjie ; Shen, Zhiqiang
Tang, Shengkun
Ma, Liqun
Li, Haonan
Sun, Mingjie
Shen, Zhiqiang
Supervisor
Department
Machine Learning
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
The typical Selective State-Space Model (SSM) used in Mamba addresses several limitations of Transformers, such as the quadratic computational complexity with respect to sequence length and the significant memory requirements during inference due to the key-value (KV) cache. However, the increasing size of Mamba models continues to pose challenges for training and deployment, particularly due to their substantial computational demands during both training and inference. In this work, we introduce Bi-Mamba, a scalable and powerful 1-bit Mamba architecture designed to enable more efficient large language models (LLMs), with model sizes of 780M, 1.3B, and 2.7B parameters. Bi-Mamba models are trained from scratch on a standard LLM-scale dataset using an autoregressive distillation loss. Extensive experiments on language modeling benchmarks demonstrate that Bi-Mamba achieves performance comparable to its full-precision (FP16 or BF16) counterparts, while outperforming post-training binarization (PTB) Mamba and binarization-aware training (BAT) Transformer baselines. Moreover, Bi-Mamba drastically reduces memory usage and computational cost compared to the original Mamba. Our work pioneers a new line of linear-complexity LLMs under low-bit representation and paves the way for the design of specialized hardware optimized for efficient 1-bit Mamba-based models. Code and the pre-trained weights are available at https://github.com/Tangshengku/Bi-Mamba.
Citation
S. Tang, L. Ma, H. Li, M. Sun, and Z. Shen, “Bi-Mamba: Towards Accurate 1-Bit State Space Models,” Transactions on Machine Learning Research, 2025
Source
Transactions on Machine Learning Research
Conference
Keywords
Subjects
Source
Publisher
Transactions on Machine Learning Research
