SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Chen, Hao ; Li, Xiang ; Sun, Ximeng ; Chen, Fangyi ; Liu, Jiang ; Wang, Jindong ; Raj, Bhiksha ; Liu, Zicheng ; Barsoum, Emad
Chen, Hao
Li, Xiang
Sun, Ximeng
Chen, Fangyi
Liu, Jiang
Wang, Jindong
Raj, Bhiksha
Liu, Zicheng
Barsoum, Emad
Author
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Efficient image tokenization with high compression ratios remains a critical challenge for training generative models. We present SoftVQ-VAE, a continuous image tokenizer that leverages soft categorical posteriors to aggregate multiple codewords into each latent token, substantially increasing the representation capacity of the latent space. When applied to Transformer-based architectures, our approach compresses 256×256 and 512×512 images using as few as 32 or 64 1-dimensional tokens. Not only does SoftVQ-VAE show consistent and high-quality reconstruction, more importantly, it also achieves state-of-the-art and significantly faster image generation results across different denoising-based generative models. Remarkably, SoftVQ-VAE improves inference throughput by up to 18x for generating 256×256 images and 55x for 512×512 images while achieving competitive FID scores of 1.78 and 2.21 for SiT-XL. It also improves the training efficiency of the generative models by reducing the number of training iterations by 2.3x while maintaining comparable performance. With its fully-differentiable design and semantic-rich latent space, our experiment demonstrates that SoftVQ-VAE achieves efficient tokenization without compromising generation quality, paving the way for more efficient generative models. Code and model are released 1
Citation
H. Chen et al., "SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer," 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2025, pp. 28358-28370, doi: 10.1109/CVPR52734.2025.02641.
Source
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Conference
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025
Keywords
Subjects
Source
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025
Publisher
IEEE
