Item

ECM: Enhancing Compressibility of Quantized Vision Encoder and LLM for Large Vision-Language Models

Wang, Weilan
Mao, Yu
Tang, Dongdong
Guan, Nan
Xue, Chun Jason
Supervisor
Department
Computer Science
Embargo End Date
Type
Conference proceeding
Date
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Quantizing the large language model (LLM) in vision-language models (VLMs) is an effective approach to reducing memory size. However, quantizing only the LLM shifts the memory bottleneck to the vision encoder, particularly in lightweight models. This paper proposes ECM, an end-to-end quantization framework for VLMs that compresses both the vision encoder and the LLM. We first study the impact of quantization granularity on model compressibility and accuracy, and find that finer granularity improves compression at the cost of performance, motivating the need for adaptive strategies. ECM incorporates Adaptive Granularity Quantization and Weight Scaling to balance compression and accuracy. We further apply lossless compression to the quantized weights to maximize storage efficiency. Experiments show that ECM achieves 1.34× and 1.25× compression ratios for the vision encoder and LLM, respectively, reducing the memory usage by 80.3% of FP16 VLM, and 51.3% of LLM-quantized VLM on average.
Citation
W. Wang, Y. Mao, D. Tang, N. Guan, C.J. Xue, "ECM: Enhancing Compressibility of Quantized Vision Encoder and LLM for Large Vision-Language Models," 2026, pp. 19732-19736.
Source
ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Conference
2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Keywords
40 Engineering, 46 Information and Computing Sciences, 4603 Computer Vision and Multimedia Computation
Subjects
Source
2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Publisher
IEEE
Full-text link