Flipping Knowledge Distillation: Leveraging Small Models' Expertise to Enhance LLMs in Text Matching
Li, Mingzhe ; Xiang, Jing ; Zhang, Qishen ; Wan, Kaiyang ; Chen, Xiuying
Li, Mingzhe
Xiang, Jing
Zhang, Qishen
Wan, Kaiyang
Chen, Xiuying
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Knowledge distillation typically involves transferring knowledge from a Large Language Model (LLM) to a Smaller Language Model (SLM). However, in tasks like text matching, smaller fine-tuned models often produce more effective domain-specific representations as they focus on optimizing the similarity between input pairs. To combine the specialized strengths of small models with the rich semantic understanding of LLMs, we propose a flipped knowledge distillation paradigm, where the LLM learns from the SLM. To bridge the architectural gap between commonly used decoder-only LLMs and the encoder-based frameworks of smaller models, we reinterpret LLMs as encoder-decoder models using LoRA. In this setup, the encoder generates compressed text representations, while the decoder transforms them into the output space. During training, the encoder produces text representations and computes their similarities, which are then aligned with the similarity scores produced by the teacher model. We achieve this alignment using our proposed Margin-aware Contrastive Learning (MCL) approach. MCL ensures accurate similarity for both positive and negative pairs, while also adaptively handling differences within positive and negative samples. We validate the effectiveness of our approach on financial and healthcare benchmarks as well as real-world online applications. Our model has been fully deployed in an online application environment, demonstrating its practical utility.
Citation
M. Li, J. Xiang, Q. Zhang, K. Wan, and X. Chen, “Flipping Knowledge Distillation: Leveraging Small Models’ Expertise to Enhance LLMs in Text Matching,” vol. 1, pp. 22218–22229, Aug. 2025, doi: 10.18653/V1/2025.ACL-LONG.1081
Source
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1)
Conference
63rd Annual Meeting of the Association for Computational Linguistics
Keywords
Subjects
Source
63rd Annual Meeting of the Association for Computational Linguistics
Publisher
Association for Computational Linguistics
