DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
Chen, Jennifer ; Myrzakhan, Aidar ; Luo, Yaxin ; Khan, Hassaan Muhammad ; Bsharat, Sondos Mahmoud ; Shen, Zhiqiang
Chen, Jennifer
Myrzakhan, Aidar
Luo, Yaxin
Khan, Hassaan Muhammad
Bsharat, Sondos Mahmoud
Shen, Zhiqiang
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Retrieval-Augmented Generation (RAG) methods have proven highly effective for tasks requiring factual consistency and robust knowledge retrieval. However, large-scale RAG systems consume significant computational resources and are prone to generating “hallucinated” content from Humans. In this work, we introduce DRAG, a novel framework for distilling RAG knowledge from large-scale Language Models (LLMs) into small LMs (SLMs). Our approach leverages evidence- and knowledge graph-based distillation, ensuring that the distilled model retains critical factual knowledge while significantly reducing model size and computational cost. By aligning the smaller model's predictions with a structured knowledge graph and ranked evidence, DRAG effectively mitigates hallucinations and improves factual accuracy. We further present a case demonstrating how our framework mitigates user privacy risks and introduce a corresponding benchmark. Experimental evaluations on multiple benchmarks demonstrate that our method outperforms the prior competitive RAG methods like MiniRAG for SLMs by up to 27.7% using the same models, preserving high-level efficiency and reliability. With DRAG, we provide a practical and resource-efficient roadmap to deploying enhanced retrieval and generation capabilities in small-sized LLMs. Code is available at https://github.com/VILA-Lab/DRAG.
Citation
J. Chen et al., “DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation,” vol. 1, pp. 7240–7260, Aug. 2025, doi: 10.18653/V1/2025.ACL-LONG.358.
Source
Proceedings of the Annual Meeting of the Association for Computational Linguistics
Conference
63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Keywords
Retrieval-Augmented Generation, Knowledge Distillation, Small-Scale Language Models, Large Language Models, Evidence-Graph Alignment, Hallucination Mitigation, Structured Knowledge Transfer, Efficient Model Deployment
Subjects
Source
63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Publisher
Association for Computational Linguistics
