Loading...
FAID: Fine-grained AI-generated Text Detection using Multi-task Auxiliary and Multi-level Contrastive Learning
Ta, Minh Ngoc ; Van, Dong Cao ; Hoang, Duc-Anh ; Le-Anh, Minh ; Nguyen, Truong ; Nguyen, My Anh Tran ; Wang, Yuxia ; Nakov, Preslav ; Sang, Dinh Viet
Ta, Minh Ngoc
Van, Dong Cao
Hoang, Duc-Anh
Le-Anh, Minh
Nguyen, Truong
Nguyen, My Anh Tran
Wang, Yuxia
Nakov, Preslav
Sang, Dinh Viet
Files
Loading...
2026.eacl-long.151.pdf
Adobe PDF, 3.34 MB
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
The growing collaboration between humans and AI models in generative tasks has introduced new challenges in distinguishing between human-written, LLM-generated, and human-LLM collaborative texts. In this work, we collect a multilingual, multi-domain, multi-generator dataset FAIDSet. We further introduce a fine-grained detection framework FAID to classify text into these three categories, and also to identify the underlying LLM family of the generator. Unlike existing binary classifiers, FAID is built to capture both authorship and model-specific characteristics. Our method combines multi-level contrastive learning with multi-task auxiliary classification to learn subtle stylistic cues. By modeling LLM families as distinct stylistic entities, we incorporate an adaptation to address distributional shifts without retraining for unseen data. Our experimental results demonstrate that FAID outperforms several baselines, particularly enhancing the generalization accuracy on unseen domains and new LLMs, thus offering a potential solution for improving transparency and accountability in AI-assisted writing. Our data and code are available at https://github.com/mbzuai-nlp/FAID.
Citation
M.N. Ta, D.C. Van, D.-A. Hoang, M. Le-Anh, T. Nguyen, M.A.T. Nguyen , et al., "FAID: Fine-grained AI-generated Text Detection using Multi-task Auxiliary and Multi-level Contrastive Learning," 2026, pp. 3275-3296.
Source
Conference
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Keywords
46 Information and Computing Sciences, 4608 Human-Centred Computing
Subjects
Source
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Publisher
Association for Computational Linguistics
