Item

The Gaps between Fine Tuning and In-context Learning in Bias Evaluation and Debiasing

Kaneko, Masahiro
Bollegala, Danushka
Baldwin, Timothy
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
The output tendencies of PLMs vary markedly before and after FT due to the updates to the model parameters. These divergences in output tendencies result in a gap in the social biases of PLMs. For example, there exits a low correlation between intrinsic bias scores of a PLM and its extrinsic bias scores under FT-based debiasing methods. Additionally, applying FT-based debiasing methods to a PLM leads to a decline in performance in downstream tasks. On the other hand, PLMs trained on large datasets can learn without parameter updates via ICL using prompts. ICL induces smaller changes to PLMs compared to FT-based debiasing methods. Therefore, we hypothesize that the gap observed in pre-trained and FT models does not hold true for debiasing methods that use ICL. In this study, we demonstrate that ICL-based debiasing methods show a higher correlation between intrinsic and extrinsic bias scores compared to FT-based methods. Moreover, the performance degradation due to debiasing is also lower in the ICL case compared to that in the FT case.
Citation
M. Kaneko, D. Bollegala, and T. Baldwin, “The Gaps between Fine Tuning and In-context Learning in Bias Evaluation and Debiasing,” 2025. Accessed: Mar. 12, 2025. [Online]. Available: https://aclanthology.org/2025.coling-main.187/
Source
Proceedings of the 31st International Conference on Computational Linguistics, 2025
Conference
Keywords
Intrinsic bias evaluation, Extrinsic bias evaluation, Fine-tuning (FT), In-context learning (ICL), Pre-trained language models (PLMs)?
Subjects
Source
Publisher
Association for Computational Linguistics
DOI
Full-text link