Handling Realistic Label Noise in BERT Text Classification
Agro, Maha Tufail
Agro, Maha Tufail
Author
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Thesis
Date
2023
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
This thesis focuses on the impact of label noise on deep learning models, particularly in the context of text classification tasks. With the increasing quantity of data in the deep learning world, producing gold-standard labels has become challenging, and cheaper data annotation methods introduce label noise in datasets. The resulting label noise can significantly impact the performance of supervised classification tasks. While many methods have been developed to mitigate the impact of label noise, they have not been extensively evaluated for text classification tasks, and most of these methods assume random injection of label noise, which does not represent real-world label noise. In this work, we evaluate state-of-the-art approaches for text classification with realistic label noise and propose three methods to combat label noise in text classification: deep ensembles, data noise cleansing, and few-shot prompt learning. Our findings demonstrate the effectiveness of these approaches in handling realistic label noise in text classification tasks and provide insights for further research in this area.
Citation
M.T. Agro, "Handling Realistic Label Noise in BERT Text Classification", M.S. Thesis, Natural Language Processing, MBZUAI, Abu Dhabi, UAE, 2023.
