Loading...
A Scalable Framework for Automated NER Annotation Correction in Low-Resource Languages
Ehsan, Toqeer ; Solorio, Thamar
Ehsan, Toqeer
Solorio, Thamar
Files
Author
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Poor quality or noisy annotations in Named Entity Recognition (NER), as in any other NLP task, make it challenging to achieve state-of-the-art performance. In this paper, we present a multi-step framework to enhance the annotation quality of NER datasets by employing automated techniques. We propose a frequency-based iterative approach that leverages self-training and a dual-threshold mechanism to enhance inference confidence. Experimental evaluations on different NER datasets demonstrate significant improvements in NER performance with respect to the original datasets. This work further explores the potential of generative Large Language Models (LLMs) to perform NER for low-resource languages.
Citation
T. Ehsan, T. Solorio, "A Scalable Framework for Automated NER Annotation Correction in Low-Resource Languages," 2026, pp. 4138-4151.
Source
Conference
Findings of the Association for Computational Linguistics: EACL 2026
Keywords
Subjects
Source
Findings of the Association for Computational Linguistics: EACL 2026
Publisher
Association for Computational Linguistics
