Item

A Fully Automated Pipeline for Conversational Discourse Annotation: Tree Scheme Generation and Labeling with LLMs

Petukhova, Kseniia
Department
Natural Language Processing
Embargo End Date
30/05/2025
Type
Thesis
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Recent advances in Large Language Models (LLMs) have shown promise in automating discourse annotation for conversations. While manually designing tree annotation schemes significantly improves annotation quality for humans and models, their creation remains time-consuming and requires expert knowledge. I propose a fully automated pipeline that uses LLMs to construct such schemes and perform annotation. I evaluate my approach on speech functions (SFs) and the Switchboard DAMSL (SWBDDAMSL) taxonomies. My experiments compare various design choices, and I show that frequency-guided decision trees, paired with an advanced LLM for annotation, can outperform previously manually designed trees and even match or surpass human annotators while significantly reducing the time required for annotation. I release all code and resultant schemes and annotations to facilitate future research on discourse annotation.
Citation
Kseniia Petukhova, “A Fully Automated Pipeline for Conversational Discourse Annotation: Tree Scheme Generation and Labeling with LLMs,” Master of Science thesis, Natural Language Processing, MBZUAI, 2025.
Source
Conference
Keywords
Discourse, Conversation, Annotation
Subjects
Source
Publisher
DOI
Full-text link