Item

NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors

Naeem, Numaan
Ahmad, Sarfraz
Ahsan, Momina
Iqbal, Hasan
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
This paper presents our system for Track 1: Mistake Identification in the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-powered Tutors. The task involves evaluating whether a tutor’s response correctly identifies a mistake in a student’s mathematical reasoning. We explore four approaches: (1) an ensemble of machine learning models over pooled token embeddings from multiple pretrained langauge models (LMs); (2) a frozen sentence-transformer using [CLS] embeddings with an MLP classifier; (3) a history-aware model with multi-head attention between token-level history and response embeddings; and (4) a retrieval-augmented few-shot prompting system with a large language model (LLM) i.e. GPT 4o. Our final system retrieves semantically similar examples, constructs structured prompts, and uses schema-guided output parsing to produce interpretable predictions. It outperforms all baselines, demonstrating the effectiveness of combining example-driven prompting with LLM reasoning for pedagogical feedback assessment.
Citation
N. Naeem Sarfraz, A. Momina, A. Hasan, and I. Mbzuai, “NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors,” pp. 1254–1259, Aug. 2025, doi: 10.18653/V1/2025.BEA-1.100.
Source
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications
Conference
20th Workshop on Innovative Use of NLP for Building Educational Applications
Keywords
Subjects
Source
20th Workshop on Innovative Use of NLP for Building Educational Applications
Publisher
Association for Computational Linguistics
Full-text link