Grounded Answers from Multi-Passage Regulations: Learning-to-Rank for Regulatory RAG
Gokhan, Tuba ; Briscoe, Ted
Gokhan, Tuba
Briscoe, Ted
Author
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Regulatory compliance questions often require aggregating evidence from multiple, interrelated sections of long, complex documents. To support question-answering (QA) in this setting, we introduce ObliQA-MP, a dataset for multi-passage regulatory QA, extending the earlier ObliQA benchmark (CITATION), and improve evidence quality with an LLM–based validation step that filters out ~20% of passages missed by prior natural language inference (NLI) based filtering. Our benchmarks show a notable performance drop from single- to multi-passage retrieval, underscoring the challenges of semantic overlap and structural complexity in regulatory texts. To address this, we propose a feature-based learning-to-rank (LTR) framework that integrates lexical, semantic, and graph-derived information, achieving consistent gains over dense and hybrid baselines. We further add a lightweight score-based filter to trim noisy tails and an obligation-centric prompting technique. On ObliQA-MP, LTR improves retrieval (Recall@10/MAP@10/nDCG@10) over dense, hybrid, and fusion baselines. Our generation approach, based on domain-specific filtering plus prompting, achieves strong scores using the RePAS metric (CITATION) on ObliQA-MP, producing faithful, citation-grounded answers. Together, ObliQA-MP and our validation and RAG systems offer a stronger benchmark and a practical recipe for grounded, citation-controlled QA in regulatory domains.
Citation
T. Gokhan and T. Briscoe, “Grounded Answers from Multi-Passage Regulations: Learning-to-Rank for Regulatory RAG,” Proceedings of the Natural Legal Language Processing Workshop 2025, pp. 135–146, 2025, doi: 10.18653/V1/2025.NLLP-1.10
Source
Proceedings of the Natural Legal Language Processing Workshop 2025
Conference
Natural Legal Language Processing Workshop 2025
Keywords
Regulatory RAG, Multi-Passage Retrieval, Learning-to-Rank, Question-Passage Matching, Legal Regulation QA, Evidence Ranking, ObliQA-MP Dataset, Large Language Models, Law
Subjects
Source
Natural Legal Language Processing Workshop 2025
Publisher
Association for Computational Linguistics
