Item

Socrates or Smartypants: Testing Logic Reasoning Capabilities of Large Language Models with Logic Programming-Based Test Oracles

Xu, Zihao
Ding, Junchen
Lou, Yiling
Zhang, Kun
Gong, Dong
Li, Yuekang
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
License
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Large Language Models (LLMs) have achieved significant progress in language understanding and reasoning. Evaluating and analyzing their logical reasoning abilities has therefore become essential. However, existing datasets and benchmarks are often limited to overly simplistic, unnatural, or contextually constrained examples. In response to the growing demand, we introduce SMARTYPAT-BENCH, a challenging, naturally expressed, and systematically labeled benchmark derived from real-world high-quality Reddit posts containing subtle logical fallacies. Unlike existing datasets and benchmarks, it provides more detailed annotations of logical fallacies and features more diverse data. To further scale up the study and address the limitations of manual data collection and labeling, such as fallacy-type imbalance and labor-intensive annotation, we introduce SMARTYPAT, an automated framework powered by logic programming-based oracles. SMARTYPAT utilizes Prolog rules to systematically generate logically fallacious statements, which are then refined into fluent natural language sentences by LLMs, ensuring precise fallacy rep- resentation. Extensive evaluation demonstrates that SMARTYPAT produces fallacies comparable in subtlety and quality to human-generated content and significantly outperforms baseline methods. Finally, experiments reveal insights into LLM capabilities, highlighting that while excessive reasoning steps hinder fallacy detection accuracy, structured reasoning enhances fallacy categorization performance.
Citation
Z. Xu, J. Ding, Y. Lou, K. Zhang, D. Gong, Y. Li, "Socrates or Smartypants: Testing Logic Reasoning Capabilities of Large Language Models with Logic Programming-Based Test Oracles," 2026, pp. 19433-19440.
Source
Proceedings of the AAAI Conference on Artificial Intelligence
Conference
AAAI Conference on Artificial Intelligence
Keywords
46 Information and Computing Sciences, 4602 Artificial Intelligence, 4605 Data Management and Data Science
Subjects
Source
AAAI Conference on Artificial Intelligence
Publisher
Association for the Advancement of Artificial Intelligence
Full-text link