Control Illusion: The Failure of Instruction Hierarchies in Large Language Models
Geng, Yilin ; Li, Haonan ; Mu, Honglin ; Han, Xudong ; Baldwin, Timothy ; Abend, Omri ; Hovy, Eduard ; Frermann, Lea
Geng, Yilin
Li, Haonan
Mu, Honglin
Han, Xudong
Baldwin, Timothy
Abend, Omri
Hovy, Eduard
Frermann, Lea
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
License
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Large language models (LLMs) are increasingly deployed with hierarchical instruction schemes, where certain instructions (e.g., system-level directives) are expected to take precedence over others (e.g., user messages). Yet, we lack a systematic understanding of how effectively these hierarchical control mechanisms work. We introduce a systematic evaluation framework based on constraint prioritization to assess how well LLMs enforce instruction hierarchies. Our experiments across six state-of-the-art LLMs reveal that models struggle with consistent instruction prioritization, even for simple formatting conflicts. We find that the widely-adopted system/user prompt separation fails to establish a reliable instruction hierarchy, and models exhibit strong inherent biases toward certain constraint types regardless of their priority designation. Interestingly, we also find that societal hierarchy framings (e.g., authority, expertise, consensus) show stronger influence on model behavior than system/user roles, suggesting that pretraining-derived social structures function as latent behavioral priors with potentially greater impact than post-training guardrails.
Citation
Y. Geng, H. Li, H. Mu, X. Han, T. Baldwin, O. Abend , et al., "Control Illusion: The Failure of Instruction Hierarchies in Large Language Models," 2026, pp. 30816-30824.
Source
Proceedings of the AAAI Conference on Artificial Intelligence
Conference
AAAI Conference on Artificial Intelligence
Keywords
46 Information and Computing Sciences, 4608 Human-Centred Computing
Subjects
Source
AAAI Conference on Artificial Intelligence
Publisher
Association for the Advancement of Artificial Intelligence
