Loading...
Prompt Inference Attack on Distributed Large Language Model Inference Frameworks
Luo, Xinjian ; Yu, Ting ; Xiao, Xiaokui
Luo, Xinjian
Yu, Ting
Xiao, Xiaokui
Files
Loading...
3719027.3744820.pdf
Adobe PDF, 1.95 MB
Author
Supervisor
Department
Computer Science
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
The inference process of modern large language models (LLMs) demands prohibitive computational resources, rendering them infeasible for deployment on consumer-grade devices. To address this limitation, recent studies propose distributed LLM inference frameworks, which employ split learning principles to enable collaborative LLM inference on resource-constrained hardware. However, distributing LLM layers across participants requires the transmission of intermediate outputs, which may introduce privacy risks to the original input prompts - a critical issue that has yet to be thoroughly explored in the literature. In this paper, we rigorously examine the privacy vulnerabilities of distributed LLM inference frameworks by designing and evaluating three prompt inference attacks aimed at reconstructing input prompts from intermediate LLM outputs. These attacks are developed under various query and data constraints to reflect diverse real-world LLM service scenarios. Specifically, the first attack assumes an unlimited query budget and access to an auxiliary dataset sharing the same distribution as the target prompts. The second attack also leverages unlimited queries but uses an auxiliary dataset with a distribution differing from the target prompts. The third attack operates under the most restrictive scenario, with limited query budgets and no auxiliary dataset available. We evaluate these attacks on a range of LLMs, including state-of-the-art models such as Llama-3.2 and Phi-3.5, as well as widely-used models like GPT-2 and BERT for comparative analysis. Our experiments show that the first two attacks achieve reconstruction accuracies exceeding 90%, while the third achieves accuracies typically above 50%, even under stringent constraints. These findings highlight substantial privacy risks in distributed LLM inference frameworks, issuing a strong alert on their deployment in real-world applications. Additionally, our analysis uncovers distinct distributional properties of intermediate embeddings across LLM layers, providing valuable insights into the LLM inference process and the development of effective defense mechanisms for distributed LLM frameworks.
Citation
X. Luo, T. Yu, and X. Xiao, “Prompt Inference Attack on Distributed Large Language Model Inference Frameworks,” pp. 1739–1753, Nov. 2025, doi: 10.1145/3719027.3744820
Source
Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security
Conference
32nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2025
Keywords
Distributed Inference Framework, LLM, Prompt Inference
Subjects
Source
32nd ACM SIGSAC Conference on Computer and Communications Security, CCS 2025
Publisher
Association for Computing Machinery
