Loading...
Thumbnail Image
Item

Discovery of the hidden world with large language models

Liu, Chenxi
Chen, Yongqiang
Liu, Tongliang
Gong, Mingming
Cheng, James
Han, Bo
Zhang, Kun
Research Projects
Organizational Units
Journal Issue
Abstract
Revealing the underlying causal mechanisms in the real world is the key to the development of science. Despite the progress in the past decades, traditional causal discovery approaches (CDs) mainly rely on high-quality measured variables, usually given by human experts, to find causal relations. The lack of well-defined high-level variables in many real-world applications has already been a longstanding roadblock to a broader application of CDs. To this end, this paper presents Causal representatiOn AssistanT (COAT) that introduces large language models (LLMs) to bridge the gap. LLMs are trained on massive observations of the world and have demonstrated great capability in extracting key information from unstructured data. Therefore, it is natural to employ LLMs to assist with proposing useful high-level factors and crafting their measurements. Meanwhile, COAT also adopts CDs to find causal relations among the identified variables as well as to provide feedback to LLMs to iteratively refine the proposed factors. We show that LLMs and CDs are mutually beneficial and the constructed feedback provably also helps with the factor proposal. We construct and curate several synthetic and real-world benchmarks including analysis of human reviews and diagnosis of neuropathic and brain tumors, to comprehensively evaluate COAT. Extensive empirical results confirm the effectiveness and reliability of COAT with significant improvements.
Citation
C. Liu et al., “Discovery of the Hidden World with Large Language Models,” Adv Neural Inf Process Syst, vol. 37, pp. 102307–102365, Dec. 2024, [Online]. Available: https://causalcoat.github.io
Source
Advances in Neural Information Processing Systems (NeurIPS 2024)
Conference
Keywords
Causal discovery, Large Language Models (LLMs), High-level variable identification, COAT framework, Unstructured data analysis?
Subjects
Source
Publisher
NEURIPS
DOI
Full-text link