Item

ChatENV: An Interactive Vision-Language Model for Sensor-Guided Environmental Monitoring and Scenario Simulation

Elgendy, Hosam
Sharshar, Ahmed
Aboeitta, Ahmed
Guizani, Mohsen
Citations
Google Scholar:
Altmetric:
Supervisor
Department
Machine Learning
Embargo End Date
Type
Journal article
Date
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Understanding environmental changes from remote sensing imagery is vital for climate resilience, urban planning, and ecosystem monitoring. Yet, current vision language models (VLMs) overlook causal signals from environmental sensors, rely on single-source captions prone to stylistic bias, and lack interactive scenario-based reasoning. We present ChatENV, the first interactive VLM that jointly reasons over satellite image pairs and real-world sensor data. Our framework: (i) creates a 177k-image dataset forming 152k temporal pairs across 62 land-use classes in 197 countries with rich sensor metadata (e.g., temperature, PM10, CO); (ii) annotates data using GPT-4o and Gemini 2.0 for stylistic and semantic diversity; and (iii) fine-tunes Qwen-2.5-VL using efficient Low-Rank Adaptation (LoRA) adapters for chat purposes. ChatENV achieves strong performance in temporal and what-if reasoning (e.g., BERT-F1 0.902) and rivals or outperforms state-of-the-art temporal models, while supporting interactive scenario-based analysis. This positions ChatENV as a powerful tool for grounded, sensor-aware environmental monitoring.1.
Citation
H. Elgendy, A. Sharshar, A. Aboeitta, M. Guizani, "ChatENV: An Interactive Vision-Language Model for Sensor-Guided Environmental Monitoring and Scenario Simulation," IEEE Transactions on Geoscience and Remote Sensing, vol. PP, no. 99, pp. 1-1, 2026, https://doi.org/10.1109/tgrs.2026.3685864.
Source
IEEE Transactions on Geoscience and Remote Sensing
Conference
Keywords
37 Earth Sciences, 40 Engineering, 13 Climate Action
Subjects
Source
Publisher
IEEE
Full-text link