Item

Multimodal Agentic System for Highway Safety Monitoring

Almarzooqi, Abdulla Hasan Ali Abdulwahab
Department
Machine Learning
Embargo End Date
2025-05-30
Type
Thesis
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
With the increasing volume of road and highway traffic and associated risks, traditional road surveillance methods (largely dependent on human intervention) are often slow, inefficient, and prone to delays. This research addresses some of these issues by introducing a multimodal agentic system for road monitoring to improve traffic safety in two aspects: realtime detection of unforeseen driving events through scene description and automation of the accident reporting process. In this work, two primary agentic systems are developed. The first is designed for driving scene description, where input images are analyzed by an object detection agent utilizing YOLOv11. The detected objects and the original image are then passed to a scene description agent employing Multimodal Large Language Models (MLLMs), which generate a structured description of the exterior driving environment. This approach enables the system to quickly identify and interpret anomalies, enhancing situational awareness. The second system focuses on automatic accident report generation. This framework leverages five specialized agents: the weather condition agent, the seatbelt status agent, the airbag status agent, the video preprocessing agent, and the accident report generation agent. Upon detecting an accident (signalled by airbag deployment), the system retrieves recent dashcam accident footage, processes relevant frames, and gathers data from the other agents (weather conditions, seatbelt usage, and airbag status). The accident report generation agent (which utilizes MLLMs) then analyzes these multimodal inputs to produce a comprehensive, structured accident report without human intervention. The key benefits of this agentic framework include its adaptability to non-deterministic traffic environments, scalability for increasingly complex road systems, and ability to unify diverse data sources within a coherent and automated workflow. The evaluations confirmed the system’s effectiveness in scene description and report generation. In particular, integrating YOLOv11 with MLLMs in the scene description system significantly improved performance, with GPT4o achieving a top score of 95.6%. In the accident report generation system, the frame sampling technique yielded the best results, reaching a peak score of 89.6% with GPT4o. This research integrates MLLM with computer vision (CV) models or realtime sensory data within an agentic framework, offering an efficient artificial intelligence (AI) driven solution to improve road safety and advance the capabilities of intelligent transportation systems.
Citation
Abdulla Hasan Ali Abdulwahab Almarzooqi, “Multimodal Agentic System for Highway Safety Monitoring,” Master of Science thesis, Machine Learning, MBZUAI, 2025.
Source
Conference
Keywords
Agentic AI, Multimodal LLMs, Highway Safety Monitoring, Driving Scene Description, Automatic Accident Report Generation, Computer Vision
Subjects
Source
Publisher
DOI
Full-text link