Item

A multi-sensor fusion network with multi-cognitive visual adaptation and adaptive dynamic convolution

Tian, Di
Shi, Jiahang
Li, Jiabo
Gong, Mingming
Supervisor
Department
Machine Learning
Embargo End Date
Type
Journal article
Date
License
Language
Englsih
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Multimodal fusion of LiDAR (Light Detcction and Ranging) point clouds and camera images is a critical approach to enhancing environmental perception performance. The core lies in fully leveraging the complementary advantages of different sensors-LiDAR provides high-precision spatial depth information, while cameras contribute rich color and contextual features. Our study identifies that the high complexity of feature extraction caused by the 2D (Two-Dimensions) characteristics of images is a major bottleneck in multi-sensor fusion. To address issues such as strong local dependency, scale diversity, and high feature complexity in visual features, we propose the Nova (Nonlinearly Optimized Visual Adaptation) module, which introduces adaptive input normalization and a dynamic multi-scale convolution perception mechanism to significantly enhance image feature representation and transferability. Furthermore, to tackle the limitations of traditional convolutions with fixed sampling positions that hinder effective image feature extraction, we introduce the DARconv (Dynamic Adaptive Rectangle Convolution) module, which dynamically adjusts the number of convolution kernels and refines the sampling point computation method, enabling more flexible and effective feature capturing. Building on these modules, we propose a novel multi-sensor fusion network SLRN (Super Learning and Representation Multimodal fusion Net), which integrates features from multi-view cameras and LiDAR to produce BEV (Bird’s-eye View) feature maps with high semantic content and precise spatial representation, enabling more accurate 3D (Three-Dimensions) object detection. On the nuScenes benchmark, our method scores 46.3 % mAP and 56.4 % NDS (nuScenes Detection Score), outperforming the baseline by 2.1 % and 1.5 % respectively and demonstrating strong multi-sensor perception performance.
Citation
D. Tian, J. Shi, J. Li, M. Gong, "A multi-sensor fusion network with multi-cognitive visual adaptation and adaptive dynamic convolution," Measurement, vol. 261, pp. 119975-119975, 2025, https://doi.org/10.1016/j.measurement.2025.119975.
Source
Measurement
Conference
Keywords
46 Information and Computing Sciences, 4603 Computer Vision and Multimedia Computation, 4605 Data Management and Data Science
Subjects
Source
Publisher
Elsevier
Full-text link