Item

Towards a Unified Pipeline for Scalable 3D Reconstruction via Scene Decomposition and Camera-Aware Fusion

Cavada, Sebastian
Supervisor
Department
Computer Vision
Embargo End Date
2025-05-30
Type
Thesis
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Scalable 3D reconstruction from multicamera video data remains a challenging problem due to the computational bottlenecks of classical Structure-from-Motion (SfM) and the inefficiencies of transformer-based fusion pipelines. Large-scale environments exacerbate these issues, making global alignment and dense reconstruction prohibitively expensive. In this work, we present a scalable reconstruction pipeline based on hierarchical SfM, combined with the development of Pos3R, a cameraaware transformer model for dense pointmap prediction in world coordinate. First, we design a modular, spatially consistent SfM pipeline that partitions environments into subscenes, reconstructs them independently, and merges them via shared camera poses - dramatically reducing COLMAP’s computational overhead and enabling robust alignment at scale. Second, we introduce Pos3R, an enhanced version of MASt3R, a transformer-based model that predicts dense pointmaps directly in world coordinates by incorporating known camera poses. Unlike MASt3R and DUSt3R, which require expensive global optimization for multi-view fusion, Pos3R leverages pose priors to eliminate this step, achieving 10–40× speedup in fusion performance. We validate our system on a newly collected multicamera video dataset comprising 97,000 frames of the MBZUAI campus. Approximately 10,000 sampled frames are used for reconstruction. As an initial exploration, we apply MASt3R-based densification to SfM outputs to examine how point density impacts Gaussian Splatting - highlighting a more nuanced relationship between structure and rendering fidelity. To support reproducibility and future research, we release the MBZUAI- Campus dataset: a largescale multiview benchmark with high-resolution imagery and SfM-derived camera poses and reconstructions.
Citation
Sebastian Cavada, “Towards a Unified Pipeline for Scalable 3D Reconstruction via Scene Decomposition and Camera-Aware Fusion,” Master of Science thesis, Computer Vision, MBZUAI, 2025.
Source
Conference
Keywords
Large Scale Reconstruction, 3D Geometry, Two Views Regression, Multi View Reconstruction, Deep Learning, Gaussian Splatting
Subjects
Source
Publisher
DOI
Full-text link