Towards a Unified Pipeline for Scalable 3D Reconstruction via Scene Decomposition and Camera-Aware Fusion
Cavada, Sebastian
Cavada, Sebastian
Author
Supervisor
Department
Computer Vision
Embargo End Date
2025-05-30
Type
Thesis
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Scalable 3D reconstruction from multicamera video data remains a challenging problem due to the computational bottlenecks of classical Structure-from-Motion (SfM) and the inefficiencies of transformer-based fusion pipelines. Large-scale environments exacerbate these issues, making global alignment and dense reconstruction prohibitively expensive. In this work, we present a scalable reconstruction pipeline based on hierarchical SfM, combined with the development of Pos3R, a cameraaware transformer model for dense pointmap prediction in world coordinate. First, we design a modular, spatially consistent SfM pipeline that partitions environments into subscenes, reconstructs them independently, and merges them via shared camera poses - dramatically reducing COLMAP’s computational overhead and enabling robust alignment at scale. Second, we introduce Pos3R, an enhanced version of MASt3R, a transformer-based model that predicts dense pointmaps directly in world coordinates by incorporating known camera poses. Unlike MASt3R and DUSt3R, which require expensive global optimization for multi-view fusion, Pos3R leverages pose priors to eliminate this step, achieving 10–40× speedup in fusion performance. We validate our system on a newly collected multicamera video dataset comprising 97,000 frames of the MBZUAI campus. Approximately 10,000 sampled frames are used for reconstruction. As an initial exploration, we apply MASt3R-based densification to SfM outputs to examine how point density impacts Gaussian Splatting - highlighting a more nuanced relationship between structure and rendering fidelity. To support reproducibility and future research, we release the MBZUAI- Campus dataset: a largescale multiview benchmark with high-resolution imagery and SfM-derived camera poses and reconstructions.
Citation
Sebastian Cavada, “Towards a Unified Pipeline for Scalable 3D Reconstruction via Scene Decomposition and Camera-Aware Fusion,” Master of Science thesis, Computer Vision, MBZUAI, 2025.
Source
Conference
Keywords
Large Scale Reconstruction, 3D Geometry, Two Views Regression, Multi View Reconstruction, Deep Learning, Gaussian Splatting
