Monocular 3-D Human Reconstruction Using Gaussian Splatting Based On Multiview Priors
Mitkin, Vladislav
Mitkin, Vladislav
Author
Supervisor
Department
Computer Vision
Embargo End Date
2025-05-30
Type
Thesis
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
This thesis addresses the challenge of reconstructing photorealistic 3D human models from monocular video in realtime, focusing on applications in telepresence and immersive communication. As AR/VR technologies advance, there is growing demand for accessible methods to create high-quality 3D human representations without specialized hardware. Current approaches often compromise between quality and computational efficiency, with highfidelity methods requiring extensive processing time, while realtime methods produce lower-quality results. We present a novel approach that bridges this gap by decomposing the inherently am biguous monocular reconstruction problem into more tractable subproblems using multi view priors. Our method introduces BlendNet, the first realtime 1-to-4 view synthesizer specifically designed for human subjects, which generates consistent canonical views by combining geometric priors from SMPL parametric body models with perceptual features from the Sapiens foundation model. These synthesized views serve as intermediate representations that facilitate more accurate 3D reconstruction. Building on these consistent multiview representations, we develop LGHM (Large Gaus sian Human Model), a specialized 3D reconstruction module that efficiently converts syn thesized views into realtime renderable 3D Gaussian representations. Our approach in corporates techniques like frontal view reprojection and adaptive Gaussian upsampling to enhance visual fidelity while maintaining realtime performance. Comprehensive evaluations demonstrate that our system outperforms existing approaches in both quality and computational efficiency. Quantitative metrics show improvements in PSNR, LPIPS, SSIM, and FID scores compared to state-of-the-art methods, while qualitative assessments highlight superior preservation of clothing details, identity consistency, and pose accuracy. The complete pipeline operates at 25 frames per second, making high quality 3D human telepresence accessible for practical applications. Our contributions include a complete endtoend system for monocular 3D human reconstruction, novel architectures for multiview synthesis and 3D representation generation, and a 2D-supervised learning approach that overcomes limitations of existing 3D datasets. This research represents a significant step toward democratizing immersive communication technologies, enabling photorealistic virtual presence from ordinary monocular video.
Citation
Vladislav Mitkin, “Monocular 3-D Human Reconstruction Using Gaussian Splatting Based On Multiview Priors,” Master of Science thesis, Computer Vision, MBZUAI, 2025.
Source
Conference
Keywords
Virtual Teleportation, 3D Human reconstruction, Computer Graphics
