Metric from human: Zero-shot monocular metric depth estimation via test-time adaptation
Zhao, Yizhou ; Bian, Hengwei ; Chen, Kaihua ; Ji, Pengliang ; Qu, Liao ; Lin, Shao-yu ; Yu, Weichen ; Li, Haoran ; Chen, Hao ; Shen, Jun ... show 2 more
Zhao, Yizhou
Bian, Hengwei
Chen, Kaihua
Ji, Pengliang
Qu, Liao
Lin, Shao-yu
Yu, Weichen
Li, Haoran
Chen, Hao
Shen, Jun
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
2024
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Monocular depth estimation (MDE) is fundamental for deriving 3D scene structures from 2D images. While state-of-the-art monocular relative depth estimation (MRDE) excels in estimating relative depths for in-the-wild images, current monocular metric depth estimation (MMDE) approaches still face challenges in handling unseen scenes. Since MMDE can be viewed as the composition of MRDE and metric scale recovery, we attribute this difficulty to scene dependency, where MMDE models rely on scenes observed during supervised training for predicting scene scales during inference. To address this issue, we propose to use humans as landmarks for distilling scene-independent metric scale priors from generative painting models. Our approach, Metric from Human (MfH), bridges from generalizable MRDE to zero-shot MMDE in a generate-and-estimate manner. Specifically, MfH generates humans on the input image with generative painting and estimates human dimensions with an off-the-shelf human mesh recovery (HMR) model. Based on MRDE predictions, it propagates the metric information from painted humans to the contexts, resulting in metric depth estimations for the original input. Through this annotation-free test-time adaptation, MfH achieves superior zero-shot performance in MMDE, demonstrating its strong generalization ability.
Citation
Y. Zhao et al., “Metric from Human: Zero-shot Monocular Metric Depth Estimation via Test-time Adaptation,” Adv Neural Inf Process Syst, vol. 37, pp. 104724–104753, Dec. 2024, [Online]. Available: https://github.com/Skaldak/MfH
Source
Advances in Neural Information Processing Systems (NeurIPS 2024)
Conference
Keywords
Classifier-free guidance, Large Language Models (LLMs), Model unlearning, ORPO reinforcement learning, Synthetic data training
Subjects
Source
Publisher
NEURIPS
