Item

ConsistentID: Portrait Generation With Multimodal Fine-Grained Identity Preserving

Huang, Jiehui
Dong, Xiao
Song, Wenhui
Chong, Zheng
Tang, Zhenchao
Zhou, Jun
Cheng, Yuhao
Chen, Long
Li, Hanhui
Yan, Yiqiang
... show 2 more
Research Projects
Organizational Units
Journal Issue
Abstract
Diffusion-based technologies have made significant strides, particularly in personalized and customized facial generation. However, existing methods struggle to achieve high-fidelity and detailed identity (ID) consistency. This is mainly due to two challenges: insufficient fine-grained control over specific facial areas and the absence of a comprehensive strategy for ID preservation that accounts for both intricate facial details and the overall facial structure. To address these limitations, we introduce ConsistentID, an innovative method crafted for diverse identity-preserving portrait generation under fine-grained multimodal facial prompts, utilizing only a single reference image. ConsistentID comprises two core components: a multimodal facial prompt generator and an ID-preservation network. The facial prompt generator combines localized facial features, facial feature descriptions, and overall facial descriptions to enhance the precision of facial detail reconstruction. The ID-preservation network, optimized with a facial attention localization strategy, ensures consistent identity preservation across facial regions. Together, these components leverage fine-grained multimodal identity information to improve identity preservation accuracy significantly. To drive ConsistentID's training, we propose a fine-grained portrait dataset, FGID, with over 500,000 facial images, offering greater diversity and comprehensiveness than existing public facial datasets. Experimental results substantiate that our ConsistentID achieves exceptional precision and diversity in personalized facial generation, surpassing existing methods in the MyStyle dataset. In addition, although ConsistentID introduces more multimodal ID information, it still maintains rapid inference speed during the generation process. Our codes and pre-trained checkpoints are available at https://github.com/JackAILab/ConsistentID.
Citation
J. Huang, X. Dong, W. Song, Z. Chong, Z. Tang, J. Zhou , et al., "ConsistentID: Portrait Generation With Multimodal Fine-Grained Identity Preserving," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PP, no. 99, pp. 1-17, 2026, https://doi.org/10.1109/tpami.2026.3652557.
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence
Conference
Keywords
46 Information and Computing Sciences, 4608 Human-Centred Computing, 3 Good Health and Well Being
Subjects
Source
Publisher
IEEE
Full-text link