Web2code: A large-scale webpage-to-code dataset and evaluation framework for multimodal llms
Yun, Sukmin ; Lin, Haokun ; Thushara, Rusiru ; Bhat, Mohammad Qazim ; Wang, Yongxin ; Jiang, Zutao ; Deng, Mingkai ; Wang, Jinhong ; Tao, Tianhua ; Li, Junbo ... show 7 more
Yun, Sukmin
Lin, Haokun
Thushara, Rusiru
Bhat, Mohammad Qazim
Wang, Yongxin
Jiang, Zutao
Deng, Mingkai
Wang, Jinhong
Tao, Tianhua
Li, Junbo
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2024
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-to-code dataset for instruction tuning and an evaluation framework for the webpage understanding and HTML code translation abilities of MLLMs. For dataset construction, we leverage pretrained LLMs to enhance existing webpage-to-code datasets as well as generate a diverse pool of new webpages rendered into images. Specifically, the inputs are webpage images and instructions, while the responses are the webpage's HTML code. We further include diverse natural language QA pairs about the webpage content in the responses to enable a more comprehensive understanding of the web content. To evaluate model performance in these tasks, we develop an evaluation framework for testing MLLMs' abilities in webpage understanding and web-to-code generation. Extensive experiments show that our proposed dataset is beneficial not only to our proposed tasks but also in the general visual domain. We hope our work will contribute to the development of general MLLMs suitable for web-based content generation and task automation. Our data and code are available at https://github.com/MBZUAI-LLM/web2code.
Citation
S. Yun et al., “Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs,” Adv Neural Inf Process Syst, vol. 37, pp. 112134–112157, Dec. 2024, 2025. [Online]. Available: https://mbzuai-llm.github.io/webpage2code/
Source
Advances in Neural Information Processing Systems (NeurIPS 2024)
Conference
Keywords
Webpage-to-code translation, Multimodal large language models (MLLMs), Web2Code dataset, HTML code generation, Webpage underst
Subjects
Source
Publisher
NEURIPS
