Description-Guided Code Generation: Using Vision-Language Models to Translate Figures into TikZ
Awad, Mostafa Mohamed Abdelhamid Azazi
Awad, Mostafa Mohamed Abdelhamid Azazi
Supervisor
Department
Natural Language Processing
Embargo End Date
30/05/2025
Type
Thesis
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
This thesis addresses the challenge of editing static scientific figures by recovering editable TikZ code from images using visionlanguage models (VLMs). TikZ is a LaTe X-based graphics language widely used in scientific writing for precise, programmatic figure creation. Building on prior work that generated code directly from images, this thesis introduces structured textual descriptions as an intermediate step to improve both interpretability and accuracy. Two training pipelines were developed using a new dataset of image-description-code triples. Results show that incorporating descriptions improves code quality, reduces errors, and accelerates training. This work contributes a new intermediate representation, an updated dataset, and a VLM-based approach that leverages models pretrained on segmentation tasks to generate better, interpretable TikZ code.
Citation
Mostafa Mohamed Abdelhamid Azazi Awad, “Description-Guided Code Generation: Using Vision-Language Models to Translate Figures into TikZ,” Master of Science thesis, Natural Language Processing, MBZUAI, 2025.
Source
Conference
Keywords
TikZ Code Generation, Vision-Language Models (VLMs), Figure to Code, DeTikZify, PaLI-Gemma 2, Description-Guided Generation
