Item

CODEMENV: Benchmarking Large Language Models on Code Migration

Cheng, Keyuan
Shen, Xudong
Yang, Yihao
Wang, Tengyue
Cao, Yang
Ali, Muhammad Asif
Wang, Hanbin
Hu, Lijie
Wang, Di
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Large language models (LLMs) have demonstrated remarkable proficiency in handling a wide range of tasks within the software engineering domain, but their ability to perform code migration—adapting code to different environments—remains underexplored. In this work, we propose a novel benchmark, : Code Migration Across Environment, designed to evaluate LLMs’ performance in handling code migration tasks. The benchmark comprises 922 data points across 19 Python and Java packages, offering three tasks to systematically evaluate code migration: identifying version-incompatible functions, determining function changes, and adapting code to target environments. Experimental evaluation of across seven LLMs revealed an average pass@1 rate of 26.50%, with GPT-4o performing best at 43.84%. We highlight our key findings as follows: (i) LLMs are more familiar with newer function versions, making them better at migrating legacy code, and (ii) a logical inconsistency where LLMs sometimes identify irrelevant function changes for the target migration environment.
Citation
K. Cheng, X. Shen, Y. Yang, T. Wang, Y. Cao, M. A. Ali, H. Wang, L. Hu, and D. Wang, “CODEMENV: Benchmarking Large Language Models on Code Migration,” in Findings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria, 2025, pp. 2719–2744.
Source
Findings of the Association for Computational Linguistics: NAACL 2025
Conference
Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics
Keywords
Subjects
Source
Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics
Publisher
Association for Computational Linguistics
DOI
Full-text link