Function Alignment: Inter-Layer Cross-Attention for Performance MIDI-to-Score Conversion
Qian, Xiaoyu
Qian, Xiaoyu
Author
Supervisor
Department
Machine Learning
Embargo End Date
2025-10-01
Type
Thesis
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
This thesis addresses a critical sub-problem within the domain of Automatic Music Transcription (AMT): Performance MIDI-to-Score (PM2S) conversion, which aims to translate the expressive nuance of a human performance captured in MIDI format into a canonical, quantized musical score. A fundamental difficulty in this task is to disentangle these expressive features from the underlying musical structure. Conventional models often face challenges in learning and aligning the implicit relationships between performance nuances and their corresponding notational concepts.
The thesis investigates a modular approach to this task by adapting two specialized, pre-trained language models—one for performance context and one for score generation. We propose a framework where the performance model conditions the score model through a parameter-efficient, inter-layer cross-attention adapter, designed to facilitate the transfer of hierarchical musical information between them.
The system is rigorously evaluated on the ASAP (Aligned Scores and Performances) dataset. While the proposed model does not surpass the overall transcription accuracy of current state-of-the-art baselines, our analysis provides critical insights into the architectural and training considerations for this task. This work contributes a detailed empirical analysis of a dual-model approach to music transcription, offering valuable methodological insights that can inform future research in the domain of symbolic music processing. The implementation is publicly available at https://github.com/XyuQian/functionalignment-MIDI/. The demo page is at https://xyuqian.github.io/function-alignmentdemo/
Citation
Qian, Xiaoyu, “Function Alignment: Inter-Layer Cross-Attention for Performance MIDI-to-Score Conversion,” Master of Science thesis, Machine Learning, MBZUAI, 2025.
Source
Conference
Keywords
Symbolic Music Representation, Function Alignment, Performance MIDI-to-Score Conversion, Parameter-Efficient Fine-Tuning, Cross-Attention Adapters
