MuPT: A Generative Symbolic Music Pretrained Transformer
Qu Xingwei ; Bai Yuelin ; Ma Yinghao ; Zhou Ziya ; Lo Ka Man ; Liu Jiaheng ; Yuan Ruibin ; Min Lejun ; Liu Xueling ; Zhang Tianyu ... show 10 more
Qu Xingwei
Bai Yuelin
Ma Yinghao
Zhou Ziya
Lo Ka Man
Liu Jiaheng
Yuan Ruibin
Min Lejun
Liu Xueling
Zhang Tianyu
Author
Qu Xingwei
Bai Yuelin
Ma Yinghao
Zhou Ziya
Lo Ka Man
Liu Jiaheng
Yuan Ruibin
Min Lejun
Liu Xueling
Zhang Tianyu
Du Xinrun
Guo Shuyue
Liang Yiming
Li Yizhi
Wu Shangda
Zhou Junting
Zheng Tianyu
Ma Ziyang
Han Fengze
Xue Wei
Xia Gus
Benetos Emmanouil
Yue Xiang
Lin Chenghua
Tan Xu
Huang Wenhao
Fu Jie
Zhang Ge
Bai Yuelin
Ma Yinghao
Zhou Ziya
Lo Ka Man
Liu Jiaheng
Yuan Ruibin
Min Lejun
Liu Xueling
Zhang Tianyu
Du Xinrun
Guo Shuyue
Liang Yiming
Li Yizhi
Wu Shangda
Zhou Junting
Zheng Tianyu
Ma Ziyang
Han Fengze
Xue Wei
Xia Gus
Benetos Emmanouil
Yue Xiang
Lin Chenghua
Tan Xu
Huang Wenhao
Fu Jie
Zhang Ge
Supervisor
Department
Machine Learning
Embargo End Date
Type
Conference proceeding
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
In this paper, we explore the application of Large Language Models (LLMs) to the pre-training on symbolic music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the challenges associated with misaligned measures from different tracks during generation, we propose the development of a Synchronized Multi-Track ABC Notation (SMT-ABC Notation), which aims to preserve coherence across multiple musical tracks. Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set. Furthermore, we explore the implications of the Symbolic Music Scaling Law (SMS Law) on model performance. In music structure experiments on repetition, we outperform GPT-4 by 17% (average Intra Similarity and Repetition Rate) on the full test set, and surpass the SOTA ABC-notation model ChatMusician by 6% on the single-track test set. In terms of subject evaluation, listeners preferred music from our system in 79% of cases, comparing to GPT-4. The results indicate a promising direction for future research in music generation, offering extensive resources for community-led research through our open-source contributions. © 2025 13th International Conference on Learning Representations, ICLR 2025. All rights reserved.
Citation
X. Qu et al., “MuPT: A Generative Symbolic Music Pretrained Transformer,” International Conference on Representation Learning, vol. 2025, pp. 46753–46779, May 2025
Source
13th International Conference on Learning Representations, ICLR 2025
Conference
13th International Conference on Learning Representations, ICLR 2025
Keywords
Subjects
Source
13th International Conference on Learning Representations, ICLR 2025
Publisher
International Conference on Learning Representations, ICLR
