Language Model Mapping in Multimodal Music Learning: A Grand Challenge Proposal

Gus Xia

Item

Language Model Mapping in Multimodal Music Learning: A Grand Challenge Proposal

Gus Xia

Author

Gus Xia

Department

Machine Learning

Type

Preprint

Date

2025

Language

English

Collections

Publications

Show all metadata

Abstract

We have seen remarkable success in representation learning and language models (LMs) using deep neural networks. Many studies aim to build the underlying connections among different modalities via the alignment and mappings at the token or embedding level, but so far, most methods are very data-hungry, limiting their performance in domains such as music where paired data are less abundant. We argue that the embedding alignment is only at the surface level of multimodal alignment. In this paper, we propose a grand challenge of \textit{language model mapping} (LMM), i.e., how to map the essence implied in the LM of one domain to the LM of another domain under the assumption that LMs of different modalities are tracking the same underlying phenomena. We first introduce a basic setup of LMM, highlighting the goal to unveil a deeper aspect of cross-modal alignment as well as to achieve more sample-efficiency learning. We then discuss why music is an ideal domain in which to conduct LMM research. After that, we connect LMM in music with a more general and challenging scientific problem of \textit{learning to take actions based on both sensory input and abstract symbols}, and in the end, present an advanced version of the challenge problem setup.

Publisher

arXiv

DOI

10.48550/arXiv.2503.00427

Additional links

https://arxiv.org/abs/2503.00427

Language Model Mapping in Multimodal Music Learning: A Grand Challenge Proposal

Gus Xia

Author

Supervisor

Department

Embargo End Date

Type

Date

License

Language

Collections

Research Projects

Organizational Units

Journal Issue

Abstract

Citation

Keywords

Subjects

Source

Publisher

DOI

Full-text link

Additional links