Loading...
Adapting Multilingual Models to Code-Mixed Tasks via Model Merging
Kodali, Prashant ; Shivkumar, Vaishnavi ; Joshi, Swarang ; Choudhury, Monojit ; Kumaraguru, Ponnurangam ; Shrivastava, Manish
Kodali, Prashant
Shivkumar, Vaishnavi
Joshi, Swarang
Choudhury, Monojit
Kumaraguru, Ponnurangam
Shrivastava, Manish
Files
Loading...
3799830.3799852.pdf
Adobe PDF, 588.52 KB
Supervisor
Department
Natural Language Processing
Embargo End Date
Type
Conference proceeding
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
We study model merging as a practical alternative to standard adaptation for code-mixed NLP. Starting from a multilingual base model, we (i) perform continued pre-training (CPT) on unlabeled code-mixed text to obtain an adapted checkpoint, (ii) merge this checkpoint with the base model, and (iii) fine-tune (FT) on downstream task data. We evaluate this approach on sentence classification (sentiment and hate speech) in English-Hindi (En-Hi) and English-Spanish (En-Es) using XLM-R and Llama-3.2-1B. Merged models consistently outperform both full fine-tuning and CPT → FT, yielding gains of 2–5 F1 points over full fine-tuning and ∼ 1–2 over CPT → FT, suggesting that merging exploits unlabeled data more effectively than CPT alone. Zero-/few-shot prompting with larger LLMs (e.g., Llama-3.3-70B) trails fine-tuned and merged checkpoints, highlighting limits of in-context learning for code-mixed inputs. Cross-pair transfer from En–Hi to En–Ta/Ml shows merged checkpoints (e.g., TV/TIES) outperform monolingual baselines (0.65–0.68 vs. 0.61–0.63 F1), confirming code-mixed knowledge as a superior substrate for low-resource pairs. We conclude with adaptation recipes for common data regimes (labeled only; labeled+unlabeled; transfer-only) and discuss limitations and scaling considerations for broader tasks and larger models.
Citation
P. Kodali, V. Shivkumar, S. Joshi, M. Choudhury, P. Kumaraguru, M. Shrivastava, "Adapting Multilingual Models to Code-Mixed Tasks via Model Merging," 2026, pp. 199-207.
Source
CODS '25: Proceedings of the 13th ACM IKDD International Conference on Data Science
Conference
The 13th ACM IKDD International Conference on Data Science
Keywords
46 Information and Computing Sciences, 47 Language, Communication and Culture, 4704 Linguistics
Subjects
Source
The 13th ACM IKDD International Conference on Data Science
Publisher
Association for Computing Machinery
