MBZUAI Institutional Repository

Recent Submissions

  • Item
    CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset
    (Delft University of Technology, 21/08/2025) Yan, Brian; Hamed, Injy; Shimizu, Shuichiro; Lodagala, Vasista Sai; Chen, William; Iakovenko, Olga; Talafha, Bashar; Hussein, Amir; Polok, Alexander; Chang, Kalvin; Klement, Dominik; Althubaiti, Sara; Peng, Puyuan; Wiesner, Matthew; Solorio, Thamar; Ali, Ahmed; Khudanpur, Sanjeev; Watanabe, Shinji
    We present CS-FLEURS, a new dataset for developing and evaluating code-switched speech recognition and translation systems beyond high-resourced languages. CS-FLEURS consists of 4 test sets which cover in total 113 unique code-switched language pairs across 52 languages: 1) a 14 X-English language pair set with real voices reading synthetically generated code-switched sentences, 2) a 16 X-English language pair set with generative text-to-speech 3) a 60 {Arabic, Mandarin, Hindi, Spanish}-X language pair set with the generative text-to-speech, and 4) a 45 X-English lower-resourced language pair test set with concatenative text-to-speech. Besides the three test sets, CS-FLEURS also provides a training set with 128 hours of generative text-to-speech data across 16 X-English language pairs. Our hope is that CS-FLEURS helps to broaden the scope of future code-switched speech research.
  • Item
    Convergence of the EM algorithm in KL distance for overspecified Gaussian mixtures
    (Springer Nature, 2025-08-19) Arman Bolatov
    We present a study of the convergence properties of the Expectation-Maximization (EM) algorithm when applied to an overspecified model. In particular, we consider fitting a balanced mixture of two Gaussians to data originating from a single Gaussian. We provide theoretical bounds on the Kullback-Leibler (KL) divergence between the fitted and true distributions. An important feature is concavity and radiality of the expected log-likelihood function on a hypersurface induced by the EM algorithm, which greatly simplifies the analysis. We also show how our result on KL divergence can be used to upperbound the error rate of a mixture discriminant analysis classifier trained by the EM algorithm.
  • Item
    Toward Resilient Airdrop Mechanisms: Empirical Measurement of Hunter Profits and Airdrop Game Theory Modeling
    (IEEE, 2025-08-14) Xue Liu
    Airdrops issued by platforms are to distribute tokens, drive user adoption, and promote decentralized services. The distributions attract airdrop hunters (attackers), who exploit the system by employing Sybil attacks, i.e., using multiple identities to manipulate token allocations to meet eligibility criteria. While debates around airdrop hunting question the potential benefits to the ecosystem, exploitative behaviors like Sybil attacks clearly undermine the system’s integrity, eroding trust and credibility. Despite the increasing prevalence of these tactics, a gap persists in the literature regarding systematic modeling of airdrop hunters’ costs and returns, alongside the theoretical models capturing the interactions among all roles for airdrop mechanism design. Our study first conducts an empirical analysis of transaction data from the Hop Protocol and LayerZero, identifying prevalent attack patterns and estimating hunters’ expected profits. Furthermore, we develop a game-theory model that simulates the interactions between attackers, organizers, and bounty hunters, proposing optimal incentive structures that enhance detection while minimizing organizational costs.
  • Item
    Policy Gradient-Based Optimal Subset Selection for Few-Shot Vision-Language Learning
    (IEEE, 2025-08-18) Muhammad Haris Khan
    Vision-Language models (VLMs) like Contrastive Language-Image Pre-Training (CLIP) have been extensively adapted for few-shot classification. Most few-shot methods rely on randomly selected samples from the dataset. However, since only a few samples are used, the sample selection process can significantly impact the performance of the downstream classification task. In this work, we propose a reinforcement learning-based policy gradient technique that employs a diversity and informativeness-based reward function to optimise the sample selection process. We evaluate various sample selection techniques based on downstream classification accuracy across three benchmark datasets, where the proposed method demonstrates promising results.
  • Item
    Frequency-Based Comprehensive Prompt Learning for Vision-Language Models
    (IEEE, 2025-08-19) Tongliang Liu
    This paper targets to learn multiple comprehensive text prompts that can describe the visual concepts from coarse to fine, thereby endowing pre-trained VLMs with better transfer ability to various downstream tasks. We focus on exploring this idea on transformer-based VLMs since this kind of architecture achieves more compelling performances than CNN-based ones. Unfortunately, unlike CNNs, the transformer-based visual encoder of pre-trained VLMs cannot naturally provide discriminative and representative local visual information. To solve this problem, we propose Frequency-based Comprehensive Prompt Learning (FCPrompt) to excavate representative local visual information from the redundant output features of the visual encoder. FCPrompt transforms these features into frequency domain via Discrete Cosine Transform (DCT). Taking the advantages of energy concentration and information orthogonality of DCT, we can obtain compact, informative and disentangled local visual information by leveraging specific frequency components of the transformed frequency features. To better fit with transformer architectures, FCPrompt further adopts and optimizes different text prompts to respectively align with the global and frequency-based local visual information via a dual-branch framework. Finally, the learned text prompts can thus describe the entire visual concepts from coarse to fine comprehensively. Extensive experiments indicate that FCPrompt achieves the state-of-the-art performances on various benchmarks. Code is available at https://github.com/llcllc1997/FCPrompt.

Communities in MBZUAI iRep

Select a community to browse its collections.