MBZUAI Institutional Repository
Welcome to MBZUAIRep, the MBZUAI institutional repository. It is the hub for collecting and preserving the university research output. The library manages this service to collect academic journal articles, conference proceedings, books, book chapters, theses, and dissertations from MBZUAI faculty members, students, staff, and researchers.
Featured Items
Recent Submissions
Item CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset(Delft University of Technology, 2025-08-21)We present CS-FLEURS, a new dataset for developing and evaluating code-switched speech recognition and translation systems beyond high-resourced languages. CS-FLEURS consists of 4 test sets which cover in total 113 unique code-switched language pairs across 52 languages: 1) a 14 X-English language pair set with real voices reading synthetically generated code-switched sentences, 2) a 16 X-English language pair set with generative text-to-speech 3) a 60 {Arabic, Mandarin, Hindi, Spanish}-X language pair set with the generative text-to-speech, and 4) a 45 X-English lower-resourced language pair test set with concatenative text-to-speech. Besides the three test sets, CS-FLEURS also provides a training set with 128 hours of generative text-to-speech data across 16 X-English language pairs. Our hope is that CS-FLEURS helps to broaden the scope of future code-switched speech research.Item SecureDA: Privacy-preserving Source-free Domain Adaptation for Person Re-identification(IEEE, 2025-08-14)Conventional domain adaptation (DA) for person reidentification (ReID) aims to bridge the domain gap but often requires direct use of fully labeled source and target domains, raising significant data privacy concerns due to the inclusion of personal identity information (PII) in raw data. Source-free domain adaptation (SFDA) for person ReID effectively preserves PII within the authorized source model. Nevertheless, these methods are vulnerable to data privacy (e.g., portrait rights) of the target domain during retrieval, where attackers can exploit pedestrian images for malicious generation, leading to damage to an individual's reputation. Beyond these limitations, we propose a novel framework called SecureDA to address privacy-preserving SFDA for person ReID, which can generate a privacy key to defend against potential attacks on PII. Technically, we introduce domain-specific adversarial attacks into DA, where the protected query and gallery images are encrypted to ensure secure image retrieval. Furthermore, we employ two simultaneous processes: 1) The global–local adversarial pathway (GLAP) leverages encrypted and original images as adversarial pairs, thereby fostering the development of robust ReID models; 2) The global–local collaborative pathway (GLCP) is mastered through positive pairs collected from the same domain, effectively mitigating the pernicious catastrophic forgetting phenomenon. Extensive experiments show that SecureDA achieves state-ofthe-art performance on multiple DA benchmarks and even outperforms the conventional DA and SFDA methods, which inherently compromise data privacy.Item Convergence of the EM algorithm in KL distance for overspecified Gaussian mixtures(Springer Nature, 2025-08-19)We present a study of the convergence properties of the Expectation-Maximization (EM) algorithm when applied to an overspecified model. In particular, we consider fitting a balanced mixture of two Gaussians to data originating from a single Gaussian. We provide theoretical bounds on the Kullback-Leibler (KL) divergence between the fitted and true distributions. An important feature is concavity and radiality of the expected log-likelihood function on a hypersurface induced by the EM algorithm, which greatly simplifies the analysis. We also show how our result on KL divergence can be used to upperbound the error rate of a mixture discriminant analysis classifier trained by the EM algorithm.Item Toward Resilient Airdrop Mechanisms: Empirical Measurement of Hunter Profits and Airdrop Game Theory Modeling(IEEE, 2025-08-14)Airdrops issued by platforms are to distribute tokens, drive user adoption, and promote decentralized services. The distributions attract airdrop hunters (attackers), who exploit the system by employing Sybil attacks, i.e., using multiple identities to manipulate token allocations to meet eligibility criteria. While debates around airdrop hunting question the potential benefits to the ecosystem, exploitative behaviors like Sybil attacks clearly undermine the system’s integrity, eroding trust and credibility. Despite the increasing prevalence of these tactics, a gap persists in the literature regarding systematic modeling of airdrop hunters’ costs and returns, alongside the theoretical models capturing the interactions among all roles for airdrop mechanism design. Our study first conducts an empirical analysis of transaction data from the Hop Protocol and LayerZero, identifying prevalent attack patterns and estimating hunters’ expected profits. Furthermore, we develop a game-theory model that simulates the interactions between attackers, organizers, and bounty hunters, proposing optimal incentive structures that enhance detection while minimizing organizational costs.Item Policy Gradient-Based Optimal Subset Selection for Few-Shot Vision-Language Learning(IEEE, 2025-08-18)Vision-Language models (VLMs) like Contrastive Language-Image Pre-Training (CLIP) have been extensively adapted for few-shot classification. Most few-shot methods rely on randomly selected samples from the dataset. However, since only a few samples are used, the sample selection process can significantly impact the performance of the downstream classification task. In this work, we propose a reinforcement learning-based policy gradient technique that employs a diversity and informativeness-based reward function to optimise the sample selection process. We evaluate various sample selection techniques based on downstream classification accuracy across three benchmark datasets, where the proposed method demonstrates promising results.
Communities in MBZUAI iRep
Select a community to browse its collections.