MBZUAI Institutional Repository

Welcome to MBZUAIRep, the MBZUAI institutional repository. It is the hub for collecting and preserving the university research output. The library manages this service to collect academic journal articles, conference proceedings, books, book chapters, theses, and dissertations from MBZUAI faculty members, students, staff, and researchers.

Featured Items

xTrimoPGLM: unified 100-billion-parameter pretrained transformer for deciphering the language of proteins

Le Song

Protein language models have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pretraining objectives, which makes them struggle to handle protein understanding and generation tasks concurrently. We propose a unified protein language model, xTrimoPGLM, to address these two types of tasks simultaneously through an innovative pretraining framework. Our key technical contribution is an exploration of the compatibility and the potential for joint optimization of the two types of objectives, which has led to a strategy for training xTrimoPGLM at an unprecedented scale of 100 billion parameters and 1 trillion training tokens. Our extensive experiments reveal that (1) xTrimoPGLM substantially outperforms other advanced baselines in 18 protein understanding benchmarks across four categories. The model also facilitates an atomic-resolution view of protein structures, leading to an advanced three-dimensional structural prediction model that surpasses existing language model-based tools. (2) xTrimoPGLM not only can generate de novo protein sequences following the principles of natural ones, but also can perform programmable generation after supervised fine-tuning on curated sequences. These results highlight the substantial capability and versatility of xTrimoPGLM in understanding and generating protein sequences, contributing to the evolving landscape of foundation models in protein science. Trained weight for the xTrimoPGLM model, and downstream datasets are available at https://huggingface.co/biomap-research.

2025-04-03

Unknown-Aware Bilateral Dependency Optimization for Defending Against Model Inversion Attacks

Tongliang Liu

By abusing access to a well-trained classifier, model inversion (MI) attacks pose a significant threat as they can recover the original training data, leading to privacy leakage. Previous studies mitigated MI attacks by imposing regularization to reduce the dependency between input features and outputs during classifier training, a strategy known as unilateral dependency optimization. However, this strategy contradicts the objective of minimizing the supervised classification loss, which inherently seeks to maximize the dependency between input features and outputs. Consequently, there is a trade-off between improving the model's robustness against MI attacks and maintaining its classification performance. To address this issue, we propose the bilateral dependency optimization strategy (BiDO), a dual-objective approach that minimizes the dependency between input features and latent representations, while simultaneously maximizing the dependency between latent representations and labels. BiDO is remarkable for its privacy-preserving capabilities. However, models trained with BiDO exhibit diminished capabilities in out-of-distribution (OOD) detection compared to models trained with standard classification supervision. Given the open-world nature of deep learning systems, this limitation could lead to significant security risks, as encountering OOD inputs whose label spaces do not overlap with the in-distribution (ID) data used during training-is inevitable. To address this, we leverage readily available auxiliary OOD data to enhance the OOD detection performance of models trained with BiDO. This leads to the introduction of an upgraded framework, unknown-aware BiDO (BiDO+), which mitigates both privacy and security concerns. As a highlight, with comparable model utility, BiDO-HSIC+ reduces the FPR95 by 55.02% and enhances the AUCROC by 9.52% compared to BiDO-HSIC, while also providing superior MI robustness.

04/04/2025

Assessing the effects of financial toxicity on quality of life among hematopoietic stem cell transplantation recipients.

Shahrukh Hashmi

Background: “Financial toxicity” refers to the financial burden imposed by treatment costs on individuals with cancer, constituting a major barrier to achieving equitable cancer outcomes. Recent literature increasingly demonstrates the detrimental impacts of financial toxicity on quality of life (QOL) among individuals with cancer, including individuals who have undergone hematopoietic stem cell transplantation (HSCT). This study evaluates associations among treatment cost burden and various aspects of QOL following HSCT. Methods: Seven hundred one HSCT recipients completed a survey examining their biopsychosocial health one year following transplant. The survey included the Functional Assessment of Cancer Therapy – Bone Marrow Transplantation (FACT-BMT), a multifactorial measure of QOL specific to this population. Treatment cost burden endorsement was measured on a 5-item Likert scale. Hierarchical regression models were developed to assess the incremental effects of demographic characteristics (i.e., Block 1), clinical predictors (Block 2), and cost burden (Block 3) on physical, emotional, social, functional, BMT-specific, general, and composite QOL outcomes. Results: Significant model improvement was observed with the addition of clinical factors (ΔF(2,650) = 20.28, p < .001), and subsequently, treatment cost burden (ΔF(1,649) = 110.29, p < .001). In the final model, higher cost burden was associated with poorer physical (β = -0.323, p < .001), emotional (β = -0.301, p < .001), social (β = -0.250, p < .001), functional (β = -0.317, p < .001), BMT-specific (β = -0.341, p < .001), general (β = -0.377, p < .001), and composite QOL (β = -0.381, p < .001). Poorer performance score was associated with each QOL indicator (p < .001), with allogeneic transplant type associated with poorer functional (β = -0.001, p = .002), but higher emotional (β = 0.118, p = .002), wellbeing. Older age (β = 0.113, p = .003) and female sex predicted higher (β = 0.183, p < .001), while Hispanic ethnicity predicted poorer (β = -0.095, p = .010), social wellbeing. Female sex was associated with poorer QOL specific to BMT concerns (β = -0.118, p = .001). Conclusions: Higher treatment cost burden is associated with poorer overall QOL and its physical, emotional, social, functional, and BMT-specific components one year following HSCT, after controlling for demographic and clinical characteristics. This reflects a critical barrier to equitable cancer care, suggesting that financial toxicity may perpetuate preexisting inequities in QOL, treatment, disease, and survival outcomes that disproportionately impact the underserved. Future research should prioritize 1) better understanding relationships among complex indicators of financial toxicity, QOL, and their underpinning mechanisms and 2) developing solutions to mitigate financial toxicity of HSCT and overall cancer care.

28/05/2025

Recent Submissions

CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset
(Delft University of Technology, 21/08/2025) Yan, Brian; Hamed, Injy; Shimizu, Shuichiro; Lodagala, Vasista Sai; Chen, William; Iakovenko, Olga; Talafha, Bashar; Hussein, Amir; Polok, Alexander; Chang, Kalvin; Klement, Dominik; Althubaiti, Sara; Peng, Puyuan; Wiesner, Matthew; Solorio, Thamar; Ali, Ahmed; Khudanpur, Sanjeev; Watanabe, Shinji
We present CS-FLEURS, a new dataset for developing and evaluating code-switched speech recognition and translation systems beyond high-resourced languages. CS-FLEURS consists of 4 test sets which cover in total 113 unique code-switched language pairs across 52 languages: 1) a 14 X-English language pair set with real voices reading synthetically generated code-switched sentences, 2) a 16 X-English language pair set with generative text-to-speech 3) a 60 {Arabic, Mandarin, Hindi, Spanish}-X language pair set with the generative text-to-speech, and 4) a 45 X-English lower-resourced language pair test set with concatenative text-to-speech. Besides the three test sets, CS-FLEURS also provides a training set with 128 hours of generative text-to-speech data across 16 X-English language pairs. Our hope is that CS-FLEURS helps to broaden the scope of future code-switched speech research.
Convergence of the EM algorithm in KL distance for overspecified Gaussian mixtures
(Springer Nature, 2025-08-19) Arman Bolatov
We present a study of the convergence properties of the Expectation-Maximization (EM) algorithm when applied to an overspecified model. In particular, we consider fitting a balanced mixture of two Gaussians to data originating from a single Gaussian. We provide theoretical bounds on the Kullback-Leibler (KL) divergence between the fitted and true distributions. An important feature is concavity and radiality of the expected log-likelihood function on a hypersurface induced by the EM algorithm, which greatly simplifies the analysis. We also show how our result on KL divergence can be used to upperbound the error rate of a mixture discriminant analysis classifier trained by the EM algorithm.
Toward Resilient Airdrop Mechanisms: Empirical Measurement of Hunter Profits and Airdrop Game Theory Modeling
(IEEE, 2025-08-14) Xue Liu
Airdrops issued by platforms are to distribute tokens, drive user adoption, and promote decentralized services. The distributions attract airdrop hunters (attackers), who exploit the system by employing Sybil attacks, i.e., using multiple identities to manipulate token allocations to meet eligibility criteria. While debates around airdrop hunting question the potential benefits to the ecosystem, exploitative behaviors like Sybil attacks clearly undermine the system’s integrity, eroding trust and credibility. Despite the increasing prevalence of these tactics, a gap persists in the literature regarding systematic modeling of airdrop hunters’ costs and returns, alongside the theoretical models capturing the interactions among all roles for airdrop mechanism design. Our study first conducts an empirical analysis of transaction data from the Hop Protocol and LayerZero, identifying prevalent attack patterns and estimating hunters’ expected profits. Furthermore, we develop a game-theory model that simulates the interactions between attackers, organizers, and bounty hunters, proposing optimal incentive structures that enhance detection while minimizing organizational costs.
Policy Gradient-Based Optimal Subset Selection for Few-Shot Vision-Language Learning
(IEEE, 2025-08-18) Muhammad Haris Khan
Vision-Language models (VLMs) like Contrastive Language-Image Pre-Training (CLIP) have been extensively adapted for few-shot classification. Most few-shot methods rely on randomly selected samples from the dataset. However, since only a few samples are used, the sample selection process can significantly impact the performance of the downstream classification task. In this work, we propose a reinforcement learning-based policy gradient technique that employs a diversity and informativeness-based reward function to optimise the sample selection process. We evaluate various sample selection techniques based on downstream classification accuracy across three benchmark datasets, where the proposed method demonstrates promising results.
Frequency-Based Comprehensive Prompt Learning for Vision-Language Models
(IEEE, 2025-08-19) Tongliang Liu
This paper targets to learn multiple comprehensive text prompts that can describe the visual concepts from coarse to fine, thereby endowing pre-trained VLMs with better transfer ability to various downstream tasks. We focus on exploring this idea on transformer-based VLMs since this kind of architecture achieves more compelling performances than CNN-based ones. Unfortunately, unlike CNNs, the transformer-based visual encoder of pre-trained VLMs cannot naturally provide discriminative and representative local visual information. To solve this problem, we propose Frequency-based Comprehensive Prompt Learning (FCPrompt) to excavate representative local visual information from the redundant output features of the visual encoder. FCPrompt transforms these features into frequency domain via Discrete Cosine Transform (DCT). Taking the advantages of energy concentration and information orthogonality of DCT, we can obtain compact, informative and disentangled local visual information by leveraging specific frequency components of the transformed frequency features. To better fit with transformer architectures, FCPrompt further adopts and optimizes different text prompts to respectively align with the global and frequency-based local visual information via a dual-branch framework. Finally, the learned text prompts can thus describe the entire visual concepts from coarse to fine comprehensively. Extensive experiments indicate that FCPrompt achieves the state-of-the-art performances on various benchmarks. Code is available at https://github.com/llcllc1997/FCPrompt.

Communities in MBZUAI iRep

Select a community to browse its collections.

Now showing 1 - 5 of 9