MBZUAI Institutional Repository

Recent Submissions

  • Item
    BAKER: Bayesian Kernel Uncertainty in Domain-Specific Document Modelling
    (Association for Computing Machinery, 10/03/2025) Imran Razzak
    In critical domains such as healthcare and law, accurately modelling the uncertainty of automatic computational models is essential. For instance, healthcare models must produce reliable estimates to guide human decision-making. However, modelling uncertainty remains challenging, particularly for models handling low-resource datasets and complex, domain-specific vocabulary. Most existing predictive models model point estimates rather than probability distributions, limiting our ability to quantify model uncertainty. This paper introduces a novel model, BAKER, designed to address these limitations. BAKER combines the strengths of Bayesian inference, known for its effectiveness in modelling uncertainty, and kernel methods, which excel at capturing complex data relationships. Incorporating kernel functions enhances model performance, particularly by reducing overfitting in data-limited scenarios. Our experimental analysis shows that BAKER significantly improves uncertainty reasoning compared to existing models.
  • Item
    Large Language Model Simulator for Cold-Start Recommendation
    (Association for Computing Machinery, 21/05/2025) Fakhri Karray
    Recommending cold items remains a significant challenge in billion-scale online recommendation systems. While warm items benefit from historical user behaviors, cold items rely solely on content features, limiting their recommendation performance and impacting user experience and revenue. Current models generate synthetic behavioral embeddings from content features but fail to address the core issue: the absence of historical behavior data. To tackle this, we introduce the LLM Simulator framework, which leverages large language models to simulate user interactions for cold items, fundamentally addressing the cold-start problem. However, simply using LLM to traverse all users can introduce significant complexity in billion-scale systems. To manage the computational complexity, we propose a coupled funnel ColdLLM framework for online recommendation. ColdLLM efficiently reduces the number of candidate users from billions to hundreds using a trained coupled filter, allowing the LLM to operate efficiently and effectively on the filtered set. Extensive experiments show that ColdLLM significantly surpasses baselines in cold-start recommendations, including Recall and NDCG metrics. A two-week A/B test also validates that ColdLLM can effectively increase the cold-start period GMV.
  • Item
    GAMED: Knowledge Adaptive Multi-Experts Decoupling for Multimodal Fake News Detection
    (Association for Computing Machinery, 10/03/2025) Imran Razzak
    In critical domains such as healthcare and law, accurately modelling the uncertainty of automatic computational models is essential. For instance, healthcare models must produce reliable estimates to guide human decision-making. However, modelling uncertainty remains challenging, particularly for models handling low-resource datasets and complex, domain-specific vocabulary. Most existing predictive models model point estimates rather than probability distributions, limiting our ability to quantify model uncertainty. This paper introduces a novel model, BAKER, designed to address these limitations. BAKER combines the strengths of Bayesian inference, known for its effectiveness in modelling uncertainty, and kernel methods, which excel at capturing complex data relationships. Incorporating kernel functions enhances model performance, particularly by reducing overfitting in data-limited scenarios. Our experimental analysis shows that BAKER significantly improves uncertainty reasoning compared to existing models.
  • Item
    Assessing the effects of financial toxicity on quality of life among hematopoietic stem cell transplantation recipients.
    (LIPPINCOTT WILLIAMS & WILKINS, 28/05/2025) Shahrukh Hashmi
    Background: “Financial toxicity” refers to the financial burden imposed by treatment costs on individuals with cancer, constituting a major barrier to achieving equitable cancer outcomes. Recent literature increasingly demonstrates the detrimental impacts of financial toxicity on quality of life (QOL) among individuals with cancer, including individuals who have undergone hematopoietic stem cell transplantation (HSCT). This study evaluates associations among treatment cost burden and various aspects of QOL following HSCT. Methods: Seven hundred one HSCT recipients completed a survey examining their biopsychosocial health one year following transplant. The survey included the Functional Assessment of Cancer Therapy – Bone Marrow Transplantation (FACT-BMT), a multifactorial measure of QOL specific to this population. Treatment cost burden endorsement was measured on a 5-item Likert scale. Hierarchical regression models were developed to assess the incremental effects of demographic characteristics (i.e., Block 1), clinical predictors (Block 2), and cost burden (Block 3) on physical, emotional, social, functional, BMT-specific, general, and composite QOL outcomes. Results: Significant model improvement was observed with the addition of clinical factors (ΔF(2,650) = 20.28, p < .001), and subsequently, treatment cost burden (ΔF(1,649) = 110.29, p < .001). In the final model, higher cost burden was associated with poorer physical (β = -0.323, p < .001), emotional (β = -0.301, p < .001), social (β = -0.250, p < .001), functional (β = -0.317, p < .001), BMT-specific (β = -0.341, p < .001), general (β = -0.377, p < .001), and composite QOL (β = -0.381, p < .001). Poorer performance score was associated with each QOL indicator (p < .001), with allogeneic transplant type associated with poorer functional (β = -0.001, p = .002), but higher emotional (β = 0.118, p = .002), wellbeing. Older age (β = 0.113, p = .003) and female sex predicted higher (β = 0.183, p < .001), while Hispanic ethnicity predicted poorer (β = -0.095, p = .010), social wellbeing. Female sex was associated with poorer QOL specific to BMT concerns (β = -0.118, p = .001). Conclusions: Higher treatment cost burden is associated with poorer overall QOL and its physical, emotional, social, functional, and BMT-specific components one year following HSCT, after controlling for demographic and clinical characteristics. This reflects a critical barrier to equitable cancer care, suggesting that financial toxicity may perpetuate preexisting inequities in QOL, treatment, disease, and survival outcomes that disproportionately impact the underserved. Future research should prioritize 1) better understanding relationships among complex indicators of financial toxicity, QOL, and their underpinning mechanisms and 2) developing solutions to mitigate financial toxicity of HSCT and overall cancer care.
  • Item
    XGBoost-Liver: An Intelligent Integrated Features Approach for Classifying Liver Diseases Using Ensemble XGBoost Training Model.
    (Tech Science Press, 26/03/2025) Salman Khan
    The liver is a crucial gland and the second-largest organ in the human body and also essential in digestion, metabolism, detoxification, and immunity. Liver diseases result from factors such as viral infections, obesity, alcohol consumption, injuries, or genetic predispositions. Pose significant health risks and demand timely diagnosis and treatment to enhance survival rates. Traditionally, diagnosing liver diseases relied heavily on clinical expertise, often leading to subjective, challenging, and time-intensive processes. However, early detection is essential for effective intervention, and advancements in machine learning (ML) have demonstrated remarkable success in predicting various conditions, including Chronic Obstructive Pulmonary Disease (COPD), hypertension, and diabetes. This study proposed a novel XGBoost-liver predictor by integrating distinct feature methodologies, including Ranking and Statistical Projection-based strategies to detect early signs of liver disease. The Fisher score method is applied to perform global interpretation analysis, helping to select optimal features by assessing their contributions to the overall model. The performance of the proposed model has been extensively evaluated through k-fold cross-validation tests. Firstly, the performance of the proposed model is evaluated using individual and hybrid features. Secondly, the XGBoost-Liver model performance is compared to that of commonly used classifier algorithms. Thirdly, its performance is compared with the existing state-of-the-art computational models. The experimental results show that the proposed model performed better than the existing predictors, reaching an average accuracy rate of 92.07%. This paper demonstrates the potential of machine learning to improve liver disease prediction, enhance diagnostic accuracy, and enable timely medical interventions for better patient outcomes.

Communities in MBZUAI iRep

Select a community to browse its collections.