A Transformer-Based Supervised Machine Learning Approach for Detecting SCAMs
Alharthi, Farah
Alharthi, Farah
Author
Supervisor
Department
Machine Learning
Embargo End Date
2024-01-01
Type
Thesis
Date
2024
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Small Colloidal Aggregating Molecules (SCAMs), formed by small molecules with low aqueous solubility, often introduce false positive results in bioassays, thereby compromising the reliability of reported bioactivities. These false positives can mislead researchers in drug discovery and development, leading to wasted resources and potential setbacks in finding effective treatments. In this study, we aim to mitigate this issue by developing a supervised machine learning model through fine-tuning the pre-trained transformer model known as ChemBERTa. The primary objective of this model is to discern instances where small molecules are prone to forming SCAMs. These predictions are instrumental in early stage screening of potential drug candidates, facilitating the identification of molecules that might yield false-positive outcomes in high-throughput screens due to colloidal aggregation. Despite encountering challenges related to class imbalance, our transformer model exhibits remarkable reliability in predicting colloidal aggregate formation, surpassing the performance of state-of-the-art models.
Citation
F. Alharthi, "A Transformer-Based Supervised Machine Learning Approach for Detecting SCAMs", MS. Thesis, Machine Learning, MBZUAI, Abu Dhabi, UAE, 2024
