Memoization-Aware Bayesian Optimization for AI Pipelines with Unknown Costs
Salahuddeen, Ridwan
Salahuddeen, Ridwan
Author
Supervisor
Department
Machine Learning
Embargo End Date
2024-01-01
Type
Thesis
Date
2024
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Bayesian optimization (BO) is an effective approach for optimizing black-box functions via potentially noisy function evaluations. However, few BO techniques address the costaware setting, in which training costs are sensible to parameter values, particularly when costs are initially unknown. This thesis explores cost awareness in tuning multi-stage AI pipelines (such as language model training/tuning), and especially explores caching techniques to store and reuse early-stage outputs in favor of optimizing later stages, without incurring the costs of re-running the full pipeline. To take advantage of caching, we propose the Expected-Expected Improvement Per Unit (EEIPU) acquisition function that adapts to black-box costs in the multi-stage pipeline setting. EEIPU incorporates earlystage memoization, allowing it to optimize later stages of the pipeline while only incurring a fraction of the pipeline cost. We ran EEIPU against state-of-the-art cost-aware and multi-stage BO methods, on a series of synthetic and real experiments. Our method reports strong results, generating an average of 148% more BO iterations within the same optimization budget than comparable methods on synthetic experiments, and 103% more BO iterations when tested on real pipelines. After warm-up iterations, EEIPU improves on the objective value by 58% and 108% over the average baseline in synthetic and real experiments, respectively.
Citation
R. Salahuddeen, "Memoization-Aware Bayesian Optimization for AI Pipelines with Unknown Costs", MS. Thesis, Machine Learning, MBZUAI, Abu Dhabi, UAE, 2024
