Item

Memoization-Aware Bayesian Optimization for AI Pipelines with Unknown Costs

Salahuddeen, Ridwan
Department
Machine Learning
Embargo End Date
2024-01-01
Type
Thesis
Date
2024
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Bayesian optimization (BO) is an effective approach for optimizing black-box functions via potentially noisy function evaluations. However, few BO techniques address the costaware setting, in which training costs are sensible to parameter values, particularly when costs are initially unknown. This thesis explores cost awareness in tuning multi-stage AI pipelines (such as language model training/tuning), and especially explores caching techniques to store and reuse early-stage outputs in favor of optimizing later stages, without incurring the costs of re-running the full pipeline. To take advantage of caching, we propose the Expected-Expected Improvement Per Unit (EEIPU) acquisition function that adapts to black-box costs in the multi-stage pipeline setting. EEIPU incorporates earlystage memoization, allowing it to optimize later stages of the pipeline while only incurring a fraction of the pipeline cost. We ran EEIPU against state-of-the-art cost-aware and multi-stage BO methods, on a series of synthetic and real experiments. Our method reports strong results, generating an average of 148% more BO iterations within the same optimization budget than comparable methods on synthetic experiments, and 103% more BO iterations when tested on real pipelines. After warm-up iterations, EEIPU improves on the objective value by 58% and 108% over the average baseline in synthetic and real experiments, respectively.
Citation
R. Salahuddeen, "Memoization-Aware Bayesian Optimization for AI Pipelines with Unknown Costs", MS. Thesis, Machine Learning, MBZUAI, Abu Dhabi, UAE, 2024
Source
Conference
Keywords
Subjects
Source
Publisher
DOI
Full-text link