Loading...
Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows
Mas Montserrat, Daniel ; Verma, Ray ; Barrabés, Míriam ; De la Vega, Francisco M ; Bustamante, Carlos D ; Ioannidis, Alexander G
Mas Montserrat, Daniel
Verma, Ray
Barrabés, Míriam
De la Vega, Francisco M
Bustamante, Carlos D
Ioannidis, Alexander G
Files
Supervisor
Department
Personalized Medicine
Embargo End Date
Type
Conference proceeding
Date
License
http://creativecommons.org/licenses/by/4.0/
Language
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Large-scale genomic workflows used in precision medicine can process datasets spanning tens to hundreds of gigabytes per sample, leading to high memory spikes, intensive disk I/O, and task failures due to out-of-memory errors. Simple static resource allocation methods struggle to handle the variability in per-chromosome RAM demands, resulting in poor resource utilization and long runtimes. In this work, we propose multiple mechanisms for adaptive, RAM-efficient parallelization of chromosome-level bioinformatics workflows. First, we develop a symbolic regression model that estimates per-chromosome memory consumption for a given task and introduces an interpolating bias to conservatively minimize over-allocation. Second, we present a dynamic scheduler that adaptively predicts RAM usage with a polynomial regression model, treating task packing as a Knapsack problem to optimally batch jobs based on predicted memory requirements. Additionally, we present a static scheduler that optimizes chromosome processing order to minimize peak memory while preserving throughput. Our proposed methods, evaluated on simulations and real-world genomic pipelines, provide new mechanisms to reduce memory overruns and balance load across threads. We thereby achieve faster end-to-end execution, showcasing the potential to optimize large-scale genomic workflows.
Citation
D. Mas Montserrat, R. Verma, M. Barrabés, F.M. De la Vega, C.D. Bustamante, A.G. Ioannidis, "Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows," 2026, pp. 40083-40091.
Source
Proceedings of the AAAI Conference on Artificial Intelligence
Conference
The Fortieth AAAI Conference on Artificial Intelligence
Keywords
31 Biological Sciences, 46 Information and Computing Sciences, 4606 Distributed Computing and Systems Software
Subjects
Source
The Fortieth AAAI Conference on Artificial Intelligence
Publisher
Association for the Advancement of Artificial Intelligence
