Loading...
Biological databases in the age of generative artificial intelligence
Pop, Mihai ; Attwood, Teresa K ; Blake, Judith A ; Bourne, Philip E ; Conesa, Ana ; Gaasterland, Terry ; Hunter, Lawrence ; Kingsford, Carl ; Kohlbacher, Oliver ; Lengauer, Thomas ... show 10 more
Pop, Mihai
Attwood, Teresa K
Blake, Judith A
Bourne, Philip E
Conesa, Ana
Gaasterland, Terry
Hunter, Lawrence
Kingsford, Carl
Kohlbacher, Oliver
Lengauer, Thomas
Author
Pop, Mihai
Attwood, Teresa K
Blake, Judith A
Bourne, Philip E
Conesa, Ana
Gaasterland, Terry
Hunter, Lawrence
Kingsford, Carl
Kohlbacher, Oliver
Lengauer, Thomas
Markel, Scott
Moreau, Yves
Noble, William S
Orengo, Christine
Ouellette, B F Francis
Parida, Laxmi
Przulj, Natasa
Przytycka, Teresa M
Ranganathan, Shoba
Schwartz, Russell
Valencia, Alfonso
Warnow, Tandy
Attwood, Teresa K
Blake, Judith A
Bourne, Philip E
Conesa, Ana
Gaasterland, Terry
Hunter, Lawrence
Kingsford, Carl
Kohlbacher, Oliver
Lengauer, Thomas
Markel, Scott
Moreau, Yves
Noble, William S
Orengo, Christine
Ouellette, B F Francis
Parida, Laxmi
Przulj, Natasa
Przytycka, Teresa M
Ranganathan, Shoba
Schwartz, Russell
Valencia, Alfonso
Warnow, Tandy
Supervisor
Department
Computational Biology
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Summary: Modern biological research critically depends on public databases. The introduction and propagation of errors within and across databases can lead to wasted resources as scientists are led astray by bad data or have to conduct expensive validation experiments. The emergence of generative artificial intelligence systems threatens to compound this problem owing to the ease with which massive volumes of synthetic data can be generated. We provide an overview of several key issues that occur within the biological data ecosystem and make several recommendations aimed at reducing data errors and their propagation. We specifically highlight the critical importance of improved educational programs aimed at biologists and life scientists that emphasize best practices in data engineering. We also argue for increased theoretical and empirical research on data provenance error propagation and on understanding the impact of errors on analytic pipelines. Furthermore we recommend enhanced funding for the stewardship and maintenance of public biological databases.
Citation
M. Pop et al., “Biological databases in the age of generative artificial intelligence,” Bioinformatics Advances, vol. 5, no. 1, p. 19, Dec. 2024, doi: 10.1093/BIOADV/VBAF044.
Source
Bioinformatics Advances
Conference
Keywords
Subjects
Source
Publisher
Oxford University Press
