Loading...
Public Omics Explorer (POE): Enabling integrative semantic search across GEO omics datasets based on PubMed publications
Grigoriadis, Dimitris ; Tsifintaris, Margaritis ; Giannakakis, Antonis ; Pavlopoulos, Georgios A. ; Perdikopanis, Nikos
Grigoriadis, Dimitris
Tsifintaris, Margaritis
Giannakakis, Antonis
Pavlopoulos, Georgios A.
Perdikopanis, Nikos
Files
Supervisor
Department
Computational Biology
Embargo End Date
Type
Journal article
Date
2025
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
The exponential growth of publicly available omics datasets and biomedical literature has created both opportunities and challenges for data-driven discovery in life sciences. While the Gene Expression Omnibus (GEO) hosts millions of high-throughput experimental datasets, the European Nucleotide Archive (ENA) stores the corresponding raw sequencing data, and PubMed contains an extensive body of related scientific publications, integrated exploration of these resources remains limited. We present Public Omics Explorer (POE), a web-based platform that performs literature‑informed dataset retrieval, semantically linking GEO datasets and ENA records through their associated PubMed publications. POE automatically collects and indexes GEO metadata, ENA cross‑references, and PubMed abstracts on a daily basis. For semantic embedding, POE employs the biomedical-specialized SBioBERT model, which generates dense vector representations from publication text. These embeddings are indexed using Facebook AI Similarity Search (FAISS) to enable high-precision, context-aware retrieval. Users can search using free‑text natural language queries, which are processed through semantic search to identify conceptually relevant datasets based on linked publication content. Structured filters allow refinement by organism, experiment type, library strategy, sample type, extracted molecule, and publication year. In addition to semantic queries, POE supports direct retrieval of datasets via accession identifiers (GSE IDs, PubMed IDs, DOIs) and offers a programmatic RESTful API for integration into computational pipelines and automated workflows. By linking processed data in GEO with raw data in ENA through shared publication context, POE facilitates hypothesis generation, meta‑analysis, and exploratory research. The application is freely available at https://nplab.gr/poe.
Citation
D. Grigoriadis, M. Tsifintaris, A. Giannakakis, G. A. Pavlopoulos, and N. Perdikopanis, “Public Omics Explorer (POE): Enabling integrative semantic search across GEO omics datasets based on PubMed publications,” Comput Struct Biotechnol J, vol. 27, pp. 4802–4812, Jan. 2025, doi: 10.1016/J.CSBJ.2025.11.004
Source
Computational and Structural Biotechnology Journal
Conference
Keywords
BERT embeddings, Dataset discovery, GEO datasets, Omics datasets, PubMed integration, Semantic search
Subjects
Source
Publisher
Elsevier
