Item

Screening for diabetes mellitus in the US population using neural network-based modeling and complex survey designs.

Matabuena, Marcos
Vidal, Juan C
Ghosal, Rahul
Onnela, Jukka-Pekka
Citations
Google Scholar:
Altmetric:
Supervisor
Department
Epidemiology
Embargo End Date
Type
Journal article
Date
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Complex survey designs are widely used in medical cohort studies. Developing risk score models that adequately account for the sampling design is essential to minimize selection bias and obtain representative population estimates. This work addresses three complementary objectives. First, we propose a general predictive framework for regression and classification tasks that utilizes neural networks to incorporate survey weights into the model estimation process. Second, we introduce a procedure for quantifying prediction uncertainty based on conformal inference, adapted to the characteristics of complex survey data. Third, we demonstrate the application of the proposed methodology in a case study assessing the risk of diabetes mellitus in the US population, using the NHANES 2011-2014 cohort. The empirical results show that models of varying complexity, each using different sets of predictors, achieve different trade-offs between predictive performance and economic cost while maintaining generalizability at the population level. Although the case study focuses on diabetes, the proposed framework is directly applicable to the development of clinical prediction models for other diseases and complex survey datasets. All software and data used in this study are publicly available on GitHub.
Citation
M. Matabuena, J.C. Vidal, R. Ghosal, J.-P. Onnela, "Screening for diabetes mellitus in the US population using neural network-based modeling and complex survey designs.," Statistical Methods in Medical Research, pp. 9622802261442893-, 2026, https://doi.org/10.1177/09622802261442893.
Source
Statistical Methods in Medical Research
Conference
Keywords
42 Health Sciences, 4202 Epidemiology, 49 Mathematical Sciences, 4905 Statistics
Subjects
Source
Publisher
SAGE Publications
Full-text link