Ethics code: IR.YUMS.REC.1402.152
History
Received: 2025/05/28 | Accepted: 2025/07/2 | Published: 2025/07/8
Rights and permissions
1- Department of Medical Informatics, Boukan School of Medical Sciences, Urmia University of Medical Sciences, Urmia, Iran
2- Social Determinants of Health Research Center, Yasuj University of Medical Sciences, Yasuj, Iran
| * Corresponding Author Address: Yasuj University of Medical Sciences, Shahid Motahari Boulevard, Yasuj, Kohgiluyeh and Boyer-Ahmad Province, Iran. Postal Code: 7591994799 (cirruse.salehnasab@gmail.com) |
Abstract (415 Views)
Aims: Type 2 diabetes mellitus is a major global health challenge, and early prediction is key to prevention. This study compared three filter-based feature selection methods (ANOVA (f-classif), mutual information, and Chi-square test) for identifying predictors of type 2 diabetes and assessed their impact on the performance of logistic regression.
Instrument & Methods: This retrospective study analyzed data from 3,203 adults aged 35-70 years from Yasuj, Kohgiluyeh and Boyer-Ahmad Province, Iran, gathered between 2020 and 2022 in the Dena-PERSIAN cohort, including 402 (12.55%) individuals with type 2 diabetes. Preprocessing included imputation, normalization, and class balancing using the synthetic minority oversampling technique. Each method ranked predictors, and the top five features were used to train logistic regression models. Model performance was evaluated on a test set using accuracy, precision, recall, and F1-score.
Findings: Fasting blood sugar and age consistently emerged as dominant predictors across methods. ANOVA highlighted metabolic factors (triglycerides, fatty liver, and kidney stones), while mutual information emphasized high-density lipoprotein cholesterol and lifestyle behaviors, and the Chi-square test prioritized categorical comorbidities. Logistic regression achieved the strongest performance with ANOVA and mutual information (accuracy and F1=0.84), slightly outperforming the Chi-square test (accuracy and F1=0.82).
Conclusion: ANOVA and mutual information produced clinically meaningful and stable feature subsets for type 2 diabetes prediction, centered on fasting glucose, age, and fatty liver.