Ethics code: IR.YUMS.REC.1402.152
History
Received: 2025/08/31 | Accepted: 2025/09/23 | Published: 2025/07/1
How to cite this article
Ghaderzadeh M, Salehnasab C. Filter-Based Feature Selection for Type II Diabetes Prediction: A Comparative Study of ANOVA, Mutual Information, and Chi-Square Tests. J Clinic Care Skill 2025; 6 (3) :1001-1013
URL:
http://jccs.yums.ac.ir/article-1-427-en.html
Rights and permissions
1- , cirruse.salehnasab@gmail.com
Abstract (25 Views)
Aims: Type 2 diabetes mellitus (T2DM) is a major global health challenge, and early prediction is key to prevention. This study compared three filter-based feature selection methods—ANOVA (f_classif), Mutual Information, and Chi-Square—for identifying predictors of T2DM and assessed their impact on Logistic Regression performance.
Methods: Data from 3,203 adults aged 35–70 years in the Dena-PERSIAN cohort were analyzed, including 402 (12.55%) with T2DM. Preprocessing included imputation, normalization, and class balancing with the Synthetic Minority Oversampling Technique (SMOTE). Each method ranked predictors, and the top five features were used to train Logistic Regression models. Model performance was evaluated on a test set using Accuracy, Precision, Recall, and F1-score.
Findings: Fasting blood sugar and age consistently emerged as dominant predictors across methods. ANOVA highlighted metabolic variables (triglycerides, fatty liver, kidney stones), while Mutual Information emphasized HDL cholesterol and lifestyle behaviors, and Chi-Square prioritized categorical comorbidities. Logistic Regression achieved the strongest performance with ANOVA and MI (Accuracy and F1 = 0.84), slightly outperforming Chi-Square (Accuracy and F1 = 0.82).
Conclusion: ANOVA and Mutual Information produced clinically meaningful and stable feature subsets for T2DM prediction, centered on fasting glucose, age, and fatty liver. Their complementary strengths—ANOVA in metabolic signals and MI in lifestyle and lipid factors—support their use in interpretable risk models and potential integration into hybrid frameworks for early diabetes detection.