Abstract
Early detection of neurodegenerative diseases such as Alzheimer's disease (AD), mild cognitive impairment (MCI), and Parkinson's disease (PD) is crucial for timely intervention and improved patient outcomes. This study presents a machine learning framework for non-invasive diagnosis using voice signals recorded via smartphones. A machine learning model was developed, trained on the Slovak EWA-DB database, and evaluated on a large cohort consisting of patients with AD-MCI, PD, and healthy controls (HC). This model uses multi-class classification, which is more challenging than binary classification. The model achieved results with 93.5% accuracy, 93.6% sensitivity, and an F1 score of 88.5%, which are comparable to the results obtained from the EWA-DB for binary classification tasks. These results were achieved through thorough data preprocessing, including stratified sampling by age and diagnosis, class balancing through synthetic oversampling of minority classes, and dimensionality reduction through principal component analysis while preserving key information. We plan to apply the results independently of the multi-category classification of AD, MCI, and HC for non-invasive screening strategies in clinical practice. The proposed approach highlights the potential of speech biomarkers in combination with machine learning to improve early diagnosis across multiple classes.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2025 Information Technology Applications
