The dataset contained spirometry investigation reports of 1314 patients from Institute of Pulmocare and Research (IPCR), Kolkata diagnosed with obstructive and non-obstructive diseases. The patients were divided in 2 groups - Group A and Group B consisting of 1163 and 151 patients respectively. The reports of the patients diagnosed with obstructive diseases were labelled as positive and those with non-obstructive diseases were labelled as negative. The reports in Group A were used for training and testing with cross validation (CV-dataset), and the reports in Group B were used as blind dataset. A summary of the dataset is given in Table - 1.
Table - 1: Summary of patient groups in the dataset.
Group A | Group B | |
---|---|---|
Used for training and testing with 5-fold cross validation | Used as blind dataset for validation | |
Patient count | 1163 | 151 |
Total number of spirometry reports | 1172 | 154 |
Number of obstructive spirometry reports | 1006 | 103 |
Number of non-obstructive spirometry reports | 166 | 51 |
In spirometry, patients are asked to take a maximal inspiration and, then, expel the air forcefully as quickly as possible into a mouthpiece. The test is repeated following the administration of a bronchodilator. The pre and post bronchodilator values of the following three metrics were used as input:
For each of the above tests, there are 4 attributes. Thus, there are a total of 12 attributes.
Supervised machine learning models were developed for the classification task using Support Vector Machine (SVM), Random Forest (RF), Naive Bayes (NB) and Multi-layer Perceptron (MLP) algorithms. Different performance metrics, such as accuracy, sensitivity, specificity, F1-score, Matthews correlation coefficient (MCC) and area under receiver operator characteristic curve (AUROC) were computed and compared. The optimal model was chosen on the basis of the highest MCC value.
The training dataset used for cross validation was highly imbalanced where the positive to negative ratio (P:N) was 6:1. To handle this imbalance, an undersampling method was used in which the majority (positive) class samples were randomly divided into six disjoint (and, exhaustive) subsets. Then the minority (negative) class samples were concatenated with each positive class subset to obtain six undersampled datasets with P:N = 1:1. Six models were trained with each undersampled dataset and the performance metrics were averaged.
The tuning of hyperparameters was performed for each ML algorithm to improve the performance of the models using grid search technique, which is an exhaustive search using a parameter grid created by taking the cartesian product of pre-specified sets of values for each hyperparameter. Hyperparameter optimization was performed separately for both sets of models - one trained with the whole training set and another with the undersampled datasets. The optimal model wass saved and used in this prediction server.
Table - 2: Performance of models with 5-fold cross validation
Dataset | Model | Accuracy | Sensitivity | Specificity | F1-score | MCC |
---|---|---|---|---|---|---|
Whole training dataset | Support Vector Machine (SVM) | 0.835 | 0.837 | 0.826 | 0.897 | 0.532 |
Random Forest (RF) | 0.906 | 0.955 | 0.609 | 0.946 | 0.597 | |
Naive Bayes (NB) | 0.870 | 0.915 | 0.602 | 0.924 | 0.495 | |
Multi-layer Perceptron (MLP) | 0.918 | 0.966 | 0.626 | 0.953 | 0.645 | |
Under-sampled datasets | Support Vector Machine (SVM) | 0.823 | 0.825 | 0.821 | 0.824 | 0.650 |
Random Forest (RF) | 0.822 | 0.832 | 0.811 | 0.824 | 0.647 | |
Naive Bayes (NB) | 0.800 | 0.864 | 0.737 | 0.813 | 0.607 | |
Multi-layer Perceptron (MLP) | 0.837 | 0.853 | 0.822 | 0.841 | 0.682 |
The MLP model trained with the under-sampled datsets showed optimal performance with MCC of 0.68 and accuracy of 83.7% (Table - 2). This model is used in this prediction server. The hyperparameters chosen by the grid-search algorithm for this MLP model used two hidden layer architecture - 100 nodes in the first hidden layer followed by 100 nodes in the second. The input and output layers used 12 and 1 nodes respectively. An "adam" weight optimizer and a rectified linear unit (ReLU) activation function with constant learning rate of 0.001 was used. The ROC plot of the different models are given in Figure - 1. The performance of the models on blind dataset (Group - B) is given in Table - 3.
Figure - 1: Receiver Operator Characteristic (ROC) plot of different models. (σ-standard deviation)
Table - 3: Performance of models on predicting the validation dataset
Training Dataset | Model | Accuracy | Sensitivity | Specificity | F1-score | MCC |
---|---|---|---|---|---|---|
Whole training dataset | Support Vector Machine (SVM) | 0.853 | 0.897 | 0.765 | 0.891 | 0.667 |
Random Forest (RF) | 0.835 | 0.971 | 0.561 | 0.887 | 0.619 | |
Naive Bayes (NB) | 0.823 | 0.944 | 0.580 | 0.877 | 0.586 | |
Multi-layer Perceptron (MLP) | 0.857 | 0.986 | 0.596 | 0.902 | 0.677 | |
Under-sampled datasets | Support Vector Machine (SVM) | 0.854 | 0.898 | 0.766 | 0.892 | 0.669 |
Random Forest (RF) | 0.862 | 0.902 | 0.781 | 0.897 | 0.687 | |
Naive Bayes (NB) | 0.855 | 0.926 | 0.712 | 0.895 | 0.665 | |
Multi-layer Perceptron (MLP) | 0.849 | 0.886 | 0.774 | 0.887 | 0.663 |
Bhattacharjee S. et al., J Comput Sci (2022), 63:101768. doi: 10.1016/j.jocs.2022.101768. Please contact Dr. Sudipto Saha (ssaha4@jcbose.ac.in) regarding any further queries.