Predicting Exercise Histories of Eating-Disordered and Control Subjects Using Machine Learning Classification Techniques
We are going to Classify the Exercise Histories of Eating-Disordered and control subjects subjects using machine learning classification techniques like Decision Tree, Logistic Regression, Random Forest, Neural Networks and Perceptron.
- Loading and preprocessing our Data
- Splitting the data into training and testing samples.
- Using different Machine Learning techniques like Decision Tree, Logistic Regression, Random Forest, Neural Networks and Perceptron.
- Different setups can be evaluated using suitable evaluation measures such as Accuracy, Precision, Recall etc.
- Concluding the best model.
- Experiments with Regularization
About the Data:
My Dataset name is Blackmore.csv. It has 946 rows and 5 columns and the columns are Subject, age, exercise, group. Our target label here is group, we need to classify the group based on patient and control.
For binary classification, we are interested in classifying data into one of two binary groups — these are usually represented as 0’s and 1’s in our data.
- Loading data and preprocessing:
This code reads the dataset using pandas.
The code below deletes 1st row and 2 columns and assigns 0 for ‘control’ and 1 for ‘patient’.
2) Splitting the data into training samples and testing samples:
Dividing the dataset into 70% training and 30% testing.
3) Using Classification techniques and finding accuracy of our model:
- Decision Tree:
2. Logistic Regression:
3. Random Forest:
4. Neural Networks:
4) Analyzing Different Classification Metrics like Precision, Recall, Accuracy etc:
We have already analyzed all these classification metrics for all the above classification techniques.
5) Analyzing and concluding the best model from these Classification Algorithms:
Now, we have to compare the predictive accuracy between all these models to know which one of of these classification techniques is most accurate for this dataset:
- Decision Tree cart : 0.58
- Decision Tree ID3 : 0.57
- Logistic Regression : 0.63
- Random Forest : 0.63
- Neural Networks : 0.63
- Perceptron : 0.63
This shows us that for this specific dataset Logistic Regression, Random Forest, Neural Networks and perceptron classification techniques are most accurate.
Experiments with Regularization:
We are selecting neural networks for experimenting and we change alpha value . alpha increases the affect of regularization, e.g. if alpha is zero there is no regularization and the higher the alpha, the more the regularization parameter influences the final model.
- alpha = 0.1
Accuracy for alpha 0.1 =0.64
Accuracy for alpha 0.9 = 0.63
So, we get different accuracy from different alpha values. From this, we can conclude that accuracy when alpha 0.1 is greater than the accuracy when alpha is equal to 0.9. So, for better classification we use alpha=0.1.
So, from this post we have learned about the classification techniques of machine learning and finding the technique which is the best suited for the given dataset through accuracy from each technique.