This project aims to compare the performance of four classifiers, namely K-Nearest Neighbors (KNN), Random Forest, Multi-Layer Perceptron (MLP), and Support Vector Machine (SVM), for pattern classification. The project utilizes four different datasets: Iris, Breast Cancer, Digits, and Wine.
The project allows the user to choose one of the following datasets for classification:
- Iris: A dataset containing measurements of iris flowers from three different species.
- Breast Cancer: A dataset containing features of breast cancer tumors, categorized as malignant or benign.
- Digits: A dataset of handwritten digits (0-9) represented as 8x8 images.
- Wine: A dataset with chemical analysis results of wines from three different cultivars.
- The user is prompted to enter the name of the dataset they want to use for classification.
- The chosen dataset is loaded using scikit-learn's
load_*
functions. - The dataset is divided into training and testing sets using a 80:20 train-test split.
- Preprocessing steps, such as data normalization or feature selection, can be applied to the data (not implemented in the provided code).
- Four classifiers (KNN, Random Forest, MLP, and SVM) are trained on the training set.
- The trained classifiers are evaluated using the testing set.
- Evaluation metrics, including accuracy, precision, recall, and F1 score, are calculated for each classifier.
- Comparison bar graphs and confusion matrices are generated to visualize the performance of each classifier.
- Ensure that you have Python installed on your system.
- Install the required dependencies by running the following command:
pip install -r requirements.txt
. - Run the code by executing the script file:
python classifier_comparison.py
. - Follow the on-screen instructions to select the dataset.
- The code will output the evaluation metrics and generate comparison graphs and confusion matrices.
To further enhance the project, the following improvements can be considered:
- Implement cross-validation to obtain more reliable performance estimates.
- Perform hyperparameter tuning using techniques like grid search for each classifier.
- Utilize scikit-learn's
Pipeline
to streamline the workflow and make the code more modular. - Explore feature engineering techniques to improve classification performance.
- Consider incorporating ensemble methods to further boost classifier performance.
- Experiment with additional classifiers available in scikit-learn.
This project is licensed under the MIT License.