- Data set
The selected data set is iris dataset (classification) from Python’sScikit learn library.
Data Set Characteristics:
:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
- class:
- Iris-Setosa
- Iris-Versicolour
- Iris-Virginica
:Missing Attribute Values: None
:Class Distribution: 33.3% for each of 3 classes.
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%[email protected])
:Date: July, 1988
This is a copy of UCI ML iris datasets.
http://archive.ics.uci.edu/ml/datasets/Iris
The famous Iris database, first used by Sir R.A Fisher
This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda& Hart, for example.)
Thedata set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
- The classification algorithms
- Gaussian Naive Bayes - a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.
- KNeighborsClassifier - is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression.
- Support Vector Classification - The implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples. The multiclass support is handled according to a one-vs-one scheme.
3. Training and testing data split
The ratio train / test data set = 70 / 30
Size of X_train - 105
Size of X_test - 45
Size of Y_train - 105
Size of Y_test – 45
Result
Gaussian Naive Bayes | KNeighborsClassifier | SVC | |
Accuracy score | 0.977777777778 | 1.0 | 1.0 |
F-score | 0.977585377585 | 1.0 | 1.0 |
Precision score | 0.97908496732 | 1.0 | 1.0 |
So, we can see that the best result give KNeighborsClassifier and SVC classifiers.