1. Data set

The selected data set is iris dataset (classification) from  Python’sScikit learn library.

Data Set Characteristics:

:Number of Instances: 150 (50 in each of three classes)

:Number of Attributes: 4 numeric, predictive attributes and the class

:Attribute Information:

- sepal length in cm

- sepal width in cm

- petal length in cm

- petal width in cm

- class:

- Iris-Setosa

- Iris-Versicolour

- Iris-Virginica

:Missing Attribute Values: None

:Class Distribution: 33.3% for each of 3 classes.

:Creator: R.A. Fisher

:Donor: Michael Marshall (MARSHALL%[email protected])

:Date: July, 1988

This is a copy of UCI ML iris datasets.


The famous Iris database, first used by Sir R.A Fisher

This is perhaps the best known database to be found in the pattern recognition literature.  Fisher's paper is a classic in the field and is referenced frequently to this day.  (See Duda& Hart, for example.)

Thedata set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.  One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

  1. The classification algorithms
  1. Gaussian Naive Bayes - a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.
  2. KNeighborsClassifier - is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression.
  3. Support Vector Classification - The implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples. The multiclass support is handled according to a one-vs-one scheme.

3. Training and testing data split

The ratio train / test data set = 70 / 30

Size of X_train - 105

Size of X_test - 45

Size of Y_train - 105

Size of Y_test – 45


  Gaussian Naive Bayes KNeighborsClassifier SVC
Accuracy score 0.977777777778 1.0 1.0
F-score 0.977585377585 1.0 1.0
Precision score 0.97908496732 1.0 1.0

So, we can see that the best result give KNeighborsClassifier and SVC classifiers.

Share This