**Data set**

The selected data set is iris dataset (classification) from Python’sScikit learn library.

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)

:Number of Attributes: 4 numeric, predictive attributes and the class

:Attribute Information:

- sepal length in cm

- sepal width in cm

- petal length in cm

- petal width in cm

- class:

- Iris-Setosa

- Iris-Versicolour

- Iris-Virginica

:Missing Attribute Values: None

:Class Distribution: 33.3% for each of 3 classes.

:Creator: R.A. Fisher

:Donor: Michael Marshall (MARSHALL%[email protected])

:Date: July, 1988

This is a copy of UCI ML iris datasets.

http://archive.ics.uci.edu/ml/datasets/Iris

The famous Iris database, first used by Sir R.A Fisher

This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda& Hart, for example.)

Thedata set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

**The classification algorithms**

- Gaussian Naive Bayes - a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.
- KNeighborsClassifier - is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression.
- Support Vector Classification - The implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples. The multiclass support is handled according to a one-vs-one scheme.

**3. ****Training and testing data split**

The ratio train / test data set = 70 / 30

Size of X_train - 105

Size of X_test - 45

Size of Y_train - 105

Size of Y_test – 45

**Result**

Gaussian Naive Bayes | KNeighborsClassifier | SVC | |

Accuracy score | 0.977777777778 | 1.0 | 1.0 |

F-score | 0.977585377585 | 1.0 | 1.0 |

Precision score | 0.97908496732 | 1.0 | 1.0 |

So, we can see that the best result give KNeighborsClassifier and SVC classifiers.