Select Page
  1. Data set

The selected data set is iris dataset (classification) from  Python’sScikit learn library.

Data Set Characteristics:

:Number of Instances: 150 (50 in each of three classes)

:Number of Attributes: 4 numeric, predictive attributes and the class

:Attribute Information:

- sepal length in cm

- sepal width in cm

- petal length in cm

- petal width in cm

- class:

- Iris-Setosa

- Iris-Versicolour

- Iris-Virginica

:Missing Attribute Values: None

:Class Distribution: 33.3% for each of 3 classes.

:Creator: R.A. Fisher

:Donor: Michael Marshall (MARSHALL%[email protected])

:Date: July, 1988

This is a copy of UCI ML iris datasets.

http://archive.ics.uci.edu/ml/datasets/Iris

The famous Iris database, first used by Sir R.A Fisher

This is perhaps the best known database to be found in the pattern recognition literature.  Fisher's paper is a classic in the field and is referenced frequently to this day.  (See Duda& Hart, for example.)

Thedata set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.  One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

  1. The classification algorithms
  1. Gaussian Naive Bayes - a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.
  2. KNeighborsClassifier - is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression.
  3. Support Vector Classification - The implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples. The multiclass support is handled according to a one-vs-one scheme.

3. Training and testing data split

The ratio train / test data set = 70 / 30

Size of X_train - 105

Size of X_test - 45

Size of Y_train - 105

Size of Y_test – 45

Result

  Gaussian Naive Bayes KNeighborsClassifier SVC
Accuracy score 0.977777777778 1.0 1.0
F-score 0.977585377585 1.0 1.0
Precision score 0.97908496732 1.0 1.0

So, we can see that the best result give KNeighborsClassifier and SVC classifiers.

Share This