KEEL: A software tool to assess evolutionary algorithms for Data Mining problems (regression, classification, clustering, pattern mining and so on)

Publicaciones Soportadas por el Proyecto KEEL

Iris Plants Database

Description
Data Set

Description

Name: Iris Plants Database
Type: Real World

Number of examples: 150

Number of features: 4
Domain of the feature 1: [4.3, 7.9]
Domain of the feature 2: [2.0, 4.4]
Domain of the feature 3: [1.0, 6.9]
Domain of the feature 4: [0.1, 2.5]
Number of Clases: 3
Domain of the Class: [Iris-setosa, Iris-versicolor, Iris-virginica]

The data set contains classes of 150 instances each one, where each class talks about a type of plant rainbow. A class is linearly separable of the other two and the two last ones are not linearly separable one of another one.

Data Set

You can download the original UCI data file http://www.keel.es/dataset/uci/iris.data and the UCI description file of the dataset http://www.keel.es/dataset/uci/iris.names or you can visit the UCI web site of the dataset http://archive.ics.uci.edu/ml/datasets/Iris.

In Keel, the original data set is partitioned using 10-fold cross-validation procedure. The initial data set T, is randomly divided into 10 disjoint sets of equal size T1,...,TN. We maintain the original class distribution (before partitioning) within each set when carrying out the partition process. We then conduct 10 pairs of training and test sets.

Moreover, the original data set of 150 examples has been randomly divided into 2 different subsets (each one with 75 examples) for 5 times. We have builded 5 different partitions of the original data set at 50%-50% to apply the 5x2cv paired t-test , namely we perform 5 replications of 2-fold cross-validation.

Training (50% examples)	Test (50% examples)
Iris-5x2-1tra.dat	Iris-5x2-1tst.dat
Iris-5x2-2tra.dat	Iris-5x2-2tst.dat
Iris-5x2-3tra.dat	Iris-5x2-3tst.dat
Iris-5x2-4tra.dat	Iris-5x2-4tst.dat
Iris-5x2-5tra.dat	Iris-5x2-5tst.dat

You can also download the whole data set in Keel format here.