main main
KEEL - dataset     Missing values data sets

This section shows the data sets with missing values avalaible in the repository. Every one defines a supervised classification problem, where each of its examples is composed by some nominal or numerical attributes and a nominal output attribute (its class).

Each data file has the following structure:

  • @relation: Name of the data set
  • @attribute: Description of an attribute (one for each attribute)
  • @inputs: List with the names of the input attributes
  • @output: Name of the output attribute
  • @data: Starting tag of the data

The rest of the file contains all the examples belonging to the data set, expressed in comma sepparated values format. Missing values are expressed either with ? or <null> tokens.

KEEL - dataset

We offer information about experimental studies using these data sets (result files, papers and more) in the Experimental studies with data sets with missing values section of the repository.


Below you can find all the Missing values data sets available. For each data set, it is shown its name and its number of instances, attributes (Real/Integer/Nominal valued), classes (number of possible values of the output variable) and percentage of examples with missing values.

The table allows to download each data set in KEEL format (inside a ZIP file). Additionally, it is possible to obtain the data set already partitioned, by means of a 10-folds / 5-folds stratified cross validation procedure.

By clicking in the column headers, you can order the table by names (alphabetically), by the number of examples, attributes, classes or percentage of examples with missing values. Clicking again will sort the rows in reverse order.

Namedownarrow.png#Attributes (R/I/N)uparrow.png#Examplesdownarrow.png#Classesdownarrow.png%MVs (Examples)downarrow.pngData set10-fcv5-fcvHeader
census41        (1/12/28)299284352.38 %zip.gifzip.gifzip.giftxt.png
dermatology34        (0/34/0)36662.19 %zip.gifzip.gifzip.giftxt.png
automobile25        (15/0/10)205626.83 %zip.gifzip.gifzip.giftxt.png
horse-colic23        (7/1/15)368298.1 %zip.gifzip.gifzip.giftxt.png
mushroom22        (0/0/22)8124230.53 %zip.gifzip.gifzip.giftxt.png
bands19        (13/6/0)539232.28 %zip.gifzip.gifzip.giftxt.png
hepatitis19        (2/17/0)155248.39 %zip.gifzip.gifzip.giftxt.png
housevotes16        (0/0/16)435246.67 %zip.gifzip.gifzip.giftxt.png
crx15        (3/3/9)69025.36 %zip.gifzip.gifzip.giftxt.png
adult14        (6/0/8)4884227.41 %zip.gifzip.gifzip.giftxt.png
cleveland13        (13/0/0)30351.98 %zip.gifzip.gifzip.giftxt.png
marketing13        (0/13/0)8993923.54 %zip.gifzip.gifzip.giftxt.png
wisconsin9        (0/9/0)69922.29 %zip.gifzip.gifzip.giftxt.png
breast9        (0/0/9)28623.15 %zip.gifzip.gifzip.giftxt.png
post-operative8        (0/0/8)9033.33 %zip.gifzip.gifzip.giftxt.png
mammographic5        (0/5/0)961213.63 %zip.gifzip.gifzip.giftxt.png
All data setszip.gif

This section provides a set of classification data set with missing values induced. They are modified version from those that can be found in the Standard classification data sets category of the repository, where a 10% of values have been randomly removed (only training partitions present missing values. Test partitions remains unchanged).

For each data set, it is shown its name and its number of instances, attributes (Real/Integer/Nominal valued), classes (number of possible values of the output variable) and percentage of examples with missing values.

The table contains the datasets used in the paper:

J. Luengo, S. García, F. Herrera, A Study on the Use of Imputation Methods for Experimentation with Radial Basis Function Network Classifiers Handling Missing Attribute Values: The good synergy between RBFs and EventCovering method. Neural Networks 23 406-418, doi:10.1016/j.neunet.2009.11.014.

Pdf
, and it allows to download the 10-folds cross validation partitions of each data set in KEEL format (inside a ZIP file) that were used in the aforementioned paper.

Name#Attributes (R/I/N)#Examples#Classes%MVs (Examples)10-fcv
Iris+MV4 (4/0/0)150332.67 %zip.gif
Pima+MV8 (8/0/0)768250.65 %zip.gif
Wine+MV13 (13/0/0)178370.22 %zip.gif
Australian+MV14 (3/5/6)690270.58 %zip.gif
Newthyroid+MV5 (4/1/0)215335.35 %zip.gif
Ecoli+MV7 (7/0/0)336848.21 %zip.gif
Satimage+MV36 (0/36/0)6435787.80 %zip.gif
German+MV20 (0/7/13)1000280.00 %zip.gif
Magic+MV10 (10/0/0)1902258.20 %zip.gif
Shuttle+MV9 (0/9/0)2175755.95 %zip.gif
All data setszip.gif

Collecting Data Sets

If you have some example data sets and you would like to share them with the rest of the research community by means of this page, please be so kind as to send your data to the Webmaster Team with the following information:

  • People answerable for the data (full name, affiliation, e-mail, web page, ...).
  • training and test data sets considered, preferably in ASCII format.
  • A brief description of the application.
  • References where it is used.
  • Results obtained by the methods proposed by the authors or used for comparison.
  • Type of experiment developed.
  • Any additional useful information.

Collecting Results

If you have applied your methods to some of the problems presented here we will be glad of showing your results in this page. Please be so kind as to send the following information to Webmaster Team:

  • Name of the application considered and type of experiment developed.
  • Results obtained by the methods proposed by the authors or used for comparison.
  • References where the results are shown.
  • Any additional useful information.

Contact Us

If you are interested on being informed of each update made in this page or you would like to comment on it, please contact with the Webmaster Team.



 
 Copyright 2004-2018, KEEL (Knowledge Extraction based on Evolutionary Learning)
About the Webmaster Team
Valid XHTML 1.1   Valid CSS!