main main
KEEL - dataset     Semi-supervised Classification data sets

In semi-supervised classification we work with both unlabeled and labeled examples. Two main settings can be found: transductive and inductive classification. The former concerns the problem of predicting the labels of the unlabeled examples, given in advance (in the training set), by taking both labeled and unlabeled data together into account to train a classifier. The latter considers the given labeled and unlabeled data as the training examples, and its objective is to predict unseen data.

All data sets included here are in the section containing Standard data sets. All of them, which were partitioned using a 10-fold cross-validation procedure, were employed in the paper:

I. Triguero, S. García, and F.Herrera, Self-Labeled Techniques for Semi-Supervised Learning: Taxonomy, Software and Empirical Study. Knowledge and Information Systems, 42 (2015) 245-284.

Pdf

It is noteworthy that test partitions are kept aside to evaluate for inductive purposes.

The training partitions have been divided into labeled and unlabeled examples. In the division process we do not maintain the class proportion in the labeled and unlabeled sets since the main aim of semi-supervised classification is to exploit unlabeled data for better classification results. Hence, we use a random selection of examples that will be marked as labeled instances, and the class label of the rest of the instances will be removed.

Four labeled ratios are considered: 10, 20, 30 and 40 %. Thus, for instance, assuming a data set that contains 1,000 examples, when the labeled rate is 10 %, 100 examples are put into the labeled set with their labels, while the remaining 900 examples are put into the unlabeled set without their labels.

This section shows the semi-supervised classification data sets avalaible in this repository. Every one defines a semi-supervised classification problem, where each of its examples is composed by some nominal or numerical attributes and a nominal output attribute (its class).

These data sets are composed of three files for each partition: training, transductive and test partitions. The former is composed of labeled and unlabeled instances (labeled as ''unlabeled''). Transductive partition contains the real class of unlabeled instances and the latter collect the test instances.

Each data file has the following structure:

  • @relation: Name of the data set
  • @attribute: Description of an attribute (one for each attribute)
  • @inputs: List with the names of the input attributes
  • @output: Name of the output attribute
  • @data: Starting tag of the data

The rest of the file contains all the examples belonging to the data set, expressed in comma sepparated values format. None of the data sets contains missing values.

KEEL - dataset

We offer information about experimental studies using these data sets (result files, papers and more) in the Experimental studies with Semi-supervised classification data section of the repository.



Below you can find all the Semi-supervised Classification data sets available with 10% of labeled data. For each data set, it is shown its name and its number of instances, attributes (Real/Integer/Nominal valued), classes (number of possible values of the output variable).

The table allows to download each data set already partitioned, by means of a 10-folds cross validation procedure, in KEEL format (inside a ZIP file).

By clicking in the column headers, you can order the table by names (alphabetically), by the number of examples, attributes or classes. Clicking again will sort the rows in reverse order.

Namedownarrow.png#Attributes (R/I/N)downarrow.png#Examplesdownarrow.png#Classesdownarrow.png 10-fcv Header
hepatitis-ssl1019        (2/17/0)802zip.giftxt.png
zoo-ssl1016        (0/0/16)1017zip.giftxt.png
appendicitis-ssl107        (7/0/0)1062zip.giftxt.png
lymphography-ssl1018        (0/3/15)1484zip.giftxt.png
iris-ssl104        (4/0/0)1503zip.giftxt.png
tae-ssl105        (0/5/0)1513zip.giftxt.png
automobile-ssl1025        (15/0/10)1596zip.giftxt.png
wine-ssl1013        (13/0/0)1783zip.giftxt.png
sonar-ssl1060        (60/0/0)2082zip.giftxt.png
glass-ssl109        (9/0/0)2147zip.giftxt.png
housevotes-ssl1016        (0/0/16)2322zip.giftxt.png
spectfheart-ssl1044        (0/44/0)2672zip.giftxt.png
heart-ssl1013        (1/12/0)2702zip.giftxt.png
breast-ssl109        (0/0/9)2772zip.giftxt.png
cleveland-ssl1013        (13/0/0)2975zip.giftxt.png
haberman-ssl103        (0/3/0)3062zip.giftxt.png
ecoli-ssl107        (7/0/0)3368zip.giftxt.png
bupa-ssl106        (1/5/0)3452zip.giftxt.png
dermatology-ssl1034        (0/34/0)3586zip.giftxt.png
movement_libras-ssl1090        (90/0/0)36015zip.giftxt.png
monk-2-ssl106        (0/6/0)4322zip.giftxt.png
saheart-ssl109        (5/3/1)4622zip.giftxt.png
led7digit-ssl107        (7/0/0)50010zip.giftxt.png
crx-ssl1015        (3/3/9)6532zip.giftxt.png
wisconsin-ssl109        (0/9/0)6832zip.giftxt.png
australian-ssl1014        (3/5/6)6902zip.giftxt.png
pima-ssl108        (8/0/0)7682zip.giftxt.png
mammographic-ssl105        (0/5/0)8302zip.giftxt.png
vehicle-ssl1018        (0/18/0)8464zip.giftxt.png
tic-tac-toe-ssl109        (0/0/9)9582zip.giftxt.png
vowel-ssl1013        (10/3/0)99011zip.giftxt.png
german-ssl1020        (0/7/13)10002zip.giftxt.png
flare-ssl1011        (0/0/11)10666zip.giftxt.png
contraceptive-ssl109        (0/9/0)14733zip.giftxt.png
yeast-ssl108        (8/0/0)148410zip.giftxt.png
titanic-ssl103        (3/0/0)22012zip.giftxt.png
segment-ssl1019        (19/0/0)23107zip.giftxt.png
splice-ssl1060        (0/0/60)31903zip.giftxt.png
chess-ssl1036        (0/0/36)31962zip.giftxt.png
abalone-ssl108        (7/0/1)417428zip.giftxt.png
spambase-ssl1057        (57/0/0)45972zip.giftxt.png
banana-ssl102        (2/0/0)53002zip.giftxt.png
phoneme-ssl105        (5/0/0)54042zip.giftxt.png
page-blocks-ssl1010        (4/6/0)54725zip.giftxt.png
texture-ssl1040        (40/0/0)550011zip.giftxt.png
mushroom-ssl1022        (0/0/22)56442zip.giftxt.png
satimage-ssl1036        (0/36/0)64357zip.giftxt.png
marketing-ssl1013        (0/13/0)68769zip.giftxt.png
thyroid-ssl1021        (6/15/0)72003zip.giftxt.png
ring-ssl1020        (20/0/0)74002zip.giftxt.png
twonorm-ssl1020        (20/0/0)74002zip.giftxt.png
coil2000-ssl1085        (0/85/0)98222zip.giftxt.png
penbased-ssl1016        (0/16/0)1099210zip.giftxt.png
nursery-ssl108        (0/0/8)129605zip.giftxt.png
magic-ssl1010        (10/0/0)190202zip.giftxt.png
All data setszip.gif

Below you can find all the Semi-supervised Classification data sets available with 20% of labeled data. For each data set, it is shown its name and its number of instances, attributes (Real/Integer/Nominal valued), classes (number of possible values of the output variable).

The table allows to download each data set already partitioned, by means of a 10-folds cross validation procedure, in KEEL format (inside a ZIP file).

By clicking in the column headers, you can order the table by names (alphabetically), by the number of examples, attributes or classes. Clicking again will sort the rows in reverse order.

Namedownarrow.png#Attributes (R/I/N)downarrow.png#Examplesdownarrow.png#Classesdownarrow.png 10-fcv Header
hepatitis-ssl2019        (2/17/0)802zip.giftxt.png
zoo-ssl2016        (0/0/16)1017zip.giftxt.png
appendicitis-ssl207        (7/0/0)1062zip.giftxt.png
lymphography-ssl2018        (0/3/15)1484zip.giftxt.png
iris-ssl204        (4/0/0)1503zip.giftxt.png
tae-ssl205        (0/5/0)1513zip.giftxt.png
automobile-ssl2025        (15/0/10)1596zip.giftxt.png
wine-ssl2013        (13/0/0)1783zip.giftxt.png
sonar-ssl2060        (60/0/0)2082zip.giftxt.png
glass-ssl209        (9/0/0)2147zip.giftxt.png
housevotes-ssl2016        (0/0/16)2322zip.giftxt.png
spectfheart-ssl2044        (0/44/0)2672zip.giftxt.png
heart-ssl2013        (1/12/0)2702zip.giftxt.png
breast-ssl209        (0/0/9)2772zip.giftxt.png
cleveland-ssl2013        (13/0/0)2975zip.giftxt.png
haberman-ssl203        (0/3/0)3062zip.giftxt.png
ecoli-ssl207        (7/0/0)3368zip.giftxt.png
bupa-ssl206        (1/5/0)3452zip.giftxt.png
dermatology-ssl2034        (0/34/0)3586zip.giftxt.png
movement_libras-ssl2090        (90/0/0)36015zip.giftxt.png
monk-2-ssl206        (0/6/0)4322zip.giftxt.png
saheart-ssl209        (5/3/1)4622zip.giftxt.png
led7digit-ssl207        (7/0/0)50010zip.giftxt.png
crx-ssl2015        (3/3/9)6532zip.giftxt.png
wisconsin-ssl209        (0/9/0)6832zip.giftxt.png
australian-ssl2014        (3/5/6)6902zip.giftxt.png
pima-ssl208        (8/0/0)7682zip.giftxt.png
mammographic-ssl205        (0/5/0)8302zip.giftxt.png
vehicle-ssl2018        (0/18/0)8464zip.giftxt.png
tic-tac-toe-ssl209        (0/0/9)9582zip.giftxt.png
vowel-ssl2013        (10/3/0)99011zip.giftxt.png
german-ssl2020        (0/7/13)10002zip.giftxt.png
flare-ssl2011        (0/0/11)10666zip.giftxt.png
contraceptive-ssl209        (0/9/0)14733zip.giftxt.png
yeast-ssl208        (8/0/0)148410zip.giftxt.png
titanic-ssl203        (3/0/0)22012zip.giftxt.png
segment-ssl2019        (19/0/0)23107zip.giftxt.png
splice-ssl2060        (0/0/60)31903zip.giftxt.png
chess-ssl2036        (0/0/36)31962zip.giftxt.png
abalone-ssl208        (7/0/1)417428zip.giftxt.png
spambase-ssl2057        (57/0/0)45972zip.giftxt.png
banana-ssl202        (2/0/0)53002zip.giftxt.png
phoneme-ssl205        (5/0/0)54042zip.giftxt.png
page-blocks-ssl2010        (4/6/0)54725zip.giftxt.png
texture-ssl2040        (40/0/0)550011zip.giftxt.png
mushroom-ssl2022        (0/0/22)56442zip.giftxt.png
satimage-ssl2036        (0/36/0)64357zip.giftxt.png
marketing-ssl2013        (0/13/0)68769zip.giftxt.png
thyroid-ssl2021        (6/15/0)72003zip.giftxt.png
ring-ssl2020        (20/0/0)74002zip.giftxt.png
twonorm-ssl2020        (20/0/0)74002zip.giftxt.png
coil2000-ssl2085        (0/85/0)98222zip.giftxt.png
penbased-ssl2016        (0/16/0)1099210zip.giftxt.png
nursery-ssl208        (0/0/8)129605zip.giftxt.png
magic-ssl2010        (10/0/0)190202zip.giftxt.png
All data setszip.gif

Below you can find all the Semi-supervised Classification data sets available with 30% of labeled data. For each data set, it is shown its name and its number of instances, attributes (Real/Integer/Nominal valued), classes (number of possible values of the output variable).

The table allows to download each data set already partitioned, by means of a 10-folds cross validation procedure, in KEEL format (inside a ZIP file).

By clicking in the column headers, you can order the table by names (alphabetically), by the number of examples, attributes or classes. Clicking again will sort the rows in reverse order.

Namedownarrow.png#Attributes (R/I/N)downarrow.png#Examplesdownarrow.png#Classesdownarrow.png 10-fcv Header
hepatitis-ssl3019        (2/17/0)802zip.giftxt.png
zoo-ssl3016        (0/0/16)1017zip.giftxt.png
appendicitis-ssl307        (7/0/0)1062zip.giftxt.png
lymphography-ssl3018        (0/3/15)1484zip.giftxt.png
iris-ssl304        (4/0/0)1503zip.giftxt.png
tae-ssl305        (0/5/0)1513zip.giftxt.png
automobile-ssl3025        (15/0/10)1596zip.giftxt.png
wine-ssl3013        (13/0/0)1783zip.giftxt.png
sonar-ssl3060        (60/0/0)2082zip.giftxt.png
glass-ssl309        (9/0/0)2147zip.giftxt.png
housevotes-ssl3016        (0/0/16)2322zip.giftxt.png
spectfheart-ssl3044        (0/44/0)2672zip.giftxt.png
heart-ssl3013        (1/12/0)2702zip.giftxt.png
breast-ssl309        (0/0/9)2772zip.giftxt.png
cleveland-ssl3013        (13/0/0)2975zip.giftxt.png
haberman-ssl303        (0/3/0)3062zip.giftxt.png
ecoli-ssl307        (7/0/0)3368zip.giftxt.png
bupa-ssl306        (1/5/0)3452zip.giftxt.png
dermatology-ssl3034        (0/34/0)3586zip.giftxt.png
movement_libras-ssl3090        (90/0/0)36015zip.giftxt.png
monk-2-ssl306        (0/6/0)4322zip.giftxt.png
saheart-ssl309        (5/3/1)4622zip.giftxt.png
led7digit-ssl307        (7/0/0)50010zip.giftxt.png
crx-ssl3015        (3/3/9)6532zip.giftxt.png
wisconsin-ssl309        (0/9/0)6832zip.giftxt.png
australian-ssl3014        (3/5/6)6902zip.giftxt.png
pima-ssl308        (8/0/0)7682zip.giftxt.png
mammographic-ssl305        (0/5/0)8302zip.giftxt.png
vehicle-ssl3018        (0/18/0)8464zip.giftxt.png
tic-tac-toe-ssl309        (0/0/9)9582zip.giftxt.png
vowel-ssl3013        (10/3/0)99011zip.giftxt.png
german-ssl3020        (0/7/13)10002zip.giftxt.png
flare-ssl3011        (0/0/11)10666zip.giftxt.png
contraceptive-ssl309        (0/9/0)14733zip.giftxt.png
yeast-ssl308        (8/0/0)148410zip.giftxt.png
titanic-ssl303        (3/0/0)22012zip.giftxt.png
segment-ssl3019        (19/0/0)23107zip.giftxt.png
splice-ssl3060        (0/0/60)31903zip.giftxt.png
chess-ssl3036        (0/0/36)31962zip.giftxt.png
abalone-ssl308        (7/0/1)417428zip.giftxt.png
spambase-ssl3057        (57/0/0)45972zip.giftxt.png
banana-ssl302        (2/0/0)53002zip.giftxt.png
phoneme-ssl305        (5/0/0)54042zip.giftxt.png
page-blocks-ssl3010        (4/6/0)54725zip.giftxt.png
texture-ssl3040        (40/0/0)550011zip.giftxt.png
mushroom-ssl3022        (0/0/22)56442zip.giftxt.png
satimage-ssl3036        (0/36/0)64357zip.giftxt.png
marketing-ssl3013        (0/13/0)68769zip.giftxt.png
thyroid-ssl3021        (6/15/0)72003zip.giftxt.png
ring-ssl3020        (20/0/0)74002zip.giftxt.png
twonorm-ssl3020        (20/0/0)74002zip.giftxt.png
coil2000-ssl3085        (0/85/0)98222zip.giftxt.png
penbased-ssl3016        (0/16/0)1099210zip.giftxt.png
nursery-ssl308        (0/0/8)129605zip.giftxt.png
magic-ssl3010        (10/0/0)190202zip.giftxt.png
All data setszip.gif

Below you can find all the Semi-supervised Classification data sets available with 40% of labeled data. For each data set, it is shown its name and its number of instances, attributes (Real/Integer/Nominal valued), classes (number of possible values of the output variable).

The table allows to download each data set already partitioned, by means of a 10-folds cross validation procedure, in KEEL format (inside a ZIP file).

By clicking in the column headers, you can order the table by names (alphabetically), by the number of examples, attributes or classes. Clicking again will sort the rows in reverse order.

Namedownarrow.png#Attributes (R/I/N)downarrow.png#Examplesdownarrow.png#Classesdownarrow.png 10-fcv Header
hepatitis-ssl4019        (2/17/0)802zip.giftxt.png
zoo-ssl4016        (0/0/16)1017zip.giftxt.png
appendicitis-ssl407        (7/0/0)1062zip.giftxt.png
lymphography-ssl4018        (0/3/15)1484zip.giftxt.png
iris-ssl404        (4/0/0)1503zip.giftxt.png
tae-ssl405        (0/5/0)1513zip.giftxt.png
automobile-ssl4025        (15/0/10)1596zip.giftxt.png
wine-ssl4013        (13/0/0)1783zip.giftxt.png
sonar-ssl4060        (60/0/0)2082zip.giftxt.png
glass-ssl409        (9/0/0)2147zip.giftxt.png
housevotes-ssl4016        (0/0/16)2322zip.giftxt.png
spectfheart-ssl4044        (0/44/0)2672zip.giftxt.png
heart-ssl4013        (1/12/0)2702zip.giftxt.png
breast-ssl409        (0/0/9)2772zip.giftxt.png
cleveland-ssl4013        (13/0/0)2975zip.giftxt.png
haberman-ssl403        (0/3/0)3062zip.giftxt.png
ecoli-ssl407        (7/0/0)3368zip.giftxt.png
bupa-ssl406        (1/5/0)3452zip.giftxt.png
dermatology-ssl4034        (0/34/0)3586zip.giftxt.png
movement_libras-ssl4090        (90/0/0)36015zip.giftxt.png
monk-2-ssl406        (0/6/0)4322zip.giftxt.png
saheart-ssl409        (5/3/1)4622zip.giftxt.png
led7digit-ssl407        (7/0/0)50010zip.giftxt.png
crx-ssl4015        (3/3/9)6532zip.giftxt.png
wisconsin-ssl409        (0/9/0)6832zip.giftxt.png
australian-ssl4014        (3/5/6)6902zip.giftxt.png
pima-ssl408        (8/0/0)7682zip.giftxt.png
mammographic-ssl405        (0/5/0)8302zip.giftxt.png
vehicle-ssl4018        (0/18/0)8464zip.giftxt.png
tic-tac-toe-ssl409        (0/0/9)9582zip.giftxt.png
vowel-ssl4013        (10/3/0)99011zip.giftxt.png
german-ssl4020        (0/7/13)10002zip.giftxt.png
flare-ssl4011        (0/0/11)10666zip.giftxt.png
contraceptive-ssl409        (0/9/0)14733zip.giftxt.png
yeast-ssl408        (8/0/0)148410zip.giftxt.png
titanic-ssl403        (3/0/0)22012zip.giftxt.png
segment-ssl4019        (19/0/0)23107zip.giftxt.png
splice-ssl4060        (0/0/60)31903zip.giftxt.png
chess-ssl4036        (0/0/36)31962zip.giftxt.png
abalone-ssl408        (7/0/1)417428zip.giftxt.png
spambase-ssl4057        (57/0/0)45972zip.giftxt.png
banana-ssl402        (2/0/0)53002zip.giftxt.png
phoneme-ssl405        (5/0/0)54042zip.giftxt.png
page-blocks-ssl4010        (4/6/0)54725zip.giftxt.png
texture-ssl4040        (40/0/0)550011zip.giftxt.png
mushroom-ssl4022        (0/0/22)56442zip.giftxt.png
satimage-ssl4036        (0/36/0)64357zip.giftxt.png
marketing-ssl4013        (0/13/0)68769zip.giftxt.png
thyroid-ssl4021        (6/15/0)72003zip.giftxt.png
ring-ssl4020        (20/0/0)74002zip.giftxt.png
twonorm-ssl4020        (20/0/0)74002zip.giftxt.png
coil2000-ssl4085        (0/85/0)98222zip.giftxt.png
penbased-ssl4016        (0/16/0)1099210zip.giftxt.png
nursery-ssl408        (0/0/8)129605zip.giftxt.png
magic-ssl4010        (10/0/0)190202zip.giftxt.png
All data setszip.gif

Collecting Data Sets

If you have some example data sets and you would like to share them with the rest of the research community by means of this page, please be so kind as to send your data to the Webmaster Team with the following information:

  • People answerable for the data (full name, affiliation, e-mail, web page, ...).
  • training and test data sets considered, preferably in ASCII format.
  • A brief description of the application.
  • References where it is used.
  • Results obtained by the methods proposed by the authors or used for comparison.
  • Type of experiment developed.
  • Any additional useful information.

Collecting Results

If you have applied your methods to some of the problems presented here we will be glad of showing your results in this page. Please be so kind as to send the following information to Webmaster Team:

  • Name of the application considered and type of experiment developed.
  • Results obtained by the methods proposed by the authors or used for comparison.
  • References where the results are shown.
  • Any additional useful information.

Contact Us

If you are interested on being informed of each update made in this page or you would like to comment on it, please contact with the Webmaster Team.



 
 Copyright 2004-2018, KEEL (Knowledge Extraction based on Evolutionary Learning)
About the Webmaster Team
Valid XHTML 1.1   Valid CSS!