main main
KEEL-dataset - data set description
dataset/images/rcv1v2-s4.jpg



This section describes main characteristics of the rcv1v2-s4 data set and its attributes:

General information

Reuters Corpus Volume I (v2) - subset 4 data set
TypeMulti labelOriginReal world
Features 47229(Real / Integer / Nominal)(47229 / 0 / 0)
Instances6000 Classes101
Missing values?No

Additional information

Reuters Corpus Volume I (RCV1) is an archive of over 800,000 manually categorized newswire stories recently made avaliable by Reuters, Ltd. for research prurposes. Use of this data for research on text categorization requires a detailed understanding of the real world constraints under which the data was produced. This data set contains the subset 4 of Reuters Corpus Volume I.




In this section you can download some files related to the rcv1v2-s4 data set:

  • The complete data set already formatted in KEEL format can be downloaded from herezip.gif.
  • A copy of the data set already partitioned by means of a 10-folds cross validation procedure can be downloaded from herezip.gif.
  • A copy of the data set already partitioned by means of a 5-folds cross validation procedure can be downloaded from herezip.gif.
  • The header file associated to this data set can be downloaded from heretxt.png.
  • This is not a native data set from the KEEL project. It has been obtained from the Mulan repository. The original page where the data set can be found is: http://mulan.sourceforge.net/datasets.html.


 
 Copyright 2004-2018, KEEL (Knowledge Extraction based on Evolutionary Learning)
About the Webmaster Team
Valid XHTML 1.1   Valid CSS!