main main
KEEL-dataset - data set description
dataset/images/enron.jpg



This section describes main characteristics of the enron data set and its attributes:

General information

e-mail messages data set
TypeMulti labelOriginReal world
Features 1001(Real / Integer / Nominal)(0 / 0 / 1001)
Instances1702 Classes53
Missing values?No

Additional information

e-mail messages data set contains a subset of about 1700 labeled email messages. These were chosen in a semi-motivated fashion (focusing on business-related emails and the California Energy Crises and on emails that occurred later in the collection, trying to avoid very personal messages, jokes, and so on). Each message was labeled by two people, but no claims of consistency, comprehensiveness, nor generality are made about these labelings.




In this section you can download some files related to the enron data set:

  • The complete data set already formatted in KEEL format can be downloaded from herezip.gif.
  • A copy of the data set already partitioned by means of a 10-folds cross validation procedure can be downloaded from herezip.gif.
  • A copy of the data set already partitioned by means of a 5-folds cross validation procedure can be downloaded from herezip.gif.
  • The header file associated to this data set can be downloaded from heretxt.png.
  • This is not a native data set from the KEEL project. It has been obtained from the Mulan repository. The original page where the data set can be found is: http://mulan.sourceforge.net/datasets.html.


 
 Copyright 2004-2018, KEEL (Knowledge Extraction based on Evolutionary Learning)
About the Webmaster Team
Valid XHTML 1.1   Valid CSS!