This section describes main characteristics of the enron data set and its attributes:
e-mail messages data set contains a subset of about 1700 labeled email messages. These were chosen in a semi-motivated fashion (focusing on business-related emails and the California Energy Crises and on emails that occurred later in the collection, trying to avoid very personal messages, jokes, and so on). Each message was labeled by two people, but no claims of consistency, comprehensiveness, nor generality are made about these labelings.