This section describes main characteristics of the ecoli data set and its attributes:
General information
Ecoli data set |
Type | Classification | Origin | Real world |
Features | 7 | (Real / Integer / Nominal) | (7 / 0 / 0) |
Instances | 336 |
Classes | 8 |
Missing values? | No |
Attribute description
Attribute | Domain |
Mcg | [0.0,89.0] |
Gvh | [1.0,88.0] |
Lip | [1.0,48.0] |
Chg | [1.0,5.0] |
Aac | [0.0,88.0] |
Alm1 | [1.0,94.0] |
Alm2 | [0.0,99.0] |
Site | {cp,im,imS,imL,imU,om,omL,pp} |
Additional information
The objective of this problem is to predict the localization site of proteins by employing some measures about the cell (cytoplasm, inner membrane, perisplasm, outer membrane, outer membrane lipoprotein, inner membrane lipoprotein inner membrane, cleavable signal sequence). To asses the data to classification process, the first attribute of the original data set (the sequence name) has been removed in this version
In this section you can download some files related to the ecoli data set:
- The complete data set already formatted in KEEL format can be downloaded from
here.
- A copy of the data set already partitioned by means of a 10-folds cross validation procedure can be downloaded from here.
- A copy of the data set already partitioned by means of a 5-folds cross validation procedure can be downloaded from here.
- The header file associated to this data set can be downloaded from here.
- This is not a native data set from the KEEL project. It has been obtained from the UCI Machine Learning Repository . The original page where the data set can be found is: http://archive.ics.uci.edu/ml/datasets/Ecoli.
|