This section describes main characteristics of the page-blocks data set and its attributes:
General information
| Page Blocks Classification data set |
| Type | Classification | Origin | Real world |
| Features | 10 | (Real / Integer / Nominal) | (4 / 6 / 0) |
| Instances | 5472 |
Classes | 5 |
| Missing values? | No |
Attribute description
| Attribute | Domain |
| Height | [1, 804] |
| Lenght | [1, 553] |
| Area | [7, 143993] |
| Eccen | [0.0070, 537.0] |
| P_black | [0.052, 1.0] |
| P_and | [0.062, 1.0] |
| Mean_tr | [1.0, 4955.0] |
| Blackpix | [1, 33017] |
| Blackand | [7, 46133] |
| Wb_trans | [1, 3212] |
| Class | {1, 2, 3, 4, 5} |
Additional information
This database contain blocks of the page layout of a document that has been detected by a segmentation process.
The task is to determine the type of block: Text (1), Horizontal line (2), Graphic (3), Vertical line (4) or Picture (5).
In this section you can download some files related to the page-blocks data set:
- The complete data set already formatted in KEEL format can be downloaded from
here
.
- A copy of the data set already partitioned by means of a 10-folds cross validation procedure can be downloaded from here
.
- A copy of the data set already partitioned by means of a 5-folds cross validation procedure can be downloaded from here
.
- The header file associated to this data set can be downloaded from here
.
- This is not a native data set from the KEEL project. It has been obtained from the UCI Machine Learning Repository. The original page where the data set can be found is: http://archive.ics.uci.edu/ml/datasets/Page+Blocks+Classification.
|