This section describes main characteristics of the page-blocks data set and its attributes:
General information
Page Blocks Classification data set |
Type | Classification | Origin | Real world |
Features | 10 | (Real / Integer / Nominal) | (4 / 6 / 0) |
Instances | 5472 |
Classes | 5 |
Missing values? | No |
Attribute description
Attribute | Domain |
Height | [1, 804] |
Lenght | [1, 553] |
Area | [7, 143993] |
Eccen | [0.0070, 537.0] |
P_black | [0.052, 1.0] |
P_and | [0.062, 1.0] |
Mean_tr | [1.0, 4955.0] |
Blackpix | [1, 33017] |
Blackand | [7, 46133] |
Wb_trans | [1, 3212] |
Class | {1, 2, 3, 4, 5} |
Additional information
This database contain blocks of the page layout of a document that has been detected by a segmentation process.
The task is to determine the type of block: Text (1), Horizontal line (2), Graphic (3), Vertical line (4) or Picture (5).
In this section you can download some files related to the page-blocks data set:
- The complete data set already formatted in KEEL format can be downloaded from
here.
- A copy of the data set already partitioned by means of a 10-folds cross validation procedure can be downloaded from here.
- A copy of the data set already partitioned by means of a 5-folds cross validation procedure can be downloaded from here.
- The header file associated to this data set can be downloaded from here.
- This is not a native data set from the KEEL project. It has been obtained from the UCI Machine Learning Repository. The original page where the data set can be found is: http://archive.ics.uci.edu/ml/datasets/Page+Blocks+Classification.
|