This section describes main characteristics of the census data set and its attributes:
General information
Census-Income (KDD) data set |
Type | Classification | Origin | Real world |
Features | 41 | (Real / Integer / Nominal) | (1 / 12 / 28) |
Classes | 3 |
Missing values? | Yes |
Total instances | 299284 |
Instances without missing values | 142521 |
Attribute description
Attribute | Domain | Attribute | Domain | Attribute | Domain |
Atr-0 | [0,90] | Atr-14 | {Not_in_universe,..., Job_leaver} | Atr-28 | {Yes, ..., No} |
Atr-1 | [Not_in_universe, ... ,Without_pay} | Atr-15 | {Armed_Forces, ... , Unemployed_part-_time} | Atr-29 | {Yes, ..., No} |
Atr-2 | [0,51] | Atr-16 | [0,99999] | Atr-30 | [0,6] |
Atr-3 | [0,46] | Atr-17 | [0,4608] | Atr-31 | {Not_in_universe, ..., Father_only_present} |
Atr-4 | {10th_grade, ..., High_school_graduate} | Atr-18 | [0,99999] | Atr-32 | {United-States, ..., Panama} |
Atr-5 | [0,9999] | Atr-19 | {Nonfiler, ..., Single} | Atr-33 | {United-States, ..., Laos} |
Atr-6 | {Not_in_universe, ... ,College_or_university} | Atr-20 | {South, ..., Abroad} | Atr-34 | {United-States, ..., Laos} |
Atr-7 | {Divorced, ..., Separated} | Atr-21 | {Arkansas, ... , Mississippi} | Atr-35 | {Native, ...., Foreign} |
Atr-8 | {Construction, ..., Armed_Forces} | Atr-22 | {Householder, ..., Spouse_of_householder} | Atr-36 | [0,2] |
Atr-9 | [Not_in_universe, ... , Armed_Forces} | Atr-23 | {Householder, ..., Nonrelative_of_householder} | Atr-37 | {Yes, ..., No} |
Atr-10 | {White, ..., Other} | Atr-24 | [37.87,18656.3] | Atr-38 | [0,2] |
Atr-11 | {All_other, ..., NA} | Atr-25 | {MSA_to_MSA, ..., NonMSA_to_MSA} | Atr-39 | [0,52] |
Atr-12 | {Male,Female} | Atr-26 | {Same_county, ..., Abroad} | Atr-40 | [94,95] |
Atr-13 | {Not_in_universe,No,Yes} | Atr-27 | {Same_county, ..., Nonmover} | Output | {-_50000.,50000+.} |
Additional information
The Census data set was extracted in 1994 from census data of the United States. It contains continuous and nominal attributes, describing some social information (age, race, sex, marital status, ...) about the citizens registered. The task is to predict whether the citizen’s income exceeds fifty thousand dollars a year.
This data set is an extended version of the Adult data set. It has four times its number of instances, and three times its number of attributes.
In this section you can download some files related to the census data set:
- The complete data set already formatted in KEEL formatcan be downloaded from here
.
- A copy of the data set already partitioned by means of a 10-folds cross validation procedure can be downloaded from here
.
- A copy of the data set already partitioned by means of a 5-folds cross validation procedure can be downloaded from here
.
- The header file associated to this data set can be downloaded from here
.
- This is not a native data set from the KEEL project. It has been obtained from the UCI Machine Learning Repository. The original page where the data set can be found is: http://archive.ics.uci.edu/ml/datasets/Census-Income+%28KDD%29.
|