UCI DATA FILE FORMAT

 

Files are encoded according to C4.5 format. This format consists of two files, one of them it is a name file with extension ".names", the other one it is a data file with extension ".data".

 

The characteristics of name files are the following:

  <attribute-name : attribute-type>

The attribute-name is an identifier  followed by a colon. The attribute type which must be one of:

                continuous: if the attribute has a continuous values.

                discrete <n>: the word 'discrete' followed by an integer which indicates how many values the attribute can take.

                 ignore: indicates that this attribute should be ignored.

 

            The format of the '.name' file is the following:
 

class-1, class-2, ..., class-N.
characteristic-1: domain.
characteristic-2: domain.
...
characteristic-M: domain.


The characteristics of data  files are the following

 

The format of the '.data' file is the following:

 

value11, value12, ..., value1N
value21, value22, ..., value2N
...
valueM1, valueM2, ..., valueMN

 

           An example of an UCI data file is the following

| Firstly the name of classes

good, bad.

|Then the attributes
dur: continuous.
wage1: continuous.
wage2: continuous.
wage3: continuous.
cola: tc, none, tcf.
hours: continuous.
pension: empl contr, ret allw, none.
stby_pay: continuous.
shift_diff: continuous.
educ_allw: yes, no.
holidays: continuous.
vacation: average, generous, below average.
lngtrm_disabil: yes, no.
dntl_ins: half, none, full.

bereavement: yes, no.
empl_hplan: half, full, none.

 

 

 

2,5.0,4.0,?,none,37,?,?,5,no,11,below average,yes,full,yes,full,good
3,2.0,2.5,?,?,35,none,?,?,?,10,average,?,?,yes,full,bad
3,4.5,4.5,5.0,none,40,?,?,?,no,11,average,?,half,?,?,good
3,3.0,2.0,2.5,tc,40,none,?,5,no,10,below average,yes,half,yes,full,bad