WEKA DATA FILE FORMAT

 

The weak data files must have in the following format:  

@ relation <name-of-relation>

where <relation-name> is a string. The string must be quoted if the name includes spaces.

                The format for the @attribute statement is:

        @ attribute <attribute-name> <datatype>

         where the <attribute-name> must start with an alphabetic character. If spaces are to be included in the name then the entire name must be quoted.

         The <datatype> can be any of the four types currently (version 3.2.1) supported by Weka:

1)      NUMERIC or REAL. Numeric attribute can be real numbers.

2)      INTEGER. Integer attribute can be integer numbers.

3)      DATE. Date attribute is an optional string specifying how date values should be parsed and printed. The default format string accepts the ISO-8601 combined date and time format: "yyyy-MM-dd'T'HH:mm:ss".

4)      STRING. String attributes allow us to create attributes containing arbitrary textual values.

5)      ENUMERATE. Enumerate attribute consists of a set of possible values separated by commas (Characters or strings), which can take the attribute. For example, if we have an attribute that indicates the time podr'ıa Express:
                    @ attribute time {sunny, rainy, cloudy}

@ data

X11, x12, ... , X1N

X21, x22, ... , X2N

Each instance is represented on a single line, with carriage returns denoting the end of the instance. Attribute values for each instance are delimited by commas. They must appear in the order that they were declared in the header section (i.e. the data corresponding to the nth @attribute declaration is always the nth field of the attribute).

        Missing values are represented by a single question mark, as in:

                @data
                4.4,?,1.5,?,Iris-setosa
 
 

Some of the specifications of this format are: 

o       The name of the relationship and the attributes are string type. This string type is same than string type used on Java.

o       If any name contains spaces it is necessary to include double quote.

o       If you need to indicate a missing values, you have to use symbol '?'.

o       The separation symbol for decimals numbers is a point instead of a comma.

o       The separation symbol for data in section @ data is comma.

o       A % symbol means that the remainder of the line should be considered as a comment.

o       These files are stores, by default, with the extension ".arff”.

 

The WEKA data files must have the following format:

 

@relation <relation-name>
@attribute <attribute-name-1> <datatype>
...
@attribute <attribute-name-N> <datatype>
@data
value11,value12,value1N
...
valueM1,valueM2,valueMN

 

One example of a valid WEKA file is:

 

% Comment

@relation weather
@attribute outlook sunny, overcast, rainy
@attribute temperature real
@attribute humidity real
@attribute windy TRUE, FALSE
@attribute play yes, no
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes