The weak data files must have in the following format:
Headline. The relation name is defined as the first line in the ARFF file. The format is:
@ relation <name-of-relation>
where <relation-name> is a string. The string must be quoted if the name includes spaces.
Declaration of attributes. Attribute declarations take the form of an ordered sequence of @attribute statements. Each attribute in the data set has its own @attribute statement which uniquely defines the name of that attribute and it's data type. The order the attributes are declared indicates the column position in the data section of the file. For example, if an attribute is the third one declared then Weka expects that all that attributes values will be found in the third comma delimited column.
The format for the @attribute statement is:
@ attribute <attribute-name> <datatype>
where the <attribute-name> must start with an alphabetic character. If spaces are to be included in the name then the entire name must be quoted.
The <datatype> can be any of the four types currently (version 3.2.1) supported by Weka:
1) NUMERIC or REAL. Numeric attribute can be real numbers.
2) INTEGER. Integer attribute can be integer numbers.
3) DATE. Date attribute is an optional string specifying how date values should be parsed and printed. The default format string accepts the ISO-8601 combined date and time format: "yyyy-MM-dd'T'HH:mm:ss".
4) STRING. String attributes allow us to create attributes containing arbitrary textual values.
5) ENUMERATE. Enumerate attribute consists of a set of possible values separated by commas (Characters or strings), which can take the attribute. For example, if we have an attribute that indicates the time podr'ıa Express:
@ attribute time {sunny, rainy, cloudy}
Section data. The data section of the file contains the data declaration line and the actual instance lines. The @data declaration is a single line denoting the start of the data segment in the file. The format is:
@ data
X11, x12, ... , X1N
X21, x22, ... , X2N
Each instance is represented on a single line, with carriage returns denoting the end of the instance. Attribute values for each instance are delimited by commas. They must appear in the order that they were declared in the header section (i.e. the data corresponding to the nth @attribute declaration is always the nth field of the attribute).
Missing values are represented by a single question mark, as in:
@data4.4,?,1.5,?,Iris-setosa
Some of the specifications of this format are:
o The name of the relationship and the attributes are string type. This string type is same than string type used on Java.
o If any name contains spaces it is necessary to include double quote.
o If you need to indicate a missing values, you have to use symbol '?'.
o The separation symbol for decimals numbers is a point instead of a comma.
o The separation symbol for data in section @ data is comma.
o A % symbol means that the remainder of the line should be considered as a comment.
o These files are stores, by default, with the extension ".arff”.
The WEKA data files must have the following format:
@relation
<relation-name> @attribute <attribute-name-1> <datatype> ... @attribute <attribute-name-N> <datatype> @data value11,value12,value1N ... valueM1,valueM2,valueMN |
One example of a valid WEKA file is:
% Comment
@relation weather |