public class myDataset
extends java.lang.Object
It contains the methods to read a Classification/Regression Dataset.
Modifier and Type | Field and Description |
---|---|
static int |
INTEGER
Number to represent type of variable integer.
|
static int |
NOMINAL
Number to represent type of variable nominal.
|
static int |
REAL
Number to represent type of variable real or double.
|
Constructor and Description |
---|
myDataset()
Init a new set of instances
|
Modifier and Type | Method and Description |
---|---|
double |
average(int position)
It return the average of an specific attribute
|
void |
computeInstancesPerClass()
It computes the number of intances per class
|
java.lang.String |
copyHeader()
It copies the header of the dataset
|
double[] |
getemax()
It returns an array with the maximum values of the attributes
|
double[] |
getemin()
It returns an array with the minimum values of the attributes
|
double[] |
getExample(int pos)
Output a specific example
|
double |
getMax(int variable)
It returns the maximum value of the attribute "variable"
|
double |
getMin(int variable)
It returns the minimum value of the attribute "variable"
|
boolean[] |
getMissing(int pos)
It returns an array showing if the value of each attribute for the instance "pos" is missing (TRUE) or not (FALSE)
|
int |
getnClasses()
It gets the number of output attributes of the data-set (for example number of classes in classification)
|
int |
getnData()
It gets the size of the data-set
|
int |
getnInputs()
It gets the number of input attributes of the data-set
|
int |
getnVars()
It gets the number of variables of the data-set (including the output)
|
int[] |
getOutputAsInteger()
Returns the output of the data-set as integer values
|
int |
getOutputAsInteger(int pos)
It returns the output value of the example "pos"
|
double[] |
getOutputAsReal()
Returns the output of the data-set as real values
|
double |
getOutputAsReal(int pos)
It returns the output value of the example "pos"
|
java.lang.String[] |
getOutputAsString()
Returns the output of the data-set as nominal values
|
java.lang.String |
getOutputAsString(int pos)
It returns the output value of the example "pos"
|
java.lang.String |
getOutputValue(int intValue)
It returns the nominal value for the class in the position "intValue"
|
int |
getType(int variable)
It returns the type for the attribute "variable"
|
double[][] |
getX()
Outputs an array of examples with their corresponding attribute values.
|
boolean |
hasMissingAttributes()
It checks if the data-set has any missing value
|
boolean |
hasNumericalAttributes()
It checks if the data-set has any numerical (real or integer) value
|
boolean |
hasRealAttributes()
It checks if the data-set has any real value
|
boolean |
isMissing(int i,
int j)
This function checks if the attribute value is missing
|
java.lang.String |
nameVar(int pos)
It returns the name of the atribute in position "pos"
|
void |
normalize()
It transform the input space into the [0,1] range
|
int |
numberInstances(java.lang.String clas)
It computes the number of intances for the class "clas"
|
int |
numberValues(int attribute)
It returns the number of nominal values for the atributte "attribute"
|
void |
readClassificationSet(java.lang.String datasetFile,
boolean train)
It reads the whole input data-set and it stores each example and its associated output value in
local arrays to ease their use.
|
void |
readRegressionSet(java.lang.String datasetFile,
boolean train)
It reads the whole input data-set and it stores each example and its associated output value in
local arrays to ease their use.
|
int |
sizeWithoutMissing()
It return the size of the data-set without having account the missing values
|
double |
stdDev(int position)
It return the standard deviation of an specific attribute
|
public static final int REAL
public static final int INTEGER
public static final int NOMINAL
public double[][] getX()
Outputs an array of examples with their corresponding attribute values.
public double[] getExample(int pos)
Output a specific example
pos
- int position (id) of the example in the data-setpublic int[] getOutputAsInteger()
Returns the output of the data-set as integer values
public double[] getOutputAsReal()
Returns the output of the data-set as real values
public java.lang.String[] getOutputAsString()
Returns the output of the data-set as nominal values
public java.lang.String getOutputAsString(int pos)
It returns the output value of the example "pos"
pos
- int the position (id) of the examplepublic int getOutputAsInteger(int pos)
It returns the output value of the example "pos"
pos
- int the position (id) of the examplepublic double getOutputAsReal(int pos)
It returns the output value of the example "pos"
pos
- int the position (id) of the examplepublic double[] getemax()
It returns an array with the maximum values of the attributes
public double[] getemin()
It returns an array with the minimum values of the attributes
public double getMax(int variable)
It returns the maximum value of the attribute "variable"
variable
- int Variable idpublic double getMin(int variable)
It returns the minimum value of the attribute "variable"
variable
- int Variable idpublic int getnData()
It gets the size of the data-set
public int getnVars()
It gets the number of variables of the data-set (including the output)
public int getnInputs()
It gets the number of input attributes of the data-set
public int getnClasses()
It gets the number of output attributes of the data-set (for example number of classes in classification)
public boolean isMissing(int i, int j)
This function checks if the attribute value is missing
i
- int Example idj
- int Variable idpublic void readClassificationSet(java.lang.String datasetFile, boolean train) throws java.io.IOException
It reads the whole input data-set and it stores each example and its associated output value in local arrays to ease their use.
datasetFile
- String name of the file containing the datasettrain
- boolean It must have the value "true" if we are reading the training data-setjava.io.IOException
- If there ocurs any problem with the reading of the data-setpublic void readRegressionSet(java.lang.String datasetFile, boolean train) throws java.io.IOException
It reads the whole input data-set and it stores each example and its associated output value in local arrays to ease their use.
datasetFile
- String name of the file containing the datasettrain
- boolean It must have the value "true" if we are reading the training data-setjava.io.IOException
- If there ocurs any problem with the reading of the data-setpublic java.lang.String copyHeader()
It copies the header of the dataset
public void normalize()
It transform the input space into the [0,1] range
public boolean hasRealAttributes()
It checks if the data-set has any real value
public boolean hasNumericalAttributes()
It checks if the data-set has any numerical (real or integer) value
public boolean hasMissingAttributes()
It checks if the data-set has any missing value
public int sizeWithoutMissing()
It return the size of the data-set without having account the missing values
public double stdDev(int position)
It return the standard deviation of an specific attribute
position
- int attribute id (position of the attribute)public double average(int position)
It return the average of an specific attribute
position
- int attribute id (position of the attribute)public void computeInstancesPerClass()
It computes the number of intances per class
public int numberInstances(java.lang.String clas)
It computes the number of intances for the class "clas"
clas
- String Name of the classpublic int numberValues(int attribute)
It returns the number of nominal values for the atributte "attribute"
attribute
- int attribute id (position of the attribute)public java.lang.String getOutputValue(int intValue)
It returns the nominal value for the class in the position "intValue"
intValue
- int class id (position of the class)public int getType(int variable)
It returns the type for the attribute "variable"
variable
- int attribute id (position of the attribute)public boolean[] getMissing(int pos)
It returns an array showing if the value of each attribute for the instance "pos" is missing (TRUE) or not (FALSE)
pos
- int Instance idpublic java.lang.String nameVar(int pos)
It returns the name of the atribute in position "pos"
pos
- int Variable id