public class MyDataset
extends java.lang.Object
Contains the methods to read a Classification/Regression Dataset
Modifier and Type | Field and Description |
---|---|
static int |
INTEGER
Number to represent type of variable integer.
|
static int |
NOMINAL
Number to represent type of variable nominal.
|
static int |
REAL
Number to represent type of variable real or double.
|
Constructor and Description |
---|
MyDataset()
Init a new set of instances
|
Modifier and Type | Method and Description |
---|---|
double |
average(int position)
It return the average of an specific attribute
|
java.lang.String[] |
classify(Mask actives,
Ruleset[] rulesets,
int length)
Classifies the entries' classes according to several sets of rules.
|
java.lang.String[] |
classify(Ruleset[] rulesets,
int length)
Classifies the entries' classes according to several sets of rules.
|
void |
computeInstancesPerClass()
It computes the number the instances per class.
|
java.lang.String |
copyHeader()
It copies the header of the dataset
|
double[][] |
devuelveRangos()
Returns the minimum and maximum values of every attributes as a matrix.
|
void |
filter(Mask mask,
int A,
double V,
int operator)
It filters the instances covered by a simple rule from this dataset;
i.e., it deactivates the instances not covered by that rule.
|
void |
filter(Mask mask,
Rule rule)
It filters the instances covered by a rule from this dataset;
i.e., it deactivates the instances not covered by that rule.
|
void |
filter(Mask mask,
Ruleset rules)
It filters the instances covered by a set of rule from this dataset;
i.e., it deactivates the instances not covered by that ruleset.
|
void |
filter(Mask mask,
SimpleRule sr)
It filters the instances covered by a simple rule from this dataset;
i.e., it deactivates the instances not covered by that rule.
|
void |
filterByClass(Mask mask,
java.lang.String value)
It filters the instances of a given class from this dataset;
i.e., it deactivates the instances from the other class.
|
double[] |
getemax()
It returns an array with the maximum values of the attributes
|
double[] |
getemin()
It returns an array with the minimum values of the attributes
|
double[] |
getExample(int pos)
Output a specific example
|
double[] |
getExample(Mask mask)
Output a specific example
|
double |
getMax(int variable)
It returns the maximum value of the attribute specified
|
double |
getMin(int variable)
It returns the minimum value of the attribute specified
|
int |
getnClasses()
It gets the number of output attributes of the data-set (for example number of classes in classification)
|
int |
getnData()
It gets the size of the data-set
|
int |
getnInputs()
It gets the number of input attributes of the data-set
|
int |
getnVars()
It gets the number of variables of the data-set (including the output)
|
int[] |
getOutputAsInteger()
Returns the output of the data-set as integer values
|
int |
getOutputAsInteger(int pos)
It returns the output value of the example "pos"
|
double[] |
getOutputAsReal()
Returns the output of the data-set as real values
|
double |
getOutputAsReal(int pos)
It returns the output value of the example "pos"
|
java.lang.String[] |
getOutputAsString()
Returns the output of the data-set as nominal values
|
java.lang.String |
getOutputAsString(int pos)
It returns the output value of the example "pos"
|
java.lang.String |
getOutputValue(int intValue)
It returns the name of the class of index intValue
|
double[][] |
getX()
Outputs an array of examples with their corresponding attribute values.
|
boolean |
hasMissingAttributes()
It checks if the data-set has any missing value
|
boolean |
hasNumericalAttributes()
It checks if the data-set has any numerical value
|
boolean |
hasRealAttributes()
It checks if the data-set has any real value
|
boolean |
isMissing(int i,
int j)
This function checks if the attribute value is missing
|
boolean |
isMissing(Mask mask,
int j)
This function checks if the attribute value is missing
|
void |
normalize()
It transform the input space into the [0,1] range
|
int |
numberInstances(int clas)
It returns the number of instances in the dataset of the given class
|
int |
numberValues(int attribute)
It returns the number of different values of an attribute
|
void |
readClassificationSet(java.lang.String datasetFile,
boolean train)
It reads the whole input data-set and it stores each example and its associated output value in
local arrays to ease their use.
|
void |
readRegressionSet(java.lang.String datasetFile,
boolean train)
It reads the whole input data-set and it stores each example and its associated output value in
local arrays to ease their use.
|
int |
size()
It returns the number of examples
|
int |
sizeWithoutMissing()
It return the size of the data-set without having account the missing values
|
double |
stdDev(int position)
It return the standard deviation of an specific attribute
|
void |
substract(Mask mask,
int A,
double V,
int operator)
It substracts the instances covered by a simple rule from this dataset;
i.e., it deactivates the instances covered by that rule.
|
void |
substract(Mask mask,
Rule rule)
It substracts the instances covered by a rule from this dataset;
i.e., it deactivates the instances covered by that rule.
|
void |
substract(Mask mask,
Ruleset rules)
It substracts the instances covered by a set of rule from this dataset;
i.e., it deactivates the instances covered by that ruleset.
|
void |
substract(Mask mask,
Ruleset rules,
int ignore)
It substracts the instances covered by a set of rules from this dataset;
i.e., it deactivates the instances covered by that ruleset.
|
void |
substract(Mask mask,
SimpleRule sr)
It substracts the instances covered by a simple rule from this dataset;
i.e., it deactivates the instances covered by that rule.
|
java.lang.String |
toString()
Returns a string representation of the entries of this MyDataset.
|
java.lang.String |
toString(Mask mask)
Returns a string representation of the active entries of this MyDataset.
|
java.lang.String |
toString(Mask mask,
double[] distribution)
Returns a string representation of the active entries of this MyDataset
wiht its associated weights.
|
public static final int REAL
public static final int INTEGER
public static final int NOMINAL
public double[][] getX()
public double[] getExample(int pos)
pos
- int position (id) of the example in the data-setpublic double[] getExample(Mask mask)
mask
- Mask with the position (id) of the example in the data-setpublic int[] getOutputAsInteger()
public double[] getOutputAsReal()
public java.lang.String[] getOutputAsString()
public java.lang.String getOutputAsString(int pos)
pos
- int the position (id) of the examplepublic int getOutputAsInteger(int pos)
pos
- int the position (id) of the examplepublic double getOutputAsReal(int pos)
pos
- int the position (id) of the examplepublic double[] getemax()
public double[] getemin()
public double getMax(int variable)
variable
- index of the attributepublic double getMin(int variable)
variable
- index of the attributepublic int getnData()
public int getnVars()
public int getnInputs()
public int getnClasses()
public boolean isMissing(int i, int j)
i
- int Example idj
- int Variable idpublic boolean isMissing(Mask mask, int j)
mask
- Maskj
- int Variable idpublic void readClassificationSet(java.lang.String datasetFile, boolean train) throws java.io.IOException
datasetFile
- String name of the file containing the datasettrain
- boolean It must have the value "true" if we are reading the training data-setjava.io.IOException
- If there ocurs any problem with the reading of the data-setpublic void readRegressionSet(java.lang.String datasetFile, boolean train) throws java.io.IOException
datasetFile
- String name of the file containing the datasettrain
- boolean It must have the value "true" if we are reading the training data-setjava.io.IOException
- If there ocurs any problem with the reading of the data-setpublic java.lang.String copyHeader()
public void normalize()
public boolean hasRealAttributes()
public boolean hasNumericalAttributes()
public boolean hasMissingAttributes()
public int sizeWithoutMissing()
public int size()
public double stdDev(int position)
position
- int attribute id (position of the attribute)public double average(int position)
position
- int attribute id (position of the attribute)public void computeInstancesPerClass()
public int numberInstances(int clas)
clas
- the index of the classpublic int numberValues(int attribute)
attribute
- the index of the attributepublic java.lang.String getOutputValue(int intValue)
intValue
- the index of the classpublic double[][] devuelveRangos()
public void filter(Mask mask, int A, double V, int operator)
mask
- Mask the mask with the active entries of the datasetA
- int attribute's idV
- double attribute's valueoperator
- int rule operator: >,<= or = (Rule.GREATER,Rule.LOWER,Rule.EQUAL)public void filter(Mask mask, SimpleRule sr)
mask
- Mask the mask with the actives entries of the datasetsr
- SimpleRule the rulepublic void filter(Mask mask, Rule rule)
mask
- Mask the mask with the active entries of the datasetrule
- Rule the rulepublic void filter(Mask mask, Ruleset rules)
mask
- Mask the mask with the active entries of the datasetrules
- Ruleset the rulesetpublic void filterByClass(Mask mask, java.lang.String value)
mask
- Mask the mask whit the active entries of the datasetvalue
- String the name of the classpublic void substract(Mask mask, int A, double V, int operator)
mask
- Mask the mask with the active entries of the datasetA
- int attribute's idV
- double attribute's valueoperator
- int rule operator: >,<= or = (Rule.GREATER,Rule.LOWER,Rule.EQUAL)public void substract(Mask mask, SimpleRule sr)
mask
- Mask the mask with the active entries of the datasetsr
- SimpleRule the rulepublic void substract(Mask mask, Rule rule)
mask
- Mask the mask with the active entries of the datasetrule
- Rule the rulepublic void substract(Mask mask, Ruleset rules)
mask
- Mask the mask with the active entries of the datasetrules
- Ruleset the set of rulespublic void substract(Mask mask, Ruleset rules, int ignore)
mask
- Mask the mask with the active entries of the datasetrules
- Ruleset the set of rulesignore
- int number of the rule to ignorepublic java.lang.String[] classify(Mask actives, Ruleset[] rulesets, int length)
actives
- Mask active entries of the datasetrulesets
- Ruleset[] the rulesetslength
- int the number of rulesetspublic java.lang.String[] classify(Ruleset[] rulesets, int length)
rulesets
- Ruleset[] the rulesetslength
- int the number of rulesetspublic java.lang.String toString()
toString
in class java.lang.Object
public java.lang.String toString(Mask mask)
mask
- Mask active entriespublic java.lang.String toString(Mask mask, double[] distribution)
mask
- Mask active entriesdistribution
- a distribution of weights