public class MyDataset
extends java.lang.Object
Modifier and Type | Field and Description |
---|---|
protected java.util.Vector |
attributes
The attributes.
|
protected int |
classIndex
The index of the class attribute.
|
protected InstanceSet |
IS
Keel dataset InstanceSet
|
protected java.util.Vector |
itemsets
The itemsets.
|
protected int[] |
m_IndicesBuffer
Buffer of indices for sparse itemsets
|
protected double[] |
m_ValueBuffer
Buffer of values for sparse itemsets
|
protected java.lang.String |
name
The name of the dataset.
|
Constructor and Description |
---|
MyDataset(MyDataset dataset)
Constructor that copies another dataset.
|
MyDataset(MyDataset dataset,
int capacity)
Constructor to copy all the attributes of another dataset but the itemsets.
|
MyDataset(MyDataset source,
int first,
int toCopy)
Creates a new set of itemsets by copying a
subset of another set.
|
MyDataset(java.lang.String name,
boolean train)
Function to read the .dat file that contains the information of the dataset.
|
MyDataset(java.lang.String name,
java.util.Vector attInfo,
int capacity)
Creates an empty set of itemsets.
|
Modifier and Type | Method and Description |
---|---|
void |
addItemset(Itemset itemset)
Function to add one itemset.
|
double[] |
attributeToDoubleArray(int index)
Gets the value of all itemsets in this dataset for a particular
attribute.
|
double |
averageClassValue()
Returns the average class value of the set.
|
double |
averageClassValue(Rule r)
Returns the average class value of the instances covered by a rule.
|
double |
averagePredictedClassValue(Rule r)
Returns the average predicted class value of the instances covered by a rule.
|
double |
averageValue(int att)
Returns the average value for a given attribute of the set.
|
boolean |
checkInstance(Itemset itemset)
Checks if the given itemset is compatible
with this dataset.
|
java.lang.String[] |
classify(Mask actives,
Ruleset[] rulesets,
int length)
Classifies the entries' classes according to several sets of rules.
|
double[] |
classify(Mask actives,
java.util.Vector rules)
Classifies the entries' classes according to several rules.
|
java.lang.String[] |
classify(Ruleset[] rulesets,
int length)
Classifies the entries' classes according to several sets of rules.
|
double[] |
classify(java.util.Vector rules)
Classifies the entries' classes according to several rules.
|
double |
classPredictedSTD(Rule r)
Computes the standard deviation (over the predicted values)
for the instances covered by a rule.
|
double |
classPredictedVariance(Rule r)
Computes the variance (over the predicted values)
for the instances covered by a rule.
|
double |
classSTD()
Computes the standard deviation for the class attribute.
|
double |
classSTD(Rule r)
Computes the standard deviation for the instances covered by a rule.
|
double |
classVariance()
Computes the variance for the class attribute.
|
double |
classVariance(Rule r)
Computes the variance for the instances covered by a rule.
|
void |
compactify()
Compactifies the set of itemsets.
|
java.lang.String |
copyHeader()
It copies the header of the dataset
|
void |
delete()
Removes all itemsets from the set.
|
void |
delete(int index)
Function to remove an itemset at the given position.
|
void |
deleteAttributeAt(int position)
Deletes an attribute at the given position
(0 to numAttributes() - 1).
|
void |
deleteWithMissing(int attIndex)
Function to remove all the attributes with missing value in the given attribute.
|
void |
deleteWithMissing(MyAttribute att)
Removes all itemsets with missing values for a particular
attribute from the dataset.
|
void |
deleteWithMissingClass()
Removes all itemsets with a missing class value
from the dataset.
|
MyDataset |
discretToBinary()
Transforms the discret attribute into numValues()-1 synthetic binary attributes.
|
java.util.Enumeration |
enumerateAttributes()
Enumerates all the attributes.
|
java.util.Enumeration |
enumerateItemsets()
Enumerates all the itemsets.
|
boolean |
equalHeaders(MyDataset dataset)
Checks if two headers are equivalent.
|
void |
filter(Mask mask,
int A,
double V,
int operator)
It filters the instances covered by a simple rule from this dataset;
i.e., it deactivates the instances not covered by that rule.
|
void |
filter(Mask mask,
Rule rule)
It filters the instances covered by a rule from this dataset;
i.e., it deactivates the instances not covered by that rule.
|
void |
filter(Mask mask,
Ruleset rules)
It filters the instances covered by a set of rule from this dataset;
i.e., it deactivates the instances not covered by that ruleset.
|
void |
filter(Mask mask,
Ruleset rules,
int ignore)
It filters the instances covered by a set of rule from this dataset;
i.e., it deactivates the instances not covered by that ruleset.
|
void |
filter(Mask mask,
SimpleRule sr)
It filters the instances covered by a simple rule from this dataset;
i.e., it deactivates the instances not covered by that rule.
|
void |
filterByClass(Mask mask,
java.lang.String class_name)
It filters the instances of a given class from this dataset;
i.e., it deactivates the instances from the other class.
|
Itemset |
firstInstance()
Returns the first itemset in the set.
|
MyAttribute |
getAttribute(int index)
Returns the attribute that has the index.
|
MyAttribute |
getAttribute(java.lang.String name)
Returns the attribute that has the name.
|
MyAttribute |
getClassAttribute()
Returns class attribute.
|
int[] |
getClassFequency()
Returns the frequency (number of instances) of each class.
|
int[] |
getClassFequency(Mask filter)
Returns the frequency (number of instances) of each class.
|
int |
getClassIndex()
Returns the index of the class attribute.
|
double[] |
getExample(int pos)
Output a specific example
|
double[] |
getExample(Mask mask)
Output a specific example
|
java.lang.String |
getName()
Returns the name of the dataset.
|
void |
insertAttributeAt(MyAttribute att,
int position)
Inserts an attribute at the given position (0 to
numAttributes()) and sets all values to be missing.
|
boolean |
isMissing(int exemple,
int attribute)
It returns wether the value for an attribute in a given exemple is missing
|
boolean |
isMissing(Mask mask,
int attribute)
It returns wether the value for an attribute in a given exemple is missing
|
Itemset |
itemset(int index)
Returns the itemset at the given position.
|
Itemset |
lastItemset()
Returns the last itemset.
|
double |
meanOrMode(int attIndex)
Returns the mean (mode) for a numeric (nominal) attribute as
a floating-point value.
|
double |
meanOrMode(MyAttribute att)
Returns the mean (mode) for a numeric (nominal) attribute as a
floating-point value.
|
int |
numAttributes()
Returns the number of attributes.
|
int |
numClasses()
Returns the number of possible values of the class attribute.
|
int |
numItemsets()
Returns the number of itemsets.
|
double |
ruleCorrelation(Rule r)
Computes the third heuristic exposed in [Holmes99]
|
double |
ruleDeviation(Rule r)
Computes the deviation of a rule from the predicted class values.
|
double |
ruleMeanAbsoluteError(Rule r)
Computes the mean absolute error of a rule for the predicted class values.
|
void |
setClass(MyAttribute att)
Sets the class attribute.
|
void |
setClassIndex(int classIndex)
Sets the class index of the set.
|
void |
setName(java.lang.String name)
Returns the name of the dataset.
|
int |
size()
It returns the number of exemple of the dataset
|
void |
sort(int attIndex)
Function to sort the dataset based on an attribute.
|
void |
sort(MyAttribute att)
Function to sort the dataset based on an attribute.
|
int[][] |
sortByAverageClassValues()
Computes the average class values for each attribute and value,
and sort them by it.
|
MyDataset[] |
split(Rule r)
It split phisically the itemsets into two subdatasets,
according to the coverage of a rule.
|
void |
stratify(int numFolds)
Stratifies a set of itemsets according to its class values
if the class attribute is nominal (so that afterwards a
stratified cross-validation can be performed).
|
void |
substract(Mask mask,
int A,
double V,
int operator)
It substracts the instances covered by a simple rule from this dataset;
i.e., it deactivates the instances covered by that rule.
|
void |
substract(Mask mask,
Rule rule)
It substracts the instances covered by a rule from this dataset;
i.e., it deactivates the instances covered by that rule.
|
void |
substract(Mask mask,
Ruleset rules)
It substracts the instances covered by a set of rule from this dataset;
i.e., it deactivates the instances covered by that ruleset.
|
void |
substract(Mask mask,
Ruleset rules,
int ignore)
It substracts the instances covered by a set of rules from this dataset;
i.e., it deactivates the instances covered by that ruleset.
|
void |
substract(Mask mask,
SimpleRule sr)
It substracts the instances covered by a simple rule from this dataset;
i.e., it deactivates the instances covered by that rule.
|
double |
sumOfWeights()
Function to compute the sum of all the weights of the itemsets.
|
MyDataset |
testCV(int numFolds,
int numFold)
Creates the test set for one fold of a cross-validation on
the dataset.
|
java.lang.String |
toString()
Returns a string representation of the entries of this MyDataset.
|
java.lang.String |
toString(Mask mask)
Returns a string representation of the active entries of this MyDataset.
|
MyDataset |
trainCV(int numFolds,
int numFold)
Creates the training set for one fold of a cross-validation
on the dataset.
|
double |
variance(int attIndex)
Computes the variance for a numeric attribute.
|
double |
variance(MyAttribute att)
Computes the variance for a numeric attribute.
|
protected java.lang.String name
protected java.util.Vector attributes
protected java.util.Vector itemsets
protected int classIndex
protected InstanceSet IS
protected double[] m_ValueBuffer
protected int[] m_IndicesBuffer
public MyDataset(java.lang.String name, boolean train)
name
- The reader object where the itemsets are readed.train
- The flag if the file is for trainingpublic MyDataset(java.lang.String name, java.util.Vector attInfo, int capacity)
name
- the name of the relationattInfo
- the attribute informationcapacity
- the capacity of the setpublic MyDataset(MyDataset dataset)
dataset
- The dataset to be copied.public MyDataset(MyDataset source, int first, int toCopy)
source
- the set of itemsets from which a subset
is to be createdfirst
- the index of the first itemset to be copiedtoCopy
- the number of itemsets to be copiedjava.lang.IllegalArgumentException
- if first and toCopy are out of rangepublic MyDataset(MyDataset dataset, int capacity)
dataset
- The dataset to be copied.capacity
- The number of itemsets.public final void addItemset(Itemset itemset)
itemset
- The itemset to add to the dataset.public java.lang.String getName()
public void setName(java.lang.String name)
name
- the name of the dataset.public final MyAttribute getAttribute(int index)
index
- int The index of the attribute.public final MyAttribute getAttribute(java.lang.String name)
name
- String The name of the attribute.public double[] attributeToDoubleArray(int index)
index
- the index of the attribute.public final MyAttribute getClassAttribute()
public final void setClass(MyAttribute att)
att
- attribute to be the classpublic final int getClassIndex()
public final void setClassIndex(int classIndex)
classIndex
- the new class indexjava.lang.IllegalArgumentException
- if the class index is too big or < 0public final double variance(int attIndex)
attIndex
- the numeric attributejava.lang.IllegalArgumentException
- if the attribute is not numericpublic final double variance(MyAttribute att)
att
- the numeric attributejava.lang.IllegalArgumentException
- if the attribute is not numericpublic final double classVariance()
java.lang.IllegalArgumentException
- if the class is not numericpublic final double classSTD()
java.lang.IllegalArgumentException
- if the class is not numericpublic final double classVariance(Rule r)
r
- the rulejava.lang.IllegalArgumentException
- if the class is not numericpublic final double classSTD(Rule r)
r
- the rulejava.lang.IllegalArgumentException
- if the class is not numericpublic final double classPredictedVariance(Rule r)
r
- the rulejava.lang.IllegalArgumentException
- if the class is not numericpublic final double classPredictedSTD(Rule r)
r
- the rulejava.lang.IllegalArgumentException
- if the class is not numericpublic final double meanOrMode(int attIndex)
attIndex
- the attribute's indexpublic final double meanOrMode(MyAttribute att)
att
- the attributepublic final int numAttributes()
public final int numClasses()
public final int numItemsets()
public final void compactify()
public final void delete()
public final void delete(int index)
index
- The index of the itemset to be deleted.public final boolean checkInstance(Itemset itemset)
itemset
- the itemsetpublic void insertAttributeAt(MyAttribute att, int position)
att
- the attribute to be insertedposition
- the attribute's positionjava.lang.IllegalArgumentException
- if the given index is out of rangepublic void deleteAttributeAt(int position)
position
- the attribute's positionjava.lang.IllegalArgumentException
- if the given index is out of range or the
class attribute is being deletedpublic final void deleteWithMissing(int attIndex)
attIndex
- The index of the attribute.public final void deleteWithMissing(MyAttribute att)
att
- the attributepublic final void deleteWithMissingClass() throws java.lang.Exception
java.lang.Exception
- UnassignedClassException if class is not setpublic java.util.Enumeration enumerateAttributes()
public final java.util.Enumeration enumerateItemsets()
public final void stratify(int numFolds) throws java.lang.Exception
numFolds
- the number of folds in the cross-validationjava.lang.Exception
- UnassignedClassException if the class is not setpublic final Itemset firstInstance()
public final Itemset itemset(int index)
index
- The index of the itemset.public final Itemset lastItemset()
public final double sumOfWeights()
public final boolean equalHeaders(MyDataset dataset)
dataset
- another datasetpublic final void sort(int attIndex)
attIndex
- The index of the attribute.public final void sort(MyAttribute att)
att
- The attribute.public MyDataset trainCV(int numFolds, int numFold)
numFolds
- the number of folds in the cross-validation. Must
be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...java.lang.IllegalArgumentException
- if the number of folds is less than 2
or greater than the number of itemsets.public MyDataset testCV(int numFolds, int numFold)
numFolds
- the number of folds in the cross-validation. Must
be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...java.lang.IllegalArgumentException
- if the number of folds is less than 2
or greater than the number of itemsets.public void filter(Mask mask, int A, double V, int operator)
mask
- Mask the mask with the active entries of the datasetA
- int attribute's idV
- double attribute's valueoperator
- int rule operator. It could be: Rule.EQUAL(for discret attributes),
Rule.GREATER (>) or Rule.LOWER(<=)public void filter(Mask mask, SimpleRule sr)
mask
- Mask the mask with the actives entries of the datasetsr
- SimpleRule the rulepublic void filter(Mask mask, Rule rule)
mask
- Mask the mask with the active entries of the datasetrule
- Rule the rulepublic void filter(Mask mask, Ruleset rules)
mask
- Mask the mask with the active entries of the datasetrules
- Ruleset the rulesetpublic void filter(Mask mask, Ruleset rules, int ignore)
mask
- Mask the mask with the active entries of the datasetrules
- Ruleset the rulesetignore
- int the algorithm ignores the i-th rule of the rulesetpublic void filterByClass(Mask mask, java.lang.String class_name)
mask
- Mask the mask whit the active entries of the datasetclass_name
- String the name of the classpublic void substract(Mask mask, int A, double V, int operator)
mask
- Mask the mask with the active entries of the datasetA
- int attribute's idV
- double attribute's valueoperator
- int rule operator. It could be: Rule.EQUAL(for discret attributes),
Rule.GREATER (>) or Rule.LOWER(<=)public void substract(Mask mask, SimpleRule sr)
mask
- Mask the mask with the active entries of the datasetsr
- SimpleRule the rulepublic void substract(Mask mask, Rule rule)
mask
- Mask the mask with the active entries of the datasetrule
- Rule the rulepublic void substract(Mask mask, Ruleset rules)
mask
- Mask the mask with the active entries of the datasetrules
- Ruleset the set of rulespublic void substract(Mask mask, Ruleset rules, int ignore)
mask
- Mask the mask with the active entries of the datasetrules
- Ruleset the set of rulesignore
- int number of the rule to ignorepublic final double ruleDeviation(Rule r)
r
- the rulejava.lang.IllegalArgumentException
- if the class is not numericpublic final double ruleMeanAbsoluteError(Rule r)
r
- the rulejava.lang.IllegalArgumentException
- if the class is not numericpublic final double ruleCorrelation(Rule r)
r
- the rulejava.lang.IllegalArgumentException
- if the class is not numericpublic java.lang.String[] classify(Mask actives, Ruleset[] rulesets, int length)
actives
- Mask active entries of the datasetrulesets
- Ruleset[] the rulesetslength
- int the number of rulesetspublic boolean isMissing(int exemple, int attribute)
exemple
- int index of the exemple in the datasetattribute
- int index of the attributepublic boolean isMissing(Mask mask, int attribute)
mask
- Mask the index of the mask signs the given exempleattribute
- int index of the attributepublic int size()
public java.lang.String[] classify(Ruleset[] rulesets, int length)
rulesets
- Ruleset[] the rulesetslength
- int the number of rulesetspublic double[] classify(Mask actives, java.util.Vector rules)
actives
- Mask active entries of the datasetrules
- Vector the rules vectorpublic double[] classify(java.util.Vector rules)
rules
- Vector the rules vectorpublic double[] getExample(int pos)
pos
- int position (id) of the example in the data-setpublic double[] getExample(Mask mask)
mask
- Mask with the position (id) of the example in the data-setpublic int[] getClassFequency()
public int[] getClassFequency(Mask filter)
filter
- Mask filterpublic MyDataset[] split(Rule r)
r
- the rulepublic double averageClassValue()
public double averageClassValue(Rule r)
r
- the rulepublic double averagePredictedClassValue(Rule r)
r
- the rulepublic double averageValue(int att)
att
- the attribute's indexpublic MyDataset discretToBinary()
public int[][] sortByAverageClassValues()
public java.lang.String copyHeader()
public java.lang.String toString()
toString
in class java.lang.Object
public java.lang.String toString(Mask mask)
mask
- Mask active entries