public class M5Instances
extends java.lang.Object
implements java.io.Serializable
Modifier and Type | Field and Description |
---|---|
protected M5Vector |
m_Attributes
The attribute information.
|
protected int |
m_ClassIndex
The class attribute's index
|
protected int[] |
m_IndicesBuffer
Buffer of indices for sparse instance
|
protected M5Vector |
m_Instances
The instances.
|
protected java.lang.String |
m_NameClassIndex
name of the class with output
|
protected java.lang.String |
m_RelationName
The dataset's name.
|
protected double[] |
m_ValueBuffer
Buffer of values for sparse instance
|
Constructor and Description |
---|
M5Instances(M5Instances dataset)
Constructor copying all instances and references to
the header information from the given set of instances.
|
M5Instances(M5Instances dataset,
int capacity)
Constructor creating an empty set of instances.
|
M5Instances(M5Instances source,
int first,
int toCopy)
Creates a new set of instances by copying a
subset of another set.
|
M5Instances(java.io.Reader reader)
Reads an data file from a reader, and assigns a weight of
one to each instance.
|
M5Instances(java.io.Reader reader,
int capacity)
Reads the header of an file from a reader and
reserves space for the given number of instances.
|
M5Instances(java.lang.String name,
M5Vector attInfo,
int capacity)
Creates an empty set of instances.
|
Modifier and Type | Method and Description |
---|---|
void |
add(M5Instance instance)
Adds one instance to the end of the set.
|
M5Attribute |
attribute(int index)
Returns an attribute.
|
M5Attribute |
attribute(java.lang.String name)
Returns an attribute given its name.
|
M5AttrStats |
attributeStats(int index)
Calculates summary statistics on the values that appear in this
set of instances for a specified attribute.
|
double[] |
attributeToDoubleArray(int index)
Gets the value of all instances in this dataset for a particular
attribute.
|
boolean |
checkForStringAttributes()
Checks for string attributes in the dataset
|
boolean |
checkInstance(M5Instance instance)
Checks if the given instance is compatible
with this dataset.
|
M5Attribute |
classAttribute()
Returns the class attribute.
|
int |
classIndex()
Returns the class attribute's index.
|
void |
compactify()
Compactifies the set of instances.
|
void |
delete()
Removes all instances from the set.
|
void |
delete(int index)
Removes an instance at the given position from the set.
|
void |
deleteAttributeAt(int position)
Deletes an attribute at the given position
(0 to numAttributes() - 1).
|
void |
deleteStringAttributes()
Deletes all string attributes in the dataset.
|
void |
deleteWithMissing(int attIndex)
Removes all instances with missing values for a particular
attribute from the dataset.
|
void |
deleteWithMissing(M5Attribute att)
Removes all instances with missing values for a particular
attribute from the dataset.
|
void |
deleteWithMissingClass()
Removes all instances with a missing class value
from the dataset.
|
java.util.Enumeration |
enumerateAttributes()
Returns an enumeration of all the attributes.
|
java.util.Enumeration |
enumerateInstances()
Returns an enumeration of all instances in the dataset.
|
boolean |
equalHeaders(M5Instances dataset)
Checks if two headers are equivalent.
|
M5Instance |
firstInstance()
Returns the first instance in the set.
|
protected boolean |
getInstance(java.io.StreamTokenizer tokenizer,
boolean flag)
Reads a single instance using the tokenizer and appends it
to the dataset.
|
protected boolean |
getInstanceFull(java.io.StreamTokenizer tokenizer,
boolean flag)
Reads a single instance using the tokenizer and appends it
to the dataset.
|
protected boolean |
getInstanceSparse(java.io.StreamTokenizer tokenizer,
boolean flag)
Reads a single instance using the tokenizer and appends it
to the dataset.
|
void |
insertAttributeAt(M5Attribute att,
int position)
Inserts an attribute at the given position (0 to
numAttributes()) and sets all values to be missing.
|
M5Instance |
instance(int index)
Returns the instance at the given position.
|
M5Instance |
lastInstance()
Returns the last instance in the set.
|
double |
meanOrMode(int attIndex)
Returns the mean (mode) for a numeric (nominal) attribute as
a floating-point value.
|
double |
meanOrMode(M5Attribute att)
Returns the mean (mode) for a numeric (nominal) attribute as a
floating-point value.
|
static M5Instances |
mergeInstances(M5Instances first,
M5Instances second)
Merges two sets of M5Instances together.
|
java.lang.String |
NameClassIndex()
Returns the class attribute's name of the @output.
|
int |
numAttributes()
Returns the number of attributes.
|
int |
numClasses()
Returns the number of class labels.
|
int |
numDistinctValues(int attIndex)
Returns the number of distinct values of a given attribute.
|
int |
numDistinctValues(M5Attribute att)
Returns the number of distinct values of a given attribute.
|
int |
numInstances()
Returns the number of instances in the dataset.
|
void |
randomize(java.util.Random random)
Shuffles the instances in the set so that they are ordered
randomly.
|
protected void |
readHeader(java.io.StreamTokenizer tokenizer)
Reads and stores header of an file.
|
boolean |
readInstance(java.io.Reader reader)
Reads a single instance from the reader and appends it
to the dataset.
|
java.lang.String |
relationName()
Returns the relation's name.
|
void |
renameAttribute(int att,
java.lang.String name)
Renames an attribute.
|
void |
renameAttribute(M5Attribute att,
java.lang.String name)
Renames an attribute.
|
void |
renameAttributeValue(int att,
int val,
java.lang.String name)
Renames the value of a nominal (or string) attribute value.
|
void |
renameAttributeValue(M5Attribute att,
java.lang.String val,
java.lang.String name)
Renames the value of a nominal (or string) attribute value.
|
M5Instances |
resample(java.util.Random random)
Creates a new dataset of the same size using random sampling
with replacement.
|
M5Instances |
resampleWithWeights(java.util.Random random)
Creates a new dataset of the same size using random sampling
with replacement according to the current instance weights.
|
M5Instances |
resampleWithWeights(java.util.Random random,
double[] weights)
Creates a new dataset of the same size using random sampling
with replacement according to the given weight vector.
|
void |
setClass(M5Attribute att)
Sets the class attribute.
|
void |
setClassIndex(int classIndex)
Sets the class index of the set.
|
void |
setRelationName(java.lang.String newName)
Sets the relation's name.
|
void |
sort(int attIndex)
Sorts the instances based on an attribute.
|
void |
sort(M5Attribute att)
Sorts the instances based on an attribute.
|
void |
stratify(int numFolds)
Stratifies a set of instances according to its class values
if the class attribute is nominal (so that afterwards a
stratified cross-validation can be performed).
|
M5Instances |
stringFreeStructure()
Create a copy of the structure, but "cleanse" string types (i.e.
|
double |
sumOfWeights()
Computes the sum of all the instances' weights.
|
static void |
test(java.lang.String[] argv)
Method for testing this class.
|
M5Instances |
testCV(int numFolds,
int numFold)
Creates the test set for one fold of a cross-validation on
the dataset.
|
java.lang.String |
toString()
Returns the dataset as a string.
|
java.lang.String |
toSummaryString()
Generates a string summarizing the set of instances.
|
M5Instances |
trainCV(int numFolds,
int numFold)
Creates the training set for one fold of a cross-validation
on the dataset.
|
double |
variance(int attIndex)
Computes the variance for a numeric attribute.
|
double |
variance(M5Attribute att)
Computes the variance for a numeric attribute.
|
protected java.lang.String m_RelationName
protected M5Vector m_Attributes
protected M5Vector m_Instances
protected int m_ClassIndex
protected double[] m_ValueBuffer
protected int[] m_IndicesBuffer
protected java.lang.String m_NameClassIndex
public M5Instances(java.io.Reader reader) throws java.io.IOException
reader
- the readerjava.io.IOException
- if the data file is not read
successfullypublic M5Instances(java.io.Reader reader, int capacity) throws java.io.IOException
reader
- the readercapacity
- the capacityjava.lang.IllegalArgumentException
- if the header is not read successfully
or the capacity is negative.java.io.IOException
- if there is a problem with the reader.public M5Instances(M5Instances dataset)
dataset
- the set to be copiedpublic M5Instances(M5Instances dataset, int capacity)
dataset
- the instances from which the header
information is to be takencapacity
- the capacity of the new datasetpublic M5Instances(M5Instances source, int first, int toCopy)
source
- the set of instances from which a subset
is to be createdfirst
- the index of the first instance to be copiedtoCopy
- the number of instances to be copiedjava.lang.IllegalArgumentException
- if first and toCopy are out of rangepublic M5Instances(java.lang.String name, M5Vector attInfo, int capacity)
name
- the name of the relationattInfo
- the attribute informationcapacity
- the capacity of the setpublic M5Instances stringFreeStructure()
public final void add(M5Instance instance)
instance
- the instance to be addedpublic final M5Attribute attribute(int index)
index
- the attribute's indexpublic final M5Attribute attribute(java.lang.String name)
name
- the attribute's namepublic boolean checkForStringAttributes()
public final boolean checkInstance(M5Instance instance)
instance
- instance to check.public final M5Attribute classAttribute() throws java.lang.Exception
java.lang.Exception
- if the class is not setpublic final int classIndex()
public final java.lang.String NameClassIndex()
public final void compactify()
public final void delete()
public final void delete(int index)
index
- the instance's positionpublic void deleteAttributeAt(int position)
position
- the attribute's positionjava.lang.IllegalArgumentException
- if the given index is out of range or the
class attribute is being deletedpublic void deleteStringAttributes()
java.lang.IllegalArgumentException
- if string attribute couldn't be
successfully deleted (probably because it is the class attribute).public final void deleteWithMissing(int attIndex)
attIndex
- the attribute's indexpublic final void deleteWithMissing(M5Attribute att)
att
- the attributepublic final void deleteWithMissingClass() throws java.lang.Exception
java.lang.Exception
- if class is not setpublic java.util.Enumeration enumerateAttributes()
public final java.util.Enumeration enumerateInstances()
public final boolean equalHeaders(M5Instances dataset)
dataset
- another datasetpublic final M5Instance firstInstance()
public void insertAttributeAt(M5Attribute att, int position)
att
- the attribute to be insertedposition
- the attribute's positionjava.lang.IllegalArgumentException
- if the given index is out of rangepublic final M5Instance instance(int index)
index
- the instance's indexpublic final M5Instance lastInstance()
public final double meanOrMode(int attIndex)
attIndex
- the attribute's indexpublic final double meanOrMode(M5Attribute att)
att
- the attributepublic final int numAttributes()
public final int numClasses() throws java.lang.Exception
java.lang.Exception
- if the class is not setpublic final int numDistinctValues(int attIndex)
attIndex
- the attributepublic final int numDistinctValues(M5Attribute att)
att
- the attributepublic final int numInstances()
public final void randomize(java.util.Random random)
random
- a random number generatorpublic final boolean readInstance(java.io.Reader reader) throws java.io.IOException
reader
- the readerjava.io.IOException
- if the information is not read
successfullypublic final java.lang.String relationName()
public final void renameAttribute(int att, java.lang.String name)
att
- the attribute's indexname
- the new namepublic final void renameAttribute(M5Attribute att, java.lang.String name)
att
- the attributename
- the new namepublic final void renameAttributeValue(int att, int val, java.lang.String name)
att
- the attribute's indexval
- the value's indexname
- the new namepublic final void renameAttributeValue(M5Attribute att, java.lang.String val, java.lang.String name)
att
- the attributeval
- the valuename
- the new namepublic final M5Instances resample(java.util.Random random)
random
- a random number generatorpublic final M5Instances resampleWithWeights(java.util.Random random)
random
- a random number generatorpublic final M5Instances resampleWithWeights(java.util.Random random, double[] weights)
random
- a random number generatorweights
- the weight vectorjava.lang.IllegalArgumentException
- if the weights array is of the wrong
length or contains negative weights.public final void setClass(M5Attribute att)
att
- attribute to be the classpublic final void setClassIndex(int classIndex)
classIndex
- the new class indexjava.lang.IllegalArgumentException
- if the class index is too big or < 0public final void setRelationName(java.lang.String newName)
newName
- the new relation name.public final void sort(int attIndex)
attIndex
- the attribute's indexpublic final void sort(M5Attribute att)
att
- the attributepublic final void stratify(int numFolds) throws java.lang.Exception
numFolds
- the number of folds in the cross-validationjava.lang.Exception
- if the class is not setpublic final double sumOfWeights()
public M5Instances testCV(int numFolds, int numFold)
numFolds
- the number of folds in the cross-validation. Must
be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...java.lang.IllegalArgumentException
- if the number of folds is less than 2
or greater than the number of instances.public final java.lang.String toString()
toString
in class java.lang.Object
public M5Instances trainCV(int numFolds, int numFold)
numFolds
- the number of folds in the cross-validation. Must
be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...java.lang.IllegalArgumentException
- if the number of folds is less than 2
or greater than the number of instances.public final double variance(int attIndex)
attIndex
- the numeric attributejava.lang.IllegalArgumentException
- if the attribute is not numericpublic final double variance(M5Attribute att)
att
- the numeric attributejava.lang.IllegalArgumentException
- if the attribute is not numericpublic M5AttrStats attributeStats(int index)
index
- the index of the attribute to summarize.public double[] attributeToDoubleArray(int index)
index
- the index of the attribute.public java.lang.String toSummaryString()
protected boolean getInstance(java.io.StreamTokenizer tokenizer, boolean flag) throws java.io.IOException
tokenizer
- the tokenizer to be usedflag
- if method should test for carriage return after
each instancejava.io.IOException
- if the information is not read
successfullyprotected boolean getInstanceSparse(java.io.StreamTokenizer tokenizer, boolean flag) throws java.io.IOException
tokenizer
- the tokenizer to be usedflag
- if method should test for carriage return after
each instancejava.io.IOException
- if the information is not read
successfullyprotected boolean getInstanceFull(java.io.StreamTokenizer tokenizer, boolean flag) throws java.io.IOException
tokenizer
- the tokenizer to be usedflag
- if method should test for carriage return after
each instancejava.io.IOException
- if the information is not read
successfullyprotected void readHeader(java.io.StreamTokenizer tokenizer) throws java.io.IOException
tokenizer
- the stream tokenizerjava.io.IOException
- if the information is not read
successfullypublic static M5Instances mergeInstances(M5Instances first, M5Instances second)
first
- the first set of M5Instancessecond
- the second set of M5Instancesjava.lang.IllegalArgumentException
- if the datasets are not the same sizepublic static void test(java.lang.String[] argv)
argv
- should contain one element: the name of an data file