public class Instances
extends java.lang.Object
implements java.io.Serializable
Typical usage:
import weka.core.converters.ConverterUtils.DataSource; ... // Read all the instances in the file (ARFF, CSV, XRFF, ...) DataSource source = new DataSource(filename); Instances instances = source.getDataSet(); // Make the last attribute be the class instances.setClassIndex(instances.numAttributes() - 1); // Print header and instances. System.out.println("\nDataset:\n"); System.out.println(instances); ...
All methods that change a set of instances are safe, ie. a change of a set of instances does not affect any other sets of instances. All methods that change a datasets's attribute information clone the dataset before it is changed.
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
ARFF_DATA
The keyword used to denote the start of the arff data section
|
static java.lang.String |
ARFF_RELATION
The keyword used to denote the start of an arff header
|
static java.lang.String |
FILE_EXTENSION
The filename extension that should be used for arff files
|
protected FastVector |
m_Attributes
The attribute information.
|
protected int |
m_ClassIndex
The class attribute's index
|
protected FastVector |
m_Instances
The instances.
|
protected int |
m_Lines
The lines read so far in case of incremental loading.
|
protected java.lang.String |
m_RelationName
The dataset's name.
|
static java.lang.String |
SERIALIZED_OBJ_FILE_EXTENSION
The filename extension that should be used for bin. serialized instances files
|
Constructor and Description |
---|
Instances(Instances dataset)
Constructor copying all instances and references to
the header information from the given set of instances.
|
Instances(Instances dataset,
int capacity)
Constructor creating an empty set of instances.
|
Instances(Instances source,
int first,
int toCopy)
Creates a new set of instances by copying a
subset of another set.
|
Instances(java.lang.String name,
FastVector attInfo,
int capacity)
Creates an empty set of instances.
|
Modifier and Type | Method and Description |
---|---|
void |
add(Instance instance)
Adds one instance to the end of the set.
|
AttributeWeka |
attribute(int index)
Returns an attribute.
|
AttributeWeka |
attribute(java.lang.String name)
Returns an attribute given its name.
|
double[] |
attributeToDoubleArray(int index)
Gets the value of all instances in this dataset for a particular
attribute.
|
boolean |
checkForAttributeType(int attType)
Checks for attributes of the given type in the dataset
|
boolean |
checkForStringAttributes()
Checks for string attributes in the dataset
|
boolean |
checkInstance(Instance instance)
Checks if the given instance is compatible
with this dataset.
|
AttributeWeka |
classAttribute()
Returns the class attribute.
|
int |
classIndex()
Returns the class attribute's index.
|
void |
compactify()
Compactifies the set of instances.
|
protected void |
copyInstances(int from,
Instances dest,
int num)
Copies instances from one set to the end of another
one.
|
void |
delete()
Removes all instances from the set.
|
void |
delete(int index)
Removes an instance at the given position from the set.
|
void |
deleteAttributeAt(int position)
Deletes an attribute at the given position
(0 to numAttributes() - 1).
|
void |
deleteAttributeType(int attType)
Deletes all attributes of the given type in the dataset.
|
void |
deleteStringAttributes()
Deletes all string attributes in the dataset.
|
void |
deleteWithMissing(AttributeWeka att)
Removes all instances with missing values for a particular
attribute from the dataset.
|
void |
deleteWithMissing(int attIndex)
Removes all instances with missing values for a particular
attribute from the dataset.
|
void |
deleteWithMissingClass()
Removes all instances with a missing class value
from the dataset.
|
java.util.Enumeration |
enumerateAttributes()
Returns an enumeration of all the attributes.
|
java.util.Enumeration |
enumerateInstances()
Returns an enumeration of all instances in the dataset.
|
boolean |
equalHeaders(Instances dataset)
Checks if two headers are equivalent.
|
Instance |
firstInstance()
Returns the first instance in the set.
|
protected void |
freshAttributeInfo()
Replaces the attribute information by a clone of
itself.
|
protected void |
initialize(Instances dataset,
int capacity)
initializes with the header information of the given dataset and sets
the capacity of the set of instances.
|
void |
insertAttributeAt(AttributeWeka att,
int position)
Inserts an attribute at the given position (0 to
numAttributes()) and sets all values to be missing.
|
Instance |
instance(int index)
Returns the instance at the given position.
|
protected java.lang.String |
instancesAndWeights()
Returns string including all instances, their weights and
their indices in the original dataset.
|
double |
kthSmallestValue(AttributeWeka att,
int k)
Returns the kth-smallest attribute value of a numeric attribute.
|
double |
kthSmallestValue(int attIndex,
int k)
Returns the kth-smallest attribute value of a numeric attribute.
|
Instance |
lastInstance()
Returns the last instance in the set.
|
double |
meanOrMode(AttributeWeka att)
Returns the mean (mode) for a numeric (nominal) attribute as a
floating-point value.
|
double |
meanOrMode(int attIndex)
Returns the mean (mode) for a numeric (nominal) attribute as
a floating-point value.
|
static Instances |
mergeInstances(Instances first,
Instances second)
Merges two sets of Instances together.
|
int |
numAttributes()
Returns the number of attributes.
|
int |
numClasses()
Returns the number of class labels.
|
int |
numDistinctValues(AttributeWeka att)
Returns the number of distinct values of a given attribute.
|
int |
numDistinctValues(int attIndex)
Returns the number of distinct values of a given attribute.
|
int |
numInstances()
Returns the number of instances in the dataset.
|
protected int |
partition(int attIndex,
int l,
int r)
Partitions the instances around a pivot.
|
protected void |
quickSort(int attIndex,
int left,
int right)
Implements quicksort according to Manber's "Introduction to
Algorithms".
|
void |
randomize(Randomize random)
Shuffles the instances in the set so that they are ordered
randomly.
|
void |
randomizeAttribute(int attIdx,
Randomize random,
int rounds)
Shuffles the values of a given attribute in all instances.
|
java.lang.String |
relationName()
Returns the relation's name.
|
void |
renameAttribute(AttributeWeka att,
java.lang.String name)
Renames an attribute.
|
void |
renameAttribute(int att,
java.lang.String name)
Renames an attribute.
|
void |
renameAttributeValue(AttributeWeka att,
java.lang.String val,
java.lang.String name)
Renames the value of a nominal (or string) attribute value.
|
void |
renameAttributeValue(int att,
int val,
java.lang.String name)
Renames the value of a nominal (or string) attribute value.
|
Instances |
resample(Randomize random)
Creates a new dataset of the same size using random sampling
with replacement.
|
Instances |
resampleWithWeights(Randomize random)
Creates a new dataset of the same size using random sampling
with replacement according to the current instance weights.
|
Instances |
resampleWithWeights(Randomize random,
double[] weights)
Creates a new dataset of the same size using random sampling
with replacement according to the given weight vector.
|
protected int |
select(int attIndex,
int left,
int right,
int k)
Implements computation of the kth-smallest element according
to Manber's "Introduction to Algorithms".
|
void |
setClass(AttributeWeka att)
Sets the class attribute.
|
void |
setClassIndex(int classIndex)
Sets the class index of the set.
|
void |
setRelationName(java.lang.String newName)
Sets the relation's name.
|
void |
sort(AttributeWeka att)
Sorts the instances based on an attribute.
|
void |
sort(int attIndex)
Sorts the instances based on an attribute.
|
void |
stratify(int numFolds)
Stratifies a set of instances according to its class values
if the class attribute is nominal (so that afterwards a
stratified cross-validation can be performed).
|
protected void |
stratStep(int numFolds)
Help function needed for stratification of set.
|
Instances |
stringFreeStructure()
Create a copy of the structure if the data has string or
relational attributes, "cleanses" string types (i.e. doesn't
contain references to the strings seen in the past) and all
relational attributes.
|
protected java.lang.String |
stringWithoutHeader()
Returns the instances in the dataset as a string in ARFF format.
|
double |
sumOfWeights()
Computes the sum of all the instances' weights.
|
void |
swap(int i,
int j)
Swaps two instances in the set.
|
Instances |
testCV(int numFolds,
int numFold)
Creates the test set for one fold of a cross-validation on
the dataset.
|
java.lang.String |
toString()
Returns the dataset as a string in ARFF format.
|
Instances |
trainCV(int numFolds,
int numFold)
Creates the training set for one fold of a cross-validation
on the dataset.
|
Instances |
trainCV(int numFolds,
int numFold,
Randomize random)
Creates the training set for one fold of a cross-validation
on the dataset.
|
void |
undoRandomizeAttribute()
Does an undo of a previous call to randomizeAttribute, so that the
original values of the attribute are restored.
|
double |
variance(AttributeWeka att)
Computes the variance for a numeric attribute.
|
double |
variance(int attIndex)
Computes the variance for a numeric attribute.
|
public static final java.lang.String FILE_EXTENSION
public static final java.lang.String SERIALIZED_OBJ_FILE_EXTENSION
public static final java.lang.String ARFF_RELATION
public static final java.lang.String ARFF_DATA
protected java.lang.String m_RelationName
protected FastVector m_Attributes
protected FastVector m_Instances
protected int m_ClassIndex
protected int m_Lines
#readInstance(Reader)
public Instances(Instances dataset)
dataset
- the set to be copiedpublic Instances(Instances dataset, int capacity)
dataset
- the instances from which the header
information is to be takencapacity
- the capacity of the new datasetpublic Instances(Instances source, int first, int toCopy)
source
- the set of instances from which a subset
is to be createdfirst
- the index of the first instance to be copiedtoCopy
- the number of instances to be copiedjava.lang.IllegalArgumentException
- if first and toCopy are out of rangepublic Instances(java.lang.String name, FastVector attInfo, int capacity)
name
- the name of the relationattInfo
- the attribute informationcapacity
- the capacity of the setprotected void initialize(Instances dataset, int capacity)
dataset
- the dataset to use as templatecapacity
- the number of rows to reservepublic Instances stringFreeStructure()
public void add(Instance instance)
instance
- the instance to be addedpublic AttributeWeka attribute(int index)
index
- the attribute's index (index starts with 0)public AttributeWeka attribute(java.lang.String name)
name
- the attribute's namepublic boolean checkForAttributeType(int attType)
attType
- the attribute type to look forpublic boolean checkForStringAttributes()
public boolean checkInstance(Instance instance)
instance
- the instance to checkpublic AttributeWeka classAttribute()
UnassignedClassException
- if the class is not setpublic int classIndex()
public void compactify()
public void delete()
public void delete(int index)
index
- the instance's position (index starts with 0)public void deleteAttributeAt(int position)
position
- the attribute's position (position starts with 0)java.lang.IllegalArgumentException
- if the given index is out of range
or the class attribute is being deletedpublic void deleteAttributeType(int attType)
attType
- the attribute type to deletejava.lang.IllegalArgumentException
- if attribute couldn't be
successfully deleted (probably because it is the class attribute).public void deleteStringAttributes()
java.lang.IllegalArgumentException
- if string attribute couldn't be
successfully deleted (probably because it is the class attribute).deleteAttributeType(int)
public void deleteWithMissing(int attIndex)
attIndex
- the attribute's index (index starts with 0)public void deleteWithMissing(AttributeWeka att)
att
- the attributepublic void deleteWithMissingClass()
UnassignedClassException
- if class is not setpublic java.util.Enumeration enumerateAttributes()
public java.util.Enumeration enumerateInstances()
public boolean equalHeaders(Instances dataset)
dataset
- another datasetpublic Instance firstInstance()
public void insertAttributeAt(AttributeWeka att, int position)
att
- the attribute to be insertedposition
- the attribute's position (position starts with 0)java.lang.IllegalArgumentException
- if the given index is out of rangepublic Instance instance(int index)
index
- the instance's index (index starts with 0)public double kthSmallestValue(AttributeWeka att, int k)
att
- the AttributeWeka objectk
- the value of kpublic double kthSmallestValue(int attIndex, int k)
attIndex
- the attribute's indexk
- the value of kpublic Instance lastInstance()
public double meanOrMode(int attIndex)
attIndex
- the attribute's index (index starts with 0)public double meanOrMode(AttributeWeka att)
att
- the attributepublic int numAttributes()
public int numClasses()
UnassignedClassException
- if the class is not setpublic int numDistinctValues(int attIndex)
attIndex
- the attribute (index starts with 0)public int numDistinctValues(AttributeWeka att)
att
- the attributepublic int numInstances()
public void randomize(Randomize random)
random
- a random number generatorpublic void undoRandomizeAttribute() throws java.lang.Exception
java.lang.Exception
- if there was no call to randomizeAttribute or if
attributes were added or removed since the last call to
randomizeAttribute
randomizeAttribute
public void randomizeAttribute(int attIdx, Randomize random, int rounds)
randomizeAttribute
and
undoRandomizeAttribute
.attIdx
- the index of the attribute to shufflerandom
- a random number generatorrounds
- how many rounds of shuffling, minimum must be 1. As more
rounds of shuffling the more random your attribute value distribution
(e.g. choose 3, but note that the time needed for shuffling is proportional
to the number of rounds).undoRandomizeAttribute
public java.lang.String relationName()
public void renameAttribute(int att, java.lang.String name)
att
- the attribute's index (index starts with 0)name
- the new namepublic void renameAttribute(AttributeWeka att, java.lang.String name)
att
- the attributename
- the new namepublic void renameAttributeValue(int att, int val, java.lang.String name)
att
- the attribute's index (index starts with 0)val
- the value's index (index starts with 0)name
- the new namepublic void renameAttributeValue(AttributeWeka att, java.lang.String val, java.lang.String name)
att
- the attributeval
- the valuename
- the new namepublic Instances resample(Randomize random)
random
- a random number generatorpublic Instances resampleWithWeights(Randomize random)
random
- a random number generatorpublic Instances resampleWithWeights(Randomize random, double[] weights)
random
- a random number generatorweights
- the weight vectorjava.lang.IllegalArgumentException
- if the weights array is of the wrong
length or contains negative weights.public void setClass(AttributeWeka att)
att
- attribute to be the classpublic void setClassIndex(int classIndex)
classIndex
- the new class index (index starts with 0)java.lang.IllegalArgumentException
- if the class index is too big or < 0public void setRelationName(java.lang.String newName)
newName
- the new relation name.public void sort(int attIndex)
attIndex
- the attribute's index (index starts with 0)public void sort(AttributeWeka att)
att
- the attributepublic void stratify(int numFolds)
numFolds
- the number of folds in the cross-validationUnassignedClassException
- if the class is not setpublic double sumOfWeights()
public Instances testCV(int numFolds, int numFold)
numFolds
- the number of folds in the cross-validation. Must
be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...java.lang.IllegalArgumentException
- if the number of folds is less than 2
or greater than the number of instances.public java.lang.String toString()
toString
in class java.lang.Object
protected java.lang.String stringWithoutHeader()
public Instances trainCV(int numFolds, int numFold)
numFolds
- the number of folds in the cross-validation. Must
be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...java.lang.IllegalArgumentException
- if the number of folds is less than 2
or greater than the number of instances.public Instances trainCV(int numFolds, int numFold, Randomize random)
numFolds
- the number of folds in the cross-validation. Must
be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...random
- the random number generatorjava.lang.IllegalArgumentException
- if the number of folds is less than 2
or greater than the number of instances.public double variance(int attIndex)
attIndex
- the numeric attribute (index starts with 0)java.lang.IllegalArgumentException
- if the attribute is not numericpublic double variance(AttributeWeka att)
att
- the numeric attributejava.lang.IllegalArgumentException
- if the attribute is not numericpublic double[] attributeToDoubleArray(int index)
index
- the index of the attribute.protected void copyInstances(int from, Instances dest, int num)
from
- the position of the first instance to be copieddest
- the destination for the instancesnum
- the number of instances to be copiedprotected void freshAttributeInfo()
protected java.lang.String instancesAndWeights()
protected int partition(int attIndex, int l, int r)
attIndex
- the attribute's index (index starts with 0)l
- the first index of the subset (index starts with 0)r
- the last index of the subset (index starts with 0)protected void quickSort(int attIndex, int left, int right)
attIndex
- the attribute's index (index starts with 0)left
- the first index of the subset to be sorted (index starts with 0)right
- the last index of the subset to be sorted (index starts with 0)protected int select(int attIndex, int left, int right, int k)
attIndex
- the attribute's index (index starts with 0)left
- the first index of the subset (index starts with 0)right
- the last index of the subset (index starts with 0)k
- the value of kprotected void stratStep(int numFolds)
numFolds
- the number of folds for the stratificationpublic void swap(int i, int j)
i
- the first instance's index (index starts with 0)j
- the second instance's index (index starts with 0)public static Instances mergeInstances(Instances first, Instances second)
first
- the first set of Instancessecond
- the second set of Instancesjava.lang.IllegalArgumentException
- if the datasets are not the same size