public class myDataset
extends java.lang.Object
Title: Dataset
Description: It contains the methods to read a Classification/Regression Dataset
Company: KEEL
Modifier and Type | Field and Description |
---|---|
static int |
INTEGER
Number to represent type of variable integer.
|
static int |
NOMINAL
Number to represent type of variable nominal.
|
static int |
REAL
Number to represent type of variable real or double.
|
Constructor and Description |
---|
myDataset()
Init a new set of instances
|
myDataset(myDataset copia)
Generates a new binary dataset by copying it from the dataset given.
|
myDataset(myDataset copia,
double[] distribution)
Generates a new binary dataset by copying it from the dataset given.
|
myDataset(myDataset copia,
int majC,
double[][] Xmaj,
int minC,
double[][] Xmin)
It generates a new binary dataset
|
myDataset(myDataset copia,
int clase_1,
int clase_2)
It generates a new binary dataset
|
myDataset(myDataset copia,
int clase_1,
int clase_2,
int[] empate)
It generates a new binary dataset by copying the instances indicated by the value 1 in the array given.
|
myDataset(myDataset copia,
java.lang.String bagType)
Generates a new binary dataset by copying it from the dataset given and preprocessing it.
|
Modifier and Type | Method and Description |
---|---|
java.util.List<java.lang.Integer> |
asList(int[] is)
Returns a List with the elements of the array given.
|
double |
average(int position)
It return the average of an specific attribute
|
java.lang.String |
claseMasFrecuente()
It returns the most frequent class in the dataset
|
int |
claseNumerica(java.lang.String valorNominal)
Returns a numeric representation of a class nominal value given as argument.
|
void |
computeInstancesPerClass()
It computes the number of examples per class.
|
void |
computeIR()
Computes the Imbalanced Rate for each class.
|
void |
computeStatisticsPerClass()
It computes the average and standard deviation of the input attributes
|
java.lang.String |
copyHeader()
It copies the header of the dataset
|
double[][] |
devuelveRangos()
Returns the minimum and maximum values of every attributes as a matrix.
|
void |
discretize(int intervalos)
Uniform width discretization
|
double[][] |
getAveragePerClass()
Returns the average values per class.
|
double[] |
getemax()
It returns an array with the maximum values of the attributes
|
double[] |
getemin()
It returns an array with the minimum values of the attributes
|
double[] |
getExample(int pos)
Output a specific example
|
double |
getIR(int clase)
Returns the Imbalanced Rate for the class given.
|
InstanceSet |
getIS()
Returns the Instance set stored.
|
double |
getMax(int variable)
It returns the maximum value of the given attribute
|
double |
getMin(int variable)
It returns the minimum value of the given attribute
|
int |
getnClasses()
It gets the number of output attributes of the data-set (for example number of classes in classification)
|
int |
getnData()
It gets the size of the data-set
|
int |
getnInputs()
It gets the number of input attributes of the data-set
|
int |
getnVars()
It gets the number of variables of the data-set (including the output)
|
int[] |
getOutputAsInteger()
Returns the output of the data-set as integer values
|
int |
getOutputAsInteger(int pos)
It returns the output value of the example "pos"
|
double[] |
getOutputAsReal()
Returns the output of the data-set as real values
|
double |
getOutputAsReal(int pos)
It returns the output value of the example "pos"
|
java.lang.String[] |
getOutputAsString()
Returns the output of the data-set as nominal values
|
java.lang.String |
getOutputAsString(int pos)
It returns the output value of the example "pos"
|
java.lang.String |
getOutputValue(int intValue)
It returns the output value (string) which matchs with a given integer value
|
double[][] |
getStdPerClass()
Returns the standard deviation per class.
|
int |
getTipo(int variable)
It returns the type of an attribute
|
double[][] |
getX()
Outputs an array of examples with their corresponding attribute values.
|
boolean |
hasMissingAttributes()
It checks if the data-set has any missing value
|
boolean |
hasNumericalAttributes()
It checks if the data-set has any numerical value
|
boolean |
hasRealAttributes()
It checks if the data-set has any real value
|
boolean[] |
importanceSampling(myDataset copia,
int size,
boolean[] oob,
double oobErr)
Importance undersampling of the dataset given.
|
boolean |
isMissing(int i,
int j)
This function checks if the attribute value is missing
|
java.lang.String |
nombreClase(int clase)
Returns the nominal value for a class represented by the integer given.
|
java.lang.String[] |
nombres()
It returns the names for all input variables
|
java.lang.String |
nombreVar(int pos)
It returns the attribute name of a given variable
|
void |
normalize()
It transform the input space into the [0,1] range
|
int |
numberInstances(int clas)
It returns the number of instances in the data set for a given class.
|
int |
numberValues(int attribute)
Function to get the number of different feasible values for a given attribute
|
int |
numEjemplos(int clase)
It returns the number of instances in the data set for a given class.
|
java.lang.String |
printDataSet()
Returns a string representation of the dataset.
|
int[] |
randomSampling(myDataset copia,
int majC,
int minC,
int a)
Random undersampling of the dataset given.
|
int[] |
randomSampling(myDataset copia,
int majC,
int minC,
int nMaj,
int nMin)
Random undersampling of the dataset given.
|
int[] |
randomUnderSampling(myDataset copia,
int majC,
int N)
Original dataset to take examples from and
the % of majority class in the new data set
|
void |
readClassificationSet(java.lang.String datasetFile,
boolean train)
It reads the whole input data-set and it stores each example and its associated output value in
local arrays to ease their use.
|
void |
readInstanceSet(InstanceSet IS)
It reads the whole input data-set and it stores each example and its associated output value in
local arrays to ease their use.
|
int |
size()
It return the size of the data-set
|
int |
sizeWithoutMissing()
It return the size of the data-set without having account the missing values
|
double |
stdDev(int position)
It return the standard deviation of an specific attribute
|
int |
totalNominales(int atributo)
It returns the number of nominal values for a given variable
|
boolean |
vacio()
Checks if there is a class without instances.
|
static java.lang.String |
valorNominal(int atributo,
double valorReal)
Returns a nominal representation of a attribute's real value given as argument.
|
static double |
valorReal(int atributo,
java.lang.String valorNominal)
Returns a real representation of a attribute's nominal value given as argument.
|
public static final int REAL
public static final int INTEGER
public static final int NOMINAL
public myDataset()
public myDataset(myDataset copia, double[] distribution)
copia
- the original training datasetdistribution
- dataset distribution.public myDataset(myDataset copia)
copia
- the original training datasetpublic myDataset(myDataset copia, int clase_1, int clase_2)
copia
- the original training datasetclase_1
- first classclase_2
- second classpublic myDataset(myDataset copia, java.lang.String bagType)
copia
- the original training datasetbagType
- type of preprocessing algorithm (OVERBAGGING, UNDERBAGGING)public myDataset(myDataset copia, int clase_1, int clase_2, int[] empate)
copia
- the original training datasetclase_1
- first classclase_2
- second classempate
- the instances with the value 1 in this array will be copied.public myDataset(myDataset copia, int majC, double[][] Xmaj, int minC, double[][] Xmin)
copia
- the original training datasetmajC
- majority class.Xmaj
- instances belonging to majority class.minC
- minority class.Xmin
- instances belonging to minority class.public int[] randomUnderSampling(myDataset copia, int majC, int N)
copia
- original training dataset.majC
- majority class.N
- number instances to be selectedpublic int[] randomSampling(myDataset copia, int majC, int minC, int a)
copia
- original dataset.majC
- majority class.minC
- minority class.a
- % of majority class to be selected.public int[] randomSampling(myDataset copia, int majC, int minC, int nMaj, int nMin)
copia
- original dataset.majC
- majority class.minC
- minority class.nMaj
- number of majority class instances to be selected.nMin
- number of minority class instances to be selected.public boolean[] importanceSampling(myDataset copia, int size, boolean[] oob, double oobErr)
copia
- original dataset.size
- size of the sampling.oob
- oob to be considered.oobErr
- oob error.public double[][] getX()
public double[] getExample(int pos)
pos
- int position (id) of the example in the data-setpublic int[] getOutputAsInteger()
public double[] getOutputAsReal()
public java.lang.String[] getOutputAsString()
public java.lang.String getOutputAsString(int pos)
pos
- int the position (id) of the examplepublic int getOutputAsInteger(int pos)
pos
- int the position (id) of the examplepublic double getOutputAsReal(int pos)
pos
- int the position (id) of the examplepublic double[] getemax()
public double[] getemin()
public double getMax(int variable)
variable
- the index of the attributepublic double getMin(int variable)
variable
- the index of the attributepublic int getnData()
public int getnVars()
public int getnInputs()
public int getnClasses()
public boolean isMissing(int i, int j)
i
- int Example idj
- int Variable idpublic void readClassificationSet(java.lang.String datasetFile, boolean train) throws java.io.IOException
datasetFile
- String name of the file containing the datasettrain
- boolean It must have the value "true" if we are reading the training data-setjava.io.IOException
- If there ocurs any problem with the reading of the data-setpublic void readInstanceSet(InstanceSet IS) throws java.io.IOException
IS
- Instance set given.java.io.IOException
- If there ocurs any problem with the reading of the data-setpublic java.lang.String copyHeader()
public void normalize()
public void computeStatisticsPerClass()
public double[][] getAveragePerClass()
public double[][] getStdPerClass()
public boolean hasRealAttributes()
public boolean hasNumericalAttributes()
public boolean hasMissingAttributes()
public int sizeWithoutMissing()
public int size()
public double stdDev(int position)
position
- int attribute id (position of the attribute)public double average(int position)
position
- int attribute id (position of the attribute)public void computeInstancesPerClass()
public int numberInstances(int clas)
clas
- int Given class.public void computeIR()
public double getIR(int clase)
clase
- class id.public int numberValues(int attribute)
attribute
- int Given attributepublic java.lang.String getOutputValue(int intValue)
intValue
- int Given valuepublic int getTipo(int variable)
variable
- Given attributepublic double[][] devuelveRangos()
public java.lang.String nombreVar(int pos)
pos
- variable id.public java.lang.String nombreClase(int clase)
clase
- integer representation of the class.public void discretize(int intervalos)
intervalos
- int Number of intervalspublic java.lang.String[] nombres()
public static double valorReal(int atributo, java.lang.String valorNominal)
atributo
- Attribute given.valorNominal
- Nominal value of the attribute given.public int claseNumerica(java.lang.String valorNominal)
valorNominal
- class nominal value.public static java.lang.String valorNominal(int atributo, double valorReal)
atributo
- Attribute given.valorReal
- Real value of the attribute given.public int totalNominales(int atributo)
atributo
- variable idpublic java.lang.String claseMasFrecuente()
public int numEjemplos(int clase)
clase
- int Given class.public boolean vacio()
public java.lang.String printDataSet()
public java.util.List<java.lang.Integer> asList(int[] is)
is
- array given.public InstanceSet getIS()