public class Dataset
extends java.lang.Object
Methods for reading the train & test file
Constructor and Description |
---|
Dataset()
Constructor.
|
Modifier and Type | Method and Description |
---|---|
void |
calculateMostCommon()
Calculates the values for each column and attribute
|
void |
computeSVDM()
Gets a matrix where store for each nominal attribute, the distances betwen all the possible values
|
java.lang.String |
copyTestHeader()
Returns the header of the file
|
int[][][] |
creaCount()
Creates a 3D array from training set, stored for each class, each attribute and each
value the number of examples of the class C witch have the value V for the attribute
A COUNT[C,V,A]
|
double[] |
createBall(int attr,
double test,
double train)
Creates a vector with the values of the attributes that d(tst[atr],trn[atr])
|
double |
distance(Complex R,
double[] E,
int s,
int q,
double minDist)
Calculates the distance betwen one rule and an example/instance
|
double |
distance(int E,
double[] E_test,
double minDist)
Calculates the distance betwen two examples
|
boolean |
existContinousAttributes()
Checks if in the class the is any in-put of real type or continous
|
boolean |
existInstanceOfClassC(int whichClass)
Check if in the set of the instances the are instances of a determined class
|
java.lang.String |
findNominalValue(int attribute,
double value)
Return the nominal value of the attribute
|
int[] |
getC()
Returns the values for the out-put(class)
|
int |
getC(int pos)
Returns the value of the attributes of the out-put for an instance
|
java.lang.String[] |
getC2() |
double[] |
getEMaximum()
Returns an array with the minium values of the attributes of the in-put
|
double[] |
getEMinimum()
Returns an array with the minium values of the in-put attributes
|
int |
getInPuts()
Returns the number of in-put variables
|
InstanceSet |
getInstanceSet()
Returns the instance set
|
double[][] |
getListValues()
Gets for each attribute the ordered list of the possible values
|
int |
getMaxim(double[] num,
long seed)
Returns the index where is the maximum of a double array
If there are more than one. returns one of them
* @param num array of doubles given.
|
int |
getMaximum(int[] num,
long seed)
Returns the index where is the maximum of an array of integers
If there are more than one, returns one of them
|
int |
getMostFrequentClass()
Returns the class most frecuent of the set of instances
|
int |
getNClasses()
Returns the total number of classes
|
int |
getNData()
Returns the number of examples
|
int[] |
getNeighbourSet(double[] test,
int k)
Calculates the neighbour of one test example
|
int[] |
getNN(double[] test,
int k)
Calculates the k examples most near of the set
|
int |
getNumNegExamples(Complex R)
Calculates the number of negative examples that match with the rule
|
int |
getNumPosExamples(Complex R)
Calculates the number of positive examples the math with the rule
|
int[] |
getNumValues2()
Returns for each attributes the number of values for the set
|
int |
getNVariables()
Returns the number of variables
|
int[][] |
getOptimumClass(int[][][] cuonter,
long seed)
Returns a vector with the class for each pair attribute-value
|
double |
getRealValue(int at,
java.lang.String str)
Gets the real value
|
double[][] |
getX()
Return the values of in-put attributes
|
double[] |
getX(int pos)
Return the values of the in-put attributes for an instance
|
java.lang.String[][] |
getX2()
Return the values of the in-put attributes
|
double[] |
getXNor(int pos)
Return the normalized values of the in-put attributes for an instance
|
java.lang.String[] |
giveClasses()
Returns the value of the classes
|
java.lang.String[] |
giveNames()
Returns the name of the problem variables
|
boolean |
isMissing(int i,
int j)
Checks if an attribute is lost or not
|
int |
mostCommon(int i)
Returns the value most comon of the 'i' attribute
|
int |
mostFrequentClass(long seed)
Calculates the class most frecuent in the set of values
|
int |
nearestSample(Complex R,
int defaultClass,
long seed,
int s,
int q)
Gets the most near example
|
void |
normalize()
Converts all the values of the set into the [0,1] interval
|
void |
readSet(java.lang.String samples,
boolean train)
Reads the file of examples (Train&Test)
|
void |
setNumValues()
Returns for each attribute the number of attributes for each set of values
|
int[] |
variableType()
Returns the types of each in-put (NOMINAL[0] or NUMERICO[1])
|
public double[][] getX()
Return the values of in-put attributes
public java.lang.String[][] getX2()
Return the values of the in-put attributes
public double[] getX(int pos)
Return the values of the in-put attributes for an instance
pos
- The position of the instancepublic double[] getXNor(int pos)
Return the normalized values of the in-put attributes for an instance
pos
- The position of the instance in the set of valuespublic InstanceSet getInstanceSet()
public java.lang.String findNominalValue(int attribute, double value)
Return the nominal value of the attribute
public boolean existInstanceOfClassC(int whichClass) throws java.lang.ArrayIndexOutOfBoundsException
Check if in the set of the instances the are instances of a determined class
whichClass
- Tha lookinf for class instancesjava.lang.ArrayIndexOutOfBoundsException
public double[][] getListValues()
Gets for each attribute the ordered list of the possible values
public int[][][] creaCount()
Creates a 3D array from training set, stored for each class, each attribute and each value the number of examples of the class C witch have the value V for the attribute A COUNT[C,V,A]
public int[][] getOptimumClass(int[][][] cuonter, long seed)
Returns a vector with the class for each pair attribute-value
Count
- each pair attribute-valueseed
- seed.public int getMaximum(int[] num, long seed)
Returns the index where is the maximum of an array of integers If there are more than one, returns one of them
num
- array of doubles given.seed
- seedpublic int getMaxim(double[] num, long seed)
Returns the index where is the maximum of a double array
If there are more than one. returns one of them * @param num array of doubles given.seed
- seedpublic void setNumValues()
Returns for each attribute the number of attributes for each set of values
public int[] getNumValues2()
Returns for each attributes the number of values for the set
public int[] getC()
Returns the values for the out-put(class)
public java.lang.String[] getC2()
public int getC(int pos)
Returns the value of the attributes of the out-put for an instance
pos
- The position of the instance in the set of valuespublic double[] getEMaximum()
Returns an array with the minium values of the attributes of the in-put
public double[] getEMinimum()
Returns an array with the minium values of the in-put attributes
public int getNData()
Returns the number of examples
public int getNVariables()
Returns the number of variables
public int getInPuts()
Returns the number of in-put variables
public int getNClasses()
Returns the total number of classes
public boolean isMissing(int i, int j)
Checks if an attribute is lost or not
i
- int Number of the examplej
- int Number of the attributepublic void readSet(java.lang.String samples, boolean train) throws java.io.IOException
Reads the file of examples (Train&Test)
samples
- Name of the file of examplestrain
- True if Train, False is Testjava.io.IOException
- A possible error de I/Opublic java.lang.String copyTestHeader()
Returns the header of the file
public void normalize()
Converts all the values of the set into the [0,1] interval
public int[] variableType()
Returns the types of each in-put (NOMINAL[0] or NUMERICO[1])
public void calculateMostCommon()
Calculates the values for each column and attribute
public int mostCommon(int i)
Returns the value most comon of the 'i' attribute
i
- int Number of the attributepublic java.lang.String[] giveNames()
Returns the name of the problem variables
public java.lang.String[] giveClasses()
Returns the value of the classes
public boolean existContinousAttributes()
Checks if in the class the is any in-put of real type or continous
public int nearestSample(Complex R, int defaultClass, long seed, int s, int q)
Gets the most near example
public double distance(Complex R, double[] E, int s, int q, double minDist)
Calculates the distance betwen one rule and an example/instance
R
- the ruleE
- the examples
- parameter to calculate the distanceq
- parameter to calculate the distanceminDist
- lowest distancepublic double distance(int E, double[] E_test, double minDist)
Calculates the distance betwen two examples
E
- the number of the example in the datasetE_test
- the example of the testminDist
- the lowest distancepublic int mostFrequentClass(long seed)
Calculates the class most frecuent in the set of values
public int getNumPosExamples(Complex R)
Calculates the number of positive examples the math with the rule
R
- the rulepublic int getNumNegExamples(Complex R)
Calculates the number of negative examples that match with the rule
R
- the rulepublic int[] getNeighbourSet(double[] test, int k)
Calculates the neighbour of one test example
test
- the exmaple of testk
- the size of the neighbourhoodpublic void computeSVDM()
Gets a matrix where store for each nominal attribute, the distances betwen all the possible values
public double[] createBall(int attr, double test, double train)
Creates a vector with the values of the attributes that d(tst[atr],trn[atr])
attr
- the attributetest
- the test exampletrain
- the train examplepublic double getRealValue(int at, java.lang.String str)
Gets the real value
public int getMostFrequentClass()
Returns the class most frecuent of the set of instances
public int[] getNN(double[] test, int k)
Calculates the k examples most near of the set
test
- the test examplek
- number of neighbours