Datos

java.lang.Object
- keel.Algorithms.Preprocess.Feature_Selection.Datos

```
public class Datos
extends java.lang.Object
```
Datos.java Data structure used for Feature Selection preprocessing.

Author:

Manuel Chica Serrano (University of Jaen) 22/8/2005, Modified by Ignacio Robles Paiz (University of Granada) 27/06/2010, Modified by Ignacio Robles Paiz (University of Granada) 02/07/2010

Constructor Summary

Constructors
Constructor and Description

Datos(java.lang.String trainFileNames1, java.lang.String testFileNames1, int k)
Creates a new instance of Datos

Constructors
Constructor and Description
`Datos(java.lang.String trainFileNames1, java.lang.String testFileNames1, int k)` Creates a new instance of Datos

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`int`	`diff(int numCarac, int numInstancia1, int numInstancia2)` data must be discretized. it is used by the RELIEF method. returns 1 if the feature value passed as argument is equal in both instances (also passed as arguments),0 in other case.
`int`	`findNearestHit(int posI)` returns the nearest instance according to the instance passed as an argument.
`int`	`findNearestMiss(int posI)` returns the nearest instance according to the instance passed as an argument.
`void`	`generarFicherosSalida(java.lang.String ficheroTrainSalida, java.lang.String ficheroTestSalida, boolean[] solucion)` this method generates the output files .tra and .tst, removing the non-selected features
`double`	`LVO(boolean[] featuresVector)` Calculates the precision (errors/total_instances) in the prediction of the instance class.
`double`	`LVOTest(boolean[] featuresVector)` calculates the precision (errors/total_instances) in the classification of all instances in the TEST DATASET using the given features and THE SAME TEST DATASET TO PREDICT.
`double`	`measureIEP(boolean[] featuresVector)` Calculates the inconsistent example pairs ratio (IEP)
`double`	`medidaInconsistencia(boolean[] featuresVector)` Calculates the inconcistency ratio.
`double[][]`	`obtenerIMVars()` Computes the mutual information between every two variables.
`double[]`	`obtenerIMVarsClase()` Calculates the mutual information measure between the variables and the class.
`int`	`returnNumFeatures()` Returns the number of features of the datasets
`int`	`returnNumInstances()` Returns the number of instances of the datasets
`double`	`sumDifferentClasses(int posExample, int feature)` Sums the prior probabilities of the classes that are different to the one of the example given.
`double`	`validacionCruzada(boolean[] featuresVector)` calculates the precision (errors/total_instances) in the classification of all instances in the TEST DATASET using the given features and THE TRAINING DATASET TO PREDICT.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - Datos
```
public Datos(java.lang.String trainFileNames1,
             java.lang.String testFileNames1,
             int k)
```
    Creates a new instance of Datos
    
    Parameters:
    
    trainFileNames1 - Training filename.
    
    testFileNames1 - Test filename.
    
    k - number of nearest neighbours to consider.
- Method Detail
  - returnNumFeatures
```
public int returnNumFeatures()
```
    Returns the number of features of the datasets
    
    Returns:
    
    the number of input features
  - returnNumInstances
```
public int returnNumInstances()
```
    Returns the number of instances of the datasets
    
    Returns:
    
    the number of input instances
  - LVO
```
public double LVO(boolean[] featuresVector)
```
    Calculates the precision (errors/total_instances) in the prediction of the instance class. This is the Leaving One Out algorithm
    
    Parameters:
    
    featuresVector - a boolean array with selected features
    
    Returns:
    
    returns a double value with the error (n_errors/total_instances)
  - medidaInconsistencia
```
public double medidaInconsistencia(boolean[] featuresVector)
```
    Calculates the inconcistency ratio.
    
    Parameters:
    
    featuresVector - a boolean array with the selected features
    
    Returns:
    
    returns a double value with the inconsistency ratio (0..1)
  - measureIEP
```
public double measureIEP(boolean[] featuresVector)
```
    Calculates the inconsistent example pairs ratio (IEP)
    
    Parameters:
    
    featuresVector - a boolean array with the selected features
    
    Returns:
    
    returns a double value with the inconsistency ratio (0..1)
  - obtenerIMVarsClase
```
public double[] obtenerIMVarsClase()
```
    Calculates the mutual information measure between the variables and the class. This method will be applied only at the beginning
    
    Returns:
    
    a double array with the MI between ith variable and the class
  - obtenerIMVars
```
public double[][] obtenerIMVars()
```
    Computes the mutual information between every two variables. Also used when the IM measurement is used.
    
    Returns:
    
    matrix(ij) with the IM measurement (Mutual information) between the variable i and the variable j.
  - findNearestHit
```
public int findNearestHit(int posI)
```
    returns the nearest instance according to the instance passed as an argument. It is neccesary that their classes are the same. The nearest instance will be an instance that has the minimum euclidean distance
    
    Parameters:
    
    posI - is the instance position to find the nearest
    
    Returns:
    
    returns the instance position of the nearest instance found
  - findNearestMiss
```
public int findNearestMiss(int posI)
```
    returns the nearest instance according to the instance passed as an argument. It is neccesary that their classes are NOT the same. The nearest instance will be an instance that has the minimum euclidean distance
    
    Parameters:
    
    posI - is the instance position to find the nearest
    
    Returns:
    
    returns the instance position of the nearest instance found
  - sumDifferentClasses
```
public double sumDifferentClasses(int posExample,
                                  int feature)
```
    Sums the prior probabilities of the classes that are different to the one of the example given.
    
    Parameters:
    
    posExample - example id.
    
    feature - feature id.
    
    Returns:
    
    Sums the prior probabilities of the classes that are different to the one of the example given.
  - diff
```
public int diff(int numCarac,
                int numInstancia1,
                int numInstancia2)
```
    data must be discretized. it is used by the RELIEF method. returns 1 if the feature value passed as argument is equal in both instances (also passed as arguments),0 in other case.
    
    Parameters:
    
    numCarac - is the number of feature to check
    
    numInstancia1 - is the position of the first instance
    
    numInstancia2 - is the position of the second instance
    
    Returns:
    
    return 1 if the feature value passed as argument is equal in both instances
  - LVOTest
```
public double LVOTest(boolean[] featuresVector)
```
    calculates the precision (errors/total_instances) in the classification of all instances in the TEST DATASET using the given features and THE SAME TEST DATASET TO PREDICT. Uses the Leaving One Out algorithm
    
    Parameters:
    
    featuresVector - a boolean array with the selected features
    
    Returns:
    
    returns a double value with the calculated error (n.errors/total)
  - validacionCruzada
```
public double validacionCruzada(boolean[] featuresVector)
```
    calculates the precision (errors/total_instances) in the classification of all instances in the TEST DATASET using the given features and THE TRAINING DATASET TO PREDICT. Uses the Leaving One Out algorithm
    
    Parameters:
    
    featuresVector - a boolean array with the selected features
    
    Returns:
    
    returns a double value with the calculated error (n.errors/total)
  - generarFicherosSalida
```
public void generarFicherosSalida(java.lang.String ficheroTrainSalida,
                                  java.lang.String ficheroTestSalida,
                                  boolean[] solucion)
```
    this method generates the output files .tra and .tst, removing the non-selected features
    
    Parameters:
    
    ficheroTrainSalida - is a string with the pathname of the output training file
    
    ficheroTestSalida - is a string with the pathname of the output test file
    
    solucion - is a boolean array with the selected features

Class Datos

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

Datos

Method Detail

returnNumFeatures

returnNumInstances

LVO

medidaInconsistencia

measureIEP

obtenerIMVarsClase

obtenerIMVars

findNearestHit

findNearestMiss

sumDifferentClasses

diff

LVOTest

validacionCruzada

generarFicherosSalida