public class PANDA
extends java.lang.Object
This noise detection algorithm, PANDA, seeks to identify those instances with a large deviation from normal given the values of a pair of attributes. When a set of instances have similar values for one attribute, large deviations from normal for the second attribute may be considered suspicious. The output of PANDA is a list of instances ordered from most noisy to least noisy. Each instance is assigned an output score (Noise Factor), which is used to rank the instance relative to the other instances in the data set. After obtaining a noise ranking, some of the instances may be discarded from the data set, which would result in a cleaner data set with which to perform additional analysis. Reference: 2007-Hulse-KIS
Constructor and Description |
---|
PANDA()
Constructor of the class
|
Modifier and Type | Method and Description |
---|---|
void |
CalculateNoiseFactor()
Computes the noise factor.
|
double |
computeDesv(int j,
int k,
int l,
double mean)
Computes deviation for the values attribute k whose discretized value j is equal to l.
|
double |
computeMean(int j,
int k,
int l)
Computes mean for the values attribute k whose discretized value j is equal to l.
|
void |
createDatasets(java.lang.String trainIN,
java.lang.String trainOUT,
java.lang.String testIN,
java.lang.String testOUT)
It apllies the changes to remove the noise
|
void |
run()
Executes the algorithm.
|
public void run()
public void CalculateNoiseFactor()
public double computeMean(int j, int k, int l)
j
- discretized value id.k
- attribute id given.l
- value given.public double computeDesv(int j, int k, int l, double mean)
j
- discretized value id.k
- attribute id given.l
- value given.mean
- given mean.public void createDatasets(java.lang.String trainIN, java.lang.String trainOUT, java.lang.String testIN, java.lang.String testOUT)
It apllies the changes to remove the noise
trainIN
- Original Training dataset filename.trainOUT
- Modified Training dataset filename.testIN
- Original test dataset filename.testOUT
- Modified test dataset filename.