Integrating Instance Selection, Instance Weighting and Feature Weighting for Nearest Neighbor Classifiers by Co-evolutionary Algorithms - Complementary Material

This Website contains complementary material to the paper:

J. Derrac, I.Triguero, S. García and F.Herrera, Integrating Instance Selection, Instance Weighting and Feature Weighting for Nearest Neighbor Classifiers by Co-evolutionary Algorithms. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 42:5 (2012) 1383-1397, doi: 10.1109/TSMCB.2012.2191953 PDF Icon

The web is organized according to the following summary:

  1. Abstract
  2. CIW-NN model
  3. Experimental framework
    1. Data-sets partitions employed in the paper
    2. Algorithms and parameters
    3. Java implementation of CIW-NN
  4. Results

Experimental framework

Data-sets partitions employed in the paper

For the experimental study, we have selected thirty small data sets and eight large data sets from KEEL-datasets. In all the experiments, we have adopted a 10-fold cross-validation model, i.e., we have split the data-set randomly into 10 folds, each one containing the 10% of the patterns of the data set. Thus, nine folds have been used for training and one for test. The results reported have been obtained through 5 rounds of 10 runs per data set(i.e. five runs per each possible test set).

Tables 1 and 2 summarize the properties of the selected data sets. It shows, for each data-set, the number of examples (#Ex.), the number of attributes (#Atts.), and the number of classes (#Cl.). The last column of this table contains a link for downloading the 10-fold cross validation partitions for each data-set in KEEL format. All data-sets may be downloaded by clicking here.

Table 1. Summary Description for Small data sets

Data set #Ex. #At. #Cl. Download Data set #Ex. #At. #Cl. Download
Australian 690 14 2 iconZip.png Monk-2 432 6 2 iconZip.png
Balance 625 4 3 iconZip.png Movement 360 90 15 iconZip.png
Bands 539 19 2 iconZip.png New Thyroid 215 5 3 iconZip.png
Breast 286 9 2 iconZip.png Pima 768 8 2 iconZip.png
Bupa 345 6 2 iconZip.png Saheart 462 9 2 iconZip.png
Car 1728 6 4 iconZip.png Sonar 208 60 2 iconZip.png
Cleveland 303 13 5 iconZip.png Spectfheart 267 44 2 iconZip.png
Contraceptive 1473 9 3 iconZip.png Tae 151 5 3 iconZip.png
Dermatology 366 34 6 iconZip.png Tic-tac-toe 958 9 2 iconZip.png
German 1000 20 2 iconZip.png Vehicle 846 18 4 iconZip.png
Glass 214 9 7 iconZip.png Vowel 990 13 11 iconZip.png
Hayes-roth 160 4 3 iconZip.png Wine 178 13 3 iconZip.png
Housevotes 435 16 2 iconZip.png Wisconsin 699 9 2 iconZip.png
Iris 150 4 3 iconZip.png Yeast 1484 8 10 iconZip.png
Lymphography 148 18 4 iconZip.png Zoo 101 16 7 iconZip.png

Table 2. Summary Description for Large data sets

Data set #Ex. #At. #Cl. Download Data set #Ex. #At. #Cl. Download
Abalone 4174 8 28 iconZip.png Page-blocks 5473 10 5 iconZip.png
Banana 5300 2 2 iconZip.png Phoneme 5404 5 2 iconZip.png
Chess 3196 36 2 iconZip.png Segment 2310 19 7 iconZip.png
Marketing 8993 13 9 iconZip.png Splice 3190 60 3 iconZip.png

Algorithms and parameters

Several classification methods, evolutionary and non-evolutionary, have been selected to perform an exhaustive study of the capabilities of CIW-NN. We briefly describe them here:

1-NN

The 1-NN rule is used as a baseline limit of performance which most of the rest methods should overcome.

IS-CHC, FW-SSGA and IW-SSGA

These methods follows exactly the same setup as the populations of CIW-NN, except that the Accuracy Rate component of their fitness function is computed separately.

SSMA

S. García, J.R. Cano, F. Herrera, “A Memetic Algorithm for Evolutionary Prototype Selection: A Scaling Up Approach”, Pattern Recognition, vol. 41, no. 8, pp. 2693-2709, 2008. PDF Icon

A Steady-State Memetic Algorithm specifically designed for prototype selection. This evolutionary IS includes a meme optimization mechanism (a local search procedure) able to improve the accuracy achieved by the SSGA and to avoid premature convergence. Moreover, it offers an high reduction capability and a good behaviour when tackling large problems

PW, CW and CPW

R. Paredes and E. Vidal, “Learning weighted metrics to minimize nearest-neighbor classification error”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 7, pp. 1100–1110, 2006. PDF Icon

Three gradient descent based algorithms developed with the aim of minimize a performance index that is an approximation to the leave one out error over the training set. Weights may be specified for each instance (Prototype Weighting, PW), for each combination of feature and class (Class Weighting, CW) or both (Class and Prototype Weigthing, CPW)

WDNN

 

A novel IW method which searches iteratively, in each training instance, for the best weight to minimize the leave one out error over the training set. A weight of 0:0 can be assigned to any instance, which means that it is discarded from the data set. Thus, this method can be regarded as a simultaneous IS and IW method.

TS/KNN

M. Z. Jahromi, E. Parvinnia, and R. John, “A method of learning weighted similarity function to improve the performance of nearest neighbor”, Information Sciences, vol. 179, no. 17, pp. 2964–2973, 2009. PDF Icon

A Tabu Search based method for simultaneous Feature Selection and Feature Weighting, which encodes in its solutions the current set of features selected (binary codification), the current set of weights assigned to features, and the best value of k found for the k-NN classifier. Furthermore, this method uses Fuzzy k-NN to avoid ties in the classification process.

ReliefF

M. A. Tahir, A. Bouridane, and F. Kurugollu, “Simultaneous feature selection and feature weighting using hybrid tabu search/k-nearest neighbor classifier”, Pattern Recognition Letters, vol. 28, no. 4, pp. 438–446, 2007. PDF Icon

The first Relief-based method adapted to perform FW process. Weights computed in the original Relief algorithm are not binarized to {0,1} Instead, they are employed as final weights for the k-NN classifier.

MI

D. Wettschereck, D. W. Aha, and T. Mohri, “A review and empirical evaluation of feature weigthing methods for a class of lazy learning algorithms,” Artificial Intelligence Review, vol. 11, pp. 273–314, 1997. PDF Icon

Mutual Information between features can be used successfully as a weighting factor for k-NN based algorithms. In our implementation, we have followed the indications given by Wettschereck et al., using the Fayad and Irani's discretizer to allow this method to handle continuous valued problems.

GOCBR

H. Ahn and K. Kim, “Bankruptcy prediction modeling with hybrid case-based reasoning and genetic algorithms approach”, Applied Soft Computing, vol. 9, pp. 599–607, 2009. PDF Icon

A genetic algorithm designed for simultaneous Instance Selection and Feature Weighting process in the same chromosome. Weights are represented by binary chains, thus preserving binary codification in the chromosomes. It has been applied successfully in several real-world applications.

 

The configuration parameters for all the algorithms of this study are shown in Table 3. We have selected common values for all problems, focusing this experimentation on the recommended parameters proposed by their respective authors, assuming that the choice of the values of the parameters was optimally chosen.

Table 3. Parameter specification for the algorithms employed in the experimentation.

Algorithm Parameters
CIW-NN Evaluations: 10000, Population size (IS, FW and IW): 50, α: 0.5, prob0to1: 0.25, prob1: 0.25, Crossover operator: 2BLX0.3-4BLX0.5-2BLX0.7, Mutation probability: 0.05 per chromosome, Epoch length (FW population): 40 evaluations, Epoch length (IW population): 40 evaluations, γ: 0.8
IS-CHC Evaluations: 10000, Population size: 50, α: 0.5, prob0to1: 0.25, prob1: 0.25
FW-SSGA Evaluations: 10000, Population size: 50, Crossover operator: 2BLX0.3-4BLX0.5-2BLX0.7, Mutation probability: 0.05 per chromosome
IW-SSGA Evaluations: 10000, Population size: 50, Crossover operator: 2BLX0.3-4BLX0.5-2BLX0.7, Mutation probability: 0.05 per chromosome
SSMA Evaluations: 10000, Population size: 30, Cross probability: 0.5 per bit, Mutation probability: 0.001 per bit
PW β: Best in [0.125,128], ρ: Best in [0.1,0.001], ε: 0.001, Max iterations: 1000
CW β: Best in [0.125,128], μ: Best in [0.1,0.001], ε: 0.001, Max iterations: 1000
CPW β: Best in [0.125,128], μ: Best in [0.1,0.001], ρ: Best in [0.1,0.001], ε: 0.001, Max iterations: 1000
WDNN Maximun number of iterations: 10
TS/K-NN Evaluations: 10000, M: 10, N: 2, P: ceil(Sqrt(#Features))
ReliefF K value for contributions: Best in [1,20]
MI It has no parameters to be fixed
GOCBR Evaluations: 10000, Population size: 100, Crossover probability: 0.7, Mutation Probability: 0.1

 

Java implementation

Below we offer the Java source code implementing the CIW-NN model, ready to use with any of the data sets considered in the experimental framework. This implementation is fully compliant with the standards of the KEEL project, and will be considered for inclusion in the KEEL Software tool.

  • Download source code of CIW-NN       iconZip.png

This implementation of CIW-NN can be executed on a standard Java Virtual Machine (Version 1.5 or higher). To do this, it is only needed to write a script which contains al the relevant parameters of the algorithms (see the KEEL reference manual, section 3; located under the KEEL Software Tool item of the main menu).

Esentially, to run CIW-NN, it is needed to perform the following steps:

- Unzip the CIW-NN-source.zip file

- Execute CIW-NN with the command java -jar CIW_NN.jar script.txt

- The results of the classification process will be found in the actual folder.