IPADE: Iterative Prototype Adjustment for Nearest Neighbor Classification - Complementary Material

I. Triguero, S. García and F.Herrera, IPADE: Iterative Prototype Adjustment for Nearest Neighbor Classification.

  1. Abstract
  2. Experimental Framework
    1. Data-sets partitions employed in the paper
    2. Algorithms and parameters
  3. Statistical Tests used
  4. Results obtained
  5. JAVA code for the methods used

Abstract

Nearest prototype methods are a successful trend of many pattern classification tasks. However, they present several shortcomings such as time response, noise sensitivity and storage requirements. Data reduction techniques are suitable to alleviate these drawbacks. Prototype generation is an appropriate process for data reduction that allows the fitting of a data set for nearest neighbor classification.

This concise paper presents a methodology to learn iteratively the positioning of prototypes using real parameters' optimization procedures. Concretely, we propose an iterative prototype adjustment technique based on differential evolution (IPADE). The results obtained are contrasted with non-parametrical statistical tests and show that our proposal consistently outperforms previously proposed methods, thus becoming a suitable tool in the task of enhancing the performance of the nearest neighbor classifier.

Experimental framework

Data-sets partitions employed in the paper

For the experimental study, we have selected fifty data sets from the KEEL-datasets repository. In all the experiments, we have adopted a 10-fold cross-validation model, i.e., we have split the data-set randomly into 10 folds, each one containing the 10% of the patterns of the data set. Thus, nine folds have been used for training and one for test.

Tables 1 summarize the properties of the selected data sets. It shows, for each data-set, the number of examples (#Ex.), the number of attributes (#Atts.), and the number of classes (#Cl.). The last column of this table contains a link for downloading the 10-fold cross validation partitions for each data-set in KEEL format. All data-sets may be downloaded by clicking here.

Table 1. Summary Description for the data sets used

Data set #Ex. #At. #Cl. Download Data set #Ex. #At. #Cl. Download
Abalone 4174 8 28 dataBase.png led7digit 500 7 10 dataBase.png
Appendicitis 106 7 2 dataBase.png Lymphography 148 18 4 dataBase.png
Australian 690 14 2 dataBase.png Magic 19020 10 2 dataBase.png
Balance 625 4 3 dataBase.png Mammography 830 5 2 dataBase.png
Banana 5300 2 2 dataBase.png Marketing 8993 13 9 dataBase.png
Bands 539 19 2 dataBase.png Monk-2 432 6 2 dataBase.png
Breast 286 9 2 dataBase.png New Thyroid 215 5 3 dataBase.png
Bupa 345 6 2 dataBase.png Page-blocks 5473 10 5 dataBase.png
Car 1728 6 4 dataBase.png Pima 768 8 2 dataBase.png
Chess 3196 36 2 dataBase.png Ring 7400 20 2 dataBase.png
Cleveland 303 13 5 dataBase.png Saheart 462 9 2 dataBase.png
coil2000 9822 85 2 dataBase.png Satimage 6435 36 7 dataBase.png
Contraceptive 1473 9 3 dataBase.png Segment 2310 19 7 dataBase.png
Crx 653 15 12 dataBase.png Sonar 208 60 2 dataBase.png
Dermatology 366 34 6 dataBase.png Spectfheart 267 44 2 dataBase.png
Ecoli 336 7 8 dataBase.png Splice 3190 60 3 dataBase.png
Flare 1066 11 6 dataBase.png Tae 151 5 3 dataBase.png
German 1000 20 2 dataBase.png Thyroid 7200 21 3 dataBase.png
Glass 214 9 7 dataBase.png Tic-tac-toe 958 9 2 dataBase.png
Haberman 306 3 2 dataBase.png Titanic 2201 3 2 dataBase.png
Hayes-roth 160 4 3 dataBase.png Twonorm 7400 20 2 dataBase.png
Heart 270 13 2 dataBase.png Wine 178 13 3 dataBase.png
Hepatitis 80 19 2 dataBase.png Wisconsin 699 9 2 dataBase.png
Housevotes 435 16 2 dataBase.png Yeast 1484 8 10 dataBase.png
Iris 150 4 3 dataBase.png Zoo 101 16 7 dataBase.png
 

Algorithms and parameters

Several prototype generation methods have been selected to perform an exhaustive study of the capabilities of IPADE. We briefly describe them here:

1-NN

The 1-NN rule is used as a baseline limit of performance which most of the rest methods should overcome.

RSP3

J. S. Sanchez, ''High training set size reduction by space partitioning and prototype abstraction,'' Pattern Recognition, vol. 37, no. 7, pp. 1561--1564, 2004. PDF Icon

This technique is based on Chen's algorithm, and it tries to avoid drastic changes in the form of decision boundaries associated with the training set which are the main shortcomings of Chen's algorithm.

MixtGauss

M.Lozano,J.M.Sotoca,J.S.Sanchez,F.Pla,E.Pekalska,and R. P. W. Duin, ''Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces,'' Pattern Recognition, vol. 39, no. 10, pp. 1827--1838, 2006. PDF Icon

It is an adaptive PG method considered in the framework of mixture modeling by Gaussian distributions, while assuming a statistical independence of features.

ENPC

F. Fernandez and P. Isasi, ''Evolutionary design of nearest prototype classifiers,'' Journal of Heuristics, vol. 10, no. 4, pp. 431--454, 2004. PDF Icon

Usually, genetic algorithms are very dependent on some crucial parameters that have to be found by a trial and error process. One of the most important characteristics of this method is the lack of initial conditions, ENPC does not need any parameter.

Initially, this method selects a random prototype from the TR that it is introduced to GS. Next, while the end condition is achived five different operators are applied to GS. These operators focus their attention on defining regions in the search space.

First of all, ENPC obtains information about the search, i.e. it checks the quality of GS with regard to TR with different measures based on the NN.

Then, GS suffers a mutation phase where prototypes are re-labeled with the most populate class in each region of the search space. In order to refine GS a reproduction operator is introduced to generate new prototypes. Each prototype p from the GS has the opportunity of introducing a new prototype to increase its own quality with the objective of defining different regions. Next, a fight operator is able to refine the regions. This step does not modify the prototypes of GS it is only used to define again the different groups that we can find in GS. When the different regions have been established, ENPC executes the move operator. It consists in relocating each protoype in the best expected place, i.e. it is moved to the centroid of its region. Finally, a die operator is used to remove some prototypes from GS that are not relevant.

AMPSO

A. Cervantes, I. M. Galvan, and P. Isasi, ''AMPSO: A new particle swarm method for nearest neighborhood classification,'' IEEE Trans- actions on Systems, Man, and Cybernetics - Part B: Cybernetics, vol. 39, no. 5, pp. 1082--1091, 2009 PDF Icon

AMPSO is a PSO algorithm with a particular way of encoding the solution into the system. AMPSO constitutes the population with the prototypes of the solution, i.e. each particle is a prototype. Next, PSO rules guide the optimization of the positioning. At each iteration, the position of each prototype is updated with the movement equations, which are neighborhood-based.

This method has a quirky way of understand the neighborhood. Specifically, for each class two different sets of prototypes are established, competing and non-competing sets, which are defined at following:

For each prototype of class Ci, non-competing prototypes are all the prototypes of classes Cj != Ci that currently classify at least one pattern of TR of class Ci.

For each prototype of class Ci, competing prototypes are all the prototypes of classes Ci that currently classify at least one pattern of TR of class Ci

The position of each prototype is attracted by the closest non-competing prototype, and repelled by the closest competing prototype. Other parameters also determine the movement.

LVQPRU

J. Li, M. T. Manry, C. Yu, and D. R. Wilson, ''Prototype classifier design with pruning.'' International Journal on Artificial Intelligence Tools, vol. 14, no. 1-2, pp. 261--280, 2005. PDF Icon

LVQPRU extends LVQ by using a pruning step to remove noisy instances. Concretely, it uses a LVQ2.1 in order to refine the positioning of the prototypes and it prunes theses prototypes whose removal causes the least increase in classification error.

HYB

S.-W. Kim and B. J. Oommen, ''Enhancing prototype reduction schemes with LVQ3-type algorithms,'' Pattern Recognition, vol. 36, no. 5, pp. 1083--1093, 2003. PDF Icon

It constitutes a hybridization of several prototype reduction techniques. Concretely, HYB combines Support Vector Machines with LVQ3 and executes a search in order to find the most appropriate parameters of LVQ3.

LVQ3

T. Kohonen, ''The self organizing map,'' Proceedings of the IEEE, vol. 78, no. 9, pp. 1464--1480, 1990. PDF Icon

Learning Vector Quantization can be understood as a special case of artificial neural network in which a neuron corresponds to a prototype and a competition weight based is carried out in order to locate each neuron in a concrete place of the m-dimensional space to increase the classification accuracy.

The configuration parameters for all the algorithms of this study are shown in Table 3. We have selected common values for all problems, focusing this experimentation on the recommended parameters proposed by their respective authors, assuming that the choice of the values of the parameters was optimally chosen.

Table 2. Parameter specification for the algorithms employed in the experimentation.

Algorithm Parameters
IPADE iterations of Basic DE = 300/500/1000, iterSFGSS = 8,iterSFHC =20, Fl=0.1, Fu=0.9
RSP3 Subset Choice = Diameter
MixtGauss Reduction Rate = 0.95
ENPC Iterations = 300/500/1000
AMPSO Iterations = 300/500/1000, C1 = 1.0, C2 = 1.0, C3 = 0.25 Vmax = 1, W = 0.1, X = 0.5, Pr = 0.1, Pd = 0.1
LVQPRU Iterations = 300/500/1000, α = 0.1 ,WindowWidth = 0.5
HYB Search Iterations = 300/500/1000, Optimal search Iterations = 1000. α = 0.1 , Initial ε = 0, Final ε = 0.5, Initial WindowWidth = 0, Final WindowWidth = 0.5, δ = 0.1, delta WindowWidth = 0.1 Initial Selection = SVM
LVQ3 Iterations = 300/500/1000, α = 0.1, WindowWidth=0.2 ε = 0.1

Statistical tests used

In this paper, we use the hypothesis testing techniques to provide statistical support for the analysis of the results. Specifically, we use non-parametric tests, due to the fact that the initial conditions that guarantee the reliability of the parametric tests may not be satisfied, causing the statistical analysis to lose credibility with these parametric tests. These tests are suggested to use in the field of Machine Learning.

In order to perform multiple comparisons between our proposals and the rest of the techniques considered, we will use the Friedman Aligned-Ranks test to detect statistical differences among a group of results and the Holm post-hoc test, to find out which algorithms are distinctive among the 1*n comparisons performed.

The Friedman test is based on n sets of ranks, one set for each data set in our case; and the performances of the algorithms analyzed are ranked separately for each data set. Such a ranking scheme allows for intra-set comparisons only, since inter-set comparisons are not meaningful. When the number of algorithms for comparison is small, this may pose a disadvantage. In such cases, comparability among data sets is desirable and we can employ the method of aligned ranks.

In this technique, a value of location is computed as the average performance achieved by all algorithms in each data set. Then it calculates the difference between the performance obtained by an algorithm and the value of location. This step is repeated for algorithms and data sets. The resulting differences, called aligned observations, which keep their identities with respect to the data set and the combination of algorithms to which they belong, are then ranked from 1 to kn relative to each other. Then, the ranking scheme is the same as that employed by a multiple comparison procedure which employs independent samples; such as the Kruskal-Wallis test. The ranks assigned to the aligned observations are called aligned ranks.

The Friedman aligned ranks test statistic can be written as:

$$T=\frac{(k-1)[\sum_{j=1}^{k}\widehat{R_j^2}-(kn^2/4)(kn+1)^2]}{([kn(kn+1)(2kn+1)]/6)-(1/k)\sum_{i=1}^{n}\widehat{R_j^2}}$$

where Ri. is equal to the rank total of the ith data set and R.j is the rank total of the jth algorithm.

The test statistic T is compared for significance with a chi-square distribution for k - 1 degrees of freedom. Critical values can be found at Table A3 in (Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press, Boca Raton (2006)). Furthermore, the p-value could be computed through normal approximations (Abramowitz, M.: Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables. Dover, New York (1974)). If the null hypothesis is rejected, we can proceed with a post hoc test.

We focus on the comparison between a control method, which is usually the proposed method, and a set of algorithms used in the empirical study. This set of comparisons is associated with a set or family of hypotheses, all of which
are related to the control method. Any of the post hoc tests is suitable for application to nonparametric tests working over a family of hypotheses. The test statistic for comparing the ith algorithm and jth algorithm depends on the main nonparametric procedure used. In this case, it depends on the Friedman Aligned Ranks test:

Since the set of related rankings is converted to absolute rankings, the expression for computing the test statistic in Friedman Aligned Ranks is the same as that used by the Kruskal–Wallis test (Zar, J.H.: Biostatistical Analysis. Prentice Hall, Englewood Cliffs (1999)):

$$z=(\widehat{R_i}-\widehat{R_j})/\sqrt{\frac{k(n+1)}{6}}$$

where Ri; Rj are the average rankings by Friedman Aligned Ranks of the algorithms compared.

In statistical hypothesis testing, the p-value is the probability of obtaining a result at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. It is a useful and interesting datum for many consumers of statistical analysis. A p-value provides information about whether a statistical hypothesis test is significant or not, and it also indicates something about ‘‘how significant” the result is: the smaller the p-value, the stronger the evidence against the null hypothesis. Most importantly, it does this without committing to a particular level of significance.

When a p-value is considered in a multiple comparison, it reflects the probability error of a certain comparison, but it does not take into account the remaining comparisons belonging to the family. If one is comparing k algorithms and in each comparison the level of significance is α, then in a single comparison the probability of not making a Type I error is (1-α), then the probability of not making a Type I error in the k - 1 comparison is (1 - α)(k - 1). Then the probability of making one or more Type I error is 1 - (1 - α)(k - 1).

One way to solve this problem is to report adjusted p-values (APVs) which take into account that multiple tests are conducted. An APV can be compared directly with any chosen significance level α. We recommend the use of APVs due to the fact that they provide more information in a statistical analysis.

The z value in all cases is used to find the corresponding probability (p-value) from the table of normal distribution N(0,1), which is then compared with an appropriate level of significance a (Table A1 in Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures. CRC Press, Boca Raton (2006)). The post hoc tests differ in the way they adjust the value of a to compensate for multiple comparisons.

Next, we will define a candidate set of post hoc procedures and we will explain how to compute the APVs depending on the post hoc procedure used in the analysis, following the indications given in P.H. Westfall, S.S. Young, Resampling-based Multiple Testing: Examples and Methods for p-Value Adjustment, John Wiley and Sons (2004). The notation used in the computation of the APVs is as follows:

  • Indexes i and j each correspond to a concrete comparison or hypothesis in the family of hypotheses, according to an incremental order of their p-values. Index i always refers to the hypothesis in question whose APV is being computed and index j refers to another hypothesis in the family.
  • pj is the p-value obtained for the jth hypothesis.
  • k is the number of classifiers being compared.

The Holm procedure (S. Holm, A simple sequentially rejective multiple test procedure, Scandinavian Journal of Statistics 6 (1979) 65–70): it adjusts the value of a in a step-down manner. Let p1; p2; ... ; pk - 1 be the ordered p-values (smallest to largest), so that p1 ≤ p2 ≤ ... ≤ pk - 1, and H1;H2; ... ;Hk - 1 be the corresponding hypotheses. The Holm procedure rejects H1 to Hi - 1 if i is the smallest integer such that pi > α/(k - 1). Holm’s step-down procedure starts with the most significant p-value. If p1 is below α/(k - 1), the corresponding hypothesis is rejected and we are allowed to compare p2 with α/(k - 2). If the second hypothesis is rejected, the test proceeds with the third, and so on. As soon as a certain null hypothesis cannot be rejected, all the remaining hypotheses are retained as well. Holm APVi: min{v; 1}, where v = max{(k - j)pj : 1 ≤ j ≤ i}

More information about these tests and other statistical procedures can be found at SICIDM.

Results obtained

In this section, we report the full results obtained in the experimental study. For each pair data set/algorithm, we report its average performance results (using accuracy, reduction rate and Acc*Red) and standard deviations. Furthermore, we have done a previous study for each method that depends on the number of iterations performed, with 300, 500 and 1000 iterations in all the data sets. This parameter can be very sensitive to the problem tackled. An excesive number of iteration may produce overfitting for some problems, and a lower number of iterations may not be enough to tackle other data sets. Finally, we present the best performing number of iterations in each method and data set.

Accuracy Training

Tabla demasiado grande

Accuracy Test

Tabla demasiado grande

Reduction Rate

Tabla demasiado grande

Acc*Red

Tabla demasiado grande

Runtime

In this section we will analyze the average time elapsed (in seconds) by every PG method in each complete execution (no times are given for 1-NN, since it does not perform a data reduction phase). The characteristics of the machine used in the experimental study are:

Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz

Cache size : 8192 KB

Cpu cores : 4.

MemTotal : 12322668 kB.

Tabla demasiado grande

Summary

In this section we present a comparison between the different algorithms with the best results obtained for each data sets depending on the Iteration number.

Table 8.Accuracy test of the best results obtained for each method with the different data sets.

  1NN IPADE best RSP3 MixtGauss ENPC best AMPSO best LVQPRU best HYB best LVQ3 best
Data set Acc*Red. Acc*Red. Acc*Red. Acc*Red. Acc*Red. Acc*Red. Acc*Red. Acc*Red. Acc*Red.
appendicitis 0.7936 0.8873 0.8018 0.8318 0.7945 0.8609 0.8400 0.7645 0.8973
australian 0.8145 0.8652 0.8290 0.8304 0.7797 0.8420 0.8232 0.7203 0.8275
bal 0.7904 0.8528 0.8432 0.8735 0.7150 0.6032 0.8367 0.7025 0.7985
bands 0.6309 0.6921 0.6940 0.6810 0.7292 0.6606 0.7032 0.7126 0.6700
bre 0.6535 0.7159 0.6317 0.6293 0.5943 0.6951 0.6628 0.6083 0.7098
bupa 0.6108 0.7039 0.5772 0.6126 0.6170 0.6044 0.6086 0.5943 0.5896
car 0.8565 0.8253 0.8466 0.7188 0.8096 0.7089 0.8629 0.8356 0.8056
cleveland 0.5314 0.5678 0.5311 0.5349 0.5380 0.5681 0.5880 0.5116 0.5509
contraceptive 0.4277 0.5479 0.4406 0.4195 0.4271 0.4474 0.4508 0.4033 0.4446
crx 0.7957 0.8536 0.8348 0.8116 0.7754 0.8464 0.8290 0.7333 0.8319
dermatology 0.9535 0.9620 0.9536 0.9590 0.9372 0.8993 0.9566 0.9454 0.9563
ecoli 0.8070 0.8010 0.8157 0.7294 0.7650 0.7117 0.7406 0.7769 0.7715
flare-solar 0.5554 0.6679 0.5517 0.6679 0.6614 0.6548 0.6209 0.6108 0.6304
german 0.7050 0.7360 0.6990 0.6750 0.6430 0.6390 0.6560 0.6590 0.6990
glass 0.7361 0.6552 0.6711 0.4812 0.7185 0.5534 0.6613 0.7425 0.5725
haberman 0.6697 0.7413 0.6666 0.7545 0.6277 0.7414 0.7215 0.5230 0.6930
hayes-roth 0.3570 0.8216 0.2648 0.5536 0.3653 0.5648 0.4943 0.3327 0.5220
heart 0.7704 0.8481 0.7259 0.8000 0.7741 0.7963 0.8148 0.7333 0.7778
hepatitis 0.8075 0.8400 0.7546 0.7804 0.7550 0.8121 0.7946 0.7488 0.7804
housevotes 0.9216 0.9378 0.9053 0.8872 0.9008 0.8891 0.9145 0.9056 0.9057
iris 0.9333 0.9600 0.9400 0.9333 0.9400 0.9467 0.9533 0.9333 0.9000
led7digit 0.4020 0.7380 0.5680 0.7360 0.6380 0.7140 0.6920 0.6760 0.6880
lym 0.7387 0.7915 0.7730 0.6696 0.7177 0.6977 0.7630 0.7779 0.7633
mammographic 0.7368 0.8200 0.7409 0.7867 0.7243 0.7857 0.7503 0.5807 0.7170
monks 0.7791 0.9681 0.7092 0.4883 0.7455 0.7224 0.7558 0.7713 0.7649
newthyroid 0.9723 0.9675 0.9541 0.9258 0.9673 0.9307 0.9494 0.9582 0.9216
pima 0.7033 0.7618 0.7214 0.7358 0.6577 0.7241 0.7240 0.6538 0.7150
saheart 0.6449 0.7208 0.6492 0.6279 0.6099 0.6902 0.6559 0.5973 0.6948
sonar 0.8555 0.7783 0.8264 0.7071 0.9086 0.6674 0.8219 0.8698 0.7543
spectfheart 0.6970 0.7940 0.7422 0.7640 0.7568 0.7983 0.7534 0.7608 0.7793
tae 0.4050 0.5763 0.4654 0.3258 0.4388 0.5254 0.4379 0.5188 0.5308
tic-tac-toe 0.7307 0.7495 0.7662 0.6711 0.7640 0.6900 0.7400 0.7922 0.7109
wine 0.9552 0.9552 0.9324 0.9660 0.9608 0.9275 0.9663 0.9663 0.9663
wisconsin 0.9557 0.9699 0.9485 0.9714 0.9357 0.9599 0.9642 0.9414 0.9571
yeast 0.5047 0.5782 0.5405 0.5209 0.4832 0.4866 0.5263 0.5135 0.5223
zoo 0.9281 0.9767 0.8892 0.7406 0.7561 0.9139 0.8617 0.9364 0.9264
abalone 0.1991 0.2132 0.2350 0.1907 0.1979 0.2041 0.2298 0.2010 0.2020
banana 0.8751 0.8345 0.8755 0.6260 0.8566 0.8106 0.8789 0.8591 0.8657
chess 0.8470 0.9014 0.8786 0.6152 0.8454 0.7728 0.8292 0.8279 0.7819
coil2000 0.8963 0.9400 0.8954 0.9391 0.8692 0.9384 0.9059 0.7819 0.9256
magic 0.8059 0.7968 0.8082 0.7799 0.7734 0.7968 0.7843 0.6513 0.7768
marketing 0.2738 0.2918 0.2786 0.2859 0.2560 0.2562 0.2704 0.2353 0.2472
page-blocks 0.9576 0.9322 0.8107 0.8915 0.9534 0.9102 0.9424 0.9006 0.9238
ring 0.7524 0.8600 0.9896 0.7577 0.8811 0.7331 0.7051 0.7277 0.9218
satimage 0.9058 0.8159 0.8719 0.8418 0.9005 0.8001 0.8828 0.8914 0.8632
segment 0.9662 0.9056 0.7547 0.8506 0.9580 0.7664 0.9485 0.9545 0.9035
splice 0.7495 0.7536 0.9506 0.6266 0.8006 0.6209 0.7107 0.8135 0.8481
thyroid 0.9258 0.9401 0.7997 0.9096 0.9031 0.9205 0.9182 0.7482 0.9081
titanic 0.6075 0.7892 0.9838 0.7556 0.7792 0.7889 0.7044 0.7756 0.8885
twonorm 0.9468 0.9777 0.9154 0.9761 0.9450 0.9461 0.9469 0.9423 0.9508
AVERAGE 0.7368 0.7916 0.7451 0.7170 0.7370 0.7309 0.7511 0.7224 0.7551

 

Table 9.Reduction rate of the best results obtained for each method with the different data sets.

  1NN IPADE best RSP3 MixtGauss ENPC best AMPSO best LVQPRU best HYB best LVQ3 best
Data set Acc*Red. Acc*Red. Acc*Red. Acc*Red. Acc*Red. Acc*Red. Acc*Red. Acc*Red. Acc*Red.
appendicitis 0.0000 0.9738 0.8344 0.9581 0.6823 0.9371 0.9161 0.5996 0.9476
australian 0.0000 0.9955 0.8341 0.9517 0.6976 0.9485 0.9034 0.6462 0.9501
bal 0.0000 0.9909 0.8158 0.9520 0.7312 0.9467 0.9042 0.1340 0.9502
bands 0.0000 0.9932 0.5510 0.9505 0.7401 0.9464 0.9027 0.3129 0.9505
bre 0.0000 0.9872 0.6857 0.9534 0.5381 0.9456 0.9064 0.2952 0.9534
bupa 0.0000 0.9878 0.5014 0.9549 0.5839 0.8520 0.9549 0.3955 0.9517
car 0.0000 0.9946 0.7359 0.9511 0.8968 0.9486 0.9022 0.1697 0.9505
cleveland 0.0000 0.9725 0.7712 0.9633 0.5024 0.8759 0.9039 0.6124 0.9490
contraceptive 0.0000 0.9955 0.6544 0.9502 0.3644 0.8548 0.9011 0.3553 0.9502
crx 0.0000 0.9955 0.8419 0.9517 0.6958 0.9354 0.9031 0.1767 0.9501
dermatology 0.0000 0.9794 0.8673 0.9636 0.9359 0.9396 0.9523 0.5533 0.9493
ecoli 0.0000 0.9663 0.7440 0.9735 0.6644 0.8925 0.9031 0.7046 0.9372
flare-solar 0.0000 0.9976 0.9628 0.9512 0.9268 0.9491 0.9015 0.3059 0.9506
german 0.0000 0.9954 0.6203 0.9511 0.7237 0.9435 0.9004 0.9251 0.9500
glass 0.0000 0.9548 0.6948 0.9688 0.6122 0.9377 0.9045 0.4068 0.9377
haberman 0.0000 0.9855 0.7509 0.9564 0.5225 0.8629 0.9546 0.0083 0.9528
hayes-roth 0.0000 0.9520 0.6271 0.9723 0.6406 0.9136 0.9200 0.4855 0.9571
heart 0.0000 0.9868 0.7626 0.9506 0.7119 0.9362 0.9091 0.4554 0.9506
hepatitis 0.0000 0.9713 0.7699 0.9570 0.7914 0.8658 0.9111 0.5572 0.9534
housevotes 0.0000 0.9903 0.8631 0.9540 0.9111 0.9489 0.9017 0.5469 0.9515
iris 0.0000 0.9770 0.8785 0.9556 0.8837 0.9333 0.9600 0.8077 0.9556
led7digit 0.0000 0.9751 0.8882 0.9556 0.8487 0.8570 0.9562 0.0807 0.9511
lym 0.0000 0.9662 0.6872 0.9700 0.7635 0.9200 0.8972 0.8193 0.9399
mammographic 0.0000 0.9957 0.8001 0.9514 0.6319 0.8844 0.9015 0.5023 0.9503
monks 0.0000 0.9892 0.5144 0.9537 0.7739 0.9486 0.9028 0.5752 0.9511
newthyroid 0.0000 0.9835 0.8491 0.9535 0.8630 0.8742 0.9530 0.4092 0.9535
pima 0.0000 0.9959 0.6793 0.9508 0.6567 0.8760 0.9013 0.4010 0.9508
saheart 0.0000 0.9899 0.6792 0.9519 0.6308 0.8830 0.9026 0.3378 0.9519
sonar 0.0000 0.9770 0.6010 0.9573 0.8478 0.9466 0.9054 0.3333 0.9519
spectfheart 0.0000 0.9863 0.6979 0.9501 0.8177 0.9417 0.9051 0.5481 0.9501
tae 0.0000 0.9603 0.7005 0.9558 0.4629 0.9338 0.9169 0.6151 0.9558
tic-tac-toe 0.0000 0.9935 0.7610 0.9513 0.7866 0.9490 0.9016 0.3054 0.9501
wine 0.0000 0.9800 0.8489 0.9625 0.9245 0.8612 0.9089 0.3347 0.9501
wisconsin 0.0000 0.9965 0.9488 0.9523 0.9258 0.9357 0.9030 0.4332 0.9507
yeast 0.0000 0.9897 0.6602 0.9551 0.4894 0.9476 0.9024 0.8461 0.9491
zoo 0.0000 0.9121 0.8490 0.9229 0.8672 0.9229 0.9143 0.0213 0.9130
abalone 0.0000 0.9916 0.7610 0.9561 0.2567 0.8294 0.9001 0.6083 0.9475
banana 0.0000 0.9984 0.8962 0.9501 0.7485 0.9799 0.9004 0.0000 0.9501
chess 0.0000 0.9985 0.6362 0.9506 0.8920 0.8791 0.9004 0.1049 0.9503
coil2000 0.0000 0.9997 0.9396 0.9500 0.8341 0.9429 0.9802 0.8139 0.9500
magic 0.0000 0.9998 0.7675 0.9501 0.7695 0.9468 0.9001 0.7988 0.9501
marketing 0.0000 0.9981 0.7828 0.9511 0.3658 0.9800 0.9005 0.5369 0.9501
page-blocks 0.0000 0.9984 0.5510 0.9503 0.8914 0.9797 0.9007 0.2363 0.9500
ring 0.0000 0.9983 0.9054 0.9502 0.9130 0.9799 0.9003 0.8372 0.9500
satimage 0.0000 0.9983 0.7982 0.9575 0.8952 0.9824 0.9006 0.4742 0.9501
segment 0.0000 0.9946 0.7181 0.9529 0.8399 0.9798 0.9028 0.5408 0.9505
splice 0.0000 0.9982 0.8589 0.9509 0.9185 0.9791 0.9005 0.5018 0.9502
thyroid 0.0000 0.9992 0.5110 0.9500 0.8593 0.9796 0.9003 0.6342 0.9500
titanic 0.0000 0.9985 0.8417 0.9505 0.9936 0.9798 0.9006 0.8593 0.9500
twonorm 0.0000 0.9994 0.9210 0.9502 0.9529 0.9799 0.9003 0.9935 0.9500
AVERAGE 0.0000 0.9861 0.7564 0.9541 0.7436 0.9279 0.9115 0.4791 0.9493

 

Table 10. Accuracy*Reduction rate of the best results obtained for each method with the different data sets.

  1NN IPADE best RSP3 MixtGauss ENPC best AMPSO best LVQPRU best HYB best LVQ3 best
Data set Acc*Red. Acc*Red. Acc*Red. Acc*Red. Acc*Red. Acc*Red. Acc*Red. Acc*Red. Acc*Red.
appendicitis 0.0000 0.8641 0.6690 0.7969 0.5421 0.8067 0.7695 0.4584 0.8503
australian 0.0000 0.8613 0.6915 0.7903 0.5439 0.7986 0.7437 0.4655 0.7862
bal 0.0000 0.8450 0.6879 0.8316 0.5228 0.5710 0.7565 0.0941 0.7587
bands 0.0000 0.6874 0.3824 0.6473 0.5397 0.6252 0.6348 0.2230 0.6368
bre 0.0000 0.7067 0.4332 0.6000 0.3198 0.6573 0.6008 0.1796 0.6767
bupa 0.0000 0.6953 0.2894 0.5850 0.3603 0.5149 0.5812 0.2350 0.5611
car 0.0000 0.8208 0.6230 0.6837 0.7260 0.6725 0.7785 0.1418 0.7657
cleveland 0.0000 0.5522 0.4096 0.5153 0.2703 0.4976 0.5315 0.3133 0.5228
contraceptive 0.0000 0.5454 0.2883 0.3986 0.1556 0.3824 0.4062 0.1433 0.4225
crx 0.0000 0.8498 0.7028 0.7724 0.5395 0.7917 0.7487 0.1296 0.7904
dermatology 0.0000 0.9422 0.8271 0.9241 0.8771 0.8450 0.9110 0.5231 0.9078
ecoli 0.0000 0.7740 0.6069 0.7101 0.5083 0.6352 0.6688 0.5474 0.7230
flare-solar 0.0000 0.6663 0.5312 0.6353 0.6130 0.6215 0.5597 0.1868 0.5993
german 0.0000 0.7326 0.4336 0.6420 0.4653 0.6029 0.5907 0.6096 0.6641
glass 0.0000 0.6256 0.4663 0.4662 0.4399 0.5189 0.5981 0.3020 0.5368
haberman 0.0000 0.7306 0.5005 0.7216 0.3280 0.6398 0.6887 0.0043 0.6603
hayes-roth 0.0000 0.7822 0.1661 0.5383 0.2340 0.5160 0.4548 0.1615 0.4996
heart 0.0000 0.8369 0.5536 0.7605 0.5511 0.7455 0.7407 0.3339 0.7394
hepatitis 0.0000 0.8159 0.5810 0.7468 0.5975 0.7031 0.7240 0.4172 0.7440
housevotes 0.0000 0.9287 0.7814 0.8464 0.8207 0.8437 0.8246 0.4953 0.8618
iris 0.0000 0.9379 0.8258 0.8919 0.8307 0.8836 0.9152 0.7538 0.8600
led7digit 0.0000 0.7196 0.5045 0.7033 0.5415 0.6119 0.6617 0.0546 0.6544
lym 0.0000 0.7647 0.5312 0.6495 0.5480 0.6419 0.6846 0.6373 0.7174
mammographic 0.0000 0.8165 0.5928 0.7485 0.4577 0.6949 0.6764 0.2917 0.6814
monks 0.0000 0.9576 0.3648 0.4657 0.5769 0.6853 0.6823 0.4437 0.7275
newthyroid 0.0000 0.9515 0.8101 0.8828 0.8348 0.8136 0.9048 0.3921 0.8787
pima 0.0000 0.7587 0.4900 0.6996 0.4319 0.6343 0.6525 0.2622 0.6798
saheart 0.0000 0.7135 0.4409 0.5977 0.3847 0.6094 0.5920 0.2018 0.6614
sonar 0.0000 0.7604 0.4967 0.6769 0.7703 0.6318 0.7441 0.2899 0.7180
spectfheart 0.0000 0.7831 0.5180 0.7259 0.6188 0.7518 0.6819 0.4170 0.7404
tae 0.0000 0.5534 0.3260 0.3114 0.2031 0.4906 0.4015 0.3191 0.5073
tic-tac-toe 0.0000 0.7446 0.5831 0.6384 0.6010 0.6548 0.6672 0.2419 0.6754
wine 0.0000 0.9361 0.7915 0.9298 0.8883 0.7988 0.8783 0.3234 0.9181
wisconsin 0.0000 0.9665 0.8999 0.9251 0.8663 0.8982 0.8707 0.4078 0.9099
yeast 0.0000 0.5722 0.3568 0.4975 0.2365 0.4611 0.4749 0.4345 0.4957
zoo 0.0000 0.8908 0.7549 0.6835 0.6557 0.8434 0.7879 0.0199 0.8458
abalone 0.0000 0.2114 0.1788 0.1823 0.0508 0.1693 0.2068 0.1223 0.1914
banana 0.0000 0.8332 0.7846 0.5948 0.6412 0.7943 0.7914 0.0000 0.8225
chess 0.0000 0.9000 0.5590 0.5848 0.7541 0.6794 0.7466 0.0868 0.7430
coil2000 0.0000 0.9397 0.8413 0.8921 0.7250 0.8848 0.8880 0.6364 0.8793
magic 0.0000 0.7966 0.6203 0.7410 0.5951 0.7544 0.7059 0.5203 0.7380
marketing 0.0000 0.2912 0.2181 0.2719 0.0936 0.2511 0.2435 0.1263 0.2349
page-blocks 0.0000 0.9307 0.4467 0.8472 0.8499 0.8917 0.8488 0.2128 0.8776
ring 0.0000 0.8585 0.8960 0.7200 0.8044 0.7184 0.6348 0.6092 0.8757
satimage 0.0000 0.8145 0.6960 0.8060 0.8061 0.7860 0.7950 0.4227 0.8201
segment 0.0000 0.9007 0.5420 0.8105 0.8046 0.7509 0.8563 0.5162 0.8588
splice 0.0000 0.7522 0.8165 0.5958 0.7354 0.6079 0.6400 0.4082 0.8059
thyroid 0.0000 0.9393 0.4086 0.8641 0.7760 0.9017 0.8267 0.4745 0.8627
titanic 0.0000 0.7880 0.8281 0.7182 0.7742 0.7730 0.6344 0.6665 0.8441
twonorm 0.0000 0.9771 0.8431 0.9275 0.9005 0.9271 0.8525 0.9362 0.9033
AVERAGE 0.0000 0.7805 0.5718 0.6839 0.5742 0.6797 0.6852 0.3439 0.7167

Comparison of IPADE with PSO and LPD techniques

In this section we provide a comparison with PSO [1] and LPD [2] techniques. Concretely, we use the results established in [1] (Table 8) where the authors compare PSO, LPD and 1-NN in terms of classification error:

Table 11. Comparison between IPADE, PSO and LDP techniques.

  PSO(5) LPD(5) 1-NN IPADE
Data set Acc. Std. Acc. Std. Acc. Std. Acc. Std.
Heart 0.8400 0.0200 0.7300 0.0310 0.7840 0.0290 0.8481 0.0785
Pima 0.7510 0.0060 0.7360 0.0010 0.6950 0.0290 0.7592 0.0388
Wisconsin 0.9660 0.0060 0.9650 0.0090 0.9560 0.0060 0.9685 0.0210
Wine 0.9600 0.0080 0.9550 0.0150 0.9470 0.0130 0.9493 0.0632
Spectfheart 0.7940 0.0140 0.7900 0.0350 0.6960 0.0290 0.7940 0.0631
AVERAGE 0.8622 0.0108 0.8352 0.0182 0.8156 0.0212 0.8638 0.0529

JAVA code for the methods used

In this section, we provide the source code of IPADE and the rest of techniques employed in the experimental framework.

Source code

The source code of the IPADE package can be downloaded from here.

It is written in the Java programming language. Although it is developed to be run inside the KEEL environment, it can be also executed on a standard Java Virtual Machine. To do this, it is only needed to place the training datasets at the same folder, and write a script which contains al the relevant parameters of the algorithms (see the KEEL reference manual, section 3; located under the KEEL Software Tool item of the main menu).

KEEL version with Prototype Generation package

The KEEL version with the Prototype Generation package can be downloaded from here. Note that the source code is available in Section ''KEEL Source code''.

The complete KEEL user manual can be downloaded from here.

Essentially, to generate a experiment, it is needed to perform the following steps:

- Execute the KEEL GUI with the command: java -jar GraphInterKeel.jar.

- Select Experiments.

- Select Classification (it is recommended the use of 10-fold cross validation, as set by default).

- Select the data sets desired to use, and click on the main window to place them.

- Click on the third icon of the left bar, select the IG method desired, and click on the main window to place it.

- Click on the last icon of the left bar, and create some arcs which joins all the nodes of the experiments.

- Click on the Run Experiment icon, located at the top bar.

This way, it is possible to generate an experiment ready to be executed on any machine with a Java Virtual Machine installed.

References

[1] L. Nanni and A. Lumini, "Particle swarm optimization for prototype reduction," Neurocomputing, vol. 72, no. 4-6, pp. 1092-1097, 2009.

[2] R. Parades and E. Vidal, "Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization," Pattern Recognition, vol. 39, pp. 180-188, 2006.