Prototype Reduction in Nearest Neighbor Classification: Prototype Selection and Prototype Generation

This Website contains SCI2S research material on Prototype Reduction in Nearest Neighbor Classification. This research is related to the following SCI2S surveys published recently:

S. García, J. Derrac, J.R. Cano and F. Herrera, Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study. IEEE Transactions on Pattern Analysis and Machine Intelligence 34:3 (2012) 417-435 doi: 10.1109/TPAMI.2011.142 COMPLEMENTARY MATERIAL to the paper here: datasets, experimental results and source codes.
I. Triguero, J. Derrac, S. García and F. Herrera, A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification . IEEE Transactions on Systems, Man, and Cybernetics--Part C: Applications and Reviews 42:1 (2012) 86-100, doi: 10.1109/TSMCC.2010.2103939 COMPLEMENTARY MATERIAL to the paper here: datasets, experimental results and source codes.

The web is organized according to the following Summary:

Introduction to Prototype Reduction
Prototype Selection
Prototype Generation
Prototype Reduction Outlook

Introduction to Prototype Reduction

Nowadays, the amount of data which scientific, research and industry processes must handle has increased substantially. The original capabilities of most of Data Mining techniques have been exceeded by the incoming data deluge. However, several tools have been developed to deal with this issue, easing the drawbacks of using an overwhelming amount of data to feed the Data Mining processes.

Data Reduction (D.Pyle, Data Preparation for Data Mining, Morgan Kaufmann, San Francisco (1999).) techniques arise as methods for minimizing the impact of huge data sets in the behavior of algorithms, reducing the size of the data without harming the quality of the intrinsic knowledge stored initially. That is, data handling requirements are reduced whereas the predictive capability of the algorithms is maintained.

The K-Nearest Neighbors classifier (KNN) (T.M. Cover, P.E. Hart, Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory 13 (1967) 21–27. doi: 10.1109/TIT.1967.1053964) is probably one of the most benefited algorithms from Data Reduction techniques. Although it is one of the most popular algorithms in Machine Learning (X. Wu and V. Kumar, Eds. , The Top Ten Algorithms in Data Mining. Chapman & Hall/CRC, Data Mining and Knowledge Discovery (2009).), mainly due to its simplicity, and overall good performance, the KNN suffers from several drawbacks related to the size of the training set. In order to avoid, or, at least, ease these drawbacks, several Data Reduction techniques have been sucessfully applied for enhancing the behavior of this algorithm, including Feature Selection (H. Liu, H. Motoda (Eds.) , Computational Methods of Feature Selection, Chapman & Hall, CRC, London, Boca Raton (2007).), Instance Selection (H. Liu, H. Motoda (Eds.), Instance Selection and Construction for Data Mining, Springer, New York (2001).) or Feature Discretization (H. Liu, F. Hussain, C. Lim, M. Dash, Discretization: An Enabling Technique. Data Mining and Knowledge Discovery 6:4 (2002) 393-423. doi: 10.1023/A:1016304305535).

Instance-Based Learning algorithms (D. W. Aha, D. F. Kibler, M. K. Albert: Instance-Based Learning Algorithms. Machine Learning 6 (1991) 37-66. doi: 10.1023/A:1022689900470), such as the KNN, rely heavily in the quality of the instances stored as training data. The memory requirements, time complexity and accuracy of the classifier can be affected substantially if the initial data is preprocessed in a smart way: the removal or correction of noisy instances will enhance the generalization capabilities of the algorithm, whereas the deletion of redundant or irrelevant examples will reduce the computational time and the amount of memory needed to run the classifier.

The first attempts to incorporate the notion of prototypes to the KNN realm were focused in the reduction of the original data. By condensing the examples that make up the training set (P. E. Hart, The Condensed Nearest Neighbor Rule. IEEE Transactions on Information Theory 18 (1968) 515–516. doi: 10.1109/TIT.1968.1054155), or by deleting noisy instances (D. L. Wilson, Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on System, Man and Cybernetics 2:3 (1972) 408–421. doi: 10.1109/TSMC.1972.4309137), the first Prototype Selection methods starting to arouse the attention of many researchers and practitioners as years went by.

On the other hand, Prototype Generation arise as a way to generate new data from the starting training set, aiming to be best suited to represent the domain of the problem. Prototype Generation methods often work by modifying the position or the label of the original instances, or by generating new ones. The resulting training sets are kept small, thus obtaining time and memory enhancements comparable to those provided by Prototype Selection Methods.

In this website, we provide and outlook of the Prototype Selection and Prototype Generation fields, merged together as Prototype Reduction techniques. Background information, taxonomies and method's references are given, as well as a full set of experimental results obtained through the application of the techniques over a set of well-known supervised classification problems. SCI2S contributions in both fields are also outlined.

In addition, we also provide an outlook of the Prototype Reduction field, describing the most important milestones in its development. Related approaches and outstanding evolutionary proposals are highlighted. Finally, the outlook is concluded with a Visibility analysis of the area at the Web of Science (WoS).

Prototype Selection

Background

Prototype Selection methods (S. García, J. Derrac, J.R. Cano and F. Herrera, Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study. IEEE Transactions on Pattern Analysis and Machine Intelligence 34:3 (2012) 417-435. doi: 10.1109/TPAMI.2011.142) are Instance Selection methods which expect to find training sets offering best classification accuracy by using the KNN classifier. Thus, their goal is to isolate the smallest set of instances which enable a data mining algorithm to predict the class of a query instance with the same quality as the initial data set. By minimizing the data set size, it is possible to reduce the space complexity and decrease the computational cost of the Data Mining algorithms that will be applied later, improving their generalization capabilities through the elimination of noise.

They can be formally specified as follows: Let Xp be an instance where $X_p = (X_{p1}, X_{p2}, ... , X_{pm}, X_{pc})$, with Xp belonging to a class c given by $X_{pc}$, and a m-dimensional space in which $X_{pi}$ is the value of the i-th feature of the p-th sample. Then, let us assume that there is a training set TR which consists of N instances, and a test set TS composed of T instances. Let S ⊆ TR be the subset of selected samples that resulted from the execution of a Prototype Selection algorithm, then new patterns T from TS can be classified by the KNN acting only over the instances of S.

Therefore, Prototype Selection methods select a subset of examples from the original training data. Depending on the strategy followed, they can be categorized in three classes: preservation methods, which aim to obtain a consistent subset from the training data, ignoring the presence of noise; noise removal methods, which aim to remove noise both in the boundary points (instances near to the decision boundaries) and in the inner points (instances far from the decision boundaries), and hybrid methods, which perform both objectives simultaneously.

Evolutionary Computation (A. Ghosh, L. C. Jain (Eds), Evolutionary Computation in Data Mining. Springer-Verlag, New York (2005)) has emerged among the extensive range of fields which have been producing Prototype Selection model in these years. It has become a outstanding methodology for the definition of new Prototype Selection approaches, redefining the Prototype Selection problem itself as a binary optimization problem (J. R. Cano, F. Herrera, M. Lozano, Using Evolutionary Algorithms as Instance Selection for Data Reduction in KDD: an Experimental Study. IEEE Transactions on Evolutionary Computation 7:6 (2003) 561–575. doi: 10.1109/TEVC.2003.819265). In this way, the Evolutionary Computation has contributing with some methods that excels both in the accuracy enhancement of the KNN and in the reduction of its computational cost.

As years goes by, more Prototype Selection methods are developed with objective of improving the performance of the former approaches. Improvements in storage reduction, noise tolerance, generalization accuracy and time requirements are reported still nowadays. They have become a proof of the lively character of this field, which continues to attracting the interest of many researching communities in the search of new ways to further improve the performance of the KNN classifier.

Taxonomy

The Prototype Selection taxonomy is oriented to categorize the algorithms regarding three aspects related to their operation:

Direction of Search: It refers to the directions in which the search can proceed: Incremental, decremental, batch, mixed or fixed.
Type of selection: This factor is mainly conditioned by the type of search carried out by the algorithms, whether they seek to retain border points, central points or some other set of points. The categories defined here are: Condensation, edition and hybrid.
Evaluation of Search: The third criteria relates to the way in which each step of the search is evaluated. Usually, the objective pursued is to make a prediction on a non-definitive selection and to compare between selections. This characteristic influences the quality criterion and it can be divided into two categories: Fixed and Wrapper

Figure 1 shows a map of the taxonomy, categorizing all the Prototype Selection methods described in this website. Later, they will be described and referenced.

Figure 1. Prototype Selection Taxonomy

The features that define each category are shown as follows:

Direction of Search
- Incremental: An incremental search begins with an empty subset S, and adds each instance in TR to S if it fulfills some criteria. In this case, the algorithm depends on the order of presentation and this factor could be very important. Under such a scheme, the order of presentation of instances in TR should be random because by definition, an incremental algorithm should be able to handle new instances as they become available without all of them being present at the beginning. Nevertheless, some recent incremental approaches are order-independent because they add instances to S in a somewhat incremental fashion, but they examine all available instances to help select which instance to add next. This makes the algorithm not truly incremental as we have defined above, although we will also consider them as incremental approaches.
  One advantage of an incremental scheme is that if instances are made available later, after training is complete, they can continue to be added to S according to the same criteria. This capability could be very helpful when dealing with data streams or online learning. Another advantage is that they can be faster and use less storage during the learning phase than non-incremental algorithms. The main disadvantage is that incremental algorithms must make decisions based on little information and are therefore prone to errors until more information is available.
- Decremental: The decremental search begins with S = TR, and then searches for instances to remove from S. Again, the order of presentation is important, but unlike the incremental process, all of the training examples are available for examination at any time.
  One disadvantage with the decremental rule is that it presents a higher computational cost than incremental algorithms. Furthermore, the learning stage must be done in an off-line fashion because decremental approaches need all possible data. However, if the application of a decremental algorithm can result in greater storage reduction, then the extra computation during learning (which is done just once) can be well worth the computational savings during execution thereafter.
- Batch: Another way to apply a Prototype Selection process is in batch mode. This involves deciding if each instance meets the removal criteria before removing any of them. Then all those that do meet the criteria are removed at once. As with decremental algorithms, batch processing suffers from increased time complexity over incremental algorithms.
- Mixed: A mixed search begins with a pre-selected subset S (randomly or selected by an incremental or decremental process) and iteratively can add or remove any instance which meets the specific criterion. This type of search allows rectifications to already done operations and its main advantage is to make easy to obtain good accuracy-suited subsets of instances. It usually suffers from the same drawbacks reported in decremental algorithms, but this fact depends to a great extent on the specific proposal. Note that these kinds of algorithms are closely related to the order-independent incremental approaches but, in this case, instance removal from S is allowed.
- Fixed: A fixed search is a subfamily of mixed search in which the number of additions and removals remains the same. Thus, the number of final prototypes is determined at the beginning of the learning phase and is never changed. This strategy of search is not very common in Prototype Selection, although it is typical in Prototype Generation methods, such as LVQ.
Type of Selection
- Condensation: This set includes the techniques which aim to retain the points which are closer to the decision boundaries, also called border points. The intuition behind retaining border points is that internal points do not affect the decision boundaries as much as border points, and thus can be removed with relatively little effect on classification. The idea behind these methods is to preserve the accuracy over the training set, but the generalization accuracy over the test set can be negatively affected. Nevertheless, the reduction capability of condensation methods is normally high due to the fact that there are fewer border points than internal points in most of the data.
- Edition: These kinds of algorithms instead seek to remove border points. They remove points that are noisy or do not agree with their neighbors. This removes close border points, leaving smoother decision boundaries behind. However, such algorithms do not remove internal points that do not necessarily contribute to the decision boundaries. The effect obtained is related to the improvement of generalization accuracy in test data, although the reduction rate obtained is lower.
- Hybrid: Hybrid methods try to find the smallest subset S which maintains or even increases the generalization accuracy in test data. To achieve this, it allows the removal of internal and border points based on criteria followed by the two previous strategies. The kNN classifier is highly adaptable to these methods, obtaining great improvements even with a very small subset of instances selected.
Evaluation of Search
- Filter: When the KNN rule is used for partial data to determine the criteria of adding or removing and no leave-one-out validation scheme is used to obtain a good estimation of generalization accuracy. The fact of using subsets of the training data in each decision increments the efficiency of these methods, but the accuracy may not be enhanced.
- Wrapper: When the KNN rule is used for the complete training set with the leave-one-out validation scheme. The conjunction in the use of the two mentioned factors allows us to get a great estimator of generalization accuracy, which helps to obtain better accuracy over test data. However, each decision involves a complete computation of the kNN rule over the training set and the learning phase can be computationally expensive.

Prototype Selection Methods

The Prototype Selection field has provided a remarkable quantity of approaches over the last forty years. Figure 2 depicts the evolution of the area, showing the year of publication of each method. Since the methods are gathered regarding their place in the taxonomy, this map also allows to observe the differences in the growth of each of the sub-areas.

Figure 2. Evolution map of the Prototype Selection field

Each dot considered in the map represent a significant contribution in the area. More details about them can be found in the SCI²S technical report about Prototype Selection methods:

S. García, J. Derrac, J.R. Cano and F.Herrera, Prototype Selection for Nearest Neighbor Classification: Survey of Methods.

References of the methods (by year of publishing)

1968
- Condensed Nearest Neighbor (CNN)
  P. E. Hart, The Condensed Nearest Neighbor Rule. IEEE Transactions on Information Theory 18 (1968) 515–516, doi: 10.1109/TIT.1968.1054155
1972
- Reduced Nearest Neighbor (RNN)
  G. W. Gates, The Reduced Nearest Neighbor Rule. IEEE Transactions on Information Theory 22 (1972) 431–433, doi: 10.1109/TIT.1972.1054809
- Edited Nearest Neighbor (ENN)
  D. L. Wilson, Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Transactions on System, Man and Cybernetics 2:3 (1972) 408–421, doi: 10.1109/TSMC.1972.4309137
1974
- Ullmann's method (Ullman)
  J. R. Ullmann, Automatic Selection of Reference Data for Use in a Nearest-Neighbor Method of Pattern Classification. IEEE Transactions on Information Theory 24 (1974) 541–543, doi: 10.1109/TIT.1974.1055252
1975
- Selective Nearest Neighbor (SNN).
  G. L. Ritter, H. B. Woodruff, S. R. Lowry, T. L. Isenhour, An Algorithm for a Selective Nearest Neighbor Decision Rule. IEEE Transactions on Information Theory 25 (1975) 665–669, doi: 10.1109/TIT.1975.1055464
1976
- Repeated Edited Nearest Neighbor (RENN).
- All-Knn method (AllKNN).
  I. Tomek, An Experiment with the Edited Nearest-Neighbor Rule. IEEE Transactions on Systems, Man, and Cybernetics 6:6 (1976) 448–452, doi: 10.1109/TSMC.1976.4309523
- Tomek Condensed Nearest Neighbor (TCNN)
  I. Tomek, Two Modifications of CNN. IEEE Transactions on Systems, Man, and Cybernetics 6:6 (1976) 769–772, doi: 10.1109/TSMC.1976.4309452
1979
- Mutual Neighborhood Value (MNV)
  K. C. Gowda, G. Krishna, The Condensed Nearest Neighbor Rule Using the Concept of Mutual Nearest Neighborhood. IEEE Transactions on Information Theory 29 (1976) 488–490, doi: 10.1109/TIT.1979.1056066
1982
- MultiEdit (MultiEdit)
  P. A. Devijver, J. Kittler, Pattern Recognition, A Statistical Approach. Prentice Hall (1982).
  
  P. A. Devijver, On the Editing Rate of the Multiedit Algorithm. Pattern Recognition Letters 4:1 (1986) 9–12, doi: 10.1016/0167-8655(86)90066-8
1987
- Shrink (Shrink)
  D. Kibler, D. W. Aha. Learning Representative Exemplars of Concepts: An initial case study. Proceedings of the Fourth International Workshop on Machine Learning (1987)24–30 .
1991
- Instance Based 2 (IB2)
- Instance Based 3 (IB3)
  D. W. Aha, D. Kibler, M. K. Albert. Instance-Based Learning Algorithms. Machine Learning 6:1 (1991) 37–66, doi: 10.1023/A:1022689900470
1994
- Monte Carlo 1 (MC1)
- Random Mutation Hill Climbing (RMHC)
  D. B. Skalak, Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms. Proceedings of the Eleventh International Conference on Machine Learning (1994) 293–301 .
- Minimal consistent set (MCS)
  B. V. Dasarathy, Minimal Consistent Set (MCS) Identification for Optimal Nearest Neighbor Decision System Design. IEEE Transactions on Systems, Man and Cybernetics 24:3 (1994) 511–517, doi: 10.1109/21.278999
1995
- Encoding Lenght Heuristic (ELH)
- Encoding Lenght Growth (ELGrowth)
- Explore (Explore)
  R. M. Cameron-Jones, Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing. Proceedings of the Eighth Australian Joint Conference on Artificial Intelligence (1995) 99–106.
- Model Class Selection (MoCS)
  C. E. Brodley, Recursive Automatic Bias Selection for Classifier Construction. Machine Learning 20:1-2 (1995) 63–94, doi: 10.1007/BF00993475
- Variable Similarity Metric (VSM)
  D. G. Lowe, Similarity Metric Learning for a Variable-Kernel Classifier. Neural Computation 7:1 (1995) 72–85, doi: 10.1162/neco.1995.7.1.72
- Generational Genetic Algorithm (GGA)
  L. I. Kuncheva, Editing for the K-Nearest Neighbors Rule by a Genetic Algorithm. Pattern Recognition Letters 16:8 (1995) 809–814, doi: 10.1016/0167-8655(95)00047-K
  
  L. I. Kuncheva, L. C. Jain, Nearest Neighbor Classifier: Simultaneous Editing and Feature Selection. Pattern Recognition Letters 20:11-13 (1999) 1149–1156, doi: 10.1016/S0167-8655(99)00082-3
1997
- Gabriel Graph Editing (GGE)
- Relative Neighborhood Graph Editing (RNGE)
  J. S. Sánchez, F. Pla, F. J. Ferri, Prototype Selection for the Nearest Neighbor Rule Through Proximity Graphs. Pattern Recognition Letters 18 (1997) 507–513, doi: 10.1016/S0167-8655(97)00035-4
1998
- Polyline functions (PF)
  U. Lipowezky, Selection of the Optimal Prototype Subset for 1-nn Classification. Pattern Recognition Letters 19:10 (1998) 907–918, doi: 10.1016/S0167-8655(98)00075-0
2000
- Modified Edited Nearest Neighbor (MENN)
  K. Hattori, M. Takahashi, A New Edited K-Nearest Neighbor Rule in the Pattern Classification Problem. Pattern Recognition 33:3 (2000) 521–528, doi: 10.1016/S0031-3203(99)00068-0
- Decremental Reduction Optimization Procedure 1 (DROP1)
- Decremental Reduction Optimization Procedure 2 (DROP2)
- Decremental Reduction Optimization Procedure 3 (DROP3)
- Decremental Reduction Optimization Procedure 4 (DROP4)
- Decremental Reduction Optimization Procedure 5 (DROP5)
- Decremental Enconding Length (DEL)
  D. R. Wilson, T. R. Martinez, Reduction Techniques for Instance-Based Learning Algorithms. Machine Learning 38:3 (2000) 257–286, doi: 10.1023/A:1007626913721
- Prototype Selection using Relative Certainty Gain (PSRCG)
  M. Sebban, R. Nock, Instance Pruning as an Information Preserving Problem. Proceedings of the Seventeenth International Conference on Machine Learning (2000) 855–862 .
  
  M. Sebban, R. Nock, E. Brodley, A. Danyluk, Stopping Criterion for Boosting-Based Data Reduction Techniques: From Binary to Multiclass Problems. Journal of Machine Learning Research, 3 (2002) 863–885.
2001
- Estimation of Distribution Algorithm (EDA)
  B. Sierra, E. Lazkano, I. Inza, M. Merino, P. Larrañaga, J. Quiroga, Prototype Selection and Feature Subset Selection by Estimation of Distribution Algorithms. A Case Study in the Survival of Cirrhotic Patients Treated with TIPS. Proceedings of the 8th Conference on AI in Medicine in Europe, Lecture Notes in Computer Science 2101 (2001) 20–29.
- Tabu Search (CerverónTS)
  V. Cerverón, F. J. Ferri, Another Move Toward the Minimum Consistent Subset: A Tabu Search Approach to the Condensed Nearest Neighbor Rule. IEEE Transactions on Systems, Man, and Cybernetics--Part B:Cybernetics 31:3 (2001) 408–413 ,doi: 10.1109/3477.931531
2002
- Iterative Case Filtering (ICF)
  H. Brighton, C. Mellish, Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining and Knowledge Discovery 6:2 (2002) 153–172, doi: 10.1023/A:1014043630878
- Modified Condensed Nearest Neighbor (MCNN)
  V. S. Devi, M. N. Murty, An Incremental Prototype Set Building Technique. Pattern Recognition 35:2 (2002) 505–513, doi: 10.1016/S0031-3203(00)00184-9
- Intelligent Genetic Algorithm (IGA)
  S.-Y. Ho, C.-C. Liu, S. Liu, Design of an Optimal Nearest Neighbor Classifier Using an Intelligent Genetic Algorithm. Pattern Recognition Letters 23:13 (2002) 1495–1503, doi: 10.1016/S0167-8655(02)00109-5
- Tabu Search (ZhangTS)
  H. Zhang, G. Sun, Optimal Reference Subset Selection for Nearest Neighbor Classification by Tabu Search. Pattern Recognition 35:7 (2002) 1481–1490, doi: 10.1016/S0031-3203(01)00137-6
- Improved KNN (IKNN)
  Y. Wu, K. G. Ianakiev, V. Govindaraju, Improved K-Nearest Neighbor Classification. Pattern Recognition 35:10 (2002) 2311–2318, doi: 10.1016/S0031-3203(01)00132-7
2003
- Iterative Maximal Nearest Centroid Neighbor (Iterative MaxNCN)
- Reconsistent (Reconsistent)
  M. T. Lozano, J. S. Sánchez, and F. Pla, Using the Geometrical Distribution of Prototypes for Training Set Condensing. CAEPIA, Lecture Notes in Computer Science (2003) 618–627.
- C-Pruner (CPruner)
  K. P. Zhao, S. G. Zhou, J. H. Guan, and A. Y. Zhou, C-Pruner: An Improved Instance Pruning Algorithm. Proceedings of the 2th International Conference on Machine Learning and Cybernetics (2003) 94–99, doi: 10.1109/ICMLC.2003.1264449
- Steady-State Genetic Algorithm (SSGA)
- Population Based Incremental Learning (PBIL)
- CHC Evolutionary Algorithm (CHC)
  J. R. Cano, F. Herrera, M. Lozano, Using Evolutionary Algorithms as Instance Selection for Data Reduction in KDD: An Experimental Study. IEEE Transactions on Evolutionary Computation 7:6 (2003) 561–575, doi: 10.1109/TEVC.2003.819265
- Patterns by Ordered Projections (POP)
  J. C. R. Santos, J. S. Aguilar-Ruiz, M. Toro, Finding Representative Patterns With Ordered Projections. Pattern Recognition, 36:4 (2003) 1009–1018, doi: 10.1016/S0031-3203(02)00119-X
- Nearest Centroid Neighbor Edition (NCNEdit)
  J. S. Sánchez, R. Barandela, A. I. Marqués, R. Alejo, J. Badenas, Analysis of New Techniques to Obtain Quality Training Sets. Pattern Recognition Letters 24:7 (2003) 1015–1022, doi: 10.1016/S0167-8655(02)00225-8
2004
- Edited Normalized Radial Basis Function (ENRBF)
- Edited Normalized Radial Basis Function (ENRBF2)
  N. Jankowski, M. Grochowski, Comparison of Instance Selection Algorithms. I. Algorithms Survey. ICAISC, Lecture Notes in Computer Science 3070 (2004) 598–603.
2005
- Edited Nearest Neighbor Estimating Class Probabilistic (ENNProb)
- Edited Nearest Neighbor Estimating Class Probabilistic and Threshold(ENNTh)
  F. Vázquez, J. S. Sánchez, and F. Pla, A Stochastic Approach to Wilson’s Editing Algorithm. 2nd Iberian Conference on Pattern Recognition and Image Analysis, Lecture Notes in Computer Science 3523(2005) 35–42, doi: 10.1007/11492542_5
- Support Vector based Prototype Selection (SVBPS)
  Y. Li, Z. Hu, Y. Cai, W. Zhang, Support Vector Based Prototype Selection Method for Nearest Neighbor Rules. Proceedings of the First International Conference on Advances in Natural Computation, Lecture Notes in Computer Science 3610 (2005) 528–535, doi: 10.1007/11539087_68
- Backward Sequential Edition (BSE)
  J. A. Olvera-López, J. F. Martínez-Trinidad, and J. A. Carrasco-Ochoa, Edition Schemes Based on BSE. Proceedings of the 10th Iberoamerican Congress on Pattern Recognition (CIARP), Lecture Notes in Computer Science 3773 (2005) 360–367, doi: 10.1007/11578079_38
- Modified Selective Subset (MSS)
  R. Barandela, F. J. Ferri, J. S. Sánchez, Decision Boundary Preserving Prototype Selection for Nearest Neighbor Classification. International Journal of Pattern Recognition and Artificial Intelligence 19:6 (2005) 787–806, doi: 10.1142/S0218001405004332
2006
- Generalized Condensed Nearest Neighbor (GCNN)
  F. Chang, C.-C. Lin, C.-J. Lu, Adaptive Prototype Learning Algorithms: Theoretical and Experimental Studies. Journal of Machine Learning Research 7 (2006) 2125–2148 .
2007
- Fast Condensed Nearest Neighbor 1(FCNN)
- Fast Condensed Nearest Neighbor 2(FCNN2)
- Fast Condensed Nearest Neighbor 3(FCNN3)
- Fast Condensed Nearest Neighbor 4(FCNN4)
  F. Angiulli, Fast Nearest Neighbor Condensation for Large Data Sets Classification. IEEE Transactions on Knowledge and Data Engineering, 19:11 (2007) 1450–1464, doi: 10.1109/TKDE.2007.190645
2008
- Noise Removing based on Minimal Consistent Set (NRMCS)
  X. Z. Wang, B. Wu, Y. L. He, X. H. Pei. NRMCS : Noise Removing Based on the MCS. Proceedings of the Seventh International Conference on Machine Learning and Cybernetics (2008) 89–93, doi: 10.1109/ICMLC.2008.4620384
- Genetic Algorithm based on Mean Square Error, Clustered Crossover and Fast Smart Mutation (GA-MSE-CC-FSM)
  R. Gil-Pita, X. Yao, Evolving Edited K-Nearest Neighbor Classifiers. International Journal of Neural Systems 18:6 (2008) 459–467, doi: 10.1142/S0129065708001725
- Steady-State Memetic Algorithm (SSMA)
  S. García, J. R. Cano, F. Herrera, A Memetic Algorithm for Evolutionary Prototype Selection: A Scaling Up Approach. Pattern Recognition 41:8 (2008) 2693–2709, doi: 10.1016/j.patcog.2008.02.006
- Hit Miss Network C (HMNC)
- Hit Miss Network Edition (HMNE)
- Hit Miss Network Edition Iterative (HMNEI)
  E. Marchiori, Hit Miss Networks With Applications to Instance Selection. Journal of Machine Learning Research 9 (2008) 997–1017.
2009
- Template Reduction for KNN (TRKNN)
  H. A. Fayed, A. F. Atiya, A Novel Template Reduction Approach for the K-Nearest Neighbor Method. IEEE Transactions on Neural Networks, 20:5 (2009) 890–896, doi: 10.1109/TNN.2009.2018547
2010
- Prototype Selection based on Clustering (PSC)
  J. A. Olvera-López, J. A. Carrasco-Ochoa, and J. F. Martínez-Trinidad, A New Fast Prototype Selection Method Based on Clustering. Pattern Analysis and Applications 13:2 (2010) 131–141, doi: 10.1007/s10462-010-9165-y
- Class Conditional Instance Selection (CCIS)
  E. Marchiori, Class Conditional Nearest Neighbor for Large Margin Instance Selection. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (2010) 364–370, doi: 10.1109/TPAMI.2009.164
- Cooperative Coevolutionary Instance Selection (CoCoIS)
  N. García-Pedrajas, J. A. Romero Del Castillo, and D. Ortiz-Boyer, A Cooperative Coevolutionary Algorithm for Instance Selection for Instance-Based Learning. Machine Learning 78:3 (2010) 381–420, doi: 10.1007/s10044-008-0106-1

Experimental Analyses

Comparative analyses regarding the most popular Prototype Selection methods are provided in this section. They have been taken from the SCI2 survey on Prototype Selection (S. García, J. Derrac, J.R. Cano and F. Herrera, Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study. IEEE Transactions on Pattern Analysis and Machine Intelligence 34:3 (2012) 417-435 doi: 10.1109/TPAMI.2011.142). The data sets, parameters, and the rest of features of the experimental study are described at the survey complementary web page.

Result sheets are categorized regarding the method's main place in the taxonomy: Condensation, Edition and Hybrid methods, including results for 1-NN and 3-NN (that is, considering one and three neighbors in the decision rule). For each category, average results in accuracy, kappa, reduction rate, running time and other composite measures are provided:

1-NN
- Condensation methods:
  - Small data sets
  - Medium data sets
- Edition methods:
  - Small data sets
  - Medium data sets
- Hybrid methods:
  - Small data sets
  - Medium data sets
- All methods:
  - Small data sets
  - Medium data sets
3-NN
- Condensation methods:
  - Small data sets
  - Medium data sets
- Edition methods:
  - Small data sets
  - Medium data sets
- Hybrid methods:
  - Small data sets
  - Medium data sets
- All methods:
  - Small data sets
  - Medium data sets

All results (zipped):

SCI²S Approaches on Prototype Selection

Below we provide an outline of the SCI2S published proposals on Prototype Selection.

J.R. Cano, F. Herrera, M. Lozano. Using Evolutionary Algorithms as Instance Selection for Data Reduction in KDD: An Experimental Study. IEEE Transactions on Evolutionary Computation, 7:6 (2003) 561-575 doi: 10.1109/TEVC.2003.819265

In this paper, we have carried out an empirical study of the performance of four representative EA models in which we have taken into account two different instance selection perspectives, the prototype selection and the training set selection for data reduction in KDD. This paper includes a comparison between these algorithms and other nonevolutionary instance selection algorithms. The results show that the evolutionary instance selection algorithms consistently outperform the nonevolutionary ones, the main advantages being: better instance reduction rates, higher classification accuracy, and models that are easier to interpret.

S. García, J.R. Cano, F. Herrera. A Memetic Algorithm for Evolutionary Prototype Selection: A Scaling Up Approach. Pattern Recognition, 41:8 (2008) 2693-2709 doi: 10.1016/j.patcog.2008.02.006

Memetic algorithms are approaches for heuristic searches in optimization problems that combine a population-based algorithm with a local search. In this paper, we propose a model of memetic algorithm that incorporates an ad hoc local search specifically designed for optimizing the properties of prototype selection problem with the aim of tackling the scaling up problem. In order to check its performance, we have carried out an empirical study including a comparison between our proposal and previous evolutionary and non-evolutionary approaches studied in the literature.

The results have been contrasted with the use of non-parametric statistical procedures and show that our approach outperforms previously studied methods, especially when the database scales up.

S. García, J. Derrac, J.R. Cano, F. Herrera. Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study. IEEE Transactions on Pattern Analysis and Machine Intelligence 34:3 (2012) 417-435. doi: 10.1109/TPAMI.2011.142 COMPLEMENTARY MATERIAL to the paper here: datasets, experimental results and source codes.

Abstract: The nearest neighbor classifier is one of the most used and well known techniques for performing recognition tasks. It has also demonstrated itself to be one of the most useful algorithms in data mining in spite of its simplicity. However, the nearest neighbor classifier suffers from several drawbacks such as high storage requirements, low efficiency in classification response and low noise tolerance. These weaknesses have been the subject of study for many researchers and many solutions have been proposed. Among them, one of the most promising solutions consists of reducing the data used for establishing a classification rule (training data), by means of selecting relevant prototypes. Many prototype selection methods exist in the literature and the research in this area is still advancing. Different properties could be observed in the definition of them but no formal categorization has been established yet. This paper provides a survey of the prototype selection methods proposed in the literature from a theoretical and empirical point of view. Considering a theoretical point of view, we propose a taxonomy based on the main characteristics presented in prototype selection and we analyze their advantages and drawbacks. Empirically, we conduct an experimental study involving different sizes of data sets for measuring their performance in terms of accuracy, reduction capabilities and run-time. The results obtained by all the methods studied have been verified by nonparametric statistical tests. Several remarks, guidelines and recommendations are made for the use of prototype selection for nearest neighbor classification.