Genetics-Based Machine Learning for Rule Induction: State of the Art, Taxonomy and Comparative Study

This Website contains complementary material to the paper:

A. Fernandez, S. García, J. Luengo, E. Bernadó-Mansilla, F. Herrera, Genetics-Based Machine Learning for Rule Induction: State of the Art, Taxonomy and Comparative Study. IEEE Transactions on Evolutionary Computation 14:6 (2010) 913-941, doi:10.1109/TEVC.2009.2039140. PDF Icon

The web is organized according to the following summary:

  1. Paper content
  2. Taxonomy of Genetics-Based Machine Learning Algorithms for Classification
    1. Taxonomy
    2. Preliminaries: Genetics-Based Machine Learning Features
    3. Taxonomy Families and Algorithms' Description
  3. Data-sets partitions employed in the paper
    1. Standard classification
    2. Imbalanced data (including pre-processed data sets)
  4. Experimental Study
    1. Parameters
    2. Analysis of the GBML algorithms for Rule Induction in Standard Classification
    3. Analysis of the GBML algorithms for Rule Induction in Imbalanced Data-sets
      1. Empirical study with the original data-sets
      2. Empirical study applying preprocessing


Paper Content

Abstract: The classification problem can be addressed by numerous techniques and algorithms, which belong to different paradigms of Machine Learning. In this work, we are interested in evolutionary algorithms, the so-called Genetics-Based Machine Learning algorithms. In particular, we will focus on evolutionary approaches that evolve a set of rules, i.e., evolutionary rule-based systems, applied to classification tasks, in order to provide a state-of-the-art in this field.

This study has been done with a double aim: to present a taxonomy of the Genetics-Based Machine Learning approaches for rule induction, and to develop an empirical analysis both for standard classification and for classification with imbalanced data sets.

We also include a comparative study of the GBML methods with some classical non-evolutionary algorithms, in order to observe the suitability and high potential of the search performed by evolutionary algorithms and the behaviour of the GBML algorithms in contrast to the classical approaches, in terms of classification accuracy.

Summary:

  1. Introduction.
  2. Taxonomy of Genetics Based Machine Learning Algorithms for Classification.
    1. Taxonomy.
    2. Preliminaries: Genetics-Based Machine Learning Features.
    3. Taxonomy Famlies and Algorithms' Description
  3. Experimental Framework.
    1. Performance measures for standard classification.
    2. Imbalanced data sets: formulation and performance measures.
    3. Selected classical Machine Learning algorihtms.
    4. Data sets.
    5. Parameters.
    6. Statistical Tests for Performance Comparison.
    7. KEEL Software.
    8. Web page associated to the paper.
  4. Analysis of the GBML Algorithms for Rule Induction in Standard Classification.
    1. Study of the behaviour of the algorithms in the different families.
    2. Study of the representative algorithms of each family versus classical ones.
  5. Analysis of the GBML Algorithms for Rule Induction in Imbalanced Data sets.
    1. Empirical analysis with the original data sets
      1. Study of the behaviour of the algorithms in the different families.
      2. Study of the representative algorithms of each family versus classical ones.
    2. Empirical analysis applying preprocessing
      1. Study of the behaviour of the algorithms in the different families.
      2. Study of the representative algorithms of each family versus classical ones.
  6. Discussion: Lessons Learned and New Challanges.
  7. Conclusions.

Taxonomy of Genetics-Based Machine Learning Algorithms for Classification

In this section, we first propose a taxonomy for the different GBML approaches for rule induction, classifying them into different families. Then, we introduce some basic features of the GBML algorithms that enable us to highlight the similarities and differences among the existing methods in this context. Finally, we describe each one of the GBML families, presenting the representative algorithms that we have selected within each category.

Taxonomy

Many proposals have been developed under the label of GBML for rule induction. All these approaches share an underlying commonality: the codification of the rule set or rule in a chromosome and the use of an evolutionary algorithm as the search mechanism. However, there are many differences among the algorithms, such as the type of learning scheme (supervised vs. reinforcement learning or online vs offline learning), the rule set representation (i.e., a decision list or a set of overlapping rules), the representation of each rule (i.e., the management of numerical and nominal values), or the design of the EA that guides the search.
With the aim of categorising the GBML methods for rule induction presented in the literature, we distinguish among three different families based on the chromosome codification:

  • Chromosome = Rule. In this type of algorithms, each rule is encoded in a single chromosome and the whole population evolves to the final rule-set. We consider three subcategories of methods:
    • The Michigan approach (J. H. Holland, Escaping brittleness: The possibilities of general purpose learning algorithms applied to parallel rule-based systems. Los Altos, CA: Morgan Kaufmann, 1986, vol. 2, pp. 593–623, L. B. Booker, D. E. Goldberg, and J. H. Holland, "Classifier systems and genetic algorithms," Artificial Intelligence, vol. 40, no. 1-3, pp. 235–282, 1989), in which the rule-set is incrementally updated through the sequential observation of training examples and their corresponding classification. Eventually, the rule-set is improved by the action of the EA.
    • The Iterative Rule Learning (IRL) approach (G. Venturini, "SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts," in Machine Learning ECML-93, ser. LNAI, P. Brazdil, Ed., vol. 667. Springer, 1993, pp. 280–296), where the EA learns rule by rule iteratively and removes from the training set the examples covered (matched) by each new rule, in a divide-and-conquer style.
    • The GCCL approach (D. P. Greene and S. F. Smith, "Competition-based induction of decision models from examples," Machine Learning, vol. 13, no. 23, pp. 229–257, 1993), where all rules/chromosomes are evolved together in the EA. The chromosomes cooperate among themselves to perform the classification task, but the final rule set does not need to include all the rules. Hence, there is a competition to be among the best fitted rules and therefore to remain in the rule base.
  •  Chromosome = Rule Set. This category is known as the Pittsburgh approach (S. F. Smith, "A learning system based on genetic algorithms," Ph.D. dissertation, University of Pittsburgh, Pittsburgh, PA, 1980, S. F. Smith, "Flexible learning of problem solving heuristics through adaptive search," in 8th International Joint Conference on Artificial Intelligence, 1983, pp. 422–425). This type of methods are a result of directly extending GAs (J. Holland, Adaptation in Natural and Artificial Systems. The University of Michigan Press, 1975) to supervised learning problems. The system maintains a population of candidate rule sets whose quality is evaluated with a fitness function that considers different aspects such as the prediction accuracy and the generalization of the rule sets. Once the classifier is trained, the best individual found during the evolutionary process is used to predict the class of unknown examples.
  • Hybrid Evolutionary Decision Trees (HEDTs). This type of GBML contains mechanisms that combine decision trees with genetic algorithms (GAs) (E. Cantú-Paz and C. Kamath, "Inducing oblique decision trees with evolutionary algorithms," IEEE Transactions on Evolutionary Computation, vol. 7, no. 1, pp. 54–68, 2003, D. Carvalho and A. Freitas, "A hybrid decision tree/genetic algorithm method for data mining," Information Sciences, vol. 163, no. 1-3, pp.13–35, 2004). The underlying idea is to use the search capability of the GA to find a highly accurate final tree. The tree can then be interpreted as a rule set, formed by the disjunction of the rules that are obtained in each branch of the tree. Thus the chromosome can encode the full tree or a just tree node/rule. We have defined this family as Hybrid Evolutionary Decision Trees (HEDTs)

In this work, we have considered the algorithms that have been published in the most influential journals on the topic and only classical methods proposed in conference contributions. Genetic Fuzzy Systems (F. Herrera, "Genetic fuzzy systems: Taxonomy, current research trends and prospects," Evolutionary Intelligence, vol. 1, pp. 27–46, 2008), methods based on Genetic Programming (C. Zhou, W. Xiao, T. M. Tirpak, and P. C. Nelson, "Evolving accurate and compact classification rules with gene expression programming," IEEE Transactions on Evolutionary Computation, vol. 7, no. 6, pp. 519–531, 2003) and distributed approaches (A. Giordana and F. Neri, "Search-intensive concept induction," Evolutionary Computation, vol. 3, no. 4, pp. 375–419, 1995, C. Anglano and M. Botta, "NOW G-net: learning classification programs on networks of workstations," IEEE Transactions on Evolutionary Computation, vol. 6, no. 13, pp. 463–480, 2002) are out of the scope of this study.
Table 1 summarises the list of selected algorithms of each type. All algorithms were implemented by the authors of this paper, except for GASSIST and HIDER whose implementations were directly supplied by their corresponding authors. Furthermore, these methods are included in the KEEL software (J. Alcalá-Fdez, L. Sánchez, S. García, M. del Jesus, S. Ventura, J. Garrell, J. Otero, C. Romero, J. Bacardit, V. Rivas, J. Fernández, and F. Herrera, "KEEL: A software tool to assess evolutionary algorithms to data mining problems," Soft Computing, vol. 13, no. 3, pp. 307–318, 2009).

Table 1. Summary of the algorithms employed in this work

Algorithm name Acronym Family Reference
XCS XCS Michigan S. W. Wilson, "Classifier fitness based on accuracy," Evolutionary Computation, vol. 3, no. 2, pp. 149–175, 1995
UCS UCS Michigan E. Bernadó-Mansilla and J. M. Garrell, "Accuracy-based learning classifier systems: models, analysis and applications to classification tasks," Evolutionary Computation, vol. 11, no. 3, pp. 209–238, 2003
Supervised Inductive Algorithm SIA IRL G. Venturini, "SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts," in Machine Learning ECML-93, ser. LNAI, P. Brazdil, Ed., vol. 667. Springer, 1993, pp. 280–296
HIerarchical DEcision Rules HIDER IRL J. Aguilar-Ruiz, R. Giráldez, and J. Riquelme, "Natural encoding for evolutionary supervised learning," IEEE Transactions on Evolutionary Computation, vol. 11, no. 4, pp. 466–479, 2007
CO-Evolutionary Rule Extractor CORE GCCL K. C. Tan, Q. Yu, and J. H. Ang, "A coevolutionary algorithm for rules discovery in data mining," International Journal of Systems Science, vol. 37, no. 12, pp. 835–864, 2006
Organizational Co-Evolutionary algorithm for Classification OCEC GCCL L. Jiao, J. Liu, and W. Zhong, "An organizational coevolutionary algorithm for classification," IEEE Transactions on Evolutionary Computation, vol. 10, no. 1, pp. 67–80, 2006.
COverage-based Genetic INduction COGIN GCCL D. P. Greene and S. F. Smith, "Competition-based induction of decision models from examples," Machine Learning, vol. 13, no. 23, pp. 229–257, 1993
Genetic-based Inductive Learning GIL Pittsburgh C. Janikow, "A knowledge-intensive genetic algorithm for supervised learning," Machine Learning, vol. 13, no. 23, pp. 189–228, 1993
Pittsburgh Genetic Interval Rule Learning Algorithm Pitts-GIRLA Pittsburgh A. L. Corcoran and S. Sen, "Using real-valued genetic algorithms to evolve rule sets for classification," in IEEE Conference on Evolutionary Computation Proceedings, 1994, pp. 120–124
Data Mining for Evolutionary Learning DMEL Pittsburgh W.-H. Au, K. C. C. Chan, and X. Yao, "A novel evolutionary data mining algorithm with applications to churn prediction," IEEE Transactions on Evolutionary Computation, vol. 7, no. 6, pp. 532–545, 2003
Genetic Algorithms based claSSIfier sySTem GASSIST Pittsburgh J. Bacardit, D. Goldberg, and M. Butz, "Improving the performance of a pittsburgh learning classifier system using a default rule," in Revised Selected Papers of the International Workshop on Learning Classifier Systems 2003-2005, ser. LNCS, T. Kovacs, X. Llor, K. Takadama, P. L. Lanzi, W. Stolzmann, and S. W. Wilson, Eds., vol. 4399. Springer, 2007, pp. 291–307
Ordered Incremental Genetic Algorithm OIGA Pittsburgh F. Zhu and S. U. Guan, "Ordered incremental training with genetic algorithms," International Journal of Intelligent Systems, vol. 19, no. 12, pp. 1239–1256, 2004
Incremental Learning with Genetic Algorithms ILGA Pittsburgh S. U. Guan and F. Zhu, "An incremental approach to genetic-algorithms based classification," IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 35, no. 2, pp. 227–239, 2005
Hybrid Decision Tree - Genetic Algorithm DT-GA HEDT D. Carvalho and A. Freitas, "A hybrid decision tree/genetic algorithm method for data mining,” Information Sciences, vol. 163, no. 1-3, pp. 13–35, 2004
Oblique Decision Tree Oblique-DT HEDT E. Cantú-Paz and C. Kamath, "Inducing oblique decision trees with evolutionary algorithms," IEEE Transactions on Evolutionary Computation, vol. 7, no. 1, pp. 54–68, 2003
Tree Analysis with Randomly Generated and Evolved Trees TARGET HEDT J. B. Gray and G. Fan, "Classification tree analysis using TARGET," Computational Statistics & Data Analysis, vol. 52, no. 3, pp. 1362–1372, 2008

Preliminaries: Genetics-Based Machine Learning Features

With the aim of giving a general background of the selected algorithms, in this section we describe the main components which are relevant to GBML algorithms and at the same time distinguish them from non-evolutionary learning algorithms. This also leads to providing an educational character to the paper content. We considered the following characteristics:

A. Chromosome representation:
As mentioned before, the underlying commonality of GMBL is the use of an EA as the search mechanism. In this respect, the first consideration is the representation of the solution in a population of chromosomes. The choice of the chromosome as codifying a single rule or a rule set determines largely the operation of the algorithm, and that is the motivation for basing our taxonomy on the chromosome representation. In any case, the rule set in its whole may operate as a set of non-ordered rules, either overlapped or not overlapped, or as a decision list. Also, the inference type (the classification process itself) is very dependent on the type of rule used in the algorithm. The details are as follows:

  • Non-ordered overlapping rule sets, also called "IF-THEN" rules. Since the rules can be overlapping, the whole rule set does not necessarily cover the full space of inputs. At classification time, if the example matches more than a rule, a voting method is usually applied, where the output is the class advocated by the majority. Other approaches include a weighted voting, or a measure of the distance to the nearest rule (S. Salzberg, "A nearest hyperrectangle method," Machine Learning, vol. 6, pp. 251–276, 1991) In the case of a tie, some approaches leave the example as non-classified, while others assign the class randomly.
  • Non-ordered, non-overlapped rule sets. This is the case of rule sets obtained from decision trees or axis-parallel hyperplanes, which split the search space into a set of non-overlapping regions. In this sense, the rule set covers all the input space. The classification process with these rules is trivial, since only one rule can match the example.
  • Ordered rule sets, also called "IF-THEN-ELSE" rules or decision lists ( R. L. Rivest, "Learning decision lists," Machine Learning, vol. 2, pp. 229–246, 1987). The classification is simply made using the first rule in the list that matches the example. It is very common to use a "default rule" that explains the input space not covered by any of the rules in the set.

Each rule has the form condition ⇒ class, where the condition specifies the set of input states that the rule matches, i.e., the region of the search space covered by the rule, and the class is the classification that the rule advocates for that region. The condition part is usually represented as a conjunction of tests over the attributes. There are two main schemes:

  • "Attribute <operator> value". A test is a condition over an attribute value, which is specified differently depending on whether the attribute is nominal or numerical:
    • Nominal attributes: when the attribute can take a value among a list of possible categories, the test over the attribute consists in checking for equality to a given category or belonging to a subset of categories. That is, "<operator>" may be in either {=,≠,∈,∉}.
    • Numerical attributes: if the attribute can take numerical values, the test checks whether the value belongs to a given interval. That is, the "<operator>" may be "<,≤,=,≥>". Interval-rules take the form "Attribute ∈ [min,max]" which is equal to the case of "Attribute ≥ min AND Attribute ≤ max".

Disjunction of attributes within a condition can also be codified, although this is rarely implemented in GBML algorithms. The disjunction of attribute values can be alternatively codified with the disjunction of several rules. This may enlarge the rule set, but simplifies the complexity of the rule. We must also remark that some algorithms treat the nominal attributes as ordered numerical values and consequently use the representations designed for numerical attributes.

  • Oblique Rules: The tests over the attributes are codified by a hyperplane, with the following form: ∑{i=1}^{d}{aixi}+ad+1>0, where the ai are real-valued coefficients (D. G. Heath, S. Kasif, and S. Salzberg, "Induction of oblique decision trees," in 13th International Joint Conference on Artificial Intelligence (IJCAI93), 1993, pp. 1002–1007 and E. G. Henrichon and K. Fu, "A nonparametric partitioning procedure for pattern classification," IEEE Transancations on Computation, vol. C-18, pp. 614–624, 1969) and xi is the value of the i-th attribute. This is an extension of the axis-parallel hyperplanes in the attribute space used by decision trees.

It is common that not all the attributes are relevant to the classification. Some algorithms explicitly allow the codification of irrelevant attributes with the so-called "don't care" symbol. This favours the generalization of the rules. Some approaches enable the chromosome to represent variable length conditions. For algorithms using a fixed chromosome length, the representation of the "don't care" condition depends on the selected rule scheme defined above:

  • For the "Attribute <operator> value" rules, we can represent the "don't care" using an specific special value with the $=$ operator, including all possible values for ∈ (none for ∉), using the maximum range or minimum range for the "<,≤,=,≥>" operators respectively, or applying an inversion of operators for the interval-rules, that is, having the "min" value higher than the "max" one.
  • For the oblique rules, it is only needed to assign a weight 0 for those variables that are not included in the antecedent.

B. Learning scheme:

This feature refers to the procedure in which the algorithm learns the rule set from a set of pre-classified examples.
Depending on how examples are provided, learning can be performed in an incremental mode (online method), or in a batch mode (offline method). The former scheme implies that the knowledge can be updated as new examples arrive to the system. Michigan approaches usually follow this. The latter model does not easily allow updates of the rule set once the algorithm is trained. If new information is available, the whole rule base should be retrained, often from scratch. Training starts with the whole training data set available. The internal operation of the algorithm may decide whether to use the whole data set in each iteration to build the model (as in the Pittsburgh approach), or incrementally reduce it by deleting the examples already covered, such as in those approaches following a "divide-and-conquer" strategy.

Learning can also happen in a supervised, unsupervised or reinforcement learning environment. In a supervised environment, the learning algorithm knows the class of the example. In reinforcement learning, the algorithm gets a reward after predicting the class of the example. The algorithm should build its model based on the positive and negative (or absence of) rewards. Unsupervised classification belongs to the case where no external feedback is given to the learner. This case does not fall into the scope of the paper.

C. Fitness Function:
The assessment of the quality of the rule set is essential to evolve an appropriate model from examples. Since the search of the EA is guided by the fitness function, it must include the quality measures that reflect the type of rule set required. The optimal rule set is usually that with high accuracy and low complexity. Accuracy is usually considered as the percentage of correct classifications, although more sophisticated metrics may arise such as those considering the percentage of classification per class. Complexity is measured with the number of rules or the generalization of the individual rules. We must remark that these measures of quality are applied both at the rule set level and for the individual rules, depending on what the EA evolve.

Fitness may also include niching or sharing techniques, or token competition as in M. L. Wong and K. S. Leung, Data Mining Using Grammar Based Genetic Programming and Applications. Kluwer Academic Publishers, London, 2000, which enables to adjust the global fitness and optimise the search towards different areas in the problem space by penalising those solutions/rules which are similar or which cover the same training examples.

D. Design of the EA:
As mentioned before, all GBML algorithms have as a common characteristic the use of a search process based on an EA to learn the final rule set. They usually apply a GA which can be generational or steady-state. In the former case, the population is replaced completely iteration by iteration, possibly with an elitism scheme in which the best individual(s) remain(s) in the new population. The steady-state model only modifies a subset of individuals (normally only two) during each iteration.

E. Specific Operators:
Each new proposal in the context of GBML algorithms not only implies the choice of representation, learning methodology, and design of the EA scheme, but also the use of specific operators for the knowledge discovery. These usually represent the most significant novelty of the approach and accentuate the differences among the methods.

These specific operators include the heuristics to initialise the population, the use of covering algorithms, and the crossover and mutation operators. Since they are particular to each algorithm, we will focus on this issue on each single description when needed.

Taxonomy Families and Algorithms' Description

The description of the methods used in this study is summarised in a table for each one of the proposed families, in which we detail the main characteristics with regard to the GBML features stated previously. Therefore, the reader can observe the similarities and differences among the algorithms of the same category or between any two methods. Afterwards, we present a brief description of the intrinsic operators associated with each one of the algorithms to provide the specific details for each of them.

A. Michigan Approach Michigan-style algorithms (J. H. Holland, Escaping brittleness: The possibilities of general purpose learning algorithms applied to parallel rule-based systems. Los Altos, CA: Morgan Kaufmann, 1986, vol. 2, pp. 593–623, L. B. Booker, D. E. Goldberg, and J. H. Holland, "Classifier systems and genetic algorithms," Artificial Intelligence, vol. 40, no. 1-3, pp. 235–282, 1989) evolve a set of rules called "classifiers" which are incrementally updated through the sequential observation of training examples. The best "classifiers", which are those that correctly cover a higher proportion of training examples, are selected and propagated over other less useful "classifiers", leading to an improvement in the overall system performance.

Some of the first developments of Michigan-style GBML are SCS (D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Publishing Company, 1989) and NewBoole (P. Bonelli and A. Parodi, "An efficient classifier system and its experimental comparison with two representative learning methods on three medical domains." in 4th international conference on genetic algorithms (ICGA91), R. K. Belew and L. B. Booker, Eds. Morgan Kaufmann, 1991, pp. 288–295). In this paper we select XCS, as it is the most influential algorithm in this family of classifiers, and UCS, a system that specialises XCS for supervised learning. The features of both algorithms are summarised in Table 2.

Table 2. Features of the Michigan Algorithms

Features XCS UCS
Learning Method Online
Reinforcement Learning
Online
Supervised Learning
Type of Rule-Set IF-THEN IF-THEN
Inference Type Weighted Voting Weighted Voting
Type of Rule Conjunction of Atts. Conjunction of Atts.
Nominal
Representation
None:
Transformation to Ordered
Numerical Values
None:
Transformation to Ordered
Numerical Values
Numerical
Representation
Interval-Rule Interval-Rule
"Don't Care"
Management
Complete Interval Complete Interval
Fitness Inverse of Prediction
Error. Sharing
Accuracy
Sharing
EA Type Steady-State GA Steady-State GA
EA Features Roulette Wheel Selection
2-Point Crossover
Random Mutation
Roulette Wheel Selection
2-Point Crossover
Random Mutation
Chromosome
Representation
Real Coding
normalised to [0,1]
Fixed Length
Real Coding
normalised to [0,1]
Fixed Length
  • XCS

XCS (S. W. Wilson, "Classifier fitness based on accuracy," Evolutionary Computation, vol. 3, no. 2, pp. 149–175, 1995, C. Stone and L. Bull, "For real! XCS with continuous-valued inputs," Evolutionary Computation, vol. 11, no. 3, pp. 299–336, 2003) was designed for reinforcement learning, although it can be used for pattern recognition by considering that a classification problem is a reinforcement problem in which maximum rewards are given to correct classifications and low rewards correspond to incorrect classifications.

Learning takes place incrementally. In each iteration, an example is provided to the system. Then, XCS finds the matching rules and randomly chooses one of the possible classes. If no rules match, then covering is applied. It creates a new rule with a condition, which is generalized from the attribute values of the example, and a random class. Once the class is decided, a reward is received. This reward is used to update the quality of the rules that advocated that particular class. Eventually, the GA is applied to the rules that proposed the chosen class as described in (S. W. Wilson, "Generalization in the XCS Classifier System," in Genetic Programming: Proceedings of the Third Annual Conference, J. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. Fogel, M. Garzon, D. Goldberg, H. Iba, and R. Riolo, Eds. Morgan Kaufmann, 1998, pp. 665–674). The GA selects two parents (two rules) based on their fitness, which is a measure of the accuracy of the reward prediction relative to the accuracies of the overlapping rules. The rules are crossed and mutated and the final offspring are introduced into the population, deleting other rules if the population is full. The rules follow the interval rule representation, which was firstly introduced in XCS in ( S. W. Wilson, "Get Real! XCS with Continuous-Valued Inputs," in Festschrift in Honor of John H. Holland, L. Booker, S. Forrest, M. Mitchell, and R. Riolo, Eds. Center for the Study of Complex Systems, University of Michigan, 1999, pp. 111–121).

  • UCS

UCS (E. Bernadó-Mansilla and J. M. Garrell, "Accuracy-based learning classifier systems: models, analysis and applications to classification tasks," Evolutionary Computation, vol. 11, no. 3, pp. 209–238, 2003) is a method derived from XCS. It inherits the main features of XCS, but mainly differs in two respects. Firstly, the learning interaction is adjusted to a supervised learning scheme. This means that each example shown to the system comes along with the class. Then, UCS uses the class information to build the set of correct rules. The rules that belong to this set get their fitness improved, while rules that matched the example but not the class get their fitness reduced. The second main difference with XCS is on fitness. In UCS, fitness is directly the proportion of correct predictions of the rule with respect to the total number of matches, whereas fitness in XCS is a windowed average of the accuracy of the reward prediction.

B. Iterative Rule Learning Approach IRL GBML systems use a divide-and-conquer methodology to create an ordered list of rules (G. Venturini, "SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts," in Machine Learning ECML-93, ser. LNAI, P. Brazdil, Ed., vol. 667. Springer, 1993, pp. 280–296). The system iteratively invokes an EA where the best individual returned is added to the end of a list of rules and all the matching examples are removed from the training data set. This process is repeated until the training data set is empty.

In this case, we selected two algorithms. First, we considered a classical approach, the SIA algorithm and then, a recent representative method of this type: HIDER. Their characteristics are shown in Table 3.

Table 3. Features of the IRL Algorithms

Features SIA HIDER
Learning Method Batch
Divide-and-conquer
Batch
Divide-and-conquer
Type of Rule-Set IF-THEN IF-THEN-ELSE
Inference Type Distance to the
Nearest Rule
Decision List
Type of Rule Conjunction of Atts. Conjunction of Atts.
Nominal Representation Operator "=" Operator "="
Numerical Representation Interval-Rule Natural Coding
Discretization
"Don't Care"
Management
Special Value Complete Interval
Fitness Accuracy and Complexity Accuracy and Coverage
EA Type Steady-State GA Generational GA
EA Features Random Selection
Uniform Crossover
Generalisation Operator
(see description)
Roulete-Wheel Selection
Specific Crossover and Mutation
Elitism
Chromosome
Representation
Real and Binary Coding
Fixed Length

Natural Coding (see description)
Fixed Length

 

 

  •  SIA

Supervised Inductive Algorithm (SIA) (G. Venturini, "SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts," in Machine Learning ECML-93, ser. LNAI, P. Brazdil, Ed., vol. 667. Springer, 1993, pp. 280–296) is a classical IRL mechanism. The main procedure of SIA is detailed as follows: first, a training sample which has not been classified yet (defined as ``uncovered'') is selected, and the most specific rule that matches that example is built. Then the condition part of the rule is generalised using a GA with a fitness function based on a linear combination of the number of classifications and misclassifications and the generalitisation of the rule.

The generalisation operator works by enlarging the bounds of the conditions in the case of a numerical attribute and by converting the condition into a ``don't care'' for nominal attributes. The GA runs iteratively under it reaches the stop condition (number of iterations without generating a better rule). The best rule in the final population is then added to the rule set and the examples covered by it are removed from the training set. The process is repeated until no more examples remain uncovered.

  • HIDER

HIerarchical DEcision Rules (HIDER) (J. S. Aguilar-Ruiz, J. C. Riquelme, and M. Toro, "Evolutionary learning of hierarchical decision rules," IEEE Transactions on Systems, Man, and Cyberneticts - Part B: Cybernetics, vol. 33, no. 2, pp. 324–331, 2003 and J. Aguilar-Ruiz, R. Giráldez, and J. Riquelme, "Natural encoding for evolutionary supervised learning," IEEE Transactions on Evolutionary Computation, vol. 11, no. 4, pp. 466–479, 2007) uses natural coding (defined by the authors in J. Aguilar-Ruiz, R. Giráldez, and J. Riquelme, "Natural encoding for evolutionary supervised learning," IEEE Transactions on Evolutionary Computation, vol. 11, no. 4, pp. 466–479, 2007) to represent each rule. That is, each rule is encoded as IF x1 = L1 ^ ... ^ xn = Ln THEN ck. For numerical attributes, each Li is a label obtained by means of the natural coding representation, which is a tabular representation of the computed cut-points of a specific discretization method designed for this method (J. S. Aguilar-Ruiz, J. Bacardit, and F. Divina, "Experimental evaluation of discretization schemes for rule induction," in Genetic and Evolutionary Computation GECCO-2004, ser. LNCS, vol. 3102. Springer-Verlag, 2004, pp. 828-839). Therefore it is directly translated into an interval-rule by taking the lower and upper cut-points.

The EA initialises the population by randomly selecting some examples and creating rules that cover these examples. Then, the evolutionary search is run during a specific number of generations, with the guidance of the fitness function which considers both the accuracy and the generalization of the rules. The best rule obtained is added to the final rule set and the examples covered by the rule are removed from the training data set, similarly to SIA. The process is repeated until there are less than the number of maximum examples allowed by a threshold called "examples pruning factor".

C. Genetic Cooperative Competitive Learning In this approach, each individual codifies a rule, and the whole rule set is evolved simultaneously. Thus, rules should cooperate among them to get an optimal rule set jointly, and at the same time, rules compete against each other to survive in the population. Population diversity must be enforced to avoid individuals converging to the same area of the search space.

The algorithms selected for this family are CORE, OCEC and COGIN. Next, we describe each algorithm, whose main features are shown in Table 4.

Table 4. Features of the GCCL Approaches

Features CORE OCEC COGIN
Learning Method Batch Batch Batch
Type of Rule-Set IF-THEN-ELSE IF-THEN IF-THEN-ELSE
Inference Type Decision List Weighted Voting Decision List
Type of Rule Conjunction of Attributes Conjunction of Attributes Conjunction of Attributes
Nominal Representation Operators "= and &neq;" Operator "=" Operator "∈"
Numerical Representation All Comparisons:
"<,≤,=,&ge,>"
and Interval-Rule
None:
Discretization Required
None:
Discretization Required
"Don't Care" Management Absence of Conditions Absence of Conditions Absence of Conditions
Special Symbol
Fitness Accuracy with
Token Competition
Accuracy
and Complexity
Accuracy and Complexity
with Token Competition
EA Type Generational GA Generational GA Generational GA
EA Features Tournament Selection
1-Point and Linear Crossover
Simple Random Mutation
No Elitism
Random Selection
Special Crossover
(see description)
No Mutation
Elitism
Random Selection
1-Point Crossover
No Mutation
Elitism based in
Token Competition
Chromosome Representation Real and Binary Coding
Variable Length
Binary Coding
Variable Length
Binary Coding
Variable Length

 

  • CORE

CO-Evolutionary Rule Extractor (CORE) (K. C. Tan, Q. Yu, and J. H. Ang, "A coevolutionary algorithm for rules discovery in data mining," International Journal of Systems Science, vol. 37, no. 12, pp. 835–864, 2006) evolves a set of rules, which are initialised randomly, using as fitness a combination of the true positive rate and the false positive rate, together with a token competition that reduces the size of the rule-set. It uses a specific regeneration operator that re-initialises those chromosomes that have a fitness below the average. For nominal attributes it uses the one-point crossover, whereas for the numerical attributes it applies a linear combination of the parents.

  • OCEC

Organizational Co-Evolutionary algorithm for Classification (OCEC) (L. Jiao, J. Liu, and W. Zhong, "An organizational coevolutionary algorithm for classification," IEEE Transactions on Evolutionary Computation, vol. 10, no. 1, pp. 67–80, 2006.) creates groups of similar examples with the same class label and then builds a rule for each group discovered. Thus the chromosome has a variable length and it stores the position of the example in the training set. At classification time, the most general rule that covers the example defines its class.

Three specific evolutionary operators are used in this algorithm, the "migrating operator" that moves a predefined number of examples from one chromosome to another, the "exchanging operator" that exchanges a predefined number of examples between two chromosomes, and the "merging operator" that joins two chromosomes. It also uses an elitist scheme in which the best pair between the parents and the offspring remains in the population. The fitness function depends on the number of conditions and the confidence of the rule.

  • COGIN

COverage-based Genetic INduction (COGIN) (D. P. Greene and S. F. Smith, "Competition-based induction of decision models from examples," Machine Learning, vol. 13, no. 23, pp. 229–257, 1993) manages a variable number of rules, which are initialised randomly until they cover all the examples of the training set, making half of the conditions ``don't care''. It generates a new offspring population by randomly selecting two parents and applying a one-point crossover. Then it applies a token competition to remove those rules that do not match any example already covered by better rules. The fitness is based on a linear combination of the entropy and the generality of the rule (number of active bits).

D. Pittsburgh Approaches In the Pittsburgh-style GBML systems, the GA encodes the rule set directly into the chromosome. In this manner, each individual represents the whole solution and the different rule sets are evolved until convergence.

There are a large number of Pittsburgh style approaches in the literature. In this paper we consider the following methods: GIL, Pitts-GIRLA, DMEL, GASSIST, OIGA and ILGA. As with the previous families introduced in this section, we summarise the main characteristics of these algorithms in Table 5.

Table 5. Features of the Pittsburgh Approaches

Features GIL Pitts-GIRLA DMEL GASSIST OIGA ILGA
Learning Method Batch Batch Batch Batch Batch Batch
Type of Rule-Set IF-THEN-ELSE IF-THEN IF-THEN IF-THEN-ELSE IF-THEN IF-THEN
Inference Type Decision List Voting Weighted Voting Decision List Voting Voting
Type of Rule Conjunction of Atts. Conjunction of Atts. Conjunction of Atts. Conjunction and
Disjunction of Atts.
Conjunction of Atts. Conjunction of Atts.
Nominal
Representation
Operator "∈" None: Transformation
to Ordered
Numerical Values
Operator "=" Operator "∈" None: Transformation
to Ordered
Numerical Values
None: Transformation
to Ordered
Numerical Values
Numerical
Representation
None:
Discretization Required
Interval-Rule None:
Discretization Required
Interval-Rule Interval-Rule Interval-Rule
"Don't Care"
Management
Absence of Conditions
All Values for a Condition
Reverse Interval None Absence of Conditions Reverse Interval Reverse Interval
Fitness Accuracy
and Complexity
Accuracy Accuracy Accuracy
and Complexity
Accuracy Accuracy
EA Type Generational GA Generational GA Steady-State GA Generational GA Steady-State GA Steady-State GA
EA Features Special Operators
(see description)
No Elitism
Random Selection
1-Point Crossover
Random and "Creep"
Mutation. Elitism
Roulette-Wheel Selection
1 and 2-Point Crossover
Mutation based on
Local Search
Tournament Selection
Special Crossover
and Mutation (see
description). Elitism
Roulette-Wheel
Selection. 1-Point
Crossover
Random Mutation
Roulette-Wheel
Selection. 1-Point
Crossover
Random Mutation
Chromosome
Representation
Binary Coding
Variable Length
Real Coding
Fixed Length
Binary Coding
Variable Length
Binary Coding and
ADI representation
Variable Length
Real Coding
Fixed Length
Real Coding
Fixed Length

 

  • GIL

The Genetic-based Inductive Learning (GIL) (C. Janikow, "A knowledge-intensive genetic algorithm for supervised learning," Machine Learning, vol. 13, no. 23, pp. 189–228, 1993) learns rules for each class sequentially, starting for the minority class, leaving the majority class as a default rule. Thus, it runs the GIL algorithm n-1 times, where n is the number of classes. For a given run i, class i is the positive class and all the other classes represent the negative class. Thus, a multiple class problem is converted into n-1 different concept learning problems.

The initialisation of the chromosomes is carried out randomly for half of the population and using some of the training examples for the other half. The fitness of each individual is based on four conditions: accuracy rate, completeness (covering all positive examples), correctness (minimization of the number of negative examples covered) and complexity of the rule (computed as the number of rules and conditions). GIL implements 14 genetic operators belonging to three kinds of levels. Six of them work at the rule set level and consist in adding and exchanging rules to the rule set, and generalising or specialising the rule set. Five operators work at the rule level by adding, removing or exchanging conditions. The final three operators work at the condition level by adding or removing values. All these operators have an associated probability which is auto-adapted according to the needs of the learning process, that is, to favour the generalisation or specialisation of the rule set.

  • Pitts-GIRLA

The Pittsburgh Genetic Interval Rule Learning Algorithm (Pitts-GIRLA) (A. L. Corcoran and S. Sen, "Using real-valued genetic algorithms to evolve rule sets for classification," in IEEE Conference on Evolutionary Computation Proceedings, 1994, pp. 120–124) initialises all chromosomes at random, where a "don't care" condition occurs when the lower bound of the interval associated with a given variable is higher than the upper bound. Apart from the conventional 1-point crossover and random mutation, it uses a so-called "creep" mutation which is applied to the bounds of each attribute condition and increments or decrements them with a fraction of the valid range for the attribute value.

  • DMEL

Data Mining for Evolutionary Learning (DMEL) (W.-H. Au, K. C. C. Chan, and X. Yao, "A novel evolutionary data mining algorithm with applications to churn prediction," IEEE Transactions on Evolutionary Computation, vol. 7, no. 6, pp. 532–545, 2003) starts with the generation of an initial set of first-order rules (rules with one condition) using a probabilistic induction technique and based on these rules, rules of a higher order (two or more conditions) are obtained iteratively. Then DMEL begins an evolutionary learning process by generating an initial population of individuals by randomly combining the rules in the rule base of order l-1 to form a set of rules of order l. The consequent of the rule is not encoded in the chromosome but determined by the computation of the fitness value. Finally, the classification is made by the rule with the highest fitness that matches the example.

  • GASSIST

GASSIST (Genetic Algorithms based claSSIfier sySTem) (J. Bacardit and J. Garrell, "Evolving multiple discretizations with adaptive intervals for a pittsburgh rule-based learning classifier system," in Genetic and Evolutionary Computation Conference (GECCO20003), ser. LNCS, vol. 2724. Springer-Verlag, 2003, pp. 1818–1831, J. Bacardit, D. Goldberg, and M. Butz, "Improving the performance of a pittsburgh learning classifier system using a default rule," in Revised Selected Papers of the International Workshop on Learning Classifier Systems 2003-2005, ser. LNCS, T. Kovacs, X. Llor, K. Takadama, P. L. Lanzi, W. Stolzmann, and S. W. Wilson, Eds., vol. 4399. Springer, 2007, pp. 291–307, J. Bacardit and J. M. Garrell, "Bloat control and generalization pressure using the minimum description length principle for a pittsburgh approach learning classifier system," in Revised Selected Papers of the International Workshop on Learning Classifier Systems 2003-2005, ser. LNCS, T. Kovacs, X. Llor, K. Takadama, P. L. Lanzi, W. Stolzmann, and S. W. Wilson, Eds., vol. 4399. Springer, 2007, pp. 59–79) inherits from the GABIL algorithm (K. A. DeJong, W. M. Spears, and D. F. Gordon, "Using genetic algorithms for concept learning," Machine Learning, vol. 13, pp. 161– 188, 1993). The GASSIST algorithm initialises the population at random and uses as a fitness function the Minimum Description Length principle (J. Rissanen, "Modeling by shortest data description," Automatica, vol. 14, pp. 465–471, 1978), that takes into account both the accuracy and the complexity of the rule set. The representation of the chromosome differs for nominal and numerical attributes. For the first type, it uses a string of bits, one per each category allowed for that attribute, whereas for continuous variables it applies an Adaptive Discretization Intervals rule representation J. Bacardit and J. Garrell, "Evolving multiple discretizations with adaptive intervals for a pittsburgh rule-based learning classifier system," in Genetic and Evolutionary Computation Conference (GECCO20003), ser. LNCS, vol. 2724. Springer-Verlag, 2003, pp. 1818–1831). This representation consists, in general terms, of a hierarchical uniform-width discretization of each attribute having different cut-points for the same attribute in different rules. We should point out that this representation can be directly translated into an interval-rule by taking the lower and upper cut-points of each attribute.

It uses the multiple-point crossover operator inherited from GABIL that forces the selected cut points in both parents to be in the same position of the variable so that semantically correct offspring are obtained. The mutation operator randomly adds or removes one value of a given variable. GASSIST introduces a new deletion operator that removes rules from individuals to limit the rule set size. This operator is activated after a predefined number of iterations and removes the rules of an individual that do not match any input example.

  • OIGA

OIGA (Ordered Incremental Genetic Algorithm) (F. Zhu and S. U. Guan, "Ordered incremental training with genetic algorithms," International Journal of Intelligent Systems, vol. 19, no. 12, pp. 1239–1256, 2004) carries out the learning in two steps: first, it learns one-condition rules (generated randomly) for each attribute, and then optimising their values using a GA. Then, once all the attributes have been explored separately, OIGA joins the obtained one-condition rule sets ordered by their fitness. This process adds the one-condition rule sets to the current increasing rule sets one by one, randomly merging the rules with the same consequent from the best one-condition rule-set.

This incremental process involves a new optimisation procedure in each merging step, in which the current increasing rule sets are refined by means of a GA (with the same features that the previous one). This process stops once all attributes have been added, obtaining a rule set with all the attributes. Both genetic processes use the accuracy rate as fitness.

  • ILGA

Incremental Learning with GAs (ILGA) (S. U. Guan and F. Zhu, "An incremental approach to genetic-algorithms based classification," IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 35, no. 2, pp. 227–239, 2005) is an evolution of the OIGA learning process. ILGA expands the possibilities of the incremental process, by means of using, as a seed for all the initial population, either the best one-condition rule set (as OIGA does) or the whole population of chromosomes in the current solution. It also introduces a bias in the mutation and crossover operators. This bias affects the ``old'' part of the chromosome, that is, the genes added in the previous iterations but not in the current one. If the randomly chosen point for mutation or crossover is located in the ``old'' genes, the corresponding rates may be reduced with a reduction rate. The motivation behind this is that ILGA tends to preserve the structure of old elements and explore more of the combination between old and new elements.

E. Hybrid Evolutionary Decision Trees Apart from the classical GBML approaches described previously, in this paper we study some hybrid approaches that unify the use of decision trees with GAs, selecting the most important algorithms from the literature, DT-GA and Oblique-DT, and including the TARGET algorithm as a recent representative. The features of these algorithms are listed in Table 6.

Table 6. Features of the GCCL Approaches

Features DT-GA Oblique-DT TARGET
Learning Method Batch
Divide-and-conquer
Batch Batch
Type of Rule-Set Axis-Parallel hyperplanes
and IF-THEN
Oblique-Rules Oblique-Rules
and IF-THEN
Inference Type Matching Rule Matching Rule or Weighted
Voting (see description)
Matching Rule
Type of Rule Conjunction of Atts. Conjunction of Linear
Combination of Atts.
Conjunction and Linear
Combination of Atts.
Nominal Representation Operator "∈" None:
Transformation to Ordered
Numerical Values
Operator "∈"
Numerical Representation Operators "< and ≥" Linear Combination Linear Combination
(Up to Three Variables)
"Don't Care" Management Special Symbol Not implicit Not implicit
Fitness Accuracy Accuracy Accuracy and Complexity
EA Type Generational GA Generational GA Generational GA
EA Features Tournament Selection
1-point Crossover
Random Mutation
Elitism
Tournament Selection
Uniform Crossover
No Mutation
Elitism
Roulette-Wheel Selection
Special Crossover and
Mutation (see description)
Elitism and Reinitialisation
Chromosome Representation Binary and Real Coding
Fixed Length
Real Coding
Fixed Length
Binary and Real Coding
Variable Length

 

  • DT-GA

The Hybrid Decision Tree - Genetic Algorithm (DT-GA) method (D. Carvalho and A. Freitas, "A hybrid decision tree/genetic algorithm method for data mining,” Information Sciences, vol. 163, no. 1-3, pp. 13–35, 2004) discovers rules in two training phases. In the first phase, it runs C4.5 with the whole training set and transforms the resulting tree into an "IF-THEN" set of rules. The examples covered by each rule are considered either as a small disjunct or as a large disjunct, depending on whether their number is smaller than or equal to a given threshold. The second phase consists of using a GA with the joint of the instances in the small disjuncts as a "second training set". To predict the output class of an example, it is pushed down the decision tree until it reaches a leaf node, as usual. If the leaf node is a large disjunct, the example is assigned the class predicted by that leaf node. Otherwise, the example is assigned the class of one of the small-disjunct rules discovered by the GA.

The rules from the GA are randomly initialised selecting a value for each attribute from the examples of the "second training set". The chromosome representation encodes the conditions for all attributes (nominal and numerical), together with a bit that specifies the "don't care" condition. The rule consequent is not encoded into the genome, but dynamically chosen as the most frequent class in the set of examples covered by that rule's antecedent. The fitness function is given by a quadratic version of the geometric mean of the true rates. It also uses a specific rule pruning operator that transforms a condition into a "don't care" based on the accuracy rate associated with each attribute. The stopping criterion is given by a threshold value of the remaining examples in the "second training set".

  • Oblique-DT

The Oblique Decision Tree (Oblique-DT) algorithm (E. Cantú-Paz and C. Kamath, "Inducing oblique decision trees with evolutionary algorithms," IEEE Transactions on Evolutionary Computation, vol. 7, no. 1, pp. 54–68, 2003) extends the classical OC1 method (S. K. Murthy, S. Kasif, and S. Salzberg, "A system for induction of oblique decision trees." Journal of Artificial Intelligence Research, vol. 2, pp. 1–32, 1994) (a greedy optimizer for oblique-type decision trees) to find oblique partitions by means of a standard generational GA.

Each chromosome encodes the coefficient of the linear combination that defines the oblique-hyperplane. To initialise the population, we first compute the best axis-parallel hyperplane using an impurity measure defined below, and we copy it to 10% of the initial population. The remainder of the population is initialised randomly with coefficients ai ∈ [-200,200]. The fitness value is computed as the impurity of a split at each tree node using the twoing rule (L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Pacific Grove, 1984), which is based on the balance of the number of examples on the left and right of the split.

  • TARGET

The Tree Analysis with Randomly Generated and Evolved Trees (TARGET) methodology (J. B. Gray and G. Fan, "Classification tree analysis using TARGET," Computational Statistics & Data Analysis, vol. 52, no. 3, pp. 1362–1372, 2008) is a novel approach that uses a GA to build decision trees in which each chromosome represents a complete decision tree. The population is initialised randomly with a pre-specified probability of adding a new condition (node) to the tree. To evaluate the fitness of each chromosome, the authors uses a measure based on the correct classifications and the length of the chromosome (number of conditions/nodes).

The genetic search uses specific operators for crossover and mutation. In the case of crossover, a node swap or a subtree swap is applied. In the case of mutation, there are four possibilities: split set mutation, split rule mutation, node swap mutation and subtree swap mutation. It also uses elitism (called cloning) and reinitialisation (called transplantation) to reach a good trade-off between convergence and diversity.

Data-sets partitions employed in the paper

Standard classification

For our study in standard classification, we have selected thirty data sets from UCI repository. In all the experiments, we have adopted a 5-fold cross-validation model, i.e., we have split the data-set randomly into 5 folds, each one containing the 20% of the patterns of the data-set. Thus, four folds have been used for training and one for test. For each one of the five partitions, we have executed 5 trials of the algorithms. For each data-set, we therefore consider the average results of 25 runs. Table 7 summarizes the properties of the selected data-sets. It shows, for each data-set, the number of examples (#Ex.), the number of attributes (#Atts.), the number of numerical (#Num.) and nominal (#Nom.) features, and the number of classes (#Cl.). Some of the larger data-sets (abalone, magic, nursery, penbased and ring) have been stratified sampled at 10%. In the case of presenting missing values (cleveland, breast cancer, dermatology, hepatitis and wisconsin) we have removed the instances with any missing value before partitioning (The actual number of examples is shown between brackets). The last column of this table contains a link for downloading the 5-fold cross validation partitions for each data-set in KEEL format. All data-sets may be downloaded by clicking here.

Table 7. Summary Description for Standard Data-Sets

id Data-set #Ex. #Atts. #Num. #Nom. #Cl. Download.
aba abalone 418 8 7 1 28 (22) dataBase.png
aus australian credit approval 690 14 8 6 2 dataBase.png
bal balance scale 625 4 4 0 3 dataBase.png
bre breast cancer 286 (277) 9 0 9 2 dataBase.png
bup liver disorders [Bupa] 345 6 6 0 2 dataBase.png
car car evaluation 1,728 6 0 6 4 dataBase.png
cle cleveland 297 13 13 0 5 dataBase.png
con contraceptive method choice 1,473 9 6 3 3 dataBase.png
crx japanese credit screening 653 15 6 9 2 dataBase.png
der dermatology 366 (358) 33 1 32 6 dataBase.png
eco ecoli 336 7 7 0 8 dataBase.png
fla solar flare 1,066 10 0 10 6 dataBase.png
ger german credit data 1,000 20 6 14 2 dataBase.png
gla glass identification 214 9 9 0 7 (6) dataBase.png
hab haberman 306 3 3 0 2 dataBase.png
hea heart 270 13 6 7 2 dataBase.png
hep hepatitis 155 (80) 19 6 13 2 dataBase.png
iri iris 150 4 4 0 3 dataBase.png
lym lymphography 148 18 3 15 4 dataBase.png
mag magic 1,902 10 10 0 2 dataBase.png
new new-thyroid 215 5 5 0 3 dataBase.png
nur nursery 1,296 8 0 8 5 dataBase.png
pen pen-based recognition 1,100 16 16 0 10 dataBase.png
pim pima 768 8 8 0 2 dataBase.png
rin ring 740 20 20 0 2 dataBase.png
tic tic-tac-toe endgame 958 9 0 9 2 dataBase.png
veh vehicle 846 18 18 0 4 dataBase.png
win wine 178 13 13 0 3 dataBase.png
wis wisconsin 683 (630) 9 9 0 2 dataBase.png
zoo zoo 101 17 0 17 7 dataBase.png

Imbalanced data (including pre-processed data sets)

For the study with imbalanced data, we have considered thirty-three data-sets from UCI with different ratios between the minority and majority classes: from low imbalance to highly imbalanced data-sets. Multi-class data sets are modified to obtain two-class non-balanced problems, defining the joint of one or more classes as positive and the joint of one or more classes as negative. Table 8 summarizes the imbalanced data employed in this study and shows, for each data-set, the number of examples (#Ex.), number of attributes (#Atts.), class name of each class (majority and minority) and class attribute distribution. This table is ordered according to this column, from low to high imbalance. Additonally, we have included one last column with the original data-sets partitions and preprocessed with SMOTE (N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "Smote: Synthetic minority over-sampling technique," Journal of Artificial Intelligent Research, vol. 16, pp. 321–357, 2002). All data-sets can be downloaded following this link. The preprocessed data sets are available here.

Table 8. Summary Description for Imbalanced Data-sets Data-Sets

Data-set #Ex. #Atts. Class (min., maj.) %Class(min.; maj.) Download.
Glass1 214 9 (build-win-non_float-proc; remainder) (35.51, 64.49) dataBase.pngdataBase.png
Ecoli0vs1 220 7 (im; cp) (35.00, 65.00) dataBase.pngdataBase.png
Wisconsin 683 9 (malignant; benign) (35.00, 65.00) dataBase.pngdataBase.png
Pima 768 8 (tested-positive; tested-negative) (34.84, 66.16) dataBase.pngdataBase.png
Iris0 150 4 (Iris-Setosa; remainder) (33.33, 66.67) dataBase.pngdataBase.png
Glass0 214 9 (build-win-float-proc; remainder) (32.71, 67.29) dataBase.pngdataBase.png
Yeast1 1484 8 (nuc; remainder) (28.91, 71.09) dataBase.pngdataBase.png
Vehicle1 846 18 (Saab; remainder) (28.37, 71.63) dataBase.pngdataBase.png
Vehicle2 846 18 (Bus; remainder) (28.37, 71.63) dataBase.pngdataBase.png
Vehicle3 846 18 (Opel; remainder) (28.37, 71.63) dataBase.pngdataBase.png
Haberman 306 3 (Die; Survive) (27.42, 73.58) dataBase.pngdataBase.png
Glass0123vs456 214 9 (non-window glass; remainder) (23.83, 76.17) dataBase.pngdataBase.png
Vehicle0 846 18 (Van; remainder) (23.64, 76.36) dataBase.pngdataBase.png
Ecoli1 336 7 (im; remainder) (22.92, 77.08) dataBase.pngdataBase.png
New-thyroid2 215 5 (hypo; remainder) (16.89, 83.11) dataBase.pngdataBase.png
New-thyroid1 215 5 (hyper; remainder) (16.28, 83.72) dataBase.pngdataBase.png
Ecoli2 336 7 (pp; remainder) (15.48, 84.52) dataBase.pngdataBase.png
Segment0 2308 19 (brickface; remainder) (14.26, 85.74) dataBase.pngdataBase.png
Glass6 214 9 (headlamps; remainder) (13.55, 86.45) dataBase.pngdataBase.png
Yeast3 1484 8 (me3; remainder) (10.98, 89.02) dataBase.pngdataBase.png
Ecoli3 336 7 (imU; remainder) (10.88, 89.12) dataBase.pngdataBase.png
Page-blocks0 5472 10 (remainder; text) (10.23, 89.77) dataBase.pngdataBase.png
Vowel0 988 13 (hid; remainder) (9.01, 90.99) dataBase.pngdataBase.png
Glass2 214 9 (Ve-win-float-proc; remainder) (8.78, 91.22) dataBase.pngdataBase.png
Ecoli4 336 7 (om; remainder) (6.74, 93.26) dataBase.pngdataBase.png
Glass4 214 9 (containers; remainder) (6.07, 93.93) dataBase.pngdataBase.png
Abalone9vs18 731 8 (18; 9) (5.65, 94.25) dataBase.pngdataBase.png
Glass5 214 9 (tableware; remainder) (4.20, 95.80 dataBase.pngdataBase.png
Yeast2vs8 482 8 (pox; cyt) (4.15, 95.85) dataBase.pngdataBase.png
Yeast4 1484 8 (me2; remainder) (3.43, 96.57) dataBase.pngdataBase.png
Yeast5 1484 8 (me1; remainder) (2.96, 97.04) dataBase.pngdataBase.png
Yeast6 1484 8 (exc; remainder) (2.49, 97.51) dataBase.pngdataBase.png
Abalone19 4174 8 (19; remainder) (0.77, 99.23) dataBase.pngdataBase.png

Experimental Study

Experimental Framework: Parameters

The configuration parameters for the GBML algorithms are shown in Table 9, whereas in Table 10 we show the parameters for the classical non-evolutionary algorithms. In both cases, the selected values are common for all problems, and they were selected according to the recommendation of the corresponding authors of each algorithm, which is the default parameters' setting included in the KEEL software (J. Alcalá-Fdez, L. Sánchez, S. García, M. del Jesus, S. Ventura, J. Garrell, J. Otero, C. Romero, J. Bacardit, V. Rivas, J. Fernández, and F. Herrera, "KEEL: A software tool to assess evolutionary algorithms to data mining problems," Soft Computing, vol. 13, no. 3, pp. 307–318, 2009).

Table 9. Parameter specification for the algorithms employed in the experimentation.

Michigan Algorithms
Algorithm Parameters
XCS number of explores = 100000, population size = 6400, α = 0.1, β = 0.2, δ = 0.1,
ν = 10.0, θmna = 2, θdel = 50.0, θsub = 50.0, ε0 = 1 ,do Action Set Subsumption = false,
fitness reduction = 0.1, p_I; = 10.0, FI; = 0.01, εI = 0.0, γ = 0.25, χ = 0.8, μ = 0.04,
θGA = 50.0, doGASubsumption = true, type of selection = RWS,
type of mutation = free, type of crossover = 2 point, P# = 0.33, r0 = 1.0, m0 = 0.1,
l0 = 0.1, doSpecify = false, nSpecify = 20.0 pSpecify = 0.5.
UCS number of explores = 100000, population size = 6400, δ = 0.1, acc0 = 0.99, Pcross = 0.8
Pmut = 0.04, θGA = 50.0, θdel = 50.0, θsub = 50.0, doGASubsumption = true, r0 = 0.6,
type of selection = RWS, type of mutation = free, type of crossover = 2 point, m0; = 0.1
IRL Algorithms
Algorithm Parameters
SIA Number of iterations = 200, α = 150, β = 0, Threshold Strength = 0
HIDER Pop. Size = 100, Number of Gen. = 100, Mutation Prob.= 0.5,
Percentage of Crossing = 80, Extreme Mutation Prob. = 0.05,
Prune Examples Factor = 0.05, Penalty Factor = 1, Error Coefficient = 0.
GCCL Algorithms
Algorithm Parameters
CORE Pop. Size = 100, Co-population Size = 50, Gen. Limit = 100, Number of co-populations = 15,
Crossover rate = 1.0, Mutation Prob. = 0.1, Regeneration Prob. = 0.5
OCEC Number of Total Generations = 500, Number of migrating/exchanging members = 1
COGIN Misclassification error level = 2, Gen. limit = 1000, Crossover rate = 0.9, Negation bit = Yes
Pittsburgh Algorithms
Algorithm Parameters
GIL Pop. Size = 40, Number of Gen. = 1000, w1; = 0.5, w2 = 0.5, w3 = 0.01, Rules Exchange = 0.2,
Rule exchange selection = 0.2, Rules copy = 0.1, New Event = 0.4, Rules Generalization = 0.5
Rules Drop = 0.5, Rules Specialization = 0.5, Rule Split = 0.005,
Nominal Rule Split = 0.1, Linear Rule Split = 0.7, Condition Drop = 0.1
Conjunction to Disjunction = 0.02, Introduce Condition = 0.1, Rule Directed Split = 0.03,
Reference Change = 0.02, Reference Extension = 0.03, Reference Restriction = 0.03,
Condition Level Prob. = 0.5, Lower threshold = 0.2, Upper threshold = 0.8
Pitts-GIRLA Number of rules: 30, Number of generations: 10,000
Population size: 61 chromosomes, Crossover Probability: 0.7, Mutation Probability: 0.5.
DMEL Pop. Size = 30, Crossover Prob. = 0.5, Mutation Prob. = 0.0001, Number of Gen. = 1000
GASSIST-ADI Threshold in Hierarchical Selection = 0,
Iteration of Activation for Rule Deletion Operator = 5,
Iteration of Activation for Hierarchical Selection = 24,
Minimum Number of Rules before Disabling the Deletion Operator = 12,
Minimum Number of Rules before Disabling the Size Penalty Operator = 4,
Number of Iterations = 750, Initial Number Of Rules = 20, Population Size = 400,
Crossover Probability = 0.6, Probability of Individual Mutation = 0.6,
Probability of Value 1 in Initialization = 0.90, Tournament Size = 3,
Possible size in micro-intervals of an attribute = {4, 5, 6, 7, 8, 10, 15, 20, 25},
Maximum Number of Intervals per Attribute = 5, psplit = 0.05, pmerge = 0.05,
Probability of Reinitialize Begin = 0.03, Probability of Reinitialize End = 0,
Use MDL = true, Iteration MDL = 25, Initial Theory Length Ratio = 0.075,
Weight Relaxation Factor = 0.90, Class Initialization Method = cwinit, Default Class = auto
OIGA Crossover Prob. = 1.0, Mutation Prob. = 0.01, Pop. Size = 200, Number of Rules = 30
Stagnation = 30, Generations = 200, Survivors Percent = 0.5, Attribute Order = descendent
ILGA Crossover Prob. = 1.0, Mutation Prob. = 0.01, Pop. Size = 200, Number of Rules = 30
Stagnation = 30, Generations = 200, Survivors Percent = 0.5, Attribute Order = descendent
Crossover reduction rate = 0.5, Mutation reduction rate = 0.5, Incremental strategy = 1
Hybrid Evolutionary Decision Trees
Algorithm Parameters
DT-GA Confidence = 0.25, Instances per leaf = 2, Threshold S to consider a Small Disjunct = 10,
Number of Gen. = 50, Number of chrom. = 200, Crossover Prob. = 0.8, Mutation Prob. = 0.01
Oblique-DT Number of Total Generations for the GA = 25, Population size = 20√d (d = dimensionality)
TARGET Split Prob. = 0.5, Number of Gen. = 100, Number of Tress = 50
 

Table 10. Parameter specification for the state-of-the-are non-evolutionary algorithms employed in the experimentation.

Algorithm Parameters
CART Max Depth of the Tree = 90
AQ Star Size = 5
CN2 Star Size = 5
C4.5 and Prune = True, Confidence level = 0.25
C4.5-Rules Minimum number of item-sets per leaf = 2
Ripper Size of growing subset = 66%
Repetitions of the optimization stage = 2
 

Finally, a preprocessing discretization step is applied in the case of algorithms that do not cope with real vales. Specifically, we made use of the Class-Attribute Dependent Discretizer (J. Ching, A. Wong, and K. Chan, "Class-dependent discretization for inductive learning from continuous and mixed-mode data," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 7, pp. 6641–6518, 1995).

Analysis of the GBML algorithms for Rule Induction in Standard Classification

Our first objective is to study the behaviour of the GBML algorithms for rule induction, dividing the analysis into the different families presented in our taxonomy.

Table 11 shows the results in test using accuracy, whereas Table 12 includes the ones for Cohen's kappa as performance measure. These tables are divided into five parts, each one of them related to Michigan, Iterative Rule Learning (IRL), Genetic Cooperative-Competitive Learning (GCCL), Pittsburgh and Hybrid Evolutionary Decision Trees (HEDTs) approaches.

These tables can be downloaded as an Excel document by clicking on the following links iconExcel.jpg and iconExcel.jpg

Table 11. Results with the accuracy measure for GBML algorithms for rule induction in standard classification

  Michigan Approach IRL Approach GCCL Approach Pittsburgh Approach HEDT Approach
Data-set XCS UCS SIA HIDER CORE OCEC COGIN GIL Pitts-GIRLA DMEL GASSIST OIGA ILGA DT-GA Oblique-DT TARGET
aba 18.67 17.80 19.52 19.20 19.29 14.12 20.39 10.48 18.52 2.92 24.45 23.65 20.72 13.82 16.56 20.15
aus 85.59 83.51 65.30 82.20 81.59 85.91 84.55 84.49 75.01 57.68 85.39 84.26 84.70 84.23 82.03 85.54
bal 82.69 76.93 82.40 73.54 66.46 70.05 74.91 52.32 69.12 70.91 78.56 53.44 53.60 75.20 90.56 75.62
bre 74.89 73.23 69.46 75.45 73.01 65.40 54.21 62.68 72.29 72.15 73.94 73.59 73.94 73.36 62.84 71.34
bup 66.03 65.80 63.83 64.23 60.23 45.22 55.36 45.86 59.42 46.43 64.81 56.87 54.90 64.64 63.07 65.97
car 79.55 91.60 93.37 70.02 79.24 77.52 72.12 80.58 79.02 8.59 92.06 79.78 70.49 86.25 98.38 77.71
cle 54.41 53.25 49.56 51.24 54.15 56.97 49.76 43.50 54.42 27.07 54.21 51.52 51.45 49.61 49.15 52.99
con 54.06 48.28 49.03 52.31 43.52 45.17 44.56 43.45 54.19 34.42 54.46 47.25 44.09 51.54 47.71 45.19
crx 85.79 83.50 65.61 80.98 82.82 86.37 83.59 84.96 86.07 56.78 85.85 84.60 85.70 86.00 80.10 86.37
der 95.59 84.41 83.24 87.59 31.72 80.45 85.65 84.35 49.79 49.90 92.29 85.98 85.27 84.85 93.85 66.15
eco 78.99 78.03 69.48 76.31 63.40 61.26 61.33 58.70 69.17 43.88 76.67 68.46 62.09 76.73 76.19 65.49
fla 53.17 72.12 52.01 74.42 65.57 70.54 67.88 61.52 46.96 50.97 73.99 71.94 63.75 74.72 73.17 61.54
ger 73.12 71.86 67.70 71.04 70.24 67.00 47.26 66.54 71.26 62.68 72.60 71.10 71.32 71.76 67.06 70.00
gla 66.47 64.12 73.43 63.91 51.03 49.72 52.46 54.99 59.45 15.34 61.42 54.94 51.02 68.35 67.12 44.11
hab 71.90 72.09 65.93 72.88 70.85 67.45 73.14 67.32 63.38 72.54 68.68 71.70 72.42 72.22 62.48 71.50
hea 78.52 76.89 67.56 72.37 74.67 78.00 77.33 77.04 53.33 77.85 80.07 78.96 78.81 77.26 74.00 74.89
hep 90.00 82.00 84.00 84.25 80.00 79.00 44.00 81.50 73.00 16.25 85.75 79.50 76.75 81.50 83.25 77.25
iri 94.40 90.00 94.40 94.13 93.47 88.13 82.80 90.00 86.40 50.27 95.87 93.73 92.93 93.33 92.80 92.93
lym 79.46 76.79 79.70 76.60 67.14 76.33 74.67 76.58 63.49 26.97 78.73 76.84 75.81 65.82 71.66 75.17
mag 81.24 79.31 75.37 73.44 73.36 68.77 67.23 68.41 78.94 64.91 79.94 77.02 75.83 80.11 74.40 75.76
new 94.60 94.88 93.12 92.09 91.35 86.14 81.77 84.19 90.42 40.56 92.74 92.19 91.44 90.42 94.79 86.79
nur 79.69 78.54 89.48 82.25 71.59 81.62 81.88 81.87 53.56 38.45 90.23 83.21 72.37 87.41 92.90 72.02
pen 88.58 91.04 95.73 68.29 33.38 67.75 72.84 48.79 33.02 19.15 65.84 49.13 47.18 86.33 91.15 38.93
pim 74.66 73.93 71.90 73.46 72.11 69.63 67.03 69.79 59.13 66.40 73.27 73.62 72.11 73.96 70.83 73.02
rin 91.84 79.51 68.62 33.95 61.14 78.46 68.89 74.16 59.89 50.41 88.14 86.81 86.05 84.46 79.89 67.59
tic 85.32 92.09 99.77 69.93 70.19 81.21 91.23 66.63 73.82 60.94 95.11 74.90 78.06 82.65 90.08 69.50
veh 72.70 71.63 62.39 63.00 39.55 52.33 51.10 45.98 45.20 33.00 66.46 59.89 57.38 68.68 70.33 49.81
win 96.83 90.54 95.26 77.49 90.29 72.70 64.29 72.91 78.84 37.87 92.66 91.01 88.49 94.90 92.08 82.24
wis 96.60 96.14 96.96 96.40 93.94 95.81 95.31 95.55 74.72 75.90 95.55 91.27 90.95 94.56 93.06 95.75
zoo 88.88 90.69 95.25 95.43 92.54 93.66 91.08 93.91 33.86 63.01 93.66 92.63 87.91 93.70 96.05 72.09
Mean 77.81 76.68 74.65 72.28 67.26 70.42 67.95 67.63 62.86 46.47 77.78 72.66 70.58 76.28 76.58 68.78

 

Table 12. Results with the kappa measure for GBML algorithms for rule induction in standard classification

  Michigan Approach IRL Approach GCCL Approach Pittsburgh Approach HEDT Approach
Data-set XCS UCS SIA HIDER CORE OCEC COGIN GIL Pitts-GIRLA DMEL GASSIST OIGA ILGA DT-GA Oblique-DT TARGET
aba .0701 .0770 .0966 .0952 .0643 .0701 .1044 .0511 .0748 .0186 .1255 .1214 .0899 .0559 .0674 .0734
aus .7092 .6646 .2752 .6404 .6320 .7162 .6912 .6897 .4578 .0537 .7042 .6888 .6958 .6832 .6345 .7122
bal .6851 .6004 .6790 .5092 .3789 .5113 .5363 .3414 .4286 .4611 .6031 .1708 .1854 .5453 .8347 .5481
bre .2781 .2924 .2191 .2731 .1998 .2082 .1612 .1966 .1872 .3198 .3099 .2858 .2418 .1926 .1051 .2990
bup .2662 .2745 .2560 .2276 .1114 -.0047 -.0256 -.0194 .1794 .0136 .2667 .0661 .0157 .2733 .2409 .2847
car .4321 .8055 .8550 .0000 .5856 .5535 .1759 .6413 .4680 .0272 .8238 .4681 .0235 .6668 .9647 .5580
cle .2794 .2281 .1720 .2034 .0875 .3067 .2132 .1894 .0852 .1143 .2448 .1118 .0663 .2203 .2213 .1313
con .2761 .1976 .1853 .2426 .1005 .1943 .1410 .1891 .2734 .0898 .2806 .1349 .0590 .2223 .1909 .1057
crx .7146 .6661 .2879 .6173 .6568 .7262 .6761 .7003 .7213 .0509 .7158 .6981 .7169 .7191 .5991 .7292
der .9446 .7840 .7890 .8435 .1266 .7543 .8227 .8033 .3693 .3243 .9033 .8248 .8178 .8091 .9227 .5645
eco .7043 .6927 .5654 .6718 .4548 .4768 .4185 .4652 .5570 .1709 .6711 .5552 .4518 .6691 .6732 .4815
fla .3931 .6424 .3479 .6709 .5563 .6290 .5851 .5309 .2636 .3483 .6631 .6366 .5309 .6738 .6561 .4971
ger .3082 .2551 .1104 .1895 .0642 .2400 .1508 .3138 .1447 .0966 .2953 .1750 .2064 .2795 .2268 .0000
gla .5350 .5053 .6404 .4903 .3367 .3376 .3701 .4098 .4141 .0576 .4517 .3734 .3143 .5652 .5546 .2302
hab .0835 .1727 .1304 .1167 .0446 .0862 .0318 .0879 .0442 .0842 .0880 .0146 .0114 .1521 .0701 .0673
hea .5644 .5247 .3379 .4374 .4834 .5583 .5562 .5380 .0000 .5473 .5948 .5935 .5931 .5346 .4738 .4849
hep .5389 .2348 .1283 .2850 .1778 .3646 .1109 .2985 .1225 .0000 .4230 .3003 .2434 .0036 .4171 .1011
iri .9160 .8500 .9160 .9120 .9020 .8220 .7427 .8500 .7960 .2540 .9380 .9079 .8964 .9000 .8920 .8940
lym .6044 .5399 .6099 .5425 .3422 .5508 .5394 .5602 .2383 .0451 .5918 .5900 .5493 .3570 .4597 .5161
mag .5643 .5328 .4462 .3836 .3788 .1623 .1039 .1510 .5130 .0013 .5430 .4762 .4399 .5256 .4384 .4518
new .8833 .8849 .8454 .8281 .8042 .7132 .6330 .6925 .7890 .2515 .8460 .8363 .8238 .7772 .8871 .7214
nur .6988 .6835 .8467 .7369 .5810 .7353 .7412 .7391 .3221 .2928 .8553 .7516 .5925 .8134 .8962 .5848
pen .8731 .9004 .9525 .6480 .2597 .6415 .7007 .4319 .2551 .0983 .6201 .4447 .4245 .8480 .9016 .3208
pim .4233 .4091 .3564 .3626 .3191 .2303 .1232 .2370 .0000 .0614 .3896 .4015 .3504 .4176 .3540 .4094
rin .8367 .5895 .3692 -.3282 .2187 .5680 .3938 .4812 .2152 .0000 .7627 .7549 .7406 .6893 .5977 .3486
tic .6487 .7959 .9949 .3399 .2794 .5909 .8213 .4109 .3330 .2464 .8891 .4286 .4997 .5900 .7796 .2975
veh .6361 .6216 .4983 .5066 .1991 .3647 .3755 .2845 .2688 .1100 .5529 .4753 .4459 .5817 .6043 .3322
win .9520 .8563 .9287 .6518 .8524 .5876 .5022 .5882 .6550 .1175 .8885 .8690 .8347 .9222 .8802 .7325
wis .9258 .9146 .9335 .9206 .8653 .9072 .8990 .9033 .4833 .3681 .9020 .8188 .8112 .8801 .8462 .9075
zoo .8527 .8686 .9377 .9397 .9050 .9166 .8838 .9192 .1051 .4668 .9167 .9037 .8421 .9165 .9477 .6224
Mean .5866 .5688 .5237 .4653 .3989 .4840 .4393 .4559 .3255 .1697 .5953 .4959 .4505 .5495 .5779 .4336

We present now a list of algorithms that have been selected as representative of each family. The criterion for choosing each method is based on the differences among the methods within each category computed in the statistical analysis, and the best mean result of Oblique-DT in its group (please refer to the paper for details). The selected algorithms are the following ones:

  1. Chromosome = Rule:
    1. Michigan approach: XCS.
    2. IRL approach: SIA.
    3. GCCL approach: OCEC.
  2. Chromosome = Rule Set (Pittsburgh): GASSIST.
  3. HEDTs: Oblique-DT.

Now, our aim is to perform a comparative study among these approaches with some state-of-the-art non-evolutionary interval rule classification methods. These algorithms are CART (L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Pacific Grove, 1984), AQ (R. Michalksi, I. Mozetic, and N. Lavrac, "The multipurpose incremental learning system AQ15 and its testing application to three medical domains," in 5th International Conference on Artificial Intelligence (AAAI’86), 1986, pp. 1041–1045), CN2 (P. Clark and T. Niblett, "The CN2 induction algorithm," Machine Learning, vol. 3, no. 4, pp. 261–283, 1989), C4.5 (J. R. Quinlan, C4.5: Programs for Machine Learning. San Mateo-California: Morgan Kaufmann Publishers, 1993), C4.5-Rules (J. Quinlan, "MDL and categorical theories (continued)," in Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, 1995, pp. 464–470) and Ripper (W. W. Cohen, "Fast effective rule induction," in Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, 1995, pp. 115–123).

In Tables 13 and 14 we show the results for accuracy and Cohen's kappa respectively. We have included the selected Interval Rule GBML approaches, that is, XCS, SIA, OCEC, GASSIST and Oblique-DT and for the classical non-evolutionary methods.

These tables can be also downloaded as an Excel document by clicking on the following links iconExcel.jpg and iconExcel.jpg

Table 13. Results with the accuracy measure for GBML algorithms for rule induction and classical non-evoluationary algorithms in standard classification

Data-set XCS SIA OCEC GASSIST Oblique-DT CART AQ CN2 C4.5 C4.5-Rules RIPPER
aba 18.67 19.52 14.12 24.45 16.56 21.29 15.17 26.80 18.67 20.97 17.37
aus 85.59 65.30 85.91 85.39 82.03 84.64 71.30 80.14 84.20 84.81 81.33
bal 82.69 82.40 70.05 78.56 90.56 62.24 57.12 78.72 77.28 80.64 50.69
bre 74.89 69.46 65.40 73.94 62.84 70.76 72.85 72.21 74.37 70.71 61.44
bup 66.03 63.83 45.22 64.81 63.07 68.99 44.58 59.42 66.09 64.17 63.59
car 79.55 93.37 77.52 92.06 98.38 80.85 86.82 77.26 90.80 86.81 88.28
cle 54.41 49.56 56.97 54.21 49.15 49.17 51.26 53.88 51.82 53.87 44.04
con 54.06 49.03 45.17 54.46 47.71 51.33 42.58 44.47 51.93 51.66 52.10
crx 85.79 65.61 86.37 85.85 80.10 84.39 78.89 79.49 85.30 85.39 81.77
der 95.59 83.24 80.45 92.29 93.85 73.22 87.04 87.98 92.46 90.79 88.11
eco 78.99 69.48 61.26 76.67 76.19 73.82 68.58 71.45 78.28 77.70 74.37
fla 53.17 52.01 70.54 73.99 73.17 74.48 65.93 66.60 74.48 70.15 67.13
ger 73.12 67.70 67.00 72.60 67.06 70.10 71.44 72.50 71.80 71.16 66.66
gla 66.47 73.43 49.72 61.42 67.12 68.70 60.59 60.79 68.73 62.76 64.80
hab 71.90 65.93 67.45 68.68 62.48 69.61 27.91 70.26 72.22 69.60 50.71
hea 78.52 67.56 78.00 80.07 74.00 72.59 78.59 76.67 79.63 77.78 74.74
hep 90.00 84.00 79.00 85.75 83.25 82.50 87.25 86.25 78.75 81.25 79.75
iri 94.40 94.40 88.13 95.87 92.80 95.33 81.47 90.67 93.33 94.00 93.07
lym 79.46 79.70 76.33 78.73 71.66 77.61 72.08 81.08 75.01 72.65 76.34
mag 81.24 75.37 68.77 79.94 74.40 74.76 67.51 68.66 79.81 78.43 76.89
new 94.60 93.12 86.14 92.74 94.79 94.42 84.28 90.70 91.16 90.23 94.33
nur 79.69 89.48 81.62 90.23 92.90 84.80 76.67 79.17 89.04 83.67 88.95
pen 88.58 95.73 67.75 65.84 91.15 62.00 61.85 66.91 89.36 87.20 85.71
pim 74.66 71.90 69.63 73.27 70.83 71.87 67.03 67.32 74.09 72.84 70.05
rin 91.84 68.62 78.46 88.14 79.89 83.38 49.22 79.59 82.70 84.92 81.41
tic 85.32 99.77 81.21 95.11 90.08 72.65 94.00 70.56 85.80 86.01 97.16
veh 72.70 62.39 52.33 66.46 70.33 65.36 56.72 55.79 71.87 64.47 70.76
win 96.83 95.26 72.70 92.66 92.08 92.11 79.47 85.95 94.90 94.90 90.06
wis 96.60 96.96 95.81 95.55 93.06 93.27 83.33 95.47 95.03 95.03 95.96
zoo 88.88 95.25 93.66 93.66 96.05 91.05 91.45 87.10 94.10 93.10 91.11
Mean 77.81 74.65 70.42 77.78 76.58 73.91 67.77 72.80 77.77 76.59 73.96

 

Table 14. Results with the kappa measure for GBML algorithms for rule induction and classical non-evolutionary algorithms in standard classification

Data-set XCS SIA OCEC GASSIST Oblique-DT CART AQ CN2 C4.5 C4.5-Rules RIPPER
aba .0701 .0966 .0701 .1255 .0674 .0918 .0590 .1590 .0873 .0941 .0935
aus .7092 .2752 .7162 .7042 .6345 .6947 .3876 .5884 .6799 .6933 .6300
bal .6851 .6790 .5113 .6031 .8347 .3024 .2160 .6069 .5922 .6439 .3178
bre .2781 .2191 .2082 .3099 .1051 .0000 .2166 .1417 .2330 .2700 .1716
bup .2662 .2560 -.0047 .2667 .2409 .3465 .0184 .0538 .2912 .2763 .2839
car .4321 .8550 .5535 .8238 .9647 .5534 .6689 .3230 .7986 .7011 .7591
cle .2794 .1720 .3067 .2448 .2213 .1655 .1288 .0918 .2257 .2372 .2068
con .2761 .1853 .1943 .2806 .1909 .2517 .0105 .0652 .2568 .2358 .2723
crx .7146 .2879 .7262 .7158 .5991 .6864 .5832 .5751 .7043 .7051 .6387
der .9446 .7890 .7543 .9033 .9227 .6567 .8341 .8448 .9048 .8829 .8513
eco .7043 .5654 .4768 .6711 .6732 .6367 .5349 .5816 .6998 .6886 .6559
fla .3931 .3479 .6290 .6631 .6561 .6702 .5552 .5596 .6716 .6137 .5891
ger .3082 .1104 .2400 .2953 .2268 .2591 .1858 .1254 .2826 .2926 .2510
gla .5350 .6404 .3376 .4517 .5546 .5738 .4477 .4360 .5742 .4920 .5288
hab .0835 .1304 .0862 .0880 .0701 .0387 .0018 .0051 .1521 .1087 .1432
hea .5644 .3379 .5583 .5948 .4738 .4459 .5577 .5097 .5866 .5528 .5017
hep .5389 .1283 .3646 .4230 .4171 .3124 .3916 .2169 .1240 .2705 .3191
iri .9160 .9160 .8220 .9380 .8920 .9300 .7220 .8600 .9000 .9100 .8960
lym .6044 .6099 .5508 .5918 .4597 .5611 .5040 .6325 .5367 .4895 .5627
mag .5643 .4462 .1623 .5430 .4384 .4480 .1092 .1524 .5246 .5241 .4990
new .8833 .8454 .7132 .8460 .8871 .8807 .6981 .7877 .8140 .7991 .8769
nur .6988 .8467 .7353 .8553 .8962 .7764 .6853 .6903 .8382 .7605 .8386
pen .8731 .9525 .6415 .6201 .9016 .5768 .5756 .6318 .8818 .8577 .8412
pim .4233 .3564 .2303 .3896 .3540 .4002 .0862 .1239 .4133 .4047 .3795
rin .8367 .3692 .5680 .7627 .5977 .6675 -.0076 .5907 .6538 .6986 .6286
tic .6487 .9949 .5909 .8891 .7796 .4222 .8613 .1881 .6766 .7014 .9375
veh .6361 .4983 .3647 .5529 .6043 .5400 .4252 .4083 .6248 .5259 .6104
win .9520 .9287 .5876 .8885 .8802 .8800 .6885 .7847 .9222 .9223 .8504
wis .9258 .9335 .9072 .9020 .8462 .8509 .6903 .8984 .8904 .8903 .9122
zoo .8527 .9377 .9166 .9167 .9477 .8815 .8855 .8197 .9217 .9088 .8828
Mean .5866 .5237 .4840 .5953 .5779 .5167 .4240 .4484 .5821 .5717 .5643

Analysis of the GBML algorithms for Rule Induction in Imbalanced Data-sets

In this section, we analyse the effect of imbalance on the classification quality of all the algorithms selected in our study, following the same methodology than in the previous section. We decide not to use just the representatives of the different families of GBML algorithms for rule induction because we cannot assume that the algorithms that perform best on standard classification will also be the ones that obtain the best behaviour on imbalanced data sets.

This study is divided in two parts. First we will present the results directly using the original data sets in order to determine which algorithms are best adapted to the framework of imbalanced data sets. Then, we will apply preprocessing to the training data sets to rebalance the distribution of examples of the different classes. Our aim is to analyse whether class rebalancing leads to performance improvement of the algorithms used in this study, and of GBML methods in particular.

Empirical study with the original data-sets

Table 15 shows the average results for all data sets using the geometric mean of the true rates for all GBML algorithms, also showing the average ranking. We stress in boldface the best value within each family.

This table can be also downloaded as an Excel document by clicking on the following link iconExcel.jpg

Table 15. Results with the geometric mean of the true rates for GBML algorithms for rule induction in classification with imbalanced data-sets

  Michigan Approach IRL Approach GCCL Approach Pittsburgh Approach HEDT Approach
Data-set XCS UCS SIA HIDER CORE OCEC COGIN GIL Pitts-GIRLA DMEL GASSIST OIGA ILGA DT-GA Oblique-DT TARGET
Glass1 75.07 55.40 74.51 32.87 48.75 50.79 43.04 53.14 65.69 19.76 70.07 37.34 25.54 65.62 68.57 64.38
Ecoli0vs1 97.67 20.00 96.90 18.55 95.97 87.73 79.53 89.20 97.61 12.65 97.97 96.99 94.51 98.31 96.90 98.28
Wisconsin 96.97 96.22 96.61 76.37 93.77 95.38 94.66 96.57 52.93 53.18 95.23 91.74 90.51 94.05 93.21 93.93
Pima 68.05 69.14 63.17 0.00 60.13 50.38 36.95 50.65 0.00 16.39 65.64 62.45 61.99 69.79 66.94 55.33
Iris0 100.00 98.97 100.00 69.50 100.00 89.83 91.16 90.65 98.97 72.46 99.49 97.95 98.48 98.97 97.89 98.97
Glass0 83.29 30.93 77.20 41.55 63.07 68.40 56.51 69.32 59.51 0.00 81.03 49.17 56.05 81.02 74.53 31.19
Yeast2 41.59 68.76 57.08 69.58 48.64 41.15 8.21 40.96 27.85 0.00 61.23 47.72 38.58 61.25 61.20 14.27
Vehicle2 97.06 92.63 85.09 6.30 36.29 88.41 83.97 88.31 17.23 50.17 94.75 88.14 86.77 94.23 94.48 49.68
Vehicle1 64.06 62.74 57.71 0.00 34.92 58.95 51.36 59.84 0.00 3.56 56.23 43.77 32.51 58.28 66.92 8.42
Vehicle3 59.82 69.14 53.18 0.00 11.02 65.12 50.85 62.87 5.30 12.96 58.83 44.75 38.16 57.25 65.21 0.00
Haberman 36.55 47.87 52.29 0.00 37.30 45.26 13.90 45.18 17.59 33.56 45.26 5.00 9.89 35.63 59.77 36.49
Glass0123vs456 93.11 87.07 86.97 89.90 80.87 85.35 79.87 83.49 80.16 63.04 88.57 79.99 77.59 89.24 87.19 89.53
Vehicle0 95.37 88.64 80.54 60.81 12.83 76.49 64.90 75.70 15.29 1.76 90.77 89.00 79.73 92.58 90.07 58.10
Ecoli1 87.29 58.09 73.85 76.56 88.79 58.72 50.63 62.07 85.60 15.53 86.22 81.44 87.52 84.91 81.55 86.64
New-Thyroid2 96.49 91.59 90.37 87.57 88.83 93.51 81.94 93.51 92.45 78.13 97.67 86.43 90.67 92.17 93.29 91.66
New-Thyroid1 90.00 89.55 91.87 92.32 91.25 83.68 83.96 83.77 37.98 83.63 96.21 91.35 94.30 90.86 95.65 93.77
Ecoli2 30.82 50.81 86.90 59.37 78.00 53.49 37.37 53.17 82.88 0.00 87.91 79.80 76.20 77.55 86.47 82.28
Segment0 98.98 98.88 99.44 80.39 0.00 93.37 65.13 93.31 0.00 0.00 98.00 98.26 98.08 98.19 98.41 80.54
Glass6 94.00 88.49 90.63 76.25 73.70 89.15 84.87 88.31 91.98 50.68 84.74 83.50 83.30 79.42 82.29 89.66
Yeast3 17.18 89.46 81.71 52.27 70.10 90.55 54.06 90.36 83.67 0.00 85.25 80.50 63.65 82.38 82.05 53.89
Ecoli3 31.91 46.29 61.69 45.04 48.79 63.29 56.70 66.32 42.14 7.30 67.11 50.84 40.23 72.68 67.15 0.00
Page-Blocks0 84.58 85.04 86.59 89.27 64.92 75.14 55.01 74.73 45.75 0.00 80.87 81.62 62.49 92.18 90.07 59.99
Vowel0 90.81 96.68 100.00 95.81 72.60 62.38 39.67 62.77 60.33 16.86 93.60 84.52 81.71 96.83 95.59 71.07
Glass2 0.00 9.61 19.34 63.23 10.00 65.99 0.00 67.11 0.00 0.00 0.00 0.00 0.00 52.59 32.05 0.00
Ecoli4 44.37 68.73 65.21 100.00 78.40 76.06 27.18 73.38 63.74 7.56 78.02 75.28 58.00 82.33 88.09 67.70
Glass4 85.97 36.12 95.41 69.63 56.47 69.72 35.79 66.51 11.40 7.63 62.51 27.67 48.49 30.09 62.70 0.00
Abalone9-18 14.14 41.36 40.17 96.63 32.73 48.65 20.56 53.50 21.19 44.67 40.41 20.36 7.07 44.78 54.11 0.00
Glass5 0.00 33.97 67.79 64.12 73.47 88.34 19.75 72.01 0.00 0.00 13.97 67.62 0.00 42.25 93.65 0.00
Yeast2vs8 69.65 56.57 10.00 42.31 72.66 71.45 65.46 71.23 10.00 25.55 62.74 0.00 0.00 10.00 69.05 72.83
Yeast4 0.00 59.96 26.21 75.16 6.31 58.31 8.91 58.91 11.21 0.00 17.16 12.34 12.31 39.92 55.56 0.00
Yeast5 0.00 78.14 77.71 44.69 9.43 78.04 26.56 79.44 28.01 0.00 72.67 41.32 13.69 83.67 80.82 0.00
Yeast6 0.00 75.53 51.47 43.77 0.00 54.24 18.20 56.43 0.00 0.00 0.00 0.00 0.00 56.60 62.80 0.00
Abalone19 0.00 0.00 0.00 32.31 0.00 61.72 0.00 68.96 0.00 8.04 0.00 0.00 0.00 0.00 7.53 0.00
Mean 58.93 64.92 69.62 56.13 52.73 70.88 48.08 70.96 39.59 20.76 67.58 57.48 51.76 69.87 75.81 46.93
 

We present a list of algorithms that have been selected as representative of each family for imbalanced classification. The criterion for choosing each method is exactly the same than for standard classification, that is, we extract the best performing method according to the differences supported by the statistical analysis and if no statistical differences were found, then we select the best ranking method, as in the case of UCS, GASSIST and Oblique-DT (please refer to the paper for details).

  1. Chromosome = Rule:
    1. Michigan approach: UCS.
    2. IRL approach: SIA.
    3. GCCL approach: OCEC.
  2. Chromosome = Rule Set (Pittsburgh): GASSIST.
  3. HEDTs: Oblique-DT.

Once we have selected the representative algorithms of each family we aim at comparing now their classification ability in imbalanced data sets versus the classical non-evolutionary algorithms. The average results using the geometric mean, together with the associated ranking for each method are shown in Table 16.

This table can be downloaded as an Excel document by clicking on the following link iconExcel.jpg

Table 16. Results for imbalanced data sets classification with the best GBML algorithms for rule induction and state-of-the-art non-evolutionary algorithms

Data-set UCS SIA OCEC GASSIST Oblique-DT CART AQ CN2 C4.5 C4.5-Rules RIPPER
Abalone9-18 41.36 40.17 48.65 40.41 54.11 25.14 19.79 20.78 44.82 47.55 53.37
Abalone19 0.00 0.00 61.72 0.00 7.53 0.00 56.64 0.00 0.00 0.00 23.20
Ecoli4 68.73 65.21 76.06 78.02 88.09 77.88 55.98 63.98 82.33 85.51 88.41
Glass2 9.61 19.34 65.99 0.00 32.05 25.08 65.72 0.00 60.08 60.08 36.66
Yeast4 59.96 26.21 58.31 17.16 55.56 10.84 38.39 21.21 41.97 55.08 63.82
Vowel0 96.68 100.00 62.38 93.60 95.59 84.68 60.06 48.83 96.83 96.83 95.73
Yeast2vs8 56.57 10.00 71.45 62.74 69.05 50.86 53.66 58.08 10.00 28.13 71.89
Glass4 36.12 95.41 69.72 62.51 62.70 76.37 74.23 58.55 55.37 58.22 78.89
Glass5 33.97 67.79 88.34 13.97 93.65 93.90 71.86 0.00 88.04 88.04 73.47
Yeast5 78.14 77.71 78.04 72.67 80.82 71.45 68.12 41.33 87.51 84.89 87.39
Yeast6 75.53 51.47 54.24 0.00 62.80 68.03 55.87 0.00 56.60 73.27 79.99
Ecoli1 58.09 73.85 58.72 86.22 81.55 82.91 42.12 55.93 85.38 85.70 91.52
Glass0123vs456 87.07 86.97 85.35 88.57 87.19 91.01 76.51 82.19 91.31 90.98 90.53
New-Thyroid1 89.55 91.87 83.68 96.21 95.65 94.62 86.56 85.66 90.84 91.78 92.88
New-Thyroid2 91.59 90.37 93.51 97.67 93.29 91.18 92.89 85.66 93.53 93.27 92.75
Page-Blocks0 85.04 86.59 75.14 80.87 90.07 71.04 11.70 59.29 91.95 93.09 93.57
Segment0 98.88 99.44 93.37 98.00 98.41 98.60 92.83 63.88 98.24 98.39 98.82
Vehicle0 88.64 80.54 76.49 90.77 90.07 92.28 69.86 29.88 92.90 92.83 90.88
Ecoli2 50.81 86.90 53.49 87.91 86.47 81.24 55.42 51.73 85.14 86.08 86.18
Yeast3 89.46 81.71 90.55 85.25 82.05 80.82 81.17 18.56 85.10 87.23 91.81
Ecoli3 46.29 61.69 63.29 67.11 67.15 68.12 45.38 56.04 67.38 69.56 80.69
Glass6 88.49 90.63 89.15 84.74 82.29 80.89 88.10 81.14 79.42 79.21 84.49
Ecoli0vs1 20.00 96.90 87.73 97.97 96.90 96.86 87.05 88.50 98.31 98.31 95.82
Haberman 47.87 52.29 45.26 45.26 59.77 25.44 13.85 23.20 35.63 35.06 55.54
Iris0 98.97 100.00 89.83 99.49 97.89 100.00 92.25 91.73 98.97 98.97 97.89
Pima 69.14 63.17 50.38 65.64 66.94 70.25 14.77 38.62 68.72 68.95 69.66
Vehicle2 92.63 85.09 88.41 94.75 94.48 93.52 89.08 49.64 95.57 93.64 95.33
Wisconsin 96.22 96.61 95.38 95.23 93.21 92.23 85.36 94.84 94.50 94.81 96.38
Yeast2 68.76 57.08 41.15 61.23 61.20 53.97 35.03 10.14 62.83 68.96 67.72
Glass0 30.93 77.20 68.40 81.03 74.53 74.60 63.11 21.62 81.36 78.30 76.80
Glass1 55.40 74.51 50.79 70.07 68.57 73.36 44.10 45.38 71.45 69.35 73.96
Vehicle1 62.74 57.71 58.95 56.23 66.92 53.63 43.78 30.94 64.70 61.75 71.28
Vehicle3 69.14 53.18 65.12 58.83 65.21 50.07 42.55 39.67 61.37 67.99 71.01
Mean 64.92 69.62 70.88 67.58 75.81 69.72 59.81 45.97 73.28 75.21 79.34

Empirical study applying preprocessing

Along the previous subsection we have studied the behaviour of the GBML and non-evoluationary approaches by means of an empirical study using the original imbalanced data-sets. Our aim is also to study whether a preprocessing method (G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, "A study of the behaviour of several methods for balancing machine learning training data," SIGKDD Explorations, vol. 6, no. 1, pp. 20–29, 2004 and A. Estabrooks, T. Jo, and N. Japkowicz, "A multiple resampling method for learning from imbalanced data sets," Computational Intelligence, vol. 20, no. 1, pp. 18–36, 2004.) that rebalances the imbalance ratio of the data set helps to improve the performance and in which cases. We use the SMOTE method (N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "Smote: Synthetic minority over–sampling technique," Journal of Artificial Intelligent Research, vol. 16, pp. 321–357, 2002. ) as one of the best known preprocessing methods that compensate for class imbalances.

Table 17 shows the average results of all GBML methods for all data sets. The results shown are the geometric mean and the average ranking of each method. We stress in boldface the best value within each family. This table can be also downloaded as an Excel document by clicking on the following link iconExcel.jpg

Table 17. Results with the geometric mean of the true rates for GBML algorithms for rule induction in classification with imbalanced data-sets applying SMOTE preprocessing

  Michigan Approach IRL Approach GCCL Approach Pittsburgh Approach HEDT Approach
Data-set XCS UCS SIA HIDER CORE OCEC COGIN GIL Pitts-GIRLA DMEL GASSIST OIGA ILGA DT-GA Oblique-DT TARGET
Glass1 75.53 58.40 82.33 68.45 62.43 63.01 46.65 50.73 0.00 35.43 80.07 68.46 64.91 70.03 71.31 61.16
Ecoli0vs1 97.63 97.35 95.15 90.17 97.97 97.26 85.65 86.97 95.95 5.16 97.94 97.67 96.57 98.31 96.90 95.10
Wisconsin 96.73 95.90 96.05 96.34 94.72 95.05 94.92 95.76 19.08 37.15 95.10 91.59 92.21 94.80 93.84 94.80
Pima 70.27 71.49 67.08 68.13 70.36 55.99 36.37 46.83 30.31 9.34 73.98 69.78 70.56 70.97 66.60 68.99
Iris0 100.00 100.00 99.49 99.49 100.00 94.91 93.31 93.82 0.00 74.24 100.00 98.47 96.96 98.97 97.89 100.00
Glass0 81.30 34.39 83.92 70.54 69.70 71.01 62.42 71.00 0.00 0.00 84.27 76.66 77.45 77.90 75.36 65.71
Yeast2 69.74 64.28 63.95 66.22 65.85 47.12 25.30 32.33 31.63 0.00 72.73 68.56 67.98 70.76 67.88 65.99
Vehicle2 96.88 94.30 77.41 83.36 39.73 90.60 73.59 75.46 0.00 57.46 95.24 90.97 90.71 94.29 93.91 76.22
Vehicle1 74.04 65.30 64.71 69.24 62.71 66.40 62.68 61.04 29.36 7.13 76.28 62.27 64.26 67.85 71.04 62.29
Vehicle3 74.18 68.02 66.66 67.93 19.55 69.17 60.55 64.54 0.00 15.78 73.67 65.95 65.30 66.85 68.04 65.14
Haberman 56.47 45.49 50.86 63.85 65.93 53.50 37.30 38.40 36.04 16.71 60.69 52.09 40.97 61.59 50.79 54.34
Glass0123vs456 89.48 76.13 93.48 83.16 86.03 79.71 62.79 80.92 88.04 65.18 93.13 85.71 87.09 92.35 91.77 92.42
Vehicle0 93.99 92.54 84.52 85.73 75.47 90.45 74.42 76.12 33.26 12.42 94.42 91.62 90.31 90.86 92.59 78.24
Ecoli1 89.04 48.42 84.38 85.70 89.44 37.15 37.39 41.04 0.00 14.90 85.25 86.00 87.04 90.01 82.01 87.75
New-Thyroid2 96.57 96.23 98.60 96.28 89.60 85.23 78.10 84.91 13.09 79.53 95.72 93.29 95.67 96.49 96.85 93.44
New-Thyroid1 97.41 97.96 98.30 96.56 93.46 86.97 75.41 85.35 38.26 78.38 94.84 96.47 93.40 96.06 98.03 93.55
Ecoli2 89.29 79.09 91.00 90.43 87.39 58.77 42.72 53.40 87.91 2.65 89.77 83.00 86.02 87.25 89.38 82.30
Segment0 98.62 99.06 99.19 92.33 73.04 93.92 89.77 92.10 0.00 14.11 98.98 98.53 98.96 99.11 98.91 94.08
Glass6 86.88 80.98 87.11 83.93 88.46 91.85 77.05 91.42 91.33 36.66 89.46 91.88 85.59 89.02 88.73 87.94
Yeast3 91.84 89.35 87.16 86.23 86.84 80.41 64.60 76.20 0.00 0.00 92.03 90.14 91.60 88.77 84.19 88.27
Ecoli3 83.79 59.29 84.43 82.71 82.86 50.18 45.68 43.24 18.44 15.49 83.08 83.69 72.06 81.43 75.53 85.03
Page-Blocks0 94.46 85.77 92.24 92.17 73.41 92.17 48.04 73.00 34.51 0.00 88.87 88.28 86.35 94.86 93.71 83.47
Vowel0 99.44 89.05 99.94 84.96 79.89 65.64 61.54 60.17 17.80 42.02 96.45 92.26 86.14 93.72 98.04 88.46
Glass2 64.09 0.00 68.23 26.02 60.78 49.30 63.77 64.85 0.00 0.00 54.69 58.43 44.04 51.36 66.13 59.21
Ecoli4 90.65 68.33 90.70 76.54 90.56 74.63 64.05 74.59 0.00 29.47 86.35 86.85 80.97 74.55 84.65 83.11
Glass4 83.20 34.05 87.44 59.47 60.98 78.92 58.57 73.48 0.00 61.14 68.92 62.63 78.75 87.50 83.37 85.97
Abalone9-18 69.27 38.36 69.18 69.01 69.45 50.03 45.92 48.02 0.00 4.17 70.24 59.51 61.19 52.79 65.42 68.02
Glass5 91.98 14.14 91.39 72.98 86.63 91.52 86.79 76.99 0.00 34.96 91.47 46.80 33.05 86.78 91.38 84.50
Yeast2vs8 68.61 59.65 70.61 61.74 72.11 69.11 64.57 69.37 0.00 23.05 76.77 30.90 33.96 81.01 77.11 71.15
Yeast4 80.56 56.29 72.36 66.27 80.57 48.67 29.10 37.67 32.96 1.18 76.72 81.71 72.36 66.66 62.44 82.07
Yeast5 96.40 84.04 94.79 97.35 93.70 64.70 50.63 68.92 0.00 13.84 95.58 85.88 89.77 90.93 89.14 93.04
Yeast6 87.04 77.37 83.10 84.71 77.62 66.92 51.40 60.27 0.00 0.00 80.81 72.55 76.80 80.57 75.20 85.27
Abalone19 67.08 0.00 23.35 64.58 67.01 55.84 55.14 55.43 0.00 0.70 48.18 62.42 47.34 15.58 18.67 63.51
Mean 84.92 67.30 81.79 78.26 76.19 71.70 60.79 66.80 21.15 23.89 83.69 77.91 75.95 80.61 80.51 80.02
 

Finally, we present the list of algorithms that have been selected as representative of each family for imbalanced classification when applying SMOTE preprocessing. The criterion for choosing each method is the same as we defined in the previous section of the study.

  1. Chromosome = Rule:
    1. Michigan approach: XCS.
    2. IRL approach: SIA.
    3. GCCL approach: CORE.
  2. Chromosome = Rule Set (Pittsburgh): GASSIST.
  3. HEDTs: DT-GA.

Finally, Table 18 shows the average results with SMOTE preprocessing for the selected GBML algorithms and the classical non-evolutionary algorithms. The best methods in this case are XCS and GASSIST for the GBML algorithms, followed by C4.5 and C4.5-Rules from the non-evolutionary approaches.

As for the previous tables of results, it can be downloaded as an Excel document by clicking on the following link iconExcel.jpg

Table 18. Results for imbalanced data sets classification with the best GBML algorithms for rule induction and state-of-the-art non-evolutionary algorithms applying SMOTE preprocessing

Data-set XCS SIA CORE GASSIST DT-GA CART AQ CN2 C4.5 C4.5-Rules RIPPER
Glass1 75.53 82.33 62.43 80.07 70.03 73.36 47.25 40.77 70.49 72.04 68.37
Ecoli0vs1 97.63 95.15 97.97 97.94 98.31 96.86 86.65 86.65 97.24 97.58 95.13
Wisconsin 96.73 96.05 94.72 95.10 94.80 92.23 94.55 93.98 96.19 95.58 94.41
Pima 70.27 67.08 70.36 73.98 70.97 75.96 8.78 47.90 72.92 70.47 69.44
Iris0 100.00 99.49 100.00 100.00 98.97 100.00 95.34 94.34 98.97 98.97 97.89
Glass0 81.30 83.92 69.70 84.27 77.90 74.60 58.84 72.23 79.02 75.76 79.52
Yeast2 69.74 63.95 65.85 72.73 70.76 53.97 24.93 27.93 72.23 69.94 67.92
Vehicle2 96.88 77.41 39.73 95.24 94.29 93.52 71.49 68.12 95.79 95.46 96.70
Vehicle1 74.04 64.71 62.71 76.28 67.85 53.63 41.24 52.38 71.90 72.13 66.52
Vehicle3 74.18 66.66 19.55 73.67 66.85 50.07 35.35 56.74 72.62 69.18 64.85
Haberman 56.47 50.86 65.93 60.69 61.59 47.07 2.98 39.58 64.03 66.90 34.78
Glass0123vs456 89.48 93.48 86.03 93.13 92.35 91.01 55.14 80.57 90.05 87.11 85.53
Vehicle0 93.99 84.52 75.47 94.42 90.86 92.28 75.29 76.30 92.62 92.25 93.52
Ecoli1 89.04 84.38 89.44 85.25 90.01 82.91 40.79 42.14 88.26 88.26 85.09
New-Thyroid2 96.57 98.60 89.60 95.72 96.49 91.18 83.27 86.12 95.93 97.70 97.11
New-Thyroid1 97.41 98.30 93.46 94.84 96.06 94.62 83.25 85.35 93.39 94.85 94.34
Ecoli2 89.29 91.00 87.39 89.77 87.25 81.24 51.91 49.77 89.08 88.64 84.06
Segment0 98.62 99.19 73.04 98.98 99.11 98.60 91.96 78.02 99.54 98.91 99.14
Glass6 86.88 87.11 88.46 89.46 89.02 80.89 60.39 86.59 92.62 89.58 94.95
Yeast3 91.84 87.16 86.84 92.03 88.77 80.82 75.14 76.40 92.26 93.55 88.26
Ecoli3 83.79 84.43 82.86 83.08 81.43 68.12 39.98 19.99 85.78 81.83 87.16
Page-Blocks0 94.46 92.24 73.41 88.87 94.86 71.04 0.64 73.43 94.51 94.55 94.36
Vowel0 99.44 99.94 79.89 96.45 93.72 84.68 55.36 49.86 94.69 96.59 97.47
Glass2 64.09 68.23 60.78 54.69 51.36 25.08 62.66 65.12 64.04 68.04 62.33
Ecoli4 90.65 90.70 90.56 86.35 74.55 77.88 55.61 74.59 86.48 84.28 82.39
Glass4 83.20 87.44 60.98 68.92 87.50 76.37 71.09 76.96 72.13 68.00 68.94
Abalone9-18 69.27 69.18 69.45 70.24 52.79 25.14 7.08 49.71 63.19 66.46 73.80
Glass5 91.98 91.39 86.63 91.47 86.78 93.90 90.61 71.89 91.40 91.14 77.77
Yeast2vs8 68.61 70.61 72.11 76.77 81.01 50.86 41.69 69.44 82.33 74.18 58.75
Yeast4 80.56 72.36 80.57 76.72 66.66 10.84 28.71 22.59 73.39 75.83 65.67
Yeast5 96.40 94.79 93.70 95.58 90.93 71.45 34.16 68.16 95.06 95.07 84.71
Yeast6 87.04 83.10 77.62 80.81 80.57 68.03 42.71 60.22 84.18 80.31 73.25
Abalone19 67.08 23.35 67.01 48.18 15.58 0.00 18.26 55.41 8.00 7.99 38.80
Mean 84.92 81.79 76.19 83.69 80.61 70.55 52.52 63.61 82.43 81.79 79.48

Page Maintained by A. Fernández