Genetics-Based Machine Learning for Rule Induction: State of the Art, Taxonomy and Comparative Study

This Website contains complementary material to the paper:

A. Fernandez, S. García, J. Luengo, E. Bernadó-Mansilla, F. Herrera, Genetics-Based Machine Learning for Rule Induction: State of the Art, Taxonomy and Comparative Study. IEEE Transactions on Evolutionary Computation 14:6 (2010) 913-941, doi:10.1109/TEVC.2009.2039140.

The web is organized according to the following summary:

Paper content
Taxonomy of Genetics-Based Machine Learning Algorithms for Classification
Data-sets partitions employed in the paper
1. Standard classification
2. Imbalanced data (including pre-processed data sets)
Experimental Study

Paper Content

Abstract: The classification problem can be addressed by numerous techniques and algorithms, which belong to different paradigms of Machine Learning. In this work, we are interested in evolutionary algorithms, the so-called Genetics-Based Machine Learning algorithms. In particular, we will focus on evolutionary approaches that evolve a set of rules, i.e., evolutionary rule-based systems, applied to classification tasks, in order to provide a state-of-the-art in this field.

This study has been done with a double aim: to present a taxonomy of the Genetics-Based Machine Learning approaches for rule induction, and to develop an empirical analysis both for standard classification and for classification with imbalanced data sets.

We also include a comparative study of the GBML methods with some classical non-evolutionary algorithms, in order to observe the suitability and high potential of the search performed by evolutionary algorithms and the behaviour of the GBML algorithms in contrast to the classical approaches, in terms of classification accuracy.

Summary:

Introduction.
Taxonomy of Genetics Based Machine Learning Algorithms for Classification.
1. Taxonomy.
2. Preliminaries: Genetics-Based Machine Learning Features.
3. Taxonomy Famlies and Algorithms' Description
Experimental Framework.
1. Performance measures for standard classification.
2. Imbalanced data sets: formulation and performance measures.
3. Selected classical Machine Learning algorihtms.
4. Data sets.
5. Parameters.
6. Statistical Tests for Performance Comparison.
7. KEEL Software.
8. Web page associated to the paper.
Analysis of the GBML Algorithms for Rule Induction in Standard Classification.
1. Study of the behaviour of the algorithms in the different families.
2. Study of the representative algorithms of each family versus classical ones.
Analysis of the GBML Algorithms for Rule Induction in Imbalanced Data sets.
1. Empirical analysis with the original data sets
  1. Study of the behaviour of the algorithms in the different families.
  2. Study of the representative algorithms of each family versus classical ones.
2. Empirical analysis applying preprocessing
  1. Study of the behaviour of the algorithms in the different families.
  2. Study of the representative algorithms of each family versus classical ones.
Discussion: Lessons Learned and New Challanges.
Conclusions.

Taxonomy of Genetics-Based Machine Learning Algorithms for Classification

In this section, we first propose a taxonomy for the different GBML approaches for rule induction, classifying them into different families. Then, we introduce some basic features of the GBML algorithms that enable us to highlight the similarities and differences among the existing methods in this context. Finally, we describe each one of the GBML families, presenting the representative algorithms that we have selected within each category.

Taxonomy

Many proposals have been developed under the label of GBML for rule induction. All these approaches share an underlying commonality: the codification of the rule set or rule in a chromosome and the use of an evolutionary algorithm as the search mechanism. However, there are many differences among the algorithms, such as the type of learning scheme (supervised vs. reinforcement learning or online vs offline learning), the rule set representation (i.e., a decision list or a set of overlapping rules), the representation of each rule (i.e., the management of numerical and nominal values), or the design of the EA that guides the search.
With the aim of categorising the GBML methods for rule induction presented in the literature, we distinguish among three different families based on the chromosome codification:

Chromosome = Rule. In this type of algorithms, each rule is encoded in a single chromosome and the whole population evolves to the final rule-set. We consider three subcategories of methods:
- The Michigan approach (J. H. Holland, Escaping brittleness: The possibilities of general purpose learning algorithms applied to parallel rule-based systems. Los Altos, CA: Morgan Kaufmann, 1986, vol. 2, pp. 593–623, L. B. Booker, D. E. Goldberg, and J. H. Holland, "Classifier systems and genetic algorithms," Artificial Intelligence, vol. 40, no. 1-3, pp. 235–282, 1989), in which the rule-set is incrementally updated through the sequential observation of training examples and their corresponding classification. Eventually, the rule-set is improved by the action of the EA.
- The Iterative Rule Learning (IRL) approach (G. Venturini, "SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts," in Machine Learning ECML-93, ser. LNAI, P. Brazdil, Ed., vol. 667. Springer, 1993, pp. 280–296), where the EA learns rule by rule iteratively and removes from the training set the examples covered (matched) by each new rule, in a divide-and-conquer style.
- The GCCL approach (D. P. Greene and S. F. Smith, "Competition-based induction of decision models from examples," Machine Learning, vol. 13, no. 23, pp. 229–257, 1993), where all rules/chromosomes are evolved together in the EA. The chromosomes cooperate among themselves to perform the classification task, but the final rule set does not need to include all the rules. Hence, there is a competition to be among the best fitted rules and therefore to remain in the rule base.
Chromosome = Rule Set. This category is known as the Pittsburgh approach (S. F. Smith, "A learning system based on genetic algorithms," Ph.D. dissertation, University of Pittsburgh, Pittsburgh, PA, 1980, S. F. Smith, "Flexible learning of problem solving heuristics through adaptive search," in 8th International Joint Conference on Artificial Intelligence, 1983, pp. 422–425). This type of methods are a result of directly extending GAs (J. Holland, Adaptation in Natural and Artificial Systems. The University of Michigan Press, 1975) to supervised learning problems. The system maintains a population of candidate rule sets whose quality is evaluated with a fitness function that considers different aspects such as the prediction accuracy and the generalization of the rule sets. Once the classifier is trained, the best individual found during the evolutionary process is used to predict the class of unknown examples.
Hybrid Evolutionary Decision Trees (HEDTs). This type of GBML contains mechanisms that combine decision trees with genetic algorithms (GAs) (E. Cantú-Paz and C. Kamath, "Inducing oblique decision trees with evolutionary algorithms," IEEE Transactions on Evolutionary Computation, vol. 7, no. 1, pp. 54–68, 2003, D. Carvalho and A. Freitas, "A hybrid decision tree/genetic algorithm method for data mining," Information Sciences, vol. 163, no. 1-3, pp.13–35, 2004). The underlying idea is to use the search capability of the GA to find a highly accurate final tree. The tree can then be interpreted as a rule set, formed by the disjunction of the rules that are obtained in each branch of the tree. Thus the chromosome can encode the full tree or a just tree node/rule. We have defined this family as Hybrid Evolutionary Decision Trees (HEDTs)

In this work, we have considered the algorithms that have been published in the most influential journals on the topic and only classical methods proposed in conference contributions. Genetic Fuzzy Systems (F. Herrera, "Genetic fuzzy systems: Taxonomy, current research trends and prospects," Evolutionary Intelligence, vol. 1, pp. 27–46, 2008), methods based on Genetic Programming (C. Zhou, W. Xiao, T. M. Tirpak, and P. C. Nelson, "Evolving accurate and compact classification rules with gene expression programming," IEEE Transactions on Evolutionary Computation, vol. 7, no. 6, pp. 519–531, 2003) and distributed approaches (A. Giordana and F. Neri, "Search-intensive concept induction," Evolutionary Computation, vol. 3, no. 4, pp. 375–419, 1995, C. Anglano and M. Botta, "NOW G-net: learning classification programs on networks of workstations," IEEE Transactions on Evolutionary Computation, vol. 6, no. 13, pp. 463–480, 2002) are out of the scope of this study.
Table 1 summarises the list of selected algorithms of each type. All algorithms were implemented by the authors of this paper, except for GASSIST and HIDER whose implementations were directly supplied by their corresponding authors. Furthermore, these methods are included in the KEEL software (J. Alcalá-Fdez, L. Sánchez, S. García, M. del Jesus, S. Ventura, J. Garrell, J. Otero, C. Romero, J. Bacardit, V. Rivas, J. Fernández, and F. Herrera, "KEEL: A software tool to assess evolutionary algorithms to data mining problems," Soft Computing, vol. 13, no. 3, pp. 307–318, 2009).

Table 1. Summary of the algorithms employed in this work
Algorithm name	Acronym	Family	Reference
XCS	XCS	Michigan	S. W. Wilson, "Classifier fitness based on accuracy," Evolutionary Computation, vol. 3, no. 2, pp. 149–175, 1995
UCS	UCS	Michigan	E. Bernadó-Mansilla and J. M. Garrell, "Accuracy-based learning classifier systems: models, analysis and applications to classification tasks," Evolutionary Computation, vol. 11, no. 3, pp. 209–238, 2003
Supervised Inductive Algorithm	SIA	IRL	G. Venturini, "SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts," in Machine Learning ECML-93, ser. LNAI, P. Brazdil, Ed., vol. 667. Springer, 1993, pp. 280–296
HIerarchical DEcision Rules	HIDER	IRL	J. Aguilar-Ruiz, R. Giráldez, and J. Riquelme, "Natural encoding for evolutionary supervised learning," IEEE Transactions on Evolutionary Computation, vol. 11, no. 4, pp. 466–479, 2007
CO-Evolutionary Rule Extractor	CORE	GCCL	K. C. Tan, Q. Yu, and J. H. Ang, "A coevolutionary algorithm for rules discovery in data mining," International Journal of Systems Science, vol. 37, no. 12, pp. 835–864, 2006
Organizational Co-Evolutionary algorithm for Classification	OCEC	GCCL	L. Jiao, J. Liu, and W. Zhong, "An organizational coevolutionary algorithm for classification," IEEE Transactions on Evolutionary Computation, vol. 10, no. 1, pp. 67–80, 2006.
COverage-based Genetic INduction	COGIN	GCCL	D. P. Greene and S. F. Smith, "Competition-based induction of decision models from examples," Machine Learning, vol. 13, no. 23, pp. 229–257, 1993
Genetic-based Inductive Learning	GIL	Pittsburgh	C. Janikow, "A knowledge-intensive genetic algorithm for supervised learning," Machine Learning, vol. 13, no. 23, pp. 189–228, 1993
Pittsburgh Genetic Interval Rule Learning Algorithm	Pitts-GIRLA	Pittsburgh	A. L. Corcoran and S. Sen, "Using real-valued genetic algorithms to evolve rule sets for classification," in IEEE Conference on Evolutionary Computation Proceedings, 1994, pp. 120–124
Data Mining for Evolutionary Learning	DMEL	Pittsburgh	W.-H. Au, K. C. C. Chan, and X. Yao, "A novel evolutionary data mining algorithm with applications to churn prediction," IEEE Transactions on Evolutionary Computation, vol. 7, no. 6, pp. 532–545, 2003
Genetic Algorithms based claSSIfier sySTem	GASSIST	Pittsburgh	J. Bacardit, D. Goldberg, and M. Butz, "Improving the performance of a pittsburgh learning classifier system using a default rule," in Revised Selected Papers of the International Workshop on Learning Classifier Systems 2003-2005, ser. LNCS, T. Kovacs, X. Llor, K. Takadama, P. L. Lanzi, W. Stolzmann, and S. W. Wilson, Eds., vol. 4399. Springer, 2007, pp. 291–307
Ordered Incremental Genetic Algorithm	OIGA	Pittsburgh	F. Zhu and S. U. Guan, "Ordered incremental training with genetic algorithms," International Journal of Intelligent Systems, vol. 19, no. 12, pp. 1239–1256, 2004
Incremental Learning with Genetic Algorithms	ILGA	Pittsburgh	S. U. Guan and F. Zhu, "An incremental approach to genetic-algorithms based classification," IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 35, no. 2, pp. 227–239, 2005
Hybrid Decision Tree - Genetic Algorithm	DT-GA	HEDT	D. Carvalho and A. Freitas, "A hybrid decision tree/genetic algorithm method for data mining,” Information Sciences, vol. 163, no. 1-3, pp. 13–35, 2004
Oblique Decision Tree	Oblique-DT	HEDT	E. Cantú-Paz and C. Kamath, "Inducing oblique decision trees with evolutionary algorithms," IEEE Transactions on Evolutionary Computation, vol. 7, no. 1, pp. 54–68, 2003
Tree Analysis with Randomly Generated and Evolved Trees	TARGET	HEDT	J. B. Gray and G. Fan, "Classification tree analysis using TARGET," Computational Statistics & Data Analysis, vol. 52, no. 3, pp. 1362–1372, 2008

Preliminaries: Genetics-Based Machine Learning Features

With the aim of giving a general background of the selected algorithms, in this section we describe the main components which are relevant to GBML algorithms and at the same time distinguish them from non-evolutionary learning algorithms. This also leads to providing an educational character to the paper content. We considered the following characteristics:

A. Chromosome representation:
As mentioned before, the underlying commonality of GMBL is the use of an EA as the search mechanism. In this respect, the first consideration is the representation of the solution in a population of chromosomes. The choice of the chromosome as codifying a single rule or a rule set determines largely the operation of the algorithm, and that is the motivation for basing our taxonomy on the chromosome representation. In any case, the rule set in its whole may operate as a set of non-ordered rules, either overlapped or not overlapped, or as a decision list. Also, the inference type (the classification process itself) is very dependent on the type of rule used in the algorithm. The details are as follows:

Non-ordered overlapping rule sets, also called "IF-THEN" rules. Since the rules can be overlapping, the whole rule set does not necessarily cover the full space of inputs. At classification time, if the example matches more than a rule, a voting method is usually applied, where the output is the class advocated by the majority. Other approaches include a weighted voting, or a measure of the distance to the nearest rule (S. Salzberg, "A nearest hyperrectangle method," Machine Learning, vol. 6, pp. 251–276, 1991) In the case of a tie, some approaches leave the example as non-classified, while others assign the class randomly.
Non-ordered, non-overlapped rule sets. This is the case of rule sets obtained from decision trees or axis-parallel hyperplanes, which split the search space into a set of non-overlapping regions. In this sense, the rule set covers all the input space. The classification process with these rules is trivial, since only one rule can match the example.
Ordered rule sets, also called "IF-THEN-ELSE" rules or decision lists ( R. L. Rivest, "Learning decision lists," Machine Learning, vol. 2, pp. 229–246, 1987). The classification is simply made using the first rule in the list that matches the example. It is very common to use a "default rule" that explains the input space not covered by any of the rules in the set.

Each rule has the form condition ⇒ class, where the condition specifies the set of input states that the rule matches, i.e., the region of the search space covered by the rule, and the class is the classification that the rule advocates for that region. The condition part is usually represented as a conjunction of tests over the attributes. There are two main schemes:

"Attribute <operator> value". A test is a condition over an attribute value, which is specified differently depending on whether the attribute is nominal or numerical:
- Nominal attributes: when the attribute can take a value among a list of possible categories, the test over the attribute consists in checking for equality to a given category or belonging to a subset of categories. That is, "<operator>" may be in either {=,≠,∈,∉}.
- Numerical attributes: if the attribute can take numerical values, the test checks whether the value belongs to a given interval. That is, the "<operator>" may be "<,≤,=,≥>". Interval-rules take the form "Attribute ∈ [min,max]" which is equal to the case of "Attribute ≥ min AND Attribute ≤ max".

Disjunction of attributes within a condition can also be codified, although this is rarely implemented in GBML algorithms. The disjunction of attribute values can be alternatively codified with the disjunction of several rules. This may enlarge the rule set, but simplifies the complexity of the rule. We must also remark that some algorithms treat the nominal attributes as ordered numerical values and consequently use the representations designed for numerical attributes.

Oblique Rules: The tests over the attributes are codified by a hyperplane, with the following form: ∑{i=1}^{d}{a_ix_i}+a_d+1>0, where the a_i are real-valued coefficients (D. G. Heath, S. Kasif, and S. Salzberg, "Induction of oblique decision trees," in 13th International Joint Conference on Artificial Intelligence (IJCAI93), 1993, pp. 1002–1007 and E. G. Henrichon and K. Fu, "A nonparametric partitioning procedure for pattern classification," IEEE Transancations on Computation, vol. C-18, pp. 614–624, 1969) and x_i is the value of the i-th attribute. This is an extension of the axis-parallel hyperplanes in the attribute space used by decision trees.

It is common that not all the attributes are relevant to the classification. Some algorithms explicitly allow the codification of irrelevant attributes with the so-called "don't care" symbol. This favours the generalization of the rules. Some approaches enable the chromosome to represent variable length conditions. For algorithms using a fixed chromosome length, the representation of the "don't care" condition depends on the selected rule scheme defined above:

For the "Attribute <operator> value" rules, we can represent the "don't care" using an specific special value with the $=$ operator, including all possible values for ∈ (none for ∉), using the maximum range or minimum range for the "<,≤,=,≥>" operators respectively, or applying an inversion of operators for the interval-rules, that is, having the "min" value higher than the "max" one.
For the oblique rules, it is only needed to assign a weight 0 for those variables that are not included in the antecedent.

B. Learning scheme:

This feature refers to the procedure in which the algorithm learns the rule set from a set of pre-classified examples.
Depending on how examples are provided, learning can be performed in an incremental mode (online method), or in a batch mode (offline method). The former scheme implies that the knowledge can be updated as new examples arrive to the system. Michigan approaches usually follow this. The latter model does not easily allow updates of the rule set once the algorithm is trained. If new information is available, the whole rule base should be retrained, often from scratch. Training starts with the whole training data set available. The internal operation of the algorithm may decide whether to use the whole data set in each iteration to build the model (as in the Pittsburgh approach), or incrementally reduce it by deleting the examples already covered, such as in those approaches following a "divide-and-conquer" strategy.

Learning can also happen in a supervised, unsupervised or reinforcement learning environment. In a supervised environment, the learning algorithm knows the class of the example. In reinforcement learning, the algorithm gets a reward after predicting the class of the example. The algorithm should build its model based on the positive and negative (or absence of) rewards. Unsupervised classification belongs to the case where no external feedback is given to the learner. This case does not fall into the scope of the paper.

C. Fitness Function:
The assessment of the quality of the rule set is essential to evolve an appropriate model from examples. Since the search of the EA is guided by the fitness function, it must include the quality measures that reflect the type of rule set required. The optimal rule set is usually that with high accuracy and low complexity. Accuracy is usually considered as the percentage of correct classifications, although more sophisticated metrics may arise such as those considering the percentage of classification per class. Complexity is measured with the number of rules or the generalization of the individual rules. We must remark that these measures of quality are applied both at the rule set level and for the individual rules, depending on what the EA evolve.

Fitness may also include niching or sharing techniques, or token competition as in M. L. Wong and K. S. Leung, Data Mining Using Grammar Based Genetic Programming and Applications. Kluwer Academic Publishers, London, 2000, which enables to adjust the global fitness and optimise the search towards different areas in the problem space by penalising those solutions/rules which are similar or which cover the same training examples.

D. Design of the EA:
As mentioned before, all GBML algorithms have as a common characteristic the use of a search process based on an EA to learn the final rule set. They usually apply a GA which can be generational or steady-state. In the former case, the population is replaced completely iteration by iteration, possibly with an elitism scheme in which the best individual(s) remain(s) in the new population. The steady-state model only modifies a subset of individuals (normally only two) during each iteration.

E. Specific Operators:
Each new proposal in the context of GBML algorithms not only implies the choice of representation, learning methodology, and design of the EA scheme, but also the use of specific operators for the knowledge discovery. These usually represent the most significant novelty of the approach and accentuate the differences among the methods.

These specific operators include the heuristics to initialise the population, the use of covering algorithms, and the crossover and mutation operators. Since they are particular to each algorithm, we will focus on this issue on each single description when needed.

Taxonomy Families and Algorithms' Description

The description of the methods used in this study is summarised in a table for each one of the proposed families, in which we detail the main characteristics with regard to the GBML features stated previously. Therefore, the reader can observe the similarities and differences among the algorithms of the same category or between any two methods. Afterwards, we present a brief description of the intrinsic operators associated with each one of the algorithms to provide the specific details for each of them.

A. Michigan Approach Michigan-style algorithms (J. H. Holland, Escaping brittleness: The possibilities of general purpose learning algorithms applied to parallel rule-based systems. Los Altos, CA: Morgan Kaufmann, 1986, vol. 2, pp. 593–623, L. B. Booker, D. E. Goldberg, and J. H. Holland, "Classifier systems and genetic algorithms," Artificial Intelligence, vol. 40, no. 1-3, pp. 235–282, 1989) evolve a set of rules called "classifiers" which are incrementally updated through the sequential observation of training examples. The best "classifiers", which are those that correctly cover a higher proportion of training examples, are selected and propagated over other less useful "classifiers", leading to an improvement in the overall system performance.

Some of the first developments of Michigan-style GBML are SCS (D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Publishing Company, 1989) and NewBoole (P. Bonelli and A. Parodi, "An efficient classifier system and its experimental comparison with two representative learning methods on three medical domains." in 4th international conference on genetic algorithms (ICGA91), R. K. Belew and L. B. Booker, Eds. Morgan Kaufmann, 1991, pp. 288–295). In this paper we select XCS, as it is the most influential algorithm in this family of classifiers, and UCS, a system that specialises XCS for supervised learning. The features of both algorithms are summarised in Table 2.

Table 2. Features of the Michigan Algorithms
Features	XCS	UCS
Learning Method	Online Reinforcement Learning	Online Supervised Learning
Type of Rule-Set	IF-THEN	IF-THEN
Inference Type	Weighted Voting	Weighted Voting
Type of Rule	Conjunction of Atts.	Conjunction of Atts.
Nominal Representation	None: Transformation to Ordered Numerical Values	None: Transformation to Ordered Numerical Values
Numerical Representation	Interval-Rule	Interval-Rule
"Don't Care" Management	Complete Interval	Complete Interval
Fitness	Inverse of Prediction Error. Sharing	Accuracy Sharing
EA Type	Steady-State GA	Steady-State GA
EA Features	Roulette Wheel Selection 2-Point Crossover Random Mutation	Roulette Wheel Selection 2-Point Crossover Random Mutation
Chromosome Representation	Real Coding normalised to [0,1] Fixed Length	Real Coding normalised to [0,1] Fixed Length

XCS (S. W. Wilson, "Classifier fitness based on accuracy," Evolutionary Computation, vol. 3, no. 2, pp. 149–175, 1995, C. Stone and L. Bull, "For real! XCS with continuous-valued inputs," Evolutionary Computation, vol. 11, no. 3, pp. 299–336, 2003) was designed for reinforcement learning, although it can be used for pattern recognition by considering that a classification problem is a reinforcement problem in which maximum rewards are given to correct classifications and low rewards correspond to incorrect classifications.

Learning takes place incrementally. In each iteration, an example is provided to the system. Then, XCS finds the matching rules and randomly chooses one of the possible classes. If no rules match, then covering is applied. It creates a new rule with a condition, which is generalized from the attribute values of the example, and a random class. Once the class is decided, a reward is received. This reward is used to update the quality of the rules that advocated that particular class. Eventually, the GA is applied to the rules that proposed the chosen class as described in (S. W. Wilson, "Generalization in the XCS Classifier System," in Genetic Programming: Proceedings of the Third Annual Conference, J. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. Fogel, M. Garzon, D. Goldberg, H. Iba, and R. Riolo, Eds. Morgan Kaufmann, 1998, pp. 665–674). The GA selects two parents (two rules) based on their fitness, which is a measure of the accuracy of the reward prediction relative to the accuracies of the overlapping rules. The rules are crossed and mutated and the final offspring are introduced into the population, deleting other rules if the population is full. The rules follow the interval rule representation, which was firstly introduced in XCS in ( S. W. Wilson, "Get Real! XCS with Continuous-Valued Inputs," in Festschrift in Honor of John H. Holland, L. Booker, S. Forrest, M. Mitchell, and R. Riolo, Eds. Center for the Study of Complex Systems, University of Michigan, 1999, pp. 111–121).

UCS (E. Bernadó-Mansilla and J. M. Garrell, "Accuracy-based learning classifier systems: models, analysis and applications to classification tasks," Evolutionary Computation, vol. 11, no. 3, pp. 209–238, 2003) is a method derived from XCS. It inherits the main features of XCS, but mainly differs in two respects. Firstly, the learning interaction is adjusted to a supervised learning scheme. This means that each example shown to the system comes along with the class. Then, UCS uses the class information to build the set of correct rules. The rules that belong to this set get their fitness improved, while rules that matched the example but not the class get their fitness reduced. The second main difference with XCS is on fitness. In UCS, fitness is directly the proportion of correct predictions of the rule with respect to the total number of matches, whereas fitness in XCS is a windowed average of the accuracy of the reward prediction.

B. Iterative Rule Learning Approach IRL GBML systems use a divide-and-conquer methodology to create an ordered list of rules (G. Venturini, "SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts," in Machine Learning ECML-93, ser. LNAI, P. Brazdil, Ed., vol. 667. Springer, 1993, pp. 280–296). The system iteratively invokes an EA where the best individual returned is added to the end of a list of rules and all the matching examples are removed from the training data set. This process is repeated until the training data set is empty.

In this case, we selected two algorithms. First, we considered a classical approach, the SIA algorithm and then, a recent representative method of this type: HIDER. Their characteristics are shown in Table 3.

Table 3. Features of the IRL Algorithms
Features	SIA	HIDER
Learning Method	Batch Divide-and-conquer	Batch Divide-and-conquer
Type of Rule-Set	IF-THEN	IF-THEN-ELSE
Inference Type	Distance to the Nearest Rule	Decision List
Type of Rule	Conjunction of Atts.	Conjunction of Atts.
Nominal Representation	Operator "="	Operator "="
Numerical Representation	Interval-Rule	Natural Coding Discretization
"Don't Care" Management	Special Value	Complete Interval
Fitness	Accuracy and Complexity	Accuracy and Coverage
EA Type	Steady-State GA	Generational GA
EA Features	Random Selection Uniform Crossover Generalisation Operator (see description)	Roulete-Wheel Selection Specific Crossover and Mutation Elitism
Chromosome Representation	Real and Binary Coding Fixed Length	Natural Coding (see description) Fixed Length

Supervised Inductive Algorithm (SIA) (G. Venturini, "SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts," in Machine Learning ECML-93, ser. LNAI, P. Brazdil, Ed., vol. 667. Springer, 1993, pp. 280–296) is a classical IRL mechanism. The main procedure of SIA is detailed as follows: first, a training sample which has not been classified yet (defined as ``uncovered'') is selected, and the most specific rule that matches that example is built. Then the condition part of the rule is generalised using a GA with a fitness function based on a linear combination of the number of classifications and misclassifications and the generalitisation of the rule.

The generalisation operator works by enlarging the bounds of the conditions in the case of a numerical attribute and by converting the condition into a ``don't care'' for nominal attributes. The GA runs iteratively under it reaches the stop condition (number of iterations without generating a better rule). The best rule in the final population is then added to the rule set and the examples covered by it are removed from the training set. The process is repeated until no more examples remain uncovered.

HIDER

HIerarchical DEcision Rules (HIDER) (J. S. Aguilar-Ruiz, J. C. Riquelme, and M. Toro, "Evolutionary learning of hierarchical decision rules," IEEE Transactions on Systems, Man, and Cyberneticts - Part B: Cybernetics, vol. 33, no. 2, pp. 324–331, 2003 and J. Aguilar-Ruiz, R. Giráldez, and J. Riquelme, "Natural encoding for evolutionary supervised learning," IEEE Transactions on Evolutionary Computation, vol. 11, no. 4, pp. 466–479, 2007) uses natural coding (defined by the authors in J. Aguilar-Ruiz, R. Giráldez, and J. Riquelme, "Natural encoding for evolutionary supervised learning," IEEE Transactions on Evolutionary Computation, vol. 11, no. 4, pp. 466–479, 2007) to represent each rule. That is, each rule is encoded as IF x₁ = L₁ ^ ... ^ x_n = L_n THEN c^k. For numerical attributes, each L_i is a label obtained by means of the natural coding representation, which is a tabular representation of the computed cut-points of a specific discretization method designed for this method (J. S. Aguilar-Ruiz, J. Bacardit, and F. Divina, "Experimental evaluation of discretization schemes for rule induction," in Genetic and Evolutionary Computation GECCO-2004, ser. LNCS, vol. 3102. Springer-Verlag, 2004, pp. 828-839). Therefore it is directly translated into an interval-rule by taking the lower and upper cut-points.

The EA initialises the population by randomly selecting some examples and creating rules that cover these examples. Then, the evolutionary search is run during a specific number of generations, with the guidance of the fitness function which considers both the accuracy and the generalization of the rules. The best rule obtained is added to the final rule set and the examples covered by the rule are removed from the training data set, similarly to SIA. The process is repeated until there are less than the number of maximum examples allowed by a threshold called "examples pruning factor".

C. Genetic Cooperative Competitive Learning In this approach, each individual codifies a rule, and the whole rule set is evolved simultaneously. Thus, rules should cooperate among them to get an optimal rule set jointly, and at the same time, rules compete against each other to survive in the population. Population diversity must be enforced to avoid individuals converging to the same area of the search space.

The algorithms selected for this family are CORE, OCEC and COGIN. Next, we describe each algorithm, whose main features are shown in Table 4.

Table 4. Features of the GCCL Approaches
Features	CORE	OCEC	COGIN
Learning Method	Batch	Batch	Batch
Type of Rule-Set	IF-THEN-ELSE	IF-THEN	IF-THEN-ELSE
Inference Type	Decision List	Weighted Voting	Decision List
Type of Rule	Conjunction of Attributes	Conjunction of Attributes	Conjunction of Attributes
Nominal Representation	Operators "= and &neq;"	Operator "="	Operator "∈"
Numerical Representation	All Comparisons: "<,≤,=,&ge,>" and Interval-Rule	None: Discretization Required	None: Discretization Required
"Don't Care" Management	Absence of Conditions	Absence of Conditions	Absence of Conditions Special Symbol
Fitness	Accuracy with Token Competition	Accuracy and Complexity	Accuracy and Complexity with Token Competition
EA Type	Generational GA	Generational GA	Generational GA
EA Features	Tournament Selection 1-Point and Linear Crossover Simple Random Mutation No Elitism	Random Selection Special Crossover (see description) No Mutation Elitism	Random Selection 1-Point Crossover No Mutation Elitism based in Token Competition
Chromosome Representation	Real and Binary Coding Variable Length	Binary Coding Variable Length	Binary Coding Variable Length

CORE

CO-Evolutionary Rule Extractor (CORE) (K. C. Tan, Q. Yu, and J. H. Ang, "A coevolutionary algorithm for rules discovery in data mining," International Journal of Systems Science, vol. 37, no. 12, pp. 835–864, 2006) evolves a set of rules, which are initialised randomly, using as fitness a combination of the true positive rate and the false positive rate, together with a token competition that reduces the size of the rule-set. It uses a specific regeneration operator that re-initialises those chromosomes that have a fitness below the average. For nominal attributes it uses the one-point crossover, whereas for the numerical attributes it applies a linear combination of the parents.

OCEC

Organizational Co-Evolutionary algorithm for Classification (OCEC) (L. Jiao, J. Liu, and W. Zhong, "An organizational coevolutionary algorithm for classification," IEEE Transactions on Evolutionary Computation, vol. 10, no. 1, pp. 67–80, 2006.) creates groups of similar examples with the same class label and then builds a rule for each group discovered. Thus the chromosome has a variable length and it stores the position of the example in the training set. At classification time, the most general rule that covers the example defines its class.

Three specific evolutionary operators are used in this algorithm, the "migrating operator" that moves a predefined number of examples from one chromosome to another, the "exchanging operator" that exchanges a predefined number of examples between two chromosomes, and the "merging operator" that joins two chromosomes. It also uses an elitist scheme in which the best pair between the parents and the offspring remains in the population. The fitness function depends on the number of conditions and the confidence of the rule.

COGIN

COverage-based Genetic INduction (COGIN) (D. P. Greene and S. F. Smith, "Competition-based induction of decision models from examples," Machine Learning, vol. 13, no. 23, pp. 229–257, 1993) manages a variable number of rules, which are initialised randomly until they cover all the examples of the training set, making half of the conditions ``don't care''. It generates a new offspring population by randomly selecting two parents and applying a one-point crossover. Then it applies a token competition to remove those rules that do not match any example already covered by better rules. The fitness is based on a linear combination of the entropy and the generality of the rule (number of active bits).

D. Pittsburgh Approaches In the Pittsburgh-style GBML systems, the GA encodes the rule set directly into the chromosome. In this manner, each individual represents the whole solution and the different rule sets are evolved until convergence.

There are a large number of Pittsburgh style approaches in the literature. In this paper we consider the following methods: GIL, Pitts-GIRLA, DMEL, GASSIST, OIGA and ILGA. As with the previous families introduced in this section, we summarise the main characteristics of these algorithms in Table 5.

Table 5. Features of the Pittsburgh Approaches
Features	GIL	Pitts-GIRLA	DMEL	GASSIST	OIGA	ILGA
Learning Method	Batch	Batch	Batch	Batch	Batch	Batch
Type of Rule-Set	IF-THEN-ELSE	IF-THEN	IF-THEN	IF-THEN-ELSE	IF-THEN	IF-THEN
Inference Type	Decision List	Voting	Weighted Voting	Decision List	Voting	Voting
Type of Rule	Conjunction of Atts.	Conjunction of Atts.	Conjunction of Atts.	Conjunction and Disjunction of Atts.	Conjunction of Atts.	Conjunction of Atts.
Nominal Representation	Operator "∈"	None: Transformation to Ordered Numerical Values	Operator "="	Operator "∈"	None: Transformation to Ordered Numerical Values	None: Transformation to Ordered Numerical Values
Numerical Representation	None: Discretization Required	Interval-Rule	None: Discretization Required	Interval-Rule	Interval-Rule	Interval-Rule
"Don't Care" Management	Absence of Conditions All Values for a Condition	Reverse Interval	None	Absence of Conditions	Reverse Interval	Reverse Interval
Fitness	Accuracy and Complexity	Accuracy	Accuracy	Accuracy and Complexity	Accuracy	Accuracy
EA Type	Generational GA	Generational GA	Steady-State GA	Generational GA	Steady-State GA	Steady-State GA
EA Features	Special Operators (see description) No Elitism	Random Selection 1-Point Crossover Random and "Creep" Mutation. Elitism	Roulette-Wheel Selection 1 and 2-Point Crossover Mutation based on Local Search	Tournament Selection Special Crossover and Mutation (see description). Elitism	Roulette-Wheel Selection. 1-Point Crossover Random Mutation	Roulette-Wheel Selection. 1-Point Crossover Random Mutation
Chromosome Representation	Binary Coding Variable Length	Real Coding Fixed Length	Binary Coding Variable Length	Binary Coding and ADI representation Variable Length	Real Coding Fixed Length	Real Coding Fixed Length

The Genetic-based Inductive Learning (GIL) (C. Janikow, "A knowledge-intensive genetic algorithm for supervised learning," Machine Learning, vol. 13, no. 23, pp. 189–228, 1993) learns rules for each class sequentially, starting for the minority class, leaving the majority class as a default rule. Thus, it runs the GIL algorithm n-1 times, where n is the number of classes. For a given run i, class i is the positive class and all the other classes represent the negative class. Thus, a multiple class problem is converted into n-1 different concept learning problems.

The initialisation of the chromosomes is carried out randomly for half of the population and using some of the training examples for the other half. The fitness of each individual is based on four conditions: accuracy rate, completeness (covering all positive examples), correctness (minimization of the number of negative examples covered) and complexity of the rule (computed as the number of rules and conditions). GIL implements 14 genetic operators belonging to three kinds of levels. Six of them work at the rule set level and consist in adding and exchanging rules to the rule set, and generalising or specialising the rule set. Five operators work at the rule level by adding, removing or exchanging conditions. The final three operators work at the condition level by adding or removing values. All these operators have an associated probability which is auto-adapted according to the needs of the learning process, that is, to favour the generalisation or specialisation of the rule set.

Pitts-GIRLA

The Pittsburgh Genetic Interval Rule Learning Algorithm (Pitts-GIRLA) (A. L. Corcoran and S. Sen, "Using real-valued genetic algorithms to evolve rule sets for classification," in IEEE Conference on Evolutionary Computation Proceedings, 1994, pp. 120–124) initialises all chromosomes at random, where a "don't care" condition occurs when the lower bound of the interval associated with a given variable is higher than the upper bound. Apart from the conventional 1-point crossover and random mutation, it uses a so-called "creep" mutation which is applied to the bounds of each attribute condition and increments or decrements them with a fraction of the valid range for the attribute value.

DMEL

Data Mining for Evolutionary Learning (DMEL) (W.-H. Au, K. C. C. Chan, and X. Yao, "A novel evolutionary data mining algorithm with applications to churn prediction," IEEE Transactions on Evolutionary Computation, vol. 7, no. 6, pp. 532–545, 2003) starts with the generation of an initial set of first-order rules (rules with one condition) using a probabilistic induction technique and based on these rules, rules of a higher order (two or more conditions) are obtained iteratively. Then DMEL begins an evolutionary learning process by generating an initial population of individuals by randomly combining the rules in the rule base of order l-1 to form a set of rules of order l. The consequent of the rule is not encoded in the chromosome but determined by the computation of the fitness value. Finally, the classification is made by the rule with the highest fitness that matches the example.

GASSIST

GASSIST (Genetic Algorithms based claSSIfier sySTem) (J. Bacardit and J. Garrell, "Evolving multiple discretizations with adaptive intervals for a pittsburgh rule-based learning classifier system," in Genetic and Evolutionary Computation Conference (GECCO20003), ser. LNCS, vol. 2724. Springer-Verlag, 2003, pp. 1818–1831, J. Bacardit, D. Goldberg, and M. Butz, "Improving the performance of a pittsburgh learning classifier system using a default rule," in Revised Selected Papers of the International Workshop on Learning Classifier Systems 2003-2005, ser. LNCS, T. Kovacs, X. Llor, K. Takadama, P. L. Lanzi, W. Stolzmann, and S. W. Wilson, Eds., vol. 4399. Springer, 2007, pp. 291–307, J. Bacardit and J. M. Garrell, "Bloat control and generalization pressure using the minimum description length principle for a pittsburgh approach learning classifier system," in Revised Selected Papers of the International Workshop on Learning Classifier Systems 2003-2005, ser. LNCS, T. Kovacs, X. Llor, K. Takadama, P. L. Lanzi, W. Stolzmann, and S. W. Wilson, Eds., vol. 4399. Springer, 2007, pp. 59–79) inherits from the GABIL algorithm (K. A. DeJong, W. M. Spears, and D. F. Gordon, "Using genetic algorithms for concept learning," Machine Learning, vol. 13, pp. 161– 188, 1993). The GASSIST algorithm initialises the population at random and uses as a fitness function the Minimum Description Length principle (J. Rissanen, "Modeling by shortest data description," Automatica, vol. 14, pp. 465–471, 1978), that takes into account both the accuracy and the complexity of the rule set. The representation of the chromosome differs for nominal and numerical attributes. For the first type, it uses a string of bits, one per each category allowed for that attribute, whereas for continuous variables it applies an Adaptive Discretization Intervals rule representation J. Bacardit and J. Garrell, "Evolving multiple discretizations with adaptive intervals for a pittsburgh rule-based learning classifier system," in Genetic and Evolutionary Computation Conference (GECCO20003), ser. LNCS, vol. 2724. Springer-Verlag, 2003, pp. 1818–1831). This representation consists, in general terms, of a hierarchical uniform-width discretization of each attribute having different cut-points for the same attribute in different rules. We should point out that this representation can be directly translated into an interval-rule by taking the lower and upper cut-points of each attribute.

It uses the multiple-point crossover operator inherited from GABIL that forces the selected cut points in both parents to be in the same position of the variable so that semantically correct offspring are obtained. The mutation operator randomly adds or removes one value of a given variable. GASSIST introduces a new deletion operator that removes rules from individuals to limit the rule set size. This operator is activated after a predefined number of iterations and removes the rules of an individual that do not match any input example.

OIGA

OIGA (Ordered Incremental Genetic Algorithm) (F. Zhu and S. U. Guan, "Ordered incremental training with genetic algorithms," International Journal of Intelligent Systems, vol. 19, no. 12, pp. 1239–1256, 2004) carries out the learning in two steps: first, it learns one-condition rules (generated randomly) for each attribute, and then optimising their values using a GA. Then, once all the attributes have been explored separately, OIGA joins the obtained one-condition rule sets ordered by their fitness. This process adds the one-condition rule sets to the current increasing rule sets one by one, randomly merging the rules with the same consequent from the best one-condition rule-set.

This incremental process involves a new optimisation procedure in each merging step, in which the current increasing rule sets are refined by means of a GA (with the same features that the previous one). This process stops once all attributes have been added, obtaining a rule set with all the attributes. Both genetic processes use the accuracy rate as fitness.

ILGA

Incremental Learning with GAs (ILGA) (S. U. Guan and F. Zhu, "An incremental approach to genetic-algorithms based classification," IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 35, no. 2, pp. 227–239, 2005) is an evolution of the OIGA learning process. ILGA expands the possibilities of the incremental process, by means of using, as a seed for all the initial population, either the best one-condition rule set (as OIGA does) or the whole population of chromosomes in the current solution. It also introduces a bias in the mutation and crossover operators. This bias affects the ``old'' part of the chromosome, that is, the genes added in the previous iterations but not in the current one. If the randomly chosen point for mutation or crossover is located in the ``old'' genes, the corresponding rates may be reduced with a reduction rate. The motivation behind this is that ILGA tends to preserve the structure of old elements and explore more of the combination between old and new elements.

E. Hybrid Evolutionary Decision Trees Apart from the classical GBML approaches described previously, in this paper we study some hybrid approaches that unify the use of decision trees with GAs, selecting the most important algorithms from the literature, DT-GA and Oblique-DT, and including the TARGET algorithm as a recent representative. The features of these algorithms are listed in Table 6.

Table 6. Features of the GCCL Approaches
Features	DT-GA	Oblique-DT	TARGET
Learning Method	Batch Divide-and-conquer	Batch	Batch
Type of Rule-Set	Axis-Parallel hyperplanes and IF-THEN	Oblique-Rules	Oblique-Rules and IF-THEN
Inference Type	Matching Rule	Matching Rule or Weighted Voting (see description)	Matching Rule
Type of Rule	Conjunction of Atts.	Conjunction of Linear Combination of Atts.	Conjunction and Linear Combination of Atts.
Nominal Representation	Operator "∈"	None: Transformation to Ordered Numerical Values	Operator "∈"
Numerical Representation	Operators "< and ≥"	Linear Combination	Linear Combination (Up to Three Variables)
"Don't Care" Management	Special Symbol	Not implicit	Not implicit
Fitness	Accuracy	Accuracy	Accuracy and Complexity
EA Type	Generational GA	Generational GA	Generational GA
EA Features	Tournament Selection 1-point Crossover Random Mutation Elitism	Tournament Selection Uniform Crossover No Mutation Elitism	Roulette-Wheel Selection Special Crossover and Mutation (see description) Elitism and Reinitialisation
Chromosome Representation	Binary and Real Coding Fixed Length	Real Coding Fixed Length	Binary and Real Coding Variable Length

DT-GA

The Hybrid Decision Tree - Genetic Algorithm (DT-GA) method (D. Carvalho and A. Freitas, "A hybrid decision tree/genetic algorithm method for data mining,” Information Sciences, vol. 163, no. 1-3, pp. 13–35, 2004) discovers rules in two training phases. In the first phase, it runs C4.5 with the whole training set and transforms the resulting tree into an "IF-THEN" set of rules. The examples covered by each rule are considered either as a small disjunct or as a large disjunct, depending on whether their number is smaller than or equal to a given threshold. The second phase consists of using a GA with the joint of the instances in the small disjuncts as a "second training set". To predict the output class of an example, it is pushed down the decision tree until it reaches a leaf node, as usual. If the leaf node is a large disjunct, the example is assigned the class predicted by that leaf node. Otherwise, the example is assigned the class of one of the small-disjunct rules discovered by the GA.

The rules from the GA are randomly initialised selecting a value for each attribute from the examples of the "second training set". The chromosome representation encodes the conditions for all attributes (nominal and numerical), together with a bit that specifies the "don't care" condition. The rule consequent is not encoded into the genome, but dynamically chosen as the most frequent class in the set of examples covered by that rule's antecedent. The fitness function is given by a quadratic version of the geometric mean of the true rates. It also uses a specific rule pruning operator that transforms a condition into a "don't care" based on the accuracy rate associated with each attribute. The stopping criterion is given by a threshold value of the remaining examples in the "second training set".

Oblique-DT

The Oblique Decision Tree (Oblique-DT) algorithm (E. Cantú-Paz and C. Kamath, "Inducing oblique decision trees with evolutionary algorithms," IEEE Transactions on Evolutionary Computation, vol. 7, no. 1, pp. 54–68, 2003) extends the classical OC1 method (S. K. Murthy, S. Kasif, and S. Salzberg, "A system for induction of oblique decision trees." Journal of Artificial Intelligence Research, vol. 2, pp. 1–32, 1994) (a greedy optimizer for oblique-type decision trees) to find oblique partitions by means of a standard generational GA.

Each chromosome encodes the coefficient of the linear combination that defines the oblique-hyperplane. To initialise the population, we first compute the best axis-parallel hyperplane using an impurity measure defined below, and we copy it to 10% of the initial population. The remainder of the population is initialised randomly with coefficients a_i ∈ [-200,200]. The fitness value is computed as the impurity of a split at each tree node using the twoing rule (L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Pacific Grove, 1984), which is based on the balance of the number of examples on the left and right of the split.

TARGET

The Tree Analysis with Randomly Generated and Evolved Trees (TARGET) methodology (J. B. Gray and G. Fan, "Classification tree analysis using TARGET," Computational Statistics & Data Analysis, vol. 52, no. 3, pp. 1362–1372, 2008) is a novel approach that uses a GA to build decision trees in which each chromosome represents a complete decision tree. The population is initialised randomly with a pre-specified probability of adding a new condition (node) to the tree. To evaluate the fitness of each chromosome, the authors uses a measure based on the correct classifications and the length of the chromosome (number of conditions/nodes).

The genetic search uses specific operators for crossover and mutation. In the case of crossover, a node swap or a subtree swap is applied. In the case of mutation, there are four possibilities: split set mutation, split rule mutation, node swap mutation and subtree swap mutation. It also uses elitism (called cloning) and reinitialisation (called transplantation) to reach a good trade-off between convergence and diversity.

Data-sets partitions employed in the paper

Standard classification

For our study in standard classification, we have selected thirty data sets from UCI repository. In all the experiments, we have adopted a 5-fold cross-validation model, i.e., we have split the data-set randomly into 5 folds, each one containing the 20% of the patterns of the data-set. Thus, four folds have been used for training and one for test. For each one of the five partitions, we have executed 5 trials of the algorithms. For each data-set, we therefore consider the average results of 25 runs. Table 7 summarizes the properties of the selected data-sets. It shows, for each data-set, the number of examples (#Ex.), the number of attributes (#Atts.), the number of numerical (#Num.) and nominal (#Nom.) features, and the number of classes (#Cl.). Some of the larger data-sets (abalone, magic, nursery, penbased and ring) have been stratified sampled at 10%. In the case of presenting missing values (cleveland, breast cancer, dermatology, hepatitis and wisconsin) we have removed the instances with any missing value before partitioning (The actual number of examples is shown between brackets). The last column of this table contains a link for downloading the 5-fold cross validation partitions for each data-set in KEEL format. All data-sets may be downloaded by clicking here.

Table 7. Summary Description for Standard Data-Sets
id	Data-set	#Ex.	#Atts.	#Num.	#Nom.	#Cl.	Download.
aba	abalone	418	8	7	1	28 (22)
aus	australian credit approval	690	14	8	6	2
bal	balance scale	625	4	4	0	3
bre	breast cancer	286 (277)	9	0	9	2
bup	liver disorders [Bupa]	345	6	6	0	2
car	car evaluation	1,728	6	0	6	4
cle	cleveland	297	13	13	0	5
con	contraceptive method choice	1,473	9	6	3	3
crx	japanese credit screening	653	15	6	9	2
der	dermatology	366 (358)	33	1	32	6
eco	ecoli	336	7	7	0	8
fla	solar flare	1,066	10	0	10	6
ger	german credit data	1,000	20	6	14	2
gla	glass identification	214	9	9	0	7 (6)
hab	haberman	306	3	3	0	2
hea	heart	270	13	6	7	2
hep	hepatitis	155 (80)	19	6	13	2
iri	iris	150	4	4	0	3
lym	lymphography	148	18	3	15	4
mag	magic	1,902	10	10	0	2
new	new-thyroid	215	5	5	0	3
nur	nursery	1,296	8	0	8	5
pen	pen-based recognition	1,100	16	16	0	10
pim	pima	768	8	8	0	2
rin	ring	740	20	20	0	2
tic	tic-tac-toe endgame	958	9	0	9	2
veh	vehicle	846	18	18	0	4
win	wine	178	13	13	0	3
wis	wisconsin	683 (630)	9	9	0	2
zoo	zoo	101	17	0	17	7

Imbalanced data (including pre-processed data sets)

For the study with imbalanced data, we have considered thirty-three data-sets from UCI with different ratios between the minority and majority classes: from low imbalance to highly imbalanced data-sets. Multi-class data sets are modified to obtain two-class non-balanced problems, defining the joint of one or more classes as positive and the joint of one or more classes as negative. Table 8 summarizes the imbalanced data employed in this study and shows, for each data-set, the number of examples (#Ex.), number of attributes (#Atts.), class name of each class (majority and minority) and class attribute distribution. This table is ordered according to this column, from low to high imbalance. Additonally, we have included one last column with the original data-sets partitions and preprocessed with SMOTE (N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "Smote: Synthetic minority over-sampling technique," Journal of Artificial Intelligent Research, vol. 16, pp. 321–357, 2002). All data-sets can be downloaded following this link. The preprocessed data sets are available here.

Table 8. Summary Description for Imbalanced Data-sets Data-Sets
Data-set	#Ex.	#Atts.	Class (min., maj.)	%Class(min.; maj.)	Download.
Glass1	214	9	(build-win-non_float-proc; remainder)	(35.51, 64.49)
Ecoli0vs1	220	7	(im; cp)	(35.00, 65.00)
Wisconsin	683	9	(malignant; benign)	(35.00, 65.00)
Pima	768	8	(tested-positive; tested-negative)	(34.84, 66.16)
Iris0	150	4	(Iris-Setosa; remainder)	(33.33, 66.67)
Glass0	214	9	(build-win-float-proc; remainder)	(32.71, 67.29)
Yeast1	1484	8	(nuc; remainder)	(28.91, 71.09)
Vehicle1	846	18	(Saab; remainder)	(28.37, 71.63)
Vehicle2	846	18	(Bus; remainder)	(28.37, 71.63)
Vehicle3	846	18	(Opel; remainder)	(28.37, 71.63)
Haberman	306	3	(Die; Survive)	(27.42, 73.58)
Glass0123vs456	214	9	(non-window glass; remainder)	(23.83, 76.17)
Vehicle0	846	18	(Van; remainder)	(23.64, 76.36)
Ecoli1	336	7	(im; remainder)	(22.92, 77.08)
New-thyroid2	215	5	(hypo; remainder)	(16.89, 83.11)
New-thyroid1	215	5	(hyper; remainder)	(16.28, 83.72)
Ecoli2	336	7	(pp; remainder)	(15.48, 84.52)
Segment0	2308	19	(brickface; remainder)	(14.26, 85.74)
Glass6	214	9	(headlamps; remainder)	(13.55, 86.45)
Yeast3	1484	8	(me3; remainder)	(10.98, 89.02)
Ecoli3	336	7	(imU; remainder)	(10.88, 89.12)
Page-blocks0	5472	10	(remainder; text)	(10.23, 89.77)
Vowel0	988	13	(hid; remainder)	(9.01, 90.99)
Glass2	214	9	(Ve-win-float-proc; remainder)	(8.78, 91.22)
Ecoli4	336	7	(om; remainder)	(6.74, 93.26)
Glass4	214	9	(containers; remainder)	(6.07, 93.93)
Abalone9vs18	731	8	(18; 9)	(5.65, 94.25)
Glass5	214	9	(tableware; remainder)	(4.20, 95.80
Yeast2vs8	482	8	(pox; cyt)	(4.15, 95.85)
Yeast4	1484	8	(me2; remainder)	(3.43, 96.57)
Yeast5	1484	8	(me1; remainder)	(2.96, 97.04)
Yeast6	1484	8	(exc; remainder)	(2.49, 97.51)
Abalone19	4174	8	(19; remainder)	(0.77, 99.23)

Experimental Study

Experimental Framework: Parameters

The configuration parameters for the GBML algorithms are shown in Table 9, whereas in Table 10 we show the parameters for the classical non-evolutionary algorithms. In both cases, the selected values are common for all problems, and they were selected according to the recommendation of the corresponding authors of each algorithm, which is the default parameters' setting included in the KEEL software (J. Alcalá-Fdez, L. Sánchez, S. García, M. del Jesus, S. Ventura, J. Garrell, J. Otero, C. Romero, J. Bacardit, V. Rivas, J. Fernández, and F. Herrera, "KEEL: A software tool to assess evolutionary algorithms to data mining problems," Soft Computing, vol. 13, no. 3, pp. 307–318, 2009).

Table 9. Parameter specification for the algorithms employed in the experimentation.
Michigan Algorithms
Algorithm	Parameters
XCS	number of explores = 100000, population size = 6400, α = 0.1, β = 0.2, δ = 0.1, ν = 10.0, θ_mna = 2, θ_del = 50.0, θ_sub = 50.0, ε₀ = 1 ,do Action Set Subsumption = false, fitness reduction = 0.1, p__I; = 10.0, F_I; = 0.01, ε_I = 0.0, γ = 0.25, χ = 0.8, μ = 0.04, θ_GA = 50.0, doGASubsumption = true, type of selection = RWS, type of mutation = free, type of crossover = 2 point, P_# = 0.33, r₀ = 1.0, m₀ = 0.1, l₀ = 0.1, doSpecify = false, nSpecify = 20.0 pSpecify = 0.5.
UCS	number of explores = 100000, population size = 6400, δ = 0.1, acc₀ = 0.99, P_cross = 0.8 P_mut = 0.04, θ_GA = 50.0, θ_del = 50.0, θ_sub = 50.0, doGASubsumption = true, r₀ = 0.6, type of selection = RWS, type of mutation = free, type of crossover = 2 point, m₀; = 0.1
IRL Algorithms
Algorithm	Parameters
SIA	Number of iterations = 200, α = 150, β = 0, Threshold Strength = 0
HIDER	Pop. Size = 100, Number of Gen. = 100, Mutation Prob.= 0.5, Percentage of Crossing = 80, Extreme Mutation Prob. = 0.05, Prune Examples Factor = 0.05, Penalty Factor = 1, Error Coefficient = 0.
GCCL Algorithms
Algorithm	Parameters
CORE	Pop. Size = 100, Co-population Size = 50, Gen. Limit = 100, Number of co-populations = 15, Crossover rate = 1.0, Mutation Prob. = 0.1, Regeneration Prob. = 0.5
OCEC	Number of Total Generations = 500, Number of migrating/exchanging members = 1
COGIN	Misclassification error level = 2, Gen. limit = 1000, Crossover rate = 0.9, Negation bit = Yes
Pittsburgh Algorithms
Algorithm	Parameters
GIL	Pop. Size = 40, Number of Gen. = 1000, w₁; = 0.5, w₂ = 0.5, w₃ = 0.01, Rules Exchange = 0.2, Rule exchange selection = 0.2, Rules copy = 0.1, New Event = 0.4, Rules Generalization = 0.5 Rules Drop = 0.5, Rules Specialization = 0.5, Rule Split = 0.005, Nominal Rule Split = 0.1, Linear Rule Split = 0.7, Condition Drop = 0.1 Conjunction to Disjunction = 0.02, Introduce Condition = 0.1, Rule Directed Split = 0.03, Reference Change = 0.02, Reference Extension = 0.03, Reference Restriction = 0.03, Condition Level Prob. = 0.5, Lower threshold = 0.2, Upper threshold = 0.8
Pitts-GIRLA	Number of rules: 30, Number of generations: 10,000 Population size: 61 chromosomes, Crossover Probability: 0.7, Mutation Probability: 0.5.
DMEL	Pop. Size = 30, Crossover Prob. = 0.5, Mutation Prob. = 0.0001, Number of Gen. = 1000
GASSIST-ADI	Threshold in Hierarchical Selection = 0, Iteration of Activation for Rule Deletion Operator = 5, Iteration of Activation for Hierarchical Selection = 24, Minimum Number of Rules before Disabling the Deletion Operator = 12, Minimum Number of Rules before Disabling the Size Penalty Operator = 4, Number of Iterations = 750, Initial Number Of Rules = 20, Population Size = 400, Crossover Probability = 0.6, Probability of Individual Mutation = 0.6, Probability of Value 1 in Initialization = 0.90, Tournament Size = 3, Possible size in micro-intervals of an attribute = {4, 5, 6, 7, 8, 10, 15, 20, 25}, Maximum Number of Intervals per Attribute = 5, p_split = 0.05, p_merge = 0.05, Probability of Reinitialize Begin = 0.03, Probability of Reinitialize End = 0, Use MDL = true, Iteration MDL = 25, Initial Theory Length Ratio = 0.075, Weight Relaxation Factor = 0.90, Class Initialization Method = cwinit, Default Class = auto
OIGA	Crossover Prob. = 1.0, Mutation Prob. = 0.01, Pop. Size = 200, Number of Rules = 30 Stagnation = 30, Generations = 200, Survivors Percent = 0.5, Attribute Order = descendent
ILGA	Crossover Prob. = 1.0, Mutation Prob. = 0.01, Pop. Size = 200, Number of Rules = 30 Stagnation = 30, Generations = 200, Survivors Percent = 0.5, Attribute Order = descendent Crossover reduction rate = 0.5, Mutation reduction rate = 0.5, Incremental strategy = 1
Hybrid Evolutionary Decision Trees
Algorithm	Parameters
DT-GA	Confidence = 0.25, Instances per leaf = 2, Threshold S to consider a Small Disjunct = 10, Number of Gen. = 50, Number of chrom. = 200, Crossover Prob. = 0.8, Mutation Prob. = 0.01
Oblique-DT	Number of Total Generations for the GA = 25, Population size = 20√d (d = dimensionality)
TARGET	Split Prob. = 0.5, Number of Gen. = 100, Number of Tress = 50

Table 10. Parameter specification for the state-of-the-are non-evolutionary algorithms employed in the experimentation.
Algorithm	Parameters
CART	Max Depth of the Tree = 90
AQ	Star Size = 5
CN2	Star Size = 5
C4.5 and	Prune = True, Confidence level = 0.25
C4.5-Rules	Minimum number of item-sets per leaf = 2
Ripper	Size of growing subset = 66% Repetitions of the optimization stage = 2

Finally, a preprocessing discretization step is applied in the case of algorithms that do not cope with real vales. Specifically, we made use of the Class-Attribute Dependent Discretizer (J. Ching, A. Wong, and K. Chan, "Class-dependent discretization for inductive learning from continuous and mixed-mode data," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 7, pp. 6641–6518, 1995).

Analysis of the GBML algorithms for Rule Induction in Standard Classification

Our first objective is to study the behaviour of the GBML algorithms for rule induction, dividing the analysis into the different families presented in our taxonomy.

Table 11 shows the results in test using accuracy, whereas Table 12 includes the ones for Cohen's kappa as performance measure. These tables are divided into five parts, each one of them related to Michigan, Iterative Rule Learning (IRL), Genetic Cooperative-Competitive Learning (GCCL), Pittsburgh and Hybrid Evolutionary Decision Trees (HEDTs) approaches.

These tables can be downloaded as an Excel document by clicking on the following links and

Table 11. Results with the accuracy measure for GBML algorithms for rule induction in standard classification
	Michigan Approach		IRL Approach		GCCL Approach			Pittsburgh Approach						HEDT Approach
Data-set	XCS	UCS	SIA	HIDER	CORE	OCEC	COGIN	GIL	Pitts-GIRLA	DMEL	GASSIST	OIGA	ILGA	DT-GA	Oblique-DT	TARGET
aba	18.67	17.80	19.52	19.20	19.29	14.12	20.39	10.48	18.52	2.92	24.45	23.65	20.72	13.82	16.56	20.15
aus	85.59	83.51	65.30	82.20	81.59	85.91	84.55	84.49	75.01	57.68	85.39	84.26	84.70	84.23	82.03	85.54
bal	82.69	76.93	82.40	73.54	66.46	70.05	74.91	52.32	69.12	70.91	78.56	53.44	53.60	75.20	90.56	75.62
bre	74.89	73.23	69.46	75.45	73.01	65.40	54.21	62.68	72.29	72.15	73.94	73.59	73.94	73.36	62.84	71.34
bup	66.03	65.80	63.83	64.23	60.23	45.22	55.36	45.86	59.42	46.43	64.81	56.87	54.90	64.64	63.07	65.97
car	79.55	91.60	93.37	70.02	79.24	77.52	72.12	80.58	79.02	8.59	92.06	79.78	70.49	86.25	98.38	77.71
cle	54.41	53.25	49.56	51.24	54.15	56.97	49.76	43.50	54.42	27.07	54.21	51.52	51.45	49.61	49.15	52.99
con	54.06	48.28	49.03	52.31	43.52	45.17	44.56	43.45	54.19	34.42	54.46	47.25	44.09	51.54	47.71	45.19
crx	85.79	83.50	65.61	80.98	82.82	86.37	83.59	84.96	86.07	56.78	85.85	84.60	85.70	86.00	80.10	86.37
der	95.59	84.41	83.24	87.59	31.72	80.45	85.65	84.35	49.79	49.90	92.29	85.98	85.27	84.85	93.85	66.15
eco	78.99	78.03	69.48	76.31	63.40	61.26	61.33	58.70	69.17	43.88	76.67	68.46	62.09	76.73	76.19	65.49
fla	53.17	72.12	52.01	74.42	65.57	70.54	67.88	61.52	46.96	50.97	73.99	71.94	63.75	74.72	73.17	61.54
ger	73.12	71.86	67.70	71.04	70.24	67.00	47.26	66.54	71.26	62.68	72.60	71.10	71.32	71.76	67.06	70.00
gla	66.47	64.12	73.43	63.91	51.03	49.72	52.46	54.99	59.45	15.34	61.42	54.94	51.02	68.35	67.12	44.11
hab	71.90	72.09	65.93	72.88	70.85	67.45	73.14	67.32	63.38	72.54	68.68	71.70	72.42	72.22	62.48	71.50
hea	78.52	76.89	67.56	72.37	74.67	78.00	77.33	77.04	53.33	77.85	80.07	78.96	78.81	77.26	74.00	74.89
hep	90.00	82.00	84.00	84.25	80.00	79.00	44.00	81.50	73.00	16.25	85.75	79.50	76.75	81.50	83.25	77.25
iri	94.40	90.00	94.40	94.13	93.47	88.13	82.80	90.00	86.40	50.27	95.87	93.73	92.93	93.33	92.80	92.93
lym	79.46	76.79	79.70	76.60	67.14	76.33	74.67	76.58	63.49	26.97	78.73	76.84	75.81	65.82	71.66	75.17
mag	81.24	79.31	75.37	73.44	73.36	68.77	67.23	68.41	78.94	64.91	79.94	77.02	75.83	80.11	74.40	75.76
new	94.60	94.88	93.12	92.09	91.35	86.14	81.77	84.19	90.42	40.56	92.74	92.19	91.44	90.42	94.79	86.79
nur	79.69	78.54	89.48	82.25	71.59	81.62	81.88	81.87	53.56	38.45	90.23	83.21	72.37	87.41	92.90	72.02
pen	88.58	91.04	95.73	68.29	33.38	67.75	72.84	48.79	33.02	19.15	65.84	49.13	47.18	86.33	91.15	38.93
pim	74.66	73.93	71.90	73.46	72.11	69.63	67.03	69.79	59.13	66.40	73.27	73.62	72.11	73.96	70.83	73.02
rin	91.84	79.51	68.62	33.95	61.14	78.46	68.89	74.16	59.89	50.41	88.14	86.81	86.05	84.46	79.89	67.59
tic	85.32	92.09	99.77	69.93	70.19	81.21	91.23	66.63	73.82	60.94	95.11	74.90	78.06	82.65	90.08	69.50
veh	72.70	71.63	62.39	63.00	39.55	52.33	51.10	45.98	45.20	33.00	66.46	59.89	57.38	68.68	70.33	49.81
win	96.83	90.54	95.26	77.49	90.29	72.70	64.29	72.91	78.84	37.87	92.66	91.01	88.49	94.90	92.08	82.24
wis	96.60	96.14	96.96	96.40	93.94	95.81	95.31	95.55	74.72	75.90	95.55	91.27	90.95	94.56	93.06	95.75
zoo	88.88	90.69	95.25	95.43	92.54	93.66	91.08	93.91	33.86	63.01	93.66	92.63	87.91	93.70	96.05	72.09
Mean	77.81	76.68	74.65	72.28	67.26	70.42	67.95	67.63	62.86	46.47	77.78	72.66	70.58	76.28	76.58	68.78

Table 12. Results with the kappa measure for GBML algorithms for rule induction in standard classification
	Michigan Approach		IRL Approach		GCCL Approach			Pittsburgh Approach						HEDT Approach
Data-set	XCS	UCS	SIA	HIDER	CORE	OCEC	COGIN	GIL	Pitts-GIRLA	DMEL	GASSIST	OIGA	ILGA	DT-GA	Oblique-DT	TARGET
aba	.0701	.0770	.0966	.0952	.0643	.0701	.1044	.0511	.0748	.0186	.1255	.1214	.0899	.0559	.0674	.0734
aus	.7092	.6646	.2752	.6404	.6320	.7162	.6912	.6897	.4578	.0537	.7042	.6888	.6958	.6832	.6345	.7122
bal	.6851	.6004	.6790	.5092	.3789	.5113	.5363	.3414	.4286	.4611	.6031	.1708	.1854	.5453	.8347	.5481
bre	.2781	.2924	.2191	.2731	.1998	.2082	.1612	.1966	.1872	.3198	.3099	.2858	.2418	.1926	.1051	.2990
bup	.2662	.2745	.2560	.2276	.1114	-.0047	-.0256	-.0194	.1794	.0136	.2667	.0661	.0157	.2733	.2409	.2847
car	.4321	.8055	.8550	.0000	.5856	.5535	.1759	.6413	.4680	.0272	.8238	.4681	.0235	.6668	.9647	.5580
cle	.2794	.2281	.1720	.2034	.0875	.3067	.2132	.1894	.0852	.1143	.2448	.1118	.0663	.2203	.2213	.1313
con	.2761	.1976	.1853	.2426	.1005	.1943	.1410	.1891	.2734	.0898	.2806	.1349	.0590	.2223	.1909	.1057
crx	.7146	.6661	.2879	.6173	.6568	.7262	.6761	.7003	.7213	.0509	.7158	.6981	.7169	.7191	.5991	.7292
der	.9446	.7840	.7890	.8435	.1266	.7543	.8227	.8033	.3693	.3243	.9033	.8248	.8178	.8091	.9227	.5645
eco	.7043	.6927	.5654	.6718	.4548	.4768	.4185	.4652	.5570	.1709	.6711	.5552	.4518	.6691	.6732	.4815
fla	.3931	.6424	.3479	.6709	.5563	.6290	.5851	.5309	.2636	.3483	.6631	.6366	.5309	.6738	.6561	.4971
ger	.3082	.2551	.1104	.1895	.0642	.2400	.1508	.3138	.1447	.0966	.2953	.1750	.2064	.2795	.2268	.0000
gla	.5350	.5053	.6404	.4903	.3367	.3376	.3701	.4098	.4141	.0576	.4517	.3734	.3143	.5652	.5546	.2302
hab	.0835	.1727	.1304	.1167	.0446	.0862	.0318	.0879	.0442	.0842	.0880	.0146	.0114	.1521	.0701	.0673
hea	.5644	.5247	.3379	.4374	.4834	.5583	.5562	.5380	.0000	.5473	.5948	.5935	.5931	.5346	.4738	.4849
hep	.5389	.2348	.1283	.2850	.1778	.3646	.1109	.2985	.1225	.0000	.4230	.3003	.2434	.0036	.4171	.1011
iri	.9160	.8500	.9160	.9120	.9020	.8220	.7427	.8500	.7960	.2540	.9380	.9079	.8964	.9000	.8920	.8940
lym	.6044	.5399	.6099	.5425	.3422	.5508	.5394	.5602	.2383	.0451	.5918	.5900	.5493	.3570	.4597	.5161
mag	.5643	.5328	.4462	.3836	.3788	.1623	.1039	.1510	.5130	.0013	.5430	.4762	.4399	.5256	.4384	.4518
new	.8833	.8849	.8454	.8281	.8042	.7132	.6330	.6925	.7890	.2515	.8460	.8363	.8238	.7772	.8871	.7214
nur	.6988	.6835	.8467	.7369	.5810	.7353	.7412	.7391	.3221	.2928	.8553	.7516	.5925	.8134	.8962	.5848
pen	.8731	.9004	.9525	.6480	.2597	.6415	.7007	.4319	.2551	.0983	.6201	.4447	.4245	.8480	.9016	.3208
pim	.4233	.4091	.3564	.3626	.3191	.2303	.1232	.2370	.0000	.0614	.3896	.4015	.3504	.4176	.3540	.4094
rin	.8367	.5895	.3692	-.3282	.2187	.5680	.3938	.4812	.2152	.0000	.7627	.7549	.7406	.6893	.5977	.3486
tic	.6487	.7959	.9949	.3399	.2794	.5909	.8213	.4109	.3330	.2464	.8891	.4286	.4997	.5900	.7796	.2975
veh	.6361	.6216	.4983	.5066	.1991	.3647	.3755	.2845	.2688	.1100	.5529	.4753	.4459	.5817	.6043	.3322
win	.9520	.8563	.9287	.6518	.8524	.5876	.5022	.5882	.6550	.1175	.8885	.8690	.8347	.9222	.8802	.7325
wis	.9258	.9146	.9335	.9206	.8653	.9072	.8990	.9033	.4833	.3681	.9020	.8188	.8112	.8801	.8462	.9075
zoo	.8527	.8686	.9377	.9397	.9050	.9166	.8838	.9192	.1051	.4668	.9167	.9037	.8421	.9165	.9477	.6224
Mean	.5866	.5688	.5237	.4653	.3989	.4840	.4393	.4559	.3255	.1697	.5953	.4959	.4505	.5495	.5779	.4336

We present now a list of algorithms that have been selected as representative of each family. The criterion for choosing each method is based on the differences among the methods within each category computed in the statistical analysis, and the best mean result of Oblique-DT in its group (please refer to the paper for details). The selected algorithms are the following ones:

Chromosome = Rule:
1. Michigan approach: XCS.
2. IRL approach: SIA.
3. GCCL approach: OCEC.
Chromosome = Rule Set (Pittsburgh): GASSIST.
HEDTs: Oblique-DT.

Now, our aim is to perform a comparative study among these approaches with some state-of-the-art non-evolutionary interval rule classification methods. These algorithms are CART (L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Pacific Grove, 1984), AQ (R. Michalksi, I. Mozetic, and N. Lavrac, "The multipurpose incremental learning system AQ15 and its testing application to three medical domains," in 5th International Conference on Artificial Intelligence (AAAI’86), 1986, pp. 1041–1045), CN2 (P. Clark and T. Niblett, "The CN2 induction algorithm," Machine Learning, vol. 3, no. 4, pp. 261–283, 1989), C4.5 (J. R. Quinlan, C4.5: Programs for Machine Learning. San Mateo-California: Morgan Kaufmann Publishers, 1993), C4.5-Rules (J. Quinlan, "MDL and categorical theories (continued)," in Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, 1995, pp. 464–470) and Ripper (W. W. Cohen, "Fast effective rule induction," in Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, 1995, pp. 115–123).

In Tables 13 and 14 we show the results for accuracy and Cohen's kappa respectively. We have included the selected Interval Rule GBML approaches, that is, XCS, SIA, OCEC, GASSIST and Oblique-DT and for the classical non-evolutionary methods.

These tables can be also downloaded as an Excel document by clicking on the following links and

Table 13. Results with the accuracy measure for GBML algorithms for rule induction and classical non-evoluationary algorithms in standard classification
Data-set	XCS	SIA	OCEC	GASSIST	Oblique-DT	CART	AQ	CN2	C4.5	C4.5-Rules	RIPPER
aba	18.67	19.52	14.12	24.45	16.56	21.29	15.17	26.80	18.67	20.97	17.37
aus	85.59	65.30	85.91	85.39	82.03	84.64	71.30	80.14	84.20	84.81	81.33
bal	82.69	82.40	70.05	78.56	90.56	62.24	57.12	78.72	77.28	80.64	50.69
bre	74.89	69.46	65.40	73.94	62.84	70.76	72.85	72.21	74.37	70.71	61.44
bup	66.03	63.83	45.22	64.81	63.07	68.99	44.58	59.42	66.09	64.17	63.59
car	79.55	93.37	77.52	92.06	98.38	80.85	86.82	77.26	90.80	86.81	88.28
cle	54.41	49.56	56.97	54.21	49.15	49.17	51.26	53.88	51.82	53.87	44.04
con	54.06	49.03	45.17	54.46	47.71	51.33	42.58	44.47	51.93	51.66	52.10
crx	85.79	65.61	86.37	85.85	80.10	84.39	78.89	79.49	85.30	85.39	81.77
der	95.59	83.24	80.45	92.29	93.85	73.22	87.04	87.98	92.46	90.79	88.11
eco	78.99	69.48	61.26	76.67	76.19	73.82	68.58	71.45	78.28	77.70	74.37
fla	53.17	52.01	70.54	73.99	73.17	74.48	65.93	66.60	74.48	70.15	67.13
ger	73.12	67.70	67.00	72.60	67.06	70.10	71.44	72.50	71.80	71.16	66.66
gla	66.47	73.43	49.72	61.42	67.12	68.70	60.59	60.79	68.73	62.76	64.80
hab	71.90	65.93	67.45	68.68	62.48	69.61	27.91	70.26	72.22	69.60	50.71
hea	78.52	67.56	78.00	80.07	74.00	72.59	78.59	76.67	79.63	77.78	74.74
hep	90.00	84.00	79.00	85.75	83.25	82.50	87.25	86.25	78.75	81.25	79.75
iri	94.40	94.40	88.13	95.87	92.80	95.33	81.47	90.67	93.33	94.00	93.07
lym	79.46	79.70	76.33	78.73	71.66	77.61	72.08	81.08	75.01	72.65	76.34
mag	81.24	75.37	68.77	79.94	74.40	74.76	67.51	68.66	79.81	78.43	76.89
new	94.60	93.12	86.14	92.74	94.79	94.42	84.28	90.70	91.16	90.23	94.33
nur	79.69	89.48	81.62	90.23	92.90	84.80	76.67	79.17	89.04	83.67	88.95
pen	88.58	95.73	67.75	65.84	91.15	62.00	61.85	66.91	89.36	87.20	85.71
pim	74.66	71.90	69.63	73.27	70.83	71.87	67.03	67.32	74.09	72.84	70.05
rin	91.84	68.62	78.46	88.14	79.89	83.38	49.22	79.59	82.70	84.92	81.41
tic	85.32	99.77	81.21	95.11	90.08	72.65	94.00	70.56	85.80	86.01	97.16
veh	72.70	62.39	52.33	66.46	70.33	65.36	56.72	55.79	71.87	64.47	70.76
win	96.83	95.26	72.70	92.66	92.08	92.11	79.47	85.95	94.90	94.90	90.06
wis	96.60	96.96	95.81	95.55	93.06	93.27	83.33	95.47	95.03	95.03	95.96
zoo	88.88	95.25	93.66	93.66	96.05	91.05	91.45	87.10	94.10	93.10	91.11
Mean	77.81	74.65	70.42	77.78	76.58	73.91	67.77	72.80	77.77	76.59	73.96

Table 14. Results with the kappa measure for GBML algorithms for rule induction and classical non-evolutionary algorithms in standard classification
Data-set	XCS	SIA	OCEC	GASSIST	Oblique-DT	CART	AQ	CN2	C4.5	C4.5-Rules	RIPPER
aba	.0701	.0966	.0701	.1255	.0674	.0918	.0590	.1590	.0873	.0941	.0935
aus	.7092	.2752	.7162	.7042	.6345	.6947	.3876	.5884	.6799	.6933	.6300
bal	.6851	.6790	.5113	.6031	.8347	.3024	.2160	.6069	.5922	.6439	.3178
bre	.2781	.2191	.2082	.3099	.1051	.0000	.2166	.1417	.2330	.2700	.1716
bup	.2662	.2560	-.0047	.2667	.2409	.3465	.0184	.0538	.2912	.2763	.2839
car	.4321	.8550	.5535	.8238	.9647	.5534	.6689	.3230	.7986	.7011	.7591
cle	.2794	.1720	.3067	.2448	.2213	.1655	.1288	.0918	.2257	.2372	.2068
con	.2761	.1853	.1943	.2806	.1909	.2517	.0105	.0652	.2568	.2358	.2723
crx	.7146	.2879	.7262	.7158	.5991	.6864	.5832	.5751	.7043	.7051	.6387
der	.9446	.7890	.7543	.9033	.9227	.6567	.8341	.8448	.9048	.8829	.8513
eco	.7043	.5654	.4768	.6711	.6732	.6367	.5349	.5816	.6998	.6886	.6559
fla	.3931	.3479	.6290	.6631	.6561	.6702	.5552	.5596	.6716	.6137	.5891
ger	.3082	.1104	.2400	.2953	.2268	.2591	.1858	.1254	.2826	.2926	.2510
gla	.5350	.6404	.3376	.4517	.5546	.5738	.4477	.4360	.5742	.4920	.5288
hab	.0835	.1304	.0862	.0880	.0701	.0387	.0018	.0051	.1521	.1087	.1432
hea	.5644	.3379	.5583	.5948	.4738	.4459	.5577	.5097	.5866	.5528	.5017
hep	.5389	.1283	.3646	.4230	.4171	.3124	.3916	.2169	.1240	.2705	.3191
iri	.9160	.9160	.8220	.9380	.8920	.9300	.7220	.8600	.9000	.9100	.8960
lym	.6044	.6099	.5508	.5918	.4597	.5611	.5040	.6325	.5367	.4895	.5627
mag	.5643	.4462	.1623	.5430	.4384	.4480	.1092	.1524	.5246	.5241	.4990
new	.8833	.8454	.7132	.8460	.8871	.8807	.6981	.7877	.8140	.7991	.8769
nur	.6988	.8467	.7353	.8553	.8962	.7764	.6853	.6903	.8382	.7605	.8386
pen	.8731	.9525	.6415	.6201	.9016	.5768	.5756	.6318	.8818	.8577	.8412
pim	.4233	.3564	.2303	.3896	.3540	.4002	.0862	.1239	.4133	.4047	.3795
rin	.8367	.3692	.5680	.7627	.5977	.6675	-.0076	.5907	.6538	.6986	.6286
tic	.6487	.9949	.5909	.8891	.7796	.4222	.8613	.1881	.6766	.7014	.9375
veh	.6361	.4983	.3647	.5529	.6043	.5400	.4252	.4083	.6248	.5259	.6104
win	.9520	.9287	.5876	.8885	.8802	.8800	.6885	.7847	.9222	.9223	.8504
wis	.9258	.9335	.9072	.9020	.8462	.8509	.6903	.8984	.8904	.8903	.9122
zoo	.8527	.9377	.9166	.9167	.9477	.8815	.8855	.8197	.9217	.9088	.8828
Mean	.5866	.5237	.4840	.5953	.5779	.5167	.4240	.4484	.5821	.5717	.5643

Analysis of the GBML algorithms for Rule Induction in Imbalanced Data-sets

In this section, we analyse the effect of imbalance on the classification quality of all the algorithms selected in our study, following the same methodology than in the previous section. We decide not to use just the representatives of the different families of GBML algorithms for rule induction because we cannot assume that the algorithms that perform best on standard classification will also be the ones that obtain the best behaviour on imbalanced data sets.

This study is divided in two parts. First we will present the results directly using the original data sets in order to determine which algorithms are best adapted to the framework of imbalanced data sets. Then, we will apply preprocessing to the training data sets to rebalance the distribution of examples of the different classes. Our aim is to analyse whether class rebalancing leads to performance improvement of the algorithms used in this study, and of GBML methods in particular.

Empirical study with the original data-sets

Table 15 shows the average results for all data sets using the geometric mean of the true rates for all GBML algorithms, also showing the average ranking. We stress in boldface the best value within each family.

This table can be also downloaded as an Excel document by clicking on the following link

Table 15. Results with the geometric mean of the true rates for GBML algorithms for rule induction in classification with imbalanced data-sets
	Michigan Approach		IRL Approach		GCCL Approach			Pittsburgh Approach						HEDT Approach
Data-set	XCS	UCS	SIA	HIDER	CORE	OCEC	COGIN	GIL	Pitts-GIRLA	DMEL	GASSIST	OIGA	ILGA	DT-GA	Oblique-DT	TARGET
Glass1	75.07	55.40	74.51	32.87	48.75	50.79	43.04	53.14	65.69	19.76	70.07	37.34	25.54	65.62	68.57	64.38
Ecoli0vs1	97.67	20.00	96.90	18.55	95.97	87.73	79.53	89.20	97.61	12.65	97.97	96.99	94.51	98.31	96.90	98.28
Wisconsin	96.97	96.22	96.61	76.37	93.77	95.38	94.66	96.57	52.93	53.18	95.23	91.74	90.51	94.05	93.21	93.93
Pima	68.05	69.14	63.17	0.00	60.13	50.38	36.95	50.65	0.00	16.39	65.64	62.45	61.99	69.79	66.94	55.33
Iris0	100.00	98.97	100.00	69.50	100.00	89.83	91.16	90.65	98.97	72.46	99.49	97.95	98.48	98.97	97.89	98.97
Glass0	83.29	30.93	77.20	41.55	63.07	68.40	56.51	69.32	59.51	0.00	81.03	49.17	56.05	81.02	74.53	31.19
Yeast2	41.59	68.76	57.08	69.58	48.64	41.15	8.21	40.96	27.85	0.00	61.23	47.72	38.58	61.25	61.20	14.27
Vehicle2	97.06	92.63	85.09	6.30	36.29	88.41	83.97	88.31	17.23	50.17	94.75	88.14	86.77	94.23	94.48	49.68
Vehicle1	64.06	62.74	57.71	0.00	34.92	58.95	51.36	59.84	0.00	3.56	56.23	43.77	32.51	58.28	66.92	8.42
Vehicle3	59.82	69.14	53.18	0.00	11.02	65.12	50.85	62.87	5.30	12.96	58.83	44.75	38.16	57.25	65.21	0.00
Haberman	36.55	47.87	52.29	0.00	37.30	45.26	13.90	45.18	17.59	33.56	45.26	5.00	9.89	35.63	59.77	36.49
Glass0123vs456	93.11	87.07	86.97	89.90	80.87	85.35	79.87	83.49	80.16	63.04	88.57	79.99	77.59	89.24	87.19	89.53
Vehicle0	95.37	88.64	80.54	60.81	12.83	76.49	64.90	75.70	15.29	1.76	90.77	89.00	79.73	92.58	90.07	58.10
Ecoli1	87.29	58.09	73.85	76.56	88.79	58.72	50.63	62.07	85.60	15.53	86.22	81.44	87.52	84.91	81.55	86.64
New-Thyroid2	96.49	91.59	90.37	87.57	88.83	93.51	81.94	93.51	92.45	78.13	97.67	86.43	90.67	92.17	93.29	91.66
New-Thyroid1	90.00	89.55	91.87	92.32	91.25	83.68	83.96	83.77	37.98	83.63	96.21	91.35	94.30	90.86	95.65	93.77
Ecoli2	30.82	50.81	86.90	59.37	78.00	53.49	37.37	53.17	82.88	0.00	87.91	79.80	76.20	77.55	86.47	82.28
Segment0	98.98	98.88	99.44	80.39	0.00	93.37	65.13	93.31	0.00	0.00	98.00	98.26	98.08	98.19	98.41	80.54
Glass6	94.00	88.49	90.63	76.25	73.70	89.15	84.87	88.31	91.98	50.68	84.74	83.50	83.30	79.42	82.29	89.66
Yeast3	17.18	89.46	81.71	52.27	70.10	90.55	54.06	90.36	83.67	0.00	85.25	80.50	63.65	82.38	82.05	53.89
Ecoli3	31.91	46.29	61.69	45.04	48.79	63.29	56.70	66.32	42.14	7.30	67.11	50.84	40.23	72.68	67.15	0.00
Page-Blocks0	84.58	85.04	86.59	89.27	64.92	75.14	55.01	74.73	45.75	0.00	80.87	81.62	62.49	92.18	90.07	59.99
Vowel0	90.81	96.68	100.00	95.81	72.60	62.38	39.67	62.77	60.33	16.86	93.60	84.52	81.71	96.83	95.59	71.07
Glass2	0.00	9.61	19.34	63.23	10.00	65.99	0.00	67.11	0.00	0.00	0.00	0.00	0.00	52.59	32.05	0.00
Ecoli4	44.37	68.73	65.21	100.00	78.40	76.06	27.18	73.38	63.74	7.56	78.02	75.28	58.00	82.33	88.09	67.70
Glass4	85.97	36.12	95.41	69.63	56.47	69.72	35.79	66.51	11.40	7.63	62.51	27.67	48.49	30.09	62.70	0.00
Abalone9-18	14.14	41.36	40.17	96.63	32.73	48.65	20.56	53.50	21.19	44.67	40.41	20.36	7.07	44.78	54.11	0.00
Glass5	0.00	33.97	67.79	64.12	73.47	88.34	19.75	72.01	0.00	0.00	13.97	67.62	0.00	42.25	93.65	0.00
Yeast2vs8	69.65	56.57	10.00	42.31	72.66	71.45	65.46	71.23	10.00	25.55	62.74	0.00	0.00	10.00	69.05	72.83
Yeast4	0.00	59.96	26.21	75.16	6.31	58.31	8.91	58.91	11.21	0.00	17.16	12.34	12.31	39.92	55.56	0.00
Yeast5	0.00	78.14	77.71	44.69	9.43	78.04	26.56	79.44	28.01	0.00	72.67	41.32	13.69	83.67	80.82	0.00
Yeast6	0.00	75.53	51.47	43.77	0.00	54.24	18.20	56.43	0.00	0.00	0.00	0.00	0.00	56.60	62.80	0.00
Abalone19	0.00	0.00	0.00	32.31	0.00	61.72	0.00	68.96	0.00	8.04	0.00	0.00	0.00	0.00	7.53	0.00
Mean	58.93	64.92	69.62	56.13	52.73	70.88	48.08	70.96	39.59	20.76	67.58	57.48	51.76	69.87	75.81	46.93

We present a list of algorithms that have been selected as representative of each family for imbalanced classification. The criterion for choosing each method is exactly the same than for standard classification, that is, we extract the best performing method according to the differences supported by the statistical analysis and if no statistical differences were found, then we select the best ranking method, as in the case of UCS, GASSIST and Oblique-DT (please refer to the paper for details).

Chromosome = Rule:
1. Michigan approach: UCS.
2. IRL approach: SIA.
3. GCCL approach: OCEC.
Chromosome = Rule Set (Pittsburgh): GASSIST.
HEDTs: Oblique-DT.

Once we have selected the representative algorithms of each family we aim at comparing now their classification ability in imbalanced data sets versus the classical non-evolutionary algorithms. The average results using the geometric mean, together with the associated ranking for each method are shown in Table 16.

This table can be downloaded as an Excel document by clicking on the following link

Table 16. Results for imbalanced data sets classification with the best GBML algorithms for rule induction and state-of-the-art non-evolutionary algorithms
Data-set	UCS	SIA	OCEC	GASSIST	Oblique-DT	CART	AQ	CN2	C4.5	C4.5-Rules	RIPPER
Abalone9-18	41.36	40.17	48.65	40.41	54.11	25.14	19.79	20.78	44.82	47.55	53.37
Abalone19	0.00	0.00	61.72	0.00	7.53	0.00	56.64	0.00	0.00	0.00	23.20
Ecoli4	68.73	65.21	76.06	78.02	88.09	77.88	55.98	63.98	82.33	85.51	88.41
Glass2	9.61	19.34	65.99	0.00	32.05	25.08	65.72	0.00	60.08	60.08	36.66
Yeast4	59.96	26.21	58.31	17.16	55.56	10.84	38.39	21.21	41.97	55.08	63.82
Vowel0	96.68	100.00	62.38	93.60	95.59	84.68	60.06	48.83	96.83	96.83	95.73
Yeast2vs8	56.57	10.00	71.45	62.74	69.05	50.86	53.66	58.08	10.00	28.13	71.89
Glass4	36.12	95.41	69.72	62.51	62.70	76.37	74.23	58.55	55.37	58.22	78.89
Glass5	33.97	67.79	88.34	13.97	93.65	93.90	71.86	0.00	88.04	88.04	73.47
Yeast5	78.14	77.71	78.04	72.67	80.82	71.45	68.12	41.33	87.51	84.89	87.39
Yeast6	75.53	51.47	54.24	0.00	62.80	68.03	55.87	0.00	56.60	73.27	79.99
Ecoli1	58.09	73.85	58.72	86.22	81.55	82.91	42.12	55.93	85.38	85.70	91.52
Glass0123vs456	87.07	86.97	85.35	88.57	87.19	91.01	76.51	82.19	91.31	90.98	90.53
New-Thyroid1	89.55	91.87	83.68	96.21	95.65	94.62	86.56	85.66	90.84	91.78	92.88
New-Thyroid2	91.59	90.37	93.51	97.67	93.29	91.18	92.89	85.66	93.53	93.27	92.75
Page-Blocks0	85.04	86.59	75.14	80.87	90.07	71.04	11.70	59.29	91.95	93.09	93.57
Segment0	98.88	99.44	93.37	98.00	98.41	98.60	92.83	63.88	98.24	98.39	98.82
Vehicle0	88.64	80.54	76.49	90.77	90.07	92.28	69.86	29.88	92.90	92.83	90.88
Ecoli2	50.81	86.90	53.49	87.91	86.47	81.24	55.42	51.73	85.14	86.08	86.18
Yeast3	89.46	81.71	90.55	85.25	82.05	80.82	81.17	18.56	85.10	87.23	91.81
Ecoli3	46.29	61.69	63.29	67.11	67.15	68.12	45.38	56.04	67.38	69.56	80.69
Glass6	88.49	90.63	89.15	84.74	82.29	80.89	88.10	81.14	79.42	79.21	84.49
Ecoli0vs1	20.00	96.90	87.73	97.97	96.90	96.86	87.05	88.50	98.31	98.31	95.82
Haberman	47.87	52.29	45.26	45.26	59.77	25.44	13.85	23.20	35.63	35.06	55.54
Iris0	98.97	100.00	89.83	99.49	97.89	100.00	92.25	91.73	98.97	98.97	97.89
Pima	69.14	63.17	50.38	65.64	66.94	70.25	14.77	38.62	68.72	68.95	69.66
Vehicle2	92.63	85.09	88.41	94.75	94.48	93.52	89.08	49.64	95.57	93.64	95.33
Wisconsin	96.22	96.61	95.38	95.23	93.21	92.23	85.36	94.84	94.50	94.81	96.38
Yeast2	68.76	57.08	41.15	61.23	61.20	53.97	35.03	10.14	62.83	68.96	67.72
Glass0	30.93	77.20	68.40	81.03	74.53	74.60	63.11	21.62	81.36	78.30	76.80
Glass1	55.40	74.51	50.79	70.07	68.57	73.36	44.10	45.38	71.45	69.35	73.96
Vehicle1	62.74	57.71	58.95	56.23	66.92	53.63	43.78	30.94	64.70	61.75	71.28
Vehicle3	69.14	53.18	65.12	58.83	65.21	50.07	42.55	39.67	61.37	67.99	71.01
Mean	64.92	69.62	70.88	67.58	75.81	69.72	59.81	45.97	73.28	75.21	79.34

Empirical study applying preprocessing

Along the previous subsection we have studied the behaviour of the GBML and non-evoluationary approaches by means of an empirical study using the original imbalanced data-sets. Our aim is also to study whether a preprocessing method (G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, "A study of the behaviour of several methods for balancing machine learning training data," SIGKDD Explorations, vol. 6, no. 1, pp. 20–29, 2004 and A. Estabrooks, T. Jo, and N. Japkowicz, "A multiple resampling method for learning from imbalanced data sets," Computational Intelligence, vol. 20, no. 1, pp. 18–36, 2004.) that rebalances the imbalance ratio of the data set helps to improve the performance and in which cases. We use the SMOTE method (N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "Smote: Synthetic minority over–sampling technique," Journal of Artificial Intelligent Research, vol. 16, pp. 321–357, 2002. ) as one of the best known preprocessing methods that compensate for class imbalances.

Table 17 shows the average results of all GBML methods for all data sets. The results shown are the geometric mean and the average ranking of each method. We stress in boldface the best value within each family. This table can be also downloaded as an Excel document by clicking on the following link

Table 17. Results with the geometric mean of the true rates for GBML algorithms for rule induction in classification with imbalanced data-sets applying SMOTE preprocessing
	Michigan Approach		IRL Approach		GCCL Approach			Pittsburgh Approach						HEDT Approach
Data-set	XCS	UCS	SIA	HIDER	CORE	OCEC	COGIN	GIL	Pitts-GIRLA	DMEL	GASSIST	OIGA	ILGA	DT-GA	Oblique-DT	TARGET
Glass1	75.53	58.40	82.33	68.45	62.43	63.01	46.65	50.73	0.00	35.43	80.07	68.46	64.91	70.03	71.31	61.16
Ecoli0vs1	97.63	97.35	95.15	90.17	97.97	97.26	85.65	86.97	95.95	5.16	97.94	97.67	96.57	98.31	96.90	95.10
Wisconsin	96.73	95.90	96.05	96.34	94.72	95.05	94.92	95.76	19.08	37.15	95.10	91.59	92.21	94.80	93.84	94.80
Pima	70.27	71.49	67.08	68.13	70.36	55.99	36.37	46.83	30.31	9.34	73.98	69.78	70.56	70.97	66.60	68.99
Iris0	100.00	100.00	99.49	99.49	100.00	94.91	93.31	93.82	0.00	74.24	100.00	98.47	96.96	98.97	97.89	100.00
Glass0	81.30	34.39	83.92	70.54	69.70	71.01	62.42	71.00	0.00	0.00	84.27	76.66	77.45	77.90	75.36	65.71
Yeast2	69.74	64.28	63.95	66.22	65.85	47.12	25.30	32.33	31.63	0.00	72.73	68.56	67.98	70.76	67.88	65.99
Vehicle2	96.88	94.30	77.41	83.36	39.73	90.60	73.59	75.46	0.00	57.46	95.24	90.97	90.71	94.29	93.91	76.22
Vehicle1	74.04	65.30	64.71	69.24	62.71	66.40	62.68	61.04	29.36	7.13	76.28	62.27	64.26	67.85	71.04	62.29
Vehicle3	74.18	68.02	66.66	67.93	19.55	69.17	60.55	64.54	0.00	15.78	73.67	65.95	65.30	66.85	68.04	65.14
Haberman	56.47	45.49	50.86	63.85	65.93	53.50	37.30	38.40	36.04	16.71	60.69	52.09	40.97	61.59	50.79	54.34
Glass0123vs456	89.48	76.13	93.48	83.16	86.03	79.71	62.79	80.92	88.04	65.18	93.13	85.71	87.09	92.35	91.77	92.42
Vehicle0	93.99	92.54	84.52	85.73	75.47	90.45	74.42	76.12	33.26	12.42	94.42	91.62	90.31	90.86	92.59	78.24
Ecoli1	89.04	48.42	84.38	85.70	89.44	37.15	37.39	41.04	0.00	14.90	85.25	86.00	87.04	90.01	82.01	87.75
New-Thyroid2	96.57	96.23	98.60	96.28	89.60	85.23	78.10	84.91	13.09	79.53	95.72	93.29	95.67	96.49	96.85	93.44
New-Thyroid1	97.41	97.96	98.30	96.56	93.46	86.97	75.41	85.35	38.26	78.38	94.84	96.47	93.40	96.06	98.03	93.55
Ecoli2	89.29	79.09	91.00	90.43	87.39	58.77	42.72	53.40	87.91	2.65	89.77	83.00	86.02	87.25	89.38	82.30
Segment0	98.62	99.06	99.19	92.33	73.04	93.92	89.77	92.10	0.00	14.11	98.98	98.53	98.96	99.11	98.91	94.08
Glass6	86.88	80.98	87.11	83.93	88.46	91.85	77.05	91.42	91.33	36.66	89.46	91.88	85.59	89.02	88.73	87.94
Yeast3	91.84	89.35	87.16	86.23	86.84	80.41	64.60	76.20	0.00	0.00	92.03	90.14	91.60	88.77	84.19	88.27
Ecoli3	83.79	59.29	84.43	82.71	82.86	50.18	45.68	43.24	18.44	15.49	83.08	83.69	72.06	81.43	75.53	85.03
Page-Blocks0	94.46	85.77	92.24	92.17	73.41	92.17	48.04	73.00	34.51	0.00	88.87	88.28	86.35	94.86	93.71	83.47
Vowel0	99.44	89.05	99.94	84.96	79.89	65.64	61.54	60.17	17.80	42.02	96.45	92.26	86.14	93.72	98.04	88.46
Glass2	64.09	0.00	68.23	26.02	60.78	49.30	63.77	64.85	0.00	0.00	54.69	58.43	44.04	51.36	66.13	59.21
Ecoli4	90.65	68.33	90.70	76.54	90.56	74.63	64.05	74.59	0.00	29.47	86.35	86.85	80.97	74.55	84.65	83.11
Glass4	83.20	34.05	87.44	59.47	60.98	78.92	58.57	73.48	0.00	61.14	68.92	62.63	78.75	87.50	83.37	85.97
Abalone9-18	69.27	38.36	69.18	69.01	69.45	50.03	45.92	48.02	0.00	4.17	70.24	59.51	61.19	52.79	65.42	68.02
Glass5	91.98	14.14	91.39	72.98	86.63	91.52	86.79	76.99	0.00	34.96	91.47	46.80	33.05	86.78	91.38	84.50
Yeast2vs8	68.61	59.65	70.61	61.74	72.11	69.11	64.57	69.37	0.00	23.05	76.77	30.90	33.96	81.01	77.11	71.15
Yeast4	80.56	56.29	72.36	66.27	80.57	48.67	29.10	37.67	32.96	1.18	76.72	81.71	72.36	66.66	62.44	82.07
Yeast5	96.40	84.04	94.79	97.35	93.70	64.70	50.63	68.92	0.00	13.84	95.58	85.88	89.77	90.93	89.14	93.04
Yeast6	87.04	77.37	83.10	84.71	77.62	66.92	51.40	60.27	0.00	0.00	80.81	72.55	76.80	80.57	75.20	85.27
Abalone19	67.08	0.00	23.35	64.58	67.01	55.84	55.14	55.43	0.00	0.70	48.18	62.42	47.34	15.58	18.67	63.51
Mean	84.92	67.30	81.79	78.26	76.19	71.70	60.79	66.80	21.15	23.89	83.69	77.91	75.95	80.61	80.51	80.02

Finally, we present the list of algorithms that have been selected as representative of each family for imbalanced classification when applying SMOTE preprocessing. The criterion for choosing each method is the same as we defined in the previous section of the study.

Chromosome = Rule:
1. Michigan approach: XCS.
2. IRL approach: SIA.
3. GCCL approach: CORE.
Chromosome = Rule Set (Pittsburgh): GASSIST.
HEDTs: DT-GA.

Finally, Table 18 shows the average results with SMOTE preprocessing for the selected GBML algorithms and the classical non-evolutionary algorithms. The best methods in this case are XCS and GASSIST for the GBML algorithms, followed by C4.5 and C4.5-Rules from the non-evolutionary approaches.

As for the previous tables of results, it can be downloaded as an Excel document by clicking on the following link

Table 18. Results for imbalanced data sets classification with the best GBML algorithms for rule induction and state-of-the-art non-evolutionary algorithms applying SMOTE preprocessing
Data-set	XCS	SIA	CORE	GASSIST	DT-GA	CART	AQ	CN2	C4.5	C4.5-Rules	RIPPER
Glass1	75.53	82.33	62.43	80.07	70.03	73.36	47.25	40.77	70.49	72.04	68.37
Ecoli0vs1	97.63	95.15	97.97	97.94	98.31	96.86	86.65	86.65	97.24	97.58	95.13
Wisconsin	96.73	96.05	94.72	95.10	94.80	92.23	94.55	93.98	96.19	95.58	94.41
Pima	70.27	67.08	70.36	73.98	70.97	75.96	8.78	47.90	72.92	70.47	69.44
Iris0	100.00	99.49	100.00	100.00	98.97	100.00	95.34	94.34	98.97	98.97	97.89
Glass0	81.30	83.92	69.70	84.27	77.90	74.60	58.84	72.23	79.02	75.76	79.52
Yeast2	69.74	63.95	65.85	72.73	70.76	53.97	24.93	27.93	72.23	69.94	67.92
Vehicle2	96.88	77.41	39.73	95.24	94.29	93.52	71.49	68.12	95.79	95.46	96.70
Vehicle1	74.04	64.71	62.71	76.28	67.85	53.63	41.24	52.38	71.90	72.13	66.52
Vehicle3	74.18	66.66	19.55	73.67	66.85	50.07	35.35	56.74	72.62	69.18	64.85
Haberman	56.47	50.86	65.93	60.69	61.59	47.07	2.98	39.58	64.03	66.90	34.78
Glass0123vs456	89.48	93.48	86.03	93.13	92.35	91.01	55.14	80.57	90.05	87.11	85.53
Vehicle0	93.99	84.52	75.47	94.42	90.86	92.28	75.29	76.30	92.62	92.25	93.52
Ecoli1	89.04	84.38	89.44	85.25	90.01	82.91	40.79	42.14	88.26	88.26	85.09
New-Thyroid2	96.57	98.60	89.60	95.72	96.49	91.18	83.27	86.12	95.93	97.70	97.11
New-Thyroid1	97.41	98.30	93.46	94.84	96.06	94.62	83.25	85.35	93.39	94.85	94.34
Ecoli2	89.29	91.00	87.39	89.77	87.25	81.24	51.91	49.77	89.08	88.64	84.06
Segment0	98.62	99.19	73.04	98.98	99.11	98.60	91.96	78.02	99.54	98.91	99.14
Glass6	86.88	87.11	88.46	89.46	89.02	80.89	60.39	86.59	92.62	89.58	94.95
Yeast3	91.84	87.16	86.84	92.03	88.77	80.82	75.14	76.40	92.26	93.55	88.26
Ecoli3	83.79	84.43	82.86	83.08	81.43	68.12	39.98	19.99	85.78	81.83	87.16
Page-Blocks0	94.46	92.24	73.41	88.87	94.86	71.04	0.64	73.43	94.51	94.55	94.36
Vowel0	99.44	99.94	79.89	96.45	93.72	84.68	55.36	49.86	94.69	96.59	97.47
Glass2	64.09	68.23	60.78	54.69	51.36	25.08	62.66	65.12	64.04	68.04	62.33
Ecoli4	90.65	90.70	90.56	86.35	74.55	77.88	55.61	74.59	86.48	84.28	82.39
Glass4	83.20	87.44	60.98	68.92	87.50	76.37	71.09	76.96	72.13	68.00	68.94
Abalone9-18	69.27	69.18	69.45	70.24	52.79	25.14	7.08	49.71	63.19	66.46	73.80
Glass5	91.98	91.39	86.63	91.47	86.78	93.90	90.61	71.89	91.40	91.14	77.77
Yeast2vs8	68.61	70.61	72.11	76.77	81.01	50.86	41.69	69.44	82.33	74.18	58.75
Yeast4	80.56	72.36	80.57	76.72	66.66	10.84	28.71	22.59	73.39	75.83	65.67
Yeast5	96.40	94.79	93.70	95.58	90.93	71.45	34.16	68.16	95.06	95.07	84.71
Yeast6	87.04	83.10	77.62	80.81	80.57	68.03	42.71	60.22	84.18	80.31	73.25
Abalone19	67.08	23.35	67.01	48.18	15.58	0.00	18.26	55.41	8.00	7.99	38.80
Mean	84.92	81.79	76.19	83.69	80.61	70.55	52.52	63.61	82.43	81.79	79.48

Soft Computing and Intelligent Information Systems

Genetics-Based Machine Learning for Rule Induction: State of the Art, Taxonomy and Comparative Study

Paper Content