Analyzing the Presence of Noise in Multi-class Problems: Alleviating its Influence with the One-vs-One Decomposition

This Website contains the complementary material to the paper:

José A. Sáez, Mikel Galar, Julián Luengo, Francisco Herrera, Analyzing the Presence of Noise in Multi-class Problems: Alleviating its Influence with the One-vs-One Decomposition. Knowledge and Information Systems, submitted.

The web is organized according to the following summary:

  1. Abstract
  2. Base Datasets
  3. Performance Results

Abstract

The presence of noise in data is a common problem that produces several negative consequences in classification problems. In multi-class problems, these consequences are aggravated in terms of accuracy, building time and complexity of the classifiers. In these cases, an interesting approach to reduce the effect of noise is to decompose the problem into several binary subproblems, reducing the complexity and, consequently, dividing the effects caused by noise into each of these subproblems.

This paper analyzes the usage of decomposition strategies, and more specifically the One-vs-One scheme, to deal with noisy multi-class datasets. In order to investigate whether the decomposition is able to reduce the effect of noise or not, a large number of datasets are created introducing different levels and types of noise, as suggested in the literature. Several well-known classification algorithms, with or without decomposition, are trained on them in order to check when decomposition is advantageous. The results obtained show that methods using the One-vs-One strategy lead to better performances and more robust classifiers when dealing with noisy data, especially with the most disruptive noise schemes.

Base Datasets

The experimentation is based on twenty real-world multi-class classification problems from the KEEL dataset repository. Next table shows the datasets sorted by the number of classes (#CLA). Moreover, for each dataset, the number of examples (#EXA) and the number of attributes (#ATT), along with the number of real, integer and nominal attributes (R/I/N) are presented.

You can also download all these datasets by clicking here: iconZip.png

Performance Results

1. Results of OVO with Class Noise

a) Datasets with Class Noise:

  • Random class noise scheme
  • Pairwise class noise scheme

b) Results on Datasets with Class Noise:

In the following tables you can download the XLS files with the results of each noise scheme considered. In the Accuracy column you find the XLS file with the test accuracy, whereas in the Robustness column you find the XLS file with the RLA results of each classification algorithm at each level of induced noise for each dataset.

Class Noise    Accuracy    Robustness
Random xls.gif xls.gif
Pairwise xls.gif xls.gif


2. Results of OVO with Attribute Noise

a) Datasets with Attribute Noise:

  • Random attribute noise scheme
  • Gaussian attribute noise scheme

b) Results on Datasets with Attribute Noise:

Attribute Noise    Accuracy    Robustness
Random xls.gif xls.gif
Gaussian xls.gif xls.gif