Analyzing the Presence of Noise in Multi-class Problems: Alleviating its Influence with the One-vs-One Decomposition
This Website contains the complementary material to the paper:
José A. Sáez, Mikel Galar, Julián Luengo, Francisco Herrera, Analyzing the Presence of Noise in Multi-class Problems: Alleviating its Influence with the One-vs-One Decomposition. Knowledge and Information Systems, submitted.
The web is organized according to the following summary:
Abstract
The presence of noise in data is a common problem that produces several negative consequences in classification problems. In multi-class problems, these consequences are aggravated in terms of accuracy, building time and complexity of the classifiers. In these cases, an interesting approach to reduce the effect of noise is to decompose the problem into several binary subproblems, reducing the complexity and, consequently, dividing the effects caused by noise into each of these subproblems.
This paper analyzes the usage of decomposition strategies, and more specifically the One-vs-One scheme, to deal with noisy multi-class datasets. In order to investigate whether the decomposition is able to reduce the effect of noise or not, a large number of datasets are created introducing different levels and types of noise, as suggested in the literature. Several well-known classification algorithms, with or without decomposition, are trained on them in order to check when decomposition is advantageous. The results obtained show that methods using the One-vs-One strategy lead to better performances and more robust classifiers when dealing with noisy data, especially with the most disruptive noise schemes.
Base Datasets
The experimentation is based on twenty real-world multi-class classification problems from the KEEL dataset repository. Next table shows the datasets sorted by the number of classes (#CLA). Moreover, for each dataset, the number of examples (#EXA) and the number of attributes (#ATT), along with the number of real, integer and nominal attributes (R/I/N) are presented.
You can also download all these datasets by clicking here:
Performance Results
1. Results of OVO with Class Noise
a) Datasets with Class Noise:
b) Results on Datasets with Class Noise:
In the following tables you can download the XLS files with the results of each noise scheme considered. In the Accuracy column you find the XLS file with the test accuracy, whereas in the Robustness column you find the XLS file with the RLA results of each classification algorithm at each level of induced noise for each dataset.
Class Noise | Accuracy | Robustness |
---|---|---|
Random | ||
Pairwise |
2. Results of OVO with Attribute Noise
a) Datasets with Attribute Noise:
b) Results on Datasets with Attribute Noise:
Attribute Noise | Accuracy | Robustness |
---|---|---|
Random | ||
Gaussian |