Estimating classifier behavior with noisy data: the Equalized Loss of Accuracy measure
This Website contains complementary material to the paper:
José A. Sáez, Julián Luengo, Francisco Herrera, Estimating classifier behavior with noisy data: the Equalized Loss of Accuracy measure. Neurocomputing, submitted.
The web is organized according to the following summary:
Abstract
Noise is common in any real-world data set and may adversely affect classifiers built under the effect of such type of disturbance. Some of these classifiers are widely recognized for their good performance when dealing with imperfect data. However, the noise robustness of the classifiers is an important issue in noisy environments and it must be carefully studied. Both performance and robustness are two independent concepts that are usually considered separately, but the conclusions reached with one of these metrics do not necessarily imply the same conclusions with the other. Therefore, involving both concepts seems to be crucial in order determine the expected behavior of the classifiers against noise. This paper proposes a new measure to establish the expected behavior of a classifier with noisy data trying to minimize the problems of considering performance and robustness individually: the Equalized Loss of Accuracy (ELA). The advantages of ELA against other robustness metrics are studied and all of them are also compared in an interesting case of study which considers the results of several classifiers with a different noise tolerance over numerous data sets. Both the analysis of the distinct measures and the empirical results show that ELA is able to overcome some of the problems that the rest of the robustness metrics could produce, being useful to represent the behavior of the classifiers against noise.
Datasets
The experimentation has been based on 32 data sets taken from the KEEL-dataset repository. The following table summarizes the properties of the originally selected data sets. For each data set, the number of instances (#EX), the number of numeric attributes (#AT) and the number of classes (#CL) are presented.
You can also download all these datasets by clicking here: ZIP file Enlace roto
Experimental results
You can download the file with the experimental results of the case of the study of the paper by clicking here: