Workshop: Data Preprocessing in Data Mining

Organizers: Julián Luengo, Salvador García and Francisco Herrera.

This workshop is proposed for the IEEE International Conference on Data Mining which will be held in Barcelona, Spain at December 13-15, 2016.

Data preprocessing for Data Mining (DM) focuses on one of the most meaningful issues within the famous Knowledge Discovery from Data process. Data will likely have inconsistencies, errors, out of range values, impossible data combinations, missing values or most substantially, data is not suitable to start a DM process. In addition, the growing amount of data in current business applications, science, industry and academia, demands to the requirement of more complex mechanisms to analyze it. With data preprocessing, converting the impractical into possible is achievable, adapting the data to accomplish the input requirements of each DM algorithm.

Data preprocessing includes data preparation, compounded by integration, cleaning, normalization and transformation of data; and data reduction tasks, which aim at reducing the complexity of the data, detecting or removing irrelevant and noisy elements from the data through feature selection, instance selection or discretization processes. The outcome expected after a reliable connection of data preprocessing processes is a final data set, which can be contemplated correct and useful for further DM algorithms.

Objectives and Scope

This workshop aims at gathering researchers with an interest in the research area described above. Specifically, we are interested in contributions towards the development of novel preprocessing techniques for DM problems, as well as approaches for developing areas in DM as Big Data.

Contributions to this special session are expected to pay special attention to the rigorous motivation of the approaches put forward and to support all aspects of the models developed with a corresponding theoretical sound framework. Straight approaches lacking such scientific approach are discouraged.

Indicative, but not complete, lists of topics covered in this focus session include:

  • Data preprocessing for classical DM problems: classification, regression, association rules, time series, etc.
  • Feature and instance selection
  • Noise filtering and correction
  • Missing values treatment
  • Data transformation
  • Discretization
  • Instance generation
  • Imbalanced data treatment: oversampling and undersampling
  • Data preprocessing for multilabel, multi-instance and ordinal classification
  • Data streams preprocessing
  • Data preprocessing for subgroup discovery
  • Big Data preprocessing
  • Data preprocessing for Deep Learning

Organizers and Contact

  • Julián Luengo. Contact information:
    Email address: julianlm@decsai.ugr.es
    Postal address: Department of Computer Science and Artificial Intelligence, University of Granada, E-18071 Granada, Spain
    Telephone number: +34-958-244258
    Fax Number: +34 958 243317
  • Salvador García. Contact information:
    Email address: salvagl@decsai.ugr.es
    Postal address: Department of Computer Science and Artificial Intelligence, University of Granada, E-18071 Granada, Spain
    Telephone number: +34-958-244258
    Fax Number: +34 958 243317
  • Francisco Herrera. Contact information:
    Email address: herrera@decsai.ugr.es
    Postal address: Department of Computer Science and Artificial Intelligence, University of Granada, E-18071 Granada, Spain
    Telephone number: +34-958-244258
    Fax Number: +34 958 243317