BigDaPR Library

BigDaPTOOLS Project


This website shows 5 libraries in the field of data preprocessing. These libraries address problems such as data reduction with autoencoders, data preprocessing for imbalanced data sets, ordinal and noisy data, as well as a general purpose library for data preprocessing, smartdata, which collects the state of the art algorithms for data preprocessing in R, being a container of algorithms that provides a uniform interface to other libraries. 


smartdata: Data Preprocessing (August 2018)

Eases data preprocessing tasks, providing a data flow based on a pipe operator which eases clearing, transformation, oversampling, instance/feature selection algorithms, imperfect data (noise data and missing values). This library does not include algorithm implementations but a uniform interface to the most important libraries in data preprocessing (infotheo, discretization, outliers, NoiseFiltersR, Boruta, FSelector, unbalanced, RoughSets, Amelia, imbalance, DMwR, missForest, missMDA, denoiseR, VIM).

ruta: Implementation of Unsupervised Neural Architectures (May 2018)

Implementation of several unsupervised neural networks, from building their architecture to their training and evaluation. Available networks are auto-encoders including their main variants: sparse, contractive, denoising, robust and variational.

imbalance: Preprocessing Algorithms for Imbalanced Datasets (February 2018)

This library includes recent relevant oversampling algorithms to improve the quality of data in imbalanced datasets, prior to performing a learning task.

Associated paper:

OCAPIS (October 2018)

Package for ordinal data classification and data preprocessing implemented in Scala.

Associated paper:

NoiseFiltersR: Label Noise Filters for Data Preprocessing in Classification (July 2016)

This library includes an extensive implementation of state-of-the-art and classical algorithms to preprocess label noise in classification problems.

Associated paper: