BigDaPR Library

BigDaPTOOLS Project

 

This website shows 5 libraries in the field of data preprocessing. These libraries address problems such as data reduction with autoencoders, data preprocessing for imbalanced data sets, ordinal and noisy data, as well as a general purpose library for data preprocessing, smartdata, which collects the state of the art algorithms for data preprocessing in R, being a container of algorithms that provides a uniform interface to other libraries. 


 

smartdata: Data Preprocessing (August 2018)

Eases data preprocessing tasks, providing a data flow based on a pipe operator which eases clearing, transformation, oversampling, instance/feature selection algorithms, imperfect data (noise data and missing values). This library does not include algorithm implementations but a uniform interface to the most important libraries in data preprocessing (infotheo, discretization, outliers, NoiseFiltersR, Boruta, FSelector, unbalanced, RoughSets, Amelia, imbalance, DMwR, missForest, missMDA, denoiseR, VIM).

https://cran.r-project.org/web/packages/smartdata

ruta: Implementation of Unsupervised Neural Architectures (May 2018)

Implementation of several unsupervised neural networks, from building their architecture to their training and evaluation. Available networks are auto-encoders including their main variants: sparse, contractive, denoising, robust and variational.

https://ruta.software

https://cran.r-project.org/web/packages/ruta

imbalance: Preprocessing Algorithms for Imbalanced Datasets (February 2018)

This library includes recent relevant oversampling algorithms to improve the quality of data in imbalanced datasets, prior to performing a learning task.

https://cran.r-project.org/web/packages/imbalance/

Associated paper: https://doi.org/10.1016/j.knosys.2018.07.035

OCAPIS (October 2018)

Package for ordinal data classification and data preprocessing implemented in Scala.

https://cristinahg.github.io/OCAPIS/

Associated paper: https://arxiv.org/abs/1810.09733

NoiseFiltersR: Label Noise Filters for Data Preprocessing in Classification (July 2016)

This library includes an extensive implementation of state-of-the-art and classical algorithms to preprocess label noise in classification problems.

https://cran.r-project.org/web/packages/NoiseFiltersR

Associated paper: https://journal.r-project.org/archive/2017/RJ-2017-027/RJ-2017-027.pdf