BigDaPR Library
BigDaPTOOLS Project
This website shows 5 libraries in the field of data preprocessing. These libraries address problems such as data reduction with autoencoders, data preprocessing for imbalanced data sets, ordinal and noisy data, as well as a general purpose library for data preprocessing, smartdata, which collects the state of the art algorithms for data preprocessing in R, being a container of algorithms that provides a uniform interface to other libraries.
smartdata: Data Preprocessing (August 2018)
Eases data preprocessing tasks, providing a data flow based on a pipe operator which eases clearing, transformation, oversampling, instance/feature selection algorithms, imperfect data (noise data and missing values). This library does not include algorithm implementations but a uniform interface to the most important libraries in data preprocessing (infotheo, discretization, outliers, NoiseFiltersR, Boruta, FSelector, unbalanced, RoughSets, Amelia, imbalance, DMwR, missForest, missMDA, denoiseR, VIM).
https://cran.r-project.org/web/packages/smartdata
ruta: Implementation of Unsupervised Neural Architectures (May 2018)
Implementation of several unsupervised neural networks, from building their architecture to their training and evaluation. Available networks are auto-encoders including their main variants: sparse, contractive, denoising, robust and variational.
https://cran.r-project.org/web/packages/ruta
imbalance: Preprocessing Algorithms for Imbalanced Datasets (February 2018)
This library includes recent relevant oversampling algorithms to improve the quality of data in imbalanced datasets, prior to performing a learning task.
https://cran.r-project.org/web/packages/imbalance/
Associated paper: https://doi.org/10.1016/j.knosys.2018.07.035
OCAPIS (October 2018)
Package for ordinal data classification and data preprocessing implemented in Scala.
https://cristinahg.github.io/OCAPIS/
Associated paper: https://arxiv.org/abs/1810.09733
NoiseFiltersR: Label Noise Filters for Data Preprocessing in Classification (July 2016)
This library includes an extensive implementation of state-of-the-art and classical algorithms to preprocess label noise in classification problems.
https://cran.r-project.org/web/packages/NoiseFiltersR
Associated paper: https://journal.r-project.org/archive/2017/RJ-2017-027/RJ-2017-027.pdf