spark-MDLP-discretization

Minimum Description Length Discretizer

This method implements Fayyad's discretizer based on Minimum Description Length Principle (MDLP) in order to treat non discrete datasets from a distributed perspective. We have developed a distributed version from the original one performing some important changes.

Status

      

Use

To Include this package in your Spark application via spark-shell or pySpark, you must use it like:

$SPARK_HOME/bin/spark-shell --packages sramirez:spark-MDLP-discretization:1.4.1

where $SPARK_HOME is your Spark path.

Release

The latest version is : 1.4.1      / Date: 2017-09-25  / Scala version: 2.11

Reference

S. Ramírez‐Gallego, S. García, H. Mouriño‐Talín, D. Martínez‐Rego, V. Bolón‐Canedo, A. Alonso‐Betanzos, J.M. Benítez, F. Herrera. "Data discretization: taxonomy and big data challenge". WIREs Data Mining Knowl Discov 2016, 6: 5-21.