spark-MDLP-discretization
Minimum Description Length Discretizer
This method implements Fayyad's discretizer based on Minimum Description Length Principle (MDLP) in order to treat non discrete datasets from a distributed perspective. We have developed a distributed version from the original one performing some important changes.
Status
Use
To Include this package in your Spark application via spark-shell or pySpark, you must use it like:
$SPARK_HOME/bin/spark-shell --packages sramirez:spark-MDLP-discretization:1.4.1
where $SPARK_HOME is your Spark path.
Release
The latest version is : 1.4.1 / Date: 2017-09-25 / Scala version: 2.11
Reference
S. Ramírez‐Gallego, S. García, H. Mouriño‐Talín, D. Martínez‐Rego, V. Bolón‐Canedo, A. Alonso‐Betanzos, J.M. Benítez, F. Herrera. "Data discretization: taxonomy and big data challenge". WIREs Data Mining Knowl Discov 2016, 6: 5-21.