spark-infotheoretic-feature-selection

Spark Infotheoric Feature Selection Framework

Feature Selection (FS) framework  implemented for a distributed paradigm and integrated under Apache Spark MLlib that contains a generic implementation of several information theory-based FS methods as mRMR (minimum redundancy maximum relevance), conditional MI (mutual information) and maximization and JMI (joint mutual information), and is based on the information theory-based framework proposed by Brown, adapting it to the Big Data environment.

Status

           

Use

To Include this package in your Spark application via spark-shell or pySpark, you must use it like:

$SPARK_HOME/bin/spark-shell\
--packages sramirez:spark-infotheoretic-feature-selection:1.4.4

where $SPARK_HOME is your Spark path.

Release

The latest version is : 1.4.4    / Date: 2017-09-25  / Scala version: 2.11

Reference

S. Ramírez-Gallego, H. Mouriño-Talín, D. Martínez-Rego, V. Bolón-Canedo, J.M. Benítez, A. Alonso-Betanzos, F. Herrera. An Information Theory-Based Feature Selection Framework for Big Data under Apache Spark. IEEE Transactions on Systems, Man, and Cybernetics: Systems (2017), in press. doi: 10.1109/TSMC.2017.2670926