spark-infotheoretic-feature-selection
Spark Infotheoric Feature Selection Framework
Feature Selection (FS) framework implemented for a distributed paradigm and integrated under Apache Spark MLlib that contains a generic implementation of several information theory-based FS methods as mRMR (minimum redundancy maximum relevance), conditional MI (mutual information) and maximization and JMI (joint mutual information), and is based on the information theory-based framework proposed by Brown, adapting it to the Big Data environment.
Status
Use
To Include this package in your Spark application via spark-shell or pySpark, you must use it like:
$SPARK_HOME/bin/spark-shell\ --packages sramirez:spark-infotheoretic-feature-selection:1.4.4
where $SPARK_HOME is your Spark path.
Release
The latest version is : 1.4.4 / Date: 2017-09-25 / Scala version: 2.11
Reference
S. Ramírez-Gallego, H. Mouriño-Talín, D. Martínez-Rego, V. Bolón-Canedo, J.M. Benítez, A. Alonso-Betanzos, F. Herrera. An Information Theory-Based Feature Selection Framework for Big Data under Apache Spark. IEEE Transactions on Systems, Man, and Cybernetics: Systems (2017), in press. doi: 10.1109/TSMC.2017.2670926