DiReliefF

DiReliefF

Feature selection (FS) is a key in the machine learning : removing irrelevant and redundant features usually helps to reduce the effort required to process a dataset while maintaining or even improving the processing algorithm’s accuracy. However, traditional algorithms lack scalability to deal with the increasing amount of data that have become available in the current Big Data era. ReliefF is one of the most important algorithms successfully implemented in many FS applications. It is a completely redesigned distributed version of the popular ReliefF algorithm based on the novel Spark cluster computing model.

Status

           

Use

To run DiReliefF, the following lines must be added to the .sbt:

name := "spark-relieff"\
version := "0.1.0"\
organization := "rauljosepalma"\
scalaVersion := "2.10.5"\
val sparkVersion = "1.6.0"\
libraryDependencies ++= Seq(\
  "org.apache.spark" %% "spark-core" % sparkVersion,\
"org.apache.spark" %% "spark-mllib" % sparkVersion) 

Release

The latest version of DireliefF is:

Reference

Palma-Mendoza, Raul-Jose, Daniel Rodriguez, and Luis de-Marcos. "Distributed ReliefF-based feature selection in Spark." Knowledge and Information Systems (2018): 1-20.