PCARD

PCARD

Is a distributed upgrade of the method present in A. Ahmad and G. Brown, "Random Projection Random Discretization Ensembles - Ensembles of Linear Multivariate Decision Trees". The algorithm performs Random Discretization and PCA to the input data,  joins the results and trains a decision tree on it. It has been proved  with five large real-world datasets such as: Poker, SUSY, HIGGS and Epsilon.

Status

           

Use

To Include this package in your Spark application via spark-shell or pySpark, you must use it like:

$SPARK_HOME/bin/spark-shell --packages djgg:PCARD:1.1

where $SPARK_HOME is your Spark path.

Release

The latest version is :  1.1       Date: 2016-03-02 / Scala version: 2.10