PCARD
PCARD
Is a distributed upgrade of the method present in A. Ahmad and G. Brown, "Random Projection Random Discretization Ensembles - Ensembles of Linear Multivariate Decision Trees". The algorithm performs Random Discretization and PCA to the input data, joins the results and trains a decision tree on it. It has been proved with five large real-world datasets such as: Poker, SUSY, HIGGS and Epsilon.
Status
Use
To Include this package in your Spark application via spark-shell or pySpark, you must use it like:
$SPARK_HOME/bin/spark-shell --packages djgg:PCARD:1.1
where $SPARK_HOME is your Spark path.
Release
The latest version is : 1.1 Date: 2016-03-02 / Scala version: 2.10