Smart_Imputation

Smart_Imputation

This contribution implements two approaches of the k Nearest Neighbor Imputation focused on the scalability in order to handle big dataset. k Nearest Neighbor - Local Imputation and k Nearest Neighbor Imputation - Global Imputation. The global proposal takes into account all the instances to calculate the k nearest neighbors. The local proposal considers those that are into the same partition, achieving higher times, but losing the information because it does not consider all the samples.

Status

           

Use

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages JMailloH:Smart_Imputation:1.0

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "JMailloH/Smart_Imputation:1.0"

Otherwise,

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"

libraryDependencies += "JMailloH" % "Smart_Imputation" % "1.0"

Maven

In your pom.xml, add:

<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>JMailloH</groupId>
    <artifactId>Smart_Imputation</artifactId>
    <version>1.0</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>http://dl.bintray.com/spark-packages/maven</url>
  </repository>
</repositories>

Release

The latest version is : 1.0   / Date: 2017-09-25  / Scala version: 2.11