Smart_Imputation
Smart_Imputation
This contribution implements two approaches of the k Nearest Neighbor Imputation focused on the scalability in order to handle big dataset. k Nearest Neighbor - Local Imputation and k Nearest Neighbor Imputation - Global Imputation. The global proposal takes into account all the instances to calculate the k nearest neighbors. The local proposal considers those that are into the same partition, achieving higher times, but losing the information because it does not consider all the samples.
Status
Use
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages JMailloH:Smart_Imputation:1.0
sbt
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "JMailloH/Smart_Imputation:1.0"
Otherwise,
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven" libraryDependencies += "JMailloH" % "Smart_Imputation" % "1.0"
Maven
In your pom.xml, add:
<dependencies> <!-- list of dependencies --> <dependency> <groupId>JMailloH</groupId> <artifactId>Smart_Imputation</artifactId> <version>1.0</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>http://dl.bintray.com/spark-packages/maven</url> </repository> </repositories>
Release
The latest version is : 1.0 / Date: 2017-09-25 / Scala version: 2.11