GPU-SME-k NN: Scalable and Memory Efficient k NN and Lazy Learning using GPUs
This Website contains complementary material to the paper:
P.D. Gutiérrez, M. Lastra, Jaume Bacardit, J.M. Benítez and F.Herrera, GPU-SME-k NN: Scalable and Memory Efficient k NN and Lazy Learning using GPUs. Information Sciences, submitted
Summary:
P.D. Gutiérrez, M. Lastra, Jaume Bacardit, J.M. Benítez and F.Herrera, GPU-SME-k NN: Scalable and Memory Efficient k NN and Lazy Learning using GPUs.
Abstract
The k nearest neighbor (kNN) rule is one of the most used techniques in data mining and pattern recognition due to its simplicity and low identification error. However, the computational effort it requires is directly related to the dataset sizes, hence delivering a poor performance on large datasets. Graphics processing units (GPU) have proven to improve the performance of the kNN rule but the computational requirements of current proposals limit the performance as the dataset size increases. In this paper we propose a new scalable and memory efficient design for a GPU-based kNN rule, called GPU-SME-kNN, that eliminates the relation between dataset size and memory footprint while delivering high performance. An experimental study of GPU-SME-kNN is presented showing a high performance, even in cases that other proposals cannot address, and keeping computational requirements suitable for most commercial GPU devices. Our design has also been applied to kNN-based lazy learning algorithms reducing run-times in a significant way.
Datasets
The proposed GPU-based kNN rule has been tested with different datasets from the UCI Machine Learning Repository. Here we provide the 5-fold cross validation partitions used in the experiments in all sizes.
Poker dataset
The poker dataset has 1,025,009 instances, 10 attributes and 10 classes. The dataset has been subsampled at sizes ranging from 50,000 to 1,000,000 instances in steps of 50,000 instances.
KDDCup 199 dataset
The KDDCup 1999 dataset has 4,898,431 instances, 41 attributes and 5 classes. This dataset has been subsampled in steps of 250,000 instances from 250,000 to 1,500,000 instances and in steps of 500,000 instances for larger sizes.
Source code
The source code is available as open source. It can be downloaded here: source.zip
Results obtained
In this section, we report the results obtained in the experimental study. We present te results for kNN rule with the Poker and KDDCup 1999 datasets and also the results of the lazy learning algorithms with the Poker dataset
We present these data in xls and ods files available for download.
xls file | ods file | |
---|---|---|
kNN rule Poker results | ||
kNN rule KDDCup 1999 results | ||
Center kNN rule Poker results | ||
kNN Adaptive rule Poker results | ||
kNN Symmetric rule Poker results |