Partition Incremental Discretization algorithm (PiD)

PiD performs incremental discretization. The basic idea is to perform the task in two layers. The first layer receives the sequence of input data and keeps some statistics on the data using many more intervals than required. Based on the statistics stored by the first layer, the second layer creates the final discretization. The proposed architecture processes streaming exam ples in a single scan, in constant time and space even for infinite sequences of examples.

Status

Use

// In PiD, data must be normalized as a previous step, so a ChainTransformer is used in the pipeline.
val pid = PIDiscretizerTransformer()
  .setAlpha(.10)
  .setUpdateExamples(50)
  .setL1Bins(5)
val scaler = MinMaxScaler()
val pipeline = scaler chainTransformer pid

pipeline fit dataSet
val result = pipeline transform dataSet

Release

The latest version is : 0.1.0 / Date: 2018-09-28 / Scala version: 2.11.12

Reference

J. Gama, C. Pinto, Discretization from data streams: Applications to histograms and data mining, in: Proceedings of the 2006 ACM Sympo sium on Applied Computing, SAC ’06, 2006, pp. 662–667.

You are here

Partition Incremental Discretization algorithm (PiD)