public class AssocRuleMining
extends java.lang.Object
Set of utilities to support various Association Rule Mining (ARM)
Modifier and Type | Class and Description |
---|---|
protected class |
AssocRuleMining.RuleNode
Inner class for storing linked list of ARs or CARs as appropriate.
|
Modifier and Type | Field and Description |
---|---|
protected double |
confidence
% confidence.
|
protected int[][] |
conversionArray
2-D array used to renumber columns for input data in terms of
frequency of single attributes (reordering will enhance performance
for some ARM algorithms).
|
protected short[][] |
dataArray
2-D aray to hold input data from data file.
|
protected myDataset |
dataset
Dataset.
|
protected boolean |
isOrderedFlag
Flag to indicate whether input data has been sorted or not.
|
protected boolean |
isPrunedFlag
Flag to indicate whether input data has been sorted and pruned or
not.
|
protected double |
minSupport
Minimum support value in terms of number of rows.
|
protected int |
numCols
Number of columns.
|
protected int |
numOneItemSets
The number of one itemsets (singletons).
|
protected int |
numRows
Number of rows.
|
protected short[] |
reconversionArray
1-D array used to reconvert input data column numbers to their
original numbering where the input data has been ordered to enhance
computational efficiency.
|
protected AssocRuleMining.RuleNode |
startRulelist
The reference to start of the rule list.
|
protected double |
support
% support.
|
Constructor and Description |
---|
AssocRuleMining(myDataset ds,
double sup,
double conf)
Constructor to process dataset and parameters.
|
Modifier and Type | Method and Description |
---|---|
protected short[] |
complement(short[] itemSet1,
short[] itemSet2)
Returns complement of first itemset with respect to second itemset.
|
protected short[] |
copyItemSet(short[] itemSet)
Makes a copy of a given itemSet.
|
protected int[][] |
countSingles()
Counts number of occurrences of each single attribute in the
input data.
|
protected void |
defConvertArrays(int[][] countArray)
Defines conversion and reconversion arrays.
|
protected int |
getNumSupOneItemSets()
Gets number of supported single item sets (note this is not necessarily
the same as the number of columns/attributes in the input set).
|
java.util.ArrayList<AssociationRule> |
getRulesSet()
It constructs a rules set once the algorithm has been carried out.
|
void |
idInputDataOrdering()
Reorders input data according to frequency of
single attributes.
|
protected void |
insertRuleintoRulelist(short[] antecedent,
short[] consequent,
double confidenceForRule,
double supportForRule,
double supportForAntecedent)
Inserts an (association/classification) rule into the linkedlist of
rules pointed at by startRulelist.
|
protected boolean |
notMemberOf(short number,
short[] itemSet)
Checks whether a particular element/attribute identified by a
column number is not a member of the given item set.
|
protected void |
orderFirstNofCountArray(int[][] countArray,
int endIndex)
Bubble sorts first N elements in count array produced by
countSingles method so that array is ordered according to
frequency of single items.
|
void |
outputDataArray()
Outputs stored input data set; initially read from input data file, but
may be reordered or pruned if desired by a particular application.
|
protected void |
outputItemSet(short[] itemSet)
Outputs a given item set.
|
void |
outputRules()
Outputs contents of rule linked list (if any) assuming that the list
represents a set of ARs.
|
void |
outputRules(AssocRuleMining.RuleNode ruleList)
Outputs given rule list.
|
protected short[] |
realloc1(short[] oldItemSet,
short newElement)
Resizes given item set so that its length is increased by one
and appends new element (identical to append method)
|
protected short[] |
realloc2_new(short[] oldItemSet,
short newElement)
Resizes given array so that its length is increased by one element
and new element added to front
|
protected short[] |
realloc2(short[] oldItemSet,
short newElement)
Resizes given array so that its length is increased by one element
and new element added to front
|
protected short[] |
reallocInsert(short[] oldItemSet,
short newElement)
Resizes given item set so that its length is increased by one
and new element inserted.
|
void |
recastInputData()
Recasts the contents of the data array so that each record is ordered
according to conversion array.
|
void |
recastInputDataAndPruneUnsupportedAtts()
Recasts the contents of the data array so that each record is
ordered according to ColumnCounts array and excludes non-supported
elements.
|
protected short |
reconvertItem(short item)
Reconvert single item if appropriate.
|
protected short[] |
reconvertItemSet(short[] itemSet)
Reconverts given item set according to contents of reconversion array.
|
protected short[] |
removeElementN(short[] oldItemSet,
int n)
Removes the nth element/attribute from the given item set.
|
protected void |
sortItemSet(short[] itemSet)
Sorts an unordered item set.
|
protected double |
twoDecPlaces(double number)
Converts given real number to real number rounded up to two decimal
places.
|
protected AssocRuleMining.RuleNode startRulelist
protected short[][] dataArray
protected int[][] conversionArray
protected short[] reconversionArray
protected int numCols
protected int numRows
protected double support
protected double minSupport
Set when input data is read and the number of records is known,
protected double confidence
protected int numOneItemSets
protected myDataset dataset
protected boolean isOrderedFlag
protected boolean isPrunedFlag
public AssocRuleMining(myDataset ds, double sup, double conf)
ds
- The instance of the dataset for dealing with its recordssup
- The user-specified minimum support for the mined association rulesconf
- The user-specified minimum confidence for the mined association rulespublic void idInputDataOrdering()
Example, given the data set:
1 2 5 1 2 3 2 4 5 1 2 5 2 3 5This would produce a countArray (ignore index 0):
+---+---+---+---+---+---+ | | 1 | 2 | 3 | 4 | 5 | +---+---+---+---+---+---+ | | 3 | 5 | 2 | 1 | 4 | +---+---+---+---+---+---+Which sorts to:
+---+---+---+---+---+---+ | | 2 | 5 | 1 | 3 | 4 | +---+---+---+---+---+---+ | | 5 | 4 | 3 | 2 | 1 | +---+---+---+---+---+---+Giving rise to the conversion Array of the form (no index 0):
+---+---+---+---+---+---+ | | 3 | 1 | 4 | 5 | 2 | +---+---+---+---+---+---+ | | 3 | 5 | 2 | 1 | 4 | +---+---+---+---+---+---+Note that the second row here are the counts which no longer play a role in the conversion exercise. Thus to the new column number for column 1 is column 3 (i.e. the first vale at index 1). The reconversion array of the form:
+---+---+---+---+---+---+ | | 2 | 5 | 1 | 3 | 4 | +---+---+---+---+---+---+
protected int[][] countSingles()
protected void orderFirstNofCountArray(int[][] countArray, int endIndex)
Used when ordering classification input data.
countArray
- The 2-D array returned by the countSingles
method.endIndex
- the index of the Nth element.protected void defConvertArrays(int[][] countArray)
countArray
- The 2-D array sorted by the orderCcountArray
method.public void recastInputData()
Proceed as follows: 1) For each record in the data array. Create an empty new itemSet array. 2) Place into this array attribute/column numbers that correspond to the appropriate equivalents contained in the conversion array. 3) Reorder this itemSet and return into the data array.
public void recastInputDataAndPruneUnsupportedAtts()
Proceed as follows: 1) For each record in the data array. Create an empty new itemSet array. 2) Place into this array any column numbers in record that are supported at the index contained in the conversion array. 3) Assign new itemSet back into to data array
protected int getNumSupOneItemSets()
protected short[] reconvertItemSet(short[] itemSet)
itemSet
- the fgiven itemset.protected short reconvertItem(short item)
item
- the given item (attribute).protected void insertRuleintoRulelist(short[] antecedent, short[] consequent, double confidenceForRule, double supportForRule, double supportForAntecedent)
The list is ordered so that rules with highest confidence are listed first. If two rules have the same confidence the new rule will be placed after the existing rule. Thus, if using an Apriori approach to generating rules, more general rules will appear first in the list with more specific rules (i.e. rules with a larger antecedent) appearing later as the more general rules will be generated first.
antecedent
- the antecedent (LHS) of the rule.consequent
- the consequent (RHS) of the rule.confidenceForRule
- the associated confidence value.supportForRule
- the associated support value.supportForAntecedent
- the antecedent support value.protected short[] reallocInsert(short[] oldItemSet, short newElement)
oldItemSet
- the original item setnewElement
- the new element/attribute to be insertedprotected short[] realloc1(short[] oldItemSet, short newElement)
oldItemSet
- the original item setnewElement
- the new element/attribute to be appendedprotected short[] realloc2(short[] oldItemSet, short newElement)
oldItemSet
- the original item setnewElement
- the new element/attribute to be appendedprotected short[] realloc2_new(short[] oldItemSet, short newElement)
oldItemSet
- the original item setnewElement
- the new element/attribute to be appendedprotected short[] removeElementN(short[] oldItemSet, int n)
oldItemSet
- the given item set.n
- the index of the element to be removed (first index is 0).protected short[] complement(short[] itemSet1, short[] itemSet2)
itemSet1
- the first given item set.itemSet2
- the second given item set.protected void sortItemSet(short[] itemSet)
itemSet
- the given item set.protected boolean notMemberOf(short number, short[] itemSet)
number
- the attribute identifier (column number).itemSet
- the given item set.protected short[] copyItemSet(short[] itemSet)
itemSet
- the given item set.public void outputDataArray()
protected void outputItemSet(short[] itemSet)
itemSet
- the given item set.public void outputRules()
public void outputRules(AssocRuleMining.RuleNode ruleList)
ruleList
- the given rule list.protected double twoDecPlaces(double number)
number
- the given number.public java.util.ArrayList<AssociationRule> getRulesSet()