public class SMO extends java.lang.Object implements WeightedInstancesHandler, TechnicalInformationHandler
@incollection{Platt1998, author = {J. Platt}, booktitle = {Advances in Kernel Methods - Support Vector Learning}, editor = {B. Schoelkopf and C. Burges and A. Smola}, publisher = {MIT Press}, title = {Machines using Sequential Minimal Optimization}, year = {1998}, URL = {http://research.microsoft.com/~jplatt/smo.html}, PS = {http://research.microsoft.com/~jplatt/smo-book.ps.gz}, PDF = {http://research.microsoft.com/~jplatt/smo-book.pdf} } @article{Keerthi2001, author = {S.S. Keerthi and S.K. Shevade and C. Bhattacharyya and K.R.K. Murthy}, journal = {Neural Computation}, number = {3}, pages = {637-649}, title = {Improvements to Platt's SMO Algorithm for SVM Classifier Design}, volume = {13}, year = {2001}, PS = {http://guppy.mpe.nus.edu.sg/~mpessk/svm/smo_mod_nc.ps.gz} } @inproceedings{Hastie1998, author = {Trevor Hastie and Robert Tibshirani}, booktitle = {Advances in Neural Information Processing Systems}, editor = {Michael I. Jordan and Michael J. Kearns and Sara A. Solla}, publisher = {MIT Press}, title = {Classification by Pairwise Coupling}, volume = {10}, year = {1998}, PS = {http://www-stat.stanford.edu/~hastie/Papers/2class.ps} }Valid options are:
-D If set, classifier is run in debug mode and may output additional info to the console
-no-checks Turns off all checks - use with caution! Turning them off assumes that data is purely numeric, doesn't contain any missing values, and has a nominal class. Turning them off also means that no header information will be stored if the machine is linear. Finally, it also assumes that no instance has a weight equal to 0. (default: checks on)
-C <double> The complexity constant C. (default 1)
-N Whether to 0=normalize/1=standardize/2=neither. (default 0=normalize)
-L <double> The tolerance parameter. (default 1.0e-3)
-P <double> The epsilon for round-off error. (default 1.0e-12)
-M Fit logistic models to SVM outputs.
-V <double> The number of folds for the internal cross-validation. (default -1, use training data)
-W <double> The random number seed. (default 1)
-K <classname and parameters> The Kernel to use. (default: weka.classifiers.functions.supportVector.PolyKernel)
Options specific to kernel weka.classifiers.functions.supportVector.PolyKernel:
-D Enables debugging output (if available) to be printed. (default: off)
-no-checks Turns off all checks - use with caution! (default: checks on)
-C <num> The size of the cache (a prime number). (default: 250007)
-E <num> The Exponent to use. (default: 1.0)
-L Use lower-order terms. (default: no)
Modifier and Type | Class and Description |
---|---|
class |
SMO.BinarySMO
Class for building a binary support vector machine.
|
Modifier and Type | Field and Description |
---|---|
static int |
FILTER_NONE
filter: No normalization/standardization
|
static int |
FILTER_NORMALIZE
filter: Normalize training data
|
static int |
FILTER_STANDARDIZE
filter: Standardize training data
|
protected java.lang.String |
input_test_name
Test dataset filename.
|
protected java.lang.String |
input_train_name
Training dataset filename.
|
protected java.lang.String |
input_validation_name
Validation dataset filename.
|
protected double |
m_C
The complexity parameter.
|
protected boolean |
m_checksTurnedOff
Turn off all checks and conversions?
|
protected Attribute |
m_classAttribute
The class attribute
|
protected SMO.BinarySMO[][] |
m_classifiers
The binary classifier(s)
|
protected int |
m_classIndex
The class index from the training data
|
protected static double |
m_Del
Precision constant for updating sets
|
protected double |
m_eps
Epsilon for rounding.
|
protected int |
m_filterType
Whether to normalize/standardize/neither
|
protected boolean |
m_fitLogisticModels
Whether logistic models are to be fit
|
protected Kernel |
m_kernel
the kernel to use
|
protected boolean |
m_KernelIsLinear
whether the kernel is a linear one
|
protected boolean |
m_nominalToBinary
Whether to convert nominal attributes into binary values
|
protected int |
m_NumClasses
The number of the class labels
|
protected int |
m_numFolds
The number of folds for the internal cross-validation
|
protected int |
m_randomSeed
The random number seed
|
protected int |
m_seed
The seed used for of the class labels
|
protected double |
m_tol
Tolerance for accuracy of result.
|
protected double[] |
mean
Variable with the mean of each attribute.
|
protected java.lang.String |
method_output
Model output filename.
|
protected java.lang.String |
output_test_name
Test output filename.
|
protected java.lang.String |
output_train_name
Training output filename.
|
double[][] |
probabilities
SMO probabilities.
|
protected double[] |
std_dev
Variable with the std deviation of each attribute.
|
static Tag[] |
TAGS_FILTER
The filter to apply to the training data
|
Constructor and Description |
---|
SMO()
Default constructor
|
SMO(java.lang.String fileParam)
Creates a new instance of SMO with a file parameter of KEEL format
|
Modifier and Type | Method and Description |
---|---|
java.lang.String[][][] |
attributeNames()
Returns the attribute names.
|
double[][] |
bias()
Returns the bias of each binary SMO.
|
void |
buildClassifier(Instances insts)
Method for building the classifier.
|
java.lang.String |
buildLogisticModelsTipText()
Returns the tip text for this property
|
java.lang.String |
checksTurnedOffTipText()
Returns the tip text for this property
|
java.lang.String[] |
classAttributeNames()
Returns the names of the class attributes.
|
protected void |
computeStats(InstanceSet IS)
Compute the mean and std. deviation of each attribute of Attributes.
|
java.lang.String |
cTipText()
Returns the tip text for this property
|
double[] |
distributionForInstance(Instance inst)
Estimates class probabilities for given instance.
|
java.lang.String |
epsilonTipText()
Returns the tip text for this property
|
java.lang.String |
filterTypeTipText()
Returns the tip text for this property
|
boolean |
getBuildLogisticModels()
Get the value of buildLogisticModels.
|
double |
getC()
Get the value of C.
|
boolean |
getChecksTurnedOff()
Returns whether the checks are turned off or not.
|
double |
getEpsilon()
Get the value of epsilon.
|
SelectedTag |
getFilterType()
Gets how the training data will be transformed.
|
Kernel |
getKernel()
Returns the kernel to use
|
int |
getNumFolds()
Get the value of numFolds.
|
int |
getRandomSeed()
Get the value of randomSeed.
|
TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing
detailed information about the technical background of this class,
e.g., paper reference or book this class is based on.
|
double |
getToleranceParameter()
Get the value of tolerance parameter.
|
java.lang.String |
globalInfo()
Returns a string describing classifier
|
protected Instances |
InstancesKEEL2Weka(InstanceSet is,
int preprocessType,
boolean nominal2binary)
Creates a new allocated WEKA's set of Instances (i.e.
|
java.lang.String |
kernelTipText()
Returns the tip text for this property
|
protected double |
normalize(double value,
Attribute a)
Normalize the input value according to the provided attribute
|
int |
numClassAttributeValues()
Returns the number of values of the class attribute.
|
java.lang.String |
numFoldsTipText()
Returns the tip text for this property
|
int[] |
obtainVotes(Instance inst)
Returns an array of votes for the given instance.
|
double[] |
pairwiseCoupling(double[][] n,
double[][] r)
Implements pairwise coupling.
|
protected void |
printSVs()
Prints the Support vectors to file
|
java.lang.String |
randomSeedTipText()
Returns the tip text for this property
|
void |
runModel()
Run the model once the parameters have been set by the
method config_read()
|
void |
runModel(InstanceSet train,
InstanceSet test)
Run the model once the parameters have been set by the
method config_read()
|
void |
setBuildLogisticModels(boolean newbuildLogisticModels)
Set the value of buildLogisticModels.
|
void |
setC(double v)
Set the value of C.
|
void |
setChecksTurnedOff(boolean value)
Disables or enables the checks (which could be time-consuming).
|
void |
setEpsilon(double v)
Set the value of epsilon.
|
void |
setFilterType(SelectedTag newType)
Sets how the training data will be transformed.
|
void |
setKernel(Kernel value)
sets the kernel to use
|
void |
setNumFolds(int newnumFolds)
Set the value of numFolds.
|
void |
setRandomSeed(int newrandomSeed)
Set the value of randomSeed.
|
void |
setToleranceParameter(double v)
Set the value of tolerance parameter.
|
int[][][] |
sparseIndices()
Returns the indices in sparse format.
|
double[][][] |
sparseWeights()
Returns the weights in sparse format.
|
protected double |
standardize(double value,
int j)
Standardize the provided value, converting it to a new double value from
a normal distribution with mean = 0 and std. deviation = 1
|
java.lang.String |
toleranceParameterTipText()
Returns the tip text for this property
|
void |
turnChecksOff()
Turns off checks for missing values, etc.
|
void |
turnChecksOn()
Turns on checks for missing values, etc.
|
static void |
writeOutput(java.lang.String fileName,
java.lang.String[] instancesIN,
java.lang.String[] instancesOUT,
Attribute[] inputs,
Attribute output,
int nInputs,
java.lang.String relation)
Creates the output file in KEEL format of this method
|
public double[][] probabilities
public static final int FILTER_NORMALIZE
public static final int FILTER_STANDARDIZE
public static final int FILTER_NONE
public static final Tag[] TAGS_FILTER
protected SMO.BinarySMO[][] m_classifiers
protected double m_C
protected double m_eps
protected double m_tol
protected int m_filterType
protected int m_classIndex
protected Attribute m_classAttribute
protected boolean m_KernelIsLinear
protected boolean m_checksTurnedOff
protected static double m_Del
protected boolean m_fitLogisticModels
protected int m_numFolds
protected int m_randomSeed
protected Kernel m_kernel
protected int m_NumClasses
protected int m_seed
protected boolean m_nominalToBinary
protected java.lang.String input_train_name
protected java.lang.String input_validation_name
protected java.lang.String input_test_name
protected java.lang.String output_train_name
protected java.lang.String output_test_name
protected java.lang.String method_output
protected double[] mean
protected double[] std_dev
public SMO(java.lang.String fileParam)
fileParam
- The path to the configuration file with all the parameters in KEEL formatpublic SMO()
Default constructor
public java.lang.String globalInfo()
public TechnicalInformation getTechnicalInformation()
getTechnicalInformation
in interface TechnicalInformationHandler
public void turnChecksOff()
public void turnChecksOn()
public void buildClassifier(Instances insts) throws java.lang.Exception
insts
- the set of training instancesjava.lang.Exception
- if the classifier can't be built successfullypublic double[] distributionForInstance(Instance inst) throws java.lang.Exception
inst
- the instance to compute the probabilities forjava.lang.Exception
- in case of an errorpublic double[] pairwiseCoupling(double[][] n, double[][] r)
n
- the sum of weights used to train each modelr
- the probability estimate from each modelpublic int[] obtainVotes(Instance inst) throws java.lang.Exception
inst
- the instancejava.lang.Exception
- if something goes wrongpublic double[][][] sparseWeights()
public int[][][] sparseIndices()
public double[][] bias()
public int numClassAttributeValues()
public java.lang.String[] classAttributeNames()
public java.lang.String[][][] attributeNames()
public void setChecksTurnedOff(boolean value)
value
- if true turns off all checkspublic boolean getChecksTurnedOff()
public java.lang.String checksTurnedOffTipText()
public java.lang.String kernelTipText()
public void setKernel(Kernel value)
value
- the kernel to usepublic Kernel getKernel()
public java.lang.String cTipText()
public double getC()
public void setC(double v)
v
- Value to assign to C.public java.lang.String toleranceParameterTipText()
public double getToleranceParameter()
public void setToleranceParameter(double v)
v
- Value to assign to tolerance parameter.public java.lang.String epsilonTipText()
public double getEpsilon()
public void setEpsilon(double v)
v
- Value to assign to epsilon.public java.lang.String filterTypeTipText()
public SelectedTag getFilterType()
public void setFilterType(SelectedTag newType)
newType
- the new filtering modepublic java.lang.String buildLogisticModelsTipText()
public boolean getBuildLogisticModels()
public void setBuildLogisticModels(boolean newbuildLogisticModels)
newbuildLogisticModels
- Value to assign to buildLogisticModels.public java.lang.String numFoldsTipText()
public int getNumFolds()
public void setNumFolds(int newnumFolds)
newnumFolds
- Value to assign to numFolds.public java.lang.String randomSeedTipText()
public int getRandomSeed()
public void setRandomSeed(int newrandomSeed)
newrandomSeed
- Value to assign to randomSeed.public void runModel()
public void runModel(InstanceSet train, InstanceSet test)
train
- training instances set.test
- test instances set.public static void writeOutput(java.lang.String fileName, java.lang.String[] instancesIN, java.lang.String[] instancesOUT, Attribute[] inputs, Attribute output, int nInputs, java.lang.String relation)
fileName
- Name of the content fileinstancesIN
- Vector with the original output valuesinstancesOUT
- Vector the predicted output valuesinputs
- Input Attributesoutput
- Output AttributenInputs
- Number of Inputs Attributesrelation
- Name of the data setprotected void printSVs()
protected Instances InstancesKEEL2Weka(InstanceSet is, int preprocessType, boolean nominal2binary)
is
- The KEEL Instance setpreprocessType
- An integer with the type of preprocess done before exporting data to Weka format (0 = normalize, 1 = standardize, 2 = do nothing).nominal2binary
- True if the nominal values must be converted in a set of binary ones (one bit per value of the nominal attribute).protected double normalize(double value, Attribute a)
value
- The value to be normalizeda
- The attribute to which value belongsprotected double standardize(double value, int j)
value
- The value to be standardizedj
- The INDEX of the attribute in the mean and std_dev members arrays (previously filled)protected void computeStats(InstanceSet IS)
IS
- The InstanceSet with the instanced from we compute the statistics