Feature Selection ToolboxFST3 Library / Documentation

FST3 Documentation

This is Doxygen-generated documentation of the C++ Feature Selection Toolbox 3 library. The library implements several cutting edge feature selection methods as well as feature selection criteria + supporting data modeling tools and classifiers

The library takes extensive use of templates and the Boost library (http://www.boost.org)

This is the documentation of library version 3.1.0.beta.


Data format(s)
Since FST3 version 3.0.1 the ARFF format (format of Weka machine learning library) is supported.

The original FST3 data file format has been TRN. TRN is a trivial textual format, where the class-ordered collection of vectors of C-style numerical values separated by whitespace is preceeded by a simple textual header. The data section (everything after header) may not contain anything else than N white-space-separated numerical values, where N=D*(S1+S2+...+Sc) for D-dimensional data representing c-classes of sizes S1,...,Sc. An example of af TRN file is shown below.

    #datafile 
    #title Medical data 
    ; 2-class 33-dimensional data representing tissue samples 
    ; 128 samples of benign tissue, 222 samples of malignant tissue 
    #features       33 
    #classes        2       128,222 
    #data 
    13.54  14.36  87.46  566.3  0.09779  0.08129 
    0.06664  0.04781  0.1885  0.05766  0.2699  0.7886 
    2.058  23.56  0.008462  0.0146  0.02387  0.01315 
    . 
    . 

All keywords must be placed at the beginning of lines. The first line must contain the #datafile keyword. The #title line is optional. The #features and #classes lines are mandatory. The #features keyword must be followed by a value depicting the number of features, separated by whitespace. The #classes keyword must be followed by a value depicting the number of classes, then by whitespace, and then by a series of class sizes separated by commas. The ";" character at the beginning of a line depicts comment. Comments may appear anywhere inside header, but not after the header. No keywords, comments or special characters may occur after the #data keyword, which depicts the start of the actual data.

General FST3 template parameter naming conventions:

Basic numeric types:

IDXTYPE
index values for enumeration of data samples - (nonnegative) integers, extent depends on numbers of samples in data
DIMTYPE
index values for enumeration of features (dimensions), or classes (not class sizes) - (nonnegative) integers, usually lower extent than IDXTYPE, but be aware of expressions like _classes*_features*_features ! (should be able to address linearized representations of feature matrices for all classes)
BINTYPE
feature selection marker type - should allow representing ca. <10 different feature states (selected, deselected, selected/deselected temporarily 1st nested loop, 2nd nested loop...)
REALTYPE
must be real numbers - for representing intermediate results of calculations like mean, covariance etc.
DATATYPE

data sample values - usually real numbers, but may be integers in text processing etc.

RETURNTYPE
criterion value: real numerical value, but may be extended in future to support multiple values

Class types:

SUBSET
class of class type Subset
CLASSIFIER
class implementing interface defined in abstract class Classifier
EVALUATOR
class implementing interface defined in abstract class SubsetEvaluator
DISTANCE
class implementing interface defined in abstract class Distance
DATAACCESSOR
class implementing interface defined in abstract class TDataAccessor
INTERVALCONTAINER
class of class type TIntervaller
CONTAINER
STL container of class type TInterval

Maintainer:

Dept. of Pattern Recognition
UTIA, Institute of Information Theory and Automation
Pod vodarenskou vezi 4
18208 Praha 8
Czech Republic
(see Contacts at http://fst.utia.cz)

Copyright:

Institute of Information Theory and Automation (UTIA), Academy of Sciences of the Czech Republic, Prague. All rights reserved.


Generated on Thu Mar 31 11:34:36 2011 for FST3Library by  doxygen 1.6.1