Version history
- 3.1.1.beta
- Fixed a bug in Sparse ARFF filter that prevented correct Sparse ARFF files from being read.
- Standard ARFF filter is now less sensitive to header formatting and accepts more ARFF files straight away.
Remark: Note that there is no difference between FST 3.1.0 and 3.1.1 except in ARFF filter (_src_dataio/data_file_ARFF.cpp) and Reuters ARFF sample data.
- 3.1.0.beta
- optimal Branch & Bound methods
- BBB, Basic Branch & Bound
- IBB, Improved Branch & Bound
- BBPP, Branch & Bound with Partial Prediction (averaging predictor)
- FBB, Fast Branch & Bound (averaging predictor)
- DAF, Dependency-Aware Feature Ranking, is a new highly efficient method for very-high-dimensional FS; unlike BIF it does not ignore contextual information and, consequently, is capable of yielding considerably better results (enables wrapper-based feature selection with dimensionality in the order of 105-106; works with arbitrary wrapper)
- DAF0 (standard)
- DAF1 (normalized)
- SFRS/SBRS - Sequential Retreating Search algorithm is related to Floating Search but more thorough. Suitable also for use with secondary criterion (result regularization)
- 'generalized' variants of all sequential methods enabling more thorough search by testing feature g-tuples instead of just single features per step (see 1982 Devijver Kittler book)
- (G)SFS, (G)SBS,
- (G)SFFS, (G)SBFS,
- (G)OS,
- (G)DOS,
- (G)SFRS, (G)SBRS
- all sequential methods now allow start from arbitrary subset (useful for tuning of results using several different methods)
- individual ranking (BIF) threaded implementation (handy in very high dimensional tasks)
- Monte Carlo and threaded Monte Carlo method selects the best from a random sequence of feature subsets
- SFS/SBS, SFFS/SBFS, and SFRS/SBRS now enable post-search retreival of best results of each subset size as observed in the course of search
- modified SFFS implementation to fit more closely the original definition (now runs faster)
- re-implmented threading in sequential methods now more efficient due to reduced number of thread creations/destructions
- search method output is now redirectable to arbitrary output stream
- search method output can be switched off (introduced output levels SILENT, NORMAL, DETAILED)
- improved result trackers (cloning, joining, etc.)
- arbitrary data part access substitution (TEST for TRAIN, etc.) to enable bias estimation
- bias estimating wrapper
- cleaner stopwatch implementation
- now permits missing values in data - such values are substituted per feature by the mean value over valid values
- classifiers now implement method classify(), enabling classification of an arbitrary sample
- refactored directory structure
- lots of new demos showing broader variety of usage scenarios
- demos grouped according to purpose (for easier orientation especially of novice users)
- various minor improvements and additions (e.g., alternative random initialization of subsets, etc.)
- corrected several bugs and minor issues
- 3.0.2.beta
- added Exhaustive Search procedure in both sequential and threaded implementations to enable optimal feature selection
- corrected minor issues to support LibSVM 3.0
- result trackers now support cloning and memory usage limits
- added logfile with captured output of all demos for verification purposes (rundemos.log)
- corrected several minor issues
- 3.0.1.beta
- added support for reading ARFF (Waikato Weka) data files
- corrected minor issues to enable compilation in Visual C++
- 3.0.0.beta
- initial public release
- templated C++ code, using Boost library
- feature selection criteria
- classification accuracy estimation based (wrappers), see data access options below
- normal Bayes classifier
- k-Nearest Neighbor classifier (based on various L-distances)
- Support Vectior Machine (optional, depends on external LibSVM library)
- normal model based (filter)
- Bhattacharyya distance
- Divergence
- Generalized Mahalanobis distance
- multinomial model based (filter) - Bhattacharyya, Mutual Information
- criteria ensembles
- hybrids
- feature selection methods
- ranking (BIF, best individual features)
- sequential search (hill-climbing)
- sequential selection (SFS/SBS, restricted/unrestricted)
- floating search (SFFS/SBFS, restricted/unrestricted)
- oscillating search (OS, deterministic, randomized, restricted/unrestricted)
- dynamic oscillating search (DOS, deterministic, randomized, restricted/unrestricted)
- in any of the above: threaded, sequential, hybrid or ensemble based feature preference evaluation
- supporting techniques (freely combinable with methods above)
- subset size optimization vs. subset size as user parameter
- result regularization (preference of solutions with slightly lower criterion value to counter over-fitting)
- feature acquisition cost minimization
- feature selection process stability evaluation
- two-process similarity evaluation (to determine impact of parameter change etc.)
- flexible data processing
- nested multi-level sampling (splitting to training, valitation, test and possibly other data parts)
- sampling through extendable objects (includes re-substitution, cross-valiation, hold-out, leave-one-out, random sampling, etc.)
- normalization through extendable objects (interval shrinking, whitening)
- support for textual flat data format TRN (see FST1)
- pre-3.0.0