partly-abstract class, defines support for data splitting More...
#include <data_accessor_splitting.hpp>
Classes | |
class | DataSplit |
Data splitting support structure; holds one set of intervals (train, test) per each splitting depth and (data)class. More... | |
Public Types | |
typedef boost::shared_ptr < Data_Splitter < INTERVALCONTAINER, IDXTYPE > > | PSplitter |
typedef boost::shared_ptr < std::vector< PSplitter > > | PSplitters |
typedef const DATATYPE * | PPattern |
Public Member Functions | |
Data_Accessor_Splitting (const PSplitters _dsp) | |
virtual unsigned int | getNoOfClasses () const |
returns number of classes | |
virtual unsigned int | getNoOfFeatures () const |
returns data dimensionality | |
virtual IDXTYPE | getClassSize (const unsigned int c) const |
returns size (number of samples in) of class c | |
virtual IDXTYPE | getClassSizeSum () const |
returns summed size (number of samples in) of all classes, i.e., no. of all patterns in data | |
virtual void | setClass (const int c) |
sets active class -> from now on only data from class c will be considered | |
virtual int | getClass () const |
returns active class | |
void | setSplittingDepth (const unsigned int depth) |
unsigned int | getSplittingDepth () const |
virtual unsigned int | getNoOfSplits () const |
data access iteration (to support, e.g., loops in cross-validation) | |
virtual bool | getFirstSplit () |
data access iteration (to support, e.g., loops in cross-validation) | |
virtual bool | getNextSplit () |
data access iteration (to support, e.g., loops in cross-validation) | |
virtual unsigned int | getSplitIndex () const |
data access iteration (to support, e.g., loops in cross-validation) | |
virtual IDXTYPE | getNoOfBlocks (const DataPart ofwhat) const |
virtual bool | getFirstBlock (const DataPart ofwhat, PPattern &firstpattern, IDXTYPE &patterns, const unsigned int loopdepth=0)=0 |
returns pointer to first consecutive block of data of requested DataPart type in the current split (access iteration) | |
virtual bool | getNextBlock (const DataPart ofwhat, PPattern &firstpattern, IDXTYPE &patterns, const unsigned int loopdepth=0)=0 |
returns pointer to next consecutive block of data of requested DataPart type in the current split (access iteration) | |
virtual IDXTYPE | getBlockIndex (const unsigned int loopdepth=0) const |
returns index of the current consecutive block of data of requested DataPart type in the current split (access iteration) | |
virtual IDXTYPE | getNoOfPatterns (const DataPart ofwhat) const |
returns number of patterns in all consecutive blocks of data of requested DataPart type in the current split (access iteration) | |
virtual void | substitute (const DataPart source, const DataPart target) |
enables change of meaning of DataPart types, for use in specialized data access scenarios like in bias predicting wrappers | |
virtual void | resubstitute () |
resets standard DataPart types' meaning | |
virtual std::ostream & | print (std::ostream &os) const |
Protected Types | |
typedef std::vector< unsigned int > | CLASSSIZES |
typedef const Data_Interval < IDXTYPE > * | DATAINTERVAL |
typedef boost::shared_ptr < INTERVALCONTAINER > | PIntervaller |
Protected Member Functions | |
Data_Accessor_Splitting (const Data_Accessor_Splitting &da) | |
void | initialize (const unsigned int _features, const CLASSSIZES &_classes) |
DataPart | mappedDataPart (const DataPart ofwhat) const |
virtual bool | getFirstBlock (const DataPart ofwhat, DATAINTERVAL &tmp, const unsigned int loopdepth=0) |
returns Data_Interval record representing the first consecutive block of data of requested DataPart type in the current split (access iteration) | |
virtual bool | getNextBlock (const DataPart ofwhat, DATAINTERVAL &tmp, const unsigned int loopdepth=0) |
returns Data_Interval record representing the next consecutive block of data of requested DataPart type in the current split (access iteration) | |
bool | is_initialized () const |
void | assert_splits (const int splitting_check=-1) const |
Protected Attributes | |
CLASSSIZES | classes |
unsigned int | features |
DataPart | mappedTRAIN |
DataPart | mappedTEST |
DataPart | mappedTRAINTEST |
DataPart | mappedALL |
PSplitters | dsp |
std::vector< std::vector < DataSplit > > | splits |
one set of splitters per each splitting depth and class | |
std::vector< IDXTYPE > | enum_split |
current split.. 0~none | |
std::vector< std::vector < IDXTYPE > > | enum_block |
current block loop.. 0~none | |
std::vector< std::vector < DataPart > > | tt_phase |
in current block loop.. for DataPart==TRAINTEST indicates: 0-no loop, 1-train loop, 2-test loop | |
unsigned int | splitting_depth |
switch between inner and outer loop get*Train*, get*Test* functionality | |
int | active_class |
denotes from which class the get*Train* get*Validate* get*Test* methods return patterns, to be set using setClass() | |
bool | _initialize_called |
partly-abstract class, defines support for data splitting
Data structures in Data_Accessor_Splitting are directly used in the splitting mechanism that enables structured access to data. In order to keep the implementation of data splitters (specializations of Data_Splitter) as simple as possible for the user, we moved most of the technicalities here. Correct state of key data structures in Data_Accessor_Splitting is as follows:
The splitting mechanism needs not just one train-test data structure pair, but two - the second pair denoted _reduced_train and _reduced_test. The "reduced" pair is constructed from the "base" pair separately in each splitting level, so as to correctly represent the subset of data defined by the respective splitter (each deeper level further reduces access to data visible in the preceeding level). The actual access to data through getFirstBlock() and getNextBlock() is commanded by data intervals stored in the "reduced" pair of interval lists only.
The two pairs of interval lists exist separately for each splitting level and each data class. In Data_Accessor_Splitting they are collected in the DataSplit subclass, of which the required number of instances is kept in the "splits" container. In correct representation "splits" must contain [number of classes]*[number of splitting levels] DataSplit instances.
The DataSplit that represent top splitting level differ from the deeper level - the "reduced" pair of lists is actually just referencing the "base" pair. This is because data indexes as produced by splitters are valid indexes usable to access the data. In deeper splitting levels this is not so because splitter indexes must be treated are relative to the data possibly restricted in higher level. Transforming the relative indexes to absolute indexes is achieved through the "reduce" method implemented in Data_Intervaller. In non-top splitting levels before data can be accessed, the "base" indexes/intervals are first transformed using the "reduce" method with the result stored in "_reduced" train and test lists.
The "base" train and test lists allocated here in Data_Accessor_Splitting need to be interlinked with respective data splitters. The splitters do not hold any allocated structures, they re-direct their output to the "base" train and test lists kept in Data_Accessor_Splitting, to enable data accessing routines to transform the indexes by means of "reduce" whenever needed and subsequently to access the correct subset of data.
IDXTYPE FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER >::getNoOfBlocks | ( | const DataPart | ofwhat | ) | const [inline, virtual] |
Implements FST::Data_Accessor< DATATYPE, IDXTYPE >.
References FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER >::active_class, FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER >::splits, and FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER >::splitting_depth.
void FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER >::initialize | ( | const unsigned int | _features, | |
const CLASSSIZES & | _classes | |||
) | [inline, protected] |
sets-up memory structures needed in the splitting mechanism data access is not needed here, the structures work with indexes only - the only information needed is dimensionality and sizes of data classes
References FST::Data_Accessor_Splitting< DATATYPE, IDXTYPE, INTERVALCONTAINER >::splits.
Referenced by FST::Data_Accessor_Splitting_Mem< DATATYPE, IDXTYPE, INTERVALCONTAINER >::initialize().