gatb.core-API-0.0.0
|
Algorithm that builds a hash table whose keys are kmers and values are kmer abundances. More...
#include <MPHFAlgorithm.hpp>
Public Types | |
typedef kmer::impl::Kmer< span >::Type | Type |
typedef tools::collections::impl::MapMPHF< Type, Abundance_t > | AbundanceMap |
typedef tools::collections::impl::MapMPHF< Type, NodeState_t > | NodeStateMap |
typedef u_int8_t | Adjacency_t |
Public Member Functions | |
MPHFAlgorithm (tools::storage::impl::Group &group, const std::string &name, tools::collections::Iterable< Count > *solidCounts, tools::collections::Iterable< Type > *solidKmers, unsigned int nbCores, bool buildOrLoad, tools::misc::IProperties *options=0) | |
~MPHFAlgorithm () | |
void | execute () |
float | getNbBitsPerKmer () const |
AbundanceMap * | getAbundanceMap () const |
Public Member Functions inherited from Algorithm | |
Algorithm (const std::string &name, int nbCores=-1, gatb::core::tools::misc::IProperties *input=0) | |
virtual | ~Algorithm () |
std::string | getName () const |
void | run () |
virtual IProperties * | getInput () |
virtual IProperties * | getOutput () |
virtual IProperties * | getInfo () |
virtual dp::IDispatcher * | getDispatcher () |
virtual TimeInfo & | getTimeInfo () |
virtual IProperties * | getSystemInfo () |
template<typename Item > | |
dp::Iterator< Item > * | createIterator (dp::Iterator< Item > *iter, size_t nbIterations=0, const char *message=0, dp::IteratorListener *listener=0) |
virtual dp::IteratorListener * | createIteratorListener (size_t nbIterations, const char *message) |
Public Member Functions inherited from SmartPointer | |
void | use () |
void | forget () |
Public Member Functions inherited from ISmartPointer | |
virtual | ~ISmartPointer () |
Static Public Attributes | |
static const Abundance_t | MAX_ABUNDANCE = std::numeric_limits<Abundance_t>::max() |
Additional Inherited Members | |
Static Public Member Functions inherited from Algorithm | |
template<template< size_t > class Functor> | |
static int | mainloop (tools::misc::IOptionsParser *parser, int argc, char *argv[]) |
Protected Member Functions inherited from Algorithm | |
std::string | getUriByKey (const std::string &key) |
std::string | getUri (const std::string &str) |
void | setInput (IProperties *input) |
Protected Member Functions inherited from SmartPointer | |
SmartPointer () | |
virtual | ~SmartPointer () |
Algorithm that builds a hash table whose keys are kmers and values are kmer abundances.
This class uses a [kmer,abundance] mapping by using a minimal perfect hash function (MPHF). For N kmers (ie. the keys), the hash function gives a unique integer value between 0 and N-1.
It uses two template parameters: 1) span : gives the max usable size for kmers 2) Abundance_t : type of the abundance values (on 1 byte by default) 2) NodeState_t : type of the node states values (on half a byte by default, grouped by two per byte)
Storing the values (ie. the abundances) is done by creating a vector of size N. Asking the abundance of a kmer consists in:
The MPHF function is built from a list of kmers values of type Kmer<span>::Type. Since the building of the MPHF may take a while, it is saved in a Storage object; more precisely, it is saved in a collection given by a couple [group,name]. Such a couple is likely to be the group of the SortingCount algorithm, with a name being by convention "mphf".
Once the MPHF is built, it is populated by the kmers abundance values, which means that we set each value of each key of the hash table. The abundances are clipped to a maximum value in order not to exceed the Abundance_t type capacity (provided as a template of the MPHFAlgorithm class). The maximum value is computed through the std::numeric_limits traits.
Once the abundance map is built and populated, it is available through the 'getAbundanceMap' method. It may be used for instance by the Graph class in order to get the abundance of any node (ie. kmer) of the de Bruijn graph.
Note: the keys of the hash table are of type Kmer<span>::Type, but we need however to have the abundance information through the Kmer<span>::Count type. That's why we need to use 2 Iterable instances, one of type Kmer<span>::Count and one of type Kmer<span>::Type.
Some statistics about the MPHF building are gathered and put into the Properties 'info'.
typedef tools::collections::impl::MapMPHF<Type,Abundance_t> AbundanceMap |
We define the type of the hash table of couples [kmer/abundance].
typedef u_int8_t Adjacency_t |
We define the type of the hash table of couples [kmer/graph adjacency information].
typedef tools::collections::impl::MapMPHF<Type,NodeState_t> NodeStateMap |
We define the type of the hash table of couples [kmer/node state].
typedef kmer::impl::Kmer<span>::Type Type |
Shortcuts.
MPHFAlgorithm | ( | tools::storage::impl::Group & | group, |
const std::string & | name, | ||
tools::collections::Iterable< Count > * | solidCounts, | ||
tools::collections::Iterable< Type > * | solidKmers, | ||
unsigned int | nbCores, | ||
bool | buildOrLoad, | ||
tools::misc::IProperties * | options = 0 |
||
) |
Constructor.
[in] | group | : storage group where to save the MPHF once built |
[in] | name | : name of the collection in the group where the MPHF will be saved |
[in] | solidCounts | : iterable on couples [kmers/abundance] |
[in] | solidKmers | : iterable on kmers |
[in] | buildOrLoad | : true for build/save the MPHF, false for load only |
[in] | options | : extra options for configuration (may be empty) |
~MPHFAlgorithm | ( | ) |
Destructor.
|
virtual |
Implementation of the Algorithm::execute method.
Implements Algorithm.
|
inline |
Accessor to the map. Note : if clients get this map and use it (as a SmartPointer), the map instance will be still alive (ie. not deleted) even if the MPHFAlgorithm instance that built it is deleted first.
float getNbBitsPerKmer | ( | ) | const |
Get the number of bits of a value.
|
static |
We define the maximum abundance according to the provided type (value set in the cpp file).
First tried to set the constant in the hpp file but got the following error: "error: a function call cannot appear in a constant-expression" Solved by putting it in the cpp... => http://stackoverflow.com/questions/2738435/using-numeric-limitsmax-in-constant-expressions