gatb.core-API-0.0.0
BankBinary Class Reference

Implementation of IBank for compressed format. More...

#include <BankBinary.hpp>

Inheritance diagram for BankBinary:
Inheritance graph

Classes

class  Iterator
 Specific Iterator impl for BankBinary class. More...
 

Public Member Functions

 BankBinary (const std::string &filename, size_t nbValidLetters=0)
 
 ~BankBinary ()
 
std::string getId ()
 
tools::dp::Iterator< Sequence > * iterator ()
 
int64_t getNbItems ()
 
void insert (const Sequence &item)
 
void flush ()
 
u_int64_t getSize ()
 
void estimate (u_int64_t &number, u_int64_t &totalSize, u_int64_t &maxSize)
 
void remove ()
 
- Public Member Functions inherited from AbstractBank
 AbstractBank ()
 
std::string getIdNb (int i)
 
int64_t estimateNbItemsBanki (int i)
 
const std::vector< IBank * > getBanks () const
 
int64_t estimateNbItems ()
 
u_int64_t estimateSequencesSize ()
 
u_int64_t getEstimateThreshold ()
 
void setEstimateThreshold (u_int64_t nbSeq)
 
void finalize ()
 
size_t getCompositionNb ()
 
- Public Member Functions inherited from Iterable< Sequence >
void iterate (Functor f)
 
virtual Sequence * getItems (Sequence *&buffer)
 
virtual size_t getItems (Sequence *&buffer, size_t start, size_t nb)
 
- Public Member Functions inherited from ISmartPointer
virtual ~ISmartPointer ()
 
- Public Member Functions inherited from Bag< Sequence >
virtual void insert (const Sequence &item)=0
 
virtual void insert (const std::vector< Sequence > &items, size_t length=0)
 
virtual void insert (const Sequence *items, size_t length)
 
- Public Member Functions inherited from SmartPointer
void use ()
 
void forget ()
 

Static Public Member Functions

static const char * name ()
 
static void setBufferSize (u_int64_t bufferSize)
 
static bool check (const std::string &uri)
 

Protected Attributes

std::string _filename
 

Additional Inherited Members

- Protected Member Functions inherited from SmartPointer
 SmartPointer ()
 
virtual ~SmartPointer ()
 

Detailed Description

Implementation of IBank for compressed format.

  • a binary file is made of:
    • a magic number
    • a list of blocks
      • a block is:
        • one block size (on 4 bytes)
        • a list of sequences
          • a sequence is:
            • a sequence length (on 4 bytes)
            • the nucleotides of the sequences (4 nucleotides encoded in 1 byte)
  • number of sequences (on 4 bytes)

Historically, BinaryBank has been used in the first step of the DSK tool to convert one input FASTA file into a binary format. DSK used to read several times the reads so having a binary (and so compressed) format had the nice effect to have less I/O operations and therefore less execution time.

In the following example, we can see how to convert any kind of bank into a binary bank:

// We declare an input Bank and use it locally
IBank* inputBank = Bank::open (argv[1]);
LOCAL (inputBank);
// We declare an output Bank
BankBinary outputBank (argv[2]);
// We create a sequence iterator on the input bank (with progress information).
ProgressIterator<Sequence> itSeq (*inputBank, "Converting input file into binary format");
// We insert each sequence of the input bank into the output bank.
for (itSeq.first(); !itSeq.isDone(); itSeq.next()) { outputBank.insert (itSeq.item()); }
// We make sure that the output bank is flushed correctly.
outputBank.flush ();

Constructor & Destructor Documentation

BankBinary ( const std::string &  filename,
size_t  nbValidLetters = 0 
)

Constructor. During a sequence insertion (see method 'insert'), a sequence may be split in sub sequences if invalid characters exist (like 'N'). A sub sequence is considered as valid if the number of consecutive letters is above some threshold (given as parameter). If this threshold is not provided, there is no split process during 'insert'

Parameters
[in]filename: uri of the bank.
[in]nbValidLetters: threshold for sequence split in 'insert' method
~BankBinary ( )

Destructor.

Member Function Documentation

bool check ( const std::string &  uri)
static

Check that the given uri is a correct binary bank.

void estimate ( u_int64_t &  number,
u_int64_t &  totalSize,
u_int64_t &  maxSize 
)
virtual

Give an estimation of sequences information in the bank.

Parameters
[out]number: sequences number
[out]totalSize: sequences size (in bytes)
[out]maxSize: max size size (in bytes)

Implements IBank.

void flush ( )
virtual

Flush the current content. May be useful for implementation that uses a cache.

Implements Bag< Sequence >.

std::string getId ( )
inlinevirtual

Implements IBank.

int64_t getNbItems ( )
inlinevirtual

Return the number of items. If a specific implementation doesn't know the value, it should return -1 by convention.

Returns
the number of items if known, -1 otherwise.

Implements Iterable< Sequence >.

u_int64_t getSize ( )
virtual

Return the size of the bank (comments + data)

The returned value may be an approximation in some case. For instance, if we use a zipped bank, an implementation may be not able to give accurate answer to the size of the original file.

Returns
the bank size in bytes.

Implements IBank.

void insert ( const Sequence item)
virtual

Insert an item into the bag.

Parameters
[in]item: the item to be inserted.

Implements IBank.

tools::dp::Iterator<Sequence>* iterator ( )
inlinevirtual

Create an iterator for the given Iterable instance.

Returns
the new iterator.

Implements IBank.

static const char* name ( )
inlinestatic

Returns the name of the bank format.

void remove ( )
virtual

Reimplemented from AbstractBank.

void setBufferSize ( u_int64_t  bufferSize)
static

Set default buffer size (static method).

Parameters
[in]bufferSize: size of the buffer.

Member Data Documentation

std::string _filename
protected

URI of the bank.


The documentation for this class was generated from the following files: