gatb.core-API-0.0.0
Sequence Struct Reference

Structure holding genomic information. More...

#include <Sequence.hpp>

Public Member Functions

 Sequence (tools::misc::Data::Encoding_e encoding=tools::misc::Data::ASCII)
 
 Sequence (char *seq)
 
virtual ~Sequence ()
 
virtual const std::string & getComment () const
 
virtual const std::string getCommentShort () const
 
virtual const std::string & getQuality () const
 
virtual tools::misc::DatagetData ()
 
virtual char * getDataBuffer () const
 
virtual size_t getDataSize () const
 
virtual tools::misc::Data::Encoding_e getDataEncoding () const
 
virtual size_t getIndex () const
 
void setDataRef (tools::misc::Data *ref, int offset, int length)
 
void setIndex (size_t index)
 
std::string toString () const
 
void setComment (const std::string &cmt)
 
void setQuality (const std::string &qual)
 
std::string getRevcomp () const
 

Public Attributes

std::string _comment
 
std::string _quality
 

Detailed Description

Structure holding genomic information.

A sequence holds several data :

  • comment (as a text)
  • genomic data
  • quality information (for fastq format, empty in other cases).

The genomic data is hold in a tools::misc::Data attribute and is supposed to hold nucleotides.

Actually, the inner format may be of different kind (ASCII, INTEGER, BINARY) and depends on the type of the bank that provides Sequence objects. For instance:

  • a FASTA bank will provide Sequence instances whose data is in ASCII
  • a BINARY bank will provide Sequence instances whose data is in BINARY

The buffer holding the nucleotides is located in the tools::misc::Data attribute, so have a look there to have further details on where the buffer can be allocated. Note just here that the buffer could be stored in the Data object itself, or may be a reference to a buffer allocated in another place.

The class Sequence is closely related to the IBank interface.

Note that this class should not be instantiated directly by end users; it is more likely that end users will receive such objects through an iteration from a bank.

Example of use:

// We create an iterator on the bank
Iterator<Sequence>* it = bank->iterator();
// We iterate the sequences of the bank
for (it->first(); !it->isDone(); it->next())
{
// We get a shortcut on the current sequence and its data
Sequence& seq = it->item();
Data& data = seq.getData();
// We dump some information about the sequence.
std::cout << "comment " << seq.getComment() << std::endl;
// We dump each nucleotide. NOTE: the output depends on the data encoding
for (size_t i=0; i<data.size(); i++) { std::cout << data[i]; } std::cout << std::endl;
}
See also
IBank

Constructor & Destructor Documentation

Constructor.

Parameters
[in]encoding: encoding scheme of the genomic data of the sequence
Sequence ( char *  seq)
inline

Constructor. For testing mainly : allows to set the genomic data through an ascii representation. For instance, one can provide "ACTTACGCAGAT" as argument of this constructor.

Parameters
[in]seq: the genomic data as an ascii string
virtual ~Sequence ( )
inlinevirtual

Destructor.

Member Function Documentation

virtual const std::string& getComment ( ) const
inlinevirtual
Returns
description of the sequence
virtual const std::string getCommentShort ( ) const
inlinevirtual
Returns
description of the sequence until first space
virtual tools::misc::Data& getData ( )
inlinevirtual
Returns
the data as a Data structure.
virtual char* getDataBuffer ( ) const
inlinevirtual

Return the raw buffer holding the genomic data. IMPORTANT : getting genomic data this way implies that the user knows what is the underlying encoding scheme in order to decode it (may be ASCII, INTEGER or BINARY)

Returns
buffer holding the genomic data as a raw buffer.
virtual tools::misc::Data::Encoding_e getDataEncoding ( ) const
inlinevirtual
Returns
encoding scheme of the data.
virtual size_t getDataSize ( ) const
inlinevirtual
Returns
number of nucleotides in the sequence.
virtual size_t getIndex ( ) const
inlinevirtual

Return the index of the sequence. It may be the index of the sequence in the database that holds the sequence.

Returns
index of the sequence.
virtual const std::string& getQuality ( ) const
inlinevirtual
Returns
quality of the sequence (set if the underlying bank is a fastq file).
std::string getRevcomp ( ) const
inline

Returns a string that is the reverse complement of the sequence The Sequence object needs to be in ASCII Format

void setComment ( const std::string &  cmt)
inline

Set the comment of the sequence (likely to be called by a IBank iterator).

Parameters
[in]cmt: comment of the sequence
void setDataRef ( tools::misc::Data ref,
int  offset,
int  length 
)
inline

Set the genomic data as a reference on a Data object (more precisely on a range in this data). This method may be used when one wants that the genomic data of the sequence points to an already existing buffer of nucleotides, which means that the sequence doesn't allocate any memory for storing the genomic data, it only relies on data stored somewhere else. This is mainly a shortcut to the gatb::core::tools::misc::Data::setRef method.

Parameters
[in]ref: the referred Data instance holding the genomic data
[in]offset: starting index in the referred data
[in]length: length of the genomic data of the current sequence.
void setIndex ( size_t  index)
inline

Set the index of the sequence. Typically, it should be called by a IBank iterator that knows what is the index of the currently iterated sequence.

Parameters
[in]index: index of the sequence
void setQuality ( const std::string &  qual)
inline

Set the quality string of the sequence (likely to be called by a fastq iterator).

Parameters
[in]qual: quality string of the sequence.
std::string toString ( ) const
inline

Get an ascii representation of the sequence. IMPORTANT ! this implementation supposes that the format of the Data attribute is ASCII. No conversion is done in case of other formats.

Returns
the ascii representation of the sequence.

Member Data Documentation

std::string _comment

Comment attribute (note: should be private with a setter and getter).

std::string _quality

Quality attribute (note: should be private with a setter and getter).


The documentation for this struct was generated from the following file: