gatb.core-API-0.0.0
Storage snippets

Presentation

This page presents some code snippets related to the use of persistency API.

Some of the snippets presented below can be used online.

Additional snippets are available in directory: gatb-core/gatb-core/examples/storage.

Create and save a collection with a Storage object

This snippet shows how to use a Storage object for creating a collection of integers. We use the HDF5 format, so we can control the result of our snippet with HDF5 tools.

Code is from example storage1.cpp:

// We include what we need for the test
// We use the required packages
using namespace std;
/********************************************************************************/
/* Create a Collection and save it with the Storage class. */
/* */
/* This snippet shows how to use the Storage layer. In this example, we use the */
/* HDF5 layer, so we can check the result of the test with HDF5 tools (like */
/* h5dump for instance). */
/* */
/* NOTE: GATB provides some HDF5 tools (check 'bin' directory) */
/* */
/********************************************************************************/
int main (int argc, char* argv[])
{
// We create a Storage product "foo" in HDF5 format
Storage* storage = StorageFactory(STORAGE_HDF5).create ("foo", true, false);
// We use locally this object (means that it should be automatically deleted when
// leaving the enclosing instructions block).
LOCAL (storage);
// Shortcut: we get the root of this Storage object
Group& root = storage->root();
// We get a collection of native integer from the storage.
Collection<NativeInt64>& myIntegers = root.getCollection<NativeInt64> ("myIntegers");
// We add some entries into the collection
myIntegers.insert (1);
myIntegers.insert (2);
myIntegers.insert (3);
myIntegers.insert (5);
myIntegers.insert (8);
// We flush the collection to be sure to save the content properly.
myIntegers.flush();
// Now, you can see the content of the collection by launching the following command: h5dump foo.h5
// You should get something like this:
// HDF5 "foo.h5" {
// GROUP "/" {
// DATASET "myIntegers" {
// DATATYPE H5T_STD_U8LE
// DATASPACE SIMPLE { ( 5 ) / ( H5S_UNLIMITED ) }
// DATA {
// (0): 1, 2, 3, 5, 8
// }
// }
// }
// }
}

[go back to top]

Create and save two collections with a Storage object

This snippet shows how to use a Storage object for creating collections of integers. We use the HDF5 format, so we can control the result of our snippet with HDF5 tools.

Code is from example storage2.cpp:

// We include what we need for the test
// We use the required packages
using namespace std;
/********************************************************************************/
/* Create 2 Collections and save them with the Storage class. */
/********************************************************************************/
int main (int argc, char* argv[])
{
// We create a Storage product "foo" in HDF5 format
Storage* storage = StorageFactory(STORAGE_HDF5).create ("foo", true, false);
LOCAL (storage);
// Shortcut: we get the root of this Storage object
Group& root = storage->root();
// We get two groups from the root
Group& group1 = root.getGroup("group1");
Group& group2 = root.getGroup("group2");
// We get two collections of native integer from the two groups.
// Note that we can use the same 'integers' name since they will be located in
// two different groups.
Collection<NativeInt64>& integers1 = group1.getCollection<NativeInt64> ("integers");
Collection<NativeInt64>& integers2 = group2.getCollection<NativeInt64> ("integers");
// We add some entries into the collection
integers1.insert (1);
integers1.insert (2);
integers1.insert (3);
integers2.insert (5);
integers2.insert (8);
// We flush the collections to be sure to save the content properly.
integers1.flush();
integers2.flush();
// Now, you can see the content of the collections by launching the following command: h5dump foo.h5
// You should get something like this:
// HDF5 "foo.h5" {
// GROUP "/" {
// GROUP "group1" {
// DATASET "integers" {
// DATATYPE H5T_STD_U8LE
// DATASPACE SIMPLE { ( 3 ) / ( H5S_UNLIMITED ) }
// DATA {
// (0): 1, 2, 3
// }
// }
// }
// GROUP "group2" {
// DATASET "integers" {
// DATATYPE H5T_STD_U8LE
// DATASPACE SIMPLE { ( 2 ) / ( H5S_UNLIMITED ) }
// DATA {
// (0): 5, 8
// }
// }
// }
// }
// }
}

[go back to top]

Load a collection from a Storage object

This snippet shows how to load a Storage object and get a saved collection from it.

Code is from example storage3.cpp:

// We include what we need for the test
// We use the required packages
using namespace std;
/********************************************************************************/
/* Read a Collection from a Storage file. */
/* */
/* This snippet reads items created during 'storage1' */
/* */
/********************************************************************************/
int main (int argc, char* argv[])
{
// We load a Storage product "foo" in HDF5 format
// It must have been created with the storage1 snippet
Storage* storage = StorageFactory(STORAGE_HDF5).load ("foo");
LOCAL (storage);
// Shortcut: we get the root of this Storage object
Group& root = storage->root();
// We get a collection of native integer from the storage.
Collection<NativeInt64>& myIntegers = root.getCollection<NativeInt64> ("myIntegers");
// We create an iterator for our collection.
Iterator<NativeInt64>* iter = myIntegers.iterator();
LOCAL (iter);
// Now we can iterate the collection through this iterator.
for (iter->first(); !iter->isDone(); iter->next()) { cout << iter->item() << endl; }
}

[go back to top]

Load collections from a Storage object

This snippet shows how to load a Storage object and get saved collections from it. Note that we use lambda expressions in this example.

Code is from example storage4.cpp:

// We include what we need for the test
// We use the required packages
using namespace std;
/********************************************************************************/
/* Read two Collections from a Storage file. */
/* */
/* This snippet reads items created during 'storage2' */
/* */
/* WARNING ! THIS SNIPPET SHOWS ALSO HOW TO USE LAMBDA EXPRESSIONS, SO YOU NEED */
/* TO USE A COMPILER THAT SUPPORTS THIS FEATURE. */
/* */
/********************************************************************************/
int main (int argc, char* argv[])
{
// We load a Storage product "foo" in HDF5 format
// It should have been created with the storage2 snippet
Storage* storage = StorageFactory(STORAGE_HDF5).load ("foo");
LOCAL (storage);
// Shortcut: we get the root of this Storage object
Group& root = storage->root();
// We get two groups from the root
Group& group1 = root.getGroup("group1");
Group& group2 = root.getGroup("group2");
// We iterate the two collections with a lambda expression. Note that we use lambda expressions here.
group1.getCollection<NativeInt64> ("integers").iterate ([] (const NativeInt64& n) { cout << n << endl; });
cout << endl;
group2.getCollection<NativeInt64> ("integers").iterate ([] (const NativeInt64& n) { cout << n << endl; });
}

[go back to top]

Iterate solid kmers from a HDF5 file

This snippet shows how to use a HDF5 Storage object holding solid kmers and iterate the kmers.

It also uses a Model instance in order to convert the solid kmers values into the corresponding nucleotides sequence.

The input file is likely to have been generated by dbgh5 for instance, or by dsk.

If you want to know the structure of the HDF5 file, you can use the h5dump utility, for instance: h5dump -H file.h5

Code is from example storage6.cpp:

// We include what we need for the test
#include <iostream>
#include <memory>
// We use the required packages
using namespace std;
/********************************************************************************/
/* Iterate solid kmers from a HDF5 file */
/* */
/* This snippet shows how to iterate solid kmers from a file generated by DSK */
/* or by dbgh5. It also compute the distribution of kmers. */
/* */
/********************************************************************************/
int main (int argc, char* argv[])
{
OptionsParser parser ("StorageSnippet");
parser.push_back (new OptionOneParam (STR_URI_GRAPH, "graph input", true));
parser.push_back (new OptionOneParam (STR_VERBOSE, "verbosity (0:no display, 1: display kmers, 2: display distrib", false, "0"));
try
{
IProperties* options = parser.parse (argc, argv);
int display = options->getInt (STR_VERBOSE);
// We get a handle on the HDF5 storage object.
// Note that we use an auto pointer since the StorageFactory dynamically allocates an instance
auto_ptr<Storage> storage (StorageFactory(STORAGE_HDF5).load (options->getStr(STR_URI_GRAPH)));
// We get the group for dsk
Group& dskGroup = storage->getGroup("dsk");
// We get the solid kmers collection 1) from the 'dsk' group 2) from the 'solid' collection
Partition<Kmer<>::Count>& solidKmers = dskGroup.getPartition<Kmer<>::Count> ("solid");
// We can retrieve information (as an XML string) about the construction of the solid kmers
cout << dskGroup.getProperty("xml") << endl;
// We can access each of these information through a Properties object
Properties props;
props.readXML (dskGroup.getProperty("xml"));
Properties configProps;
configProps.readXML (storage->getGroup("configuration").getProperty("xml"));
// Now, we can for instance get the kmer size (as an integer)
cout << "kmer size: " << configProps.getInt ("kmer_size") << endl;
cout << "nb solid kmers: " << props.getInt ("kmers_nb_solid") << endl;
// We create a Model instance. It will help to dump the kmers in
// a human readable form (ie as a string of nucleotides)
Kmer<>::ModelCanonical model (configProps.getInt ("kmer_size"));
size_t nbKmers = 0;
// We create an iterator for our [kmer,abundance] values
ProgressIterator<Kmer<>::Count> iter (solidKmers);
Kmer<>::Type checksum;
map<u_int64_t,u_int64_t> distrib;
// We iterate the solid kmers from the retrieved collection
for (iter.first(); !iter.isDone(); iter.next())
{
// shortcut
Kmer<>::Count& count = iter.item();
// We update the checksum.
checksum += count.value;
// We update the distribution
distrib [count.abundance] ++;
// We dump the solid kmer information:
// 1) nucleotides
// 2) raw value (integer)
// 3) abundance
if (display==1)
{
cout << "[" << ++nbKmers << "] " << model.toString(count.value) << " " << count.value << " " << count.abundance << endl;
}
}
cout << "kmer checksum: " << checksum << endl;
if (display==2)
{
for (map<u_int64_t,u_int64_t>::iterator it = distrib.begin(); it != distrib.end(); ++it)
{
cout << it->first << " " << it->second << endl;
}
}
}
catch (OptionFailure& e)
{
return e.displayErrors (std::cout);
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
}

[go back to top]

Associate metadata to HDF5 collections

This snippet shows how to associate metadata to HDF5 collections.

You can dump such values with h5dump: h5dump -a myIntegers/myData foo.h5

Code is from example storage7.cpp:

// We include what we need for the test
#include <iostream>
// We use the required packages
using namespace std;
/********************************************************************************/
/* Associate metadata to HDF5 collections */
/********************************************************************************/
int main (int argc, char* argv[])
{
// We create a Storage product "foo" in HDF5 format
Storage* storage = StorageFactory(STORAGE_HDF5).create ("foo", true, false);
// We use locally this object (means that it should be automatically deleted when
// leaving the enclosing instructions block).
LOCAL (storage);
// Shortcut: we get the root of this Storage object
Group& root = storage->root();
// We get a collection of native integer from the storage.
Collection<NativeInt64>& myIntegers = root.getCollection<NativeInt64> ("myIntegers");
// We associate a custom data to our collection
myIntegers.addProperty ("myData", "test_%d", 147);
// We can retrieve later this metadata with getProperty
cout << "metadata is " << myIntegers.getProperty("myData") << endl;
// You can dump such values with h5dump:
// h5dump -a myIntegers/myData foo.h5
}

[go back to top]

Using C++ like streams with HDF5

This snippet shows how to use binary input/output streams with HDF5. There are two types:

  • Storage::ostream : used for saving binary data into a HDF5 collection
  • Storage::istream : used for retrieving binary data from a HDF5 collection

Code is from example storage8.cpp:

// We include what we need for the test
#include <iostream>
// We use the required packages
using namespace std;
/********************************************************************************/
/* ostream and istream with HDF5 */
/* */
/* This snippet shows how to use binary input/output streams with HDF5. */
/* */
/* You can use 'h5dump foo.h5' to have a look to the generated binary stream */
/* inside the HDF5 dataset. */
/* */
/* Here, we save 3 float objects, which needs 12 bytes, so we can see the */
/* binary representation of theses 3 floats. */
/* */
/********************************************************************************/
int main (int argc, char* argv[])
{
float table[] = { 0.577, 3.1415, 2.71 };
// We create a Storage product "foo" in HDF5 format
Storage* storage = StorageFactory(STORAGE_HDF5).create ("foo", true, false);
// We use locally this object (means that it should be automatically deleted when
// leaving the enclosing instructions block).
LOCAL (storage);
// Shortcut: we get the root of this Storage object
Group& root = storage->root();
// We get an output stream in a C++ style
Storage::ostream os (root, "data");
// We write some information in this stream
os.write (reinterpret_cast<char const*>(table), sizeof(table));
// We have to flush the stream in order to be sure everything is ok
os.flush();
// We get a handle on the HDF5 collection where we put our data
// Note: the collection is typed as NativeInt8, meaning we get binary data
Collection<NativeInt8>& dataCollection = root.getCollection<NativeInt8> ("data");
// We get the number of items in the collection.
size_t nbItems = dataCollection.getNbItems() / sizeof(table[0]);
cout << "nb items : " << nbItems << endl;
// Now we declare an input stream on the collection
Storage::istream is (root, "data");
// We want to read the data, we first need to have a buffer for this
float* buffer = new float [nbItems];
// We read the data from the input stream
is.read (reinterpret_cast<char*>(buffer), nbItems*sizeof(float));
// We check that we read correct values.
cout << "check : " << (memcmp(buffer, table, nbItems*sizeof(float)) == 0) << endl;
// cleanup
delete[] buffer;
}

[go back to top]