Presentation
This page presents some code snippets related to the use of De Bruijn graph API.
Some of the snippets presented below can be used online.
Additional snippets are available in directory: gatb-core/gatb-core/examples/debruijn.
Build / Load De Bruijn graphs
Building a De Bruijn graph from command line options
This snippet shows how to create a Graph object thanks to command line options with at least a mandatory FASTA file URI.
The first thing to do is to get a parser that analyzes the command line options from (argc,argv). Such a parser can be retrieved with a static method from Graph class.
Then, the parsed options can be provided to the Graph::create method and then we get a Graph object on which we can do anything we want.
The only mandatory option is '-in fastafile'. All other options have default values if not set through command line.
In this snippet, we dump information about the Graph object building with Graph::getInfo method.
Code is from example debruijn1.cpp:
int main (int argc, char* argv[])
{
IOptionsParser* parser = Graph::getOptionsParser();
try
{
parser->parse (argc, argv);
Graph graph = Graph::create (parser->getProperties());
std::cout << graph.getInfo() << std::endl;
}
catch (OptionFailure& e)
{
return e.displayErrors (std::cout);
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Building a De Bruijn graph from a command-line-like string
Like the previous snippet, we create a Graph object with command line options, but here the options are directly provided as a string.
Code is from example debruijn2.cpp:
int main (int argc, char* argv[])
{
if (argc < 2)
{
std::cerr << "You must provide a bank uri." << std::endl;
return EXIT_FAILURE;
}
try
{
Graph graph = Graph::create ("-in %s", argv[1]);
std::cout << graph.getInfo() << std::endl;
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Building a De Bruijn graph from a bank object
Here, we create a Graph object by providing a bank object, more precisely a IBank object.
It is therefore possible to provide a Bank instance (ie a FASTA bank), or another kind of bank that implements the IBank interface.
Note in the example that we can provide additional options after the bank object.
Code is from example debruijn3.cpp:
int main (int argc, char* argv[])
{
if (argc < 2)
{
std::cerr << "You must provide a FASTA file uri." << std::endl;
return EXIT_FAILURE;
}
try
{
Graph graph = Graph::create (Bank::open(argv[1]), "-abundance-min %d", 5);
std::cout << graph.getInfo() << std::endl;
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Building a De Bruijn graph from a fake bank object
Like the previous snippet, we create a Graph object by providing a bank object, but here this is a 'fake' bank built "on the fly".
Such banks are often useful for testing purposes.
In such a case, the output file for the graph will be named "noname", unless a specific name is set through the command line option "-out mygraph".
Code is from example debruijn4.cpp:
int main (int argc, char* argv[])
{
IBank* bank = new BankStrings (
"ATCGTACGACGCTAGCTAGCA",
"ACTACGTATCGGTATATATTTTCGATCGATCAG",
"TGACGGTAGCATCGATCAGGATCGA",
NULL
);
try
{
Graph graph = Graph::create (bank, "-kmer-size 5 -abundance-min 1 -out mygraph");
std::cout << graph.getInfo() << std::endl;
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Load a De Bruijn graph from a graph file
Once we have built a graph, it is saved as a file (likely a HDF5 file).
This snippet shows how to load such a file to get a Graph object.
Code is from example debruijn5.cpp:
int main (int argc, char* argv[])
{
if (argc < 2)
{
std::cerr << "You must provide a HDF5 file." << std::endl;
return EXIT_FAILURE;
}
try
{
Graph graph = Graph::load (argv[1]);
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Iterating nodes
Iterate the nodes of a De Bruijn graph
This snippet shows how to iterate all the nodes of a graph (the graph is loaded from a graph file).
The idea is to get an iterator from the graph and use it to get each node of the graph.
Here, the nodes are retrieved regardless of any edges between them.
Code is from example debruijn6.cpp:
int main (int argc, char* argv[])
{
if (argc < 2)
{
std::cerr << "You must provide a HDF5 file." << std::endl;
return EXIT_FAILURE;
}
try
{
Graph graph = Graph::load (argv[1]);
GraphIterator<Node> it = graph.iterator ();
for (it.first(); !it.isDone(); it.next())
{
std::cout << graph.toString (it.item()) << std::endl;
}
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Iterate the nodes of a De Bruijn graph in a multithread way
As the previous example, this snippet shows how to iterate all the nodes of a graph.
The difference here is that the iteration is parallelized, using all possible available cores, which should speed up the iteration.
WARNING ! don't forget this is parallel execution, so you have be careful if you want to modify the same object in different thread execution contexts.
Note: lambda expressions are used here to have code conciseness (which suppose to use an up-to-date compiler). You can have some information at http://stackoverflow.com/questions/7627098/what-is-a-lambda-expression-in-c11
Code is from example debruijn7.cpp:
int main (int argc, char* argv[])
{
if (argc < 2)
{
std::cerr << "You must provide a HDF5 file." << std::endl;
return EXIT_FAILURE;
}
try
{
Graph graph = Graph::load (argv[1]);
GraphIterator<Node> it = graph.iterator ();
size_t nbCores = 4;
IDispatcher::Status status = Dispatcher(nbCores).iterate (it, [&graph] (const Node& node)
{
});
std::cout << "we used " << status.nbCores << " cores, elapsed time is " << status.time << " msec" << std::endl;
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Neighborhoods
Working with neighborhoods of nodes in the De Bruijn graph
This snippet shows how to use some methods related to neighbors in a graph.
We use a fake bank with one sequence of size 5 and use a kmer size of 4, so we will have 2 possible nodes in the graph.
We iterate these two nodes, and for one of them, we ask for its neighbors with the Graph::successors method. We can then check that the retrieved neighbor is the correct one by analyzing the node string representations.
In this example, we use the successors method, but note it is possible to get the incoming neighbors with the Graph::predecessors method. By the way, it is possible to know the in and out degree of a given node with the two corresponding methods Graph::indegree and Graph::outdegree
Code is from example debruijn9.cpp:
#undef NDEBUG
#include <cassert>
int main (int argc, char* argv[])
{
try
{
Graph graph = Graph::create (new BankStrings ("AATGC", NULL), "-kmer-size 4 -abundance-min 1 -verbose 0");
GraphIterator<Node> it = graph.iterator();
assert (it.size() == 2);
for (it.first(); !it.isDone(); it.next())
{
Node& current = it.item();
std::string s = graph.toString (current);
assert (s=="AATG" || s=="ATGC");
if (s=="AATG")
{
GraphVector<Node> neighbors = graph.successors(current);
assert (neighbors.size() == 1);
assert (graph.outdegree(current) == 1);
assert (graph.toString(neighbors[0]) == "ATGC");
}
}
std::cout << "Test OK" << std::endl;
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Working with neighborhoods of nodes in the De Bruijn graph (continued)
This snippet shows how to use some methods related to neighbors in a graph.
We do the same work as the previous example. The only difference is that we retrieve the neighbors as Edge objects rather than Node objects, so we will have the full information about each transition between the source node and the retrieved neighbors (including the transition nucleotide for instance).
In particular, we use some of the Edge attributes (Edge::to, Edge::nt)
Code is from example debruijn10.cpp:
#undef NDEBUG
#include <cassert>
int main (int argc, char* argv[])
{
try
{
Graph graph = Graph::create (new BankStrings ("AATGC", NULL), "-kmer-size 4 -abundance-min 1 -verbose 0");
GraphIterator<Node> it = graph.iterator ();
assert (it.size() == 2);
for (it.first(); !it.isDone(); it.next())
{
Node& current = it.item();
std::string s = graph.toString (current);
assert (s=="AATG" || s=="ATGC");
if (s=="AATG")
{
GraphVector<Edge> neighbors = graph.successorsEdge (current);
assert (neighbors.size() == 1);
assert (graph.toString(neighbors[0].to) == "ATGC");
assert (neighbors[0].nt ==
NUCL_C);
}
}
std::cout << "Test OK" << std::endl;
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Working with a specific neighbor for a specific node
This snippet shows how to get a specific neighbor for a specific node.
Sometimes, it is interesting to ask the graph for only one neighbor for a given node. It may happen when one has already got neighbors information through a Graph::neighbors call and memorized only the transition nucleotides for valid neighbors.
The Graph::neighbor fulfills this need. This method has two forms, with or without check to graph membership, according to performance considerations.
Code is from example debruijn11.cpp:
#undef NDEBUG
#include <cassert>
int main (int argc, char* argv[])
{
try
{
Graph graph = Graph::create (new BankStrings ("AATGC", NULL), "-kmer-size 4 -abundance-min 1 -verbose 0");
GraphIterator<Node> it = graph.iterator ();
assert (it.size() == 2);
for (it.first(); !it.isDone(); it.next())
{
Node& current = it.item();
std::string s = graph.toString (current);
if (s=="AATG")
{
Node neighbor = graph.successor (current,
NUCL_C);
assert (graph.toString (neighbor) == "ATGC");
bool exists;
Node potentialNeighbor;
potentialNeighbor = graph.successor (current,
NUCL_A, exists);
assert (exists == false);
potentialNeighbor = graph.successor (current,
NUCL_C, exists);
assert (exists == true);
assert (graph.toString (potentialNeighbor) == "ATGC");
potentialNeighbor = graph.successor (current,
NUCL_G, exists);
assert (exists == false);
potentialNeighbor = graph.successor (current,
NUCL_T, exists);
assert (exists == false);
}
}
std::cout << "Test OK" << std::endl;
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Branching nodes
In the De Bruijn graph, we can define two types of nodes:
- a node N is 'simple' <=> indegree(N)==1 && outdegree(N)
- a node N is 'branching' <=> N is not simple
Branching nodes set is an important subset in the whole nodes set of the De Bruijn graph, so it is of most interest to have some graph methods that work on such nodes. In particular, we can:
- iterate all the branching nodes of the De Bruijn graph
- get the branching neighbors of some node
Iterate the branching nodes of a De Bruijn graph
This snippet shows how to iterate the branching nodes of a graph (the graph is loaded from a graph file).
The idea is to get an iterator from the graph and use it to get each branching node of the graph.
Code is from example debruijn8.cpp:
int main (int argc, char* argv[])
{
if (argc < 2)
{
std::cerr << "You must provide a HDF5 file." << std::endl;
return EXIT_FAILURE;
}
try
{
Graph graph = Graph::load (argv[1]);
GraphIterator<BranchingNode> it = graph.iteratorBranching ();
for (it.first(); !it.isDone(); it.next())
{
std::cout << "[" << it.rank() << "] " << graph.toString (it.item()) << std::endl;
}
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Working with branching neighbors of a node
This snippet shows how to get the branching neighbors of a node. Such neighbors are computed as follow:
- the immediate neighbors of the node are retrieved
- a simple path is done from each neighbor in order to reach the first non simple node
Here, we use directly the Graph::successors<BranchingNode> method that encapsulates this behavior.
Code is from example debruijn16.cpp:
int main (int argc, char* argv[])
{
size_t kmerSize = 7;
const char* sequences[] =
{
"AGGCGCTAGGGAGAGGATGATGAAA",
"AGGCGCTCGGGAGAGGATGATGAAA",
"AGGCGCTTGGGAGAGGATGATGAAA"
};
try
{
Graph graph = Graph::create (new BankStrings (sequences, ARRAY_SIZE(sequences)), "-kmer-size %d -abundance-min 1 -verbose 0", kmerSize);
Node node = graph.buildNode (sequences[0]);
GraphVector<BranchingNode> branchingNeighbors = graph.successorsBranching (node);
std::cout << "We found " << branchingNeighbors.size() << " branching neighbors from node " << graph.toString(node) << std::endl;
for (size_t i=0; i<branchingNeighbors.size(); i++)
{
std::cout << graph.toString (branchingNeighbors[i]) << std::endl;
}
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Working with branching neighbors of a node (continued)
This snippet shows how to get the branching neighbors of a node.
It is similar to the previous snippet. The difference here is that we retrieve BranchingEdge objects. A BranchingEdge object is made of:
- the source branching node
- the target branching node
- the direction of the neighbors (in/out coming)
- the nucleotide of the transition between the initial branching node and the first neighbor on the simple path between the two branching nodes.
- the number of transitions that link the two branching nodes.
Code is from example debruijn17.cpp:
int main (int argc, char* argv[])
{
size_t kmerSize = 7;
const char* sequences[] =
{
"AGGCGCTAGGGAGAGGATGATGAAA",
"AGGCGCTCGGGAGAGGATGATGAAA",
"AGGCGCTTGGGAGAGGATGATGAAA"
};
try
{
Graph graph = Graph::create (new BankStrings (sequences, ARRAY_SIZE(sequences)), "-kmer-size %d -abundance-min 1 -verbose 0", kmerSize);
Node node = graph.buildNode (sequences[0]);
GraphVector<BranchingEdge> branchingNeighbors = graph.successorsBranchingEdge (node);
std::cout << "We found " << branchingNeighbors.size() << " branching neighbors from node " << graph.toString(node) << std::endl;
for (size_t i=0; i<branchingNeighbors.size(); i++)
{
std::cout << graph.toString (branchingNeighbors[i]) << std::endl;
}
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Simple path
Iterating simple path from a node
As we saw previously, a simple node is defined as having indegree==1 and outdegree==1. It is often useful to iterate successive simple nodes in order to build some path in the De Bruijn graph.
This snippet shows how to iterate such a simple path. Here, the iterated items are the successive nodes of the path.
Code is from example debruijn14.cpp:
int main (int argc, char* argv[])
{
try
{
size_t kmerSize = 11;
const char* seq = "AGGCGCTAGGGTAGAGGATGATGA";
std::cout << "The initial sequence is '" << seq << "', kmer size is " << kmerSize << std::endl;
Graph graph = Graph::create (new BankStrings (seq, NULL), "-kmer-size %d -abundance-min 1 -verbose 0", kmerSize);
Node node = graph.buildNode (seq);
GraphIterator<Node> path = graph.simplePath (node, DIR_OUTCOMING);
for (path.first(); !path.isDone(); path.next())
{
std::cout << " [" << path.rank() << "] current item is " << graph.toString (path.item()) << std::endl;
}
std::cout << "The simple path was " << path.rank() << " long" << std::endl;
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Iterating simple path from a node (continued)
Like the previous example, this snippet shows how to iterate a simple path. Here, the iterated items are the successive edges of the path. If we note E an edge in this path, we will have:
- outdegree(E.from)==1 && indegree(E.to)==1
Code is from example debruijn15.cpp:
int main (int argc, char* argv[])
{
try
{
size_t kmerSize = 11;
char* seq = (char*) "AGGCGCTAGGGTAGAGGATGATGA";
std::cout << "The initial sequence is '" << seq << "', kmer size is " << kmerSize << std::endl;
Graph graph = Graph::create (new BankStrings (seq, NULL), "-kmer-size %d -abundance-min 1 -verbose 0", kmerSize);
Node node = graph.buildNode (seq);
GraphIterator<Edge> path = graph.simplePathEdge (node, DIR_OUTCOMING);
for (path.first(); !path.isDone(); path.next())
{
std::cout << " [" << path.rank() << "] current item is " << graph.toString (path.item()) << std::endl;
}
std::cout << "The simple path was " << path.rank() << " long" << std::endl;
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Miscellanous
Playing with node strands
A Node object is fully defined by a kmer value and a strand that disambiguates how to interpret the kmer.
It is often required to change the reading strand of a node. This can be done with the Graph::reverse method.
Code is from example debruijn12.cpp:
#undef NDEBUG
#include <cassert>
int main (int argc, char* argv[])
{
try
{
Graph graph = Graph::create (new BankStrings ("AATGC", NULL), "-kmer-size 5 -abundance-min 1 -verbose 0");
GraphIterator<Node> it = graph.iterator ();
assert (it.size() == 1);
it.first(); Node& current = it.item();
std::string s = graph.toString (current);
assert (s == "AATGC");
Node other = graph.reverse (current);
assert (graph.toString(other) == "GCATT");
assert (current.kmer == other.kmer);
std::cout << "Test OK" << std::endl;
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Playing with fake nodes
Sometimes, it is useful to build "fake" nodes from a simple sequence, without having a graph holding true data.
It is possible to get an empty Graph object (its kmer size must be nevertheless specified), and then use the Graph::buildNode to get a node based on a Data object.
Code is from example debruijn13.cpp:
#undef NDEBUG
#include <cassert>
int main (int argc, char* argv[])
{
try
{
Graph graph = Graph::create (7);
const char* seq = "ATGCATCGTA";
Node n0 = graph.buildNode (seq+0);
assert (graph.toString(n0) == "ATGCATC");
Node n1 = graph.buildNode (seq+1);
assert (graph.toString(n1) == "TGCATCG");
std::cout << "Test OK" << std::endl;
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
To go further...
Getting branching nodes statistics in a parallel way
This example is a little bit harder than the previous ones. Its purpose is to show how to use the graph API for extracting some information. In particular, we will try to take care about time execution by using available cores.
The idea here is to compute information about branching nodes and their in and out degrees. For instance, we want to know how many branching nodes have indegree==2 and outdegree=3. We compute therefore the number of branching nodes having indegree==X and outdegree==Y, with X and Y in [0..4] and the constraint that we can't have X==Y==1 (otherwise the node wouldn't be a branching one).
We can do it easily by using the methods:
- Graph::successors<BranchingNode>
- Graph::predecessors<BranchingNode>
Moreover, we want to do it in a parallel way in order to speed up the computation. The idea is to get an iterator over the branching nodes and iterate it through a Dispatcher object; such a dispatcher will create as many threads as wanted and will feed each threads with branching nodes. Note that this scheme can work here because the processing of one branching node is independant of the others.
We need also some container for the in/out degrees statistics. A natural choice is to use a map, with the key being a unique identifier for a couple (indegree/outdegree) and the value the number of occurrences for the key. The idea is to use one map per thread and to merge the N maps into a global one after the parallel iteration of the branching nodes. We use here a ThreadObject object that allows to do it in a simple way. This object clones N time the global map and each clone is used in a specific thread. The ThreadObject class allows to hide many cumbersome points for the parallelization.
In this example, we also use progress notification feature (with ProgressIterator) in order to have some user feedback during the iteration of the branching nodes.
Code is from example debruijn18.cpp:
#include <map>
#include <algorithm>
#include <cstdio>
#include <cstdlib>
int main (int argc, char* argv[])
{
OptionsParser parser ("GraphStats");
parser.push_back (new OptionOneParam (STR_URI_GRAPH, "graph input", true));
parser.push_back (new OptionOneParam (STR_NB_CORES, "nb cores", false, "0"));
try
{
IProperties* options = parser.parse (argc, argv);
Graph graph = Graph::load (options->getStr(STR_URI_GRAPH));
size_t nbCores = options->getInt(STR_NB_CORES);
ProgressGraphIterator<BranchingNode,ProgressTimer> itBranching (graph.iteratorBranching(), "statistics");
typedef pair<size_t,size_t> InOut_t;
ThreadObject <map <InOut_t, size_t> > topology;
IDispatcher::Status status = Dispatcher(nbCores).iterate (itBranching, [&] (const BranchingNode& node)
{
map <InOut_t,size_t>& localTopology = topology();
GraphVector<BranchingEdge> successors = graph.successorsBranchingEdge ((Node&)node);
GraphVector<BranchingEdge> predecessors = graph.predecessorsBranchingEdge ((Node&)node);
localTopology [make_pair(predecessors.size(), successors.size())] ++;
});
topology.foreach ([&] (const map <InOut_t, size_t>& t)
{
for_each (t.begin(), t.end(), [&] (const pair<InOut_t, size_t>& p) { (*topology)[p.first] += p.second; });
});
vector < pair<InOut_t,size_t> > stats;
for (auto it = topology->begin(); it != topology->end(); it++) { stats.push_back (*it); }
sort (stats.begin(), stats.end(), [=] (const pair<InOut_t,size_t>& a, const pair<InOut_t,size_t>& b) { return a.second > b.second; });
printf ("\nThere are %d branching nodes with the following distribution: \n", itBranching.size());
size_t sum=0;
for (size_t i=0; i<stats.size(); i++)
{
sum += stats[i].second;
printf (" [in=%d out=%d] nb=%7d percent=%5.2f distrib=%5.2f\n",
stats[i].first.first,
stats[i].first.second,
stats[i].second,
100.0*(float)stats[i].second / (float)itBranching.size(),
100.0*(float)sum / (float)itBranching.size()
);
}
printf ("\nDone on %d cores in %.2f sec\n\n", status.nbCores, (float)status.time/1000.0);
}
catch (OptionFailure& e)
{
return e.displayErrors (std::cout);
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Computing connected components of the branching nodes sub graph
Code is from example debruijn19.cpp:
#include <map>
#include <cstdio>
#include <cstdlib>
#include <set>
#undef NDEBUG
#include <cassert>
class GraphMarker
{
public:
GraphMarker (const Graph& graph) : graph(graph)
{
auto iter = [&] (const BranchingNode& item) { this->markMap[item] = false; };
graph.iteratorBranching().iterate (iter);
}
void mark (const BranchingNode& item) { markMap [item] = true; }
void mark (const set<BranchingNode>& items)
{
for (typename set<BranchingNode>::const_iterator it=items.begin(); it != items.end(); ++it) { mark (*it); }
}
bool isMarked (const BranchingNode& item) const { return markMap.find(item)->second; }
private:
const Graph& graph;
map<BranchingNode,bool> markMap;
};
class BFS
{
public:
BFS (const Graph& graph) : graph(graph) {}
const set<BranchingNode>& run (const BranchingNode& node)
{
set<BranchingNode> frontline;
frontline.insert (node);
connectedComponent.clear();
connectedComponent.insert (node);
while (!frontline.empty())
{
set<BranchingNode> neighbors = graph.neighbors (frontline.begin(), frontline.end());
frontline.clear();
for (typename set<BranchingNode>::iterator it = neighbors.begin(); it != neighbors.end(); it++)
{
if (connectedComponent.find (*it) == connectedComponent.end())
{
frontline.insert (*it);
connectedComponent.insert (*it);
}
}
}
return connectedComponent;
}
const set<BranchingNode>& get() const { return connectedComponent; }
private:
const Graph& graph;
set<BranchingNode> connectedComponent;
};
int main (int argc, char* argv[])
{
OptionsParser parser ("GraphStats");
parser.push_back (new OptionOneParam (STR_URI_GRAPH, "graph input", true));
try
{
IProperties* options = parser.parse (argc, argv);
Graph graph = Graph::load (options->getStr(STR_URI_GRAPH));
GraphMarker marker (graph);
BFS bfs (graph);
map<size_t,size_t> distrib;
ProgressGraphIterator<BranchingNode,ProgressTimer> itBranching (graph.iteratorBranching(), "statistics");
size_t nbConnectedComponents = 0;
TimeInfo ti;
ti.start ("compute");
for (itBranching.first(); !itBranching.isDone(); itBranching.next())
{
if (marker.isMarked (*itBranching)) { continue; }
const set<BranchingNode>& component = bfs.run (*itBranching);
marker.mark (component);
distrib[component.size()] ++;
nbConnectedComponents++;
}
ti.stop ("compute");
size_t sum = 0; for (map<size_t,size_t>::iterator it = distrib.begin(); it != distrib.end(); it++) { sum += it->first*it->second; }
assert (sum == itBranching.size());
Properties props ("connected_components");
props.add (1, "graph_name", "%s", graph.getName().c_str());
props.add (1, "nb_branching_nodes", "%d", sum);
props.add (1, "nb_connected_classes", "%d", distrib.size());
props.add (1, "nb_connected_components", "%d", nbConnectedComponents);
for (map<size_t,size_t>::iterator it = distrib.begin(); it!=distrib.end(); it++)
{
props.add (2, "component");
props.add (3, "nb_nodes", "%d", it->first);
props.add (3, "nb_occurs", "%d", it->second);
props.add (3, "freq_nodes", "%f", 100.0*(float)(it->first*it->second) / (float)sum);
}
props.add (1, ti.getProperties("time"));
XmlDumpPropertiesVisitor v (graph.getName() + ".xml", false);
props.accept (&v);
}
catch (OptionFailure& e)
{
return e.displayErrors (std::cout);
}
catch (Exception& e)
{
std::cerr << "EXCEPTION: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
}
[go back to top]
Generate dot file from a graph
This snippet generates a dot file from a de Bruijn graph. It is then possible to generate a pdf output of the graph (dot -Tpdf graph.dot -o graph.pdf)
Code is from example debruijn24.cpp:
#include <fstream>
#include <queue>
#include <stack>
#include <map>
#define DEBUG(a) //a
#define INFO(a) //a
const char* STR_NODE_TYPE = "-type";
class DotGeneratorTool : public Tool
{
public:
DotGeneratorTool () : Tool ("DotGenerator")
{
_parser->push_front (new OptionOneParam (STR_URI_GRAPH, "graph file", true ));
_parser->push_front (new OptionOneParam (STR_URI_OUTPUT, "dot file", false ));
_parser->push_front (new OptionOneParam (STR_NODE_TYPE, "node type (0: all, 1:branching)", false, "1" ));
}
void processNode(Graph &graph, ofstream &output)
{
map<Node, u_int64_t> mapping;
u_int64_t count = 0;
GraphIterator<Node> itMap = graph.iterator ();
for (itMap.first(); !itMap.isDone(); itMap.next()) { mapping[itMap.item()] = count++; }
ProgressGraphIterator<Node,ProgressTimer> it = graph.iterator ();
for (it.first(); !it.isDone(); it.next())
{
Node current = it.item();
GraphVector<Node> neighbors = graph.neighbors(current.kmer);
for (size_t i=0; i<neighbors.size(); i++)
{
output << mapping[current.kmer] << " -> " << mapping[neighbors[i].kmer] << " ;\n";
}
}
output << "}\n";
output.close();
}
void processBranchingNode(Graph &graph, ofstream & output)
{
map<Node, u_int64_t> mapping;
u_int64_t count = 0;
GraphIterator<BranchingNode> itMap = graph.iteratorBranching();
for (itMap.first(); !itMap.isDone(); itMap.next()) { mapping[itMap.item()] = count++; }
ProgressGraphIterator<BranchingNode,ProgressTimer> it = graph.iteratorBranching ();
for (it.first(); !it.isDone(); it.next())
{
BranchingNode current = it.item();
GraphVector<BranchingNode> neighbors = graph.neighborsBranching (current.kmer);
for (size_t i=0; i<neighbors.size(); i++)
{
output << mapping[current.kmer] << " -> " << mapping[neighbors[i].kmer] << " ;\n";
}
}
output << "}\n";
output.close();
}
void execute ()
{
string outputFile = getInput()->get(STR_URI_OUTPUT) ?
getInput()->getStr(STR_URI_OUTPUT) :
(System::file().getBaseName(getInput()->getStr(STR_URI_GRAPH)) + ".dot");
ofstream output (outputFile.c_str());
Graph graph = Graph::load (getInput()->getStr(STR_URI_GRAPH));
switch (getInput()->getInt(STR_NODE_TYPE))
{
case 0:
output << "digraph " << "all" << "{\n";
processNode(graph, output);
break;
case 1:
output << "digraph " << "branching" << "{\n";
processBranchingNode(graph, output);
break;
default: break;
}
}
};
int main (int argc, char* argv[])
{
try
{
DotGeneratorTool().run (argc, argv);
}
catch (Exception& e)
{
std::cout << "EXCEPTION: " << e.getMessage() << std::endl;
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
[go back to top]
Retrieve extra information from a node
This snippet shows how to use the 'queryAbundance' method that returns the occurrences number of the node in the initial set of reads.
Code is from example debruijn26.cpp:
int main (int argc, char* argv[])
{
const char* seq = "AAAAACTACATTACCCGTTTGCGAGACAGGTA";
size_t kmerSize = strlen (seq) - 1;
IBank* bank = new BankStrings (seq, seq, seq, seq, 0);
Graph graph = Graph::create (bank, "-kmer-size %d -abundance-min 1 -verbose 0", kmerSize);
Node node = graph.buildNode (seq);
cout << "abundance=" << graph.queryAbundance(node) << endl;
}
[go back to top]