In this review, we present a survey of frequent subgraph mining applications that deal with biomolecular graphs, such as interaction networks, protein graph structures and chemical structures.īy definition, a graph consists of nodes and edges. One common problem in graph theory consists of finding the underlying subgraph patterns in graphs, which are also referred to as network motifs or graphlets.
Graphs or networks are ubiquitous data types, pervasive in multiple domains, from social sciences to medicine, biology and chemistry. We give an overview of various subgraph mining algorithms from a bioinformatics perspective and present several of their potential biomedical applications. In this review, we provide an introduction to subgraph mining for life scientists. Thus far, information about the bioinformatics application of subgraph mining remains scattered over heterogeneous literature. These techniques have seen numerous applications and are able to tackle a range of biological research questions, spanning from the detection of common substructures in sets of biomolecular compounds, to the discovery of network motifs in large-scale molecular interaction networks. The definition of which subgraphs are interesting and which are not is highly dependent on the application. Subgraph mining techniques focus on the discovery of patterns in graphs that exhibit a specific network structure that is deemed interesting within these data sets. Messaging Protocols (Some are no longer needed.Searching for interesting common subgraphs in graph data is a well-studied problem in data mining. tasks for the next stage) which would further be distributed (again maybe randomly) among all the task queues. After processing it will produce many subgraphs(i.e. A slave will pick a task from any of the task queue(maybe randomly) and process it. Each slave also has a task queue which actually consists of subgraphs.This way no slave gets loaded and checking uniqueness is properly distributed. And each slave knows which bloom filter it must contact to check a hash value of a subgraph. Each slave will maintain a bloom filter for a particular range of hash values. Initally, we shard the range of hash values.
We use a distributed bloom filter on these values to maintain only distinct subgraphs.