One of the fundamental problems in computational biology is understanding the inner workings of a cell. Most cellular processes are carried out by protein-protein interactions (PPIs). Thus, analyzing and modeling of large PPI networks is an integral part of this process. Analogous to biological sequence comparison, comparing cellular networks is an important problem that could provide insight into biological understanding and therapeutics. The full-scale comparison of two arbitrary networks is computationally intractable, because it contains the subgraph isomorphism problem, which is NP-complete. Thus, heuristic measures must be developed for network comparison. We devise a highly constraining network comparison metric based on local structural properties that are a direct generalization of the degree distribution. We use this new metric to demonstrate that geometric random graphs better model PPI networks than do Erdos-Renyi, random scale-free, or Barabasi-Albert scale-free networks. Our new systematic measure of a network's local structure imposes a large number of similarity constraints on networks being compared. In particular, we generalize the degree distribution, which measures the number of nodes ``touching'' k edges, into graphlet degree distributions measuring the number of nodes ``touching'' k graphlets, where graphlets are small connected non-isomorphic induced subgraphs of a large network. Clearly, the degree distribution is the first one in the spectrum of graphlet degree distributions, since an edge is the only 2-node graphlet. We design a network ``agreement'' measure as a number in [0,1] that encompasses the 73 graphlet degree distributions of 2-, 3-, 4-, and 5-node graphlets. This measure is easily extendible to a greater number of constraints (i.e., graphlets) and the extensions are limited only by the available CPU.
Natasa Przulj is an Assistant Professor in the Computer Science Department at UC Irvine. Her research involves developing new tools for analyzing and modeling of complex networks in cellular biology. She was a postdoctoral fellow at the Samuel Lunenfeld Research Institute in Toronto working under the supervision of Prof. Jeff Wrana. She completed her Ph.D. in the Department of Computer Science at the University of Toronto in 2005 under the supervision of Prof. Derek Corneil and Prof. Igor Jurisica. She received her M.Sc. from the same department in 2000 and her B.Sc. in Mathematics and Computing Science from Simon Fraser University, BC, Canada in 1997 after starting at the Department of Mathematics of the University of Belgrade in Yugoslavia. She spent two years in industry working for Hughes Aircraft, Westech Information Systems (subsidiary of BC Hydro, Vancouver, Canada), and IBM.