nips nips2010 nips2010-259 nips2010-259-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Benjamin Miller, Nadya Bliss, Patrick J. Wolfe
Abstract: When working with network datasets, the theoretical framework of detection theory for Euclidean vector spaces no longer applies. Nevertheless, it is desirable to determine the detectability of small, anomalous graphs embedded into background networks with known statistical properties. Casting the problem of subgraph detection in a signal processing context, this article provides a framework and empirical results that elucidate a “detection theory” for graph-valued data. Its focus is the detection of anomalies in unweighted, undirected graphs through L1 properties of the eigenvectors of the graph’s so-called modularity matrix. This metric is observed to have relatively low variance for certain categories of randomly-generated graphs, and to reveal the presence of an anomalous subgraph with reasonable reliability when the anomaly is not well-correlated with stronger portions of the background graph. An analysis of subgraphs in real network datasets confirms the efficacy of this approach. 1
[1] J. Sun, J. Qu, D. Chakrabarti, and C. Faloutsos, “Neighborhood formation and anomaly detection in bipartite graphs,” in Proc. IEEE Int’l. Conf. on Data Mining, Nov. 2005.
[2] J. Sun, Y. Xie, H. Zhang, and C. Faloutsos, “Less is more: Compact matrix decomposition for large sparse graphs,” in Proc. SIAM Int’l. Conf. on Data Mining, 2007.
[3] C. C. Noble and D. J. Cook, “Graph-based anomaly detection,” in Proc. ACM SIGKDD Int’l. Conf. on Knowledge Discovery and Data Mining, pp. 631–636, 2003.
[4] W. Eberle and L. Holder, “Anomaly detection in data represented as graphs,” Intelligent Data Analysis, vol. 11, pp. 663–689, December 2007.
[5] C. E. Priebe, J. M. Conroy, D. J. Marchette, and Y. Park, “Scan statistics on enron graphs,” Computational & Mathematical Organization Theory, vol. 11, no. 3, pp. 229–247, 2005.
[6] T. Id´ and H. Kashima, “Eigenspace-based anomaly detection in computer systems,” in Proc. e KDD ’04, pp. 440–449, 2004.
[7] S. Hirose, K. Yamanishi, T. Nakata, and R. Fujimaki, “Network anomaly detection based on eigen equation compression,” in Proc. KDD ’09, pp. 1185–1193, 2009.
[8] M. E. J. Newman, “Finding community structure in networks using the eigenvectors of matrices,” Phys. Rev. E, vol. 74, no. 3, 2006.
[9] J. Ruan and W. Zhang, “An efficient spectral algorithm for network community discovery and its applications to biological and social networks,” in Proc. IEEE Int’l Conf. on Data Mining, pp. 643–648, 2007.
[10] S. White and P. Smyth, “A spectral clustering approach to finding communities in graphs,” in Proc. SIAM Data Mining Conf., 2005.
[11] B. A. Miller, N. T. Bliss, and P. J. Wolfe, “Toward signal processing theory for graphs and other non-Euclidean data,” in Proc. IEEE Int’l Conf. on Acoustics, Speech and Signal Processing, pp. 5414–5417, 2010.
[12] T. Mifflin, “Detection theory on random graphs,” in Proc. Int’l Conf. on Information Fusion, pp. 954–959, 2009.
[13] D. Chakrabarti and C. Faloutsos, “Graph mining: Laws, generators, and algorithms,” ACM Computing Surveys, vol. 38, no. 1, 2006.
[14] F. Chung, L. Lu, and V. Vu, “The spectra of random graphs with given expected degrees,” Proc. of National Academy of Sciences of the USA, vol. 100, no. 11, pp. 6313–6318, 2003.
[15] D. Chakrabarti, Y. Zhan, and C. Faloutsos, “R-MAT: A recursive model for graph mining,” in Proc. Fourth SIAM Int’l Conference on Data Mining, vol. 6, pp. 442–446, 2004.
[16] T. S. Motzkin and E. G. Straus, “Maxima for graphs and a new proof of a theorem of Tur´ n,” a Canad. J. Math., vol. 17, pp. 533–540, 1965.
[17] C. Ding, T. Li, and M. I. Jordan, “Nonnegative matrix factorization for combinatorial optimization: Spectral clustering, graph matching, and clique finding,” in Proc. IEEE Int’l Conf. on Data Mining, pp. 183–192, 2008.
[18] J. Leskovec, “Stanford network analysis package.” http://snap.stanford.edu.
[19] M. Richardson, R. Agrawal, and P. Domingos, “Trust management for the semantic web,” in Proc. ISWC, 2003.
[20] J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graph evolution: Densification and shinking diameters,” ACM Trans. on Knowledge Discovery from Data, vol. 1, no. 1, 2007.
[21] J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graphs over time: Densification laws, shinking diameters and possible explanations,” in Proc. KDD ’05, 2005.
[22] J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney, “Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters.” arXiv.org:0810.1355, 2008. 9