jmlr jmlr2008 jmlr2008-61 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, Eric P. Xing
Abstract: Consider data consisting of pairwise measurements, such as presence or absence of links between pairs of objects. These data arise, for instance, in the analysis of protein interactions and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing pairwise measurements with probabilistic models requires special assumptions, since the usual independence or exchangeability assumptions no longer hold. Here we introduce a class of variance allocation models for pairwise measurements: mixed membership stochastic blockmodels. These models combine global parameters that instantiate dense patches of connectivity (blockmodel) with local parameters that instantiate node-specific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodels with applications to social networks and protein interaction networks. Keywords: hierarchical Bayes, latent variables, mean-field approximation, statistical network analysis, social networks, protein interaction networks
Reference: text
sentIndex sentText sentNum sentScore
1 These data arise, for instance, in the analysis of protein interactions and gene regulatory networks, collections of author-recipient email, and social networks. [sent-13, score-0.542]
2 Here we introduce a class of variance allocation models for pairwise measurements: mixed membership stochastic blockmodels. [sent-15, score-0.61]
3 We demonstrate the advantages of mixed membership stochastic blockmodels with applications to social networks and protein interaction networks. [sent-18, score-1.148]
4 Keywords: hierarchical Bayes, latent variables, mean-field approximation, statistical network analysis, social networks, protein interaction networks 1. [sent-19, score-0.702]
5 Rather, the latent stochastic blockmodel (Wang and Wong, 1987; Snijders and Nowicki, 1997) is an adaptation of mixture modeling to relational data. [sent-40, score-0.591]
6 The latent stochastic blockmodel suffers from a limitation that each object can only belong to one cluster, or in other words, play a single latent role. [sent-46, score-0.68]
7 For example, when a protein or a social actor interacts with different partners, different functional or social contexts may apply and thus the protein or the actor may be acting according to different latent roles they can possible play. [sent-48, score-1.008]
8 In this paper, we relax the assumption of single-latentrole for actors, and develop a mixed membership model for relational data. [sent-49, score-0.711]
9 Mixed membership models, such as latent Dirichlet allocation (Blei et al. [sent-50, score-0.603]
10 The mixed membership model associates each unit of observation with multiple clusters rather than a single cluster, via a membership probability-like vector. [sent-57, score-1.044]
11 The concurrent membership of a data in different clusters can capture its different aspects, such as different underlying topics for words constituting each document. [sent-58, score-0.462]
12 This is also a natural idea for relational data, where the objects can bear multiple latent roles or cluster-memberships that influence their relationships to others. [sent-59, score-0.382]
13 As we will demonstrate, a mixed membership approach to relational data lets us describe the interaction between objects playing multiple roles. [sent-60, score-0.819]
14 Existing mixed membership models are not appropriate for relational data because they assume that the data are conditionally independent given their latent membership vectors. [sent-62, score-1.314]
15 In relational data, where each object is described by its relationships to others, we would like to assume that the ensemble of mixed membership vectors help govern the relationships of each object. [sent-63, score-0.711]
16 The conditional independence assumptions of modern mixed membership models do not apply. [sent-64, score-0.582]
17 n zN 1 y N1 zN 2 y N2 z1 1 y N3 zN N y NN Figure 1: Two graphical model representations of the mixed membership stochastic blockmodel (MMB). [sent-125, score-0.826]
18 Intuitively, the MMB summarized the variability of a graph with the blockmodel B and node-specific mixed membership vectors (left). [sent-126, score-0.798]
19 In this paper, we develop mixed membership models for relational data. [sent-131, score-0.711]
20 We develop a fast nested variational inference algorithm that performs well in the relational setting and is parallelizable. [sent-133, score-0.469]
21 We demonstrate the application of our technique to large-scale protein interaction networks and social networks. [sent-134, score-0.453]
22 Mixed membership and the latent block structure can be recovered from relational data (Section 4. [sent-136, score-0.784]
23 The application to a friendship network among students tests the model on a real data set where a well-defined latent block structure exists (Section 4. [sent-138, score-0.511]
24 The application to a protein interaction network tests to what extent our model can reduce the dimensionality of the data, while revealing substantive information about the functionality of proteins that can be used to inform subsequent analyses (Section 4. [sent-140, score-0.548]
25 The Mixed Membership Stochastic Blockmodel In this section, we describe the modeling assumptions if the mixed membership model of relational data. [sent-143, score-0.711]
26 In the context of the monastery example, we assume K factions, that is, latent groups, exist in the monastery, and the observed network is generated according to distributions of group-membership for each monk and a matrix of group-group interaction strength. [sent-155, score-0.527]
27 Each monk is associated with a randomly drawn vector πi for monk i, where πi,g denotes the probability of monk i belonging to group g. [sent-157, score-0.376]
28 The probabilities of interactions between different groups are defined by a matrix of Bernoulli rates B(K×K) , where B(g, h) represents the probability of having a link between a monk from group g and a monk from group h. [sent-159, score-0.457]
29 For each monk, the indicator vector z p→q denotes the group membership of monk p when he responds to survey questions about monk q and z p←q denotes the group membership of monk q when he responds to survey questions about node p. [sent-160, score-1.174]
30 In this abstract setting, the mixed membership stochastic blockmodel (MMB) posits that a graph G = (N ,Y ) is drawn from the following procedure. [sent-163, score-0.826]
31 • For each node p ∈ N : – Draw a K dimensional mixed membership vector π p ∼ Dirichlet ( α ). [sent-164, score-0.582]
32 • For each pair of nodes (p, q) ∈ N × N : – Draw membership indicator for the initiator, z p→q ∼ Multinomial ( π p ). [sent-165, score-0.43]
33 – Draw membership indicator for the receiver, zq→p ∼ Multinomial ( πq ). [sent-166, score-0.385]
34 In previous work we combined mixed membership and blockmodels to perform analyses of a single collection of binary, paired measurements; namely, hypothesis testing, predicting and de-noising interactions within an unsupervised learning setting (Airoldi et al. [sent-169, score-0.848]
35 An indicator vector is used to denote membership in one of the K groups. [sent-172, score-0.385]
36 Note that the group membership of each node is context dependent. [sent-175, score-0.413]
37 That is, each node may assume different membership when interacting to or being interacted by different peers. [sent-176, score-0.385]
38 Using the algorithms presented in Section 3, we fit the monks to MMB models for different ˆ ˆ numbers of groups, providing model estimates {α, B} and posterior mixed membership vectors πn for each monk. [sent-228, score-0.731]
39 In Figure 3 we illustrate the the posterior means of the mixed membership scores, E[ π|Y ], for the 18 monks in the monastery. [sent-246, score-0.731]
40 Parameter Estimation and Posterior Inference Two computational problems are central to the MMB: posterior inference of the per-node mixed membership vectors and per-pair roles, and parameter estimation of the Dirichlet parameters and Bernoulli rate matrix. [sent-252, score-0.702]
41 1 Posterior Inference The posterior inference problem is to compute the posterior distribution of the latent variables given a collection of observations. [sent-255, score-0.458]
42 A number of approximate inference algorithms for mixed membership models have appeared in recent years, including mean-field variational methods (Blei et al. [sent-258, score-0.85]
43 The main idea behind variational methods is to first posit a distribution of the latent variables with free parameters, and then fit those parameters such that the distribution is close in Kullback-Leibler divergence to the true posterior. [sent-263, score-0.443]
44 This latter estimator attributes all the information in the non-interactions to the point mass, that is, to latent sources other than the block model B or the mixed membership vectors π1:N . [sent-362, score-0.852]
45 Experiments and Results We present a study of simulated data and applications to social and protein interaction networks. [sent-371, score-0.453]
46 1 to show that both mixed membership, π1:N , and the latent block structure, B, can be recovered from data, when they exist, and that the nested variational inference algorithm is faster than the na¨ve implementation while reaching the same peak in the ı likelihood—all other things being equal. [sent-373, score-0.807]
47 3 tests the model on a real data set where we expect a noisy, vague latent block structure to inform the observed connectivity patterns in the network to some degree. [sent-381, score-0.371]
48 We used different values of α to simulate a range of settings in terms of membership of nodes to clusters—from unique (α = 0. [sent-387, score-0.43]
49 The variational EM algorithm successfully recovers both the latent block model B and the latent mixed membership vectors π1:N . [sent-391, score-1.295]
50 Furthermore, the nested variational algorithm can be parallelized given that the updates for each interaction (i, j) are independent of one another. [sent-404, score-0.405]
51 We measure the number of latent clusters Figure 6: Adjacency matrices of corresponding to simulated interaction graphs with 100 nodes and 4 clusters, 300 nodes and 10 clusters, 600 nodes and 20 clusters (top to bottom) and α equal to 0. [sent-408, score-0.615]
52 In the example shown, the peak identifies the correct number of clusters, K ∗ = 10 1994 M IXED M EMBERSHIP S TOCHASTIC B LOCKMODELS Figure 8: The posterior mixed membership scores, π, for the 69 students. [sent-434, score-0.659]
53 Each panel correspond to a student; we order the clusters 1 to 6 on the X axis, and we measure the student’s grade of membership to these clusters on the Y axis. [sent-435, score-0.641]
54 3719 Figure 8 shows the expected posterior mixed membership scores for the 69 students in the sample; few students display mixed membership. [sent-480, score-1.106]
55 The rarity of mixed membership in this context is expected, while mixed membership may signal unexpected social situations for further investigation. [sent-481, score-1.295]
56 In Figure 9, we contrast the friendship relation data (left) to the estimates obtained by thresholding the estimated probabilities of a relation, using the blockmodel and the node-specific latent variables (center) and the interactions-specific latent variables (right). [sent-483, score-0.737]
57 The model provides a good summary of the social structure in the school; students 1995 A IROLDI , B LEI , F IENBERG AND X ING tend to befriend other students in the same grade, with a few exceptions. [sent-484, score-0.381]
58 The low degree of mixed membership explains the absence of obvious differences between the model-based reconstructions of the friendship relations with the two model variants (center and right). [sent-485, score-0.723]
59 Figure 9: Original matrix of friensdhip relations among 69 students in grades 7 to 12 (left), and friendship estimated relations obtained by thresholding the posterior expectations π p B πq |Y (center), and φ p B φq |Y (right). [sent-486, score-0.445]
60 The mixed membership vectors provide a mapping between grades and blocks. [sent-489, score-0.628]
61 Conditionally on such a mapping, we assign students to the grade they are most associated with, according to their posterior-mean mixed membership vectors, E[πn |Y ]. [sent-490, score-0.769]
62 Table 1 computes the correspondence of grades to blocks by quoting the number of students in each grade-block pair, for MMB versus the mixture blockmodel (MB) in Doreian et al. [sent-492, score-0.443]
63 The results suggest that the extra-flexibility MMB offers over MB and LSCM reduces bias in the prediction of the membership of students to blocks. [sent-498, score-0.51]
64 In other words, mixed membership does not absorb noise in this example; rather it accommodates variability in the friendship relation that is instrumental in producing better predictions. [sent-499, score-0.667]
65 On the other hand, the mixed membership vectors π1:N provide a collection of node-specific latent vectors, which inform the directed connections in the graph in a symmetric fashion. [sent-502, score-0.875]
66 MMB is the proposed mixed membership stochastic blockmodel, MSB is a simpler stochastic block mixture model (Doreian et al. [sent-507, score-0.69]
67 The goal of the analysis of protein interactions with MMB is to reveal the proteins’ diverse functional roles by analyzing their local and global patterns of interaction. [sent-524, score-0.452]
68 Below, we describe the MIPS protein interactions data and the possible interpretations of the blocks in MMB in terms of biological functions, and we report results of two experiments. [sent-528, score-0.408]
69 It includes a hand-curated collection of protein interactions that does not include interactions obtained with highthroughput technologies. [sent-533, score-0.533]
70 Figure 10 shows the binary representations, b1:871 , of the proteins in our collections; each panel corresponds to a protein; the 15 functional categories are ordered as in Table 2 on the X axis, whereas the presence or absence of the corresponding functional annotation is displayed on the Y axis. [sent-552, score-0.368]
71 2, we fit a mixed membership blockmodel with K = 15, and we explore the direct correspondence between protein-specific mixed memberships to blocks, π1:871 , and MIPS-derived functional annotations, b1:871 . [sent-555, score-1.113]
72 We plot marginal frequencies of proteins’ membership to true functions (left) and to predicted functions (right). [sent-609, score-0.423]
73 In other words, we need to estimate a permutation of the components of πn in order to be able to interpret E[πn (k)|Y ] as the expected degree of membership of protein n in function k of Table 2—rather than simply the expected degree of membership of protein n in block k, out of 15. [sent-613, score-1.25]
74 We predicted membership of the 87 proteins by thresholding their mixed membership representations, 1 if πn (k) > τ ˆ bn (k) = 0 otherwise, where τ is the 95th percentile of the ensemble of elements of π1:87 , corresponding to the 87 proteins in the training set. [sent-616, score-1.265]
75 We used this mapping to compare predicted versus known functional annotations for all proteins; in Figure 11 we plot marginal frequencies of proteins’ membership to true functions (left panel) and to predicted functions (right panel). [sent-618, score-0.624]
76 Figure 12 shows predicted mixed memberships (dashed, red lines) versus the true annotations (solid, black lines), given the estimated mapping of blocks to functions, for six example proteins. [sent-621, score-0.442]
77 These computations lead to two estimated protein interaction networks with expected probabilities of interactions taking values in [0, 1]. [sent-681, score-0.46]
78 We use the independent set of functional annotations from the gene ontology to decide which interactions are functionally meaningful; namely those between pairs of proteins that share at least one functional annotation (Myers et al. [sent-685, score-0.616]
79 Most importantly, notice the estimated protein interaction 2002 M IXED M EMBERSHIP S TOCHASTIC B LOCKMODELS networks, that is, ex-es and crosses, corresponding to lower levels of recall feature a more precise functional content than the original. [sent-693, score-0.387]
80 5 In this application, the MMB learned information about (i) the mixed membership of objects to latent groups, and (ii) the connectivity patterns among latent groups. [sent-700, score-1.056]
81 In the latent space models, the latent vectors are drawn from Gaussian distributions and the interaction data is drawn from a Gaussian with mean π p Iπq . [sent-717, score-0.544]
82 In the MMB, the marginal probability of an interaction takes a similar form, π p Bπq , where B is the matrix of probabilities of interactions for each pair of latent groups. [sent-718, score-0.464]
83 (It would be interesting to develop a variational algorithm for the latent space models as well. [sent-724, score-0.443]
84 This is not the case, however, as the blockmodel B captures global/asymmetric relations, while the mixed membership vectors Πs capture local/symmetric relations. [sent-741, score-0.798]
85 A recurring question, which bears relevance to mixed membership models in general, is why we do not integrate out the single membership indicators—(z p→q , z p←q ). [sent-743, score-0.967]
86 3, for example, they encode the interaction-specific memberships of individual proteins to protein complexes. [sent-747, score-0.397]
87 In the relational setting, cross-validation is feasible if the blockmodel estimated on training data can be expected to hold on test data; for this to happen the network must be of reasonable size, so that we can expect members of each block to be in both training and test sets. [sent-748, score-0.428]
88 In this setting, scheduling of variational updates is important; nested variational scheduling leads to efficient and parallelizable inference. [sent-749, score-0.522]
89 From a data analysis perspective, we speculate that the value of MMB in capturing substantive information about a problem will increase in semi-supervised setting—where, for example, information about the membership of genes to functional contexts is included in the form of prior distributions. [sent-753, score-0.483]
90 To maintain mixed membership of nodes to groups/blocks in such setting, we need to sample them from a hierarchical Dirichlet process (Teh et al. [sent-757, score-0.627]
91 Conclusions In this paper we introduced mixed membership stochastic blockmodels, a novel class of latent variable models for relational data. [sent-769, score-0.957]
92 The nested variational inference algorithm is parallelizable and allows fast approximate inference on large graphs. [sent-771, score-0.383]
93 General Model Formulation In general, mixed membership stochastic blockmodels can be specified in terms of assumptions at four levels: population, node, latent variable, and sampling scheme level. [sent-778, score-0.913]
94 2 Node Level The components of the membership vector π p = [π p (1), . [sent-790, score-0.385]
95 , π p (k)] encodes the mixed membership of the n-th node to the various sub-populations. [sent-793, score-0.582]
96 3 Latent Variable Level Assume that the mixed membership vectors π1:N are realizations of a latent variable with distribution Dα , with parameter vector α. [sent-797, score-0.8]
97 This latter estimator attributes all the information in the non-interactions to the point mass, that is, to latent sources other than the block model B or the mixed membership vectors π1:N . [sent-883, score-0.852]
98 Bayesian mixed membership models for soft clustering and classification. [sent-1077, score-0.61]
99 Estimation and prediction for stochastic blockmodels for graphs with latent block structure. [sent-1362, score-0.383]
100 A collapsed variational bayesian inference algorithm for latent dirichlet allocation. [sent-1388, score-0.571]
wordName wordTfidf (topN-words)
[('membership', 0.385), ('mmb', 0.286), ('variational', 0.225), ('latent', 0.218), ('blockmodel', 0.216), ('protein', 0.214), ('mixed', 0.197), ('mips', 0.147), ('interactions', 0.138), ('social', 0.131), ('ienberg', 0.131), ('iroldi', 0.131), ('lei', 0.131), ('proteins', 0.13), ('relational', 0.129), ('students', 0.125), ('embership', 0.124), ('lockmodels', 0.124), ('monk', 0.116), ('ym', 0.11), ('interaction', 0.108), ('eq', 0.105), ('annotations', 0.098), ('ixed', 0.094), ('tochastic', 0.094), ('dirichlet', 0.085), ('blockmodels', 0.085), ('friendship', 0.085), ('sampson', 0.079), ('clusters', 0.077), ('posterior', 0.077), ('blei', 0.077), ('handcock', 0.077), ('qg', 0.077), ('go', 0.075), ('monks', 0.072), ('nested', 0.072), ('fienberg', 0.07), ('qh', 0.066), ('functional', 0.065), ('airoldi', 0.062), ('grade', 0.062), ('collections', 0.059), ('relations', 0.056), ('blocks', 0.056), ('doreian', 0.054), ('monastery', 0.054), ('sociometric', 0.054), ('health', 0.054), ('ing', 0.053), ('memberships', 0.053), ('block', 0.052), ('zm', 0.049), ('zs', 0.048), ('annotated', 0.048), ('grades', 0.046), ('nodes', 0.045), ('inference', 0.043), ('collection', 0.043), ('em', 0.042), ('log', 0.042), ('ontology', 0.041), ('panel', 0.04), ('annotation', 0.04), ('functionally', 0.039), ('hy', 0.039), ('wo', 0.039), ('kemp', 0.039), ('connectivity', 0.038), ('measurements', 0.038), ('predicted', 0.038), ('roles', 0.035), ('na', 0.034), ('interpretable', 0.033), ('substantive', 0.033), ('ths', 0.033), ('inform', 0.032), ('rid', 0.032), ('network', 0.031), ('cellular', 0.031), ('biosynthesis', 0.031), ('breiger', 0.031), ('factions', 0.031), ('hoff', 0.031), ('loyal', 0.031), ('lscm', 0.031), ('outcasts', 0.031), ('turks', 0.031), ('groups', 0.031), ('young', 0.03), ('mcmc', 0.03), ('transcription', 0.029), ('isolating', 0.029), ('grif', 0.029), ('clustering', 0.028), ('categories', 0.028), ('likelihood', 0.028), ('stochastic', 0.028), ('group', 0.028), ('cluster', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999994 61 jmlr-2008-Mixed Membership Stochastic Blockmodels
Author: Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, Eric P. Xing
Abstract: Consider data consisting of pairwise measurements, such as presence or absence of links between pairs of objects. These data arise, for instance, in the analysis of protein interactions and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing pairwise measurements with probabilistic models requires special assumptions, since the usual independence or exchangeability assumptions no longer hold. Here we introduce a class of variance allocation models for pairwise measurements: mixed membership stochastic blockmodels. These models combine global parameters that instantiate dense patches of connectivity (blockmodel) with local parameters that instantiate node-specific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodels with applications to social networks and protein interaction networks. Keywords: hierarchical Bayes, latent variables, mean-field approximation, statistical network analysis, social networks, protein interaction networks
2 0.081841528 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
Author: Matthias W. Seeger
Abstract: The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007). Keywords: sparse linear model, Laplace prior, expectation propagation, approximate inference, optimal design, Bayesian statistics, gene network recovery, image coding, compressive sensing
3 0.07196033 16 jmlr-2008-Approximations for Binary Gaussian Process Classification
Author: Hannes Nickisch, Carl Edward Rasmussen
Abstract: We provide a comprehensive overview of many recent algorithms for approximate inference in Gaussian process models for probabilistic binary classification. The relationships between several approaches are elucidated theoretically, and the properties of the different algorithms are corroborated by experimental results. We examine both 1) the quality of the predictive distributions and 2) the suitability of the different marginal likelihood approximations for model selection (selecting hyperparameters) and compare to a gold standard based on MCMC. Interestingly, some methods produce good predictive distributions although their marginal likelihood approximations are poor. Strong conclusions are drawn about the methods: The Expectation Propagation algorithm is almost always the method of choice unless the computational budget is very tight. We also extend existing methods in various ways, and provide unifying code implementing all approaches. Keywords: Gaussian process priors, probabilistic classification, Laplaces’s approximation, expectation propagation, variational bounding, mean field methods, marginal likelihood evidence, MCMC
4 0.050682332 29 jmlr-2008-Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods
Author: Matthias W. Seeger
Abstract: We propose a highly efficient framework for penalized likelihood kernel methods applied to multiclass models with a large, structured set of classes. As opposed to many previous approaches which try to decompose the fitting problem into many smaller ones, we focus on a Newton optimization of the complete model, making use of model structure and linear conjugate gradients in order to approximate Newton search directions. Crucially, our learning method is based entirely on matrix-vector multiplication primitives with the kernel matrices and their derivatives, allowing straightforward specialization to new kernels, and focusing code optimization efforts to these primitives only. Kernel parameters are learned automatically, by maximizing the cross-validation log likelihood in a gradient-based way, and predictive probabilities are estimated. We demonstrate our approach on large scale text classification tasks with hierarchical structure on thousands of classes, achieving state-of-the-art results in an order of magnitude less time than previous work. Parts of this work appeared in the conference paper Seeger (2007). Keywords: multi-way classification, kernel logistic regression, hierarchical classification, cross validation optimization, Newton-Raphson optimization
5 0.04847889 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
Author: Jun Zhu, Zaiqing Nie, Bo Zhang, Ji-Rong Wen
Abstract: Existing template-independent web data extraction approaches adopt highly ineffective decoupled strategies—attempting to do data record detection and attribute labeling in two separate phases. In this paper, we propose an integrated web data extraction paradigm with hierarchical models. The proposed model is called Dynamic Hierarchical Markov Random Fields (DHMRFs). DHMRFs take structural uncertainty into consideration and define a joint distribution of both model structure and class labels. The joint distribution is an exponential family distribution. As a conditional model, DHMRFs relax the independence assumption as made in directed models. Since exact inference is intractable, a variational method is developed to learn the model’s parameters and to find the MAP model structure and label assignments. We apply DHMRFs to a real-world web data extraction task. Experimental results show that: (1) integrated web data extraction models can achieve significant improvements on both record detection and attribute labeling compared to decoupled models; (2) in diverse web data extraction DHMRFs can potentially address the blocky artifact issue which is suffered by fixed-structured hierarchical models. Keywords: conditional random fields, dynamic hierarchical Markov random fields, integrated web data extraction, statistical hierarchical modeling, blocky artifact issue
6 0.047072563 62 jmlr-2008-Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
8 0.042679992 38 jmlr-2008-Generalization from Observed to Unobserved Features by Clustering
9 0.039825827 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
10 0.03896524 58 jmlr-2008-Max-margin Classification of Data with Absent Features
11 0.0375701 12 jmlr-2008-Algorithms for Sparse Linear Classifiers in the Massive Data Setting
12 0.036610398 47 jmlr-2008-Learning Balls of Strings from Edit Corrections
13 0.036515545 59 jmlr-2008-Maximal Causes for Non-linear Component Extraction
14 0.035941146 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields
15 0.035717189 20 jmlr-2008-Causal Reasoning with Ancestral Graphs (Special Topic on Causality)
16 0.035708711 70 jmlr-2008-On Relevant Dimensions in Kernel Feature Spaces
17 0.035189703 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers
18 0.034004852 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models
19 0.033946503 17 jmlr-2008-Automatic PCA Dimension Selection for High Dimensional Data and Small Sample Sizes
20 0.033697505 55 jmlr-2008-Linear-Time Computation of Similarity Measures for Sequential Data
topicId topicWeight
[(0, 0.17), (1, 0.012), (2, -0.04), (3, -0.007), (4, -0.07), (5, 0.122), (6, 0.013), (7, -0.048), (8, -0.026), (9, 0.09), (10, -0.104), (11, 0.074), (12, -0.002), (13, -0.045), (14, 0.069), (15, -0.146), (16, -0.032), (17, 0.027), (18, 0.065), (19, -0.171), (20, -0.103), (21, -0.021), (22, 0.102), (23, -0.147), (24, -0.153), (25, -0.163), (26, -0.115), (27, -0.099), (28, -0.072), (29, -0.017), (30, 0.032), (31, -0.133), (32, -0.027), (33, 0.188), (34, -0.007), (35, 0.177), (36, 0.187), (37, -0.073), (38, -0.017), (39, 0.145), (40, 0.131), (41, -0.216), (42, 0.325), (43, -0.186), (44, -0.056), (45, 0.111), (46, 0.124), (47, 0.042), (48, 0.0), (49, 0.122)]
simIndex simValue paperId paperTitle
same-paper 1 0.96586198 61 jmlr-2008-Mixed Membership Stochastic Blockmodels
Author: Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, Eric P. Xing
Abstract: Consider data consisting of pairwise measurements, such as presence or absence of links between pairs of objects. These data arise, for instance, in the analysis of protein interactions and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing pairwise measurements with probabilistic models requires special assumptions, since the usual independence or exchangeability assumptions no longer hold. Here we introduce a class of variance allocation models for pairwise measurements: mixed membership stochastic blockmodels. These models combine global parameters that instantiate dense patches of connectivity (blockmodel) with local parameters that instantiate node-specific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodels with applications to social networks and protein interaction networks. Keywords: hierarchical Bayes, latent variables, mean-field approximation, statistical network analysis, social networks, protein interaction networks
2 0.33508071 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
Author: Matthias W. Seeger
Abstract: The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007). Keywords: sparse linear model, Laplace prior, expectation propagation, approximate inference, optimal design, Bayesian statistics, gene network recovery, image coding, compressive sensing
3 0.26927561 38 jmlr-2008-Generalization from Observed to Unobserved Features by Clustering
Author: Eyal Krupka, Naftali Tishby
Abstract: We argue that when objects are characterized by many attributes, clustering them on the basis of a random subset of these attributes can capture information on the unobserved attributes as well. Moreover, we show that under mild technical conditions, clustering the objects on the basis of such a random subset performs almost as well as clustering with the full attribute set. We prove finite sample generalization theorems for this novel learning scheme that extends analogous results from the supervised learning setting. We use our framework to analyze generalization to unobserved features of two well-known clustering algorithms: k-means and the maximum likelihood multinomial mixture model. The scheme is demonstrated for collaborative filtering of users with movie ratings as attributes and document clustering with words as attributes. Keywords: clustering, unobserved features, learning theory, generalization in clustering, information bottleneck
4 0.24310742 16 jmlr-2008-Approximations for Binary Gaussian Process Classification
Author: Hannes Nickisch, Carl Edward Rasmussen
Abstract: We provide a comprehensive overview of many recent algorithms for approximate inference in Gaussian process models for probabilistic binary classification. The relationships between several approaches are elucidated theoretically, and the properties of the different algorithms are corroborated by experimental results. We examine both 1) the quality of the predictive distributions and 2) the suitability of the different marginal likelihood approximations for model selection (selecting hyperparameters) and compare to a gold standard based on MCMC. Interestingly, some methods produce good predictive distributions although their marginal likelihood approximations are poor. Strong conclusions are drawn about the methods: The Expectation Propagation algorithm is almost always the method of choice unless the computational budget is very tight. We also extend existing methods in various ways, and provide unifying code implementing all approaches. Keywords: Gaussian process priors, probabilistic classification, Laplaces’s approximation, expectation propagation, variational bounding, mean field methods, marginal likelihood evidence, MCMC
5 0.24149302 29 jmlr-2008-Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods
Author: Matthias W. Seeger
Abstract: We propose a highly efficient framework for penalized likelihood kernel methods applied to multiclass models with a large, structured set of classes. As opposed to many previous approaches which try to decompose the fitting problem into many smaller ones, we focus on a Newton optimization of the complete model, making use of model structure and linear conjugate gradients in order to approximate Newton search directions. Crucially, our learning method is based entirely on matrix-vector multiplication primitives with the kernel matrices and their derivatives, allowing straightforward specialization to new kernels, and focusing code optimization efforts to these primitives only. Kernel parameters are learned automatically, by maximizing the cross-validation log likelihood in a gradient-based way, and predictive probabilities are estimated. We demonstrate our approach on large scale text classification tasks with hierarchical structure on thousands of classes, achieving state-of-the-art results in an order of magnitude less time than previous work. Parts of this work appeared in the conference paper Seeger (2007). Keywords: multi-way classification, kernel logistic regression, hierarchical classification, cross validation optimization, Newton-Raphson optimization
6 0.22748893 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
8 0.21362203 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks
9 0.21187682 12 jmlr-2008-Algorithms for Sparse Linear Classifiers in the Massive Data Setting
10 0.20767623 47 jmlr-2008-Learning Balls of Strings from Edit Corrections
11 0.17978264 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models
12 0.17801873 96 jmlr-2008-Visualizing Data using t-SNE
13 0.17385267 59 jmlr-2008-Maximal Causes for Non-linear Component Extraction
14 0.17382315 7 jmlr-2008-A Tutorial on Conformal Prediction
15 0.16762108 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models
16 0.15990575 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields
17 0.15695789 20 jmlr-2008-Causal Reasoning with Ancestral Graphs (Special Topic on Causality)
18 0.14572063 62 jmlr-2008-Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
19 0.13815133 73 jmlr-2008-On the Suitable Domain for SVM Training in Image Coding
20 0.13130753 32 jmlr-2008-Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers
topicId topicWeight
[(0, 0.016), (5, 0.021), (9, 0.011), (31, 0.011), (40, 0.032), (54, 0.026), (58, 0.028), (66, 0.064), (76, 0.044), (78, 0.016), (88, 0.047), (92, 0.034), (94, 0.051), (97, 0.479), (99, 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.8468653 61 jmlr-2008-Mixed Membership Stochastic Blockmodels
Author: Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, Eric P. Xing
Abstract: Consider data consisting of pairwise measurements, such as presence or absence of links between pairs of objects. These data arise, for instance, in the analysis of protein interactions and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing pairwise measurements with probabilistic models requires special assumptions, since the usual independence or exchangeability assumptions no longer hold. Here we introduce a class of variance allocation models for pairwise measurements: mixed membership stochastic blockmodels. These models combine global parameters that instantiate dense patches of connectivity (blockmodel) with local parameters that instantiate node-specific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodels with applications to social networks and protein interaction networks. Keywords: hierarchical Bayes, latent variables, mean-field approximation, statistical network analysis, social networks, protein interaction networks
2 0.23155811 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
Author: Matthias W. Seeger
Abstract: The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007). Keywords: sparse linear model, Laplace prior, expectation propagation, approximate inference, optimal design, Bayesian statistics, gene network recovery, image coding, compressive sensing
3 0.22396168 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks
Author: Gal Elidan, Stephen Gould
Abstract: With the increased availability of data for complex domains, it is desirable to learn Bayesian network structures that are sufficiently expressive for generalization while at the same time allow for tractable inference. While the method of thin junction trees can, in principle, be used for this purpose, its fully greedy nature makes it prone to overfitting, particularly when data is scarce. In this work we present a novel method for learning Bayesian networks of bounded treewidth that employs global structure modifications and that is polynomial both in the size of the graph and the treewidth bound. At the heart of our method is a dynamic triangulation that we update in a way that facilitates the addition of chain structures that increase the bound on the model’s treewidth by at most one. We demonstrate the effectiveness of our “treewidth-friendly” method on several real-life data sets and show that it is superior to the greedy approach as soon as the bound on the treewidth is nontrivial. Importantly, we also show that by making use of global operators, we are able to achieve better generalization even when learning Bayesian networks of unbounded treewidth. Keywords: Bayesian networks, structure learning, model selection, bounded treewidth
4 0.22369376 58 jmlr-2008-Max-margin Classification of Data with Absent Features
Author: Gal Chechik, Geremy Heitz, Gal Elidan, Pieter Abbeel, Daphne Koller
Abstract: We consider the problem of learning classifiers in structured domains, where some objects have a subset of features that are inherently absent due to complex relationships between the features. Unlike the case where a feature exists but its value is not observed, here we focus on the case where a feature may not even exist (structurally absent) for some of the samples. The common approach for handling missing features in discriminative models is to first complete their unknown values, and then use a standard classification procedure over the completed data. This paper focuses on features that are known to be non-existing, rather than have an unknown value. We show how incomplete data can be classified directly without any completion of the missing features using a max-margin learning framework. We formulate an objective function, based on the geometric interpretation of the margin, that aims to maximize the margin of each sample in its own relevant subspace. In this formulation, the linearly separable case can be transformed into a binary search over a series of second order cone programs (SOCP), a convex problem that can be solved efficiently. We also describe two approaches for optimizing the general case: an approximation that can be solved as a standard quadratic program (QP) and an iterative approach for solving the exact problem. By avoiding the pre-processing phase in which the data is completed, both of these approaches could offer considerable computational savings. More importantly, we show that the elegant handling of missing values by our approach allows it to both outperform other methods when the missing values have non-trivial structure, and be competitive with other methods when the values are missing at random. We demonstrate our results on several standard benchmarks and two real-world problems: edge prediction in metabolic pathways, and automobile detection in natural images. Keywords: max margin, missing features, network reconstruction, metabolic pa
5 0.22254515 56 jmlr-2008-Magic Moments for Structured Output Prediction
Author: Elisa Ricci, Tijl De Bie, Nello Cristianini
Abstract: Most approaches to structured output prediction rely on a hypothesis space of prediction functions that compute their output by maximizing a linear scoring function. In this paper we present two novel learning algorithms for this hypothesis class, and a statistical analysis of their performance. The methods rely on efficiently computing the first two moments of the scoring function over the output space, and using them to create convex objective functions for training. We report extensive experimental results for sequence alignment, named entity recognition, and RNA secondary structure prediction. Keywords: structured output prediction, discriminative learning, Z-score, discriminant analysis, PAC bound
6 0.22083241 83 jmlr-2008-Robust Submodular Observation Selection
7 0.22052945 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
8 0.2198685 57 jmlr-2008-Manifold Learning: The Price of Normalization
9 0.21902169 86 jmlr-2008-SimpleMKL
10 0.21816386 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections
12 0.2169627 66 jmlr-2008-Multi-class Discriminant Kernel Learning via Convex Programming (Special Topic on Model Selection)
13 0.21647449 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
14 0.21565388 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields
15 0.21551108 16 jmlr-2008-Approximations for Binary Gaussian Process Classification
16 0.21542312 72 jmlr-2008-On the Size and Recovery of Submatrices of Ones in a Random Binary Matrix
17 0.21434398 9 jmlr-2008-Active Learning by Spherical Subdivision
18 0.2142933 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning
19 0.21268541 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration
20 0.21234795 27 jmlr-2008-Consistency of the Group Lasso and Multiple Kernel Learning