jmlr jmlr2010 jmlr2010-24 knowledge-graph by maker-knowledge-mining

24 jmlr-2010-Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials


Source: pdf

Author: Rahul Gupta, Sunita Sarawagi, Ajit A. Diwan

Abstract: Many structured information extraction tasks employ collective graphical models that capture interinstance associativity by coupling them with various clique potentials. We propose tractable families of such potentials that are invariant under permutations of their arguments, and call them symmetric clique potentials. We present three families of symmetric potentials—MAX, SUM, and MAJORITY . We propose cluster message passing for collective inference with symmetric clique potentials, and present message computation algorithms tailored to such potentials. Our first message computation algorithm, called α-pass, is sub-quadratic in the clique size, outputs exact messages for 13 MAX , and computes 15 -approximate messages for Potts, a popular member of the SUM family. Empirically, it is upto two orders of magnitude faster than existing algorithms based on graph-cuts or belief propagation. Our second algorithm, based on Lagrangian relaxation, operates on MAJORITY potentials and provides close to exact solutions while being two orders of magnitude faster. We show that the cluster message passing framework is more principled, accurate and converges faster than competing approaches. We extend our collective inference framework to exploit associativity of more general intradomain properties of instance labelings, which opens up interesting applications in domain adaptation. Our approach leads to significant error reduction on unseen domains without incurring any overhead of model retraining. Keywords: passing graphical models, collective inference, clique potentials, cluster graphs, message

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 IN Computer Science and Engineering IIT Bombay, Powai Mumbai, 400076, India Editor: Ben Taskar Abstract Many structured information extraction tasks employ collective graphical models that capture interinstance associativity by coupling them with various clique potentials. [sent-10, score-1.001]

2 We propose tractable families of such potentials that are invariant under permutations of their arguments, and call them symmetric clique potentials. [sent-11, score-0.944]

3 We propose cluster message passing for collective inference with symmetric clique potentials, and present message computation algorithms tailored to such potentials. [sent-13, score-1.45]

4 Our first message computation algorithm, called α-pass, is sub-quadratic in the clique size, outputs exact messages for 13 MAX , and computes 15 -approximate messages for Potts, a popular member of the SUM family. [sent-14, score-0.808]

5 Keywords: passing graphical models, collective inference, clique potentials, cluster graphs, message 1. [sent-20, score-1.171]

6 The second category defines an associative potential for each clique that is created from all repetitions of a unigram and the potentials can take more general forms like the Majority function (Krishnan and Manning, 2006). [sent-52, score-1.112]

7 This paper unifies various collective extraction tasks with different forms of associative potentials under a single framework. [sent-55, score-0.897]

8 The cluster graph comprises of one cluster per MRF-instance, and one cluster per clique with its corresponding associative potential. [sent-58, score-0.87]

9 As in the traditional collective extraction models, we assume that a clique comprises of all the occurrences of a particular token. [sent-59, score-0.933]

10 First, it allows us to plug in and study various clique potentials cleanly under the same cluster message passing umbrella. [sent-63, score-1.145]

11 Specifically, we show that most associative potentials used in practice are symmetric clique potentials. [sent-65, score-1.06]

12 Symmetric clique potentials are invariant under any permutation of their argu3098 C OLLECTIVE I NFERENCE FOR J OINT E XTRACTION WITH S YMMETRIC C LIQUE P OTENTIALS ments. [sent-66, score-0.873]

13 For example in Figure 1(d), the clique potential on ‘Iraq’ would give a low score if its end vertices had different labels, regardless of which ‘Iraq’ vertex gets what label. [sent-68, score-0.985]

14 We present three families of symmetric clique potentials that capture label associativity in different ways—MAX, SUM (which subsumes the popular Potts potential), and MAJORITY. [sent-69, score-1.127]

15 The most crucial component of the collective inference problem is then to efficiently compute outgoing messages from clique clusters governed by symmetric potentials. [sent-70, score-1.109]

16 We present a suite of efficient combinatorial algorithms tailored to specific symmetric clique potentials for the clique inference sub-problem. [sent-75, score-1.541]

17 We devised a combinatorial algorithm called α-pass that computes exact outgoing messages for cliques with MAX potentials, and also for any arbitrary symmetric clique potential over two labels. [sent-76, score-0.773]

18 We show that α-pass provides 13 a 15 -approximation for clique inference with the well-known and NP-hard Potts potential with the SUM family. [sent-77, score-0.679]

19 We show that this analysis is tight, and that the corresponding clique inference bound 1 by alternative schemes like α-expansion, LP-rounding, TRW-S and ICM are either 2 . [sent-78, score-0.618]

20 Alternative clique inference approaches such as the graph-cut algorithm of Boykov et al. [sent-82, score-0.618]

21 We also show that decomposing the problem over the cluster graph and employing fast message computation algorithms helps our collective inference scheme converge one-order faster than alternatives like loopy belief propagation. [sent-87, score-0.675]

22 We then extend our collective inference framework to capture associativity of a more general kind than just labels of unigram repetitions. [sent-89, score-0.634]

23 We use this collective signal to couple together the labelings of intra-domain records, and show how our collective inference framework naturally extends to this scenario and provides significant reduction in error. [sent-95, score-0.897]

24 Figure 1(e) shows the cluster graph for the collective model of (d) with one cluster per sentence (shown as flat chains), and one cluster per associative clique (shown as big circles). [sent-105, score-1.246]

25 This paper presents a detailed treatment of the algorithm and its generalized version, with rigorous proofs of their approximation bounds, including some new bounds for a variant of the clique inference objective with Potts potential. [sent-108, score-0.618]

26 , 2007), there has been an abundance of research on higher order clique potentials in last few years, primarily by the computer vision community. [sent-111, score-0.873]

27 , 2007a; Komodakis and Paragios, 2009; Kumar and Torr, 2008a; Werner, 2008), reduction of clique potentials to pairwise potentials (Kohli et al. [sent-113, score-1.223]

28 , 2009; Ishikawa, 2009), and special families of clique potentials (Potetz and Lee, 2008; Rother et al. [sent-114, score-0.894]

29 In Section 3, we discuss the cluster message passing algorithm for collective inference, and introduce the clique inference sub-problem which formalizes the task of computing outbound messages from clique clusters. [sent-121, score-1.854]

30 In Section 5 we present the α-pass and LR clique inference algorithms, 3100 C OLLECTIVE I NFERENCE FOR J OINT E XTRACTION WITH S YMMETRIC C LIQUE P OTENTIALS and analyze their approximation quality. [sent-123, score-0.618]

31 Section 7 contains experimental results of three types: (a) quality of the α-pass and LR algorithms; (b) effect of modeling more general associative properties in a domain adaptation task; and (c) collective inference via cluster message passing vs alternatives. [sent-125, score-0.88]

32 We express the conformance in the label assigned to positions in D(t) with a higher order clique potential Ct ({yk }(k,i)∈D(t) ). [sent-147, score-0.728]

33 i t∈T The clique potential Ct has two important properties: First, it is invariant under any permutation of its arguments—that is, it is a symmetric function of its input. [sent-149, score-0.634]

34 The earliest and the most popular collective model used associative potentials over every pair of occurrences of t (Sutton and McCallum, 2004; Finkel et al. [sent-154, score-0.863]

35 Other known collective models like the ones proposed by Lu and i Getoor (2003) and Krishnan and Manning (2006) can also be cast using the ‘Majority’ symmetric clique potential as we shall see in later sections. [sent-161, score-1.01]

36 Having defined our collective model, and shown that it subsumes popular collective models, we are ready to define the collective inference objective. [sent-162, score-1.223]

37 i (2) t∈T In general terms, collective inference corresponds to labeling the nodes of a generic cyclic graph. [sent-170, score-0.609]

38 However, our graph has a special structure—it is composed of chains and cliques, and although the intra-chain edge potentials φ can be arbitrary, the clique potentials are associative and symmetric. [sent-171, score-1.418]

39 All three clique potentials are Ct ({y, y′ }) = δ(y = y′ ). [sent-178, score-0.873]

40 Thus, the total contribution from clique potentials is E + k. [sent-184, score-0.873]

41 This involves passing messages along edges inside the chains, as well as along the clique edges. [sent-192, score-0.68]

42 First, some symmetric potentials like the Majority potential cannot be decomposed along the clique edges, so we cannot compute the corresponding messages. [sent-194, score-0.984]

43 Hence we adopt message passing on a special cluster graph where every chain and clique corresponds to a cluster. [sent-197, score-0.82]

44 The clique cluster for a unigram is adjacent to all the chains that contain that unigram, as shown via singleton separator vertices in Figure 1(e). [sent-198, score-0.796]

45 This setup of message passing on the cluster graph allows us to exploit potential-specific algorithms at the cliques, and at the same time work with any arbitrary symmetric clique potentials. [sent-199, score-0.845]

46 Let mk→t and mt→k denote message vectors from chain k to an incident clique t and vice-versa. [sent-203, score-0.709]

47 A clique t ′ is incident to chain k via only a singleton vertex so we can easily absorb the message mt ′ →k by including it in the node potential of this vertex. [sent-210, score-1.006]

48 (4) The maximization term is an instance of the general clique inference problem defined as: Definition 3 (Clique Inference) Given a clique over n vertices, with a symmetric clique potential C(y1 , . [sent-235, score-1.775]

49 , yn ), and vertex potentials ψ jy for all j ≤ n and labels y. [sent-238, score-0.674]

50 Compute a labeling of the clique vertices that maximizes: n max y1 ,. [sent-239, score-0.826]

51 (5) j=1 Thus the second term in Equation 4 can be seen as clique inference by defining ψ jy m j→t (y) and C Ct . [sent-246, score-0.648]

52 To compute mt→k (y), we can solve the clique inference problem with the easily enforceable constraint yk = y. [sent-247, score-0.733]

53 From now on, we refer to outbound message computation at the cliques as clique inference. [sent-248, score-0.71]

54 Symmetric Clique Potentials Having established cluster message passing as our collective inference paradigm, and clique inference as our message computation tool, we turn our attention towards various families of symmetric clique potentials. [sent-250, score-2.089]

55 As seen in Section 2, these associative clique potentials depend only on the histogram of label counts {ny |y = 1, . [sent-251, score-1.188]

56 , m} over the clique, ny being the number of clique vertices labeled y. [sent-254, score-0.722]

57 Here we clarify that n is used to denote the entire count histogram, ny to denote the count for label y, while n denotes the clique size. [sent-262, score-0.813]

58 An associative symmetric clique potential is thus maximized when ny = n for some y, that is, one label is given to all the clique vertices. [sent-263, score-1.472]

59 In Section 5 we will look at various potential-specific exact and approximate clique inference algorithms that exploit the specific structure of the potential at a clique. [sent-265, score-0.7]

60 In particular, we consider the three types of symmetric clique potentials listed in Table 1. [sent-266, score-0.923]

61 1 MAX Clique Potentials These clique potentials are of the form: C(n1 , . [sent-269, score-0.873]

62 Table 1: Three kinds of symmetric clique potentials considered in this paper. [sent-277, score-0.923]

63 , nm ) denotes the counts of various labels among the clique vertices. [sent-281, score-0.65]

64 When fy (ny ) ny , we get the makespan clique potential which has roots in the job-scheduling literature. [sent-283, score-0.73]

65 1, we present the α-pass algorithm that computes messages for MAX clique potentials exactly in O(mn log n) time. [sent-288, score-0.938]

66 2 SUM SUM Clique Potentials clique potentials are of the form: C(n1 , . [sent-291, score-0.873]

67 y This family of potentials includes functions that aggregate the histogram skew over bins, for example, the entropy potential where fy (ny ) ∝ ny log ny . [sent-295, score-0.653]

68 The summation of these terms over a clique is equivalent (up to a constant) to the clique potential: CPotts (n1 , . [sent-298, score-1.046]

69 We will show that the clique inference problem is NP-hard with the above potential and provide a 13 -approximation in Section 5. [sent-303, score-0.679]

70 , yn ) to denote the clique inference objective in Equation 5. [sent-338, score-0.657]

71 , node potential) of the clique labeling y and the second term is the clique score (i. [sent-344, score-1.248]

72 The MAP clique labeling will be denoted by y∗ , and ˆ y will denote a possibly sub-optimal labeling. [sent-349, score-0.661]

73 We show that the clique inference is easy for any C() with just two labels and in Section 5. [sent-350, score-0.672]

74 We address the SUM and MAJORITY clique potentials respectively in Sections 5. [sent-353, score-0.873]

75 4 we show how to extend the clique inference algorithms to efficiently batch the computation of multiple max-marginals. [sent-357, score-0.618]

76 ˆ Let yαk denote the clique labeling thus obtained in the (α, k)th combination. [sent-369, score-0.661]

77 y y For clique potentials that are decomposable over the edges, as in Potts, this runtime is much better than ordinary belief propagation, which would cost O(m2 n2 ). [sent-375, score-0.873]

78 Theorem 4 The α-pass algorithm solves the clique inference problem exactly for tentials in O(mn log n) time. [sent-390, score-0.618]

79 MAX clique po- ˆ Proof Let y∗ be the true MAP and let β = argmaxy fy (ny (y∗ )), ℓ = nβ (y∗ ). [sent-391, score-0.652]

80 Although we had initially designed the α-pass algorithm for MAX potentials, a similar argument can be used to show that α-pass performs exact clique inference for any arbitrary symmetric potential when we have two labels. [sent-396, score-0.75]

81 We also note that in some real-life tasks the vertex terms tend to heavily dominate the clique term. [sent-403, score-0.724]

82 2 Clique Inference for SUM Potentials We focus on the Potts potential, the most popular member of the given by CPotts (y) = λ ∑y n2 and the clique inference objective is: y n max y1 ,. [sent-408, score-0.647]

83 Theorem 5 When C(y) = λ ∑y n2 , λ > 0, clique inference is NP-hard. [sent-421, score-0.618]

84 We create an instance of clique inference as follows. [sent-424, score-0.618]

85 The MAP score in the constructed clique inference instance 3 will be (2n2 + 32 n )λ iff we can find an exact cover. [sent-428, score-0.703]

86 We now state the more involved proof for showing that α-pass actually provides a tight approximation bound of 13 for clique inference with Potts when the vertex scores are non-negative. [sent-447, score-0.85]

87 1 G ENERALIZED α- PASS A LGORITHM In α-pass, for each label α, we go over each count k and find the best vertex score with k vertices assigned label α. [sent-454, score-0.716]

88 We explore the behavior of this algorithm for clique inference with Potts potentials. [sent-476, score-0.618]

89 3 C OMPARISON WITH E XISTING A PPROXIMATION B OUNDS As mentioned earlier, the CPotts clique potential is equivalent to the sum of Potts potentials over edges of a complete graph. [sent-499, score-0.958]

90 Symmetrically, we also show that the α-expansion algorithm provides a bound of 2 for our original max-version of the clique inference problem. [sent-504, score-0.618]

91 Then the clique energy is 0 and vertex energy is n(2n − 2). [sent-516, score-0.798]

92 But this reduces vertex energy by 2n − 2 but increases clique energy by the same amount, so no switching is done. [sent-518, score-0.798]

93 If k ≤ n/2, then the clique energy in the optimal labeling is at least γ(n2 − nk) = γn(n − k). [sent-535, score-0.698]

94 y The main reason α-pass provides a good bound for Potts potentials is that it guarantees a clique potential of at least n2 where n1 is the count of the most dominant label in the optimal solution y∗ . [sent-555, score-1.105]

95 If the jth vertex lies in the yth chunk, then let it have a vertex score of log n with label y in A and a vertex score of log n + ε with the jth label in B. [sent-563, score-0.997]

96 Let i be a vertex currently assigned y and let β(i) be its second most preferred label under the vertex potentials ψα . [sent-668, score-0.896]

97 We show that for symmetric potentials we can compute max-marginals in O(m2 ) calls to the clique inference algorithm in practice, as opposed to the expected nm invocations. [sent-685, score-1.061]

98 For label pairs α, β, define a modified clique potential function Cαβ that adds “1” to the count for label β and subtracts “1” from the count of label α: Cαβ (y) = C(n1 (y), . [sent-688, score-1.041]

99 This concludes the description of our various clique inference algorithms and their theoretical properties. [sent-707, score-0.618]

100 We will see that the same cluster message passing setup and collective inference algorithms can be used almost as is in this more general scenario. [sent-710, score-0.743]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('clique', 0.523), ('collective', 0.376), ('potentials', 0.35), ('vertex', 0.201), ('ziy', 0.169), ('potts', 0.167), ('labeling', 0.138), ('associative', 0.137), ('vertices', 0.136), ('message', 0.134), ('iy', 0.123), ('label', 0.115), ('yk', 0.115), ('otentials', 0.107), ('arawagi', 0.098), ('iwan', 0.098), ('upta', 0.098), ('inference', 0.095), ('lique', 0.089), ('ollective', 0.089), ('xtraction', 0.089), ('fy', 0.083), ('ymmetric', 0.076), ('majority', 0.075), ('cluster', 0.07), ('oint', 0.068), ('associativity', 0.068), ('passing', 0.068), ('messages', 0.065), ('score', 0.064), ('ny', 0.063), ('potential', 0.061), ('ct', 0.057), ('count', 0.056), ('nference', 0.055), ('labels', 0.054), ('maxy', 0.053), ('boykov', 0.053), ('cliques', 0.053), ('labelings', 0.05), ('symmetric', 0.05), ('argmaxy', 0.046), ('iraq', 0.044), ('venue', 0.044), ('nm', 0.043), ('token', 0.041), ('unigram', 0.041), ('yn', 0.039), ('ip', 0.038), ('fa', 0.038), ('krishnan', 0.038), ('records', 0.038), ('energy', 0.037), ('cpotts', 0.036), ('homepage', 0.036), ('iter', 0.036), ('yth', 0.036), ('lgorithm', 0.035), ('mt', 0.035), ('extraction', 0.034), ('gupta', 0.033), ('mrf', 0.033), ('yv', 0.033), ('histogram', 0.033), ('mk', 0.033), ('edge', 0.032), ('lr', 0.032), ('manning', 0.032), ('mrfs', 0.031), ('title', 0.031), ('scores', 0.031), ('conservative', 0.031), ('counts', 0.03), ('getoor', 0.03), ('golden', 0.03), ('jy', 0.03), ('sy', 0.03), ('mn', 0.03), ('assigned', 0.029), ('max', 0.029), ('move', 0.027), ('incident', 0.027), ('bibliographic', 0.027), ('iitb', 0.027), ('sarawagi', 0.027), ('sunita', 0.027), ('troops', 0.027), ('wlog', 0.027), ('chains', 0.026), ('sentences', 0.026), ('chain', 0.025), ('ie', 0.025), ('relaxation', 0.025), ('switch', 0.025), ('edges', 0.024), ('ips', 0.023), ('author', 0.021), ('exact', 0.021), ('lp', 0.021), ('families', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 24 jmlr-2010-Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials

Author: Rahul Gupta, Sunita Sarawagi, Ajit A. Diwan

Abstract: Many structured information extraction tasks employ collective graphical models that capture interinstance associativity by coupling them with various clique potentials. We propose tractable families of such potentials that are invariant under permutations of their arguments, and call them symmetric clique potentials. We present three families of symmetric potentials—MAX, SUM, and MAJORITY . We propose cluster message passing for collective inference with symmetric clique potentials, and present message computation algorithms tailored to such potentials. Our first message computation algorithm, called α-pass, is sub-quadratic in the clique size, outputs exact messages for 13 MAX , and computes 15 -approximate messages for Potts, a popular member of the SUM family. Empirically, it is upto two orders of magnitude faster than existing algorithms based on graph-cuts or belief propagation. Our second algorithm, based on Lagrangian relaxation, operates on MAJORITY potentials and provides close to exact solutions while being two orders of magnitude faster. We show that the cluster message passing framework is more principled, accurate and converges faster than competing approaches. We extend our collective inference framework to exploit associativity of more general intradomain properties of instance labelings, which opens up interesting applications in domain adaptation. Our approach leads to significant error reduction on unseen domains without incurring any overhead of model retraining. Keywords: passing graphical models, collective inference, clique potentials, cluster graphs, message

2 0.074077219 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library

Author: Ariel Jaimovich, Ofer Meshi, Ian McGraw, Gal Elidan

Abstract: The FastInf C++ library is designed to perform memory and time efficient approximate inference in large-scale discrete undirected graphical models. The focus of the library is propagation based approximate inference methods, ranging from the basic loopy belief propagation algorithm to propagation based on convex free energies. Various message scheduling schemes that improve on the standard synchronous or asynchronous approaches are included. Also implemented are a clique tree based exact inference, Gibbs sampling, and the mean field algorithm. In addition to inference, FastInf provides parameter estimation capabilities as well as representation and learning of shared parameters. It offers a rich interface that facilitates extension of the basic classes to other inference and learning methods. Keywords: graphical models, Markov random field, loopy belief propagation, approximate inference

3 0.066485502 42 jmlr-2010-Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

Author: Gideon S. Mann, Andrew McCallum

Abstract: In this paper, we present an overview of generalized expectation criteria (GE), a simple, robust, scalable method for semi-supervised training using weakly-labeled data. GE fits model parameters by favoring models that match certain expectation constraints, such as marginal label distributions, on the unlabeled data. This paper shows how to apply generalized expectation criteria to two classes of parametric models: maximum entropy models and conditional random fields. Experimental results demonstrate accuracy improvements over supervised training and a number of other stateof-the-art semi-supervised learning methods for these models. Keywords: generalized expectation criteria, semi-supervised learning, logistic regression, conditional random fields

4 0.061310858 91 jmlr-2010-Posterior Regularization for Structured Latent Variable Models

Author: Kuzman Ganchev, João Graça, Jennifer Gillenwater, Ben Taskar

Abstract: We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model complexity from the complexity of structural constraints it is desired to satisfy. By directly imposing decomposable regularization on the posterior moments of latent variables during learning, we retain the computational efficiency of the unconstrained model while ensuring desired constraints hold in expectation. We present an efficient algorithm for learning with posterior regularization and illustrate its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multi-view learning, cross-lingual dependency grammar induction, unsupervised part-of-speech induction, and bitext word alignment.1 Keywords: posterior regularization framework, unsupervised learning, latent variables models, prior knowledge, natural language processing

5 0.050363131 88 jmlr-2010-Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure

Author: Kaname Kojima, Eric Perrier, Seiya Imoto, Satoru Miyano

Abstract: We study the problem of learning an optimal Bayesian network in a constrained search space; skeletons are compelled to be subgraphs of a given undirected graph called the super-structure. The previously derived constrained optimal search (COS) remains limited even for sparse superstructures. To extend its feasibility, we propose to divide the super-structure into several clusters and perform an optimal search on each of them. Further, to ensure acyclicity, we introduce the concept of ancestral constraints (ACs) and derive an optimal algorithm satisfying a given set of ACs. Finally, we theoretically derive the necessary and sufficient sets of ACs to be considered for finding an optimal constrained graph. Empirical evaluations demonstrate that our algorithm can learn optimal Bayesian networks for some graphs containing several hundreds of vertices, and even for super-structures having a high average degree (up to four), which is a drastic improvement in feasibility over the previous optimal algorithm. Learnt networks are shown to largely outperform state-of-the-art heuristic algorithms both in terms of score and structural hamming distance. Keywords: Bayesian networks, structure learning, constrained optimal search

6 0.044070452 44 jmlr-2010-Graph Kernels

7 0.043479111 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels

8 0.040503003 69 jmlr-2010-Lp-Nested Symmetric Distributions

9 0.040113557 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

10 0.040070385 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

11 0.03978223 13 jmlr-2010-Approximate Inference on Planar Graphs using Loop Calculus and Belief Propagation

12 0.03776921 118 jmlr-2010-libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models

13 0.035856206 61 jmlr-2010-Learning From Crowds

14 0.035581809 76 jmlr-2010-Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes

15 0.034304548 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

16 0.033926196 20 jmlr-2010-Chromatic PAC-Bayes Bounds for Non-IID Data: Applications to Ranking and Stationary β-Mixing Processes

17 0.030279562 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference

18 0.030279262 7 jmlr-2010-A Streaming Parallel Decision Tree Algorithm

19 0.028302107 12 jmlr-2010-Analysis of Multi-stage Convex Relaxation for Sparse Regularization

20 0.02824422 72 jmlr-2010-Matrix Completion from Noisy Entries


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.134), (1, 0.002), (2, -0.054), (3, -0.033), (4, -0.01), (5, -0.028), (6, 0.016), (7, 0.011), (8, -0.067), (9, 0.191), (10, -0.085), (11, -0.015), (12, 0.174), (13, 0.063), (14, 0.049), (15, -0.006), (16, 0.018), (17, 0.035), (18, -0.015), (19, 0.092), (20, 0.047), (21, -0.004), (22, -0.009), (23, 0.027), (24, 0.076), (25, 0.018), (26, 0.002), (27, -0.044), (28, -0.094), (29, 0.082), (30, 0.163), (31, -0.001), (32, -0.116), (33, -0.14), (34, 0.027), (35, -0.228), (36, -0.076), (37, -0.145), (38, -0.075), (39, 0.151), (40, 0.131), (41, 0.176), (42, 0.02), (43, -0.261), (44, 0.011), (45, -0.009), (46, -0.001), (47, 0.191), (48, -0.036), (49, -0.058)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97108859 24 jmlr-2010-Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials

Author: Rahul Gupta, Sunita Sarawagi, Ajit A. Diwan

Abstract: Many structured information extraction tasks employ collective graphical models that capture interinstance associativity by coupling them with various clique potentials. We propose tractable families of such potentials that are invariant under permutations of their arguments, and call them symmetric clique potentials. We present three families of symmetric potentials—MAX, SUM, and MAJORITY . We propose cluster message passing for collective inference with symmetric clique potentials, and present message computation algorithms tailored to such potentials. Our first message computation algorithm, called α-pass, is sub-quadratic in the clique size, outputs exact messages for 13 MAX , and computes 15 -approximate messages for Potts, a popular member of the SUM family. Empirically, it is upto two orders of magnitude faster than existing algorithms based on graph-cuts or belief propagation. Our second algorithm, based on Lagrangian relaxation, operates on MAJORITY potentials and provides close to exact solutions while being two orders of magnitude faster. We show that the cluster message passing framework is more principled, accurate and converges faster than competing approaches. We extend our collective inference framework to exploit associativity of more general intradomain properties of instance labelings, which opens up interesting applications in domain adaptation. Our approach leads to significant error reduction on unseen domains without incurring any overhead of model retraining. Keywords: passing graphical models, collective inference, clique potentials, cluster graphs, message

2 0.535963 88 jmlr-2010-Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure

Author: Kaname Kojima, Eric Perrier, Seiya Imoto, Satoru Miyano

Abstract: We study the problem of learning an optimal Bayesian network in a constrained search space; skeletons are compelled to be subgraphs of a given undirected graph called the super-structure. The previously derived constrained optimal search (COS) remains limited even for sparse superstructures. To extend its feasibility, we propose to divide the super-structure into several clusters and perform an optimal search on each of them. Further, to ensure acyclicity, we introduce the concept of ancestral constraints (ACs) and derive an optimal algorithm satisfying a given set of ACs. Finally, we theoretically derive the necessary and sufficient sets of ACs to be considered for finding an optimal constrained graph. Empirical evaluations demonstrate that our algorithm can learn optimal Bayesian networks for some graphs containing several hundreds of vertices, and even for super-structures having a high average degree (up to four), which is a drastic improvement in feasibility over the previous optimal algorithm. Learnt networks are shown to largely outperform state-of-the-art heuristic algorithms both in terms of score and structural hamming distance. Keywords: Bayesian networks, structure learning, constrained optimal search

3 0.4546718 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference

Author: Remco Bouckaert, Raymond Hemmecke, Silvia Lindner, Milan Studený

Abstract: The topic of the paper is computer testing of (probabilistic) conditional independence (CI) implications by an algebraic method of structural imsets. The basic idea is to transform (sets of) CI statements into certain integral vectors and to verify by a computer the corresponding algebraic relation between the vectors, called the independence implication. We interpret the previous methods for computer testing of this implication from the point of view of polyhedral geometry. However, the main contribution of the paper is a new method, based on linear programming (LP). The new method overcomes the limitation of former methods to the number of involved variables. We recall/describe the theoretical basis for all four methods involved in our computational experiments, whose aim was to compare the efficiency of the algorithms. The experiments show that the LP method is clearly the fastest one. As an example of possible application of such algorithms we show that testing inclusion of Bayesian network structures or whether a CI statement is encoded in an acyclic directed graph can be done by the algebraic method. Keywords: conditional independence inference, linear programming approach

4 0.38612369 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library

Author: Ariel Jaimovich, Ofer Meshi, Ian McGraw, Gal Elidan

Abstract: The FastInf C++ library is designed to perform memory and time efficient approximate inference in large-scale discrete undirected graphical models. The focus of the library is propagation based approximate inference methods, ranging from the basic loopy belief propagation algorithm to propagation based on convex free energies. Various message scheduling schemes that improve on the standard synchronous or asynchronous approaches are included. Also implemented are a clique tree based exact inference, Gibbs sampling, and the mean field algorithm. In addition to inference, FastInf provides parameter estimation capabilities as well as representation and learning of shared parameters. It offers a rich interface that facilitates extension of the basic classes to other inference and learning methods. Keywords: graphical models, Markov random field, loopy belief propagation, approximate inference

5 0.32871521 91 jmlr-2010-Posterior Regularization for Structured Latent Variable Models

Author: Kuzman Ganchev, João Graça, Jennifer Gillenwater, Ben Taskar

Abstract: We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model complexity from the complexity of structural constraints it is desired to satisfy. By directly imposing decomposable regularization on the posterior moments of latent variables during learning, we retain the computational efficiency of the unconstrained model while ensuring desired constraints hold in expectation. We present an efficient algorithm for learning with posterior regularization and illustrate its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multi-view learning, cross-lingual dependency grammar induction, unsupervised part-of-speech induction, and bitext word alignment.1 Keywords: posterior regularization framework, unsupervised learning, latent variables models, prior knowledge, natural language processing

6 0.32656777 61 jmlr-2010-Learning From Crowds

7 0.27962208 42 jmlr-2010-Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

8 0.25040272 63 jmlr-2010-Learning Instance-Specific Predictive Models

9 0.22277398 118 jmlr-2010-libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models

10 0.21149901 69 jmlr-2010-Lp-Nested Symmetric Distributions

11 0.20050755 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks

12 0.19711593 13 jmlr-2010-Approximate Inference on Planar Graphs using Loop Calculus and Belief Propagation

13 0.17745033 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

14 0.17719144 101 jmlr-2010-Second-Order Bilinear Discriminant Analysis

15 0.16942903 8 jmlr-2010-A Surrogate Modeling and Adaptive Sampling Toolbox for Computer Based Design

16 0.16939819 3 jmlr-2010-A Fast Hybrid Algorithm for Large-Scalel1-Regularized Logistic Regression

17 0.16682033 66 jmlr-2010-Linear Algorithms for Online Multitask Classification

18 0.16258949 37 jmlr-2010-Evolving Static Representations for Task Transfer

19 0.1550881 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

20 0.15248775 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.029), (4, 0.02), (8, 0.021), (21, 0.023), (32, 0.073), (36, 0.05), (37, 0.062), (63, 0.421), (75, 0.091), (81, 0.011), (85, 0.045), (96, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.70554459 24 jmlr-2010-Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials

Author: Rahul Gupta, Sunita Sarawagi, Ajit A. Diwan

Abstract: Many structured information extraction tasks employ collective graphical models that capture interinstance associativity by coupling them with various clique potentials. We propose tractable families of such potentials that are invariant under permutations of their arguments, and call them symmetric clique potentials. We present three families of symmetric potentials—MAX, SUM, and MAJORITY . We propose cluster message passing for collective inference with symmetric clique potentials, and present message computation algorithms tailored to such potentials. Our first message computation algorithm, called α-pass, is sub-quadratic in the clique size, outputs exact messages for 13 MAX , and computes 15 -approximate messages for Potts, a popular member of the SUM family. Empirically, it is upto two orders of magnitude faster than existing algorithms based on graph-cuts or belief propagation. Our second algorithm, based on Lagrangian relaxation, operates on MAJORITY potentials and provides close to exact solutions while being two orders of magnitude faster. We show that the cluster message passing framework is more principled, accurate and converges faster than competing approaches. We extend our collective inference framework to exploit associativity of more general intradomain properties of instance labelings, which opens up interesting applications in domain adaptation. Our approach leads to significant error reduction on unseen domains without incurring any overhead of model retraining. Keywords: passing graphical models, collective inference, clique potentials, cluster graphs, message

2 0.31858563 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

Author: Yevgeny Seldin, Naftali Tishby

Abstract: We derive PAC-Bayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as co-clustering, matrix tri-factorization, graphical models, graph clustering, and pairwise clustering.1 We begin with the analysis of co-clustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in co-occurrence matrices. We derive PAC-Bayesian generalization bounds for the expected out-of-sample performance of co-clustering-based solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of co-clustering. The bounds suggest that the expected performance of co-clustering is governed by a trade-off between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for finding a local optimum of this trade-off for discriminative prediction tasks. This algorithm achieved stateof-the-art performance in the MovieLens collaborative filtering task. Our co-clustering model can also be seen as matrix tri-factorization and the results provide generalization bounds, regularization terms, and new algorithms for this form of matrix factorization. The analysis of co-clustering is extended to tree-shaped graphical models, which can be used to analyze high dimensional tensors. According to the bounds, the generalization abilities of treeshaped graphical models depend on a trade-off between their empirical data fit and the mutual information that is propagated up the tree levels. We also formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. The analysis of co-clustering easily

3 0.31352171 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming

Author: Ming Yuan

Abstract: This paper considers the problem of estimating a high dimensional inverse covariance matrix that can be well approximated by “sparse” matrices. Taking advantage of the connection between multivariate linear regression and entries of the inverse covariance matrix, we propose an estimating procedure that can effectively exploit such “sparsity”. The proposed method can be computed using linear programming and therefore has the potential to be used in very high dimensional problems. Oracle inequalities are established for the estimation error in terms of several operator norms, showing that the method is adaptive to different types of sparsity of the problem. Keywords: covariance selection, Dantzig selector, Gaussian graphical model, inverse covariance matrix, Lasso, linear programming, oracle inequality, sparsity

4 0.3085264 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels

Author: Pinar Donmez, Guy Lebanon, Krishnakumar Balasubramanian

Abstract: Estimating the error rates of classifiers or regression models is a fundamental task in machine learning which has thus far been studied exclusively using supervised learning techniques. We propose a novel unsupervised framework for estimating these error rates using only unlabeled data and mild assumptions. We prove consistency results for the framework and demonstrate its practical applicability on both synthetic and real world data. Keywords: classification and regression, maximum likelihood, latent variable models

5 0.30837128 103 jmlr-2010-Sparse Semi-supervised Learning Using Conjugate Functions

Author: Shiliang Sun, John Shawe-Taylor

Abstract: In this paper, we propose a general framework for sparse semi-supervised learning, which concerns using a small portion of unlabeled data and a few labeled data to represent target functions and thus has the merit of accelerating function evaluations when predicting the output of a new example. This framework makes use of Fenchel-Legendre conjugates to rewrite a convex insensitive loss involving a regularization with unlabeled data, and is applicable to a family of semi-supervised learning methods such as multi-view co-regularized least squares and single-view Laplacian support vector machines (SVMs). As an instantiation of this framework, we propose sparse multi-view SVMs which use a squared ε-insensitive loss. The resultant optimization is an inf-sup problem and the optimal solutions have arguably saddle-point properties. We present a globally optimal iterative algorithm to optimize the problem. We give the margin bound on the generalization error of the sparse multi-view SVMs, and derive the empirical Rademacher complexity for the induced function class. Experiments on artificial and real-world data show their effectiveness. We further give a sequential training approach to show their possibility and potential for uses in large-scale problems and provide encouraging experimental results indicating the efficacy of the margin bound and empirical Rademacher complexity on characterizing the roles of unlabeled data for semi-supervised learning. Keywords: semi-supervised learning, Fenchel-Legendre conjugate, representer theorem, multiview regularization, support vector machine, statistical learning theory

6 0.30646986 109 jmlr-2010-Stochastic Composite Likelihood

7 0.30522665 74 jmlr-2010-Maximum Relative Margin and Data-Dependent Regularization

8 0.30365494 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

9 0.30356506 102 jmlr-2010-Semi-Supervised Novelty Detection

10 0.30330637 95 jmlr-2010-Rademacher Complexities and Bounding the Excess Risk in Active Learning

11 0.303249 104 jmlr-2010-Sparse Spectrum Gaussian Process Regression

12 0.30116972 91 jmlr-2010-Posterior Regularization for Structured Latent Variable Models

13 0.29999384 60 jmlr-2010-Learnability, Stability and Uniform Convergence

14 0.29984733 38 jmlr-2010-Expectation Truncation and the Benefits of Preselection In Training Generative Models

15 0.29952139 106 jmlr-2010-Stability Bounds for Stationary φ-mixing and β-mixing Processes

16 0.29900512 107 jmlr-2010-Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion

17 0.29878265 42 jmlr-2010-Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

18 0.29833573 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking

19 0.29801735 92 jmlr-2010-Practical Approaches to Principal Component Analysis in the Presence of Missing Values

20 0.29782021 69 jmlr-2010-Lp-Nested Symmetric Distributions