nips nips2010 nips2010-83 knowledge-graph by maker-knowledge-mining

83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs

Source: pdf

Author: Anton Chechetka, Carlos Guestrin

Abstract: We present a simple and effective approach to learning tractable conditional random ﬁelds with structure that depends on the evidence. Our approach retains the advantages of tractable discriminative models, namely efﬁcient exact inference and arbitrarily accurate parameter learning in polynomial time. At the same time, our algorithm does not suffer a large expressive power penalty inherent to ﬁxed tractable structures. On real-life relational datasets, our approach matches or exceeds state of the art accuracy of the dense models, and at the same time provides an order of magnitude speedup. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a simple and effective approach to learning tractable conditional random ﬁelds with structure that depends on the evidence. [sent-5, score-0.302]

2 Our approach retains the advantages of tractable discriminative models, namely efﬁcient exact inference and arbitrarily accurate parameter learning in polynomial time. [sent-6, score-0.238]

3 At the same time, our algorithm does not suffer a large expressive power penalty inherent to ﬁxed tractable structures. [sent-7, score-0.18]

4 On real-life relational datasets, our approach matches or exceeds state of the art accuracy of the dense models, and at the same time provides an order of magnitude speedup. [sent-8, score-0.436]

5 A key advantage of CRFs over other probabilistic graphical models (PGMs, [3]) stems from the observation that in almost all applications, some variables are unknown at test time (we will denote such variables X ), but others, called the evidence E, are known at test time. [sent-10, score-0.151]

6 The discriminative approach adopted by CRFs allows for better approximation quality of the learned conditional distribution P (X | E), because the representational power of the model is not “wasted” on modeling P (E). [sent-12, score-0.218]

7 To overcome the extra computational challenges posed by the conditional random ﬁelds, practitioners usually resort to several of the following approximations throughout the process: • • • • CRF structure is speciﬁed by hand, leading to suboptimal structures. [sent-17, score-0.234]

8 Approximate inference at test time results in suboptimal results [5]. [sent-19, score-0.15]

9 Replacing the CRF conditional likelihood objective with a more tractable one (e. [sent-20, score-0.205]

10 [6]) results in suboptimal models (both in terms of learned structure and parameters). [sent-22, score-0.166]

11 For 1 such models, parameter learning and inference can be done exactly1 ; only structure learning involves approximations. [sent-25, score-0.166]

12 Generalize the Chow-Liu algorithm [7] to learn evidence-speciﬁc structures for tree CRFs. [sent-31, score-0.165]

13 Generalize tree CRFs with evidence-speciﬁc structure (ESS-CRFs) to the relational setting. [sent-32, score-0.494]

14 Demonstrate empirically the superior performance of ESS-CRFs over densely connected models in terms of both accuracy and runtime on real-life relational models. [sent-33, score-0.406]

15 Given the features f and training data D that consists of fully observed assignments to X and E, the optimal feature weights w∗ maximize the conditional log-likelihood (CLLH) of the data:   w∗= arg max logP (X | E,w) = arg max (X,E)∈D   wijk fijk (Xi ,Xj ,E) − logZ(E,w)) . [sent-38, score-0.696]

16 Moreover, ∂ log P (X | E, w) = fijk (Xi ,Xj ,E) − EP (Xi ,Xj |E,w) [fijk (Xi ,Xj ,E)] , ∂wijk where EP denotes expectation with respect to a distribution P. [sent-40, score-0.252]

17 However, the gradient (3) contains the conditional distribution over Xi Xj , so computing (3) requires inference in the model for every datapoint. [sent-42, score-0.204]

18 Time complexity of the exact inference is exponential in the treewidth of the graph deﬁned by edges T [5]. [sent-43, score-0.312]

19 Therefore, exact evaluation of the CLLH objective (2)and gradient (3) and exact inference at test time are all only feasible for models with low-treewidth T. [sent-44, score-0.272]

20 Unfortunately, restricting the space of models to only those with low treewidth severely decreases the expressive power of CRFs. [sent-45, score-0.292]

21 Complex dependencies of real-life distributions usually cannot be adequately captured by a single tree-like structure, so most of the models used in practice have high treewidth, making exact inference infeasible. [sent-46, score-0.199]

22 Approximate inference is NP-hard [5], so approximate inference algorithms have very few result quality guarantees. [sent-52, score-0.223]

23 Greater expressive power of the models is thus obtained at the expense of worse quality of estimated parameters and inference. [sent-53, score-0.185]

24 Here, we show an alternative way to increase expressive power of tree-like structured CRFs without sacriﬁcing optimal weights learning and exact inference at test time. [sent-54, score-0.341]

25 In practice, our approach is much better suited for relational than for propositional settings, because of much higher parameters dimensionality in the propositional case. [sent-55, score-0.738]

26 However, we ﬁrst present in detail the propositional case theory to better convey the key high-level ideas. [sent-56, score-0.203]

27 3 Evidence-speciﬁc structure for CRFs Observe that, given a particular evidence value E, the set of edges T in the CRF formulation (1) actually can be viewed as a supergraph of the conditional model over X . [sent-57, score-0.279]

28 If T (E) has low treewidth for all values of E, inference and parameter learning using the effective structure are tractable, even if a priori structure T has high treewidth. [sent-62, score-0.44]

29 Unfortunately, in practice the treewidth of T (E) is usually not much smaller than the treewidth of T. [sent-63, score-0.298]

30 Low-treewidth effective structures are rarely used, because treewidth is a global property of the graph (even computing treewidth is NP-complete [13]), while feature design is a local process. [sent-64, score-0.458]

31 Achieving low treewidth for the effective structures requires elaborate feature design, making model construction very difﬁcult. [sent-66, score-0.332]

32 Instead, in this work, we separate construction of low-treewidth effective structures from feature design and weight learning, to combine the advantages of exact inference and discriminative weights learning, high expressive power of high-treewidth models, and local feature design. [sent-67, score-0.56]

33 Observe that the CRF deﬁnition (1) can be written equivalently as P (X | E, w) = Z −1 (E, w) exp wijk × (I((i, j) ∈ T ) · fijk (Xi , Xj , E)) . [sent-68, score-0.386]

34 In addition to the feature values f, the effective structure of the model is now controlled by the indicator functions I(·). [sent-70, score-0.157]

35 These indicator functions provide us with a way to control the treewidth of the effective structures independently of the features. [sent-71, score-0.277]

36 The resulting model, which we call a CRF with evidence-speciﬁc structure (ESS-CRF), deﬁnes a conditional distribution P (X | E, w, u) as follows P (X | E,w,u) = Z −1 (E,w,u) exp ij k wijk (I((i, j) ∈ T (E, u)) · fijk (Xi , Xj , E)) . [sent-75, score-0.617]

37 ESS-CRFs have an important advantage over the traditional parametrization: in (5) the parameters u that determine the model structure are decoupled from the feature weights w. [sent-78, score-0.22]

38 , optimizing u) can be decoupled from feature selection (choosing f ) and feature weights learning (optimizing w). [sent-81, score-0.164]

39 Such a decoupling makes it much easier to guarantee that the effective structure of the model has low treewidth by relegating all the necessary global computation to the structure construction algorithm T = T (E, u). [sent-82, score-0.385]

40 2 Optimize weights w to maximize conditional LLH (2) of the training data. [sent-84, score-0.202]

41 3 foreach E in test data do 4 Use conditional model (1) to deﬁne the conditional distribution P (X | E, w). [sent-86, score-0.298]

42 Use approximate inference to compute the marginals or the most likely assignment to X . [sent-87, score-0.18]

43 Algorithm 2: CRF with evidence-speciﬁc structures approach 1 Deﬁne features fijk (Xi , Xj , E). [sent-88, score-0.375]

44 2 Optimize weights w to maximize conditional LLH log P (X | E, u, w) of the training data. [sent-92, score-0.202]

45 Use exact inference to compute CLLH objective (2) and gradient (3). [sent-93, score-0.153]

46 3 foreach E in test data do 4 Use conditional model (5) to deﬁne the conditional distribution P (X | E, w, u). [sent-94, score-0.298]

47 Use exact inference to compute the marginals or the most likely assignment to X . [sent-95, score-0.208]

48 Also, ∂ logP(X | E,w,u) = I((i, j) ∈ T (E, u)) fijk (Xi ,Xj ,E)−EP (Xi ,Xj |E,w,u) [fijk (Xi ,Xj ,E)] . [sent-97, score-0.252]

49 4 Conditional Chow-Liu algorithm for tractable evidence-speciﬁc structures Learning the most likely PGM structure from data is in most cases intractable. [sent-104, score-0.274]

50 Unlike tree CRFs, however, likelihood of tree MRF structures decomposes into contributions of individual edges: LLH(T ) = (i,j)∈T I(Xi , Xj ) − Xi ∈X H(Xi ), (7) where I(·, ·) is the mutual information and H(·) is entropy. [sent-111, score-0.267]

51 Therefore, as shown in [7], the most likely structure can be obtained by taking the maximum spanning tree of a fully connected graph, where the weight of an edge ij is I(Xi , Xj ). [sent-112, score-0.254]

52 Given the concrete value E of evidence E, one can write down the conditional version of the tree structure likelihood (7) for that particular value of evidence: LLH(T | E) = (i,j)∈T IP (·|E) (Xi , Xj ) − Xi ∈X HP (·|E) (Xi ). [sent-114, score-0.349]

53 (8) If exact conditional distributions P (Xi , Xj | E) were available, then the same Chow-Liu algorithm would ﬁnd the optimal conditional structure. [sent-115, score-0.261]

54 However, we can still plug in approximate conditionals P (· | E) learned from 4 Algorithm 3: Conditional Chow-Liu algorithm for learning evidence-speciﬁc tree structures // Parameter learning stage. [sent-117, score-0.19]

55 with L-BFGS using the gradient (12) data using any standard density estimation technique3 In particular, with the same features fijk that are used in the CRF model, one can train a logistic regression model for P (· | E) : −1 P (Xi , Xj | E, uij ) = Zij (E, uij ) exp k uijk fijk (Xi , Xj , E) . [sent-123, score-0.66]

56 [8, 15] for higher treewidth junction trees, can be used as components in the same way as Chow-Liu algorithm is used in Alg. [sent-134, score-0.192]

57 5 Relational CRFs with evidence-speciﬁc sructure Traditional (also called propositional) PGMs are not well suited for dealing with relational data, where every variable is an entity of some type, and entities are related to each other via different types of links. [sent-136, score-0.407]

58 data assumption of traditional PGMs, and huge dimensionalities of relational datasets preclude learning meaningful propositional models. [sent-142, score-0.535]

59 Instead, several formulations of relational PGMs have been proposed [16] to work with relational data, including relational CRFs. [sent-143, score-1.019]

60 More concretely, in relational CRFs every variable Xi is assigned a type mi out of the set M of possible types. [sent-145, score-0.332]

61 A binary relation R ∈ R, corresponding to a speciﬁc type of link between two R variables, speciﬁes the types of its input arguments, and a set of features fk (·, ·, E) and feature R weights wk . [sent-146, score-0.249]

62 By accounting for parameter sharing, it is straightforward to adapt our ESS-CRF formulation to the relational setting. [sent-149, score-0.332]

63 We deﬁne the relational ESS-CRF conditional distribution as P(X | E,R,w,u) ∝ exp 3 R∈R I((i, j) ∈ T (E,u)) Xi ,Xj ∈inst(R,X ) R R wk fk (Xi , Xj , E) k (11) Notice that the approximation error from P (·) is the only source of approximations in all our approach. [sent-150, score-0.515]

64 Given the structure learning algorithm T (·, ·) that is guaranteed to return low-treewidth structures, one can learn optimal feature weights w∗ and perform inference at test time exactly: Observation 4 Relational ESS-CRF log-likelihood is concave with respect to w. [sent-160, score-0.328]

65 3) can be also extended to the relational setting by using templated logistic regression weights for estimating edge conditionals. [sent-163, score-0.436]

66 In the relational setting, one only needs to learn O(|R|) parameters, regardless of the dataset size, for both structure selection and feature weights, as opposed to O(|X |2 ) parameters for the propositional case. [sent-169, score-0.655]

67 Thus, relational ESS-CRFs are typically much less prone to overﬁtting than propositional ones. [sent-170, score-0.535]

68 6 Experiments We have tested the ESS-CRF approach on both propositional and relational data. [sent-171, score-0.535]

69 With the large number of parameters needed for the propositional case (O(|X |2 )), our approach is only practical for cases of abundant data. [sent-172, score-0.203]

70 So our experiments with propositional data serve only to prove the concept, verifying that ESS-CRF can successfully learn a model better than a single tree baseline. [sent-173, score-0.277]

71 In contrast to the propositional settings, in the relational cases the relatively low parameter space dimensionality (O(|R|2 )) almost eliminates the overﬁtting problem. [sent-174, score-0.535]

72 As a result, on relational datasets ESS-CRF is a very attractive approach in practice. [sent-175, score-0.332]

73 Our experiments show ESS-CRFs comfortably outperforming state of the art high-treewidth discriminative models on several real-life relational datasets. [sent-176, score-0.429]

74 1 Propositional models We compare ESS-CRFs with ﬁxed tree CRFs, where the tree structure learned by the Chow-Liu algorithm using P (X ). [sent-178, score-0.272]

75 The ﬁrst model, called FACES, aims to improve face recognition in collections of related images using information about similarity between different faces in addition to the standard single-face features. [sent-192, score-0.188]

76 Pairwise features f (Xi , Xj , E), based on blob color similarity, indicate how close two faces are in appearance. [sent-195, score-0.169]

77 The model is used in a semi-supervised way: at test time, a PGM is instantiated jointly over the train and test entities, values of the train entities are ﬁxed to the ground truth, and inference ﬁnds the (approximately) most likely labels for the test entities. [sent-197, score-0.303]

78 We compare ESS-CRFs with a dense relational PGM encoded by a Markov logic network (MLN, [20]) using the same features. [sent-221, score-0.394]

79 For the MLN, we had to threshold the pairwise features indicating the likelihood of label agreement and set those under the threshold to 0 to prevent (a) oversmoothing and (b) very long inference times. [sent-223, score-0.221]

80 Also, to prevent oversmoothing by the MLN, we have found it useful to scale down the pairwise feature weights learned during training, thus weakening the smoothing effect of any single edge in the model4 . [sent-224, score-0.219]

81 We compare ESS-CRFs to high-treewidth relational Markov networks (RMNs, [23]), max-margin Markov networks (M3Ns, [24]) and a standalone SVM classiﬁer. [sent-235, score-0.366]

82 All the relational PGMs use the same single-variable features encoding the webpage text, and pairwise features encoding the link structure. [sent-236, score-0.478]

83 RMNs and ESS-CRFs are trained to maximize the conditional likelihood of the labels, while M3Ns maximize the margin in likelihood between the correct assignment and all of the incorrect ones, explicitly targeting the classiﬁcation. [sent-238, score-0.235]

84 Observe that ESS-CRF matches the accuracy of high-treewidth RMNs, again showing that the smaller expressive power of tree models can be fully compensated by exact parameter learning and inference. [sent-241, score-0.308]

85 Still, the RMN results indicate that it may be possible to match the M3N accuracy with much faster tractable ESS models by replacing the CRF conditional likelihood objective with the max-margin objective, which is an important direction of future work. [sent-251, score-0.279]

86 Two cornerstones of our ESS-CRF approach, namely using models that become more sparse when evidence is instantiated, and using multiple tractable models to avoid restrictions on the expressive power inherent to low-treewidth models, have been discussed in the existing literature. [sent-254, score-0.307]

87 However, so far CSI has been treated as a local property of the model, which made reasoning about the resulting treewidth of evidencespeciﬁc models impossible. [sent-256, score-0.185]

88 Thus, the full potential of exact inference for models with CSI remained unused. [sent-257, score-0.167]

89 Unlike the mixture models, our approach of selecting a single structure for any given evidence value has the advantage of allowing for efﬁcient exact decoding of the most probable assignment to the unknowns X using the Viterbi algorithm [29]. [sent-262, score-0.248]

90 Both for the mixture models and our approach, joint optimization of the structure and weights (u and w in our notation) is infeasible due to many local optima of the objective. [sent-263, score-0.197]

91 Learning the CRF structure in general is NP-hard, which follows from the hardness results for the generative models (c. [sent-266, score-0.179]

92 Moreover, CRF structure learning is further complicated by the fact the CRF structure likelihood does not decompose into scores of local graph components, as do scores for some generative models [3]. [sent-269, score-0.333]

93 In practice, the hardness of CRF structure learning leads to high popularity of heuristics: chain and skip-chain [32] structures are often used, as well as grid-like structures. [sent-271, score-0.207]

94 Finally, one can try to approximate the CRF structure score as a combination of local scores [15, 4] and use an algorithm for learning generative structures (where the score actually decomposes). [sent-277, score-0.264]

95 Learning the weights is straightforward for tractable CRFs, because the log-likelihood is concave [1] and the gradient (3) can be used with mature convex optimization techniques. [sent-280, score-0.195]

96 For dense structures, computing the gradient (3) exactly is intractable as even approximate inference in general models is NP-hard [5]. [sent-282, score-0.2]

97 As a result, approximate inference techniques, such as belief propagation [10, 11] or Gibbs sampling [12] are employed, without guarantees on the quality of the result. [sent-283, score-0.168]

98 Our experiments showed that exact weight learning for tractable models gives an advantage in approximation quality and efﬁciency over dense structures. [sent-287, score-0.243]

99 2), has just one source of approximation, namely conditional structure scores. [sent-292, score-0.192]

100 We have demonstrated on real-life relational datasets that our approach matches or exceeds the accuracy of state of the art dense discriminative models, and at the same time provide more than a factor of magnitude speedup. [sent-293, score-0.47]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('crf', 0.465), ('mln', 0.392), ('relational', 0.332), ('crfs', 0.265), ('fijk', 0.252), ('propositional', 0.203), ('treewidth', 0.149), ('ess', 0.138), ('faces', 0.137), ('wijk', 0.134), ('llh', 0.117), ('xj', 0.109), ('conditional', 0.104), ('cllh', 0.101), ('structures', 0.091), ('structure', 0.088), ('pgm', 0.084), ('pgms', 0.084), ('xi', 0.082), ('inference', 0.078), ('tree', 0.074), ('tractable', 0.073), ('weights', 0.073), ('expressive', 0.069), ('foreach', 0.06), ('rmn', 0.059), ('inst', 0.059), ('logp', 0.059), ('evidence', 0.055), ('entities', 0.053), ('exact', 0.053), ('csi', 0.05), ('webkb', 0.05), ('pairwise', 0.049), ('fk', 0.046), ('chechetka', 0.044), ('junction', 0.043), ('quality', 0.042), ('suboptimal', 0.042), ('rmns', 0.041), ('dense', 0.039), ('ij', 0.039), ('chow', 0.038), ('accuracy', 0.038), ('power', 0.038), ('guestrin', 0.038), ('effective', 0.037), ('models', 0.036), ('uij', 0.036), ('webpages', 0.036), ('elds', 0.036), ('seconds', 0.035), ('discriminative', 0.034), ('alchemy', 0.034), ('oversmoothing', 0.034), ('standalone', 0.034), ('scores', 0.033), ('wk', 0.033), ('link', 0.033), ('dependencies', 0.032), ('edges', 0.032), ('ep', 0.032), ('features', 0.032), ('people', 0.032), ('feature', 0.032), ('edge', 0.031), ('marginals', 0.03), ('train', 0.03), ('test', 0.03), ('trees', 0.03), ('face', 0.029), ('hardness', 0.028), ('likelihood', 0.028), ('mrfs', 0.028), ('art', 0.027), ('dash', 0.027), ('unknowns', 0.027), ('alg', 0.027), ('decoupled', 0.027), ('traffic', 0.027), ('concave', 0.027), ('generative', 0.027), ('abnormality', 0.025), ('mixtures', 0.025), ('approximate', 0.025), ('maximize', 0.025), ('assignment', 0.025), ('uai', 0.024), ('markov', 0.023), ('logic', 0.023), ('belief', 0.023), ('construction', 0.023), ('formulations', 0.023), ('images', 0.022), ('likely', 0.022), ('max', 0.022), ('entity', 0.022), ('xr', 0.022), ('gradient', 0.022), ('richardson', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs

Author: Anton Chechetka, Carlos Guestrin

2 0.19921711 71 nips-2010-Efficient Relational Learning with Hidden Variable Detection

Author: Ni Lao, Jun Zhu, Liu Xinwang, Yandong Liu, William W. Cohen

Abstract: Markov networks (MNs) can incorporate arbitrarily complex features in modeling relational data. However, this ﬂexibility comes at a sharp price of training an exponentially complex model. To address this challenge, we propose a novel relational learning approach, which consists of a restricted class of relational MNs (RMNs) called relation tree-based RMN (treeRMN), and an efﬁcient Hidden Variable Detection algorithm called Contrastive Variable Induction (CVI). On one hand, the restricted treeRMN only considers simple (e.g., unary and pairwise) features in relational data and thus achieves computational efﬁciency; and on the other hand, the CVI algorithm efﬁciently detects hidden variables which can capture long range dependencies. Therefore, the resultant approach is highly efﬁcient yet does not sacriﬁce its expressive power. Empirical results on four real datasets show that the proposed relational learning method can achieve similar prediction quality as the state-of-the-art approaches, but is signiﬁcantly more efﬁcient in training; and the induced hidden variables are semantically meaningful and crucial to improve the training speed and prediction qualities of treeRMNs.

3 0.18124224 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction

Author: Tamir Hazan, Raquel Urtasun

Abstract: In this paper we propose an approximated structured prediction framework for large scale graphical models and derive message-passing algorithms for learning their parameters efﬁciently. We ﬁrst relate CRFs and structured SVMs and show that in CRFs a variant of the log-partition function, known as the soft-max, smoothly approximates the hinge loss function of structured SVMs. We then propose an intuitive approximation for the structured prediction problem, using duality, based on a local entropy approximation and derive an efﬁcient messagepassing algorithm that is guaranteed to converge. Unlike existing approaches, this allows us to learn efﬁciently graphical models with cycles and very large number of parameters. 1

4 0.15877354 159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features

Author: Abhay Jha, Vibhav Gogate, Alexandra Meliou, Dan Suciu

Abstract: Lifted Inference algorithms for representations that combine ﬁrst-order logic and graphical models have been the focus of much recent research. All lifted algorithms developed to date are based on the same underlying idea: take a standard probabilistic inference algorithm (e.g., variable elimination, belief propagation etc.) and improve its efﬁciency by exploiting repeated structure in the ﬁrst-order model. In this paper, we propose an approach from the other side in that we use techniques from logic for probabilistic inference. In particular, we deﬁne a set of rules that look only at the logical representation to identify models for which exact efﬁcient inference is possible. Our rules yield new tractable classes that could not be solved efﬁciently by any of the existing techniques. 1

5 0.1548252 1 nips-2010-(RF)^2 -- Random Forest Random Field

Author: Nadia Payet, Sinisa Todorovic

Abstract: We combine random forest (RF) and conditional random ﬁeld (CRF) into a new computational framework, called random forest random ﬁeld (RF)2 . Inference of (RF)2 uses the Swendsen-Wang cut algorithm, characterized by MetropolisHastings jumps. A jump from one state to another depends on the ratio of the proposal distributions, and on the ratio of the posterior distributions of the two states. Prior work typically resorts to a parametric estimation of these four distributions, and then computes their ratio. Our key idea is to instead directly estimate these ratios using RF. RF collects in leaf nodes of each decision tree the class histograms of training examples. We use these class histograms for a nonparametric estimation of the distribution ratios. We derive the theoretical error bounds of a two-class (RF)2 . (RF)2 is applied to a challenging task of multiclass object recognition and segmentation over a random ﬁeld of input image regions. In our empirical evaluation, we use only the visual information provided by image regions (e.g., color, texture, spatial layout), whereas the competing methods additionally use higher-level cues about the horizon location and 3D layout of surfaces in the scene. Nevertheless, (RF)2 outperforms the state of the art on benchmark datasets, in terms of accuracy and computation time.

6 0.13967353 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies

7 0.13012308 67 nips-2010-Dynamic Infinite Relational Model for Time-varying Relational Data Analysis

8 0.10538703 144 nips-2010-Learning Efficient Markov Networks

9 0.089752994 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits

10 0.087230988 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

11 0.086833484 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks

12 0.071984664 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning

13 0.071829945 128 nips-2010-Infinite Relational Modeling of Functional Connectivity in Resting State fMRI

14 0.070637897 118 nips-2010-Implicit Differentiation by Perturbation

15 0.064904436 108 nips-2010-Graph-Valued Regression

16 0.063138053 164 nips-2010-MAP Estimation for Graphical Models by Likelihood Maximization

17 0.063008651 288 nips-2010-Worst-case bounds on the quality of max-product fixed-points

18 0.060173698 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data

19 0.055066474 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities

20 0.054637033 99 nips-2010-Gated Softmax Classification

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.18), (1, 0.059), (2, -0.014), (3, -0.021), (4, -0.109), (5, -0.045), (6, -0.022), (7, -0.005), (8, 0.106), (9, -0.008), (10, -0.198), (11, -0.018), (12, 0.075), (13, 0.019), (14, -0.009), (15, -0.086), (16, -0.052), (17, -0.065), (18, -0.049), (19, -0.085), (20, -0.099), (21, 0.019), (22, 0.171), (23, -0.112), (24, -0.108), (25, -0.01), (26, -0.048), (27, -0.045), (28, -0.06), (29, 0.053), (30, -0.171), (31, -0.239), (32, -0.208), (33, -0.128), (34, -0.119), (35, 0.118), (36, -0.033), (37, 0.09), (38, -0.056), (39, -0.043), (40, 0.039), (41, -0.029), (42, 0.018), (43, 0.001), (44, -0.092), (45, 0.029), (46, 0.127), (47, 0.13), (48, -0.081), (49, 0.088)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91604644 83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs

Author: Anton Chechetka, Carlos Guestrin

2 0.85459733 71 nips-2010-Efficient Relational Learning with Hidden Variable Detection

Author: Ni Lao, Jun Zhu, Liu Xinwang, Yandong Liu, William W. Cohen

3 0.7241568 159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features

Author: Abhay Jha, Vibhav Gogate, Alexandra Meliou, Dan Suciu

4 0.60711688 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies

Author: Harold Pashler, Matthew Wilder, Robert Lindsey, Matt Jones, Michael C. Mozer, Michael P. Holmes

Abstract: For over half a century, psychologists have been struck by how poor people are at expressing their internal sensations, impressions, and evaluations via rating scales. When individuals make judgments, they are incapable of using an absolute rating scale, and instead rely on reference points from recent experience. This relativity of judgment limits the usefulness of responses provided by individuals to surveys, questionnaires, and evaluation forms. Fortunately, the cognitive processes that transform internal states to responses are not simply noisy, but rather are inﬂuenced by recent experience in a lawful manner. We explore techniques to remove sequential dependencies, and thereby decontaminate a series of ratings to obtain more meaningful human judgments. In our formulation, decontamination is fundamentally a problem of inferring latent states (internal sensations) which, because of the relativity of judgment, have temporal dependencies. We propose a decontamination solution using a conditional random ﬁeld with constraints motivated by psychological theories of relative judgment. Our exploration of decontamination models is supported by two experiments we conducted to obtain ground-truth rating data on a simple length estimation task. Our decontamination techniques yield an over 20% reduction in the error of human judgments. 1

5 0.55807799 67 nips-2010-Dynamic Infinite Relational Model for Time-varying Relational Data Analysis

Author: Katsuhiko Ishiguro, Tomoharu Iwata, Naonori Ueda, Joshua B. Tenenbaum

Abstract: We propose a new probabilistic model for analyzing dynamic evolutions of relational data, such as additions, deletions and split & merge, of relation clusters like communities in social networks. Our proposed model abstracts observed timevarying object-object relationships into relationships between object clusters. We extend the inﬁnite Hidden Markov model to follow dynamic and time-sensitive changes in the structure of the relational data and to estimate a number of clusters simultaneously. We show the usefulness of the model through experiments with synthetic and real-world data sets.

6 0.55623901 144 nips-2010-Learning Efficient Markov Networks

7 0.50879419 1 nips-2010-(RF)^2 -- Random Forest Random Field

8 0.50174737 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction

9 0.45737201 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits

10 0.43489251 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

11 0.41752163 215 nips-2010-Probabilistic Deterministic Infinite Automata

12 0.41710648 118 nips-2010-Implicit Differentiation by Perturbation

13 0.38340148 84 nips-2010-Exact inference and learning for cumulative distribution functions on loopy graphs

14 0.33908632 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs

15 0.33482152 211 nips-2010-Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression

16 0.32993925 188 nips-2010-On Herding and the Perceptron Cycling Theorem

17 0.31809622 288 nips-2010-Worst-case bounds on the quality of max-product fixed-points

18 0.31480941 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades

19 0.30945331 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models

20 0.30695635 9 nips-2010-A New Probabilistic Model for Rank Aggregation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.026), (17, 0.018), (27, 0.046), (30, 0.051), (35, 0.016), (41, 0.018), (45, 0.258), (50, 0.066), (52, 0.031), (59, 0.292), (60, 0.033), (77, 0.033), (90, 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86224061 4 nips-2010-A Computational Decision Theory for Interactive Assistants

Author: Alan Fern, Prasad Tadepalli

Abstract: We study several classes of interactive assistants from the points of view of decision theory and computational complexity. We ﬁrst introduce a class of POMDPs called hidden-goal MDPs (HGMDPs), which formalize the problem of interactively assisting an agent whose goal is hidden and whose actions are observable. In spite of its restricted nature, we show that optimal action selection in ﬁnite horizon HGMDPs is PSPACE-complete even in domains with deterministic dynamics. We then introduce a more restricted model called helper action MDPs (HAMDPs), where the assistant’s action is accepted by the agent when it is helpful, and can be easily ignored by the agent otherwise. We show classes of HAMDPs that are complete for PSPACE and NP along with a polynomial time class. Furthermore, we show that for general HAMDPs a simple myopic policy achieves a regret, compared to an omniscient assistant, that is bounded by the entropy of the initial goal distribution. A variation of this policy is shown to achieve worst-case regret that is logarithmic in the number of goals for any goal distribution. 1

2 0.83660328 206 nips-2010-Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine

Author: George Dahl, Marc'aurelio Ranzato, Abdel-rahman Mohamed, Geoffrey E. Hinton

Abstract: Straightforward application of Deep Belief Nets (DBNs) to acoustic modeling produces a rich distributed representation of speech data that is useful for recognition and yields impressive results on the speaker-independent TIMIT phone recognition task. However, the ﬁrst-layer Gaussian-Bernoulli Restricted Boltzmann Machine (GRBM) has an important limitation, shared with mixtures of diagonalcovariance Gaussians: GRBMs treat different components of the acoustic input vector as conditionally independent given the hidden state. The mean-covariance restricted Boltzmann machine (mcRBM), ﬁrst introduced for modeling natural images, is a much more representationally efﬁcient and powerful way of modeling the covariance structure of speech data. Every conﬁguration of the precision units of the mcRBM speciﬁes a different precision matrix for the conditional distribution over the acoustic space. In this work, we use the mcRBM to learn features of speech data that serve as input into a standard DBN. The mcRBM features combined with DBNs allow us to achieve a phone error rate of 20.5%, which is superior to all published results on speaker-independent TIMIT to date. 1

3 0.80107892 254 nips-2010-Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

Author: Han Liu, Kathryn Roeder, Larry Wasserman

Abstract: A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include K-fold cross-validation (K-CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i.e. with high probability, all the true edges will be included in the selected model even when the graph size diverges with the sample size. Empirically, the performance of StARS is compared with the state-of-the-art model selection procedures, including K-CV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all these competing procedures.

same-paper 4 0.78773773 83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs

Author: Anton Chechetka, Carlos Guestrin

5 0.7490021 63 nips-2010-Distributed Dual Averaging In Networks

Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi

Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is conﬁrmed both by theoretical lower bounds and simulations for various networks. 1

6 0.71305227 207 nips-2010-Phoneme Recognition with Large Hierarchical Reservoirs

7 0.6901266 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades

8 0.68918973 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction

9 0.68916029 287 nips-2010-Worst-Case Linear Discriminant Analysis

10 0.68812174 257 nips-2010-Structured Determinantal Point Processes

11 0.6880433 103 nips-2010-Generating more realistic images using gated MRF's

12 0.68766665 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

13 0.68729895 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition

14 0.68695402 224 nips-2010-Regularized estimation of image statistics by Score Matching

15 0.68689674 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

16 0.68683964 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models

17 0.68607581 217 nips-2010-Probabilistic Multi-Task Feature Selection

18 0.68600988 176 nips-2010-Multiple Kernel Learning and the SMO Algorithm

19 0.68595821 185 nips-2010-Nonparametric Density Estimation for Stochastic Optimization with an Observable State Variable

20 0.68584025 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups