nips nips2010 nips2010-83 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Anton Chechetka, Carlos Guestrin
Abstract: We present a simple and effective approach to learning tractable conditional random fields with structure that depends on the evidence. Our approach retains the advantages of tractable discriminative models, namely efficient exact inference and arbitrarily accurate parameter learning in polynomial time. At the same time, our algorithm does not suffer a large expressive power penalty inherent to fixed tractable structures. On real-life relational datasets, our approach matches or exceeds state of the art accuracy of the dense models, and at the same time provides an order of magnitude speedup. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present a simple and effective approach to learning tractable conditional random fields with structure that depends on the evidence. [sent-5, score-0.302]
2 Our approach retains the advantages of tractable discriminative models, namely efficient exact inference and arbitrarily accurate parameter learning in polynomial time. [sent-6, score-0.238]
3 At the same time, our algorithm does not suffer a large expressive power penalty inherent to fixed tractable structures. [sent-7, score-0.18]
4 On real-life relational datasets, our approach matches or exceeds state of the art accuracy of the dense models, and at the same time provides an order of magnitude speedup. [sent-8, score-0.436]
5 A key advantage of CRFs over other probabilistic graphical models (PGMs, [3]) stems from the observation that in almost all applications, some variables are unknown at test time (we will denote such variables X ), but others, called the evidence E, are known at test time. [sent-10, score-0.151]
6 The discriminative approach adopted by CRFs allows for better approximation quality of the learned conditional distribution P (X | E), because the representational power of the model is not “wasted” on modeling P (E). [sent-12, score-0.218]
7 To overcome the extra computational challenges posed by the conditional random fields, practitioners usually resort to several of the following approximations throughout the process: • • • • CRF structure is specified by hand, leading to suboptimal structures. [sent-17, score-0.234]
8 Approximate inference at test time results in suboptimal results [5]. [sent-19, score-0.15]
9 Replacing the CRF conditional likelihood objective with a more tractable one (e. [sent-20, score-0.205]
10 [6]) results in suboptimal models (both in terms of learned structure and parameters). [sent-22, score-0.166]
11 For 1 such models, parameter learning and inference can be done exactly1 ; only structure learning involves approximations. [sent-25, score-0.166]
12 Generalize the Chow-Liu algorithm [7] to learn evidence-specific structures for tree CRFs. [sent-31, score-0.165]
13 Generalize tree CRFs with evidence-specific structure (ESS-CRFs) to the relational setting. [sent-32, score-0.494]
14 Demonstrate empirically the superior performance of ESS-CRFs over densely connected models in terms of both accuracy and runtime on real-life relational models. [sent-33, score-0.406]
15 Given the features f and training data D that consists of fully observed assignments to X and E, the optimal feature weights w∗ maximize the conditional log-likelihood (CLLH) of the data: w∗= arg max logP (X | E,w) = arg max (X,E)∈D wijk fijk (Xi ,Xj ,E) − logZ(E,w)) . [sent-38, score-0.696]
16 Moreover, ∂ log P (X | E, w) = fijk (Xi ,Xj ,E) − EP (Xi ,Xj |E,w) [fijk (Xi ,Xj ,E)] , ∂wijk where EP denotes expectation with respect to a distribution P. [sent-40, score-0.252]
17 However, the gradient (3) contains the conditional distribution over Xi Xj , so computing (3) requires inference in the model for every datapoint. [sent-42, score-0.204]
18 Time complexity of the exact inference is exponential in the treewidth of the graph defined by edges T [5]. [sent-43, score-0.312]
19 Therefore, exact evaluation of the CLLH objective (2)and gradient (3) and exact inference at test time are all only feasible for models with low-treewidth T. [sent-44, score-0.272]
20 Unfortunately, restricting the space of models to only those with low treewidth severely decreases the expressive power of CRFs. [sent-45, score-0.292]
21 Complex dependencies of real-life distributions usually cannot be adequately captured by a single tree-like structure, so most of the models used in practice have high treewidth, making exact inference infeasible. [sent-46, score-0.199]
22 Approximate inference is NP-hard [5], so approximate inference algorithms have very few result quality guarantees. [sent-52, score-0.223]
23 Greater expressive power of the models is thus obtained at the expense of worse quality of estimated parameters and inference. [sent-53, score-0.185]
24 Here, we show an alternative way to increase expressive power of tree-like structured CRFs without sacrificing optimal weights learning and exact inference at test time. [sent-54, score-0.341]
25 In practice, our approach is much better suited for relational than for propositional settings, because of much higher parameters dimensionality in the propositional case. [sent-55, score-0.738]
26 However, we first present in detail the propositional case theory to better convey the key high-level ideas. [sent-56, score-0.203]
27 3 Evidence-specific structure for CRFs Observe that, given a particular evidence value E, the set of edges T in the CRF formulation (1) actually can be viewed as a supergraph of the conditional model over X . [sent-57, score-0.279]
28 If T (E) has low treewidth for all values of E, inference and parameter learning using the effective structure are tractable, even if a priori structure T has high treewidth. [sent-62, score-0.44]
29 Unfortunately, in practice the treewidth of T (E) is usually not much smaller than the treewidth of T. [sent-63, score-0.298]
30 Low-treewidth effective structures are rarely used, because treewidth is a global property of the graph (even computing treewidth is NP-complete [13]), while feature design is a local process. [sent-64, score-0.458]
31 Achieving low treewidth for the effective structures requires elaborate feature design, making model construction very difficult. [sent-66, score-0.332]
32 Instead, in this work, we separate construction of low-treewidth effective structures from feature design and weight learning, to combine the advantages of exact inference and discriminative weights learning, high expressive power of high-treewidth models, and local feature design. [sent-67, score-0.56]
33 Observe that the CRF definition (1) can be written equivalently as P (X | E, w) = Z −1 (E, w) exp wijk × (I((i, j) ∈ T ) · fijk (Xi , Xj , E)) . [sent-68, score-0.386]
34 In addition to the feature values f, the effective structure of the model is now controlled by the indicator functions I(·). [sent-70, score-0.157]
35 These indicator functions provide us with a way to control the treewidth of the effective structures independently of the features. [sent-71, score-0.277]
36 The resulting model, which we call a CRF with evidence-specific structure (ESS-CRF), defines a conditional distribution P (X | E, w, u) as follows P (X | E,w,u) = Z −1 (E,w,u) exp ij k wijk (I((i, j) ∈ T (E, u)) · fijk (Xi , Xj , E)) . [sent-75, score-0.617]
37 ESS-CRFs have an important advantage over the traditional parametrization: in (5) the parameters u that determine the model structure are decoupled from the feature weights w. [sent-78, score-0.22]
38 , optimizing u) can be decoupled from feature selection (choosing f ) and feature weights learning (optimizing w). [sent-81, score-0.164]
39 Such a decoupling makes it much easier to guarantee that the effective structure of the model has low treewidth by relegating all the necessary global computation to the structure construction algorithm T = T (E, u). [sent-82, score-0.385]
40 2 Optimize weights w to maximize conditional LLH (2) of the training data. [sent-84, score-0.202]
41 3 foreach E in test data do 4 Use conditional model (1) to define the conditional distribution P (X | E, w). [sent-86, score-0.298]
42 Use approximate inference to compute the marginals or the most likely assignment to X . [sent-87, score-0.18]
43 Algorithm 2: CRF with evidence-specific structures approach 1 Define features fijk (Xi , Xj , E). [sent-88, score-0.375]
44 2 Optimize weights w to maximize conditional LLH log P (X | E, u, w) of the training data. [sent-92, score-0.202]
45 Use exact inference to compute CLLH objective (2) and gradient (3). [sent-93, score-0.153]
46 3 foreach E in test data do 4 Use conditional model (5) to define the conditional distribution P (X | E, w, u). [sent-94, score-0.298]
47 Use exact inference to compute the marginals or the most likely assignment to X . [sent-95, score-0.208]
48 Also, ∂ logP(X | E,w,u) = I((i, j) ∈ T (E, u)) fijk (Xi ,Xj ,E)−EP (Xi ,Xj |E,w,u) [fijk (Xi ,Xj ,E)] . [sent-97, score-0.252]
49 4 Conditional Chow-Liu algorithm for tractable evidence-specific structures Learning the most likely PGM structure from data is in most cases intractable. [sent-104, score-0.274]
50 Unlike tree CRFs, however, likelihood of tree MRF structures decomposes into contributions of individual edges: LLH(T ) = (i,j)∈T I(Xi , Xj ) − Xi ∈X H(Xi ), (7) where I(·, ·) is the mutual information and H(·) is entropy. [sent-111, score-0.267]
51 Therefore, as shown in [7], the most likely structure can be obtained by taking the maximum spanning tree of a fully connected graph, where the weight of an edge ij is I(Xi , Xj ). [sent-112, score-0.254]
52 Given the concrete value E of evidence E, one can write down the conditional version of the tree structure likelihood (7) for that particular value of evidence: LLH(T | E) = (i,j)∈T IP (·|E) (Xi , Xj ) − Xi ∈X HP (·|E) (Xi ). [sent-114, score-0.349]
53 (8) If exact conditional distributions P (Xi , Xj | E) were available, then the same Chow-Liu algorithm would find the optimal conditional structure. [sent-115, score-0.261]
54 However, we can still plug in approximate conditionals P (· | E) learned from 4 Algorithm 3: Conditional Chow-Liu algorithm for learning evidence-specific tree structures // Parameter learning stage. [sent-117, score-0.19]
55 with L-BFGS using the gradient (12) data using any standard density estimation technique3 In particular, with the same features fijk that are used in the CRF model, one can train a logistic regression model for P (· | E) : −1 P (Xi , Xj | E, uij ) = Zij (E, uij ) exp k uijk fijk (Xi , Xj , E) . [sent-123, score-0.66]
56 [8, 15] for higher treewidth junction trees, can be used as components in the same way as Chow-Liu algorithm is used in Alg. [sent-134, score-0.192]
57 5 Relational CRFs with evidence-specific sructure Traditional (also called propositional) PGMs are not well suited for dealing with relational data, where every variable is an entity of some type, and entities are related to each other via different types of links. [sent-136, score-0.407]
58 data assumption of traditional PGMs, and huge dimensionalities of relational datasets preclude learning meaningful propositional models. [sent-142, score-0.535]
59 Instead, several formulations of relational PGMs have been proposed [16] to work with relational data, including relational CRFs. [sent-143, score-1.019]
60 More concretely, in relational CRFs every variable Xi is assigned a type mi out of the set M of possible types. [sent-145, score-0.332]
61 A binary relation R ∈ R, corresponding to a specific type of link between two R variables, specifies the types of its input arguments, and a set of features fk (·, ·, E) and feature R weights wk . [sent-146, score-0.249]
62 By accounting for parameter sharing, it is straightforward to adapt our ESS-CRF formulation to the relational setting. [sent-149, score-0.332]
63 We define the relational ESS-CRF conditional distribution as P(X | E,R,w,u) ∝ exp 3 R∈R I((i, j) ∈ T (E,u)) Xi ,Xj ∈inst(R,X ) R R wk fk (Xi , Xj , E) k (11) Notice that the approximation error from P (·) is the only source of approximations in all our approach. [sent-150, score-0.515]
64 Given the structure learning algorithm T (·, ·) that is guaranteed to return low-treewidth structures, one can learn optimal feature weights w∗ and perform inference at test time exactly: Observation 4 Relational ESS-CRF log-likelihood is concave with respect to w. [sent-160, score-0.328]
65 3) can be also extended to the relational setting by using templated logistic regression weights for estimating edge conditionals. [sent-163, score-0.436]
66 In the relational setting, one only needs to learn O(|R|) parameters, regardless of the dataset size, for both structure selection and feature weights, as opposed to O(|X |2 ) parameters for the propositional case. [sent-169, score-0.655]
67 Thus, relational ESS-CRFs are typically much less prone to overfitting than propositional ones. [sent-170, score-0.535]
68 6 Experiments We have tested the ESS-CRF approach on both propositional and relational data. [sent-171, score-0.535]
69 With the large number of parameters needed for the propositional case (O(|X |2 )), our approach is only practical for cases of abundant data. [sent-172, score-0.203]
70 So our experiments with propositional data serve only to prove the concept, verifying that ESS-CRF can successfully learn a model better than a single tree baseline. [sent-173, score-0.277]
71 In contrast to the propositional settings, in the relational cases the relatively low parameter space dimensionality (O(|R|2 )) almost eliminates the overfitting problem. [sent-174, score-0.535]
72 As a result, on relational datasets ESS-CRF is a very attractive approach in practice. [sent-175, score-0.332]
73 Our experiments show ESS-CRFs comfortably outperforming state of the art high-treewidth discriminative models on several real-life relational datasets. [sent-176, score-0.429]
74 1 Propositional models We compare ESS-CRFs with fixed tree CRFs, where the tree structure learned by the Chow-Liu algorithm using P (X ). [sent-178, score-0.272]
75 The first model, called FACES, aims to improve face recognition in collections of related images using information about similarity between different faces in addition to the standard single-face features. [sent-192, score-0.188]
76 Pairwise features f (Xi , Xj , E), based on blob color similarity, indicate how close two faces are in appearance. [sent-195, score-0.169]
77 The model is used in a semi-supervised way: at test time, a PGM is instantiated jointly over the train and test entities, values of the train entities are fixed to the ground truth, and inference finds the (approximately) most likely labels for the test entities. [sent-197, score-0.303]
78 We compare ESS-CRFs with a dense relational PGM encoded by a Markov logic network (MLN, [20]) using the same features. [sent-221, score-0.394]
79 For the MLN, we had to threshold the pairwise features indicating the likelihood of label agreement and set those under the threshold to 0 to prevent (a) oversmoothing and (b) very long inference times. [sent-223, score-0.221]
80 Also, to prevent oversmoothing by the MLN, we have found it useful to scale down the pairwise feature weights learned during training, thus weakening the smoothing effect of any single edge in the model4 . [sent-224, score-0.219]
81 We compare ESS-CRFs to high-treewidth relational Markov networks (RMNs, [23]), max-margin Markov networks (M3Ns, [24]) and a standalone SVM classifier. [sent-235, score-0.366]
82 All the relational PGMs use the same single-variable features encoding the webpage text, and pairwise features encoding the link structure. [sent-236, score-0.478]
83 RMNs and ESS-CRFs are trained to maximize the conditional likelihood of the labels, while M3Ns maximize the margin in likelihood between the correct assignment and all of the incorrect ones, explicitly targeting the classification. [sent-238, score-0.235]
84 Observe that ESS-CRF matches the accuracy of high-treewidth RMNs, again showing that the smaller expressive power of tree models can be fully compensated by exact parameter learning and inference. [sent-241, score-0.308]
85 Still, the RMN results indicate that it may be possible to match the M3N accuracy with much faster tractable ESS models by replacing the CRF conditional likelihood objective with the max-margin objective, which is an important direction of future work. [sent-251, score-0.279]
86 Two cornerstones of our ESS-CRF approach, namely using models that become more sparse when evidence is instantiated, and using multiple tractable models to avoid restrictions on the expressive power inherent to low-treewidth models, have been discussed in the existing literature. [sent-254, score-0.307]
87 However, so far CSI has been treated as a local property of the model, which made reasoning about the resulting treewidth of evidencespecific models impossible. [sent-256, score-0.185]
88 Thus, the full potential of exact inference for models with CSI remained unused. [sent-257, score-0.167]
89 Unlike the mixture models, our approach of selecting a single structure for any given evidence value has the advantage of allowing for efficient exact decoding of the most probable assignment to the unknowns X using the Viterbi algorithm [29]. [sent-262, score-0.248]
90 Both for the mixture models and our approach, joint optimization of the structure and weights (u and w in our notation) is infeasible due to many local optima of the objective. [sent-263, score-0.197]
91 Learning the CRF structure in general is NP-hard, which follows from the hardness results for the generative models (c. [sent-266, score-0.179]
92 Moreover, CRF structure learning is further complicated by the fact the CRF structure likelihood does not decompose into scores of local graph components, as do scores for some generative models [3]. [sent-269, score-0.333]
93 In practice, the hardness of CRF structure learning leads to high popularity of heuristics: chain and skip-chain [32] structures are often used, as well as grid-like structures. [sent-271, score-0.207]
94 Finally, one can try to approximate the CRF structure score as a combination of local scores [15, 4] and use an algorithm for learning generative structures (where the score actually decomposes). [sent-277, score-0.264]
95 Learning the weights is straightforward for tractable CRFs, because the log-likelihood is concave [1] and the gradient (3) can be used with mature convex optimization techniques. [sent-280, score-0.195]
96 For dense structures, computing the gradient (3) exactly is intractable as even approximate inference in general models is NP-hard [5]. [sent-282, score-0.2]
97 As a result, approximate inference techniques, such as belief propagation [10, 11] or Gibbs sampling [12] are employed, without guarantees on the quality of the result. [sent-283, score-0.168]
98 Our experiments showed that exact weight learning for tractable models gives an advantage in approximation quality and efficiency over dense structures. [sent-287, score-0.243]
99 2), has just one source of approximation, namely conditional structure scores. [sent-292, score-0.192]
100 We have demonstrated on real-life relational datasets that our approach matches or exceeds the accuracy of state of the art dense discriminative models, and at the same time provide more than a factor of magnitude speedup. [sent-293, score-0.47]
wordName wordTfidf (topN-words)
[('crf', 0.465), ('mln', 0.392), ('relational', 0.332), ('crfs', 0.265), ('fijk', 0.252), ('propositional', 0.203), ('treewidth', 0.149), ('ess', 0.138), ('faces', 0.137), ('wijk', 0.134), ('llh', 0.117), ('xj', 0.109), ('conditional', 0.104), ('cllh', 0.101), ('structures', 0.091), ('structure', 0.088), ('pgm', 0.084), ('pgms', 0.084), ('xi', 0.082), ('inference', 0.078), ('tree', 0.074), ('tractable', 0.073), ('weights', 0.073), ('expressive', 0.069), ('foreach', 0.06), ('rmn', 0.059), ('inst', 0.059), ('logp', 0.059), ('evidence', 0.055), ('entities', 0.053), ('exact', 0.053), ('csi', 0.05), ('webkb', 0.05), ('pairwise', 0.049), ('fk', 0.046), ('chechetka', 0.044), ('junction', 0.043), ('quality', 0.042), ('suboptimal', 0.042), ('rmns', 0.041), ('dense', 0.039), ('ij', 0.039), ('chow', 0.038), ('accuracy', 0.038), ('power', 0.038), ('guestrin', 0.038), ('effective', 0.037), ('models', 0.036), ('uij', 0.036), ('webpages', 0.036), ('elds', 0.036), ('seconds', 0.035), ('discriminative', 0.034), ('alchemy', 0.034), ('oversmoothing', 0.034), ('standalone', 0.034), ('scores', 0.033), ('wk', 0.033), ('link', 0.033), ('dependencies', 0.032), ('edges', 0.032), ('ep', 0.032), ('features', 0.032), ('people', 0.032), ('feature', 0.032), ('edge', 0.031), ('marginals', 0.03), ('train', 0.03), ('test', 0.03), ('trees', 0.03), ('face', 0.029), ('hardness', 0.028), ('likelihood', 0.028), ('mrfs', 0.028), ('art', 0.027), ('dash', 0.027), ('unknowns', 0.027), ('alg', 0.027), ('decoupled', 0.027), ('traffic', 0.027), ('concave', 0.027), ('generative', 0.027), ('abnormality', 0.025), ('mixtures', 0.025), ('approximate', 0.025), ('maximize', 0.025), ('assignment', 0.025), ('uai', 0.024), ('markov', 0.023), ('logic', 0.023), ('belief', 0.023), ('construction', 0.023), ('formulations', 0.023), ('images', 0.022), ('likely', 0.022), ('max', 0.022), ('entity', 0.022), ('xr', 0.022), ('gradient', 0.022), ('richardson', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs
Author: Anton Chechetka, Carlos Guestrin
Abstract: We present a simple and effective approach to learning tractable conditional random fields with structure that depends on the evidence. Our approach retains the advantages of tractable discriminative models, namely efficient exact inference and arbitrarily accurate parameter learning in polynomial time. At the same time, our algorithm does not suffer a large expressive power penalty inherent to fixed tractable structures. On real-life relational datasets, our approach matches or exceeds state of the art accuracy of the dense models, and at the same time provides an order of magnitude speedup. 1
2 0.19921711 71 nips-2010-Efficient Relational Learning with Hidden Variable Detection
Author: Ni Lao, Jun Zhu, Liu Xinwang, Yandong Liu, William W. Cohen
Abstract: Markov networks (MNs) can incorporate arbitrarily complex features in modeling relational data. However, this flexibility comes at a sharp price of training an exponentially complex model. To address this challenge, we propose a novel relational learning approach, which consists of a restricted class of relational MNs (RMNs) called relation tree-based RMN (treeRMN), and an efficient Hidden Variable Detection algorithm called Contrastive Variable Induction (CVI). On one hand, the restricted treeRMN only considers simple (e.g., unary and pairwise) features in relational data and thus achieves computational efficiency; and on the other hand, the CVI algorithm efficiently detects hidden variables which can capture long range dependencies. Therefore, the resultant approach is highly efficient yet does not sacrifice its expressive power. Empirical results on four real datasets show that the proposed relational learning method can achieve similar prediction quality as the state-of-the-art approaches, but is significantly more efficient in training; and the induced hidden variables are semantically meaningful and crucial to improve the training speed and prediction qualities of treeRMNs.
3 0.18124224 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
Author: Tamir Hazan, Raquel Urtasun
Abstract: In this paper we propose an approximated structured prediction framework for large scale graphical models and derive message-passing algorithms for learning their parameters efficiently. We first relate CRFs and structured SVMs and show that in CRFs a variant of the log-partition function, known as the soft-max, smoothly approximates the hinge loss function of structured SVMs. We then propose an intuitive approximation for the structured prediction problem, using duality, based on a local entropy approximation and derive an efficient messagepassing algorithm that is guaranteed to converge. Unlike existing approaches, this allows us to learn efficiently graphical models with cycles and very large number of parameters. 1
4 0.15877354 159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features
Author: Abhay Jha, Vibhav Gogate, Alexandra Meliou, Dan Suciu
Abstract: Lifted Inference algorithms for representations that combine first-order logic and graphical models have been the focus of much recent research. All lifted algorithms developed to date are based on the same underlying idea: take a standard probabilistic inference algorithm (e.g., variable elimination, belief propagation etc.) and improve its efficiency by exploiting repeated structure in the first-order model. In this paper, we propose an approach from the other side in that we use techniques from logic for probabilistic inference. In particular, we define a set of rules that look only at the logical representation to identify models for which exact efficient inference is possible. Our rules yield new tractable classes that could not be solved efficiently by any of the existing techniques. 1
5 0.1548252 1 nips-2010-(RF)^2 -- Random Forest Random Field
Author: Nadia Payet, Sinisa Todorovic
Abstract: We combine random forest (RF) and conditional random field (CRF) into a new computational framework, called random forest random field (RF)2 . Inference of (RF)2 uses the Swendsen-Wang cut algorithm, characterized by MetropolisHastings jumps. A jump from one state to another depends on the ratio of the proposal distributions, and on the ratio of the posterior distributions of the two states. Prior work typically resorts to a parametric estimation of these four distributions, and then computes their ratio. Our key idea is to instead directly estimate these ratios using RF. RF collects in leaf nodes of each decision tree the class histograms of training examples. We use these class histograms for a nonparametric estimation of the distribution ratios. We derive the theoretical error bounds of a two-class (RF)2 . (RF)2 is applied to a challenging task of multiclass object recognition and segmentation over a random field of input image regions. In our empirical evaluation, we use only the visual information provided by image regions (e.g., color, texture, spatial layout), whereas the competing methods additionally use higher-level cues about the horizon location and 3D layout of surfaces in the scene. Nevertheless, (RF)2 outperforms the state of the art on benchmark datasets, in terms of accuracy and computation time.
6 0.13967353 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies
7 0.13012308 67 nips-2010-Dynamic Infinite Relational Model for Time-varying Relational Data Analysis
8 0.10538703 144 nips-2010-Learning Efficient Markov Networks
9 0.089752994 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits
10 0.087230988 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
11 0.086833484 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks
12 0.071984664 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning
13 0.071829945 128 nips-2010-Infinite Relational Modeling of Functional Connectivity in Resting State fMRI
14 0.070637897 118 nips-2010-Implicit Differentiation by Perturbation
15 0.064904436 108 nips-2010-Graph-Valued Regression
16 0.063138053 164 nips-2010-MAP Estimation for Graphical Models by Likelihood Maximization
17 0.063008651 288 nips-2010-Worst-case bounds on the quality of max-product fixed-points
18 0.060173698 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data
19 0.055066474 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities
20 0.054637033 99 nips-2010-Gated Softmax Classification
topicId topicWeight
[(0, 0.18), (1, 0.059), (2, -0.014), (3, -0.021), (4, -0.109), (5, -0.045), (6, -0.022), (7, -0.005), (8, 0.106), (9, -0.008), (10, -0.198), (11, -0.018), (12, 0.075), (13, 0.019), (14, -0.009), (15, -0.086), (16, -0.052), (17, -0.065), (18, -0.049), (19, -0.085), (20, -0.099), (21, 0.019), (22, 0.171), (23, -0.112), (24, -0.108), (25, -0.01), (26, -0.048), (27, -0.045), (28, -0.06), (29, 0.053), (30, -0.171), (31, -0.239), (32, -0.208), (33, -0.128), (34, -0.119), (35, 0.118), (36, -0.033), (37, 0.09), (38, -0.056), (39, -0.043), (40, 0.039), (41, -0.029), (42, 0.018), (43, 0.001), (44, -0.092), (45, 0.029), (46, 0.127), (47, 0.13), (48, -0.081), (49, 0.088)]
simIndex simValue paperId paperTitle
same-paper 1 0.91604644 83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs
Author: Anton Chechetka, Carlos Guestrin
Abstract: We present a simple and effective approach to learning tractable conditional random fields with structure that depends on the evidence. Our approach retains the advantages of tractable discriminative models, namely efficient exact inference and arbitrarily accurate parameter learning in polynomial time. At the same time, our algorithm does not suffer a large expressive power penalty inherent to fixed tractable structures. On real-life relational datasets, our approach matches or exceeds state of the art accuracy of the dense models, and at the same time provides an order of magnitude speedup. 1
2 0.85459733 71 nips-2010-Efficient Relational Learning with Hidden Variable Detection
Author: Ni Lao, Jun Zhu, Liu Xinwang, Yandong Liu, William W. Cohen
Abstract: Markov networks (MNs) can incorporate arbitrarily complex features in modeling relational data. However, this flexibility comes at a sharp price of training an exponentially complex model. To address this challenge, we propose a novel relational learning approach, which consists of a restricted class of relational MNs (RMNs) called relation tree-based RMN (treeRMN), and an efficient Hidden Variable Detection algorithm called Contrastive Variable Induction (CVI). On one hand, the restricted treeRMN only considers simple (e.g., unary and pairwise) features in relational data and thus achieves computational efficiency; and on the other hand, the CVI algorithm efficiently detects hidden variables which can capture long range dependencies. Therefore, the resultant approach is highly efficient yet does not sacrifice its expressive power. Empirical results on four real datasets show that the proposed relational learning method can achieve similar prediction quality as the state-of-the-art approaches, but is significantly more efficient in training; and the induced hidden variables are semantically meaningful and crucial to improve the training speed and prediction qualities of treeRMNs.
3 0.7241568 159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features
Author: Abhay Jha, Vibhav Gogate, Alexandra Meliou, Dan Suciu
Abstract: Lifted Inference algorithms for representations that combine first-order logic and graphical models have been the focus of much recent research. All lifted algorithms developed to date are based on the same underlying idea: take a standard probabilistic inference algorithm (e.g., variable elimination, belief propagation etc.) and improve its efficiency by exploiting repeated structure in the first-order model. In this paper, we propose an approach from the other side in that we use techniques from logic for probabilistic inference. In particular, we define a set of rules that look only at the logical representation to identify models for which exact efficient inference is possible. Our rules yield new tractable classes that could not be solved efficiently by any of the existing techniques. 1
4 0.60711688 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies
Author: Harold Pashler, Matthew Wilder, Robert Lindsey, Matt Jones, Michael C. Mozer, Michael P. Holmes
Abstract: For over half a century, psychologists have been struck by how poor people are at expressing their internal sensations, impressions, and evaluations via rating scales. When individuals make judgments, they are incapable of using an absolute rating scale, and instead rely on reference points from recent experience. This relativity of judgment limits the usefulness of responses provided by individuals to surveys, questionnaires, and evaluation forms. Fortunately, the cognitive processes that transform internal states to responses are not simply noisy, but rather are influenced by recent experience in a lawful manner. We explore techniques to remove sequential dependencies, and thereby decontaminate a series of ratings to obtain more meaningful human judgments. In our formulation, decontamination is fundamentally a problem of inferring latent states (internal sensations) which, because of the relativity of judgment, have temporal dependencies. We propose a decontamination solution using a conditional random field with constraints motivated by psychological theories of relative judgment. Our exploration of decontamination models is supported by two experiments we conducted to obtain ground-truth rating data on a simple length estimation task. Our decontamination techniques yield an over 20% reduction in the error of human judgments. 1
5 0.55807799 67 nips-2010-Dynamic Infinite Relational Model for Time-varying Relational Data Analysis
Author: Katsuhiko Ishiguro, Tomoharu Iwata, Naonori Ueda, Joshua B. Tenenbaum
Abstract: We propose a new probabilistic model for analyzing dynamic evolutions of relational data, such as additions, deletions and split & merge, of relation clusters like communities in social networks. Our proposed model abstracts observed timevarying object-object relationships into relationships between object clusters. We extend the infinite Hidden Markov model to follow dynamic and time-sensitive changes in the structure of the relational data and to estimate a number of clusters simultaneously. We show the usefulness of the model through experiments with synthetic and real-world data sets.
6 0.55623901 144 nips-2010-Learning Efficient Markov Networks
7 0.50879419 1 nips-2010-(RF)^2 -- Random Forest Random Field
8 0.50174737 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
9 0.45737201 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits
10 0.43489251 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
11 0.41752163 215 nips-2010-Probabilistic Deterministic Infinite Automata
12 0.41710648 118 nips-2010-Implicit Differentiation by Perturbation
13 0.38340148 84 nips-2010-Exact inference and learning for cumulative distribution functions on loopy graphs
14 0.33908632 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs
15 0.33482152 211 nips-2010-Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression
16 0.32993925 188 nips-2010-On Herding and the Perceptron Cycling Theorem
17 0.31809622 288 nips-2010-Worst-case bounds on the quality of max-product fixed-points
18 0.31480941 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades
19 0.30945331 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models
20 0.30695635 9 nips-2010-A New Probabilistic Model for Rank Aggregation
topicId topicWeight
[(13, 0.026), (17, 0.018), (27, 0.046), (30, 0.051), (35, 0.016), (41, 0.018), (45, 0.258), (50, 0.066), (52, 0.031), (59, 0.292), (60, 0.033), (77, 0.033), (90, 0.025)]
simIndex simValue paperId paperTitle
1 0.86224061 4 nips-2010-A Computational Decision Theory for Interactive Assistants
Author: Alan Fern, Prasad Tadepalli
Abstract: We study several classes of interactive assistants from the points of view of decision theory and computational complexity. We first introduce a class of POMDPs called hidden-goal MDPs (HGMDPs), which formalize the problem of interactively assisting an agent whose goal is hidden and whose actions are observable. In spite of its restricted nature, we show that optimal action selection in finite horizon HGMDPs is PSPACE-complete even in domains with deterministic dynamics. We then introduce a more restricted model called helper action MDPs (HAMDPs), where the assistant’s action is accepted by the agent when it is helpful, and can be easily ignored by the agent otherwise. We show classes of HAMDPs that are complete for PSPACE and NP along with a polynomial time class. Furthermore, we show that for general HAMDPs a simple myopic policy achieves a regret, compared to an omniscient assistant, that is bounded by the entropy of the initial goal distribution. A variation of this policy is shown to achieve worst-case regret that is logarithmic in the number of goals for any goal distribution. 1
2 0.83660328 206 nips-2010-Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine
Author: George Dahl, Marc'aurelio Ranzato, Abdel-rahman Mohamed, Geoffrey E. Hinton
Abstract: Straightforward application of Deep Belief Nets (DBNs) to acoustic modeling produces a rich distributed representation of speech data that is useful for recognition and yields impressive results on the speaker-independent TIMIT phone recognition task. However, the first-layer Gaussian-Bernoulli Restricted Boltzmann Machine (GRBM) has an important limitation, shared with mixtures of diagonalcovariance Gaussians: GRBMs treat different components of the acoustic input vector as conditionally independent given the hidden state. The mean-covariance restricted Boltzmann machine (mcRBM), first introduced for modeling natural images, is a much more representationally efficient and powerful way of modeling the covariance structure of speech data. Every configuration of the precision units of the mcRBM specifies a different precision matrix for the conditional distribution over the acoustic space. In this work, we use the mcRBM to learn features of speech data that serve as input into a standard DBN. The mcRBM features combined with DBNs allow us to achieve a phone error rate of 20.5%, which is superior to all published results on speaker-independent TIMIT to date. 1
3 0.80107892 254 nips-2010-Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
Author: Han Liu, Kathryn Roeder, Larry Wasserman
Abstract: A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include K-fold cross-validation (K-CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i.e. with high probability, all the true edges will be included in the selected model even when the graph size diverges with the sample size. Empirically, the performance of StARS is compared with the state-of-the-art model selection procedures, including K-CV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all these competing procedures.
same-paper 4 0.78773773 83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs
Author: Anton Chechetka, Carlos Guestrin
Abstract: We present a simple and effective approach to learning tractable conditional random fields with structure that depends on the evidence. Our approach retains the advantages of tractable discriminative models, namely efficient exact inference and arbitrarily accurate parameter learning in polynomial time. At the same time, our algorithm does not suffer a large expressive power penalty inherent to fixed tractable structures. On real-life relational datasets, our approach matches or exceeds state of the art accuracy of the dense models, and at the same time provides an order of magnitude speedup. 1
5 0.7490021 63 nips-2010-Distributed Dual Averaging In Networks
Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi
Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. 1
6 0.71305227 207 nips-2010-Phoneme Recognition with Large Hierarchical Reservoirs
7 0.6901266 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades
8 0.68918973 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
9 0.68916029 287 nips-2010-Worst-Case Linear Discriminant Analysis
10 0.68812174 257 nips-2010-Structured Determinantal Point Processes
11 0.6880433 103 nips-2010-Generating more realistic images using gated MRF's
12 0.68766665 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
13 0.68729895 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition
14 0.68695402 224 nips-2010-Regularized estimation of image statistics by Score Matching
15 0.68689674 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
16 0.68683964 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
17 0.68607581 217 nips-2010-Probabilistic Multi-Task Feature Selection
18 0.68600988 176 nips-2010-Multiple Kernel Learning and the SMO Algorithm
19 0.68595821 185 nips-2010-Nonparametric Density Estimation for Stochastic Optimization with an Observable State Variable
20 0.68584025 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups