nips nips2008 nips2008-176 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jun Zhu, Eric P. Xing, Bo Zhang
Abstract: Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. While likelihood-based methods have been extensively explored, to our knowledge, learning structured prediction models with latent variables based on the max-margin principle remains largely an open problem. In this paper, we present a partially observed Maximum Entropy Discrimination Markov Network (PoMEN) model that attempts to combine the advantages of Bayesian and margin based paradigms for learning Markov networks from partially labeled data. PoMEN leads to an averaging prediction rule that resembles a Bayes predictor that is more robust to overfitting, but is also built on the desirable discriminative laws resemble those of the M3 N. We develop an EM-style algorithm utilizing existing convex optimization algorithms for M3 N as a subroutine. We demonstrate competent performance of PoMEN over existing methods on a real-world web data extraction task. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu † Abstract Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. [sent-13, score-0.473]
2 While likelihood-based methods have been extensively explored, to our knowledge, learning structured prediction models with latent variables based on the max-margin principle remains largely an open problem. [sent-14, score-0.407]
3 In this paper, we present a partially observed Maximum Entropy Discrimination Markov Network (PoMEN) model that attempts to combine the advantages of Bayesian and margin based paradigms for learning Markov networks from partially labeled data. [sent-15, score-0.408]
4 PoMEN leads to an averaging prediction rule that resembles a Bayes predictor that is more robust to overfitting, but is also built on the desirable discriminative laws resemble those of the M3 N. [sent-16, score-0.175]
5 We develop an EM-style algorithm utilizing existing convex optimization algorithms for M3 N as a subroutine. [sent-17, score-0.113]
6 We demonstrate competent performance of PoMEN over existing methods on a real-world web data extraction task. [sent-18, score-0.31]
7 1 Introduction Inferring structured predictions based on high-dimensional, often multi-modal and hybrid covariates remains a central problem in data mining (e. [sent-19, score-0.204]
8 Several recent approaches to this problem are based on learning discriminative graphical models defined on composite features that explicitly exploit the structured dependencies among input elements and structured interpretational outputs. [sent-26, score-0.504]
9 However, the problem of structured input/output learning can be intriguing and significantly more difficult when there exist hidden substructures in the data, which is not uncommon in realistic problems. [sent-28, score-0.31]
10 Most existing work along this line, such as the hidden CRF for object recognition [9] and scene segmentation [14] and the dynamic hierarchical MRF for web data extraction [18], falls in the likelihood-based learning. [sent-30, score-0.494]
11 MaxEnDNet offers a general framework to combine Bayesian-style learning and max-margin learning in structured prediction. [sent-35, score-0.251]
12 This elegant combination of probabilistic and maximum margin concepts provides a natural path to incorporate hidden structured variables in learning max-margin Markov networks (M3 N), which is the focus of this paper. [sent-37, score-0.471]
13 It has been shown in [20] that, in the fully observed case, MaxEnDNet subsumes the standard M3 N [12]. [sent-38, score-0.095]
14 In this paper, we explore yet another advantage of MaxEnDNet stemmed from the Bayesian-style max-margin learning formalism on incorporating hidden variables. [sent-41, score-0.15]
15 We present the partially observed MaxEnDNet (PoMEN), which offers a principled way to incorporate latent structures carrying domain knowledge and learn a discriminative model with partially labeled data. [sent-42, score-0.463]
16 The reducibility of MaxEnDNet to M3 N renders many existing convex optimization algorithms developed for learning M3 N directly applicable as subroutines for learning our proposed model. [sent-43, score-0.141]
17 As a practical application, we apply the proposed model to a web data extraction task–product information extraction, where collecting fully labeled training data is very difficult. [sent-45, score-0.424]
18 The results show the promise of max-margin learning as opposed to likelihood-based estimation in the presence of hidden variables. [sent-46, score-0.106]
19 Section 2 reviews the basic max-margin structured prediction formalism and MaxEnDNet. [sent-48, score-0.286]
20 Section 4 applies the model to real web data extraction, and Section 5 brings this paper to a conclusion. [sent-50, score-0.145]
21 2 Preliminaries Our goal is to learn a predictive function h : X → Y from a structured input x ∈ X to a structured output y ∈ Y, where Y = Y1 ×· · ·×Yl represents a combinatorial space of structured interpretations of multi-facet objects. [sent-51, score-0.638]
22 For example, in part-of-speech (POS) tagging, Yi consists of all the POS tags and each label y = (y1 , · · · , yl ) is a sequence of POS tags, and each input x is a sentence (word sequence). [sent-52, score-0.097]
23 We assume that the feasible set of labels Y(x) ⊆ Y is finite for any x. [sent-53, score-0.091]
24 y∈Y(x) (1) By using different loss functions, the parameters w can be estimated by maximizing the conditional likelihood [7] or by maximizing the margin [2, 12, 13] on labeled training data. [sent-57, score-0.169]
25 Exploring sparse dependencies among individual labels yi in y, as reflected in the specific design of the feature functions (e. [sent-63, score-0.101]
26 , based on pair-wise labeling potentials), efficient optimization algorithms based on cutting-plane [13] or message-passing [12], and various gradientbased methods [3, 10] have been proposed to obtain approximate solution to P0. [sent-65, score-0.096]
27 In a MaxEnDNet, a prior over w is introduced to regularize its distribution, and the margins resulting from predictor (3) are used to define a feasible distribution subspace. [sent-69, score-0.133]
28 U is also known as an additional “potential” term in the maximum entropy principle. [sent-71, score-0.107]
29 The feasible distribution subspace F1 is defined as follows: F1 = p(w) : i p(w)[∆Fi (y; w) − ∆ i (y)] dw ≥ −ξi , ∀i, ∀y , i where ∆Fi (y; w) = F (x , y ; w) − F (xi , y; w). [sent-72, score-0.128]
30 P1 is a variational optimization problem over p(w) in the feasible subspace F1 . [sent-73, score-0.118]
31 Thus, one can apply the calculus of variations to the Lagrangian to obtain a variational extremum, followed by a dual transformation of P1. [sent-75, score-0.107]
32 Name, Image, Price, and Description); and inner nodes are the intermediate class labels defined for parts of a web page, e. [sent-88, score-0.264]
33 First, the MaxEnDNet prediction is based on model averaging and therefore enjoys a desirable smoothing effect, with a uniform convergence bound on generalization error, as shown in [20]. [sent-92, score-0.096]
34 Second, MaxEnDNet admits a prior that can be designed to introduce useful regularization effects, such as a sparsity bias, as explored in the Laplace M3 N [19, 20]. [sent-93, score-0.092]
35 Third, as explored in this paper, MaxEnDNet offers a principled way to incorporate hidden generative models underlying the structured predictions, but allows the predictive model to be discriminatively trained based on partially labeled data. [sent-94, score-0.577]
36 In the sequel, we introduce partially observed MaxEnDNet (PoMEN), that combines (possibly latent) generative model and discriminative training for structured prediction. [sent-95, score-0.44]
37 3 Partially Observed MaxEnDNet Consider, for example, the problem of web data extraction, which is to identify interested information from web pages. [sent-96, score-0.29]
38 Each sample is a data record or an entire web page which is represented as a set of HTML elements. [sent-97, score-0.273]
39 One striking characteristic of web data extraction is that various types of structural dependencies between HTML elements exist, e. [sent-98, score-0.307]
40 In [17], fully observed hierarchical CRFs are shown to have great promise and achieve better performance than flat models like linear-chain CRFs [7]. [sent-101, score-0.201]
41 One method to construct a hierarchical model is to first use a parser to construct a so called vision tree [17]. [sent-102, score-0.11]
42 For example, Figure 1(b) is a part of the vision tree of the page in Figure 1(a). [sent-103, score-0.106]
43 In such a hierarchical extraction model, inner nodes are useful to incorporate long distance dependencies, and the variables at one level are refinements of the variables at upper levels. [sent-107, score-0.362]
44 Due to concerns over labeling cost and annotation-ambiguity caused by the overlapping of class labels as in Figure 1(c), it is desirable to effectively learn a hierarchical extraction model with partially labeled data. [sent-109, score-0.513]
45 Without loss of generality, assume that the structured labeling of a sample consists of two parts—an observed part y and a hidden part z. [sent-110, score-0.42]
46 , zN ) denote the ensemble of hidden labels of all the samples. [sent-118, score-0.149]
47 Analogous to the setup for learning the MaxEnDNet, we specify a prior distribution p0 ({z}) over all the hidden structured labels. [sent-119, score-0.354]
48 Again we learn the optimum p(w, {z}) based on a structured minimum relative entropy principle as in MaxEnDNet. [sent-121, score-0.385]
49 Specifically, let p0 (w, {z}) represent a given joint prior over the parameters and the hidden variables, we define the PoMEN problem that gives rise to the optimum p(w, {z}): P2 (PoMEN) : min p(w,{z})∈F2 ,ξ∈RN + KL(p(w, {z})||p0 (w, {z})) + U (ξ). [sent-122, score-0.191]
50 (7) Analogous to P1, P2 is a variational optimization problem over p(w, {z}) in the feasible space F2 . [sent-123, score-0.118]
51 1 Learning PoMEN For a fully general p(w, {z}) where hidden variables in all samples are coupled, solving P2 based on an extension of Theorem 1 would involve very high-dimensional integration and summation that is in practice intractable. [sent-127, score-0.223]
52 In this paper we consider a simpler case where the hidden labels of different samples are iid and independent of the parameter w in both the prior and the posterior distributions, N N that is, p0 (w, {z}) = p0 (w) i=1 p0 (zi ) and p(w, {z}) = p(w) i=1 p(zi ). [sent-128, score-0.193]
53 This assumption will hold true in a graphical model where w corresponds to only the observed y variables at the bottom of a hierarchical model. [sent-129, score-0.167]
54 For more general models where dependencies are more global, we can use the above factored model as a generalized mean field approximation to the true distribution, but this extension is beyond the scope of this paper, and will be explored later in the full paper. [sent-131, score-0.106]
55 Thus, we can apply the same convex optimization techniques as being used for solving the problem P1. [sent-133, score-0.115]
56 The dual variables α are achieved by solving a dual problem: αi (y)∆ i (y) − max α∈P(C) i,y 1 2 αi (y)Ep(z) [∆fi (y, z)] 2 , (9) i,y where P(C) = {α : y αi (y) = C; αi (y) ≥ 0, ∀i, ∀y}. [sent-137, score-0.221]
57 This dual problem is isomorphic to the dual form of the M3 N optimization problem, and we can use existing algorithms developed for M3 N, such as [12, 3] to solve it. [sent-138, score-0.245]
58 For example, in a pairwise Markov network, we can define the prior model as: p0 (z) ∝ exp (i,j)∈E k λk gk (zi , zj ) , where gk (zi , zj ) are feature functions and λk are weights. [sent-153, score-0.26]
59 However, since the number of constraints in (12) is exponential to the size of the observed labels, the optimization problem cannot be efficiently solved. [sent-168, score-0.114]
60 Thus, we can introduce a set of marginal dual variables and transfer the dual problem (12) to an equivalent form with a polynomial number of constraints. [sent-170, score-0.186]
61 4 Experiments We apply PoMEN to the problem of web data extraction, and compare it with partially observed CRFs (PoHCRF) [9], and fully observed hierarchical CRFs (HCRF) [17] and hierarchical M3 N (HM3 N) which has the same hierarchical model structure as the HCRF. [sent-172, score-0.619]
62 The evaluation data consists of product web pages generated from 37 different templates. [sent-176, score-0.18]
63 We evaluate all the methods on two different levels of inputs, record level and page level. [sent-178, score-0.128]
64 For record-level evaluation, we assume that data records are given, and we compare different models on accuracy of extracting attributes in the given records. [sent-179, score-0.207]
65 For page-level evaluation, the inputs are raw web pages and all the models perform 6 Image Name 0. [sent-180, score-0.173]
66 78 0 50 Training Ratio (b) Figure 2: (a) The F1 and block instance accuracy of record-level evaluation from 4 models under different amount of training data. [sent-224, score-0.27]
67 6 20 40 Training Ratio 0 20 40 Training Ratio (b) Figure 3: The average F1 and block instance accuracy of different models with different ratios of training data for two types of page-level evaluation: (a) ST1; and (b) ST2. [sent-258, score-0.235]
68 both record detection and attribute extraction simultaneously as in [17]. [sent-259, score-0.227]
69 In the 185 training pages, there are 1585 data records in total; in the 370 testing pages, 3391 data records are collected. [sent-260, score-0.225]
70 average F1 and block instance accuracy, as defined in [17]. [sent-263, score-0.128]
71 We adopt an independent prior described earlier for the latent variables, each factor p0 (zi ) over a single latent label is assumed to be uniform. [sent-264, score-0.211]
72 2 Record-Level Evaluation In this evaluation, partially observed training data are the data records whose leaf nodes are labeled and inner nodes are hidden. [sent-266, score-0.488]
73 We randomly select m = 5, 10, 20, 30, 40, or, 50 percent of the training records as training data, and test on all the testing records. [sent-267, score-0.192]
74 From Figure 2(a), it can be seen that the HM3 N performs slightly better than HCRF trained on fully labeled data. [sent-269, score-0.094]
75 For the two partially observed models, PoMEN performs much better than PoHCRF in both average F1 and block instance accuracy, and with lower variances of the score, especially when the training set is small. [sent-270, score-0.36]
76 For all the models, higher scores and lower variances are achieved with more training data. [sent-275, score-0.087]
77 Overall, for attributes Image, Price, and Description, although all models generally perform better with more training data, the improvement is small; and the differences between different models are small. [sent-277, score-0.176]
78 This is possibly because the features of these attributes are usually consistent and distinctive, and therefore easier to learn and predict. [sent-278, score-0.093]
79 For the attribute Name, however, a large number of training data are needed to learn a good model because its underlying features have diverse appearance on web pages. [sent-279, score-0.265]
80 Two different partial labeling strategies are used to generate training data. [sent-282, score-0.112]
81 ST1: label the leaf nodes and the nodes that represent data records; ST2: label more information based on ST1, e. [sent-283, score-0.208]
82 , label also the nodes above the “Data Record” nodes in the hierarchy as in Figure 1(c). [sent-285, score-0.164]
83 Due to space limitation, we only report average F1 and block instance accuracy. [sent-286, score-0.128]
84 For ST1, PoMEN achieves better scores and lower variances than PoHCRF in both average F1 and block instance accuracy. [sent-287, score-0.162]
85 The HM3 N performs slightly better than HCRF (both trained on full labeling), and PoMEN performs comparably with the fully observed HCRF in block instance accuracy. [sent-288, score-0.223]
86 For ST2, with more supervision information, PoHCRF achieves higher performance that is comparable to that of HM3 N in average F1, but slightly lower than HM3 N in block instance accuracy. [sent-289, score-0.128]
87 For 7 the latent models, PoHCRF performs slightly better in average F1, and PoMEN does better in block instance accuracy; moreover, the variances of PoMEN are much smaller than those of PoHCRF in both average F1 and block instance accuracy. [sent-290, score-0.353]
88 Thus, the max-margin principle could provide a better paradigm than the likelihood-based estimation for learning latent hierarchical models. [sent-292, score-0.177]
89 It is possible that in hierarchical models, since inner variables usually represent overlapping concepts, the initial distribution are already reasonably good to describe confidence on the labeling due to implicit consistence across the labels. [sent-295, score-0.203]
90 This is unlike the multi-label learning [6] where only one of the multiple labels is true and during the iteration more probability mass should be redistributed on the true label during the EM iterations. [sent-296, score-0.084]
91 5 Conclusions We have presented an extension of the standard max-margin learning to address the challenging problem of learning Markov networks with the existence of structured hidden variables. [sent-297, score-0.338]
92 Our approach is a generalization of the maximum entropy discrimination Markov networks (MaxEnDNet), which offer a general framework to combine Bayesian-style and max-margin learning and subsume the standard M3 N as a special case, to consider structured hidden variables. [sent-298, score-0.522]
93 For the partially observed MaxEnDNet, we developed an EM-style algorithm based on existing convex optimization algorithms developed for the standard M3 N. [sent-299, score-0.258]
94 We applied the proposed model to a real-world web data extraction task and showed that learning latent hierarchical models based on the max-margin principle could be better than the likelihood-based learning with hidden variables. [sent-300, score-0.588]
95 Conditional random fields: Probabilistic models for segmenting and labeling sequence data. [sent-351, score-0.087]
96 Large margin hidden markov models for automatic speech recognition. [sent-376, score-0.281]
97 Support vector machine learning for interdependent and structured output spaces. [sent-389, score-0.204]
98 Scene segmentation with conditional random fields learned from partially labeled images. [sent-394, score-0.144]
99 Simultaneous record detection and attribute labeling in web data extraction. [sent-417, score-0.299]
100 Dynamic hierarchical markov random fields and their application to web data extraction. [sent-425, score-0.304]
wordName wordTfidf (topN-words)
[('maxendnet', 0.53), ('pomen', 0.393), ('pohcrf', 0.314), ('hcrf', 0.24), ('structured', 0.204), ('fi', 0.197), ('web', 0.145), ('extraction', 0.132), ('hidden', 0.106), ('partially', 0.094), ('records', 0.086), ('zj', 0.083), ('ep', 0.082), ('markov', 0.081), ('dw', 0.08), ('entropy', 0.078), ('hierarchical', 0.078), ('block', 0.077), ('discrimination', 0.077), ('dual', 0.074), ('page', 0.074), ('attributes', 0.067), ('crfs', 0.066), ('margin', 0.066), ('name', 0.064), ('latent', 0.063), ('price', 0.062), ('labeling', 0.059), ('record', 0.054), ('training', 0.053), ('zhu', 0.053), ('tsinghua', 0.052), ('instance', 0.051), ('observed', 0.051), ('labeled', 0.05), ('nodes', 0.048), ('feasible', 0.048), ('explored', 0.048), ('lagrangian', 0.047), ('html', 0.047), ('offers', 0.047), ('kl', 0.047), ('formalism', 0.044), ('prior', 0.044), ('multipliers', 0.044), ('pos', 0.044), ('fully', 0.044), ('labels', 0.043), ('convex', 0.043), ('zi', 0.042), ('label', 0.041), ('attribute', 0.041), ('predictor', 0.041), ('ratio', 0.041), ('optimum', 0.041), ('ipopt', 0.039), ('nie', 0.039), ('prediction', 0.038), ('discriminative', 0.038), ('variables', 0.038), ('optimization', 0.037), ('icml', 0.036), ('principle', 0.036), ('evaluation', 0.035), ('solving', 0.035), ('variances', 0.034), ('slack', 0.033), ('variational', 0.033), ('existing', 0.033), ('laplace', 0.032), ('tree', 0.032), ('desirable', 0.031), ('lab', 0.031), ('dependencies', 0.03), ('nips', 0.03), ('leaf', 0.03), ('altun', 0.029), ('maximum', 0.029), ('expectations', 0.029), ('yi', 0.028), ('models', 0.028), ('inner', 0.028), ('networks', 0.028), ('subroutines', 0.028), ('merits', 0.028), ('tags', 0.028), ('yl', 0.028), ('tsochantaridis', 0.028), ('xing', 0.028), ('averaging', 0.027), ('hierarchy', 0.027), ('isomorphic', 0.027), ('accuracy', 0.026), ('constraints', 0.026), ('primal', 0.026), ('learn', 0.026), ('gk', 0.025), ('tech', 0.025), ('paradigms', 0.025), ('rn', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
Author: Jun Zhu, Eric P. Xing, Bo Zhang
Abstract: Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. While likelihood-based methods have been extensively explored, to our knowledge, learning structured prediction models with latent variables based on the max-margin principle remains largely an open problem. In this paper, we present a partially observed Maximum Entropy Discrimination Markov Network (PoMEN) model that attempts to combine the advantages of Bayesian and margin based paradigms for learning Markov networks from partially labeled data. PoMEN leads to an averaging prediction rule that resembles a Bayes predictor that is more robust to overfitting, but is also built on the desirable discriminative laws resemble those of the M3 N. We develop an EM-style algorithm utilizing existing convex optimization algorithms for M3 N as a subroutine. We demonstrate competent performance of PoMEN over existing methods on a real-world web data extraction task. 1
2 0.12428357 119 nips-2008-Learning a discriminative hidden part model for human action recognition
Author: Yang Wang, Greg Mori
Abstract: We present a discriminative part-based approach for human action recognition from video sequences using motion features. Our model is based on the recently proposed hidden conditional random field (hCRF) for object recognition. Similar to hCRF for object recognition, we model a human action by a flexible constellation of parts conditioned on image observations. Different from object recognition, our model combines both large-scale global features and local patch features to distinguish various actions. Our experimental results show that our model is comparable to other state-of-the-art approaches in action recognition. In particular, our experimental results demonstrate that combining large-scale global features and local patch features performs significantly better than directly applying hCRF on local patches alone. 1
3 0.11996259 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
Author: Xuming He, Richard S. Zemel
Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1
4 0.1055593 239 nips-2008-Tighter Bounds for Structured Estimation
Author: Olivier Chapelle, Chuong B. Do, Choon H. Teo, Quoc V. Le, Alex J. Smola
Abstract: Large-margin structured estimation methods minimize a convex upper bound of loss functions. While they allow for efficient optimization algorithms, these convex formulations are not tight and sacrifice the ability to accurately model the true loss. We present tighter non-convex bounds based on generalizing the notion of a ramp loss from binary classification to structured estimation. We show that a small modification of existing optimization algorithms suffices to solve this modified problem. On structured prediction tasks such as protein sequence alignment and web page ranking, our algorithm leads to improved accuracy. 1
5 0.089044042 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
Author: Leo Zhu, Yuanhao Chen, Yuan Lin, Chenxi Lin, Alan L. Yuille
Abstract: Language and image understanding are two major goals of artificial intelligence which can both be conceptually formulated in terms of parsing the input signal into a hierarchical representation. Natural language researchers have made great progress by exploiting the 1D structure of language to design efficient polynomialtime parsing algorithms. By contrast, the two-dimensional nature of images makes it much harder to design efficient image parsers and the form of the hierarchical representations is also unclear. Attempts to adapt representations and algorithms from natural language have only been partially successful. In this paper, we propose a Hierarchical Image Model (HIM) for 2D image parsing which outputs image segmentation and object recognition. This HIM is represented by recursive segmentation and recognition templates in multiple layers and has advantages for representation, inference, and learning. Firstly, the HIM has a coarse-to-fine representation which is capable of capturing long-range dependency and exploiting different levels of contextual information. Secondly, the structure of the HIM allows us to design a rapid inference algorithm, based on dynamic programming, which enables us to parse the image rapidly in polynomial time. Thirdly, we can learn the HIM efficiently in a discriminative manner from a labeled dataset. We demonstrate that HIM outperforms other state-of-the-art methods by evaluation on the challenging public MSRC image dataset. Finally, we sketch how the HIM architecture can be extended to model more complex image phenomena. 1
6 0.087627478 184 nips-2008-Predictive Indexing for Fast Search
7 0.077730767 62 nips-2008-Differentiable Sparse Coding
8 0.074907042 98 nips-2008-Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data
9 0.072373353 241 nips-2008-Transfer Learning by Distribution Matching for Targeted Advertising
10 0.063256264 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias
11 0.062637374 112 nips-2008-Kernel Measures of Independence for non-iid Data
12 0.062088341 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations
13 0.061977588 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models
14 0.060629796 161 nips-2008-On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization
15 0.059208557 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition
16 0.058989108 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning
17 0.058181275 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
18 0.057251107 234 nips-2008-The Infinite Factorial Hidden Markov Model
19 0.056567572 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
20 0.056411553 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models
topicId topicWeight
[(0, -0.21), (1, -0.078), (2, 0.016), (3, -0.051), (4, -0.035), (5, -0.012), (6, 0.008), (7, 0.027), (8, -0.019), (9, -0.019), (10, -0.039), (11, 0.053), (12, 0.08), (13, -0.006), (14, 0.001), (15, -0.007), (16, -0.01), (17, -0.073), (18, 0.03), (19, 0.061), (20, 0.016), (21, 0.022), (22, -0.058), (23, -0.079), (24, 0.013), (25, -0.077), (26, 0.008), (27, 0.065), (28, -0.023), (29, -0.062), (30, -0.093), (31, -0.089), (32, -0.097), (33, -0.067), (34, -0.024), (35, -0.049), (36, -0.012), (37, 0.091), (38, 0.037), (39, 0.012), (40, 0.001), (41, 0.089), (42, 0.047), (43, 0.129), (44, 0.054), (45, 0.075), (46, 0.028), (47, 0.113), (48, -0.158), (49, 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.91136825 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
Author: Jun Zhu, Eric P. Xing, Bo Zhang
Abstract: Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. While likelihood-based methods have been extensively explored, to our knowledge, learning structured prediction models with latent variables based on the max-margin principle remains largely an open problem. In this paper, we present a partially observed Maximum Entropy Discrimination Markov Network (PoMEN) model that attempts to combine the advantages of Bayesian and margin based paradigms for learning Markov networks from partially labeled data. PoMEN leads to an averaging prediction rule that resembles a Bayes predictor that is more robust to overfitting, but is also built on the desirable discriminative laws resemble those of the M3 N. We develop an EM-style algorithm utilizing existing convex optimization algorithms for M3 N as a subroutine. We demonstrate competent performance of PoMEN over existing methods on a real-world web data extraction task. 1
2 0.66504771 98 nips-2008-Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data
Author: Tran T. Truyen, Dinh Phung, Hung Bui, Svetha Venkatesh
Abstract: Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we develop efficient algorithms for learning and constrained inference in a partially-supervised setting, which is important issue in practice where labels can only be obtained sparsely. We demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. 1
3 0.59746617 186 nips-2008-Probabilistic detection of short events, with application to critical care monitoring
Author: Norm Aleks, Stuart Russell, Michael G. Madden, Diane Morabito, Kristan Staudenmayer, Mitchell Cohen, Geoffrey T. Manley
Abstract: We describe an application of probabilistic modeling and inference technology to the problem of analyzing sensor data in the setting of an intensive care unit (ICU). In particular, we consider the arterial-line blood pressure sensor, which is subject to frequent data artifacts that cause false alarms in the ICU and make the raw data almost useless for automated decision making. The problem is complicated by the fact that the sensor data are averaged over fixed intervals whereas the events causing data artifacts may occur at any time and often have durations significantly shorter than the data collection interval. We show that careful modeling of the sensor, combined with a general technique for detecting sub-interval events and estimating their duration, enables detection of artifacts and accurate estimation of the underlying blood pressure values. Our model’s performance identifying artifacts is superior to two other classifiers’ and about as good as a physician’s. 1
4 0.57085133 105 nips-2008-Improving on Expectation Propagation
Author: Manfred Opper, Ulrich Paquet, Ole Winther
Abstract: A series of corrections is developed for the fixed points of Expectation Propagation (EP), which is one of the most popular methods for approximate probabilistic inference. These corrections can lead to improvements of the inference approximation or serve as a sanity check, indicating when EP yields unrealiable results.
5 0.5535745 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
Author: Ydo Wexler, Christopher Meek
Abstract: We propose a multiplicative approximation scheme (MAS) for inference problems in graphical models, which can be applied to various inference algorithms. The method uses -decompositions which decompose functions used throughout the inference procedure into functions over smaller sets of variables with a known error . MAS translates these local approximations into bounds on the accuracy of the results. We show how to optimize -decompositions and provide a fast closed-form solution for an L2 approximation. Applying MAS to the Variable Elimination inference algorithm, we introduce an algorithm we call DynaDecomp which is extremely fast in practice and provides guaranteed error bounds on the result. The superior accuracy and efficiency of DynaDecomp is demonstrated. 1
6 0.51851779 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
7 0.51510543 2 nips-2008-A Convex Upper Bound on the Log-Partition Function for Binary Distributions
8 0.49768823 239 nips-2008-Tighter Bounds for Structured Estimation
9 0.48403105 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
10 0.48131138 82 nips-2008-Fast Computation of Posterior Mode in Multi-Level Hierarchical Models
11 0.47972888 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences
12 0.45644397 119 nips-2008-Learning a discriminative hidden part model for human action recognition
13 0.44424766 233 nips-2008-The Gaussian Process Density Sampler
14 0.44328287 127 nips-2008-Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction
15 0.4410733 236 nips-2008-The Mondrian Process
16 0.43114722 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence
17 0.42212781 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
18 0.42123908 185 nips-2008-Privacy-preserving logistic regression
19 0.40126446 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
20 0.40005118 241 nips-2008-Transfer Learning by Distribution Matching for Targeted Advertising
topicId topicWeight
[(6, 0.063), (7, 0.091), (12, 0.04), (28, 0.159), (57, 0.125), (59, 0.019), (63, 0.035), (64, 0.01), (71, 0.035), (77, 0.036), (78, 0.026), (83, 0.091), (86, 0.181)]
simIndex simValue paperId paperTitle
1 0.85950959 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning
Author: Chunhua Shen, Alan Welsh, Lei Wang
Abstract: In this work, we consider the problem of learning a positive semidefinite matrix. The critical issue is how to preserve positive semidefiniteness during the course of learning. Our algorithm is mainly inspired by LPBoost [1] and the general greedy convex optimization framework of Zhang [2]. We demonstrate the essence of the algorithm, termed PSDBoost (positive semidefinite Boosting), by focusing on a few different applications in machine learning. The proposed PSDBoost algorithm extends traditional Boosting algorithms in that its parameter is a positive semidefinite matrix with trace being one instead of a classifier. PSDBoost is based on the observation that any trace-one positive semidefinite matrix can be decomposed into linear convex combinations of trace-one rank-one matrices, which serve as base learners of PSDBoost. Numerical experiments are presented. 1
same-paper 2 0.83674461 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
Author: Jun Zhu, Eric P. Xing, Bo Zhang
Abstract: Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. While likelihood-based methods have been extensively explored, to our knowledge, learning structured prediction models with latent variables based on the max-margin principle remains largely an open problem. In this paper, we present a partially observed Maximum Entropy Discrimination Markov Network (PoMEN) model that attempts to combine the advantages of Bayesian and margin based paradigms for learning Markov networks from partially labeled data. PoMEN leads to an averaging prediction rule that resembles a Bayes predictor that is more robust to overfitting, but is also built on the desirable discriminative laws resemble those of the M3 N. We develop an EM-style algorithm utilizing existing convex optimization algorithms for M3 N as a subroutine. We demonstrate competent performance of PoMEN over existing methods on a real-world web data extraction task. 1
3 0.7870813 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
Author: Xuming He, Richard S. Zemel
Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1
4 0.77546549 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
Author: Erik B. Sudderth, Michael I. Jordan
Abstract: We develop a statistical framework for the simultaneous, unsupervised segmentation and discovery of visual object categories from image databases. Examining a large set of manually segmented scenes, we show that object frequencies and segment sizes both follow power law distributions, which are well modeled by the Pitman–Yor (PY) process. This nonparametric prior distribution leads to learning algorithms which discover an unknown set of objects, and segmentation methods which automatically adapt their resolution to each image. Generalizing previous applications of PY processes, we use Gaussian processes to discover spatially contiguous segments which respect image boundaries. Using a novel family of variational approximations, our approach produces segmentations which compare favorably to state-of-the-art methods, while simultaneously discovering categories shared among natural scenes. 1
5 0.77451962 95 nips-2008-Grouping Contours Via a Related Image
Author: Praveen Srinivasan, Liming Wang, Jianbo Shi
Abstract: Contours have been established in the biological and computer vision literature as a compact yet descriptive representation of object shape. While individual contours provide structure, they lack the large spatial support of region segments (which lack internal structure). We present a method for further grouping of contours in an image using their relationship to the contours of a second, related image. Stereo, motion, and similarity all provide cues that can aid this task; contours that have similar transformations relating them to their matching contours in the second image likely belong to a single group. To find matches for contours, we rely only on shape, which applies directly to all three modalities without modification, in contrast to the specialized approaches developed for each independently. Visually salient contours are extracted in each image, along with a set of candidate transformations for aligning subsets of them. For each transformation, groups of contours with matching shape across the two images are identified to provide a context for evaluating matches of individual contour points across the images. The resulting contexts of contours are used to perform a final grouping on contours in the original image while simultaneously finding matches in the related image, again by shape matching. We demonstrate grouping results on image pairs consisting of stereo, motion, and similar images. Our method also produces qualitatively better results against a baseline method that does not use the inferred contexts. 1
6 0.77335238 27 nips-2008-Artificial Olfactory Brain for Mixture Identification
7 0.77052838 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
8 0.77042717 62 nips-2008-Differentiable Sparse Coding
9 0.76974452 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
10 0.76905507 66 nips-2008-Dynamic visual attention: searching for coding length increments
11 0.76900554 194 nips-2008-Regularized Learning with Networks of Features
12 0.76745552 200 nips-2008-Robust Kernel Principal Component Analysis
13 0.7662214 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
14 0.76569217 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
15 0.76450157 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
16 0.76425433 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
17 0.76389825 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
18 0.76093566 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction
19 0.76058435 248 nips-2008-Using matrices to model symbolic relationship
20 0.76024342 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization