nips nips2008 nips2008-98 knowledge-graph by maker-knowledge-mining

98 nips-2008-Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data


Source: pdf

Author: Tran T. Truyen, Dinh Phung, Hung Bui, Svetha Venkatesh

Abstract: Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we develop efficient algorithms for learning and constrained inference in a partially-supervised setting, which is important issue in practice where labels can only be obtained sparsely. We demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. [sent-13, score-0.341]

2 Importantly, we develop efficient algorithms for learning and constrained inference in a partially-supervised setting, which is important issue in practice where labels can only be obtained sparsely. [sent-15, score-0.135]

3 We demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. [sent-16, score-0.236]

4 We show that the HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. [sent-17, score-0.191]

5 1 Introduction Modelling hierarchical aspects in complex stochastic processes is an important research issue in many application domains ranging from computer vision, text information extraction, computational linguistics to bioinformatics. [sent-18, score-0.119]

6 For example, in a syntactic parsing task known as noun-phrase chunking, noun-phrases (NPs) and part-of-speech tags (POS) are two layers of semantics associated with words in the sentence. [sent-19, score-0.18]

7 Previous approach first tags the POS and then feeds these tags as input to the chunker. [sent-20, score-0.282]

8 This may not be optimal, as a noun-phrase is often very informative to infer the POS tags belonging to the phrase. [sent-22, score-0.141]

9 Thus, it is more desirable to jointly model and infer both the NPs and the POS tags at the same time. [sent-23, score-0.141]

10 hierarchical structures, such as the dynamic CRFs (DCRF) [10], and hierarchical CRFs [5]. [sent-33, score-0.238]

11 For example, in the noun-phrase chunking problem, no prior hierarchical structures are known. [sent-35, score-0.227]

12 In addition, most discriminative structured models are trained in a completely supervised fashion using fully labelled data, and limited research has been devoted to dealing with the partially labelled data (e. [sent-37, score-0.146]

13 We term the process of learning with partial labels partial-supervision, and the process of inference with partial labels constrained inference. [sent-42, score-0.2]

14 To address the above issues, we propose the Hierarchical Semi-Markov Conditional Random Field (HSCRF), which is a recursive, undirected graphical model that generalises the undirected Markov chains and allows hierarchical decomposition. [sent-45, score-0.201]

15 For example, the noun-phrase chunking problem can be modeled as a two level HSCRF, where the top level represents the NP process, the bottom level the POS process. [sent-47, score-0.43]

16 Rich contextual information such as starting and ending of the phrase, the phrase length, and the distribution of words falling inside the phrase can be effectively encoded. [sent-50, score-0.475]

17 We demonstrate the effectiveness of HSCRFs in two applications: (i) segmenting and labelling activities of daily living (ADLs) in an indoor environment and (ii) jointly modelling noun-phrases and part-of-speeches in shallow parsing. [sent-52, score-0.31]

18 Our experimental results in the first application show that the HSCRFs are capable of learning rich, hierarchical activities with good accuracy and exhibit better performance when compared to DCRFs and flat-CRFs. [sent-53, score-0.222]

19 Applications for recognition of activities and natural language parsing are presented in section 5. [sent-61, score-0.142]

20 1 Model Definition and Parameterisation The Hierarchical Semi-Markov Conditional Random Fields Consider a hierarchically nested Markov process with D levels where, by convention, the top level is the dummy root level that generates all subsequent Markov chains. [sent-64, score-0.339]

21 Then, as in the generative process of the hierarchical HMMs [2], the parent state embeds a child Markov chain whose states may in turn contain grand-child Markov chains. [sent-65, score-0.439]

22 The relation among these nested Markov chains is defined via the model topology, which is a state hierarchy of depth D. [sent-66, score-0.211]

23 It specifies a set of states S d at each level d, i. [sent-67, score-0.155]

24 |S d |}, where |S d | is the number of states at level d and 1 ≤ d ≤ D. [sent-72, score-0.155]

25 For each state sd ∈ S d where d = D, the model also defines a set of children associated with it at the next level ch(sd ) ⊂ S d+1 , and thus conversely, each child sd+1 is associated with a set of parental states at the upper level pa(sd+1 ) ⊂ S d . [sent-73, score-0.637]

26 Unlike the original HHMMs proposed in [2] where tree structure is explicitly enforced on the state hierarchy, the HSCRFs allow arbitrary sharing of children among parental states as addressed in [1]. [sent-74, score-0.185]

27 Start with the root node at the top level, as soon as a new state is created at level d = D, it initialises a child state at level d + 1. [sent-77, score-0.575]

28 This child process at level d + 1 continues its execution recursively until it terminates, and when it does, the control of execution returns to its parent at the upper level d. [sent-79, score-0.412]

29 At this point, the parent makes a decision either to transits to a new state at the same level or returns the control to the grand-parent at the upper level d − 1. [sent-80, score-0.448]

30 The key intuition for this hierarchical nesting process is that the lifespan of a child process is a subsegment in the lifespan of its parent. [sent-81, score-0.299]

31 To be more precise, consider the case which a parent process sd at level d starts a new state2 at time i and persists until time j. [sent-82, score-0.448]

32 At time i, the parent initialises i:j a child state sd+1 which continues until it ends at time k < j, at which the child state transits to i a new child state sd+1 . [sent-83, score-0.838]

33 The child process exits at time j, at which the control from the child level k+1 is returned to the parent sd . [sent-84, score-0.542]

34 Upon receiving the control, the parent state sd may transit to a new i:j i:j parent state sd j+1:l , or ends at j and returns the control to the grand-parent at level d − 1. [sent-85, score-0.788]

35 d=1 xi−1 xj−1 xi xd−1 i+1 x2 2 d=2 ei−1 = 1 e2 2 ei = 0 xd i d=D ed i−1 = 1 1 xj 2 T −1 ej−1 = 0 xd i ej = 1 d−1 ei = 0 xd i xd i+1 ed = 1 i ed = 1 i T xd+1 i xd+1 i Figure 1: Graphical presentation for HSCRFs (leftmost). [sent-86, score-1.503]

36 Graph structures for state-persistence (middle-top), initialisation and ending (middle-bottom), and state-transition (rightmost). [sent-87, score-0.238]

37 It starts from the root level indexed as 1, runs for T time slices and at each time slice a hierarchy of D states are generated. [sent-90, score-0.288]

38 At each level d and time index i, there is a node representing a state variable xd ∈ S d = {1, 2, . [sent-91, score-0.576]

39 Associated with each xd i i is an ending indicator ed which can be either 1 or 0 to signify whether the state xd terminates or i i continues its execution to the next time slice. [sent-95, score-1.069]

40 The nesting nature of the HSCRFs is formally realised by imposing the specific constraints on the value assignment of ending indicators: • The root state persists during the course of evolution, i. [sent-96, score-0.415]

41 , ed = 1 implies i ed+1:D = 1; and when a state persists, all of its ancestors must also persist, i. [sent-103, score-0.133]

42 i • When a state transits, its parent must remain unchanged, i. [sent-106, score-0.186]

43 , ed = 1, ed−1 = 0, and states i i at the bottom level terminates at every single slice, i. [sent-108, score-0.24]

44 i Thus, specific value assignments of ending indicators provide contexts that realise the evolution of the model states in both hierarchical (vertical) and temporal (horizontal) directions. [sent-111, score-0.427]

45 Each context at 1 In HHMMs, the bottom level is also called production level, in which the states emit observational symbols. [sent-112, score-0.184]

46 2 Our notation sd is to denote the set of variables from time i to j at level d, i. [sent-114, score-0.293]

47 i:j i:j i i+1 j a level and associated state variables form a contextual clique, and here we identify four contextual clique types (cf. [sent-120, score-0.669]

48 1): • State-persistence : This corresponds to the life time of a state at a given level Specifically, persist,d given a context be c = (ed = (xd , c), is a contextual i−1:j = (1, 0, . [sent-122, score-0.378]

49 , 0, 1)), then σi:j i:j d clique that specifies the life span [i, j] of any state s = xi:j . [sent-124, score-0.297]

50 • State-transition : This corresponds to a state at level d ∈ [2, D] at time i transiting to transit,d a new state. [sent-125, score-0.245]

51 Specifically, given a context c = (ed−1 = 0, ed = 1) then σi = i i d−1 d d d (xi+1 , xi:i+1 , c) is a contextual clique that specifies the transition of xi to xi+1 at time i under the same parent xd−1 . [sent-126, score-0.46]

52 i+1 • State-initialisation : This corresponds to a state at level d ∈ [1, D − 1] initialising a new child state at level d + 1 at time i. [sent-127, score-0.543]

53 Specifically, given a context c = (ed = 1), then i−1 init,d σi = (xd , xd+1 , c) is a contextual clique that specifies the initialisation at time i from i i the parent xd to the first child xd+1 . [sent-128, score-0.887]

54 i i • State-exiting : This corresponds to a state at level d ∈ [1, D−1] to end at time i Specifically, exit,d given a context c = (ed = 1), then σi = (xd , xd+1 , c) is a contextual clique that i i i d specifies the ending of xi at time i with the last child xd+1 . [sent-129, score-0.861]

55 i In the HSCRF, we are interested in the conditional setting, in which the entire state and ending variables (x1:D , e1:D ) are conditioned on an observational sequence z. [sent-130, score-0.324]

56 For example, in computa1:T 1:T tional linguistics, the observation is often the sequence of words, and the state variables might be the part-of-speech tags and the phrases. [sent-131, score-0.248]

57 To capture the correlation between variables and such conditioning, we define a non-negative potential function φ(σ, z) over each contextual clique σ. [sent-132, score-0.352]

58 Figure 2 shows the notations for potentials that correspond to the four contextual clique types we have identified above. [sent-133, score-0.377]

59 State persistence potential State transition potential State initialization potential State ending potential d,s,z persist,d Ri:j = φ(σi:j , z) where s = xd . [sent-136, score-0.648]

60 i:j d,s,z transit,d Au,v,i = φ(σi , z) where s = xd−1 and u = xd , v = xd . [sent-137, score-0.662]

61 i i+1 i+1 d,s,z init,d πu,i = φ(σi , z) where s = xd , u = xd+1 . [sent-138, score-0.331]

62 i i d,s,z exit,d Eu,i = φ(σi , z) where s = xd , u = xd+1 . [sent-139, score-0.331]

63 i i Figure 2: Shorthands for contextual clique potentials. [sent-140, score-0.323]

64 A configuration ζ of the model is a complete assignment of all the states and i ending indicators (x1:D , e1:D ) which satisfies the set of hierarchical constraints described earlier in 1:i 1:T this section. [sent-142, score-0.415]

65 2 (1) Log-linear Parameterisation d In our HSCRF setting, there is a feature vector fσ (σ, z) associated with each type of contextual d d d clique σ, in that φ(σ , z) = exp θσ • fσ (σ, z) . [sent-145, score-0.323]

66 Thus, the features are active only in the context in which the corresponding contextual cliques appear. [sent-147, score-0.133]

67 For the state-persistence contextual clique, the features incorporate state-duration, start time i and end time j of the state. [sent-148, score-0.197]

68 The key insight is the context-specific independence, which is due to hierarchical constraints described in Section 2. [sent-153, score-0.119]

69 Given Πi:j , the set of variables inside the blanket is independent of those outside it. [sent-156, score-0.172]

70 A similar relation holds with respect to the asymmetric Markov blanket, which includes the set of variable assignments Γd,s (u) = (xd = i:j i:j s, xd+1 = u, ed:D = 1, ed+1:D = 1, ed = 0). [sent-157, score-0.122]

71 Figure 3 depicts an asymmetric Markov blanket i−1 i:j−1 j j (the covering arrowed line) containing a smaller asymmetric blanket (the left arrowed line) and a symmetric blanket (the double-arrowed line). [sent-158, score-0.577]

72 Denote by ∆d,s the sum of products of all clique i:j level d level d +1 Figure 3: Decomposition with respect to symmetric/asymmetric Markov blankets. [sent-159, score-0.402]

73 potentials falling inside the symmetric Markov blanket Πd,s . [sent-160, score-0.273]

74 In the same manner, let αi:j (u) be the sum i:j ˆ of products of all clique potentials falling inside the asymmetric Markov blanket Γd,s (u). [sent-162, score-0.519]

75 In general, when we make observations, we observe some states and some ending indicators. [sent-176, score-0.225]

76 Let ˜ V = {˜, e} be the set of observed state and end variables respectively. [sent-177, score-0.132]

77 For example, computing ∆d,s assumes Πd,s , which implies the constraint to the i:j i:j state s at level d starting at i and persisting till terminating at j. [sent-179, score-0.213]

78 , there is an xd = s for k ∈ [i, j]) are made causing this constraint invalid, ∆d,s will be zero. [sent-182, score-0.331]

79 The persistent activities share some of the 12 sub-trajectories. [sent-193, score-0.14]

80 Thus naturally, the data has a state hierarchy of depth 3: the dummy root for each location sequence, the persistent activities, and the sub-trajectories. [sent-195, score-0.28]

81 At each level d and time t we count an error if the predicted state is not the same as the ground-truth. [sent-197, score-0.245]

82 Next, we consider partially-supervised learning in which about 50% of start/end times of a state and state labels are observed at the second level. [sent-210, score-0.304]

83 All ending indicators are known at the bottom level. [sent-211, score-0.248]

84 As can be seen, although only 50% of the state labels and state start/end times are observed, the model learned is still performing well with accuracy of 80. [sent-213, score-0.279]

85 We next consider the issue of partially observing labels during decoding and test the effect using degraded learned models. [sent-216, score-0.137]

86 There are 48 POS labels and 3 NP labels (B-NP for beginning of a noun-phrase, I-NP for inside a noun-phrase or O for others). [sent-227, score-0.169]

87 We build an HSCRF topology of 3 levels, where the root is just a dummy node, the second level has 2 NP states, and the bottom level has 5 POS states. [sent-234, score-0.339]

88 Since the state spaces are relatively small, we are able to run exact inference in the DCRF by collapsing both the NP and POS state spaces to a combined state space of size 3 × 5 = 15. [sent-237, score-0.359]

89 The SCRF and Semi-CRF model only the NP process, taking the POS tags and words as input. [sent-238, score-0.141]

90 Furthermore, we use the contextual window of 5 instead of 7 as in [10]. [sent-244, score-0.133]

91 The model feature is factorised as f (xc , z) = I(xc )gc (z), where I(xc ) is a binary function on the assignment of the clique variables xc , and gc (z) are the raw features. [sent-246, score-0.291]

92 At test time, since the SCRF and the Semi-CRF are able to use the POS tags as input, it is not fair for the DCRF and HSCRF to predict those labels during inference. [sent-255, score-0.206]

93 Instead, we also give the POS tags to the DCRF and HSCRF and perform constrained inference to predict only the NP labels. [sent-256, score-0.211]

94 Without POS tags given at test time, both the HSCRF and the DCRF perform worse than the SCRF. [sent-265, score-0.141]

95 This is not surprising because the POS tags are always given in the case of SCRF. [sent-266, score-0.141]

96 6 Discussion and Conclusions The HSCRFs presented here are not a standard graphical model since the clique structures are not predefined. [sent-268, score-0.241]

97 The potentials are defined on-the-fly depending on the assignments of the ending indicators. [sent-269, score-0.27]

98 Furthermore, the state persistence potentials capture duration information that is not available in the DBN representation of the HHMMs in [6]. [sent-271, score-0.186]

99 However, the context-free grammar does not limit the depth of semantic hierarchy, thus making unnecessarily difficult to map many hierarchical problems into its form. [sent-274, score-0.151]

100 Learning and detecting activities from movement trajectories using the hierarchical hidden Markov models. [sent-320, score-0.222]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hscrf', 0.464), ('pos', 0.348), ('xd', 0.331), ('dcrf', 0.232), ('hscrfs', 0.216), ('clique', 0.19), ('ending', 0.176), ('sd', 0.155), ('tags', 0.141), ('contextual', 0.133), ('blanket', 0.133), ('scrf', 0.133), ('np', 0.124), ('hierarchical', 0.119), ('xpos', 0.116), ('state', 0.107), ('level', 0.106), ('activities', 0.103), ('child', 0.085), ('chunking', 0.083), ('hhmm', 0.083), ('parent', 0.079), ('indoor', 0.073), ('bui', 0.073), ('markov', 0.071), ('pr', 0.067), ('hhmms', 0.066), ('phung', 0.066), ('labels', 0.065), ('xnp', 0.058), ('asymmetric', 0.056), ('crfs', 0.056), ('potentials', 0.054), ('persist', 0.053), ('transits', 0.05), ('states', 0.049), ('falling', 0.047), ('partially', 0.047), ('ik', 0.046), ('crf', 0.046), ('modelling', 0.045), ('persists', 0.044), ('ej', 0.043), ('indicators', 0.043), ('conditional', 0.041), ('phrase', 0.04), ('xc', 0.04), ('parameterised', 0.04), ('assignments', 0.04), ('inside', 0.039), ('parsing', 0.039), ('inference', 0.038), ('hierarchy', 0.038), ('initialisation', 0.037), ('persistent', 0.037), ('continues', 0.036), ('labelled', 0.036), ('dummy', 0.035), ('nested', 0.034), ('shallow', 0.034), ('adls', 0.033), ('aio', 0.033), ('arrowed', 0.033), ('curtin', 0.033), ('gc', 0.033), ('hung', 0.033), ('initialises', 0.033), ('lifespan', 0.033), ('parameterisation', 0.033), ('recognising', 0.033), ('truyen', 0.033), ('venkatesh', 0.033), ('time', 0.032), ('topology', 0.032), ('depth', 0.032), ('constrained', 0.032), ('root', 0.031), ('terminates', 0.03), ('ei', 0.029), ('jul', 0.029), ('nesting', 0.029), ('nps', 0.029), ('parental', 0.029), ('sentences', 0.029), ('bottom', 0.029), ('potential', 0.029), ('assignment', 0.028), ('undirected', 0.028), ('segmenting', 0.028), ('levels', 0.027), ('discriminative', 0.027), ('jose', 0.027), ('living', 0.027), ('graphical', 0.026), ('ed', 0.026), ('elds', 0.026), ('observed', 0.025), ('structures', 0.025), ('persistence', 0.025), ('degraded', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.000001 98 nips-2008-Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data

Author: Tran T. Truyen, Dinh Phung, Hung Bui, Svetha Venkatesh

Abstract: Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we develop efficient algorithms for learning and constrained inference in a partially-supervised setting, which is important issue in practice where labels can only be obtained sparsely. We demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. 1

2 0.10328329 25 nips-2008-An interior-point stochastic approximation method and an L1-regularized delta rule

Author: Peter Carbonetto, Mark Schmidt, Nando D. Freitas

Abstract: The stochastic approximation method is behind the solution to many important, actively-studied problems in machine learning. Despite its farreaching application, there is almost no work on applying stochastic approximation to learning problems with general constraints. The reason for this, we hypothesize, is that no robust, widely-applicable stochastic approximation method exists for handling such problems. We propose that interior-point methods are a natural solution. We establish the stability of a stochastic interior-point approximation method both analytically and empirically, and demonstrate its utility by deriving an on-line learning algorithm that also performs feature selection via L1 regularization. 1

3 0.10222589 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing

Author: Leo Zhu, Yuanhao Chen, Yuan Lin, Chenxi Lin, Alan L. Yuille

Abstract: Language and image understanding are two major goals of artificial intelligence which can both be conceptually formulated in terms of parsing the input signal into a hierarchical representation. Natural language researchers have made great progress by exploiting the 1D structure of language to design efficient polynomialtime parsing algorithms. By contrast, the two-dimensional nature of images makes it much harder to design efficient image parsers and the form of the hierarchical representations is also unclear. Attempts to adapt representations and algorithms from natural language have only been partially successful. In this paper, we propose a Hierarchical Image Model (HIM) for 2D image parsing which outputs image segmentation and object recognition. This HIM is represented by recursive segmentation and recognition templates in multiple layers and has advantages for representation, inference, and learning. Firstly, the HIM has a coarse-to-fine representation which is capable of capturing long-range dependency and exploiting different levels of contextual information. Secondly, the structure of the HIM allows us to design a rapid inference algorithm, based on dynamic programming, which enables us to parse the image rapidly in polynomial time. Thirdly, we can learn the HIM efficiently in a discriminative manner from a labeled dataset. We demonstrate that HIM outperforms other state-of-the-art methods by evaluation on the challenging public MSRC image dataset. Finally, we sketch how the HIM architecture can be extended to model more complex image phenomena. 1

4 0.091012798 20 nips-2008-An Extended Level Method for Efficient Multiple Kernel Learning

Author: Zenglin Xu, Rong Jin, Irwin King, Michael Lyu

Abstract: We consider the problem of multiple kernel learning (MKL), which can be formulated as a convex-concave problem. In the past, two efficient methods, i.e., Semi-Infinite Linear Programming (SILP) and Subgradient Descent (SD), have been proposed for large-scale multiple kernel learning. Despite their success, both methods have their own shortcomings: (a) the SD method utilizes the gradient of only the current solution, and (b) the SILP method does not regularize the approximate solution obtained from the cutting plane model. In this work, we extend the level method, which was originally designed for optimizing non-smooth objective functions, to convex-concave optimization, and apply it to multiple kernel learning. The extended level method overcomes the drawbacks of SILP and SD by exploiting all the gradients computed in past iterations and by regularizing the solution via a projection to a level set. Empirical study with eight UCI datasets shows that the extended level method can significantly improve efficiency by saving on average 91.9% of computational time over the SILP method and 70.3% over the SD method. 1

5 0.085860699 139 nips-2008-Modeling the effects of memory on human online sentence processing with particle filters

Author: Roger P. Levy, Florencia Reali, Thomas L. Griffiths

Abstract: Language comprehension in humans is significantly constrained by memory, yet rapid, highly incremental, and capable of utilizing a wide range of contextual information to resolve ambiguity and form expectations about future input. In contrast, most of the leading psycholinguistic models and fielded algorithms for natural language parsing are non-incremental, have run time superlinear in input length, and/or enforce structural locality constraints on probabilistic dependencies between events. We present a new limited-memory model of sentence comprehension which involves an adaptation of the particle filter, a sequential Monte Carlo method, to the problem of incremental parsing. We show that this model can reproduce classic results in online sentence comprehension, and that it naturally provides the first rational account of an outstanding problem in psycholinguistics, in which the preferred alternative in a syntactic ambiguity seems to grow more attractive over time even in the absence of strong disambiguating information. 1

6 0.074907042 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

7 0.067971185 127 nips-2008-Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction

8 0.055793878 82 nips-2008-Fast Computation of Posterior Mode in Multi-Level Hierarchical Models

9 0.054993629 4 nips-2008-A Scalable Hierarchical Distributed Language Model

10 0.048071314 36 nips-2008-Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree

11 0.045843981 112 nips-2008-Kernel Measures of Independence for non-iid Data

12 0.044248436 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding

13 0.043861646 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

14 0.041112997 234 nips-2008-The Infinite Factorial Hidden Markov Model

15 0.040419493 152 nips-2008-Non-stationary dynamic Bayesian networks

16 0.039429907 229 nips-2008-Syntactic Topic Models

17 0.039290242 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models

18 0.038056687 81 nips-2008-Extracting State Transition Dynamics from Multiple Spike Trains with Correlated Poisson HMM

19 0.037950482 69 nips-2008-Efficient Exact Inference in Planar Ising Models

20 0.037859574 104 nips-2008-Improved Moves for Truncated Convex Models


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.138), (1, -0.009), (2, 0.056), (3, -0.044), (4, 0.004), (5, -0.03), (6, 0.017), (7, 0.032), (8, -0.015), (9, -0.055), (10, -0.049), (11, 0.026), (12, 0.043), (13, 0.004), (14, 0.004), (15, 0.057), (16, 0.033), (17, -0.095), (18, -0.041), (19, -0.008), (20, 0.035), (21, 0.08), (22, 0.002), (23, -0.028), (24, -0.048), (25, 0.01), (26, 0.021), (27, -0.085), (28, -0.004), (29, -0.058), (30, 0.037), (31, -0.083), (32, -0.196), (33, -0.095), (34, -0.069), (35, -0.124), (36, 0.073), (37, 0.033), (38, 0.027), (39, 0.035), (40, -0.066), (41, -0.013), (42, -0.074), (43, 0.106), (44, -0.026), (45, 0.03), (46, 0.023), (47, 0.012), (48, -0.158), (49, 0.065)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91636932 98 nips-2008-Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data

Author: Tran T. Truyen, Dinh Phung, Hung Bui, Svetha Venkatesh

Abstract: Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we develop efficient algorithms for learning and constrained inference in a partially-supervised setting, which is important issue in practice where labels can only be obtained sparsely. We demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. 1

2 0.63560271 139 nips-2008-Modeling the effects of memory on human online sentence processing with particle filters

Author: Roger P. Levy, Florencia Reali, Thomas L. Griffiths

Abstract: Language comprehension in humans is significantly constrained by memory, yet rapid, highly incremental, and capable of utilizing a wide range of contextual information to resolve ambiguity and form expectations about future input. In contrast, most of the leading psycholinguistic models and fielded algorithms for natural language parsing are non-incremental, have run time superlinear in input length, and/or enforce structural locality constraints on probabilistic dependencies between events. We present a new limited-memory model of sentence comprehension which involves an adaptation of the particle filter, a sequential Monte Carlo method, to the problem of incremental parsing. We show that this model can reproduce classic results in online sentence comprehension, and that it naturally provides the first rational account of an outstanding problem in psycholinguistics, in which the preferred alternative in a syntactic ambiguity seems to grow more attractive over time even in the absence of strong disambiguating information. 1

3 0.54362619 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

Author: Jun Zhu, Eric P. Xing, Bo Zhang

Abstract: Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. While likelihood-based methods have been extensively explored, to our knowledge, learning structured prediction models with latent variables based on the max-margin principle remains largely an open problem. In this paper, we present a partially observed Maximum Entropy Discrimination Markov Network (PoMEN) model that attempts to combine the advantages of Bayesian and margin based paradigms for learning Markov networks from partially labeled data. PoMEN leads to an averaging prediction rule that resembles a Bayes predictor that is more robust to overfitting, but is also built on the desirable discriminative laws resemble those of the M3 N. We develop an EM-style algorithm utilizing existing convex optimization algorithms for M3 N as a subroutine. We demonstrate competent performance of PoMEN over existing methods on a real-world web data extraction task. 1

4 0.50679344 127 nips-2008-Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction

Author: Shay B. Cohen, Kevin Gimpel, Noah A. Smith

Abstract: We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilistic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal distribution as a prior over the grammar parameters. We derive a variational EM algorithm for that model, and then experiment with the task of unsupervised grammar induction for natural language dependency parsing. We show that our model achieves superior results over previous models that use different priors. 1

5 0.47637001 82 nips-2008-Fast Computation of Posterior Mode in Multi-Level Hierarchical Models

Author: Liang Zhang, Deepak Agarwal

Abstract: Multi-level hierarchical models provide an attractive framework for incorporating correlations induced in a response variable that is organized hierarchically. Model fitting is challenging, especially for a hierarchy with a large number of nodes. We provide a novel algorithm based on a multi-scale Kalman filter that is both scalable and easy to implement. For Gaussian response, we show our method provides the maximum a-posteriori (MAP) parameter estimates; for non-Gaussian response, parameter estimation is performed through a Laplace approximation. However, the Laplace approximation provides biased parameter estimates that is corrected through a parametric bootstrap procedure. We illustrate through simulation studies and analyses of real world data sets in health care and online advertising.

6 0.47262928 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing

7 0.46828479 25 nips-2008-An interior-point stochastic approximation method and an L1-regularized delta rule

8 0.45035461 186 nips-2008-Probabilistic detection of short events, with application to critical care monitoring

9 0.44519073 4 nips-2008-A Scalable Hierarchical Distributed Language Model

10 0.44240493 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees

11 0.39758316 35 nips-2008-Bayesian Synchronous Grammar Induction

12 0.39578336 20 nips-2008-An Extended Level Method for Efficient Multiple Kernel Learning

13 0.39356861 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference

14 0.37881589 115 nips-2008-Learning Bounded Treewidth Bayesian Networks

15 0.37520698 36 nips-2008-Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree

16 0.35329095 2 nips-2008-A Convex Upper Bound on the Log-Partition Function for Binary Distributions

17 0.35230118 16 nips-2008-Adaptive Template Matching with Shift-Invariant Semi-NMF

18 0.34015328 211 nips-2008-Simple Local Models for Complex Dynamical Systems

19 0.33600098 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences

20 0.30874181 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.036), (7, 0.051), (12, 0.051), (15, 0.011), (28, 0.141), (57, 0.104), (59, 0.019), (63, 0.016), (71, 0.017), (76, 0.316), (77, 0.057), (78, 0.032), (83, 0.059)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75786924 98 nips-2008-Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data

Author: Tran T. Truyen, Dinh Phung, Hung Bui, Svetha Venkatesh

Abstract: Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we develop efficient algorithms for learning and constrained inference in a partially-supervised setting, which is important issue in practice where labels can only be obtained sparsely. We demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. 1

2 0.53765357 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

Author: Xuming He, Richard S. Zemel

Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1

3 0.53212166 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes

Author: Erik B. Sudderth, Michael I. Jordan

Abstract: We develop a statistical framework for the simultaneous, unsupervised segmentation and discovery of visual object categories from image databases. Examining a large set of manually segmented scenes, we show that object frequencies and segment sizes both follow power law distributions, which are well modeled by the Pitman–Yor (PY) process. This nonparametric prior distribution leads to learning algorithms which discover an unknown set of objects, and segmentation methods which automatically adapt their resolution to each image. Generalizing previous applications of PY processes, we use Gaussian processes to discover spatially contiguous segments which respect image boundaries. Using a novel family of variational approximations, our approach produces segmentations which compare favorably to state-of-the-art methods, while simultaneously discovering categories shared among natural scenes. 1

4 0.52989638 200 nips-2008-Robust Kernel Principal Component Analysis

Author: Minh H. Nguyen, Fernando Torre

Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1

5 0.52532011 66 nips-2008-Dynamic visual attention: searching for coding length increments

Author: Xiaodi Hou, Liqing Zhang

Abstract: A visual attention system should respond placidly when common stimuli are presented, while at the same time keep alert to anomalous visual inputs. In this paper, a dynamic visual attention model based on the rarity of features is proposed. We introduce the Incremental Coding Length (ICL) to measure the perspective entropy gain of each feature. The objective of our model is to maximize the entropy of the sampled visual features. In order to optimize energy consumption, the limit amount of energy of the system is re-distributed amongst features according to their Incremental Coding Length. By selecting features with large coding length increments, the computational system can achieve attention selectivity in both static and dynamic scenes. We demonstrate that the proposed model achieves superior accuracy in comparison to mainstream approaches in static saliency map generation. Moreover, we also show that our model captures several less-reported dynamic visual search behaviors, such as attentional swing and inhibition of return. 1

6 0.52453667 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction

7 0.5241611 216 nips-2008-Sparse probabilistic projections

8 0.52361655 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words

9 0.52347445 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

10 0.52317059 95 nips-2008-Grouping Contours Via a Related Image

11 0.52277124 118 nips-2008-Learning Transformational Invariants from Natural Movies

12 0.52198017 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

13 0.52073538 235 nips-2008-The Infinite Hierarchical Factor Regression Model

14 0.52044648 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding

15 0.52014667 27 nips-2008-Artificial Olfactory Brain for Mixture Identification

16 0.51885134 234 nips-2008-The Infinite Factorial Hidden Markov Model

17 0.51853073 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

18 0.51830983 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

19 0.51821256 138 nips-2008-Modeling human function learning with Gaussian processes

20 0.51756555 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization