nips nips2009 nips2009-171 knowledge-graph by maker-knowledge-mining

171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution


Source: pdf

Author: Cosmin Bejan, Matthew Titsworth, Andrew Hickl, Sanda Harabagiu

Abstract: We present a sequence of unsupervised, nonparametric Bayesian models for clustering complex linguistic objects. In this approach, we consider a potentially infinite number of features and categorical outcomes. We evaluated these models for the task of within- and cross-document event coreference on two corpora. All the models we investigated show significant improvements when compared against an existing baseline for this task.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a sequence of unsupervised, nonparametric Bayesian models for clustering complex linguistic objects. [sent-3, score-0.128]

2 We evaluated these models for the task of within- and cross-document event coreference on two corpora. [sent-5, score-0.92]

3 1 Introduction In Natural Language Processing (NLP), the task of event coreference has numerous applications, including question answering, multi-document summarization, and information extraction. [sent-7, score-0.92]

4 Two event mentions are coreferential if they share the same participants and spatio-temporal groundings. [sent-8, score-0.757]

5 Moreover, two event mentions are identical if they have the same causes and effects. [sent-9, score-0.729]

6 For example, the three documents listed in Table 1 contains four mentions of identical events but only the arrested, apprehended, and arrest mentions from the documents 1 and 2 are coreferential. [sent-10, score-0.776]

7 Previous approaches to event coreference resolution [3] used the same lexeme or synonymy of the verb describing the event to decide coreference. [sent-12, score-1.46]

8 Event coreference was also tried by using the semantic types of an ontology [17]. [sent-13, score-0.521]

9 To address this problems, we have explored a sequence of unsupervised, nonparametric Bayesian models that are used to probabilistically infer coreference clusters of event mentions from a collection of unlabeled documents. [sent-15, score-1.248]

10 Our approach is motivated by the recent success of unsupervised approaches for entity coreference resolution [16, 22, 25] and by the advantages of using a large amount of data at no cost. [sent-16, score-0.648]

11 However, to employ the H&K;’s model for tasks that require clustering objects with rich linguistic features (such as event coreference resolution), or to extend this model in order to enclose additional observable properties is a challenging task [22, 25]. [sent-18, score-1.144]

12 In order to counter this limitation, we make a conditional independence assumption between the observable features and propose a generalized framework (Section 3) that is able to easily incorporate new features. [sent-19, score-0.142]

13 In this paper, we focus on event coreference resolution, though adaptations for event identity resolution can be easily made. [sent-25, score-1.46]

14 We evaluated the models on the ACE 2005 event corpus [18] and on a new annotated corpus encoding within- and cross-document event coreference information (Section 5). [sent-26, score-1.486]

15 Document 2: Despite his arrest on suspicion of driving under the influence yesterday, Chargers receiver Vincent Jackson will play in Sunday’s AFC divisional playoff game at Pittsburgh. [sent-34, score-0.168]

16 2 Event Coreference Resolution Models for solving event coreference and event identity can lead to the generation of ad-hoc event hierarchies from text. [sent-37, score-1.814]

17 generic events events event mentions arrest Event properties: Vincent Jackson Suspect: Authorities: police Tuesday Time: Location: San Diego arrest . [sent-39, score-1.153]

18 Document 2 Event properties: sea brigands Suspect: Authorities: Navy warship Saturday Time: Location: Gulf of Aden Document 3 Figure 1: A portion of the event hierarchy. [sent-57, score-0.475]

19 1 Next, to cluster the mentions that share common event properties (as shown in Figure 1), we briefly describe the linguistic features of event mentions. [sent-59, score-1.332]

20 1 Notation As input for our models, we consider a collection of I documents, each document i having Ji event mentions. [sent-61, score-0.525]

21 Each event mention is characterized by L feature types, FT, and each feature type is represented by a finite number of feature values, f v. [sent-62, score-0.907]

22 Therefore, we can represent the observable properties of an event mention, em, as a vector of pairs (FT1 : f v1i), . [sent-63, score-0.521]

23 , (FTL : f vLi) , where each feature value index i ranges in the feature value space associated with a feature type. [sent-66, score-0.252]

24 Class Features (CF) These features aim to classify mentions into several types of classes: the mention HW’s part-of-speech (POS), the word class of the HW (HWC), which can take one of the following values verb, noun, adjective, other , and the event class of the mention (EC). [sent-69, score-1.139]

25 To extract the event class associated to every event mention, we employed the event identifier described in [6]. [sent-70, score-1.341]

26 WordNet Features (WF) We build three types of clusters over all the words from WordNet [9] and use them as features for the mention HW. [sent-71, score-0.239]

27 Semantic Features (SF) To extract features that characterize participants and properties of event mentions, we use s semantic parser [8] trained on PropBank(PB) [23] and FrameNet(FN) [4] corpora. [sent-76, score-0.563]

28 (For instance, for the apprehended mention from our example, Jackson is the feature value 1 2 For consistency, we try to preserve the notation of the original models. [sent-77, score-0.311]

29 In this subsection and the following section, the feature term is used in context of a feature type. [sent-78, score-0.168]

30 ) Another semantic feature is the semantic frame (FR) that is evoked by an event mention. [sent-80, score-0.661]

31 (For instance, all the emphasized mentions from our example evoke the ARREST frame from FN. [sent-81, score-0.316]

32 3 Finite Feature Models In this section, we present a sequence of HDP mixture models for solving event coreference. [sent-84, score-0.447]

33 To describe these models, we consider Z the set of indicator random variables for indices of events, φz the set of parameters associated to an event z, φ a notation for all model parameters, and X a notation for all random variables that represent observable features. [sent-86, score-0.521]

34 Given a document collection annotated with event mentions, the goal is to find the best assignment of event indices, Z∗ , which maximize the posterior probability P (Z | X). [sent-87, score-1.007]

35 In this model, which is depicted graphically in Figure 2(a), the observable components are characterized by only one feature. [sent-91, score-0.149]

36 The distribution over events associated to each document β is generated by a Dirichlet process with a concentration parameter α > 0. [sent-92, score-0.178]

37 Since this setting enables a clustering of event mentions at the document level, it is desirable that events are shared across documents and the number of events K is inferred from data. [sent-93, score-1.034]

38 The global distribution drawn from this DP prior, denoted as β 0 in Figure 2(a), encodes the event mixing weights. [sent-95, score-0.447]

39 Thus, same global events are used for each document, but each event has a document specific distribution βi that is drawn from a DP prior centered on β0 . [sent-96, score-0.625]

40 The formula for sampling an event index for mention j from document i, Zi,j , is given by:4 P (Zi,j | Z−i,j , HL) ∝ P (Zi,j | Z−i,j )P (HLi,j | Z, HL−i,j ) where HLi,j is the head lemma of the event mention j from the document i. [sent-99, score-1.492]

41 Then, to generate the mention head lemma (in this model, X = HL ), the event z is associated with a multinomial emission distribution over the HL feature values having the parameters φ = φhl . [sent-101, score-0.764]

42 P (HLi,j = hl | Z, HL−i,j ) ∝ nhl,z + λHL where HLi,j is the head lemma of mention j from document i, and nhl,z is the number of times the feature value hl has been associated with the event index z in (Z, HL−i,j ). [sent-112, score-1.329]

43 2 Adding More Features A model in which observable components are represented only by one feature has the tendency to cluster these components based on their feature value. [sent-115, score-0.275]

44 To address this limitation, H&K; proposed a more complex model that is strictly customized for entity coreference resolution. [sent-116, score-0.527]

45 On the other hand, event coreference involves clustering complex objects characterized by richer features than entity coreference (or topic detection), and therefore it is desirable to extend the HDP1f model with a generalized model where additional features can be easily incorporated. [sent-117, score-1.647]

46 In general, if X consists of L feature variables, the inference formula for the Gibbs sampler is defined as: P (Zi,j | X) ∝ P (Zi,j ) P (F Ti,j | Z) F T ∈X The graphical model for this general setting is depicted in Figure 2(c). [sent-124, score-0.154]

47 4 Unbounded Feature Models First, we present a generative model called the Markov Indian Buffet Process (mIBP) that provides a mechanism in which each object can be represented by a sparse subset of a potentially unbounded set of latent features [15, 14, 30]. [sent-134, score-0.192]

48 1 The Markov Indian Buffet Process As described in [30], the mIBP defines a distribution over an unbounded set of binary Markov chains, where each chain can be associated to a binary latent feature that evolves over time according to Markov dynamics. [sent-138, score-0.164]

49 An observation yt contains a subset from the unbounded set of features {f 1 , f 2 , . [sent-140, score-0.213]

50 Therefore, F decomposes the observations and represents them as feature factors, which can then be associated to hidden variables in an iFHMM as depicted in Figure 3(a). [sent-147, score-0.173]

51 The transition matrix of a binary Markov chain associated to a feature f m is defined as W(m) = 1 − am 1 − bm am bm (m) m where Wij = P (Ft+1 = j | Ftm = i), the parameters am ∼ Beta(α′ /M, 1) and bm ∼ Beta(γ ′ , δ ′ ), m and the initial state F0 = 0. [sent-148, score-0.204]

52 In the generative process, the hidden variable of feature f m for an m 1−F m Ft−1 object yt , Ftm ∼ Bernoulli(am t−1 bm ). [sent-149, score-0.268]

53 After all features are sampled for the tth component, a number of Poisson(α′ /t) new features are assigned for this component and M gets incremented accordingly. [sent-154, score-0.136]

54 Since one observable component is associated with an unbounded countable set of features, we have to provide a mechanism in which only a finite set of features will represent the component in the HDP inference process. [sent-158, score-0.238]

55 (M→ ∞) The idea behind this mechanism is to use slice sampling 7 [21] in order to derive a finite set of features for yt . [sent-163, score-0.27]

56 Another observation worth mentioning regarding the way this set is constructed is that only the most representative features of yt get selected in Bt . [sent-165, score-0.161]

57 To describe the beam sampler for event coreference resolution, we introduce additional notation. [sent-172, score-0.96]

58 , sT ) the sequence of hidden states corresponding to the sequence of event mentions (y1 , . [sent-176, score-0.78]

59 , yT ), where each state st belong to one of the K events, st ∈ {1, . [sent-179, score-0.312]

60 , K}, and each mention yt is represented by a sequence of latent features Ft1 , Ft2 , . [sent-182, score-0.36]

61 One element of the transition probability π is defined as πij = P (st = j | st−1 = i) and a mention yt is generated according to a likelihood model F that is parameterized by a state-dependent parameter φst (yt | st ∼ F(φst )). [sent-186, score-0.42]

62 The beam sampling algorithm combines the ideas of slice sampling and dynamic programming for an efficient sampling of state trajectories. [sent-188, score-0.175]

63 1, and the auxiliary variable ut ∼ Uniform(0, πst−1 st ) for each mention yt . [sent-192, score-0.42]

64 Also, in this step, we compute the probabilities P (st | y1:t , u1:t ) for all t as described in [29]: P (st−1 | y1:t−1 , u1:t−1 ) P (st | y1:t , u1:t ) ∝ P (yt | st ) st−1 :ut <πst−1 st Here, the dependencies involving parameters π and φ are omitted for clarity. [sent-195, score-0.345]

65 In the backward step, we first sample the event for the last state sT directly from P (sT | y1:T , u1:T ) and then, for all t : T − 1, 1, we sample each state st given st+1 by using the formula P (st | st+1 , y1:T , u1:T) ∝ P (st|y1:t , u1:t)P (st+1|st , ut+1). [sent-196, score-0.635]

66 , oK ) defined as: T ok = nmk t=1 f m ∈Bt where nmk counts how many times feature f m was sampled for event k, and Bt stores a finite set of features for yt as it is defined in Section 4. [sent-200, score-0.692]

67 This corpus annotates within-document coreference information of specific types of events (such as Conflict, Justice, and Life). [sent-203, score-0.643]

68 After an initial processing phase, we extracted from ACE 6553 event mentions and 4946 events. [sent-204, score-0.729]

69 To increase the diversity of events and to evaluate the models for both within- and crossdocument event coreference, we created the EventCorefBank corpus (ECB). [sent-205, score-0.617]

70 8 This new corpus contains 43 topics, 1744 event mentions, 1302 within-document events, and 339 cross-document events. [sent-206, score-0.489]

71 For a more realistic approach, we trained the models on all the event mentions from the two corpora and not only on the mentions manually annotated for event coreference (the true event mentions). [sent-207, score-2.413]

72 In this regard, we ran the event identifier described in [6] on the ACE and ECB corpora, and extracted 45289 and 21175 system mentions respectively. [sent-208, score-0.729]

73 Since there is no agreement on the best coreference resolution metric, we employed four metrics for our evaluation: the link -based MUC metric [31], the mention -based B3 metric [2], the entity -based CEAF metric [19], and the pairwise F1 (PW) metric. [sent-210, score-0.791]

74 In the evaluation process, we considered only the true mentions of the ACE test dataset and of the test sets of a 5-fold cross validation scheme on the ECB dataset. [sent-211, score-0.282]

75 For evaluating the cross-document coreference annotations, we adopted the same approach as described in [3] by merging all the documents from the same topic into a meta-document and then scoring this document as performed for within-document evaluation. [sent-212, score-0.551]

76 Also, for both corpora, we considered a set of 132 feature types, where each feature type consists on average of 3900 distinct feature values. [sent-213, score-0.252]

77 The Baseline A simple baseline for event coreference consists in grouping events by their event classes [1]. [sent-214, score-1.502]

78 To extract event classes, we employed the event identifier described in [6]. [sent-215, score-0.894]

79 Therefore, this baseline will categorize events into a small number of clusters, since the event identifier is trained to predict the five event classes annotated in TimeBank [26]. [sent-216, score-1.064]

80 As it was already observed [20, 11], considering very few categories for coreference resolution tasks will result in overestimates of the MUC scorer. [sent-217, score-0.566]

81 For instance, a baseline that groups all entity mentions into the same entity achieves the highest MUC score than any published system for the task of entity coreference. [sent-218, score-0.479]

82 Similar behaviour of the MUC metric is observed for event coreference resolution. [sent-219, score-0.92]

83 For example, for crossdocument evaluation on ECB, a baseline that clusters all mentions into one event achieves 73. [sent-220, score-0.792]

84 HDP Extensions Due to memory limitations, we evaluated the HDPf lat and HDPstruct models only on a restricted subset of manually selected feature types. [sent-223, score-0.21]

85 In general, as shown in Table 2, the HDPf lat model achieved the best performance results on the ACE test dataset, whereas the 8 This resource is available at http://www. [sent-224, score-0.126]

86 7 Model R MUC P Baseline HDP1f (HL ) HDPf lat HDPstruct mIBP-HDP iFHMM-iHMM 94. [sent-229, score-0.126]

87 8 Baseline HDP1f (HL ) HDPf lat HDPstruct mIBP-HDP iFHMM-iHMM 92. [sent-241, score-0.126]

88 2 Baseline HDP1f (HL ) HDPf lat HDPstruct mIBP-HDP iFHMM-iHMM 90. [sent-253, score-0.126]

89 0 B3 CEAF F R P F R P ACE (within-document event coreference) 49. [sent-265, score-0.447]

90 2 Table 2: Evaluation results for within- and cross-document event coreference resolution. [sent-445, score-0.92]

91 HDPstruct model, which also considers dependencies between feature types, proved to be more effective on the ECB dataset for both within- and cross-document event coreference evaluation. [sent-446, score-1.037]

92 The set of feature types used to achieve these results consists of combinations of types from all feature categories described in Section 2. [sent-447, score-0.168]

93 As can be observed from Table 2, the results of the HDPf lat and HDPstruct models show an F-score increase by 4-10% over the HDP1f model, and therefore prove that the HDP extensions provide a more flexible representation for clustering objects characterized by rich properties. [sent-450, score-0.19]

94 When compared with the restricted set of features considered by the HDPf lat and HDPstruct models, the percentage of values selected by mIBP-HDP is only 6%. [sent-453, score-0.194]

95 iFHMM-iHMM In spite of the automatic feature selection employed for the iFHMM-iHMM model, its results remain competitive against the results of the HDP extensions (where the feature types were hand tuned). [sent-455, score-0.168]

96 Also, these results indicate that the iFHMM-iHMM model is a better framework than HDP in capturing the event mention dependencies simulated by the mIBP feature sampling scheme. [sent-457, score-0.77]

97 6 Conclusion In this paper, we have described how a sequence of unsupervised, nonparametric Bayesian models can be employed to cluster complex linguistic objects that are characterized by a rich set of features. [sent-462, score-0.171]

98 The experimental results proved that these models are able to solve real data applications in which the feature and cluster numbers are treated as free parameters, and the selection of features is performed automatically. [sent-463, score-0.185]

99 While the results of event coreference resolution are promising, we believe that the classes of models proposed in this paper have a real utility for a wide range of applications. [sent-464, score-1.013]

100 Bayesian Statistics 8, chapter Bayesian nonparametric latent feature models, pages 201–225. [sent-530, score-0.158]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('coreference', 0.473), ('event', 0.447), ('mentions', 0.282), ('hl', 0.258), ('mention', 0.171), ('st', 0.156), ('mibp', 0.154), ('hdpstruct', 0.14), ('hdpf', 0.126), ('lat', 0.126), ('hdp', 0.115), ('arrest', 0.112), ('events', 0.1), ('ace', 0.098), ('ecb', 0.098), ('ftm', 0.098), ('muc', 0.098), ('resolution', 0.093), ('yt', 0.093), ('ifhmm', 0.084), ('feature', 0.084), ('document', 0.078), ('observable', 0.074), ('wordnet', 0.074), ('fm', 0.07), ('features', 0.068), ('fr', 0.059), ('jackson', 0.056), ('hw', 0.056), ('apprehended', 0.056), ('cosmin', 0.056), ('linguistic', 0.055), ('entity', 0.054), ('ft', 0.053), ('pb', 0.053), ('unbounded', 0.052), ('hidden', 0.051), ('markov', 0.049), ('adrian', 0.049), ('semantic', 0.048), ('nonparametric', 0.046), ('mechanism', 0.044), ('factorial', 0.044), ('zoubin', 0.042), ('corpus', 0.042), ('arrested', 0.042), ('hli', 0.042), ('bm', 0.04), ('beam', 0.04), ('depicted', 0.038), ('characterized', 0.037), ('bt', 0.037), ('ihmm', 0.037), ('ji', 0.036), ('annotated', 0.035), ('baseline', 0.035), ('sampling', 0.035), ('frame', 0.034), ('nite', 0.033), ('head', 0.033), ('dependencies', 0.033), ('cluster', 0.033), ('vincent', 0.032), ('buffet', 0.032), ('formula', 0.032), ('lexical', 0.03), ('whye', 0.03), ('yee', 0.03), ('pos', 0.03), ('slice', 0.03), ('indian', 0.029), ('henceforth', 0.029), ('emission', 0.029), ('gael', 0.029), ('unsupervised', 0.028), ('aden', 0.028), ('annotates', 0.028), ('bagga', 0.028), ('bejan', 0.028), ('breck', 0.028), ('ceaf', 0.028), ('chargers', 0.028), ('coreferential', 0.028), ('crossdocument', 0.028), ('framenet', 0.028), ('fri', 0.028), ('gaizauskas', 0.028), ('gulf', 0.028), ('hwc', 0.028), ('jurgen', 0.028), ('nabbed', 0.028), ('playoff', 0.028), ('sanda', 0.028), ('saturday', 0.028), ('suspicion', 0.028), ('timebank', 0.028), ('tuesday', 0.028), ('warship', 0.028), ('latent', 0.028), ('clustering', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution

Author: Cosmin Bejan, Matthew Titsworth, Andrew Hickl, Sanda Harabagiu

Abstract: We present a sequence of unsupervised, nonparametric Bayesian models for clustering complex linguistic objects. In this approach, we consider a potentially infinite number of features and categorical outcomes. We evaluated these models for the task of within- and cross-document event coreference on two corpora. All the models we investigated show significant improvements when compared against an existing baseline for this task.

2 0.24917534 89 nips-2009-FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs

Author: Andrew McCallum, Karl Schultz, Sameer Singh

Abstract: Discriminatively trained undirected graphical models have had wide empirical success, and there has been increasing interest in toolkits that ease their application to complex relational data. The power in relational models is in their repeated structure and tied parameters; at issue is how to define these structures in a powerful and flexible way. Rather than using a declarative language, such as SQL or first-order logic, we advocate using an imperative language to express various aspects of model structure, inference, and learning. By combining the traditional, declarative, statistical semantics of factor graphs with imperative definitions of their construction and operation, we allow the user to mix declarative and procedural domain knowledge, and also gain significant efficiencies. We have implemented such imperatively defined factor graphs in a system we call FACTORIE, a software library for an object-oriented, strongly-typed, functional language. In experimental comparisons to Markov Logic Networks on joint segmentation and coreference, we find our approach to be 3-15 times faster while reducing error by 20-25%—achieving a new state of the art. 1

3 0.095889516 45 nips-2009-Beyond Convexity: Online Submodular Minimization

Author: Elad Hazan, Satyen Kale

Abstract: We consider an online decision problem over a discrete space in which the loss function is submodular. We give algorithms which are computationally efficient and are Hannan-consistent in both the full information and bandit settings. 1

4 0.074481875 24 nips-2009-Adapting to the Shifting Intent of Search Queries

Author: Umar Syed, Aleksandrs Slivkins, Nina Mishra

Abstract: Search engines today present results that are often oblivious to recent shifts in intent. For example, the meaning of the query ‘independence day’ shifts in early July to a US holiday and to a movie around the time of the box office release. While no studies exactly quantify the magnitude of intent-shifting traffic, studies suggest that news events, seasonal topics, pop culture, etc account for 1/2 the search queries. This paper shows that the signals a search engine receives can be used to both determine that a shift in intent happened, as well as find a result that is now more relevant. We present a meta-algorithm that marries a classifier with a bandit algorithm to achieve regret that depends logarithmically on the number of query impressions, under certain assumptions. We provide strong evidence that this regret is close to the best achievable. Finally, via a series of experiments, we demonstrate that our algorithm outperforms prior approaches, particularly as the amount of intent-shifting traffic increases. 1

5 0.074015751 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

Author: Kurt Miller, Michael I. Jordan, Thomas L. Griffiths

Abstract: As the availability and importance of relational data—such as the friendships summarized on a social networking website—increases, it becomes increasingly important to have good models for such data. The kinds of latent structure that have been considered for use in predicting links in such networks have been relatively limited. In particular, the machine learning community has focused on latent class models, adapting Bayesian nonparametric methods to jointly infer how many latent classes there are while learning which entities belong to each class. We pursue a similar approach with a richer kind of latent variable—latent features—using a Bayesian nonparametric approach to simultaneously infer the number of features at the same time we learn which entities have each feature. Our model combines these inferred features with known covariates in order to perform link prediction. We demonstrate that the greater expressiveness of this approach allows us to improve performance on three datasets. 1

6 0.073674098 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

7 0.071611948 57 nips-2009-Conditional Random Fields with High-Order Features for Sequence Labeling

8 0.070400678 53 nips-2009-Complexity of Decentralized Control: Special Cases

9 0.065976709 260 nips-2009-Zero-shot Learning with Semantic Output Codes

10 0.062076498 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

11 0.060043041 65 nips-2009-Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

12 0.059862737 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations

13 0.059206512 96 nips-2009-Filtering Abstract Senses From Image Search Results

14 0.059181157 242 nips-2009-The Infinite Partially Observable Markov Decision Process

15 0.057104547 226 nips-2009-Spatial Normalized Gamma Processes

16 0.055283487 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall

17 0.054610625 192 nips-2009-Posterior vs Parameter Sparsity in Latent Variable Models

18 0.053579815 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

19 0.052579071 29 nips-2009-An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

20 0.052238014 218 nips-2009-Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.15), (1, -0.022), (2, 0.001), (3, -0.177), (4, 0.053), (5, -0.042), (6, -0.012), (7, 0.015), (8, -0.06), (9, 0.065), (10, 0.045), (11, 0.022), (12, 0.003), (13, -0.02), (14, -0.05), (15, 0.042), (16, 0.121), (17, -0.023), (18, -0.002), (19, 0.017), (20, 0.012), (21, 0.06), (22, -0.003), (23, -0.036), (24, 0.059), (25, 0.164), (26, -0.004), (27, -0.057), (28, 0.13), (29, 0.002), (30, -0.016), (31, 0.101), (32, -0.094), (33, -0.104), (34, 0.128), (35, -0.223), (36, -0.037), (37, 0.028), (38, -0.038), (39, -0.071), (40, 0.033), (41, 0.127), (42, 0.029), (43, 0.003), (44, -0.024), (45, -0.13), (46, -0.008), (47, -0.06), (48, -0.094), (49, 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93148029 171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution

Author: Cosmin Bejan, Matthew Titsworth, Andrew Hickl, Sanda Harabagiu

Abstract: We present a sequence of unsupervised, nonparametric Bayesian models for clustering complex linguistic objects. In this approach, we consider a potentially infinite number of features and categorical outcomes. We evaluated these models for the task of within- and cross-document event coreference on two corpora. All the models we investigated show significant improvements when compared against an existing baseline for this task.

2 0.75312454 89 nips-2009-FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs

Author: Andrew McCallum, Karl Schultz, Sameer Singh

Abstract: Discriminatively trained undirected graphical models have had wide empirical success, and there has been increasing interest in toolkits that ease their application to complex relational data. The power in relational models is in their repeated structure and tied parameters; at issue is how to define these structures in a powerful and flexible way. Rather than using a declarative language, such as SQL or first-order logic, we advocate using an imperative language to express various aspects of model structure, inference, and learning. By combining the traditional, declarative, statistical semantics of factor graphs with imperative definitions of their construction and operation, we allow the user to mix declarative and procedural domain knowledge, and also gain significant efficiencies. We have implemented such imperatively defined factor graphs in a system we call FACTORIE, a software library for an object-oriented, strongly-typed, functional language. In experimental comparisons to Markov Logic Networks on joint segmentation and coreference, we find our approach to be 3-15 times faster while reducing error by 20-25%—achieving a new state of the art. 1

3 0.5459615 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

Author: Kurt Miller, Michael I. Jordan, Thomas L. Griffiths

Abstract: As the availability and importance of relational data—such as the friendships summarized on a social networking website—increases, it becomes increasingly important to have good models for such data. The kinds of latent structure that have been considered for use in predicting links in such networks have been relatively limited. In particular, the machine learning community has focused on latent class models, adapting Bayesian nonparametric methods to jointly infer how many latent classes there are while learning which entities belong to each class. We pursue a similar approach with a richer kind of latent variable—latent features—using a Bayesian nonparametric approach to simultaneously infer the number of features at the same time we learn which entities have each feature. Our model combines these inferred features with known covariates in order to perform link prediction. We demonstrate that the greater expressiveness of this approach allows us to improve performance on three datasets. 1

4 0.4038974 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

Author: Khashayar Rohanimanesh, Sameer Singh, Andrew McCallum, Michael J. Black

Abstract: Large, relational factor graphs with structure defined by first-order logic or other languages give rise to notoriously difficult inference problems. Because unrolling the structure necessary to represent distributions over all hypotheses has exponential blow-up, solutions are often derived from MCMC. However, because of limitations in the design and parameterization of the jump function, these samplingbased methods suffer from local minima—the system must transition through lower-scoring configurations before arriving at a better MAP solution. This paper presents a new method of explicitly selecting fruitful downward jumps by leveraging reinforcement learning (RL). Rather than setting parameters to maximize the likelihood of the training data, parameters of the factor graph are treated as a log-linear function approximator and learned with methods of temporal difference (TD); MAP inference is performed by executing the resulting policy on held out test data. Our method allows efficient gradient updates since only factors in the neighborhood of variables affected by an action need to be computed—we bypass the need to compute marginals entirely. Our method yields dramatic empirical success, producing new state-of-the-art results on a complex joint model of ontology alignment, with a 48% reduction in error over state-of-the-art in that domain. 1

5 0.40172485 57 nips-2009-Conditional Random Fields with High-Order Features for Sequence Labeling

Author: Nan Ye, Wee S. Lee, Hai L. Chieu, Dan Wu

Abstract: Dependencies among neighbouring labels in a sequence is an important source of information for sequence labeling problems. However, only dependencies between adjacent labels are commonly exploited in practice because of the high computational complexity of typical inference algorithms when longer distance dependencies are taken into account. In this paper, we show that it is possible to design efficient inference algorithms for a conditional random field using features that depend on long consecutive label sequences (high-order features), as long as the number of distinct label sequences used in the features is small. This leads to efficient learning algorithms for these conditional random fields. We show experimentally that exploiting dependencies using high-order features can lead to substantial performance improvements for some problems and discuss conditions under which high-order features can be effective. 1

6 0.39836207 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

7 0.39095569 196 nips-2009-Quantification and the language of thought

8 0.37564236 68 nips-2009-Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora

9 0.36810723 29 nips-2009-An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

10 0.35029909 56 nips-2009-Conditional Neural Fields

11 0.34113571 24 nips-2009-Adapting to the Shifting Intent of Search Queries

12 0.33735004 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

13 0.33342162 90 nips-2009-Factor Modeling for Advertisement Targeting

14 0.33036089 197 nips-2009-Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs

15 0.31951967 153 nips-2009-Modeling Social Annotation Data with Content Relevance using a Topic Model

16 0.31926394 218 nips-2009-Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining

17 0.31306732 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

18 0.31082633 195 nips-2009-Probabilistic Relational PCA

19 0.30877447 204 nips-2009-Replicated Softmax: an Undirected Topic Model

20 0.30848405 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.015), (24, 0.032), (25, 0.034), (27, 0.351), (35, 0.043), (36, 0.108), (39, 0.04), (55, 0.01), (58, 0.041), (61, 0.041), (66, 0.016), (71, 0.107), (81, 0.015), (86, 0.054)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75885725 171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution

Author: Cosmin Bejan, Matthew Titsworth, Andrew Hickl, Sanda Harabagiu

Abstract: We present a sequence of unsupervised, nonparametric Bayesian models for clustering complex linguistic objects. In this approach, we consider a potentially infinite number of features and categorical outcomes. We evaluated these models for the task of within- and cross-document event coreference on two corpora. All the models we investigated show significant improvements when compared against an existing baseline for this task.

2 0.74969625 63 nips-2009-DUOL: A Double Updating Approach for Online Learning

Author: Peilin Zhao, Steven C. Hoi, Rong Jin

Abstract: In most online learning algorithms, the weights assigned to the misclassified examples (or support vectors) remain unchanged during the entire learning process. This is clearly insufficient since when a new misclassified example is added to the pool of support vectors, we generally expect it to affect the weights for the existing support vectors. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short. Instead of only assigning a fixed weight to the misclassified example received in current trial, the proposed online learning algorithm also tries to update the weight for one of the existing support vectors. We show that the mistake bound can be significantly improved by the proposed online learning method. Encouraging experimental results show that the proposed technique is in general considerably more effective than the state-of-the-art online learning algorithms. 1

3 0.66430289 205 nips-2009-Rethinking LDA: Why Priors Matter

Author: Andrew McCallum, David M. Mimno, Hanna M. Wallach

Abstract: Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such “smoothing parameters” have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document–topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic–word distributions provides no real benefit. Approximation of this prior structure through simple, efficient hyperparameter optimization steps is sufficient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efficient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling. 1

4 0.47093979 27 nips-2009-Adaptive Regularization of Weight Vectors

Author: Koby Crammer, Alex Kulesza, Mark Dredze

Abstract: We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, that does not assume separability. We also relate our algorithm to recent confidence-weighted online learning techniques and show empirically that AROW achieves state-of-the-art performance and notable robustness in the case of non-separable data. 1

5 0.44851813 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains

Author: Theodore J. Perkins

Abstract: Continuous-time Markov chains are used to model systems in which transitions between states as well as the time the system spends in each state are random. Many computational problems related to such chains have been solved, including determining state distributions as a function of time, parameter estimation, and control. However, the problem of inferring most likely trajectories, where a trajectory is a sequence of states as well as the amount of time spent in each state, appears unsolved. We study three versions of this problem: (i) an initial value problem, in which an initial state is given and we seek the most likely trajectory until a given final time, (ii) a boundary value problem, in which initial and final states and times are given, and we seek the most likely trajectory connecting them, and (iii) trajectory inference under partial observability, analogous to finding maximum likelihood trajectories for hidden Markov models. We show that maximum likelihood trajectories are not always well-defined, and describe a polynomial time test for well-definedness. When well-definedness holds, we show that each of the three problems can be solved in polynomial time, and we develop efficient dynamic programming algorithms for doing so. 1

6 0.44804475 56 nips-2009-Conditional Neural Fields

7 0.44760764 130 nips-2009-Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization

8 0.44548979 104 nips-2009-Group Sparse Coding

9 0.44412225 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

10 0.43878713 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

11 0.43468153 260 nips-2009-Zero-shot Learning with Semantic Output Codes

12 0.4332149 204 nips-2009-Replicated Softmax: an Undirected Topic Model

13 0.43302581 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

14 0.43095085 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

15 0.43038327 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

16 0.42819676 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

17 0.42798021 226 nips-2009-Spatial Normalized Gamma Processes

18 0.4261775 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

19 0.42541599 129 nips-2009-Learning a Small Mixture of Trees

20 0.4229109 11 nips-2009-A General Projection Property for Distribution Families