acl acl2010 acl2010-55 knowledge-graph by maker-knowledge-mining

55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

Source: pdf

Author: Ivan Titov ; Mikhail Kozhevnikov

Abstract: We argue that groups of unannotated texts with overlapping and non-contradictory semantics represent a valuable source of information for learning semantic representations. A simple and efficient inference method recursively induces joint semantic representations for each group and discovers correspondence between lexical entries and latent semantic concepts. We consider the generative semantics-text correspondence model (Liang et al., 2009) and demonstrate that exploiting the noncontradiction relation between texts leads to substantial improvements over natural baselines on a problem of analyzing human-written weather forecasts.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract We argue that groups of unannotated texts with overlapping and non-contradictory semantics represent a valuable source of information for learning semantic representations. [sent-4, score-0.649]

2 A simple and efficient inference method recursively induces joint semantic representations for each group and discovers correspondence between lexical entries and latent semantic concepts. [sent-5, score-0.592]

3 , 2009) and demonstrate that exploiting the noncontradiction relation between texts leads to substantial improvements over natural baselines on a problem of analyzing human-written weather forecasts. [sent-7, score-0.457]

4 For example, when analyzing weather forecasts it is very hard to discover in an unsupervised way which of the expressions among “south wind”, “wind from west” and “southerly” denote the same wind direction and which are not, as they all have a very similar distribution of their contexts. [sent-15, score-0.746]

5 In this paper, we show that groups of unannotated texts with overlapping and non-contradictory semantics provide a valuable source of information. [sent-17, score-0.546]

6 We assume that each text in a group is independently generated from a full latent semantic state corresponding to the group. [sent-19, score-0.395]

7 Importantly, the texts in each group do not have to be paraphrases of each other, as they can verbalize only specific parts (aspects) of the full semantic state, yet statements about the same aspects must not contradict each other. [sent-20, score-0.494]

8 Simultaneous inference of the semantic state for the noncontradictory and semantically overlapping documents would restrict the space of compatible hypotheses, and, intuitively, ‘easier’ texts in a group will help to analyze the ‘harder’ ones. [sent-21, score-0.764]

9 weather forecasts and their alignment to the semantic Note that the semantic representation (the block in the middle) is not observable in training. [sent-29, score-0.898]

10 However, it is important to note that the phrase “wind from west” may still appear in the texts, but in reference to other time periods, underlying the need for modeling alignment between grouped texts and their latent meaning representation. [sent-34, score-0.485]

11 As much of the human knowledge is redescribed multiple times, we believe that noncontradictory and semantically overlapping texts are often easy to obtain. [sent-35, score-0.348]

12 Alternatively, if such groupings are not available, it may still be easier to give each semantic representation (or a state) to multiple annotators and ask each of them to provide a textual description, instead of annotating texts with semantic expressions. [sent-38, score-0.396]

13 They investigate grounded language acquisition set-up and assume that semantics (world state) can be represented as a set of records each consisting of a set of fields. [sent-46, score-0.384]

14 Their model segments text into utterances and identifies records, fields and field values discussed in each utter- ance. [sent-47, score-0.449]

15 For example, in the weather forecast domain field sky cover should get the same value given expressions “overcast” and “very cloudy” but a different one if the expres959 sions are “clear” or “sunny”. [sent-51, score-0.682]

16 measuring how well the model predicts the alignment between the text and the observable records describing the entire world state. [sent-55, score-0.584]

17 We follow their set-up, but assume that in- stead of having access to the full semantic state for every training example, we have a very small amount of data annotated with semantic states and a larger number of unannotated texts with noncontradictory semantics. [sent-56, score-0.694]

18 We study our set-up on the weather forecast data (Liang et al. [sent-57, score-0.436]

19 , 2009) where the original textual weather forecasts were complemented by additional forecasts describing the same weather states (see figure 1for an example). [sent-58, score-1.011]

20 The average overlap between the verbalized fields in each group ofnoncontradictory forecasts was below 35%, and more than 60% of fields are mentioned only in a single forecast from a group. [sent-59, score-0.997]

21 Our model, learned from 100 labeled forecasts and 259 groups of unannotated non-contradictory forecasts (750 texts in total), achieved 73. [sent-60, score-0.783]

22 1% shown by a semi-supervised learning approach, though, as expected, does not reach the score of the model which, in training, observed semantics states for all the 750 documents (77. [sent-63, score-0.292]

23 Statistical models of parsing can often be regarded as defining the probability distribution of meaning m and its alignment a with the given text w, P(m, a, w) = P(a, w |m)P(m). [sent-74, score-0.302]

24 , (Poon and Domingos, 2009)) or as a set of field values if database records are used as a meaning representation (Liang et al. [sent-77, score-0.493]

25 The alignment a defines how semantics is verbalized in the text w, and it can be represented by a meaning derivation tree in case of full semantic parsing (Poon and Domingos, 2009) or, e. [sent-79, score-0.721]

26 In semantic parsing, we aim to find the most likely underlying semantics and alignment given the text: ( mˆ , aˆ) = argmm,aaxP(a,w|m)P(m). [sent-82, score-0.379]

27 Note th|awt the decision about mi is now conditioned on all the texts wj rather than only on wi. [sent-96, score-0.43]

28 This conditioning is exactly what drives learning, as the information about likely semantics mj of text j affects the decision about choice of mi: P(mi|w1,. [sent-97, score-0.263]

29 Note, that this probability is different from the probability that mi is actually verbalized in the text. [sent-112, score-0.306]

30 Even though the dependencies are only conveyed via {mj : j i} the space ofpossible meanings m i sa very large even hfoer s rpealcaetively simple semantic representations, and, therefore, we need to resort to efficient approximations. [sent-114, score-0.267]

31 An even simpler technique would be to parse texts in a random order conditioning each meaning mk? [sent-118, score-0.293]

32 We propose a simple algorithm which aims to find an appropriate order of the greedy inference × by estimating how well each candidate semantics mˆk would explain other texts and at each step selecting k (and mˆ k) which explains them best. [sent-124, score-0.437]

33 Then it iteratively predicts meaning representations mˆ j conditioned on the list of semantics m? [sent-156, score-0.402]

34 The algorithm selects a single meaning mˆ j which maximizes the probability of all the remaining texts and excludes the text j from future consideration (lines 6-7). [sent-163, score-0.345]

35 Though the semantics mk (k ∈/ n∪ {j}) used in the estimates (line 6) can be in(cko ∈/ ns nist∪en{tj }w)i tuhs eedac inh other, the final list of meanings m? [sent-164, score-0.4]

36 , as the semantics mˆ ni was conditioned on the meaning m? [sent-168, score-0.379]

37 An important aspect of this algorithm is that unlike usual greedy inference, the remaining (‘future’) texts do affect the choice of meaning rep- resentations made on the earlier stages. [sent-170, score-0.293]

38 In this paper, we use semantic structures as a pivot for finding the best alignment in the hope that presence of meaningful text alignments will improve the quality of the resulting semantic structures by enforcing a form of agreement between them. [sent-183, score-0.432]

39 (2009) considered a scenario where each text was annotated with a world state, even though alignment between the text and the state was not observable. [sent-188, score-0.465]

40 This is a weaker form of supervision than the one traditionally considered in supervised semantic parsing, where the alignment is also usually provided in training (Chen and Mooney, 2008; Zettlemoyer and Collins, 2005). [sent-189, score-0.375]

41 Nevertheless, both in training and testing the world state is observable, and the alignment and the text are conditioned on the state during inference. [sent-190, score-0.527]

42 As explained in the introduction, the world states s are represented by sets of records (see the block in the middle of figure 1 for an example of a world state). [sent-193, score-0.472]

43 Each record is characterized by a record type t ∈ {1,. [sent-194, score-0.346]

44 For example, there may be more than a single record of type wind speed, as they may refer to different time periods but all these records have the same set of fields, such as minimal, maximal and average wind speeds. [sent-199, score-0.742]

45 We write = v to denote that n-th record of type t has field f set to value v. [sent-201, score-0.344]

46 The probability of meaning mk then equals the probability of this assignment with other state variables left non-observable (and therefore marginalized out). [sent-203, score-0.368]

47 In this formalism checking for contradiction is trivial: two meaning representations Figure 3: The semantics-text correspondence model with K documents sharing the same latent semantic state. [sent-204, score-0.608]

48 The semantics-text correspondence model defines a hierarchical segmentation of text: first, it segments the text into fragments discussing different records, then the utterances corresponding to each record are further segmented into fragments verbalizing specific fields ofthat record. [sent-206, score-0.545]

49 , wK sharing a semantic state is as follows: • Generation of world state s: Faotiro nea ocfh w type τ a∈t {1,. [sent-214, score-0.442]

50 , } choose field values for all fields f ∈ ,Fn(τ) from the type-specific distfroibru atlilon fi. [sent-223, score-0.357]

51 , Kthe} v:4e R ∈ec {o1rd, Types: Choose a sequence of verbalized record types t = (t1,. [sent-227, score-0.36]

52 Records: For each type ti choose a verbalized record ri from all the records of that type: l ∼ Unif(1,. [sent-231, score-0.616]

53 , n(τ)), ri := Fields: For each record ri choose a sequence of verbalized fields fi = (fi1,. [sent-234, score-0.546]

54 962 Figure 4: A segmentation of a text fragment into records and fields. [sent-245, score-0.271]

55 Note that, when generating fields, the Markov chain is defined over fields and the transition parameters are independent of the field values rifij . [sent-246, score-0.474]

56 The form of word generation distributions P(w|fij , rifij ) depends on the type of the field fi,j. [sent-248, score-0.288]

57 Verbalizations ofnumerical fields are generated via a perturbation on the field value rifij : the value rifij can be perturbed by either rounding it (up or down) or distorting (up or down, modeled by a geometric distribution). [sent-250, score-0.591]

58 For details on these emission models, as well as for details on modeling record and field transitions, we refer the reader to the original publication (Liang et al. [sent-252, score-0.344]

59 In our experiments, when choosing a world state s, we generate the field values independently. [sent-254, score-0.371]

60 This is clearly a suboptimal regime as often there are very strong dependencies between field values: e. [sent-255, score-0.27]

61 , in the weather domain many record types contain groups ofrelated fields defining minimal, maximal and average values of some parameter. [sent-257, score-0.705]

62 As explained above, semantics ofa text m is defined by the assignment of state variables s. [sent-261, score-0.318]

63 Analogously, an alignment a between semantics m and a text w is represented by all the remaining latent variables: by the sequence of record types t = (t1,. [sent-262, score-0.582]

64 , t|t|), choice of records ri for each ti, the field sequence fi and the segment length for every field fij. [sent-265, score-0.561]

65 When the world state is observable, learning does not require any approximations, as dynamic programming (a form of the forward-backward algorithm) can be used to infer the posterior distribution on the E-step (Liang et al. [sent-273, score-0.276]

66 In the context of the semantics-text correspondence model, as we discussed above, semantics m defines the subset of admissible world states. [sent-277, score-0.358]

67 Summarizing, when predicting the most likely semantics mˆ j (line 4), for each span the decoder weighs alternatives of either (1) aligning this span to the previously induced meaning m? [sent-281, score-0.371]

68 Instead of predicting the most probable semantics mˆ j we search for the most probable pair ( aˆj , mˆ j), thus assuming that the probability mass is mostly concentrated on a single alignment. [sent-284, score-0.284]

69 We use a modification of the beam search algorithm, where we keep a set of candidate meanings (partial semantic representations) and compute an alignment for each of them using a form of the Viterbi algorithm. [sent-288, score-0.285]

70 , 2009): the state s is no longer latent and we can run efficient inference on the E-step. [sent-291, score-0.264]

71 Though some fields of the state s may still not be specified by m? [sent-292, score-0.287]

72 We smooth the distributions of values for numerical fields with convolution smoothing equivalent to the assumption that the fields are affected by distortion in the form of a two-sided geometric distribution with the success rate parameter equal to 0. [sent-295, score-0.408]

73 4 Empirical Evaluation In this section, we consider the semi-supervised set-up, and present evaluation of our approach on on the problem of aligning weather forecast reports to the formal representation of weather. [sent-299, score-0.494]

74 1 Experiments To perform the experiments we used a subset of the weather dataset introduced in (Liang et al. [sent-301, score-0.267]

75 We randomly chose 100 texts along with their world states to be used as the labeled data. [sent-306, score-0.388]

76 6 To produce groups of noncontradictory texts we have randomly selected a subset of weather states, represented them in a visual form (icons accompanied by numerical and 6In order to distinguish from completely unlabeled examples, we refer to examples labeled with world states as labeled examples. [sent-307, score-0.872]

77 Note though that the alignments are not observable even for these labeled examples. [sent-308, score-0.26]

78 These newly-produced forecasts, when combined with the original texts, resulted in 259 groups of non-contradictory texts (650 texts, 2. [sent-311, score-0.269]

79 , number distortions), or due to different perception of the weather by the annotators (e. [sent-316, score-0.267]

80 The overlap between the verbalized fields in each group was estimated to be below 35%. [sent-319, score-0.431]

81 Around 60% of fields are mentioned only in a single forecast from a group, consequently, the texts cannot be regarded as paraphrases of each other. [sent-320, score-0.58]

82 The test set consists of 150 texts, each corresponding to a different weather state. [sent-321, score-0.267]

83 We aimed to preserve approximately the same proportion of new and original examples as we had in the training set, therefore, we combined 50 texts originally present in the weather dataset with additional 100 newly-produced texts. [sent-323, score-0.457]

84 We annotated these 100 texts by aligning each line to one or more records,7 whereas for the original texts the alignments were already present. [sent-324, score-0.554]

85 Namely, 5 iterations of EM with a basic model (with no segmentation or coherence modeling), followed by 5 iterations of EM with the model which generates fields independently and, at last, 5 iterations with the full model. [sent-329, score-0.3]

86 To speed-up training, only a single record of each type is allowed to be generated when running inference for unlabeled examples on the E7The text was automatically tokenized and segmented into lines, with line breaks at punctuation characters. [sent-334, score-0.36]

87 9 Table 1: Results (precision, recall and F1) on the weather forecast dataset. [sent-348, score-0.436]

88 Similarly, though we preserved all records which refer to the first time period, for other time periods we removed all the records which declare that the corresponding event (e. [sent-350, score-0.579]

89 We compare our approach (Semi-superv, noncontr) with two baselines: the basic supervised training on 100 labeled forecasts (Supervised BL) and with the semi-supervised training which disregards the non-contradiction relations (Semi-superv BL). [sent-354, score-0.305]

90 The learning regime, the inference procedure and the texts for the semi-supervised baseline were identical to the ones used for our approach, the only difference is that all the documents were modeled as independent. [sent-355, score-0.344]

91 In these experiments, we consider the problem of predicting alignment between text and the corresponding observable world state. [sent-362, score-0.41]

92 semantic parsing) accuracy is not possible on this dataset, as the data does not contain information which fields are discussed. [sent-365, score-0.289]

93 Even if it would pro- 2507v-a50l2-u517e50 tac olm peuoarwdu,osny,rtmids n,aslorcas,eitnlacysf,oaihlun,dg iy,hn cgehalv,poynsu,edwspiu,rnopedsu c,ieb rlceyozuld Table 2: Top 5 words in the word distribution for field mode of record sky cover, function words and punctuation are omitted. [sent-366, score-0.455]

94 vide this information, the documents do not verbalize the state at the necessary granularity level to predict the field values. [sent-367, score-0.414]

95 For example, it is not possible to decide to which bucket of the field sky cover the expression ‘cloudy’ refers to, as it has a relatively uniform distribution across 3 (out of 4) buckets. [sent-368, score-0.282]

96 8 To confirm that the model trained by our approach indeed assigns new words to correct fields and records, we visualize top words for the field characterizing sky cover (table 2). [sent-371, score-0.432]

97 Our approach outperformed the baselines both in predicting ‘text’-state correspondence and in the F1 score on the predicted set of field assignments (‘text meanings’). [sent-376, score-0.346]

98 6 Summary and Future Work In this work we studied the use of weak supervision in the form of non-contradictory relations between documents in learning semantic representations. [sent-390, score-0.322]

99 However, exact inference for groups of documents with overlapping semantic representation is generally prohibitively expensive, as the shared latent semantics introduces nonlocal dependences between semantic representations of individual documents. [sent-392, score-0.82]

100 , 2009) and evaluated it on a dataset of weather forecasts. [sent-395, score-0.267]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('weather', 0.267), ('records', 0.219), ('forecasts', 0.211), ('texts', 0.19), ('verbalized', 0.187), ('fields', 0.186), ('record', 0.173), ('field', 0.171), ('forecast', 0.169), ('semantics', 0.165), ('mk', 0.164), ('liang', 0.159), ('wind', 0.15), ('mi', 0.119), ('rifij', 0.117), ('alignment', 0.111), ('supervision', 0.111), ('meaning', 0.103), ('semantic', 0.103), ('observable', 0.103), ('state', 0.101), ('world', 0.099), ('correspondence', 0.094), ('noncontradictory', 0.094), ('poon', 0.083), ('inference', 0.082), ('latent', 0.081), ('mooney', 0.08), ('groups', 0.079), ('fij', 0.075), ('sky', 0.075), ('zettlemoyer', 0.075), ('documents', 0.072), ('wk', 0.071), ('meanings', 0.071), ('representations', 0.071), ('cloudy', 0.07), ('verbalize', 0.07), ('domingos', 0.069), ('barzilay', 0.066), ('overlapping', 0.064), ('alignments', 0.063), ('conditioned', 0.063), ('raymond', 0.062), ('unif', 0.062), ('bl', 0.061), ('group', 0.058), ('wj', 0.058), ('aligning', 0.058), ('cij', 0.056), ('regime', 0.056), ('states', 0.055), ('fell', 0.053), ('line', 0.053), ('passages', 0.053), ('shinyama', 0.053), ('text', 0.052), ('em', 0.05), ('periods', 0.05), ('supervised', 0.05), ('though', 0.05), ('unannotated', 0.048), ('paraphrase', 0.048), ('ni', 0.048), ('west', 0.048), ('south', 0.048), ('graca', 0.047), ('overcast', 0.047), ('southerly', 0.047), ('verbalizations', 0.047), ('vq', 0.047), ('wnk', 0.047), ('contradiction', 0.046), ('mj', 0.046), ('predicting', 0.045), ('unsupervised', 0.045), ('labeled', 0.044), ('dempster', 0.044), ('dependencies', 0.043), ('sn', 0.043), ('declare', 0.041), ('basu', 0.041), ('fq', 0.041), ('posterior', 0.04), ('utterances', 0.04), ('iterations', 0.038), ('sharing', 0.038), ('mmci', 0.038), ('contradict', 0.038), ('rain', 0.038), ('ub', 0.038), ('probable', 0.037), ('discover', 0.037), ('ti', 0.037), ('expensive', 0.037), ('assignments', 0.036), ('studied', 0.036), ('distribution', 0.036), ('paraphrases', 0.035), ('blum', 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

Author: Ivan Titov ; Mikhail Kozhevnikov

2 0.11616134 133 acl-2010-Hierarchical Search for Word Alignment

Author: Jason Riesa ; Daniel Marcu

Abstract: We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.

3 0.11567522 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

Author: Fei Huang ; Alexander Yates

Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.

4 0.11481187 246 acl-2010-Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure

Author: Minwoo Jeong ; Ivan Titov

Abstract: Documents often have inherently parallel structure: they may consist of a text and commentaries, or an abstract and a body, or parts presenting alternative views on the same problem. Revealing relations between the parts by jointly segmenting and predicting links between the segments, would help to visualize such documents and construct friendlier user interfaces. To address this problem, we propose an unsupervised Bayesian model for joint discourse segmentation and alignment. We apply our method to the “English as a second language” podcast dataset where each episode is composed of two parallel parts: a story and an explanatory lecture. The predicted topical links uncover hidden re- lations between the stories and the lectures. In this domain, our method achieves competitive results, rivaling those of a previously proposed supervised technique.

5 0.10437876 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment

Author: Vamshi Ambati ; Stephan Vogel ; Jaime Carbonell

Abstract: Semi-supervised word alignment aims to improve the accuracy of automatic word alignment by incorporating full or partial manual alignments. Motivated by standard active learning query sampling frameworks like uncertainty-, margin- and query-by-committee sampling we propose multiple query strategies for the alignment link selection task. Our experiments show that by active selection of uncertain and informative links, we reduce the overall manual effort involved in elicitation of alignment link data for training a semisupervised word aligner.

6 0.10416067 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation

7 0.10168891 170 acl-2010-Letter-Phoneme Alignment: An Exploration

8 0.10063537 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

9 0.099351339 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future

10 0.095522612 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

11 0.094155967 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

12 0.093872204 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

13 0.087351233 220 acl-2010-Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure

14 0.083748005 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach

15 0.083562732 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages

16 0.080349468 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

17 0.076791659 238 acl-2010-Towards Open-Domain Semantic Role Labeling

18 0.076320618 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features

19 0.075710617 158 acl-2010-Latent Variable Models of Selectional Preference

20 0.075219706 262 acl-2010-Word Alignment with Synonym Regularization

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.269), (1, 0.008), (2, -0.014), (3, -0.038), (4, 0.041), (5, -0.009), (6, -0.056), (7, -0.01), (8, 0.104), (9, -0.04), (10, -0.11), (11, -0.036), (12, 0.004), (13, 0.002), (14, -0.04), (15, -0.006), (16, 0.025), (17, -0.01), (18, -0.029), (19, -0.005), (20, 0.095), (21, -0.056), (22, -0.088), (23, 0.0), (24, -0.035), (25, -0.086), (26, 0.01), (27, 0.087), (28, -0.08), (29, -0.09), (30, -0.012), (31, -0.057), (32, -0.016), (33, -0.008), (34, 0.041), (35, -0.018), (36, 0.083), (37, -0.015), (38, -0.041), (39, 0.082), (40, -0.09), (41, -0.061), (42, -0.07), (43, -0.104), (44, 0.012), (45, 0.011), (46, -0.001), (47, -0.025), (48, 0.057), (49, -0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94875711 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

Author: Ivan Titov ; Mikhail Kozhevnikov

2 0.74516279 246 acl-2010-Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure

Author: Minwoo Jeong ; Ivan Titov

3 0.56254238 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: One goal of natural language generation is to produce coherent text that presents information in a logical order. In this paper, we show that topological fields, which model high-level clausal structure, are an important component of local coherence in German. First, we show in a sentence ordering experiment that topological field information improves the entity grid model of Barzilay and Lapata (2008) more than grammatical role and simple clausal order information do, particularly when manual annotations of this information are not available. Then, we incorporate the model enhanced with topological fields into a natural language generation system that generates constituent orders for German text, and show that the added coherence component improves performance slightly, though not statistically significantly.

4 0.55603427 248 acl-2010-Unsupervised Ontology Induction from Text

Author: Hoifung Poon ; Pedro Domingos

Abstract: Extracting knowledge from unstructured text is a long-standing goal of NLP. Although learning approaches to many of its subtasks have been developed (e.g., parsing, taxonomy induction, information extraction), all end-to-end solutions to date require heavy supervision and/or manual engineering, limiting their scope and scalability. We present OntoUSP, a system that induces and populates a probabilistic ontology using only dependency-parsed text as input. OntoUSP builds on the USP unsupervised semantic parser by jointly forming ISA and IS-PART hierarchies of lambda-form clusters. The ISA hierarchy allows more general knowledge to be learned, and the use of smoothing for parameter estimation. We evaluate On- toUSP by using it to extract a knowledge base from biomedical abstracts and answer questions. OntoUSP improves on the recall of USP by 47% and greatly outperforms previous state-of-the-art approaches.

5 0.54612845 263 acl-2010-Word Representations: A Simple and General Method for Semi-Supervised Learning

Author: Joseph Turian ; Lev-Arie Ratinov ; Yoshua Bengio

Abstract: If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking. We use near state-of-the-art supervised baselines, and find that each of the three word representations improves the accuracy of these baselines. We find further improvements by combining different word representations. You can download our word features, for off-the-shelf use in existing NLP systems, as well as our code, here: http ://metaoptimize com/proj ects/wordreprs/ .

6 0.53883833 238 acl-2010-Towards Open-Domain Semantic Role Labeling

7 0.53121823 194 acl-2010-Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning

8 0.53011322 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation

9 0.52662027 100 acl-2010-Enhanced Word Decomposition by Calibrating the Decision Threshold of Probabilistic Models and Using a Model Ensemble

10 0.52360797 170 acl-2010-Letter-Phoneme Alignment: An Exploration

11 0.52269715 40 acl-2010-Automatic Sanskrit Segmentizer Using Finite State Transducers

12 0.52169859 66 acl-2010-Compositional Matrix-Space Models of Language

13 0.52097118 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction

14 0.51052868 35 acl-2010-Automated Planning for Situated Natural Language Generation

15 0.50579333 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment

16 0.50009346 262 acl-2010-Word Alignment with Synonym Regularization

17 0.49960589 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

18 0.48948237 43 acl-2010-Automatically Generating Term Frequency Induced Taxonomies

19 0.48898438 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation

20 0.48673195 205 acl-2010-SVD and Clustering for Unsupervised POS Tagging

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.217), (14, 0.033), (25, 0.072), (28, 0.014), (39, 0.012), (42, 0.03), (44, 0.022), (59, 0.108), (73, 0.056), (78, 0.049), (80, 0.011), (83, 0.097), (84, 0.036), (98, 0.15)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89474332 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation

Author: Federico Sangati

Abstract: We present a probabilistic model extension to the Tesni `ere Dependency Structure (TDS) framework formulated in (Sangati and Mazza, 2009). This representation incorporates aspects from both constituency and dependency theory. In addition, it makes use of junction structures to handle coordination constructions. We test our model on parsing the English Penn WSJ treebank using a re-ranking framework. This technique allows us to efficiently test our model without needing a specialized parser, and to use the standard evaluation metric on the original Phrase Structure version of the treebank. We obtain encouraging results: we achieve a small improvement over state-of-the-art results when re-ranking a small number of candidate structures, on all the evaluation metrics except for chunking.

2 0.87271011 212 acl-2010-Simple Semi-Supervised Training of Part-Of-Speech Taggers

Author: Anders Sogaard

Abstract: Most attempts to train part-of-speech taggers on a mixture of labeled and unlabeled data have failed. In this work stacked learning is used to reduce tagging to a classification task. This simplifies semisupervised training considerably. Our prefered semi-supervised method combines tri-training (Li and Zhou, 2005) and disagreement-based co-training. On the Wall Street Journal, we obtain an error reduction of 4.2% with SVMTool (Gimenez and Marquez, 2004).

3 0.84882593 194 acl-2010-Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning

Author: Francois Mairesse ; Milica Gasic ; Filip Jurcicek ; Simon Keizer ; Blaise Thomson ; Kai Yu ; Steve Young

Abstract: Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents BAGEL, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that BAGEL can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally, generation perfor- mance on sparse datasets is improved significantly by using certainty-based active learning, yielding ratings close to the human gold standard with a fraction of the data.

same-paper 4 0.84235799 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

Author: Ivan Titov ; Mikhail Kozhevnikov

5 0.71949816 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

Author: Fei Huang ; Alexander Yates

6 0.71873373 71 acl-2010-Convolution Kernel over Packed Parse Forest

7 0.71728951 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

8 0.71698147 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification

9 0.7159996 169 acl-2010-Learning to Translate with Source and Target Syntax

10 0.71504682 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

11 0.71411341 158 acl-2010-Latent Variable Models of Selectional Preference

12 0.71400809 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons

13 0.71327108 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese