emnlp emnlp2012 emnlp2012-124 knowledge-graph by maker-knowledge-mining

124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction


Source: pdf

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky

Abstract: We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. [sent-8, score-0.286]

2 We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. [sent-9, score-0.387]

3 (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. [sent-10, score-0.4]

4 (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. [sent-11, score-0.53]

5 Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers. [sent-12, score-0.262]

6 , 2006) and punctuation marks (Seginer, 2007; Ponvert et al. [sent-15, score-0.208]

7 We will show that boundary information can also be useful in dependency grammar induction models, which traditionally focus on head rather than fringe words (Carroll and Charniak, 1992). [sent-17, score-0.735]

8 Similarly, the fact that the noun head (NN) of the object the mail appears at the right edge of the sentence could help identify the noun check as the right edge of the subject NP. [sent-23, score-0.201]

9 Because typical headdriven grammars model valence separately for each class of head, however, they cannot see that the left fringe boundary, The check, of the verb-phrase is shared with its daughter’s, check. [sent-26, score-0.326]

10 Neither of these insights is available to traditional dependency formulations, which could learn from the boundaries of this sentence only that determiners might have no left- and that nouns might have no right-dependents. [sent-27, score-0.238]

11 We propose a family of dependency parsing models that are capable of inducing longer-range implications from sentence edges than just fertilities of their fringe words. [sent-28, score-0.283]

12 Our ideas conveniently lend themselves to implementations that can reuse much of the standard grammar induction machinery, including efficient dynamic programming routines for the relevant expectation-maximization algorithms. [sent-29, score-0.261]

13 Each (head) word ch generates a left-dependent with probability 1− PSTOP( · | L; · · · ), where dots represent additional parameterization on which it may be conditioned. [sent-34, score-0.172]

14 If the child is indeed generated, its identity cd is chosen with probability PATTACH(cd | ch; · · · ), influenced by the identity of the parent ch and possibly other parameters (again represented by dots). [sent-35, score-0.29]

15 The child then generates its own subtree recursively and the whole process continues, moving away from the head, until ch fails to generate a left-dependent. [sent-36, score-0.172]

16 Instances of these split-head automata have been heavily used in grammar induction (Paskin, 2001b; Klein and Manning, 2004; Headden et al. [sent-39, score-0.339]

17 The basic tenet of split-head grammars is that every head word generates its left-dependents independently of its right-dependents. [sent-41, score-0.217]

18 But it does — — not imply that descendants that are closer to the head cannot influence the generation of farther dependents on the same side. [sent-43, score-0.142]

19 Nevertheless, many popular grammars for unsupervised parsing behave as if a word had to generate all of its children (to one side) or at least their count before allowing any ofthese children themselves to recurse. [sent-44, score-0.176]

20 For example, Klein and Manning’s (2004) dependency model with valence (DMV) could be imple— — 1Unrestricted head-outward automata are strictly more powerful (e. [sent-45, score-0.227]

21 , the ordered sequence ofall already-generated descendants, on the side of the head that is in the process of spawning off an additional child is not only known but also readily accessible. [sent-56, score-0.178]

22 Taking advantage of this availability, we designed three new models for dependency grammar induction. [sent-57, score-0.281]

23 1 Dependency and Boundary Model One DBM-1 conditions all stopping decisions on adjacency and the identity of the fringe word ce the currently-farthest descendant (edge) derived by head ch in the given head-outward direction (dir ∈ {L, R}): — PSTOP( · | dir; adj, ce). [sent-59, score-0.7]

24 In the adjacent case (adj = T), ch is deciding whether to have any children on a given side: a first child’s subtree would be right next to the head, so the head × × and the fringe words coincide (ch = ce). [sent-60, score-0.415]

25 In the nonadjacent case (adj = F), these will be different words and their classes will, in general, not be the Thus, non-adjacent stopping decisions will be made independently of a head word’s identity. [sent-61, score-0.271]

26 Therefore, all word classes will be equally likely to continue to grow or not, for a specific proposed fringe boundary. [sent-62, score-0.173]

27 For example, production of The check is involves two non-adjacent stopping decisions on the left: one by the noun check and one by the verb is, both of which stop after generating a first child. [sent-63, score-0.248]

28 Figure 2: Our running example a simple sentence and its unlabeled dependency parse structure’s probability, as factored by DBM-1 ; highlighted comments specify heads associated to non-adjacent stopping probability factors. [sent-69, score-0.329]

29 2 Dependency and Boundary Model Two DBM-2 allows different but related grammars to coexist in a single model. [sent-74, score-0.147]

30 Specifically, we presuppose that all sentences are assigned to one of two classes: complete and incomplete (comp ∈ {T, F}, for now taken as exogenous). [sent-75, score-0.175]

31 However, sentence lengths for which stopping probabilities are responsible and distributions of root words may be different. [sent-79, score-0.208]

32 Consequently, an additional comp parameter is added to the context of two relevant types of factors: PSTOP ( · | dir; adj, ce , comp) ; and PATTACH(cr | ⋄; L, comp) . [sent-80, score-0.174]

33 For example, the new stopping factors could capture the fact that incomplete fragments such as the noun-phrases George Morton, headlines Energy and Odds and Ends, a line item c - Domestic car, dollar — — — 690 quantity Revenue: $3. [sent-81, score-0.401]

34 The new root-attachment factors could further track that incomplete sentences generally lack verbs, in contrast to other short sentences, e. [sent-83, score-0.175]

35 3 Dependency and Boundary Model Three DBM-3 adds further conditioning on punctuation context. [sent-103, score-0.273]

36 We introduce another boolean parameter, cross, which indicates the presence of intervening punctuation between a proposed head word ch and its dependent cd. [sent-104, score-0.533]

37 Conditioning on (the absence of) intervening punctuation could help tell true long-distance relations from impostors. [sent-108, score-0.253]

38 Table 1 lists some parameterizations that have since been used by unsupervised dependency grammar inducers sharing their backbone split-head process. [sent-113, score-0.613]

39 In the grammar induction experiments that follow, we will test each model’s incremental contribution to accuracies empirically, across many disparate languages. [sent-120, score-0.306]

40 4 For each data set, we induced a baseline grammar using the DMV. [sent-123, score-0.171]

41 Grammar inducers were initialized using (the same) uniformly-at-random chosen parse trees of training sentences (Cohen and Smith, 2010); thereafter, we applied “add one” smoothing at every training step. [sent-126, score-0.317]

42 We did not test DBM-3 in this set-up because most sentence-internal punctuation occurs in longer sentences; instead, DBM-3 will be tested later (see §7), using most in the final training step §of7 a c uusirrnigcu mluomst strategy (Bengio et al. [sent-140, score-0.208]

43 -792 Table 2: Directed dependency accuracies, averaged over all 2006/7 CoNLL evaluation sets (all sentences), for the DMV and two new dependency-and-boundary grammar inducers (DBM-1,2) using two termination strategies. [sent-147, score-0.578]

44 6 — 4 Dependency and Boundary Model One The primary difference between DBM-1 and traditional models, such as the DMV, is that DBM-1 conditions non-adjacent stopping decisions on the identities of fringe words in partial yields (see §2. [sent-148, score-0.399]

45 1 Analytical Motivation Treebank data suggests that the class of the fringe word its part-of-speech, ce is a better predictor of (non-adjacent) stopping decisions, in a given direction dir, than the head’s own class ch. [sent-151, score-0.384]

46 In contrast, using ce in place of ch boosts explanatory power to 24%, keeping the number ofparameters the same. [sent-159, score-0.294]

47 Moreover, every sentence exposes two true edges (H¨ anig, 2010): integrated over many sample sentence beginnings and ends, cumulative knowledge about such markers can guide a grammar inducer inside long inputs, where structure is murky. [sent-163, score-0.171]

48 692 Table 4: Empirical distributions for non-punctuationpartof-speech tags in WSJ, ordered by overall frequency, as well as distributions for sentence boundaries and for the roots of complete and incomplete sentences. [sent-175, score-0.359]

49 ) Figure 5 3: Histograms of lengths (in tokens) for 2,261 non-clausal fragments (red) and other sentences (blue) in WSJ. [sent-188, score-0.185]

50 Dependency and Boundary Model Two DBM-2 adapts DBM-1 grammars to two classes of inputs (complete sentences and incomplete fragments) by forking off new, separate multinomials for stopping decisions and root-distributions (see §2. [sent-189, score-0.499]

51 1 Analytical Motivation Unrepresentative short sentences such as headlines and titles are common in news-style data and pose a known nuisance to grammar inducers. [sent-192, score-0.266]

52 8 Table 4 shows that roots of incomplete sentences, which are dominated by nouns, barely resemble the other roots, drawn from more traditional verb and modal types. [sent-196, score-0.196]

53 693 (complete) roots, suggesting that heads of fragments too may warrant their own multinomial in the model. [sent-206, score-0.152]

54 Further, incomplete sentences are uncharacteristically short (see Figure 3). [sent-207, score-0.175]

55 It is this property that makes them particularly treacherous to grammar inducers, since by offering few options of root positions they increase the chances that a learner will incorrectly induce nouns to be heads. [sent-208, score-0.171]

56 Given that expected lengths are directly related to stopping decisions, it could make sense to also model the stopping probabilities of incomplete sentences separately. [sent-209, score-0.469]

57 2 Experimental Results Since it is not possible to consult parse trees during grammar induction (to check whether an input sentence is clausal), we opted for a proxy: presence of sentence-final punctuation. [sent-211, score-0.377]

58 Using punctuation to divide input sentences into two groups, DBM-2 scored higher: 40. [sent-212, score-0.256]

59 3o tan 1lC4 la6 u,891s421a978lno -Cl12a,u932s362a615l4 89T,7o24t604a583l Table 6: A contingency table for clausal sentences and trailing punctuation in WSJ; the mean square contingency coefficient rφ signifies a low degree of correlation. [sent-218, score-0.401]

60 We suspect that identities of punctuation marks (Collins, 2003, Footnote 13) both sentence-final and sentence-initial could be of extra assistance in grammar induction, specifically for grouping imperatives, questions, and so forth. [sent-225, score-0.437]

61 — — 6 Dependency and Boundary Model Three DBM-3 exploits sentence-internal punctuation contexts by modeling punctuation-crossing dependency arcs separately from other attachments (see §2. [sent-226, score-0.318]

62 Sometimes longer-distance dependencies can be vetted using sentence-internal punctuation marks. [sent-235, score-0.208]

63 It happens that the presence of punctuation between such conjunction (IN) and verb (MD) types serves as a clue that they are not connected (see Table 7a); by contrast, a simpler cue whether these words are adjacent is, in this case, hardly of any use (see Table 7b). [sent-236, score-0.246]

64 Conditioning on crossing punctuation could be of help then, playing a role similar to that of comma-counting (Collins, 1997, §2. [sent-237, score-0.208]

65 04na tlA a2 ,c4h3178e41d38notA 1a74c,h60e4817d50231746T,91o 86t5a241l Table 7: Contingency tables for IN right-attaching MD, among closest ordered pairs of these tokens in WSJ sentences with punctuation, versus: (a) presence of intervening punctuation; and (b) presence of intermediate words. [sent-241, score-0.205]

66 2 Experimental Results Postponed As we mentioned earlier (see §3), there is little point iAns testing DntiBoMne-d3 weairtlhie srh (osreteer § sentences, s li nttlcee pmooinstt sentence-internal punctuation occurs in longer inputs. [sent-243, score-0.208]

67 Our grammar inducers will thus be “starting small” in both senses suggested by Elman (1993): simultaneously scaffolding on model- and data-complexity. [sent-251, score-0.468]

68 1 Scaffolding Stage #1: DBM-1 We begin by training DBM-1 on sentences without sentence-internal punctuation but with at least one trailing punctuation mark. [sent-253, score-0.464]

69 Table 8: Average accuracies over CoNLL evaluation sets (all sentences), for the DMV baseline and DBM1–3 trained with a curriculum strategy, and state-of-the-art results for systems that: (i) are also POS-agnostic and monolingual, including SCAJ (Spitkovsky et al. [sent-264, score-0.161]

70 , 2011b); (ii) rely on gold POS-tag identities to discourage noun roots (Mare ˇcek and Zabokrtsk y´, 2011, MZ) or to encourage verbs (Rasooli and Faili, 2012, RF); and (iii) transfer delexicalized parsers (Søgaard, 2011a, S) from resource-rich languages with translations (McDonald et al. [sent-266, score-0.169]

71 DMV and DBM-1 trained on simple sentences, from uniform; DBM-2 and 3 trained on most sentences, from DBM-1 and 2, respectively; +inference is DBM-3 with punctuation constraints. [sent-268, score-0.208]

72 1), plus — 695 uniformly-at-random chosen dependency trees for the new complex and incomplete sentences, subject to punctuation-induced constraints. [sent-276, score-0.237]

73 Here we used the sprawl method a more relaxed approach than in training, allowing arbitrary words to attach inter-punctuation fragments (provided that each entire fragment still be derived by one of its words) as suggested by Spitkovsky et al. [sent-291, score-0.131]

74 10 — — 8 Discussion and the State-of-the-Art DBMs come from a long line of head-outward mod- els for dependency grammar induction yet their generative processes feature important novelties. [sent-297, score-0.371]

75 , of complete and incomplete sentences to coexist in a single model. [sent-301, score-0.208]

76 The second part of our work the use of a curriculum strategy to train DBM-1 through 3 eliminates having to know tuned cut-offs, such as sentences with up to a predetermined number of tokens. [sent-304, score-0.164]

77 : stage one’s data is dictated by DBM-1 (which ignores punctuation); subsequent stages initialize additional pieces uniformly: uniform-at-random parses for new data and uniform multinomials for new parameters. [sent-306, score-0.132]

78 Other orthogonal dependency grammar induction techniques including ones based on universal rules (Naseem et al. [sent-313, score-0.371]

79 1 Monolingual POS-Agnostic Inducers The first type of grammar inducers, including our own approach, uses standard training and test data sets for each language, with gold part-of-speech tags as anonymized word classes. [sent-318, score-0.171]

80 The progression of scores for DBM-1 through 3 without using punctuation constraints in inference 40. [sent-328, score-0.208]

81 2% fell entirely above this previous state-of-the-art result as well; the DMV baseline also trained on sentences without internal but with final punctuation averaged 33. [sent-331, score-0.256]

82 Such grammar inducers generally do better than the first kind e. [sent-338, score-0.402]

83 Of the 10 languages for which we found results in the literature, transferred parsers underperformed the grammar inducers in only one case: on English (see Table 8). [sent-347, score-0.402]

84 For example, modeling of incomplete sentences could help in incremental initialization strategies like baby steps (Spitkovsky et al. [sent-352, score-0.208]

85 Since redundant views of data can make learning easier (Blum and Mitchell, 1998), integrating aspects of both constituency and dependency ought to be able to help grammar induction. [sent-362, score-0.322]

86 We have shown that this insight is correct: dependency grammar inducers can gain from modeling boundary information that is fundamental to constituency (i. [sent-363, score-0.641]

87 DBMs are a step in the direction towards modeling constituent boundaries jointly with head dependencies. [sent-366, score-0.217]

88 Further steps must involve more tightly 697 coupling the two frameworks, as well as showing ways to incorporate both kinds of information in other state-of-the art grammar induction paradigms. [sent-367, score-0.261]

89 Learning dependency translation models as collections of finite-state head transducers. [sent-378, score-0.213]

90 Boosting unsupervised grammar induction by splitting complex sentences on function words. [sent-412, score-0.371]

91 Two experiments on learning probabilistic dependency grammars from corpora. [sent-475, score-0.224]

92 Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. [sent-483, score-0.275]

93 Concavity and initialization for unsupervised dependency grammar induction. [sent-577, score-0.343]

94 Improving unsupervised dependency parsing with richer contexts and smoothing. [sent-590, score-0.172]

95 Corpus-based induction of syntactic structure: Models of dependency and constituency. [sent-597, score-0.2]

96 Gibbs sampling with treeness constraint in unsupervised dependency parsing. [sent-619, score-0.172]

97 From ranked words to dependency trees: two-stage unsupervised non-projective dependency parsing. [sent-715, score-0.282]

98 Baby Steps: How “Less is More” in unsupervised dependency parsing. [sent-723, score-0.172]

99 Lateen EM: Unsupervised training with multiple objectives, applied to dependency grammar induction. [sent-731, score-0.281]

100 Bootstrapping dependency grammar inducers from incomplete sentence fragments via austere models the “wabi-sabi” of unsupervised parsing. [sent-747, score-0.8]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pstop', 0.366), ('spitkovsky', 0.258), ('inducers', 0.231), ('pattach', 0.212), ('punctuation', 0.208), ('dbms', 0.193), ('fringe', 0.173), ('grammar', 0.171), ('alshawi', 0.165), ('dmv', 0.165), ('ch', 0.139), ('dir', 0.138), ('stopping', 0.128), ('incomplete', 0.127), ('vbz', 0.117), ('curriculum', 0.116), ('grammars', 0.114), ('dependency', 0.11), ('head', 0.103), ('fragments', 0.099), ('wsj', 0.092), ('comp', 0.091), ('induction', 0.09), ('boundary', 0.088), ('ce', 0.083), ('nn', 0.08), ('boundaries', 0.079), ('automata', 0.078), ('zabokrtsk', 0.077), ('dt', 0.072), ('roots', 0.069), ('termination', 0.066), ('lateen', 0.066), ('mare', 0.066), ('scaffolding', 0.066), ('em', 0.066), ('conditioning', 0.065), ('unsupervised', 0.062), ('adj', 0.058), ('identities', 0.058), ('mail', 0.058), ('paskin', 0.058), ('rasooli', 0.058), ('heads', 0.053), ('contingency', 0.05), ('cd', 0.05), ('determiners', 0.049), ('sentences', 0.048), ('stage', 0.047), ('headlines', 0.047), ('naseem', 0.047), ('clausal', 0.045), ('accuracies', 0.045), ('intervening', 0.045), ('uniform', 0.043), ('distributions', 0.042), ('side', 0.042), ('bengio', 0.042), ('multinomials', 0.042), ('delexicalized', 0.042), ('determiner', 0.041), ('constituency', 0.041), ('gillenwater', 0.041), ('check', 0.04), ('decisions', 0.04), ('valence', 0.039), ('descendants', 0.039), ('faili', 0.039), ('krueger', 0.039), ('ofparameters', 0.039), ('parameterizations', 0.039), ('ponvert', 0.039), ('steer', 0.039), ('unrepresentative', 0.039), ('presence', 0.038), ('lengths', 0.038), ('parse', 0.038), ('cr', 0.037), ('analytical', 0.037), ('cohen', 0.037), ('tokens', 0.036), ('collins', 0.036), ('constituent', 0.035), ('alia', 0.035), ('inter', 0.035), ('eisner', 0.034), ('identity', 0.034), ('siblings', 0.033), ('gaard', 0.033), ('blum', 0.033), ('dots', 0.033), ('cek', 0.033), ('brent', 0.033), ('baby', 0.033), ('samdani', 0.033), ('explanatory', 0.033), ('hellinger', 0.033), ('coexist', 0.033), ('child', 0.033), ('fragment', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000013 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky

Abstract: We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers.

2 0.1531741 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing

Author: David Marecek ; Zdene20 ek Zabokrtsky

Abstract: The possibility of deleting a word from a sentence without violating its syntactic correctness belongs to traditionally known manifestations of syntactic dependency. We introduce a novel unsupervised parsing approach that is based on a new n-gram reducibility measure. We perform experiments across 18 languages available in CoNLL data and we show that our approach achieves better accuracy for the majority of the languages then previously reported results.

3 0.13525885 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars

Author: Kewei Tu ; Vasant Honavar

Abstract: We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. The resulting family of algorithms includes the expectation-maximization algorithm (EM) and its variant, Viterbi EM, as well as a so-called softmax-EM algorithm. The softmax-EM algorithm can be implemented with a simple and computationally efficient extension to standard EM. In our experiments of unsupervised dependency grammar learn- ing, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art.

4 0.11596425 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation

Author: David Hall ; Dan Klein

Abstract: PCFGs can grow exponentially as additional annotations are added to an initially simple base grammar. We present an approach where multiple annotations coexist, but in a factored manner that avoids this combinatorial explosion. Our method works with linguisticallymotivated annotations, induced latent structure, lexicalization, or any mix of the three. We use a structured expectation propagation algorithm that makes use of the factored structure in two ways. First, by partitioning the factors, it speeds up parsing exponentially over the unfactored approach. Second, it minimizes the redundancy of the factors during training, improving accuracy over an independent approach. Using purely latent variable annotations, we can efficiently train and parse with up to 8 latent bits per symbol, achieving F1 scores up to 88.4 on the Penn Treebank while using two orders of magnitudes fewer parameters compared to the na¨ ıve approach. Combining latent, lexicalized, and unlexicalized anno- tations, our best parser gets 89.4 F1 on all sentences from section 23 of the Penn Treebank.

5 0.10396685 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon

Author: Greg Durrett ; Adam Pauls ; Dan Klein

Abstract: We consider the problem of using a bilingual dictionary to transfer lexico-syntactic information from a resource-rich source language to a resource-poor target language. In contrast to past work that used bitexts to transfer analyses of specific sentences at the token level, we instead use features to transfer the behavior of words at a type level. In a discriminative dependency parsing framework, our approach produces gains across a range of target languages, using two different lowresource training methodologies (one weakly supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed).

6 0.099595867 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning

7 0.090760231 12 emnlp-2012-A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing

8 0.082348503 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output

9 0.079272591 37 emnlp-2012-Dynamic Programming for Higher Order Parsing of Gap-Minding Trees

10 0.078276142 88 emnlp-2012-Minimal Dependency Length in Realization Ranking

11 0.076966785 81 emnlp-2012-Learning to Map into a Universal POS Tagset

12 0.074596159 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules

13 0.071633279 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures

14 0.068443671 48 emnlp-2012-Exploring Adaptor Grammars for Native Language Identification

15 0.06688977 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure

16 0.061452854 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP

17 0.059173685 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries

18 0.058831077 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables

19 0.058464315 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

20 0.057662811 125 emnlp-2012-Towards Efficient Named-Entity Rule Induction for Customizability


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.23), (1, -0.098), (2, 0.11), (3, -0.041), (4, 0.031), (5, 0.072), (6, -0.023), (7, 0.023), (8, -0.0), (9, 0.143), (10, 0.1), (11, 0.141), (12, 0.016), (13, 0.104), (14, -0.131), (15, -0.044), (16, -0.01), (17, 0.004), (18, -0.151), (19, -0.107), (20, 0.071), (21, -0.037), (22, 0.019), (23, 0.139), (24, -0.052), (25, 0.137), (26, 0.064), (27, 0.141), (28, 0.007), (29, 0.063), (30, 0.122), (31, -0.192), (32, -0.016), (33, -0.056), (34, 0.142), (35, -0.042), (36, -0.038), (37, 0.186), (38, -0.132), (39, 0.119), (40, -0.052), (41, 0.004), (42, 0.005), (43, 0.068), (44, -0.028), (45, 0.001), (46, -0.037), (47, 0.046), (48, -0.043), (49, 0.071)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94625074 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky

Abstract: We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers.

2 0.74216664 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing

Author: David Marecek ; Zdene20 ek Zabokrtsky

Abstract: The possibility of deleting a word from a sentence without violating its syntactic correctness belongs to traditionally known manifestations of syntactic dependency. We introduce a novel unsupervised parsing approach that is based on a new n-gram reducibility measure. We perform experiments across 18 languages available in CoNLL data and we show that our approach achieves better accuracy for the majority of the languages then previously reported results.

3 0.72772259 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars

Author: Kewei Tu ; Vasant Honavar

Abstract: We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. The resulting family of algorithms includes the expectation-maximization algorithm (EM) and its variant, Viterbi EM, as well as a so-called softmax-EM algorithm. The softmax-EM algorithm can be implemented with a simple and computationally efficient extension to standard EM. In our experiments of unsupervised dependency grammar learn- ing, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art.

4 0.50189149 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation

Author: David Hall ; Dan Klein

Abstract: PCFGs can grow exponentially as additional annotations are added to an initially simple base grammar. We present an approach where multiple annotations coexist, but in a factored manner that avoids this combinatorial explosion. Our method works with linguisticallymotivated annotations, induced latent structure, lexicalization, or any mix of the three. We use a structured expectation propagation algorithm that makes use of the factored structure in two ways. First, by partitioning the factors, it speeds up parsing exponentially over the unfactored approach. Second, it minimizes the redundancy of the factors during training, improving accuracy over an independent approach. Using purely latent variable annotations, we can efficiently train and parse with up to 8 latent bits per symbol, achieving F1 scores up to 88.4 on the Penn Treebank while using two orders of magnitudes fewer parameters compared to the na¨ ıve approach. Combining latent, lexicalized, and unlexicalized anno- tations, our best parser gets 89.4 F1 on all sentences from section 23 of the Penn Treebank.

5 0.44190285 88 emnlp-2012-Minimal Dependency Length in Realization Ranking

Author: Michael White ; Rajakrishnan Rajkumar

Abstract: Comprehension and corpus studies have found that the tendency to minimize dependency length has a strong influence on constituent ordering choices. In this paper, we investigate dependency length minimization in the context of discriminative realization ranking, focusing on its potential to eliminate egregious ordering errors as well as better match the distributional characteristics of sentence orderings in news text. We find that with a stateof-the-art, comprehensive realization ranking model, dependency length minimization yields statistically significant improvements in BLEU scores and significantly reduces the number of heavy/light ordering errors. Through distributional analyses, we also show that with simpler ranking models, dependency length minimization can go overboard, too often sacrificing canonical word order to shorten dependencies, while richer models manage to better counterbalance the dependency length minimization preference against (sometimes) competing canonical word order preferences.

6 0.42998585 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon

7 0.42632729 48 emnlp-2012-Exploring Adaptor Grammars for Native Language Identification

8 0.37945575 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules

9 0.35132986 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning

10 0.34390378 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context

11 0.33719951 81 emnlp-2012-Learning to Map into a Universal POS Tagset

12 0.3074255 59 emnlp-2012-Generating Non-Projective Word Order in Statistical Linearization

13 0.30579269 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables

14 0.30167758 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP

15 0.29811063 12 emnlp-2012-A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing

16 0.29113442 125 emnlp-2012-Towards Efficient Named-Entity Rule Induction for Customizability

17 0.2898345 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage

18 0.28554693 37 emnlp-2012-Dynamic Programming for Higher Order Parsing of Gap-Minding Trees

19 0.27239189 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures

20 0.2658287 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.034), (11, 0.024), (16, 0.055), (25, 0.022), (29, 0.014), (34, 0.048), (39, 0.012), (45, 0.026), (60, 0.066), (63, 0.056), (64, 0.024), (65, 0.035), (70, 0.02), (73, 0.031), (74, 0.072), (76, 0.072), (80, 0.023), (82, 0.011), (86, 0.027), (89, 0.238), (95, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.85280615 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions

Author: Victor Chahuneau ; Kevin Gimpel ; Bryan R. Routledge ; Lily Scherlis ; Noah A. Smith

Abstract: We investigate the use of language in food writing, specifically on restaurant menus and in customer reviews. Our approach is to build predictive models of concrete external variables, such as restaurant menu prices. We make use of a dataset of menus and customer reviews for thousands of restaurants in several U.S. cities. By focusing on prediction tasks and doing our analysis at scale, our methodology allows quantitative, objective measurements of the words and phrases used to de- scribe food in restaurants. We also explore interactions in language use between menu prices and sentiment as expressed in user reviews.

same-paper 2 0.78946549 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky

Abstract: We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers.

3 0.53205162 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

Author: Jayant Krishnamurthy ; Tom Mitchell

Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.

4 0.50558579 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars

Author: Kewei Tu ; Vasant Honavar

Abstract: We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. The resulting family of algorithms includes the expectation-maximization algorithm (EM) and its variant, Viterbi EM, as well as a so-called softmax-EM algorithm. The softmax-EM algorithm can be implemented with a simple and computationally efficient extension to standard EM. In our experiments of unsupervised dependency grammar learn- ing, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art.

5 0.50210673 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields

Author: Bishan Yang ; Claire Cardie

Abstract: Extracting opinion expressions from text is usually formulated as a token-level sequence labeling task tackled using Conditional Random Fields (CRFs). CRFs, however, do not readily model potentially useful segment-level information like syntactic constituent structure. Thus, we propose a semi-CRF-based approach to the task that can perform sequence labeling at the segment level. We extend the original semi-CRF model (Sarawagi and Cohen, 2004) to allow the modeling of arbitrarily long expressions while accounting for their likely syntactic structure when modeling segment boundaries. We evaluate performance on two opinion extraction tasks, and, in contrast to previous sequence labeling approaches to the task, explore the usefulness of segment- level syntactic parse features. Experimental results demonstrate that our approach outperforms state-of-the-art methods for both opinion expression tasks.

6 0.49405488 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

7 0.49359947 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon

8 0.49317989 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction

9 0.4916448 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing

10 0.48914969 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

11 0.48888952 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

12 0.48709419 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games

13 0.48684335 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition

14 0.48532054 81 emnlp-2012-Learning to Map into a Universal POS Tagset

15 0.48505211 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT

16 0.48291609 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence

17 0.48253965 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction

18 0.48235515 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis

19 0.48144755 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling

20 0.48110953 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints