emnlp emnlp2012 emnlp2012-46 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Marecek ; Zdene20 ek Zabokrtsky
Abstract: The possibility of deleting a word from a sentence without violating its syntactic correctness belongs to traditionally known manifestations of syntactic dependency. We introduce a novel unsupervised parsing approach that is based on a new n-gram reducibility measure. We perform experiments across 18 languages available in CoNLL data and we show that our approach achieves better accuracy for the majority of the languages then previously reported results.
Reference: text
sentIndex sentText sentNum sentScore
1 We introduce a novel unsupervised parsing approach that is based on a new n-gram reducibility measure. [sent-5, score-0.753]
2 It is undeniable that a huge progress has been reached in the field of supervised dependency parsing, especially due to the CoNLL shared task series. [sent-9, score-0.118]
3 (2009), one of the traditional linguistic criteria for recognizing dependency relations (including their head-dependent orientation) is that a head H of a construction C determines the syntactic category of C and can often replace C. [sent-14, score-0.148]
4 Of course, all the above works had to respond to the notorious fact that there are many language phenomena precluding the ideal (word by word) sen- tence reducibility (e. [sent-18, score-0.643]
5 However, we disregard their solutions tentatively and borrow only the very core of the reducibility idea: if a word can be removed from a sentence without damaging it, then it is likely to be dependent on some other (still present) word. [sent-21, score-0.67]
6 That is why we introduce a simple reducibility measure based on n-gram corpus statistics. [sent-23, score-0.643]
7 We employ this reducibility measure as the main feature in our unsupervised parsing procedure. [sent-24, score-0.753]
8 In our sampler, the more reducible a given token is, the more likely it is to be sampled as a dependant and not as a head. [sent-27, score-0.263]
9 After certain number of sampling iterations, for each sentence a final dependency tree is created (one token per node, including punctuation) that maximizes the product of edge probabilities gathered along the sampling history. [sent-28, score-0.301]
10 While the computationally demanding sampling procedure can be applied only on limited data, the unrepeated precomputation of statistics for reducibility estimates can easily exploit much larger data. [sent-32, score-0.744]
11 We are not aware of any other published work on unsupervised parsing employing reducibility or a similar idea. [sent-33, score-0.78]
12 Dominating approaches in unsupervised parsing are typically based on repeated patterns, and not on the possibility of a deletion inside a pattern. [sent-34, score-0.141]
13 It seems that the two views of dependency (frequent co-occurrence of head-dependant pair, versus reducibility of the dependant) are rather complementary, so fruitful combinations can be hopefully expected in future. [sent-35, score-0.733]
14 Section 2 briefly outlines the state of the art in unsupervised dependency parsing. [sent-37, score-0.162]
15 Our measure of reducibility based on a large monolingual corpus is presented in Section 3. [sent-38, score-0.643]
16 Section 4 shows our models which serve for generating probability estimates for edge sampling described in Section 5. [sent-39, score-0.141]
17 Experimental parsing results for languages included in CoNLL shared task treebanks are summarized in Section 6. [sent-40, score-0.168]
18 2 Related Work The most popular approach in unsupervised dependency parsing of the recent years is to employ Dependency Model with Valence (DMV), which was introduced by Klein and Manning (2004). [sent-42, score-0.2]
19 Such experiments were done by Spitkovsky (201 1b; 2011c), where the parsing algorithm was evaluated on all 19 languages included in CoNLL 2006 (Buchholz and 1The state-of-the-art unsupervised than 50% of attachment score measured achieve more on the Penn Treebank. [sent-49, score-0.256]
20 , 2011a) shows that the unsupervised part-of-speech tags may be more useful for this task than the supervised ones. [sent-53, score-0.118]
21 Another possibility for obtaining dependency structures for languages without any linguistically annotated resources can be the projection using a parallel treebank with a resource-rich language (typically English). [sent-54, score-0.205]
22 In this paper, we describe a novel approach to unsupervised dependency parsing. [sent-59, score-0.162]
23 Our model differs from DMV, since we employ the reducibility feature and use fertility of nodes instead of generating STOP signs. [sent-60, score-1.073]
24 If we find it, we assume the word was reducible in the original sentence. [sent-67, score-0.232]
25 Since the number of such reducible word sequences found in any corpus will be low, we determine the reducibility scores from their individual types (part-of-speech tags). [sent-68, score-0.922]
26 The verb ‘went’ would be reducible in the context ‘their children went to school’, because the sequence ‘their children to school’ occurs in the second sentence. [sent-72, score-0.423]
27 Using part-of-speech tags instead of word forms is thus not suitable for computing reducibility scores. [sent-82, score-0.689]
28 Although we search for reducible sequences of word forms in the corpus, we compute reducibility scores for sequences of part-of-speech tags. [sent-83, score-0.922]
29 We denote the number of such reducible oc2We do not take into account sentences with less then 10 words, because they could be nominal (without any verb) and might influence the reducibility scores of verbs. [sent-93, score-0.922]
30 The relative reducibility R(g) of a PoS n-gram g is then computed as R(g) =N1rc((gg)) + + σσ21, (1) where the normalization constant N, which expresses relative reducibility over all the PoS n-grams (denoted by G), causes the scores are concentrated around the value 1. [sent-97, score-1.333]
31 Tables 1, 2, and 3 show reducibility scores of the most frequent PoS n-grams of three selected lan- guages: English, German, and Czech. [sent-99, score-0.69]
32 That is desired, because the reducible unigrams will more likely become leaves in dependency trees. [sent-103, score-0.322]
33 However, there are also n-grams such as the German trigram [determiner noun preposition] (ART-NN-APPR) whose reducibility score is undesirably high. [sent-105, score-0.643]
34 , 2009), there is a STOP sign indicating that no more dependents in a given direction will be generated. [sent-108, score-0.157]
35 Given a certain head, all its dependents in left direction are generated first, then the STOP sign in that direction, then all its right dependents and then STOP in the other direction. [sent-109, score-0.285]
36 Our model introduces fertility of a node, which substitutes the STOP sign. [sent-111, score-0.43]
37 For a given head, we first generate the number of its left and right children 3The high reducibility score of ART-NN-APPR was probably caused by German particles, which have the same PoS tag as prepositions. [sent-112, score-0.776]
38 (V* are verbs, N* are nouns, P* are pronouns, R* are prepositions, A * are adjectives, D* are adverbs, C* are numerals, J* are conjunctions, and Z* is punctuation) (fertility model) and then we fill these positions by generating its individual dependents (edge model). [sent-114, score-0.128]
39 If a zero fertility is generated in both the directions, the head becomes a leaf. [sent-115, score-0.488]
40 Besides the fertility model and the edge model, we use two more models (subtree model and distance model), which force the generated trees to have more desired shape. [sent-116, score-0.578]
41 1 Fertility Model We express a fertility of a node by a pair of numbers: the number of its left dependents and the number of its right dependents. [sent-118, score-0.589]
42 For example, fertility “1-3” means that the node has one left and three right dependents, fertility “0-0” indicates that it is a leaf. [sent-119, score-0.891]
43 This means that if a specific fertility has been frequent for a given PoS tag in the past, it is more likely to be generated again. [sent-121, score-0.467]
44 Besides the basic fertility model, we introduce also an extended fertility model, which uses frequency of a given word form for generating number of children. [sent-126, score-0.899]
45 Such words tend to have a stable number of children, for example (i) some function words are exclusively leaves, (ii) prepositions have just one child, and (iii) attachment of auxiliary verbs depends on the annotation style, but number of their children is also not very variable. [sent-130, score-0.319]
46 The extended fertility is described by equation Pf0(fi|ti,wi) =c−i(“ct−i,if(“it”i)” +) +Fα(Fwe(αiwe)iP)0(fi), (6) where F(wi) is a frequency of the word wi, which is computed as a number of words wi in our corpus divided by number of all words. [sent-132, score-0.498]
47 2 Edge Model After the fertility (number of left and right dependents) is generated, the individual slots are filled using the edge model. [sent-134, score-0.501]
48 A part-of-speech tag of each de- pendent is conditioned by part-of-speech tag of the head and the edge direction (position of the dependent related to the head). [sent-135, score-0.259]
49 5 5For the edge model purposes, the PoS tag of the technical root is set to ‘ ’ and it is in the zero-th position in the 301 Similarly as for the fertility model, we employ Chinese restaurant process to assign probabilities of individual dependent. [sent-136, score-0.538]
50 Pe(tj|ti,dj) =cc−−ii((““ttii,,tdjj,”d)j +”) β +|T β|, (7) where ti and tj are the part-of-speech tags of the head and the generated dependent respectively; dj is a direction of edge between the words iand j,which can have two values: left and right. [sent-137, score-0.357]
51 c−i( “ti, tj, dj” ) stands for the count of edges ti ← tj with the direction dj in the history, |T| is a nu←mb ter of unique tags in the corpus ahnisdt β yis, a D| i sric ah nluemt hyperparameter. [sent-138, score-0.232]
52 3 Distance Model Distance model is an auxiliary model that prevents the resulting trees from being too flat. [sent-140, score-0.105]
53 4 Subtree Model The subtree model uses the reducibility measure. [sent-150, score-0.729]
54 It plays an important role since it forces the reducible words to be leaves and reducible n-grams to be subtrees. [sent-151, score-0.464]
55 Words with low reducibility are forced towards the root of the tree. [sent-152, score-0.643]
56 The probability of such subtree is proportional to its reducibility R(desc(i)) . [sent-159, score-0.729]
57 We multiply the probabilities of fertility, edge, distance from parent, and subtree over all words (nodes) in the corpus. [sent-168, score-0.122]
58 The extended fertility model Pf0 can be substituted by its basic variant Pf. [sent-169, score-0.469]
59 In the end, “average” trees based on the whole sampling are built. [sent-174, score-0.111]
60 1 Initialization Before the sampling starts, we initialize the projective trees randomly. [sent-176, score-0.175]
61 If it is not possible to attach a word to one side, we attach it to the other side. [sent-180, score-0.114]
62 )) Figure 1: Arrow and bracketing notation of a projective dependency tree. [sent-186, score-0.154]
63 Each projective dependency tree consisting of n words can be expressed by n pairs of brackets. [sent-194, score-0.154]
64 From the perspective of dependency structures, the small change can be described as follows: 1. [sent-201, score-0.129]
65 7 When the samling is finished, we build final dependency trees based on the edge counts obtained during the sampling. [sent-220, score-0.202]
66 Other possibilities for obtaining final dependency trees would be using Eisner’s projective algorithm (Eisner, 1996) or using annealing method (favoring more likely changes) at the end of the sampling. [sent-223, score-0.195]
67 6 Experiments and Evaluation We evaluate our parser on 20 treebanks (18 languages) included in CoNLL shared tasks 2006 (Buchholz and Marsi, 2006) and 2007 (Nivre et al. [sent-225, score-0.127]
68 Similarly to some previous papers on unsupervised parsing (Gillenwater et al. [sent-227, score-0.11]
69 The best configuration of the parser achieved on English development data was then used for parsing all other languages. [sent-231, score-0.108]
70 This simulates the situation in which we have only one treebank (English) on which we can tune our parser and we want to parse other languages for which we have no manually annotated treebanks. [sent-232, score-0.124]
71 1 Data We need two kinds of data for our experiments: a smaller treebank, which is used for sampling and for evaluation, and a large corpus, from which we compute n-gram reducibility scores. [sent-236, score-0.713]
72 For obtaining reducibility scores, we used the W2C corpus9 of Wikipedia articles, which was downloaded by Majli ˇs and Zˇabokrtsk y´ (2012). [sent-242, score-0.643]
73 However, it is sufficient for obtaining good reducibility scores. [sent-248, score-0.643]
74 After several experiments, we have observed that the extended fertility model provides better results than the basic fertility model; the parser using the basic fertility model achieved 44. [sent-258, score-1.369]
75 1% attachment score for English, whereas the extended fertility model increased the score to 46. [sent-259, score-0.572]
76 The four hyperparameters αe (extended fertility model), β (edge model), γ (distance model), and (subtree model), were set by a grid search algorithm,12 which found the following optimal values: δ αe = 0. [sent-261, score-0.464]
77 Therefore, adjusting the hyperparameters on another language would probably change the scores significantly. [sent-264, score-0.12]
78 To be able to compare our parser attachment score to previously published results, the following steps must be done: • • We take the testing part of each treebank (the fWilee te st . [sent-267, score-0.211]
79 If the punctuation node is not a leaf, its children are attached to the parent of the removed node. [sent-269, score-0.202]
80 We define the best configuration as the one with the highest average attachment score across all the tested languages. [sent-290, score-0.133]
81 However, it is important to note that we used an additional source ofinformation, namely large unannotated corpora for computing reducibility scores, while the others used only the CoNLL data. [sent-294, score-0.643]
82 4 Error Analysis Our main motivation for developing an unsupervised dependency parser was that we wanted to be able to parse any language. [sent-296, score-0.202]
83 Auxiliary verbs in Slovenian In the Slovenian treebank, many verbs are composed of two words: main verb (marked as Verb-main) and auxiliary verb (Verb-copula). [sent-299, score-0.208]
84 Our parser choose the auxiliary verb as the head and the main verb and all its dependants become its children. [sent-300, score-0.252]
85 The main verb is switched with the auxiliary one which causes also the wrong attachment of all its dependants. [sent-304, score-0.222]
86 The dependency between content and function word is switched and the dependants of the content word are attached to the function word. [sent-308, score-0.202]
87 Unlabeled attachment scores for different combinations of model compo- nents (fertility model, edge model, distance model and subtree model). [sent-314, score-0.343]
88 From the perspective of the subtree model, which implements the reducibility feature, we can see that it is the most useful model here. [sent-323, score-0.729]
89 Very important is also the distance model which eliminates the possibility of attaching all words to one head word. [sent-329, score-0.125]
90 If we omit 13 13This relatively high baseline scores are caused by the MST algorithm, which chooses the most frequent edges from random trees i. [sent-330, score-0.156]
91 However, the numbers of reducible words in CoNLL training set were very low (50 words at maximum in CoNLL 2006 training data and 10 words at maximum in CoNLL 2007 training data). [sent-339, score-0.232]
92 This led to completely unreliable reducibility scores and the consequent poor results. [sent-340, score-0.69]
93 7 Conclusions and Future Work We have shown that employing the reducibility feature is useful in unsupervised dependency parsing task. [sent-341, score-0.843]
94 We extracted the n-gram reducibility scores from a large corpus, and then made the computationally demanding inference on smaller data using only these scores. [sent-342, score-0.721]
95 We evaluated our parser on 18 languages included in CoNLL and for 14 of them, we achieved higher attachment scores than previously published results. [sent-343, score-0.26]
96 The most errors were caused by function words, which sometimes take over the dependents of adjacent content words. [sent-344, score-0.165]
97 This can be caused by the fact that the reducibility cannot handle function words correctly, because they must be reduced together with a content word, not one after another. [sent-345, score-0.68]
98 Furthermore, we would like to get rid of manually designed PoS tags and use some kind of unsupervised clusters in order to have all the annotation process completely unsupervised. [sent-347, score-0.118]
99 306 Software The source code of our unsupervised dependency parser including the script for computing reducibility scores from large corpora is available at http : / /ufal . [sent-350, score-0.892]
100 Improving unsupervised dependency parsing with richer contexts and smoothing. [sent-411, score-0.2]
wordName wordTfidf (topN-words)
[('reducibility', 0.643), ('fertility', 0.43), ('reducible', 0.232), ('abokrtsk', 0.143), ('dependents', 0.128), ('conll', 0.112), ('attachment', 0.103), ('spitkovsky', 0.091), ('dependency', 0.09), ('subtree', 0.086), ('mare', 0.077), ('fi', 0.072), ('unsupervised', 0.072), ('zden', 0.071), ('edge', 0.071), ('sampling', 0.07), ('auxiliary', 0.064), ('projective', 0.064), ('gg', 0.061), ('treebanks', 0.059), ('children', 0.059), ('head', 0.058), ('attach', 0.057), ('dmv', 0.056), ('desc', 0.054), ('slovenian', 0.054), ('ek', 0.052), ('ti', 0.049), ('prepositions', 0.048), ('attached', 0.048), ('pos', 0.047), ('scores', 0.047), ('german', 0.047), ('tags', 0.046), ('went', 0.046), ('conl', 0.046), ('gilks', 0.046), ('numerals', 0.046), ('verbs', 0.045), ('languages', 0.043), ('tj', 0.043), ('park', 0.043), ('alshawi', 0.042), ('hiyan', 0.042), ('hyperparameter', 0.041), ('mst', 0.041), ('dog', 0.041), ('treebank', 0.041), ('trees', 0.041), ('parser', 0.04), ('change', 0.039), ('extended', 0.039), ('gillenwater', 0.038), ('parsing', 0.038), ('punctuation', 0.037), ('caused', 0.037), ('tag', 0.037), ('headden', 0.036), ('valence', 0.036), ('distance', 0.036), ('cuni', 0.036), ('dependants', 0.036), ('evg', 0.036), ('lopatkov', 0.036), ('majli', 0.036), ('popel', 0.036), ('ppgg', 0.036), ('tectomt', 0.036), ('tnt', 0.036), ('stop', 0.035), ('hyperparameters', 0.034), ('adverbs', 0.034), ('conjunctions', 0.034), ('dj', 0.034), ('valentin', 0.034), ('gibbs', 0.032), ('possibility', 0.031), ('edges', 0.031), ('node', 0.031), ('demanding', 0.031), ('dependant', 0.031), ('gerdes', 0.031), ('configuration', 0.03), ('preposition', 0.03), ('ubler', 0.03), ('wi', 0.029), ('direction', 0.029), ('buchholz', 0.029), ('stroudsburg', 0.028), ('shared', 0.028), ('chu', 0.028), ('bracket', 0.028), ('pretrained', 0.028), ('switched', 0.028), ('parent', 0.027), ('verb', 0.027), ('published', 0.027), ('dt', 0.027), ('nn', 0.027), ('dependent', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
Author: David Marecek ; Zdene20 ek Zabokrtsky
Abstract: The possibility of deleting a word from a sentence without violating its syntactic correctness belongs to traditionally known manifestations of syntactic dependency. We introduce a novel unsupervised parsing approach that is based on a new n-gram reducibility measure. We perform experiments across 18 languages available in CoNLL data and we show that our approach achieves better accuracy for the majority of the languages then previously reported results.
2 0.1531741 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky
Abstract: We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers.
3 0.12200933 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
Author: Greg Durrett ; Adam Pauls ; Dan Klein
Abstract: We consider the problem of using a bilingual dictionary to transfer lexico-syntactic information from a resource-rich source language to a resource-poor target language. In contrast to past work that used bitexts to transfer analyses of specific sentences at the token level, we instead use features to transfer the behavior of words at a type level. In a discriminative dependency parsing framework, our approach produces gains across a range of target languages, using two different lowresource training methodologies (one weakly supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed).
Author: Bernd Bohnet ; Joakim Nivre
Abstract: Most current dependency parsers presuppose that input words have been morphologically disambiguated using a part-of-speech tagger before parsing begins. We present a transitionbased system for joint part-of-speech tagging and labeled dependency parsing with nonprojective trees. Experimental evaluation on Chinese, Czech, English and German shows consistent improvements in both tagging and parsing accuracy when compared to a pipeline system, which lead to improved state-of-theart results for all languages.
5 0.094744712 37 emnlp-2012-Dynamic Programming for Higher Order Parsing of Gap-Minding Trees
Author: Emily Pitler ; Sampath Kannan ; Mitchell Marcus
Abstract: We introduce gap inheritance, a new structural property on trees, which provides a way to quantify the degree to which intervals of descendants can be nested. Based on this property, two new classes of trees are derived that provide a closer approximation to the set of plausible natural language dependency trees than some alternative classes of trees: unlike projective trees, a word can have descendants in more than one interval; unlike spanning trees, these intervals cannot be nested in arbitrary ways. The 1-Inherit class of trees has exactly the same empirical coverage of natural language sentences as the class of mildly nonprojective trees, yet the optimal scoring tree can be found in an order of magnitude less time. Gap-minding trees (the second class) have the property that all edges into an interval of descendants come from the same node, and thus an algorithm which uses only single in- tervals can produce trees in which a node has descendants in multiple intervals.
6 0.087843738 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output
7 0.086632378 81 emnlp-2012-Learning to Map into a Universal POS Tagset
8 0.071422063 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
9 0.071275242 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
10 0.069553889 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
11 0.068238705 59 emnlp-2012-Generating Non-Projective Word Order in Statistical Linearization
12 0.064284407 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing
13 0.061861996 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
14 0.061329626 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
15 0.060162012 43 emnlp-2012-Exact Sampling and Decoding in High-Order Hidden Markov Models
16 0.058105297 55 emnlp-2012-Forest Reranking through Subtree Ranking
17 0.057365905 106 emnlp-2012-Part-of-Speech Tagging for Chinese-English Mixed Texts with Dynamic Features
18 0.05538309 66 emnlp-2012-Improving Transition-Based Dependency Parsing with Buffer Transitions
19 0.052224703 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context
20 0.04936979 138 emnlp-2012-Wiki-ly Supervised Part-of-Speech Tagging
topicId topicWeight
[(0, 0.205), (1, -0.109), (2, 0.143), (3, -0.049), (4, 0.042), (5, 0.07), (6, -0.011), (7, 0.014), (8, -0.0), (9, 0.091), (10, 0.111), (11, 0.061), (12, 0.071), (13, 0.09), (14, -0.072), (15, -0.065), (16, -0.0), (17, -0.045), (18, -0.022), (19, 0.012), (20, 0.012), (21, -0.093), (22, -0.025), (23, 0.123), (24, 0.005), (25, 0.098), (26, -0.03), (27, 0.106), (28, -0.054), (29, 0.039), (30, -0.003), (31, -0.112), (32, -0.117), (33, 0.05), (34, 0.074), (35, -0.163), (36, -0.073), (37, 0.312), (38, -0.181), (39, 0.099), (40, -0.026), (41, 0.047), (42, 0.058), (43, 0.063), (44, 0.067), (45, 0.118), (46, 0.106), (47, 0.076), (48, -0.017), (49, 0.113)]
simIndex simValue paperId paperTitle
same-paper 1 0.89751422 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
Author: David Marecek ; Zdene20 ek Zabokrtsky
Abstract: The possibility of deleting a word from a sentence without violating its syntactic correctness belongs to traditionally known manifestations of syntactic dependency. We introduce a novel unsupervised parsing approach that is based on a new n-gram reducibility measure. We perform experiments across 18 languages available in CoNLL data and we show that our approach achieves better accuracy for the majority of the languages then previously reported results.
2 0.72268444 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky
Abstract: We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers.
3 0.51224542 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
Author: Greg Durrett ; Adam Pauls ; Dan Klein
Abstract: We consider the problem of using a bilingual dictionary to transfer lexico-syntactic information from a resource-rich source language to a resource-poor target language. In contrast to past work that used bitexts to transfer analyses of specific sentences at the token level, we instead use features to transfer the behavior of words at a type level. In a discriminative dependency parsing framework, our approach produces gains across a range of target languages, using two different lowresource training methodologies (one weakly supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed).
4 0.43592644 37 emnlp-2012-Dynamic Programming for Higher Order Parsing of Gap-Minding Trees
Author: Emily Pitler ; Sampath Kannan ; Mitchell Marcus
Abstract: We introduce gap inheritance, a new structural property on trees, which provides a way to quantify the degree to which intervals of descendants can be nested. Based on this property, two new classes of trees are derived that provide a closer approximation to the set of plausible natural language dependency trees than some alternative classes of trees: unlike projective trees, a word can have descendants in more than one interval; unlike spanning trees, these intervals cannot be nested in arbitrary ways. The 1-Inherit class of trees has exactly the same empirical coverage of natural language sentences as the class of mildly nonprojective trees, yet the optimal scoring tree can be found in an order of magnitude less time. Gap-minding trees (the second class) have the property that all edges into an interval of descendants come from the same node, and thus an algorithm which uses only single in- tervals can produce trees in which a node has descendants in multiple intervals.
5 0.40031818 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
Author: Kewei Tu ; Vasant Honavar
Abstract: We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. The resulting family of algorithms includes the expectation-maximization algorithm (EM) and its variant, Viterbi EM, as well as a so-called softmax-EM algorithm. The softmax-EM algorithm can be implemented with a simple and computationally efficient extension to standard EM. In our experiments of unsupervised dependency grammar learn- ing, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art.
6 0.39130455 59 emnlp-2012-Generating Non-Projective Word Order in Statistical Linearization
7 0.38221416 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context
9 0.34187442 75 emnlp-2012-Large Scale Decipherment for Out-of-Domain Machine Translation
10 0.33232003 81 emnlp-2012-Learning to Map into a Universal POS Tagset
11 0.32153267 121 emnlp-2012-Supervised Text-based Geolocation Using Language Models on an Adaptive Grid
12 0.30460849 88 emnlp-2012-Minimal Dependency Length in Realization Ranking
13 0.30292571 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
14 0.28582004 55 emnlp-2012-Forest Reranking through Subtree Ranking
15 0.27595681 43 emnlp-2012-Exact Sampling and Decoding in High-Order Hidden Markov Models
16 0.2745986 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis
17 0.27271226 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking
18 0.26837805 26 emnlp-2012-Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings
19 0.26793122 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation
20 0.26421407 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories
topicId topicWeight
[(2, 0.335), (11, 0.026), (16, 0.043), (25, 0.017), (29, 0.018), (34, 0.078), (39, 0.016), (45, 0.015), (60, 0.069), (63, 0.054), (64, 0.028), (65, 0.027), (70, 0.017), (73, 0.014), (74, 0.055), (76, 0.048), (80, 0.012), (86, 0.025), (95, 0.013)]
simIndex simValue paperId paperTitle
1 0.91750878 60 emnlp-2012-Generative Goal-Driven User Simulation for Dialog Management
Author: Aciel Eshky ; Ben Allison ; Mark Steedman
Abstract: User simulation is frequently used to train statistical dialog managers for task-oriented domains. At present, goal-driven simulators (those that have a persistent notion of what they wish to achieve in the dialog) require some task-specific engineering, making them impossible to evaluate intrinsically. Instead, they have been evaluated extrinsically by means of the dialog managers they are intended to train, leading to circularity of argument. In this paper, we propose the first fully generative goal-driven simulator that is fully induced from data, without hand-crafting or goal annotation. Our goals are latent, and take the form of topics in a topic model, clustering together semantically equivalent and phonetically confusable strings, implicitly modelling synonymy and speech recognition noise. We evaluate on two standard dialog resources, the Communicator and Let’s Go datasets, and demonstrate that our model has substantially better fit to held out data than competing approaches. We also show that features derived from our model allow significantly greater improvement over a baseline at distinguishing real from randomly permuted dialogs.
2 0.88183975 62 emnlp-2012-Identifying Constant and Unique Relations by using Time-Series Text
Author: Yohei Takaku ; Nobuhiro Kaji ; Naoki Yoshinaga ; Masashi Toyoda
Abstract: Because the real world evolves over time, numerous relations between entities written in presently available texts are already obsolete or will potentially evolve in the future. This study aims at resolving the intricacy in consistently compiling relations extracted from text, and presents a method for identifying constancy and uniqueness of the relations in the context of supervised learning. We exploit massive time-series web texts to induce features on the basis of time-series frequency and linguistic cues. Experimental results confirmed that the time-series frequency distributions contributed much to the recall of constancy identification and the precision of the uniqueness identification.
same-paper 3 0.81475675 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
Author: David Marecek ; Zdene20 ek Zabokrtsky
Abstract: The possibility of deleting a word from a sentence without violating its syntactic correctness belongs to traditionally known manifestations of syntactic dependency. We introduce a novel unsupervised parsing approach that is based on a new n-gram reducibility measure. We perform experiments across 18 languages available in CoNLL data and we show that our approach achieves better accuracy for the majority of the languages then previously reported results.
4 0.48164874 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction
Author: David McClosky ; Christopher D. Manning
Abstract: We present a distantly supervised system for extracting the temporal bounds of fluents (relations which only hold during certain times, such as attends school). Unlike previous pipelined approaches, our model does not assume independence between each fluent or even between named entities with known connections (parent, spouse, employer, etc.). Instead, we model what makes timelines of fluents consistent by learning cross-fluent constraints, potentially spanning entities as well. For example, our model learns that someone is unlikely to start a job at age two or to marry someone who hasn’t been born yet. Our system achieves a 36% error reduction over a pipelined baseline.
5 0.47580761 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky
Abstract: We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers.
6 0.46824387 120 emnlp-2012-Streaming Analysis of Discourse Participants
7 0.46646053 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games
8 0.46537763 81 emnlp-2012-Learning to Map into a Universal POS Tagset
9 0.46071294 72 emnlp-2012-Joint Inference for Event Timeline Construction
10 0.46025997 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media
11 0.45897245 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
12 0.45784724 66 emnlp-2012-Improving Transition-Based Dependency Parsing with Buffer Transitions
13 0.45335045 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
14 0.45268321 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
15 0.4514029 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes
16 0.45137808 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
17 0.44910488 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
18 0.44403002 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents
19 0.44200194 12 emnlp-2012-A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing
20 0.4411734 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts