emnlp emnlp2012 emnlp2012-130 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kewei Tu ; Vasant Honavar
Abstract: We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. The resulting family of algorithms includes the expectation-maximization algorithm (EM) and its variant, Viterbi EM, as well as a so-called softmax-EM algorithm. The softmax-EM algorithm can be implemented with a simple and computationally efficient extension to standard EM. In our experiments of unsupervised dependency grammar learn- ing, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. [sent-2, score-0.884]
2 The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. [sent-3, score-0.335]
3 We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. [sent-4, score-0.726]
4 In our experiments of unsupervised dependency grammar learn- ing, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art. [sent-7, score-1.71]
5 Because of the high cost of manual sentence annotation, there is substantial interest in unsupervised grammar learning, i. [sent-9, score-0.273]
6 , the induction of a grammar from a corpus of unannotated sentences. [sent-11, score-0.223]
7 edu lihood of the grammar given the training data, typically using expectation-maximization (EM) (Baker, 1979; Lari and Young, 1990; Klein and Manning, 2004). [sent-15, score-0.223]
8 More recent approaches incorporate additional prior information of the target grammar into learning. [sent-16, score-0.254]
9 (2007) used Dirichlet priors with hyperparameters set to values less than 1 to encourage sparsity of grammar rules. [sent-19, score-0.357]
10 (2008) and Cohen and Smith (2009) employed the logistic normal prior to model the correlations between grammar symbols. [sent-24, score-0.292]
11 (2010) incorporated a sparsity bias on grammar rules into learning by means of posterior regularization. [sent-26, score-0.37]
12 Against this background, we propose the use of a novel type of prior information for unsupervised learning of probabilistic natural language grammars, namely the syntactic unambiguity of natural language. [sent-35, score-0.646]
13 Although it is often possible to correctly parse a natural language sentence in more than one way, natural language is remarkably unambiguous in the sense that the number of plausible parses of a natural language sentence is rather small in comparison with the total number of possible parses. [sent-36, score-0.335]
14 Thus, we incorporate into learning an inductive bias in favor of grammars that lead to unambiguous parses on natural language sentences, by using the posterior regularization framework (Ganchev et al. [sent-37, score-0.827]
15 The fact that Viterbi EM is a special case of our approach also gives an explanation of the advantage of Viterbi EM observed in previous work: it is because Viterbi EM implicitly utilizes unambiguity regularization. [sent-42, score-0.565]
16 In our experiments of unsupervised dependency grammar learning, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art. [sent-43, score-1.71]
17 When applied to unsupervised gram- mar learning, DA has been shown to lead to worse parsing accuracy than standard EM (Smith and Eisner, 2004); in contrast, we show that our approach leads to significantly higher parsing accuracy than standard EM in unsupervised dependency grammar learning. [sent-46, score-0.413]
18 Section 2 analyzes the degree of unambiguity of natural 1325 language grammars. [sent-48, score-0.565]
19 Section 3 introduces the unambiguity regularization approach and shows that standard EM, Viterbi EM and softmax-EM are its special cases. [sent-49, score-0.834]
20 2 The (Un)ambiguity of Natural Language Grammars A grammar is said to be ambiguous on a sentence if the sentence can be parsed in more than one way by the grammar. [sent-51, score-0.275]
21 , 2006), one of the state-of-the-art English language parsers, we find many alternative parses in addition to the parses shown in (Manning and Sch u¨tze, 1999). [sent-55, score-0.25]
22 Indeed, with a probabilistic context-free grammar of only 26 nonterminals (as used in the Berkeley parser), the estimated total number of possible parses1 of the example sentence is 2 1037. [sent-56, score-0.258]
23 We can see that most of the parses have probabilities that are negligible compared with the probability of the best parse (i. [sent-59, score-0.208]
24 Quantitatively, we find that the probabilities of the parses decrease roughly exponentially as we go from the best parse to the less likely parses. [sent-62, score-0.208]
25 natural language sentence, the probability mass of the parses is concentrated to a tiny portion of all possible parses. [sent-67, score-0.222]
26 To highlight the unambiguity of natural language grammars, here we compare the parse probabilities shown in Figure 1 with the parse probabilities produced by two other probabilistic context-free grammars. [sent-69, score-0.731]
27 The random grammar has a similar number of nonterminals as in the Berkeley parser, and its grammar rule probabilities are sampled from a uniform distribution and then normalized. [sent-71, score-0.559]
28 It can be seen that unlike the natural language grammar, the random grammar produces a very uniform probability distribution over parses. [sent-72, score-0.223]
29 Figure 2(b) shows the probabilities of the 100 best parses of the example sentence produced by a maximumlikelihood grammar learned from the unannotated Wall Street Journal corpus of the Penn Treebank us- ing the EM algorithm. [sent-73, score-0.425]
30 This suggests that both the random grammar and the maximum-likelihood grammar are far more ambiguous on natural language sentences than true natural 1326 language grammars. [sent-76, score-0.498]
31 3 Learning with Unambiguity Regularization Motivated by the preceding observation, we want to incorporate into learning an inductive bias in favor of grammars that are unambiguous on natural language sentences. [sent-77, score-0.378]
32 First of all, we need a precise definition of the ambiguity of a grammar on a sentence. [sent-78, score-0.264]
33 Assume a grammar with a fixed set of grammar rules and let θ be the rule probabilities. [sent-79, score-0.478]
34 One natural measurement of the ambiguity is the information entropy of z conditioned on x and θ: H(z|x,θ) = −∑pθ(z|x)logpθ(z|x) ∑z The lower the entropy is, the less ambiguous the grammar is on sentence x. [sent-81, score-0.382]
35 When the entropy reaches 0, the grammar is strictly unambiguous on sentence x, i. [sent-82, score-0.425]
36 Now we need to modify the objective function of grammar learning to favor low ambiguity of the learned grammar in parsing natural langauge sentences. [sent-85, score-0.649]
37 Since the likelihood term in the objective function would ensure that the learned grammar will have high probability of generating natural language sentences, combining the likelihood and the prior would lead to low ambiguity of the learned grammar on natural language sentences. [sent-87, score-0.658]
38 Hence, here we adopt an alternative approach using the posterior regularization framework (Ganchev et al. [sent-89, score-0.324]
39 Posterior regularization biases learning in favor of solutions with desired behavior by constraining the model posteriors on the unlabeled data. [sent-91, score-0.328]
40 In our case, we use the constraint that the probability distributions on the parses of the training sentences given the learned grammar must have low entropy, which is equivalent to requiring the learned grammar to have low ambiguity on the training sentences. [sent-92, score-0.674]
41 , zn} hdeen soette o fth tera sinetx 10−24 100 Best Parses (a) 100 Best Parses (b) Figure 2: The probabilities of the 100 best parses of the example sentence produced by (a) a random grammar and (b) a maximum-likelihood grammar learned by the EM algorithm. [sent-99, score-0.648]
42 of parses of the training sentences, and θ denote the rule probabilities of the grammar. [sent-100, score-0.203]
43 We use the slackpenalized version of the posterior regularization objective function: J(θ) = logp(θ|X) − mq,iξn(KL(q(Z)||pθ(Z|X)) + σ∑iξi) s. [sent-101, score-0.373]
44 ∀i,H(zi) = −∑q(zi)logq(zi) ≤ ξi ∑zi where σ is a nonnegative constant that controls the strength of the regularization term; ∏q is an auxiliary distribution such that q(Z) = ∏i q(zi). [sent-103, score-0.322]
45 The first term in the objective function is∏ ∏the log posterior probability of the grammar parameters given the training corpus, and the second term minimizes the KL-divergence between the auxiliary distribution q and the posterior distribution on Z while constrains q to have low entropy. [sent-104, score-0.465]
46 q∗ = argmaqxF(θ,q) = argmqin(KL(q(Z)||pθ(Z|X)) + σ∑iH(zi)) It is different from the E-step of the EM algorithm in that it contains an additional regularization term σ ∑i H(zi). [sent-109, score-0.298]
47 fi(q) is strictly convex on the unit simplex ∆ when 0 < σ < 1. [sent-122, score-0.227]
48 in F Ftoher uannyit simplex ∆1),, we can ysh towwo that tfi(q1) + (1 − t)fi (q2) − fi (tq1 + (1 − t)q2) = (1 − σ)∑zi[t−g( qg1(t(qz1i() zi +) + (1 ( −1 − t) tg)(qq22((z i ) ) ] It is easy to prove that g(x) is strictly convex on the interval [0, 1] . [sent-126, score-0.791]
49 To compute q∗, note that pθ(zi |xi) is the product of a set of grammar rule probabilities, so we can raise all the rule probabilities of the grammar to the power of 1 then run the normal E-step of and the EM algor1it−hσm. [sent-129, score-0.594]
50 fi(q) is strictly concave on the unit simplex ∆ when σ > 1. [sent-134, score-0.24]
51 The minimum of fi(q) is attained at a vertex of the unit simplex ∆. [sent-137, score-0.261]
52 Assume the minimum of fi(q) is attained at q∗ that is not a vertex of the unit simplex ∆, so there are at least two assignments of zi, say z1 and z2, such that q∗(z1) and q∗(z2) are nonzero. [sent-139, score-0.261]
53 According to Theorem 2, fi(q) is strictly concave on the unit simplex ∆, so we have fi(q∗) > tfi(q′) + (1 − t)fi (q′′). [sent-145, score-0.24]
54 So the minimum is attained at q∗(zi) ={ 01 ioft hzeir=wi saergmaxzipθ(zi|xi) It can be seen that the minimum in the case of > 1is attained at the same point as in the case of σ = 1, at which all the probability mass is assigned to the best parse of the sentence. [sent-151, score-0.222]
55 With q∗, the objective function becomes σ F(θ,q∗) = ∑logp(zi∗,xi|θ) ∑i + logp(θ) − logp(X) The first term is the sum of the log probabilities of the best parses of the corpus. [sent-154, score-0.274]
56 Summary Our unambiguity regularization approach is an extension of the EM algorithm. [sent-156, score-0.834]
57 A larger value of σ corresponds to a stronger bias in favor of an unambiguous grammar. [sent-158, score-0.266]
58 The softmax function can be computed by simply exponentiating the grammar rule probabilities before the standard E-step, which does not increase the time complexity of the E-step. [sent-163, score-0.361]
59 1 Annealing the Strength of Regularization In unsupervised learning of probabilistic grammars, the initial grammar is typically very ambiguous (e. [sent-166, score-0.325]
60 On the other hand, natural language grammars do contain some degree of ambiguity, so if the value of σ is too large, then the learned grammar might be excessively unambiguous and thus not a good model of natural languages. [sent-170, score-0.524]
61 , σ = 1) to strongly push the learner away from the highly ambiguous initial grammar; then we gradually reduce the value of σ, possibly ending with σ = 0, to avoid inducing excessive unambiguity in the learned grammar. [sent-175, score-0.724]
62 Note that if the value of σ is annealed to 0, then our approach can be seen as providing an unambiguous initialization for standard EM. [sent-176, score-0.201]
63 Here we incorporate unambiguity regularization into mean-field variational inference. [sent-183, score-0.919]
64 The objective function with unambiguity regularization for mean-field variational inference is: F(q(θ), q(Z)) = logp(X) −(KL(q(θ)q(Z)||p(θ,Z|X)) + σ∑iH(zi)) where ∀i,H(zi) = −∑q(zi)logq(zi) ∑zi We can perform coordinate ascent that alternately optimizes q(θ) and q(Z) . [sent-184, score-1.066]
65 Since the regularization term does not contain q(θ), the optimization of q(θ) is exactly the same as in the standard mean-field variational inference. [sent-185, score-0.383]
66 Note )th raetp ilfa cDediric whilthet pp(rxiors are used over grammar rule probabilities θ, then pe (xi, zi) can be reprmesaren rtuelde aprso btaheb pitiroedsu θc,t hoefn nae p xset of weights in mmeaarn r-uflieeld p rvoabraibatiiloitnieasl i θn,fe threennce pe ((xKurihara and Sato, 2004). [sent-187, score-0.385]
67 1 ≥ 4 Experiments We tested the effectiveness of unambiguity regularization in unsupervised learning of a type of dependency grammar called the dependency model with valence (DMV) (Klein and Manning, 2004). [sent-189, score-1.195]
68 To evaluate a learned grammar, we used the grammar to parse the testing corpus and computed the dependency accuracy which is the percentage of the dependencies that are correctly matched between the parses generated by the grammar and the gold standard parses. [sent-203, score-0.683]
69 It can be seen that learning with unambiguity regularization (i. [sent-215, score-0.834]
70 , with σ > 0) consistently outperforms learning without unambiguity regularization (i. [sent-217, score-0.834]
71 The grammar learned by Viterbi EM has significantly higher dependency accuracy in parsing short sentences. [sent-220, score-0.321]
72 We speculate that this is because short sentences are less ambiguous and therefore a strong unambiguity regularization is especially helpful in learning the grammatical structures of short sentences. [sent-221, score-0.886]
73 25 achieves the best dependency accuracy, which suggests that controlling the strength of unambiguity regularization can contribute to improved performance. [sent-223, score-0.906]
74 The results of this experiment (shown as “URAnnealing” in Table 2) suggest that annealing the value of σ not only helps circumvent the problem of choosing an optimal value of σ, but may also lead to substantial improvements over the results of learning using any fixed value of σ. [sent-237, score-0.234]
75 We added Dirichlet priors over grammar rule probabilities and ran the variational inference version of our approach. [sent-239, score-0.471]
76 25 consistently boosted the dependency accuracy of the learned grammar by 1–2%. [sent-245, score-0.298]
77 It can be seen that our best result (unambiguity regularization with annealing and prior) clearly outperforms previous results. [sent-248, score-0.425]
78 3 Results on Extended Models It has been pointed out that the DMV model is very simplistic and cannot capture many linguistic phenomena; therefore a few extensions of DMV have been proposed, which achieve significant improvement over DMV in unsupervised grammar learning (Headden et al. [sent-251, score-0.273]
79 We examined the effect of unambiguity regularization on E-DMV, an extension of DMV (with two different settings: (2,2) and (3,3)) (Headden et al. [sent-253, score-0.834]
80 As shown in the second part of Table 2, unambiguity regularization with annealing on E-DMV achieves better dependency accuracies than the state-of-the-art approaches to unsupervised parsing with extended dependency models. [sent-256, score-1.151]
81 We speculate that the performance of unambiguity regularization can be further improved if applied to more advanced models like LexTSG-DMV (Blunsom and Cohn, 2010). [sent-259, score-0.834]
82 4 Results on More Languages We examined the effect of unambiguity regularization with the DMV model on the corpora of eight additional languages2. [sent-261, score-0.834]
83 It can be seen that learning with unambiguity regularization (i. [sent-263, score-0.834]
84 , with σ > 0) outperforms learning without unambiguity regularization (i. [sent-265, score-0.834]
85 5 Related Work Deterministic annealing (DA) (Rose, 1998; Smith and Eisner, 2004) also extends the standard EM algorithm by exponentiating the posterior probabilities of the hidden variables in the E-step. [sent-274, score-0.286]
86 In contrast, the goal of unambiguity regularization is to bias learning in favor of unambiguous grammars, which is achieved by setting the exponent in the Estep (i. [sent-276, score-1.116]
87 In contrast, the results of our experiments show that unambiguity regularization leads to significantly higher parsing accuracy than standard EM. [sent-283, score-0.857]
88 However, entropy regularization is either motivated by the theoretical result that unlabeled data samples are informa- 1332 tive when classes are well separated (Grandvalet and Bengio, 2005), or derived from the expected conditional log-likelihood (Smith and Eisner, 2007). [sent-285, score-0.302]
89 In contrast, our approach is motivated by the observed unambiguity of natural language grammars. [sent-286, score-0.565]
90 One implication of this difference is that if our approach is applied to semi-supervised learning, the regularization term would be applied to labeled sentences as well (by ignoring the labels) because the target grammar shall be unambiguous on all the training sentences. [sent-287, score-0.659]
91 The sparsity bias, which favors a grammar with fewer grammar rules, has been widely used in unsupervised grammar learning (Chen, 1995; Johnson et al. [sent-288, score-0.768]
92 Although a more sparse grammar is often less ambiguous, in general that is not always the case. [sent-291, score-0.223]
93 We have shown that unambiguity regularization could lead to better performance than approaches utilizing the sparsity bias, and that the two types of biases can be applied together for further improvement in the learning performance. [sent-292, score-0.883]
94 6 Conclusion We have introduced unambiguity regularization, a novel approach to unsupervised learning of probabilistic natural language grammars. [sent-293, score-0.615]
95 It is based on the observation that natural language grammars are remarkably unambiguous in the sense that in parsing natural language sentences they tend to concentrate the probability mass to a tiny portion of all possible parses. [sent-294, score-0.371]
96 By using posterior regularization, we incorporate an inductive bias into learning in favor of grammars that are unambiguous on natural language sentences. [sent-295, score-0.433]
97 In our experiments of unsupervised dependency grammar learning, we show that unambiguity regularization is beneficial to learning, and by incorporating regularization strength annealing and sparsity priors our approach outperforms the current state-of-theart grammar learning algorithms. [sent-298, score-1.961]
98 For future work, we plan to combine unambiguity regularization with ArabicBasqueCzechDanishDutchEnglishPortugueseSloveneSwedish TaUσblRe= -A301:. [sent-299, score-0.834]
99 other types of priors and regularizations for unsupervised grammar learning, to apply it to more advanced grammar models, and to explore alternative formulations of unambiguity regularization. [sent-318, score-1.175]
100 Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. [sent-378, score-0.311]
wordName wordTfidf (topN-words)
[('unambiguity', 0.565), ('zi', 0.441), ('regularization', 0.269), ('grammar', 0.223), ('fi', 0.17), ('em', 0.168), ('annealing', 0.156), ('unambiguous', 0.138), ('parses', 0.125), ('simplex', 0.125), ('viterbi', 0.111), ('grammars', 0.106), ('logp', 0.093), ('dmv', 0.09), ('variational', 0.085), ('priors', 0.085), ('kl', 0.062), ('favor', 0.059), ('iowa', 0.058), ('tfi', 0.058), ('posterior', 0.055), ('ascent', 0.052), ('ambiguous', 0.052), ('unsupervised', 0.05), ('da', 0.05), ('sparsity', 0.049), ('xi', 0.049), ('objective', 0.049), ('ih', 0.049), ('dirichlet', 0.048), ('unit', 0.047), ('gillenwater', 0.046), ('coordinate', 0.046), ('probabilities', 0.046), ('dependency', 0.044), ('haji', 0.044), ('grandvalet', 0.043), ('bias', 0.043), ('pe', 0.042), ('exponent', 0.042), ('ambiguity', 0.041), ('theorem', 0.04), ('normal', 0.038), ('concave', 0.037), ('annealed', 0.037), ('tiny', 0.037), ('honavar', 0.037), ('ganchev', 0.037), ('attained', 0.037), ('parse', 0.037), ('nonterminals', 0.035), ('remarkably', 0.035), ('entropy', 0.033), ('mass', 0.032), ('rule', 0.032), ('smith', 0.032), ('inductive', 0.032), ('prior', 0.031), ('learned', 0.031), ('softmax', 0.031), ('strictly', 0.031), ('headden', 0.029), ('aduriz', 0.029), ('beek', 0.029), ('exponentiating', 0.029), ('exponentiation', 0.029), ('hzeir', 0.029), ('kewei', 0.029), ('kurihara', 0.029), ('lari', 0.029), ('logq', 0.029), ('regularizations', 0.029), ('saergmaxzip', 0.029), ('vasant', 0.029), ('term', 0.029), ('strength', 0.028), ('concentrated', 0.028), ('spitkovsky', 0.028), ('optimize', 0.028), ('minimum', 0.027), ('berkeley', 0.026), ('eisner', 0.026), ('value', 0.026), ('gradually', 0.025), ('vertex', 0.025), ('stationary', 0.025), ('sato', 0.025), ('nonnegative', 0.025), ('afonso', 0.025), ('erjavec', 0.025), ('basque', 0.025), ('tg', 0.025), ('excessive', 0.025), ('ioft', 0.025), ('treebank', 0.025), ('log', 0.025), ('convex', 0.024), ('manning', 0.023), ('rose', 0.023), ('parsing', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
Author: Kewei Tu ; Vasant Honavar
Abstract: We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. The resulting family of algorithms includes the expectation-maximization algorithm (EM) and its variant, Viterbi EM, as well as a so-called softmax-EM algorithm. The softmax-EM algorithm can be implemented with a simple and computationally efficient extension to standard EM. In our experiments of unsupervised dependency grammar learn- ing, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art.
2 0.13884684 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes
Author: Robert Lindsey ; William Headden ; Michael Stipicevic
Abstract: Topic models traditionally rely on the bagof-words assumption. In data mining applications, this often results in end-users being presented with inscrutable lists of topical unigrams, single words inferred as representative of their topics. In this article, we present a hierarchical generative probabilistic model of topical phrases. The model simultaneously infers the location, length, and topic of phrases within a corpus and relaxes the bagof-words assumption within phrases by using a hierarchy of Pitman-Yor processes. We use Markov chain Monte Carlo techniques for approximate inference in the model and perform slice sampling to learn its hyperparameters. We show via an experiment on human subjects that our model finds substantially better, more interpretable topical phrases than do competing models.
3 0.13525885 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky
Abstract: We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers.
4 0.10434116 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation
Author: David Hall ; Dan Klein
Abstract: PCFGs can grow exponentially as additional annotations are added to an initially simple base grammar. We present an approach where multiple annotations coexist, but in a factored manner that avoids this combinatorial explosion. Our method works with linguisticallymotivated annotations, induced latent structure, lexicalization, or any mix of the three. We use a structured expectation propagation algorithm that makes use of the factored structure in two ways. First, by partitioning the factors, it speeds up parsing exponentially over the unfactored approach. Second, it minimizes the redundancy of the factors during training, improving accuracy over an independent approach. Using purely latent variable annotations, we can efficiently train and parse with up to 8 latent bits per symbol, achieving F1 scores up to 88.4 on the Penn Treebank while using two orders of magnitudes fewer parameters compared to the na¨ ıve approach. Combining latent, lexicalized, and unlexicalized anno- tations, our best parser gets 89.4 F1 on all sentences from section 23 of the Penn Treebank.
5 0.10407135 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction
Author: Mihai Surdeanu ; Julie Tibshirani ; Ramesh Nallapati ; Christopher D. Manning
Abstract: Distant supervision for relation extraction (RE) gathering training data by aligning a database of facts with text – is an efficient approach to scale RE to thousands of different relations. However, this introduces a challenging learning scenario where the relation expressed by a pair of entities found in a sentence is unknown. For example, a sentence containing Balzac and France may express BornIn or Died, an unknown relation, or no relation at all. Because of this, traditional supervised learning, which assumes that each example is explicitly mapped to a label, is not appropriate. We propose a novel approach to multi-instance multi-label learning for RE, which jointly models all the instances of a pair of entities in text and all their labels using a graphical model with latent variables. Our model performs competitively on two difficult domains. –
6 0.090875365 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
7 0.089176208 81 emnlp-2012-Learning to Map into a Universal POS Tagset
8 0.08052361 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules
9 0.080416121 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
10 0.069845498 48 emnlp-2012-Exploring Adaptor Grammars for Native Language Identification
11 0.069553889 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
12 0.06937325 94 emnlp-2012-Multiple Aspect Summarization Using Integer Linear Programming
13 0.055813976 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers
14 0.05391733 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
15 0.052563537 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context
16 0.049464367 90 emnlp-2012-Modelling Sequential Text with an Adaptive Topic Model
17 0.048202816 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
18 0.047370039 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
19 0.04564172 12 emnlp-2012-A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing
20 0.043631956 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
topicId topicWeight
[(0, 0.174), (1, -0.049), (2, 0.103), (3, 0.005), (4, -0.091), (5, 0.067), (6, -0.03), (7, 0.027), (8, 0.021), (9, 0.085), (10, 0.07), (11, 0.196), (12, 0.052), (13, 0.01), (14, -0.209), (15, 0.018), (16, -0.075), (17, 0.063), (18, -0.125), (19, -0.166), (20, 0.069), (21, 0.041), (22, 0.053), (23, 0.019), (24, -0.19), (25, 0.006), (26, 0.012), (27, 0.297), (28, -0.069), (29, 0.103), (30, 0.195), (31, -0.093), (32, -0.027), (33, -0.001), (34, 0.046), (35, 0.098), (36, -0.072), (37, 0.032), (38, -0.071), (39, -0.033), (40, -0.071), (41, -0.143), (42, 0.036), (43, 0.14), (44, 0.049), (45, -0.095), (46, -0.038), (47, 0.004), (48, 0.025), (49, 0.004)]
simIndex simValue paperId paperTitle
same-paper 1 0.96497977 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
Author: Kewei Tu ; Vasant Honavar
Abstract: We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. The resulting family of algorithms includes the expectation-maximization algorithm (EM) and its variant, Viterbi EM, as well as a so-called softmax-EM algorithm. The softmax-EM algorithm can be implemented with a simple and computationally efficient extension to standard EM. In our experiments of unsupervised dependency grammar learn- ing, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art.
2 0.65016258 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky
Abstract: We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers.
3 0.47789001 48 emnlp-2012-Exploring Adaptor Grammars for Native Language Identification
Author: Sze-Meng Jojo Wong ; Mark Dras ; Mark Johnson
Abstract: The task of inferring the native language of an author based on texts written in a second language has generally been tackled as a classification problem, typically using as features a mix of n-grams over characters and part of speech tags (for small and fixed n) and unigram function words. To capture arbitrarily long n-grams that syntax-based approaches have suggested are useful, adaptor grammars have some promise. In this work we investigate their extension to identifying n-gram collocations of arbitrary length over a mix of PoS tags and words, using both maxent and induced syntactic language model approaches to classification. After presenting a new, simple baseline, we show that learned collocations used as features in a maxent model perform better still, but that the story is more mixed for the syntactic language model.
4 0.44421276 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation
Author: David Hall ; Dan Klein
Abstract: PCFGs can grow exponentially as additional annotations are added to an initially simple base grammar. We present an approach where multiple annotations coexist, but in a factored manner that avoids this combinatorial explosion. Our method works with linguisticallymotivated annotations, induced latent structure, lexicalization, or any mix of the three. We use a structured expectation propagation algorithm that makes use of the factored structure in two ways. First, by partitioning the factors, it speeds up parsing exponentially over the unfactored approach. Second, it minimizes the redundancy of the factors during training, improving accuracy over an independent approach. Using purely latent variable annotations, we can efficiently train and parse with up to 8 latent bits per symbol, achieving F1 scores up to 88.4 on the Penn Treebank while using two orders of magnitudes fewer parameters compared to the na¨ ıve approach. Combining latent, lexicalized, and unlexicalized anno- tations, our best parser gets 89.4 F1 on all sentences from section 23 of the Penn Treebank.
5 0.38985714 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes
Author: Robert Lindsey ; William Headden ; Michael Stipicevic
Abstract: Topic models traditionally rely on the bagof-words assumption. In data mining applications, this often results in end-users being presented with inscrutable lists of topical unigrams, single words inferred as representative of their topics. In this article, we present a hierarchical generative probabilistic model of topical phrases. The model simultaneously infers the location, length, and topic of phrases within a corpus and relaxes the bagof-words assumption within phrases by using a hierarchy of Pitman-Yor processes. We use Markov chain Monte Carlo techniques for approximate inference in the model and perform slice sampling to learn its hyperparameters. We show via an experiment on human subjects that our model finds substantially better, more interpretable topical phrases than do competing models.
6 0.3786076 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules
7 0.36746183 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
8 0.34014967 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction
9 0.3004781 81 emnlp-2012-Learning to Map into a Universal POS Tagset
10 0.25905818 94 emnlp-2012-Multiple Aspect Summarization Using Integer Linear Programming
11 0.25868985 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage
12 0.24866429 125 emnlp-2012-Towards Efficient Named-Entity Rule Induction for Customizability
13 0.24161367 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
14 0.23792349 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
15 0.22235253 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
16 0.21346959 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers
17 0.20898776 108 emnlp-2012-Probabilistic Finite State Machines for Regression-based MT Evaluation
18 0.20056203 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context
19 0.19917236 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
20 0.1884748 75 emnlp-2012-Large Scale Decipherment for Out-of-Domain Machine Translation
topicId topicWeight
[(2, 0.028), (16, 0.037), (25, 0.02), (34, 0.084), (39, 0.019), (45, 0.29), (60, 0.065), (63, 0.039), (64, 0.01), (65, 0.023), (73, 0.017), (74, 0.144), (76, 0.053), (80, 0.023), (81, 0.012), (86, 0.028), (95, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.82371944 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
Author: Kewei Tu ; Vasant Honavar
Abstract: We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. The resulting family of algorithms includes the expectation-maximization algorithm (EM) and its variant, Viterbi EM, as well as a so-called softmax-EM algorithm. The softmax-EM algorithm can be implemented with a simple and computationally efficient extension to standard EM. In our experiments of unsupervised dependency grammar learn- ing, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art.
2 0.79663289 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition
Author: William Blacoe ; Mirella Lapata
Abstract: In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.
3 0.69798023 135 emnlp-2012-Using Discourse Information for Paraphrase Extraction
Author: Michaela Regneri ; Rui Wang
Abstract: Previous work on paraphrase extraction using parallel or comparable corpora has generally not considered the documents’ discourse structure as a useful information source. We propose a novel method for collecting paraphrases relying on the sequential event order in the discourse, using multiple sequence alignment with a semantic similarity measure. We show that adding discourse information boosts the performance of sentence-level paraphrase acquisition, which consequently gives a tremendous advantage for extracting phraselevel paraphrase fragments from matched sentences. Our system beats an informed baseline by a margin of 50%.
4 0.64706719 138 emnlp-2012-Wiki-ly Supervised Part-of-Speech Tagging
Author: Shen Li ; Joao Graca ; Ben Taskar
Abstract: Despite significant recent work, purely unsupervised techniques for part-of-speech (POS) tagging have not achieved useful accuracies required by many language processing tasks. Use of parallel text between resource-rich and resource-poor languages is one source ofweak supervision that significantly improves accuracy. However, parallel text is not always available and techniques for using it require multiple complex algorithmic steps. In this paper we show that we can build POS-taggers exceeding state-of-the-art bilingual methods by using simple hidden Markov models and a freely available and naturally growing resource, the Wiktionary. Across eight languages for which we have labeled data to evaluate results, we achieve accuracy that significantly exceeds best unsupervised and parallel text methods. We achieve highest accuracy reported for several languages and show that our . approach yields better out-of-domain taggers than those trained using fully supervised Penn Treebank.
5 0.55205494 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis
Author: Shafiq Joty ; Giuseppe Carenini ; Raymond Ng
Abstract: We propose a complete probabilistic discriminative framework for performing sentencelevel discourse analysis. Our framework comprises a discourse segmenter, based on a binary classifier, and a discourse parser, which applies an optimal CKY-like parsing algorithm to probabilities inferred from a Dynamic Conditional Random Field. We show on two corpora that our approach outperforms the state-of-the-art, often by a wide margin.
6 0.54344118 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules
7 0.51483279 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
8 0.51393443 21 emnlp-2012-Assessment of ESL Learners' Syntactic Competence Based on Similarity Measures
9 0.51042461 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
10 0.49218169 81 emnlp-2012-Learning to Map into a Universal POS Tagset
11 0.48896617 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
12 0.48486453 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields
13 0.48310375 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
14 0.4778221 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction
15 0.47599658 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
16 0.47203249 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction
17 0.47048914 120 emnlp-2012-Streaming Analysis of Discourse Participants
18 0.46955031 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
19 0.46802109 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
20 0.46606129 88 emnlp-2012-Minimal Dependency Length in Realization Ranking