acl acl2012 acl2012-11 knowledge-graph by maker-knowledge-mining

11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction


Source: pdf

Author: Dave Golland ; John DeNero ; Jakob Uszkoreit

Abstract: We present LLCCM, a log-linear variant ofthe constituent context model (CCM) of grammar induction. LLCCM retains the simplicity of the original CCM but extends robustly to long sentences. On sentences of up to length 40, LLCCM outperforms CCM by 13.9% bracketing F1 and outperforms a right-branching baseline in regimes where CCM does not.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract We present LLCCM, a log-linear variant ofthe constituent context model (CCM) of grammar induction. [sent-5, score-0.19]

2 LLCCM retains the simplicity of the original CCM but extends robustly to long sentences. [sent-6, score-0.073]

3 On sentences of up to length 40, LLCCM outperforms CCM by 13. [sent-7, score-0.069]

4 9% bracketing F1 and outperforms a right-branching baseline in regimes where CCM does not. [sent-8, score-0.073]

5 1 Introduction Unsupervised grammar induction is a fundamental challenge of statistical natural language processing (Lari and Young, 1990; Pereira and Schabes, 1992; Carroll and Charniak, 1992). [sent-9, score-0.136]

6 The constituent context model (CCM) for inducing constituency parses (Klein and Manning, 2002) was the first unsupervised approach to surpass a right-branching baseline. [sent-10, score-0.306]

7 This paper shows that a simple re- parameterization of the model, which ties together the probabilities of related events, allows the CCM to extend robustly to long sentences. [sent-12, score-0.114]

8 For instance, the dependency model with valence (DMV) of Klein and Manning (2004) has been extended to utilize multilingual information (Berg-Kirkpatrick and Klein, 2010; Cohen et al. [sent-14, score-0.028]

9 Nevertheless, simplistic dependency models like the DMV do not contain information present in a constituency parse, such as the attachment order of object and subject to a verb. [sent-18, score-0.097]

10 Unsupervised constituency parsing is also an active research area. [sent-19, score-0.069]

11 , 2011) 17 have considered the problem of inducing parses over raw lexical items rather than part-of-speech (POS) tags. [sent-21, score-0.05]

12 The CCM scores each parse as a product of probabilities of span and context subsequences. [sent-23, score-0.19]

13 It was originally evaluated only on unpunctuated sentences up to length 10 (Klein and Manning, 2002), which account for only 15% of the WSJ corpus; our experiments confirm the observation in (Klein, 2005) that performance degrades dramatically on longer sentences. [sent-24, score-0.1]

14 This problem is unsurprising: CCM scores each constituent type by a single, isolated multinomial parameter. [sent-25, score-0.132]

15 Our work leverages the idea that sharing information between local probabilities in a structured unsupervised model can lead to substantial accuracy gains, previously demonstrated for dependency grammar induction (Cohen and Smith, 2009; BergKirkpatrick et al. [sent-26, score-0.296]

16 Our model, Log-Linear CCM (LLCCM), shares information between the probabilities of related constituents by expressing them as a log-linear combination of features trained using the gradient-based learning procedure ofBergKirkpatrick et al. [sent-28, score-0.125]

17 In this way, the probability of generating a constituent is informed by related constituents. [sent-30, score-0.089]

18 Our model improves unsupervised constituency parsing of sentences longer than 10 words. [sent-31, score-0.192]

19 On sentences of up to length 40 (96% of all sentences in the Penn Treebank), LLCCM outperforms CCM by 13. [sent-32, score-0.097]

20 9% (unlabeled) bracketing F1 and, unlike CCM, outperforms a right-branching baseline on sentences longer than 15 words. [sent-33, score-0.132]

21 c s 2o0c1ia2ti Aosns fo cria Ctio nm fpourta Ctoiomnpault Laitniognuaislt Licisn,g puaigsteiscs 17–2 , 2 Model The CCM is a generative model for the unsupervised induction of binary constituency parses over sequences of part-of-speech (POS) tags (Klein and Manning, 2002). [sent-36, score-0.257]

22 Conditioned on the constituency or distituency ofeach span in the parse, CCM generates both the complete sequence of terminals it contains and the terminals in the surrounding context. [sent-37, score-0.299]

23 Formally, the CCM is a probabilistic model that jointly generates a sentence, s, and a bracketing, B, specifying whether each contiguous subsequence is a constituent or not, in which case the span is called a distituent. [sent-38, score-0.254]

24 Each subsequence of POS tags, or SPAN, α, occurs in a CONTEXT, β, which is an ordered pair of preceding and following tags. [sent-39, score-0.025]

25 A bracketing is a boolean matrix B, indicating which spans (i, j) are constituents (Bij = true) and which are distituents (Bij = false). [sent-40, score-0.317]

26 A bracketing is considered legal if its constituents are nested and form a binary tree T(B). [sent-41, score-0.217]

27 The joint distribution is given by: P(s, B) = PT (B) · Y PS (α(i,j, s)|true) PC (β(i,j, s)|true) Y PS (α(i,j, s)|false) PC (β(i,j, s)|false) · i,j∈YT(B) i,j6∈YT(B) The prior over unobserved bracketings PT (B) is fixed to be the uniform distribution over all legal bracketings. [sent-42, score-0.126]

28 The other distributions, PS (·) and PC (·), are multinomials whose isolated parameters are (e·s)t,im araete md utlot nmoamxiimalisze w thhoes eli ikseolliahoteodd p oafr a set orfs observed sentences {sn} using EM (Dempster et al. [sent-43, score-0.103]

29 1 The Log-Linear CCM A fundamental limitation of the CCM is that it con- tains a single isolated parameter for every span. [sent-46, score-0.043]

30 The number ofdifferent possible span types increases exponentially in span length, leading to data sparsity as the sentence length increases. [sent-47, score-0.271]

31 1As mentioned in (Klein and Manning, 2002), the CCM model is deficient because it assigns probability mass to yields and spans that cannot consistently combine to form a valid sentence. [sent-48, score-0.07]

32 18 The Log-Linear CCM (LLCCM) reparameterizes the distributions in the CCM using intuitive features to address the limitations of CCM while retaining its predictive power. [sent-50, score-0.024]

33 The set of proposed features includes a BASIC feature for each parameter of the original CCM, enabling the LLCCM to retain the full expressive power of the CCM. [sent-51, score-0.022]

34 To introduce features into the CCM, we express each of its local conditional distributions as a multiclass logistic regression model. [sent-53, score-0.075]

35 Each local distribution, Pt(y|x) for t ∈ {SPAN, CONTEXT}, condi- tions on la(bye|lx x ∈ {true, false} and generates an event (span or context) y. [sent-54, score-0.052]

36 mation was shown to be effective in unsupervised models for part-of-speech induction, dependency grammar induction, word alignment, and word segmentation (Berg-Kirkpatrick et al. [sent-59, score-0.159]

37 In our case, replacing multinomials via featurized models not only improves model accuracy, but also lets the model apply effectively to a new regime of long sentences. [sent-61, score-0.055]

38 2 Feature Templates In the SPAN model, for each span y = [α1 , . [sent-63, score-0.115]

39 ” For example, there are many distinct noun phrases with different spans that all begin with DT and end with NN; a fact expressed by the BOUNDARY feature (Table 1). [sent-67, score-0.07]

40 Notice that although the BASIC span features are active for at most one span, the remaining features fire for both spans, effectively sharing information between the local probabilities of these events. [sent-71, score-0.183]

41 The coarser CONTEXT features factor the context pair into its components, which allow the LLCCM to more easily learn, for example, that a constituent is unlikely to immediately follow a determiner. [sent-72, score-0.123]

42 3 Training In the EM algorithm for estimating CCM parameters, the E-Step computes posteriors over bracket- ings using the Inside-Outside algorithm. [sent-73, score-0.032]

43 The MStep chooses parameters that maximize the expected complete log likelihood of the data. [sent-74, score-0.08]

44 (2010) showed that the data log likelihood gradient is equivalent to the gradient of the expected complete log likelihood (the objective maximized in the M-step of EM) at the point from which expectations are computed. [sent-77, score-0.238]

45 First, we compute the local probabilities of the CCM, Pt(y|x), from the current w using Equation (1). [sent-79, score-0.068]

46 W(ye| approximate eth ceu rnroernmta wliza utisoinng over an exponential number of terms by only summing over spans that appeared in the training corpus. [sent-80, score-0.093]

47 We summarize these expected count quantities as: − exyt=(eexxy , SCPOANNTEXT i f t == SCPOANNTEXT Finally, we compute the gradient with respect to w, expressed in terms of these expected counts and conditional probabilities: ∇L(w) = Xexytfxyt− G(w) Xxyt G(w) =Xxt Xyexyt! [sent-82, score-0.148]

48 Xy0Pt(y|x)fxy0t Following (Klein and Manning, 2002), we initialize the model weights by optimizing against posterior probabilities fixed to the split-uniform distribution, which generates binary trees by randomly choosing a split point and recursing on each side of the split. [sent-83, score-0.153]

49 2, Klein (2005) shows this posterior can be expressed in closed form. [sent-87, score-0.025]

50 As in previous work, we start the initialization optimization with the zero vector, and terminate after 10 iterations to regularize against achieving a local maximum. [sent-88, score-0.027]

51 Each fixed (i, j) and sn pair has exPactly one span and contPext, hence the quantities Py I [α(i, j,sn) = y] and Py I [β(i, j,sn) = y] are Pboth equal to 1. [sent-91, score-0.251]

52 The sum of the posterior probabilities, δ(true), over all positions is equal to the total number of constituents in the tree. [sent-93, score-0.109]

53 Any binary tree over N terminals contains exactly 2N − 1 constituents and 12(N − 2)(N − 1) daicsttliytu 2enNts. [sent-94, score-0.161]

54 γt(x) =(P21Psns(n2(| ssnn|| − − 1 2))(|sn| − 1) i f x = tfraulese where |sn | dePnotes the length of sentence sn. [sent-95, score-0.041]

55 Thus, G(w) can bthee precomputed once f sor the entire dataset at each minimization step. [sent-96, score-0.039]

56 Moreover, γt(x) can be precomputed once before all iterations. [sent-97, score-0.039]

57 2 Relationship to Smoothing The original CCM uses additive smoothing in its Mstep to capture the fact that distituents outnumber constituents. [sent-99, score-0.113]

58 For each span or context, CCM adds 10 counts: 2 as a constituent and 8 as a distituent. [sent-100, score-0.204]

59 4 We note that these smoothing parameters are tailored to short sentences: in a binary tree, the number of constituents grows linearly with sentence length, whereas the number of distituents grows quadratically. [sent-101, score-0.301]

60 Therefore, the ratio of constituents to distituents is not constant across sentence lengths. [sent-102, score-0.174]

61 In contrast, by virtue of the log-linear model, LLCCM assigns positive probability to all spans or contexts without explicit smoothing. [sent-103, score-0.07]

62 4These counts are specified in (Klein, 2005); Klein and Manning (2002) added 10 constituent and 50 distituent counts. [sent-104, score-0.116]

63 20 100 g upp pe e r b b o ou u n n d d Maximum sentence length Figure 1: CCM and LLCCM trained and tested on sentences of a fixed length. [sent-105, score-0.099]

64 The binary branching upper bound correponds to UBOUND from (Klein and Manning, 2002). [sent-107, score-0.064]

65 We report bracketing F1 scores between the binary trees predicted by the models on these sequences and the treebank parses. [sent-110, score-0.105]

66 We train and evaluate both a CCM implementa- tion (Luque, 2011) and our LLCCM on sentences up to a fixed length n, for n ∈ {10, 15, . [sent-111, score-0.099]

67 Figure a1 f sxheodw les nthgtaht L nL,C foCrM n substantially outperforms the CCM on longer sentences. [sent-115, score-0.031]

68 After length 15, CCM accuracy falls below the right branching baseline, whereas LLCCM remains significantly better than right-branching through length 40. [sent-116, score-0.114]

69 5 Conclusion Our log-linear variant of the CCM extends robustly to long sentences, enabling constituent grammar induction to be used in settings that typically include long sentences, such as machine translation reordering (Chiang, 2005; DeNero and Uszkoreit, 2011; Dyer et al. [sent-117, score-0.343]

70 In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1288–1297, Uppsala, Sweden, July. [sent-124, score-0.028]

71 In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 582–590, Los Angeles, California, June. [sent-129, score-0.028]

72 Two experiments on learning probabilistic dependency grammars from corpora. [sent-137, score-0.028]

73 In Workshop Notes for StatisticallyBased NLP Techniques, AAAI, pages 1–13. [sent-138, score-0.028]

74 In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 263–270, Ann Arbor, Michigan, June. [sent-142, score-0.028]

75 Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. [sent-148, score-0.179]

76 In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Asso- ciation for Computational Linguistics, pages 74–82, Boulder, Colorado, June. [sent-149, score-0.028]

77 In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 50–61, Edinburgh, Scotland, UK. [sent-156, score-0.028]

78 Maximum likelihood from incomplete data via the EM algorithm. [sent-161, score-0.031]

79 In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 193– 203, Edinburgh, Scotland, UK. [sent-167, score-0.028]

80 In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 337–343, Edinburgh, Scotland, July. [sent-175, score-0.028]

81 Improving unsupervised dependency parsing with richer contexts and smoothing. [sent-180, score-0.092]

82 In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 101–109, Boulder, Colorado, June. [sent-181, score-0.028]

83 In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pages 128–135, Philadelphia, Pennsylvania, USA, July. [sent-187, score-0.028]

84 Corpusbased induction of syntactic structure: Models of dependency and constituency. [sent-192, score-0.097]

85 In Proceedings of the 42nd Meeting of the Association for Computational Linguistics, Main Volume, pages 478–485, Barcelona, Spain, July. [sent-193, score-0.028]

86 The estimation of stochastic context-free grammars using the insideoutside algorithm. [sent-203, score-0.036]

87 In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1234–1244, Cambridge, MA, October. [sent-227, score-0.028]

88 In Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, pages 128– 135, Newark, Delaware, USA, June. [sent-232, score-0.028]

89 Simple unsupervised grammar induction from raw text with cascaded finite state models. [sent-236, score-0.2]

90 In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1077–1086, Portland, Oregon, USA, June. [sent-237, score-0.028]

91 In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 684–693, Cambridge, MA, October. [sent-242, score-0.028]

92 In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 384– 391, Prague, Czech Republic, June. [sent-247, score-0.028]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ccm', 0.79), ('llccm', 0.316), ('xsn', 0.135), ('span', 0.115), ('klein', 0.102), ('distituents', 0.09), ('constituent', 0.089), ('xij', 0.084), ('constituents', 0.084), ('sn', 0.079), ('bracketing', 0.073), ('dmv', 0.072), ('spans', 0.07), ('induction', 0.069), ('constituency', 0.069), ('bracketings', 0.068), ('grammar', 0.067), ('unsupervised', 0.064), ('manning', 0.059), ('false', 0.054), ('pt', 0.051), ('robustly', 0.05), ('naseem', 0.05), ('gradient', 0.05), ('denero', 0.05), ('terminals', 0.045), ('xy', 0.045), ('bij', 0.045), ('headden', 0.045), ('ponvert', 0.045), ('scpoanntext', 0.045), ('association', 0.044), ('true', 0.044), ('isolated', 0.043), ('pc', 0.043), ('length', 0.041), ('probabilities', 0.041), ('cohen', 0.04), ('appendix', 0.039), ('lari', 0.039), ('mstep', 0.039), ('precomputed', 0.039), ('xxxi', 0.039), ('ps', 0.038), ('grows', 0.036), ('insideoutside', 0.036), ('py', 0.036), ('reichart', 0.036), ('taylor', 0.035), ('context', 0.034), ('tahira', 0.034), ('xxi', 0.034), ('binary', 0.032), ('branching', 0.032), ('multinomials', 0.032), ('posteriors', 0.032), ('wsj', 0.031), ('likelihood', 0.031), ('scotland', 0.031), ('longer', 0.031), ('uszkoreit', 0.03), ('dempster', 0.03), ('fixed', 0.03), ('em', 0.029), ('dt', 0.029), ('shay', 0.029), ('noah', 0.028), ('pages', 0.028), ('templates', 0.028), ('legal', 0.028), ('carroll', 0.028), ('dependency', 0.028), ('sentences', 0.028), ('inducing', 0.027), ('local', 0.027), ('counts', 0.027), ('log', 0.027), ('quantities', 0.027), ('yt', 0.027), ('google', 0.026), ('posterior', 0.025), ('subsequence', 0.025), ('jakob', 0.025), ('generates', 0.025), ('nn', 0.024), ('distributions', 0.024), ('edinburgh', 0.024), ('logistic', 0.024), ('parses', 0.023), ('computational', 0.023), ('smoothing', 0.023), ('colorado', 0.023), ('summing', 0.023), ('long', 0.023), ('annual', 0.023), ('enabling', 0.022), ('dan', 0.022), ('expected', 0.022), ('pereira', 0.021), ('regina', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction

Author: Dave Golland ; John DeNero ; Jakob Uszkoreit

Abstract: We present LLCCM, a log-linear variant ofthe constituent context model (CCM) of grammar induction. LLCCM retains the simplicity of the original CCM but extends robustly to long sentences. On sentences of up to length 40, LLCCM outperforms CCM by 13.9% bracketing F1 and outperforms a right-branching baseline in regimes where CCM does not.

2 0.091531672 109 acl-2012-Higher-order Constituent Parsing and Parser Combination

Author: Xiao Chen ; Chunyu Kit

Abstract: This paper presents a higher-order model for constituent parsing aimed at utilizing more local structural context to decide the score of a grammar rule instance in a parse tree. Experiments on English and Chinese treebanks confirm its advantage over its first-order version. It achieves its best F1 scores of 91.86% and 85.58% on the two languages, respectively, and further pushes them to 92.80% and 85.60% via combination with other highperformance parsers.

3 0.071501113 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

Author: Adam Pauls ; Dan Klein

Abstract: We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours. We evaluate on perplexity and a range of grammaticality tasks, and find that we perform as well or better than n-gram models and other generative baselines. Our model even competes with state-of-the-art discriminative models hand-designed for the grammaticality tasks, despite training on positive data alone. We also show fluency improvements in a preliminary machine translation experiment.

4 0.069196865 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies

Author: Wanxiang Che ; Valentin Spitkovsky ; Ting Liu

Abstract: Stanford dependencies are widely used in natural language processing as a semanticallyoriented representation, commonly generated either by (i) converting the output of a constituent parser, or (ii) predicting dependencies directly. Previous comparisons of the two approaches for English suggest that starting from constituents yields higher accuracies. In this paper, we re-evaluate both methods for Chinese, using more accurate dependency parsers than in previous work. Our comparison of performance and efficiency across seven popular open source parsers (four constituent and three dependency) shows, by contrast, that recent higher-order graph-based techniques can be more accurate, though somewhat slower, than constituent parsers. We demonstrate also that n-way jackknifing is a useful technique for producing automatic (rather than gold) partof-speech tags to train Chinese dependency parsers. Finally, we analyze the relations produced by both kinds of parsing and suggest which specific parsers to use in practice.

5 0.066608898 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing

Author: Tahira Naseem ; Regina Barzilay ; Amir Globerson

Abstract: We present a novel algorithm for multilingual dependency parsing that uses annotations from a diverse set of source languages to parse a new unannotated language. Our motivation is to broaden the advantages of multilingual learning to languages that exhibit significant differences from existing resource-rich languages. The algorithm learns which aspects of the source languages are relevant for the target language and ties model parameters accordingly. The model factorizes the process of generating a dependency tree into two steps: selection of syntactic dependents and their ordering. Being largely languageuniversal, the selection component is learned in a supervised fashion from all the training languages. In contrast, the ordering decisions are only influenced by languages with similar properties. We systematically model this cross-lingual sharing using typological features. In our experiments, the model consistently outperforms a state-of-the-art multilingual parser. The largest improvement is achieved on the non Indo-European languages yielding a gain of 14.4%.1

6 0.065852076 128 acl-2012-Learning Better Rule Extraction with Translation Span Alignment

7 0.060718328 140 acl-2012-Machine Translation without Words through Substring Alignment

8 0.052367173 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling

9 0.049101412 154 acl-2012-Native Language Detection with Tree Substitution Grammars

10 0.047696292 64 acl-2012-Crosslingual Induction of Semantic Roles

11 0.044228196 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars

12 0.042282008 179 acl-2012-Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm

13 0.042151213 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

14 0.041850671 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

15 0.041398033 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors

16 0.04068289 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation

17 0.040606707 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?

18 0.039596818 115 acl-2012-Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification

19 0.038266011 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models

20 0.037270471 131 acl-2012-Learning Translation Consensus with Structured Label Propagation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.132), (1, -0.022), (2, -0.055), (3, -0.047), (4, -0.033), (5, -0.01), (6, -0.009), (7, 0.008), (8, 0.014), (9, 0.002), (10, 0.006), (11, -0.06), (12, -0.028), (13, -0.012), (14, -0.008), (15, -0.033), (16, 0.021), (17, 0.025), (18, 0.005), (19, 0.031), (20, -0.021), (21, 0.024), (22, -0.019), (23, -0.022), (24, 0.017), (25, -0.003), (26, -0.032), (27, 0.011), (28, 0.044), (29, 0.052), (30, -0.043), (31, -0.039), (32, 0.062), (33, 0.013), (34, -0.008), (35, 0.016), (36, -0.051), (37, -0.044), (38, -0.042), (39, 0.044), (40, 0.047), (41, -0.043), (42, -0.045), (43, -0.115), (44, -0.008), (45, -0.056), (46, 0.121), (47, -0.07), (48, -0.028), (49, -0.006)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.870911 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction

Author: Dave Golland ; John DeNero ; Jakob Uszkoreit

Abstract: We present LLCCM, a log-linear variant ofthe constituent context model (CCM) of grammar induction. LLCCM retains the simplicity of the original CCM but extends robustly to long sentences. On sentences of up to length 40, LLCCM outperforms CCM by 13.9% bracketing F1 and outperforms a right-branching baseline in regimes where CCM does not.

2 0.60817462 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing

Author: Tahira Naseem ; Regina Barzilay ; Amir Globerson

Abstract: We present a novel algorithm for multilingual dependency parsing that uses annotations from a diverse set of source languages to parse a new unannotated language. Our motivation is to broaden the advantages of multilingual learning to languages that exhibit significant differences from existing resource-rich languages. The algorithm learns which aspects of the source languages are relevant for the target language and ties model parameters accordingly. The model factorizes the process of generating a dependency tree into two steps: selection of syntactic dependents and their ordering. Being largely languageuniversal, the selection component is learned in a supervised fashion from all the training languages. In contrast, the ordering decisions are only influenced by languages with similar properties. We systematically model this cross-lingual sharing using typological features. In our experiments, the model consistently outperforms a state-of-the-art multilingual parser. The largest improvement is achieved on the non Indo-European languages yielding a gain of 14.4%.1

3 0.57877177 109 acl-2012-Higher-order Constituent Parsing and Parser Combination

Author: Xiao Chen ; Chunyu Kit

Abstract: This paper presents a higher-order model for constituent parsing aimed at utilizing more local structural context to decide the score of a grammar rule instance in a parse tree. Experiments on English and Chinese treebanks confirm its advantage over its first-order version. It achieves its best F1 scores of 91.86% and 85.58% on the two languages, respectively, and further pushes them to 92.80% and 85.60% via combination with other highperformance parsers.

4 0.55610967 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

Author: Adam Pauls ; Dan Klein

Abstract: We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours. We evaluate on perplexity and a range of grammaticality tasks, and find that we perform as well or better than n-gram models and other generative baselines. Our model even competes with state-of-the-art discriminative models hand-designed for the grammaticality tasks, despite training on positive data alone. We also show fluency improvements in a preliminary machine translation experiment.

5 0.47976372 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations

Author: Emily Pitler

Abstract: Prepositions and conjunctions are two of the largest remaining bottlenecks in parsing. Across various existing parsers, these two categories have the lowest accuracies, and mistakes made have consequences for downstream applications. Prepositions and conjunctions are often assumed to depend on lexical dependencies for correct resolution. As lexical statistics based on the training set only are sparse, unlabeled data can help ameliorate this sparsity problem. By including unlabeled data features into a factorization of the problem which matches the representation of prepositions and conjunctions, we achieve a new state-of-the-art for English dependencies with 93.55% correct attachments on the current standard. Furthermore, conjunctions are attached with an accuracy of 90.8%, and prepositions with an accuracy of 87.4%.

6 0.47964391 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

7 0.47373128 57 acl-2012-Concept-to-text Generation via Discriminative Reranking

8 0.46788216 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

9 0.46352112 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

10 0.44704783 189 acl-2012-Syntactic Annotations for the Google Books NGram Corpus

11 0.44481516 154 acl-2012-Native Language Detection with Tree Substitution Grammars

12 0.44130769 34 acl-2012-Automatically Learning Measures of Child Language Development

13 0.43501121 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars

14 0.43379462 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities

15 0.42896554 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool

16 0.42742738 179 acl-2012-Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm

17 0.42576104 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies

18 0.42367163 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

19 0.42205268 56 acl-2012-Computational Approaches to Sentence Completion

20 0.41438159 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.018), (15, 0.264), (25, 0.025), (26, 0.035), (28, 0.066), (30, 0.038), (37, 0.054), (39, 0.032), (59, 0.016), (74, 0.036), (82, 0.022), (84, 0.034), (85, 0.024), (90, 0.114), (92, 0.054), (94, 0.025), (99, 0.05)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.76238823 100 acl-2012-Fine Granular Aspect Analysis using Latent Structural Models

Author: Lei Fang ; Minlie Huang

Abstract: In this paper, we present a structural learning model forjoint sentiment classification and aspect analysis of text at various levels of granularity. Our model aims to identify highly informative sentences that are aspect-specific in online custom reviews. The primary advantages of our model are two-fold: first, it performs document-level and sentence-level sentiment polarity classification jointly; second, it is able to find informative sentences that are closely related to some respects in a review, which may be helpful for aspect-level sentiment analysis such as aspect-oriented summarization. The proposed method was evaluated with 9,000 Chinese restaurant reviews. Preliminary experiments demonstrate that our model obtains promising performance. 1

same-paper 2 0.7168079 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction

Author: Dave Golland ; John DeNero ; Jakob Uszkoreit

Abstract: We present LLCCM, a log-linear variant ofthe constituent context model (CCM) of grammar induction. LLCCM retains the simplicity of the original CCM but extends robustly to long sentences. On sentences of up to length 40, LLCCM outperforms CCM by 13.9% bracketing F1 and outperforms a right-branching baseline in regimes where CCM does not.

3 0.6546309 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars

Author: Zhenghua Li ; Ting Liu ; Wanxiang Che

Abstract: We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing. Several types of transformation patterns (TP) are designed to capture the systematic annotation inconsistencies among different treebanks. Based on such TPs, we design quasisynchronous grammar features to augment the baseline parsing models. Our approach can significantly advance the state-of-the-art parsing accuracy on two widely used target treebanks (Penn Chinese Treebank 5. 1 and 6.0) using the Chinese Dependency Treebank as the source treebank. The improvements are respectively 1.37% and 1.10% with automatic part-of-speech tags. Moreover, an indirect comparison indicates that our approach also outperforms previous work based on treebank conversion.

4 0.57968724 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling

Author: Arjun Mukherjee ; Bing Liu

Abstract: Aspect extraction is a central problem in sentiment analysis. Current methods either extract aspects without categorizing them, or extract and categorize them using unsupervised topic modeling. By categorizing, we mean the synonymous aspects should be clustered into the same category. In this paper, we solve the problem in a different setting where the user provides some seed words for a few aspect categories and the model extracts and clusters aspect terms into categories simultaneously. This setting is important because categorizing aspects is a subjective task. For different application purposes, different categorizations may be needed. Some form of user guidance is desired. In this paper, we propose two statistical models to solve this seeded problem, which aim to discover exactly what the user wants. Our experimental results show that the two proposed models are indeed able to perform the task effectively. 1

5 0.52630329 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

Author: Lea Frermann ; Francis Bond

Abstract: We present a system for cross-lingual parse disambiguation, exploiting the assumption that the meaning of a sentence remains unchanged during translation and the fact that different languages have different ambiguities. We simultaneously reduce ambiguity in multiple languages in a fully automatic way. Evaluation shows that the system reliably discards dispreferred parses from the raw parser output, which results in a pre-selection that can speed up manual treebanking.

6 0.52449638 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

7 0.5229255 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization

8 0.51911837 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

9 0.51708549 140 acl-2012-Machine Translation without Words through Substring Alignment

10 0.51654279 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents

11 0.51622772 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

12 0.5161289 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models

13 0.51589811 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities

14 0.51565671 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

15 0.5151037 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation

16 0.51482892 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation

17 0.51381862 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool

18 0.51236677 191 acl-2012-Temporally Anchored Relation Extraction

19 0.5122329 136 acl-2012-Learning to Translate with Multiple Objectives

20 0.51217997 218 acl-2012-You Had Me at Hello: How Phrasing Affects Memorability