emnlp emnlp2010 emnlp2010-81 knowledge-graph by maker-knowledge-mining

81 emnlp-2010-Modeling Perspective Using Adaptor Grammars


Source: pdf

Author: Eric Hardisty ; Jordan Boyd-Graber ; Philip Resnik

Abstract: Strong indications of perspective can often come from collocations of arbitrary length; for example, someone writing get the government out of my X is typically expressing a conservative rather than progressive viewpoint. However, going beyond unigram or bigram features in perspective classification gives rise to problems of data sparsity. We address this problem using nonparametric Bayesian modeling, specifically adaptor grammars (Johnson et al., 2006). We demonstrate that an adaptive na¨ ıve Bayes model captures multiword lexical usages associated with perspective, and establishes a new state-of-the-art for perspective classification results using the Bitter Lemons corpus, a collection of essays about mid-east issues from Israeli and Palestinian points of view.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Strong indications of perspective can often come from collocations of arbitrary length; for example, someone writing get the government out of my X is typically expressing a conservative rather than progressive viewpoint. [sent-4, score-0.465]

2 However, going beyond unigram or bigram features in perspective classification gives rise to problems of data sparsity. [sent-5, score-0.259]

3 We address this problem using nonparametric Bayesian modeling, specifically adaptor grammars (Johnson et al. [sent-6, score-0.548]

4 1 Introduction Most work on the computational analysis of sentiment and perspective relies on lexical features. [sent-9, score-0.152]

5 describing healthcare reform as idiotic or wonderful) or to frame a discussion in order to convey a perspective more implicitly (e. [sent-12, score-0.186]

6 However, important indicators of perspective can also be longer (get the government out of my). [sent-24, score-0.184]

7 In this paper, we employ nonparametric Bayesian models (Orbanz and Teh, 2010) in order to address this limitation. [sent-26, score-0.124]

8 In contrast to parametric models, for which a fixed number of parameters are specified in advance, nonparametric models can “grow” to the size best suited to the observed data. [sent-27, score-0.124]

9 , 2006), a formalism for nonparametric Bayesian modeling that has recently proven useful in unsupervised modeling of phonemes (Johnson, 2008), grammar induction (Cohen et al. [sent-33, score-0.278]

10 , 2010), and named entity structure learning (Johnson, 2010), to make supervised na¨ıve Bayes classification nonparametric in order to improve perspective modeling. [sent-34, score-0.339]

11 We introduce adaptive na¨ıve Bayes (ANB), for which in principle the vocabulary can grow as needed to include collocations of arbitrary length, as determined ProceMedITin,g Ms oasfs thaceh 2u0se1t0ts C,o UnSfAer,e n9c-e1 on O Ectmobpeir ic 2a0l1 M0. [sent-36, score-0.292]

12 We show that using adaptive na¨ıve Bayes improves on state of the art classification using the Bitter Lemons corpus (Lin et al. [sent-39, score-0.181]

13 , 2006), a document collection that has been used by a variety of authors to evaluate perspective classification. [sent-40, score-0.219]

14 In Section 2, we review adaptor grammars, show how na¨ıve Bayes can be expressed within the formalism, and describe how and how easily an adaptive na¨ıve Bayes model can be created. [sent-41, score-0.455]

15 — 2 — Adapting Na¨ ıve Bayes to be Less Na¨ ıve In this work we apply the adaptor grammar formalism introduced by Johnson, Griffiths, and Goldwater (Johnson et al. [sent-44, score-0.491]

16 Adaptor grammars are a generalization of probabilistic context free grammars (PCFGs) that make it particularly easy to express nonparametric Bayesian models of language simply and readably using context free rules. [sent-46, score-0.298]

17 provide an inference procedure based on Markov Chain Monte Carlo techniques that makes parameter estimation straightforward for all models that can be expressed using adaptor grammars. [sent-48, score-0.371]

18 1 Variational inference for adaptor grammars has also been recently introduced (Cohen et al. [sent-49, score-0.458]

19 Briefly, adaptor grammars allow nonterminals to be rewritten to entire subtrees. [sent-51, score-0.458]

20 In contrast, a nonterminal in a PCFG rewrites only to a collection of grammar symbols; their subsequent productions are independent of each other. [sent-52, score-0.166]

21 In contrast, an adaptor grammar can l→earn P (or “cache”) the production PP → (P up) (NP(DET a) (N tree)). [sent-54, score-0.475]

22 It does this by positing that the distribution over children for an adapted non-terminal comes from a Pitman-Yor distribution. [sent-55, score-0.192]

23 A Pitman-Yor distribution (Pitman and Yor, 1997) is a distribution over distributions. [sent-56, score-0.206]

24 285 and a probability distribution G0 known as the base distribution. [sent-62, score-0.205]

25 Adaptor grammars allow distributions over subtrees to come from a Pitman-Yor distribution with the PCFG’s original distribution over trees as the base distribution. [sent-63, score-0.395]

26 The generative process for obtaining draws from a distribution drawn from a Pitman-Yor distribution can be described by the “Chinese restaurant process” (CRP). [sent-64, score-0.392]

27 Suppose that we have a base distribution Ω that is some distribution over all sequences of words (the exact structure of such a distribution is unimportant; such a distribution will be defined later in Table 1). [sent-66, score-0.514]

28 2 If she sits at a new table j, that table is assigned a draw yj from the base distribution, Ω; note that, since Ω is a distribution over n-grams, yj is an ngram. [sent-71, score-0.422]

29 The probability of joining an existing table j, with cj patrons already seated at table j, is ccj−+ba, where Pc· is the number of patrons seated at all tab·les: c· = Pj0 cj0 . [sent-74, score-0.196]

30 However, there is always a chance of drawing from the base distribution, and therefore every word sequence can also always be drawn from φ. [sent-78, score-0.136]

31 We will then use the PCFG distribution as the base distribution for a Pitman-Yor distribution, adapting the na¨ıve Bayes process to give us a distribution over n-grams, thus learning new language substructures that are useful for modeling the differences in perspective. [sent-80, score-0.411]

32 It posits that there are K distinct categories of text each with a distinct distribution over words and that every document, represented as an exchangeable bag of words, is drawn from one (and only one) of these distributions. [sent-83, score-0.213]

33 Draw a global distribution over classes θ ∼ Dir (α) 2. [sent-86, score-0.143]

34 , K}, draw a word distribution φi ∼ D∈ir { (λ) 3. [sent-90, score-0.167]

35 , Nd, draw wd,n ∼ Mult (φzd) A variant ofthe na¨ıve Bayes generative process can be expressed using the adaptor grammar formalism (Table 1). [sent-97, score-0.61]

36 , m; i ∈ {1, K} ii ∈ {1, K} ii ∈ {1, K} V ; ii ∈ {1, K} {1,K Table 1: A na¨ıve Bayes-inspired model expressed as a PCFG. [sent-109, score-0.276]

37 One can assume a symmetric Dirichlet prior of Dir (1¯) over the production choices unless otherwise specified as with the DOCd production rule above, where a sparse prior is used. [sent-111, score-0.15]

38 If the distribution over per-sentence labels is sparse (as it is above for DOCd), this will closely approximate na¨ıve Bayes, since it will be very unlikely for the sentences in a document to have different labels. [sent-115, score-0.17]

39 A non-sparse prior leads to behavior more like models — that allow parts of a document to express sentiment or perspective differently. [sent-116, score-0.251]

40 2 Moving Beyond the Bag of Words The na¨ıve Bayes generative distribution posits that when writing a document, the author selects a distribution of categories zd for the document from θ. [sent-118, score-0.472]

41 The author then generates words one at a time: each word is selected independently from a flat multinomial distribution φzd over the vocabulary. [sent-119, score-0.135]

42 Clearly words are often connected with each other as collocations, and, just as clearly, extending a flat vocabulary to include bigram collocations does not suffice, since sometimes relevant perspective-bearing phrases are longer than two words. [sent-121, score-0.218]

43 Consider phrases like health care for all or government takeover of health care, connected with progressive and conservative positions, respectively, during the national debate on healthcare reform. [sent-122, score-0.438]

44 Simply applying na¨ıve Bayes, or any other model, to a bag of n-grams for high n is λαθzdWφdi,nNKdMα (a) Na¨ ıve Bayes Figure 1: A plate diagram for na¨ıve Bayes and adaptive na¨ıve Bayes. [sent-123, score-0.152]

45 Following Johnson (2010), however, we can use adaptor grammars to extend na¨ıve Bayes flexibly to include richer structure like collocations when they improve the model, and not including them when they do not. [sent-126, score-0.542]

46 This can be accomplished by introducing adapted nonterminal rules: in a revised generative process, the author can draw from Pitman-Yor distribution whose base distribution is over word sequences of arbitrary length. [sent-127, score-0.619]

47 Note the following differences between Figures 1(a) and 1(b): • • • • • selects which Pitman-Yor distribution to draw fzrom for document d. [sent-130, score-0.234]

48 Ω is the Pitman-Yor base distribution with τ as iΩts uniform hyperparameter. [sent-133, score-0.205]

49 zd 4As defined above, the base distribution is that of the PCFG production rule WORDSi. [sent-134, score-0.318]

50 287 Returning to the CRP metaphor discussed when we introduced the Pitman-Yor distribution, there are two restaurants, one for the PROGRESSIVE distribution and one for the CONSERVATIVE distribution. [sent-136, score-0.137]

51 There is no such table in the CONSERVATIVE restaurant, so in order to generate those words, the phrase health care for all would need to come from a new table; however, it is more easily explained by three customers sitting at three existing, popular tables: health care, for, and all. [sent-138, score-0.282]

52 The grammar for adaptive na¨ıve Bayes is shown in Table 2. [sent-140, score-0.213]

53 The adapted COLLOCi rule means that every time we need to generate that nonterminal, we are actually drawing from a distribution drawn from a Pitman-Yor distribution. [sent-141, score-0.226]

54 The distribution over the possible yields of the WORDSi rule serves as the base distribution. [sent-142, score-0.205]

55 For completeness, we also consider the alternative of using a shared base distribution rather than distinguishing the base distributions of the two classes. [sent-145, score-0.307]

56 , m; i ∈ {1, K} ii ∈ {1, K} ii ∈ {1, K} ii ∈ ii ∈ {1, K} ii ∈ {1, K} V ; ii ∈ {1, K} {1, K} v ∈ ∈ {1,K Table 2: An adaptive na¨ıve Bayes grammar. [sent-153, score-0.67]

57 The COLLOCi nonterminal’s distribution over yields is drawn from a Pitman-Yor distribution rather than a Dirichlet over production rules. [sent-154, score-0.283]

58 , m; i ∈ {1, K} ii ∈ {1, K} ii ∈ {1, K} ii ∈ v → → → → 7→v {1, K} v ∈ V ∈ V Table 3: An adaptive na¨ıve Bayes grammar with a common base distribution for collocations. [sent-162, score-0.694]

59 Briefly, using a shared base distribution posits that the two classes use similar word distributions, but generate collocations unique to each class, whereas using separate base distributions assumes that the distribution of words is unique to each class. [sent-164, score-0.61]

60 1 Corpus Description We conducted our classification experiments on the Bitter Lemons (BL) corpus, which is a collection of 297 essays averaging 700-800 words in length, on various Middle East issues, written from both the Israeli and Palestinian perspectives. [sent-166, score-0.145]

61 org 288 Figure 2: An alternative adaptive na¨ıve Bayes with a common base distribution for both classes. [sent-171, score-0.323]

62 The classification goal for this corpus is to label each document with the perspective of its author, either Israeli or Palestinian. [sent-176, score-0.282]

63 2 Experimental Setup The vocabulary generator determines the vocabulary used by a given experiment by converting the training set to lower case, stemming with the Porter stemmer, and filtering punctuation. [sent-181, score-0.154]

64 6 The vocabulary is then passed to a grammar generator and a corpus filter. [sent-183, score-0.193]

65 The grammar generator uses the vocabulary to generate the terminating rules of the grammar from the ANB grammar presented in Tables 2 and 3. [sent-184, score-0.383]

66 The test and training set are then sent, along with the grammar, into the adaptor grammar inference engine. [sent-188, score-0.466]

67 We identify that distribution from each of the test set sentence parses and use it as the sentence level classification for that particular sentence. [sent-194, score-0.166]

68 We then use majority rule on the individual sentence classifications in a document to obtain the document classification. [sent-195, score-0.134]

69 6428 73 grammar as Adapted Na¨ıve Bayes, but with adaptation disabled. [sent-206, score-0.13]

70 Com and Sep refer to whether the base distribution was common to both classes or separate. [sent-207, score-0.245]

71 ANB* refers to the grammar from Table 2, but with adaptation disabled. [sent-211, score-0.13]

72 The reported accuracy values for ANB*, ANB with a common base distribution (see Table 3), and ANB with separate base distributions (see Table 2) are the mean values from five separate sampling chains. [sent-212, score-0.307]

73 Consistent with all prior work on this corpus we found that the classification accuracy for training on editors and testing on guests was lower than the other direction since the larger number of editors in the guest set allows for greater generalization. [sent-215, score-0.238]

74 The difference between ANB* and ANB with a common base distribution is not statistically significant. [sent-216, score-0.205]

75 Also of note is that the classification accuracy improves for testing on Guests when the ANB grammar is allowed to adapt and a separate base distribution is used for the two classes (88. [sent-217, score-0.403]

76 The column labeled unique unigrams cached indicates the number of unique unigrams that appear on the right hand side of the adapted rules. [sent-221, score-0.298]

77 Similarly, unique n-grams cached indicates the number of unique n-grams that appear on the right hand side of the adapted rules. [sent-222, score-0.224]

78 The rightmost column indicates the percentage of terms from the group vocabulary that appear on the right hand side of adapted rules as unigrams. [sent-223, score-0.145]

79 Values less than 100% indicate that the remaining vocabu- lary terms are cached in n-grams. [sent-224, score-0.135]

80 Inspection of the captured bigrams showed that it captured sequences that a human might associate with one perspective over the other. [sent-226, score-0.209]

81 Table 6 lists just a few of the more charged bigrams that were captured in the adapted rules. [sent-227, score-0.146]

82 This data clearly demonstrates that raw n-gram frequency alone is not indicative of how many times an n-gram is used as a cached rule. [sent-229, score-0.135]

83 For example, consider the bigram people go, which is used as a cached bigram only three times, yet appears in the corpus 407 times. [sent-230, score-0.223]

84 Compare that with isra palestinian, which is cached 290 the same number of times but appears only 18 times in the corpus. [sent-231, score-0.135]

85 4 Conclusions In this paper, we have applied adaptor grammars in a supervised setting to model lexical properties of text and improve document classification according to perspective, by allowing nonparametric discovery of collocations that aid in perspective classification. [sent-236, score-0.948]

86 The adaptive na¨ıve Bayes model improves on state of the art supervised classification performance in head-to-head comparisons with previous approaches. [sent-237, score-0.181]

87 Although there have been many investigations on the efficacy of using multiword collocations in text classification (Bekkerman and Allan, 2004), usually such approaches depend on a preprocessing step such as computing tf-idf or other measures of frequency based on either word bigrams (Tan et al. [sent-238, score-0.238]

88 (2006) argue, and as we have confirmed here, the adaptor Table 7: Most frequently used cached bigrams. [sent-245, score-0.472]

89 The first colum in each section is the number of times that bigram was used as a cached rule. [sent-246, score-0.179]

90 grammar formalism makes it quite easy to work with latent variable models, in order to automatically discover structures in the data that have predictive value. [sent-248, score-0.187]

91 (2006), which is straightforward to encode using the adaptor grammar formalism simply by introducing two new nonterminals to represent the neutral distribution: SENT7→DOCdd = 1,. [sent-251, score-0.56]

92 , m; WORDSi → WORDSi WORDi i ∈ {1, K} WORDSi → WORDi ii ∈ {1, K} WORDi → v v ∈ V ; ii ∈ {1, K} NEUT → NEUTSi NEUTi NEUT → NEUT NEUT → v v ∈ V 7→v ∈ V Running this grammar did not produce improvements consistent with those reported by Lin et al. [sent-260, score-0.279]

93 We plan to investigate this further, and a natural follow-on would be to experiment with adaptation for this variety of latent structure, to produce an adapted LSPM-like model analogous to adaptive na¨ ıve Bayes. [sent-261, score-0.275]

94 , 2008)), it is clear that lexical evidence is one key to understanding how language is used to frame discussion from one perspective or another; Resnik and Greene (2009) have shown that syntactic choices can provide important evidence, as well. [sent-268, score-0.186]

95 Another promising direction for this work is the application of adaptor grammar models as a way to capture both lexical and grammatical aspects of framing in a unified model. [sent-269, score-0.478]

96 We are particularly grateful to Mark Johnson for making his adaptor grammar code available. [sent-275, score-0.432]

97 Improving nonparameteric bayesian inference: experiments on unsupervised word segmentation with adaptor grammars. [sent-308, score-0.386]

98 Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models. [sent-313, score-0.124]

99 Using adaptor grammars to identify synergies in the unsupervised acquisition of linguistic structure. [sent-317, score-0.424]

100 PCFGs, topic models, adaptor grammars and learning topical collocations and the structure of proper names. [sent-321, score-0.542]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('adaptor', 0.337), ('bayes', 0.259), ('wordsi', 0.256), ('na', 0.253), ('ve', 0.234), ('anb', 0.217), ('wordi', 0.217), ('colloci', 0.158), ('spani', 0.158), ('perspective', 0.152), ('cached', 0.135), ('nonparametric', 0.124), ('collocations', 0.118), ('adaptive', 0.118), ('johnson', 0.109), ('distribution', 0.103), ('base', 0.102), ('idd', 0.099), ('grammar', 0.095), ('ii', 0.092), ('adapted', 0.089), ('grammars', 0.087), ('health', 0.087), ('guests', 0.084), ('conservative', 0.084), ('essays', 0.082), ('bitter', 0.079), ('docd', 0.079), ('docdd', 0.079), ('lemons', 0.079), ('neut', 0.079), ('progressive', 0.079), ('nonterminal', 0.071), ('zd', 0.07), ('care', 0.069), ('crp', 0.068), ('document', 0.067), ('political', 0.067), ('draw', 0.064), ('classification', 0.063), ('restaurant', 0.062), ('formalism', 0.059), ('guest', 0.059), ('laver', 0.059), ('lspm', 0.059), ('mullen', 0.059), ('seated', 0.059), ('sits', 0.059), ('bigrams', 0.057), ('pcfg', 0.057), ('greene', 0.056), ('vocabulary', 0.056), ('generative', 0.055), ('resnik', 0.054), ('contributors', 0.051), ('bayesian', 0.049), ('yj', 0.047), ('opus', 0.046), ('framing', 0.046), ('bigram', 0.044), ('production', 0.043), ('generator', 0.042), ('posits', 0.042), ('pcfgs', 0.042), ('lin', 0.04), ('classes', 0.04), ('bekkerman', 0.039), ('hardisty', 0.039), ('imagine', 0.039), ('lowd', 0.039), ('monroe', 0.039), ('orbanz', 0.039), ('patrons', 0.039), ('raskutti', 0.039), ('restaurants', 0.039), ('sitting', 0.039), ('palestinian', 0.039), ('israeli', 0.037), ('rewrite', 0.037), ('umiacs', 0.037), ('unigrams', 0.037), ('adaptation', 0.035), ('neutral', 0.035), ('customer', 0.035), ('draws', 0.035), ('inference', 0.034), ('drawn', 0.034), ('metaphor', 0.034), ('tax', 0.034), ('cache', 0.034), ('discount', 0.034), ('observable', 0.034), ('pitman', 0.034), ('bag', 0.034), ('frame', 0.034), ('nonterminals', 0.034), ('latent', 0.033), ('government', 0.032), ('prior', 0.032), ('author', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 81 emnlp-2010-Modeling Perspective Using Adaptor Grammars

Author: Eric Hardisty ; Jordan Boyd-Graber ; Philip Resnik

Abstract: Strong indications of perspective can often come from collocations of arbitrary length; for example, someone writing get the government out of my X is typically expressing a conservative rather than progressive viewpoint. However, going beyond unigram or bigram features in perspective classification gives rise to problems of data sparsity. We address this problem using nonparametric Bayesian modeling, specifically adaptor grammars (Johnson et al., 2006). We demonstrate that an adaptive na¨ ıve Bayes model captures multiword lexical usages associated with perspective, and establishes a new state-of-the-art for perspective classification results using the Bitter Lemons corpus, a collection of essays about mid-east issues from Israeli and Palestinian points of view.

2 0.098141022 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars

Author: Zhongqiang Huang ; Mary Harper ; Slav Petrov

Abstract: Mary Harper†‡ ‡HLT Center of Excellence Johns Hopkins University Baltimore, MD mharpe r@ umd .edu Slav Petrov∗ ∗Google Research 76 Ninth Avenue New York, NY s lav@ google . com ting the training data and eventually begins over- fitting (Liang et al., 2007). Moreover, EM is a loWe study self-training with products of latent variable grammars in this paper. We show that increasing the quality of the automatically parsed data used for self-training gives higher accuracy self-trained grammars. Our generative self-trained grammars reach F scores of 91.6 on the WSJ test set and surpass even discriminative reranking systems without selftraining. Additionally, we show that multiple self-trained grammars can be combined in a product model to achieve even higher accuracy. The product model is most effective when the individual underlying grammars are most diverse. Combining multiple grammars that were self-trained on disjoint sets of unlabeled data results in a final test accuracy of 92.5% on the WSJ test set and 89.6% on our Broadcast News test set.

3 0.096843064 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation

Author: Jordan Boyd-Graber ; Philip Resnik

Abstract: In this paper, we develop multilingual supervised latent Dirichlet allocation (MLSLDA), a probabilistic generative model that allows insights gleaned from one language’s data to inform how the model captures properties of other languages. MLSLDA accomplishes this by jointly modeling two aspects of text: how multilingual concepts are clustered into thematically coherent topics and how topics associated with text connect to an observed regression variable (such as ratings on a sentiment scale). Concepts are represented in a general hierarchical framework that is flexible enough to express semantic ontologies, dictionaries, clustering constraints, and, as a special, degenerate case, conventional topic models. Both the topics and the regression are discovered via posterior inference from corpora. We show MLSLDA can build topics that are consistent across languages, discover sensible bilingual lexical correspondences, and leverage multilingual corpora to better predict sentiment. Sentiment analysis (Pang and Lee, 2008) offers the promise of automatically discerning how people feel about a product, person, organization, or issue based on what they write online, which is potentially of great value to businesses and other organizations. However, the vast majority of sentiment resources and algorithms are limited to a single language, usually English (Wilson, 2008; Baccianella and Sebastiani, 2010). Since no single language captures a majority of the content online, adopting such a limited approach in an increasingly global community risks missing important details and trends that might only be available when text in multiple languages is taken into account. 45 Philip Resnik Department of Linguistics and UMIACS University of Maryland College Park, MD re snik@umd .edu Up to this point, multiple languages have been addressed in sentiment analysis primarily by transferring knowledge from a resource-rich language to a less rich language (Banea et al., 2008), or by ignoring differences in languages via translation into English (Denecke, 2008). These approaches are limited to a view of sentiment that takes place through an English-centric lens, and they ignore the potential to share information between languages. Ideally, learning sentiment cues holistically, across languages, would result in a richer and more globally consistent picture. In this paper, we introduce Multilingual Supervised Latent Dirichlet Allocation (MLSLDA), a model for sentiment analysis on a multilingual corpus. MLSLDA discovers a consistent, unified picture of sentiment across multiple languages by learning “topics,” probabilistic partitions of the vocabulary that are consistent in terms of both meaning and relevance to observed sentiment. Our approach makes few assumptions about available resources, requiring neither parallel corpora nor machine translation. The rest of the paper proceeds as follows. In Section 1, we describe the probabilistic tools that we use to create consistent topics bridging across languages and the MLSLDA model. In Section 2, we present the inference process. We discuss our set of semantic bridges between languages in Section 3, and our experiments in Section 4 demonstrate that this approach functions as an effective multilingual topic model, discovers sentiment-biased topics, and uses multilingual corpora to make better sentiment predictions across languages. Sections 5 and 6 discuss related research and discusses future work, respectively. ProcMe IdTi,n Mgsas ofsa tchehu 2se0t1t0s, C UoSnAfe,r 9e-n1ce1 o Onc Etombepri 2ic0a1l0 M. ?ec th2o0d1s0 i Ans Nsaotcuiartaioln La fonrg Cuaogmep Purtoatcieosnsainlg L,in pgagueis ti 4c5s–5 , 1 Predictions from Multilingual Topics As its name suggests, MLSLDA is an extension of Latent Dirichlet allocation (LDA) (Blei et al., 2003), a modeling approach that takes a corpus of unannotated documents as input and produces two outputs, a set of “topics” and assignments of documents to topics. Both the topics and the assignments are probabilistic: a topic is represented as a probability distribution over words in the corpus, and each document is assigned a probability distribution over all the topics. Topic models built on the foundations of LDA are appealing for sentiment analysis because the learned topics can cluster together sentimentbearing words, and because topic distributions are a parsimonious way to represent a document.1 LDA has been used to discover latent structure in text (e.g. for discourse segmentation (Purver et al., 2006) and authorship (Rosen-Zvi et al., 2004)). MLSLDA extends the approach by ensuring that this latent structure the underlying topics is consistent across languages. We discuss multilingual topic modeling in Section 1. 1, and in Section 1.2 we show how this enables supervised regression regardless of a document’s language. — — 1.1 Capturing Semantic Correlations Topic models posit a straightforward generative process that creates an observed corpus. For each docu- ment d, some distribution θd over unobserved topics is chosen. Then, for each word position in the document, a topic z is selected. Finally, the word for that position is generated by selecting from the topic indexed by z. (Recall that in LDA, a “topic” is a distribution over words). In monolingual topic models, the topic distribution is usually drawn from a Dirichlet distribution. Using Dirichlet distributions makes it easy to specify sparse priors, and it also simplifies posterior inference because Dirichlet distributions are conjugate to multinomial distributions. However, drawing topics from Dirichlet distributions will not suffice if our vocabulary includes multiple languages. If we are working with English, German, and Chinese at the same time, a Dirichlet prior has no way to favor distributions z such that p(good|z), p(gut|z), and 1The latter property has also made LDA popular for information retrieval (Wei and Croft, 2006)). 46 p(h aˇo|z) all tend to be high at the same time, or low at hth ˇaeo same lti tmened. tMoo bree generally, et sheam structure oorf our model must encourage topics to be consistent across languages, and Dirichlet distributions cannot encode correlations between elements. One possible solution to this problem is to use the multivariate normal distribution, which can produce correlated multinomials (Blei and Lafferty, 2005), in place of the Dirichlet distribution. This has been done successfully in multilingual settings (Cohen and Smith, 2009). However, such models complicate inference by not being conjugate. Instead, we appeal to tree-based extensions of the Dirichlet distribution, which has been used to induce correlation in semantic ontologies (Boyd-Graber et al., 2007) and to encode clustering constraints (Andrzejewski et al., 2009). The key idea in this approach is to assume the vocabularies of all languages are organized according to some shared semantic structure that can be represented as a tree. For concreteness in this section, we will use WordNet (Miller, 1990) as the representation of this multilingual semantic bridge, since it is well known, offers convenient and intuitive terminology, and demonstrates the full flexibility of our approach. However, the model we describe generalizes to any tree-structured rep- resentation of multilingual knowledge; we discuss some alternatives in Section 3. WordNet organizes a vocabulary into a rooted, directed acyclic graph of nodes called synsets, short for “synonym sets.” A synset is a child of another synset if it satisfies a hyponomy relationship; each child “is a” more specific instantiation of its parent concept (thus, hyponomy is often called an “isa” relationship). For example, a “dog” is a “canine” is an “animal” is a “living thing,” etc. As an approximation, it is not unreasonable to assume that WordNet’s structure of meaning is language independent, i.e. the concept encoded by a synset can be realized using terms in different languages that share the same meaning. In practice, this organization has been used to create many alignments of international WordNets to the original English WordNet (Ordan and Wintner, 2007; Sagot and Fiˇ ser, 2008; Isahara et al., 2008). Using the structure of WordNet, we can now describe a generative process that produces a distribution over a multilingual vocabulary, which encourages correlations between words with similar meanings regardless of what language each word is in. For each synset h, we create a multilingual word distribution for that synset as follows: 1. Draw transition probabilities βh ∼ Dir (τh) 2. Draw stop probabilities ωh ∼ Dir∼ (κ Dhi)r 3. For each language l, draw emission probabilities for that synset φh,l ∼ Dir (πh,l) . For conciseness in the rest of the paper, we will refer to this generative process as multilingual Dirichlet hierarchy, or MULTDIRHIER(τ, κ, π) .2 Each observed token can be viewed as the end result of a sequence of visited synsets λ. At each node in the tree, the path can end at node iwith probability ωi,1, or it can continue to a child synset with probability ωi,0. If the path continues to another child synset, it visits child j with probability βi,j. If the path ends at a synset, it generates word k with probability φi,l,k.3 The probability of a word being emitted from a path with visited synsets r and final synset h in language lis therefore p(w, λ = r, h|l, β, ω, φ) = (iY,j)∈rβi,jωi,0(1 − ωh,1)φh,l,w. Note that the stop probability ωh (1) is independent of language, but the emission φh,l is dependent on the language. This is done to prevent the following scenario: while synset A is highly probable in a topic and words in language 1attached to that synset have high probability, words in language 2 have low probability. If this could happen for many synsets in a topic, an entire language would be effectively silenced, which would lead to inconsistent topics (e.g. 2Variables τh, πh,l, and κh are hyperparameters. Their mean is fixed, but their magnitude is sampled during inference (i.e. Pkτhτ,ih,k is constant, but τh,i is not). For the bushier bridges, (Pe.g. dictionary and flat), their mean is uniform. For GermaNet, we took frequencies from two balanced corpora of German and English: the British National Corpus (University of Oxford, 2006) and the Kern Corpus of the Digitales Wo¨rterbuch der Deutschen Sprache des 20. Jahrhunderts project (Geyken, 2007). We took these frequencies and propagated them through the multilingual hierarchy, following LDAWN’s (Boyd-Graber et al., 2007) formulation of information content (Resnik, 1995) as a Bayesian prior. The variance of the priors was initialized to be 1.0, but could be sampled during inference. 3Note that the language and word are taken as given, but the path through the semantic hierarchy is a latent random variable. 47 Topic 1 is about baseball in English and about travel in German). Separating path from emission helps ensure that topics are consistent across languages. Having defined topic distributions in a way that can preserve cross-language correspondences, we now use this distribution within a larger model that can discover cross-language patterns of use that predict sentiment. 1.2 The MLSLDA Model We will view sentiment analysis as a regression problem: given an input document, we want to predict a real-valued observation y that represents the sentiment of a document. Specifically, we build on supervised latent Dirichlet allocation (SLDA, (Blei and McAuliffe, 2007)), which makes predictions based on the topics expressed in a document; this can be thought of projecting the words in a document to low dimensional space of dimension equal to the number of topics. Blei et al. showed that using this latent topic structure can offer improved predictions over regressions based on words alone, and the approach fits well with our current goals, since word-level cues are unlikely to be identical across languages. In addition to text, SLDA has been successfully applied to other domains such as social networks (Chang and Blei, 2009) and image classification (Wang et al., 2009). The key innovation in this paper is to extend SLDA by creating topics that are globally consistent across languages, using the bridging approach above. We express our model in the form of a probabilistic generative latent-variable model that generates documents in multiple languages and assigns a realvalued score to each document. The score comes from a normal distribution whose sum is the dot product between a regression parameter η that encodes the influence of each topic on the observation and a variance σ2. With this model in hand, we use statistical inference to determine the distribution over latent variables that, given the model, best explains observed data. The generative model is as follows: 1. For each topic i= 1. . . K, draw a topic distribution {βi, ωi, φi} from MULTDIRHIER(τ, κ, π). 2. {Foβr each do}cuf mroemn tM Md = 1. . . M with language ld: (a) CDihro(oαse). a distribution over topics θd ∼ (b) For each word in the document n = 1. . . Nd, choose a topic assignment zd,n ∼ Mult (θd) and a path λd,n ending at word wd,n according to Equation 1using {βzd,n , ωzd,n , φzd,n }. 3. Choose a re?sponse variable from y Norm ?η> z¯, σ2?, where z¯ d ≡ N1 PnN=1 zd,n. ∼ Crucially, note that the topics are not independent of the sentiment task; the regression encourages terms with similar effects on the observation y to be in the same topic. The consistency of topics described above allows the same regression to be done for the entire corpus regardless of the language of the underlying document. 2 Inference Finding the model parameters most likely to explain the data is a problem of statistical inference. We employ stochastic EM (Diebolt and Ip, 1996), using a Gibbs sampler for the E-step to assign words to paths and topics. After randomly initializing the topics, we alternate between sampling the topic and path of a word (zd,n, λd,n) and finding the regression parameters η that maximize the likelihood. We jointly sample the topic and path conditioning on all of the other path and document assignments in the corpus, selecting a path and topic with probability p(zn = k, λn = r|z−n , λ−n, wn , η, σ, Θ) = p(yd|z, η, σ)p(λn = r|zn = k, λ−n, wn, τ, p(zn = k|z−n, α) . κ, π) (2) Each of these three terms reflects a different influence on the topics from the vocabulary structure, the document’s topics, and the response variable. In the next paragraphs, we will expand each of them to derive the full conditional topic distribution. As discussed in Section 1.1, the structure of the topic distribution encourages terms with the same meaning to be in the same topic, even across languages. During inference, we marginalize over possible multinomial distributions β, ω, and φ, using the observed transitions from ito j in topic k; Tk,i,j, stop counts in synset iin topic k, Ok,i,0; continue counts in synsets iin topic k, Ok,i,1 ; and emission counts in synset iin language lin topic k, Fk,i,l. The 48 Multilingual Topics Text Documents Sentiment Prediction Figure 1: Graphical model representing MLSLDA. Shaded nodes represent observations, plates denote replication, and lines show probabilistic dependencies. probability of taking a path r is then p(λn = r|zn = k, λ−n) = (iY,j)∈r PBj0Bk,ik,j,i,+j0 τ+i,j τi,jPs∈0O,1k,Oi,1k,+i,s ω+i ωi,s! |(iY,j)∈rP{zP} Tran{szitiPon Ok,rend,0 + ωrend Fk,rend,wn + πrend,}l Ps∈0,1Ok,rend,s+ ωrend,sPw0Frend,w0+ πrend,w0 |PEmi{szsiPon} (3) Equation 3 reflects the multilingual aspect of this model. The conditional topic distribution for SLDA (Blei and McAuliffe, 2007) replaces this term with the standard Multinomial-Dirichlet. However, we believe this is the first published SLDA-style model using MCMC inference, as prior work has used variational inference (Blei and McAuliffe, 2007; Chang and Blei, 2009; Wang et al., 2009). Because the observed response variable depends on the topic assignments of a document, the conditional topic distribution is shifted toward topics that explain the observed response. Topics that move the predicted response yˆd toward the true yd will be favored. We drop terms that are constant across all topics for the effect of the response variable, p(yd|z, η, σ) ∝ exp?σ12?yd−PPk0kN0Nd,dk,0kη0k0?Pkη0Nzkd,k0? |??PP{z?P?} . Other wPord{zs’ influence exp

4 0.09290877 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

Author: Phil Blunsom ; Trevor Cohn

Abstract: Inducing a grammar directly from text is one of the oldest and most challenging tasks in Computational Linguistics. Significant progress has been made for inducing dependency grammars, however the models employed are overly simplistic, particularly in comparison to supervised parsing models. In this paper we present an approach to dependency grammar induction using tree substitution grammar which is capable of learning large dependency fragments and thereby better modelling the text. We define a hierarchical non-parametric Pitman-Yor Process prior which biases towards a small grammar with simple productions. This approach significantly improves the state-of-the-art, when measured by head attachment accuracy.

5 0.087578051 100 emnlp-2010-Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective

Author: Amr Ahmed ; Eric Xing

Abstract: With the proliferation of user-generated articles over the web, it becomes imperative to develop automated methods that are aware of the ideological-bias implicit in a document collection. While there exist methods that can classify the ideological bias of a given document, little has been done toward understanding the nature of this bias on a topical-level. In this paper we address the problem ofmodeling ideological perspective on a topical level using a factored topic model. We develop efficient inference algorithms using Collapsed Gibbs sampling for posterior inference, and give various evaluations and illustrations of the utility of our model on various document collections with promising results. Finally we give a Metropolis-Hasting inference algorithm for a semi-supervised extension with decent results.

6 0.084346019 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

7 0.081234269 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

8 0.079844572 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging

9 0.072496586 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

10 0.069599062 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning

11 0.069092102 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications

12 0.067159198 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

13 0.061105922 114 emnlp-2010-Unsupervised Parse Selection for HPSG

14 0.060643002 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

15 0.056596946 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

16 0.056239784 17 emnlp-2010-An Efficient Algorithm for Unsupervised Word Segmentation with Branching Entropy and MDL

17 0.054237794 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

18 0.052086037 79 emnlp-2010-Mining Name Translations from Entity Graph Mapping

19 0.050271124 61 emnlp-2010-Improving Gender Classification of Blog Authors

20 0.049776435 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.194), (1, 0.093), (2, 0.011), (3, -0.112), (4, 0.108), (5, -0.029), (6, -0.013), (7, -0.027), (8, 0.012), (9, -0.061), (10, 0.122), (11, -0.008), (12, 0.014), (13, 0.072), (14, -0.006), (15, 0.019), (16, -0.044), (17, 0.011), (18, -0.052), (19, 0.073), (20, 0.052), (21, -0.228), (22, -0.052), (23, 0.026), (24, 0.015), (25, 0.052), (26, -0.009), (27, -0.048), (28, -0.172), (29, 0.027), (30, -0.091), (31, -0.061), (32, -0.129), (33, 0.004), (34, -0.096), (35, 0.131), (36, 0.19), (37, 0.117), (38, -0.032), (39, -0.132), (40, -0.024), (41, -0.146), (42, 0.128), (43, -0.048), (44, -0.162), (45, 0.038), (46, 0.105), (47, -0.219), (48, 0.118), (49, -0.319)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96000463 81 emnlp-2010-Modeling Perspective Using Adaptor Grammars

Author: Eric Hardisty ; Jordan Boyd-Graber ; Philip Resnik

Abstract: Strong indications of perspective can often come from collocations of arbitrary length; for example, someone writing get the government out of my X is typically expressing a conservative rather than progressive viewpoint. However, going beyond unigram or bigram features in perspective classification gives rise to problems of data sparsity. We address this problem using nonparametric Bayesian modeling, specifically adaptor grammars (Johnson et al., 2006). We demonstrate that an adaptive na¨ ıve Bayes model captures multiword lexical usages associated with perspective, and establishes a new state-of-the-art for perspective classification results using the Bitter Lemons corpus, a collection of essays about mid-east issues from Israeli and Palestinian points of view.

2 0.47845164 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars

Author: Zhongqiang Huang ; Mary Harper ; Slav Petrov

Abstract: Mary Harper†‡ ‡HLT Center of Excellence Johns Hopkins University Baltimore, MD mharpe r@ umd .edu Slav Petrov∗ ∗Google Research 76 Ninth Avenue New York, NY s lav@ google . com ting the training data and eventually begins over- fitting (Liang et al., 2007). Moreover, EM is a loWe study self-training with products of latent variable grammars in this paper. We show that increasing the quality of the automatically parsed data used for self-training gives higher accuracy self-trained grammars. Our generative self-trained grammars reach F scores of 91.6 on the WSJ test set and surpass even discriminative reranking systems without selftraining. Additionally, we show that multiple self-trained grammars can be combined in a product model to achieve even higher accuracy. The product model is most effective when the individual underlying grammars are most diverse. Combining multiple grammars that were self-trained on disjoint sets of unlabeled data results in a final test accuracy of 92.5% on the WSJ test set and 89.6% on our Broadcast News test set.

3 0.46080807 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

Author: Phil Blunsom ; Trevor Cohn

Abstract: Inducing a grammar directly from text is one of the oldest and most challenging tasks in Computational Linguistics. Significant progress has been made for inducing dependency grammars, however the models employed are overly simplistic, particularly in comparison to supervised parsing models. In this paper we present an approach to dependency grammar induction using tree substitution grammar which is capable of learning large dependency fragments and thereby better modelling the text. We define a hierarchical non-parametric Pitman-Yor Process prior which biases towards a small grammar with simple productions. This approach significantly improves the state-of-the-art, when measured by head attachment accuracy.

4 0.37073427 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

Author: Jacob Eisenstein ; Brendan O'Connor ; Noah A. Smith ; Eric P. Xing

Abstract: The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as “sports” or “entertainment” are rendered differently in each geographic region, revealing topic-specific regional distinctions. Applied to a new dataset of geotagged microblogs, our model recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency. The model also enables prediction of an author’s geographic location from raw text, outperforming both text regression and supervised topic models.

5 0.32843396 100 emnlp-2010-Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective

Author: Amr Ahmed ; Eric Xing

Abstract: With the proliferation of user-generated articles over the web, it becomes imperative to develop automated methods that are aware of the ideological-bias implicit in a document collection. While there exist methods that can classify the ideological bias of a given document, little has been done toward understanding the nature of this bias on a topical-level. In this paper we address the problem ofmodeling ideological perspective on a topical level using a factored topic model. We develop efficient inference algorithms using Collapsed Gibbs sampling for posterior inference, and give various evaluations and illustrations of the utility of our model on various document collections with promising results. Finally we give a Metropolis-Hasting inference algorithm for a semi-supervised extension with decent results.

6 0.31535926 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation

7 0.31113124 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

8 0.2721405 79 emnlp-2010-Mining Name Translations from Entity Graph Mapping

9 0.24799657 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications

10 0.24582627 17 emnlp-2010-An Efficient Algorithm for Unsupervised Word Segmentation with Branching Entropy and MDL

11 0.23029585 114 emnlp-2010-Unsupervised Parse Selection for HPSG

12 0.22382198 39 emnlp-2010-EMNLP 044

13 0.22056845 80 emnlp-2010-Modeling Organization in Student Essays

14 0.2193763 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?

15 0.21086419 61 emnlp-2010-Improving Gender Classification of Blog Authors

16 0.20057493 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

17 0.1917073 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

18 0.19007975 4 emnlp-2010-A Game-Theoretic Approach to Generating Spatial Descriptions

19 0.18457021 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

20 0.18202557 99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.013), (10, 0.017), (12, 0.041), (26, 0.363), (29, 0.121), (30, 0.047), (32, 0.017), (52, 0.022), (56, 0.073), (62, 0.013), (66, 0.089), (72, 0.049), (76, 0.029), (77, 0.018), (79, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.73622572 81 emnlp-2010-Modeling Perspective Using Adaptor Grammars

Author: Eric Hardisty ; Jordan Boyd-Graber ; Philip Resnik

Abstract: Strong indications of perspective can often come from collocations of arbitrary length; for example, someone writing get the government out of my X is typically expressing a conservative rather than progressive viewpoint. However, going beyond unigram or bigram features in perspective classification gives rise to problems of data sparsity. We address this problem using nonparametric Bayesian modeling, specifically adaptor grammars (Johnson et al., 2006). We demonstrate that an adaptive na¨ ıve Bayes model captures multiword lexical usages associated with perspective, and establishes a new state-of-the-art for perspective classification results using the Bitter Lemons corpus, a collection of essays about mid-east issues from Israeli and Palestinian points of view.

2 0.43482286 89 emnlp-2010-PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts

Author: Chang Liu ; Daniel Dahlmeier ; Hwee Tou Ng

Abstract: We present PEM, the first fully automatic metric to evaluate the quality of paraphrases, and consequently, that of paraphrase generation systems. Our metric is based on three criteria: adequacy, fluency, and lexical dissimilarity. The key component in our metric is a robust and shallow semantic similarity measure based on pivot language N-grams that allows us to approximate adequacy independently of lexical similarity. Human evaluation shows that PEM achieves high correlation with human judgments.

3 0.43089503 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

Author: Kristian Woodsend ; Yansong Feng ; Mirella Lapata

Abstract: The task of selecting information and rendering it appropriately appears in multiple contexts in summarization. In this paper we present a model that simultaneously optimizes selection and rendering preferences. The model operates over a phrase-based representation of the source document which we obtain by merging PCFG parse trees and dependency graphs. Selection preferences for individual phrases are learned discriminatively, while a quasi-synchronous grammar (Smith and Eisner, 2006) captures rendering preferences such as paraphrases and compressions. Based on an integer linear programming formulation, the model learns to generate summaries that satisfy both types of preferences, while ensuring that length, topic coverage and grammar constraints are met. Experiments on headline and image caption generation show that our method obtains state-of-the-art performance using essentially the same model for both tasks without any major modifications.

4 0.42943653 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice

Author: Samidh Chatterjee ; Nicola Cancedda

Abstract: Minimum Error Rate Training is the algorithm for log-linear model parameter training most used in state-of-the-art Statistical Machine Translation systems. In its original formulation, the algorithm uses N-best lists output by the decoder to grow the Translation Pool that shapes the surface on which the actual optimization is performed. Recent work has been done to extend the algorithm to use the entire translation lattice built by the decoder, instead of N-best lists. We propose here a third, intermediate way, consisting in growing the translation pool using samples randomly drawn from the translation lattice. We empirically measure a systematic im- provement in the BLEU scores compared to training using N-best lists, without suffering the increase in computational complexity associated with operating with the whole lattice.

5 0.4290362 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

Author: Tom Kwiatkowksi ; Luke Zettlemoyer ; Sharon Goldwater ; Mark Steedman

Abstract: This paper addresses the problem of learning to map sentences to logical form, given training data consisting of natural language sentences paired with logical representations of their meaning. Previous approaches have been designed for particular natural languages or specific meaning representations; here we present a more general method. The approach induces a probabilistic CCG grammar that represents the meaning of individual words and defines how these meanings can be combined to analyze complete sentences. We use higher-order unification to define a hypothesis space containing all grammars consistent with the training data, and develop an online learning algorithm that efficiently searches this space while simultaneously estimating the parameters of a log-linear parsing model. Experiments demonstrate high accuracy on benchmark data sets in four languages with two different meaning representations.

6 0.42617869 87 emnlp-2010-Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space

7 0.42576498 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics

8 0.42543113 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

9 0.42517146 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

10 0.42372027 63 emnlp-2010-Improving Translation via Targeted Paraphrasing

11 0.42364773 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

12 0.42263508 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

13 0.42075074 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

14 0.42044088 32 emnlp-2010-Context Comparison of Bursty Events in Web Search and Online Media

15 0.41930127 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning

16 0.41808373 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation

17 0.417907 103 emnlp-2010-Tense Sense Disambiguation: A New Syntactic Polysemy Task

18 0.41783223 100 emnlp-2010-Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective

19 0.41640881 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

20 0.41562834 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment