emnlp emnlp2011 emnlp2011-1 knowledge-graph by maker-knowledge-mining

1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features


Source: pdf

Author: Christos Christodoulopoulos ; Sharon Goldwater ; Mark Steedman

Abstract: In this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. By using a mixture model rather than a sequence model (e.g., HMM), we are able to easily add multiple kinds of features, including those at both the type level (morphology features) and token level (context and alignment features, the latter from parallel corpora). Using only context features, our system yields results comparable to state-of-the art, far better than a similar model without the one-class-per-type constraint. Using the additional features provides added benefit, and our final system outperforms the best published results on most of the 25 corpora tested.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract In this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. [sent-4, score-0.605]

2 By using a mixture model rather than a sequence model (e. [sent-5, score-0.162]

3 , HMM), we are able to easily add multiple kinds of features, including those at both the type level (morphology features) and token level (context and alignment features, the latter from parallel corpora). [sent-7, score-0.397]

4 Using only context features, our system yields results comparable to state-of-the art, far better than a similar model without the one-class-per-type constraint. [sent-8, score-0.135]

5 Using the additional features provides added benefit, and our final system outperforms the best published results on most of the 25 corpora tested. [sent-9, score-0.266]

6 1 1The task is more commonly referred to as part-of-speech induction, but we prefer the term syntactic class induction since the induced classes may not coincide with part-of-speech tags. [sent-13, score-0.329]

7 This fact suggests that we should con- sider which features of the older systems led to their success, and attempt to combine these features with some of the machine learning methods introduced by the more recent systems. [sent-22, score-0.254]

8 Since this is much better than the performance of current unsupervised syntactic class induction systems, constraining the model in this way seems likely to improve performance by reducing the number of parameters in the model and incorporating useful linguistic knowledge. [sent-27, score-0.423]

9 (2010), is that the underlying probabilistic model is a clustering model, (specifically, a multinomial mixture model) rather than a sequence model (HMM). [sent-36, score-0.346]

10 One advantage of using a clustering model rather than a sequence model is that the features used for clustering need not be restricted to context words. [sent-46, score-0.405]

11 Additional types of features can easily be incorporated into the model and inference procedure using the same general framework as in the basic model that uses only context word features. [sent-47, score-0.294]

12 The first uses morphological features, which serve as cues to syntactic class and seemed to partly explain the success of two best-performing systems analysed by Christodoulopoulos et al. [sent-49, score-0.301]

13 The second extension to our model uses alignment features gathered from parallel corpora. [sent-51, score-0.272]

14 We evaluate our model on 25 corpora in 20 languages that vary substantially in both syntax and morphology. [sent-54, score-0.183]

15 , 2010), we find that the one-class-per-type restriction boosts performance considerably over a comparable tokenbased model and yields results that are comparable to state-of-the-art even without the use of morphology or alignment features. [sent-56, score-0.59]

16 Including morphology features yields the best published results on 14 or 15 of our 25 corpora (depending on the measure) and alignment features can improve results further. [sent-57, score-0.881]

17 2 Models Our model is a multinomial mixture model with Bayesian priors over the mixing weights θ and 639 Figure 1: Plate diagram of the basic model with a single feature per token (the observed variable f). [sent-58, score-0.655]

18 M, Z, and nj are the number of word types, syntactic classes z, and features (= tokens) per word type, respectively. [sent-59, score-0.28]

19 The model is defined so that all observations associated with a single word type are generated from the same mixing component (syntactic class). [sent-61, score-0.166]

20 In the basic model, these observations are token-level features; the morphology model adds type-level features as well. [sent-62, score-0.54]

21 We begin by describing the simplest version of our model, where each word token is associated with a single feature, for example its left context word (the word that occurs to its left in the corpus). [sent-63, score-0.174]

22 We then show how to generalise the model to multiple token-level features and to type-level features. [sent-64, score-0.146]

23 1 Basic model In the basic model, each word token is represented by a single feature such as its left context word. [sent-66, score-0.321]

24 These features are the observed data; the model explains the data by assuming that it has been generated from some set of latent syntactic classes. [sent-67, score-0.192]

25 The ith class is associated with a multinomial parameter vector ϕi that defines the distribution over features generated from that class, and with a mixing weight θi that defines the prior probability of that class. [sent-68, score-0.408]

26 The generative story goes as follows: First, generate the prior class probabilities θ. [sent-70, score-0.142]

27 M, choose a class assignment zj from the distribution θ. [sent-74, score-0.573]

28 nj of word type j, generate a feature fjk from ϕzj , the distribution associated with the class that word type j is assigned to. [sent-82, score-0.458]

29 The difference from other vector-based syntactic class induction systems is in the method of clustering. [sent-85, score-0.275]

30 Here, we define a Gibbs sampler that samples from the posterior distribution of the clusters given the observed features; other systems have used various standard distance-based vector clustering methods. [sent-86, score-0.236]

31 2 Inference At inference time we want to sample a syntactic class assignment z from the posterior of the model. [sent-90, score-0.263]

32 (1) Rather than sampling the joint class assignment P(z|f, α, β) directly, the sampler iterates over each Pwo(zrd|f type j, resampling imtsp clelras iste assignment zj given the current assignments z−j of all other word types. [sent-92, score-0.796]

33 The posterior over zj can be computed as P(zj | z−j , f, α, β) ∝ P(zj | z−j , α, β)P(fj | f−j , z, α, β) (2) 640 where fj are the features associated with word type j (one feature for each token of j). [sent-93, score-0.946]

34 The first (prior) factor is easy to compute due to the conjugacy be- tween the Dirichlet and multinomial distributions, and is equal to P(zj= z |z−j,α) =nnz++ Z αα (3) where nz is the number of types in class z and n· is the total number of word types in all classes. [sent-94, score-0.243]

35 tor is slightly more complex due to the dependencies between the different variables in fj that are induced by integrating out the ϕ parameters. [sent-99, score-0.162]

36 Consider first a simple case where word type j occurs exactly twice in the corpus, so fj contains two features. [sent-100, score-0.233]

37 The probability of the first feature fj1is equal to P(fj1= f |zj= z,z−j,f−j,β) =nn·f,z,z++ F ββ (4) where nf,z is the number of times feature f has been seen in class z, n·,z is the total number of feature tokens in the class, and F is the number of different possible features. [sent-101, score-0.304]

38 The probability of the second feature fj2 can be calculated similarly, except that it is conditioned on fj1 in addition to the other variables, so the counts for previously observed features must include the counts due to fj1 as well as those due to f−j. [sent-102, score-0.162]

39 Thus, the probability is P(fj2 = f | fj1, zj = z, z−j , f−j , β) =nf,zn+·,z δ+(f j11+, f jF2)β + β (5) where δ is the Kronecker delta function, equal to 1 if its arguments are equal and 0 otherwise. [sent-103, score-0.394]

40 3 Extended models We can extend the model above in two different ways: by adding more features at the word token level, or by adding features at the type level. [sent-106, score-0.444]

41 To add more token-level features, we simply assume that each word token generates multiple features, one feature from each of several different kinds. [sent-107, score-0.173]

42 3 For example, the left context word might be one kind of feature and the right context word another. [sent-108, score-0.164]

43 We assume conditional independence between the generated features given the syntactic class, so each kind of feature t has its own output parameters . [sent-109, score-0.267]

44 A plate diagram of the model with T kinds of features is shown in Figure 2 (a type-level feature is also included in this diagram, as described below). [sent-110, score-0.397]

45 Due to the independence assumption between the different kinds of features, the basic Gibbs sampler is easy to extend to this case by simpling multiplying in extra factors for the additional kinds of features, with the prior (Equation 3) unchanged. [sent-111, score-0.449]

46 T), zj = z, z−j, β) =∏TP(fj(t)|f−(t)j,zj= z,z−j,β) (7) t∏= ∏1 where each factor in the product is computed using Equation 6. [sent-118, score-0.394]

47 In addition to monolingual context features, we also explore the use of alignment features for those languages where we have parallel corpora. [sent-119, score-0.375]

48 These features are extracted for language ℓ by wordaligning ℓ to another language ℓ′ (details of the alignment procedure are described in Section 3. [sent-120, score-0.234]

49 The features used for each token e in ℓ are the left and right context words of the word token that is aligned to e (if there is one). [sent-122, score-0.452]

50 2One could approximate this likelihood term by assuming independence between all nj feature tokens of word type j. [sent-124, score-0.256]

51 However, deficient models have proven useful in other unsupervised NLP tasks (Klein and Manning, 2002; Toutanova and Johnson, 2007). [sent-129, score-0.12]

52 If we remove the part of their model that relies on the dictionary (the morphological ambiguity classes), their model is equivalent to our own, without the restriction of one class per type. [sent-131, score-0.376]

53 The final extension to our model introduces typelevel features, specifically morphology features. [sent-133, score-0.377]

54 We assume conditional independence between the morphology features and other features, so again we can simply multiply another factor into the likelihood during inference. [sent-135, score-0.506]

55 There is only one morphological feature per type, so this factor has the form of Equation 4. [sent-136, score-0.167]

56 Since frequent words will have many token-level features contributing to the likelihood and only one morphology feature, the morphology features will have a greater effect for infrequent words (as appropriate, since there is less evidence from context and alignments). [sent-137, score-0.949]

57 As with the other kinds of features, we use only a limited number Fm of morphology features, as described below. [sent-138, score-0.42]

58 We use the F = 100 most frequent words as features, and consider two versions of this model: one with two kinds of features (one left and one right context word) and one with four (two context words on each side). [sent-141, score-0.299]

59 For the model with morphology features we ran the unsupervised morphological segmentation system Morfessor (Creutz and Lagus, 2005) to get a Figure 2: Plate diagram of the extended model with T kinds of token-level features (f(t) variables) and a single kind of type-level feature (morphology, m). [sent-142, score-1.06]

60 This process yielded on average Fm = 110 morphological feature Each word type4 types5. [sent-145, score-0.167]

61 We also explore the idea of extending the morphology feature space beyond suffixes, by including features like capitalisation and punctuation. [sent-150, score-0.501]

62 Specifically we use the features described in Haghighi and Klein (2006), namely initial-capital, containshyphen, contains-digit and we add an extra feature contains-punctuation. [sent-151, score-0.22]

63 For the model with alignment features, we follow (Naseem et al. [sent-152, score-0.164]

64 , 2009) in using only bidirectional alignments: using Giza++ (Och and Ney, 2003), we get the word alignments in both directions between all possible language pairs in our parallel corpora (i. [sent-153, score-0.149]

65 642 above, we use two kinds of alignment features: the left and right context words of the aligned token in the other language. [sent-159, score-0.432]

66 Instead of fixing the hyperparameters α and β, we used the Metropolis-Hastings sampler presented by Goldwater and Griffiths (2007) to get updated values based on the likelihood of the data with respect to those hyperparameters6. [sent-161, score-0.115]

67 2 Datasets Although unsupervised systems should in principle be language- and corpus-independent, most part-ofspeech induction systems (especially in the early literature) have been developed on English. [sent-167, score-0.159]

68 , 2010) Since we aim to use our system mostly on nonEnglish corpora, and ones that are significantly smaller than the large English treebank corpora, we developed our models using one of the languages of the MULTEXT-East corpus (Erjavec, 2004), namely Bulgarian. [sent-169, score-0.229]

69 Since none of the authors speak any of the languages in the MULTEXT col6For simplicity, we tied the β parameters for the two or four kinds of context features to the same value, and similarly the β parameters for the two kinds of alignment features. [sent-171, score-0.537]

70 The first is the basic k-means clustering algorithm, which we applied to the same feature vectors we extracted for our system (context + extended morphology), using a Euclidean distance metric. [sent-184, score-0.232]

71 The second baseline is a more recent vector-based syntactic class induction method, the SVD approach of (Lamar et al. [sent-186, score-0.275]

72 , 2009) 643 els morphology and has produced very good results on multilingual corpora. [sent-203, score-0.383]

73 The first conclusion that can be drawn from these results is the large difference between the tokenand type-based versions of our system, which confirms that the one-class-per-type restriction is helpful for unsupervised syntactic class induction. [sent-211, score-0.305]

74 n n Weaec thhesiredfeo)ries used only two context words for all of our additional test languages (below). [sent-213, score-0.141]

75 We can clearly see that morphological features are helpful in both languages; however the extended features of Haghighi and Klein (2006) seem to help only on the English data. [sent-214, score-0.369]

76 This could be due to the fact that Bulgarian has a much richer morphology and thus the extra features contribute little to the overall performance of the model. [sent-215, score-0.505]

77 The contribution of the alignment features on the Bulgarian corpus (aligned with English) is less significant than that of morphology but when combined, the two sets of features yield the best performance. [sent-216, score-0.681]

78 Finally, initialisation method 2 does not yield on the MULTEXT-Bulgarian corpus for various models using either ±1 or ±2 context words as features. [sent-218, score-0.167]

79 ; (init): Initialisation method 2—other results use method 1; (ext): Extended morphological features. [sent-220, score-0.113]

80 We tested all possible combinations of two languages to align, and present both the average score over all alignments, and the score under the best choice of aligned language. [sent-227, score-0.137]

81 18 Also shown are the results of adding morphology features to the basic model (context features only) and to the best alignment model for each language. [sent-228, score-0.812]

82 644 development results, adding morphology to the basic model is generally useful. [sent-230, score-0.432]

83 The alignment results are mixed: on the one hand, choosing the best possible language to align yields improvements, which can be improved further by adding morphological features, resulting in the best scores of all models for most languages. [sent-231, score-0.281]

84 On the other hand, without knowing which language to choose, alignment features do not help on average. [sent-232, score-0.234]

85 The low average performance of the alignment features is disappointing, but there are many possible variations on our method for extracting these features that we have not yet tested. [sent-234, score-0.342]

86 For example, we used only bidirectional alignments in an effort to improve alignment precision, but these alignments typically cover less than 40% of tokens. [sent-235, score-0.306]

87 Our system, including morphology features in all cases, is listed as BMMM (Bayesian Multinomial Mixture Model). [sent-238, score-0.447]

88 We do not include alignment features for the MULTEXT languages since these features only yielded improvements for the oracle case where we know which aligned language to choose. [sent-239, score-0.479]

89 5 Conclusion We have presented a Bayesian model for syntactic class induction that has two important properties. [sent-244, score-0.313]

90 First, it is type-based, assigning the same class to every token of a word type. [sent-245, score-0.261]

91 BASE results use ±1-word context features alone or with morphology. [sent-247, score-0.163]

92 dAaLrdIG nNuMmbEeNrT ofS caldadsss alignment nfea Ttuarbelse, reporting th rees average score across anltle possible cesho aicloens of paired language and the scores under the best performing paired language (in parens), alone or with morphology features. [sent-248, score-0.465]

93 645 comparison with a token-based version of the model that this restriction a clustering is very helpful. [sent-258, score-0.166]

94 makes it easy to incorporate multi- ple kinds of features into the model at either the token or the type level. [sent-260, score-0.417]

95 Here, we experimented token-level context features and alignment and type-level morphology morphology features, with features showing that features are helpful in nearly all cases, and alignment features can be helpful if the aligned language is properly chosen. [sent-261, score-1.468]

96 Our results even without these extra features are competitive with stateof-the-art; with the additional features we achieve the best published results in the majority of the 25 corpora tested. [sent-262, score-0.432]

97 Since it is so easy to add extra features to our model, one direction for future work is to explore other possible features. [sent-263, score-0.166]

98 For example, it could be useful to add dependency features from an unsupervised dependency parser. [sent-264, score-0.18]

99 Finally, our model could be extended by replacing the standard mixture model with an infinite mixture model (Rasmussen, 2000) in order to induce the number of syntactic classes automatically. [sent-266, score-0.426]

100 PDTVALLEX: creating a large-coverage valency lexicon for treebank annotation. [sent-350, score-0.143]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('zj', 0.394), ('morphology', 0.339), ('christodoulopoulos', 0.203), ('fj', 0.162), ('treebank', 0.143), ('class', 0.142), ('lamar', 0.141), ('multext', 0.141), ('alignment', 0.126), ('token', 0.119), ('sampler', 0.115), ('morphological', 0.113), ('initialisation', 0.112), ('simov', 0.112), ('features', 0.108), ('multinomial', 0.101), ('published', 0.099), ('goldwater', 0.099), ('alignments', 0.09), ('abeill', 0.088), ('induction', 0.087), ('languages', 0.086), ('mixture', 0.086), ('clustering', 0.083), ('kinds', 0.081), ('bayesian', 0.078), ('lee', 0.077), ('erjavec', 0.073), ('petr', 0.073), ('vm', 0.073), ('bulgarian', 0.072), ('unsupervised', 0.072), ('nj', 0.072), ('type', 0.071), ('diagram', 0.069), ('tze', 0.069), ('sch', 0.066), ('anne', 0.066), ('kluwer', 0.066), ('haji', 0.063), ('haghighi', 0.061), ('treebanks', 0.059), ('corpora', 0.059), ('independence', 0.059), ('extra', 0.058), ('johnson', 0.057), ('mixing', 0.057), ('bamman', 0.056), ('christos', 0.056), ('civit', 0.056), ('kawata', 0.056), ('oflazer', 0.056), ('ohmov', 0.056), ('pajas', 0.056), ('redington', 0.056), ('smr', 0.056), ('toma', 0.056), ('vmeasure', 0.056), ('zeroski', 0.056), ('context', 0.055), ('basic', 0.055), ('feature', 0.054), ('classes', 0.054), ('naseem', 0.052), ('aligned', 0.051), ('prague', 0.05), ('ravi', 0.049), ('sharon', 0.049), ('cooling', 0.048), ('deficient', 0.048), ('fjk', 0.048), ('tapio', 0.048), ('plate', 0.047), ('suffixes', 0.047), ('syntactic', 0.046), ('snyder', 0.045), ('restriction', 0.045), ('multilingual', 0.044), ('toutanova', 0.044), ('academic', 0.044), ('ancient', 0.044), ('creutz', 0.044), ('morfessor', 0.044), ('ov', 0.044), ('rosenberg', 0.044), ('sinica', 0.044), ('temperature', 0.044), ('tiger', 0.044), ('yields', 0.042), ('morristown', 0.041), ('slovene', 0.041), ('extended', 0.04), ('arabic', 0.04), ('regina', 0.04), ('posterior', 0.038), ('older', 0.038), ('svd', 0.038), ('model', 0.038), ('klein', 0.038), ('assignment', 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features

Author: Christos Christodoulopoulos ; Sharon Goldwater ; Mark Steedman

Abstract: In this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. By using a mixture model rather than a sequence model (e.g., HMM), we are able to easily add multiple kinds of features, including those at both the type level (morphology features) and token level (context and alignment features, the latter from parallel corpora). Using only context features, our system yields results comparable to state-of-the art, far better than a similar model without the one-class-per-type constraint. Using the additional features provides added benefit, and our final system outperforms the best published results on most of the 25 corpora tested.

2 0.19480969 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction

Author: Young-Bum Kim ; Joao Graca ; Benjamin Snyder

Abstract: In this paper, we consider the problem of unsupervised morphological analysis from a new angle. Past work has endeavored to design unsupervised learning methods which explicitly or implicitly encode inductive biases appropriate to the task at hand. We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. We define a universal morphological feature space in which every language and its morphological analysis reside. We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled lan- guage which lies as close as possible in the feature space to a training language. We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDLbased approach. Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases.

3 0.15950106 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance

Author: Shay B. Cohen ; Dipanjan Das ; Noah A. Smith

Abstract: We describe a method for prediction of linguistic structure in a language for which only unlabeled data is available, using annotated data from a set of one or more helper languages. Our approach is based on a model that locally mixes between supervised models from the helper languages. Parallel data is not used, allowing the technique to be applied even in domains where human-translated texts are unavailable. We obtain state-of-theart performance for two tasks of structure prediction: unsupervised part-of-speech tagging and unsupervised dependency parsing.

4 0.15255781 3 emnlp-2011-A Correction Model for Word Alignments

Author: J. Scott McCarley ; Abraham Ittycheriah ; Salim Roukos ; Bing Xiang ; Jian-ming Xu

Abstract: Models of word alignment built as sequences of links have limited expressive power, but are easy to decode. Word aligners that model the alignment matrix can express arbitrary alignments, but are difficult to decode. We propose an alignment matrix model as a correction algorithm to an underlying sequencebased aligner. Then a greedy decoding algorithm enables the full expressive power of the alignment matrix formulation. Improved alignment performance is shown for all nine language pairs tested. The improved alignments also improved translation quality from Chinese to English and English to Italian.

5 0.13891144 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Angel X. Chang ; Daniel Jurafsky

Abstract: We show that categories induced by unsupervised word clustering can surpass the performance of gold part-of-speech tags in dependency grammar induction. Unlike classic clustering algorithms, our method allows a word to have different tags in different contexts. In an ablative analysis, we first demonstrate that this context-dependence is crucial to the superior performance of gold tags — requiring a word to always have the same part-ofspeech significantly degrades the performance of manual tags in grammar induction, eliminating the advantage that human annotation has over unsupervised tags. We then introduce a sequence modeling technique that combines the output of a word clustering algorithm with context-colored noise, to allow words to be tagged differently in different contexts. With these new induced tags as input, our state-of- the-art dependency grammar inducer achieves 59. 1% directed accuracy on Section 23 (all sentences) of the Wall Street Journal (WSJ) corpus — 0.7% higher than using gold tags.

6 0.13043705 99 emnlp-2011-Non-parametric Bayesian Segmentation of Japanese Noun Phrases

7 0.12545982 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model

8 0.11752988 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers

9 0.081000999 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

10 0.078001328 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing

11 0.0726256 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser

12 0.068735868 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning

13 0.067967519 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances

14 0.065963082 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP

15 0.06397707 122 emnlp-2011-Simple Effective Decipherment via Combinatorial Optimization

16 0.063899539 14 emnlp-2011-A generative model for unsupervised discovery of relations and argument classes from clinical texts

17 0.062655441 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus

18 0.062303677 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification

19 0.061634835 128 emnlp-2011-Structured Relation Discovery using Generative Models

20 0.060956001 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.259), (1, 0.01), (2, -0.082), (3, 0.076), (4, -0.044), (5, 0.136), (6, -0.17), (7, 0.053), (8, -0.294), (9, 0.019), (10, -0.084), (11, 0.081), (12, -0.031), (13, 0.01), (14, -0.074), (15, 0.045), (16, -0.06), (17, 0.094), (18, -0.072), (19, 0.165), (20, -0.09), (21, 0.039), (22, -0.008), (23, 0.051), (24, -0.005), (25, 0.109), (26, 0.019), (27, -0.085), (28, 0.06), (29, -0.044), (30, 0.092), (31, -0.058), (32, -0.003), (33, -0.052), (34, 0.13), (35, -0.11), (36, 0.079), (37, 0.085), (38, -0.09), (39, -0.061), (40, 0.042), (41, 0.066), (42, 0.004), (43, 0.164), (44, -0.012), (45, -0.078), (46, -0.052), (47, 0.012), (48, 0.117), (49, 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93405426 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features

Author: Christos Christodoulopoulos ; Sharon Goldwater ; Mark Steedman

Abstract: In this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. By using a mixture model rather than a sequence model (e.g., HMM), we are able to easily add multiple kinds of features, including those at both the type level (morphology features) and token level (context and alignment features, the latter from parallel corpora). Using only context features, our system yields results comparable to state-of-the art, far better than a similar model without the one-class-per-type constraint. Using the additional features provides added benefit, and our final system outperforms the best published results on most of the 25 corpora tested.

2 0.86608052 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction

Author: Young-Bum Kim ; Joao Graca ; Benjamin Snyder

Abstract: In this paper, we consider the problem of unsupervised morphological analysis from a new angle. Past work has endeavored to design unsupervised learning methods which explicitly or implicitly encode inductive biases appropriate to the task at hand. We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. We define a universal morphological feature space in which every language and its morphological analysis reside. We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled lan- guage which lies as close as possible in the feature space to a training language. We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDLbased approach. Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases.

3 0.61856312 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance

Author: Shay B. Cohen ; Dipanjan Das ; Noah A. Smith

Abstract: We describe a method for prediction of linguistic structure in a language for which only unlabeled data is available, using annotated data from a set of one or more helper languages. Our approach is based on a model that locally mixes between supervised models from the helper languages. Parallel data is not used, allowing the technique to be applied even in domains where human-translated texts are unavailable. We obtain state-of-theart performance for two tasks of structure prediction: unsupervised part-of-speech tagging and unsupervised dependency parsing.

4 0.61614549 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model

Author: Markus Dreyer ; Jason Eisner

Abstract: We present an inference algorithm that organizes observed words (tokens) into structured inflectional paradigms (types). It also naturally predicts the spelling of unobserved forms that are missing from these paradigms, and discovers inflectional principles (grammar) that generalize to wholly unobserved words. Our Bayesian generative model of the data explicitly represents tokens, types, inflections, paradigms, and locally conditioned string edits. It assumes that inflected word tokens are generated from an infinite mixture of inflectional paradigms (string tuples). Each paradigm is sampled all at once from a graphical model, whose potential functions are weighted finitestate transducers with language-specific parameters to be learned. These assumptions naturally lead to an elegant empirical Bayes inference procedure that exploits Monte Carlo EM, belief propagation, and dynamic programming. Given 50–100 seed paradigms, adding a 10million-word corpus reduces prediction error for morphological inflections by up to 10%.

5 0.53211665 3 emnlp-2011-A Correction Model for Word Alignments

Author: J. Scott McCarley ; Abraham Ittycheriah ; Salim Roukos ; Bing Xiang ; Jian-ming Xu

Abstract: Models of word alignment built as sequences of links have limited expressive power, but are easy to decode. Word aligners that model the alignment matrix can express arbitrary alignments, but are difficult to decode. We propose an alignment matrix model as a correction algorithm to an underlying sequencebased aligner. Then a greedy decoding algorithm enables the full expressive power of the alignment matrix formulation. Improved alignment performance is shown for all nine language pairs tested. The improved alignments also improved translation quality from Chinese to English and English to Italian.

6 0.4669953 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers

7 0.46539485 99 emnlp-2011-Non-parametric Bayesian Segmentation of Japanese Noun Phrases

8 0.44402176 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

9 0.34969088 62 emnlp-2011-Generating Subsequent Reference in Shared Visual Scenes: Computation vs Re-Use

10 0.34193647 143 emnlp-2011-Unsupervised Information Extraction with Distributional Prior Knowledge

11 0.33195111 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax

12 0.33177438 73 emnlp-2011-Improving Bilingual Projections via Sparse Covariance Matrices

13 0.33012712 67 emnlp-2011-Hierarchical Verb Clustering Using Graph Factorization

14 0.32854924 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification

15 0.32724217 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context

16 0.31833905 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing

17 0.31093431 48 emnlp-2011-Enhancing Chinese Word Segmentation Using Unlabeled Data

18 0.30996504 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances

19 0.30722576 144 emnlp-2011-Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions

20 0.30161482 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(15, 0.026), (23, 0.105), (36, 0.03), (37, 0.023), (45, 0.076), (53, 0.022), (54, 0.029), (57, 0.017), (62, 0.018), (64, 0.074), (66, 0.098), (69, 0.019), (79, 0.058), (81, 0.225), (82, 0.025), (90, 0.022), (94, 0.011), (96, 0.051), (98, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74704164 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features

Author: Christos Christodoulopoulos ; Sharon Goldwater ; Mark Steedman

Abstract: In this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. By using a mixture model rather than a sequence model (e.g., HMM), we are able to easily add multiple kinds of features, including those at both the type level (morphology features) and token level (context and alignment features, the latter from parallel corpora). Using only context features, our system yields results comparable to state-of-the art, far better than a similar model without the one-class-per-type constraint. Using the additional features provides added benefit, and our final system outperforms the best published results on most of the 25 corpora tested.

2 0.63151044 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction

Author: Young-Bum Kim ; Joao Graca ; Benjamin Snyder

Abstract: In this paper, we consider the problem of unsupervised morphological analysis from a new angle. Past work has endeavored to design unsupervised learning methods which explicitly or implicitly encode inductive biases appropriate to the task at hand. We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. We define a universal morphological feature space in which every language and its morphological analysis reside. We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled lan- guage which lies as close as possible in the feature space to a training language. We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDLbased approach. Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases.

3 0.61093998 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Angel X. Chang ; Daniel Jurafsky

Abstract: We show that categories induced by unsupervised word clustering can surpass the performance of gold part-of-speech tags in dependency grammar induction. Unlike classic clustering algorithms, our method allows a word to have different tags in different contexts. In an ablative analysis, we first demonstrate that this context-dependence is crucial to the superior performance of gold tags — requiring a word to always have the same part-ofspeech significantly degrades the performance of manual tags in grammar induction, eliminating the advantage that human annotation has over unsupervised tags. We then introduce a sequence modeling technique that combines the output of a word clustering algorithm with context-colored noise, to allow words to be tagged differently in different contexts. With these new induced tags as input, our state-of- the-art dependency grammar inducer achieves 59. 1% directed accuracy on Section 23 (all sentences) of the Wall Street Journal (WSJ) corpus — 0.7% higher than using gold tags.

4 0.61052769 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing

Author: Amit Dubey ; Frank Keller ; Patrick Sturt

Abstract: This paper introduces a psycholinguistic model of sentence processing which combines a Hidden Markov Model noun phrase chunker with a co-reference classifier. Both models are fully incremental and generative, giving probabilities of lexical elements conditional upon linguistic structure. This allows us to compute the information theoretic measure of surprisal, which is known to correlate with human processing effort. We evaluate our surprisal predictions on the Dundee corpus of eye-movement data show that our model achieve a better fit with human reading times than a syntax-only model which does not have access to co-reference information.

5 0.59675062 107 emnlp-2011-Probabilistic models of similarity in syntactic context

Author: Diarmuid O Seaghdha ; Anna Korhonen

Abstract: This paper investigates novel methods for incorporating syntactic information in probabilistic latent variable models of lexical choice and contextual similarity. The resulting models capture the effects of context on the interpretation of a word and in particular its effect on the appropriateness of replacing that word with a potentially related one. Evaluating our techniques on two datasets, we report performance above the prior state of the art for estimating sentence similarity and ranking lexical substitutes.

6 0.59106046 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

7 0.58933252 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance

8 0.58455068 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context

9 0.58313906 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model

10 0.58116335 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning

11 0.58008689 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming

12 0.57839465 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax

13 0.57814723 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification

14 0.57646394 97 emnlp-2011-Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French

15 0.57628244 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction

16 0.57406068 136 emnlp-2011-Training a Parser for Machine Translation Reordering

17 0.57332325 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases

18 0.57306868 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers

19 0.57277048 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics

20 0.57156259 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction