emnlp emnlp2010 emnlp2010-111 knowledge-graph by maker-knowledge-mining

111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?


Source: pdf

Author: Christos Christodoulopoulos ; Sharon Goldwater ; Mark Steedman

Abstract: Part-of-speech (POS) induction is one of the most popular tasks in research on unsupervised NLP. Many different methods have been proposed, yet comparisons are difficult to make since there is little consensus on evaluation framework, and many papers evaluate against only one or two competitor systems. Here we evaluate seven different POS induction systems spanning nearly 20 years of work, using a variety of measures. We show that some of the oldest (and simplest) systems stand up surprisingly well against more recent approaches. Since most of these systems were developed and tested using data from the WSJ corpus, we compare their generalization abil- ities by testing on both WSJ and the multilingual Multext-East corpus. Finally, we introduce the idea of evaluating systems based on their ability to produce cluster prototypes that are useful as input to a prototype-driven learner. In most cases, the prototype-driven learner outperforms the unsupervised system used to initialize it, yielding state-of-the-art results on WSJ and improvements on nonEnglish corpora.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract Part-of-speech (POS) induction is one of the most popular tasks in research on unsupervised NLP. [sent-5, score-0.244]

2 Here we evaluate seven different POS induction systems spanning nearly 20 years of work, using a variety of measures. [sent-7, score-0.282]

3 We show that some of the oldest (and simplest) systems stand up surprisingly well against more recent approaches. [sent-8, score-0.164]

4 Since most of these systems were developed and tested using data from the WSJ corpus, we compare their generalization abil- ities by testing on both WSJ and the multilingual Multext-East corpus. [sent-9, score-0.165]

5 Finally, we introduce the idea of evaluating systems based on their ability to produce cluster prototypes that are useful as input to a prototype-driven learner. [sent-10, score-0.678]

6 uk the relative performance of unsupervised POS tagging systems because of differences in evaluation measures, and the fact that no paper includes direct comparisons against more than a few other systems. [sent-20, score-0.253]

7 In this paper, we attempt to remedy that situation by providing a comprehensive evaluation of seven different POS induction systems spanning nearly 20 years of research. [sent-21, score-0.282]

8 We focus specifically on POS induction systems, where no prior knowledge is available, in contrast to POS disambiguation systems (Merialdo, 1994; Toutanova and Johnson, 2007; Naseem et al. [sent-22, score-0.247]

9 , 2009; Ravi and Knight, 2009; Smith and Eisner, 2005), which use a dictionary to provide possible tags for some or all of the words in the corpus, or prototype-driven systems (Haghighi and Klein, 2006), which use a small set of prototypes for each tag class, but no dictionary. [sent-23, score-0.631]

10 Our motivation stems from another part of our own research, in which we are trying to use NLP systems on over 50 low-density languages (some of them dead) where both tagged corpora and language speakers are mostly unavailable. [sent-24, score-0.154]

11 Nevertheless, most recent papers have used mapping-based performance measures (either oneto-one or many-to-one accuracy). [sent-27, score-0.164]

12 Using V-Measure along with several other evaluation measures, we compare the performance of the different induction systems on both WSJ (the data on which most systems were developed and tested) and Multext East, a corpus of parallel texts in eight different languages. [sent-31, score-0.362]

13 We find that for virtually all measures and datasets, older systems using relatively simple models and algorithms (Brown et al. [sent-32, score-0.281]

14 , 1992; Clark, 2003) work as well or better than systems using newer and often far more sophisticated and time-consuming machine learning methods (Goldwater and Griffiths, 2007; Johnson, 2007; Graca et al. [sent-33, score-0.221]

15 Thus, although these newer methods have introduced potentially useful machine learning techniques, they should not be assumed to provide the best performance for unsupervised POS induction. [sent-36, score-0.218]

16 In addition to our review and comparison, we introduce a new way to both evaluate and potentially improve a POS induction system. [sent-37, score-0.167]

17 Our method is based on the prototype-driven learning system of Haghighi and Klein (2006), which achieves very good performance by using a hand-selected list of prototypes for each syntactic cluster. [sent-38, score-0.482]

18 We instead use the existing POS induction systems to induce prototypes automatically, and evaluate the systems based on the quality of their prototypes. [sent-39, score-0.794]

19 We find that the oldest system tested (Brown et al. [sent-40, score-0.177]

20 , 1992) produces the best prototypes, and that using these prototypes as input to Haghighi and Klein’s system yields stateof-the-art performance on WSJ and improvements on seven of the eight non-English corpora. [sent-41, score-0.552]

21 Each system outputs a set of syntactic clusters C; except where noted, the target number of clusters |C| must be specified as an input parameter. [sent-43, score-0.549]

22 This is the oldest and one of the simplest systems we tested. [sent-47, score-0.197]

23 This system uses a similar model to the previous one, and also clusters word types (rather than tokens, as the rest of the systems do). [sent-55, score-0.378]

24 The main differences between the systems are that clark uses a slightly different approximate search procedure, and that he augments the probabilistic model with a prior that prefers clusterings where morphologically similar words are clustered together. [sent-56, score-0.348]

25 The first clusters the most frequent 10,000 words (target words) based on their context statistics, with contexts formed from the most frequent 150-250 words (feature words) that appear ei1Implementations were obtained from: brown: http : / /www . [sent-62, score-0.251]

26 The final clustering is of the clusters obtained in the two While the number of target words, feature words, and window size are in principle parameters of the algorithm, they are hard-coded in the implementation we used and we did not change them. [sent-86, score-0.322]

27 The Dirichlet hyperparameters α (which controls the sparsity of the transition probabilities) and β (which controls the sparsity of the emission probabilities) can be fixed or inferred. [sent-90, score-0.199]

28 This system, while utilizing the same bigram HMM, encourages sparsity directly by constraining the posterior distributions using the posterior regularization framework (Ganchev et al. [sent-100, score-0.155]

29 3 Evaluation Measures One difficulty in comparing POS induction methods is in finding an appropriate evaluation measure. [sent-111, score-0.167]

30 In addition, some measures with supposed theoretical advantages, such as Variation ofInformation (VI) (Meil ˇa, 2003) have had little empirical analysis. [sent-113, score-0.165]

31 Our goal in this section is to determine which of these measures is most sensible for evaluating the systems presented above. [sent-114, score-0.25]

32 Except for VI, all measures range from 0 to 1, with higher scores indicating better performance. [sent-116, score-0.174]

33 [many-to-1]: Many-to-one mapping accuracy (also known as cluster purity) maps each cluster to the gold standard tag that is most common for the words in that cluster (henceforth, the preferred tag), and then computes the proportion of words tagged correctly. [sent-117, score-0.591]

34 More than one cluster may be mapped to the same gold standard tag. [sent-118, score-0.271]

35 In this measure, the first half of the corpus is used to obtain the many-to-one mapping of clusters to tags, and this mapping is used to compute the accuracy of the clustering on the second half of the corpus. [sent-122, score-0.396]

36 [1-to-1]: One-to-one mapping accuracy (Haghighi and Klein, 2006) constrains the mapping from clusters to tags, so that at most one cluster can be mapped to any tag. [sent-123, score-0.447]

37 In general, as the number of clusters increases, fewer clusters will be mapped to their preferred tag and scores will decrease (especially if the number of clusters is larger than the number of tags, so that some clusters are unassigned and receive zero credit). [sent-125, score-1.088]

38 [vi]: Variation of Information (Meil ˇa, 2003) is an information-theoretic measure that regards the system output C and the gold standard tags T as two separate clusterings, and evaluates the amount of information lost in going from C to T and the amount of information gained, i. [sent-127, score-0.273]

39 VI and other entropy-based measures have been argued to be superior to accuracy-based measures such as those above, because they consider not only the majority tag in each cluster, but also whether the remainder of the cluster is more or less homogeneous. [sent-134, score-0.419]

40 Unlike the other measures we consider, lower scores are better (since VI measures the difference between clusterings in bits). [sent-135, score-0.387]

41 They noted that V-Measure favors clusterings where the number of clusters |C| is larger than the number of POS tags |cTlu| . [sent-139, score-0.412]

42 One potential issue with all ofthe above measures is that they require a gold standard tagging to compute. [sent-144, score-0.326]

43 This is normally available during development of a system, but if the system is deployed on a novel language a gold standard may not be available. [sent-145, score-0.196]

44 578 In addition, there is the question of whether the gold standard itself is “correct”. [sent-146, score-0.149]

45 (2009) proposed this novel evaluation measure that requires no gold standard, instead using the concept of substitutability to evaluate performance. [sent-148, score-0.16]

46 Instead of comparing the system’s clusters C to gold standard clusters T, they are compared to a set of clusters S created from substitutable frames, i. [sent-149, score-1.019]

47 , clusters of words that occur in the same syntactic environment. [sent-151, score-0.251]

48 First, we examine the effects of varying |C| on mthes b Feihrsavt,io wre o exf atmhei neev athluea etifofnec measures, wgh |iCle| keeping the number of gold standard tags the same (|T| = 45). [sent-164, score-0.226]

49 Figure 1: Scores for all evaluation measures as a function of the number of clusters returned [model:brown, corpus:wsj, |C| :{ 1-200}, |T|:45]. [sent-175, score-0.38]

50 One might hope that the peak in performance would occur when the number of clusters is approximately equal to the number of gold standard tags; however, the best performance for both 1-to-1 and vi occurs with approximately 25-30 clusters, many fewer than the gold standard 45. [sent-197, score-0.659]

51 Interestingly, although vmb was proposed as a way to correct for the supposed tendency of vm to increase with increasing |C|, we find that vm is actually more stable than ivnmgb |C over edi ffinfedre thnat tv vamlue iss oacft |uCal|l . [sent-199, score-0.487]

52 dTi fhfeurse,n itf n thume gbeorasl of clusters (especially important for systems that induce the number of clusters), then vm seems more appropriate than any of the above measures, which are more standard in the literature. [sent-201, score-0.583]

53 oIts assigns a tlo vwaleure score Cto| tahfete supervised system than to brown, indicating that words in the supervised clusters (which are very close to the gold standard) are actually less substitutable than words in the unsupervised clusters. [sent-204, score-0.641]

54 This is probably due to the fact that the gold standard encodes “pure” syntactic classes, while substitutability also depends on semantic characteristics (which tend to be picked up by unsupervised clustering systems as well). [sent-205, score-0.424]

55 If a gold – standard is available, then many-to-1 and 1-to-1 are the most intuitive measures, but should not be used when |C| is variable, and do not account for differences |inC t|h ies errors lme,ad aned. [sent-208, score-0.149]

56 Overall, vm seems to pbae tahbei bityest a general-purpose measure lt,h vamt co semebmins teos an entropy-based score with an intuitive 0-1 scale and stability over a wide range of |C| . [sent-210, score-0.184]

57 This is a particularly pertinent question since a primary argument in favor of unsupervised systems is that they are easier to port to a new language or domain than supervised systems. [sent-214, score-0.157]

58 Based on our assessment of evaluation 580 Figure 2: Performance of the different systems on WSJ, using three different measures [|C| :45, |T| :45] vbsyrhclomftaewp knre14u0nhmt irs. [sent-217, score-0.209]

59 v measures above, we report VM scores as the most reliable measure across different systems and cluster set sizes; to facilitate comparisons with previous papers, we also report many-to-one and one-to-one accuracy. [sent-221, score-0.424]

60 Even the oldest and perhaps simplest method (brown) outperforms the two BHMMs and posterior regularization on all measures. [sent-226, score-0.155]

61 The cw system returns a total of 568 clusters on this data set, so the many-to-one and one-to-one measures are not strictly comparable to the other systems; on VM this system achieves middling performance. [sent-234, score-0.53]

62 We note that the two best-performing systems, clark and feat, are also the only two to use morphological information. [sent-235, score-0.243]

63 Since the clustering algorithms used by brown and clark are quite similar, the difference in performance between the two can probably be attributed to the extra information provided by the morphology. [sent-236, score-0.407]

64 2 Results on other corpora We now examine whether either the relative or absolute performance of the different systems holds up when tested on a variety of different languages. [sent-239, score-0.168]

65 For the WSJ corpora we experimented with two standardly used tagsets: the orig- inal PTB 45-tag gold standard and a coarser set of 17 tags previously used by several researchers working on unsupervised POS tagging (Smith and Eisner, 2005; Goldwater and Griffiths, 2007; Johnson, 2007). [sent-242, score-0.393]

66 , controlling as much as possible for corpus size and number of gold standard tags), we see that despite being developed on WSJ, the systems actually perform better on Multext-East. [sent-248, score-0.265]

67 6We tried to make the meanings of the tags as similar as possible between the two corpora; we had to create 13 rather than 14 WSJ tags for this reason. [sent-257, score-0.154]

68 , between different types of punctuation) are collapsed in the gold standard. [sent-265, score-0.151]

69 One might expect that the two systems with morphological features (clark and feat) would show less difference between English and some of the other languages (all of which have complex morphology) than the other systems. [sent-268, score-0.171]

70 However, although clark and feat (along with Brown) are the best performing systems overall, they don’t show any particular benefit for the morphologically complex languages. [sent-269, score-0.525]

71 8 One difference between the Multext-East results and the WSJ results is that on Multext-East, clark clearly outperforms all the other systems. [sent-270, score-0.184]

72 This is true for both the English and non-English corpora, despite the similar performance of clark and feat on (English) WSJ. [sent-271, score-0.445]

73 This suggests that feat benefits more from the larger corpus size of WSJ. [sent-272, score-0.261]

74 For the other languages clark may be benefiting from somewhat more general morphological features; feat currently contains suffix features but no prefix features (although these could be added). [sent-273, score-0.536]

75 Overall, our experiments on multiple languages support our earlier claim that many of the newer POS induction systems are not as successful as the older methods. [sent-274, score-0.492]

76 Moreover, these experiments underscore the importance of testing unsupervised systems on multiple languages and domains, since both the absolute and relative performance of systems may change on different data sets. [sent-275, score-0.269]

77 5 Learning from induced prototypes We now introduce a final novel method of evaluating POS induction systems and potentially improving their performance as well. [sent-277, score-0.762]

78 This model is unsupervised, but requires as input a handful of prototypes (canonical examples) for each word class. [sent-282, score-0.435]

79 Using the most frequent words in each gold standard class as prototypes, the authors report 80. [sent-284, score-0.149]

80 5% accuracy (both many-to-one and one-to-one) on WSJ, considerably higher than any of the induction systems seen here. [sent-285, score-0.247]

81 This raises two questions: If we wish to induce prototypes without a tagged corpus or language-specific knowledge, which induction system will provide the best prototypes (i. [sent-286, score-1.116]

82 And, can we use the induced prototypes as input to the prototype-driven model (h&k;) to achieve better performance than the system the prototypes were extracted from? [sent-289, score-0.956]

83 For each cluster ci ∈ C, we retain as candidate prototypes the words∈ ∈w Cho,se w frequency si cna ci iisat least 90% as high as the word with the highest frequency (in ci). [sent-291, score-0.679]

84 keep as prototypes twheo top yten th ewirord Ms w scitohr eMs, n>d 0k. [sent-298, score-0.475]

85 tThhee ocupto tfefn nth wreosrhdsold w riethsu Mlts Min > some 5c l∗us mtearsx having less than ten prototypes, which is appropriate since some gold standard categories have very few members (e. [sent-300, score-0.149]

86 Results in Table 3 show that the brown system produces the best prototypes. [sent-304, score-0.199]

87 Although not as good as using prototypes from the gold standard (h&k;), duction system, with prototypes extracted from each of the existing systems [|C| :45,|T| :45]. [sent-305, score-1.099]

88 In fact, the brown+proto scores are, to our knowledge, the best reported results for an unsupervised POS induction system on WSJ. [sent-313, score-0.336]

89 We see that brown again yields the best prototypes, and again yields improvements when used as brown+proto (although the improvements are not as large as those on WSJ). [sent-315, score-0.152]

90 Interestingly, clark+proto actually performs worse than clark on the multilingual data, showing that although induced prototypes can in principle improve the performance of a system, not all systems will benefit in all situations. [sent-316, score-0.813]

91 This suggests a need for additional investigation to determine what properties of an existing 583 induction system allow it to produce useful prototypes with the current method and/or to develop a specialized system specifically targeted towards inducing useful prototypes. [sent-317, score-0.696]

92 6 Conclusion In this paper, we have attempted to provide a more comprehensive review and comparison of evaluation measures and systems for POS induction than has been done before. [sent-318, score-0.376]

93 We pointed out that most of the commonly used evaluation measures are sensitive to the number of induced clusters, and suggested that V-measure (which is less sensitive) should be used as an alternative or in conjunction with the standard measures. [sent-319, score-0.204]

94 With regard to the systems themselves, we found that many of the newer approaches actually perform worse than older methods that are both simpler and faster. [sent-320, score-0.329]

95 The newer systems have introduced potentially important machine learning tools, but are not necessarily better suited to the POS induction task specifically. [sent-321, score-0.388]

96 Since portability is a distinguishing feature for unsupervised models, we have stressed the importance of testing the systems on corpora that were not used in their development, and especially on different languages. [sent-322, score-0.199]

97 Finally, we introduced the idea of evaluating induction systems based on their ability to produce useful cluster prototypes. [sent-324, score-0.41]

98 , 1992) yielded the best prototypes, and that using these prototypes gave state-of-the-art performance on WSJ, as well as improvements on nearly all of the non-English corpora. [sent-326, score-0.435]

99 These promising results suggest a new direction for future research: improving POS induction by developing methods targeted towards extracting better prototypes, rather than focusing on improving clustering of the entire data set. [sent-327, score-0.238]

100 A comparison of bayesian estimators for unsupervised hidden markov model pos taggers. [sent-367, score-0.257]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('prototypes', 0.435), ('feat', 0.261), ('clusters', 0.251), ('wsj', 0.21), ('vm', 0.184), ('clark', 0.184), ('induction', 0.167), ('brown', 0.152), ('newer', 0.141), ('proto', 0.141), ('pos', 0.135), ('measures', 0.129), ('cluster', 0.122), ('substitutable', 0.117), ('gold', 0.113), ('vi', 0.11), ('bhmm', 0.094), ('hmm', 0.091), ('graca', 0.091), ('clusterings', 0.084), ('oldest', 0.084), ('meil', 0.08), ('systems', 0.08), ('unsupervised', 0.077), ('tags', 0.077), ('goldwater', 0.077), ('older', 0.072), ('clustering', 0.071), ('vbhmm', 0.07), ('haghighi', 0.068), ('frank', 0.067), ('johnson', 0.061), ('ci', 0.061), ('runtimes', 0.06), ('morphological', 0.059), ('inf', 0.056), ('ganchev', 0.056), ('cw', 0.056), ('vlachos', 0.054), ('naseem', 0.054), ('rosenberg', 0.054), ('morristown', 0.053), ('ed', 0.051), ('ac', 0.049), ('comparisons', 0.048), ('tagging', 0.048), ('nj', 0.048), ('analogue', 0.047), ('homogeneity', 0.047), ('multext', 0.047), ('stella', 0.047), ('substitutability', 0.047), ('vmb', 0.047), ('sr', 0.047), ('system', 0.047), ('tested', 0.046), ('cc', 0.045), ('scores', 0.045), ('bayesian', 0.045), ('sparsity', 0.044), ('kuzman', 0.044), ('corpora', 0.042), ('evaluating', 0.041), ('gael', 0.04), ('hh', 0.04), ('biemann', 0.04), ('twheo', 0.04), ('whispers', 0.04), ('edinburgh', 0.04), ('ip', 0.04), ('induced', 0.039), ('multilingual', 0.039), ('klein', 0.039), ('tag', 0.039), ('collapsed', 0.038), ('sp', 0.038), ('ftware', 0.038), ('posterior', 0.038), ('mapping', 0.037), ('sharon', 0.037), ('controls', 0.037), ('hyperparameters', 0.037), ('informatics', 0.037), ('pr', 0.037), ('standard', 0.036), ('supposed', 0.036), ('dissimilarity', 0.036), ('zoubin', 0.036), ('actually', 0.036), ('eight', 0.035), ('papers', 0.035), ('bigram', 0.035), ('seven', 0.035), ('dirichlet', 0.034), ('lemmatization', 0.033), ('hirschberg', 0.033), ('simplest', 0.033), ('languages', 0.032), ('induce', 0.032), ('cs', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?

Author: Christos Christodoulopoulos ; Sharon Goldwater ; Mark Steedman

Abstract: Part-of-speech (POS) induction is one of the most popular tasks in research on unsupervised NLP. Many different methods have been proposed, yet comparisons are difficult to make since there is little consensus on evaluation framework, and many papers evaluate against only one or two competitor systems. Here we evaluate seven different POS induction systems spanning nearly 20 years of work, using a variety of measures. We show that some of the oldest (and simplest) systems stand up surprisingly well against more recent approaches. Since most of these systems were developed and tested using data from the WSJ corpus, we compare their generalization abil- ities by testing on both WSJ and the multilingual Multext-East corpus. Finally, we introduce the idea of evaluating systems based on their ability to produce cluster prototypes that are useful as input to a prototype-driven learner. In most cases, the prototype-driven learner outperforms the unsupervised system used to initialize it, yielding state-of-the-art results on WSJ and improvements on nonEnglish corpora.

2 0.18958651 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging

Author: Yoong Keok Lee ; Aria Haghighi ; Regina Barzilay

Abstract: Part-of-speech (POS) tag distributions are known to exhibit sparsity a word is likely to take a single predominant tag in a corpus. Recent research has demonstrated that incorporating this sparsity constraint improves tagging accuracy. However, in existing systems, this expansion come with a steep increase in model complexity. This paper proposes a simple and effective tagging method that directly models tag sparsity and other distributional properties of valid POS tag assignments. In addition, this formulation results in a dramatic reduction in the number of model parameters thereby, enabling unusually rapid training. Our experiments consistently demonstrate that this model architecture yields substantial performance gains over more complex tagging — counterparts. On several languages, we report performance exceeding that of more complex state-of-the art systems.1

3 0.17177151 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

Author: Taesun Moon ; Katrin Erk ; Jason Baldridge

Abstract: We define the crouching Dirichlet, hidden Markov model (CDHMM), an HMM for partof-speech tagging which draws state prior distributions for each local document context. This simple modification of the HMM takes advantage of the dichotomy in natural language between content and function words. In contrast, a standard HMM draws all prior distributions once over all states and it is known to perform poorly in unsupervised and semisupervised POS tagging. This modification significantly improves unsupervised POS tagging performance across several measures on five data sets for four languages. We also show that simply using different hyperparameter values for content and function word states in a standard HMM (which we call HMM+) is surprisingly effective.

4 0.14307366 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics

Author: Joseph Reisinger ; Raymond Mooney

Abstract: We introduce tiered clustering, a mixture model capable of accounting for varying degrees of shared (context-independent) feature structure, and demonstrate its applicability to inferring distributed representations of word meaning. Common tasks in lexical semantics such as word relatedness or selectional preference can benefit from modeling such structure: Polysemous word usage is often governed by some common background metaphoric usage (e.g. the senses of line or run), and likewise modeling the selectional preference of verbs relies on identifying commonalities shared by their typical arguments. Tiered clustering can also be viewed as a form of soft feature selection, where features that do not contribute meaningfully to the clustering can be excluded. We demonstrate the applicability of tiered clustering, highlighting particular cases where modeling shared structure is beneficial and where it can be detrimental.

5 0.1428446 71 emnlp-2010-Latent-Descriptor Clustering for Unsupervised POS Induction

Author: Michael Lamar ; Yariv Maron ; Elie Bienenstock

Abstract: We present a novel approach to distributionalonly, fully unsupervised, POS tagging, based on an adaptation of the EM algorithm for the estimation of a Gaussian mixture. In this approach, which we call Latent-Descriptor Clustering (LDC), word types are clustered using a series of progressively more informative descriptor vectors. These descriptors, which are computed from the immediate left and right context of each word in the corpus, are updated based on the previous state of the cluster assignments. The LDC algorithm is simple and intuitive. Using standard evaluation criteria for unsupervised POS tagging, LDC shows a substantial improvement in performance over state-of-the-art methods, along with a several-fold reduction in computational cost.

6 0.11251215 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

7 0.10581335 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

8 0.10559292 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning

9 0.089547127 114 emnlp-2010-Unsupervised Parse Selection for HPSG

10 0.086800553 124 emnlp-2010-Word Sense Induction Disambiguation Using Hierarchical Random Graphs

11 0.084134817 84 emnlp-2010-NLP on Spoken Documents Without ASR

12 0.084075056 27 emnlp-2010-Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification

13 0.084042341 8 emnlp-2010-A Multi-Pass Sieve for Coreference Resolution

14 0.080561355 17 emnlp-2010-An Efficient Algorithm for Unsupervised Word Segmentation with Branching Entropy and MDL

15 0.078139573 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

16 0.077358134 66 emnlp-2010-Inducing Word Senses to Improve Web Search Result Clustering

17 0.075156875 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

18 0.068986639 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

19 0.068017416 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

20 0.067911498 61 emnlp-2010-Improving Gender Classification of Blog Authors


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.258), (1, 0.169), (2, 0.066), (3, -0.114), (4, -0.135), (5, 0.017), (6, -0.222), (7, 0.082), (8, 0.215), (9, 0.085), (10, 0.174), (11, -0.028), (12, 0.079), (13, -0.037), (14, 0.04), (15, -0.065), (16, 0.025), (17, -0.022), (18, 0.098), (19, 0.062), (20, -0.077), (21, 0.065), (22, 0.015), (23, 0.052), (24, 0.078), (25, 0.109), (26, 0.05), (27, 0.08), (28, 0.032), (29, -0.044), (30, -0.046), (31, -0.08), (32, 0.096), (33, 0.107), (34, 0.042), (35, 0.074), (36, -0.026), (37, 0.019), (38, -0.009), (39, -0.036), (40, -0.091), (41, 0.017), (42, -0.03), (43, -0.119), (44, -0.007), (45, -0.086), (46, -0.0), (47, -0.073), (48, 0.1), (49, -0.002)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96433109 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?

Author: Christos Christodoulopoulos ; Sharon Goldwater ; Mark Steedman

Abstract: Part-of-speech (POS) induction is one of the most popular tasks in research on unsupervised NLP. Many different methods have been proposed, yet comparisons are difficult to make since there is little consensus on evaluation framework, and many papers evaluate against only one or two competitor systems. Here we evaluate seven different POS induction systems spanning nearly 20 years of work, using a variety of measures. We show that some of the oldest (and simplest) systems stand up surprisingly well against more recent approaches. Since most of these systems were developed and tested using data from the WSJ corpus, we compare their generalization abil- ities by testing on both WSJ and the multilingual Multext-East corpus. Finally, we introduce the idea of evaluating systems based on their ability to produce cluster prototypes that are useful as input to a prototype-driven learner. In most cases, the prototype-driven learner outperforms the unsupervised system used to initialize it, yielding state-of-the-art results on WSJ and improvements on nonEnglish corpora.

2 0.67489433 71 emnlp-2010-Latent-Descriptor Clustering for Unsupervised POS Induction

Author: Michael Lamar ; Yariv Maron ; Elie Bienenstock

Abstract: We present a novel approach to distributionalonly, fully unsupervised, POS tagging, based on an adaptation of the EM algorithm for the estimation of a Gaussian mixture. In this approach, which we call Latent-Descriptor Clustering (LDC), word types are clustered using a series of progressively more informative descriptor vectors. These descriptors, which are computed from the immediate left and right context of each word in the corpus, are updated based on the previous state of the cluster assignments. The LDC algorithm is simple and intuitive. Using standard evaluation criteria for unsupervised POS tagging, LDC shows a substantial improvement in performance over state-of-the-art methods, along with a several-fold reduction in computational cost.

3 0.63752425 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging

Author: Yoong Keok Lee ; Aria Haghighi ; Regina Barzilay

Abstract: Part-of-speech (POS) tag distributions are known to exhibit sparsity a word is likely to take a single predominant tag in a corpus. Recent research has demonstrated that incorporating this sparsity constraint improves tagging accuracy. However, in existing systems, this expansion come with a steep increase in model complexity. This paper proposes a simple and effective tagging method that directly models tag sparsity and other distributional properties of valid POS tag assignments. In addition, this formulation results in a dramatic reduction in the number of model parameters thereby, enabling unusually rapid training. Our experiments consistently demonstrate that this model architecture yields substantial performance gains over more complex tagging — counterparts. On several languages, we report performance exceeding that of more complex state-of-the art systems.1

4 0.62274402 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics

Author: Joseph Reisinger ; Raymond Mooney

Abstract: We introduce tiered clustering, a mixture model capable of accounting for varying degrees of shared (context-independent) feature structure, and demonstrate its applicability to inferring distributed representations of word meaning. Common tasks in lexical semantics such as word relatedness or selectional preference can benefit from modeling such structure: Polysemous word usage is often governed by some common background metaphoric usage (e.g. the senses of line or run), and likewise modeling the selectional preference of verbs relies on identifying commonalities shared by their typical arguments. Tiered clustering can also be viewed as a form of soft feature selection, where features that do not contribute meaningfully to the clustering can be excluded. We demonstrate the applicability of tiered clustering, highlighting particular cases where modeling shared structure is beneficial and where it can be detrimental.

5 0.60652548 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

Author: Taesun Moon ; Katrin Erk ; Jason Baldridge

Abstract: We define the crouching Dirichlet, hidden Markov model (CDHMM), an HMM for partof-speech tagging which draws state prior distributions for each local document context. This simple modification of the HMM takes advantage of the dichotomy in natural language between content and function words. In contrast, a standard HMM draws all prior distributions once over all states and it is known to perform poorly in unsupervised and semisupervised POS tagging. This modification significantly improves unsupervised POS tagging performance across several measures on five data sets for four languages. We also show that simply using different hyperparameter values for content and function word states in a standard HMM (which we call HMM+) is surprisingly effective.

6 0.52896488 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning

7 0.45744208 84 emnlp-2010-NLP on Spoken Documents Without ASR

8 0.39256132 124 emnlp-2010-Word Sense Induction Disambiguation Using Hierarchical Random Graphs

9 0.38950726 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

10 0.38473046 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

11 0.38176072 17 emnlp-2010-An Efficient Algorithm for Unsupervised Word Segmentation with Branching Entropy and MDL

12 0.37820548 8 emnlp-2010-A Multi-Pass Sieve for Coreference Resolution

13 0.36719427 66 emnlp-2010-Inducing Word Senses to Improve Web Search Result Clustering

14 0.35686368 27 emnlp-2010-Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification

15 0.34712818 114 emnlp-2010-Unsupervised Parse Selection for HPSG

16 0.33780047 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

17 0.29565582 61 emnlp-2010-Improving Gender Classification of Blog Authors

18 0.28996706 2 emnlp-2010-A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model

19 0.28543302 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

20 0.27881461 10 emnlp-2010-A Probabilistic Morphological Analyzer for Syriac


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.01), (12, 0.026), (29, 0.121), (30, 0.018), (32, 0.013), (52, 0.028), (56, 0.04), (66, 0.593), (72, 0.04), (76, 0.014), (77, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99751693 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning

Author: Lei Shi ; Rada Mihalcea ; Mingjun Tian

Abstract: In this paper, we introduce a method that automatically builds text classifiers in a new language by training on already labeled data in another language. Our method transfers the classification knowledge across languages by translating the model features and by using an Expectation Maximization (EM) algorithm that naturally takes into account the ambiguity associated with the translation of a word. We further exploit the readily available unlabeled data in the target language via semisupervised learning, and adapt the translated model to better fit the data distribution of the target language.

2 0.99591184 91 emnlp-2010-Practical Linguistic Steganography Using Contextual Synonym Substitution and Vertex Colour Coding

Author: Ching-Yun Chang ; Stephen Clark

Abstract: Linguistic Steganography is concerned with hiding information in natural language text. One of the major transformations used in Linguistic Steganography is synonym substitution. However, few existing studies have studied the practical application of this approach. In this paper we propose two improvements to the use of synonym substitution for encoding hidden bits of information. First, we use the Web 1T Google n-gram corpus for checking the applicability of a synonym in context, and we evaluate this method using data from the SemEval lexical substitution task. Second, we address the problem that arises from words with more than one sense, which creates a potential ambiguity in terms of which bits are encoded by a particular word. We develop a novel method in which words are the vertices in a graph, synonyms are linked by edges, and the bits assigned to a word are determined by a vertex colouring algorithm. This method ensures that each word encodes a unique sequence of bits, without cutting out large number of synonyms, and thus maintaining a reasonable embedding capacity.

same-paper 3 0.99305922 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?

Author: Christos Christodoulopoulos ; Sharon Goldwater ; Mark Steedman

Abstract: Part-of-speech (POS) induction is one of the most popular tasks in research on unsupervised NLP. Many different methods have been proposed, yet comparisons are difficult to make since there is little consensus on evaluation framework, and many papers evaluate against only one or two competitor systems. Here we evaluate seven different POS induction systems spanning nearly 20 years of work, using a variety of measures. We show that some of the oldest (and simplest) systems stand up surprisingly well against more recent approaches. Since most of these systems were developed and tested using data from the WSJ corpus, we compare their generalization abil- ities by testing on both WSJ and the multilingual Multext-East corpus. Finally, we introduce the idea of evaluating systems based on their ability to produce cluster prototypes that are useful as input to a prototype-driven learner. In most cases, the prototype-driven learner outperforms the unsupervised system used to initialize it, yielding state-of-the-art results on WSJ and improvements on nonEnglish corpora.

4 0.99288028 70 emnlp-2010-Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid

Author: Xin Zhao ; Jing Jiang ; Hongfei Yan ; Xiaoming Li

Abstract: Discovering and summarizing opinions from online reviews is an important and challenging task. A commonly-adopted framework generates structured review summaries with aspects and opinions. Recently topic models have been used to identify meaningful review aspects, but existing topic models do not identify aspect-specific opinion words. In this paper, we propose a MaxEnt-LDA hybrid model to jointly discover both aspects and aspect-specific opinion words. We show that with a relatively small amount of training data, our model can effectively identify aspect and opinion words simultaneously. We also demonstrate the domain adaptability of our model.

5 0.98974204 10 emnlp-2010-A Probabilistic Morphological Analyzer for Syriac

Author: Peter McClanahan ; George Busby ; Robbie Haertel ; Kristian Heal ; Deryle Lonsdale ; Kevin Seppi ; Eric Ringger

Abstract: We define a probabilistic morphological analyzer using a data-driven approach for Syriac in order to facilitate the creation of an annotated corpus. Syriac is an under-resourced Semitic language for which there are no available language tools such as morphological analyzers. We introduce novel probabilistic models for segmentation, dictionary linkage, and morphological tagging and connect them in a pipeline to create a probabilistic morphological analyzer requiring only labeled data. We explore the performance of models with varying amounts of training data and find that with about 34,500 labeled tokens, we can outperform a reasonable baseline trained on over 99,000 tokens and achieve an accuracy of just over 80%. When trained on all available training data, our joint model achieves 86.47% accuracy, a 29.7% reduction in error rate over the baseline.

6 0.92133898 104 emnlp-2010-The Necessity of Combining Adaptation Methods

7 0.91679406 50 emnlp-2010-Facilitating Translation Using Source Language Paraphrase Lattices

8 0.91422862 85 emnlp-2010-Negative Training Data Can be Harmful to Text Classification

9 0.91421461 43 emnlp-2010-Enhancing Domain Portability of Chinese Segmentation Model Using Chi-Square Statistics and Bootstrapping

10 0.89849734 119 emnlp-2010-We're Not in Kansas Anymore: Detecting Domain Changes in Streams

11 0.89530832 114 emnlp-2010-Unsupervised Parse Selection for HPSG

12 0.89357734 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

13 0.88180745 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

14 0.88146544 44 emnlp-2010-Enhancing Mention Detection Using Projection via Aligned Corpora

15 0.87212205 76 emnlp-2010-Maximum Entropy Based Phrase Reordering for Hierarchical Phrase-Based Translation

16 0.86067069 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields

17 0.85902566 92 emnlp-2010-Predicting the Semantic Compositionality of Prefix Verbs

18 0.85573369 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

19 0.85285324 3 emnlp-2010-A Fast Fertility Hidden Markov Model for Word Alignment Using MCMC

20 0.85191476 9 emnlp-2010-A New Approach to Lexical Disambiguation of Arabic Text