emnlp emnlp2013 emnlp2013-53 knowledge-graph by maker-knowledge-mining

53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

Source: pdf

Author: Kuzman Ganchev ; Dipanjan Das

Abstract: We present a framework for cross-lingual transfer of sequence information from a resource-rich source language to a resourceimpoverished target language that incorporates soft constraints via posterior regularization. To this end, we use automatically word aligned bitext between the source and target language pair, and learn a discriminative conditional random field model on the target side. Our posterior regularization constraints are derived from simple intuitions about the task at hand and from cross-lingual alignment information. We show improvements over strong baselines for two tasks: part-of-speech tagging and namedentity segmentation.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract We present a framework for cross-lingual transfer of sequence information from a resource-rich source language to a resourceimpoverished target language that incorporates soft constraints via posterior regularization. [sent-2, score-0.439]

2 To this end, we use automatically word aligned bitext between the source and target language pair, and learn a discriminative conditional random field model on the target side. [sent-3, score-0.164]

3 Our posterior regularization constraints are derived from simple intuitions about the task at hand and from cross-lingual alignment information. [sent-4, score-0.333]

4 We show improvements over strong baselines for two tasks: part-of-speech tagging and namedentity segmentation. [sent-5, score-0.128]

5 For a given resource-poor target language of interest, we assume that parallel data with a resource-rich source language exists. [sent-9, score-0.19]

6 With the help of this bitext and a supervised system in the source language, we infer constraints over the label distribution in the target language, and train a discriminative model using posterior regularization (Ganchev et al. [sent-10, score-0.51]

7 Cross-lingual learning of structured prediction models via parallel data has been applied for several natural language processing problems, including partof-speech (POS) tagging (Yarowsky and Ngai, 2001), syntactic parsing (Hwa et al. [sent-12, score-0.217]

8 (2013) presented a technique for coupling token constraints derived from projected cross-lingual information and type constraints derived from noisy tag dictionaries to learn POS taggers. [sent-22, score-0.509]

9 (2009) presented a framework for learning weakly-supervised systems (in their case, dependency parsers) that incorporated alignment-based information too, but used the crosslingual information only as soft constraints, via posterior regularization. [sent-25, score-0.209]

10 The advantage of this framework lay in the fact that the projections were only trusted to a certain degree, determined by a strength hyperparameter, which unfortunately the authors did not have an elegant way to tune. [sent-26, score-0.213]

11 by treating the alignment-based projections only as soft constraints (see §3. [sent-28, score-0.237]

12 4); second, we choose the constraint strength by utilizing the tag ambiguity of tokens for a given resource-poor language (see §6. [sent-29, score-0.426]

13 oc d2s0 i1n3 N Aastusorcaila Ltiaon g fuoarg Ceo Pmrpoucetastsi on ga,l p Laignegsu 1is9t9ic6s–20 6, task, we present a novel method to perform highprecision phrase-level entity transfer (§5. [sent-34, score-0.151]

14 2); we also provide ways to balance precision and recall with posterior regularization (§6. [sent-36, score-0.243]

15 The first idea utilizes parallel data to create full or partial annotations in the low-resource language and trains from this data. [sent-40, score-0.128]

16 (2009), who also use posterior regularization but focus on dependency parsing alone. [sent-53, score-0.243]

17 1997 Algorithm 1Cross-Lingual Learning with Posterior Regularization Require:Parallel source and target language data De and Df, source language model (M)e, taskspecific target language constraints C. [sent-60, score-0.253]

18 First, we run word alignment over a large corpus of parallel data between the resourcerich source language and the resource-impoverished target language (see §4. [sent-65, score-0.19]

19 In the second step, we use a supervised mod§el to label the source side of the parallel data (see §5. [sent-67, score-0.289]

20 In the next subsection, we turn to a brief summary of this final step of estimating parameters of a discriminative model with posterior regularization. [sent-80, score-0.125]

21 2 Learning with Posterior Regularization In this work, we utilize discriminative CRF models, and use posterior regularization (PR) to optimize their parameters. [sent-82, score-0.243]

22 As a framework, posterior regularization is described in detail by Ganchev et al. [sent-83, score-0.243]

23 For example, we may know that a particular token could be labeled only by a label inventory licensed by a dictionary, or that a labeling projected from a source language is usually (but not always) correct. [sent-98, score-0.396]

24 Let Q be a set of distributions defined by: mθax Q = {q(Y) : Eq[φ(X, Y)] ≤ b}, (4) where φ is a constraint feature function and b is a vec- tor of non-negative values that serve as upper bounds to the expectations of every constraint feature. [sent-101, score-0.485]

25 Note that the constraint features φ are not related to the model features f. [sent-103, score-0.212]

26 By contrast, the constraint features and their corresponding constraint values are used to define our training objective function (and are only used during learning). [sent-105, score-0.464]

27 In the limit, Q = {q(Y) : = 1} contains just one distribQutio =n {cqo(nYce)n :tra qt(eYd on a single labeling In this limit, posterior regularization degenerates q(Yˆ) Yˆ. [sent-108, score-0.283]

28 Note: To make it easier to reason about constraint values b, we scale constraint features φ(X, Y) to lie in [0, 1] by computing maxY φ(X, Y) for the corpus to which φ is applied. [sent-115, score-0.424]

29 8 with respect to θ, we need to find expectations of the model features f given the current distribution pθ and the constraint distribution q. [sent-127, score-0.273]

30 8 with respect to λ, we need to find the expectations of the constraint features φ. [sent-131, score-0.273]

31 In our notation, they define their objective as: mθaxlogY∈XY(X)pθ(Y|X) − γkθk where (10) Yb(X) are the cbonstrained lattices of label sequencesb t(Xhat) agree with both a dictionary and crosslinguallby projected POS tags for each sentence of the training corpus. [sent-148, score-0.415]

32 Let us define a constraint feature φ(X, Y) which counts the number of tags in Y which are outside the constraint set Yb(X) and require φ(X, Y) ≤ 0. [sent-149, score-0.505]

33 φ(X,Y) ≤ 0 1Note that we did not implement regularization of θ in the stochastic optimizer, hence our PR objective (Eq. [sent-152, score-0.158]

34 avoid maintaining a parameter for the constraint, but lose the ability to relax the constraint value and allow some probability mass outside the pruned lat- tice. [sent-160, score-0.212]

35 Since the objectives are non-convex, the two optimization techniques could lead to different local optima even when the constraint is not relaxed (b = 0). [sent-162, score-0.212]

36 After pruning the search space with the dictionary, we place soft constraints derived by projecting POS tags across word alignments. [sent-170, score-0.255]

37 2), but we also filter any projected tags that are §not licensed by the dictionary. [sent-173, score-0.335]

38 The example in Figure 1 illustrates why this dictionary filtering step is important. [sent-174, score-0.153]

39 Our supervised tagger correctly tags Asian with the ADJ tag as shown in the figure. [sent-176, score-0.311]

40 Because the Spanish Wiktionary only allows the NOUN tag for Asia, we do not project the ADJ tag from the English word Asian. [sent-178, score-0.218]

41 By contrast, we do project the NOUN tag from the English word sponges to the Spanish 2http : / /code . [sent-179, score-0.154]

42 org/wiki/Wiktionary 1999 ADP ADJ NOUN of [ Asian ] M IS C sponges de las esponjas Figure 1: An English (top) – de Asia Spanish (bottom) phrase pair from our parallel data. [sent-183, score-0.326]

43 word esponjas because this tag is in our dictionary for the latter word. [sent-186, score-0.227]

44 The first column of Table 1 lists all seventeen languages using their two-letter abbreviation codes from the ISO 639-1 standard. [sent-190, score-0.159]

45 , 1993, with tags mapped to the universal tags) to train our supervised source-side model. [sent-196, score-0.138]

46 The English supervised NE tagger correctly identifies Asian as a named entity of type MISC (miscellaneous). [sent-202, score-0.241]

47 We use English as a source language and train a supervised English named-entity tagger with the labels in place, using the CoNLL 2003 shared task data (Tjong Kim Sang and De Meulder, 2003). [sent-207, score-0.183]

48 3 Parallel Data For both tasks we use parallel data gathered automatically from the web using the method of Uszkoreit et al. [sent-211, score-0.128]

49 (2010), as well as data from Europarl (Koehn, 2005) and the UN parallel corpus (UN, 2006), for languages covered by the latter two corpora. [sent-212, score-0.21]

50 The parallel sentences are word aligned with the aligner of DeNero and Macherey (201 1). [sent-213, score-0.221]

51 The size of the parallel corpus is larger than we need for our tasks, so we follow Ta¨ckstro¨m et al. [sent-214, score-0.128]

52 (2013) in sampling 500k tokens for POS tagging and 10k sentences for named-entity segmentation (see §5. [sent-215, score-0.189]

53 When describing feature sets we refer to features conjoined with just a single tag as emission features and with consecutive tag pairs as transition features. [sent-223, score-0.322]

54 This did not work well because the CoNLL gazetteers do not have good coverage on our parallel datasets, which we use for training. [sent-225, score-0.128]

55 1 Supervised Source-Side Model We tag the English side of our parallel data with a supervised first-order linear-chain CRF POS tagger. [sent-228, score-0.336]

56 We set the number of clusters to 256 for both the source side tagger and all the other languages. [sent-236, score-0.168]

57 On Section 23 of the WSJ section of the Penn Treebank, the source side tagger achieves an accuracy of 96. [sent-237, score-0.168]

58 (2013), we tag the English side of our parallel data using the source-side POS tagger, intersect the word alignments and filter alignments with confidence below 0. [sent-242, score-0.357]

59 The emission features are the same as the supervised model but without the punctuation feature,5 and we use only the bias transition feature. [sent-249, score-0.161]

60 2001 We have only one constraint feature in our posterior regularization models that fires for the unpruned projected tags on words xi. [sent-256, score-0.755]

61 This feature controls how often our model trusts a projected tag; we explain how its strength is chosen in §6. [sent-257, score-0.285]

62 2 Word-Alignment Filtering Projecting named entities across languages can be error prone for several reasons. [sent-278, score-0.129]

63 Word alignment errors are particularly problematic for entity mentions because of the garbage collector effect (Brown et al. [sent-280, score-0.121]

64 labeling on the source side, which is inaccurate if the parallel corpus is out of domain. [sent-287, score-0.23]

65 We discard sentence pairs where more than 30% of the source language tokens are unaligned, where any source entities are unaligned or where any source entities are more than 4 tokens long. [sent-289, score-0.186]

66 We also compute a confidence score over entity annotations as the minimum posterior over the tags that comprise the entity and discard sentence pairs that have an entity with confidence below 0. [sent-290, score-0.425]

67 Finally, we discard any sentences that contain no projected entities. [sent-292, score-0.18]

68 We compare our approach (“PR” in Table 2) to a baseline (“BASE” in Table 2) which treats the projected annotations as fully observed. [sent-301, score-0.18]

69 The PR model treats the projected NE spans of a sentence as observed, and allows all labels on the remaining tokens. [sent-302, score-0.18]

70 We add two features that fire when the current word is tagged “O”: a bias feature and a feature that fires when the automatic POS tag is a proper noun. [sent-304, score-0.148]

71 6 Results In this section, we turn to our experimental results; first, we focus on POS tagging and then turn to the NE segmentation task. [sent-308, score-0.189]

72 1, it is important to filter out projected annotatio§ns not licensed by Wiktionary. [sent-311, score-0.254]

73 4 1/TpT Figure 2: Correlation between optimal constraint value b and dictionary pruning efficiency. [sent-318, score-0.285]

74 Specifically, for each token, we counted the number of tags licensed by the dictionary, or all tags for word forms not in the dictionary. [sent-321, score-0.236]

75 For each language, we also ran our system with constraint strengths in {0. [sent-322, score-0.212]

76 00}, and computed the optimal constraint strength fro}m this set. [sent-331, score-0.317]

77 We found that the best constraint strength is closely correlated with the average number of tags available for each token. [sent-332, score-0.398]

78 Figure 2 shows the best constraint strength as a function of the inverse of the number of unpruned tags per token. [sent-333, score-0.398]

79 When applying this technique to a new language, we would not be able to estimate the optimal constraint strength, but we could use the linear approximation and knowledge of 1/TpT to estimate it. [sent-336, score-0.212]

80 , with and without the ‘+’ extension), our estimated constraint strength is usually better than using a constraint strength of 1. [sent-343, score-0.634]

81 For the languages where PR results in large improvements, it stems from the ability to allow the sentential context to sometimes override the tag projected via the parallel data. [sent-355, score-0.499]

82 For example, the phrase “podı´vali jsme se” translates to “we looked”, 2003 and the word jsme would typically be aligned to we; se, which serves as a reflexive pronoun here, remains unaligned. [sent-358, score-0.173]

83 Consequently, in our data, over 7000 occurrences of se appear, but only 17 instances have a tag projection that is not filtered by Wiktionary. [sent-359, score-0.184]

84 6% have the particle annotation projected from the English ’s possessive marker. [sent-369, score-0.18]

85 Alternatively, we could add another constraint to prefer closed-class words over open-class words when both are licensed by the dictionary. [sent-372, score-0.286]

86 When we add such a constraint to Chinese with a constraint value of0. [sent-373, score-0.424]

87 2 Named-Entity Segmentation Results: Table 2 shows the results for the named entity segmentation experiments. [sent-378, score-0.22]

88 By having a soft constraint via PR and allowing some segmentations to fall outside of the transferred one, we get an increase in recall, No FilteringFiltering (§5. [sent-384, score-0.296]

89 While filtering parallel sentences and using a soft constraint both increase recall, even our strongest model does not get enough information to predict these entities, and they continue to be major sources of error. [sent-427, score-0.504]

90 Note that “No Filtering” still discards sentences with no projected entities. [sent-467, score-0.18]

91 Note that because we focus on named entity segmentation, our results are not directly comparable to those of Ta¨ckstro¨m (2012), who train a de-lexicalized named entity recognizer on one language and apply it to other languages. [sent-472, score-0.24]

92 Error Analysis: In order to get a sense for the types of errors made by the baseline which are corrected by the PR model, we collected statistics about the most frequent errors in the segments extracted by the baseline and by our model. [sent-473, score-0.138]

93 For 2004 segmentation system is most useful for the long tail of entity mentions. [sent-477, score-0.173]

94 The Spanish annotation guidelines include enclosing quotes as part of the entity name, and failing to include them accounts for just under 1% of the precision errors of the PR system that uses filtering. [sent-481, score-0.121]

95 7 Conclusions In this paper, we presented a framework for crosslingual transfer of sequence information from a resource-rich source language to a resource-poor target language. [sent-484, score-0.14]

96 Our framework incorporates soft constraints while training with projected information via posterior regularization. [sent-485, score-0.479]

97 The soft constraints used in our work model intuitions about a given task. [sent-487, score-0.174]

98 For the POS tagging problem, we designed constraints that also incorporate projected token-level information, and presented a principled method for choosing the extent to which this information should be trusted within the PR framework. [sent-488, score-0.404]

99 Multilingual named entity recognition using parallel data and metadata from wikipedia. [sent-562, score-0.248]

100 Nudging the envelope of direct transfer methods for multilingual named entity recognition. [sent-621, score-0.198]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ckstro', 0.468), ('pr', 0.26), ('constraint', 0.212), ('projected', 0.18), ('ta', 0.166), ('ganchev', 0.148), ('parallel', 0.128), ('posterior', 0.125), ('regularization', 0.118), ('tag', 0.109), ('pos', 0.106), ('tjong', 0.106), ('strength', 0.105), ('spanish', 0.105), ('segmentation', 0.1), ('wiktionary', 0.093), ('constraints', 0.09), ('jq', 0.089), ('tagging', 0.089), ('conll', 0.085), ('soft', 0.084), ('sang', 0.082), ('languages', 0.082), ('crf', 0.082), ('petrov', 0.081), ('tags', 0.081), ('filtering', 0.08), ('transfer', 0.078), ('oscar', 0.077), ('seventeen', 0.077), ('das', 0.077), ('kim', 0.076), ('projection', 0.075), ('licensed', 0.074), ('dictionary', 0.073), ('entity', 0.073), ('labelings', 0.067), ('base', 0.066), ('tagger', 0.064), ('projections', 0.063), ('source', 0.062), ('uszkoreit', 0.062), ('mcdonald', 0.062), ('expectations', 0.061), ('ngai', 0.058), ('bitext', 0.058), ('meulder', 0.058), ('ax', 0.058), ('ryan', 0.057), ('supervised', 0.057), ('transition', 0.055), ('de', 0.054), ('slav', 0.053), ('kuzman', 0.053), ('hwa', 0.053), ('abeill', 0.053), ('dipanjan', 0.052), ('yarowsky', 0.052), ('aligner', 0.049), ('bio', 0.049), ('emission', 0.049), ('asia', 0.049), ('errors', 0.048), ('kl', 0.047), ('dual', 0.047), ('named', 0.047), ('esponjas', 0.045), ('jsme', 0.045), ('mila', 0.045), ('sponges', 0.045), ('trusted', 0.045), ('zeman', 0.045), ('aligned', 0.044), ('asian', 0.044), ('pronominal', 0.042), ('segments', 0.042), ('side', 0.042), ('ne', 0.042), ('gradient', 0.042), ('english', 0.041), ('ner', 0.041), ('lattices', 0.041), ('labeling', 0.04), ('token', 0.04), ('objective', 0.04), ('alignments', 0.039), ('adverb', 0.039), ('adj', 0.039), ('jakob', 0.039), ('dutch', 0.039), ('german', 0.039), ('yb', 0.039), ('minq', 0.039), ('christodoulopoulos', 0.039), ('deutsche', 0.039), ('fires', 0.039), ('namedentity', 0.039), ('reflexive', 0.039), ('sourceside', 0.039), ('taskspecific', 0.039)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

Author: Kuzman Ganchev ; Dipanjan Das

2 0.16029817 201 emnlp-2013-What is Hidden among Translation Rules

Author: Libin Shen ; Bowen Zhou

Abstract: Most of the machine translation systems rely on a large set of translation rules. These rules are treated as discrete and independent events. In this short paper, we propose a novel method to model rules as observed generation output of a compact hidden model, which leads to better generalization capability. We present a preliminary generative model to test this idea. Experimental results show about one point improvement on TER-BLEU over a strong baseline in Chinese-to-English translation.

3 0.15682609 70 emnlp-2013-Efficient Higher-Order CRFs for Morphological Tagging

Author: Thomas Mueller ; Helmut Schmid ; Hinrich Schutze

Abstract: Training higher-order conditional random fields is prohibitive for huge tag sets. We present an approximated conditional random field using coarse-to-fine decoding and early updating. We show that our implementation yields fast and accurate morphological taggers across six languages with different morphological properties and that across languages higher-order models give significant improvements over 1st-order models.

4 0.14037383 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

Author: Xiaoqing Zheng ; Hanyang Chen ; Tianyu Xu

Abstract: This study explores the feasibility of performing Chinese word segmentation (CWS) and POS tagging by deep learning. We try to avoid task-specific feature engineering, and use deep layers of neural networks to discover relevant features to the tasks. We leverage large-scale unlabeled data to improve internal representation of Chinese characters, and use these improved representations to enhance supervised word segmentation and POS tagging models. Our networks achieved close to state-of-theart performance with minimal computational cost. We also describe a perceptron-style algorithm for training the neural networks, as an alternative to maximum-likelihood method, to speed up the training process and make the learning algorithm easier to be implemented.

5 0.11233096 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech

Author: Stella Frank ; Frank Keller ; Sharon Goldwater

Abstract: Frank Keller keller@ inf .ed .ac .uk Sharon Goldwater sgwater@ inf .ed .ac .uk ILCC, School of Informatics University of Edinburgh Edinburgh, EH8 9AB, UK interactions are often (but not necessarily) synergisChildren learn various levels of linguistic structure concurrently, yet most existing models of language acquisition deal with only a single level of structure, implicitly assuming a sequential learning process. Developing models that learn multiple levels simultaneously can provide important insights into how these levels might interact synergistically dur- ing learning. Here, we present a model that jointly induces syntactic categories and morphological segmentations by combining two well-known models for the individual tasks. We test on child-directed utterances in English and Spanish and compare to single-task baselines. In the morphologically poorer language (English), the model improves morphological segmentation, while in the morphologically richer language (Spanish), it leads to better syntactic categorization. These results provide further evidence that joint learning is useful, but also suggest that the benefits may be different for typologically different languages.

6 0.10295743 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

7 0.10042401 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

8 0.098296545 198 emnlp-2013-Using Soft Constraints in Joint Inference for Clinical Concept Recognition

9 0.096104085 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology

10 0.09456297 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation

11 0.091237679 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment

12 0.091122158 2 emnlp-2013-A Convex Alternative to IBM Model 2

13 0.089292146 73 emnlp-2013-Error-Driven Analysis of Challenges in Coreference Resolution

14 0.086436622 86 emnlp-2013-Feature Noising for Log-Linear Structured Prediction

15 0.082610555 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation

16 0.079971477 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

17 0.077924706 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation

18 0.077456124 96 emnlp-2013-Identifying Phrasal Verbs Using Many Bilingual Corpora

19 0.076001137 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

20 0.075055428 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.277), (1, -0.068), (2, 0.038), (3, -0.081), (4, -0.136), (5, -0.042), (6, 0.033), (7, 0.07), (8, -0.03), (9, 0.035), (10, 0.055), (11, -0.06), (12, 0.086), (13, 0.053), (14, -0.061), (15, -0.048), (16, -0.095), (17, 0.055), (18, 0.016), (19, -0.078), (20, 0.031), (21, 0.074), (22, -0.016), (23, 0.08), (24, 0.165), (25, 0.01), (26, -0.147), (27, -0.177), (28, 0.055), (29, 0.033), (30, -0.033), (31, 0.008), (32, 0.006), (33, 0.023), (34, 0.115), (35, -0.066), (36, 0.102), (37, 0.235), (38, 0.064), (39, 0.131), (40, 0.029), (41, 0.05), (42, 0.064), (43, 0.127), (44, -0.002), (45, -0.066), (46, -0.041), (47, 0.126), (48, 0.056), (49, -0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95336854 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

Author: Kuzman Ganchev ; Dipanjan Das

2 0.65939718 70 emnlp-2013-Efficient Higher-Order CRFs for Morphological Tagging

Author: Thomas Mueller ; Helmut Schmid ; Hinrich Schutze

3 0.60381812 2 emnlp-2013-A Convex Alternative to IBM Model 2

Author: Andrei Simion ; Michael Collins ; Cliff Stein

Abstract: The IBM translation models have been hugely influential in statistical machine translation; they are the basis of the alignment models used in modern translation systems. Excluding IBM Model 1, the IBM translation models, and practically all variants proposed in the literature, have relied on the optimization of likelihood functions or similar functions that are non-convex, and hence have multiple local optima. In this paper we introduce a convex relaxation of IBM Model 2, and describe an optimization algorithm for the relaxation based on a subgradient method combined with exponentiated-gradient updates. Our approach gives the same level of alignment accuracy as IBM Model 2.

4 0.59232819 86 emnlp-2013-Feature Noising for Log-Linear Structured Prediction

Author: Sida Wang ; Mengqiu Wang ; Stefan Wager ; Percy Liang ; Christopher D. Manning

Abstract: NLP models have many and sparse features, and regularization is key for balancing model overfitting versus underfitting. A recently repopularized form of regularization is to generate fake training data by repeatedly adding noise to real data. We reinterpret this noising as an explicit regularizer, and approximate it with a second-order formula that can be used during training without actually generating fake data. We show how to apply this method to structured prediction using multinomial logistic regression and linear-chain CRFs. We tackle the key challenge of developing a dynamic program to compute the gradient of the regularizer efficiently. The regularizer is a sum over inputs, so we can estimate it more accurately via a semi-supervised or transductive extension. Applied to text classification and NER, our method provides a > 1% absolute performance gain over use of standard L2 regularization.

5 0.55050623 198 emnlp-2013-Using Soft Constraints in Joint Inference for Clinical Concept Recognition

Author: Prateek Jindal ; Dan Roth

Abstract: This paper introduces IQPs (Integer Quadratic Programs) as a way to model joint inference for the task of concept recognition in clinical domain. IQPs make it possible to easily incorporate soft constraints in the optimization framework and still support exact global inference. We show that soft constraints give statistically significant performance improvements when compared to hard constraints.

6 0.50938416 190 emnlp-2013-Ubertagging: Joint Segmentation and Supertagging for English

7 0.50908709 201 emnlp-2013-What is Hidden among Translation Rules

8 0.50526029 195 emnlp-2013-Unsupervised Spectral Learning of WCFG as Low-rank Matrix Completion

9 0.50183481 111 emnlp-2013-Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning

10 0.47748727 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

11 0.45703509 138 emnlp-2013-Naive Bayes Word Sense Induction

12 0.45677829 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology

13 0.43237653 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

14 0.42979786 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech

15 0.4262982 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

16 0.40522793 72 emnlp-2013-Elephant: Sequence Labeling for Word and Sentence Segmentation

17 0.40282768 26 emnlp-2013-Assembling the Kazakh Language Corpus

18 0.39515114 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

19 0.38805878 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation

20 0.38399968 203 emnlp-2013-With Blinkers on: Robust Prediction of Eye Movements across Readers

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.039), (18, 0.052), (22, 0.035), (26, 0.023), (30, 0.087), (34, 0.134), (45, 0.02), (47, 0.014), (50, 0.034), (51, 0.214), (52, 0.011), (64, 0.019), (66, 0.061), (71, 0.029), (75, 0.039), (77, 0.051), (90, 0.015), (96, 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92820477 52 emnlp-2013-Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation

Author: Rui Wang ; Masao Utiyama ; Isao Goto ; Eiichro Sumita ; Hai Zhao ; Bao-Liang Lu

Abstract: Neural network language models, or continuous-space language models (CSLMs), have been shown to improve the performance of statistical machine translation (SMT) when they are used for reranking n-best translations. However, CSLMs have not been used in the first pass decoding of SMT, because using CSLMs in decoding takes a lot of time. In contrast, we propose a method for converting CSLMs into back-off n-gram language models (BNLMs) so that we can use converted CSLMs in decoding. We show that they outperform the original BNLMs and are comparable with the traditional use of CSLMs in reranking.

same-paper 2 0.9134655 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

Author: Kuzman Ganchev ; Dipanjan Das

3 0.85655218 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

Author: Xiaoqing Zheng ; Hanyang Chen ; Tianyu Xu

4 0.85544568 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

Author: Uri Lerner ; Slav Petrov

Abstract: We present a simple and novel classifier-based preordering approach. Unlike existing preordering models, we train feature-rich discriminative classifiers that directly predict the target-side word order. Our approach combines the strengths of lexical reordering and syntactic preordering models by performing long-distance reorderings using the structure of the parse tree, while utilizing a discriminative model with a rich set of features, including lexical features. We present extensive experiments on 22 language pairs, including preordering into English from 7 other languages. We obtain improvements of up to 1.4 BLEU on language pairs in the WMT 2010 shared task. For languages from different families the improvements often exceed 2 BLEU. Many of these gains are also significant in human evaluations.

5 0.85492605 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

Author: Alla Rozovskaya ; Dan Roth

Abstract: State-of-the-art systems for grammatical error correction are based on a collection of independently-trained models for specific errors. Such models ignore linguistic interactions at the sentence level and thus do poorly on mistakes that involve grammatical dependencies among several words. In this paper, we identify linguistic structures with interacting grammatical properties and propose to address such dependencies via joint inference and joint learning. We show that it is possible to identify interactions well enough to facilitate a joint approach and, consequently, that joint methods correct incoherent predictions that independentlytrained classifiers tend to produce. Furthermore, because the joint learning model considers interacting phenomena during training, it is able to identify mistakes that require mak- ing multiple changes simultaneously and that standard approaches miss. Overall, our model significantly outperforms the Illinois system that placed first in the CoNLL-2013 shared task on grammatical error correction.

6 0.85275483 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

7 0.85052615 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology

8 0.84895164 143 emnlp-2013-Open Domain Targeted Sentiment

9 0.84835762 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

10 0.84749281 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

11 0.84735501 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

12 0.84659684 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech

13 0.8459453 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

14 0.84339935 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment

15 0.84318942 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation

16 0.84301805 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

17 0.84228826 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology

18 0.84197891 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

19 0.84133327 140 emnlp-2013-Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts

20 0.84007603 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)