emnlp emnlp2013 emnlp2013-195 knowledge-graph by maker-knowledge-mining

195 emnlp-2013-Unsupervised Spectral Learning of WCFG as Low-rank Matrix Completion

Source: pdf

Author: Raphael Bailly ; Xavier Carreras ; Franco M. Luque ; Ariadna Quattoni

Abstract: We derive a spectral method for unsupervised learning of Weighted Context Free Grammars. We frame WCFG induction as finding a Hankel matrix that has low rank and is linearly constrained to represent a function computed by inside-outside recursions. The proposed algorithm picks the grammar that agrees with a sample and is the simplest with respect to the nuclear norm of the Hankel matrix.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 s Abstract We derive a spectral method for unsupervised learning of Weighted Context Free Grammars. [sent-3, score-0.42]

2 We frame WCFG induction as finding a Hankel matrix that has low rank and is linearly constrained to represent a function computed by inside-outside recursions. [sent-4, score-0.148]

3 The proposed algorithm picks the grammar that agrees with a sample and is the simplest with respect to the nuclear norm of the Hankel matrix. [sent-5, score-0.153]

4 In spirit, the work by Clark (2001 ; 2007) is probably the most similar to our approach since both approaches share an algebraic view of the problem. [sent-26, score-0.082]

5 In his case the key idea is to work with an algebraic representation of a WCFG. [sent-27, score-0.082]

6 In the last years, multiple spectral learning algorithms have been proposed for a wide range of models (Hsu et al. [sent-29, score-0.36]

7 Since the spectral approach provides a good thinking tool to reason about distributions over Σ∗, the question of whether they can be used for un- supervised learning of WCFG seems natural. [sent-36, score-0.387]

8 Still, while spectral algorithms for unsupervised learning of languages can learn regular languages, tree languages and simple dependency grammars, the frontier to WCFG seems hard to reach. [sent-37, score-0.474]

9 In fact, the most recent theoretical results on spectral learning of WCFG do not seem to be very encouraging. [sent-38, score-0.36]

10 hc o2d0s1 i3n A Nsastoucria lti Loan fgoura Cgoem Ppruotcaetsiosin agl, L piang eusis 6t2ic4s–635, Although, for some simple grammar subclasses (e. [sent-43, score-0.095]

11 In their paper, they propose a spectral algorithm based on a generalization of the method of moments for these restricted subclasses. [sent-48, score-0.385]

12 Thus one open direction for spectral research consists on defining subclasses of context free languages that can be learned (in the strong sense) from observations of yields. [sent-49, score-0.455]

13 Our main contribution is to present a spectral algorithm for unsupervised learning of WCFG. [sent-52, score-0.42]

14 (2012), the algorithm is framed as a convex optimization where we search for a low-rank matrix satisfying two types of constraints: (1) Constraints derived from observable statistics over yields; and (2) Constraints derived from certain recurrence relations satisfied by a WCFG. [sent-54, score-0.221]

15 Our derivations of the learning algorithm illustrate the main ingredients behind the spectral approach to learning functions over Σ∗ which are: (1) to exploit the recurrence relations satisfied by the target family of functions and (2) provide algebraic formulations of these relations. [sent-55, score-0.473]

16 Section 3 establishes that spectral methods can learn a WCFG from a Hankel matrix containing statistics about context-free 625 cuts. [sent-62, score-0.426]

17 Section 4 presents the unsupervised algorithm, where we formulate grammar induction as a lowrank optimization. [sent-63, score-0.159]

18 The set of all finite strings over Σ is denoted by Σ? [sent-67, score-0.115]

19 We start with a classic definition and then describe an algebraic form of WCFG that will be used throughout the paper. [sent-79, score-0.106]

20 x) Σ+ → , R (1) Xi∈V where we define the inside functionβ¯G¯ R recursively: β¯G¯(i = ⇒? [sent-99, score-0.081]

21 ⇒ase x) exploits the fundamental inside recursion in =W ⇒C xF)G e (Baker, 1979; nLdaarmi aennd- × Young, 1990). [sent-107, score-0.081]

22 y) (6) Xi∈V is the weight that the grammar G¯ assigns to a string xyz that has a cut or bracketing around y. [sent-123, score-0.183]

23 In this paper we will make frequent use of inside and outside quantities. [sent-129, score-0.126]

24 x; z, will simbolize a cut where we can insert an inside string y. [sent-132, score-0.14]

25 (i) is the probability to start a derivation with non-terminal i; wR(i → j k) is the conditional probability of rewriting no jn tke)rm isina thl ei cinontod j and k; wT(i →P Pσ) is the probability of rewriting i into symbol σ; Pi w? [sent-134, score-0.126]

26 2 WCFG in Algebraic Form We now define a WCFG in algebraic form. [sent-138, score-0.082]

27 >βG(x) where the inside function βG : Σ+ (7) → Rn is βG(σ) = βσ βG(x) = X x1x=,xXx2∈1xΣ2+ R (8) A(βG(x1) ⊗ βG(x2)) We will define the outside function Σ? [sent-146, score-0.126]

28 and y ∈ Σ+ we have that αG(x; z)>βG(y) (12) is the weight that the grammar assigns to the string xyz with a cut around y. [sent-152, score-0.183]

29 Let us make clear that a WCFG is the same device in classic or algebraic forms. [sent-154, score-0.106]

30 These matrices explicitly capture insideoutside recursions employed by WCFG functions, and are key to a derivation of a spectral learning algorithm that learns a grammar G using statistics of a training sample. [sent-164, score-0.566]

31 T Whee set yo tfh composed inside strings I2 is the set of elements (x, x0), where x, xe0 ∈ Σng+s. [sent-168, score-0.168]

32 Intuitively, hx; zi represents a czio,nt wexhte rweh xe,rze we can nitnusietrivt an hinxs;idzei element y in between x and z, yielding xyz. [sent-176, score-0.184]

33 To this end we will employ the notion of basis B = (P, S), where {hλ, λi} ⊆ P ⊆ O is a set oBf =out (siPde, Sc)o,nt wexhtesr ea {ndhλ Σ,λ ⊆ S⊆ ⊆ I ⊆1 iOs a set oeft oinfs ioduet strings. [sent-180, score-0.125]

34 First a matrix S ∈ of inside ivnegcto mrsa rfiocre sa. [sent-197, score-0.147]

35 or T ahlle contexts, w Pith ∈ row hx; zi taking eva vleuceto Phx;zi = αG(x; z). [sent-200, score-0.184]

36 wIti tihs easy htox see tthakatHf = PS, since Hf(hx; zi , y) = Phx;zi Sy = αG(x; z)>βG(y). [sent-201, score-0.184]

37 If HB is the sub-block associated with basis B = (P, S), then tshueb -sbulbo-cbklo acsksos PB ∈ wRiPth×n b asnisd SB ∈ RPn,S×S) ,o tfh ePn and S also accomplish that HB = PBSB. [sent-204, score-0.125]

38 (20) We say that a basis B is complete for f if rank(HB) = rank(Hf). [sent-208, score-0.125]

39 T ishe following oisr a key result for spectral methods. [sent-209, score-0.36]

40 Let B = (P, S) be a complete basis of dLiemmenmsiao 1n. [sent-211, score-0.125]

41 2 Supervised Spectral Learning of WCFG The spectral learning method directly exploits Lemma 1. [sent-223, score-0.36]

42 Choose a complete basis B = (P, S) and a diCmheonsoisoen a n. [sent-225, score-0.125]

43 The parameters of the algorithm are the basis and the dimension of the grammar n. [sent-242, score-0.19]

44 This is a practical difficulty of spectral methods, for example to apply evaluation metrics like perplexity which are only defined for distributions. [sent-248, score-0.36]

45 However, the statistics in the Hankel require access to strings that have information about context-free cuts. [sent-250, score-0.087]

46 We will assume that we only have access to statistics about plain strings of a distribution, i. [sent-251, score-0.087]

47 In this scenario, one natural idea is to search for a Hankel matrix that agrees with the observations. [sent-254, score-0.094]

48 The method we present in this section frames this problem as a low-rank matrix optimization problem. [sent-255, score-0.119]

49 Hankel matrices associated with WCFG that agree with observable statistics. [sent-258, score-0.125]

50 We first describe an inside-outside basis that is an extension of the one in the previous section. [sent-262, score-0.125]

51 Inside elements are the same, namely I I1∪ I2, = swidheere el e Im1e are strings (x) a nnadm eIl2y are composed strings (x, x0). [sent-263, score-0.205]

52 For example, if we consider hx; z0, zi and insert a string y, we obtain x(y, z0)z, ewrh hxer;ez we use (y, z0) tt oa explicitly dee onbottaei a composed inside string. [sent-268, score-0.293]

53 o Swp p(x) we can adneyfin setr itnhge xfol ∈lowing observable constraint: p(x) = H(hλ; λi, (x)) (24) The rest of entries of H correspond to a string with an inside-outside cut, and these are not observable. [sent-275, score-0.1]

54 629 Intuitively, we look for H that agrees with the observable statistics and satisfies the inside-outside constraints. [sent-294, score-0.1]

55 The k Hk2 ≤ 1 is satisfied by any Hankel matrix dTehreive kdH fkrom≤ a t 1ru ies distribution, a anndy i sH uasnekde tlo m avatoriidx incoherent solutions. [sent-296, score-0.096]

56 The above optimization problem, however, is computationally hard because of the rank objective. [sent-297, score-0.101]

57 The method alternates between separate projections for the observable constraints, the ‘2 norm, the inside-outside constraints, and the nuclear norm. [sent-302, score-0.101]

58 One can prove that there is theoretical identifiability of the rank and the parameters of an FST distribution, using a rank minimization formulation. [sent-306, score-0.14]

59 i aWtieo nbu stiletp pth ien Horadnekre tlo m gea-t trix from the inside basis {(x) }x∈Σ and outside basis Sample size Figure 1: KL divergence for spectral and EM methods, unsupervised and supervised, for different sizes of learning sample, on log-log scales. [sent-313, score-0.851]

60 The algorithm we use for the unsupervised spectral method is a simplified version: we use alternatively a hard projection on the constraints (by 630 Sample size Figure 2: KL divergence for unsupervised and supervised spectral methods, for different sizes of learning sample, on log-log scales. [sent-322, score-0.929]

61 We finally use the regular spectral method on this matrix to get our model. [sent-326, score-0.426]

62 We compare this method with an unsupervised EM, and also with supervised versions of spectral method and EM. [sent-327, score-0.447]

63 We run 50 optimization steps for the unsupervised spectral method, and 200 iterations for the EM methods. [sent-329, score-0.473]

64 For sample size greater than 105, the unsupervised spectral method seems to provide better solutions than both EM and supervised EM. [sent-331, score-0.478]

65 The solution, in terms of KL-divergence, is comparable to the one obtained with the supervised spectral method. [sent-332, score-0.387]

66 The computation time of unsupervised spectral method is almost constant w. [sent-333, score-0.42]

67 One can see that, for big sample sizes (109), the unsupervised spectral method is losing accuracy compared to the supervised method. [sent-343, score-0.503]

68 and could be overcome by considering a greater basis (e. [sent-345, score-0.125]

69 We do experiments with the following models and algorithms: • • • WFA: a Weighted Finite Automata learned using spectral mghettehdod Fsi as ed Aesuctroimbeadt ain l (Luque setal. [sent-353, score-0.36]

70 Supervised Spectral: a WCFG learned from sSturupcertuvriesded strings using WtheC algorithm odf section 3. [sent-356, score-0.087]

71 We choose as basis the most frequent insides and outsides observed in the training data. [sent-358, score-0.243]

72 The size of the basis is determined by a parameter f called the basis factor, that determines the proportion of total insides and outsides that will be in the basis. [sent-359, score-0.368]

73 Unsupervised Spectral: a WCFG learned from strings using t Shep algorithm CofF SGec lteiaornn e4d. [sent-360, score-0.087]

74 Trohme basis is like in the supervised case, but since ×× context-free cuts in the strings are not observed, 631 basissize of Hobs. [sent-361, score-0.239]

75 3901 0−4 Table 2: Experiments with the unsupervised spectral method on the WSJ10 corpus. [sent-368, score-0.42]

76 Results are in terms of expected L1 on the training set, for different basis and numbers of states. [sent-369, score-0.125]

77 all possible inside and outsides of the sample (i. [sent-370, score-0.186]

78 We generate a training set by sampling 4,000 strings from the target PCFG and counting the relative frequency of each. [sent-373, score-0.087]

79 For the supervised model, we generate strings paired with their context-free derivation. [sent-374, score-0.114]

80 To measure the quality of the learned models, we use the L1 distance to the target distribution over a fixed set of strings Σ≤n, for n = 7. [sent-375, score-0.087]

81 1 Figure 3 shows the results for the different models and for different basis sizes (in terms of the basis factor f). [sent-376, score-0.275]

82 lWeneg rtehm ≤ov 1e0d aleftxeirc faill ietreimngs apnundc mapped atnhed cPuOrrSe tags 1Given two functions f1and f2 over strings, the L1 distance is the sum of the absolute difference over all strings in a set: Px |f1(x) − f2(x)|. [sent-382, score-0.087]

83 Table 1 shows the size of the problem for different basis sizes. [sent-385, score-0.125]

84 As described in the previous subsection for the unsupervised case, we obtain the basis by taking the most frequent observed substrings and contexts. [sent-386, score-0.185]

85 We then compute all yields that can be generated with this basis, and close the basis to include all possible insides and outsides with operations completions, such that we create a Hankel as described in Section 4. [sent-387, score-0.272]

86 Table 1 shows, for each base, the size of H we induce, the number of observable constraints (i. [sent-389, score-0.109]

87 With the current implementation of the optimizer we were only able to run the unsupervised learning for small basis sizes. [sent-392, score-0.185]

88 For a fixed basis, as we increase the number of states we see that the error decreases, showing that the method is inducing a Hankel matrix that explains the observable statistics. [sent-394, score-0.165]

89 Our method combines ingredients of spectral learning with low-rank convex optimization methods. [sent-396, score-0.443]

90 To scale the method to learn languages of the complexity of natural languages we would need to identify optimization algorithms specially suited for this problem. [sent-398, score-0.107]

91 16 and 17 For the inside function, the base case is trivial. [sent-401, score-0.081]

92 x) 632 For the outside function, let ei be an ndimensional vector with coordinate ito 1 and the rest to 0. [sent-405, score-0.167]

93 W foer f fir stth saht ionwthat there exists an invertible matrix M that changes the basis of the operators of G into those of G0. [sent-413, score-0.217]

94 RI1 hh0x1,x2;zi and hh0x;z1,z2i be the rows in of h0A for hx1, x2; zi and hx; z1, z2i). [sent-440, score-0.184]

95 Towards high speed grammar induction on large text corpora. [sent-470, score-0.099]

96 A spectral approach for probabilistic grammatical inference on trees. [sent-480, score-0.385]

97 Local loss optimization in operator models: A new insight into spectral learning. [sent-502, score-0.413]

98 Unsupervised induction of stochastic context-free grammars using distributional clustering. [sent-519, score-0.086]

99 A spectral algorithm for learning hidden markov models. [sent-556, score-0.36]

100 The estimation of stochastic context-free grammars using the insideoutside algorithm. [sent-584, score-0.096]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('wcfg', 0.547), ('hankel', 0.37), ('spectral', 0.36), ('hx', 0.195), ('zi', 0.184), ('hb', 0.167), ('bailly', 0.133), ('gg', 0.129), ('basis', 0.125), ('balle', 0.104), ('luque', 0.089), ('hf', 0.087), ('strings', 0.087), ('ha', 0.085), ('algebraic', 0.082), ('inside', 0.081), ('pcfg', 0.078), ('vec', 0.077), ('ariadna', 0.074), ('outsides', 0.074), ('observable', 0.072), ('hsu', 0.07), ('lemma', 0.069), ('wr', 0.067), ('matrix', 0.066), ('grammar', 0.065), ('unsupervised', 0.06), ('borja', 0.059), ('quattoni', 0.059), ('rapha', 0.059), ('xyz', 0.059), ('wt', 0.055), ('optimization', 0.053), ('matrices', 0.053), ('xx', 0.052), ('proof', 0.052), ('grammars', 0.052), ('xavier', 0.049), ('rank', 0.048), ('ps', 0.047), ('outside', 0.045), ('franco', 0.044), ('identifiability', 0.044), ('insideoutside', 0.044), ('insides', 0.044), ('ordoba', 0.044), ('recursions', 0.044), ('rewriting', 0.044), ('sb', 0.041), ('free', 0.038), ('ei', 0.038), ('constraints', 0.037), ('rp', 0.036), ('carreras', 0.035), ('induction', 0.034), ('cohen', 0.034), ('shay', 0.033), ('em', 0.032), ('sample', 0.031), ('el', 0.031), ('px', 0.031), ('cut', 0.031), ('derivations', 0.031), ('convex', 0.03), ('adriaans', 0.03), ('conicet', 0.03), ('dyck', 0.03), ('fista', 0.03), ('idzei', 0.03), ('iesre', 0.03), ('kakade', 0.03), ('kurihara', 0.03), ('minhimize', 0.03), ('mrm', 0.03), ('phx', 0.03), ('sham', 0.03), ('subclasses', 0.03), ('tlo', 0.03), ('tres', 0.03), ('coordinate', 0.029), ('nuclear', 0.029), ('let', 0.029), ('yields', 0.029), ('agrees', 0.028), ('finite', 0.028), ('string', 0.028), ('languages', 0.027), ('completion', 0.027), ('states', 0.027), ('supervised', 0.027), ('rn', 0.026), ('lari', 0.026), ('ndimensional', 0.026), ('wri', 0.026), ('invertible', 0.026), ('xy', 0.025), ('restricted', 0.025), ('grammatical', 0.025), ('sizes', 0.025), ('classic', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 195 emnlp-2013-Unsupervised Spectral Learning of WCFG as Low-rank Matrix Completion

Author: Raphael Bailly ; Xavier Carreras ; Franco M. Luque ; Ariadna Quattoni

2 0.059674151 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

Author: Oier Lopez de Lacalle ; Mirella Lapata

Abstract: In this paper we present an unsupervised approach to relational information extraction. Our model partitions tuples representing an observed syntactic relationship between two named entities (e.g., “X was born in Y” and “X is from Y”) into clusters corresponding to underlying semantic relation types (e.g., BornIn, Located). Our approach incorporates general domain knowledge which we encode as First Order Logic rules and automatically combine with a topic model developed specifically for the relation extraction task. Evaluation results on the ACE 2007 English Relation Detection and Categorization (RDC) task show that our model outperforms competitive unsupervised approaches by a wide margin and is able to produce clusters shaped by both the data and the rules.

3 0.059244748 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky

Abstract: Many statistical learning problems in NLP call for local model search methods. But accuracy tends to suffer with current techniques, which often explore either too narrowly or too broadly: hill-climbers can get stuck in local optima, whereas samplers may be inefficient. We propose to arrange individual local optimizers into organized networks. Our building blocks are operators of two types: (i) transform, which suggests new places to search, via non-random restarts from already-found local optima; and (ii) join, which merges candidate solutions to find better optima. Experiments on grammar induction show that pursuing different transforms (e.g., discarding parts of a learned model or ignoring portions of training data) results in improvements. Groups of locally-optimal solutions can be further perturbed jointly, by constructing mixtures. Using these tools, we designed several modular dependency grammar induction networks of increasing complexity. Our complete sys- tem achieves 48.6% accuracy (directed dependency macro-average over all 19 languages in the 2006/7 CoNLL data) more than 5% higher than the previous state-of-the-art. —

4 0.056823038 85 emnlp-2013-Fast Joint Compression and Summarization via Graph Cuts

Author: Xian Qian ; Yang Liu

Abstract: Extractive summarization typically uses sentences as summarization units. In contrast, joint compression and summarization can use smaller units such as words and phrases, resulting in summaries containing more information. The goal of compressive summarization is to find a subset of words that maximize the total score of concepts and cutting dependency arcs under the grammar constraints and summary length constraint. We propose an efficient decoding algorithm for fast compressive summarization using graph cuts. Our approach first relaxes the length constraint using Lagrangian relaxation. Then we propose to bound the relaxed objective function by the supermodular binary quadratic programming problem, which can be solved efficiently using graph max-flow/min-cut. Since finding the tightest lower bound suffers from local optimality, we use convex relaxation for initialization. Experimental results on TAC2008 dataset demonstrate our method achieves competitive ROUGE score and has good readability, while is much faster than the integer linear programming (ILP) method.

5 0.054487363 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

Author: Xinyan Xiao ; Deyi Xiong

Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.

6 0.05366721 19 emnlp-2013-Adaptor Grammars for Learning Non-Concatenative Morphology

7 0.050010961 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs

8 0.049719483 2 emnlp-2013-A Convex Alternative to IBM Model 2

9 0.045376398 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

10 0.045021947 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization

11 0.044431023 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

12 0.039255176 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming

13 0.037113141 186 emnlp-2013-Translating into Morphologically Rich Languages with Synthetic Phrases

14 0.03662318 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation

15 0.036590405 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching

16 0.036568042 176 emnlp-2013-Structured Penalties for Log-Linear Language Models

17 0.035812974 36 emnlp-2013-Automatically Determining a Proper Length for Multi-Document Summarization: A Bayesian Nonparametric Approach

18 0.035809711 9 emnlp-2013-A Log-Linear Model for Unsupervised Text Normalization

19 0.035465237 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

20 0.034160048 106 emnlp-2013-Inducing Document Plans for Concept-to-Text Generation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.127), (1, -0.03), (2, -0.005), (3, 0.013), (4, -0.051), (5, 0.029), (6, 0.03), (7, -0.029), (8, -0.009), (9, 0.013), (10, 0.004), (11, -0.05), (12, -0.048), (13, 0.06), (14, 0.019), (15, -0.066), (16, 0.021), (17, -0.005), (18, -0.01), (19, 0.02), (20, -0.031), (21, -0.0), (22, 0.058), (23, 0.014), (24, -0.045), (25, -0.024), (26, -0.085), (27, -0.083), (28, 0.134), (29, -0.06), (30, -0.049), (31, 0.025), (32, -0.014), (33, -0.043), (34, 0.052), (35, 0.072), (36, -0.041), (37, 0.048), (38, -0.031), (39, 0.016), (40, -0.014), (41, -0.054), (42, 0.031), (43, 0.096), (44, 0.038), (45, -0.127), (46, -0.034), (47, 0.014), (48, -0.024), (49, 0.038)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92235619 195 emnlp-2013-Unsupervised Spectral Learning of WCFG as Low-rank Matrix Completion

Author: Raphael Bailly ; Xavier Carreras ; Franco M. Luque ; Ariadna Quattoni

2 0.63902241 2 emnlp-2013-A Convex Alternative to IBM Model 2

Author: Andrei Simion ; Michael Collins ; Cliff Stein

Abstract: The IBM translation models have been hugely influential in statistical machine translation; they are the basis of the alignment models used in modern translation systems. Excluding IBM Model 1, the IBM translation models, and practically all variants proposed in the literature, have relied on the optimization of likelihood functions or similar functions that are non-convex, and hence have multiple local optima. In this paper we introduce a convex relaxation of IBM Model 2, and describe an optimization algorithm for the relaxation based on a subgradient method combined with exponentiated-gradient updates. Our approach gives the same level of alignment accuracy as IBM Model 2.

3 0.62557054 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky

4 0.56641108 138 emnlp-2013-Naive Bayes Word Sense Induction

Author: Do Kook Choe ; Eugene Charniak

Abstract: We introduce an extended naive Bayes model for word sense induction (WSI) and apply it to a WSI task. The extended model incorporates the idea the words closer to the target word are more relevant in predicting its sense. The proposed model is very simple yet effective when evaluated on SemEval-2010 WSI data. 1

5 0.5620231 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization

Author: Joseph Le Roux ; Antoine Rozenknop ; Jennifer Foster

Abstract: It has recently been shown that different NLP models can be effectively combined using dual decomposition. In this paper we demonstrate that PCFG-LA parsing models are suitable for combination in this way. We experiment with the different models which result from alternative methods of extracting a grammar from a treebank (retaining or discarding function labels, left binarization versus right binarization) and achieve a labeled Parseval F-score of 92.4 on Wall Street Journal Section 23 this represents an absolute improvement of 0.7 and an error reduction rate of 7% over a strong PCFG-LA product-model baseline. Although we experiment only with binarization and function labels in this study, there is much scope for applying this approach to – other grammar extraction strategies.

6 0.51539892 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs

7 0.46774605 86 emnlp-2013-Feature Noising for Log-Linear Structured Prediction

8 0.43130141 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

9 0.42779791 159 emnlp-2013-Regularized Minimum Error Rate Training

10 0.41088986 19 emnlp-2013-Adaptor Grammars for Learning Non-Concatenative Morphology

11 0.40882695 198 emnlp-2013-Using Soft Constraints in Joint Inference for Clinical Concept Recognition

12 0.40875214 44 emnlp-2013-Centering Similarity Measures to Reduce Hubs

13 0.40216199 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

14 0.39848435 85 emnlp-2013-Fast Joint Compression and Summarization via Graph Cuts

15 0.3831059 161 emnlp-2013-Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!

16 0.37073848 176 emnlp-2013-Structured Penalties for Log-Linear Language Models

17 0.35408768 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

18 0.35398433 184 emnlp-2013-This Text Has the Scent of Starbucks: A Laplacian Structured Sparsity Model for Computational Branding Analytics

19 0.35042357 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors

20 0.33989185 37 emnlp-2013-Automatically Identifying Pseudepigraphic Texts

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.033), (5, 0.345), (9, 0.015), (18, 0.033), (22, 0.029), (30, 0.081), (43, 0.011), (50, 0.025), (51, 0.159), (66, 0.051), (71, 0.019), (75, 0.018), (77, 0.024), (90, 0.014), (95, 0.015), (96, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.73990417 195 emnlp-2013-Unsupervised Spectral Learning of WCFG as Low-rank Matrix Completion

Author: Raphael Bailly ; Xavier Carreras ; Franco M. Luque ; Ariadna Quattoni

2 0.48671976 143 emnlp-2013-Open Domain Targeted Sentiment

Author: Margaret Mitchell ; Jacqui Aguilar ; Theresa Wilson ; Benjamin Van Durme

Abstract: We propose a novel approach to sentiment analysis for a low resource setting. The intuition behind this work is that sentiment expressed towards an entity, targeted sentiment, may be viewed as a span of sentiment expressed across the entity. This representation allows us to model sentiment detection as a sequence tagging problem, jointly discovering people and organizations along with whether there is sentiment directed towards them. We compare performance in both Spanish and English on microblog data, using only a sentiment lexicon as an external resource. By leveraging linguisticallyinformed features within conditional random fields (CRFs) trained to minimize empirical risk, our best models in Spanish significantly outperform a strong baseline, and reach around 90% accuracy on the combined task of named entity recognition and sentiment prediction. Our models in English, trained on a much smaller dataset, are not yet statistically significant against their baselines.

3 0.48656327 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

Author: Kuzman Ganchev ; Dipanjan Das

Abstract: We present a framework for cross-lingual transfer of sequence information from a resource-rich source language to a resourceimpoverished target language that incorporates soft constraints via posterior regularization. To this end, we use automatically word aligned bitext between the source and target language pair, and learn a discriminative conditional random field model on the target side. Our posterior regularization constraints are derived from simple intuitions about the task at hand and from cross-lingual alignment information. We show improvements over strong baselines for two tasks: part-of-speech tagging and namedentity segmentation.

4 0.48650101 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

Author: Xiaoqing Zheng ; Hanyang Chen ; Tianyu Xu

Abstract: This study explores the feasibility of performing Chinese word segmentation (CWS) and POS tagging by deep learning. We try to avoid task-specific feature engineering, and use deep layers of neural networks to discover relevant features to the tasks. We leverage large-scale unlabeled data to improve internal representation of Chinese characters, and use these improved representations to enhance supervised word segmentation and POS tagging models. Our networks achieved close to state-of-theart performance with minimal computational cost. We also describe a perceptron-style algorithm for training the neural networks, as an alternative to maximum-likelihood method, to speed up the training process and make the learning algorithm easier to be implemented.

5 0.48395824 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

Author: Alla Rozovskaya ; Dan Roth

Abstract: State-of-the-art systems for grammatical error correction are based on a collection of independently-trained models for specific errors. Such models ignore linguistic interactions at the sentence level and thus do poorly on mistakes that involve grammatical dependencies among several words. In this paper, we identify linguistic structures with interacting grammatical properties and propose to address such dependencies via joint inference and joint learning. We show that it is possible to identify interactions well enough to facilitate a joint approach and, consequently, that joint methods correct incoherent predictions that independentlytrained classifiers tend to produce. Furthermore, because the joint learning model considers interacting phenomena during training, it is able to identify mistakes that require mak- ing multiple changes simultaneously and that standard approaches miss. Overall, our model significantly outperforms the Illinois system that placed first in the CoNLL-2013 shared task on grammatical error correction.

6 0.48295254 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

7 0.48111859 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)

8 0.48027489 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment

9 0.48009953 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

10 0.47958136 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

11 0.47942403 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech

12 0.47923341 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation

13 0.4791753 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types

14 0.4791092 140 emnlp-2013-Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts

15 0.47884095 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

16 0.47833693 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

17 0.47789553 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

18 0.47763216 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

19 0.47711733 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology

20 0.47680348 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media