acl acl2012 acl2012-181 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shay B. Cohen ; Karl Stratos ; Michael Collins ; Dean P. Foster ; Lyle Ungar
Abstract: We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al., 2006). Under a separability (singular value) condition, we prove that the method provides consistent parameter estimates.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al. [sent-12, score-0.36]
2 1 Introduction Statistical models with hidden or latent variables are of great importance in natural language processing, speech, and many other fields. [sent-15, score-0.125]
3 The EM algorithm is a remarkably successful method for parameter estimation within these models: it is simple, it is often relatively efficient, and it has well understood formal properties. [sent-16, score-0.141]
4 From a theoretical perspective, this means that the EM algorithm is not guaranteed to give consistent parameter estimates. [sent-18, score-0.096]
5 Recent work has introduced polynomial-time learning algorithms (and consistent estimation methods) for two important cases of hidden-variable models: Gaussian mixture models (Dasgupta, 1999; Vempala and Wang, 2004) and hidden Markov models (Hsu et al. [sent-20, score-0.109]
6 These algorithms use spectral methods: that is, algorithms based on eigenvector decompositions of linear systems, in particular singular value decomposition (SVD). [sent-22, score-0.4]
7 (2009) has a sample complexity that is polynomial in 1/σ, where σ is the minimum singular value of an underlying decomposition. [sent-28, score-0.136]
8 In this paper we derive a spectral algorithm for learning of latent-variable PCFGs (L-PCFGs) (Petrov et al. [sent-30, score-0.36]
9 Under a separation (singular value) condition, our algorithm provides consistent parameter estimates; this is in contrast with previous work, which has used the EM algorithm for parameter estimation, with the usual problems of local optima. [sent-36, score-0.192]
10 The parameter estimation algorithm (see figure 4) is simple and efficient. [sent-37, score-0.141]
11 On test examples, simple (tensor-based) variants of the insideoutside algorithm (figures 2 and 3) can be used to calculate probabilities and marginals of interest. [sent-40, score-0.145]
12 Sec•tio Tnen 5s osrho fwosrm mtha otf fth teh ein isnidsied-oeu-otsuitdseid algorithm fmor. [sent-42, score-0.096]
13 Theorem 1 gives conditions under which the tensor form calculates inside and outside terms correctly. [sent-44, score-0.579]
14 Section 6 shows tha•t u Onbdseerr a singular-value condition, cthtieorne i6s an oowb-s servable form for the tensors required by the insideoutside algorithm. [sent-46, score-0.113]
15 By an observable form, we follow the terminology of Hsu et al. [sent-47, score-0.141]
16 (2009) in referring to quantities that can be estimated directly from data where values for latent variables are unobserved. [sent-48, score-0.127]
17 Theorem 2 shows that tensors derived from the observable form satisfy the conditions of theorem 1. [sent-49, score-0.621]
18 Section 7 gives an algorithm mfora estimating parameters oonf t7he g iovbesser avnab allerepresentation from training data. [sent-51, score-0.096]
19 The algorithm is strikingly different from the EM algorithm for L-PCFGs, both in its basic form, and in its consistency guarantees. [sent-53, score-0.192]
20 The tensor form of the inside- × outside algorithm gives a new view of basic calculations in PCFGs, and may itself lead to new models. [sent-57, score-0.463]
21 , 2012; Jaeger, 2000), but involves several extensions: in particular in the tensor form of the inside-outside algorithm, and observable representations for the tensor form. [sent-63, score-0.682]
22 (201 1) consider spectral learning of finite-state transducers; Lugue et al. [sent-65, score-0.264]
23 (2012) considers spectral learning of head automata for dependency parsing. [sent-66, score-0.264]
24 (201 1) consider spectral learning algorithms of treestructured directed bayes nets. [sent-68, score-0.264]
25 3 Notation Given a matrix A or a vector v, we write A⊤ or v⊤ for the associated transpose. [sent-69, score-0.094]
26 rF nor ≥ any row or c [nol]um ton d vector y ∈ Rm, we use diag(y) to rreowfer o otor ctholeu (m m) rm yatr ∈ix with diagonal elements equal oto t yh mfor ×h = m1 . [sent-74, score-0.294]
27 We will make (quite limited) use of tensors: Definition 1A tensor C ∈ is a set of mDe3f parameters Ci,j,k for i,j,k ∈ [m]. [sent-80, score-0.247]
28 Given a tensor C, and a vector y ∈ Rm, we define C(y) to bnetshoer (m m) mecattorirx y yw ∈ith components [C(y)]i,j = Pk∈[m] Ci,j,kyk. [sent-81, score-0.365]
29 ∈In addition, we define t)h oef tensor Csio∗ ∈ for any tensor C ∈ to have∈ values R(m×m×m) [C∗]i,j,k = Ck,j,i Rm(×m × mm×). [sent-83, score-0.562]
30 Assuming tsha tth eth lee fnt-ohna-ntder-msiidneals o fi na trhulee grammar can sbseu partitioned tthheis way eisr relatively benign, and makes the estimation problem cleaner. [sent-99, score-0.12]
31 A skeletal tree (s-tree) is a sequence of rules r1 . [sent-108, score-0.177]
32 rN where each ri is either of the form a → b c or a → x. [sent-111, score-0.178]
33 ) the hidden variable for the left-hand-side of rule ri. [sent-124, score-0.137]
34 Define ai to be the non-terminal on the left-handside of rule ri. [sent-126, score-0.184]
35 = In the remainder ofthis paper, we make use of ma- tPrix form of parameters of an L-PCFG, as follows: • For each a → b c ∈ R, we define Qa→b ∈ Rm•×m Fo tro e abec hth ae →ma btri cx w∈it hR v,a wluees d q(a → b c|h, a) for h = 1, 2, . [sent-155, score-0.115]
36 m on xit ws diagonal, aqn(da 0→ va blu ce|hs ,faor) its off-diagonal elements. [sent-158, score-0.122]
37 s • For each a → b c ∈ R, we define Sa→b ∈ Rm•× Fmo wr ehaecrhe [ aSa →→b c]h′,h = s(h′|h, a → b S c). [sent-164, score-0.112]
38 xN, the parsing algorithm of Goodman (1996) can be used to find argτ m∈Ta (xx)(aX,i,j)∈τµ(a,i,j) This is the parsing algorithm used by Petrov et al. [sent-189, score-0.192]
39 Vτ∈aTria (xn)ts of the inside-oPutsa∈idIe algorithm can be uPsed for problems 1 and 2. [sent-192, score-0.096]
40 This is the first step in deriving the spectral estimation method. [sent-194, score-0.36]
41 The following theorem gives conditions under which the algorithms are correct: Theorem 1Assume that we have an L-PCFG with parameters Qa→x, Qa→b Ta→b Sa→b πa, and that there exist matrices Ga ∈ for all a ∈ N such that each Ga is invertible, and sufcohr t ahlalt: a 1. [sent-202, score-0.45]
42 For all a ∈ I, ca1 = Gaπa Then: 1) The algorithm in figure 2 correctly computes p(r1 . [sent-205, score-0.096]
43 2) The algorithm in figure 3 correctly computes the marginals µ(a, i,j) under the L-PCFG. [sent-209, score-0.145]
44 6 Estimating the Tensor Model A crucial result is that it is possible to directly estimate parameters Ca→b ca∞→x and ca1 that satisfy the conditions in theorem 1, froam→x a training sample consisting of s-trees (i. [sent-212, score-0.367]
45 We first describe random variables underlying the approach, then describe observable representations based on these random variables. [sent-215, score-0.308]
46 Each node has an associated rule: for example, node 2 in the tree in figure 1has the rule NP → D N. [sent-222, score-0.467]
47 are leef rtu alned a right dines i sde of trees o bremlow a →the b ble cf,t t chhenild th aenrde right child of the rule. [sent-224, score-0.232]
48 For example, for node 2 we have a left inside tree rooted at node 3, and a right inside tree rooted at node 4 (in this case the left and right inside trees both contain only a single rule production, of the form a → x; however in the general case they might ob erm arbitrary subtrees). [sent-225, score-1.339]
49 For node 2, the outside tree is S VP NP VP him saw 226 Figure 3: The tensor form ofthe inside-outside algorithm, for calculation of marginal terms µ(a, i,j). [sent-227, score-0.608]
50 The outside tree contains everything in the s-tree r1 . [sent-228, score-0.161]
51 First, we select a random internal node, from a random tree, as follows: • Sample an s-tree r1 . [sent-233, score-0.106]
52 sC-throeeose r a node iuniformly at random from [N] . [sent-240, score-0.206]
53 If the rule ri for the node iis of the form a → b c, we define randofmor v thareia nboldees as sfo ofll tohwes f:o • R1 is equal to the rule ri (e. [sent-241, score-0.676]
54 D T2 is the ins•id Te tree rooted at the left child of node i, and T3 is the inside tree rooted at the right child of node i. [sent-247, score-0.886]
55 • H1, H2, H3 are the hidden variables associated wit•h Hnode i, the left child of node i, and the right child of node irespectively. [sent-248, score-0.603]
56 • A1, A2, A3 are the labels for node i, the left chi•ld A Aof node i, and the right child of node irespectively. [sent-249, score-0.566]
57 If the rule ri for the selected node i is of the form a → x, we have random variables R1, T1, H1, A1, O, B as defined above, but H2 , H3, T2 ,T3 , A2, and A3 are not defined. [sent-258, score-0.518]
58 We assume a function ψ that maps outside trees o to feature vectors ψ(o) ∈ . [sent-259, score-0.243]
59 For example, the feature vector might ψtr(aoc)k ∈ the rule directly above the node in question, the word following the node in question, and so on. [sent-260, score-0.429]
60 We also assume a function φ that maps inside trees t to feature vectors φ(t) ∈ Rd. [sent-261, score-0.282]
61 Athas one example, trheees sf tu tnoct fieoant φ might brse φ an )in ∈dicator function tracking the rule production at the root of the inside tree. [sent-262, score-0.185]
62 In tandem with thes≥e definitions, we assume projection matices Ua ∈ and Va ∈ for all a ∈ N. [sent-265, score-0.11]
63 We ∈then define addition∈al random vfoarria abllle as Y1, Y2, Y3, Z th as Rd′ R(d×m) R(d′×m) Y1 = (Ua1)⊤φ(T1) Z = (Va1)⊤ψ(O) Y2 = (Ua2)⊤φ(T2) Y3 = (Ua3)⊤φ(T3) where ai is the value of the random variable Ai. [sent-266, score-0.324]
64 227 Our observable representation then consists of: Ca→b c(y) = Da→b c(y)(Σa)−1 (2) ca∞→x = d∞(Σa)−1 (3) = E [[[A1 = a]]Y1 |B = 1] (4) ca1 We next introduce conditions under which these quantities satisfy the conditions in theorem 1. [sent-270, score-0.674]
65 The correctness of the representation will rely on the following conditions being satisfied (these are parallel to conditions 1 and 2 in Hsu et al. [sent-272, score-0.2]
66 (2009)): Condition 1 ∀a ∈ N, the matrices Ia and Ja are of full rtaionnk (i. [sent-273, score-0.123]
67 Condition 2 ∀a ∈ N, the matrices Ua ∈ R(d×m) ×am ∈) aCnodn dVitai ∈ R 2(d ∀′ are ,s tuhceh mthaattr itchees m Uatr∈ices Ga = (Ua)⊤Ia∈ and Ka = (Va)⊤Ja are invertible. [sent-277, score-0.123]
68 The remainder is similar to the proof of lemma 2 in Hsu et al. [sent-280, score-0.263]
69 The matrices can be estimated directly from a training set consisting of s-trees, assuming that we have access to the functions φ and ψ. [sent-282, score-0.166]
70 We can now state the following theorem: Theorem 2 Assume conditions 1and 2 are satisfied. [sent-283, score-0.1]
71 2): Da→b c(y) = (6) GcTa→b cdiag(yGbSa→b c)Qa→b cdiag(γa)(Ka)⊤ da∞→x = 1⊤Qa→xdiag(γa)(Ka)⊤ (7) Σa = Gadiag(γa)(Ka)⊤ (8) ca1 = Gaπa (9) Under conditions 1 and 2, Σa is invertible, and (Σa)−1 = ((Ka)⊤)−1(diag(γa))−1(Ga)−1. [sent-291, score-0.1]
72 7 Deriving Empirical Estimates Figure 4 shows an algorithm that derives estimates of the quantities in Eqs 2, 3, and 4. [sent-293, score-0.227]
73 As input, the algorithm takes a sequence of tuples (r(i,1), t(i,1), t(i,2), t(i,3), o(i), b(i)) for i∈ [M]. [sent-294, score-0.138]
74 τM as follows: • ∀i ∈ [M], choose a single node ji uniformly at ran•d ∀omi ∈ ∈fr [oMm t,h ceh nooosdees a i sni τi. [sent-298, score-0.153]
75 If is of the form a → b c, then is the inside tree uinsd oefr hthee loerfmt ch ai →ld o bf cn,o thdee ji, and is the inside tree under the right child of node ji. [sent-301, score-0.818]
76 b(i) is 1if node ji is at the root of the tree, 0 otherwise. [sent-304, score-0.153]
77 draws from the joint distribution over the random variables R1, T1, T2 , T3, O, B defined in the previous section. [sent-314, score-0.114]
78 The algorithm first computes estimates of the projection matrices Ua and Va: following lemma 1, this is done by first deriving estimates of Ωa, and then taking SVDs of each Ωa. [sent-315, score-0.561]
79 (Note that Λ is a function aof→ xthe projection matrices Ua and Va as well as the underlying L-PCFG. [sent-319, score-0.186]
80 ) • For each a ∈ N, σa is the value of the m’th largest singular va ∈lu Ne o,f σ Ωa. [sent-320, score-0.258]
81 We then have the following theorem: Theorem 3 Assume that the inputs to the algorithm in figure 4 are i. [sent-322, score-0.096]
82 draws from the joint distribution over the random variables R1, T1, T2 , T3 , O, B, under an L-PCFG with distribution p(r1 . [sent-325, score-0.114]
83 Assume that the algorithm in figure 4 has projection matrices and derived as left and right singular vectors of Ωa, as defined in Eq. [sent-330, score-0.539]
84 t Fheo rva alnuye s -ctraleceu rlated by the algorithm in figure 3 with inputs ca1, ca∞→x , derived from the algorithm in figure 4. [sent-341, score-0.192]
85 in D theefi input to the algorithm in figure 4 where ri,1 has non-terminal a on its lefthand-side. [sent-344, score-0.096]
86 Ωˆa Proof sketch: The proof is similar to that of Foster et al. [sent-386, score-0.165]
87 In practice, an implementation should most likely use all nodes in all trees in training data; by Rao-Blackwellization we know such an algorithm would be better than the one presented, but the analysis of how much better would be challenging. [sent-394, score-0.14]
88 The sample complexity of the method depends on the minimum singular values of Ωa; these singular values are a measure of how well correlated ψ and φ are with the unobserved hidden variable H1. [sent-404, score-0.336]
89 1 Proof of Theorem 1 First, the following lemma leads directly to the correctness of the algorithm in figure 2: 1Parameters can be estimated using the algorithm in figure 4; for a test sentence x1 . [sent-410, score-0.29]
90 xN we can first use the algorithm in figure 3 to calculate marginals µ(a, i,j), then use the algorithm of Goodman (1996) to find argmaxτ∈T (x) P(a,i,j)∈τ µ(a,i,j). [sent-413, score-0.241]
91 Lemma 2 Assume that conditions 1-3 of theorem 1 are satisfied, and that the input to the algorithm in figure 2 is an s-tree r1 . [sent-416, score-0.423]
92 Define ai for i∈ [N] to be the non-terminal on the left-hand-side of r [Nule] ri, and ti for i ∈ [N] to be the s-tree with rule ri at its root. [sent-420, score-0.315]
93 Finally, for aoll b ie ∈ [N], define t rheu row vector obio ∈ to rha avlle components R Fi(1n ×amlly) bih = P(Ti = ti|Hi = h, Ai = ai) for h ∈ [m]. [sent-421, score-0.293]
94 rN) This lemma shows a direct link between the vectors fi calculated in the algorithm, and the terms bih, which are terms calculated by the conventional inside algorithm: each fi is a linear transformation (through Gai) of the corresponding vector bi. [sent-426, score-0.489]
95 , for any isuch that ai ∈ P—we have bih = q(ri |h, ai), and it is easily veri∈fied P t—hawt fei h = bei b(G(ai))−1 . [sent-431, score-0.2]
96 For all i ∈ [N] sucThh teh iant ai ∈ I,by eth ies daesfi fonlitlioowns i. [sent-433, score-0.111]
97 n Ftoher algorithm, fi = fγCri(fβ) = fγGaγTridiag(fβGaβSri)Qri(Gai)−1 × Assuming by induction that fγ = bγ (G(aγ) )−1 and fβ = bβ(G(aβ))−1, this simplifies to fi = κrdiag(κl)Qri(Gai)−1 (10) where κr = bγTri, and κl = bβSri. [sent-434, score-0.15]
98 Next, we give a similar lemma, which implies the correctness of the algorithm in figure 3: 230 Lemma 3 Assume that conditions 1-3 of theorem 1 are satisfied, and that the input to the algorithm in figure 3 is a sentence x1 . [sent-447, score-0.519]
99 The proof is by induction, and is similar to the proof of lemma 2; for reasons of space it is omitted. [sent-466, score-0.428]
100 For reasons of space, we do not give the proofs of identities 7-9: the proofs are similar. [sent-471, score-0.256]
wordName wordTfidf (topN-words)
[('spectral', 0.264), ('tensor', 0.247), ('theorem', 0.227), ('ga', 0.226), ('rn', 0.198), ('hsu', 0.174), ('proof', 0.165), ('rm', 0.163), ('ua', 0.159), ('node', 0.153), ('observable', 0.141), ('singular', 0.136), ('ri', 0.131), ('matrices', 0.123), ('va', 0.122), ('inside', 0.112), ('ai', 0.111), ('ca', 0.102), ('conditions', 0.1), ('ph', 0.099), ('lemma', 0.098), ('algorithm', 0.096), ('qa', 0.095), ('da', 0.094), ('proofs', 0.093), ('bih', 0.089), ('cdiag', 0.089), ('gcta', 0.089), ('skeletal', 0.089), ('bh', 0.088), ('tree', 0.088), ('vectors', 0.079), ('qri', 0.077), ('fi', 0.075), ('outside', 0.073), ('rule', 0.073), ('identities', 0.07), ('define', 0.068), ('balle', 0.066), ('gai', 0.066), ('tensors', 0.066), ('ja', 0.066), ('hb', 0.066), ('quantities', 0.066), ('pcfgs', 0.066), ('child', 0.065), ('estimates', 0.065), ('foster', 0.064), ('hidden', 0.064), ('ka', 0.063), ('projection', 0.063), ('definitions', 0.063), ('condition', 0.062), ('hi', 0.061), ('variables', 0.061), ('rooted', 0.06), ('pmf', 0.058), ('xn', 0.054), ('random', 0.053), ('deriving', 0.051), ('vector', 0.05), ('marginals', 0.049), ('svd', 0.049), ('rh', 0.049), ('xh', 0.049), ('form', 0.047), ('assume', 0.047), ('ha', 0.045), ('estimation', 0.045), ('matrix', 0.044), ('dean', 0.044), ('afondr', 0.044), ('aof', 0.044), ('ehaecrhe', 0.044), ('eja', 0.044), ('fmo', 0.044), ('hpa', 0.044), ('invertible', 0.044), ('lugue', 0.044), ('mach', 0.044), ('parikh', 0.044), ('prdiag', 0.044), ('rha', 0.044), ('separability', 0.044), ('tfhoer', 0.044), ('vempala', 0.044), ('trees', 0.044), ('em', 0.043), ('assuming', 0.043), ('tuples', 0.042), ('eh', 0.042), ('row', 0.042), ('right', 0.042), ('hmms', 0.041), ('bi', 0.041), ('matsuzaki', 0.041), ('satisfy', 0.04), ('th', 0.039), ('diagonal', 0.039), ('ia', 0.039)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 181 acl-2012-Spectral Learning of Latent-Variable PCFGs
Author: Shay B. Cohen ; Karl Stratos ; Michael Collins ; Dean P. Foster ; Lyle Ungar
Abstract: We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al., 2006). Under a separability (singular value) condition, we prove that the method provides consistent parameter estimates.
2 0.11629303 139 acl-2012-MIX Is Not a Tree-Adjoining Language
Author: Makoto Kanazawa ; Sylvain Salvati
Abstract: The language MIX consists of all strings over the three-letter alphabet {a, b, c} that contain an equal n-luemttebrer a olpfh occurrences }o tfh heaatch c olentttaeinr. We prove Joshi’s (1985) conjecture that MIX is not a tree-adjoining language.
3 0.10854807 78 acl-2012-Efficient Search for Transformation-based Inference
Author: Asher Stern ; Roni Stern ; Ido Dagan ; Ariel Felner
Abstract: This paper addresses the search problem in textual inference, where systems need to infer one piece of text from another. A prominent approach to this task is attempts to transform one text into the other through a sequence of inference-preserving transformations, a.k.a. a proof, while estimating the proof’s validity. This raises a search challenge of finding the best possible proof. We explore this challenge through a comprehensive investigation of prominent search algorithms and propose two novel algorithmic components specifically designed for textual inference: a gradient-style evaluation function, and a locallookahead node expansion method. Evaluations, using the open-source system, BIUTEE, show the contribution of these ideas to search efficiency and proof quality.
4 0.089422293 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment
Author: Asher Stern ; Ido Dagan
Abstract: This paper introduces BIUTEE1 , an opensource system for recognizing textual entailment. Its main advantages are its ability to utilize various types of knowledge resources, and its extensibility by which new knowledge resources and inference components can be easily integrated. These abilities make BIUTEE an appealing RTE system for two research communities: (1) researchers of end applications, that can benefit from generic textual inference, and (2) RTE researchers, who can integrate their novel algorithms and knowledge resources into our system, saving the time and effort of developing a complete RTE system from scratch. Notable assistance for these re- searchers is provided by a visual tracing tool, by which researchers can refine and “debug” their knowledge resources and inference components.
5 0.086180732 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction
Author: Katsuhiko Hayashi ; Taro Watanabe ; Masayuki Asahara ; Yuji Matsumoto
Abstract: This paper presents a novel top-down headdriven parsing algorithm for data-driven projective dependency analysis. This algorithm handles global structures, such as clause and coordination, better than shift-reduce or other bottom-up algorithms. Experiments on the English Penn Treebank data and the Chinese CoNLL-06 data show that the proposed algorithm achieves comparable results with other data-driven dependency parsing algorithms.
6 0.08217261 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
7 0.076879792 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
8 0.075832501 185 acl-2012-Strong Lexicalization of Tree Adjoining Grammars
9 0.074724764 109 acl-2012-Higher-order Constituent Parsing and Parser Combination
10 0.067115463 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
11 0.062519528 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
12 0.061720945 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
13 0.061528504 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions
14 0.061175793 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation
15 0.059590071 94 acl-2012-Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection
16 0.058920443 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums
17 0.056790192 16 acl-2012-A Nonparametric Bayesian Approach to Acoustic Model Discovery
18 0.055899225 145 acl-2012-Modeling Sentences in the Latent Space
19 0.054544091 83 acl-2012-Error Mining on Dependency Trees
20 0.054270048 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
topicId topicWeight
[(0, -0.171), (1, 0.008), (2, -0.06), (3, -0.039), (4, -0.068), (5, 0.029), (6, -0.009), (7, 0.137), (8, 0.065), (9, 0.001), (10, -0.057), (11, -0.028), (12, -0.05), (13, -0.052), (14, 0.074), (15, -0.017), (16, 0.047), (17, 0.048), (18, 0.032), (19, 0.028), (20, 0.027), (21, -0.093), (22, -0.066), (23, -0.025), (24, -0.074), (25, 0.004), (26, -0.074), (27, -0.057), (28, -0.027), (29, 0.135), (30, -0.08), (31, 0.064), (32, -0.127), (33, -0.054), (34, 0.018), (35, 0.099), (36, -0.027), (37, -0.143), (38, -0.217), (39, -0.106), (40, 0.093), (41, -0.008), (42, -0.049), (43, 0.068), (44, 0.127), (45, 0.025), (46, -0.145), (47, 0.081), (48, 0.015), (49, -0.051)]
simIndex simValue paperId paperTitle
same-paper 1 0.95622623 181 acl-2012-Spectral Learning of Latent-Variable PCFGs
Author: Shay B. Cohen ; Karl Stratos ; Michael Collins ; Dean P. Foster ; Lyle Ungar
Abstract: We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al., 2006). Under a separability (singular value) condition, we prove that the method provides consistent parameter estimates.
2 0.7459929 139 acl-2012-MIX Is Not a Tree-Adjoining Language
Author: Makoto Kanazawa ; Sylvain Salvati
Abstract: The language MIX consists of all strings over the three-letter alphabet {a, b, c} that contain an equal n-luemttebrer a olpfh occurrences }o tfh heaatch c olentttaeinr. We prove Joshi’s (1985) conjecture that MIX is not a tree-adjoining language.
3 0.60976082 185 acl-2012-Strong Lexicalization of Tree Adjoining Grammars
Author: Andreas Maletti ; Joost Engelfriet
Abstract: Recently, it was shown (KUHLMANN, SATTA: Tree-adjoining grammars are not closed under strong lexicalization. Comput. Linguist., 2012) that finitely ambiguous tree adjoining grammars cannot be transformed into a normal form (preserving the generated tree language), in which each production contains a lexical symbol. A more powerful model, the simple context-free tree grammar, admits such a normal form. It can be effectively constructed and the maximal rank of the nonterminals only increases by 1. Thus, simple context-free tree grammars strongly lexicalize tree adjoining grammars and themselves.
4 0.4974601 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
Author: Bevan Jones ; Mark Johnson ; Sharon Goldwater
Abstract: Many semantic parsing models use tree transformations to map between natural language and meaning representation. However, while tree transformations are central to several state-of-the-art approaches, little use has been made of the rich literature on tree automata. This paper makes the connection concrete with a tree transducer based semantic parsing model and suggests that other models can be interpreted in a similar framework, increasing the generality of their contributions. In particular, this paper further introduces a variational Bayesian inference algorithm that is applicable to a wide class of tree transducers, producing state-of-the-art semantic parsing results while remaining applicable to any domain employing probabilistic tree transducers.
5 0.4498232 78 acl-2012-Efficient Search for Transformation-based Inference
Author: Asher Stern ; Roni Stern ; Ido Dagan ; Ariel Felner
Abstract: This paper addresses the search problem in textual inference, where systems need to infer one piece of text from another. A prominent approach to this task is attempts to transform one text into the other through a sequence of inference-preserving transformations, a.k.a. a proof, while estimating the proof’s validity. This raises a search challenge of finding the best possible proof. We explore this challenge through a comprehensive investigation of prominent search algorithms and propose two novel algorithmic components specifically designed for textual inference: a gradient-style evaluation function, and a locallookahead node expansion method. Evaluations, using the open-source system, BIUTEE, show the contribution of these ideas to search efficiency and proof quality.
6 0.42292804 83 acl-2012-Error Mining on Dependency Trees
7 0.41642073 34 acl-2012-Automatically Learning Measures of Child Language Development
8 0.41184902 42 acl-2012-Bootstrapping via Graph Propagation
9 0.40268013 145 acl-2012-Modeling Sentences in the Latent Space
10 0.39984968 107 acl-2012-Heuristic Cube Pruning in Linear Time
11 0.37725386 120 acl-2012-Information-theoretic Multi-view Domain Adaptation
12 0.37658712 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment
13 0.37498361 184 acl-2012-String Re-writing Kernel
14 0.36992216 163 acl-2012-Prediction of Learning Curves in Machine Translation
15 0.36406595 137 acl-2012-Lemmatisation as a Tagging Task
16 0.34202 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
17 0.3315829 108 acl-2012-Hierarchical Chunk-to-String Translation
18 0.33051008 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
19 0.32418653 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction
20 0.31491947 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing
topicId topicWeight
[(23, 0.347), (25, 0.026), (26, 0.031), (28, 0.046), (30, 0.04), (37, 0.031), (39, 0.048), (59, 0.018), (74, 0.02), (82, 0.02), (84, 0.024), (85, 0.025), (90, 0.105), (92, 0.088), (94, 0.023), (99, 0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.80863601 181 acl-2012-Spectral Learning of Latent-Variable PCFGs
Author: Shay B. Cohen ; Karl Stratos ; Michael Collins ; Dean P. Foster ; Lyle Ungar
Abstract: We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al., 2006). Under a separability (singular value) condition, we prove that the method provides consistent parameter estimates.
2 0.57682604 73 acl-2012-Discriminative Learning for Joint Template Filling
Author: Einat Minkov ; Luke Zettlemoyer
Abstract: This paper presents a joint model for template filling, where the goal is to automatically specify the fields of target relations such as seminar announcements or corporate acquisition events. The approach models mention detection, unification and field extraction in a flexible, feature-rich model that allows for joint modeling of interdependencies at all levels and across fields. Such an approach can, for example, learn likely event durations and the fact that start times should come before end times. While the joint inference space is large, we demonstrate effective learning with a Perceptron-style approach that uses simple, greedy beam decoding. Empirical results in two benchmark domains demonstrate consistently strong performance on both mention de- tection and template filling tasks.
3 0.42943519 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
Author: Bevan Jones ; Mark Johnson ; Sharon Goldwater
Abstract: Many semantic parsing models use tree transformations to map between natural language and meaning representation. However, while tree transformations are central to several state-of-the-art approaches, little use has been made of the rich literature on tree automata. This paper makes the connection concrete with a tree transducer based semantic parsing model and suggests that other models can be interpreted in a similar framework, increasing the generality of their contributions. In particular, this paper further introduces a variational Bayesian inference algorithm that is applicable to a wide class of tree transducers, producing state-of-the-art semantic parsing results while remaining applicable to any domain employing probabilistic tree transducers.
4 0.4263944 31 acl-2012-Authorship Attribution with Author-aware Topic Models
Author: Yanir Seroussi ; Fabian Bohnert ; Ingrid Zukerman
Abstract: Authorship attribution deals with identifying the authors of anonymous texts. Building on our earlier finding that the Latent Dirichlet Allocation (LDA) topic model can be used to improve authorship attribution accuracy, we show that employing a previously-suggested Author-Topic (AT) model outperforms LDA when applied to scenarios with many authors. In addition, we define a model that combines LDA and AT by representing authors and documents over two disjoint topic sets, and show that our model outperforms LDA, AT and support vector machines on datasets with many authors.
5 0.42298296 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
Author: Hiroyuki Shindo ; Yusuke Miyao ; Akinori Fujino ; Masaaki Nagata
Abstract: We propose Symbol-Refined Tree Substitution Grammars (SR-TSGs) for syntactic parsing. An SR-TSG is an extension of the conventional TSG model where each nonterminal symbol can be refined (subcategorized) to fit the training data. We aim to provide a unified model where TSG rules and symbol refinement are learned from training data in a fully automatic and consistent fashion. We present a novel probabilistic SR-TSG model based on the hierarchical Pitman-Yor Process to encode backoff smoothing from a fine-grained SR-TSG to simpler CFG rules, and develop an efficient training method based on Markov Chain Monte Carlo (MCMC) sampling. Our SR-TSG parser achieves an F1 score of 92.4% in the Wall Street Journal (WSJ) English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and better than state-of-the-art discriminative reranking parsers.
6 0.42135242 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars
7 0.41869232 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
8 0.41864741 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
9 0.4183405 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition
10 0.41830692 167 acl-2012-QuickView: NLP-based Tweet Search
11 0.41683036 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment
12 0.41340271 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale
13 0.41165048 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
14 0.41059393 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
15 0.40971291 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization
16 0.4089345 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling
17 0.40744194 139 acl-2012-MIX Is Not a Tree-Adjoining Language
18 0.40720063 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
19 0.40690574 98 acl-2012-Finding Bursty Topics from Microblogs
20 0.40633839 182 acl-2012-Spice it up? Mining Refinements to Online Instructions from User Generated Content