emnlp emnlp2010 emnlp2010-65 knowledge-graph by maker-knowledge-mining

65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification


Source: pdf

Author: Tom Kwiatkowksi ; Luke Zettlemoyer ; Sharon Goldwater ; Mark Steedman

Abstract: This paper addresses the problem of learning to map sentences to logical form, given training data consisting of natural language sentences paired with logical representations of their meaning. Previous approaches have been designed for particular natural languages or specific meaning representations; here we present a more general method. The approach induces a probabilistic CCG grammar that represents the meaning of individual words and defines how these meanings can be combined to analyze complete sentences. We use higher-order unification to define a hypothesis space containing all grammars consistent with the training data, and develop an online learning algorithm that efficiently searches this space while simultaneously estimating the parameters of a log-linear parsing model. Experiments demonstrate high accuracy on benchmark data sets in four languages with two different meaning representations.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu ∗School of Informatics University of Edinburgh Edinburgh, EH8 9AB, UK Abstract This paper addresses the problem of learning to map sentences to logical form, given training data consisting of natural language sentences paired with logical representations of their meaning. [sent-8, score-0.831]

2 The approach induces a probabilistic CCG grammar that represents the meaning of individual words and defines how these meanings can be combined to analyze complete sentences. [sent-10, score-0.253]

3 We use higher-order unification to define a hypothesis space containing all grammars consistent with the training data, and develop an online learning algorithm that efficiently searches this space while simultaneously estimating the parameters of a log-linear parsing model. [sent-11, score-0.395]

4 Recent work has addressed this problem by learning semantic parsers given sentences paired with logical meaning representations (Thompson & Mooney, 2002; Kate et al. [sent-14, score-0.655]

5 Here, we develop an approach that can learn to map any natural language to a wide variety of logical representations of linguistic meaning. [sent-26, score-0.474]

6 The need to learn over multiple representations arises from the fact that there is no standard representation for logical form for natural language. [sent-31, score-0.474]

7 Our approach works by inducing a combinatory categorial grammar (CCG) (Steedman, 1996, 2000). [sent-34, score-0.25]

8 A CCG grammar consists of a language-specific lexicon, whose entries pair individual words and phrases with both syntactic and semantic information, and a universal set of combinatory rules that Proce MdiInTg,s M oaf sthseac 2h0u1s0et Ctso, UnfeSrAe,nc 9e-1 o1n O Ecmtopbireirca 2l0 M10e. [sent-35, score-0.329]

9 c od2s01 in0 N Aastsuorcaialt Lioan g foura Cgeom Prpoucteastisoin ga,l p Laignegsui 1s2ti2c3s–123 , project that lexicon onto the sentences and meanings of the language via syntactic derivations. [sent-37, score-0.18]

10 The learning process starts by postulating, for each sentence in the training data, a single multi-word lexical item pairing that sentence with its complete logical form. [sent-38, score-0.513]

11 These entries are iteratively refined with a restricted higher-order unification procedure (Huet, 1975) that defines all possible ways to subdivide them, consis- tent with the requirement that each training sentence can still be parsed to yield its labeled meaning. [sent-39, score-0.456]

12 We show that accurate models can be learned for multiple languages with both the variable-free and lambdacalculus meaning representations introduced above. [sent-46, score-0.342]

13 2 Overview of the Approach The goal of our algorithm is to find a function f : x → z that maps sentences x to logical ex1224 pressions z. [sent-50, score-0.33]

14 The induced grammar consists of two components which the algorithm must learn: • • A CCG lexicon, Λ, containing lexical items tAhat C dCeGfin lee tihceo space coofn possible parses y mfors an input sentence x. [sent-55, score-0.398]

15 Each parse contains both syntactic and semantic information, and defines the output logical form z. [sent-56, score-0.514]

16 The lexical induction process (Section 4) uses a restricted form of higher order unification along with the CCG combinatory rules to propose new entries for Λ. [sent-59, score-0.644]

17 The complete learning algorithm (Section 5) integrates this lexical induction with a parameter estimation scheme that learns θ. [sent-60, score-0.186]

18 3 Background This section provides an introduction to the ways in which we will use lambda calculus and higher-order unification to construct meaning representations. [sent-62, score-0.588]

19 1 Lambda Calculus and Higher-Order Unification We assume that sentence meanings are represented as logical expressions, which we will construct from the meaning of individual words by using the operations defined in the lambda calculus. [sent-65, score-0.602]

20 We represent the meaning of words and phrases using lambda-calculus expressions that can contain constants, quantifiers, logical connectors, and lambda abstractions. [sent-70, score-0.628]

21 The meanings of individual words and phrases can be arbitrary lambda expressions, while the final meaning for a sentence can take different forms. [sent-72, score-0.272]

22 It can be a full lambdacalculus expression, a variable-free expression such as answer(state(borders(tex))), or any other logical expression that can be built from the primitive meanings via function application and composition. [sent-73, score-0.599]

23 The higher-order unification problem (Huet, 1975) involves finding a substitution for the free variables in a pair of lambda-calculus expressions that, when applied, makes the expressions equal each other. [sent-74, score-0.426]

24 In this paper, we will guide the grammar induction process using a restricted version of higherorder unification that is tractable. [sent-76, score-0.438]

25 This limited form of the unification problem will allow us to define the ways to split h into subparts that can be recombined with CCG parsing operations, which we will define in the next section, to reconstruct h. [sent-79, score-0.384]

26 For present purposes a CCG grammar includes a lexicon Λ with entries like the following: New York ‘ NP : ny borders ‘ S\NP/NP : λxλy. [sent-82, score-0.456]

27 next to(y, x) Vermont ‘ NP : vt where each lexical item w ‘ X : h has words w, a syntactic category X, amnd w a logical hfo hrmas h w expressed as a lambda-calculus expression. [sent-83, score-0.695]

28 For example, given the lexicon above, the sentence New York borders Vermont can be parsed to produce: New York borders Vermont NnyP λxλ(Sy. [sent-88, score-0.377]

29 Figure 1) swhiotwhs a ntwyo o parses wPh,e Sre/ NthPe composition combinators and vertical slashes are used. [sent-97, score-0.193]

30 The joint probability of a logical form z constructed with a parse y, given a sentence x is hangi eyaletin texas ye siniri vardir what states border texas λx. [sent-102, score-0.726]

31 state(x) ∧ next to(x,y)> Figure 1: Two examples of CCG parses with different logical form representations. [sent-112, score-0.398]

32 defined as: P(y,z|x;θ,Λ) =P(y0e,zθ0·)φ(exθ,·yφ,(zx),y0,z0) (1) Section 7 defines the feaPtures used in the experiments, which include, for example, lexical features that indicate when specific lexical items in Λ are used in the parse y. [sent-113, score-0.488]

33 4 Splitting Lexical Items Before presenting a complete learning algorithm, we first describe how to use higher-order unification to define a procedure for splitting CCG lexical entries. [sent-131, score-0.548]

34 This splitting process is used to expand the lexicon during learning. [sent-132, score-0.267]

35 We seed the lexical induction with a multi-word lexical item xi ‘S: zi for each training example (xi, zi), consisting o‘fS Sth:ez entire sentence xi and its associated meaning representation ample, one initial lexical item might be: zi. [sent-133, score-0.979]

36 For ex- New York borders Vermont ‘S:next to(ny, vt) (5) Although these initial, sentential lexical items can parse the training data, they will not generalize well to unseen data. [sent-134, score-0.521]

37 To learn effectively, we will need to split overly specific entries of this type into pairs of new, smaller, entries that generalize better. [sent-135, score-0.372]

38 For example, one possible split of the lexical entry given in (5) would be the pair: New York borders ‘ S/NP : λx. [sent-136, score-0.229]

39 next to(ny, x), Vermont ‘ NP : vt where we broke the original logical expression into two new ones λx. [sent-137, score-0.495]

40 next to(ny, x) and vt, and paired them with syntactic categories that allow the new lexical entries to be recombined to produce the original analysis. [sent-138, score-0.401]

41 The process is driven by solving a higher-order unification problem that defines all of the ways of splitting the logical expression into two parts, as described in Section 4. [sent-140, score-0.917]

42 2 describes how to construct syntactic categories that are consistent with the two new fragments of logical form and which will allow the new lexical items to recombine. [sent-143, score-0.695]

43 3 defines the full set of lexical entry pairs that can be created by splitting a lexical entry. [sent-145, score-0.433]

44 As we will see, this splitting process is overly prolific for any single language and will yield many lexical items that do not generalize well. [sent-146, score-0.506]

45 Section 5 describes how we estimate the parameters of a probabilistic parsing model and how this parsing model can be used to guide the selection of items to add to the lexicon. [sent-149, score-0.269]

46 1 Restricted Higher-Order Unification The set of possible splits for a logical expression h is defined as the solution to a pair of higherorder unification problems. [sent-151, score-0.824]

47 We find pairs of logical expressions (f, g) such that either f(g) = h or λx. [sent-152, score-0.434]

48 2 Splitting Categories We define the set of possible splits for a category X :h with syntax X and logical form h by enumerating the solution pairs (f, g) to the higher-order unification problems defined above and creating syntactic categories for the resulting expressions. [sent-178, score-0.894]

49 in(x, y) ) which were constructed by first choosing the syntactic category for g, in this case NP, and then enumerating the possible directions for the new slash in the category containing f. [sent-183, score-0.256]

50 rTe thuernns, tthhee syntactic category of T as follows: C(T) =SCN(PT2)|C(T1) wifh T en = T te=hT1,T2i The basic types e and t are assigned syntactic categories NP and S, and all functional types are assigned categories recursively. [sent-190, score-0.252]

51 , wni and ‘C ACG, category A, define the se=t SL of splits toi b ane:d SL(w0:n‘A) = {(w0:i‘B, wi+1:n‘C) | =0 { ≤ i< n ∧w (B, C) ∈ SC(A)} where we enumerate all ways of splitting the words sequence w0:n and aligning the subsequences with categories in SC(A), as defined in the last section. [sent-216, score-0.355]

52 5 Learning Algorithm The previous section described how a splitting procedure can be used to break apart overly specific lexical items into smaller ones that may generalize better to unseen data. [sent-217, score-0.506]

53 The space of possible lexical items supported by this splitting procedure is too 1228 large to explicitly enumerate. [sent-218, score-0.433]

54 Instead, we learn the parameters of a PCCG, which is used both to guide the splitting process, and also to select the best parse, given a learned lexicon. [sent-219, score-0.22]

55 First, new lexical items are induced for the training instance by splitting and merging nodes in the best correct parse, given the current parameters. [sent-222, score-0.433]

56 Inputs and Initialization The algorithm takes as input the training set of n (sentence, logical form) pairs {(xi, zi) : i = 1. [sent-224, score-0.369]

57 pTrohepe lexicon, eΛx,i cisa lin ititemialsiz seudc hw aitsh a single lexical item xi ‘ S : zi for each of the training pairs along with the ‘coSnt:eznts of the NP list. [sent-231, score-0.482]

58 Step 1: Updating the Lexicon In the lexical update step the algorithm first computes the best correct parse tree y∗ for the current training example and then uses y∗ as input to the procedure NEW-LEX, which determines which (if any) new lexical items to add to Λ. [sent-234, score-0.481]

59 For each pair (C, wi:j), NEW-LEX considers introducing a new lexical item wi:j ‘C, which allows for the possibility ofa parse where‘ ‘thCe, ,su wbhtircehe raollootweds at C is replaced with this new entry. [sent-239, score-0.252]

60 ) NEW-LEX also considers adding each pair of new lexical items that is obtained by splitting wi:j ‘C as described in Section 4, thereby considering many ddiefsfecrriebnetd ways cofreanalyzing the node. [sent-241, score-0.433]

61 This process creates a set of possible new lexicons, where each lexicon expands Λ in a different way by adding the items from either a single split or a single merge of a node in y∗. [sent-242, score-0.292]

62 For each potential new lexicon Λ0, NEW-LEX computes the probability p(y∗ |xi, zi; θ0, Λ0) of the original parse y∗ under Λ0 and parameters θ0 that are the same as θ but have weights for the new lexical items, as described in Section 7. [sent-243, score-0.276]

63 The initial model assigns a conditional likelihood of one to each training example (there is a single lexical item for each sentence xi, and it contains the labeled logical form zi). [sent-250, score-0.513]

64 Although the splitting step often decreases the probability of the data, the new entries it produces are less specific and should generalize better. [sent-251, score-0.312]

65 Since we initially assign positive weights to the parameters for new lexical items, the overall approach prefers splitting; trees with many lexical items will initially be much more likely. [sent-252, score-0.373]

66 However, if the learned lexical items are used in too many incorrect parses, the stochastic gradient updates will down weight them to the point where the lexical induction step can merge or re-split nodes in the trees that contain them. [sent-253, score-0.536]

67 Several approaches have been designed for the variable-free logical representations shown in examples throughout this paper. [sent-256, score-0.442]

68 For example, Kate & Mooney (2006) present a method (KRISP) that extends an existing SVM learning algorithm to recover logical representations. [sent-257, score-0.33]

69 Definitions: The function NEW-LEX(y) takes a parse y and returns a set of new lexical items found by splitting and merging categories in y, as described in Section 5. [sent-269, score-0.56]

70 n : Step 1: (Update Lexicon) • Let y∗ = arg maxy p(y|xi , zi ; θ, Λ) • Set Λ = Λ ∪ NEW-LEX(y∗) and expand the parameter v ∪ec NtoErW Wθ LtoE cXon(ytain entries for the new lexical items, as described in Section 7. [sent-284, score-0.373]

71 Zettlemoyer & Collins (2005, 2007) developed CCG grammar induction techniques where lexical items are proposed according to a set of hand-engineered lexical templates. [sent-294, score-0.471]

72 Another line of work has focused on recovering meaning representations that are not based on logic. [sent-296, score-0.227]

73 (2004) consider the challenging problem of constructing broad-coverage semantic representations with CCG, but do not learn the lexicon. [sent-303, score-0.183]

74 First, we include a set of lexical features: For each lexical item L ∈ Λ, we include a feature φL rt ehaact hfir leesx iwcaheln it eLm mis L Lus ∈ed. [sent-305, score-0.279]

75 Λ Second, we ienc alu fedaet semantic features that are functions of the output logical expression z. [sent-306, score-0.458]

76 The initial weight for each φL is set to ten times the average score over the (word, constant) pairs in L, except for the weights of seed lexical entries in ΛNP which are set to 10 (equivalent to the highest possible coocurrence score). [sent-311, score-0.284]

77 Data and Evaluation We evaluate our system on the GeoQuery datasets, which contain natural- language queries of a geographical database paired with logical representations of each query’s meaning. [sent-315, score-0.501]

78 We report results for both representations, using the standard measures of Recall (percentage of test sentences assigned correct logical forms), Precision (percentage of logical forms returned that are correct) and F1 (the harmonic mean of Precision and Recall). [sent-322, score-0.66]

79 In aggregate, they demonstrate that our algorithm, UBL, learns accurate models across languages and for both meaning representations. [sent-327, score-0.191]

80 These approaches used language-specific templates to propose new lexical items and also required as input a set of hand-engineered lexical entries to model phenomena such as quantification and determiners. [sent-335, score-0.487]

81 However, the use of higher-order unification allows UBL to achieve comparable performance while automatically inducing these types of entries. [sent-336, score-0.331]

82 For a more qualitative evaluation, Table 4 shows a selection of lexical items learned with high weights for the lambda-calculus meaning representations. [sent-337, score-0.424]

83 Language-specific word order is also encoded, using the slash directions of the CCG 1231 meaning representations. [sent-340, score-0.189]

84 There is less variation and complexity in the learned lexical items for the variable-free representation. [sent-345, score-0.309]

85 For example, recall that the sentence “what states border texas” would be paired with the meaning answer(state(borders(tex))). [sent-347, score-0.22]

86 For this representation, lexical items such as: what ‘ S/NP : λx. [sent-348, score-0.277]

87 borders(x) texas ‘ NP : tex can be used to construct the desired output. [sent-351, score-0.338]

88 For example, the average length of a learned lexical item for the (lambdacalculus, variable-free) meaning representations is: (1. [sent-358, score-0.442]

89 For both meaning representations the model learns significantly more multiword lexical items for the somewhat analytic Japanese than the agglutinative Turkish. [sent-367, score-0.549]

90 There are also variations in the average number of learned lexical items in the best parses during the final pass of training: 192 for Japanese, 206 for Spanish, 188 for English and 295 for Turkish. [sent-368, score-0.377]

91 Sim- ilarly, the approach presented here does not model morphology, and must repeatedly learn the correct categories for the Turkish words “nehri,” “nehir,” “nehirler,” and “nehirlerin”, all of which correspond to the logical form λx. [sent-375, score-0.42]

92 9 Conclusions and Future Work This paper has presented a method for inducing probabilistic CCGs from sentences paired with logical forms. [sent-377, score-0.424]

93 The approach uses higher-order unification to define the space of possible grammars in a language- and representation-independent manner, paired with an algorithm that learns a probabilistic parsing model. [sent-378, score-0.499]

94 We evaluated the approach on four languages with two meaning representations each, achieving high accuracy across all scenarios. [sent-379, score-0.258]

95 One potential limitation is in the constraints we introduced to ensure the tractability of the higher-order unification procedure. [sent-381, score-0.296]

96 These restrictions will not allow the approach to induce lexical items that would be used with, among other things, many of the type-raised combinators commonly employed in CCG grammars. [sent-382, score-0.329]

97 Such an approach would complement ideas for using high-order unification to model a wider range of language phenomena, such as VP ellipsis (Dalrymple et al. [sent-399, score-0.331]

98 Learning synchronous grammars for semantic parsing with lambda calculus. [sent-536, score-0.256]

99 Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. [sent-549, score-0.399]

100 Online learning of relaxed CCG grammars for parsing to logical form. [sent-555, score-0.429]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('logical', 0.33), ('ccg', 0.315), ('unification', 0.296), ('tex', 0.276), ('np', 0.208), ('items', 0.181), ('zi', 0.163), ('zettlemoyer', 0.16), ('mooney', 0.156), ('splitting', 0.156), ('ubl', 0.138), ('borders', 0.133), ('vermont', 0.121), ('lambda', 0.118), ('meaning', 0.115), ('entries', 0.114), ('representations', 0.112), ('lexicon', 0.111), ('kate', 0.107), ('wong', 0.098), ('xi', 0.097), ('lexical', 0.096), ('combinatory', 0.093), ('expression', 0.089), ('item', 0.087), ('huet', 0.086), ('turkish', 0.08), ('vt', 0.076), ('steedman', 0.076), ('category', 0.076), ('slash', 0.074), ('categorial', 0.069), ('parse', 0.069), ('parses', 0.068), ('expressions', 0.065), ('splits', 0.065), ('texas', 0.062), ('calculus', 0.059), ('paired', 0.059), ('categories', 0.058), ('lu', 0.056), ('grammars', 0.055), ('gradient', 0.054), ('grammar', 0.053), ('combinators', 0.052), ('lambdacalculus', 0.052), ('pccg', 0.052), ('vardir', 0.052), ('collins', 0.05), ('curran', 0.047), ('japanese', 0.047), ('border', 0.046), ('defines', 0.046), ('induction', 0.045), ('learns', 0.045), ('ny', 0.045), ('recombined', 0.044), ('higherorder', 0.044), ('parsing', 0.044), ('generalize', 0.042), ('fc', 0.041), ('wi', 0.04), ('backward', 0.039), ('ep', 0.039), ('update', 0.039), ('meanings', 0.039), ('pairs', 0.039), ('semantic', 0.039), ('composition', 0.038), ('clark', 0.036), ('inducing', 0.035), ('buszkowski', 0.035), ('ccgs', 0.035), ('combinator', 0.035), ('coocurrence', 0.035), ('cooling', 0.035), ('ellipsis', 0.035), ('eyaletin', 0.035), ('geoquery', 0.035), ('hangi', 0.035), ('lecun', 0.035), ('ntexp', 0.035), ('quantifiers', 0.035), ('siniri', 0.035), ('slashes', 0.035), ('wasp', 0.035), ('zelle', 0.035), ('forward', 0.033), ('updates', 0.032), ('learned', 0.032), ('learn', 0.032), ('overly', 0.031), ('bc', 0.031), ('sl', 0.031), ('languages', 0.031), ('syntactic', 0.03), ('carpenter', 0.03), ('esx', 0.03), ('kwiatkowski', 0.03), ('villavicencio', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

Author: Tom Kwiatkowksi ; Luke Zettlemoyer ; Sharon Goldwater ; Mark Steedman

Abstract: This paper addresses the problem of learning to map sentences to logical form, given training data consisting of natural language sentences paired with logical representations of their meaning. Previous approaches have been designed for particular natural languages or specific meaning representations; here we present a more general method. The approach induces a probabilistic CCG grammar that represents the meaning of individual words and defines how these meanings can be combined to analyze complete sentences. We use higher-order unification to define a hypothesis space containing all grammars consistent with the training data, and develop an online learning algorithm that efficiently searches this space while simultaneously estimating the parameters of a log-linear parsing model. Experiments demonstrate high accuracy on benchmark data sets in four languages with two different meaning representations.

2 0.18668404 121 emnlp-2010-What a Parser Can Learn from a Semantic Role Labeler and Vice Versa

Author: Stephen Boxwell ; Dennis Mehay ; Chris Brew

Abstract: In many NLP systems, there is a unidirectional flow of information in which a parser supplies input to a semantic role labeler. In this paper, we build a system that allows information to flow in both directions. We make use of semantic role predictions in choosing a single-best parse. This process relies on an averaged perceptron model to distinguish likely semantic roles from erroneous ones. Our system penalizes parses that give rise to low-scoring semantic roles. To explore the consequences of this we perform two experiments. First, we use a baseline generative model to produce n-best parses, which are then re-ordered by our semantic model. Second, we use a modified version of our semantic role labeler to predict semantic roles at parse time. The performance of this modified labeler is weaker than that of our best full SRL, because it is restricted to features that can be computed directly from the parser’s packed chart. For both experiments, the resulting semantic predictions are then used to select parses. Finally, we feed the selected parses produced by each experiment to the full version of our semantic role labeler. We find that SRL performance can be improved over this baseline by selecting parses with likely semantic roles.

3 0.12423925 114 emnlp-2010-Unsupervised Parse Selection for HPSG

Author: Rebecca Dridan ; Timothy Baldwin

Abstract: Parser disambiguation with precision grammars generally takes place via statistical ranking of the parse yield of the grammar using a supervised parse selection model. In the standard process, the parse selection model is trained over a hand-disambiguated treebank, meaning that without a significant investment of effort to produce the treebank, parse selection is not possible. Furthermore, as treebanking is generally streamlined with parse selection models, creating the initial treebank without a model requires more resources than subsequent treebanks. In this work, we show that, by taking advantage of the constrained nature of these HPSG grammars, we can learn a discriminative parse selection model from raw text in a purely unsupervised fashion. This allows us to bootstrap the treebanking process and provide better parsers faster, and with less resources.

4 0.10938933 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

Author: Eugene Charniak

Abstract: We present a new syntactic parser that works left-to-right and top down, thus maintaining a fully-connected parse tree for a few alternative parse hypotheses. All of the commonly used statistical parsers use context-free dynamic programming algorithms and as such work bottom up on the entire sentence. Thus they only find a complete fully connected parse at the very end. In contrast, both subjective and experimental evidence show that people understand a sentence word-to-word as they go along, or close to it. The constraint that the parser keeps one or more fully connected syntactic trees is intended to operationalize this cognitive fact. Our parser achieves a new best result for topdown parsers of 89.4%,a 20% error reduction over the previous single-parser best result for parsers of this type of 86.8% (Roark, 2001) . The improved performance is due to embracing the very large feature set available in exchange for giving up dynamic programming.

5 0.08869382 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: Syntactic consistency is the preference to reuse a syntactic construction shortly after its appearance in a discourse. We present an analysis of the WSJ portion of the Penn Treebank, and show that syntactic consistency is pervasive across productions with various lefthand side nonterminals. Then, we implement a reranking constituent parser that makes use of extra-sentential context in its feature set. Using a linear-chain conditional random field, we improve parsing accuracy over the generative baseline parser on the Penn Treebank WSJ corpus, rivalling a similar model that does not make use of context. We show that the context-aware and the context-ignorant rerankers perform well on different subsets of the evaluation data, suggesting a combined approach would provide further improvement. We also compare parses made by models, and suggest that context can be useful for parsing by capturing structural dependencies between sentences as opposed to lexically governed dependencies.

6 0.087282322 42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation

7 0.080186337 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

8 0.078285426 117 emnlp-2010-Using Unknown Word Techniques to Learn Known Words

9 0.076552667 99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar

10 0.072073303 17 emnlp-2010-An Efficient Algorithm for Unsupervised Word Segmentation with Branching Entropy and MDL

11 0.068903096 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars

12 0.068593882 112 emnlp-2010-Unsupervised Discovery of Negative Categories in Lexicon Bootstrapping

13 0.068055831 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

14 0.066078871 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

15 0.058424205 77 emnlp-2010-Measuring Distributional Similarity in Context

16 0.05788409 87 emnlp-2010-Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space

17 0.056959093 40 emnlp-2010-Effects of Empty Categories on Machine Translation

18 0.055248752 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

19 0.052041333 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?

20 0.051045809 38 emnlp-2010-Dual Decomposition for Parsing with Non-Projective Head Automata


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.208), (1, 0.068), (2, 0.18), (3, 0.079), (4, 0.089), (5, 0.105), (6, 0.005), (7, -0.139), (8, 0.037), (9, -0.058), (10, 0.114), (11, -0.046), (12, 0.002), (13, -0.016), (14, 0.003), (15, -0.082), (16, 0.019), (17, -0.042), (18, -0.094), (19, -0.172), (20, -0.055), (21, 0.015), (22, 0.016), (23, 0.013), (24, -0.032), (25, -0.082), (26, -0.053), (27, -0.318), (28, 0.002), (29, 0.01), (30, 0.054), (31, 0.033), (32, 0.092), (33, -0.271), (34, -0.189), (35, 0.021), (36, -0.099), (37, -0.063), (38, -0.007), (39, 0.036), (40, -0.085), (41, -0.009), (42, 0.1), (43, -0.115), (44, 0.155), (45, 0.007), (46, -0.007), (47, 0.089), (48, 0.082), (49, -0.071)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96174371 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

Author: Tom Kwiatkowksi ; Luke Zettlemoyer ; Sharon Goldwater ; Mark Steedman

Abstract: This paper addresses the problem of learning to map sentences to logical form, given training data consisting of natural language sentences paired with logical representations of their meaning. Previous approaches have been designed for particular natural languages or specific meaning representations; here we present a more general method. The approach induces a probabilistic CCG grammar that represents the meaning of individual words and defines how these meanings can be combined to analyze complete sentences. We use higher-order unification to define a hypothesis space containing all grammars consistent with the training data, and develop an online learning algorithm that efficiently searches this space while simultaneously estimating the parameters of a log-linear parsing model. Experiments demonstrate high accuracy on benchmark data sets in four languages with two different meaning representations.

2 0.63727713 121 emnlp-2010-What a Parser Can Learn from a Semantic Role Labeler and Vice Versa

Author: Stephen Boxwell ; Dennis Mehay ; Chris Brew

Abstract: In many NLP systems, there is a unidirectional flow of information in which a parser supplies input to a semantic role labeler. In this paper, we build a system that allows information to flow in both directions. We make use of semantic role predictions in choosing a single-best parse. This process relies on an averaged perceptron model to distinguish likely semantic roles from erroneous ones. Our system penalizes parses that give rise to low-scoring semantic roles. To explore the consequences of this we perform two experiments. First, we use a baseline generative model to produce n-best parses, which are then re-ordered by our semantic model. Second, we use a modified version of our semantic role labeler to predict semantic roles at parse time. The performance of this modified labeler is weaker than that of our best full SRL, because it is restricted to features that can be computed directly from the parser’s packed chart. For both experiments, the resulting semantic predictions are then used to select parses. Finally, we feed the selected parses produced by each experiment to the full version of our semantic role labeler. We find that SRL performance can be improved over this baseline by selecting parses with likely semantic roles.

3 0.56567705 114 emnlp-2010-Unsupervised Parse Selection for HPSG

Author: Rebecca Dridan ; Timothy Baldwin

Abstract: Parser disambiguation with precision grammars generally takes place via statistical ranking of the parse yield of the grammar using a supervised parse selection model. In the standard process, the parse selection model is trained over a hand-disambiguated treebank, meaning that without a significant investment of effort to produce the treebank, parse selection is not possible. Furthermore, as treebanking is generally streamlined with parse selection models, creating the initial treebank without a model requires more resources than subsequent treebanks. In this work, we show that, by taking advantage of the constrained nature of these HPSG grammars, we can learn a discriminative parse selection model from raw text in a purely unsupervised fashion. This allows us to bootstrap the treebanking process and provide better parsers faster, and with less resources.

4 0.49914107 117 emnlp-2010-Using Unknown Word Techniques to Learn Known Words

Author: Kostadin Cholakov ; Gertjan van Noord

Abstract: Unknown words are a hindrance to the performance of hand-crafted computational grammars of natural language. However, words with incomplete and incorrect lexical entries pose an even bigger problem because they can be the cause of a parsing failure despite being listed in the lexicon of the grammar. Such lexical entries are hard to detect and even harder to correct. We employ an error miner to pinpoint words with problematic lexical entries. An automated lexical acquisition technique is then used to learn new entries for those words which allows the grammar to parse previously uncovered sentences successfully. We test our method on a large-scale grammar of Dutch and a set of sentences for which this grammar fails to produce a parse. The application of the method enables the grammar to cover 83.76% of those sentences with an accuracy of 86.15%.

5 0.4458659 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: Syntactic consistency is the preference to reuse a syntactic construction shortly after its appearance in a discourse. We present an analysis of the WSJ portion of the Penn Treebank, and show that syntactic consistency is pervasive across productions with various lefthand side nonterminals. Then, we implement a reranking constituent parser that makes use of extra-sentential context in its feature set. Using a linear-chain conditional random field, we improve parsing accuracy over the generative baseline parser on the Penn Treebank WSJ corpus, rivalling a similar model that does not make use of context. We show that the context-aware and the context-ignorant rerankers perform well on different subsets of the evaluation data, suggesting a combined approach would provide further improvement. We also compare parses made by models, and suggest that context can be useful for parsing by capturing structural dependencies between sentences as opposed to lexically governed dependencies.

6 0.38241455 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

7 0.32445657 17 emnlp-2010-An Efficient Algorithm for Unsupervised Word Segmentation with Branching Entropy and MDL

8 0.30823895 112 emnlp-2010-Unsupervised Discovery of Negative Categories in Lexicon Bootstrapping

9 0.28402817 40 emnlp-2010-Effects of Empty Categories on Machine Translation

10 0.28057951 77 emnlp-2010-Measuring Distributional Similarity in Context

11 0.27727464 122 emnlp-2010-WikiWars: A New Corpus for Research on Temporal Expressions

12 0.26610219 42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation

13 0.26051271 4 emnlp-2010-A Game-Theoretic Approach to Generating Spatial Descriptions

14 0.24299212 99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar

15 0.23041819 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

16 0.22423273 13 emnlp-2010-A Simple Domain-Independent Probabilistic Approach to Generation

17 0.22060393 87 emnlp-2010-Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space

18 0.21225031 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions

19 0.21015051 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics

20 0.20896363 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.026), (12, 0.036), (29, 0.118), (30, 0.032), (32, 0.011), (52, 0.035), (56, 0.075), (62, 0.022), (64, 0.308), (66, 0.095), (72, 0.074), (76, 0.062), (79, 0.011), (87, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.7370609 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

Author: Tom Kwiatkowksi ; Luke Zettlemoyer ; Sharon Goldwater ; Mark Steedman

Abstract: This paper addresses the problem of learning to map sentences to logical form, given training data consisting of natural language sentences paired with logical representations of their meaning. Previous approaches have been designed for particular natural languages or specific meaning representations; here we present a more general method. The approach induces a probabilistic CCG grammar that represents the meaning of individual words and defines how these meanings can be combined to analyze complete sentences. We use higher-order unification to define a hypothesis space containing all grammars consistent with the training data, and develop an online learning algorithm that efficiently searches this space while simultaneously estimating the parameters of a log-linear parsing model. Experiments demonstrate high accuracy on benchmark data sets in four languages with two different meaning representations.

2 0.62022334 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

Author: Xian Qian ; Qi Zhang ; Yaqian Zhou ; Xuanjing Huang ; Lide Wu

Abstract: Many sequence labeling tasks in NLP require solving a cascade of segmentation and tagging subtasks, such as Chinese POS tagging, named entity recognition, and so on. Traditional pipeline approaches usually suffer from error propagation. Joint training/decoding in the cross-product state space could cause too many parameters and high inference complexity. In this paper, we present a novel method which integrates graph structures of two subtasks into one using virtual nodes, and performs joint training and decoding in the factorized state space. Experimental evaluations on CoNLL 2000 shallow parsing data set and Fourth SIGHAN Bakeoff CTB POS tagging data set demonstrate the superiority of our method over cross-product, pipeline and candidate reranking approaches.

3 0.50824243 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice

Author: Samidh Chatterjee ; Nicola Cancedda

Abstract: Minimum Error Rate Training is the algorithm for log-linear model parameter training most used in state-of-the-art Statistical Machine Translation systems. In its original formulation, the algorithm uses N-best lists output by the decoder to grow the Translation Pool that shapes the surface on which the actual optimization is performed. Recent work has been done to extend the algorithm to use the entire translation lattice built by the decoder, instead of N-best lists. We propose here a third, intermediate way, consisting in growing the translation pool using samples randomly drawn from the translation lattice. We empirically measure a systematic im- provement in the BLEU scores compared to training using N-best lists, without suffering the increase in computational complexity associated with operating with the whole lattice.

4 0.50567943 32 emnlp-2010-Context Comparison of Bursty Events in Web Search and Online Media

Author: Yunliang Jiang ; Cindy Xide Lin ; Qiaozhu Mei

Abstract: In this paper, we conducted a systematic comparative analysis of language in different contexts of bursty topics, including web search, news media, blogging, and social bookmarking. We analyze (1) the content similarity and predictability between contexts, (2) the coverage of search content by each context, and (3) the intrinsic coherence of information in each context. Our experiments show that social bookmarking is a better predictor to the bursty search queries, but news media and social blogging media have a much more compelling coverage. This comparison provides insights on how the search behaviors and social information sharing behaviors of users are correlated to the professional news media in the context of bursty events.

5 0.50376362 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

Author: Kristian Woodsend ; Yansong Feng ; Mirella Lapata

Abstract: The task of selecting information and rendering it appropriately appears in multiple contexts in summarization. In this paper we present a model that simultaneously optimizes selection and rendering preferences. The model operates over a phrase-based representation of the source document which we obtain by merging PCFG parse trees and dependency graphs. Selection preferences for individual phrases are learned discriminatively, while a quasi-synchronous grammar (Smith and Eisner, 2006) captures rendering preferences such as paraphrases and compressions. Based on an integer linear programming formulation, the model learns to generate summaries that satisfy both types of preferences, while ensuring that length, topic coverage and grammar constraints are met. Experiments on headline and image caption generation show that our method obtains state-of-the-art performance using essentially the same model for both tasks without any major modifications.

6 0.50266594 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

7 0.50255358 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

8 0.49782959 87 emnlp-2010-Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space

9 0.49729633 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

10 0.49647737 89 emnlp-2010-PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts

11 0.49505112 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics

12 0.49436533 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning

13 0.49313825 40 emnlp-2010-Effects of Empty Categories on Machine Translation

14 0.49220613 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

15 0.49208745 103 emnlp-2010-Tense Sense Disambiguation: A New Syntactic Polysemy Task

16 0.49164256 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning

17 0.49034986 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation

18 0.48845121 26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats

19 0.48821065 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

20 0.48812878 63 emnlp-2010-Improving Translation via Targeted Paraphrasing