acl acl2011 acl2011-282 knowledge-graph by maker-knowledge-mining

282 acl-2011-Shift-Reduce CCG Parsing


Source: pdf

Author: Yue Zhang ; Stephen Clark

Abstract: CCGs are directly compatible with binarybranching bottom-up parsing algorithms, in particular CKY and shift-reduce algorithms. While the chart-based approach has been the dominant approach for CCG, the shift-reduce method has been little explored. In this paper, we develop a shift-reduce CCG parser using a discriminative model and beam search, and compare its strengths and weaknesses with the chart-based C&C; parser. We study different errors made by the two parsers, and show that the shift-reduce parser gives competitive accuracies compared to C&C.; Considering our use of a small beam, and given the high ambiguity levels in an automatically-extracted grammar and the amount of information in the CCG lexical categories which form the shift actions, this is a surprising result.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this paper, we develop a shift-reduce CCG parser using a discriminative model and beam search, and compare its strengths and weaknesses with the chart-based C&C; parser. [sent-7, score-0.389]

2 We study different errors made by the two parsers, and show that the shift-reduce parser gives competitive accuracies compared to C&C. [sent-8, score-0.352]

3 ; Considering our use of a small beam, and given the high ambiguity levels in an automatically-extracted grammar and the amount of information in the CCG lexical categories which form the shift actions, this is a surprising result. [sent-9, score-0.284]

4 In this paper we fill a gap in the CCG literature by developing a shiftreduce parser for CCG. [sent-20, score-0.316]

5 Shift-reduce parsers have become popular for dependency parsing, building on the initial work of Yamada and Matsumoto (2003) and Nivre and Scholz (2004). [sent-21, score-0.192]

6 One advantage of shift-reduce parsers is that the scoring model can be defined over actions, allowing highly efficient parsing by using a greedy algorithm in which the highest scoring action (or a small number of possible actions) is taken at each step. [sent-22, score-0.407]

7 , 2008), we build our shift-reduce parser using a global linear model, and compare it with the chartbased C&C; parser. [sent-26, score-0.341]

8 Using standard development and test sets from CCGbank, our shift-reduce parser gives a labeled F-measure of 85. [sent-27, score-0.262]

9 45% F-measure of the C&C; parser on recovery of predicate-argument dependencies from CCGbank. [sent-29, score-0.361]

10 Detailed analysis shows that our shift-reduce parser yields a higher precision, lower recall and higher F-score on most of the common CCG dependency types compared to C&C. [sent-34, score-0.465]

11 Unlike the C&C; parser, the shift-reduce parser naturally produces fragmentary analyses when appropriate (Nivre et al. [sent-36, score-0.358]

12 1 Finally, considering this work in the wider parsing context, it provides an interesting comparison between heuristic beam search using a rich set of features, and optimal dynamic programming search where the feature range is restricted. [sent-38, score-0.37]

13 We are able to perform this comparison because the use of the CCG supertagger means that the C&C; parser is able to build the complete chart, from which it can find the optimal derivation, with no pruning whatsoever at the parsing stage. [sent-39, score-0.528]

14 In contrast, the shift-reduce parser uses a simple beam search with a relatively small beam. [sent-40, score-0.427]

15 Perhaps surprisingly, given the ambiguity levels in an automatically-extracted grammar, and the amount of information in the CCG lexical categories which form the shift actions, the shift-reduce parser using heuristic beam search is able to outperform the chart-based parser. [sent-41, score-0.653]

16 During CCG parsing, adjacent categories are combined using CCG’s combinatory rules. [sent-44, score-0.207]

17 (2007) for chartbased parsers which can produce fragmentary analyses. [sent-49, score-0.256]

18 For example, forward type-raising can change a subject NP into a complex category looking to the right for a verb phrase: NP ⇒ S/(S\NP) An example CCG derivation is given in Section 3. [sent-51, score-0.203]

19 a mapping from words to sets of lexical categories, and then manually define the combinatory rule schemas, such as functional application and composition, which combine the categories together. [sent-56, score-0.3]

20 2 The second approach is to read the complete grammar from the derivations, by extracting combinatory rule instances from the local trees consisting of a parent category and one or two child categories, and applying only those instances during parsing. [sent-59, score-0.449]

21 (These rule instances also include rules to deal with punctuation and unary type-changing rules, in addition to instances of the combinatory rule schemas. [sent-60, score-0.347]

22 method, which has the potential to produce a mildlycontext sensitive grammar (given the existence of certain combinatory rules) (Weir, 1988). [sent-63, score-0.215]

23 3 The Shift-reduce CCG Parser Given an input sentence, our parser uses a stack of partial derivations, a queue of incoming words, and a series of actions—derived from the rule instances in CCGbank—to build a derivation tree. [sent-65, score-0.69]

24 The derivation tree can be transformed into CCG dependencies or grammatical relations by a post-processing step, which essentially runs the C&C; parser deterministically over the derivation, interpreting the derivation and generating the required output. [sent-68, score-0.476]

25 The configuration of the parser, at each step of the parsing process, is shown in part (a) of Figure 1, where the stack holds the partial derivation trees that have been built, and the queue contains the incoming words that have not been processed. [sent-69, score-0.49]

26 In the figure, S ( H ) represents a category S on the stack with head word H, while Qi represents a word in the incoming queue. [sent-70, score-0.31]

27 The set of action types used by the parser is as follows: {SHIFT, COMBINE, UNARY, F INI SH}. [sent-71, score-0.414]

28 available to the parser at each step in the process. [sent-73, score-0.262]

29 The SHIFT-X action pushes the next incoming word onto the stack, and assigns the lexical category X to the word (Figure 1(b)). [sent-74, score-0.488]

30 The label X can be any lexical category from the set assigned to the word being shifted by the supertagger. [sent-75, score-0.255]

31 This is in contrast to a shift-reduce dependency parser in which a shift action typically just pushes a word onto the stack. [sent-77, score-0.68]

32 The category of 685 Figure 1: The parser configuration and set of actions. [sent-79, score-0.39]

33 A COMB INE action corresponds to a combinatory rule in the CCG grammar (or one of the additional punctuation or type-changing rules), which is applied to the categories of the top two nodes on the stack. [sent-81, score-0.492]

34 The UNARY-X action pops the top of the stack, transforms it into a new node with category X, and pushes the new node onto the stack. [sent-82, score-0.46]

35 A UNARY action corresponds to a unary type-changing or typeraising rule in the CCG grammar, which is applied to the category on top of the stack. [sent-83, score-0.393]

36 The F INI SH action terminates the parsing process; it can be applied when all input words have been shifted onto the stack. [sent-84, score-0.413]

37 Note that the F INI SH action can be applied when the stack contains more than one node, in which case the parser produces a set of partial derivation trees, each corresponding to a node on the stack. [sent-85, score-0.67]

38 This sometimes happens when a full derivation tree cannot be built due to supertagging errors, and provides a graceful solution to the problem of producing high-quality fragmentary parses when necessary. [sent-86, score-0.216]

39 Figure 2 shows the shift-reduce parsing process for the example sentence “IBM bought Lotus”. [sent-88, score-0.19]

40 First the word “IBM” is shifted onto the stack as an NP; then “bought” is shifted as a transitive verb looking for its object NP on the right and subject NP on the left ((S[dcl]\NP)/NP); and then “Lotus” is shifted as an NP. [sent-89, score-0.388]

41 , 2006) and CFG 686 (Sagae and Lavie, 2006b) parsing is that, for CCG, there are many more shift actions a shift action for each word-lexical category pair. [sent-93, score-0.632]

42 Given the amount of syntactic information in the lexical categories, the choice of correct category, from those supplied by the supertagger, is often a difficult one, and often a choice best left to the parsing model. [sent-94, score-0.185]

43 The C&C; parser solves this problem by building the complete packed chart consistent with the lexical categories supplied by the supertagger, leaving the selection of the lexical categories to the Viterbi algorithm. [sent-95, score-0.514]

44 For the shift-reduce parser the choice is also left to the parsing model, but in contrast to C&C; the correct lexical category could be lost at any point in the heuristic search process. [sent-96, score-0.642]

45 A candidate item is finished if and only if the F INI SH action has been applied to it, and no more actions can be applied to a candidate item after it reaches the finished status. [sent-104, score-0.93]

46 Given an input sentence, we define the start item as the unfinished item with an empty stack and the whole input sentence as the incoming words. [sent-105, score-0.567]

47 A derivation is built from the start item by repeated applications of actions until the item is finished. [sent-106, score-0.473]

48 To apply beam-search, an agenda is used to hold the N-best partial (unfinished) candidate items at each parsing step. [sent-107, score-0.488]

49 A separate candidate output is function DECODE(input, agenda, list, N, grammar, candidate output): agenda. [sent-108, score-0.277]

50 best(N)) Figure 3: The decoding algorithm; N is the agenda size used to record the current best finished item that has been found, since candidate items can be finished at different steps. [sent-121, score-0.64]

51 Initially the agenda contains only the start item, and the candidate output is set to none. [sent-122, score-0.306]

52 At each step during parsing, each candidate item from the agenda is extended in all possible ways by applying one action according to the grammar, and a number of new candidate items are generated. [sent-123, score-0.742]

53 If a newly generated candidate is finished, it is compared with the current candidate output. [sent-124, score-0.291]

54 If the candidate output is none or the score of the newly generated candidate is higher than the score of the candidate output, the candidate output is replaced with the newly generated item; otherwise the newly generated item is discarded. [sent-125, score-0.895]

55 If the newly generated candidate is unfinished, it is appended to a list of newly generated partial candidates. [sent-126, score-0.263]

56 After all candidate items from the agenda have been processed, the agenda is cleared and the N-best items from the list are put on the agenda. [sent-127, score-0.491]

57 Then the list is cleared and the parser moves on to the next step. [sent-128, score-0.262]

58 This process repeats until the agenda is empty (which means that no new items have been generated in the previous step), and the candidate output is the final derivation. [sent-129, score-0.341]

59 Features for a (finished or partial) candidate are extracted from each action that have been applied to build the candidate. [sent-132, score-0.273]

60 The symbols S0, S1, S2 and S3 in the table represent the top four nodes on the stack (if existent), and Q0, Q1, Q2 and Q3 represent the front four words in the incoming queue (if existent). [sent-135, score-0.233]

61 Extracted from the training data, the CCG grammar used by our parser consists of 3070 binary rule instances and 191 unary rule instances. [sent-144, score-0.538]

62 We compute F-scores over labeled CCG dependencies and also lexical category accuracy. [sent-145, score-0.239]

63 For example, the first NP in a transitive verb category is a CCG dependency relation, corresponding to the subject of the verb. [sent-147, score-0.239]

64 There is a mismatch between the grammar that generate uses, which is the same grammar as the C&C; parser, and the grammar we extract from CCGbank, which contains more rule instances. [sent-150, score-0.307]

65 Hence generate is unable to produce dependencies for some of the derivations our shift-reduce parser produces. [sent-151, score-0.399]

66 For each word, the supertagger assigns all lexical categories whose forward-backward probability is above β · max, where max is the highest lbeilxiitcyal i category probability feorre t mhea word, haend h β hise a threshold parameter. [sent-162, score-0.382]

67 To give the parser a reasonable freedom in lexical category disambiguation, we used a small β value of 0. [sent-163, score-0.437]

68 For training, but not testing, we also added the correct lexical category to the list of lexical categories for a word in cases when it was not provided by the supertagger. [sent-166, score-0.301]

69 Increasing the size of the beam in the parser beam search leads to higher accuracies but slower running time. [sent-167, score-0.645]

70 In our development experiments, the accuracy improvement became small when the beam size reached 16, and so we set the size of the beam to 16 for the remainder of the experiments. [sent-168, score-0.254]

71 1 Development test accuracies Table 2 shows the labeled precision (lp), recall (lr), F-score (lf), sentence-level accuracy (lsent) and lexical category accuracy (cats) of our parser and the C&C; parser on the development data. [sent-170, score-0.821]

72 We ran the C&C; parser using the normal-form model (we reproduced the numbers reported in Clark and Curran (2007)), and copied the results of the hybrid model from Clark and Curran (2007), since the hybrid model is not part of the public release. [sent-171, score-0.364]

73 The accuracy of our parser is much better when evaluated on all sentences, partly because C&C; failed on 0. [sent-172, score-0.262]

74 Our shift-reduce parser does not suffer from this problem because it produces fragmentary analyses for those cases. [sent-174, score-0.358]

75 When evaluated on only those sentences that C&C; could analyze, our parser gave 0. [sent-175, score-0.319]

76 Our shift-reduce parser also gave higher accuracies on lexical category assignment. [sent-177, score-0.585]

77 The sentence accuracy of our shift-reduce parser is also higher than C&C;, which confirms that our shift-reduce parser produces reasonable sentence-level analyses, despite the pos- sibility for fragmentary analysis. [sent-178, score-0.651]

78 rneics%po 6 978 5 0 50 0thispaC&peCr51015202530; r%clae 95 86 87 05 0 50 thispaCp&eCr510; 520 530 Precision comparison by dependency length dependency length (bins of 5) Recall comparison by dependency length dependency length (bins of 5) Figure 4: P & R scores relative to dependency length. [sent-204, score-0.555]

79 2 Error comparison with C&C; parser Our shift-reduce parser and the chart-based C&C; parser offer two different solutions to the CCG parsing problem. [sent-206, score-0.924]

80 We follow McDonald and Nivre (2007) and characterize the errors of the two parsers by sentence and dependency length and dependency type. [sent-210, score-0.303]

81 Our shift-reduce parser performed consistently better than C&C; on all sentence lengths, and there was no significant difference in the rate of performance degradation between the parsers as the sentence length increased. [sent-213, score-0.343]

82 The number of dependencies drops when the dependency length increases; there are 141, 180 and 124 dependencies from the gold-standard, C&C; output and our shift-reduce parser output, respectively, when the dependency length is between 21 and 25, inclusive. [sent-220, score-0.647]

83 The recall of our parser drops more quickly as the dependency length grows beyond 15. [sent-222, score-0.403]

84 In contrast, the precision did not drop more quickly than C&C;, and in fact is consistently higher than C&C; across all dependency lengths, which reflects the fact that the long range dependencies our parser managed to recover are comparatively reliable. [sent-224, score-0.5]

85 While our shift-reduce parser gave higher precision for almost all categories, it gave higher recall on only half of them, but higher F-scores for all but one dependency type. [sent-227, score-0.642]

86 Evaluated on all sentences, the accuracies of our parser are much higher than the C&C; parser, since the C&C; parser failed to produce any output for 10 sentences. [sent-231, score-0.65]

87 parsers on the sentences for which C&C; produces an analysis, our parser still gave the highest accuracies. [sent-267, score-0.4]

88 The shift-reduce parser gave higher precision, and lower recall, than C&C; it also gave higher sentencelevel and lexical category accuracy. [sent-268, score-0.613]

89 The last two rows in the table show the accuracies of Fowler and Penn (2010) (F&P;), who applied the CFG parser of Petrov and Klein (2007) to CCG, and the corresponding accuracies for the C&C; parser on the same test sentences. [sent-269, score-0.644]

90 The shiftreduce parser is linear-time (in both sentence length and beam size), and can analyse over 10 sentences per second on a 2GHz CPU, with a beam of 16, which compares very well with other constituency parsers. [sent-272, score-0.57]

91 7 Related Work Sagae and Lavie (2006a) describes a shift-reduce parser for the Penn Treebank parsing task which uses best-first search to allow some ambiguity into the parsing process. [sent-274, score-0.576]

92 (2010)) describe a greedy shift-reduce parser for HPSG, in which a single action is chosen at each parsing step, allowing the possibility of highly efficient parsing. [sent-283, score-0.588]

93 Our approach to this problem was to allow the parser to return a fragmentary analysis; Ninomiya et al. [sent-285, score-0.358]

94 C&C; can perform exhaustive search because the supertagger has already reduced the search space. [sent-291, score-0.204]

95 We also found that approximate heuristic search for shift-reduce parsing, utilising a rich feature space, can match the performance of the optimal chart-based parser, as well as similar error profiles for the two CCG parsers compared to the two dependency parsers. [sent-292, score-0.294]

96 Considered in terms of the wider parsing problem, we have shown that state-of-the-art parsing results can be obtained using a global discriminative model, one of the few papers to do so without using a generative baseline as a feature. [sent-294, score-0.276]

97 The comparison with C&C; also allowed us to compare a shift-reduce parser based on heuristic beam search utilising a rich feature set with an optimal chart-based parser whose features are restricted by dynamic programming, with favourable results for the shift-reduce parser. [sent-295, score-0.753]

98 The complementary errors made by the chartbased and shift-reduce parsers opens the possibility of effective parser combination, following similar work for dependency parsing. [sent-296, score-0.533]

99 The parser code can be downloaded at http : / /www . [sent-297, score-0.262]

100 A tale of two parsers: investigating and combining graph-based and transition-based dependency parsing using beamsearch. [sent-452, score-0.249]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ccg', 0.568), ('parser', 0.262), ('clark', 0.167), ('item', 0.163), ('curran', 0.154), ('action', 0.152), ('agenda', 0.15), ('parsing', 0.138), ('combinatory', 0.128), ('category', 0.128), ('supertagger', 0.128), ('beam', 0.127), ('candidate', 0.121), ('ccgbank', 0.112), ('dependency', 0.111), ('hockenmaier', 0.109), ('np', 0.108), ('stack', 0.105), ('steedman', 0.101), ('fragmentary', 0.096), ('grammar', 0.087), ('nivre', 0.084), ('parsers', 0.081), ('fowler', 0.08), ('shifted', 0.08), ('ninomiya', 0.08), ('sagae', 0.079), ('categories', 0.079), ('chartbased', 0.079), ('incoming', 0.077), ('derivation', 0.075), ('derivations', 0.073), ('actions', 0.072), ('shift', 0.071), ('finished', 0.069), ('matsuzaki', 0.068), ('unary', 0.067), ('dependencies', 0.064), ('hpsg', 0.063), ('accuracies', 0.06), ('lotus', 0.059), ('subnode', 0.059), ('unfinished', 0.059), ('penn', 0.058), ('ini', 0.057), ('gave', 0.057), ('shiftreduce', 0.054), ('yue', 0.053), ('bought', 0.052), ('dcl', 0.052), ('stephen', 0.052), ('queue', 0.051), ('newly', 0.049), ('lexical', 0.047), ('perceptron', 0.047), ('rule', 0.046), ('supertagging', 0.045), ('partial', 0.044), ('onto', 0.043), ('treebank', 0.041), ('pushes', 0.041), ('zhang', 0.04), ('miyao', 0.039), ('takuya', 0.038), ('search', 0.038), ('lavie', 0.037), ('mstparser', 0.037), ('greedy', 0.036), ('hybrid', 0.036), ('sh', 0.035), ('items', 0.035), ('output', 0.035), ('recovery', 0.035), ('ades', 0.035), ('utilising', 0.035), ('nobuyuki', 0.035), ('shimizu', 0.035), ('rimell', 0.035), ('comb', 0.035), ('existent', 0.035), ('spanning', 0.034), ('bins', 0.034), ('decoding', 0.033), ('mcdonald', 0.033), ('categorial', 0.033), ('iwpt', 0.033), ('node', 0.032), ('pops', 0.032), ('precision', 0.032), ('higher', 0.031), ('yamada', 0.03), ('julia', 0.03), ('recall', 0.03), ('bos', 0.03), ('copied', 0.03), ('scholz', 0.03), ('cfg', 0.03), ('instances', 0.03), ('competitive', 0.03), ('heuristic', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 282 acl-2011-Shift-Reduce CCG Parsing

Author: Yue Zhang ; Stephen Clark

Abstract: CCGs are directly compatible with binarybranching bottom-up parsing algorithms, in particular CKY and shift-reduce algorithms. While the chart-based approach has been the dominant approach for CCG, the shift-reduce method has been little explored. In this paper, we develop a shift-reduce CCG parser using a discriminative model and beam search, and compare its strengths and weaknesses with the chart-based C&C; parser. We study different errors made by the two parsers, and show that the shift-reduce parser gives competitive accuracies compared to C&C.; Considering our use of a small beam, and given the high ambiguity levels in an automatically-extracted grammar and the amount of information in the CCG lexical categories which form the shift actions, this is a surprising result.

2 0.4065299 112 acl-2011-Efficient CCG Parsing: A* versus Adaptive Supertagging

Author: Michael Auli ; Adam Lopez

Abstract: We present a systematic comparison and combination of two orthogonal techniques for efficient parsing of Combinatory Categorial Grammar (CCG). First we consider adaptive supertagging, a widely used approximate search technique that prunes most lexical categories from the parser’s search space using a separate sequence model. Next we consider several variants on A*, a classic exact search technique which to our knowledge has not been applied to more expressive grammar formalisms like CCG. In addition to standard hardware-independent measures of parser effort we also present what we believe is the first evaluation of A* parsing on the more realistic but more stringent metric of CPU time. By itself, A* substantially reduces parser effort as measured by the number of edges considered during parsing, but we show that for CCG this does not always correspond to improvements in CPU time over a CKY baseline. Combining A* with adaptive supertagging decreases CPU time by 15% for our best model.

3 0.39508933 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing

Author: Michael Auli ; Adam Lopez

Abstract: Via an oracle experiment, we show that the upper bound on accuracy of a CCG parser is significantly lowered when its search space is pruned using a supertagger, though the supertagger also prunes many bad parses. Inspired by this analysis, we design a single model with both supertagging and parsing features, rather than separating them into distinct models chained together in a pipeline. To overcome the resulting increase in complexity, we experiment with both belief propagation and dual decomposition approaches to inference, the first empirical comparison of these algorithms that we are aware of on a structured natural language processing problem. On CCGbank we achieve a labelled dependency F-measure of 88.8% on gold POS tags, and 86.7% on automatic part-of-speeoch tags, the best reported results for this task.

4 0.30236697 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features

Author: Yue Zhang ; Joakim Nivre

Abstract: Transition-based dependency parsers generally use heuristic decoding algorithms but can accommodate arbitrarily rich feature representations. In this paper, we show that we can improve the accuracy of such parsers by considering even richer feature sets than those employed in previous systems. In the standard Penn Treebank setup, our novel features improve attachment score form 91.4% to 92.9%, giving the best results so far for transitionbased parsing and rivaling the best results overall. For the Chinese Treebank, they give a signficant improvement of the state of the art. An open source release of our parser is freely available.

5 0.16406831 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation

Author: Nathan Green

Abstract: Flat noun phrase structure was, up until recently, the standard in annotation for the Penn Treebanks. With the recent addition of internal noun phrase annotation, dependency parsing and applications down the NLP pipeline are likely affected. Some machine translation systems, such as TectoMT, use deep syntax as a language transfer layer. It is proposed that changes to the noun phrase dependency parse will have a cascading effect down the NLP pipeline and in the end, improve machine translation output, even with a reduction in parser accuracy that the noun phrase structure might cause. This paper examines this noun phrase structure’s effect on dependency parsing, in English, with a maximum spanning tree parser and shows a 2.43%, 0.23 Bleu score, improvement for English to Czech machine translation. .

6 0.16268417 143 acl-2011-Getting the Most out of Transition-based Dependency Parsing

7 0.15684812 236 acl-2011-Optimistic Backtracking - A Backtracking Overlay for Deterministic Incremental Parsing

8 0.14623202 167 acl-2011-Improving Dependency Parsing with Semantic Classes

9 0.14361057 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing

10 0.14331788 39 acl-2011-An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing

11 0.1365934 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

12 0.12951437 59 acl-2011-Better Automatic Treebank Conversion Using A Feature-Based Approach

13 0.12684309 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

14 0.12627283 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

15 0.11445749 107 acl-2011-Dynamic Programming Algorithms for Transition-Based Dependency Parsers

16 0.11289877 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation

17 0.11244203 333 acl-2011-Web-Scale Features for Full-Scale Parsing

18 0.11184202 184 acl-2011-Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser

19 0.11053617 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons

20 0.11004216 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.226), (1, -0.11), (2, -0.079), (3, -0.384), (4, -0.058), (5, -0.058), (6, -0.05), (7, 0.079), (8, 0.03), (9, -0.103), (10, 0.012), (11, 0.114), (12, 0.079), (13, -0.154), (14, -0.051), (15, 0.091), (16, 0.144), (17, -0.083), (18, -0.007), (19, -0.086), (20, 0.096), (21, 0.078), (22, 0.143), (23, -0.02), (24, -0.207), (25, 0.146), (26, 0.089), (27, 0.015), (28, 0.012), (29, 0.041), (30, -0.077), (31, -0.15), (32, -0.045), (33, 0.024), (34, -0.143), (35, 0.032), (36, 0.002), (37, 0.008), (38, -0.025), (39, 0.055), (40, -0.093), (41, 0.005), (42, -0.035), (43, 0.05), (44, -0.077), (45, 0.066), (46, -0.051), (47, 0.099), (48, -0.103), (49, -0.001)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94394112 282 acl-2011-Shift-Reduce CCG Parsing

Author: Yue Zhang ; Stephen Clark

Abstract: CCGs are directly compatible with binarybranching bottom-up parsing algorithms, in particular CKY and shift-reduce algorithms. While the chart-based approach has been the dominant approach for CCG, the shift-reduce method has been little explored. In this paper, we develop a shift-reduce CCG parser using a discriminative model and beam search, and compare its strengths and weaknesses with the chart-based C&C; parser. We study different errors made by the two parsers, and show that the shift-reduce parser gives competitive accuracies compared to C&C.; Considering our use of a small beam, and given the high ambiguity levels in an automatically-extracted grammar and the amount of information in the CCG lexical categories which form the shift actions, this is a surprising result.

2 0.932693 112 acl-2011-Efficient CCG Parsing: A* versus Adaptive Supertagging

Author: Michael Auli ; Adam Lopez

Abstract: We present a systematic comparison and combination of two orthogonal techniques for efficient parsing of Combinatory Categorial Grammar (CCG). First we consider adaptive supertagging, a widely used approximate search technique that prunes most lexical categories from the parser’s search space using a separate sequence model. Next we consider several variants on A*, a classic exact search technique which to our knowledge has not been applied to more expressive grammar formalisms like CCG. In addition to standard hardware-independent measures of parser effort we also present what we believe is the first evaluation of A* parsing on the more realistic but more stringent metric of CPU time. By itself, A* substantially reduces parser effort as measured by the number of edges considered during parsing, but we show that for CCG this does not always correspond to improvements in CPU time over a CKY baseline. Combining A* with adaptive supertagging decreases CPU time by 15% for our best model.

3 0.92555684 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing

Author: Michael Auli ; Adam Lopez

Abstract: Via an oracle experiment, we show that the upper bound on accuracy of a CCG parser is significantly lowered when its search space is pruned using a supertagger, though the supertagger also prunes many bad parses. Inspired by this analysis, we design a single model with both supertagging and parsing features, rather than separating them into distinct models chained together in a pipeline. To overcome the resulting increase in complexity, we experiment with both belief propagation and dual decomposition approaches to inference, the first empirical comparison of these algorithms that we are aware of on a structured natural language processing problem. On CCGbank we achieve a labelled dependency F-measure of 88.8% on gold POS tags, and 86.7% on automatic part-of-speeoch tags, the best reported results for this task.

4 0.72571969 236 acl-2011-Optimistic Backtracking - A Backtracking Overlay for Deterministic Incremental Parsing

Author: Gisle Ytrestl

Abstract: This paper describes a backtracking strategy for an incremental deterministic transitionbased parser for HPSG. The method could theoretically be implemented on any other transition-based parser with some adjustments. In this paper, the algorithm is evaluated on CuteForce, an efficient deterministic shiftreduce HPSG parser. The backtracking strategy may serve to improve existing parsers, or to assess if a deterministic parser would benefit from backtracking as a strategy to improve parsing.

5 0.61573243 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing

Author: Mohit Bansal ; Dan Klein

Abstract: We investigate full-scale shortest-derivation parsing (SDP), wherein the parser selects an analysis built from the fewest number of training fragments. Shortest derivation parsing exhibits an unusual range of behaviors. At one extreme, in the fully unpruned case, it is neither fast nor accurate. At the other extreme, when pruned with a coarse unlexicalized PCFG, the shortest derivation criterion becomes both fast and surprisingly effective, rivaling more complex weighted-fragment approaches. Our analysis includes an investigation of tie-breaking and associated dynamic programs. At its best, our parser achieves an accuracy of 87% F1 on the English WSJ task with minimal annotation, and 90% F1 with richer annotation.

6 0.6069833 143 acl-2011-Getting the Most out of Transition-based Dependency Parsing

7 0.58773828 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features

8 0.51474631 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation

9 0.50528347 184 acl-2011-Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser

10 0.50037932 107 acl-2011-Dynamic Programming Algorithms for Transition-Based Dependency Parsers

11 0.47439814 267 acl-2011-Reversible Stochastic Attribute-Value Grammars

12 0.47249982 59 acl-2011-Better Automatic Treebank Conversion Using A Feature-Based Approach

13 0.46793348 39 acl-2011-An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing

14 0.4602423 243 acl-2011-Partial Parsing from Bitext Projections

15 0.44190544 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

16 0.43166947 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

17 0.42637879 167 acl-2011-Improving Dependency Parsing with Semantic Classes

18 0.42302755 230 acl-2011-Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation

19 0.41155988 234 acl-2011-Optimal Head-Driven Parsing Complexity for Linear Context-Free Rewriting Systems

20 0.40787077 106 acl-2011-Dual Decomposition for Natural Language Processing


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.023), (17, 0.063), (26, 0.018), (28, 0.029), (37, 0.115), (39, 0.105), (41, 0.063), (55, 0.028), (59, 0.042), (72, 0.022), (91, 0.034), (96, 0.105), (97, 0.013), (98, 0.263)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86387926 22 acl-2011-A Probabilistic Modeling Framework for Lexical Entailment

Author: Eyal Shnarch ; Jacob Goldberger ; Ido Dagan

Abstract: Recognizing entailment at the lexical level is an important and commonly-addressed component in textual inference. Yet, this task has been mostly approached by simplified heuristic methods. This paper proposes an initial probabilistic modeling framework for lexical entailment, with suitable EM-based parameter estimation. Our model considers prominent entailment factors, including differences in lexical-resources reliability and the impacts of transitivity and multiple evidence. Evaluations show that the proposed model outperforms most prior systems while pointing at required future improvements. 1 Introduction and Background Textual Entailment was proposed as a generic paradigm for applied semantic inference (Dagan et al., 2006). This task requires deciding whether a tex- tual statement (termed the hypothesis-H) can be inferred (entailed) from another text (termed the textT). Since it was first introduced, the six rounds of the Recognizing Textual Entailment (RTE) challenges1 , currently organized under NIST, have become a standard benchmark for entailment systems. These systems tackle their complex task at various levels of inference, including logical representation (Tatu and Moldovan, 2007; MacCartney and Manning, 2007), semantic analysis (Burchardt et al., 2007) and syntactic parsing (Bar-Haim et al., 2008; Wang et al., 2009). Inference at these levels usually 1http://www.nist.gov/tac/2010/RTE/index.html 558 requires substantial processing and resources (e.g. parsing) aiming at high performance. Nevertheless, simple entailment methods, performing at the lexical level, provide strong baselines which most systems did not outperform (Mirkin et al., 2009; Majumdar and Bhattacharyya, 2010). Within complex systems, lexical entailment modeling is an important component. Finally, there are cases in which a full system cannot be used (e.g. lacking a parser for a targeted language) and one must resort to the simpler lexical approach. While lexical entailment methods are widely used, most of them apply ad hoc heuristics which do not rely on a principled underlying framework. Typically, such methods quantify the degree of lexical coverage of the hypothesis terms by the text’s terms. Coverage is determined either by a direct match of identical terms in T and H or by utilizing lexical semantic resources, such as WordNet (Fellbaum, 1998), that capture lexical entailment relations (denoted here as entailment rules). Common heuristics for quantifying the degree of coverage are setting a threshold on the percentage coverage of H’s terms (Majumdar and Bhattacharyya, 2010), counting absolute number of uncovered terms (Clark and Harrison, 2010), or applying an Information Retrievalstyle vector space similarity score (MacKinlay and Baldwin, 2009). Other works (Corley and Mihalcea, 2005; Zanzotto and Moschitti, 2006) have applied a heuristic formula to estimate the similarity between text fragments based on a similarity function between their terms. These heuristics do not capture several important aspects of entailment, such as varying reliability of Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 558–563, entailment resources and the impact of rule chaining and multiple evidence on entailment likelihood. An additional observation from these and other systems is that their performance improves only moderately when utilizing lexical resources2. We believe that the textual entailment field would benefit from more principled models for various entailment phenomena. Inspired by the earlier steps in the evolution of Statistical Machine Translation methods (such as the initial IBM models (Brown et al., 1993)), we formulate a concrete generative probabilistic modeling framework that captures the basic aspects of lexical entailment. Parameter estimation is addressed by an EM-based approach, which enables estimating the hidden lexical-level entailment parameters from entailment annotations which are available only at the sentence-level. While heuristic methods are limited in their ability to wisely integrate indications for entailment, probabilistic methods have the advantage of being extendable and enabling the utilization of wellfounded probabilistic methods such as the EM algorithm. We compared the performance of several model variations to previously published results on RTE data sets, as well as to our own implementation of typical lexical baselines. Results show that both the probabilistic model and our percentagecoverage baseline perform favorably relative to prior art. These results support the viability of the probabilistic framework while pointing at certain modeling aspects that need to be improved. 2 Probabilistic Model Under the lexical entailment scope, our modeling goal is obtaining a probabilistic score for the likelihood that all H’s terms are entailed by T. To that end, we model prominent aspects of lexical entailment, which were mostly neglected by previous lexical methods: (1) distinguishing different reliability levels of lexical resources; (2) allowing transitive chains of rule applications and considering their length when estimating their validity; and (3) considering multiple entailments when entailing a term. 2See ablation tests reports in http://aclweb.org/aclwiki/ index.php?title=RTE Knowledge Resources#Ablation Tests 559 Figure 1: The generative process of entailing terms of a hypothesis from a text. Edges represent entailment rules. There are 3 evidences for the entailment of hi :a rule from Resource1 , another one from Resource3 both suggesting that tj entails it, and a chain from t1through an intermediate term t0. 2.1 Model Description For T to entail H it is usually a necessary, but not sufficient, that every term h ∈ H would be entsauiflefidci by ,at t hleatast e one t teerrmm mt h ∈ ∈T (Glickman eet al., 2006). Figure s1t odneescr tiebrmes tth ∈e process komf entailing hypothesis terms. The trivial case is when identical terms, possibly at the stem or lemma level, appear in T and H (a direct match as tn and hm in Figure 1). Alternatively, we can establish entailment based on knowledge of entailing lexical-semantic relations, such as synonyms, hypernyms and morphological derivations, available in lexical resources (e.g the rule inference → reasoning from WordNet). (We.eg d theneo rutel by R(r) cthee → resource nwgh ficroh provided teht)e. rule r. Since entailment is a transitive relation, rules may compose transitive chains that connect a term t ∈ T ctoo a pteosrme rha ∈ Hive through hinatte cromnendeicatte a tteerrmms. t ∈Fo Tr instance, fr hom ∈ t hHe r thurleosu infer → inference armnds inference → reasoning we can dre →duc inef tehreen rcuele a infer → reasoning (were inference dise dthuec ein thteerm rueldeia intef trer →m as t0 in Figure 1). Multiple chains may connect t to h (as for tj and hi in Figure 1) or connect several terms in T to h (as t1 and tj are indicating the entailment of hi in Figure 1), thus providing multiple evidence for h’s entailment. It is reasonable to expect that if a term t indeed entails a term h, it is likely to find evidences for this relation in several resources. Taking a probabilistic perspective, we assume a parameter θR for each resource R, denoting its reliability, i.e. the prior probability that applying a rule from R corresponds to a valid entailment instance. Direct matches are considered as a special “resource”, called MATCH, for which θMATCH is expected to be close to 1. We now present our probabilistic model. For a text term t ∈ T to entail a hypothesis term h by a tcehxatin te c, mde tn ∈ote Td by etn →tcai h, thhyep application mof h every r ∈ c must be valid. N −→ote h ,t thhaet a pruplleic r i onn a cfh eaviner c rco ∈nne cc mtsu tswt ob ete vramlisd (its oleteft t-hhaatnd a- rsuildee ran ind aits c righthand-side, denoted lhs → rhs). The lhs of the first rhualned i-ns c eis, td ∈ oTte adn ldh sth →e r rhhss )o.f T Tthhee l lahsts r oufle t hine ifitr sist rhu ∈ iHn. c W ise t d ∈en Tote a nthde t event so fo a vhael ilda rtu rluel applicathio ∈n by l Whse →dren orhtes. t Sei envceen a-priori a d ru rluel r aips pvliacliadwith probability θR(r) , ancnde assuming independence of all r ∈ c, we obtain Eq. 1 to specify the probability rof ∈ ∈th ce, weveen otb tt i→cn Ehq. Next, pleetc C(h) ede pnroobtethe set of chains which− → suggest txhte, leentt Cail(mhe)n dt eonfo hte. The probability that T does not entail h at all (by any chain), specified in Eq. 2, is the probability that all these chains are not valid. Finally, the probability that T entails all of H, assuming independence of H’s terms, is the probability that every h ∈ H is entailed, as given ien p Eq. a3b. Nityot tihceat t ehvaet yth here ∈ c oHul ids be a term h which is not covered by any available rule chain. Under this formulation, we assume that each such h is covered by a single rule coming from a special “resource” called UNCOVERED (expecting θUNCOVERED to be relatively small). p(t −→c h) = Yp(lhs →r rhs) = Yr∈c p(T 9 h) = Y YθR(r)(1) Yr∈c [1 − p(t− →c h)] (2) c∈YC(h) p(T → H) = Y p(T → h) (3) hY∈H As can be seen, our model indeed distinguishes varying resource reliability, decreases entailment probability as rule chains grow and increases it when entailment of a term is supported by multiple chains. The above treatment of uncovered terms in H, as captured in Eq. 3, assumes that their entailment probability is independent of the rest of the hypothesis. However, when the number of covered hypothesis terms increases the probability that the remaining terms are actually entailed by T increases too 560 (even though we do not have supporting knowledge for their entailment). Thus, an alternative model is to group all uncovered terms together and estimate the overall probability of their joint entailment as a function of the lexical coverage of the hypothesis. We denote Hc as the subset of H’s terms which are covered by some rule chain and Huc as the remaining uncovered part. Eq. 3a then provides a refined entailment model for H, in which the second term specifies the probability that Huc is entailed given that Hc is validly entailed and the corresponding lengths: p(T→H) = [Yp(T→h)]·p(T→Huc hY∈Hc 2.2 | |Hc|,|H|) (3a) Parameter Estimation The difficulty in estimating the θR values is that these are term-level parameters while the RTEtraining entailment annotation is given for the sentence-level. Therefore, we use EM-based estimation for the hidden parameters (Dempster et al., 1977). In the E step we use the current θR values to compute all whcr (T, H) values for each training pair. whcr (T, H) stands for the posterior probability that application of the rule r in the chain c for h ∈ H tish valid, given nth oaft heieth reurl eT r e innta thiles c Hha or not ra hcc ∈ord Hing to the training annotation (see Eq. 4). Remember that a rule r provides an entailment relation between its left-hand-side (lhs) and its right-hand-side (rhs). Therefore Eq. 4 uses the notation lhs →r rhs to designate the application of the rule r (similar htos Eq. 1). wEhc:r(T,H)=   p (lTh9→sH−→ |rlhsrp−→ rh(Tsr→9|hTsH )9→p(lhHs−→ r) =hs)if(4T)9→H After applying Bayes’ rule we get a fraction with Eq. 3 in its denominator and θR(r) as the second term of the numerator. The first numerator term is defined as in Eq. 3 except that for the corresponding rule application we substitute θR(r) by 1(per the conditioning event). The probabilistic model defined by Eq. 1-3 is a loop-free directed acyclic graphical model (aka a Bayesian network). Hence the E-step probabilities can be efficiently calculated using the belief propagation algorithm (Pearl, 1988). The M step uses Eq. 5 to update the parameter set. For each resource R we average the whcr (T, H) val- ues for all its rule applications in the training, whose total number is denoted nR. M : θR=n1RTX,HhX∈Hc∈XC(h)r∈c|RX(r)=wRhcr(T,H) (5) For Eq. 3a we need to estimate also p(T→Huc | |Hc| ,|H|). 3Tah iws eis n ndeoende t directly avteia a amlsaoxi pm(Tu→m Hlikeli-| |hHoo|d, eHst|i)m.a Tthioins over tehe d training set, by calculating the proportion of entailing examples within the set of all examples of a given hypothesis length (|H|) aonfd a a given lneusm ofbe ar goifv ecnov heyrepdo hteersmiss (|Hc|). HA|)s |Hc| we tvaekne tnhuem nbuemrb oefr ocofv videerendtic taelr mtesrm (|sH in| )T. a Ands |HH (exact match) suinmcbee irn o afl imdeonstti caall cases itner Tms a nind H which have an exact match in T are indeed entailed. We also tried initializing the EM algorithm with these direct estimations but did not obtain performance improvements. 3 Evaluations and Results The 5th Recognizing Textual Entailment challenge (RTE-5) introduced a new search task (Bentivogli et al., 2009) which became the main task in RTE6 (Bentivogli et al., 2010). In this task participants should find all sentences that entail a given hypothesis in a given document cluster. This task’s data sets reflect a natural distribution of entailments in a corpus and demonstrate a more realistic scenario than the previous RTE challenges. In our system, sentences are tokenized and stripped of stop words and terms are lemmatized and tagged for part-of-speech. As lexical resources we use WordNet (WN) (Fellbaum, 1998), taking as entailment rules synonyms, derivations, hyponyms and meronyms of the first senses of T and H terms, and the CatVar (Categorial Variation) database (Habash and Dorr, 2003). We allow rule chains of length up to 4 in WordNet (WN4). We compare our model to two types of baselines: (1) RTE published results: the average of the best runs of all systems, the best and second best performing lexical systems and the best full system of each challenge; (2) our implementation of lexical 561 coverage model, tuning the percentage-of-coverage threshold for entailment on the training set. This model uses the same configuration as ourprobabilistic model. We also implemented an Information Re- trieval style baseline3 (both with and without lexical expansions), but given its poorer performance we omit its results here. Table 1 presents the results. We can see that both our implemented models (probabilistic and coverage) outperform all RTE lexical baselines on both data sets, apart from (Majumdar and Bhattacharyya, 2010) which incorporates additional lexical resources, a named entity recognizer and a co-reference system. On RTE-5, the probabilistic model is comparable in performance to the best full system, while the coverage model achieves considerably better results. We notice that our implemented models successfully utilize resources to increase performance, as opposed to typical smaller or less consistent improvements in prior works (see Section 1). ModelRTE-5F1%RTE-6 ERT2b avne sgdst.b floeu fsxl taic slyealxs tisyceysmastlemesyms tem4 3504 . 36.4531 4 34873. 0 .68254 evrcagon+ o CW raeN tsVo4a+urCcaetVr43479685. 25384 4534. 5817 Tabspticrlaoe1:+ Envo CW arlueN tasV4oi+urnCcaetsVularonRTE-5and4 R521 T. 80 E-6.RT4 s25 y. s9635t1ems (1)(MacKinlay and Baldwin, 2009), (2)(Clark and Harrison, 2010), (3)(Mirkin et al., 2009)(2 submitted runs), (4)(Majumdar and Bhattacharyya, 2010) and (5)(Jia et al., 2010). are: While the probabilistic and coverage models are comparable on RTE-6 (with non-significant advantage for the former), on RTE-5 the latter performs 3Utilizing Lucene search engine (http://lucene.apache.org) better, suggesting that the probabilistic model needs to be further improved. In particular, WN4 performs better than the single-step WN only on RTE-5, suggesting the need to improve the modeling of chain- ing. The fluctuations over the data sets and impacts of resources suggest the need for further investigation over additional data sets and resources. As for the coverage model, under our configuration it poses a bigger challenge for RTE systems than perviously reported baselines. It is thus proposed as an easy to implement baseline for future entailment research. 4 Conclusions and Future Work This paper presented, for the first time, a principled and relatively rich probabilistic model for lexical entailment, amenable for estimation of hidden lexicallevel parameters from standard sentence-level annotations. The positive results of the probabilistic model compared to prior art and its ability to exploit lexical resources indicate its future potential. Yet, further investigation is needed. For example, analyzing current model’s limitations, we observed that the multiplicative nature of eqs. 1and 3 (reflecting independence assumptions) is too restrictive, resembling a logical AND. Accordingly we plan to explore relaxing this strict conjunctive behavior through models such as noisy-AND (Pearl, 1988). We also intend to explore the contribution of our model, and particularly its estimated parameter values, within a complex system that integrates multiple levels of inference. Acknowledgments This work was partially supported by the NEGEV Consortium of the Israeli Ministry of Industry, Trade and Labor (www.negev-initiative.org), the PASCAL-2 Network of Excellence of the European Community FP7-ICT-2007-1-216886, the FIRBIsrael research project N. RBIN045PXH and by the Israel Science Foundation grant 1112/08. References Roy Bar-Haim, Jonathan Berant, Ido Dagan, Iddo Greental, Shachar Mirkin, Eyal Shnarch, and Idan Szpektor. 2008. Efficient semantic deduction and approximate matching over compact parse forests. In Proceedings of Text Analysis Conference (TAC). 562 Luisa Bentivogli, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo, and Bernardo Magnini. 2009. The fifth PASCAL recognizing textual entailment challenge. In Proceedings of Text Analysis Conference (TAC). Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Trang Dang, and Danilo Giampiccolo. 2010. The sixth PASCAL recognizing textual entailment challenge. In Proceedings of Text Analysis Conference (TAC). Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2):263–3 11, June. Aljoscha Burchardt, Nils Reiter, Stefan Thater, and Anette Frank. 2007. A semantic approach to textual entailment: System evaluation and task analysis. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing. Peter Clark and Phil Harrison. 2010. BLUE-Lite: a knowledge-based lexical entailment system for RTE6. In Proceedings of Text Analysis Conference (TAC). Courtney Corley and Rada Mihalcea. 2005. Measuring the semantic similarity of texts. In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment. Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006. The PASCAL recognising textual entailment challenge. In Lecture Notes in Computer Science, volume 3944, pages 177–190. A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society, se- ries [B], 39(1): 1–38. Christiane Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press. Oren Glickman, Eyal Shnarch, and Ido Dagan. 2006. Lexical reference: a semantic matching subtask. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 172–179. Association for Computational Linguistics. Nizar Habash and Bonnie Dorr. 2003. A categorial variation database for english. In Proceedings of the North American Association for Computational Linguistics. Houping Jia, Xiaojiang Huang, Tengfei Ma, Xiaojun Wan, and Jianguo Xiao. 2010. PKUTM participation at TAC 2010 RTE and summarization track. In Proceedings of Text Analysis Conference (TAC). Bill MacCartney and Christopher D. Manning. 2007. Natural logic for textual inference. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing. Andrew MacKinlay and Timothy Baldwin. 2009. A baseline approach to the RTE5 search pilot. In Proceedings of Text Analysis Conference (TAC). Debarghya Majumdar and Pushpak Bhattacharyya. 2010. Lexical based text entailment system for main task of RTE6. In Proceedings of Text Analysis Conference (TAC). Mirkin, Roy Bar-Haim, Jonathan Berant, Ido Eyal Shnarch, Asher Stern, and Idan Szpektor. 2009. Addressing discourse and document structure in the RTE search task. In Proceedings of Text Analysis Conference (TAC). Judea Pearl. 1988. Probabilistic reasoning in intelligent systems: networks ofplausible inference. Morgan Kaufmann. Marta Tatu and Dan Moldovan. 2007. COGEX at RTE 3. In Proceedings of the ACL-PASCAL Workshop on Shachar Dagan, Textual Entailment and Paraphrasing. Rui Wang, Yi Zhang, and Guenter Neumann. 2009. A joint syntactic-semantic representation for recognizing textual relatedness. In Proceedings of Text Analysis Conference (TAC). Fabio Massimo Zanzotto and Alessandro Moschitti. 2006. Automatic learning of textual entailments with cross-pair similarities. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. 563

same-paper 2 0.80409902 282 acl-2011-Shift-Reduce CCG Parsing

Author: Yue Zhang ; Stephen Clark

Abstract: CCGs are directly compatible with binarybranching bottom-up parsing algorithms, in particular CKY and shift-reduce algorithms. While the chart-based approach has been the dominant approach for CCG, the shift-reduce method has been little explored. In this paper, we develop a shift-reduce CCG parser using a discriminative model and beam search, and compare its strengths and weaknesses with the chart-based C&C; parser. We study different errors made by the two parsers, and show that the shift-reduce parser gives competitive accuracies compared to C&C.; Considering our use of a small beam, and given the high ambiguity levels in an automatically-extracted grammar and the amount of information in the CCG lexical categories which form the shift actions, this is a surprising result.

3 0.78607643 312 acl-2011-Turn-Taking Cues in a Human Tutoring Corpus

Author: Heather Friedberg

Abstract: Most spoken dialogue systems are still lacking in their ability to accurately model the complex process that is human turntaking. This research analyzes a humanhuman tutoring corpus in order to identify prosodic turn-taking cues, with the hopes that they can be used by intelligent tutoring systems to predict student turn boundaries. Results show that while there was variation between subjects, three features were significant turn-yielding cues overall. In addition, a positive relationship between the number of cues present and the probability of a turn yield was demonstrated. 1

4 0.70064688 161 acl-2011-Identifying Word Translations from Comparable Corpora Using Latent Topic Models

Author: Ivan Vulic ; Wim De Smet ; Marie-Francine Moens

Abstract: A topic model outputs a set of multinomial distributions over words for each topic. In this paper, we investigate the value of bilingual topic models, i.e., a bilingual Latent Dirichlet Allocation model for finding translations of terms in comparable corpora without using any linguistic resources. Experiments on a document-aligned English-Italian Wikipedia corpus confirm that the developed methods which only use knowledge from word-topic distributions outperform methods based on similarity measures in the original word-document space. The best results, obtained by combining knowledge from wordtopic distributions with similarity measures in the original space, are also reported.

5 0.68338227 205 acl-2011-Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments

Author: Michael Mohler ; Razvan Bunescu ; Rada Mihalcea

Abstract: In this work we address the task of computerassisted assessment of short student answers. We combine several graph alignment features with lexical semantic similarity measures using machine learning techniques and show that the student answers can be more accurately graded than if the semantic measures were used in isolation. We also present a first attempt to align the dependency graphs of the student and the instructor answers in order to make use of a structural component in the automatic grading of student answers.

6 0.67088401 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing

7 0.64942157 112 acl-2011-Efficient CCG Parsing: A* versus Adaptive Supertagging

8 0.62286484 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

9 0.61504775 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features

10 0.60482657 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation

11 0.60256785 236 acl-2011-Optimistic Backtracking - A Backtracking Overlay for Deterministic Incremental Parsing

12 0.59829748 33 acl-2011-An Affect-Enriched Dialogue Act Classification Model for Task-Oriented Dialogue

13 0.59684467 97 acl-2011-Discovering Sociolinguistic Associations with Structured Sparsity

14 0.59044123 207 acl-2011-Learning to Win by Reading Manuals in a Monte-Carlo Framework

15 0.58944941 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

16 0.58815616 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

17 0.58812153 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

18 0.58517385 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

19 0.5848555 186 acl-2011-Joint Training of Dependency Parsing Filters through Latent Support Vector Machines

20 0.58411032 27 acl-2011-A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging