acl acl2010 acl2010-131 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Adam Pauls ; Dan Klein
Abstract: Hierarchical A∗ (HA∗) uses of a hierarchy of coarse grammars to speed up parsing without sacrificing optimality. HA∗ prioritizes search in refined grammars using Viterbi outside costs computed in coarser grammars. We present Bridge Hierarchical A∗ (BHA∗), a modified Hierarchial A∗ algorithm which computes a novel outside cost called a bridge outside cost. These bridge costs mix finer outside scores with coarser inside scores, and thus constitute tighter heuristics than entirely coarse scores. We show that BHA∗ substantially outperforms HA∗ when the hierarchy contains only very coarse grammars, while achieving comparable performance on more refined hierarchies.
Reference: text
sentIndex sentText sentNum sentScore
1 edu ey Abstract Hierarchical A∗ (HA∗) uses of a hierarchy of coarse grammars to speed up parsing without sacrificing optimality. [sent-3, score-0.514]
2 HA∗ prioritizes search in refined grammars using Viterbi outside costs computed in coarser grammars. [sent-4, score-0.858]
3 We present Bridge Hierarchical A∗ (BHA∗), a modified Hierarchial A∗ algorithm which computes a novel outside cost called a bridge outside cost. [sent-5, score-1.02]
4 These bridge costs mix finer outside scores with coarser inside scores, and thus constitute tighter heuristics than entirely coarse scores. [sent-6, score-1.742]
5 We show that BHA∗ substantially outperforms HA∗ when the hierarchy contains only very coarse grammars, while achieving comparable performance on more refined hierarchies. [sent-7, score-0.443]
6 1 Introduction The Hierarchical A∗ (HA∗) algorithm of Felzenszwalb and McAllester (2007) allows the use of a hierarchy of coarse grammars to speed up parsing without sacrificing optimality. [sent-8, score-0.514]
7 Pauls and Klein (2009) showed that a hierarchy of coarse grammars outperforms standard A∗ parsing for a range of grammars. [sent-9, score-0.486]
8 HA∗ operates by computing Viterbi inside and outside scores in an agendabased way, using outside scores computed under coarse grammars as heuristics which guide the search in finer grammars. [sent-10, score-1.64]
9 The outside scores computed by HA∗ are auxiliary quantities, useful only because they form admissible heuristics for search in finer grammars. [sent-11, score-0.688]
10 We show that a modification of the HA∗ algorithm can compute modified bridge outside scores which are tighter bounds on the true outside costs in finer grammars. [sent-12, score-1.421]
11 These bridge outside scores mix inside and outside costs from finer grammars with inside costs from coarser grammars. [sent-13, score-2.283]
12 Because the bridge costs represent tighter estimates of the true outside costs, we expect them to reduce the work of computing inside costs in finer grammars. [sent-14, score-1.455]
13 At the same time, because bridge costs mix computation from coarser and finer levels of the hierarchy, they are more expensive to compute than purely coarse outside costs. [sent-15, score-1.346]
14 Whether the work saved by using tighter estimates outweighs the extra computation needed to compute them is an empirical question. [sent-16, score-0.182]
15 In this paper, we show that the use ofbridge outside costs substantially outperforms the HA∗ algorithm when the coarsest levels of the hierarchy are very loose approximations of the target grammar. [sent-17, score-0.722]
16 For hierarchies with tighter estimates, we show that BHA∗ obtains comparable performance to HA∗. [sent-18, score-0.141]
17 sn−1 of length n, and a hierarchy of m weighted contextfree grammars G1 . [sent-26, score-0.214]
18 We call the most refined grammar Gm t Ghe target grammar, and all other (coarser) grammars auxiliary grammars. [sent-30, score-0.297]
19 Without loss of generality, we assume Chomsky normal form, so each non-terminal rule r in Gt has the form r = At → Bt Ct with weight wr . [sent-32, score-0.169]
20 The weight of a derivation is the sum of rule weights in the derivation. [sent-34, score-0.142]
21 The weight of the best (minimum) inside derivation for an edge e is called the Viterbi inside score β(e), and the weight of the 348 UppsalaP,r Sowce ed ein ,g 1s1 o-f16 th Jeu AlyC 2L0 210 1. [sent-35, score-0.761]
22 sn-1 Figure 1: Representations of the different types of items used in parsing and how they depend on each other. [sent-41, score-0.262]
23 (a) In HA∗, the inside item I(VPt , 3, 5) relies on the coarse outside item O(πt (VPt) , 3, 5) for outside estimates. [sent-42, score-1.563]
24 (b) In BHA∗ , the same inside item relies on the bridge outside item O˜(VPt , 3, 5), which mixes coarse and refined outside costs. [sent-43, score-1.916]
25 The coarseness of an item is indicated with dotted lines. [sent-44, score-0.22]
26 The goal of a 1-best parsing algorithm is to compute the Viterbi inside score of the edge (Gm, 0, n); the actual best parse can be reconstructed from backpointers in the standard way. [sent-52, score-0.385]
27 We assume that each auxiliary grammar Gt−1 forWmse a relaxed projection uoxfi Gt. [sent-53, score-0.171]
28 Ay grammar Gt−1 fiso a projection porf Gt itifo tnhe orfe G ex. [sent-54, score-0.145]
29 A projection is relaxed if, for every rule r = At → Bt Ct with weight wr the projection r0 = At0 → Bt0 Ct0 has weight wr0 ≤ wr in Gt−1. [sent-56, score-0.441]
30 In other words, the weight of r0 is a ≤low wer i bno Gund on the weight of all rules r in Gt wish aic lohw project ntod r o0. [sent-57, score-0.162]
31 2 Deduction Rules HA∗ and our modification BHA∗ can be formulated in terms of prioritized weighted deduction rules (Shieber et al. [sent-59, score-0.275]
32 A prioritized weighted deduction rule has the form φ1 : w1, . [sent-61, score-0.257]
33 , φn are the antecedent items of the deduction rule and φ0 is the conclusion item. [sent-72, score-0.458]
34 A deduction rule states that, given the antecedents φ1 , . [sent-73, score-0.236]
35 These deduction rules are “executed” within a generic agenda-driven algorithm, which constructs items in a prioritized fashion. [sent-86, score-0.518]
36 The algorithm maintains an agenda (a priority queue of items), as well as a chart of items already processed. [sent-87, score-0.451]
37 The fundamental operation of the algorithm is to pop the highest priority item φ from the agenda, put it into the chart with its current weight, and form using deduction rules any items which can be built by combining φ with items already in the chart. [sent-88, score-0.996]
38 If new or improved, resulting items are put on the agenda with priority given by p(·). [sent-89, score-0.407]
39 re B a cdaeudsuec atiloln a n rtuelece eids executed, we soonmstertuicmteeds refer to particular conclusion item as “waiting” on an other item(s) before it can be built. [sent-91, score-0.167]
40 Inside items I(At, i,j) represent possible derivations of the edge (At, i,j), while outside items O(At, i,j) represent derivations of G → s1 . [sent-94, score-0.897]
41 Inside items are used to compute Viterbi inside scores under grammar Gt, while outside items are eu ssceodr etos compute mVimterabri G outside scores. [sent-102, score-1.636]
42 The deduction rules which construct inside and outside items are given in Table 1. [sent-103, score-1.096]
43 The IN deduction rule combines two inside items over smaller spans with a grammar rule to form an inside item over larger spans. [sent-104, score-1.28]
44 The weight of the resulting item is the sum of the weights of the smaller inside items and the grammar rule. [sent-105, score-0.791]
45 However, the IN rule also requires that an outside score in the coarse grammar1 be computed before an inside item is built. [sent-106, score-1.119]
46 Once constructed, this coarse outside score is added to the weight of the conclusion item to form the priority of the resulting item. [sent-107, score-0.937]
47 In other words, the coarse outside score computed by the algorithm plays the same role as a heuristic in standard A∗ parsing (Klein and Manning, 2003). [sent-108, score-0.673]
48 Outside scores are computed by the OUT-L and OUT-R deduction rules. [sent-109, score-0.259]
49 These rules combine an outside item over a large span and inside items over smaller spans to form outside items over smaller spans. [sent-110, score-1.704]
50 Unlike the IN deduction, the OUT deductions only involve items from the same level of the hierarchy. [sent-111, score-0.271]
51 That is, whereas inside scores wait on coarse outside scores to be constructed, outside scores wait on inside scores at the same level in the hierarchy. [sent-112, score-1.815]
52 Conceptually, these deduction rules operate by 1For the coarsest grammar G1, the IN rule builds rules usingF o0r ra tsh aen c coouatrssideset s gcroarme. [sent-113, score-0.469]
53 Red underline indicates items constructed under the previous grammar in the hierarchy. [sent-115, score-0.373]
54 Red underline indicates items constructed under the previous grammar in the hierarchy. [sent-117, score-0.373]
55 first computing inside scores bottom-up in the coarsest grammar, then outside scores top-down in the same grammar, then inside scores in the next finest grammar, and so on. [sent-118, score-1.108]
56 However, the crucial aspect of HA∗ is that items from all levels of the hierarchy compete on the same queue, interleaving the computation of inside and outside scores at all levels. [sent-119, score-1.045]
57 The HA∗ deduction rules come with three important guarantees. [sent-120, score-0.262]
58 The first is a monotonicity guarantee: each item is popped off the agenda in order of its intrinsic priority pˆ (·). [sent-121, score-0.41]
59 Fthoer aingesinddea iit nem ors I(e) over edge e, trhioisr priority ˆp (I(e)) = β(e) α(e0) where e0 is the projection of e. [sent-122, score-0.205]
60 For outside items O(·) over edge e, this priority i. [sent-123, score-0.745]
61 The second is a correctness guarantee: when an inside/outside item is popped of the agenda, its weight is its true Viterbi inside/outside cost. [sent-125, score-0.278]
62 Taken together, these two imply an efficiency guarantee, which states that only items x whose intrinsic priority pˆ(x) is less than or equal to the Viterbi inside score of the goal are removed from the agenda. [sent-126, score-0.665]
63 4 HA∗ with Bridge Costs The outside scores computed by HA∗ are useful for prioritizing computation in more refined grammars. [sent-128, score-0.534]
64 The key property of these scores is that they form consistent and admissible heuristic costs for more refined grammars, but coarse outside costs are not the only quantity which satisfy this requirement. [sent-129, score-1.204]
65 As an alternative, we propose a novel “bridge” outside cost α˜ (e). [sent-130, score-0.392]
66 Intuitively, this cost represents the cost of the best derivation where rules “above” and “left” of an edge e come from Gt, and rules “below” and “right” of tchoem e come fGrom Gt−1 ; see Figure 2 for a graphithcael depiction. [sent-131, score-0.35]
67 M Gore formally, let the spine of an edge e = (At, i,j) for some derivation d be Gt St NPt N PtV tPXNtP-1VtPX -N1Pt-X1t-X t-1 s0 s1 s2 s3 s4 s5 sn-1 Figure 2: A concrete example of a possible bridge outside derivation for the bridge item O˜(VPt , 1, 4). [sent-132, score-1.293]
68 The spine of the derivation is shown in bold and colored in blue. [sent-134, score-0.14]
69 Rules from a coarser grammar are shown with dotted lines, and colored in red. [sent-135, score-0.226]
70 A bridge outside derivation of e is a derivation d of G → s1 . [sent-138, score-0.75]
71 sn such tdhearti every rdu loef on or l seft of the spine comes from Gt, and all other rules come from Gt−1 . [sent-144, score-0.164]
72 The score oGf ,t ahen d b aesltl osuthcehr d reurlievsa ctoiomn efo frro e mis G the bridge outside cost α˜ (e). [sent-145, score-0.687]
73 Like ordinary outside costs, bridge outside costs form consistent and admissible estimates of the true Viterbi outside score α(e) of an edge e. [sent-146, score-1.745]
74 Because bridge costs mix rules from the finer and coarser grammar, bridge costs are at least as good an estimate of the true outside score as entirely coarse outside costs, and will in general be much tighter. [sent-147, score-2.196]
75 That is, we have α(e0) ≤ α˜ (e) ≤ α(e) In particular, note that the bridge costs become better approximations farther right in the sentence, and the bridge cost of the last word in the sentence is equal to the Viterbi outside cost of that word. [sent-148, score-1.173]
76 To compute bridge outside costs, we introduce 350 O˜(At, bridge outside items i,j), shown graphically in Figure 1(b). [sent-149, score-1.523]
77 The deduction rules which build both inside items and bridge outside items are shown in Table 2. [sent-150, score-1.611]
78 First, inside items wait for bridge outside items at the same level, while outside items wait for inside items from the previous level. [sent-152, score-2.582]
79 Second, the left and right outside deductions are no longer symmetric bridge outside items can extended to the left given two coarse inside items, but can only be extended to the right given an exact inside item on the left and coarse inside item on the right. [sent-153, score-2.887]
80 5 Guarantees These deduction rules come with guarantees analogous to those of HA∗. [sent-155, score-0.292]
81 The efficiency guarantee remains the same, though because the intrinsic priorities are different, the set of items processed will be different from those processed by HA∗. [sent-157, score-0.389]
82 The key property of HA∗ needed for these proofs is that coarse outside costs form consistent and admissible heuristics for inside items, and exact inside costs form consistent and admissible heuristics for outside items. [sent-160, score-2.158]
83 BHA∗ also has this property, with bridge outside costs forming admissible and consistent heuristics for inside items, and coarse inside costs forming admissible and consistent heuristics for outside items. [sent-161, score-2.459]
84 In fact, BHA∗ has the potential to be slower BHA∗ – Ilsmei)Pu(tiodlshnBs32140 0-split1-split2-split3-splitHBAH4*-Asp*lit5-split Figure 3: Performance of HA∗ and BHA∗ as a function of increasing refinement of the coarse grammar. [sent-164, score-0.253]
85 Along the x-axis, we show which coarse grammars were used in the hierarchy. [sent-169, score-0.376]
86 For example, 3-5 indicates the 3-,4-, and 5-split grammars were used as coarse grammars. [sent-170, score-0.376]
87 builds both inside and bridge outside items under the target grammar, where HA∗ only builds inside items. [sent-171, score-1.447]
88 It is an empirical, grammar- and hierarchydependent question whether the increased tightness of the outside estimates outweighs the additional cost needed to compute them. [sent-172, score-0.489]
89 We demonstrate empirically in this section that for hierarchies with very loosely approximating coarse grammars, BHA∗ can outperform HA∗, while for hierarchies with good approximations, performance of the two algorithms is comparable. [sent-173, score-0.438]
90 We performed experiments with the grammars of Petrov et al. [sent-174, score-0.123]
91 The training procedure for these grammars produces a hierarchy of increasingly refined grammars through state-splitting, so a natural projection function πt is given. [sent-176, score-0.477]
92 We used the Berkeley Parser2 to learn such grammars from Sections 2-21 of the Penn Treebank (Marcus et al. [sent-177, score-0.123]
93 We tested these grammars on 300 sentences of length ≤ 25 of Section 23 of the 3Tr0e0eb seanntken. [sent-180, score-0.123]
94 com 351 In our first experiment, we construct 2-level hierarchies consisting of one coarse grammar and the target grammar. [sent-184, score-0.398]
95 By varying the coarse grammar from the 0-split (X-bar) through 5-split grammars, we can investigate the performance of each algorithm as a function of the coarseness of the coarse grammar. [sent-185, score-0.609]
96 We follow Pauls and Klein (2009) in using the number of items pushed as a machine- and implementation-independent measure of speed. [sent-186, score-0.28]
97 In Figure 3, we show the performance of HA∗ and BHA∗ as a function of the total number of items pushed onto the agenda. [sent-187, score-0.297]
98 We see that for very coarse approximating grammars, BHA∗ substantially outperforms HA∗, but for more refined approximating grammars the performance is comparable, with HA∗ slightly outperforming BHA∗ on the 3-split grammar. [sent-188, score-0.545]
99 We constructed two multi-level hierarchies: a 4-level hierarchy consisting of the 3-,4-,5-, and 6- split grammars, and 7-level hierarchy consisting of all grammars. [sent-190, score-0.209]
100 Our results echo the results of Pauls and Klein (2009): although the addition of the reasonably refined 4- and 5-split grammars produces modest performance gains, the addition of coarser grammars can actually hurt overall performance. [sent-192, score-0.433]
wordName wordTfidf (topN-words)
[('bha', 0.41), ('outside', 0.356), ('ha', 0.291), ('bridge', 0.272), ('inside', 0.264), ('coarse', 0.253), ('items', 0.243), ('gt', 0.215), ('deduction', 0.181), ('costs', 0.17), ('item', 0.167), ('grammars', 0.123), ('coarser', 0.106), ('finer', 0.101), ('priority', 0.091), ('hierarchy', 0.091), ('wr', 0.088), ('refined', 0.081), ('admissible', 0.077), ('pauls', 0.076), ('hierarchies', 0.075), ('vpt', 0.075), ('agenda', 0.073), ('viterbi', 0.071), ('grammar', 0.07), ('tighter', 0.066), ('felzenszwalb', 0.065), ('derivation', 0.061), ('projection', 0.059), ('scores', 0.056), ('coarsest', 0.056), ('edge', 0.055), ('guarantee', 0.054), ('heuristics', 0.053), ('rules', 0.052), ('spine', 0.049), ('wait', 0.049), ('weight', 0.047), ('mix', 0.045), ('mcallester', 0.045), ('prioritized', 0.042), ('bt', 0.041), ('wn', 0.039), ('outweighs', 0.037), ('pushed', 0.037), ('cost', 0.036), ('klein', 0.036), ('estimates', 0.036), ('approximating', 0.035), ('rule', 0.034), ('sn', 0.034), ('ct', 0.034), ('underline', 0.033), ('coarseness', 0.033), ('sj', 0.031), ('approximations', 0.031), ('guarantees', 0.03), ('colored', 0.03), ('proof', 0.029), ('come', 0.029), ('deductions', 0.028), ('sacrificing', 0.028), ('constructed', 0.027), ('popped', 0.027), ('monotonicity', 0.027), ('intrinsic', 0.025), ('gm', 0.025), ('queue', 0.025), ('consistent', 0.024), ('compute', 0.024), ('builds', 0.024), ('processed', 0.024), ('forming', 0.023), ('auxiliary', 0.023), ('score', 0.023), ('spans', 0.023), ('shieber', 0.022), ('computed', 0.022), ('symbol', 0.021), ('hierarchical', 0.021), ('executed', 0.021), ('antecedents', 0.021), ('dotted', 0.02), ('true', 0.02), ('si', 0.02), ('parsing', 0.019), ('red', 0.019), ('computation', 0.019), ('efficiency', 0.019), ('relaxed', 0.019), ('chart', 0.019), ('petrov', 0.018), ('substantially', 0.018), ('property', 0.017), ('correctness', 0.017), ('onto', 0.017), ('porf', 0.016), ('interleaving', 0.016), ('aic', 0.016), ('adpaul', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 131 acl-2010-Hierarchical A* Parsing with Bridge Outside Scores
Author: Adam Pauls ; Dan Klein
Abstract: Hierarchical A∗ (HA∗) uses of a hierarchy of coarse grammars to speed up parsing without sacrificing optimality. HA∗ prioritizes search in refined grammars using Viterbi outside costs computed in coarser grammars. We present Bridge Hierarchical A∗ (BHA∗), a modified Hierarchial A∗ algorithm which computes a novel outside cost called a bridge outside cost. These bridge costs mix finer outside scores with coarser inside scores, and thus constitute tighter heuristics than entirely coarse scores. We show that BHA∗ substantially outperforms HA∗ when the hierarchy contains only very coarse grammars, while achieving comparable performance on more refined hierarchies.
2 0.45560482 236 acl-2010-Top-Down K-Best A* Parsing
Author: Adam Pauls ; Dan Klein ; Chris Quirk
Abstract: We propose a top-down algorithm for extracting k-best lists from a parser. Our algorithm, TKA∗ is a variant of the kbest A∗ (KA∗) algorithm of Pauls and Klein (2009). In contrast to KA∗, which performs an inside and outside pass before performing k-best extraction bottom up, TKA∗ performs only the inside pass before extracting k-best lists top down. TKA∗ maintains the same optimality and efficiency guarantees of KA∗, but is simpler to both specify and implement.
3 0.091431871 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
Author: Mohit Bansal ; Dan Klein
Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.
4 0.08289656 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
Author: Liang Huang ; Kenji Sagae
Abstract: Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging “equivalent” stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce depen- dency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.
5 0.060797576 255 acl-2010-Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization
Author: Shay Cohen ; Noah A Smith
Abstract: We consider the search for a maximum likelihood assignment of hidden derivations and grammar weights for a probabilistic context-free grammar, the problem approximately solved by “Viterbi training.” We show that solving and even approximating Viterbi training for PCFGs is NP-hard. We motivate the use of uniformat-random initialization for Viterbi EM as an optimal initializer in absence of further information about the correct model parameters, providing an approximate bound on the log-likelihood.
6 0.055620063 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
7 0.055470526 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
8 0.051728822 169 acl-2010-Learning to Translate with Source and Target Syntax
9 0.051712032 195 acl-2010-Phylogenetic Grammar Induction
10 0.050056145 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System
11 0.048601802 260 acl-2010-Wide-Coverage NLP with Linguistically Expressive Grammars
12 0.044233564 69 acl-2010-Constituency to Dependency Translation with Forests
13 0.043604709 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
14 0.039237637 162 acl-2010-Learning Common Grammar from Multilingual Corpus
15 0.037703238 220 acl-2010-Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
16 0.036454845 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
17 0.036048487 71 acl-2010-Convolution Kernel over Packed Parse Forest
18 0.035981156 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment
19 0.035978198 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
20 0.035336453 124 acl-2010-Generating Image Descriptions Using Dependency Relational Patterns
topicId topicWeight
[(0, -0.106), (1, -0.043), (2, 0.029), (3, -0.025), (4, -0.08), (5, -0.075), (6, 0.122), (7, 0.01), (8, 0.01), (9, -0.071), (10, -0.058), (11, -0.055), (12, 0.007), (13, -0.053), (14, -0.051), (15, 0.011), (16, -0.027), (17, 0.013), (18, -0.168), (19, 0.132), (20, -0.054), (21, 0.134), (22, 0.146), (23, -0.064), (24, -0.044), (25, 0.355), (26, -0.181), (27, 0.323), (28, -0.13), (29, -0.054), (30, 0.198), (31, 0.15), (32, 0.042), (33, -0.32), (34, -0.063), (35, 0.091), (36, -0.027), (37, -0.115), (38, -0.044), (39, -0.025), (40, 0.05), (41, -0.006), (42, -0.017), (43, 0.057), (44, -0.052), (45, 0.007), (46, 0.04), (47, -0.019), (48, -0.027), (49, -0.073)]
simIndex simValue paperId paperTitle
same-paper 1 0.99206227 131 acl-2010-Hierarchical A* Parsing with Bridge Outside Scores
Author: Adam Pauls ; Dan Klein
Abstract: Hierarchical A∗ (HA∗) uses of a hierarchy of coarse grammars to speed up parsing without sacrificing optimality. HA∗ prioritizes search in refined grammars using Viterbi outside costs computed in coarser grammars. We present Bridge Hierarchical A∗ (BHA∗), a modified Hierarchial A∗ algorithm which computes a novel outside cost called a bridge outside cost. These bridge costs mix finer outside scores with coarser inside scores, and thus constitute tighter heuristics than entirely coarse scores. We show that BHA∗ substantially outperforms HA∗ when the hierarchy contains only very coarse grammars, while achieving comparable performance on more refined hierarchies.
2 0.96286994 236 acl-2010-Top-Down K-Best A* Parsing
Author: Adam Pauls ; Dan Klein ; Chris Quirk
Abstract: We propose a top-down algorithm for extracting k-best lists from a parser. Our algorithm, TKA∗ is a variant of the kbest A∗ (KA∗) algorithm of Pauls and Klein (2009). In contrast to KA∗, which performs an inside and outside pass before performing k-best extraction bottom up, TKA∗ performs only the inside pass before extracting k-best lists top down. TKA∗ maintains the same optimality and efficiency guarantees of KA∗, but is simpler to both specify and implement.
3 0.29759237 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
Author: Mohit Bansal ; Dan Klein
Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.
4 0.28344223 186 acl-2010-Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two
Author: Benoit Sagot ; Giorgio Satta
Abstract: Linear Context-Free Rewriting Systems (LCFRSs) are a grammar formalism capable of modeling discontinuous phrases. Many parsing applications use LCFRSs where the fan-out (a measure of the discontinuity of phrases) does not exceed 2. We present an efficient algorithm for optimal reduction of the length of production right-hand side in LCFRSs with fan-out at most 2. This results in asymptotical running time improvement for known parsing algorithms for this class.
5 0.27026048 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
Author: Liang Huang ; Kenji Sagae
Abstract: Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging “equivalent” stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce depen- dency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.
6 0.26954594 92 acl-2010-Don't 'Have a Clue'? Unsupervised Co-Learning of Downward-Entailing Operators.
7 0.25290301 255 acl-2010-Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization
8 0.2204017 137 acl-2010-How Spoken Language Corpora Can Refine Current Speech Motor Training Methodologies
9 0.20051168 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
10 0.20045327 7 acl-2010-A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices
11 0.17890513 111 acl-2010-Extracting Sequences from the Web
12 0.17327233 98 acl-2010-Efficient Staggered Decoding for Sequence Labeling
13 0.17201239 222 acl-2010-SystemT: An Algebraic Approach to Declarative Information Extraction
14 0.16556576 126 acl-2010-GernEdiT - The GermaNet Editing Tool
15 0.16307566 61 acl-2010-Combining Data and Mathematical Models of Language Change
16 0.15966533 250 acl-2010-Untangling the Cross-Lingual Link Structure of Wikipedia
17 0.1520153 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment
18 0.15174177 63 acl-2010-Comparable Entity Mining from Comparative Questions
19 0.15134495 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study
20 0.14937799 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System
topicId topicWeight
[(14, 0.02), (25, 0.068), (33, 0.029), (34, 0.29), (42, 0.011), (59, 0.097), (68, 0.134), (73, 0.026), (76, 0.014), (78, 0.03), (83, 0.052), (84, 0.054), (98, 0.077)]
simIndex simValue paperId paperTitle
same-paper 1 0.79767436 131 acl-2010-Hierarchical A* Parsing with Bridge Outside Scores
Author: Adam Pauls ; Dan Klein
Abstract: Hierarchical A∗ (HA∗) uses of a hierarchy of coarse grammars to speed up parsing without sacrificing optimality. HA∗ prioritizes search in refined grammars using Viterbi outside costs computed in coarser grammars. We present Bridge Hierarchical A∗ (BHA∗), a modified Hierarchial A∗ algorithm which computes a novel outside cost called a bridge outside cost. These bridge costs mix finer outside scores with coarser inside scores, and thus constitute tighter heuristics than entirely coarse scores. We show that BHA∗ substantially outperforms HA∗ when the hierarchy contains only very coarse grammars, while achieving comparable performance on more refined hierarchies.
2 0.72929561 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web
Author: Aarne Ranta ; Krasimir Angelov ; Thomas Hallgren
Abstract: This is a system demo for a set of tools for translating texts between multiple languages in real time with high quality. The translation works on restricted languages, and is based on semantic interlinguas. The underlying model is GF (Grammatical Framework), which is an open-source toolkit for multilingual grammar implementations. The demo will cover up to 20 parallel languages. Two related sets of tools are presented: grammarian’s tools helping to build translators for new domains and languages, and translator’s tools helping to translate documents. The grammarian’s tools are designed to make it easy to port the technique to new applications. The translator’s tools are essential in the restricted language context, enabling the author to remain in the fragments recognized by the system. The tools that are demonstrated will be ap- plied and developed further in the European project MOLTO (Multilingual On-Line Translation) which has started in March 2010 and runs for three years. 1 Translation Needs for the Web The best-known translation tools on the web are Google translate1 and Systran2. They are targeted to consumers of web documents: users who want to find out what a given document is about. For this purpose, browsing quality is sufficient, since the user has intelligence and good will, and understands that she uses the translation at her own risk. Since Google and Systran translations can be grammatically and semantically flawed, they don’t reach publication quality, and cannot hence be used by the producers of web documents. For instance, the provider of an e-commerce site cannot take the risk that the product descriptions or selling conditions have errors that change the original intentions. There are very few automatic translation systems actually in use for producers of information. As already 1www .google . com/t rans l e at 2www. systransoft . com noted by Bar-Hillel (1964), machine translation is one of those AI-complete tasks that involves a trade-off between coverage and precision, and the current mainstream systems opt for coverage. This is also what web users expect: they want to be able to throw just anything at the translation system and get something useful back. Precision-oriented approaches, the prime example of which is METEO (Chandioux 1977), have not been popular in recent years. However, from the producer’s point of view, large coverage is not essential: unlike the consumer’s tools, their input is predictable, and can be restricted to very specific domains, and to content that the producers themselves are creating in the first place. But even in such tasks, two severe problems remain: • • The development cost problem: a large amount oTfh ew dorekv eisl onpemedeendt f coors building tmra:n asl laatorgrse afomr new domains and new languages. The authoring problem: since the method does nTohte ew aourkth foorri nalgl input, etmhe: :asu tihnocer othfe eth me source toexest of translation may need special training to write in a way that can be translated at all. These two problems have probably been the main obstacles to making high-quality restricted language translation more wide-spread in tasks where it would otherwise be applicable. We address these problems by providing tools that help developers of translation systems on the one hand, and authors and translators—i.e. the users of the systems—on the other. In the MOLTO project (Multilingual On-Line Translation)3, we have the goal to improve both the development and use of restricted language translation by an order of magnitude, as compared with the state of the art. As for development costs, this means that a system for many languages and with adequate quality can be built in a matter of days rather than months. As for authoring, this means that content production does not require the use of manuals or involve trial and error, both of which can easily make the work ten times slower than normal writing. In the proposed system demo, we will show how some of the building blocks for MOLTO can already now be used in web-based translators, although on a 3 www.molto-project .eu 66 UppsalaP,r Sowceeeddenin,g 1s3 o Jfu tlhye 2 A0C1L0. 2 ?c 01200 S1y0s Atesmso Dcieamtioonns ftorart Cioonms,p puatagteiso 6n6a–l7 L1in,guistics Figure 1: A multilingual GF grammar with reversible mappings from a common abstract syntax to the 15 languages currently available in the GF Resource Grammar Library. smaller scale as regards languages and application domains. A running demo system is available at http : / / grammat i cal framework .org : 4 1 9 6. 2 2 Multilingual Grammars The translation tools are based on GF, Grammatical Framework4 (Ranta 2004). GF is a grammar formalism—that is, a mathematical model of natural language, equipped with a formal notation for writing grammars and a computer program implementing parsing and generation which are declaratively defined by grammars. Thus GF is comparable with formalism such as HPSG (Pollard and Sag 1994), LFG (Bresnan 1982) or TAG (Joshi 1985). The novel feature of GF is the notion of multilingual grammars, which describe several languages simultaneously by using a common representation called abstract syntax; see Figure 1. In a multilingual GF grammar, meaning-preserving translation is provided as a composition of parsing and generation via the abstract syntax, which works as an interlingua. This model of translation is different from approaches based on other comparable grammar formalisms, such as synchronous TAGs (Shieber and Schabes 1990), Pargram (Butt & al. 2002, based on LFG), LINGO Matrix (Bender and Flickinger 2005, based on HPSG), and CLE (Core Language Engine, Alshawi 1992). These approaches use transfer rules between individual languages, separate for each pair of languages. Being interlingua-based, GF translation scales up linearly to new languages without the quadratic blowup of transfer-based systems. In transfer-based sys- 4www.grammaticalframework.org tems, as many as n(n − 1) components (transfer functtieomnss), are naeneyde ads nto( cover a)l cl language pairs nisnf bero tfhu ndci-rections. In an interlingua-based system, 2n + 1components are enough: the interlingua itself, plus translations in both directions between each language and the interlingua. However, in GF, n + 1 components are sufficient, because the mappings from the abstract syntax to each language (the concrete syntaxes) are reversible, i.e. usable for both generation and parsing. Multilingual GF grammars can be seen as an implementation of Curry’s distinction between tectogrammatical and phenogrammatical structure (Curry 1961). In GF, the tectogrammatical structure is called abstract syntax, following standard computer science terminology. It is defined by using a logical framework (Harper & al. 1993), whose mathematical basis is in the type theory of Martin-L o¨f (1984). Two things can be noted about this architecture, both showing im- provements over state-of-the-art grammar-based translation methods. First, the translation interlingua (the abstract syntax) is a powerful logical formalism, able to express semantical structures such as context-dependencies and anaphora (Ranta 1994). In particular, dependent types make it more expressive than the type theory used in Montague grammar (Montague 1974) and employed in the Rosetta translation project (Rosetta 1998). Second, GF uses a framework for interlinguas, rather than one universal interlingua. This makes the interlingual approach more light-weight and feasible than in systems assuming one universal interlingua, such as Rosetta and UNL, Universal Networking Language5 . It also gives more precision to special-purpose translation: the interlingua of a GF translation system (i.e. the abstract syntax of a multilingual grammar) can encode precisely those structures and distinctions that are relevant for the task at hand. Thus an interlingua for mathematical proofs (Hallgren and Ranta 2000) is different from one for commands for operating an MP3 player (Perera and Ranta 2007). The expressive power of the logical framework is sufficient for both kinds of tasks. One important source of inspiration for GF was the WYSIWYM system (Power and Scott 1998), which used domain-specific interlinguas and produced excellent quality in multilingual generation. But the generation components were hard-coded in the program, instead of being defined declaratively as in GF, and they were not usable in the direction of parsing. 3 Grammars and Ontologies Parallel to the first development efforts of GF in the late 1990’s, another framework idea was emerging in web technology: XML, Extensible Mark-up Language, which unlike HTML is not a single mark-up language but a framework for creating custom mark-up lan5www .undl .org 67 guages. The analogy between GF and XML was seen from the beginning, and GF was designed as a formalism for multilingual rendering of semantic content (Dymetman and al. 2000). XML originated as a format for structuring documents and structured data serialization, but a couple ofits descendants, RDF(S) and OWL, developed its potential to formally express the semantics of data and content, serving as the fundaments of the emerging Semantic Web. Practically any meaning representation format can be converted into GF’s abstract syntax, which can then be mapped to different target languages. In particular the OWL language can be seen as a syntactic sugar for a subset of Martin-L o¨f’s type theory so it is trivial to embed it in GF’s abstract syntax. The translation problem defined in terms of an ontology is radically different from the problem of translating plain text from one language to another. Many of the projects in which GF has been used involve precisely this: a meaning representation formalized as GF abstract syntax. Some projects build on previously existing meaning representation and address mathematical proofs (Hallgren and Ranta 2000), software specifications (Beckert & al. 2007), and mathematical exercises (the European project WebALT6). Other projects start with semantic modelling work to build meaning representations from scratch, most notably ones for dialogue systems (Perera and Ranta 2007) in the European project TALK7. Yet another project, and one closest to web translation, is the multilingual Wiki system presented in (Meza Moreno and Bringert 2008). In this system, users can add and modify reviews of restaurants in three languages (English, Spanish, and Swedish). Any change made in any of the languages gets automatically translated to the other languages. To take an example, the OWL-to-GF mapping trans- lates OWL’s classes to GF’s categories and OWL’s properties to GF’s functions that return propositions. As a running example in this and the next section, we will use the class of integers and the two-place property of being divisible (“x is divisible by y”). The correspondences are as follows: Clas s (pp : intege r . . . ) m catm integer Ob j e ctP roperty ( pp :div domain (pp : int ege r ) range ( pp :integer ) ) m funm div : int eger -> 4 int ege r -> prop Grammar Engineer’s Tools In the GF setting, building a multilingual translation system is equivalent to building a multilingual GF 6EDC-22253, webalt .math .he l inki . fi s 7IST-507802, 2004–2006, www .t alk-pro j e ct .org grammar, which in turn consists of two kinds of components: • a language-independent abstract syntax, giving tahe l snegmuaangtei-ci nmdeopdeenl dveinat tw ahbisctrha ctrtan ssylnattiaoxn, gisi performed; • for each language, a concrete syntax mapping abfstorrac eta syntax turaegese ,t oa strings ien s tyhnatta language. While abstract syntax construction is an extra task compared to many other kinds of translation methods, it is technically relatively simple, and its cost is moreover amortized as the system is extended to new languages. Concrete syntax construction can be much more demanding in terms of programming skills and linguistic knowledge, due to the complexity of natural languages. This task is where GF claims perhaps the highest advantage over other approaches to special-purpose grammars. The two main assets are: • • Programming language support: GF is a modern fPuroncgtriaomnaml programming language, w isith a a powerful type system and module system supporting modular and collaborative programming and reuse of code. RGL, the GF Resource Grammar Library, implementing Fthe R bsoausicrc linguistic dre Ltaiiblsr orfy l iamn-guages: inflectional morphology and syntactic combination functions. The RGL covers fifteen languages at the moment, shown in Figure 1; see also Khegai 2006, El Dada and Ranta 2007, Angelov 2008, Ranta 2009a,b, and Enache et al. 2010. To give an example of what the library provides, let us first consider the inflectional morphology. It is presented as a set of lexicon-building functions such as, in English, mkV : St r -> V i.e. function mkV, which takes a string (St r) as its argument and returns a verb (V) as its value. The verb is, internally, an inflection table containing all forms of a verb. The function mkV derives all these forms from its argument string, which is the infinitive form. It predicts all regular variations: (mkV
3 0.584813 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
Author: Yotaro Watanabe ; Masayuki Asahara ; Yuji Matsumoto
Abstract: In predicate-argument structure analysis, it is important to capture non-local dependencies among arguments and interdependencies between the sense of a predicate and the semantic roles of its arguments. However, no existing approach explicitly handles both non-local dependencies and semantic dependencies between predicates and arguments. In this paper we propose a structured model that overcomes the limitation of existing approaches; the model captures both types of dependencies simultaneously by introducing four types of factors including a global factor type capturing non-local dependencies among arguments and a pairwise factor type capturing local dependencies between a predicate and an argument. In experiments the proposed model achieved competitive results compared to the stateof-the-art systems without applying any feature selection procedure.
4 0.55933475 236 acl-2010-Top-Down K-Best A* Parsing
Author: Adam Pauls ; Dan Klein ; Chris Quirk
Abstract: We propose a top-down algorithm for extracting k-best lists from a parser. Our algorithm, TKA∗ is a variant of the kbest A∗ (KA∗) algorithm of Pauls and Klein (2009). In contrast to KA∗, which performs an inside and outside pass before performing k-best extraction bottom up, TKA∗ performs only the inside pass before extracting k-best lists top down. TKA∗ maintains the same optimality and efficiency guarantees of KA∗, but is simpler to both specify and implement.
5 0.5240407 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network
Author: Roberto Navigli ; Simone Paolo Ponzetto
Abstract: In this paper we present BabelNet a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the resource with lexical information for all languages. We conduct experiments on new and existing gold-standard datasets to show the high quality and coverage of the resource. –
6 0.45849085 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems
7 0.43396378 169 acl-2010-Learning to Translate with Source and Target Syntax
8 0.43393683 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
9 0.43099952 162 acl-2010-Learning Common Grammar from Multilingual Corpus
10 0.4293865 65 acl-2010-Complexity Metrics in an Incremental Right-Corner Parser
11 0.42780855 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
12 0.42763826 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons
13 0.42620367 216 acl-2010-Starting from Scratch in Semantic Role Labeling
14 0.4256258 114 acl-2010-Faster Parsing by Supertagger Adaptation
15 0.42470527 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
16 0.42345601 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
17 0.42332441 59 acl-2010-Cognitively Plausible Models of Human Language Processing
18 0.42156532 185 acl-2010-Open Information Extraction Using Wikipedia
19 0.42129248 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection
20 0.42116585 136 acl-2010-How Many Words Is a Picture Worth? Automatic Caption Generation for News Images