emnlp emnlp2010 emnlp2010-116 knowledge-graph by maker-knowledge-mining

116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction


Source: pdf

Author: Tahira Naseem ; Harr Chen ; Regina Barzilay ; Mark Johnson

Abstract: We present an approach to grammar induction that utilizes syntactic universals to improve dependency parsing across a range of languages. Our method uses a single set of manually-specified language-independent rules that identify syntactic dependencies between pairs of syntactic categories that commonly occur across languages. During inference of the probabilistic model, we use posterior expectation constraints to require that a minimum proportion of the dependencies we infer be instances of these rules. We also automatically refine the syntactic categories given in our coarsely tagged input. Across six languages our approach outperforms state-of-theart unsupervised methods by a significant margin.1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Our method uses a single set of manually-specified language-independent rules that identify syntactic dependencies between pairs of syntactic categories that commonly occur across languages. [sent-2, score-0.457]

2 During inference of the probabilistic model, we use posterior expectation constraints to require that a minimum proportion of the dependencies we infer be instances of these rules. [sent-3, score-0.632]

3 In this paper, we present an alternative grammar induction approach that exploits these structural correspondences by declaratively encoding a small set of universal dependency rules. [sent-24, score-0.776]

4 , high-level part-ofspeech tags) and a set of universal rules defined over these categories, such as those in Table 1. [sent-27, score-0.561]

5 These rules incorporate the definitional properties of syntactic categories in terms of their interdependencies and thus are universal across languages. [sent-28, score-0.661]

6 They can potentially help disambiguate structural ambiguities that are difficult to learn from data alone for example, our rules prefer analyses in which verbs are dependents of auxiliaries, even though analyzing auxiliaries as dependents of verbs is also consistent with the data. [sent-29, score-0.271]

7 Leveraging these universal rules has the potential to improve parsing performance for a large number of human languages; this is particularly relevant to the processing of low-resource — Proce MdiInTg,s M oaf sthseac 2h0u1s0et Ctso, UnfeSrAe,nc 9e-1 o1n O Ecmtopbireirca 2l0 M10e. [sent-30, score-0.596]

8 Furthermore, these universal rules are compact and well-understood, making them easy to manually construct. [sent-33, score-0.561]

9 In addition to these universal dependencies, each specific language typically possesses its own idiosyncratic set of dependencies. [sent-34, score-0.326]

10 We address this challenge by requiring the universal constraints to only hold in expectation rather than absolutely, i. [sent-35, score-0.675]

11 We formulate a generative Bayesian model that explains the observed data while accounting for declarative linguistic rules during inference. [sent-38, score-0.383]

12 These rules are used as expectation constraints on the posterior distribution over dependency structures. [sent-39, score-0.803]

13 , 2009), which we apply to a variational inference algorithm for our parsing model. [sent-41, score-0.256]

14 Since the universals guide induction toward linguistically plausible structures, automatic refinement becomes feasible even in the absence of manually annotated syntactic trees. [sent-43, score-0.519]

15 We test the effectiveness of our grammar induction model on six Indo-European languages from three language groups: English, Danish, Portuguese, Slovene, Spanish, and Swedish. [sent-44, score-0.301]

16 Our results demonstrate that universal rules greatly improve the accuracy of dependency parsing across all of these languages, outperforming current stateof-the-art unsupervised grammar induction methods (Headden III et al. [sent-46, score-1.014]

17 The way we apply constraints is clos1235 est to the latter two approaches of posterior regularization and generalized expectation criteria. [sent-53, score-0.495]

18 In the posterior regularization framework, constraints are expressed in the form of expectations on posteriors (Gra ¸ca et al. [sent-54, score-0.257]

19 Our approach also ex- presses constraints as expectations on the posterior; we utilize the machinery of their framework within a variational inference algorithm with a mean field approximation. [sent-61, score-0.376]

20 Generalized expectation criteria, another technique for declaratively specifying expectation constraints, has previously been successfully applied to the task of dependency parsing (Druck et al. [sent-62, score-0.704]

21 Furthermore, we find that our method outperforms the generalized expectation approach using corpus-specific constraints. [sent-69, score-0.278]

22 As with their work, we also use nonparametric priors for category refinement and employ variational methods for inference. [sent-75, score-0.342]

23 However, our goal is to apply category refinement to dependency parsing, rather than to PCFGs, requiring a substantially different model formulation. [sent-76, score-0.283]

24 On the acquisition side, Daum e´ III and Campbell (2007) proposed a computational technique for discovering universal implications in typological features. [sent-81, score-0.389]

25 We also argue that cross-language universals are beneficial for automatic language processing; however, our focus is on learning language-specific adaptations of these rules from data. [sent-84, score-0.403]

26 3 Model The central hypothesis of this work is that unsupervised dependency grammar induction can be improved using universal linguistic knowledge. [sent-85, score-0.752]

27 Toward this end our approach is comprised of two components: a probabilistic model that explains how sentences are generated from latent dependency structures and a technique for incorporating declarative rules into the inference process. [sent-86, score-0.45]

28 Each node of the dependency tree is comprised of three random variables: an observed coarse symbol s, a hidden refined subsymbol z, and an observed word x. [sent-95, score-0.909]

29 In the following let the parent of the current node have symbol s0 and subsymbol z0; the root node is generated from separate root-specific distributions. [sent-96, score-0.762]

30 As we explain at the end of this section, without this aspect the generative story closely resembles the classic dependency model with valence (DMV) of Klein and Manning (2004). [sent-98, score-0.275]

31 First we draw symbol s from a finite multinomial πβxθzφs 0c-wtdroeispfat-rdnsleo c(vdhesnybplrtuamdecswrib0hvyoatnemlrsdyob()mcvzaoefbln0rs,dauznevbdfsaoync)rdmezbatcoxhlnsfec,ort Figure 1: Graphical representation of the model and a summary of the notation. [sent-99, score-0.273]

32 There is a copy of the outer plate for each distinct symbol in the observed coarse tags. [sent-100, score-0.37]

33 As the indices indicate, we have one such set of multinomial parameters for every combination of parent symbol s0 and subsymbol z0 along with a context c. [sent-106, score-0.677]

34 Next we draw the refined syntactic category subsymbol z from an infinite multinomial with parameters πss0z0c. [sent-109, score-0.529]

35 Here the selection of π is indexed by the current node’s coarse symbol s, the symbol s0 and subsymbol z0 of the parent node, and the context c of the current node. [sent-110, score-1.002]

36 For each unique coarse symbol s we tie together the distributions πss0z0c for all possible parent and context combinations (i. [sent-111, score-0.459]

37 By formulating the generation of z as an HDP, we can share parameters for a single coarse symbol’s subsymbol distribution while allowing for individual variability based on node parent and context. [sent-116, score-0.611]

38 Note that parameters are not shared across different coarse symbols, preserving the distinctions expressed via the coarse tag 1237 annotations. [sent-117, score-0.284]

39 Finally, we generate the word x from a finite multinomial with parameters φsz, where s and z are the symbol and subsymbol of the current node. [sent-118, score-0.588]

40 We follow an approach similar to the widely-referenced DMV model (Klein and Manning, 2004), which forms the basis of the current state-of-the-art unsupervised grammar induction model (Headden III et al. [sent-121, score-0.267]

41 We encode more detailed valence information than Klein and Manning (2004) and condition child generation on parent valence. [sent-124, score-0.268]

42 Specifically, after drawing a node we first decide whether to proceed to generate a child or to stop conditioned on the parent symbol and subsymbol and the current context (direction and valence). [sent-125, score-0.771]

43 We can combine the stopping decision with the generation of the child symbol by including a distinguished STOP symbol as a possible outcome in distribution θ. [sent-127, score-0.53]

44 No-Split Model Variant In the absence of subsymbol refinement (i. [sent-128, score-0.446]

45 , when subsymbol z is set to be identical to coarse symbol s), our model simplifies in some respects. [sent-130, score-0.685]

46 In particular, the HDP gener- ation of z is obviated and word x is drawn from a word distribution φs indexed solely by coarse symbol s. [sent-131, score-0.37]

47 The resulting simplified model closely resembles DMV (Klein and Manning, 2004), except that it 1) explicitly generate words x rather than only partof-speech tags s, 2) encodes richer context and valence information, and 3) imposes a Dirichlet prior on the symbol distribution θ. [sent-132, score-0.333]

48 4 Inference with Constraints We now describe how to augment our generative model of dependency structure with constraints derived from linguistic knowledge. [sent-133, score-0.327]

49 Incorporating arbitrary linguistic rules directly in the generative story is challenging as it requires careful tuning of either the model structure or priors for each constraint. [sent-134, score-0.33]

50 (2007), we constrain the posterior to satisfy the rules in expectation during inference. [sent-136, score-0.62]

51 In standard variational inference, an intractable true posterior is approximated by a distribution from a tractable set (Bishop, 2006). [sent-138, score-0.278]

52 To in- corporate the constraints, we further restrict the set to only include distributions that satisfy the specified expectation constraints over hidden variables. [sent-140, score-0.398]

53 We further constrain q to be from the subset of Q thaWt seat fuisfrithese rth ceo expectation cbeon fsrtormain tth Eq [f(z)] ≤ Qb where f is a deterministically computable f(zu)n]c≤t io nb of the hidden structures. [sent-149, score-0.268]

54 In our model, for example, f counts the dependency edges that are an instance of one of the declaratively specified dependency rules, while b is the proportion of the total dependencies that we expect should fulfill this constraint. [sent-150, score-0.454]

55 2 With the mean field factorization and the expectation constraints in place, solving the maximization of F in (1) separately for each factor yields the following updates: q(θ) = arqg(mθ)inKL? [sent-151, score-0.349]

56 (5) We can solve (2) by setting q(θ) to q0(θ) since q(z) is held fixed while updating q(θ), the expectation function of the constraint remains constant during this update. [sent-158, score-0.268]

57 1 Variational Updates are easily imposed We now derive the specific variational updates for our dependency induction model. [sent-164, score-0.465]

58 q(z) of child symbol s and subsymbol z in context c when generated by parent symbol s0 and subsymbol z0. [sent-184, score-1.249]

59 eing generated by the parent symbol s0 and subsymbol z0 in context c and Cs0z0x is the count of word x being generated by symbol s0 and subsymbol z0. [sent-193, score-1.175]

60 1239 × The only factor affected by the expectation constraints is q(z). [sent-194, score-0.349]

61 5 Linguistic Constraints Universal Dependency Rules We compile a set of 13 universal dependency rules consistent with various linguistic accounts (Carnie, 2002; Newmeyer, 2005), shown in Table 1. [sent-200, score-0.72]

62 These rules are defined over coarse part-of-speech tags: Noun, Verb, Adjective, Adverb, Pronoun, Article, Auxiliary, Preposition, Numeral and Conjunction. [sent-201, score-0.377]

63 We require that a minimum proportion of the posterior dependencies be instances of these rules in expectation. [sent-203, score-0.473]

64 , 2009), where each rule has a separately specified expectation, we only set a single minimum expectation for the proportion of all dependencies that must match one of the rules. [sent-205, score-0.42]

65 English-specific Dependency Rules For English, we also consider a small set of hand-crafted dependency rules designed by Michael Collins3 for deterministic parsing, shown in Table 3. [sent-207, score-0.424]

66 Unlike the universals from Table 1, these rules alone are enough to construct a full dependency tree. [sent-208, score-0.52]

67 Moreover, with this dataset we can assess the additional benefit of using rules tailored to an individual language as opposed to universal rules. [sent-210, score-0.561]

68 1240 dencies with the Collins head finding rules (Collins, 1999); for the other languages we use data from the 2006 CoNLL-X Shared Task (Buchholz and Marsi, 2006). [sent-214, score-0.285]

69 This is computed based on the Viterbi parses produced using the final unnormalized variational distribution q(z) over dependency structures. [sent-219, score-0.293]

70 Hyperparameters and Training Regimes Unless otherwise stated, in experiments with rule-based constraints the expected proportion of dependencies that must satisfy those constraints is set to 0. [sent-220, score-0.415]

71 This threshold value was chosen based on minimal tuning on a single language and ruleset (English with universal rules) and carried over to each other experimental condition. [sent-222, score-0.473]

72 We also conduct a set of No-Split experiments to evaluate the importance of syntactic refinement; in these experiments each coarse symbol corresponds to only one refined symbol. [sent-227, score-0.468]

73 This is easily effected during inference by setting the HDP variational approximation truncation level to one. [sent-228, score-0.263]

74 For each experiment we run 50 iterations of variational updates; for each iteration we perform five steps of gradient search to compute the update for the variational distribution q(z) over dependency structures. [sent-229, score-0.572]

75 7 Results In the following section we present our primary cross-lingual results using universal rules (Section 7. [sent-230, score-0.561]

76 E 3465o)Pdel with universal dependency rules (No-Split and HDPDEP), compared to DMV (Klein and Manning, 2004) and PGI (Berg-Kirkpatrick and Klein, 2010). [sent-238, score-0.678]

77 1 Main Cross-Lingual Results Table 4 shows the performance of both our full model (HDP-DEP) and its No-Split version using universal dependency rules across six languages. [sent-243, score-0.714]

78 We also provide the performance oftwo baselines the dependency model with valence (DMV) (Klein and Manning, 2004) and the phylogenetic grammar induction (PGI) model (Berg-Kirkpatrick and Klein, — 2010). [sent-244, score-0.5]

79 This improvement is expected given that DMV does not have access to the additional information provided through the universal rules. [sent-248, score-0.326]

80 PGI is more relevant as a point of comparison, since it is able to leverage multilingual data to learn information similar to what we have declaratively specified using universal rules. [sent-249, score-0.445]

81 For each rule, we evaluate the model using the ruleset excluding that rule, and list the most significant rules for each language. [sent-264, score-0.316]

82 This result attests to how the expectation constraints consistently guide inference toward high-accuracy areas of the search space. [sent-274, score-0.435]

83 Ablation Analysis Our next experiment seeks to understand the relative importance of the various universal rules from Table 1. [sent-275, score-0.561]

84 “Gold” refers to the setting where each language’s threshold is set independently to the proportion of gold dependencies satisfying the rules for English this proportion is 70%, while the average proportion across languages is 63%. [sent-281, score-0.627]

85 This result suggests that some rules are harder to learn than others regardless of their frequency, so their presence in the specified ruleset yields stronger performance gains. [sent-283, score-0.316]

86 Impact of Rules Selection We compare the performance of HDP-DEP using the universal rules versus a set of rules designed for deterministically parsing the Penn Treebank (see Section 5 for details). [sent-302, score-0.865]

87 We also test model performance when no linguistic rules are available, i. [sent-309, score-0.277]

88 The model performs substantially worse (line 2), confirming that syntactic category refinement in a fully unsupervised setup is challenging. [sent-312, score-0.274]

89 Learning Beyond Provided Rules Since HDPDEP is provided with linguistic rules, a legitimate question is whether it improves upon what the rules encode, especially when the rules are complete and language-specific. [sent-313, score-0.512]

90 We can answer this question by comparing the performance of our model seeded with the English-specific rules against a deterministic parser that implements the same rules. [sent-314, score-0.378]

91 Comparison with Alternative Semi-supervised Parser The dependency parser based on the generalized expectation criteria (Druck et al. [sent-318, score-0.43]

92 Note that we do not rely on rule-specific expectation information as they do, instead requiring only a single expectation constraint parameter. [sent-325, score-0.468]

93 4 Model Stability It is commonly acknowledged in the literature that unsupervised grammar induction methods exhibit sensitivity to initialization. [sent-326, score-0.313]

94 As in the previous section, we find that the pres4As explained in Section 5, having a single expectation parameter is motivated by our focus on parsing with universal rules. [sent-327, score-0.595]

95 1243 ence of linguistic rules greatly reduces this sensitivity: for HDP-DEP, the standard deviation over five randomly initialized runs with the English-specific rules is 1. [sent-328, score-0.546]

96 8 Conclusions In this paper we demonstrated that syntactic universals encoded as declarative constraints improve grammar induction. [sent-333, score-0.484]

97 We formulated a generative model for dependency structure that models syntactic category refinement and biases inference to cohere with the provided constraints. [sent-334, score-0.437]

98 We are especially grateful to Michael Collins for inspiring us toward this line of inquiry and providing deterministic rules for English parsing. [sent-342, score-0.348]

99 Semi-supervised learning of dependency parsers using generalized expectation criteria. [sent-407, score-0.395]

100 Corpusbased induction of syntactic structure: Models of dependency and constituency. [sent-447, score-0.296]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('universal', 0.326), ('subsymbol', 0.315), ('rules', 0.235), ('expectation', 0.234), ('symbol', 0.228), ('variational', 0.176), ('universals', 0.168), ('dmv', 0.15), ('coarse', 0.142), ('gra', 0.135), ('refinement', 0.131), ('druck', 0.126), ('pgi', 0.126), ('induction', 0.123), ('dependency', 0.117), ('constraints', 0.115), ('eq', 0.113), ('valence', 0.105), ('posterior', 0.102), ('klein', 0.098), ('grammar', 0.092), ('hdp', 0.09), ('parent', 0.089), ('declaratively', 0.084), ('dirichlet', 0.083), ('ruleset', 0.081), ('headden', 0.079), ('ganchev', 0.075), ('child', 0.074), ('deterministic', 0.072), ('proportion', 0.07), ('threshold', 0.066), ('dependencies', 0.066), ('node', 0.065), ('arqg', 0.063), ('expeq', 0.063), ('inkl', 0.063), ('phylogenetic', 0.063), ('rulesets', 0.063), ('subsymbols', 0.063), ('typological', 0.063), ('logp', 0.06), ('liang', 0.057), ('iii', 0.057), ('syntactic', 0.056), ('gradient', 0.055), ('yq', 0.054), ('ca', 0.053), ('generative', 0.053), ('kuzman', 0.053), ('declarative', 0.053), ('unsupervised', 0.052), ('rule', 0.05), ('languages', 0.05), ('satisfy', 0.049), ('updates', 0.049), ('update', 0.048), ('cohen', 0.046), ('dir', 0.046), ('sensitivity', 0.046), ('inference', 0.045), ('multinomial', 0.045), ('categories', 0.044), ('generalized', 0.044), ('ao', 0.044), ('refined', 0.042), ('linguistic', 0.042), ('carnie', 0.042), ('cnj', 0.042), ('gem', 0.042), ('hdpdep', 0.042), ('mz', 0.042), ('newmeyer', 0.042), ('truncation', 0.042), ('snyder', 0.042), ('jo', 0.042), ('toward', 0.041), ('refine', 0.04), ('expectations', 0.04), ('manning', 0.039), ('zq', 0.036), ('kuhn', 0.036), ('auxiliaries', 0.036), ('tahira', 0.036), ('seeded', 0.036), ('zh', 0.036), ('infinite', 0.036), ('regina', 0.036), ('six', 0.036), ('nj', 0.036), ('parsing', 0.035), ('parser', 0.035), ('multilingual', 0.035), ('dan', 0.035), ('category', 0.035), ('updating', 0.034), ('correspondences', 0.034), ('deterministically', 0.034), ('english', 0.034), ('greatly', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

Author: Tahira Naseem ; Harr Chen ; Regina Barzilay ; Mark Johnson

Abstract: We present an approach to grammar induction that utilizes syntactic universals to improve dependency parsing across a range of languages. Our method uses a single set of manually-specified language-independent rules that identify syntactic dependencies between pairs of syntactic categories that commonly occur across languages. During inference of the probabilistic model, we use posterior expectation constraints to require that a minimum proportion of the dependencies we infer be instances of these rules. We also automatically refine the syntactic categories given in our coarsely tagged input. Across six languages our approach outperforms state-of-theart unsupervised methods by a significant margin.1

2 0.17972851 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

Author: Samuel Brody

Abstract: We reveal a previously unnoticed connection between dependency parsing and statistical machine translation (SMT), by formulating the dependency parsing task as a problem of word alignment. Furthermore, we show that two well known models for these respective tasks (DMV and the IBM models) share common modeling assumptions. This motivates us to develop an alignment-based framework for unsupervised dependency parsing. The framework (which will be made publicly available) is flexible, modular and easy to extend. Using this framework, we implement several algorithms based on the IBM alignment models, which prove surprisingly effective on the dependency parsing task, and demonstrate the potential of the alignment-based approach.

3 0.16240303 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

Author: Phil Blunsom ; Trevor Cohn

Abstract: Inducing a grammar directly from text is one of the oldest and most challenging tasks in Computational Linguistics. Significant progress has been made for inducing dependency grammars, however the models employed are overly simplistic, particularly in comparison to supervised parsing models. In this paper we present an approach to dependency grammar induction using tree substitution grammar which is capable of learning large dependency fragments and thereby better modelling the text. We define a hierarchical non-parametric Pitman-Yor Process prior which biases towards a small grammar with simple productions. This approach significantly improves the state-of-the-art, when measured by head attachment accuracy.

4 0.15706119 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging

Author: Yoong Keok Lee ; Aria Haghighi ; Regina Barzilay

Abstract: Part-of-speech (POS) tag distributions are known to exhibit sparsity a word is likely to take a single predominant tag in a corpus. Recent research has demonstrated that incorporating this sparsity constraint improves tagging accuracy. However, in existing systems, this expansion come with a steep increase in model complexity. This paper proposes a simple and effective tagging method that directly models tag sparsity and other distributional properties of valid POS tag assignments. In addition, this formulation results in a dramatic reduction in the number of model parameters thereby, enabling unusually rapid training. Our experiments consistently demonstrate that this model architecture yields substantial performance gains over more complex tagging — counterparts. On several languages, we report performance exceeding that of more complex state-of-the art systems.1

5 0.14047694 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

Author: Adria de Gispert ; Juan Pino ; William Byrne

Abstract: We report on investigations into hierarchical phrase-based translation grammars based on rules extracted from posterior distributions over alignments of the parallel text. Rather than restrict rule extraction to a single alignment, such as Viterbi, we instead extract rules based on posterior distributions provided by the HMM word-to-word alignmentmodel. We define translation grammars progressively by adding classes of rules to a basic phrase-based system. We assess these grammars in terms of their expressive power, measured by their ability to align the parallel text from which their rules are extracted, and the quality of the translations they yield. In Chinese-to-English translation, we find that rule extraction from posteriors gives translation improvements. We also find that grammars with rules with only one nonterminal, when extracted from posteri- ors, can outperform more complex grammars extracted from Viterbi alignments. Finally, we show that the best way to exploit source-totarget and target-to-source alignment models is to build two separate systems and combine their output translation lattices.

6 0.1063377 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

7 0.10581335 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?

8 0.10419252 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

9 0.095819883 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

10 0.095164105 117 emnlp-2010-Using Unknown Word Techniques to Learn Known Words

11 0.093893506 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

12 0.09004157 72 emnlp-2010-Learning First-Order Horn Clauses from Web Text

13 0.084346019 81 emnlp-2010-Modeling Perspective Using Adaptor Grammars

14 0.083965287 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing

15 0.081382528 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

16 0.079840295 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

17 0.07710284 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

18 0.076972216 17 emnlp-2010-An Efficient Algorithm for Unsupervised Word Segmentation with Branching Entropy and MDL

19 0.075583756 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

20 0.073399708 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.266), (1, 0.092), (2, 0.213), (3, -0.135), (4, 0.158), (5, -0.023), (6, -0.1), (7, 0.066), (8, 0.07), (9, -0.058), (10, 0.052), (11, -0.011), (12, 0.05), (13, 0.047), (14, 0.032), (15, -0.01), (16, -0.053), (17, -0.061), (18, -0.019), (19, 0.043), (20, 0.042), (21, -0.239), (22, -0.122), (23, 0.064), (24, -0.17), (25, 0.183), (26, 0.127), (27, 0.004), (28, -0.103), (29, -0.1), (30, 0.014), (31, 0.059), (32, 0.069), (33, 0.111), (34, 0.026), (35, -0.125), (36, -0.123), (37, 0.144), (38, 0.104), (39, -0.047), (40, -0.039), (41, 0.015), (42, -0.007), (43, 0.009), (44, 0.161), (45, 0.116), (46, -0.005), (47, -0.069), (48, 0.03), (49, 0.045)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96231139 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

Author: Tahira Naseem ; Harr Chen ; Regina Barzilay ; Mark Johnson

Abstract: We present an approach to grammar induction that utilizes syntactic universals to improve dependency parsing across a range of languages. Our method uses a single set of manually-specified language-independent rules that identify syntactic dependencies between pairs of syntactic categories that commonly occur across languages. During inference of the probabilistic model, we use posterior expectation constraints to require that a minimum proportion of the dependencies we infer be instances of these rules. We also automatically refine the syntactic categories given in our coarsely tagged input. Across six languages our approach outperforms state-of-theart unsupervised methods by a significant margin.1

2 0.73608929 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

Author: Phil Blunsom ; Trevor Cohn

Abstract: Inducing a grammar directly from text is one of the oldest and most challenging tasks in Computational Linguistics. Significant progress has been made for inducing dependency grammars, however the models employed are overly simplistic, particularly in comparison to supervised parsing models. In this paper we present an approach to dependency grammar induction using tree substitution grammar which is capable of learning large dependency fragments and thereby better modelling the text. We define a hierarchical non-parametric Pitman-Yor Process prior which biases towards a small grammar with simple productions. This approach significantly improves the state-of-the-art, when measured by head attachment accuracy.

3 0.501131 117 emnlp-2010-Using Unknown Word Techniques to Learn Known Words

Author: Kostadin Cholakov ; Gertjan van Noord

Abstract: Unknown words are a hindrance to the performance of hand-crafted computational grammars of natural language. However, words with incomplete and incorrect lexical entries pose an even bigger problem because they can be the cause of a parsing failure despite being listed in the lexicon of the grammar. Such lexical entries are hard to detect and even harder to correct. We employ an error miner to pinpoint words with problematic lexical entries. An automated lexical acquisition technique is then used to learn new entries for those words which allows the grammar to parse previously uncovered sentences successfully. We test our method on a large-scale grammar of Dutch and a set of sentences for which this grammar fails to produce a parse. The application of the method enables the grammar to cover 83.76% of those sentences with an accuracy of 86.15%.

4 0.46529073 72 emnlp-2010-Learning First-Order Horn Clauses from Web Text

Author: Stefan Schoenmackers ; Jesse Davis ; Oren Etzioni ; Daniel Weld

Abstract: input. Even the entire Web corpus does not explicitly answer all questions, yet inference can uncover many implicit answers. But where do inference rules come from? This paper investigates the problem of learning inference rules from Web text in an unsupervised, domain-independent manner. The SHERLOCK system, described herein, is a first-order learner that acquires over 30,000 Horn clauses from Web text. SHERLOCK embodies several innovations, including a novel rule scoring function based on Statistical Relevance (Salmon et al., 1971) which is effective on ambiguous, noisy and incomplete Web extractions. Our experiments show that inference over the learned rules discovers three times as many facts (at precision 0.8) as the TEXTRUNNER system which merely extracts facts explicitly stated in Web text.

5 0.45606783 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

Author: Samuel Brody

Abstract: We reveal a previously unnoticed connection between dependency parsing and statistical machine translation (SMT), by formulating the dependency parsing task as a problem of word alignment. Furthermore, we show that two well known models for these respective tasks (DMV and the IBM models) share common modeling assumptions. This motivates us to develop an alignment-based framework for unsupervised dependency parsing. The framework (which will be made publicly available) is flexible, modular and easy to extend. Using this framework, we implement several algorithms based on the IBM alignment models, which prove surprisingly effective on the dependency parsing task, and demonstrate the potential of the alignment-based approach.

6 0.40935314 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

7 0.38134348 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging

8 0.37461677 81 emnlp-2010-Modeling Perspective Using Adaptor Grammars

9 0.36598673 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

10 0.36065248 111 emnlp-2010-Two Decades of Unsupervised POS Induction: How Far Have We Come?

11 0.34235361 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

12 0.34209943 17 emnlp-2010-An Efficient Algorithm for Unsupervised Word Segmentation with Branching Entropy and MDL

13 0.33647203 110 emnlp-2010-Turbo Parsers: Dependency Parsing by Approximate Variational Inference

14 0.32581943 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

15 0.32441261 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

16 0.32255632 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

17 0.30558115 99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar

18 0.30090898 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

19 0.29378247 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

20 0.28839549 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.021), (12, 0.03), (29, 0.163), (30, 0.026), (32, 0.013), (52, 0.03), (56, 0.078), (62, 0.019), (66, 0.101), (72, 0.04), (76, 0.028), (77, 0.334), (79, 0.013), (83, 0.019), (87, 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91215694 28 emnlp-2010-Collective Cross-Document Relation Extraction Without Labelled Data

Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum

Abstract: We present a novel approach to relation extraction that integrates information across documents, performs global inference and requires no labelled text. In particular, we tackle relation extraction and entity identification jointly. We use distant supervision to train a factor graph model for relation extraction based on an existing knowledge base (Freebase, derived in parts from Wikipedia). For inference we run an efficient Gibbs sampler that leads to linear time joint inference. We evaluate our approach both for an indomain (Wikipedia) and a more realistic outof-domain (New York Times Corpus) setting. For the in-domain setting, our joint model leads to 4% higher precision than an isolated local approach, but has no advantage over a pipeline. For the out-of-domain data, we benefit strongly from joint modelling, and observe improvements in precision of 13% over the pipeline, and 15% over the isolated baseline.

same-paper 2 0.81157351 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

Author: Tahira Naseem ; Harr Chen ; Regina Barzilay ; Mark Johnson

Abstract: We present an approach to grammar induction that utilizes syntactic universals to improve dependency parsing across a range of languages. Our method uses a single set of manually-specified language-independent rules that identify syntactic dependencies between pairs of syntactic categories that commonly occur across languages. During inference of the probabilistic model, we use posterior expectation constraints to require that a minimum proportion of the dependencies we infer be instances of these rules. We also automatically refine the syntactic categories given in our coarsely tagged input. Across six languages our approach outperforms state-of-theart unsupervised methods by a significant margin.1

3 0.55246401 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

Author: Jacob Eisenstein ; Brendan O'Connor ; Noah A. Smith ; Eric P. Xing

Abstract: The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as “sports” or “entertainment” are rendered differently in each geographic region, revealing topic-specific regional distinctions. Applied to a new dataset of geotagged microblogs, our model recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency. The model also enables prediction of an author’s geographic location from raw text, outperforming both text regression and supervised topic models.

4 0.54758793 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

Author: Kristian Woodsend ; Yansong Feng ; Mirella Lapata

Abstract: The task of selecting information and rendering it appropriately appears in multiple contexts in summarization. In this paper we present a model that simultaneously optimizes selection and rendering preferences. The model operates over a phrase-based representation of the source document which we obtain by merging PCFG parse trees and dependency graphs. Selection preferences for individual phrases are learned discriminatively, while a quasi-synchronous grammar (Smith and Eisner, 2006) captures rendering preferences such as paraphrases and compressions. Based on an integer linear programming formulation, the model learns to generate summaries that satisfy both types of preferences, while ensuring that length, topic coverage and grammar constraints are met. Experiments on headline and image caption generation show that our method obtains state-of-the-art performance using essentially the same model for both tasks without any major modifications.

5 0.54475766 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

Author: Zhongqiang Huang ; Martin Cmejrek ; Bowen Zhou

Abstract: In this paper, we present a novel approach to enhance hierarchical phrase-based machine translation systems with linguistically motivated syntactic features. Rather than directly using treebank categories as in previous studies, we learn a set of linguistically-guided latent syntactic categories automatically from a source-side parsed, word-aligned parallel corpus, based on the hierarchical structure among phrase pairs as well as the syntactic structure of the source side. In our model, each X nonterminal in a SCFG rule is decorated with a real-valued feature vector computed based on its distribution of latent syntactic categories. These feature vectors are utilized at decod- ing time to measure the similarity between the syntactic analysis of the source side and the syntax of the SCFG rules that are applied to derive translations. Our approach maintains the advantages of hierarchical phrase-based translation systems while at the same time naturally incorporates soft syntactic constraints.

6 0.5378958 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

7 0.53535783 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

8 0.53446066 13 emnlp-2010-A Simple Domain-Independent Probabilistic Approach to Generation

9 0.53230536 81 emnlp-2010-Modeling Perspective Using Adaptor Grammars

10 0.52952313 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

11 0.52414542 89 emnlp-2010-PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts

12 0.52354431 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

13 0.52081716 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

14 0.51977789 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics

15 0.51834935 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice

16 0.51695669 77 emnlp-2010-Measuring Distributional Similarity in Context

17 0.51677185 42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation

18 0.5156461 87 emnlp-2010-Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space

19 0.51409703 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

20 0.51101577 99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar