acl acl2012 acl2012-130 knowledge-graph by maker-knowledge-mining

130 acl-2012-Learning Syntactic Verb Frames using Graphical Models

Source: pdf

Author: Thomas Lippincott ; Anna Korhonen ; Diarmuid O Seaghdha

Abstract: We present a novel approach for verb subcategorization lexicons using a simple graphical model. In contrast to previous methods, we show how the model can be trained without parsed input or a predefined subcategorization frame inventory. Our method outperforms the state-of-the-art on a verb clustering task, and is easily trained on arbitrary domains. This quantitative evaluation is com- plemented by a qualitative discussion of verbs and their frames. We discuss the advantages of graphical models for this task, in particular the ease of integrating semantic information about verbs and arguments in a principled fashion. We conclude with future work to augment the approach.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk 3 Abstract We present a novel approach for verb subcategorization lexicons using a simple graphical model. [sent-3, score-0.49]

2 In contrast to previous methods, we show how the model can be trained without parsed input or a predefined subcategorization frame inventory. [sent-4, score-0.28]

3 Our method outperforms the state-of-the-art on a verb clustering task, and is easily trained on arbitrary domains. [sent-5, score-0.198]

4 This quantitative evaluation is com- plemented by a qualitative discussion of verbs and their frames. [sent-6, score-0.128]

5 We discuss the advantages of graphical models for this task, in particular the ease of integrating semantic information about verbs and arguments in a principled fashion. [sent-7, score-0.246]

6 1 Introduction Subcategorization frames (SCFs) give a compact description of a verb’s syntactic preferences. [sent-9, score-0.206]

7 These two sentences have the same sequence of lexical syntactic categories (VP-NP-SCOMP), but the first is a simple transitive (“X understood Y”), while the second is a ditransitive with a sentential complement (“X persuaded Y that Z”): 1. [sent-10, score-0.174]

8 Kim (VP persuaded (NP the judge) (SCOMP that Sandy was present)) 420 Diarmuid O´ S ´eaghdha University of Cambridge Computer Laboratory United Kingdom do2 4 2 @ cam . [sent-12, score-0.075]

9 A comprehensive lexicon would also include semantic information about selectional preferences (or restrictions) on argument heads of verbs, diathesis alternations (i. [sent-17, score-0.298]

10 semantically-motivated alternations between pairs of SCFs) and a mapping from surface frames to the underlying predicate-argument structure. [sent-19, score-0.204]

11 Information about verb subcategorization is useful for tasks like information extraction (Cohen and Hunter, 2006; Rupp et al. [sent-20, score-0.358]

12 Large, manually-constructed SCF lexicons mostly target general language (Boguraev and Briscoe, 1987; Grishman et al. [sent-26, score-0.076]

13 However, in many domains verbs exhibit different syntactic behavior (Roland and Jurafsky, 1998; Lippincott et al. [sent-28, score-0.21]

14 For example, the verb “develop” has specific usages in newswire, biomedicine and engineering that dramatically change its probability distribution over SCFs. [sent-30, score-0.203]

15 In a few domains like biomedicine, the need for focused SCF lexicons has led to manually-built resources (Bodenreider, 2004). [sent-31, score-0.11]

16 Such resources, however, are costly, prone to human error, and in domains where new lexical and syntactic constructs are frequently coined, quickly become obsolete (Cohen and Hunter, 2006). [sent-32, score-0.082]

17 c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi4c 2s0–429, these problems by building lexicons tailored to new domains with less manual effort, and higher coverage and scalability. [sent-35, score-0.11]

18 Unfortunately, high quality SCF lexicons are difficult to build automatically. [sent-36, score-0.076]

19 , 2000), many genuine frames are also low in frequency. [sent-41, score-0.158]

20 State-of-theart methods for building data-driven SCF lexicons typically rely on parsed input (see section 2). [sent-42, score-0.112]

21 Finally, many SCF acquisition methods operate with predefined SCF inventories. [sent-47, score-0.078]

22 In our model, a Bayesian net- work describes how verbs choose their arguments in terms of a small number of frames, which are represented as distributions over syntactic relationships. [sent-50, score-0.245]

23 Second, by replacing the syntactic features with an approximation based on POS tags, we achieve state-of-the-art performance without relying on error-prone unlexicalized or domain-specific lexicalized parsers. [sent-52, score-0.107]

24 Third, we highlight a key advantage of our method compared to previous approaches: the ease of integrating and performing joint inference of additional syntactic and semantic information. [sent-53, score-0.08]

25 GRs express binary dependencies between lexical items, and many parsers produce them as output, with some variation in inventory (Briscoe et al. [sent-56, score-0.113]

26 GR inventories include direct and indirect objects, complements, conjunctions, among other relations. [sent-60, score-0.096]

27 There are several SCF lexicons for general language, such as ANLT (Boguraev and Briscoe, 1987) and COMLEX (Grishman et al. [sent-62, score-0.076]

28 , 2007) provides SCF distributions for 6,397 verbs acquired from a parsed general language corpus via a system that relies on hand-crafted rules. [sent-65, score-0.203]

29 There are also resources which provide information about both syntactic and semantic properties of verbs: VerbNet (Kipper et al. [sent-66, score-0.08]

30 , 1998) provides semantic frames and annotated example sentences for 4,186 verbs. [sent-69, score-0.19]

31 , 2005) is a corpus where each verb is annotated for its arguments and their semantic roles, covering a total of 4,592 verbs. [sent-71, score-0.212]

32 Two state-of-the-art data-driven systems for English verbs are those that produced VALEX, Preiss et al. [sent-82, score-0.128]

33 The Preiss system extracts a verb instance’s GRs using the Rasp general-language unlexicalized parser (Briscoe et al. [sent-85, score-0.213]

34 , 2006) as input, and based on handcrafted rules, maps verb instances to a predefined inventory of 168 SCFs. [sent-86, score-0.299]

35 Filtering is then performed to remove noisy frames, with methods ranging from a simple single threshold to SCF-specific hypothesis tests based on external verb classes and SCF inventories. [sent-87, score-0.203]

36 The BioLexicon system extracts each verb instance’s GRs using the lexicalized Enju parser tuned to the biomedical domain (Miyao, 2005). [sent-88, score-0.29]

37 The BioLexicon system induces its SCF inventory automatically, but requires a lexicalized parsing model, rendering it more sensitive to domain variation. [sent-91, score-0.141]

38 The potential for joint inference of complementary information, such as syntactic verb and semantic argument classes, has a clear and interpretable way forward, in contrast to the pipelined methods described above. [sent-97, score-0.292]

39 (2004), where a Bayesian model was used to jointly induce syntactic and semantic classes for verbs, although that study relied on manually annotated data and a predefined SCF inventory and MLE. [sent-99, score-0.282]

40 Their study employed unsupervised POS tagging and parsing, and measures of selectional preference and argument structure as complementary features for the classifier. [sent-101, score-0.125]

41 422 Finally, our task-based evaluation, verb clustering with Levin (1993)’s alternation classes as the gold standard, was previously conducted by Joanis and Stevenson (2003), Korhonen et al. [sent-102, score-0.349]

42 Our method is agnostic regarding the exact definition of GR, and for example could use the Stanford inventory (De Marneffe et al. [sent-107, score-0.113]

43 We define the feature set for a verb occurrence as the counts of each GR the verb participates in. [sent-117, score-0.3]

44 t NG GaRmR p ea ra m ,lexn c s u b j(V V0 ,P PIS F2 e)-awteu)resd o b j(V V0 ,N N2 ) Table 2: True-GR features for example sentence: note there are also tGR∗,lim versions of each that only consider subjects, objects and complements and are not shown. [sent-122, score-0.12]

45 An unlexicalized parser cannot distinguish these based just on POS tags, while a lexicalized parser requires a large treebank. [sent-124, score-0.123]

46 We therefore define pseudo-GRs (pGRs), which consider each (distance, POS) pair within a given window of the verb to be a potential tGR. [sent-125, score-0.199]

47 We extract instances for the 385 verbs in the union of our two gold standards from the VALEX lexicon’s data set, which was used in previous studies (Sun and Korhonen, 2009; Preiss et al. [sent-129, score-0.172]

48 Its generative story is as follows: when a verb is instantiated, an SCF is chosen according to a verb-specific multinomial. [sent-134, score-0.15]

49 Then, the number and type of syntactic arguments (GRs) are chosen from two SCF-specific multinomials. [sent-135, score-0.078]

50 The model is trained via collapsed Gibbs sampling, where the probability of assigning a particular SCF s to an instance of verb v with GRs (gr1 . [sent-137, score-0.178]

51 verbs tend to take a small number of SCFs, which in turn are limited to a small number of realizations). [sent-150, score-0.128]

52 For the pGRs we used a window of 5 tokens: a verb’s arguments will fall within a small window in the majority of cases, so there is diminished return in expanding the window at the cost of increased noise. [sent-151, score-0.177]

53 Finally, we set our SCF count to 40, about twice the size of the strictly syntactic general-language gold standard we describe in section 3. [sent-152, score-0.092]

54 This overestimation allows some flexibility for the model to define its inventory based on the data; any supernumerary frames will act as “junk frames” that are rarely assigned and hence will have little influence. [sent-154, score-0.271]

55 We therefore compare the lexicons based on their performance on a task that a good SCF lexicon should be useful for: clustering verbs into lexical-semantic classes. [sent-167, score-0.318]

56 Our gold standard is from (Sun and Korhonen, 2009), where 200 verbs were assigned to 17 classes based on their alternation patterns (Levin, 1993). [sent-168, score-0.279]

57 Previous work (Schulte im Walde, 2009; Sun and Korhonen, 2009) has demonstrated that the quality of an SCF lexicon’s inventory and probability estimates corresponds to its predictive power for membership in such alternation classes. [sent-169, score-0.195]

58 The clusters are then compared to the gold standard clusters with the purity-based F-Score from Sun and Korhonen (2009) and the more familiar Adjusted Rand Index (Hubert and Arabie, 1985). [sent-172, score-0.126]

59 Qualitative: manual gold standard We also want to see how our results line up with a traditional linguistic view of subcategorization, but this requires digging into the unsupervised output and associating anonymous probabilistic objects with established categories. [sent-174, score-0.101]

60 Third, we compared the output for several verbs to a coarsened version of the manually-annotated gold standard used to evaluate VALEX (Preiss et al. [sent-177, score-0.172]

61 We collapsed the original inventory of 168 SCFs to 18 purely syntactic SCFs based on their characteristic GRs and removed frames that depend on semantic distinctions, leaving the detection of finer-grained and semanticallybased frames for future work. [sent-179, score-0.509]

62 1 Verb clustering We evaluated SCF lexicons based on the eight feature sets described in section 3. [sent-181, score-0.124]

63 Table 4 shows the performance of the lexicons in ascending order. [sent-183, score-0.076]

64 It is therefore possible to use this method to build large-scale lexicons for any new domain with sufficient data. [sent-192, score-0.076]

65 Some clusters are immediately intelligible at the semantic level and correspond closely to the lexical-semantic classes found in Levin (1993). [sent-198, score-0.126]

66 For example, clusters 1, 6, and 14 include member verbs of Levin’s SAY, PEER and AMUSE classes, respectively. [sent-199, score-0.169]

67 cluster 2 which groups together verbs related to locations) while others relate semantic classes purely based on their syntactic similarity (e. [sent-202, score-0.261]

68 the verbs in cluster 17 share strong preference for ’to’ preposition). [sent-204, score-0.128]

69 The syntactic-semantic nature of the clusters reflects the multimodal nature of verbs and illustrates why a comprehensive subcategorization lexicon should not be limited to syntactic frames. [sent-205, score-0.491]

70 Looking at the verbs most likely to take this SCF (“stimulate”, “conserve”) confirms using tGRparam,lex,lim this. [sent-210, score-0.128]

71 The induced SCF inventory also has some redundancy, such as additional transitive frames beside figure 2, and frames with poor probability estimates. [sent-212, score-0.51]

72 For example, if an SCF gives any weight to indirect objects, it gives non-zero probability to an instance with only indirect objects, an impossible case. [sent-217, score-0.156]

73 indirect objects and prepositional phrases) the model may find it reasonable to create an SCF with all probability focused on that tGR, ignoring all others, such as in figure 4. [sent-220, score-0.149]

74 We conclude that our independence assumption was too strong, and the model would benefit from defining more structure within 426 Figure 2: The SCF corresponding to transitive has most probability centered on dobj (e. [sent-221, score-0.081]

75 stimulate, conserve, deepen, eradicate, broaden) Figure 3: The SCF corresponding to verbs taking complements has more probability on xcomp and ccomp (e. [sent-223, score-0.219]

76 The full tables necessary to compare verb SCF distributions from our output with the manual gold standard are prohibited by space, but a few examples reinforce the analysis above. [sent-226, score-0.233]

77 The verbs “load” and “fill” show particularly high usage of ditransitive SCFs in the gold standard. [sent-227, score-0.216]

78 However, the Rasp parser often conflates indirect objects and prepositional phrases due to its unlexicalized model. [sent-230, score-0.184]

79 While our system correctly gives high probability to ditransitive for both verbs, it inherits this confusion and over-estimates “acquire”’s probability mass for the frame. [sent-231, score-0.1]

80 This is an example of how bad decisions made by the parser cannot be fixed by the graphical model, and an area where pGR features have an advantage. [sent-232, score-0.088]

81 We achieve better results with far less effort than previous approaches by allowing the data to govern the definition of frames while estimating the verb-specific distributions in a fully Bayesian manner. [sent-234, score-0.197]

82 Second, simply treating POS tags within a small window of the verb as pseudo-GRs produces state-of-the-art results without the need for a parsing model. [sent-235, score-0.199]

83 Our initial attempt at applying graphical models to subcategorization also suggested several ways to extend and improve the method. [sent-241, score-0.264]

84 , 2003), diathesis alternations (McCarthy, 2000) and selectional preferences (O´ S ´eaghdha, 2010). [sent-246, score-0.168]

85 This study targeted high-frequency verbs, but the use of syntactic and semantic classes would also help with data sparsity down the road. [sent-247, score-0.133]

86 These extensions would also call for a more comprehensive evaluation, averaging over several tasks, such as clustering by semantics, syntax, alternations and selectional preferences. [sent-248, score-0.157]

87 In concrete terms, we plan to introduce latent variables corresponding to syntactic, semantic and alternation classes, that will determine a verb’s syntactic arguments, their semantic realization (i. [sent-249, score-0.166]

88 We next plan to test our method on a new domain, such as biomedical text, where verbs are known to take on distinct syntactic behavior (Lippincott et al. [sent-254, score-0.256]

89 We are grateful to Lin Sun and Laura Rimell for the use of their clustering and subcategorization gold standards, and the ACL reviewers for their helpful comments and suggestions. [sent-257, score-0.3]

90 The Unified Medical Language System (UMLS): integrating biomedical terminology. [sent-276, score-0.08]

91 A critical review of PASBio’s argument structures for biomedical verbs. [sent-292, score-0.112]

92 Weakly supervised SVM for Chinese-English crosslingual subcategorization lexicon acquisition. [sent-313, score-0.274]

93 A large subcategorization lexicon for natural language processing applications. [sent-351, score-0.274]

94 The choice of features for classification of verbs in biomedical texts. [sent-359, score-0.208]

95 Unsupervised acquisition of verb subcategorization frames from shallow-parsed corpora. [sent-363, score-0.558]

96 Automatic verb classification based on statistical distributions of argument structure. [sent-383, score-0.221]

97 A system for large-scale acquisition of verbal, nominal and adjectival subcategorization frames from corpora. [sent-412, score-0.408]

98 How verb subcategorization frequencies are affected by corpus choice. [sent-416, score-0.358]

99 A specialised verb lexicon as the basis of fact extraction in the biomedical domain. [sent-426, score-0.296]

100 The induction of verb frames and verb classes from corpora. [sent-430, score-0.511]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('scf', 0.679), ('subcategorization', 0.208), ('scfs', 0.189), ('korhonen', 0.185), ('tgrs', 0.174), ('grs', 0.164), ('frames', 0.158), ('verb', 0.15), ('valex', 0.131), ('verbs', 0.128), ('preiss', 0.116), ('inventory', 0.113), ('pgrs', 0.102), ('tgr', 0.102), ('briscoe', 0.081), ('biomedical', 0.08), ('gr', 0.08), ('anna', 0.08), ('lexicons', 0.076), ('erb', 0.073), ('lippincott', 0.073), ('lexicon', 0.066), ('indirect', 0.064), ('complements', 0.063), ('selectional', 0.063), ('rasp', 0.058), ('objects', 0.057), ('graphical', 0.056), ('alternation', 0.054), ('transitive', 0.053), ('classes', 0.053), ('window', 0.049), ('syntactic', 0.048), ('clustering', 0.048), ('alternations', 0.046), ('cam', 0.046), ('gold', 0.044), ('biolexicon', 0.044), ('ditransitive', 0.044), ('eaghdha', 0.044), ('hunter', 0.044), ('kingdom', 0.044), ('krymolowski', 0.044), ('pgr', 0.044), ('teichert', 0.044), ('acquisition', 0.042), ('clusters', 0.041), ('levin', 0.041), ('ted', 0.04), ('sun', 0.039), ('distributions', 0.039), ('boguraev', 0.038), ('diarmuid', 0.038), ('pos', 0.037), ('predefined', 0.036), ('parsed', 0.036), ('domains', 0.034), ('semantic', 0.032), ('inventories', 0.032), ('parser', 0.032), ('argument', 0.032), ('unlexicalized', 0.031), ('preferences', 0.03), ('complementary', 0.03), ('arguments', 0.03), ('sampling', 0.03), ('bmc', 0.029), ('comlex', 0.029), ('conserve', 0.029), ('csn', 0.029), ('csv', 0.029), ('diathesis', 0.029), ('grn', 0.029), ('hartigan', 0.029), ('joanis', 0.029), ('persuaded', 0.029), ('programmes', 0.029), ('romania', 0.029), ('rupp', 0.029), ('sandy', 0.029), ('scomp', 0.029), ('stimulate', 0.029), ('uzun', 0.029), ('venturi', 0.029), ('han', 0.029), ('lexicalized', 0.028), ('probability', 0.028), ('encouraging', 0.028), ('carroll', 0.027), ('yuval', 0.027), ('daum', 0.026), ('cohen', 0.026), ('hubert', 0.025), ('biomedicine', 0.025), ('schulte', 0.025), ('montemagni', 0.025), ('simonetta', 0.025), ('abend', 0.025), ('gri', 0.025), ('gregor', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models

Author: Thomas Lippincott ; Anna Korhonen ; Diarmuid O Seaghdha

2 0.26167253 48 acl-2012-Classifying French Verbs Using French and English Lexical Resources

Author: Ingrid Falk ; Claire Gardent ; Jean-Charles Lamirel

Abstract: We present a novel approach to the automatic acquisition of a Verbnet like classification of French verbs which involves the use (i) of a neural clustering method which associates clusters with features, (ii) of several supervised and unsupervised evaluation metrics and (iii) of various existing syntactic and semantic lexical resources. We evaluate our approach on an established test set and show that it outperforms previous related work with an Fmeasure of 0.70.

3 0.10323995 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili ; Martha Palmer

Abstract: In this paper, we propose innovative representations for automatic classification of verbs according to mainstream linguistic theories, namely VerbNet and FrameNet. First, syntactic and semantic structures capturing essential lexical and syntactic properties of verbs are defined. Then, we design advanced similarity functions between such structures, i.e., semantic tree kernel functions, for exploiting distributional and grammatical information in Support Vector Machines. The extensive empirical analysis on VerbNet class and frame detection shows that our models capture mean- ingful syntactic/semantic structures, which allows for improving the state-of-the-art.

4 0.095180169 64 acl-2012-Crosslingual Induction of Semantic Roles

Author: Ivan Titov ; Alexandre Klementiev

Abstract: We argue that multilingual parallel data provides a valuable source of indirect supervision for induction of shallow semantic representations. Specifically, we consider unsupervised induction of semantic roles from sentences annotated with automatically-predicted syntactic dependency representations and use a stateof-the-art generative Bayesian non-parametric model. At inference time, instead of only seeking the model which explains the monolingual data available for each language, we regularize the objective by introducing a soft constraint penalizing for disagreement in argument labeling on aligned sentences. We propose a simple approximate learning algorithm for our set-up which results in efficient inference. When applied to German-English parallel data, our method obtains a substantial improvement over a model trained without using the agreement signal, when both are tested on non-parallel sentences.

5 0.060238585 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum

Abstract: To discover relation types from text, most methods cluster shallow or syntactic patterns of relation mentions, but consider only one possible sense per pattern. In practice this assumption is often violated. In this paper we overcome this issue by inducing clusters of pattern senses from feature representations of patterns. In particular, we employ a topic model to partition entity pairs associated with patterns into sense clusters using local and global features. We merge these sense clusters into semantic relations using hierarchical agglomerative clustering. We compare against several baselines: a generative latent-variable model, a clustering method that does not disambiguate between path senses, and our own approach but with only local features. Experimental results show our proposed approach discovers dramatically more accurate clusters than models without sense disambiguation, and that incorporating global features, such as the document theme, is crucial.

6 0.058316417 192 acl-2012-Tense and Aspect Error Correction for ESL Learners Using Global Context

7 0.056109708 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

8 0.053019207 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

9 0.052877754 71 acl-2012-Dependency Hashing for n-best CCG Parsing

10 0.052388374 189 acl-2012-Syntactic Annotations for the Google Books NGram Corpus

11 0.05169886 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies

12 0.050577797 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT

13 0.049322098 91 acl-2012-Extracting and modeling durations for habits and events from Twitter

14 0.049145684 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling

15 0.048864096 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations

16 0.045629751 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities

17 0.043520078 186 acl-2012-Structuring E-Commerce Inventory

18 0.042095538 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons

19 0.040670145 41 acl-2012-Bootstrapping a Unified Model of Lexical and Phonetic Acquisition

20 0.040521406 170 acl-2012-Robust Conversion of CCG Derivations to Phrase Structure Trees

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.149), (1, 0.048), (2, -0.067), (3, -0.024), (4, -0.035), (5, 0.027), (6, -0.01), (7, 0.018), (8, -0.014), (9, 0.002), (10, -0.0), (11, -0.05), (12, 0.123), (13, 0.03), (14, -0.174), (15, 0.034), (16, -0.002), (17, -0.055), (18, -0.06), (19, 0.012), (20, -0.088), (21, -0.015), (22, -0.048), (23, 0.055), (24, 0.12), (25, -0.111), (26, 0.13), (27, -0.131), (28, 0.225), (29, 0.096), (30, -0.025), (31, -0.088), (32, -0.141), (33, 0.158), (34, -0.026), (35, -0.085), (36, 0.182), (37, 0.124), (38, -0.074), (39, -0.103), (40, -0.131), (41, 0.05), (42, 0.069), (43, -0.075), (44, 0.166), (45, 0.028), (46, 0.119), (47, 0.135), (48, 0.013), (49, 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9468137 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models

Author: Thomas Lippincott ; Anna Korhonen ; Diarmuid O Seaghdha

2 0.90038657 48 acl-2012-Classifying French Verbs Using French and English Lexical Resources

Author: Ingrid Falk ; Claire Gardent ; Jean-Charles Lamirel

3 0.46373793 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili ; Martha Palmer

4 0.35575894 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions

Author: Inderjeet Mani ; James Pustejovsky

Abstract: unkown-abstract

5 0.34475723 189 acl-2012-Syntactic Annotations for the Google Books NGram Corpus

Author: Yuri Lin ; Jean-Baptiste Michel ; Erez Aiden Lieberman ; Jon Orwant ; Will Brockman ; Slav Petrov

Abstract: We present a new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of five centuries, in eight languages; it reflects 6% of all books ever published. This new edition introduces syntactic annotations: words are tagged with their part-of-speech, and headmodifier relationships are recorded. The annotations are produced automatically with statistical models that are specifically adapted to historical text. The corpus will facilitate the study of linguistic trends, especially those related to the evolution of syntax.

6 0.33434546 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

7 0.31229001 91 acl-2012-Extracting and modeling durations for habits and events from Twitter

8 0.30552322 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities

9 0.29012781 186 acl-2012-Structuring E-Commerce Inventory

10 0.28643712 120 acl-2012-Information-theoretic Multi-view Domain Adaptation

11 0.27766287 195 acl-2012-The Creation of a Corpus of English Metalanguage

12 0.27355236 64 acl-2012-Crosslingual Induction of Semantic Roles

13 0.26963073 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

14 0.26579148 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

15 0.24869467 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study

16 0.23363039 192 acl-2012-Tense and Aspect Error Correction for ESL Learners Using Global Context

17 0.23144542 16 acl-2012-A Nonparametric Bayesian Approach to Acoustic Model Discovery

18 0.22895922 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

19 0.22131731 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

20 0.19789959 190 acl-2012-Syntactic Stylometry for Deception Detection

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(17, 0.011), (25, 0.029), (26, 0.058), (28, 0.035), (30, 0.03), (37, 0.06), (39, 0.064), (45, 0.239), (57, 0.011), (59, 0.014), (71, 0.012), (74, 0.038), (82, 0.022), (84, 0.031), (85, 0.035), (90, 0.086), (92, 0.062), (94, 0.017), (99, 0.062)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76253581 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models

Author: Thomas Lippincott ; Anna Korhonen ; Diarmuid O Seaghdha

2 0.60483181 81 acl-2012-Enhancing Statistical Machine Translation with Character Alignment

Author: Ning Xi ; Guangchao Tang ; Xinyu Dai ; Shujian Huang ; Jiajun Chen

Abstract: The dominant practice of statistical machine translation (SMT) uses the same Chinese word segmentation specification in both alignment and translation rule induction steps in building Chinese-English SMT system, which may suffer from a suboptimal problem that word segmentation better for alignment is not necessarily better for translation. To tackle this, we propose a framework that uses two different segmentation specifications for alignment and translation respectively: we use Chinese character as the basic unit for alignment, and then convert this alignment to conventional word alignment for translation rule induction. Experimentally, our approach outperformed two baselines: fully word-based system (using word for both alignment and translation) and fully character-based system, in terms of alignment quality and translation performance. 1Introduction Chinese Word segmentation is a necessary step in Chinese-English statistical machine translation (SMT) because Chinese sentences do not delimit words by spaces. The key characteristic of a Chinese word segmenter is the segmentation specification1. As depicted in Figure 1(a), the dominant practice of SMT uses the same word segmentation for both word alignment and translation rule induction. For brevity, we will refer to the word segmentation of the bilingual corpus as word segmentation for alignment (WSA for short), because it determines the basic tokens for alignment; and refer to the word segmentation of the aligned corpus as word segmentation for rules (WSR for short), because it determines the basic tokens of translation 1 We hereafter use “word segmentation” for short. 285 (a) WSA=WSR (b) WSA≠WSR Figure 1. WSA and WSR in SMT pipeline rules2, which also determines how the translation rules would be matched by the source sentences. It is widely accepted that word segmentation with a higher F-score will not necessarily yield better translation performance (Chang et al., 2008; Zhang et al., 2008; Xiao et al., 2010). Therefore, many approaches have been proposed to learn word segmentation suitable for SMT. These approaches were either complicated (Ma et al., 2007; Chang et al., 2008; Ma and Way, 2009; Paul et al., 2010), or of high computational complexity (Chung and Gildea 2009; Duan et al., 2010). Moreover, they implicitly assumed that WSA and WSR should be equal. This requirement may lead to a suboptimal problem that word segmentation better for alignment is not necessarily better for translation. To tackle this, we propose a framework that uses different word segmentation specifications as WSA and WSR respectively, as shown Figure 1(b). We investigate a solution in this framework: first, we use Chinese character as the basic unit for alignment, viz. character alignment; second, we use a simple method (Elming and Habash, 2007) to convert the character alignment to conventional word alignment for translation rule induction. In the 2 Interestingly, word is also a basic token in syntax-based rules. Proce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi2c 8s5–290, experiment, our approach consistently outperformed two baselines with three different word segmenters: fully word-based system (using word for both alignment and translation) and fully character-based system, in terms of alignment quality and translation performance. The remainder of this paper is structured as follows: Section 2 analyzes the influences of WSA and WSR on SMT respectively; Section 3 discusses how to convert character alignment to word alignment; Section 4 presents experimental results, followed by conclusions and future work in section 5. 2 Understanding WSA and WSR We propose a solution to tackle the suboptimal problem: using Chinese character for alignment while using Chinese word for translation. Character alignment differs from conventional word alignment in the basic tokens of the Chinese side of the training corpus3. Table 1 compares the token distributions of character-based corpus (CCorpus) and word-based corpus (WCorpus). We see that the WCorpus has a longer-tailed distribution than the CCorpus. More than 70% of the unique tokens appear less than 5 times in WCorpus. However, over half of the tokens appear more than or equal to 5 times in the CCorpus. This indicates that modeling word alignment could suffer more from data sparsity than modeling character alignment. Table 2 shows the numbers of the unique tokens (#UT) and unique bilingual token pairs (#UTP) of the two corpora. Consider two extensively features, fertility and translation features, which are extensively used by many state-of-the-art word aligners. The number of parameters w.r.t. fertility features grows linearly with #UT while the number of parameters w.r.t. translation features grows linearly with #UTP. We compare #UT and #UTP of both corpora in Table 2. As can be seen, CCorpus has less UT and UTP than WCorpus, i.e. character alignment model has a compact parameterization than word alignment model, where the compactness of parameterization is shown very important in statistical modeling (Collins, 1999). Another advantage of character alignment is the reduction in alignment errors caused by word seg3 Several works have proposed to use character (letter) on both sides of the parallel corpus for SMT between similar (European) languages (Vilar et al., 2007; Tiedemann, 2009), however, Chinese is not similar to English. 286 Frequency Characters (%) Words (%) 1 27.22 45.39 2 11.13 14.61 3 6.18 6.47 4 4.26 4.32 5(+) 50.21 29.21 Table 1 Token distribution of CCorpus and WCorpus Stats. Characters Words #UT 9.7K 88.1K #UTP 15.8M 24.2M Table 2 #UT and #UTP in CCorpus and WCorpus mentation errors. For example, “切尼 (Cheney)” and “愿 (will)” are wrongly merged into one word 切尼 by the word segmenter, and 切尼 wrongly aligns to a comma in English sentence in the word alignment; However, both 切 and 尼 align to “Cheney” correctly in the character alignment. However, this kind of errors cannot be fixed by methods which learn new words by packing already segmented words, such as word packing (Ma et al., 2007) and Pseudo-word (Duan et al., 2010). As character could preserve more meanings than word in Chinese, it seems that a character can be wrongly aligned to many English words by the aligner. However, we found this can be avoided to a great extent by the basic features (co-occurrence and distortion) used by many alignment models. For example, we observed that the four characters of the non-compositional word “阿拉法特 (Arafat)” align to Arafat correctly, although these characters preserve different meanings from that of Arafat. This can be attributed to the frequent co-occurrence (192 愿愿 times) of these characters and Arafat in CCorpus. Moreover, 法 usually means France in Chinese, thus it may co-occur very often with France in CCorpus. If both France and Arafat appear in the English sentence, 法 may wrongly align to France. However, if 阿 aligns to Arafat, 法 will probably align to Arafat, because aligning 法 to Arafat could result in a lower distortion cost than aligning it to France. Different from alignment, translation is a pattern matching procedure (Lopez, 2008). WSR determines how the translation rules would be matched by the source sentences. For example, if we use translation rules with character as WSR to translate name entities such as the non-compositional word 阿拉法特, i.e. translating literally, we may get a wrong translation. That’s because the linguistic knowledge that the four characters convey a specific meaning different from the characters has been lost, which cannot always be totally recovered even by using phrase in phrase-based SMT systems (see Chang et al. (2008) for detail). Duan et al. (2010) and Paul et al., (2010) further pointed out that coarser-grained segmentation of the source sentence do help capture more contexts in translation. Therefore, rather than using character, using coarser-grained, at least as coarser as the conventional word, as WSR is quite necessary. 3 Converting Character Alignment to Word Alignment In order to use word as WSR, we employ the same method as Elming and Habash (2007)4 to convert the character alignment (CA) to its word-based version (CA ’) for translation rule induction. The conversion is very intuitive: for every English-Chinese word pair ??, ?? in the sentence pair, we align ? to ? as a link in CA ’, if and only if there is at least one Chinese character of ? aligns to ? in CA. Given two different segmentations A and B of the same sentence, it is easy to prove that if every word in A is finer-grained than the word of B at the corresponding position, the conversion is unambiguity (we omit the proof due to space limitation). As character is a finer-grained than its original word, character alignment can always be converted to alignment based on any word segmentation. Therefore, our approach can be naturally scaled to syntax-based system by converting character alignment to word alignment where the word seg- mentation is consistent with the parsers. We compare CA with the conventional word alignment (WA) as follows: We hand-align some sentence pairs as the evaluation set based on characters (ESChar), and converted it to the evaluation set based on word (ESWord) using the above conversion method. It is worth noting that comparing CA and WA by evaluating CA on ESChar and evaluating WA on ESWord is meaningless, because the basic tokens in CA and WA are different. However, based on the conversion method, comparing CA with WA can be accomplished by evaluating both CA ’ and WA on ESWord. 4 They used this conversion for word alignment combination only, no translation results were reported. 287 4 Experiments 4.1 Setup FBIS corpus (LDC2003E14) (210K sentence pairs) was used for small-scale task. A large bilingual corpus of our lab (1.9M sentence pairs) was used for large-scale task. The NIST’06 and NIST’08 test sets were used as the development set and test set respectively. The Chinese portions of all these data were preprocessed by character segmenter (CHAR), ICTCLAS word segmenter5 (ICT) and Stanford word segmenters with CTB and PKU specifications6 respectively. The first 100 sentence pairs of the hand-aligned set in Haghighi et al. (2009) were hand-aligned as ESChar, which is converted to three ESWords based on three segmentations respectively. These ESWords were appended to training corpus with the corresponding word segmentation for evaluation purpose. Both character and word alignment were performed by GIZA++ (Och and Ney, 2003) enhanced with gdf heuristics to combine bidirectional alignments (Koehn et al., 2003). A 5-gram language model was trained from the Xinhua portion of Gigaword corpus. A phrase-based MT decoder similar to (Koehn et al., 2007) was used with the decoding weights optimized by MERT (Och, 2003). 4.2 Evaluation We first evaluate the alignment quality. The method discussed in section 3 was used to compare character and word alignment. As can be seen from Table 3, the systems using character as WSA outperformed the ones using word as WSA in both small-scale (row 3-5) and large-scale task (row 6-8) with all segmentations. This gain can be attributed to the small vocabulary size (sparsity) for character alignment. The observation is consistent with Koehn (2005) which claimed that there is a negative correlation between the vocabulary size and translation performance without explicitly distinguishing WSA and WSR. We then evaluated the translation performance. The baselines are fully word-based MT systems (WordSys), i.e. using word as both WSA and WSR, and fully character-based systems (CharSys). Table 5 http://www.ictclas.org/ 6 http://nlp.stanford.edu/software/segmenter.shtml TLSablCIPeKT3BUAlig87 n609P5mW.0162eonrdt8avl52R01ai.g l6489numatieo78n29F t. 46590PrecC87 i1hP28s.oa3027rn(ctPe89)r6R05,.ar7162e3licganm8 (15F62eR.n983)t, TableSL4TWwrcahonSraAdslatioWw no SerdRvalu2Ct31iT.o405Bn1724ofW2P 301Ko.895rU61d Sy2sI03Ca.29nT035d4 proand F-score (F) with ? ? 0.5 (Fraser and Marcu, 2007) posed system using BLEU-SBP (Chiang et al., 2008) 4 compares WordSys to our proposed system. Significant testing was carried out using bootstrap re-sampling method proposed by Koehn (2004) with a 95% confidence level. We see that our proposed systems outperformed WordSys in all segmentation specifications settings. Table 5 lists the results of CharSys in small-scale task. In this setting, we gradually set the phrase length and the distortion limits of the phrase-based decoder (context size) to 7, 9, 11 and 13, in order to remove the disadvantage of shorter context size of using character as WSR for fair comparison with WordSys as suggested by Duan et al. (2010). Comparing Table 4 and 5, we see that all CharSys underperformed WordSys. This observation is consistent with Chang et al. (2008) which claimed that using characters, even with large phrase length (up to 13 in our experiment) cannot always capture everything a Chinese word segmenter can do, and using word for translation is quite necessary. We also see that CharSys underperformed our proposed systems, that’s because the harm of using character as WSR outweighed the benefit of using character as WSA, which indicated that word segmentation better for alignment is not necessarily better for translation, and vice versa. We finally compared our approaches to Ma et al. (2007) and Ma and Way (2009), which proposed “packed word (PW)” and “bilingual motivated word (BS)” respectively. Both methods iteratively learn word segmentation and alignment alternatively, with the former starting from word-based corpus and the latter starting from characters-based corpus. Therefore, PW can be experimented on all segmentations. Table 6 lists their results in small- 288 Context Size 7 9 11 13 BLEU 20.90 21.19 20.89 21.09 Table 5 Translation evaluation of CharSys. CWPhrSoayps+TdoPtSaBebWmySdsle6wWPcCBhoWSa rAmdpawWrPBisoWS rRdnwiC2t1hT.2504oB6the2r1P0w9K.2o178U496rk s2I10C.9T547 scale task, we see that both PW and BS underperformed our approach. This may be attributed to the low recall of the learned BS or PW in their approaches. BS underperformed both two baselines, one reason is that Ma and Way (2009) also employed word lattice decoding techniques (Dyer et al., 2008) to tackle the low recall of BS, which was removed from our experiments for fair comparison. Interestingly, we found that using character as WSA and BS as WSR (Char+BS), a moderate gain (+0.43 point) was achieved compared with fully BS-based system; and using character as WSA and PW as WSR (Char+PW), significant gains were achieved compared with fully PW-based system, the result of CTB segmentation in this setting even outperformed our proposed approach (+0.42 point). This observation indicated that in our framework, better combinations of WSA and WSR can be found to achieve better translation performance. 5 Conclusions and Future Work We proposed a SMT framework that uses character for alignment and word for translation, which improved both alignment quality and translation performance. We believe that in this framework, using other finer-grained segmentation, with fewer ambiguities than character, would better parameterize the alignment models, while using other coarser-grained segmentation as WSR can help capture more linguistic knowledge than word to get better translation. We also believe that our approach, if integrated with combination techniques (Dyer et al., 2008; Xi et al., 2011), can yield better results. Acknowledgments We thank ACL reviewers. This work is supported by the National Natural Science Foundation of China (No. 61003 112), the National Fundamental Research Program of China (2010CB327903). References Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Peitra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2), pages 263-3 11. Pi-Chuan Chang, Michel Galley, and Christopher D. Manning. 2008. Optimizing Chinese word segmentation for machine translation performance. In Proceedings of third workshop on SMT, pages 224-232. David Chiang, Steve DeNeefe, Yee Seng Chan and Hwee Tou Ng. 2008. Decomposability of Translation Metrics for Improved Evaluation and Efficient Algorithms. In Proceedings of Conference on Empirical Methods in Natural Language Processing, pages 610-619. Tagyoung Chung and Daniel Gildea. 2009. Unsupervised tokenization for machine translation. In Proceedings of Conference on Empirical Methods in Natural Language Processing, pages 718-726. Michael Collins. 1999. Head-driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania. Xiangyu Duan, Min Zhang, and Haizhou Li. 2010. Pseudo-word for phrase-based machine translation. In Proceedings of the Association for Computational Linguistics, pages 148-156. Christopher Dyer, Smaranda Muresan, and Philip Resnik. 2008. Generalizing word lattice translation. In Proceedings of the Association for Computational Linguistics, pages 1012-1020. Jakob Elming and Nizar Habash. 2007. Combination of statistical word alignments based on multiple preprocessing schemes. In Proceedings of the Association for Computational Linguistics, pages 25-28. Alexander Fraser and Daniel Marcu. 2007. Squibs and Discussions: Measuring Word Alignment Quality for Statistical Machine Translation. In Computational Linguistics, 33(3), pages 293-303. Aria Haghighi, John Blitzer, John DeNero, and Dan Klein. 2009. Better word alignments with supervised ITG models. In Proceedings of the Association for Computational Linguistics, pages 923-93 1. Phillip Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan,W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, E. Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the Association for Computational Linguistics, pages 177-1 80. 289 Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods on Natural Language Processing, pages 388-395. Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of the MT Summit. Adam David Lopez. 2008. Machine translation by pattern matching. Ph.D. thesis, University of Maryland. Yanjun Ma, Nicolas Stroppa, and Andy Way. 2007. Bootstrapping word alignment via word packing. In Proceedings of the Association for Computational Linguistics, pages 304-3 11. Yanjun Ma and Andy Way. 2009. Bilingually motivated domain-adapted word segmentation for statistical machine translation. In Proceedings of the Conference of the European Chapter of the ACL, pages 549-557. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the Association for Computational Linguistics, pages 440-447. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1), pages 19-5 1. Michael Paul, Andrew Finch and Eiichiro Sumita. 2010. Integration of multiple bilingually-learned segmentation schemes into statistical machine translation. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pages 400-408. Jörg Tiedemann. 2009. Character-based PSMT for closely related languages. In Proceedings of the Annual Conference of the European Association for machine Translation, pages 12-19. David Vilar, Jan-T. Peter and Hermann Ney. 2007. Can we translate letters? In Proceedings of the Second Workshop on Statistical Machine Translation, pages 33-39. Xinyan Xiao, Yang Liu, Young-Sook Hwang, Qun Liu and Shouxun Lin. 2010. Joint tokenization and translation. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 1200-1208. Ning Xi, Guangchao Tang, Boyuan Li, and Yinggong Zhao. 2011. Word alignment combination over multiple word segmentation. In Proceedings of the ACL 2011 Student Session, pages 1-5. Ruiqiang Zhang, Keiji Yasuda, and Eiichiro Sumita. 2008. Improved statistical machine translation by multiple Chinese word segmentation. of the Third Workshop on Statistical Machine Translation, pages 216-223. 290 In Proceedings

3 0.53641689 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

Author: Jonathan Berant ; Ido Dagan ; Meni Adler ; Jacob Goldberger

Abstract: Learning entailment rules is fundamental in many semantic-inference applications and has been an active field of research in recent years. In this paper we address the problem of learning transitive graphs that describe entailment rules between predicates (termed entailment graphs). We first identify that entailment graphs exhibit a “tree-like” property and are very similar to a novel type of graph termed forest-reducible graph. We utilize this property to develop an iterative efficient approximation algorithm for learning the graph edges, where each iteration takes linear time. We compare our approximation algorithm to a recently-proposed state-of-the-art exact algorithm and show that it is more efficient and scalable both theoretically and empirically, while its output quality is close to that given by the optimal solution of the exact algorithm.

4 0.53634614 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili ; Martha Palmer

5 0.53565323 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

Author: Gerard de Melo ; Gerhard Weikum

Abstract: We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.

6 0.53502703 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

7 0.53005284 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars

8 0.52939218 191 acl-2012-Temporally Anchored Relation Extraction

9 0.52647138 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

10 0.52558041 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

11 0.52463251 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool

12 0.52444923 187 acl-2012-Subgroup Detection in Ideological Discussions

13 0.52241832 48 acl-2012-Classifying French Verbs Using French and English Lexical Resources

14 0.52162218 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization

15 0.52076858 167 acl-2012-QuickView: NLP-based Tweet Search

16 0.52044284 31 acl-2012-Authorship Attribution with Author-aware Topic Models

17 0.52043515 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities

18 0.52033955 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

19 0.51989746 184 acl-2012-String Re-writing Kernel

20 0.51911116 139 acl-2012-MIX Is Not a Tree-Adjoining Language