acl acl2013 acl2013-186 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Veronika Vincze ; Istvan Nagy T. ; Richard Farkas
Abstract: Here, we introduce a machine learningbased approach that allows us to identify light verb constructions (LVCs) in Hungarian and English free texts. We also present the results of our experiments on the SzegedParalellFX English–Hungarian parallel corpus where LVCs were manually annotated in both languages. With our approach, we were able to contrast the performance of our method and define language-specific features for these typologically different languages. Our presented method proved to be sufficiently robust as it achieved approximately the same scores on the two typologically different languages.
Reference: text
sentIndex sentText sentNum sentScore
1 hu Abstract Here, we introduce a machine learningbased approach that allows us to identify light verb constructions (LVCs) in Hungarian and English free texts. [sent-6, score-0.662]
2 We also present the results of our experiments on the SzegedParalellFX English–Hungarian parallel corpus where LVCs were manually annotated in both languages. [sent-7, score-0.046]
3 With our approach, we were able to contrast the performance of our method and define language-specific features for these typologically different languages. [sent-8, score-0.093]
4 Our presented method proved to be sufficiently robust as it achieved approximately the same scores on the two typologically different languages. [sent-9, score-0.093]
5 However, the investigation of languages that are typologically different from English is also essential since it can lead to innovations that might be usefully integrated into systems developed for English. [sent-11, score-0.089]
6 In this paper, we focus on the task of identifying light verb constructions (LVCs) in English and Hungarian free texts. [sent-13, score-0.63]
7 Thus, the same task will be carried out for English and a morphologically rich language. [sent-14, score-0.075]
8 We compare whether the same set of features can be used for both languages, we investigate the benefits of integrating language specific features into the systems and we explore how the systems could be further improved. [sent-15, score-0.07]
9 For this purpose, we make use of the English–Hungarian parallel corpus SzegedParalellFX (Vincze, 2012), where LVCs have been manually annotated. [sent-16, score-0.046]
10 2 Light Verb Constructions Light verb constructions (e. [sent-17, score-0.34]
11 to give advice) are a subtype of multiword expressions (Sag et al. [sent-19, score-0.422]
12 They consist of a nominal and a verbal component where the verb functions as the syntactic head, but the semantic head is the noun. [sent-21, score-0.275]
13 The verbal component (also called a light verb) usually loses its original sense to some extent. [sent-22, score-0.293]
14 LVCs are usually distinguished from productive or literal verb + noun constructions on the one hand and idiomatic verb + noun expressions on the other (Fazly and Stevenson, 2007). [sent-25, score-0.872]
15 Variativity and omitting the verb play the most significant role in distinguishing LVCs from productive constructions and idioms (Vincze, 2011). [sent-26, score-0.44]
16 Variativity reflects the fact that LVCs can be often substituted by a verb derived from the same root as the nominal component within the construction: productive constructions and idioms can be rarely substituted by a single verb (like make a decision decide). [sent-27, score-0.65]
17 Omitting the verb exploits the fact that it is the nominal component that mostly bears the semantic content of the LVC, hence the event denoted by the construction can be determined even without the verb in most cases. [sent-28, score-0.466]
18 Furthermore, the very same noun + verb combination may function as an LVC in certain contexts while it isjust a productive construction in other ones, compare He gave her a – 255 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t. [sent-29, score-0.325]
19 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 25 –261, ring made of gold (non-LVC) and He gave her a ring because he wanted to hear her voice (LVC), hence it is important to identify them in context. [sent-31, score-0.083]
20 In theoretical linguistics, Kearns (2002) distinguishes between two subtypes of light verb constructions. [sent-32, score-0.434]
21 True light verb constructions such as to give a wipe or to have a laugh and vague action verbs such as to make an agreement or to do the ironing differ in some syntactic and semantic features and can be separated by various tests, e. [sent-33, score-0.688]
22 This distinction also manifests in natural language processing as several authors pay attention to the identification of just true light verb constructions, e. [sent-36, score-0.472]
23 However, here we do not make such a distinction and aim to identify all types of light verb constructions both in English and in Hungarian, in accordance with the annotation principles of SZPFX. [sent-39, score-0.574]
24 The canonical form of a Hungarian light verb construction is a bare noun + third person singular verb. [sent-40, score-0.516]
25 However, they may occur in non-canonical versions as well: the verb may precede the noun, or the noun and the verb may be not adjacent due to the free word order. [sent-41, score-0.459]
26 Moreover, as Hungarian is a morphologically rich language, the verb may occur in different surface forms inflected for tense, mood, person and number. [sent-42, score-0.228]
27 Parallel corpora are of high importance in the automatic identification of multiword expressions: it is usually one-to-many correspondence that is exploited when designing methods for detecting multiword expressions. [sent-48, score-0.646]
28 (2010) developed an alignment-based method for extracting multiword expressions from Portuguese–English parallel corpora. [sent-50, score-0.468]
29 Samardˇ zi c´ and Merlo (2010) an- alyzed English and German light verb constructions in parallel corpora: they pay special attention to their manual and automatic alignment. [sent-51, score-0.675]
30 Zarrieß and Kuhn (2009) argued that multiword expressions can be reliably detected in parallel corpora by using dependency-parsed, word-aligned sentences. [sent-52, score-0.468]
31 a combination of a light verb and a noun, a verb or an adjective) in a Hindi–English parallel corpus by identifying a mismatch of the Hindi light verb meaning in the aligned English sentence. [sent-55, score-1.07]
32 (2010) when identifying Arabic multiword expressions relying on asymmetries between paralell entry titles of Wikipedia. [sent-57, score-0.45]
33 Tsvetkov and Wintner (2010) identified Hebrew multiword expressions by searching for misalignments in an English–Hebrew parallel corpus. [sent-58, score-0.468]
34 To the best of our knowledge, parallel corpora have not been used for testing the efficiency of an MWE-detecting method for two languages at the same time. [sent-59, score-0.077]
35 Here, we investigate the performance of our base LVC-detector on English and Hungar- ian and pay special attention to the added value of language-specific features. [sent-60, score-0.056]
36 4 Experiments In our investigations we made use of the SzegedParalellFX English-Hungarian parallel corpus, which consists of 14,000 sentences and contains about 1370 LVCs for each language. [sent-61, score-0.046]
37 This binary classifier was based on a rich feature set described below. [sent-67, score-0.053]
38 The candidate extraction method investi– –, gated the dependency relation among the verbs and nouns. [sent-68, score-0.151]
39 The dependency labels were provided by the Bohnet parser (Bohnet, 2010) for English and by magyarlanc 2 . [sent-70, score-0.055]
40 verb + noun) the candidate was marked as true; otherwise as false. [sent-77, score-0.224]
41 The English auxiliary verbs, do and have often occur as light verbs, hence we defined a feature for the two verbs to denote whether or not they were auxiliary verbs in a given sentence. [sent-78, score-0.441]
42 The POS code of the next word of LVC candidate was also applied as a feature. [sent-79, score-0.048]
43 As Hungarian is a morphologically rich language, we were able to define various morphology-based features like the case of the noun or its number etc. [sent-80, score-0.166]
44 Nouns which were historically derived from verbs but were not treated as derivation by the Hungarian morphological parser were also added as a feature. [sent-81, score-0.117]
45 Semantic features: This feature also exploited the fact that the nominal component is usually derived from verbs. [sent-82, score-0.117]
46 Consequently, the act ivity or event semantic senses were looked for among the upper level hyperonyms of the head of the noun phrase in English WordNet 3. [sent-83, score-0.079]
47 Orthographic features: The suffix feature is also based on the fact that many nominal components in LVCs are derived from verbs. [sent-86, score-0.066]
48 This feature checks whether the lemma of the noun ended in a given character bi- or trigram. [sent-87, score-0.105]
49 The number of words of the candidate LVC was also noted and applied as a feature. [sent-88, score-0.048]
50 Statistical features: Potential English LVCs and their occurrences were collected from 10,000 English Wikipedia pages by the candidate extraction method. [sent-89, score-0.072]
51 The number of occurrences was used as a feature when the candidate was one ofthe syntactic phrases collected. [sent-90, score-0.074]
52 Lexical features: We exploit the fact that the most common verbs are typically light verbs. [sent-91, score-0.313]
53 Therefore, fifteen typical light verbs were selected from the list of the most frequent verbs taken from the Wiki50 (Vincze et al. [sent-92, score-0.418]
54 Then, we investigated whether the lemmatised verbal component of the candidate was one of these fifteen verbs. [sent-96, score-0.16]
55 The lemma of the noun was also applied as a lexical feature. [sent-97, score-0.079]
56 Afterwards, we constructed lists of lemmatised LVCs got from the other corpora. [sent-99, score-0.074]
57 Syntactic features: As the candidate extraction methods basically depended on the dependency relation between the noun and the verb, they could also be utilised in identifying LVCs. [sent-100, score-0.179]
58 Though the dob j ,prep, rcmod, partmod or nsub jpas s dependency labels were used in candidate extraction in the case ofEnglish, these syntactic relations were defined as features, while the att, ob j , obl, sub j dependency relations were used in the case of Hungarian. [sent-101, score-0.072]
59 When the noun had a determiner in the candidate LVC, it was also encoded as another syntactic feature. [sent-102, score-0.127]
60 Our feature set includes language-independent and language-specific features as well. [sent-103, score-0.061]
61 Languageindependent features seek to acquire general features ofLVCs while language-specific features can be applied due to the different grammatical characteristics of the two languages or due to the availability of different resources. [sent-104, score-0.136]
62 – – The potential LVCs which are extracted by the candidate extraction method but not marked as positive in the gold standard were classed as negative. [sent-115, score-0.109]
63 The candidate extraction methods could not detect all LVCs in the corpus data, so some positive elements in the corpora were not covered. [sent-117, score-0.072]
64 257 Features Base English Hungarian Orthographical•–– VerbalStem •• POS pattern •• LVC list •• Light verb list •• Semantic features •• Syntactic features •• Auxiliary verb –•• Determiner •• Noun list •• POS After •• LVC freq. [sent-119, score-0.422]
65 5 Results As a baseline, a context free dictionary matching method was applied. [sent-131, score-0.064]
66 Table 2 lists the results got on the two different parts of SZPFX using the machine learningbased approach and the baseline dictionary matching. [sent-134, score-0.143]
67 However, the machine learningbased approach proved to be the most successful as it achieved an F-score that was 18. [sent-137, score-0.095]
68 At the same time, the machine learning and dictionary matching methods got roughly the same precision score on the Hungarian part of SZPFX, but again the machine learning-based approach achieved the best F-score. [sent-140, score-0.083]
69 While in the case of English the dictionary matching method got a higher precision score, the machine learning approach proved to be more effective. [sent-141, score-0.118]
70 An ablation analysis was carried out to examine the effectiveness of each individual feature of the machine learning-based candidate classifica- terms of F-score using the SZPFX corpus. [sent-142, score-0.097]
71 For each feature type, a J48 classifier was trained with all of the features except that one. [sent-144, score-0.061]
72 We also investigated how language-specific features improved the performance compared to the base feature set. [sent-145, score-0.086]
73 We then compared the performance to that got with all the features. [sent-146, score-0.047]
74 6 Discussion According to the results, our base system is robust enough to achieve approximately the same results on two typologically different languages. [sent-150, score-0.083]
75 It should be also mentioned that some of the base features (e. [sent-152, score-0.06]
76 they were included in the base feature set) since it was also effective in the case of the other language. [sent-156, score-0.051]
77 This may be related to the fact that the distribution of light verbs is quite different in the two languages. [sent-160, score-0.313]
78 While the top 15 verbs covers more than 80% of the English LVCs, in Hungarian, this number is only 63% (and in order to reach the same coverage, 38 verbs should be included). [sent-161, score-0.158]
79 Another difference is that there are 102 258 different verbs in English, which follow the Zipf distribution, on the other hand, there are 157 Hungarian verbs with a more balanced distributional pattern. [sent-162, score-0.158]
80 Thus, fewer verbs cover a greater part of LVCs in English than in Hungarian and this also explains why lexical features contribute more to the overall performance in English. [sent-163, score-0.114]
81 This fact also indicates that if verb lists are further extended, still better recall scores may be achieved for both languages. [sent-164, score-0.176]
82 As for the effectiveness of morphological and syntactic features, morphological features perform better on a language with a rich morphological representation (Hungarian). [sent-165, score-0.176]
83 Among the general errors, we found that LVCs with a rare light verb were difficult to recognize (e. [sent-171, score-0.41]
84 In other cases, an originally deverbal noun was used in a lexicalised sense together with a typical light verb ((e. [sent-174, score-0.489]
85 buildings are given (something)) and these candidates were falsely classed as LVCs. [sent-176, score-0.037]
86 As for language-specific errors, English verb- particle combinations (VPCs) followed by a noun were often labeled as LVCs such as make up his mind or give in his notice. [sent-178, score-0.079]
87 In Hungarian, verb + proper noun constructions (Hamletet j´ atssz a´k (Hamlet-ACC play-3PL. [sent-179, score-0.419]
88 DEF) “they are playing Hamlet”) were sometimes regarded as LVCs since the morphological analysis does not make a distinction between proper and common nouns. [sent-180, score-0.038]
89 This may be explained by the fact that different authors aimed to identify a different scope of linguistic phenomena and thus interpreted the concept of “light verb construction” slightly differently. [sent-183, score-0.176]
90 (2006) focused only on true light verb constructions while only object–verb pairs are considered in other studies (Stevenson et al. [sent-185, score-0.574]
91 Several other studies report results only on light verb constructions formed with certain light verbs (Stevenson et al. [sent-189, score-0.887]
92 7 Conclusions In this paper, we introduced our machine learningbased approach for identifying LVCs in Hungarian and English free texts. [sent-197, score-0.116]
93 The method proved to be sufficiently robust as it achieved approximately the same scores on two typologically dif- ferent languages. [sent-198, score-0.093]
94 In addition, some language-independent features were inspired by one of the languages, so a multilingual approach proved to be fruitful in the case of monolingual LVC detection as well. [sent-200, score-0.07]
95 Later, we also plan to adapt the tool to other types of multiword expressions and conduct further experiments on languages other than English and Hungarian, the results of which may further lead to a more robust, general LVC system. [sent-202, score-0.453]
96 A measure of syntactic flexibility for automatically identifying multiword expressions in corpora. [sent-226, score-0.45]
97 Pulling their weight: exploiting syntactic forms for the automatic identification of idiomatic expressions in context. [sent-239, score-0.186]
98 Cross-lingual variation of light verb constructions: Using parallel corpora and automatic alignment for linguistic research. [sent-286, score-0.456]
99 Extending corpus-based identification of light verb constructions using a supervised learning framework. [sent-309, score-0.605]
100 Multiword expressions and named entities in the Wiki50 corpus. [sent-349, score-0.128]
wordName wordTfidf (topN-words)
[('lvcs', 0.589), ('hungarian', 0.314), ('multiword', 0.294), ('lvc', 0.244), ('light', 0.234), ('vincze', 0.195), ('verb', 0.176), ('constructions', 0.164), ('szpfx', 0.129), ('expressions', 0.128), ('veronika', 0.099), ('szeged', 0.098), ('fazly', 0.081), ('noun', 0.079), ('verbs', 0.079), ('szegedparalellfx', 0.074), ('learningbased', 0.06), ('typologically', 0.058), ('mwe', 0.056), ('bor', 0.055), ('magyarlanc', 0.055), ('tan', 0.054), ('stevenson', 0.054), ('tu', 0.052), ('afsaneh', 0.049), ('english', 0.049), ('candidate', 0.048), ('got', 0.047), ('parallel', 0.046), ('istv', 0.045), ('nagy', 0.045), ('productive', 0.043), ('csirik', 0.042), ('nos', 0.042), ('nominal', 0.04), ('broader', 0.039), ('morphological', 0.038), ('hindi', 0.038), ('attila', 0.037), ('classed', 0.037), ('eged', 0.037), ('gurrutxaga', 0.037), ('ltz', 0.037), ('samard', 0.037), ('sanrom', 0.037), ('tsvetkov', 0.037), ('variativity', 0.037), ('verbalstem', 0.037), ('zsibrita', 0.037), ('roth', 0.037), ('dictionary', 0.036), ('proved', 0.035), ('verbal', 0.035), ('features', 0.035), ('suzanne', 0.035), ('zarrie', 0.033), ('bego', 0.033), ('caseli', 0.033), ('moir', 0.033), ('pay', 0.031), ('languages', 0.031), ('identification', 0.031), ('organizing', 0.03), ('morristown', 0.03), ('alonso', 0.03), ('cruys', 0.03), ('ofenglish', 0.03), ('omitting', 0.03), ('mih', 0.03), ('attia', 0.03), ('ring', 0.03), ('sz', 0.03), ('cook', 0.029), ('aline', 0.028), ('free', 0.028), ('identifying', 0.028), ('exploited', 0.027), ('rich', 0.027), ('construction', 0.027), ('bond', 0.027), ('bannard', 0.027), ('idioms', 0.027), ('idiomatic', 0.027), ('lemmatised', 0.027), ('feature', 0.026), ('fifteen', 0.026), ('sag', 0.026), ('evert', 0.026), ('perspective', 0.026), ('base', 0.025), ('morphologically', 0.025), ('mel', 0.025), ('basque', 0.025), ('component', 0.024), ('extraction', 0.024), ('zi', 0.024), ('subtypes', 0.024), ('hence', 0.023), ('nj', 0.023), ('carried', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 186 acl-2013-Identifying English and Hungarian Light Verb Constructions: A Contrastive Approach
Author: Veronika Vincze ; Istvan Nagy T. ; Richard Farkas
Abstract: Here, we introduce a machine learningbased approach that allows us to identify light verb constructions (LVCs) in Hungarian and English free texts. We also present the results of our experiments on the SzegedParalellFX English–Hungarian parallel corpus where LVCs were manually annotated in both languages. With our approach, we were able to contrast the performance of our method and define language-specific features for these typologically different languages. Our presented method proved to be sufficiently robust as it achieved approximately the same scores on the two typologically different languages.
2 0.13434674 302 acl-2013-Robust Automated Natural Language Processing with Multiword Expressions and Collocations
Author: Valia Kordoni ; Markus Egg
Abstract: unkown-abstract
3 0.092440829 366 acl-2013-Understanding Verbs based on Overlapping Verbs Senses
Author: Kavitha Rajan
Abstract: Natural language can be easily understood by everyone irrespective of their differences in age or region or qualification. The existence of a conceptual base that underlies all natural languages is an accepted claim as pointed out by Schank in his Conceptual Dependency (CD) theory. Inspired by the CD theory and theories in Indian grammatical tradition, we propose a new set of meaning primitives in this paper. We claim that this new set of primitives captures the meaning inherent in verbs and help in forming an inter-lingual and computable ontological classification of verbs. We have identified seven primitive overlapping verb senses which substantiate our claim. The percentage of coverage of these primitives is 100% for all verbs in Sanskrit and Hindi and 3750 verbs in English. 1
4 0.087243326 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics
Author: Rebecca J. Passonneau ; Emily Chen ; Weiwei Guo ; Dolores Perin
Abstract: The pyramid method for content evaluation of automated summarizers produces scores that are shown to correlate well with manual scores used in educational assessment of students’ summaries. This motivates the development of a more accurate automated method to compute pyramid scores. Of three methods tested here, the one that performs best relies on latent semantics.
5 0.087070204 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners
Author: Lis Pereira ; Erlyn Manguilimotan ; Yuji Matsumoto
Abstract: This study addresses issues of Japanese language learning concerning word combinations (collocations). Japanese learners may be able to construct grammatically correct sentences, however, these may sound “unnatural”. In this work, we analyze correct word combinations using different collocation measures and word similarity methods. While other methods use well-formed text, our approach makes use of a large Japanese language learner corpus for generating collocation candidates, in order to build a system that is more sensitive to constructions that are difficult for learners. Our results show that we get better results compared to other methods that use only wellformed text. 1
6 0.067100927 213 acl-2013-Language Acquisition and Probabilistic Models: keeping it simple
7 0.063250929 270 acl-2013-ParGramBank: The ParGram Parallel Treebank
8 0.062621213 8 acl-2013-A Learner Corpus-based Approach to Verb Suggestion for ESL
9 0.060885608 119 acl-2013-Diathesis alternation approximation for verb clustering
10 0.059491526 29 acl-2013-A Visual Analytics System for Cluster Exploration
11 0.05929612 192 acl-2013-Improved Lexical Acquisition through DPP-based Verb Clustering
13 0.054283407 378 acl-2013-Using subcategorization knowledge to improve case prediction for translation to German
14 0.052125569 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
15 0.050220303 116 acl-2013-Detecting Metaphor by Contextual Analogy
16 0.050073199 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing
17 0.050030921 45 acl-2013-An Empirical Study on Uncertainty Identification in Social Media Context
18 0.04887059 258 acl-2013-Neighbors Help: Bilingual Unsupervised WSD Using Context
19 0.04526281 121 acl-2013-Discovering User Interactions in Ideological Discussions
20 0.043839291 303 acl-2013-Robust multilingual statistical morphological generation models
topicId topicWeight
[(0, 0.125), (1, 0.017), (2, -0.017), (3, -0.057), (4, -0.052), (5, -0.035), (6, -0.051), (7, 0.029), (8, 0.059), (9, -0.022), (10, -0.035), (11, 0.007), (12, 0.001), (13, 0.021), (14, -0.072), (15, -0.025), (16, -0.021), (17, -0.057), (18, 0.018), (19, -0.006), (20, -0.01), (21, -0.009), (22, 0.078), (23, -0.052), (24, 0.1), (25, 0.017), (26, -0.085), (27, -0.016), (28, 0.013), (29, 0.004), (30, 0.026), (31, -0.017), (32, -0.049), (33, -0.077), (34, -0.031), (35, 0.01), (36, -0.067), (37, -0.024), (38, 0.09), (39, -0.055), (40, -0.0), (41, -0.107), (42, 0.035), (43, 0.015), (44, 0.051), (45, 0.006), (46, 0.074), (47, -0.065), (48, -0.029), (49, -0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.91984582 186 acl-2013-Identifying English and Hungarian Light Verb Constructions: A Contrastive Approach
Author: Veronika Vincze ; Istvan Nagy T. ; Richard Farkas
Abstract: Here, we introduce a machine learningbased approach that allows us to identify light verb constructions (LVCs) in Hungarian and English free texts. We also present the results of our experiments on the SzegedParalellFX English–Hungarian parallel corpus where LVCs were manually annotated in both languages. With our approach, we were able to contrast the performance of our method and define language-specific features for these typologically different languages. Our presented method proved to be sufficiently robust as it achieved approximately the same scores on the two typologically different languages.
2 0.75482303 302 acl-2013-Robust Automated Natural Language Processing with Multiword Expressions and Collocations
Author: Valia Kordoni ; Markus Egg
Abstract: unkown-abstract
3 0.71770191 366 acl-2013-Understanding Verbs based on Overlapping Verbs Senses
Author: Kavitha Rajan
Abstract: Natural language can be easily understood by everyone irrespective of their differences in age or region or qualification. The existence of a conceptual base that underlies all natural languages is an accepted claim as pointed out by Schank in his Conceptual Dependency (CD) theory. Inspired by the CD theory and theories in Indian grammatical tradition, we propose a new set of meaning primitives in this paper. We claim that this new set of primitives captures the meaning inherent in verbs and help in forming an inter-lingual and computable ontological classification of verbs. We have identified seven primitive overlapping verb senses which substantiate our claim. The percentage of coverage of these primitives is 100% for all verbs in Sanskrit and Hindi and 3750 verbs in English. 1
4 0.65083253 8 acl-2013-A Learner Corpus-based Approach to Verb Suggestion for ESL
Author: Yu Sawai ; Mamoru Komachi ; Yuji Matsumoto
Abstract: We propose a verb suggestion method which uses candidate sets and domain adaptation to incorporate error patterns produced by ESL learners. The candidate sets are constructed from a large scale learner corpus to cover various error patterns made by learners. Furthermore, the model is trained using both a native corpus and the learner corpus via a domain adaptation technique. Experiments on two learner corpora show that the candidate sets increase the coverage of error patterns and domain adaptation improves the performance for verb suggestion.
5 0.63802224 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners
Author: Lis Pereira ; Erlyn Manguilimotan ; Yuji Matsumoto
Abstract: This study addresses issues of Japanese language learning concerning word combinations (collocations). Japanese learners may be able to construct grammatically correct sentences, however, these may sound “unnatural”. In this work, we analyze correct word combinations using different collocation measures and word similarity methods. While other methods use well-formed text, our approach makes use of a large Japanese language learner corpus for generating collocation candidates, in order to build a system that is more sensitive to constructions that are difficult for learners. Our results show that we get better results compared to other methods that use only wellformed text. 1
6 0.60702646 344 acl-2013-The Effects of Lexical Resource Quality on Preference Violation Detection
8 0.5933463 119 acl-2013-Diathesis alternation approximation for verb clustering
9 0.58378971 213 acl-2013-Language Acquisition and Probabilistic Models: keeping it simple
10 0.55816531 270 acl-2013-ParGramBank: The ParGram Parallel Treebank
11 0.53946412 364 acl-2013-Typesetting for Improved Readability using Lexical and Syntactic Information
12 0.52804542 161 acl-2013-Fluid Construction Grammar for Historical and Evolutionary Linguistics
13 0.52681369 122 acl-2013-Discriminative Approach to Fill-in-the-Blank Quiz Generation for Language Learners
14 0.51260412 378 acl-2013-Using subcategorization knowledge to improve case prediction for translation to German
15 0.51021385 227 acl-2013-Learning to lemmatise Polish noun phrases
16 0.50025052 192 acl-2013-Improved Lexical Acquisition through DPP-based Verb Clustering
17 0.47404751 28 acl-2013-A Unified Morpho-Syntactic Scheme of Stanford Dependencies
18 0.47364402 299 acl-2013-Reconstructing an Indo-European Family Tree from Non-native English Texts
19 0.47031224 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context
20 0.46911061 76 acl-2013-Building and Evaluating a Distributional Memory for Croatian
topicId topicWeight
[(0, 0.031), (6, 0.021), (11, 0.064), (13, 0.274), (15, 0.014), (24, 0.056), (26, 0.044), (31, 0.015), (35, 0.073), (42, 0.103), (48, 0.041), (64, 0.012), (70, 0.038), (88, 0.019), (90, 0.026), (95, 0.08)]
simIndex simValue paperId paperTitle
1 0.79147243 386 acl-2013-What causes a causal relation? Detecting Causal Triggers in Biomedical Scientific Discourse
Author: Claudiu Mihaila ; Sophia Ananiadou
Abstract: Current domain-specific information extraction systems represent an important resource for biomedical researchers, who need to process vaster amounts of knowledge in short times. Automatic discourse causality recognition can further improve their workload by suggesting possible causal connections and aiding in the curation of pathway models. We here describe an approach to the automatic identification of discourse causality triggers in the biomedical domain using machine learning. We create several baselines and experiment with various parameter settings for three algorithms, i.e., Conditional Random Fields (CRF), Support Vector Machines (SVM) and Random Forests (RF). Also, we evaluate the impact of lexical, syntactic and semantic features on each of the algorithms and look at er- rors. The best performance of 79.35% F-score is achieved by CRFs when using all three feature types.
same-paper 2 0.77540189 186 acl-2013-Identifying English and Hungarian Light Verb Constructions: A Contrastive Approach
Author: Veronika Vincze ; Istvan Nagy T. ; Richard Farkas
Abstract: Here, we introduce a machine learningbased approach that allows us to identify light verb constructions (LVCs) in Hungarian and English free texts. We also present the results of our experiments on the SzegedParalellFX English–Hungarian parallel corpus where LVCs were manually annotated in both languages. With our approach, we were able to contrast the performance of our method and define language-specific features for these typologically different languages. Our presented method proved to be sufficiently robust as it achieved approximately the same scores on the two typologically different languages.
3 0.75475514 322 acl-2013-Simple, readable sub-sentences
Author: Sigrid Klerke ; Anders Sgaard
Abstract: We present experiments using a new unsupervised approach to automatic text simplification, which builds on sampling and ranking via a loss function informed by readability research. The main idea is that a loss function can distinguish good simplification candidates among randomly sampled sub-sentences of the input sentence. Our approach is rated as equally grammatical and beginner reader appropriate as a supervised SMT-based baseline system by native speakers, but our setup performs more radical changes that better resembles the variation observed in human generated simplifications.
4 0.66791362 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
Author: Lei Cui ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou
Abstract: The quality of bilingual data is a key factor in Statistical Machine Translation (SMT). Low-quality bilingual data tends to produce incorrect translation knowledge and also degrades translation modeling performance. Previous work often used supervised learning methods to filter lowquality data, but a fair amount of human labeled examples are needed which are not easy to obtain. To reduce the reliance on labeled examples, we propose an unsupervised method to clean bilingual data. The method leverages the mutual reinforcement between the sentence pairs and the extracted phrase pairs, based on the observation that better sentence pairs often lead to better phrase extraction and vice versa. End-to-end experiments show that the proposed method substantially improves the performance in largescale Chinese-to-English translation tasks.
5 0.55583972 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
Author: lemao liu ; Taro Watanabe ; Eiichiro Sumita ; Tiejun Zhao
Abstract: Most statistical machine translation (SMT) systems are modeled using a loglinear framework. Although the log-linear model achieves success in SMT, it still suffers from some limitations: (1) the features are required to be linear with respect to the model itself; (2) features cannot be further interpreted to reach their potential. A neural network is a reasonable method to address these pitfalls. However, modeling SMT with a neural network is not trivial, especially when taking the decoding efficiency into consideration. In this paper, we propose a variant of a neural network, i.e. additive neural networks, for SMT to go beyond the log-linear translation model. In addition, word embedding is employed as the input to the neural network, which encodes each word as a feature vector. Our model outperforms the log-linear translation models with/without embedding features on Chinese-to-English and Japanese-to-English translation tasks.
6 0.55521792 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
7 0.55481416 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
8 0.55385411 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction
9 0.55006158 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
10 0.54935265 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation
11 0.54838008 80 acl-2013-Chinese Parsing Exploiting Characters
12 0.54819441 383 acl-2013-Vector Space Model for Adaptation in Statistical Machine Translation
13 0.54731882 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
14 0.547203 172 acl-2013-Graph-based Local Coherence Modeling
15 0.54705882 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
16 0.54621398 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding
17 0.54616714 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT
18 0.54595554 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation
19 0.5454796 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
20 0.54487646 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing