acl acl2013 acl2013-302 knowledge-graph by maker-knowledge-mining

302 acl-2013-Robust Automated Natural Language Processing with Multiword Expressions and Collocations


Source: pdf

Author: Valia Kordoni ; Markus Egg

Abstract: unkown-abstract

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Robust Automated Natural Language Processing with Multiword Expressions and Collocations Valia Kordoni and Markus Egg Humboldt-Universit a¨t zu Berlin (Germany) kordonie @ angl i ik . [sent-1, score-0.102]

2 Our target audience are researchers and practitioners in language technology, not necessarily experts in MWEs, who are interested in tasks that involve or could benefit from considering MWEs as a pervasive phenomenon in human language and communication. [sent-7, score-0.025]

3 2 Topic Overview Multiword expressions (MWEs) like break down, bus stop and make ends meet, are expressions con- sisting of two or more lexical units that correspond to some conventional way of saying things (Sag et al. [sent-8, score-0.321]

4 They range over linguistic constructions such as fixed phrases (per se, by and large), noun compounds (telephone booth, cable car), compound verbs (give a presentation), idioms (a frog in the throat, kill some time), etc. [sent-10, score-0.084]

5 While easily mastered by native speakers, their treatment and interpretation involves considerable effort for computational systems (and nonnative speakers), due to their idiosyncratic, flexible and heterogeneous nature (Rayson et al. [sent-14, score-0.132]

6 For a given MWE, there is also the problem of determining whether it forms a compositional (take away the dishes), semi-idiomatic (boil up the beans) or idiomatic combination (roll up your sleeves) (Kim and Nakov, 2011; Shutova et al. [sent-22, score-0.047]

7 Furthermore, MWEs may also be polysemous: bring up as carrying (bring up the bags), raising (bring up the children) and mentioning (bring up the subject). [sent-24, score-0.055]

8 the idiomatic use of spill in spilling beans as revealing secrets vs. [sent-27, score-0.232]

9 3 Content Overview This tutorial consists of four parts. [sent-29, score-0.066]

10 Part Istarts with a thorough introduction to different types of MWEs and collocations, their linguistic dimensions (idiomaticity, syntactic and semantic fixedness, specificity, etc. [sent-30, score-0.037]

11 This part concludes with an overview of linguistic and psycholinguistic theories of MWEs to date. [sent-33, score-0.166]

12 For MWEs to be useful for language technology, they must be recognisable automatically. [sent-34, score-0.029]

13 Proce diSnogfsia, of B thuleg5a r1iast, A Anungu aslt M4-9e t2in01g3 o. [sent-35, score-0.024]

14 We will also review token identification and disambiguation of MWEs in context (e. [sent-38, score-0.035]

15 The bus stop is here) and methods for the automatic detection of the degree of compositionality of MWEs and their interpretation. [sent-42, score-0.176]

16 Part IV concludes with a list of future possibilities and open challenges in the computational treatment of MWEs in current NLP models and techniques. [sent-45, score-0.075]

17 PART I General overview: – (a) Introduction (b) Types and examples of MWEs and collocations (c) Linguistic dimensions of MWEs: idiomaticity, syntactic and semantic fixedness, specificity, etc. [sent-47, score-0.148]

18 (d) Statistical dimensions of MWEs: variability, recurrence, association, etc. [sent-48, score-0.037]

19 PART III – Resources, tasks and applications: (a) MWEs in resources: corpora, lexica and ontologies (e. [sent-51, score-0.024]

20 Large-scale noun compound interpretation using bootstrapping and the web as a corpus. [sent-63, score-0.063]

21 An evaluation of methods for the extraction of multiword expressions. [sent-81, score-0.278]

22 Multiword expressions: A pain in the neck for NLP. [sent-94, score-0.05]

23 Introduction to the special issue on multiword expressions: Having a crack at a hard nut. [sent-109, score-0.278]

24 Validation and evaluation of automatically acquired multiword expressions for grammar engineering. [sent-113, score-0.37]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mwes', 0.809), ('multiword', 0.278), ('mwe', 0.174), ('collocations', 0.111), ('aline', 0.099), ('expressions', 0.092), ('bus', 0.091), ('ramisch', 0.086), ('villavicencio', 0.079), ('valia', 0.075), ('evert', 0.068), ('tutorial', 0.066), ('angl', 0.065), ('beans', 0.065), ('egoire', 0.065), ('idiomaticity', 0.065), ('krenn', 0.065), ('rayson', 0.065), ('spilling', 0.065), ('recurrence', 0.057), ('idiart', 0.057), ('kordoni', 0.057), ('fixedness', 0.057), ('carlos', 0.055), ('bring', 0.055), ('nicole', 0.053), ('brigitte', 0.053), ('egg', 0.05), ('variability', 0.048), ('bond', 0.047), ('idiomatic', 0.047), ('stop', 0.046), ('treatment', 0.046), ('shutova', 0.045), ('sag', 0.045), ('specificity', 0.044), ('psycholinguistic', 0.042), ('gr', 0.041), ('francis', 0.04), ('compositionality', 0.039), ('stefan', 0.038), ('recognising', 0.038), ('dimensions', 0.037), ('ik', 0.037), ('markus', 0.035), ('disambiguation', 0.035), ('part', 0.032), ('interpretation', 0.032), ('overview', 0.032), ('compound', 0.031), ('theories', 0.031), ('iv', 0.03), ('presentation', 0.03), ('concludes', 0.029), ('frog', 0.029), ('boil', 0.029), ('piao', 0.029), ('secrets', 0.029), ('apps', 0.029), ('recognisable', 0.029), ('ucs', 0.029), ('dishes', 0.029), ('bego', 0.029), ('moir', 0.029), ('villada', 0.029), ('mastered', 0.029), ('cortex', 0.029), ('pie', 0.029), ('seretan', 0.029), ('anna', 0.028), ('green', 0.027), ('paulo', 0.026), ('spill', 0.026), ('genia', 0.026), ('nsp', 0.026), ('sharoff', 0.026), ('neck', 0.026), ('roll', 0.026), ('booth', 0.025), ('nonnative', 0.025), ('practitioners', 0.025), ('serge', 0.025), ('spence', 0.025), ('colloquial', 0.025), ('marco', 0.024), ('pain', 0.024), ('lexica', 0.024), ('korkontzelos', 0.024), ('routines', 0.024), ('idioms', 0.024), ('ekaterina', 0.024), ('idiosyncratic', 0.024), ('acsiasoticoinat', 0.024), ('anungu', 0.024), ('aslt', 0.024), ('casges', 0.024), ('cpoumtaptuiotantaioln', 0.024), ('fio', 0.024), ('lainlg', 0.024), ('luinisgtuicis', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 302 acl-2013-Robust Automated Natural Language Processing with Multiword Expressions and Collocations

Author: Valia Kordoni ; Markus Egg

Abstract: unkown-abstract

2 0.13434674 186 acl-2013-Identifying English and Hungarian Light Verb Constructions: A Contrastive Approach

Author: Veronika Vincze ; Istvan Nagy T. ; Richard Farkas

Abstract: Here, we introduce a machine learningbased approach that allows us to identify light verb constructions (LVCs) in Hungarian and English free texts. We also present the results of our experiments on the SzegedParalellFX English–Hungarian parallel corpus where LVCs were manually annotated in both languages. With our approach, we were able to contrast the performance of our method and define language-specific features for these typologically different languages. Our presented method proved to be sufficiently robust as it achieved approximately the same scores on the two typologically different languages.

3 0.042227834 116 acl-2013-Detecting Metaphor by Contextual Analogy

Author: Eirini Florou

Abstract: As one of the most challenging issues in NLP, metaphor identification and its interpretation have seen many models and methods proposed. This paper presents a study on metaphor identification based on the semantic similarity between literal and non literal meanings of words that can appear at the same context.

4 0.039363276 146 acl-2013-Exploiting Social Media for Natural Language Processing: Bridging the Gap between Language-centric and Real-world Applications

Author: Simone Paolo Ponzetto ; Andrea Zielinski

Abstract: unkown-abstract

5 0.034222726 121 acl-2013-Discovering User Interactions in Ideological Discussions

Author: Arjun Mukherjee ; Bing Liu

Abstract: Online discussion forums are a popular platform for people to voice their opinions on any subject matter and to discuss or debate any issue of interest. In forums where users discuss social, political, or religious issues, there are often heated debates among users or participants. Existing research has studied mining of user stances or camps on certain issues, opposing perspectives, and contention points. In this paper, we focus on identifying the nature of interactions among user pairs. The central questions are: How does each pair of users interact with each other? Does the pair of users mostly agree or disagree? What is the lexicon that people often use to express agreement and disagreement? We present a topic model based approach to answer these questions. Since agreement and disagreement expressions are usually multiword phrases, we propose to employ a ranking method to identify highly relevant phrases prior to topic modeling. After modeling, we use the modeling results to classify the nature of interaction of each user pair. Our evaluation results using real-life discussion/debate posts demonstrate the effectiveness of the proposed techniques.

6 0.033476703 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners

7 0.028796745 238 acl-2013-Measuring semantic content in distributional vectors

8 0.027576007 347 acl-2013-The Role of Syntax in Vector Space Models of Compositional Semantics

9 0.026075684 253 acl-2013-Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts

10 0.025116535 22 acl-2013-A Structured Distributional Semantic Model for Event Co-reference

11 0.02496594 213 acl-2013-Language Acquisition and Probabilistic Models: keeping it simple

12 0.023549695 62 acl-2013-Automatic Term Ambiguity Detection

13 0.022501145 202 acl-2013-Is a 204 cm Man Tall or Small ? Acquisition of Numerical Common Sense from the Web

14 0.022126045 364 acl-2013-Typesetting for Improved Readability using Lexical and Syntactic Information

15 0.021428389 67 acl-2013-Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model

16 0.021166118 212 acl-2013-Language-Independent Discriminative Parsing of Temporal Expressions

17 0.020926012 108 acl-2013-Decipherment

18 0.020868232 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates

19 0.020753903 32 acl-2013-A relatedness benchmark to test the role of determiners in compositional distributional semantics

20 0.0204571 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.061), (1, 0.02), (2, -0.003), (3, -0.025), (4, -0.028), (5, -0.021), (6, -0.026), (7, 0.021), (8, 0.022), (9, 0.016), (10, -0.035), (11, 0.012), (12, 0.021), (13, 0.001), (14, -0.033), (15, -0.032), (16, -0.016), (17, -0.021), (18, 0.011), (19, -0.007), (20, 0.005), (21, -0.022), (22, 0.037), (23, -0.052), (24, 0.034), (25, 0.013), (26, -0.052), (27, -0.004), (28, -0.013), (29, -0.02), (30, 0.02), (31, -0.042), (32, 0.001), (33, -0.039), (34, 0.005), (35, 0.038), (36, 0.02), (37, 0.039), (38, 0.089), (39, -0.036), (40, -0.004), (41, -0.022), (42, 0.037), (43, 0.033), (44, 0.045), (45, 0.035), (46, 0.063), (47, -0.037), (48, -0.008), (49, -0.003)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.85141802 302 acl-2013-Robust Automated Natural Language Processing with Multiword Expressions and Collocations

Author: Valia Kordoni ; Markus Egg

Abstract: unkown-abstract

2 0.68845797 186 acl-2013-Identifying English and Hungarian Light Verb Constructions: A Contrastive Approach

Author: Veronika Vincze ; Istvan Nagy T. ; Richard Farkas

Abstract: Here, we introduce a machine learningbased approach that allows us to identify light verb constructions (LVCs) in Hungarian and English free texts. We also present the results of our experiments on the SzegedParalellFX English–Hungarian parallel corpus where LVCs were manually annotated in both languages. With our approach, we were able to contrast the performance of our method and define language-specific features for these typologically different languages. Our presented method proved to be sufficiently robust as it achieved approximately the same scores on the two typologically different languages.

3 0.52521449 366 acl-2013-Understanding Verbs based on Overlapping Verbs Senses

Author: Kavitha Rajan

Abstract: Natural language can be easily understood by everyone irrespective of their differences in age or region or qualification. The existence of a conceptual base that underlies all natural languages is an accepted claim as pointed out by Schank in his Conceptual Dependency (CD) theory. Inspired by the CD theory and theories in Indian grammatical tradition, we propose a new set of meaning primitives in this paper. We claim that this new set of primitives captures the meaning inherent in verbs and help in forming an inter-lingual and computable ontological classification of verbs. We have identified seven primitive overlapping verb senses which substantiate our claim. The percentage of coverage of these primitives is 100% for all verbs in Sanskrit and Hindi and 3750 verbs in English. 1

4 0.5095765 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners

Author: Lis Pereira ; Erlyn Manguilimotan ; Yuji Matsumoto

Abstract: This study addresses issues of Japanese language learning concerning word combinations (collocations). Japanese learners may be able to construct grammatically correct sentences, however, these may sound “unnatural”. In this work, we analyze correct word combinations using different collocation measures and word similarity methods. While other methods use well-formed text, our approach makes use of a large Japanese language learner corpus for generating collocation candidates, in order to build a system that is more sensitive to constructions that are difficult for learners. Our results show that we get better results compared to other methods that use only wellformed text. 1

5 0.49622491 88 acl-2013-Computational considerations of comparisons and similes

Author: Vlad Niculae ; Victoria Yaneva

Abstract: This paper presents work in progress towards automatic recognition and classification of comparisons and similes. Among possible applications, we discuss the place of this task in text simplification for readers with Autism Spectrum Disorders (ASD), who are known to have deficits in comprehending figurative language. We propose an approach to comparison recognition through the use of syntactic patterns. Keeping in mind the requirements of autistic readers, we discuss the properties relevant for distinguishing semantic criteria like figurativeness and abstractness.

6 0.47625852 116 acl-2013-Detecting Metaphor by Contextual Analogy

7 0.45482609 8 acl-2013-A Learner Corpus-based Approach to Verb Suggestion for ESL

8 0.45356688 364 acl-2013-Typesetting for Improved Readability using Lexical and Syntactic Information

9 0.4365961 238 acl-2013-Measuring semantic content in distributional vectors

10 0.43266153 344 acl-2013-The Effects of Lexical Resource Quality on Preference Violation Detection

11 0.42674285 286 acl-2013-Psycholinguistically Motivated Computational Models on the Organization and Processing of Morphologically Complex Words

12 0.41947913 190 acl-2013-Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs

13 0.4152995 161 acl-2013-Fluid Construction Grammar for Historical and Evolutionary Linguistics

14 0.41055372 122 acl-2013-Discriminative Approach to Fill-in-the-Blank Quiz Generation for Language Learners

15 0.39835688 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

16 0.39818296 61 acl-2013-Automatic Interpretation of the English Possessive

17 0.38894713 270 acl-2013-ParGramBank: The ParGram Parallel Treebank

18 0.3860887 253 acl-2013-Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts

19 0.38538429 95 acl-2013-Crawling microblogging services to gather language-classified URLs. Workflow and case study

20 0.37173638 227 acl-2013-Learning to lemmatise Polish noun phrases


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.028), (6, 0.019), (7, 0.01), (11, 0.048), (13, 0.023), (15, 0.011), (24, 0.03), (26, 0.035), (28, 0.019), (35, 0.076), (42, 0.422), (48, 0.025), (56, 0.026), (63, 0.011), (64, 0.012), (70, 0.037), (88, 0.027), (95, 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98614097 86 acl-2013-Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures

Author: Sina Zarriess ; Jonas Kuhn

Abstract: We suggest a generation task that integrates discourse-level referring expression generation and sentence-level surface realization. We present a data set of German articles annotated with deep syntax and referents, including some types of implicit referents. Our experiments compare several architectures varying the order of a set of trainable modules. The results suggest that a revision-based pipeline, with intermediate linearization, significantly outperforms standard pipelines or a parallel architecture.

2 0.98237485 372 acl-2013-Using CCG categories to improve Hindi dependency parsing

Author: Bharat Ram Ambati ; Tejaswini Deoskar ; Mark Steedman

Abstract: We show that informative lexical categories from a strongly lexicalised formalism such as Combinatory Categorial Grammar (CCG) can improve dependency parsing of Hindi, a free word order language. We first describe a novel way to obtain a CCG lexicon and treebank from an existing dependency treebank, using a CCG parser. We use the output of a supertagger trained on the CCGbank as a feature for a state-of-the-art Hindi dependency parser (Malt). Our results show that using CCG categories improves the accuracy of Malt on long distance dependencies, for which it is known to have weak rates of recovery.

3 0.9804076 125 acl-2013-Distortion Model Considering Rich Context for Statistical Machine Translation

Author: Isao Goto ; Masao Utiyama ; Eiichiro Sumita ; Akihiro Tamura ; Sadao Kurohashi

Abstract: This paper proposes new distortion models for phrase-based SMT. In decoding, a distortion model estimates the source word position to be translated next (NP) given the last translated source word position (CP). We propose a distortion model that can consider the word at the CP, a word at an NP candidate, and the context of the CP and the NP candidate simultaneously. Moreover, we propose a further improved model that considers richer context by discriminating label sequences that specify spans from the CP to NP candidates. It enables our model to learn the effect of relative word order among NP candidates as well as to learn the effect of distances from the training data. In our experiments, our model improved 2.9 BLEU points for Japanese-English and 2.6 BLEU points for Chinese-English translation compared to the lexical reordering models.

4 0.95081925 64 acl-2013-Automatically Predicting Sentence Translation Difficulty

Author: Abhijit Mishra ; Pushpak Bhattacharyya ; Michael Carl

Abstract: In this paper we introduce Translation Difficulty Index (TDI), a measure of difficulty in text translation. We first define and quantify translation difficulty in terms of TDI. We realize that any measure of TDI based on direct input by translators is fraught with subjectivity and adhocism. We, rather, rely on cognitive evidences from eye tracking. TDI is measured as the sum of fixation (gaze) and saccade (rapid eye movement) times of the eye. We then establish that TDI is correlated with three properties of the input sentence, viz. length (L), degree of polysemy (DP) and structural complexity (SC). We train a Support Vector Regression (SVR) system to predict TDIs for new sentences using these features as input. The prediction done by our framework is well correlated with the empirical gold standard data, which is a repository of < L, DP, SC > and TDI pairs for a set of sentences. The primary use of our work is a way of “binning” sentences (to be translated) in “easy”, “medium” and “hard” categories as per their predicted TDI. This can decide pricing of any translation task, especially useful in a scenario where parallel corpora for Machine Translation are built through translation crowdsourcing/outsourcing. This can also provide a way of monitoring progress of second language learners.

same-paper 5 0.94517297 302 acl-2013-Robust Automated Natural Language Processing with Multiword Expressions and Collocations

Author: Valia Kordoni ; Markus Egg

Abstract: unkown-abstract

6 0.94030309 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation

7 0.9302932 206 acl-2013-Joint Event Extraction via Structured Prediction with Global Features

8 0.91641545 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation

9 0.75256109 166 acl-2013-Generalized Reordering Rules for Improved SMT

10 0.74264419 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction

11 0.74005884 77 acl-2013-Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?

12 0.73949111 281 acl-2013-Post-Retrieval Clustering Using Third-Order Similarity Measures

13 0.71143049 38 acl-2013-Additive Neural Networks for Statistical Machine Translation

14 0.69525146 199 acl-2013-Integrating Multiple Dependency Corpora for Inducing Wide-coverage Japanese CCG Resources

15 0.69169867 69 acl-2013-Bilingual Lexical Cohesion Trigger Model for Document-Level Machine Translation

16 0.68681115 363 acl-2013-Two-Neighbor Orientation Model with Cross-Boundary Global Contexts

17 0.68450475 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

18 0.67269564 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk

19 0.66551572 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation

20 0.65687543 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing