acl acl2010 acl2010-85 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Keun Chan Park ; Yoonjae Jeong ; Sung Hyon Myaeng
Abstract: Weblogs are a source of human activity knowledge comprising valuable information such as facts, opinions and personal experiences. In this paper, we propose a method for mining personal experiences from a large set of weblogs. We define experience as knowledge embedded in a collection of activities or events which an individual or group has actually undergone. Based on an observation that experience-revealing sentences have a certain linguistic style, we formulate the problem of detecting experience as a classification task using various features including tense, mood, aspect, modality, experiencer, and verb classes. We also present an activity verb lexicon construction method based on theories of lexical semantics. Our results demonstrate that the activity verb lexicon plays a pivotal role among selected features in the classification perfor- , mance and shows that our proposed method outperforms the baseline significantly.
Reference: text
sentIndex sentText sentNum sentScore
1 In this paper, we propose a method for mining personal experiences from a large set of weblogs. [sent-2, score-0.477]
2 We define experience as knowledge embedded in a collection of activities or events which an individual or group has actually undergone. [sent-3, score-0.443]
3 Based on an observation that experience-revealing sentences have a certain linguistic style, we formulate the problem of detecting experience as a classification task using various features including tense, mood, aspect, modality, experiencer, and verb classes. [sent-4, score-0.798]
4 We also present an activity verb lexicon construction method based on theories of lexical semantics. [sent-5, score-0.608]
5 Our results demonstrate that the activity verb lexicon plays a pivotal role among selected features in the classification perfor- , mance and shows that our proposed method outperforms the baseline significantly. [sent-6, score-0.704]
6 Reasoning allows us to draw a conclusion based on evidence, but people tend to believe it firmly when they experience or observe it in the physical world. [sent-8, score-0.382]
7 Despite the fact that direct experiences play a crucial role in making a firm decision and solving a problem, people often resort to indirect experiences by reading written materials or asking around. [sent-9, score-0.751]
8 While Web documents contain various types of information including facts, encyclopedic knowledge, opinions, and experiences in general, myaeng } @ kai s t . [sent-11, score-0.394]
9 kr personal experiences tend to be found in weblogs more often than other web documents like news articles, home pages, and scientific papers. [sent-13, score-0.532]
10 Mined experiences can be of practical use in wide application areas. [sent-17, score-0.357]
11 For example, a collection of experiences from the people who visited a resort area would help planning what to do and how to do things correctly without having to spend time sifting through a variety of resources or rely on commercially-oriented sources. [sent-18, score-0.394]
12 Therefore attributes such as location, time, and activity and their relations must be extracted by devising a method for selecting experiencecontaining sentences based on verbs that have a particular linguistics case frame or belong to a “do” class (Kurashima et al. [sent-21, score-0.506]
13 None of the sentences contain actual experiences because hypotheses, questions, and orders have not actually happened in the real world. [sent-27, score-0.394]
14 For experience mining, it is important to ensure a sentence mentions an event or passes a factuality test to contain experience (Inui et al. [sent-28, score-0.862]
15 In this paper, we focus on the problem of detecting experiences from weblogs. [sent-30, score-0.357]
16 mVtendlrpbuieakcnli,oadtsgkE(wnaxiozepmahwxi,ocarutwpmebslra p)esilkz,esv the problem as a classification task using various linguistic features including tense, mood, aspect, modality, experiencer, and verb classes. [sent-34, score-0.379]
17 Another issue addressed in this paper is automatic construction of a lexicon for verbs related to activities and events. [sent-38, score-0.364]
18 While there have been well-known studies about classifying verbs based on aspectual features (Vendler, 1967), thematic roles and selectional restrictions (Fillmore, 1968; Somers, 1987; Kipper et al. [sent-39, score-0.474]
19 We introduce a method for constructing an activity/event verb lexicon based on Vendler’s theory and statistics obtained by utilizing a web search engine. [sent-43, score-0.361]
20 We define experience as knowledge embedded in a collection of activities or events which an individual or group has actually It can be subjective as in opinions as well as objective, but our focus in this article lies in objective knowledge. [sent-44, score-0.48]
21 Section 3 describes the experience detection method, including experimental setup, evaluation, and results. [sent-59, score-0.443]
22 2 Lexicon Construction Since our definition of experience is based on activities and events, it is critical to determine whether a sentence contains a predicate describ- ing an activity or an event. [sent-61, score-0.685]
23 To this end, it is quite conceivable that a lexicon containing activity / event verbs would play a key role. [sent-62, score-0.535]
24 Given that our ultimate goal is to extract experiences from a large amount of weblogs, we opt for increased coverage by automatically constructing a lexicon rather than high precision obtainable by manually crafted lexicon. [sent-63, score-0.462]
25 Based on the theory of Vendler (1967), we classify a given verb or a verb phrase into one of the two categories: activity and state. [sent-64, score-0.674]
26 We consider all the verbs and verb phrases in WordNet (Fellbaum, 1998) which is the largest electronic lexical database. [sent-65, score-0.369]
27 In addition to the linguistic schemata features based on Vendler’s theory, we used thematic role features and an external knowledge feature. [sent-66, score-0.416]
28 1 Background Vendler (1967) proposes that verb meanings can be categorized into four basic classes, states, activities, achievements, and accomplishments, depending on interactions between the verbs and their aspectual and temporal modifiers. [sent-68, score-0.416]
29 * The schemata are not perfect because verbs can shift classes due to various contextual factors such as arguments and senses. [sent-88, score-0.335]
30 On the other hand, activity and accomplishment are processes (transeunt operations) in traditional philosophy. [sent-93, score-0.305]
31 We henceforth call the first genus activity and the latter state. [sent-94, score-0.276]
32 ’ represents the verb at hand, to a search engine, we can get an estimate about how the verb is likely to belong to state. [sent-101, score-0.432]
33 Based on the query matrix in table 2, we issued queries for all the verbs and verb phrases from WordNet to a search engine. [sent-112, score-0.431]
34 The basic statistics we consider are hit count, candidate sentence count and correct sentence count which we use the notations Hij(w), Sij(w), and Cij(w), respectfully, where w is a word, ithe linguistic schema and j the verb form from the query matrix in table 2. [sent-117, score-0.7]
35 For example, the progressive schema for a verb “build” can retrieve the following sentences. [sent-121, score-0.439]
36 For each linguistic schema, we derived three features: Absolute hit ratio, Relative hit ratio and Valid ratio for which we use the notations Ai(w), × Ri(w) and Vi(w), respectfully, where w is a word and ia linguistic schema. [sent-126, score-0.548]
37 Ai( w )=∑HjHi(ij*( )w ) Ri( w )=∑∑jHNjHoSicjhe(mwe( )w ) (1) Vi( w )=∑∑jjCSiij ((ww ) ) Absolute hit ratio is computes the extent to which the target word w occurs with the i-th schema over all occurrences of the schema. [sent-129, score-0.334]
38 The denominator is the hit count of wild card “*” matching any single word with the schema pattern from Google (e. [sent-130, score-0.293]
39 Relative hit ratio computes the extent to which the target word w occurs with the i-th schema over all occurrences of the word. [sent-134, score-0.334]
40 The weight of a linguistic schema increases as the valid ratio gets high. [sent-137, score-0.254]
41 , when the query becomes too long in case of long verb phrases), we also consider additional features. [sent-142, score-0.278]
42 While our initial observation indicated that the existing lexical resources would not be sufficient for our goal, it occurred to us that the linguistic theory behind them would be worth exploring as generating additional features for categorizing verbs for the two classes. [sent-143, score-0.257]
43 The subject of a state verb is dative (D) as in [12] whereas the subject for an action verb takes the agent (A) role. [sent-146, score-0.602]
44 In addition, a verb with the in- strument (I) role tends to be an action verb. [sent-147, score-0.289]
45 Activity verbs are expected to have high frequency of agent and instrument roles than state verbs. [sent-149, score-0.276]
46 Although a verb may have more than one case frame, it is possible to determine which thematic roles used more dominantly. [sent-150, score-0.445]
47 Levin (1993) demonstrated that syntactic alternations can be the basis for groupings of verbs semantically and accord reasonably well with linguistic intuitions. [sent-154, score-0.267]
48 Verbnet provides 274 verb classes with 23 thematic roles covering 3,769 verbs based on their alternation behaviors with thematic roles annotated. [sent-155, score-0.881]
49 By the mapping, we obtained distributions of the thematic roles for 2,868 unique verbs that exist in both of the resources. [sent-161, score-0.382]
50 The authors attempted to extract actions comprising a verb and some ingredients like an object entity from the documents based on syntactic patterns and a CRF based model. [sent-166, score-0.26]
51 Since each extracted action has its probability, we can use the value as a feature for state / activity verb classification. [sent-167, score-0.531]
52 5 Classification For training, we selected 80 seed verbs from Dowty’s list (1979) which are representative verbs for each Vendler (1967) class. [sent-178, score-0.306]
53 The features we considered are a total of 42 real values: 18 from linguistic schemata, 23 thematic role distributions, and one from eHow. [sent-186, score-0.243]
54 Note that the precision and recall are macroaveraged values across the two classes, activity and state. [sent-191, score-0.242]
55 The most discriminative features were absolute ratio and relative ratio in conjunction with the force, stop, progressive, and persuade schemata, the role distribution of experiencer, and the eHow evidence. [sent-192, score-0.252]
56 Many of the verbs had zero hit counts for the for and carefully schemata. [sent-199, score-0.324]
57 We finally trained our model with the top 10 features and classified all WordNet verbs and verb phrases. [sent-201, score-0.461]
58 For actual construction of the lexicon, 11,416 verbs and verb phrases were classified into the two classes roughly equally. [sent-202, score-0.515]
59 Since many features are computed based on Web resources, rare verbs cannot be classified correctly when their hit rations are very low. [sent-211, score-0.384]
60 1468 Having converted the problem of experience detection for sentences to a classification task, we focus on the extent to which various linguistic features contribute to the performance of the binary classifier for sentences. [sent-214, score-0.643]
61 1 Linguistic features In addition to the verb class feature available in the verb lexicon constructed automatically, we used tense, mood, aspect, modality, and experiencer features. [sent-217, score-0.723]
62 Verb class: The feature comes directly from the lexicon since a verb has been classified into a state or activity verb. [sent-218, score-0.61]
63 The predicate part of the sentence to be classified for experience is looked up in the lexicon without sense disambiguation. [sent-219, score-0.534]
64 , 2003) for tense determination, but since the Penn tagset provides no future tenses, they are determined by exploiting modal verbs such as “will” and future expressions such “going to”. [sent-223, score-0.304]
65 Aspect: It defines the temporal flow of a verb in the activity or state. [sent-227, score-0.458]
66 Experiencer: A sentence can or cannot be treated as containing an experience depending on the subject or experiencer of the verb (note that this is different from the experiencer role in a case frame). [sent-236, score-0.912]
67 The first sentence is considered an experience since the subject is a person. [sent-239, score-0.414]
68 However, the second sentence with the same verb is not, be- cause the subject is a non-animate abstract concept. [sent-240, score-0.248]
69 In selecting experience-containing blog pots, we used location names such as Central Park, SOHO, Seoul and general place names such as airport, subway station, and restaurant because blog posts with some places are expected to describe experiences rather than facts or thoughts. [sent-247, score-0.7]
70 We randomly sampled 1,000 sentences4 and asked three annotators to judge whether or not individual sentences are considered containing an experience based on our definition. [sent-250, score-0.419]
71 First, the candidates that do not have an objective case (Fillmore, 1968) are eliminated because their definition of experience as “action + object”. [sent-263, score-0.382]
72 Finally, the candidate sentences including a verb that indicates a movement are eliminated because the main interest was to identify an activity in a place. [sent-266, score-0.53]
73 Although their definition of experience is somewhat different from ours (i. [sent-267, score-0.382]
74 , “action + object”), they used the method to generate candidate sentences from which various experience attributes are extracted. [sent-269, score-0.486]
75 From this perspective, the method functioned like our experience detection. [sent-270, score-0.382]
76 The authors propose a lexicon of experience expression by collecting hyponyms from a hierarchically structured dictionary. [sent-277, score-0.519]
77 We not only compared our results with the baseline in terms of precision and recall but also 5 This is based on our observation that the three annotators found their task of identifying experience sentences not difficulty, resulting in a high degree of agreements. [sent-280, score-0.419]
78 without Individual Features evaluated individual features for their importance in experience detection (classification). [sent-281, score-0.488]
79 It seems that the linguistic styles shown in experience expressions are different from each other. [sent-286, score-0.441]
80 , using the WordNet) contains more errors than our activity lexicon for activity verbs. [sent-289, score-0.589]
81 Some hyponyms of an activity verb may not be activity verbs. [sent-290, score-0.732]
82 This result is very encouraging for the automatic lexicon construction work because the lexicon plays a pivotal role in the overall performance. [sent-300, score-0.292]
83 Similar to table 5, the aspect and experience features were the least contributors as the performance drops are almost negligible. [sent-304, score-0.475]
84 While opinion mining or sentiment analysis, which can be considered an important part of experience mining, has been studied quite extensively (see Pang and Lee’s excellent survey (2008)), another sub-area, factuality analysis, begins to gain some popularity (Inui et al. [sent-306, score-0.525]
85 Very few studies have focused explicitly on extracting various entities that constitute experiences (Kurashima et al. [sent-308, score-0.357]
86 , 2009) or detecting experience-containing parts of text although many NLP research areas such as named entity recognition and verb classification are strongly related. [sent-309, score-0.275]
87 The previous work on experience detection relies on a handcrafted lexicon. [sent-310, score-0.443]
88 There have been a number of studies for verb classification (Fillmore, 1968; Vendler, 1967; Somers, 1982; Levin, 1993; Fillmore and Baker, 2001 ; Kipper et al. [sent-311, score-0.275]
89 , 2008) that are essential for construction of an activity verb lexicon, which in turn is important for experience detection. [sent-312, score-0.885]
90 Most similar to our work was done by Siegel and McKeown (2000), who attempted to categorize verbs into state or event classes based on 14 tests similar to those of Vendler’s. [sent-313, score-0.286]
91 Similarly, Zacrone and Lenci (2008) attempted to categorize verbs in Italian into the four Vendler classes using the Vendler tests by using a tagged corpus. [sent-316, score-0.251]
92 Since our work is specifically geared toward domain-independent experience detection, we attempted to maximize the coverage by using all the verbs in WordNet, as opposed to the verbs appearing in a particular domain-specific corpus (e. [sent-319, score-0.732]
93 5 Conclusion and Future Work We defined experience detection as an essential task for experience mining, which is restated as determining whether individual sentences contain experience or not. [sent-323, score-1.244]
94 Viewing the task as a classification problem, we focused on identification and examination of various linguistic features such as verb class, tense, aspect, mood, modality, and experience, all of which were computed automatically. [sent-324, score-0.379]
95 For verb classes, in par- ticular, we devised a method for classifying all the verbs and verb phrases in WordNet into the activity and state classes. [sent-325, score-0.827]
96 The experimental results show that verb and verb phrase classification method is reasonably accurate with 91% precision and 78% recall with manually constructed gold standard consisting of 80 verbs and 82% accuracy for a random sample of all the WordNet entries. [sent-326, score-0.644]
97 For experience detection, the performance was very promising, closed to 92% in precision and recall when all the features were used. [sent-327, score-0.427]
98 Among the features, the verb classes, or the lexicon we constructed, contributed the most. [sent-328, score-0.321]
99 Given that experience mining is a relatively new research area, there are many areas to explore. [sent-332, score-0.462]
100 In addition to refinements of our work, our next step is to develop a method for representing and extracting actual experiences from experience-revealing sentences. [sent-333, score-0.357]
wordName wordTfidf (topN-words)
[('experience', 0.382), ('experiences', 0.357), ('activity', 0.242), ('vendler', 0.23), ('verb', 0.216), ('kurashima', 0.188), ('verbs', 0.153), ('experiencer', 0.141), ('thematic', 0.139), ('hit', 0.139), ('schemata', 0.128), ('blog', 0.119), ('schema', 0.119), ('modality', 0.119), ('mood', 0.11), ('lexicon', 0.105), ('progressive', 0.104), ('tense', 0.099), ('weblogs', 0.095), ('verbnet', 0.094), ('roles', 0.09), ('fillmore', 0.084), ('mining', 0.08), ('kipper', 0.078), ('ratio', 0.076), ('framenet', 0.075), ('action', 0.073), ('inui', 0.071), ('participle', 0.063), ('accomplishment', 0.063), ('ehow', 0.063), ('factuality', 0.063), ('query', 0.062), ('activities', 0.061), ('detection', 0.061), ('linguistic', 0.059), ('classification', 0.059), ('wordnet', 0.058), ('persuade', 0.055), ('alternations', 0.055), ('john', 0.055), ('levin', 0.054), ('classes', 0.054), ('achievements', 0.052), ('modal', 0.052), ('baker', 0.05), ('aspect', 0.048), ('aspectual', 0.047), ('classified', 0.047), ('construction', 0.045), ('force', 0.045), ('features', 0.045), ('takeshi', 0.045), ('attempted', 0.044), ('frame', 0.042), ('awkward', 0.042), ('hij', 0.042), ('katsumi', 0.042), ('messed', 0.042), ('precursor', 0.042), ('tezuka', 0.042), ('personal', 0.04), ('web', 0.04), ('harry', 0.039), ('posts', 0.039), ('marker', 0.039), ('opinions', 0.037), ('google', 0.037), ('sentences', 0.037), ('resort', 0.037), ('pivotal', 0.037), ('myaeng', 0.037), ('siegel', 0.037), ('sung', 0.037), ('taro', 0.037), ('pang', 0.036), ('candidate', 0.035), ('event', 0.035), ('vi', 0.035), ('count', 0.035), ('past', 0.034), ('logistic', 0.034), ('facts', 0.034), ('cij', 0.034), ('achievement', 0.034), ('genus', 0.034), ('hsu', 0.034), ('saur', 0.034), ('tenses', 0.034), ('agent', 0.033), ('location', 0.032), ('subject', 0.032), ('attributes', 0.032), ('carefully', 0.032), ('park', 0.032), ('hyponyms', 0.032), ('respectfully', 0.031), ('korea', 0.031), ('loper', 0.031), ('sij', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 85 acl-2010-Detecting Experiences from Weblogs
Author: Keun Chan Park ; Yoonjae Jeong ; Sung Hyon Myaeng
Abstract: Weblogs are a source of human activity knowledge comprising valuable information such as facts, opinions and personal experiences. In this paper, we propose a method for mining personal experiences from a large set of weblogs. We define experience as knowledge embedded in a collection of activities or events which an individual or group has actually undergone. Based on an observation that experience-revealing sentences have a certain linguistic style, we formulate the problem of detecting experience as a classification task using various features including tense, mood, aspect, modality, experiencer, and verb classes. We also present an activity verb lexicon construction method based on theories of lexical semantics. Our results demonstrate that the activity verb lexicon plays a pivotal role among selected features in the classification perfor- , mance and shows that our proposed method outperforms the baseline significantly.
2 0.16441378 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
Author: Galina Tremper
Abstract: Presupposition relations between verbs are not very well covered in existing lexical semantic resources. We propose a weakly supervised algorithm for learning presupposition relations between verbs that distinguishes five semantic relations: presupposition, entailment, temporal inclusion, antonymy and other/no relation. We start with a number of seed verb pairs selected manually for each semantic relation and classify unseen verb pairs. Our algorithm achieves an overall accuracy of 36% for type-based classification.
3 0.11785912 123 acl-2010-Generating Focused Topic-Specific Sentiment Lexicons
Author: Valentin Jijkoun ; Maarten de Rijke ; Wouter Weerkamp
Abstract: We present a method for automatically generating focused and accurate topicspecific subjectivity lexicons from a general purpose polarity lexicon that allow users to pin-point subjective on-topic information in a set of relevant documents. We motivate the need for such lexicons in the field of media analysis, describe a bootstrapping method for generating a topic-specific lexicon from a general purpose polarity lexicon, and evaluate the quality of the generated lexicons both manually and using a TREC Blog track test set for opinionated blog post retrieval. Although the generated lexicons can be an order of magnitude more selective than the general purpose lexicon, they maintain, or even improve, the performance of an opin- ion retrieval system.
4 0.11773949 108 acl-2010-Expanding Verb Coverage in Cyc with VerbNet
Author: Clifton McFate
Abstract: A robust dictionary of semantic frames is an essential element of natural language understanding systems that use ontologies. However, creating lexical resources that accurately capture semantic representations en masse is a persistent problem. Where the sheer amount of content makes hand creation inefficient, computerized approaches often suffer from over generality and difficulty with sense disambiguation. This paper describes a semi-automatic method to create verb semantic frames in the Cyc ontology by converting the information contained in VerbNet into a Cyc usable format. This method captures the differences in meaning between types of verbs, and uses existing connections between WordNet, VerbNet, and Cyc to specify distinctions between individual verbs when available. This method provides 27,909 frames to OpenCyc which currently has none and can be used to extend ResearchCyc as well. We show that these frames lead to a 20% increase in sample sentences parsed over the Research Cyc verb lexicon. 1
5 0.089740917 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining
Author: Peng Li ; Jing Jiang ; Yinglin Wang
Abstract: In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles. This kind of summary templates can be useful in various applications. We first develop an entity-aspect LDA model to simultaneously cluster both sentences and words into aspects. We then apply frequent subtree pattern mining on the dependency parse trees of the clustered and labeled sentences to discover sentence patterns that well represent the aspects. Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in. We apply our method on five Wikipedia entity categories and compare our method with two baseline methods. Both quantitative evaluation based on human judgment and qualitative comparison demonstrate the effectiveness and advantages of our method.
6 0.089710973 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
7 0.07891763 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
8 0.078443706 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features
9 0.07808584 216 acl-2010-Starting from Scratch in Semantic Role Labeling
10 0.073194817 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
11 0.072668321 59 acl-2010-Cognitively Plausible Models of Human Language Processing
12 0.072645158 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
13 0.071936809 176 acl-2010-Mood Patterns and Affective Lexicon Access in Weblogs
14 0.06962987 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval
15 0.069590881 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar
16 0.068923041 238 acl-2010-Towards Open-Domain Semantic Role Labeling
17 0.067844525 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse
18 0.066143684 141 acl-2010-Identifying Text Polarity Using Random Walks
19 0.066004619 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning
20 0.065709174 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs
topicId topicWeight
[(0, -0.215), (1, 0.123), (2, -0.03), (3, 0.041), (4, 0.009), (5, -0.006), (6, -0.011), (7, 0.044), (8, 0.011), (9, -0.019), (10, 0.015), (11, 0.045), (12, -0.051), (13, 0.008), (14, 0.049), (15, 0.052), (16, 0.073), (17, 0.098), (18, 0.054), (19, 0.054), (20, 0.062), (21, -0.022), (22, 0.041), (23, -0.022), (24, 0.075), (25, -0.023), (26, -0.007), (27, 0.043), (28, 0.028), (29, -0.033), (30, 0.1), (31, -0.013), (32, 0.155), (33, 0.018), (34, 0.084), (35, -0.076), (36, -0.086), (37, 0.118), (38, 0.074), (39, -0.141), (40, -0.017), (41, 0.019), (42, -0.025), (43, -0.081), (44, 0.008), (45, -0.046), (46, -0.181), (47, -0.019), (48, 0.02), (49, -0.172)]
simIndex simValue paperId paperTitle
same-paper 1 0.94560927 85 acl-2010-Detecting Experiences from Weblogs
Author: Keun Chan Park ; Yoonjae Jeong ; Sung Hyon Myaeng
Abstract: Weblogs are a source of human activity knowledge comprising valuable information such as facts, opinions and personal experiences. In this paper, we propose a method for mining personal experiences from a large set of weblogs. We define experience as knowledge embedded in a collection of activities or events which an individual or group has actually undergone. Based on an observation that experience-revealing sentences have a certain linguistic style, we formulate the problem of detecting experience as a classification task using various features including tense, mood, aspect, modality, experiencer, and verb classes. We also present an activity verb lexicon construction method based on theories of lexical semantics. Our results demonstrate that the activity verb lexicon plays a pivotal role among selected features in the classification perfor- , mance and shows that our proposed method outperforms the baseline significantly.
2 0.83087587 108 acl-2010-Expanding Verb Coverage in Cyc with VerbNet
Author: Clifton McFate
Abstract: A robust dictionary of semantic frames is an essential element of natural language understanding systems that use ontologies. However, creating lexical resources that accurately capture semantic representations en masse is a persistent problem. Where the sheer amount of content makes hand creation inefficient, computerized approaches often suffer from over generality and difficulty with sense disambiguation. This paper describes a semi-automatic method to create verb semantic frames in the Cyc ontology by converting the information contained in VerbNet into a Cyc usable format. This method captures the differences in meaning between types of verbs, and uses existing connections between WordNet, VerbNet, and Cyc to specify distinctions between individual verbs when available. This method provides 27,909 frames to OpenCyc which currently has none and can be used to extend ResearchCyc as well. We show that these frames lead to a 20% increase in sample sentences parsed over the Research Cyc verb lexicon. 1
3 0.69397378 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
Author: Galina Tremper
Abstract: Presupposition relations between verbs are not very well covered in existing lexical semantic resources. We propose a weakly supervised algorithm for learning presupposition relations between verbs that distinguishes five semantic relations: presupposition, entailment, temporal inclusion, antonymy and other/no relation. We start with a number of seed verb pairs selected manually for each semantic relation and classify unseen verb pairs. Our algorithm achieves an overall accuracy of 36% for type-based classification.
4 0.63895434 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs
Author: Barbara McGillivray
Abstract: We present a system that automatically induces Selectional Preferences (SPs) for Latin verbs from two treebanks by using Latin WordNet. Our method overcomes some of the problems connected with data sparseness and the small size of the input corpora. We also suggest a way to evaluate the acquired SPs on unseen events extracted from other Latin corpora.
5 0.56846792 126 acl-2010-GernEdiT - The GermaNet Editing Tool
Author: Verena Henrich ; Erhard Hinrichs
Abstract: GernEdiT (short for: GermaNet Editing Tool) offers a graphical interface for the lexicographers and developers of GermaNet to access and modify the underlying GermaNet resource. GermaNet is a lexical-semantic wordnet that is modeled after the Princeton WordNet for English. The traditional lexicographic development of GermaNet was error prone and time-consuming, mainly due to a complex underlying data format and no opportunity of automatic consistency checks. GernEdiT replaces the earlier development by a more userfriendly tool, which facilitates automatic checking of internal consistency and correctness of the linguistic resource. This paper pre- sents all these core functionalities of GernEdiT along with details about its usage and usability. 1
6 0.51676059 176 acl-2010-Mood Patterns and Affective Lexicon Access in Weblogs
7 0.50318575 139 acl-2010-Identifying Generic Noun Phrases
8 0.47720948 225 acl-2010-Temporal Information Processing of a New Language: Fast Porting with Minimal Resources
9 0.45925856 256 acl-2010-Vocabulary Choice as an Indicator of Perspective
10 0.45282352 141 acl-2010-Identifying Text Polarity Using Random Walks
11 0.43726593 111 acl-2010-Extracting Sequences from the Web
12 0.43164939 121 acl-2010-Generating Entailment Rules from FrameNet
13 0.43049675 35 acl-2010-Automated Planning for Situated Natural Language Generation
14 0.42399007 165 acl-2010-Learning Script Knowledge with Web Experiments
15 0.41949123 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features
16 0.41869026 92 acl-2010-Don't 'Have a Clue'? Unsupervised Co-Learning of Downward-Entailing Operators.
17 0.41853794 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
18 0.4164187 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction
19 0.41334197 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
20 0.41238514 238 acl-2010-Towards Open-Domain Semantic Role Labeling
topicId topicWeight
[(7, 0.01), (14, 0.018), (25, 0.061), (39, 0.02), (42, 0.051), (44, 0.014), (59, 0.114), (64, 0.278), (73, 0.074), (76, 0.014), (78, 0.027), (80, 0.013), (83, 0.1), (84, 0.028), (98, 0.094)]
simIndex simValue paperId paperTitle
same-paper 1 0.78483278 85 acl-2010-Detecting Experiences from Weblogs
Author: Keun Chan Park ; Yoonjae Jeong ; Sung Hyon Myaeng
Abstract: Weblogs are a source of human activity knowledge comprising valuable information such as facts, opinions and personal experiences. In this paper, we propose a method for mining personal experiences from a large set of weblogs. We define experience as knowledge embedded in a collection of activities or events which an individual or group has actually undergone. Based on an observation that experience-revealing sentences have a certain linguistic style, we formulate the problem of detecting experience as a classification task using various features including tense, mood, aspect, modality, experiencer, and verb classes. We also present an activity verb lexicon construction method based on theories of lexical semantics. Our results demonstrate that the activity verb lexicon plays a pivotal role among selected features in the classification perfor- , mance and shows that our proposed method outperforms the baseline significantly.
2 0.67735082 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery
Author: Omri Abend ; Roi Reichart ; Ari Rappoport
Abstract: We present a novel fully unsupervised algorithm for POS induction from plain text, motivated by the cognitive notion of prototypes. The algorithm first identifies landmark clusters of words, serving as the cores of the induced POS categories. The rest of the words are subsequently mapped to these clusters. We utilize morphological and distributional representations computed in a fully unsupervised manner. We evaluate our algorithm on English and German, achieving the best reported results for this task.
3 0.59035563 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval
Author: Binyang Li ; Lanjun Zhou ; Shi Feng ; Kam-Fai Wong
Abstract: There is a growing research interest in opinion retrieval as on-line users’ opinions are becoming more and more popular in business, social networks, etc. Practically speaking, the goal of opinion retrieval is to retrieve documents, which entail opinions or comments, relevant to a target subject specified by the user’s query. A fundamental challenge in opinion retrieval is information representation. Existing research focuses on document-based approaches and documents are represented by bag-of-word. However, due to loss of contextual information, this representation fails to capture the associative information between an opinion and its corresponding target. It cannot distinguish different degrees of a sentiment word when associated with different targets. This in turn seriously affects opinion retrieval performance. In this paper, we propose a sentence-based approach based on a new information representa- , tion, namely topic-sentiment word pair, to capture intra-sentence contextual information between an opinion and its target. Additionally, we consider inter-sentence information to capture the relationships among the opinions on the same topic. Finally, the two types of information are combined in a unified graph-based model, which can effectively rank the documents. Compared with existing approaches, experimental results on the COAE08 dataset showed that our graph-based model achieved significant improvement. 1
4 0.57744795 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years
Author: Vincent Ng
Abstract: The research focus of computational coreference resolution has exhibited a shift from heuristic approaches to machine learning approaches in the past decade. This paper surveys the major milestones in supervised coreference research since its inception fifteen years ago.
5 0.575252 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews
Author: Niklas Jakob ; Iryna Gurevych
Abstract: unkown-abstract
6 0.57335299 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People
7 0.57235783 158 acl-2010-Latent Variable Models of Selectional Preference
8 0.56940174 214 acl-2010-Sparsity in Dependency Grammar Induction
9 0.56906891 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes
10 0.56865597 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
11 0.56754249 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
12 0.56729281 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
13 0.56669915 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields
14 0.56665319 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
15 0.56663644 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
16 0.56603277 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
17 0.56586885 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
18 0.56555212 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
19 0.56552994 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection
20 0.56545782 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns