acl acl2010 acl2010-63 knowledge-graph by maker-knowledge-mining

63 acl-2010-Comparable Entity Mining from Comparative Questions


Source: pdf

Author: Shasha Li ; Chin-Yew Lin ; Young-In Song ; Zhoujun Li

Abstract: Comparing one thing with another is a typical part of human decision making process. However, it is not always easy to know what to compare and what are the alternatives. To address this difficulty, we present a novel way to automatically mine comparable entities from comparative questions that users posted online. To ensure high precision and high recall, we develop a weakly-supervised bootstrapping method for comparative question identification and comparable entity extraction by leveraging a large online question archive. The experimental results show our method achieves F1measure of 82.5% in comparative question identification and 83.3% in comparable entity extraction. Both significantly outperform an existing state-of-the-art method. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 However, it is not always easy to know what to compare and what are the alternatives. [sent-7, score-0.028]

2 To address this difficulty, we present a novel way to automatically mine comparable entities from comparative questions that users posted online. [sent-8, score-1.135]

3 To ensure high precision and high recall, we develop a weakly-supervised bootstrapping method for comparative question identification and comparable entity extraction by leveraging a large online question archive. [sent-9, score-1.186]

4 The experimental results show our method achieves F1measure of 82. [sent-10, score-0.027]

5 For example, if someone is interested in certain products such as digital cameras, he or she would want to know what the alternatives are and compare different cameras before making a purchase. [sent-15, score-0.203]

6 This type of comparison activity is very common in our daily life but requires high knowledge skill. [sent-16, score-0.059]

7 Magazines such as Consumer Reports and PC Magazine and online media such as CNet. [sent-17, score-0.03]

8 com strive in providing editorial comparison content and surveys to satisfy this need. [sent-18, score-0.078]

9 In the World Wide Web era, a comparison activity typically involves: search for relevant web pages containing information about the targeted products, find competing products, read reviews, and identify pros and cons. [sent-19, score-0.059]

10 In this paper, we focus on finding a set of comparable entities given a user‟s input entity. [sent-20, score-0.176]

11 For example, given an entity, Nokia N95 (a cellphone), we want to find comparable entities such as Nokia N82, iPhone and so on. [sent-21, score-0.205]

12 In general, it is difficult to decide if two entities are comparable or not since people do com- pare apples and oranges for various reasons. [sent-22, score-0.282]

13 For example, “Ford” and “BMW” might be comparable as “car manufacturers” or as “market segments that their products are targeting”, but we rarely see people comparing “Ford Focus” (car model) and “BMW 328i”. [sent-23, score-0.188]

14 Things also get more complicated when an entity has several functionalities. [sent-24, score-0.055]

15 For example, one might compare “iPhone” and “PSP” as “portable game player” while compare “iPhone” and “Nokia N95” as “mobile phone”. [sent-25, score-0.085]

16 Fortunately, plenty of comparative questions are posted online, which provide evidences for what people want to compare, e. [sent-26, score-0.908]

17 In this paper, we define comparative questions and comparators as:   Comparative question: A question that intends to compare two or more entities and it has to mention these entities explicitly in the question. [sent-31, score-1.23]

18 Comparator: An entity which is a target of comparison in a comparative question. [sent-32, score-0.698]

19 According to these definitions, Q1 and Q2 be- low are not comparative questions while Q3 is. [sent-33, score-0.709]

20 ” The goal of this work is mining comparators from comparative questions. [sent-38, score-0.914]

21 The results would be very useful in helping users‟ exploration of 650 Proce dinUgsp osfa tlhae, 4S8wthed Aen n,u 1a1l-1 M6e Jeutilnyg 2 o0f1 t0h. [sent-39, score-0.036]

22 c As2s0o1c0ia Atisosnoc foiart Cionom fopru Ctaotmiopnuatla Lti on gaulis Lti cnsg,u piasgtiecs 650–658, alternative choices by suggesting comparable entities based on other users‟ prior requests. [sent-41, score-0.202]

23 To mine comparators from comparative questions, we first have to detect whether a question is comparative or not. [sent-42, score-1.574]

24 According to our definition, a comparative question has to be a question with intent to compare at least two entities. [sent-43, score-0.814]

25 Please note that a question containing at least two entities is not a comparative question if it does not have comparison intent. [sent-44, score-0.878]

26 However, we observe that a question is very likely to be a comparative question if it contains at least two entities. [sent-45, score-0.762]

27 We leverage this insight and develop a weakly supervised bootstrapping method to identify comparative questions and extract comparators simultaneously. [sent-46, score-1.137]

28 To our best knowledge, this is the first attempt to specially address the problem on finding good comparators to support users‟ comparison activity. [sent-47, score-0.298]

29 We are also the first to propose using comparative questions posted online that reflect what users truly care about as the medium from which we mine comparable entities. [sent-48, score-1.052]

30 8% in end-to-end comparative question identification and comparator extraction which outperform the most relevant state-of-the-art method by Jindal & Liu (2006b) significantly. [sent-52, score-1.127]

31 Section 3 presents our weakly-supervised method for comparator mining. [sent-55, score-0.31]

32 1 Related Work Overview In terms of discovering related items for an entity, our work is similar to the research on recommender systems, which recommend items to a user. [sent-58, score-0.189]

33 Recommender systems mainly rely on similarities between items and/or their statistical correlations in user log data (Linden et al. [sent-59, score-0.078]

34 For example, Amazon recommends products to its customers based on their own purchase histories, similar customers‟ purchase histories, and similarity between products. [sent-61, score-0.26]

35 However, recommending an item is not equivalent to finding a comparable item. [sent-62, score-0.158]

36 In the case of Amazon, the purpose of recommendation is to entice their customers to add more items to their shopping carts by suggesting similar or related items. [sent-63, score-0.21]

37 While in the case of comparison, we would like to help users explore alternatives, i. [sent-64, score-0.083]

38 For example, it is reasonable to recommend “iPod speaker” or “iPod batteries” if a user is interested in “iPod”, but we would not compare them with “iPod”. [sent-67, score-0.065]

39 However, items that are comparable with “iPod” such as “iPhone” or “PSP” which were found in comparative questions posted by users are difficult to be predicted simply based on item similarity between them. [sent-68, score-1.066]

40 Although they are all music players, “iPhone” is mainly a mobile phone, and “PSP” is mainly a portable game device. [sent-69, score-0.159]

41 They are similar but also different therefore beg comparison with each other. [sent-70, score-0.029]

42 It is clear that comparator mining and item recommendation are related but not the same. [sent-71, score-0.449]

43 Our work on comparator mining is related to the research on entity and relation extraction in information extraction (Cardie, 1997; Califf and Mooney, 1999; Soderland, 1999; Radev et al. [sent-72, score-0.504]

44 Specifically, the most relevant work is by Jindal and Liu (2006a and 2006b) on mining comparative sentences and relations. [sent-75, score-0.694]

45 Their methods applied class sequential rules (CSR) (Chapter 2, Liu 2006) and label sequential rules (LSR) (Chapter 2, Liu 2006) learned from annotated corpora to identify comparative sentences and extract comparative relations respectively in the news and review domains. [sent-76, score-1.401]

46 The same techniques can be applied to comparative question identification and comparator mining from questions. [sent-77, score-1.112]

47 However, ensuring high recall is crucial in our intended application scenario where users can issue arbitrary queries. [sent-79, score-0.122]

48 To address this problem, we develop a weakly-supervised bootstrapping pattern learning method by effectively leveraging unlabeled questions. [sent-80, score-0.172]

49 Bootstrapping methods have been shown to be very effective in previous information extraction research (Riloff, 1996; Riloff and Jones, 1999; Ravichandran and Hovy, 2002; Mooney and Bunescu, 2005; Kozareva et al. [sent-81, score-0.043]

50 Our work is similar to them in terms of methodology using bootstrapping technique to extract entities with a specific relation. [sent-83, score-0.147]

51 However, our task is different from theirs in that it requires not only extracting entities (comparator extraction) but also ensuring that the entities are extracted from comparative questions (comparative question identification), which is generally not required in IE task. [sent-84, score-0.996]

52 2 Jindal & Liu 2006 In this subsection, we provide a brief summary of the comparative mining method proposed by Jindal and Liu (2006a and 2006b), which is used as baseline for comparison and represents the state-of-the-art in this area. [sent-86, score-0.75]

53 We first introduce the definition of CSR and LSR rule used in their approach, and then describe their comparative mining method. [sent-87, score-0.694]

54 In our problem, C is either comparative or non-comparative. [sent-95, score-0.614]

55 Given a collection of sequences with class information, every CSR is associated to two parameters: support and confidence. [sent-96, score-0.097]

56 Support is the proportion of sequences in the collection containing S as a subsequence. [sent-97, score-0.051]

57 Confidence is the proportion of sequences labeled as C in the sequences containing the S. [sent-98, score-0.102]

58 ) in the input sequence with a designated label (? [sent-117, score-0.07]

59 The anchor in the input sequence could be extracted if its corresponding label in the labeled sequence is what we want (in our case, a comparator). [sent-121, score-0.133]

60 LSRs are also mined from an annotated corpus, therefore each LSR also have two parameters: support and confidence. [sent-122, score-0.023]

61 … … … … Supervised Comparative Mining Method J&L; treated comparative sentence identification as a classification problem and comparative relation extraction as an information extraction problem. [sent-124, score-1.375]

62 They first manually created a set of 83 keywords such as beat, exceed, and outperform that are likely indicators of comparative sentences. [sent-125, score-0.707]

63 These keywords were then used as pivots to create part-of-speech (POS) sequence data. [sent-126, score-0.152]

64 comparative or non-comparative, was used to create sequences and CSRs were mined. [sent-129, score-0.665]

65 The classifier was then used to identify comparative sentences. [sent-131, score-0.614]

66 Given a set of comparative sentences, J&L; manually annotated two comparators with labels $ES 1 and $ES2 and the feature compared with label $FT for each sentence. [sent-132, score-0.87]

67 J&L;‟s method was only applied to noun and pronoun. [sent-133, score-0.027]

68 To differentiate noun and pronoun that are not comparators or features, they added the fourth label $NEF, i. [sent-134, score-0.256]

69 These labels were used as pivots together with special tokens li & rj1 (token position), #start (beginning of a sentence), and #end (end of a sentence) to generate sequence data, sequences with single label only and minimum support greater than 1% are retained, and then LSRs were created. [sent-137, score-0.194]

70 J&L;‟s method have been proved effective in their experimental setups. [sent-139, score-0.027]

71 However, it has the following weaknesses:  The performance of J&L;‟s method relies heavily on a set of comparative sentence indicative keywords. [sent-140, score-0.641]

72 These keywords were   manually created and they offered no guidelines to select keywords for inclusion. [sent-141, score-0.136]

73 Users can express comparative sentences or questions in many different ways. [sent-143, score-0.709]

74 It is a surprise that their rules achieved high precision but low recall. [sent-147, score-0.023]

75 However, we suspect that their rules might be too specific and overfit their small training set (about 2,600 sentences). [sent-149, score-0.023]

76 We would like to increase recall, avoid overfitting, and allow rules to include discriminative lexical tokens to retain precision. [sent-150, score-0.023]

77 In the next section, we introduce our method to address these shortcomings. [sent-151, score-0.053]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('comparative', 0.614), ('comparator', 0.283), ('ipod', 0.283), ('comparators', 0.22), ('iphone', 0.193), ('jindal', 0.188), ('lsrs', 0.188), ('csr', 0.165), ('lsr', 0.157), ('csrs', 0.126), ('questions', 0.095), ('psp', 0.094), ('posted', 0.089), ('comparable', 0.089), ('entities', 0.087), ('users', 0.083), ('mining', 0.08), ('nokia', 0.076), ('liu', 0.075), ('question', 0.074), ('products', 0.071), ('keywords', 0.068), ('customers', 0.067), ('bmw', 0.063), ('zune', 0.063), ('weakly', 0.062), ('identification', 0.061), ('bootstrapping', 0.06), ('entity', 0.055), ('histories', 0.055), ('mine', 0.052), ('items', 0.051), ('sequences', 0.051), ('pivots', 0.05), ('recommender', 0.05), ('token', 0.048), ('purchase', 0.047), ('cameras', 0.045), ('touch', 0.045), ('item', 0.045), ('extraction', 0.043), ('china', 0.041), ('pivot', 0.041), ('recommendation', 0.041), ('ford', 0.039), ('ensuring', 0.039), ('portable', 0.039), ('phone', 0.037), ('mobile', 0.037), ('recommend', 0.037), ('hd', 0.036), ('helping', 0.036), ('mooney', 0.036), ('label', 0.036), ('sequence', 0.034), ('sequential', 0.034), ('beijing', 0.032), ('car', 0.032), ('riloff', 0.031), ('supervised', 0.03), ('amazon', 0.03), ('activity', 0.03), ('leveraging', 0.03), ('online', 0.03), ('alternatives', 0.03), ('want', 0.029), ('game', 0.029), ('develop', 0.029), ('comparison', 0.029), ('marks', 0.029), ('compare', 0.028), ('evidences', 0.028), ('rj', 0.028), ('recommends', 0.028), ('batteries', 0.028), ('oranges', 0.028), ('people', 0.028), ('method', 0.027), ('mainly', 0.027), ('ft', 0.026), ('address', 0.026), ('suggesting', 0.026), ('outperform', 0.025), ('strive', 0.025), ('apples', 0.025), ('califf', 0.025), ('intends', 0.025), ('manufacturers', 0.025), ('pare', 0.025), ('plenty', 0.025), ('shopping', 0.025), ('end', 0.025), ('shasha', 0.024), ('recommending', 0.024), ('camera', 0.024), ('editorial', 0.024), ('intent', 0.024), ('support', 0.023), ('class', 0.023), ('rules', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 63 acl-2010-Comparable Entity Mining from Comparative Questions

Author: Shasha Li ; Chin-Yew Lin ; Young-In Song ; Zhoujun Li

Abstract: Comparing one thing with another is a typical part of human decision making process. However, it is not always easy to know what to compare and what are the alternatives. To address this difficulty, we present a novel way to automatically mine comparable entities from comparative questions that users posted online. To ensure high precision and high recall, we develop a weakly-supervised bootstrapping method for comparative question identification and comparable entity extraction by leveraging a large online question archive. The experimental results show our method achieves F1measure of 82.5% in comparative question identification and 83.3% in comparable entity extraction. Both significantly outperform an existing state-of-the-art method. 1

2 0.083039403 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices

Author: Taniya Mishra ; Srinivas Bangalore

Abstract: The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.

3 0.062138781 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion

Author: Xiao-Li Li ; Lei Zhang ; Bing Liu ; See-Kiong Ng

Abstract: Distributional similarity is a classic technique for entity set expansion, where the system is given a set of seed entities of a particular class, and is asked to expand the set using a corpus to obtain more entities of the same class as represented by the seeds. This paper shows that a machine learning model called positive and unlabeled learning (PU learning) can model the set expansion problem better. Based on the test results of 10 corpora, we show that a PU learning technique outperformed distributional similarity significantly. 1

4 0.055357084 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

Author: Ruihong Huang ; Ellen Riloff

Abstract: This research explores the idea of inducing domain-specific semantic class taggers using only a domain-specific text collection and seed words. The learning process begins by inducing a classifier that only has access to contextual features, forcing it to generalize beyond the seeds. The contextual classifier then labels new instances, to expand and diversify the training set. Next, a cross-category bootstrapping process simultaneously trains a suite of classifiers for multiple semantic classes. The positive instances for one class are used as negative instances for the others in an iterative bootstrapping cycle. We also explore a one-semantic-class-per-discourse heuristic, and use the classifiers to dynam- ically create semantic features. We evaluate our approach by inducing six semantic taggers from a collection of veterinary medicine message board posts.

5 0.053314447 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

Author: Peng Li ; Jing Jiang ; Yinglin Wang

Abstract: In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles. This kind of summary templates can be useful in various applications. We first develop an entity-aspect LDA model to simultaneously cluster both sentences and words into aspects. We then apply frequent subtree pattern mining on the dependency parse trees of the clustered and labeled sentences to discover sentence patterns that well represent the aspects. Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in. We apply our method on five Wikipedia entity categories and compare our method with two baseline methods. Both quantitative evaluation based on human judgment and qualitative comparison demonstrate the effectiveness and advantages of our method.

6 0.052531801 28 acl-2010-An Entity-Level Approach to Information Extraction

7 0.051284201 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach

8 0.046928875 204 acl-2010-Recommendation in Internet Forums and Blogs

9 0.042622656 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

10 0.042162284 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering

11 0.040033691 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web

12 0.039649226 122 acl-2010-Generating Fine-Grained Reviews of Songs from Album Reviews

13 0.039632268 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning

14 0.038711876 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries

15 0.038003121 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications

16 0.036826771 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

17 0.036810067 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree

18 0.03603876 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities

19 0.033332318 85 acl-2010-Detecting Experiences from Weblogs

20 0.033321768 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.105), (1, 0.043), (2, -0.04), (3, -0.003), (4, -0.009), (5, -0.017), (6, 0.001), (7, -0.013), (8, -0.015), (9, 0.005), (10, -0.018), (11, 0.015), (12, -0.025), (13, -0.122), (14, 0.037), (15, 0.074), (16, -0.009), (17, -0.014), (18, 0.001), (19, 0.009), (20, 0.034), (21, -0.01), (22, 0.055), (23, 0.012), (24, -0.021), (25, 0.032), (26, -0.042), (27, -0.012), (28, -0.022), (29, 0.002), (30, 0.052), (31, 0.041), (32, 0.003), (33, 0.005), (34, -0.004), (35, 0.007), (36, -0.048), (37, 0.076), (38, 0.114), (39, 0.017), (40, 0.002), (41, 0.017), (42, -0.057), (43, -0.001), (44, -0.032), (45, -0.115), (46, 0.08), (47, -0.018), (48, 0.077), (49, 0.065)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93360138 63 acl-2010-Comparable Entity Mining from Comparative Questions

Author: Shasha Li ; Chin-Yew Lin ; Young-In Song ; Zhoujun Li

Abstract: Comparing one thing with another is a typical part of human decision making process. However, it is not always easy to know what to compare and what are the alternatives. To address this difficulty, we present a novel way to automatically mine comparable entities from comparative questions that users posted online. To ensure high precision and high recall, we develop a weakly-supervised bootstrapping method for comparative question identification and comparable entity extraction by leveraging a large online question archive. The experimental results show our method achieves F1measure of 82.5% in comparative question identification and 83.3% in comparable entity extraction. Both significantly outperform an existing state-of-the-art method. 1

2 0.60625505 122 acl-2010-Generating Fine-Grained Reviews of Songs from Album Reviews

Author: Swati Tata ; Barbara Di Eugenio

Abstract: Music Recommendation Systems often recommend individual songs, as opposed to entire albums. The challenge is to generate reviews for each song, since only full album reviews are available on-line. We developed a summarizer that combines information extraction and generation techniques to produce summaries of reviews of individual songs. We present an intrinsic evaluation of the extraction components, and of the informativeness of the summaries; and a user study of the impact of the song review summaries on users’ decision making processes. Users were able to make quicker and more informed decisions when presented with the summary as compared to the full album review.

3 0.58942437 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study

Author: Yun-Cheng Ju ; Tim Paek

Abstract: Speech recognition affords automobile drivers a hands-free, eyes-free method of replying to Short Message Service (SMS) text messages. Although a voice search approach based on template matching has been shown to be more robust to the challenging acoustic environment of automobiles than using dictation, users may have difficulties verifying whether SMS response templates match their intended meaning, especially while driving. Using a high-fidelity driving simulator, we compared dictation for SMS replies versus voice search in increasingly difficult driving conditions. Although the two approaches did not differ in terms of driving performance measures, users made about six times more errors on average using dictation than voice search. 1

4 0.52364916 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives

Author: Marie-Catherine de Marneffe ; Christopher D. Manning ; Christopher Potts

Abstract: Texts and dialogues often express information indirectly. For instance, speakers’ answers to yes/no questions do not always straightforwardly convey a ‘yes’ or ‘no’ answer. The intended reply is clear in some cases (Was it good? It was great!) but uncertain in others (Was it acceptable? It was unprecedented.). In this paper, we present methods for interpreting the answers to questions like these which involve scalar modifiers. We show how to ground scalar modifier meaning based on data collected from the Web. We learn scales between modifiers and infer the extent to which a given answer conveys ‘yes’ or ‘no’ . To evaluate the methods, we collected examples of question–answer pairs involving scalar modifiers from CNN transcripts and the Dialog Act corpus and use response distributions from Mechanical Turk workers to assess the degree to which each answer conveys ‘yes’ or ‘no’ . Our experimental results closely match the Turkers’ response data, demonstrating that meanings can be learned from Web data and that such meanings can drive pragmatic inference.

5 0.52053559 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web

Author: Dmitry Davidov ; Ari Rappoport

Abstract: We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet similarity information. First, we obtain from the Web and WordNet a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement.

6 0.51787698 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices

7 0.48976254 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion

8 0.44961137 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities

9 0.43791991 111 acl-2010-Extracting Sequences from the Web

10 0.43662247 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach

11 0.43382546 189 acl-2010-Optimizing Question Answering Accuracy by Maximizing Log-Likelihood

12 0.42720938 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

13 0.41422668 92 acl-2010-Don't 'Have a Clue'? Unsupervised Co-Learning of Downward-Entailing Operators.

14 0.407217 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering

15 0.40321243 28 acl-2010-An Entity-Level Approach to Information Extraction

16 0.40044504 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications

17 0.3986291 138 acl-2010-Hunting for the Black Swan: Risk Mining from Text

18 0.39043438 204 acl-2010-Recommendation in Internet Forums and Blogs

19 0.38521245 157 acl-2010-Last but Definitely Not Least: On the Role of the Last Sentence in Automatic Polarity-Classification

20 0.37602994 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(23, 0.369), (25, 0.069), (42, 0.035), (44, 0.016), (59, 0.082), (72, 0.02), (73, 0.033), (78, 0.048), (80, 0.01), (83, 0.084), (84, 0.023), (98, 0.1)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.71955979 63 acl-2010-Comparable Entity Mining from Comparative Questions

Author: Shasha Li ; Chin-Yew Lin ; Young-In Song ; Zhoujun Li

Abstract: Comparing one thing with another is a typical part of human decision making process. However, it is not always easy to know what to compare and what are the alternatives. To address this difficulty, we present a novel way to automatically mine comparable entities from comparative questions that users posted online. To ensure high precision and high recall, we develop a weakly-supervised bootstrapping method for comparative question identification and comparable entity extraction by leveraging a large online question archive. The experimental results show our method achieves F1measure of 82.5% in comparative question identification and 83.3% in comparable entity extraction. Both significantly outperform an existing state-of-the-art method. 1

2 0.66059113 122 acl-2010-Generating Fine-Grained Reviews of Songs from Album Reviews

Author: Swati Tata ; Barbara Di Eugenio

Abstract: Music Recommendation Systems often recommend individual songs, as opposed to entire albums. The challenge is to generate reviews for each song, since only full album reviews are available on-line. We developed a summarizer that combines information extraction and generation techniques to produce summaries of reviews of individual songs. We present an intrinsic evaluation of the extraction components, and of the informativeness of the summaries; and a user study of the impact of the song review summaries on users’ decision making processes. Users were able to make quicker and more informed decisions when presented with the summary as compared to the full album review.

3 0.65975231 107 acl-2010-Exemplar-Based Models for Word Meaning in Context

Author: Katrin Erk ; Sebastian Pado

Abstract: This paper describes ongoing work on distributional models for word meaning in context. We abandon the usual one-vectorper-word paradigm in favor of an exemplar model that activates only relevant occurrences. On a paraphrasing task, we find that a simple exemplar model outperforms more complex state-of-the-art models.

4 0.45144409 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

Author: Fei Huang ; Alexander Yates

Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.

5 0.43084151 71 acl-2010-Convolution Kernel over Packed Parse Forest

Author: Min Zhang ; Hui Zhang ; Haizhou Li

Abstract: This paper proposes a convolution forest kernel to effectively explore rich structured features embedded in a packed parse forest. As opposed to the convolution tree kernel, the proposed forest kernel does not have to commit to a single best parse tree, is thus able to explore very large object spaces and much more structured features embedded in a forest. This makes the proposed kernel more robust against parsing errors and data sparseness issues than the convolution tree kernel. The paper presents the formal definition of convolution forest kernel and also illustrates the computing algorithm to fast compute the proposed convolution forest kernel. Experimental results on two NLP applications, relation extraction and semantic role labeling, show that the proposed forest kernel significantly outperforms the baseline of the convolution tree kernel. 1

6 0.43081346 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

7 0.43064606 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese

8 0.43008712 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification

9 0.42979586 169 acl-2010-Learning to Translate with Source and Target Syntax

10 0.42897654 158 acl-2010-Latent Variable Models of Selectional Preference

11 0.42642707 248 acl-2010-Unsupervised Ontology Induction from Text

12 0.42615807 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

13 0.42515659 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

14 0.42487878 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

15 0.42451948 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

16 0.42363101 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

17 0.42340955 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

18 0.4232057 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning

19 0.42319369 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web

20 0.42219967 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons