emnlp emnlp2012 emnlp2012-51 knowledge-graph by maker-knowledge-mining

51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields


Source: pdf

Author: Bishan Yang ; Claire Cardie

Abstract: Extracting opinion expressions from text is usually formulated as a token-level sequence labeling task tackled using Conditional Random Fields (CRFs). CRFs, however, do not readily model potentially useful segment-level information like syntactic constituent structure. Thus, we propose a semi-CRF-based approach to the task that can perform sequence labeling at the segment level. We extend the original semi-CRF model (Sarawagi and Cohen, 2004) to allow the modeling of arbitrarily long expressions while accounting for their likely syntactic structure when modeling segment boundaries. We evaluate performance on two opinion extraction tasks, and, in contrast to previous sequence labeling approaches to the task, explore the usefulness of segment- level syntactic parse features. Experimental results demonstrate that our approach outperforms state-of-the-art methods for both opinion expression tasks.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract Extracting opinion expressions from text is usually formulated as a token-level sequence labeling task tackled using Conditional Random Fields (CRFs). [sent-3, score-0.753]

2 Thus, we propose a semi-CRF-based approach to the task that can perform sequence labeling at the segment level. [sent-5, score-0.631]

3 We extend the original semi-CRF model (Sarawagi and Cohen, 2004) to allow the modeling of arbitrarily long expressions while accounting for their likely syntactic structure when modeling segment boundaries. [sent-6, score-0.823]

4 We evaluate performance on two opinion extraction tasks, and, in contrast to previous sequence labeling approaches to the task, explore the usefulness of segment- level syntactic parse features. [sent-7, score-0.748]

5 Experimental results demonstrate that our approach outperforms state-of-the-art methods for both opinion expression tasks. [sent-8, score-0.631]

6 1 Introduction Accurate opinion expression identification is crucial for tasks that benefit from fine-grained opinion analysis (Wiebe et al. [sent-9, score-1.118]

7 , it is a first step in characterizing the sentiment and intensity of the opinion; it provides a textual anchor for identifying the opinion holder and the target or topic of an opinion; and these, in turn, form the basis of opinionoriented question answering and opinion summarization systems. [sent-12, score-0.998]

8 In this paper, we focus on opinion expressions as defined in Wiebe et al. [sent-13, score-0.671]

9 These include direct subjective expressions (DSEs): explicit mentions of private states or speech events expressing private states; and expressive subjective expressions (ESEs): expressions that indicate sentiment, emotion, etc. [sent-18, score-0.894]

10 As a type of information extraction task, opinion expression extraction has been successfully tackled in the past via sequence tagging methods: Choi et al. [sent-23, score-0.792]

11 Lc a2n0g1u2ag Aes Psorcoicaetsiosin fgo arn Cdo Cmopmutpauti oantiaoln Lailn Ngautiustriacls — — Our goal in this work is to extract opinion expressions at the segment level with semi-Markov conditional random fields (semi-CRFs). [sent-33, score-1.302]

12 However, to the best of our knowledge, semi-CRF techniques have not been investigated for opinion expression extraction. [sent-38, score-0.631]

13 The contribution of this paper is a semi-CRFbased approach for opinion expression extraction that leverages parsing information to provide better modeling of opinion expressions. [sent-39, score-1.216]

14 We also explore the impact of syntactic features for extracting opinion expressions. [sent-42, score-0.634]

15 We evaluate our model on two opinion extraction tasks: identifying direct subjective expressions (DSEs) and expressive subjective expressions (ESEs). [sent-43, score-1.103]

16 More recent studies tackle opinion expression extraction at the expression level. [sent-50, score-0.839]

17 Others extend the token-level approach to jointly identify opinion holders (Choi et al. [sent-53, score-0.536]

18 , 2006), and to determine the polarity and inten1336 sity of the opinion expressions (Choi and Cardie, 2010). [sent-54, score-0.671]

19 Reranking the output of a simple sequence labeler has been shown to further improve the extraction of opinion expressions (Johansson and Moschitti, 2010; Johansson and Moschitti, 2011); importantly, their reranking approach relied on features that encoded syntactic structure. [sent-55, score-0.957]

20 Semi-CRFs (Sarawagi and Cohen, 2004) are general CRFs that relax the Markovian assumptions to allow sequence labeling at the segment level. [sent-57, score-0.631]

21 The task of opinion expression extraction is known to be harder than traditional NER since subjective expressions exhibit substantial lexical variation and their recognition requires more attention to linguistic structure. [sent-60, score-0.971]

22 In opinion mining, numerous studies have shown that syntactic parsing features are very helpful for opinion analysis. [sent-62, score-1.102]

23 A lot of work uses syntactic features to identify opinion holders and opinion topics (Bethard et al. [sent-63, score-1.117]

24 (2010) recently employed dependency path features for the extraction of opinion targets. [sent-69, score-0.58]

25 Johansson and Moschitti (2010; Johansson and Moschitti (201 1) also successfully employed syntactic features that indicate dependency relations between opinion expressions for the task of opinion expression extraction. [sent-70, score-1.396]

26 3 Approach We formulate the extraction of opinion expressions as a sequence labeling problem. [sent-72, score-0.817]

27 As a result, we explore the use of semi-CRFs, which can assign labels to segments instead of tokens; hence, features can be defined at the segment level. [sent-77, score-0.689]

28 vIne tbhe p hfroalsloeKw cianng sFuobrs eexcatimonpsl,e ,w fee ftiurrste sin ltirkoed JuXce ssta an vdearrbd psehmrais-eCKR caFns and then describe our semi-CRF-based approach for opinion expression extraction. [sent-79, score-0.631]

29 , sni, where si is a triple si = (ti, ui, yi), ti hdsenotes thie, start position of segment si, ui denotes the end position, and yi denotes the label of the segment. [sent-84, score-0.681]

30 Features in semi-CRFs are defined at the segment level rather than the word level. [sent-86, score-0.549]

31 The feature function g(i, x, s) is a function of x, the current segment si, and the label yi−1 of the previous segment si−1 (we consider the usual first-order Markovian assumption). [sent-87, score-1.098]

32 The conditional probability of a segmentation s given a sequence x is defined as p(s|x) =Z(1x)exp(XiXkλkgk(i,x,s)) (1) where Z(x) =sX0exp(XiXkλkgk(i,x,s0)) and the set S contains all possible segmentations obtained from segment candidates with length ranging from 1to the maximum length L. [sent-89, score-0.759]

33 2 Semi-CRF-based Approach for Opinion Expression Extraction In this section, we present an extended version of semi-CRFs in which we can make use of parsing information in learning entity boundaries and labels for opinion expression extraction. [sent-97, score-0.716]

34 1, the maximum entity length L is fixed during training to generate segment candidates in the standard semi-CRFs. [sent-99, score-0.621]

35 In opinion expression extraction, L is unbounded since opinion expressions may be clauses or whole sentences, which can be arbitrarily long. [sent-100, score-1.327]

36 Thus, fixing an upper bound on segment length based on the observed entities may lead to an incorrect removal of segments during inference. [sent-101, score-0.66]

37 Also note that possible segment candidates are generated based on the length constraint, which means any span of the text consisting of no more than L words would be considered as a possible segment. [sent-102, score-0.597]

38 , “The Chief” in sentence (2) is an incorrect segment within the multi-word expression “The Chief Minister”. [sent-105, score-0.693]

39 More specifically, we construct segment units from the parse tree of each sentence1 , and then build up possible segment candidates based on those units. [sent-107, score-1.254]

40 In the parse tree, each leafphrase or leafword is considered to be a segment unit. [sent-108, score-0.599]

41 Each segment unit performs as the smallest unit in the model (words within a segment unit will be automatically assigned the same label). [sent-109, score-1.254]

42 The segment units are highlighted in rectangles in the parse tree example in Figure 1. [sent-110, score-0.657]

43 As the segment units are not separable, we avoid implausible seg- ments, which truncate multi-word expressions. [sent-111, score-0.634]

44 For example, “both ridiculous and”, would not be considered a possible segment in our model. [sent-112, score-0.59]

45 To generate segment candidates for the model, we consider meaningful combinations of consecutive segment units. [sent-113, score-1.146]

46 The shaded regions correspond segment groups, where represents the segment group starting from segment unit Gi Ui. [sent-123, score-1.699]

47 to we consider each segment unit to belong to a meaningful group defined by the span of its parent node. [sent-124, score-0.626]

48 Two consecutive segment units are considered to belong to the same group if the subtrees rooted in their parent nodes have the same rightmost child. [sent-125, score-0.632]

49 For example, in Figure 1, segment units “are” and “both ridiculous and odd” belong to the same group, while “I” and “found” belong to different groups. [sent-126, score-0.698]

50 Algorithm 1 Construction of segment candidates Input: A training sentence x Output: A set of segment candidates S 1: Obtain the segment units U = (U1, . [sent-127, score-1.801]

51 , Uk+t) Ss ← sSeg g∪m s Return SS ← Following this idea, we generate possible segment candidates by Algorithm 1. [sent-136, score-0.597]

52 Starting from each segment unit Ui, we first find the rightmost segment unit Uj that belongs to the same group as Ui. [sent-137, score-1.202]

53 Then we enumerate all possible combinations of segment units Ui, . [sent-148, score-0.607]

54 , Uj) denotes twhhee segment ko b≤tained by concatenating words in the consecutive segment units Ui,. [sent-155, score-1.156]

55 This way, segment candidates are generated without constraints on length and are meaningful for learning entity boundaries. [sent-159, score-0.621]

56 Based on the generated segment candidates, the correct segmentation for each training sentence can be obtained as follows. [sent-160, score-0.61]

57 For opinion expressions that do not match any segment candidate, we break them down into smaller segments using a greedy matching process. [sent-161, score-1.331]

58 Note that here non-entities correspond EtoS segment uonteits t hinatste haedre eo fn single-word segments in the original semi-CRF model. [sent-165, score-0.66]

59 2 After obtaining the set of possible segment candidates and the correct segmentation s for each training sentence, the semi-CRF model can be trained. [sent-166, score-0.658]

60 We use the limited2There are cases where words within a segment unit have different labels. [sent-168, score-0.601]

61 In such cases, we consider each word within the segment unit as a segment. [sent-170, score-0.601]

62 Then we ha|vxe) V (j,y) =(i,mj)a∈xs:,jmya0xφ(x,i,j,y,y0)V (i − 1,y0) where φ(x,i,j,y,y0) = exp(Xkλkgk(x,i,j,y,y0)) and s:,j denotes the set of the generated segment candidates ending at position j. [sent-177, score-0.623]

63 To employ them in our model, we simply extend the feature definition to the segment level. [sent-189, score-0.549]

64 ch as the length of the segment, the position ofthe segment in the current segmentation (at the beginning or at the end), indicators for the start word and end word within the segment, and indicators for words before and after the segment. [sent-192, score-0.636]

65 However, we only found the position of the segment to be helpful for the extraction of opinion expressions, probably due to the lack of patterns in the length distribution and word choices of opinion expressions. [sent-195, score-1.613]

66 Besides the above features, we design new segment-level syntactic features to capture the syntactic patterns of opinion expressions. [sent-196, score-0.646]

67 In our task, we found that the majority of opinion expressions involve verb phrases. [sent-198, score-0.716]

68 Denote the head of VPLEAF as the predicate, and its next segment unit as the argument. [sent-202, score-0.601]

69 If a segment consists ofwords in the VP nodes visited by the preorder constituents. [sent-203, score-0.581]

70 4 3The percentages of opinion expressions involving VP/NP/PP are 64. [sent-204, score-0.671]

71 If a segment consists of a verb cluster and the argument in VPLEAF, we consider it as a VP segment. [sent-213, score-0.622]

72 VPcluster: Indicates whether or not the segment matches the verb-cluster structure. [sent-215, score-0.549]

73 For example, if “warned” is the head of VPLEAF rather than “informed”, the chance of the segment being an opinion expression increases. [sent-218, score-1.18]

74 The argument in the verb phrase (could be a noun phrase, adjectival phrase or prepositional phrase) may convey some relevant information for identifying opinion expressions. [sent-221, score-0.61]

75 VPsubj: Whether the verb clusters or the argument in the segment contains an entry from the subjectivity lexicon. [sent-222, score-0.68]

76 For example, the word “negative” is in the lexicon, so the segment “take a negative stand” has a feature ISVPSUBJ. [sent-223, score-0.549]

77 We focus on the task of extracting two types of opinion expressions: direct subjective expressions (DSEs) and expressive subjective expressions (ESEs). [sent-228, score-1.092]

78 E69315s0 Table 1: Statistics of opinion expressions in the MPQA Corpus. [sent-239, score-0.671]

79 F-measure is computed as Because the boundaries of opinion expressions are hard to define even for human annotators (Wiebe et al. [sent-243, score-0.698]

80 m47612e3∗asure Table 2: Results for extracting opinion expressions with Binary-Overlap metric. [sent-272, score-0.724]

81 Results of new-semi-CRF that are statistically significantly than semi-CRF according to a two-tailed t-test are indicated with ∗(p < results are also shown for new-semi-CRF(w/ Table 3: Results for extracting opinion expressions < 0. [sent-274, score-0.724]

82 Segment-CRF treats segment units obtained from the parser as word tokens. [sent-282, score-0.607]

83 For example, in Figure 1, the segment units the statement and both ridiculous and odd will be treated as word tokens. [sent-283, score-0.648]

84 We consider the VP-related segment features introduced in Section 3. [sent-285, score-0.578]

85 To the best of our knowledge, our work is the first to explore the use of semi-CRFs on the extraction of opinion expressions. [sent-289, score-0.551]

86 For segment features, we used the same features as in our approach (see Section 3. [sent-293, score-0.578]

87 However, adding segment-level Table 4: Effect of syntactic features on extracting opinion expressions with Binary-Overlap metric syntactic features into standard CRF yields slightly reduced performance. [sent-304, score-0.94]

88 The promising F-measure results obtained by semi-CRF and new-semi-CRF confirm that relaxing the Markovian assumption on segments leads to better modeling of opinion expressions. [sent-307, score-0.598]

89 This indicates that syntactic information does not help if learning and inference take place on segment candidates generated without accounting for parse information. [sent-315, score-0.712]

90 In Table 5, we compare our results to the previous work on opinion expression extraction (here we also focus on the Binary Overlap metric due to the similar trend demonstrated by the Proportional Overlap metric). [sent-327, score-0.751]

91 m6415e732asur Table 5: Comparison of our work with previous work on opinion expression extraction using the Binary-Overlap metric 4. [sent-345, score-0.723]

92 × By comparing the extraction results across different methods, we see that full parsing provides many benefits for modeling segment boundaries and improving the prediction precision for opinion expression extraction. [sent-360, score-1.305]

93 And we also found many cases where the original semi-CRF cannot extract the opinion expressions while our approach can. [sent-368, score-0.671]

94 5 Conclusion In this paper we propose a semi-CRF-based approach for extracting opinion expressions that takes into account during learning and inference the structural information available from syntactic parsing. [sent-374, score-0.789]

95 Our approach allows opinion expressions to be identified at the segment level and their boundaries to be influenced by their probable syntactic structure. [sent-375, score-1.312]

96 Experimental evaluations show that our model outperforms the best existing approaches on two opinion extraction tasks. [sent-376, score-0.551]

97 Also, we will apply our model to additional opinion analysis tasks such as fine-grained opinion summarization and relation extraction. [sent-380, score-0.974]

98 Extracting opinion propositions and opinion holders using syntactic and lexical cues. [sent-389, score-1.088]

99 Joint extraction of entities and relations for opinion recognition. [sent-402, score-0.551]

100 Extracting opinion targets in a single- and cross-domain setting with conditional random fields. [sent-413, score-0.523]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('segment', 0.549), ('opinion', 0.487), ('expressions', 0.184), ('dse', 0.175), ('crf', 0.161), ('expression', 0.144), ('dses', 0.143), ('ese', 0.132), ('breck', 0.115), ('segments', 0.111), ('ui', 0.106), ('eses', 0.095), ('vpleaf', 0.095), ('subjective', 0.092), ('wiebe', 0.091), ('crfs', 0.085), ('johansson', 0.081), ('sarawagi', 0.081), ('private', 0.079), ('vparg', 0.079), ('choi', 0.079), ('moschitti', 0.069), ('uj', 0.068), ('syntactic', 0.065), ('claire', 0.065), ('extraction', 0.064), ('vppre', 0.064), ('vpsubj', 0.064), ('segmentation', 0.061), ('units', 0.058), ('subjectivity', 0.058), ('reranking', 0.054), ('extracting', 0.053), ('unit', 0.052), ('janyce', 0.051), ('yejin', 0.051), ('vp', 0.05), ('parse', 0.05), ('okanohara', 0.049), ('mpqa', 0.049), ('holders', 0.049), ('syn', 0.049), ('chief', 0.049), ('markovian', 0.049), ('labeling', 0.049), ('candidates', 0.048), ('kgk', 0.048), ('fields', 0.046), ('cardie', 0.046), ('verb', 0.045), ('labeler', 0.041), ('ridiculous', 0.041), ('semicrf', 0.041), ('wilson', 0.041), ('conditional', 0.036), ('cohen', 0.035), ('opinions', 0.035), ('parsing', 0.034), ('sequence', 0.033), ('overlap', 0.032), ('segmentations', 0.032), ('eat', 0.032), ('commongroup', 0.032), ('fbeo', 0.032), ('oses', 0.032), ('preorder', 0.032), ('reared', 0.032), ('semicrfs', 0.032), ('traversal', 0.032), ('vpcluster', 0.032), ('vproot', 0.032), ('xixk', 0.032), ('theresa', 0.03), ('features', 0.029), ('metric', 0.028), ('trend', 0.028), ('argument', 0.028), ('corne', 0.027), ('interannotator', 0.027), ('kobayashi', 0.027), ('bethard', 0.027), ('implausible', 0.027), ('minister', 0.027), ('boundaries', 0.027), ('proportional', 0.026), ('position', 0.026), ('arbitrarily', 0.025), ('ner', 0.025), ('phrase', 0.025), ('belong', 0.025), ('tsochantaridis', 0.025), ('thh', 0.025), ('munson', 0.025), ('cornell', 0.025), ('emotions', 0.025), ('riloff', 0.025), ('stand', 0.025), ('entity', 0.024), ('sentiment', 0.024), ('framenet', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields

Author: Bishan Yang ; Claire Cardie

Abstract: Extracting opinion expressions from text is usually formulated as a token-level sequence labeling task tackled using Conditional Random Fields (CRFs). CRFs, however, do not readily model potentially useful segment-level information like syntactic constituent structure. Thus, we propose a semi-CRF-based approach to the task that can perform sequence labeling at the segment level. We extend the original semi-CRF model (Sarawagi and Cohen, 2004) to allow the modeling of arbitrarily long expressions while accounting for their likely syntactic structure when modeling segment boundaries. We evaluate performance on two opinion extraction tasks, and, in contrast to previous sequence labeling approaches to the task, explore the usefulness of segment- level syntactic parse features. Experimental results demonstrate that our approach outperforms state-of-the-art methods for both opinion expression tasks.

2 0.42927575 101 emnlp-2012-Opinion Target Extraction Using Word-Based Translation Model

Author: Kang Liu ; Liheng Xu ; Jun Zhao

Abstract: This paper proposes a novel approach to extract opinion targets based on wordbased translation model (WTM). At first, we apply WTM in a monolingual scenario to mine the associations between opinion targets and opinion words. Then, a graphbased algorithm is exploited to extract opinion targets, where candidate opinion relevance estimated from the mined associations, is incorporated with candidate importance to generate a global measure. By using WTM, our method can capture opinion relations more precisely, especially for long-span relations. In particular, compared with previous syntax-based methods, our method can effectively avoid noises from parsing errors when dealing with informal texts in large Web corpora. By using graph-based algorithm, opinion targets are extracted in a global process, which can effectively alleviate the problem of error propagation in traditional bootstrap-based methods, such as Double Propagation. The experimental results on three real world datasets in different sizes and languages show that our approach is more effective and robust than state-of-art methods. 1

3 0.17918116 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

Author: Jianxing Yu ; Zheng-Jun Zha ; Tat-Seng Chua

Abstract: This paper proposes to generate appropriate answers for opinion questions about products by exploiting the hierarchical organization of consumer reviews. The hierarchy organizes product aspects as nodes following their parent-child relations. For each aspect, the reviews and corresponding opinions on this aspect are stored. We develop a new framework for opinion Questions Answering, which enables accurate question analysis and effective answer generation by making use the hierarchy. In particular, we first identify the (explicit/implicit) product aspects asked in the questions and their sub-aspects by referring to the hierarchy. We then retrieve the corresponding review fragments relevant to the aspects from the hierarchy. In order to gener- ate appropriate answers from the review fragments, we develop a multi-criteria optimization approach for answer generation by simultaneously taking into account review salience, coherence, diversity, and parent-child relations among the aspects. We conduct evaluations on 11 popular products in four domains. The evaluated corpus contains 70,359 consumer reviews and 220 questions on these products. Experimental results demonstrate the effectiveness of our approach.

4 0.11227107 90 emnlp-2012-Modelling Sequential Text with an Adaptive Topic Model

Author: Lan Du ; Wray Buntine ; Huidong Jin

Abstract: Topic models are increasingly being used for text analysis tasks, often times replacing earlier semantic techniques such as latent semantic analysis. In this paper, we develop a novel adaptive topic model with the ability to adapt topics from both the previous segment and the parent document. For this proposed model, a Gibbs sampler is developed for doing posterior inference. Experimental results show that with topic adaptation, our model significantly improves over existing approaches in terms of perplexity, and is able to uncover clear sequential structure on, for example, Herman Melville’s book “Moby Dick”.

5 0.094431877 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

Author: Lizhen Qu ; Rainer Gemulla ; Gerhard Weikum

Abstract: We propose the weakly supervised MultiExperts Model (MEM) for analyzing the semantic orientation of opinions expressed in natural language reviews. In contrast to most prior work, MEM predicts both opinion polarity and opinion strength at the level of individual sentences; such fine-grained analysis helps to understand better why users like or dislike the entity under review. A key challenge in this setting is that it is hard to obtain sentence-level training data for both polarity and strength. For this reason, MEM is weakly supervised: It starts with potentially noisy indicators obtained from coarse-grained training data (i.e., document-level ratings), a small set of diverse base predictors, and, if available, small amounts of fine-grained training data. We integrate these noisy indicators into a unified probabilistic framework using ideas from ensemble learning and graph-based semi-supervised learning. Our experiments indicate that MEM outperforms state-of-the-art methods by a significant margin.

6 0.082941413 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing

7 0.081205651 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes

8 0.074966066 28 emnlp-2012-Collocation Polarity Disambiguation Using Web-based Pseudo Contexts

9 0.068774067 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants

10 0.061807945 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis

11 0.056944985 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling

12 0.055715062 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output

13 0.053534549 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures

14 0.052375831 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking

15 0.051941894 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

16 0.049099222 21 emnlp-2012-Assessment of ESL Learners' Syntactic Competence Based on Similarity Measures

17 0.048664022 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns

18 0.045940414 112 emnlp-2012-Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge

19 0.045354761 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction

20 0.04363098 12 emnlp-2012-A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.193), (1, 0.063), (2, 0.063), (3, 0.252), (4, 0.203), (5, -0.122), (6, -0.233), (7, -0.093), (8, -0.242), (9, -0.035), (10, -0.033), (11, 0.042), (12, 0.089), (13, -0.027), (14, -0.114), (15, 0.098), (16, -0.015), (17, -0.485), (18, -0.022), (19, 0.003), (20, 0.017), (21, -0.05), (22, -0.138), (23, -0.163), (24, -0.081), (25, 0.024), (26, -0.11), (27, -0.036), (28, -0.102), (29, -0.032), (30, 0.018), (31, 0.007), (32, -0.004), (33, 0.014), (34, -0.016), (35, -0.05), (36, 0.018), (37, 0.022), (38, 0.021), (39, 0.059), (40, -0.074), (41, 0.005), (42, 0.02), (43, 0.047), (44, 0.012), (45, -0.024), (46, -0.064), (47, -0.068), (48, 0.002), (49, -0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97399992 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields

Author: Bishan Yang ; Claire Cardie

Abstract: Extracting opinion expressions from text is usually formulated as a token-level sequence labeling task tackled using Conditional Random Fields (CRFs). CRFs, however, do not readily model potentially useful segment-level information like syntactic constituent structure. Thus, we propose a semi-CRF-based approach to the task that can perform sequence labeling at the segment level. We extend the original semi-CRF model (Sarawagi and Cohen, 2004) to allow the modeling of arbitrarily long expressions while accounting for their likely syntactic structure when modeling segment boundaries. We evaluate performance on two opinion extraction tasks, and, in contrast to previous sequence labeling approaches to the task, explore the usefulness of segment- level syntactic parse features. Experimental results demonstrate that our approach outperforms state-of-the-art methods for both opinion expression tasks.

2 0.94485843 101 emnlp-2012-Opinion Target Extraction Using Word-Based Translation Model

Author: Kang Liu ; Liheng Xu ; Jun Zhao

Abstract: This paper proposes a novel approach to extract opinion targets based on wordbased translation model (WTM). At first, we apply WTM in a monolingual scenario to mine the associations between opinion targets and opinion words. Then, a graphbased algorithm is exploited to extract opinion targets, where candidate opinion relevance estimated from the mined associations, is incorporated with candidate importance to generate a global measure. By using WTM, our method can capture opinion relations more precisely, especially for long-span relations. In particular, compared with previous syntax-based methods, our method can effectively avoid noises from parsing errors when dealing with informal texts in large Web corpora. By using graph-based algorithm, opinion targets are extracted in a global process, which can effectively alleviate the problem of error propagation in traditional bootstrap-based methods, such as Double Propagation. The experimental results on three real world datasets in different sizes and languages show that our approach is more effective and robust than state-of-art methods. 1

3 0.44108513 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

Author: Jianxing Yu ; Zheng-Jun Zha ; Tat-Seng Chua

Abstract: This paper proposes to generate appropriate answers for opinion questions about products by exploiting the hierarchical organization of consumer reviews. The hierarchy organizes product aspects as nodes following their parent-child relations. For each aspect, the reviews and corresponding opinions on this aspect are stored. We develop a new framework for opinion Questions Answering, which enables accurate question analysis and effective answer generation by making use the hierarchy. In particular, we first identify the (explicit/implicit) product aspects asked in the questions and their sub-aspects by referring to the hierarchy. We then retrieve the corresponding review fragments relevant to the aspects from the hierarchy. In order to gener- ate appropriate answers from the review fragments, we develop a multi-criteria optimization approach for answer generation by simultaneously taking into account review salience, coherence, diversity, and parent-child relations among the aspects. We conduct evaluations on 11 popular products in four domains. The evaluated corpus contains 70,359 consumer reviews and 220 questions on these products. Experimental results demonstrate the effectiveness of our approach.

4 0.23858105 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking

Author: Junsheng Zhou ; Weiguang Qu ; Fen Zhang

Abstract: Most existing systems solved the phrase chunking task with the sequence labeling approaches, in which the chunk candidates cannot be treated as a whole during parsing process so that the chunk-level features cannot be exploited in a natural way. In this paper, we formulate phrase chunking as a joint segmentation and labeling task. We propose an efficient dynamic programming algorithm with pruning for decoding, which allows the direct use of the features describing the internal characteristics of chunk and the features capturing the correlations between adjacent chunks. A relaxed, online maximum margin training algorithm is used for learning. Within this framework, we explored a variety of effective feature representations for Chinese phrase chunking. The experimental results show that the use of chunk-level features can lead to significant performance improvement, and that our approach achieves state-of-the-art performance. In particular, our approach is much better at recognizing long and complicated phrases. 1

5 0.2272462 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

Author: Lizhen Qu ; Rainer Gemulla ; Gerhard Weikum

Abstract: We propose the weakly supervised MultiExperts Model (MEM) for analyzing the semantic orientation of opinions expressed in natural language reviews. In contrast to most prior work, MEM predicts both opinion polarity and opinion strength at the level of individual sentences; such fine-grained analysis helps to understand better why users like or dislike the entity under review. A key challenge in this setting is that it is hard to obtain sentence-level training data for both polarity and strength. For this reason, MEM is weakly supervised: It starts with potentially noisy indicators obtained from coarse-grained training data (i.e., document-level ratings), a small set of diverse base predictors, and, if available, small amounts of fine-grained training data. We integrate these noisy indicators into a unified probabilistic framework using ideas from ensemble learning and graph-based semi-supervised learning. Our experiments indicate that MEM outperforms state-of-the-art methods by a significant margin.

6 0.20557246 9 emnlp-2012-A Sequence Labelling Approach to Quote Attribution

7 0.20264968 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis

8 0.18857616 90 emnlp-2012-Modelling Sequential Text with an Adaptive Topic Model

9 0.18727434 55 emnlp-2012-Forest Reranking through Subtree Ranking

10 0.16456865 26 emnlp-2012-Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings

11 0.16206269 100 emnlp-2012-Open Language Learning for Information Extraction

12 0.15893179 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling

13 0.15866317 21 emnlp-2012-Assessment of ESL Learners' Syntactic Competence Based on Similarity Measures

14 0.15806669 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants

15 0.15753964 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing

16 0.15193827 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns

17 0.15127175 120 emnlp-2012-Streaming Analysis of Discourse Participants

18 0.14832529 28 emnlp-2012-Collocation Polarity Disambiguation Using Web-based Pseudo Contexts

19 0.14733636 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output

20 0.14461409 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.024), (16, 0.044), (25, 0.016), (29, 0.017), (34, 0.054), (45, 0.015), (59, 0.284), (60, 0.082), (63, 0.053), (64, 0.024), (65, 0.024), (70, 0.016), (73, 0.021), (74, 0.087), (76, 0.08), (80, 0.02), (86, 0.033), (95, 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77610928 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields

Author: Bishan Yang ; Claire Cardie

Abstract: Extracting opinion expressions from text is usually formulated as a token-level sequence labeling task tackled using Conditional Random Fields (CRFs). CRFs, however, do not readily model potentially useful segment-level information like syntactic constituent structure. Thus, we propose a semi-CRF-based approach to the task that can perform sequence labeling at the segment level. We extend the original semi-CRF model (Sarawagi and Cohen, 2004) to allow the modeling of arbitrarily long expressions while accounting for their likely syntactic structure when modeling segment boundaries. We evaluate performance on two opinion extraction tasks, and, in contrast to previous sequence labeling approaches to the task, explore the usefulness of segment- level syntactic parse features. Experimental results demonstrate that our approach outperforms state-of-the-art methods for both opinion expression tasks.

2 0.70104027 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games

Author: Jordan Boyd-Graber ; Brianna Satinoff ; He He ; Hal Daume III

Abstract: Cost-sensitive classification, where thefeatures used in machine learning tasks have a cost, has been explored as a means of balancing knowledge against the expense of incrementally obtaining new features. We introduce a setting where humans engage in classification with incrementally revealed features: the collegiate trivia circuit. By providing the community with a web-based system to practice, we collected tens of thousands of implicit word-by-word ratings of how useful features are for eliciting correct answers. Observing humans’ classification process, we improve the performance of a state-of-the art classifier. We also use the dataset to evaluate a system to compete in the incremental classification task through a reduction of reinforcement learning to classification. Our system learns when to answer a question, performing better than baselines and most human players.

3 0.49975634 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

Author: Jayant Krishnamurthy ; Tom Mitchell

Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.

4 0.48433867 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky

Abstract: We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers.

5 0.47432396 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars

Author: Kewei Tu ; Vasant Honavar

Abstract: We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. The resulting family of algorithms includes the expectation-maximization algorithm (EM) and its variant, Viterbi EM, as well as a so-called softmax-EM algorithm. The softmax-EM algorithm can be implemented with a simple and computationally efficient extension to standard EM. In our experiments of unsupervised dependency grammar learn- ing, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art.

6 0.47391891 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis

7 0.47276697 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT

8 0.47252789 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

9 0.46587199 122 emnlp-2012-Syntactic Surprisal Affects Spoken Word Duration in Conversational Contexts

10 0.46457091 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction

11 0.4638972 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon

12 0.46125838 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

13 0.45780286 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

14 0.45712256 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns

15 0.45563704 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction

16 0.45457527 120 emnlp-2012-Streaming Analysis of Discourse Participants

17 0.45227739 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling

18 0.45159391 81 emnlp-2012-Learning to Map into a Universal POS Tagset

19 0.44956174 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis

20 0.44889346 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints