emnlp emnlp2011 emnlp2011-88 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Anna Kazantseva ; Stan Szpakowicz
Abstract: This paper presents a new algorithm for linear text segmentation. It is an adaptation of Affinity Propagation, a state-of-the-art clustering algorithm in the framework of factor graphs. Affinity Propagation for Segmentation, or APS, receives a set of pairwise similarities between data points and produces segment boundaries and segment centres data points which best describe all other data points within the segment. APS iteratively passes messages in a cyclic factor graph, until convergence. Each iteration works with information on all available similarities, resulting in highquality results. APS scales linearly for realistic segmentation tasks. We derive the algorithm from the original Affinity Propagation formu– lation, and evaluate its performance on topical text segmentation in comparison with two state-of-the art segmenters. The results suggest that APS performs on par with or outperforms these two very competitive baselines.
Reference: text
sentIndex sentText sentNum sentScore
1 It is an adaptation of Affinity Propagation, a state-of-the-art clustering algorithm in the framework of factor graphs. [sent-4, score-0.206]
2 Affinity Propagation for Segmentation, or APS, receives a set of pairwise similarities between data points and produces segment boundaries and segment centres data points which best describe all other data points within the segment. [sent-5, score-1.052]
3 APS iteratively passes messages in a cyclic factor graph, until convergence. [sent-6, score-0.361]
4 We derive the algorithm from the original Affinity Propagation formu– lation, and evaluate its performance on topical text segmentation in comparison with two state-of-the art segmenters. [sent-9, score-0.288]
5 Topical text segmentation identifies the more noticeable topic shifts. [sent-13, score-0.226]
6 A topical segmenter’s output is a very simple picture of the document’s structure. [sent-14, score-0.101]
7 That is why improved quality of text segmentation can benefit other language-processing tasks. [sent-16, score-0.19]
8 ca We present Affinity Propagation for Segmentation (APS), an adaptation of a state-of-the-art clustering algorithm, Affinity Propagation (Frey and Dueck, 2007; Givoni and Frey, 2009). [sent-19, score-0.082]
9 1 The original AP algorithm considerably improved exemplarbased clustering both in terms of speed and the quality of solutions. [sent-20, score-0.106]
10 At its core, APS is suitable for segmenting any sequences of data, but we present it in the context of segmenting documents. [sent-22, score-0.05]
11 APS takes as input a matrix of pairwise similarities between sentences and, for each sentence, a preference value which indicates an a priori belief in how likely a sentence is to be chosen as a segment centre. [sent-23, score-0.541]
12 APS outputs segment assignments and segment centres data points which best explain all other points in a segment. [sent-24, score-0.793]
13 The algorithm attempts to maximize net similarity the sum of similarities between all data points and their respective segment centres. [sent-25, score-0.707]
14 APS operates by iteratively passing messages in – – a factor graph (Kschischang, Frey, and Loeliger, 2001) until a good set of segments emerges. [sent-26, score-0.467]
15 Each iteration considers all similarities takes into account all available information. [sent-27, score-0.142]
16 For the majority of realistic segmentation tasks, however, the upper bound is O(MN) messages, where M is a constant. [sent-29, score-0.214]
17 This is more computationally expensive than the requirements of locally informed segmentation algorithms such as those based on HMM or CRF (see Section 2), but for a globallyinformed algorithm the requirements are very reasonable. [sent-30, score-0.367]
18 APS is an instance of loopy-belief propagation (belief propagation on cyclic graphs) which has – 1An implementation of APS in Java, and the data sets, can be downloaded at hwww . [sent-31, score-0.375]
19 Theoretically, such algorithms are not guaranteed to converge or to maximize the objective function. [sent-38, score-0.119]
20 The desired number of segments can be set by adjusting preferences. [sent-41, score-0.038]
21 We evaluate the performance of APS on three tasks: finding topical boundaries in transcripts of course lectures (Malioutov and Barzilay, 2006), identifying sections in medical textbooks (Eisenstein and Barzilay, 2008) and identifying chapter breaks in novels. [sent-42, score-0.118]
22 We compare APS with two recent systems: the Minimum Cut segmenter (Malioutov and Barzilay, 2006) and the Bayesian segmenter (Eisenstein and Barzilay, 2008). [sent-43, score-0.104]
23 Section 2 of the paper outlines relevant research on topical text segmentation. [sent-46, score-0.109]
24 Section 3 briefly covers the framework of factor graphs and outlines the original Affinity Propagation algorithm for clustering. [sent-47, score-0.251]
25 Section 4 contains the derivation of the new update messages for APSeg. [sent-48, score-0.198]
26 2 Related Work This sections discusses selected text segmentation methods and positions the proposed APS algorithm in that context. [sent-50, score-0.237]
27 Most research on automatic text segmentation revolves around a simple idea: when the topic shifts, so does the vocabulary (Youmans, 1991). [sent-51, score-0.226]
28 We can roughly subdivide existing approaches into two categories: locally informed and globally informed. [sent-52, score-0.143]
29 Locally informed segmenters attempt to identify topic shifts by considering only a small portion of complete document. [sent-53, score-0.193]
30 Other examples include text 285 segmentation using Hidden Markov Models (Blei and Moreno, 2001) or Conditional Random Fields (Lafferty, McCallum, and Pereira, 2001). [sent-57, score-0.19]
31 Locally informed methods are often very efficient because of lean memory and CPU time requirements. [sent-58, score-0.066]
32 Globally informed methods consider “the big picture” when determining the most likely location of segment boundaries. [sent-60, score-0.356]
33 Malioutov and Barzilay (2006) show that the knowledge about long-range similarities between sentences improves segmentation quality. [sent-62, score-0.299]
34 The document is represented as a graph: nodes are sentences and edges are weighted using a measure of lexical similarity. [sent-64, score-0.075]
35 The graph is cut in a way which maximizes the net edge weight within each segment and minimizes the net weight of severed edges. [sent-65, score-0.526]
36 Such Minimum Cut segmentation resembles APS the most among others mentioned in this paper. [sent-66, score-0.19]
37 Another notable direction in text segmentation uses generative models to find segment boundaries. [sent-68, score-0.48]
38 Segment boundaries are assigned so as to maximize the likelihood of observing the complete sequence. [sent-70, score-0.109]
39 (2009) use a Latent Dirichlet allocation topic model (Blei, Ng, and Jordan, 2003) to find coherent segment boundaries. [sent-72, score-0.326]
40 Such methods output segment boundaries and suggest lexical distribution associated with each segment. [sent-73, score-0.334]
41 Globally informed models generally perform better, especially on more challenging datasets such as speech recordings, but they have unsurprisingly higher memory and CPU time requirements. [sent-75, score-0.066]
42 It is unsupervised and, unlike most other segmenters, does not require specifying the desired number of segments as an input parameter. [sent-77, score-0.038]
43 Because APS operates on a precompiled matrix of pair-wise sentence similarities, it is easy to incorporate new kinds ofinformation, such as synonymy or adjacency. [sent-79, score-0.065]
44 It also provides some information as to what the segment is about, because each segment is associated with a segment centre. [sent-80, score-0.87]
45 – – 3 Factor graphs and affinity propagation for clustering 3. [sent-81, score-0.575]
46 1 Factor graphs and the max-sum algorithm The APS algorithm is an instance of belief propagation on a cyclic factor graph. [sent-82, score-0.492]
47 In order to explain the derivation of the algorithm, we will first briefly introduce factor graphs as a framework. [sent-83, score-0.192]
48 Many computational problems can be reduced to maximizing the value of a multi-variate function F(x1, . [sent-84, score-0.064]
49 In Equation 1, H is a set of discrete indices and fh is a local function with arguments Xh ⊂ {x1, . [sent-88, score-0.087]
50 ,xn) = X fh(Xh) (1) hX∈H Factor graphs offer a concise graphical representation for such problems. [sent-94, score-0.092]
51 A global function F which can be decomposed into a sum of M local function fh can be represented as a bi-partite graph with M function nodes and N variable nodes (M = |H|). [sent-95, score-0.471]
52 Figure n1 sohdoewss a an example loef a dfaescto (Mr graph f|o)r. [sent-96, score-0.059]
53 The factor (or function) nodes are dark squares, the variable nodes are light circles. [sent-98, score-0.323]
54 The well-known max-sum algorithm (Bishop, 2006) seeks a configuration of variables which maximizes the objective function. [sent-99, score-0.13]
55 It finds the maximum in acyclic factor graphs, but in graphs with cycles neither convergence nor optimality are guaranteed (Pearl, 1982). [sent-100, score-0.215]
56 The max-sum algorithm amounts to propagating messages from function nodes to variable nodes and from variable nodes to function nodes. [sent-102, score-0.635]
57 x1fx12fx32x4 N(x) is the set of all function nodes which are x’s neighbours. [sent-104, score-0.107]
58 The message reflects the evidence about the distribution of x from all functions which have x as an argument, except for the function corresponding to the receiving node f. [sent-105, score-0.34]
59 A message µf→x from function f to variable x is computed as follows: µf→x=N m(fa)\xx(f(x1,. [sent-106, score-0.283]
60 ,xm) +x0∈XN(f)\xµx0→f) (3) N(f) is the set of all variable nodes which are f’s neighbours. [sent-109, score-0.148]
61 The message reflects the evidence about the distribution of x from function f and its neighbours other than x. [sent-110, score-0.324]
62 A common message-passing schedule on cyclic factor graphs is flooding: iteratively passing all variable-to-function messages, then all function-tovariable messages. [sent-111, score-0.31]
63 Upon convergence, the summary message reflecting final beliefs about the maximizing configuration of variables is computed as a sum of all incoming function-to-variable messages. [sent-112, score-0.368]
64 2 Affinity Propagation The APS algorithm described in this paper is a modification of the original Affinity Propagation algorithm intended for exemplar-based clustering (Frey and Dueck, 2007; Givoni and Frey, 2009). [sent-114, score-0.13]
65 This section describes the binary variable formulation proposed by Givoni and Frey, and lays the groundwork for deriving the new update messages (Section 4). [sent-115, score-0.271]
66 Affinity Propagation for exemplar-based clustering is formulated as follows: to cluster N data points, one must specify a matrix of pairwise similarities {SIM(i, j)}i,j∈{1,. [sent-116, score-0.27]
67 ,N},i6=j and a set of islealrfi-tsiiemsil {aSriItieMs (so-called preferences) SIM (j, j) swehlfic-shi mreilflaercitti a priori ablel eidef ps rienf ehroewnc likely IeMach( jd,ajta) point is to be selected as an exemplar. [sent-119, score-0.073]
68 The algorithm then assigns each data point to an exemplar so as to maximize net similarity the sum of Figure 2: Factor graph for affinity propagation. [sent-121, score-0.734]
69 – E1 Ej EN similarities between all points and their respective exemplars; this is expressed by Equation 7. [sent-122, score-0.19]
70 Figure 2 shows a schematic factor graph for this problem, with N2 binary variables. [sent-123, score-0.159]
71 Function nodes Ej enforce a coherence constraint: a data point cannot exemplify another point unless it is an exemplar for itself: Ej(c1j,. [sent-125, score-0.368]
72 ,cNj) =0−∞ifo th crje srjow=miese 0 i6 = ∧ j cij=( 14) An I node encodes a single-cluster constraint: each data point must belong to exactly one exemplar and therefore to one cluster: – Ii(ci1,. [sent-126, score-0.295]
73 ,cNj) Xj According to Equation 3, the computation of a single factor-to-variable message involves maximizing over 2n configurations. [sent-139, score-0.21]
74 Tinthsis a drastically ere tdou −c∞es th foer n muomsbte cro nofconfigurations which can maximize the message values. [sent-141, score-0.243]
75 Given this simple fact, Givoni and Frey (2009) show how to reduce the necessary update messages to only two types of scalar ones: availabilities (α) and responsibilities (ρ). [sent-142, score-0.293]
76 Instead of sending two-valued messages (corresponding etoa dth oef ft sweon possible values of the binary variables), we can send the difference for the two possible configurations: γij = γij (1) − γij (0) effectively, a log-likelihood ratio. [sent-144, score-0.242]
77 – 2Normally, each iteration of the algorithm sends five types of two-valued messages: to and from functions E and I and a message from functions S. [sent-145, score-0.235]
78 Fortunately, the messages sent to and from E factors to the variable nodes subsume the three other message types and it is not necessary to compute them explicitly. [sent-146, score-0.573]
79 Figure 3: Examples of valid configuration of hidden variables {cij} for clustering and segmentation. [sent-149, score-0.184]
80 (a) Clustering (b) Segmentation The algorithm converges when the set of points labelled as exemplars remains unchanged for a predetermined number of iterations. [sent-150, score-0.224]
81 When the algorithm terminates, messages to each variable are added together. [sent-151, score-0.273]
82 A positive final message indicates that the most likely value of a variable cij is 1 (point j is an exemplar for i), a negative message indicates that it is 0 (j is not an exemplar for i). [sent-152, score-1.005]
83 4 Affinity Propagation for Segmentation This section explains how we adapt the Affinity Propagation clustering algorithm to segmentation. [sent-153, score-0.106]
84 In this setting, sentences are data points and we refer to exemplars as segment centres. [sent-154, score-0.49]
85 Given a doc- ument, we want to assign each sentence to a segment centre so as to maximize net similarity. [sent-155, score-0.508]
86 The new formulation relies on the same underlying factor graph (Figure 2). [sent-156, score-0.159]
87 A binary variable node cij is set to 1iff sentence j is the segment centre for sentence i. [sent-157, score-0.77]
88 When clustering is the objective, a cluster may consist of points coming from anywhere in the data sequence. [sent-158, score-0.185]
89 When segmentation is the objective, a segment must consist of a solid block of points around the segment centre. [sent-159, score-0.851]
90 Figure 3 shows, for a toy problem with 5 data points, possible valid configurations of variables {cij } for clustering (3a) acondn figoru segmentation (3b). [sent-160, score-0.375]
91 Case 1 is the original coherence ∞co ninst trhairneet. [sent-163, score-0.04]
92 2 C astsaete 1s sth tahte no point kc may be in the segment with a centre is j, if k lies before the start of the segment (the sequence c(s−1)j = 0, csj = 1 necessarily corresponds to the start of the segment). [sent-165, score-0.709]
93 288 The E function nodes are the only changed part of the factor graph, so we only must re-derive α messages (availabilities) sent from factors E to variable nodes. [sent-167, score-0.527]
94 A function-to-variable message is computed as shown in Equation 11 (elaborated Equation 3), and the only incoming messages to E nodes are responsibilities (ρ messages): µf→x=N m(fa)\xx(f(x1,. [sent-168, score-0.518]
95 ,cNj) +cijX, i6=jρij(cij))) We need to compute the message values for the two possible settings of binary variables denoted – as (1) and αij (0) and propagate the difference αij = αij(1) - αij(0). [sent-173, score-0.21]
96 Consider the case of factor Ej sending an α message to the variable node cjj (i. [sent-174, score-0.535]
97 If cjj = 0 then point j is not its own segment centre and the only valid configuration is to set all other cij to 0: αij – αjj(0) =ci mj,ai6=xj(Ej(c1j,. [sent-177, score-0.798]
98 ,cNj) +ciXj,i6=jρij(cij)) = Xρij(0) (12) Xi6=j To compute αij (1) (point j is its own segment centre), we only must maximize over configurations which will not correspond to cases 2 and 3 in Equation 10 (other assignments are trivially non-optimal because they would evaluate Ej to −∞). [sent-179, score-0.399]
99 Let the start of a segment be s, 1 ≤ s < j −an∞d )th. [sent-180, score-0.29]
100 e Leentd t hoef tshtaer segment gbme e, j + s1, < e ≤ N <. [sent-181, score-0.29]
wordName wordTfidf (topN-words)
[('aps', 0.461), ('segment', 0.29), ('cij', 0.258), ('ij', 0.253), ('affinity', 0.242), ('ej', 0.199), ('segmentation', 0.19), ('message', 0.178), ('messages', 0.176), ('frey', 0.176), ('propagation', 0.159), ('exemplar', 0.159), ('givoni', 0.127), ('exemplars', 0.119), ('similarities', 0.109), ('factor', 0.1), ('graphs', 0.092), ('centre', 0.082), ('clustering', 0.082), ('points', 0.081), ('malioutov', 0.076), ('nodes', 0.075), ('topical', 0.074), ('variable', 0.073), ('sent', 0.071), ('net', 0.071), ('sim', 0.069), ('barzilay', 0.068), ('node', 0.067), ('informed', 0.066), ('sending', 0.066), ('maximize', 0.065), ('graph', 0.059), ('cyclic', 0.057), ('awa', 0.055), ('fh', 0.055), ('equation', 0.053), ('segmenter', 0.052), ('anka', 0.051), ('availabilities', 0.051), ('centres', 0.051), ('cjj', 0.051), ('dueck', 0.051), ('igiven', 0.051), ('neighbours', 0.051), ('point', 0.047), ('shifts', 0.047), ('incoming', 0.045), ('boundaries', 0.044), ('configurations', 0.044), ('electrical', 0.044), ('segmenters', 0.044), ('responsibilities', 0.044), ('configuration', 0.043), ('eisenstein', 0.043), ('reflects', 0.041), ('locally', 0.041), ('coherence', 0.04), ('xn', 0.039), ('segments', 0.038), ('xj', 0.038), ('sum', 0.038), ('topic', 0.036), ('belief', 0.036), ('globally', 0.036), ('cut', 0.035), ('outlines', 0.035), ('cpu', 0.035), ('xh', 0.035), ('ottawa', 0.035), ('iteration', 0.033), ('operates', 0.033), ('passing', 0.033), ('function', 0.032), ('maximizing', 0.032), ('variables', 0.032), ('matrix', 0.032), ('objective', 0.031), ('hearst', 0.031), ('similarity', 0.029), ('iteratively', 0.028), ('fa', 0.027), ('picture', 0.027), ('valid', 0.027), ('priori', 0.026), ('pairwise', 0.025), ('competitive', 0.025), ('segmenting', 0.025), ('algorithm', 0.024), ('realistic', 0.024), ('discusses', 0.023), ('guaranteed', 0.023), ('requirements', 0.023), ('blei', 0.023), ('preference', 0.023), ('cluster', 0.022), ('update', 0.022), ('evidence', 0.022), ('xx', 0.022), ('encodes', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 88 emnlp-2011-Linear Text Segmentation Using Affinity Propagation
Author: Anna Kazantseva ; Stan Szpakowicz
Abstract: This paper presents a new algorithm for linear text segmentation. It is an adaptation of Affinity Propagation, a state-of-the-art clustering algorithm in the framework of factor graphs. Affinity Propagation for Segmentation, or APS, receives a set of pairwise similarities between data points and produces segment boundaries and segment centres data points which best describe all other data points within the segment. APS iteratively passes messages in a cyclic factor graph, until convergence. Each iteration works with information on all available similarities, resulting in highquality results. APS scales linearly for realistic segmentation tasks. We derive the algorithm from the original Affinity Propagation formu– lation, and evaluate its performance on topical text segmentation in comparison with two state-of-the art segmenters. The results suggest that APS performs on par with or outperforms these two very competitive baselines.
2 0.1145964 99 emnlp-2011-Non-parametric Bayesian Segmentation of Japanese Noun Phrases
Author: Yugo Murawaki ; Sadao Kurohashi
Abstract: A key factor of high quality word segmentation for Japanese is a high-coverage dictionary, but it is costly to manually build such a lexical resource. Although external lexical resources for human readers are potentially good knowledge sources, they have not been utilized due to differences in segmentation criteria. To supplement a morphological dictionary with these resources, we propose a new task of Japanese noun phrase segmentation. We apply non-parametric Bayesian language models to segment each noun phrase in these resources according to the statistical behavior of its supposed constituents in text. For inference, we propose a novel block sampling procedure named hybrid type-based sampling, which has the ability to directly escape a local optimum that is not too distant from the global optimum. Experiments show that the proposed method efficiently corrects the initial segmentation given by a morphological ana- lyzer.
3 0.084008865 48 emnlp-2011-Enhancing Chinese Word Segmentation Using Unlabeled Data
Author: Weiwei Sun ; Jia Xu
Abstract: This paper investigates improving supervised word segmentation accuracy with unlabeled data. Both large-scale in-domain data and small-scale document text are considered. We present a unified solution to include features derived from unlabeled data to a discriminative learning model. For the large-scale data, we derive string statistics from Gigaword to assist a character-based segmenter. In addition, we introduce the idea about transductive, document-level segmentation, which is designed to improve the system recall for out-ofvocabulary (OOV) words which appear more than once inside a document. Novel features1 result in relative error reductions of 13.8% and 15.4% in terms of F-score and the recall of OOV words respectively.
4 0.071548074 104 emnlp-2011-Personalized Recommendation of User Comments via Factor Models
Author: Deepak Agarwal ; Bee-Chung Chen ; Bo Pang
Abstract: In recent years, the amount of user-generated opinionated texts (e.g., reviews, user comments) continues to grow at a rapid speed: featured news stories on a major event easily attract thousands of user comments on a popular online News service. How to consume subjective information ofthis volume becomes an interesting and important research question. In contrast to previous work on review analysis that tried to filter or summarize information for a generic average user, we explore a different direction of enabling personalized recommendation of such information. For each user, our task is to rank the comments associated with a given article according to personalized user preference (i.e., whether the user is likely to like or dislike the comment). To this end, we propose a factor model that incorporates rater-comment and rater-author interactions simultaneously in a principled way. Our full model significantly outperforms strong baselines as well as related models that have been considered in previous work.
5 0.071212068 3 emnlp-2011-A Correction Model for Word Alignments
Author: J. Scott McCarley ; Abraham Ittycheriah ; Salim Roukos ; Bing Xiang ; Jian-ming Xu
Abstract: Models of word alignment built as sequences of links have limited expressive power, but are easy to decode. Word aligners that model the alignment matrix can express arbitrary alignments, but are difficult to decode. We propose an alignment matrix model as a correction algorithm to an underlying sequencebased aligner. Then a greedy decoding algorithm enables the full expressive power of the alignment matrix formulation. Improved alignment performance is shown for all nine language pairs tested. The improved alignments also improved translation quality from Chinese to English and English to Italian.
6 0.065933205 92 emnlp-2011-Minimally Supervised Event Causality Identification
7 0.062813625 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning
8 0.055772685 96 emnlp-2011-Multilayer Sequence Labeling
9 0.04712452 26 emnlp-2011-Class Label Enhancement via Related Instances
10 0.046196789 67 emnlp-2011-Hierarchical Verb Clustering Using Graph Factorization
11 0.045414537 27 emnlp-2011-Classifying Sentences as Speech Acts in Message Board Posts
12 0.044824108 101 emnlp-2011-Optimizing Semantic Coherence in Topic Models
13 0.044725668 135 emnlp-2011-Timeline Generation through Evolutionary Trans-Temporal Summarization
14 0.043208316 116 emnlp-2011-Robust Disambiguation of Named Entities in Text
15 0.040550958 119 emnlp-2011-Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions
16 0.039884329 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
17 0.038228963 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
18 0.037882015 61 emnlp-2011-Generating Aspect-oriented Multi-Document Summarization with Event-aspect model
19 0.035169415 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
20 0.034000404 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study
topicId topicWeight
[(0, 0.133), (1, -0.055), (2, -0.045), (3, -0.036), (4, 0.011), (5, 0.013), (6, -0.03), (7, -0.06), (8, -0.105), (9, -0.021), (10, -0.031), (11, -0.011), (12, -0.015), (13, 0.028), (14, -0.06), (15, -0.047), (16, -0.153), (17, 0.091), (18, 0.201), (19, 0.115), (20, 0.059), (21, 0.014), (22, -0.166), (23, 0.029), (24, 0.021), (25, -0.05), (26, -0.127), (27, 0.044), (28, 0.076), (29, 0.144), (30, 0.123), (31, 0.043), (32, 0.073), (33, 0.008), (34, 0.04), (35, 0.016), (36, -0.09), (37, -0.085), (38, -0.031), (39, -0.206), (40, -0.146), (41, 0.122), (42, 0.158), (43, 0.116), (44, 0.236), (45, 0.226), (46, 0.005), (47, 0.055), (48, -0.232), (49, -0.104)]
simIndex simValue paperId paperTitle
same-paper 1 0.9806754 88 emnlp-2011-Linear Text Segmentation Using Affinity Propagation
Author: Anna Kazantseva ; Stan Szpakowicz
Abstract: This paper presents a new algorithm for linear text segmentation. It is an adaptation of Affinity Propagation, a state-of-the-art clustering algorithm in the framework of factor graphs. Affinity Propagation for Segmentation, or APS, receives a set of pairwise similarities between data points and produces segment boundaries and segment centres data points which best describe all other data points within the segment. APS iteratively passes messages in a cyclic factor graph, until convergence. Each iteration works with information on all available similarities, resulting in highquality results. APS scales linearly for realistic segmentation tasks. We derive the algorithm from the original Affinity Propagation formu– lation, and evaluate its performance on topical text segmentation in comparison with two state-of-the art segmenters. The results suggest that APS performs on par with or outperforms these two very competitive baselines.
2 0.4530305 99 emnlp-2011-Non-parametric Bayesian Segmentation of Japanese Noun Phrases
Author: Yugo Murawaki ; Sadao Kurohashi
Abstract: A key factor of high quality word segmentation for Japanese is a high-coverage dictionary, but it is costly to manually build such a lexical resource. Although external lexical resources for human readers are potentially good knowledge sources, they have not been utilized due to differences in segmentation criteria. To supplement a morphological dictionary with these resources, we propose a new task of Japanese noun phrase segmentation. We apply non-parametric Bayesian language models to segment each noun phrase in these resources according to the statistical behavior of its supposed constituents in text. For inference, we propose a novel block sampling procedure named hybrid type-based sampling, which has the ability to directly escape a local optimum that is not too distant from the global optimum. Experiments show that the proposed method efficiently corrects the initial segmentation given by a morphological ana- lyzer.
3 0.42873776 48 emnlp-2011-Enhancing Chinese Word Segmentation Using Unlabeled Data
Author: Weiwei Sun ; Jia Xu
Abstract: This paper investigates improving supervised word segmentation accuracy with unlabeled data. Both large-scale in-domain data and small-scale document text are considered. We present a unified solution to include features derived from unlabeled data to a discriminative learning model. For the large-scale data, we derive string statistics from Gigaword to assist a character-based segmenter. In addition, we introduce the idea about transductive, document-level segmentation, which is designed to improve the system recall for out-ofvocabulary (OOV) words which appear more than once inside a document. Novel features1 result in relative error reductions of 13.8% and 15.4% in terms of F-score and the recall of OOV words respectively.
4 0.3633644 26 emnlp-2011-Class Label Enhancement via Related Instances
Author: Zornitsa Kozareva ; Konstantin Voevodski ; Shanghua Teng
Abstract: Class-instance label propagation algorithms have been successfully used to fuse information from multiple sources in order to enrich a set of unlabeled instances with class labels. Yet, nobody has explored the relationships between the instances themselves to enhance an initial set of class-instance pairs. We propose two graph-theoretic methods (centrality and regularization), which start with a small set of labeled class-instance pairs and use the instance-instance network to extend the class labels to all instances in the network. We carry out a comparative study with state-of-the-art knowledge harvesting algorithm and show that our approach can learn additional class labels while maintaining high accuracy. We conduct a comparative study between class-instance and instance-instance graphs used to propagate the class labels and show that the latter one achieves higher accuracy.
5 0.29663649 3 emnlp-2011-A Correction Model for Word Alignments
Author: J. Scott McCarley ; Abraham Ittycheriah ; Salim Roukos ; Bing Xiang ; Jian-ming Xu
Abstract: Models of word alignment built as sequences of links have limited expressive power, but are easy to decode. Word aligners that model the alignment matrix can express arbitrary alignments, but are difficult to decode. We propose an alignment matrix model as a correction algorithm to an underlying sequencebased aligner. Then a greedy decoding algorithm enables the full expressive power of the alignment matrix formulation. Improved alignment performance is shown for all nine language pairs tested. The improved alignments also improved translation quality from Chinese to English and English to Italian.
6 0.28031626 67 emnlp-2011-Hierarchical Verb Clustering Using Graph Factorization
7 0.2662783 96 emnlp-2011-Multilayer Sequence Labeling
8 0.24150378 104 emnlp-2011-Personalized Recommendation of User Comments via Factor Models
9 0.21776578 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning
10 0.20871013 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
11 0.19085398 92 emnlp-2011-Minimally Supervised Event Causality Identification
12 0.18564026 143 emnlp-2011-Unsupervised Information Extraction with Distributional Prior Knowledge
13 0.18077224 64 emnlp-2011-Harnessing different knowledge sources to measure semantic relatedness under a uniform model
14 0.17560858 27 emnlp-2011-Classifying Sentences as Speech Acts in Message Board Posts
15 0.17055595 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
16 0.16952439 112 emnlp-2011-Refining the Notions of Depth and Density in WordNet-based Semantic Similarity Measures
17 0.16938265 46 emnlp-2011-Efficient Subsampling for Training Complex Language Models
18 0.16902597 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction
19 0.16065159 135 emnlp-2011-Timeline Generation through Evolutionary Trans-Temporal Summarization
topicId topicWeight
[(15, 0.011), (23, 0.115), (32, 0.348), (36, 0.059), (37, 0.026), (45, 0.045), (54, 0.014), (57, 0.01), (62, 0.032), (64, 0.024), (66, 0.036), (69, 0.014), (79, 0.03), (82, 0.029), (90, 0.02), (96, 0.055), (98, 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.80669975 88 emnlp-2011-Linear Text Segmentation Using Affinity Propagation
Author: Anna Kazantseva ; Stan Szpakowicz
Abstract: This paper presents a new algorithm for linear text segmentation. It is an adaptation of Affinity Propagation, a state-of-the-art clustering algorithm in the framework of factor graphs. Affinity Propagation for Segmentation, or APS, receives a set of pairwise similarities between data points and produces segment boundaries and segment centres data points which best describe all other data points within the segment. APS iteratively passes messages in a cyclic factor graph, until convergence. Each iteration works with information on all available similarities, resulting in highquality results. APS scales linearly for realistic segmentation tasks. We derive the algorithm from the original Affinity Propagation formu– lation, and evaluate its performance on topical text segmentation in comparison with two state-of-the art segmenters. The results suggest that APS performs on par with or outperforms these two very competitive baselines.
2 0.62998462 17 emnlp-2011-Active Learning with Amazon Mechanical Turk
Author: Florian Laws ; Christian Scheible ; Hinrich Schutze
Abstract: Supervised classification needs large amounts of annotated training data that is expensive to create. Two approaches that reduce the cost of annotation are active learning and crowdsourcing. However, these two approaches have not been combined successfully to date. We evaluate the utility of active learning in crowdsourcing on two tasks, named entity recognition and sentiment detection, and show that active learning outperforms random selection of annotation examples in a noisy crowdsourcing scenario.
3 0.40932634 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
Author: Kevin Gimpel ; Noah A. Smith
Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.
4 0.40272132 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives
Author: Keith Hall ; Ryan McDonald ; Jason Katz-Brown ; Michael Ringgaard
Abstract: We present an online learning algorithm for training parsers which allows for the inclusion of multiple objective functions. The primary example is the extension of a standard supervised parsing objective function with additional loss-functions, either based on intrinsic parsing quality or task-specific extrinsic measures of quality. Our empirical results show how this approach performs for two dependency parsing algorithms (graph-based and transition-based parsing) and how it achieves increased performance on multiple target tasks including reordering for machine translation and parser adaptation.
5 0.40083483 138 emnlp-2011-Tuning as Ranking
Author: Mark Hopkins ; Jonathan May
Abstract: We offer a simple, effective, and scalable method for statistical machine translation parameter tuning based on the pairwise approach to ranking (Herbrich et al., 1999). Unlike the popular MERT algorithm (Och, 2003), our pairwise ranking optimization (PRO) method is not limited to a handful of parameters and can easily handle systems with thousands of features. Moreover, unlike recent approaches built upon the MIRA algorithm of Crammer and Singer (2003) (Watanabe et al., 2007; Chiang et al., 2008b), PRO is easy to implement. It uses off-the-shelf linear binary classifier software and can be built on top of an existing MERT framework in a matter of hours. We establish PRO’s scalability and effectiveness by comparing it to MERT and MIRA and demonstrate parity on both phrase-based and syntax-based systems in a variety of language pairs, using large scale data scenarios.
6 0.3992725 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
7 0.39919645 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation
8 0.39874992 136 emnlp-2011-Training a Parser for Machine Translation Reordering
9 0.39743862 69 emnlp-2011-Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources
10 0.39524943 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
11 0.3928934 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances
12 0.39227584 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction
13 0.3915855 68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding
14 0.3907097 6 emnlp-2011-A Generate and Rank Approach to Sentence Paraphrasing
15 0.38990593 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests
16 0.38892379 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search
17 0.38742146 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction
18 0.38715726 128 emnlp-2011-Structured Relation Discovery using Generative Models
19 0.38632584 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
20 0.38584334 70 emnlp-2011-Identifying Relations for Open Information Extraction