acl acl2013 acl2013-157 knowledge-graph by maker-knowledge-mining

157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

Source: pdf

Author: Miguel Almeida ; Andre Martins

Abstract: We present a dual decomposition framework for multi-document summarization, using a model that jointly extracts and compresses sentences. Compared with previous work based on integer linear programming, our approach does not require external solvers, is significantly faster, and is modular in the three qualities a summary should have: conciseness, informativeness, and grammaticality. In addition, we propose a multi-task learning framework to take advantage of existing data for extractive summarization and sentence compression. Experiments in the TAC2008 dataset yield the highest published ROUGE scores to date, with runtimes that rival those of extractive summarizers.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 pt Abstract We present a dual decomposition framework for multi-document summarization, using a model that jointly extracts and compresses sentences. [sent-6, score-0.317]

2 Compared with previous work based on integer linear programming, our approach does not require external solvers, is significantly faster, and is modular in the three qualities a summary should have: conciseness, informativeness, and grammaticality. [sent-7, score-0.219]

3 In addition, we propose a multi-task learning framework to take advantage of existing data for extractive summarization and sentence compression. [sent-8, score-0.464]

4 Experiments in the TAC2008 dataset yield the highest published ROUGE scores to date, with runtimes that rival those of extractive summarizers. [sent-9, score-0.339]

5 1 Introduction Automatic text summarization is a seminal problem in information retrieval and natural language processing (Luhn, 1958; Baxendale, 1958; Edmundson, 1969). [sent-10, score-0.231]

6 Today, with the overwhelming amount of information available on the Web, the demand for fast, robust, and scalable summarization systems is stronger than ever. [sent-11, score-0.231]

7 Up to now, extractive systems have been the most popular in multi-document summarization. [sent-12, score-0.233]

8 However, extractive systems are rather limited in the summaries they can produce. [sent-22, score-0.273]

9 This has motivated research in compressive summarization (Lin, 2003; Zajic et al. [sent-24, score-0.728]

10 All approaches above are based on in- teger linear programming (ILP), suffering from slow runtimes, when compared to extractive systems. [sent-28, score-0.308]

11 Having a compressive summarizer which is both fast and expressive remains an open problem. [sent-31, score-0.624]

12 For example, such solvers are unable to take advantage of efficient dynamic programming routines for sentence compression (McDonald, 2006). [sent-33, score-0.333]

13 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 196–206, This paper makes progress in two fronts: • We derive a dual decomposition framework for eWxetr adcertiivvee aan ddu compressive situimonm farraimzaetwioonr (§2– 3). [sent-36, score-0.814]

14 We also contribute with a novel knapsack factor, along with a linear-time algorithm for the corresponding dual decomposition subproblem. [sent-38, score-0.461]

15 We propose multi-task learning (§4) as a principled way tmo utrltaii-nt compressive summarizers, using auxiliary data for extractive summarization and sentence compression. [sent-39, score-0.961]

16 Experiments on TAC data (§5) yield state-of-theart results, twsi tohn r TuAnCtim deast asim (§i5l)ar y to dth satta otef- extractive systems. [sent-41, score-0.233]

17 To our best knowledge, this had never been achieved by compressive summarizers. [sent-42, score-0.497]

18 2 Extractive Summarization In extractive summarization, we are given a set of sentences D := {s1, . [sent-43, score-0.233]

19 , sN} belonging to one or more documents, and the goal liso tgoi negxt troac otn a subset S ⊆ D that conveys a good summary of D asnubds ewth Sos ⊆e t oDta tlh natu mcobnevre yofs w ao grodosd dd souems nmoatr eyx ocefe Dd a prespecified budget B. [sent-46, score-0.195]

20 We use an indicator vector y := hyninN=1 to represent an eaxnt irnadciticvaeto summary, :w=h ehyre yn = 1 if sn ∈ S, and yn = 0 otherwise. [sent-47, score-0.117]

21 By designing a quality score function g : {0, 1}N → R, this can a be q ucaalsitt as a global optimization problem ,w thitihs a knapsack constraint: maximize g(y) 1}N s. [sent-49, score-0.182]

22 However, extending these models to allow for sentence compression (as will be detailed in §3) breaks the diminishing returns property, making bsurebamkosd thuela dr optimization no longer applicable. [sent-62, score-0.262]

23 1 Coverage-Based Summarization Coverage-based extractive summarization can be formalized as follows. [sent-64, score-0.464]

24 Then, the following quality score function is defined: g(y) = PmM=1σmum(y), (2) where um(y) := WPn∈Im yn is a Boolean function that indicates wheWthne∈r the mth concept is present in the summary. [sent-77, score-0.186]

25 Even though existing commercial solvers can solve most instances with a moderate speed, they still exhibit poor worst-case behaviour; this is exacerbated when there is the need to combine an extractive component with other modules, as in compressive summarization (§3). [sent-95, score-1.058]

26 3 can be addressed with dual decomposition, a class of optimization techniques that tackle the dual of combinatorial problems in a modular, extensible, and parallelizable manner (Komodakis et al. [sent-101, score-0.4]

27 In particular, we employ alternating directions dual decomposition (AD3; Martins et al. [sent-104, score-0.317]

28 , 2011a, 2012) for solving a linear relaxation of Eq. [sent-105, score-0.143]

29 Both algorithms split the original problem into several components, and then iterate between solving independent local subproblems at each component and adjusting multipliers to promote an agreement. [sent-109, score-0.134]

30 We will see that AD3 can also handle budget and knapsack constraints efficiently. [sent-114, score-0.25]

31 3 with dual decomposition, we split the coverage-based summarizer into the following M + 1components (one per constraint): 1. [sent-116, score-0.308]

32 For each of the M concepts in C(D), one component for imposing the logic constraint in Eq. [sent-117, score-0.143]

33 (201 1b); the AD3 subproblem for the mth factor can be solved in time O( |Im |). [sent-120, score-0.192]

34 3 3 Compressive Summarization We now turn to compressive summarization, which does not limit the summary sentences to be verbatim extracts from the original documents; in2For details about dual decomposition and Lagrangian relaxation, see the recent tutorial by Rush and Collins (2012). [sent-125, score-0.903]

35 3The AD3 subproblem in this case corresponds to computing an Euclidean projection onto the knapsack polytope (Eq. [sent-126, score-0.257]

36 Others addressed the related, but much harder, integer quadratic knapsack problem (McDonald, 2007). [sent-128, score-0.224]

37 Wte represent a compression douf sn as an ibnodl. [sent-131, score-0.261]

38 icator vector zn := where zn,‘ = 1 if the ‘th word is :i=ncl uhzdedi in the compression. [sent-132, score-0.207]

39 By convention, the dummy symbol is included if and only if the remaining compression is non-empty. [sent-133, score-0.261]

40 ∈M o[Nde]ls a fdo ar compressive ‘su ∈m {m0a}ri∪z at [Lion were proposed by Martins and Smith (2009) and BergKirkpatrick et al. [sent-138, score-0.497]

41 Here, we follow the latter work, by combining a coverage score function g with sentence-level compression score functions h1, . [sent-140, score-0.224]

42 2, but taking a compressive summary z as argument: g(z) = PmM=1 σmum(z), where we redefine Pum as follows. [sent-152, score-0.586]

43 (5) First, we parametrize each occurrence of the mth concept (assumed to be a k-gram) as a triple hn, ‘s, ‘ei, w(ashseurme n din tdoex bees a sentence, ‘s ain tdreipxlees a s,ta‘rt position within the sentence, and ‘e indexes the end position. [sent-153, score-0.146]

44 We denote by Tm the set of triples representing all occurrences of the mth concept in the original text, and we associate an indicator variable zn,‘s:‘e to each member of this set. [sent-154, score-0.146]

45 We then define um(z) via the following logic constraints: • A concept type is selected if some of its k-gram Atok ceonnsc are s teylpecet iesd s: um(y) := Whn,‘s,‘ei∈Tm • zn,‘s:‘e. [sent-155, score-0.121]

46 198 Figure 1: Components of our compressive summarizer. [sent-157, score-0.497]

47 Finally, the budget factor, in green, is connected to the word nodes; it ensures that the summary fits the word limit. [sent-160, score-0.227]

48 We will exploit this fact in the dual decomposition framework described next. [sent-171, score-0.317]

49 Here, we employ the AD3 algorithm, in a 4The same framework can be readily adapted to other compression models that are efficiently decodable, such as the semi-Markov model of McDonald (2006), which would allow incorporating a language model for the compression. [sent-176, score-0.224]

50 For each of the N sentences, one component for the compression model. [sent-179, score-0.255]

51 The AD3 quadratic subproblem for this factor can be addressed by solving a sequence oflinear subproblems, as described by Martins et al. [sent-180, score-0.215]

52 For each of the M concept types in C(D), one OR-WITH-OUTPUT factor for the logic constraint in Eq. [sent-186, score-0.203]

53 This is analogous to the one described for the extractive case. [sent-188, score-0.233]

54 For each k-gram concept token in Tm, one AND-WITH-OUTPUT factor that imposes the constraint in Eq. [sent-190, score-0.165]

55 (201 1b) and its AD3 subproblem can be solved in time linear in k. [sent-193, score-0.111]

56 The runtime of this AD3 subproblem is linear in the number of words. [sent-196, score-0.144]

57 We chose instead to adopt a fast and simple rounding procedure for obtaining a summary from a fractional solution. [sent-207, score-0.136]

58 In addition, we included hard constraints to prevent the deletion of certain arcs, following previous work in sentence compression (Clarke and Lapata, 2008). [sent-224, score-0.224]

59 , allowed to come); arcs pointing to negation words, cardinal numbers, or determiners; and arcs connecting two proper nouns or words within quotation marks. [sent-233, score-0.178]

60 Prior work in compressive summarization has followed one of two strategies: Martins and Smith (2009) and Woodsend and Lapata (2012) learn the extraction and compression models separately, and then post-combine them, circumventing the lack of fully annotated data. [sent-235, score-0.952]

61 With this in mind, we put together a multi-task learning framework for compressive summarization (which we name task #1). [sent-240, score-0.728]

62 The goal is to take advantage of existing data for related tasks, such as extractive summarization (task #2), and sentence compression (task #3). [sent-241, score-0.688]

63 , 2007), and for all of them we assume featurebased models that decompose over “parts”: • • • For the compressive summarization task, the parts correspond tsoiv concept aferaiztuatrioens (§3. [sent-253, score-0.811]

64 For the extractive summarization task, there are parts hfeo re concept f seuamtumrearsi only. [sent-256, score-0.547]

65 For the sentence compression task, the parts correspond ttoe nacrec-cd oemletpiorens sfieoantu traseks only. [sent-257, score-0.224]

66 , 2004), where the corresponding cost functions are concept recall (for task #2), precision of arc deletions (for task #3), and a combination thereof (for task #1). [sent-270, score-0.133]

67 As a result, the optimal v1 will be a vector of zeros; since tasks #2 and #3 have no parts in common, the objective will decouple into a sum of two independent terms + 6Note that, by substituting uk := w vk and solving for w, the problem in Eq. [sent-284, score-0.125]

68 1 Experimental setup We evaluated our compressive summarizers on data from the Text Analysis Conference (TAC) evaluations. [sent-295, score-0.537]

69 The test partition contains 48 multi-document summarization problems; each provides 10 related news articles as input, and asks for a summary with up to 100 words, which is evaluated against four manually written abstracts. [sent-299, score-0.32]

70 In the single-task experiments, we trained a compressive summarizer on the dataset disclosed by Berg-Kirkpatrick et al. [sent-302, score-0.624]

71 (201 1), which contains manual compressive summaries for the TAC-2009 data. [sent-303, score-0.537]

72 9 Our choice of a dependency parser was motivated by our will for a fast system; in particular, TurboParser attains top accuracies at a rate of 1,200 words per second, keeping parsing times below 1 second for each summarization problem. [sent-309, score-0.231]

73 (201 1), but we augmented the training data with extractive summarization and sentence compression datasets, to help train the 8We use the AD3 implementation in http: //www. [sent-312, score-0.688]

74 We extended the code to handle the knapsack and budget factors; the modified code will be part of the next release (AD3 2. [sent-317, score-0.25]

75 For extractive summarization, we used the DUC 2003 and 2004 datasets (a total of 80 multi-document summarization problems). [sent-324, score-0.464]

76 The top rows refer to three strong baselines: the ICSI-1 extractive coverage-based system of Gillick et al. [sent-331, score-0.233]

77 (2008), which achieved the best ROUGE scores in the TAC-2008 evaluation; the compressive summarizer of Berg-Kirkpatrick et al. [sent-332, score-0.624]

78 (201 1), denoted BGK’ 11; and the multi-aspect compressive summarizer of Woodsend and Lapata (2012), denoted WL’ 12. [sent-333, score-0.624]

79 The bottom rows show the results achieved by our implementation of a pure extractive system (similar to the learned extractive summarizer of Berg-Kirkpatrick et al. [sent-335, score-0.593]

80 , 2011); a system that postcombines extraction and compression components trained separately, as in Martins and Smith (2009); and our compressive summarizer trained as a single task, and in the multi-task setting. [sent-336, score-0.848]

81 The ROUGE and Pyramid scores show that the compressive summarizers (when properly trained) yield considerable benefits in content coverage over extractive systems, confirming the results of Berg-Kirkpatrick et al. [sent-337, score-0.77]

82 30%) is, to our knowledge, the highest reported on the TAC-2008 dataset, with little harm in grammaticality with respect to an extractive system that preserves the original sentences. [sent-341, score-0.233]

83 3 Runtimes We conducted another set of experiments to compare the runtime of our compressive summarizer based on AD3 with the runtimes achieved by GLPK, the ILP solver used by Berg-Kirkpatrick et al. [sent-344, score-0.808]

84 For each decoder, we show the average time taken to solve a summarization problem in TAC-2008. [sent-391, score-0.231]

85 The reported runtimes of AD3 and LP-Relax include the time taken to round the solution (§3. [sent-392, score-0.106]

86 11The runtimes obtained with the exact ILP solver seem slower than those reported by Berg-Kirkpatrick et al. [sent-400, score-0.151]

87 Figure 2: Example summary from our compressive system. [sent-408, score-0.586]

88 To our knowledge, this is the first time a compressive summarizer achieves such a favorable accuracy/speed tradeoff. [sent-415, score-0.624]

89 6 Conclusions We presented a multi-task learning framework for compressive summarization, leveraging data for related tasks in a principled manner. [sent-416, score-0.497]

90 We decode with AD3, a fast and modular dual decomposition algorithm which is orders of magnitude faster than ILP-based approaches. [sent-417, score-0.402]

91 Results show that the state of the art is improved in automatic and manual metrics, with speeds close to extractive systems. [sent-418, score-0.233]

92 For example, a different compression model could incorporate rewriting rules to enable compressions that go beyond word deletion, as in Cohn and Lapata (2008). [sent-420, score-0.224]

93 Other aspects may be added as additional components in our dual decomposition framework, such as query information (Schilder and Kondadadi, 2008), discourse con203 straints (Clarke and Lapata, 2007), or lexical preferences (Woodsend and Lapata, 2012). [sent-421, score-0.317]

94 Our multitask approach may be used to jointly learn parameters for these aspects; the dual decomposition algorithm ensures that optimization remains tractable even with many components. [sent-422, score-0.387]

95 This inwclhuederes as special cases th≥e problems oNf projecting onto a budget constraint (Ln = 1, ∀n) and onto the simplex (same, plus B = 1). [sent-429, score-0.206]

96 W ← W \ (WL ∪ WR ∪ WM) UUppddaattee wtigohrtk-isnugm s:e stight+Pn∈WL Ln(1−an)−Pn∈WR Lnan 21: 22: 23: 24: 25: stight ← Update slack-sumP: ξ ← ξ + Pn∈WM LP2n endU wphdailtee Define τ∗ ← (B −PiN=1 Liai −P stight)/ξ Set zn ← ←clip ((Ban − +P Pτ∗Ln) , ∀n− −∈ s [N] output: z := h(zaninNP=1. [sent-466, score-0.298]

97 Multi-document summarization via budgeted maximization of submodular functions. [sent-604, score-0.292]

98 Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. [sent-717, score-0.231]

99 On dual decomposition and linear programming relaxations for natural language processing. [sent-732, score-0.392]

100 Sentence compression as a component of a multidocument summarization system. [sent-799, score-0.486]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('compressive', 0.497), ('extractive', 0.233), ('summarization', 0.231), ('compression', 0.224), ('zn', 0.207), ('martins', 0.188), ('dual', 0.181), ('knapsack', 0.144), ('decomposition', 0.136), ('woodsend', 0.134), ('pnn', 0.127), ('summarizer', 0.127), ('runtimes', 0.106), ('budget', 0.106), ('ln', 0.103), ('stight', 0.091), ('arcs', 0.089), ('ilp', 0.089), ('summary', 0.089), ('concept', 0.083), ('subproblem', 0.079), ('gillick', 0.076), ('lnzn', 0.073), ('pmm', 0.073), ('lk', 0.071), ('vk', 0.071), ('rush', 0.069), ('lapata', 0.067), ('solvers', 0.066), ('rouge', 0.063), ('mth', 0.063), ('submodular', 0.061), ('bergkirkpatrick', 0.059), ('pardalos', 0.059), ('relaxation', 0.057), ('wl', 0.056), ('kovoor', 0.054), ('solving', 0.054), ('smith', 0.053), ('clip', 0.053), ('taskar', 0.052), ('yk', 0.051), ('factor', 0.05), ('arc', 0.05), ('modular', 0.05), ('lagrangian', 0.049), ('subproblems', 0.049), ('integer', 0.048), ('tac', 0.048), ('rounding', 0.047), ('pyramid', 0.045), ('solver', 0.045), ('daum', 0.044), ('cov', 0.044), ('filatova', 0.044), ('um', 0.043), ('programming', 0.043), ('bilmes', 0.042), ('lisboa', 0.042), ('mum', 0.042), ('concepts', 0.042), ('yn', 0.04), ('summaries', 0.04), ('summarizers', 0.04), ('modifier', 0.039), ('clarke', 0.039), ('compressed', 0.038), ('aguiar', 0.038), ('logic', 0.038), ('optimization', 0.038), ('dummy', 0.037), ('sn', 0.037), ('bgk', 0.036), ('klk', 0.036), ('sipos', 0.036), ('subgradients', 0.036), ('im', 0.036), ('portugal', 0.035), ('wm', 0.035), ('faster', 0.035), ('wr', 0.034), ('carbonell', 0.034), ('onto', 0.034), ('comp', 0.033), ('duc', 0.033), ('runtime', 0.033), ('hn', 0.033), ('structured', 0.032), ('quadratic', 0.032), ('ensures', 0.032), ('lin', 0.032), ('priberam', 0.032), ('maximizing', 0.032), ('tm', 0.032), ('constraint', 0.032), ('linear', 0.032), ('deleted', 0.032), ('interval', 0.032), ('cm', 0.031), ('component', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

Author: Miguel Almeida ; Andre Martins

2 0.28797209 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

Author: Lu Wang ; Hema Raghavan ; Vittorio Castelli ; Radu Florian ; Claire Cardie

Abstract: We consider the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization. We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees. An innovative beam search decoder is proposed to efficiently find highly probable compressions. Under this framework, we show how to integrate various indicative metrics such as linguistic motivation and query relevance into the compression process by deriving a novel formulation of a compression scoring function. Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e.g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task. ,

3 0.24676585 362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers

Author: Andre Martins ; Miguel Almeida ; Noah A. Smith

Abstract: We present fast, accurate, direct nonprojective dependency parsers with thirdorder features. Our approach uses AD3, an accelerated dual decomposition algorithm which we extend to handle specialized head automata and sequential head bigram models. Experiments in fourteen languages yield parsing speeds competitive to projective parsers, with state-ofthe-art accuracies for the largest datasets (English, Czech, and German).

4 0.21261436 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization

Author: Chen Li ; Xian Qian ; Yang Liu

Abstract: In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework. For each bigram, a regression model is used to estimate its frequency in the reference summary. The regression model uses a variety ofindicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary. During testing, the sentence selection problem is formulated as an ILP problem to maximize the bigram gains. We demonstrate that our system consistently outperforms the previous ILP method on different TAC data sets, and performs competitively compared to the best results in the TAC evaluations. We also conducted various analysis to show the impact of bigram selection, weight estimation, and ILP setup.

5 0.16843782 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization

Author: Hajime Morita ; Ryohei Sasano ; Hiroya Takamura ; Manabu Okumura

Abstract: This study proposes a text summarization model that simultaneously performs sentence extraction and compression. We translate the text summarization task into a problem of extracting a set of dependency subtrees in the document cluster. We also encode obligatory case constraints as must-link dependency constraints in order to guarantee the readability of the generated summary. In order to handle the subtree extraction problem, we investigate a new class of submodular maximization problem, and a new algorithm that has the approximation ratio 21 (1 − e−1). Our experiments with the NTC(1IR − −A eCLIA test collections show that our approach outperforms a state-of-the-art algorithm.

6 0.15172468 333 acl-2013-Summarization Through Submodularity and Dispersion

7 0.13677043 353 acl-2013-Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain

8 0.12901531 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art

9 0.12779506 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization

10 0.12088507 143 acl-2013-Exact Maximum Inference for the Fertility Hidden Markov Model

11 0.112572 334 acl-2013-Supervised Model Learning with Feature Grouping based on a Discrete Constraint

12 0.10833866 237 acl-2013-Margin-based Decomposed Amortized Inference

13 0.10687752 50 acl-2013-An improved MDL-based compression algorithm for unsupervised word segmentation

14 0.10262214 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition

15 0.10062008 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors

16 0.10046921 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models

17 0.094938792 375 acl-2013-Using Integer Linear Programming in Concept-to-Text Generation to Produce More Compact Texts

18 0.094332039 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification

19 0.079313762 319 acl-2013-Sequential Summarization: A New Application for Timely Updated Twitter Trending Topics

20 0.076464474 142 acl-2013-Evolutionary Hierarchical Dirichlet Process for Timeline Summarization

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.183), (1, -0.005), (2, -0.046), (3, -0.043), (4, -0.024), (5, 0.02), (6, 0.171), (7, -0.025), (8, -0.243), (9, -0.105), (10, -0.053), (11, -0.043), (12, -0.223), (13, -0.116), (14, -0.084), (15, 0.189), (16, 0.212), (17, -0.131), (18, 0.051), (19, 0.032), (20, 0.018), (21, -0.015), (22, -0.047), (23, 0.086), (24, -0.031), (25, -0.058), (26, -0.033), (27, -0.019), (28, 0.061), (29, -0.002), (30, 0.065), (31, 0.0), (32, -0.032), (33, 0.081), (34, 0.035), (35, 0.107), (36, 0.025), (37, -0.063), (38, 0.115), (39, -0.005), (40, -0.059), (41, -0.055), (42, -0.006), (43, 0.003), (44, -0.079), (45, 0.017), (46, -0.019), (47, -0.001), (48, 0.009), (49, -0.09)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94335359 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

Author: Miguel Almeida ; Andre Martins

2 0.7942211 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization

Author: Hajime Morita ; Ryohei Sasano ; Hiroya Takamura ; Manabu Okumura

3 0.78866559 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization

Author: Chen Li ; Xian Qian ; Yang Liu

4 0.75561875 333 acl-2013-Summarization Through Submodularity and Dispersion

Author: Anirban Dasgupta ; Ravi Kumar ; Sujith Ravi

Abstract: We propose a new optimization framework for summarization by generalizing the submodular framework of (Lin and Bilmes, 2011). In our framework the summarization desideratum is expressed as a sum of a submodular function and a nonsubmodular function, which we call dispersion; the latter uses inter-sentence dissimilarities in different ways in order to ensure non-redundancy of the summary. We consider three natural dispersion functions and show that a greedy algorithm can obtain an approximately optimal summary in all three cases. We conduct experiments on two corpora—DUC 2004 and user comments on news articles—and show that the performance of our algorithm outperforms those that rely only on submodularity.

5 0.73206365 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

Author: Lu Wang ; Hema Raghavan ; Vittorio Castelli ; Radu Florian ; Claire Cardie

6 0.6440925 353 acl-2013-Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain

7 0.58633333 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art

8 0.55845392 362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers

9 0.54645193 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models

10 0.53983939 237 acl-2013-Margin-based Decomposed Amortized Inference

11 0.53756917 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization

12 0.53180772 375 acl-2013-Using Integer Linear Programming in Concept-to-Text Generation to Produce More Compact Texts

13 0.51730061 334 acl-2013-Supervised Model Learning with Feature Grouping based on a Discrete Constraint

14 0.50682682 142 acl-2013-Evolutionary Hierarchical Dirichlet Process for Timeline Summarization

15 0.48990163 50 acl-2013-An improved MDL-based compression algorithm for unsupervised word segmentation

16 0.48130056 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics

17 0.44522527 382 acl-2013-Variational Inference for Structured NLP Models

18 0.42254326 319 acl-2013-Sequential Summarization: A New Application for Timely Updated Twitter Trending Topics

19 0.42056572 178 acl-2013-HEADY: News headline abstraction through event pattern clustering

20 0.41748884 143 acl-2013-Exact Maximum Inference for the Fertility Hidden Markov Model

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.112), (6, 0.082), (11, 0.065), (15, 0.015), (16, 0.212), (24, 0.034), (26, 0.053), (35, 0.065), (42, 0.042), (48, 0.056), (64, 0.014), (70, 0.042), (88, 0.024), (90, 0.023), (95, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.8539207 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

Author: Miguel Almeida ; Andre Martins

2 0.80783027 191 acl-2013-Improved Bayesian Logistic Supervised Topic Models with Data Augmentation

Author: Jun Zhu ; Xun Zheng ; Bo Zhang

Abstract: Supervised topic models with a logistic likelihood have two issues that potentially limit their practical use: 1) response variables are usually over-weighted by document word counts; and 2) existing variational inference methods make strict mean-field assumptions. We address these issues by: 1) introducing a regularization constant to better balance the two parts based on an optimization formulation of Bayesian inference; and 2) developing a simple Gibbs sampling algorithm by introducing auxiliary Polya-Gamma variables and collapsing out Dirichlet variables. Our augment-and-collapse sampling algorithm has analytical forms of each conditional distribution without making any restricting assumptions and can be easily parallelized. Empirical results demonstrate significant improvements on prediction performance and time efficiency.

3 0.77500695 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study

Author: Wenbin Jiang ; Meng Sun ; Yajuan Lu ; Yating Yang ; Qun Liu

Abstract: Structural information in web text provides natural annotations for NLP problems such as word segmentation and parsing. In this paper we propose a discriminative learning algorithm to take advantage of the linguistic knowledge in large amounts of natural annotations on the Internet. It utilizes the Internet as an external corpus with massive (although slight and sparse) natural annotations, and enables a classifier to evolve on the large-scaled and real-time updated web text. With Chinese word segmentation as a case study, experiments show that the segmenter enhanced with the Chinese wikipedia achieves sig- nificant improvement on a series of testing sets from different domains, even with a single classifier and local features.

4 0.71882039 193 acl-2013-Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations

Author: Longkai Zhang ; Li Li ; Zhengyan He ; Houfeng Wang ; Ni Sun

Abstract: Micro-blog is a new kind of medium which is short and informal. While no segmented corpus of micro-blogs is available to train Chinese word segmentation model, existing Chinese word segmentation tools cannot perform equally well as in ordinary news texts. In this paper we present an effective yet simple approach to Chinese word segmentation of micro-blog. In our approach, we incorporate punctuation information of unlabeled micro-blog data by introducing characters behind or ahead of punctuations, for they indicate the beginning or end of words. Meanwhile a self-training framework to incorporate confident instances is also used, which prove to be helpful. Ex- periments on micro-blog data show that our approach improves performance, especially in OOV-recall. 1 INTRODUCTION Micro-blog (also known as tweets in English) is a new kind of broadcast medium in the form of blogging. A micro-blog differs from a traditional blog in that it is typically smaller in size. Furthermore, texts in micro-blogs tend to be informal and new words occur more frequently. These new features of micro-blogs make the Chinese Word Segmentation (CWS) models trained on the source domain, such as news corpus, fail to perform equally well when transferred to texts from micro-blogs. For example, the most widely used Chinese segmenter ”ICTCLAS” yields 0.95 f-score in news corpus, only gets 0.82 f-score on micro-blog data. The poor segmentation results will hurt subsequent analysis on micro-blog text. ∗Corresponding author Manually labeling the texts of micro-blog is time consuming. Luckily, punctuations provide useful information because they are used as indicators of the end of previous sentence and the beginning of the next one, which also indicate the start and the end of a word. These ”natural boundaries” appear so frequently in micro-blog texts that we can easily make good use of them. TABLE 1 shows some statistics of the news corpus vs. the micro-blogs. Besides, English letters and digits are also more than those in news corpus. They all are natural delimiters of Chinese characters and we treat them just the same as punctuations. We propose a method to enlarge the training corpus by using punctuation information. We build a semi-supervised learning (SSL) framework which can iteratively incorporate newly labeled instances from unlabeled micro-blog data during the training process. We test our method on microblog texts and experiments show good results. This paper is organized as follows. In section 1 we introduce the problem. Section 2 gives detailed description of our approach. We show the experi- ment and analyze the results in section 3. Section 4 gives the related works and in section 5 we conclude the whole work. 2 Our method 2.1 Punctuations Chinese word segmentation problem might be treated as a character labeling problem which gives each character a label indicating its position in one word. To be simple, one can use label ’B’ to indicate a character is the beginning of a word, and use ’N’ to indicate a character is not the beginning of a word. We also use the 2-tag in our work. Other tag sets like the ’BIES’ tag set are not suiteable because the puctuation information cannot decide whether a character after punctuation should be labeled as ’B’ or ’S’(word with Single 177 ProceedingSsof oifa, th Beu 5l1gsarti Aan,An uuaglu Mste 4e-ti9n2g 0 o1f3 t.he ?c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioinngauli Lsitnicgsu,i psatgicess 177–182, micNreow-bslogC68h56i. n73e%%seE10n1.g6.8l%i%shN20u. m76%%berPu1n13c9.t u03a%%tion Table 1: Percentage of Chinese, English, number, punctuation in the news corpus vs. the micro-blogs. character). Punctuations can serve as implicit labels for the characters before and after them. The character right after punctuations must be the first character of a word, meanwhile the character right before punctuations must be the last character of a word. An example is given in TABLE 2. 2.2 Algorithm Our algorithm “ADD-N” is shown in TABLE 3. The initially selected character instances are those right after punctuations. By definition they are all labeled with ’B’ . In this case, the number of training instances with label ’B’ is increased while the number with label ’N’ remains unchanged. Because of this, the model trained on this unbalanced corpus tends to be biased. This problem can become even worse when there is inexhaustible supply of texts from the target domain. We assume that labeled corpus of the source domain can be treated as a balanced reflection of different labels. Therefore we choose to estimate the balanced point by counting characters labeling ’B’ and ’N’ and calculate the ratio which we denote as η . We assume the enlarged corpus is also balanced if and only if the ratio of ’B’ to ’N’ is just the same to η of the source domain. Our algorithm uses data from source domain to make the labels balanced. When enlarging corpus using characters behind punctuations from texts in target domain, only characters labeling ’B’ are added. We randomly reuse some characters labeling ’N’ from labeled data until ratio η is reached. We do not use characters ahead of punctuations, because the single-character words ahead of punctuations take the label of ’B’ instead of ’N’ . In summary our algorithm tackles the problem by duplicating labeled data in source domain. We denote our algorithm as ”ADD-N”. We also use baseline feature templates include the features described in previous works (Sun and Xu, 2011; Sun et al., 2012). Our algorithm is not necessarily limited to a specific tagger. For simplicity and reliability, we use a simple MaximumEntropy tagger. 3 Experiment 3.1 Data set We evaluate our method using the data from weibo.com, which is the biggest micro-blog service in China. We use the API provided by weibo.com1 to crawl 500,000 micro-blog texts of weibo.com, which contains 24,243,772 characters. To keep the experiment tractable, we first randomly choose 50,000 of all the texts as unlabeled data, which contain 2,420,037 characters. We manually segment 2038 randomly selected microblogs.We follow the segmentation standard as the PKU corpus. In micro-blog texts, the user names and URLs have fixed format. User names start with ’ @ ’, followed by Chinese characters, English letters, numbers and ’ ’, and terminated when meeting punctuations or blanks. URLs also match fixed patterns, which are shortened using ”http : / /t . cn /” plus six random English letters or numbers. Thus user names and URLs can be pre-processed separately. We follow this principle in following experiments. We use the benchmark datasets provided by the second International Chinese Word Segmentation Bakeoff2 as the labeled data. We choose the PKU data in our experiment because our baseline methods use the same segmentation standard. We compare our method with three baseline methods. The first two are both famous Chinese word segmentation tools: ICTCLAS3 and Stanford Chinese word segmenter4, which are widely used in NLP related to word segmentation. Stanford Chinese word segmenter is a CRF-based segmentation tool and its segmentation standard is chosen as the PKU standard, which is the same to ours. ICTCLAS, on the other hand, is a HMMbased Chinese word segmenter. Another baseline is Li and Sun (2009), which also uses punctuation in their semi-supervised framework. F-score 1http : / / open . we ibo .com/wiki 2http : / /www . s ighan .org/bakeo f f2 0 0 5 / 3http : / / i c l .org/ ct as 4http : / / nlp . st an ford . edu /pro j ect s / chine s e-nlp . shtml \ # cws 178 评B论-是-风-格-，-评B论-是-能-力-。- BNBBNBBNBBNB Table 2: The first line represents the original text. The second line indicates whether each character is the Beginning of sentence. The third line is the tag sequence using ”BN” tag set. is used as the accuracy measure. The recall of out-of-vocabulary is also taken into consideration, which measures the ability of the model to correctly segment out of vocabulary words. 3.2 Main results methods on the development data. TABLE 4 summarizes the segmentation results. In TABLE 4, Li-Sun is the method in Li and Sun (2009). Maxent only uses the PKU data for training, with neither punctuation information nor self-training framework incorporated. The next 4 methods all require a 100 iteration of self-training. No-punc is the method that only uses self-training while no punctuation information is added. Nobalance is similar to ADD N. The only difference between No-balance and ADD-N is that the former does not balance label ’B’ and label ’N’ . The comparison of Maxent and No-punctuation shows that naively adding confident unlabeled instances does not guarantee to improve performance. The writing style and word formation of the source domain is different from target domain. When segmenting texts of the target domain using models trained on source domain, the performance will be hurt with more false segmented instances added into the training set. The comparison of Maxent, No-balance and ADD-N shows that considering punctuation as well as self-training does improve performance. Both the f-score and OOV-recall increase. By comparing No-balance and ADD-N alone we can find that we achieve relatively high f-score if we ignore tag balance issue, while slightly hurt the OOV-Recall. However, considering it will improve OOV-Recall by about +1.6% and the fscore +0.2%. We also experimented on different size of unlabeled data to evaluate the performance when adding unlabeled target domain data. TABLE 5 shows different f-scores and OOV-Recalls on different unlabeled data set. We note that when the number of texts changes from 0 to 50,000, the f-score and OOV both are improved. However, when unlabeled data changes to 200,000, the performance is a bit decreased, while still better than not using unlabeled data. This result comes from the fact that the method ’ADD-N’ only uses characters behind punctua179 Tabl152S0eiz 0:Segm0.8nP67ta245ion0p.8Rer6745f9om0a.8nF57c6e1witOh0 .d7Vi65f-2394Rernt size of unlabeled data tions from target domain. Taking more texts into consideration means selecting more characters labeling ’N’ from source domain to simulate those in target domain. If too many ’N’s are introduced, the training data will be biased against the true distribution of target domain. 3.3 Characters ahead of punctuations In the ”BN” tagging method mentioned above, we incorporate characters after punctuations from texts in micro-blog to enlarge training set.We also try an opposite approach, ”EN” tag, which uses ’E’ to represent ”End of word”, and ’N’ to rep- resent ”Not the end of word”. In this contrasting method, we only use charactersjust ahead ofpunctuations. We find that the two methods show similar results. Experiment results with ADD-N are shown in TABLE 6 . 5DU0an0lt a b0Tsiealzbe lde6:0.C8Fo7”m5BNpa”rO0itsOa.o7gVn7-3oRfBN0.8aFn”7E0dNEN”Ot0.aO.g7V6-3R 4 Related Work Recent studies show that character sequence labeling is an effective formulation of Chinese word segmentation (Low et al., 2005; Zhao et al., 2006a,b; Chen et al., 2006; Xue, 2003). These supervised methods show good results, however, are unable to incorporate information from new domain, where OOV problem is a big challenge for the research community. On the other hand unsupervised word segmentation Peng and Schuurmans (2001); Goldwater et al. (2006); Jin and Tanaka-Ishii (2006); Feng et al. (2004); Maosong et al. (1998) takes advantage of the huge amount of raw text to solve Chinese word segmentation problems. However, they usually are less accurate and more complicated than supervised ones. Meanwhile semi-supervised methods have been applied into NLP applications. Bickel et al. (2007) learns a scaling factor from data of source domain and use the distribution to resemble target domain distribution. Wu et al. (2009) uses a Domain adaptive bootstrapping (DAB) framework, which shows good results on Named Entity Recognition. Similar semi-supervised applications include Shen et al. (2004); Daum e´ III and Marcu (2006); Jiang and Zhai (2007); Weinberger et al. (2006). Besides, Sun and Xu (201 1) uses a sequence labeling framework, while unsupervised statistics are used as discrete features in their model, which prove to be effective in Chinese word segmentation. There are previous works using punctuations as implicit annotations. Riley (1989) uses it in sentence boundary detection. Li and Sun (2009) proposed a compromising solution to by using a clas- sifier to select the most confident characters. We do not follow this approach because the initial errors will dramatically harm the performance. Instead, we only add the characters after punctuations which are sure to be the beginning of words (which means labeling ’B’) into our training set. Sun and Xu (201 1) uses punctuation information as discrete feature in a sequence labeling framework, which shows improvement compared to the pure sequence labeling approach. Our method is different from theirs. We use characters after punctuations directly. 5 Conclusion In this paper we have presented an effective yet simple approach to Chinese word segmentation on micro-blog texts. In our approach, punctuation information of unlabeled micro-blog data is used, as well as a self-training framework to incorporate confident instances. Experiments show that our approach improves performance, especially in OOV-recall. Both the punctuation information and the self-training phase contribute to this improve- ment. Acknowledgments This research was partly supported by National High Technology Research and Development Program of China (863 Program) (No. 2012AA01 1101), National Natural Science Foundation of China (No.91024009) and Major National Social Science Fund of China(No. 12&ZD227;). 180 References Bickel, S., Br¨ uckner, M., and Scheffer, T. (2007). Discriminative learning for differing training and test distributions. In Proceedings ofthe 24th international conference on Machine learning, pages 81–88. ACM. Chen, W., Zhang, Y., and Isahara, H. (2006). Chinese named entity recognition with conditional random fields. In 5th SIGHAN Workshop on Chinese Language Processing, Australia. Daum e´ III, H. and Marcu, D. (2006). Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26(1): 101–126. Feng, H., Chen, K., Deng, X., and Zheng, W. (2004). Accessor variety criteria for chinese word extraction. Computational Linguistics, 30(1):75–93. Goldwater, S., Griffiths, T., and Johnson, M. (2006). Contextual dependencies in unsupervised word segmentation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 673–680. Association for Computational Linguistics. Jiang, J. and Zhai, C. (2007). Instance weighting for domain adaptation in nlp. In Annual Meeting-Association For Computational Linguistics, volume 45, page 264. Jin, Z. and Tanaka-Ishii, K. (2006). Unsupervised segmentation of chinese text by use of branching entropy. In Proceedings of the COLING/ACL on Main conference poster sessions, pages 428–435. Association for Computational Linguistics. Li, Z. and Sun, M. (2009). Punctuation as implicit annotations for chinese word segmentation. Computational Linguistics, 35(4):505– 512. Low, J., Ng, H., and Guo, W. (2005). A maximum entropy approach to chinese word segmentation. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, volume 164. Jeju Island, Korea. Maosong, S., Dayang, S., and Tsou, B. (1998). Chinese word segmentation without using lexicon and hand-crafted training data. In Proceedings of the 1 7th international conference on Computational linguistics-Volume 2, pages 1265–1271 . Association for Computational Linguistics. Pan, S. and Yang, Q. (2010). A survey on transfer learning. Knowledge and Data Engineering, IEEE Transactions on, 22(10): 1345–1359. Peng, F. and Schuurmans, D. (2001). Selfsupervised chinese word segmentation. Advances in Intelligent Data Analysis, pages 238– 247. Riley, M. (1989). Some applications of tree-based modelling to speech and language. In Proceedings of the workshop on Speech and Natural Language, pages 339–352. Association for Computational Linguistics. Shen, D., Zhang, J., Su, J., Zhou, G., and Tan, C. (2004). Multi-criteria-based active learning for named entity recognition. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, page 589. Association for Computational Linguistics. Sun, W. and Xu, J. (201 1). Enhancing chinese word segmentation using unlabeled data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 970–979. Association for Computational Linguistics. Sun, X., Wang, H., and Li, W. (2012). Fast online training with frequency-adaptive learning rates for chinese word segmentation and new word detection. In Proceedings of the 50th Annual Meeting of the Association for Compu- tational Linguistics (Volume 1: Long Papers), pages 253–262, Jeju Island, Korea. Association for Computational Linguistics. Weinberger, K., Blitzer, J., and Saul, L. (2006). Distance metric learning for large margin nearest neighbor classification. In In NIPS. Citeseer. Wu, D., Lee, W., Ye, N., and Chieu, H. (2009). Domain adaptive bootstrapping for named entity recognition. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3-Volume 3, pages 1523–1532. Association for Computational Linguistics. Xue, N. (2003). Chinese word segmentation as character tagging. Computational Linguistics and Chinese Language Processing, 8(1):29–48. Zhao, H., Huang, C., and Li, M. (2006a). An improved chinese word segmentation system with 181 conditional random field. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, volume 117. Sydney: July. Zhao, H., Huang, C., Li, M., and Lu, B. (2006b). Effective tag set selection in chinese word segmentation via conditional random field modeling. In Proceedings pages of PACLIC, volume 20, 87–94. 182

5 0.65530902 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

Author: Ulle Endriss ; Raquel Fernandez

Abstract: Crowdsourcing, which offers new ways of cheaply and quickly gathering large amounts of information contributed by volunteers online, has revolutionised the collection of labelled data. Yet, to create annotated linguistic resources from this data, we face the challenge of having to combine the judgements of a potentially large group of annotators. In this paper we investigate how to aggregate individual annotations into a single collective annotation, taking inspiration from the field of social choice theory. We formulate a general formal model for collective annotation and propose several aggregation methods that go beyond the commonly used majority rule. We test some of our methods on data from a crowdsourcing experiment on textual entailment annotation.

6 0.64258307 237 acl-2013-Margin-based Decomposed Amortized Inference

7 0.63524139 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition

8 0.63299209 297 acl-2013-Recognizing Partial Textual Entailment

9 0.63152593 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

10 0.63000387 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning

11 0.62642044 333 acl-2013-Summarization Through Submodularity and Dispersion

12 0.62591255 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction

13 0.62459099 275 acl-2013-Parsing with Compositional Vector Grammars

14 0.62333924 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

15 0.62294322 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics

16 0.61790091 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

17 0.61732751 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity

18 0.61723572 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models

19 0.61717188 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification

20 0.61616611 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit