acl acl2012 acl2012-23 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dong Wang ; Xian Qian ; Yang Liu
Abstract: This paper presents a two-step approach to compress spontaneous spoken utterances. In the first step, we use a sequence labeling method to determine if a word in the utterance can be removed, and generate n-best compressed sentences. In the second step, we use a discriminative training approach to capture sentence level global information from the candidates and rerank them. For evaluation, we compare our system output with multiple human references. Our results show that the new features we introduced in the first compression step improve performance upon the previous work on the same data set, and reranking is able to yield additional gain, especially when training is performed to take into account multiple references.
Reference: text
sentIndex sentText sentNum sentScore
1 edu a Abstract This paper presents a two-step approach to compress spontaneous spoken utterances. [sent-3, score-0.226]
2 In the first step, we use a sequence labeling method to determine if a word in the utterance can be removed, and generate n-best compressed sentences. [sent-4, score-0.359]
3 In the second step, we use a discriminative training approach to capture sentence level global information from the candidates and rerank them. [sent-5, score-0.308]
4 Our results show that the new features we introduced in the first compression step improve performance upon the previous work on the same data set, and reranking is able to yield additional gain, especially when training is performed to take into account multiple references. [sent-7, score-1.013]
5 1 Introduction Sentence compression aims to preserve the most important information in the original sentence with fewer words. [sent-8, score-0.911]
6 It can be used for abstractive summa- rization where extracted important sentences often need to be compressed and merged. [sent-9, score-0.265]
7 For summarization of spontaneous speech, sentence compression is especially important, since unlike fluent and wellstructured written text, spontaneous speech contains a lot of disfluencies and much redundancy. [sent-10, score-1.232]
8 The following shows an example of a pair of source and compressed spoken from human annotation (removed words shown in bold): sentences1 [original sentence] 1For speech domains, “sentences” are not clearly defined. [sent-11, score-0.309]
9 We use sentences and utterances interchangeably when there is no ambiguity. [sent-12, score-0.06]
10 166 and then um in terms of the source the things uh the only things that we had on there I believe were whether. [sent-13, score-0.086]
11 [compressed sentence] and then in terms of the source the only things that we had on there were whether. [sent-16, score-0.043]
12 In this study we investigate sentence compression of spoken utterances in order to remove redundant or unnecessary words while trying to preserve the information in the original sentence. [sent-19, score-0.997]
13 Sentence compression has been studied from formal text domain to speech domain. [sent-20, score-0.812]
14 (Galley and Mckeown, 2007) proposes to use a synchronous context-free grammars (SCFG) based method to compress the sentence. [sent-22, score-0.089]
15 In speech domain, (Clarke and Lapata, 2008) investigates sentence compression in broadcast news using an integer linear programming approach. [sent-24, score-0.925]
16 There is only a few existing work in spontaneous speech domains. [sent-25, score-0.157]
17 (Liu and Liu, 2010) modeled it as a sequence labeling problem using conditional random fields model. [sent-26, score-0.131]
18 (Liu and Liu, 2009) compared the effect of different compression methods on a meeting summarization task, but did not evaluate sentence compression itself. [sent-27, score-1.693]
19 We propose to use a two-step approach in this paper for sentence compression of spontaneous speech utterances. [sent-28, score-1.018]
20 In the first step, we adopt a similar sequence labeling method as used in (Liu and Liu, 2010), but expanded the feature set, which Proce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A. [sent-30, score-0.19]
21 In the second step, we use discriminative reranking to incorporate global information about the compressed sentence candidates, which cannot be accomplished by word level labeling. [sent-33, score-0.58]
22 • We evaluate our methods using different metrWices vinaclluuadteing o rw moredt-hloevdsel accuracy aenndt mFe1t-measure by comparing to one reference compression, and BLEU scores comparing with multiple references. [sent-34, score-0.105]
23 We also demonstrate that training in the reranking module can be tailed to the evaluation metrics to optimize system performance. [sent-35, score-0.239]
24 2 Corpus We use the same corpus as (Liu and Liu, 2010) where they annotated 2,860 summary sentences in 26 meetings from the ICSI meeting corpus (Murray et al. [sent-36, score-0.072]
25 In their annotation procedure, filled pauses such as “uh/um” and incomplete words are removed before annotation. [sent-38, score-0.323]
26 In the first step, 8 annotators were asked to select words to be removed to compress the sentences. [sent-39, score-0.212]
27 In the second step, 6 annotators (different from the first step) were asked to pick the best one from the 8 compressions from the previous step. [sent-40, score-0.348]
28 Therefore for each sentence, we have 8 human compressions, as well a best one selected by the majority of the 6 annotators in the second step. [sent-41, score-0.077]
29 The compression ratio of the best human reference is 63. [sent-42, score-0.96]
30 In the first step of our sentence compression approach (described below), for model training we need the reference labels for each word, which represents whether it is preserved or deleted in the compressed sentence. [sent-44, score-1.394]
31 In (Liu and Liu, 2010), they used the labels from the annotators directly. [sent-45, score-0.102]
32 For each sentence, we still use the best compression as the gold standard, but we realign the pair of the source sentence and the compressed sentence, instead of using the labels provided by annotators. [sent-47, score-1.163]
33 This is because when there are repeated words, annotators sometimes randomly pick removed ones. [sent-48, score-0.199]
34 However, we want to keep the patterns consistent for model training we always label the last appearance of the repeated words as ‘preserved’, and the earlier ones as ‘deleted’ . [sent-49, score-0.079]
35 Another difference in our processing of the corpus from the previous work is that when aligning the original and the compressed sentence, we keep filled pauses and incomplete words since they tend to appear together with disfluencies and thus provide useful in– formation for compression. [sent-50, score-0.545]
36 167 3 Sentence Compression Approach Our compression approach has two steps: in the first step, we use Conditional Random Fields (CRFs) to model this problem as a sequence labeling task, where the label indicates whether the word should be removed or not. [sent-51, score-1.01]
37 We select n-best candidates (n = 25 in our work) from this step. [sent-52, score-0.062]
38 In the second step we use discriminative training based on a maximum Entropy model to rerank the candidate compressions, in order to select the best one based on the quality of the whole candidate sentence, which cannot be performed in the first step. [sent-53, score-0.339]
39 1 Generate N-best Candidates In the first step, we cast sentence compression as a sequence labeling problem. [sent-55, score-0.966]
40 Each sentence with n words can be viewed as a word sequence X1, X2, . [sent-59, score-0.136]
41 , Xn, and our task is to find the best label sequence Y1, Y2, . [sent-62, score-0.106]
42 Similar to (Liu and Liu, 2010), for sequence labeling we use linear-chain first-order CRFs. [sent-66, score-0.105]
43 To train the model for this step, we use the best reference compression to obtain the reference labels (as described in Section 2). [sent-68, score-1.065]
44 In the CRF compression model, each word is represented by a feature vector. [sent-69, score-0.802]
45 In this work, we further expand the feature set in order to represent the characteristics of disfluencies in spontaneous speech as well as model the adjacent output labels. [sent-72, score-0.25]
46 a binary feature to indicate if there is a filled pause royr fienactoumrepl teote i wdoicradt ein i ft thhee froel ilsow ain fgill e4dword window. [sent-74, score-0.153]
47 We add this feature since filled pauses or incomplete words often appear after disfluent words. [sent-75, score-0.251]
48 transition features: a combination ofthe current output loanbfeel aatnudre etsh:e a apcroemviobuinsa one, toofgtehtehecru rwreitnht some observation features such as the unigram and bigrams of word or POS tag. [sent-79, score-0.089]
49 In this work, we propose to use discriminative training to rerank the candidates generated in the first step. [sent-82, score-0.192]
50 Reranking has been used in many tasks to find better global solutions, such as machine translation (Wang et al. [sent-83, score-0.025]
51 , 2007), parsing (Charniak and Johnson, 2005), and disfluency detection (Zwarts and Johnson, 2011). [sent-84, score-0.055]
52 We use a maximum Entropy reranker to learn distributions over a set of candidates such that the probability of the best compression is maximized. [sent-85, score-0.862]
53 The conditional probability of output y given observation x in the maximum entropy model is defined as: p(y|x) = Z(1x)exp hPik=1λif(x,y)i where f(x, y) are fehaPture functions and λi are their weighting parameters; Z(x) is the normalization factor. [sent-86, score-0.081]
54 In this reranking model, every compression candidate is represented by the following features: • All the bigrams and trigrams of words and POS tAalgls t hine tbhieg rcaamndsi adnadte t rsiegnrtaemnsce o. [sent-87, score-1.144]
55 • Bigrams and trigrams of words and POS tags in tBhieg orarmigsin aanld dse trnigternamces i onf c woomrdbsin aantidon P OwSit hta tghse i nr binary labels in the candidate sentence (delete the word or not). [sent-88, score-0.268]
56 For example, if the original sentence is “so Ishould go”, and the can- didate compression sentence is “I should go”, 168 then “so I 10”, “so I should 100” are included in the features (1 means the word is deleted). [sent-89, score-0.996]
57 • • • • The log likelihood of the candidate sentence bTahseed lo oogn tlhikee l aihnogoudag oef m thoede cla. [sent-90, score-0.209]
58 The absolute difference of the compression ratTioh eo afb tshoel ctaen ddiifdfaertee sceent oefn tchee wcoitmhp trheasts oofn t rhaefirst ranked candidate. [sent-91, score-0.792]
59 This is because we try to avoid a very large or small compression ratio, and the first candidate is generally a good candidate with reasonable length. [sent-92, score-0.896]
60 The probability of the label sequence of the cTahnedi pdraoteb asbenilitetync oef fgi tvheen l babye ethle s efiqruste step CofRF thse. [sent-93, score-0.184]
61 The rank of the candidate sentence in 25 best lTishte. [sent-94, score-0.184]
62 For discriminative training using the n-best candidates, we need to identify the best candidate from the n-best list, which can be either the reference compression (if it exists on the list), or the most similar candidate to the reference. [sent-95, score-1.088]
63 Since we have 8 human compressions and also want to evaluate system performance using all of them (see experiments later), we try to use multiple references in this reranking step. [sent-96, score-0.435]
64 If no reference compression appears in 25best list, we just keep the entire list and label the instance that is most similar to the best reference compression as positive. [sent-98, score-1.87]
65 4 Experiments We perform a cross-validation evaluation where one meeting is used for testing and the rest of them are used as the training set. [sent-99, score-0.024]
66 When evaluating the system performance, we do not consider filled pauses and incomplete words since they can be easily identified and removed. [sent-100, score-0.219]
67 We use two different performance metrics in this study. [sent-101, score-0.025]
68 These measures are obtained by comparing with the best compression. [sent-104, score-0.03]
69 • In evaluation we map the result using ‘BIO’ labels from the first-step compression to binary labels that indicate a word is removed or not. [sent-105, score-1.012]
70 Since there is a great variation in human compression results, and we have 8 reference compressions, we explore using BLEU for our sentence compression task. [sent-110, score-1.736]
71 Table 1 shows the averaged scores of the cross validation evaluation using the above metrics for several methods. [sent-113, score-0.025]
72 Also shown in the table is the compression ratio of the system output. [sent-114, score-0.825]
73 For “reference”, we randomly choose one compression from 8 references, and use the rest of them as references in calculating the BLEU score. [sent-115, score-0.77]
74 The row “basic features” shows the result of using all features in (Liu and Liu, 2010) except discourse parsing tree based features, and using binary labels (removed or not). [sent-117, score-0.114]
75 The next row uses this same basic feature set and “BIO” labels. [sent-118, score-0.063]
76 Row “expanded features” shows the result of our ex- panded feature set using “BIO” label set from the first step of compression. [sent-119, score-0.116]
77 The last two rows show the results after reranking, trained using one best reference or 8 reference compressions, respectively. [sent-120, score-0.24]
78 using only one best compression, however, training with multiple references improves BLEU scores. [sent-121, score-0.03]
79 This indicates the discriminative training used in maximum entropy reranking is consistent with the performance metrics. [sent-122, score-0.277]
80 Another reason for the performance gain for this condition is that there is less data imbalance in model training (since we split the n-best list, each containing fewer negative examples). [sent-123, score-0.049]
81 We also notice that the compression ratio after reranking is more similar to the reference. [sent-124, score-1.015]
82 , 2011), it is not appropriate to compare compression systems with different compression ratios, especially when considering grammars and meanings. [sent-126, score-1.568]
83 Therefore for the compression system without reranking, we generated results with the same compression ratio (77. [sent-127, score-1.595]
84 15%), and found that using reranking still outperforms this result, 1. [sent-128, score-0.19]
85 For an analysis, we check how often our system output contains reference compressions based on the 8 references. [sent-130, score-0.35]
86 8% of system generated compressions appear in the 8 refer- ences when using CRF output with a compression ration of 77. [sent-132, score-1.037]
87 7% of sentences, the 25-best list contains one or more reference sentences, that is, there – ebar nxfsdpeiarLcneifdueca,dt2u0fre1a0st,u)(BrLeIiOsuac87 691u. [sent-136, score-0.138]
88 okehnicbeghn- rtTr e a ria ibn nle wkwkin / 18:g 1C re ofmspres ion7 r98. [sent-142, score-0.048]
89 50cT5ohmispCproeaspnseciorlunp:sreiwose n fitrs tag2e-nsetr aptea pn rno-abcehstfloisrtsfeonrte anc he Our result using the basic feature set is similar to that in (Liu and Liu, 2010) (their accuracy is 76. [sent-147, score-0.056]
90 7), though the experimental setups are different: they used 6 meetings as the test set while we performed cross validation. [sent-149, score-0.048]
91 Using the “BIO” label set instead of binary labels has marginal improvement for the three scores. [sent-150, score-0.114]
92 From the table, we can see that our expanded feature set is able to significantly improve the result, suggesting the effectiveness of the new introduced features. [sent-151, score-0.085]
93 Regarding the two training settings in reranking, we find that there is no gain from reranking when 169 source sentence using a sequence labeling method, then rerank the n-best candidates to select the best one based on the quality of the whole candidate sentence using discriminative training. [sent-152, score-0.787]
94 Our results show that our expanded feature set improves the performance across multiple metrics, and reranking is able to improve the BLEU score. [sent-154, score-0.275]
95 In future work, we will incorporate more syntactic information in the model to better evaluate sentence quality. [sent-155, score-0.091]
96 We also plan to perform a human evaluation for the compressed sentences, and use sentence compression in summarization. [sent-156, score-1.078]
97 Global inference for sentence compression an integer linear programming approach. [sent-168, score-0.883]
98 From extractive to abstractive meeting summaries: can it be done by sentence compression? [sent-185, score-0.199]
99 Using spoken utterance compression for meeting summarization: a pilot study. [sent-189, score-0.881]
100 The impact of language models and loss functions on repair disfluency detection. [sent-207, score-0.079]
wordName wordTfidf (topN-words)
[('compression', 0.77), ('compressions', 0.245), ('compressed', 0.217), ('reranking', 0.19), ('liu', 0.117), ('spontaneous', 0.115), ('reference', 0.105), ('removed', 0.104), ('bio', 0.101), ('pauses', 0.095), ('sentence', 0.091), ('bleu', 0.079), ('rerank', 0.073), ('filled', 0.071), ('deleted', 0.067), ('candidate', 0.063), ('candidates', 0.062), ('disfluencies', 0.061), ('compress', 0.061), ('labeling', 0.06), ('discriminative', 0.057), ('yk', 0.057), ('labels', 0.055), ('ratio', 0.055), ('disfluency', 0.055), ('zwarts', 0.055), ('incomplete', 0.053), ('expanded', 0.053), ('step', 0.053), ('spoken', 0.05), ('abstractive', 0.048), ('meetings', 0.048), ('murray', 0.048), ('napoles', 0.048), ('tbhieg', 0.048), ('annotators', 0.047), ('sequence', 0.045), ('things', 0.043), ('speech', 0.042), ('bigrams', 0.042), ('summarization', 0.038), ('utterance', 0.037), ('extractive', 0.036), ('preserved', 0.036), ('utterances', 0.036), ('clarke', 0.035), ('list', 0.033), ('feature', 0.032), ('label', 0.031), ('oef', 0.031), ('trigrams', 0.031), ('row', 0.031), ('crfs', 0.03), ('best', 0.03), ('entropy', 0.03), ('cohn', 0.029), ('grammars', 0.028), ('preserve', 0.028), ('binary', 0.028), ('conditional', 0.026), ('keep', 0.026), ('pick', 0.026), ('observation', 0.025), ('fragment', 0.025), ('gain', 0.025), ('metrics', 0.025), ('global', 0.025), ('meeting', 0.024), ('fei', 0.024), ('interchangeably', 0.024), ('imbalance', 0.024), ('anc', 0.024), ('ethle', 0.024), ('iare', 0.024), ('ibn', 0.024), ('ishould', 0.024), ('jfj', 0.024), ('nle', 0.024), ('renals', 0.024), ('repair', 0.024), ('tailed', 0.024), ('thoede', 0.024), ('wdo', 0.024), ('galley', 0.024), ('mirella', 0.024), ('darpa', 0.023), ('lapata', 0.023), ('charniak', 0.023), ('integer', 0.022), ('repeated', 0.022), ('yang', 0.022), ('ain', 0.022), ('courtney', 0.022), ('dallas', 0.022), ('didate', 0.022), ('lse', 0.022), ('oefn', 0.022), ('ration', 0.022), ('original', 0.022), ('unigram', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 23 acl-2012-A Two-step Approach to Sentence Compression of Spoken Utterances
Author: Dong Wang ; Xian Qian ; Yang Liu
Abstract: This paper presents a two-step approach to compress spontaneous spoken utterances. In the first step, we use a sequence labeling method to determine if a word in the utterance can be removed, and generate n-best compressed sentences. In the second step, we use a discriminative training approach to capture sentence level global information from the candidates and rerank them. For evaluation, we compare our system output with multiple human references. Our results show that the new features we introduced in the first compression step improve performance upon the previous work on the same data set, and reranking is able to yield additional gain, especially when training is performed to take into account multiple references.
2 0.35826874 176 acl-2012-Sentence Compression with Semantic Role Constraints
Author: Katsumasa Yoshikawa ; Ryu Iida ; Tsutomu Hirao ; Manabu Okumura
Abstract: For sentence compression, we propose new semantic constraints to directly capture the relations between a predicate and its arguments, whereas the existing approaches have focused on relatively shallow linguistic properties, such as lexical and syntactic information. These constraints are based on semantic roles and superior to the constraints of syntactic dependencies. Our empirical evaluation on the Written News Compression Corpus (Clarke and Lapata, 2008) demonstrates that our system achieves results comparable to other state-of-the-art techniques.
3 0.097343996 101 acl-2012-Fully Abstractive Approach to Guided Summarization
Author: Pierre-Etienne Genest ; Guy Lapalme
Abstract: This paper shows that full abstraction can be accomplished in the context of guided summarization. We describe a work in progress that relies on Information Extraction, statistical content selection and Natural Language Generation. Early results already demonstrate the effectiveness of the approach.
4 0.086425312 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums
Author: Zhonghua Qu ; Yang Liu
Abstract: Online forums are becoming a popular resource in the state of the art question answering (QA) systems. Because of its nature as an online community, it contains more updated knowledge than other places. However, going through tedious and redundant posts to look for answers could be very time consuming. Most prior work focused on extracting only question answering sentences from user conversations. In this paper, we introduce the task of sentence dependency tagging. Finding dependency structure can not only help find answer quickly but also allow users to trace back how the answer is concluded through user conversations. We use linear-chain conditional random fields (CRF) for sentence type tagging, and a 2D CRF to label the dependency relation between sentences. Our experimental results show that our proposed approach performs well for sentence dependency tagging. This dependency information can benefit other tasks such as thread ranking and answer summarization in online forums.
5 0.073940068 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
Author: Xiaodong He ; Li Deng
Abstract: This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU score-based utility function with KL regularization as the objective, and train the models on a large parallel dataset. For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective. The proposed method, evaluated on the Europarl German-to-English dataset, leads to a 1.1 BLEU point improvement over a state-of-the-art baseline translation system. In IWSLT 201 1 Benchmark, our system using the proposed method achieves the best Chinese-to-English translation result on the task of translating TED talks.
6 0.061528265 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
7 0.056127295 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization
8 0.055743691 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
9 0.054392107 154 acl-2012-Native Language Detection with Tree Substitution Grammars
10 0.052761249 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
11 0.051550083 131 acl-2012-Learning Translation Consensus with Structured Label Propagation
12 0.04643723 109 acl-2012-Higher-order Constituent Parsing and Parser Combination
13 0.046266578 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors
14 0.045485158 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
15 0.045385554 52 acl-2012-Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation
16 0.045108166 178 acl-2012-Sentence Simplification by Monolingual Machine Translation
17 0.044671129 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning
18 0.043863039 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
19 0.042395897 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language
20 0.042246576 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation
topicId topicWeight
[(0, -0.151), (1, -0.013), (2, -0.039), (3, 0.006), (4, -0.003), (5, 0.007), (6, -0.02), (7, 0.031), (8, -0.022), (9, 0.085), (10, -0.034), (11, -0.078), (12, -0.045), (13, 0.019), (14, -0.145), (15, 0.071), (16, 0.129), (17, -0.062), (18, 0.068), (19, 0.024), (20, 0.177), (21, 0.028), (22, 0.157), (23, -0.082), (24, 0.079), (25, 0.121), (26, 0.087), (27, 0.022), (28, -0.288), (29, -0.156), (30, 0.084), (31, 0.126), (32, -0.195), (33, 0.056), (34, 0.012), (35, 0.182), (36, 0.265), (37, -0.067), (38, 0.16), (39, 0.127), (40, 0.033), (41, 0.251), (42, 0.123), (43, -0.032), (44, 0.056), (45, 0.012), (46, 0.053), (47, 0.178), (48, 0.074), (49, 0.002)]
simIndex simValue paperId paperTitle
same-paper 1 0.93790382 23 acl-2012-A Two-step Approach to Sentence Compression of Spoken Utterances
Author: Dong Wang ; Xian Qian ; Yang Liu
Abstract: This paper presents a two-step approach to compress spontaneous spoken utterances. In the first step, we use a sequence labeling method to determine if a word in the utterance can be removed, and generate n-best compressed sentences. In the second step, we use a discriminative training approach to capture sentence level global information from the candidates and rerank them. For evaluation, we compare our system output with multiple human references. Our results show that the new features we introduced in the first compression step improve performance upon the previous work on the same data set, and reranking is able to yield additional gain, especially when training is performed to take into account multiple references.
2 0.79741651 176 acl-2012-Sentence Compression with Semantic Role Constraints
Author: Katsumasa Yoshikawa ; Ryu Iida ; Tsutomu Hirao ; Manabu Okumura
Abstract: For sentence compression, we propose new semantic constraints to directly capture the relations between a predicate and its arguments, whereas the existing approaches have focused on relatively shallow linguistic properties, such as lexical and syntactic information. These constraints are based on semantic roles and superior to the constraints of syntactic dependencies. Our empirical evaluation on the Written News Compression Corpus (Clarke and Lapata, 2008) demonstrates that our system achieves results comparable to other state-of-the-art techniques.
3 0.32478625 195 acl-2012-The Creation of a Corpus of English Metalanguage
Author: Shomir Wilson
Abstract: Metalanguage is an essential linguistic mechanism which allows us to communicate explicit information about language itself. However, it has been underexamined in research in language technologies, to the detriment of the performance of systems that could exploit it. This paper describes the creation of the first tagged and delineated corpus of English metalanguage, accompanied by an explicit definition and a rubric for identifying the phenomenon in text. This resource will provide a basis for further studies of metalanguage and enable its utilization in language technologies.
4 0.3209123 101 acl-2012-Fully Abstractive Approach to Guided Summarization
Author: Pierre-Etienne Genest ; Guy Lapalme
Abstract: This paper shows that full abstraction can be accomplished in the context of guided summarization. We describe a work in progress that relies on Information Extraction, statistical content selection and Natural Language Generation. Early results already demonstrate the effectiveness of the approach.
5 0.30776766 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs
Author: Sindhu Raghavan ; Raymond Mooney ; Hyeonseo Ku
Abstract: Most information extraction (IE) systems identify facts that are explicitly stated in text. However, in natural language, some facts are implicit, and identifying them requires “reading between the lines”. Human readers naturally use common sense knowledge to infer such implicit information from the explicitly stated facts. We propose an approach that uses Bayesian Logic Programs (BLPs), a statistical relational model combining firstorder logic and Bayesian networks, to infer additional implicit information from extracted facts. It involves learning uncertain commonsense knowledge (in the form of probabilistic first-order rules) from natural language text by mining a large corpus of automatically extracted facts. These rules are then used to derive additional facts from extracted information using BLP inference. Experimental evaluation on a benchmark data set for machine reading demonstrates the efficacy of our approach.
6 0.26868728 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
7 0.25084803 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums
8 0.24859037 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language
9 0.23380373 113 acl-2012-INPROwidth.3emiSS: A Component for Just-In-Time Incremental Speech Synthesis
10 0.23032002 178 acl-2012-Sentence Simplification by Monolingual Machine Translation
11 0.20972687 56 acl-2012-Computational Approaches to Sentence Completion
12 0.19792506 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
13 0.19525969 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
14 0.19219695 154 acl-2012-Native Language Detection with Tree Substitution Grammars
15 0.18980053 32 acl-2012-Automated Essay Scoring Based on Finite State Transducer: towards ASR Transcription of Oral English Speech
16 0.18794015 43 acl-2012-Building Trainable Taggers in a Web-based, UIMA-Supported NLP Workbench
17 0.18729831 83 acl-2012-Error Mining on Dependency Trees
18 0.18160225 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
19 0.18034564 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining
20 0.18026094 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
topicId topicWeight
[(26, 0.031), (28, 0.044), (30, 0.01), (37, 0.023), (39, 0.033), (44, 0.245), (74, 0.034), (82, 0.046), (84, 0.029), (85, 0.031), (90, 0.207), (92, 0.055), (94, 0.05), (99, 0.054)]
simIndex simValue paperId paperTitle
same-paper 1 0.78898668 23 acl-2012-A Two-step Approach to Sentence Compression of Spoken Utterances
Author: Dong Wang ; Xian Qian ; Yang Liu
Abstract: This paper presents a two-step approach to compress spontaneous spoken utterances. In the first step, we use a sequence labeling method to determine if a word in the utterance can be removed, and generate n-best compressed sentences. In the second step, we use a discriminative training approach to capture sentence level global information from the candidates and rerank them. For evaluation, we compare our system output with multiple human references. Our results show that the new features we introduced in the first compression step improve performance upon the previous work on the same data set, and reranking is able to yield additional gain, especially when training is performed to take into account multiple references.
2 0.77081287 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing
Author: Tahira Naseem ; Regina Barzilay ; Amir Globerson
Abstract: We present a novel algorithm for multilingual dependency parsing that uses annotations from a diverse set of source languages to parse a new unannotated language. Our motivation is to broaden the advantages of multilingual learning to languages that exhibit significant differences from existing resource-rich languages. The algorithm learns which aspects of the source languages are relevant for the target language and ties model parameters accordingly. The model factorizes the process of generating a dependency tree into two steps: selection of syntactic dependents and their ordering. Being largely languageuniversal, the selection component is learned in a supervised fashion from all the training languages. In contrast, the ordering decisions are only influenced by languages with similar properties. We systematically model this cross-lingual sharing using typological features. In our experiments, the model consistently outperforms a state-of-the-art multilingual parser. The largest improvement is achieved on the non Indo-European languages yielding a gain of 14.4%.1
Author: Weiwei Sun ; Hans Uszkoreit
Abstract: From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by constituent parsing and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated approaches yield a relative error reduction of 18% in total over a stateof-the-art baseline.
4 0.68869531 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors
Author: Hyun-Je Song ; Jeong-Woo Son ; Tae-Gil Noh ; Seong-Bae Park ; Sang-Jo Lee
Abstract: All types of part-of-speech (POS) tagging errors have been equally treated by existing taggers. However, the errors are not equally important, since some errors affect the performance of subsequent natural language processing (NLP) tasks seriously while others do not. This paper aims to minimize these serious errors while retaining the overall performance of POS tagging. Two gradient loss functions are proposed to reflect the different types of errors. They are designed to assign a larger cost to serious errors and a smaller one to minor errors. Through a set of POS tagging experiments, it is shown that the classifier trained with the proposed loss functions reduces serious errors compared to state-of-the-art POS taggers. In addition, the experimental result on text chunking shows that fewer serious errors help to improve the performance of sub- sequent NLP tasks.
5 0.68727452 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
Author: Spence Green ; John DeNero
Abstract: When automatically translating from a weakly inflected source language like English to a target language with richer grammatical features such as gender and dual number, the output commonly contains morpho-syntactic agreement errors. To address this issue, we present a target-side, class-based agreement model. Agreement is promoted by scoring a sequence of fine-grained morpho-syntactic classes that are predicted during decoding for each translation hypothesis. For English-to-Arabic translation, our model yields a +1.04 BLEU average improvement over a state-of-the-art baseline. The model does not require bitext or phrase table annotations and can be easily implemented as a feature in many phrase-based decoders. 1
6 0.68653631 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia
7 0.68603528 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
9 0.68500602 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
10 0.68408823 140 acl-2012-Machine Translation without Words through Substring Alignment
11 0.68306804 131 acl-2012-Learning Translation Consensus with Structured Label Propagation
12 0.68063259 119 acl-2012-Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
13 0.68037778 118 acl-2012-Improving the IBM Alignment Models Using Variational Bayes
14 0.67760843 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
15 0.67715335 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
16 0.67702329 73 acl-2012-Discriminative Learning for Joint Template Filling
17 0.67685783 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules
18 0.67663044 168 acl-2012-Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations
19 0.676292 55 acl-2012-Community Answer Summarization for Multi-Sentence Question with Group L1 Regularization
20 0.6752829 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval