acl acl2011 acl2011-51 knowledge-graph by maker-knowledge-mining

51 acl-2011-Automatic Headline Generation using Character Cross-Correlation

Source: pdf

Author: Fahad Alotaiby

Abstract: Arabic language is a morphologically complex language. Affixes and clitics are regularly attached to stems which make direct comparison between words not practical. In this paper we propose a new automatic headline generation technique that utilizes character cross-correlation to extract best headlines and to overcome the Arabic language complex morphology. The system that uses character cross-correlation achieves ROUGE-L score of 0. 19384 while the exact word matching scores only 0. 17252 for the same set of documents. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract Arabic language is a morphologically complex language. [sent-5, score-0.066]

2 Affixes and clitics are regularly attached to stems which make direct comparison between words not practical. [sent-6, score-0.097]

3 In this paper we propose a new automatic headline generation technique that utilizes character cross-correlation to extract best headlines and to overcome the Arabic language complex morphology. [sent-7, score-1.18]

4 The system that uses character cross-correlation achieves ROUGE-L score of 0. [sent-8, score-0.104]

5 19384 while the exact word matching scores only 0. [sent-9, score-0.041]

6 1 Introduction A headline is considered as a condensed summary of a document. [sent-11, score-0.691]

7 The necessity for automatic headline generation has been raised due to the need to handle huge amount of documents, which is a tedious and time-consuming process. [sent-13, score-0.714]

8 Instead of reading every document, the headline can be used to decide which of them contains important information. [sent-14, score-0.69]

9 There are two major disciplines towards automatic headline generation: extractive and abstractive. [sent-15, score-0.696]

10 In the work of (Douzidia and Lapalme, 2004), and extractive method was used to produce a 10words summary (which can be considered as a headline) of an Arabic document, and then it was automatically translated into English. [sent-16, score-0.068]

11 Therefore, the reported score reflects the accuracy of the gen117 eration and translation which makes it difficult to evaluate the process of headline generation of this system. [sent-17, score-0.774]

12 , 2003) is a system that creates a headline for an English newspaper story using linguistically-motivated heuristics to choose a potential headline. [sent-19, score-0.691]

13 1 Headline Length One of the tasks of the Document Understanding Conference of 2004 (DUC 2004) was generating a very short summary which can be considered as a headline. [sent-22, score-0.041]

14 Knowing that the average word size in Arabic is 5 characters (Alotaiby et al. [sent-24, score-0.048]

15 2009) in addition to space characters, the specified summary size in Arabic words was roughly equivalent to 12 words. [sent-25, score-0.041]

16 In the meantime, the average length of the headlines was about 8 words in the Arabic Gigaword corpus (Graff, 2007) of articles and their headlines. [sent-26, score-0.366]

17 In this work, a 10-words headline is considered as an appropriate length. [sent-27, score-0.65]

18 Every letter in the 28 Arabic alphabets represents a single consonant. [sent-30, score-0.021]

19 To overcome the problem of different pronunciations of consonants in Arabic text, graphPortland, OPRro,c UeeSdAin 1g9s- o2f4 t Jhuene AC 20L1-H1. [sent-31, score-0.035]

20 c 2 0 1 1 S Atus doecnitat Sieosnsi fo nr, C paomgepsu 1t1a7ti–o1n2a1l, Linguistics ical signs known as diacritics were invented in the seventh century. [sent-33, score-0.07]

21 Currently in the Modern Standard Arabic (MSA), diacritics are omitted from written text almost all the time. [sent-34, score-0.052]

22 As a result, this omission increases the number homographs (words with the same writing form). [sent-35, score-0.042]

23 However, Arab readers normally differentiate between homographs by the context of the script. [sent-36, score-0.042]

24 An Arabic word may be constructed out of a stem plus affixes and clitics. [sent-38, score-0.054]

25 Furthermore, some parts of the stem may be deleted or modified when appending a clitic to it according to specific orthographical rules. [sent-39, score-0.121]

26 As a result of omitting diacritics, complex morphology and different orthographical rules, two same words may be regarded as different if compared literally. [sent-41, score-0.091]

27 2 Evaluation Tools Correctly evaluating the automatically generated headlines is an important phase. [sent-42, score-0.363]

28 Automatic methods for evaluating machine generated headlines are preferred against human evaluations because they are faster, cost effective and can be performed repeatedly. [sent-43, score-0.363]

29 However, they are not trivial because of various factors such as readability of headlines and adequacy of headlines (whether headlines indicate the main content of news story). [sent-44, score-1.052]

30 Nevertheless, there are some automatic metrics available for headline evaluation. [sent-46, score-0.669]

31 ROUGE is a system for measuring the quality of a summary by comparing it to a correct summary created by human. [sent-50, score-0.082]

32 ROUGE provides four different measures, namely ROUGEn (usually n = 1,2,3,4), ROUGE-L, ROUGE-W, ROUGE-S and ROUGE-SU. [sent-51, score-0.02]

33 The Arabic Gigaword is a collection of text data extracted from newswire archives of Arabic news sources and their titles that have been gathered over several years by the Linguistic Data Consortium (LDC) at the University of Pennsylvania. [sent-54, score-0.038]

34 The Arabic Gigaword corpus contains almost two million documents with nearly 600 million words. [sent-56, score-0.026]

35 For this work, 260 documents were selected from the corpus based on the following steps: • 3 170 documents were selected automatically according to the following: i. [sent-57, score-0.052]

36 The length of the document body is between 300 to 1000 words ii. [sent-58, score-0.183]

37 The length of the headline (hereafter called original headline) was between 7 to 15 words. [sent-59, score-0.672]

38 All words in the original headline must be found in the document body. [sent-61, score-0.735]

39 • 260 documents were randomly from the 3 170 documents. [sent-62, score-0.026]

40 selected After automatically generating the headlines, 3 native Arabic speaker examiners were hired to evaluate one of the generated headlines as well as the original headline. [sent-63, score-0.403]

41 Also, they were asked to generate 1 headline each for every document. [sent-64, score-0.69]

42 These new 3 headlines will be used as reference headlines in ROUGE to evaluate all automatically generated headlines and the original headline. [sent-65, score-1.051]

43 4 Headline Extraction Techniques The main idea of the used method is to extract the most appropriate set of consecutive words (phrase) from a document body that should represent an adequate headline for the document. [sent-66, score-0.811]

44 Then, evaluate those headlines by calculating ROUGE score against a set of 3 reference headlines. [sent-67, score-0.419]

45 To do so, first, a list of nominated headlines was created from the document body. [sent-68, score-0.827]

46 After this, four different evaluation methods were applied to choose the best headline that reflects the idea of the document among the nominated list. [sent-69, score-1.153]

47 The task of these methods is to catch the most suitable headline that matches the document. [sent-70, score-0.65]

48 The idea here is to choose the headline that contains the largest number of the most frequent words in the document taking into account ignoring stop words and giving earlier sentences in documents more weight. [sent-71, score-0.761]

49 1 Nominating a List of Headlines A window of a length of 10-words was passed over the paragraphs word by word to generate chunks of consecutive words that could be used as headlines. [sent-73, score-0.046]

50 Moving the widow one word step may corrupt the fluency of the sentences. [sent-74, score-0.019]

51 Therefore, the document body was divided into smaller paragraphs at new-line, comma, colon and period characters. [sent-76, score-0.185]

52 This step increased the number of nominated headlines with proper start and end. [sent-77, score-0.742]

53 The resulting is a nominated list of headlines of a length of 10 words. [sent-78, score-0.764]

54 In the case of a paragraph of a length less than 10, there will be only one nominated headline of the same length of that paragraph. [sent-79, score-1.119]

55 Table 1 shows an example of nominating headline list where a is the selected paragraph, b is the first nominated headline and c is the second nominated headline. [sent-80, score-2.167]

56 2 Calculating Word Matching Score The very basic process of making a matching score between every two words in the document body is to give a score of 1 if the two words exactly match or 0 if there is even one mismatch character. [sent-84, score-0.344]

57 Unfortunately, Arabic language contains clitics and is morphologically rich. [sent-86, score-0.092]

58 This means the 119 same word could appear with a single clitic attached to it and yet to be considered as a different word in the EWM method. [sent-87, score-0.069]

59 In which a variable score in the range of 0 to 1 is calculated depending on how much characters match with each other. [sent-89, score-0.11]

60 For example, if the word “‫“ ”وﻛﺘﺒﮭﺎ‬and he wrote it” is compared with the word “‫“ ”ﻛﺘﺐ‬he wrote” using the EWM method the resulting score will be 0, but when using the CCC method it will be 0. [sent-90, score-0.071]

61 3 Calculating Best Headline Score After preparing the two tables of words matching score, now they will be utilized in the selection of the best headline. [sent-95, score-0.074]

62 Except stop-words, every word in the document body (wd) will be matched with every word in the nominated headline (wh) using the CCC and the EWM methods and a score will be registered for every nominated sentence. [sent-96, score-1.803]

63 Calculating matching score for every sentence is also performed in two ways. [sent-98, score-0.121]

64 The first way is the SUM method which is defined in the following equation: =∑ ∑ , (3) where SUMp is the score using SUM method for the nominated headline p, K is the size of unique words in the document body and L is the size of words in the nominated headline (except stopwords). [sent-99, score-2.297]

65 In this method the summation of the crosscorrelation score of every word in the document body and every word in the headline is added up. [sent-100, score-0.931]

66 In a similar way, in the other method MAXp the maximum score between every word in the document body and the nominated headline is added up. [sent-101, score-1.289]

67 Therefore, for every word in the document, its maximum matching score will be added in either cases, CCC or EWM. [sent-102, score-0.121]

68 And it can be defined in the following equation: = ∑ max , (4) SUMp and MAXp were calculated using EWM and CCC method resulting four different variation of the algorithm namely SUM-EWM, SUM-CCC, MAX-EWM and MAX-CCC. [sent-103, score-0.02]

69 4 Weighing Early Nominated Headlines In the case of news articles usually the early sentences absorb the subject of the article (Wasson, 1998). [sent-105, score-0.02]

70 To reflect that, a nonlinear multiplicative scaling factor was applied. [sent-106, score-0.096]

71 The suggested scaling factor is inspired from sigmoid functions and described in the following equations. [sent-108, score-0.096]

72 = where − − 1 /2 (5) = 5 −1 (6) and r is the rank of the nominated headline and S is the total number of sentences. [sent-109, score-1.048]

73 Scaling Function Figure 1: Scaling function of a 1000 nominated headline document. [sent-110, score-1.048]

74 According the nominating mechanism hundreds of sentences could be nominated as possible headlines. [sent-111, score-0.469]

75 Figure 1 shows the scaling function of a one 120 thousand nominated headlines. [sent-112, score-0.471]

76 After applying the scaling factor, the headline with the maximum score was chosen. [sent-113, score-0.763]

77 ROUGE-1 measures the cooccurrences of unigrams where ROUGE-L is based on the longest common subsequence (LCS) of an automatically generated headline and the reference headlines. [sent-115, score-0.669]

78 It is clear that the MAX-CCC scores the highest result in the automatically generated headlines. [sent-116, score-0.019]

79 Unfortunately there are no available results on an Arabic headline generation system to compare with and it is not right to compare these results with other systems applied on other languages or different datasets. [sent-117, score-0.695]

80 So, to give ROUGE score a meaningful aspect, the original headline was evaluated in addition to randomly selected 10 words (Rand-10) and the first 10 words (Lead-10) in the document. [sent-118, score-0.69]

81 From the registered results it is clear that the MAX-CCC has overcome the problem of the rich existence of clitics and morphology. [sent-120, score-0.12]

82 6 Conclusions We have shown the effectiveness of using character cross-correlation in choosing the best headline out of nominated sentences from Arabic document. [sent-121, score-1.112]

83 The advantage of using character cross-correlation i s to overcome the complex morphology of the Arabic language. [sent-122, score-0.151]

84 19384 and outperformed the exact word match which got ROUGE-L= 0. [sent-124, score-0.045]

85 Therefore, we conclude that character cross-correlation is effective when comparing words in morphologically complex languages such as Arabic. [sent-126, score-0.13]

86 Fouad Douzidia and Guy Lapalme, Lakhas, an Arabic summarization system. [sent-147, score-0.035]

87 Using Lead Text for news Summaries: Evaluation Results and Implications for Commercial Summarization Applications. [sent-156, score-0.02]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('headline', 0.65), ('nominated', 0.398), ('headlines', 0.344), ('arabic', 0.288), ('ccc', 0.166), ('rouge', 0.144), ('ewm', 0.119), ('document', 0.085), ('body', 0.076), ('scaling', 0.073), ('alotaiby', 0.071), ('nominating', 0.071), ('character', 0.064), ('gigaword', 0.057), ('diacritics', 0.052), ('clitics', 0.049), ('characters', 0.048), ('alkharashi', 0.048), ('arab', 0.048), ('douzidia', 0.048), ('fahad', 0.048), ('hauptmann', 0.048), ('saud', 0.048), ('generation', 0.045), ('morphologically', 0.043), ('duc', 0.042), ('lapalme', 0.042), ('clitic', 0.042), ('homographs', 0.042), ('salah', 0.042), ('sump', 0.042), ('trimmer', 0.042), ('matching', 0.041), ('summary', 0.041), ('score', 0.04), ('every', 0.04), ('ibrahim', 0.039), ('hedge', 0.039), ('orthographical', 0.039), ('maxp', 0.036), ('registered', 0.036), ('calculating', 0.035), ('overcome', 0.035), ('summarization', 0.035), ('iwould', 0.034), ('preparing', 0.033), ('affixes', 0.032), ('graff', 0.032), ('wrote', 0.031), ('morphology', 0.029), ('jin', 0.029), ('king', 0.028), ('paragraph', 0.027), ('attached', 0.027), ('extractive', 0.027), ('documents', 0.026), ('paragraphs', 0.024), ('complex', 0.023), ('got', 0.023), ('factor', 0.023), ('understanding', 0.023), ('newspaper', 0.022), ('stem', 0.022), ('match', 0.022), ('length', 0.022), ('examiners', 0.021), ('academia', 0.021), ('alphabets', 0.021), ('arabia', 0.021), ('gisting', 0.021), ('meantime', 0.021), ('regularly', 0.021), ('riyadh', 0.021), ('zajic', 0.021), ('lin', 0.021), ('title', 0.02), ('reflects', 0.02), ('namely', 0.02), ('equation', 0.02), ('news', 0.02), ('story', 0.019), ('saudi', 0.019), ('corrupt', 0.019), ('eration', 0.019), ('hired', 0.019), ('lcs', 0.019), ('rong', 0.019), ('supervisors', 0.019), ('understudy', 0.019), ('automatic', 0.019), ('dorr', 0.019), ('generated', 0.019), ('electrical', 0.018), ('guy', 0.018), ('cairo', 0.018), ('archives', 0.018), ('appending', 0.018), ('atus', 0.018), ('doecnitat', 0.018), ('egypt', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 51 acl-2011-Automatic Headline Generation using Character Cross-Correlation

Author: Fahad Alotaiby

2 0.18917251 201 acl-2011-Learning From Collective Human Behavior to Introduce Diversity in Lexical Choice

Author: Vahed Qazvinian ; Dragomir R. Radev

Abstract: We analyze collective discourse, a collective human behavior in content generation, and show that it exhibits diversity, a property of general collective systems. Using extensive analysis, we propose a novel paradigm for designing summary generation systems that reflect the diversity of perspectives seen in reallife collective summarization. We analyze 50 sets of summaries written by human about the same story or artifact and investigate the diversity of perspectives across these summaries. We show how different summaries use various phrasal information units (i.e., nuggets) to express the same atomic semantic units, called factoids. Finally, we present a ranker that employs distributional similarities to build a net- work of words, and captures the diversity of perspectives by detecting communities in this network. Our experiments show how our system outperforms a wide range of other document ranking systems that leverage diversity.

3 0.15323062 7 acl-2011-A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality

Author: Sarah Alkuhlani ; Nizar Habash

Abstract: We present an enriched version of the Penn Arabic Treebank (Maamouri et al., 2004), where latent features necessary for modeling morpho-syntactic agreement in Arabic are manually annotated. We describe our process for efficient annotation, and present the first quantitative analysis of Arabic morphosyntactic phenomena.

4 0.13736849 329 acl-2011-Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition

Author: Nizar Habash ; Ryan Roth

Abstract: Arabic handwriting recognition (HR) is a challenging problem due to Arabic’s connected letter forms, consonantal diacritics and rich morphology. In this paper we isolate the task of identification of erroneous words in HR from the task of producing corrections for these words. We consider a variety of linguistic (morphological and syntactic) and non-linguistic features to automatically identify these errors. Our best approach achieves a roughly ∼15% absolute increase in F-score aov reoru ag hsliym ∼pl1e5 b%ut a rbesaoslounteab inlec breaasseeli inne. F -Asc doreetailed error analysis shows that linguistic features, such as lemma (i.e., citation form) models, help improve HR-error detection precisely where we expect them to: semantically incoherent error words.

5 0.12178604 299 acl-2011-The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content

Author: Omar F. Zaidan ; Chris Callison-Burch

Abstract: The written form of Arabic, Modern Standard Arabic (MSA), differs quite a bit from the spoken dialects of Arabic, which are the true “native” languages of Arabic speakers used in daily life. However, due to MSA’s prevalence in written form, almost all Arabic datasets have predominantly MSA content. We present the Arabic Online Commentary Dataset, a 52M-word monolingual dataset rich in dialectal content, and we describe our long-term annotation effort to identify the dialect level (and dialect itself) in each sentence of the dataset. So far, we have labeled 108K sentences, 41% of which as having dialectal content. We also present experimental results on the task of automatic dialect identification, using the collected labels for training and evaluation.

6 0.10088336 164 acl-2011-Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features

7 0.080249242 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic

8 0.067854404 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

9 0.067200728 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization

10 0.062835738 187 acl-2011-Jointly Learning to Extract and Compress

11 0.058103912 326 acl-2011-Using Bilingual Information for Cross-Language Document Summarization

12 0.054351345 83 acl-2011-Contrasting Multi-Lingual Prosodic Cues to Predict Verbal Feedback for Rapport

13 0.053141575 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers

14 0.052871566 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations

15 0.048704799 162 acl-2011-Identifying the Semantic Orientation of Foreign Words

16 0.04703844 76 acl-2011-Comparative News Summarization Using Linear Programming

17 0.046532381 98 acl-2011-Discovery of Topically Coherent Sentences for Extractive Summarization

18 0.042938046 270 acl-2011-SciSumm: A Multi-Document Summarization System for Scientific Articles

19 0.041528862 109 acl-2011-Effective Measures of Domain Similarity for Parsing

20 0.041013304 4 acl-2011-A Class of Submodular Functions for Document Summarization

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.097), (1, 0.018), (2, -0.01), (3, 0.032), (4, -0.049), (5, -0.004), (6, 0.049), (7, 0.065), (8, 0.041), (9, 0.057), (10, -0.09), (11, 0.018), (12, -0.192), (13, 0.05), (14, 0.011), (15, -0.123), (16, 0.052), (17, -0.01), (18, -0.089), (19, -0.056), (20, -0.055), (21, 0.035), (22, 0.092), (23, 0.064), (24, -0.047), (25, -0.002), (26, -0.053), (27, -0.054), (28, 0.12), (29, -0.029), (30, -0.102), (31, -0.015), (32, 0.012), (33, -0.012), (34, 0.012), (35, -0.055), (36, 0.019), (37, 0.0), (38, 0.11), (39, -0.029), (40, 0.01), (41, 0.027), (42, -0.025), (43, 0.148), (44, -0.062), (45, 0.003), (46, -0.032), (47, 0.073), (48, -0.044), (49, 0.07)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93707955 51 acl-2011-Automatic Headline Generation using Character Cross-Correlation

Author: Fahad Alotaiby

2 0.81917727 299 acl-2011-The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content

Author: Omar F. Zaidan ; Chris Callison-Burch

3 0.73898917 7 acl-2011-A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality

Author: Sarah Alkuhlani ; Nizar Habash

4 0.64283562 329 acl-2011-Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition

Author: Nizar Habash ; Ryan Roth

5 0.5782479 164 acl-2011-Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features

Author: Yuval Marton ; Nizar Habash ; Owen Rambow

Abstract: We explore the contribution of morphological features both lexical and inflectional to dependency parsing of Arabic, a morphologically rich language. Using controlled experiments, we find that definiteness, person, number, gender, and the undiacritzed lemma are most helpful for parsing on automatically tagged input. We further contrast the contribution of form-based and functional features, and show that functional gender and number (e.g., “broken plurals”) and the related rationality feature improve over form-based features. It is the first time functional morphological features are used for Arabic NLP. – –

6 0.50461298 201 acl-2011-Learning From Collective Human Behavior to Introduce Diversity in Lexical Choice

7 0.44996062 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic

8 0.37852195 326 acl-2011-Using Bilingual Information for Cross-Language Document Summarization

9 0.36555737 83 acl-2011-Contrasting Multi-Lingual Prosodic Cues to Predict Verbal Feedback for Rapport

10 0.36275554 308 acl-2011-Towards a Framework for Abstractive Summarization of Multimodal Documents

11 0.36077595 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization

12 0.34866738 47 acl-2011-Automatic Assessment of Coverage Quality in Intelligence Reports

13 0.34836212 321 acl-2011-Unsupervised Discovery of Rhyme Schemes

14 0.33557886 270 acl-2011-SciSumm: A Multi-Document Summarization System for Scientific Articles

15 0.31781101 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering

16 0.3169831 187 acl-2011-Jointly Learning to Extract and Compress

17 0.31041044 4 acl-2011-A Class of Submodular Functions for Document Summarization

18 0.30509144 162 acl-2011-Identifying the Semantic Orientation of Foreign Words

19 0.29730815 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers

20 0.28686616 76 acl-2011-Comparative News Summarization Using Linear Programming

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.026), (17, 0.035), (26, 0.013), (31, 0.022), (37, 0.033), (39, 0.021), (41, 0.054), (55, 0.038), (59, 0.391), (72, 0.019), (75, 0.016), (91, 0.031), (96, 0.164)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96092075 322 acl-2011-Unsupervised Learning of Semantic Relation Composition

Author: Eduardo Blanco ; Dan Moldovan

Abstract: This paper presents an unsupervised method for deriving inference axioms by composing semantic relations. The method is independent of any particular relation inventory. It relies on describing semantic relations using primitives and manipulating these primitives according to an algebra. The method was tested using a set of eight semantic relations yielding 78 inference axioms which were evaluated over PropBank.

2 0.92317009 102 acl-2011-Does Size Matter - How Much Data is Required to Train a REG Algorithm?

Author: Mariet Theune ; Ruud Koolen ; Emiel Krahmer ; Sander Wubben

Abstract: In this paper we investigate how much data is required to train an algorithm for attribute selection, a subtask of Referring Expressions Generation (REG). To enable comparison between different-sized training sets, a systematic training method was developed. The results show that depending on the complexity of the domain, training on 10 to 20 items may already lead to a good performance.

3 0.90174592 293 acl-2011-Template-Based Information Extraction without the Templates

Author: Nathanael Chambers ; Dan Jurafsky

Abstract: Standard algorithms for template-based information extraction (IE) require predefined template schemas, and often labeled data, to learn to extract their slot fillers (e.g., an embassy is the Target of a Bombing template). This paper describes an approach to template-based IE that removes this requirement and performs extraction without knowing the template structure in advance. Our algorithm instead learns the template structure automatically from raw text, inducing template schemas as sets of linked events (e.g., bombings include detonate, set off, and destroy events) associated with semantic roles. We also solve the standard IE task, using the induced syntactic patterns to extract role fillers from specific documents. We evaluate on the MUC-4 terrorism dataset and show that we induce template structure very similar to handcreated gold structure, and we extract role fillers with an F1 score of .40, approaching the performance of algorithms that require full knowledge of the templates.

4 0.8981747 279 acl-2011-Semi-supervised latent variable models for sentence-level sentiment analysis

Author: Oscar Tackstrom ; Ryan McDonald

Abstract: We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and inference algorithms with rich feature definitions. We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines. 1 Sentence-level sentiment analysis In this paper, we demonstrate how combining coarse-grained and fine-grained supervision benefits sentence-level sentiment analysis an important task in the field of opinion classification and retrieval (Pang and Lee, 2008). Typical supervised learning approaches to sentence-level sentiment analysis rely on sentence-level supervision. While such fine-grained supervision rarely exist naturally, and thus requires labor intensive manual annotation effort (Wiebe et al., 2005), coarse-grained supervision is naturally abundant in the form of online review ratings. This coarse-grained supervision is, of course, less informative compared to fine-grained supervision, however, by combining a small amount of sentence-level supervision with a large amount of document-level supervision, we are able to substantially improve on the sentence-level classification task. Our work combines two strands of research: models for sentiment analysis that take document structure into account; – 569 Ryan McDonald Google, Inc., New York ryanmcd@ google com . and models that use latent variables to learn unobserved phenomena from that which can be observed. Exploiting document structure for sentiment analysis has attracted research attention since the early work of Pang and Lee (2004), who performed minimal cuts in a sentence graph to select subjective sentences. McDonald et al. (2007) later showed that jointly learning fine-grained (sentence) and coarsegrained (document) sentiment improves predictions at both levels. More recently, Yessenalina et al. (2010) described how sentence-level latent variables can be used to improve document-level prediction and Nakagawa et al. (2010) used latent variables over syntactic dependency trees to improve sentence-level prediction, using only labeled sentences for training. In a similar vein, Sauper et al. (2010) integrated generative content structure models with discriminative models for multi-aspect sentiment summarization and ranking. These approaches all rely on the availability of fine-grained annotations, but Ta¨ckstro¨m and McDonald (201 1) showed that latent variables can be used to learn fine-grained sentiment using only coarse-grained supervision. While this model was shown to beat a set of natural baselines with quite a wide margin, it has its shortcomings. Most notably, due to the loose constraints provided by the coarse supervision, it tends to only predict the two dominant fine-grained sentiment categories well for each document sentiment category, so that almost all sentences in positive documents are deemed positive or neutral, and vice versa for negative documents. As a way of overcoming these shortcomings, we propose to fuse a coarsely supervised model with a fully supervised model. Below, we describe two ways of achieving such a combined model in the framework of structured conditional latent variable models. Contrary to (generative) topic models (Mei et al., 2007; Titov and Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 569–574, Figure 1: a) Factor graph of the fully observed graphical model. b) Factor graph of the corresponding latent variable model. During training, shaded nodes are observed, while non-shaded nodes are unobserved. The input sentences si are always observed. Note that there are no factors connecting the document node, yd, with the input nodes, s, so that the sentence-level variables, ys, in effect form a bottleneck between the document sentiment and the input sentences. McDonald, 2008; Lin and He, 2009), structured conditional models can handle rich and overlapping features and allow for exact inference and simple gradient based estimation. The former models are largely orthogonal to the one we propose in this work and combining their merits might be fruitful. As shown by Sauper et al. (2010), it is possible to fuse generative document structure models and task specific structured conditional models. While we do model document structure in terms of sentiment transitions, we do not model topical structure. An interesting avenue for future work would be to extend the model of Sauper et al. (2010) to take coarse-grained taskspecific supervision into account, while modeling fine-grained task-specific aspects with latent variables. Note also that the proposed approach is orthogonal to semi-supervised and unsupervised induction of context independent (prior polarity) lexicons (Turney, 2002; Kim and Hovy, 2004; Esuli and Sebastiani, 2009; Rao and Ravichandran, 2009; Velikovich et al., 2010). The output of such models could readily be incorporated as features in the proposed model. 1.1 Preliminaries Let d be a document consisting of n sentences, s = (si)in=1, with a document–sentence-sequence pair denoted d = (d, s). Let yd = (yd, ys) denote random variables1 the document level sentiment, yd, and the sequence of sentence level sentiment, = (ysi)in=1 . – ys 1We are abusing notation throughout by using the same symbols to refer to random variables and their particular assignments. 570 In what follows, we assume that we have access to two training sets: a small set of fully labeled instances, DF = {(dj, and a large set of ydj)}jm=f1, coarsely labeled instances DC = {(dj, yjd)}jm=fm+fm+c1. Furthermore, we assume that yd and all yis take values in {POS, NEG, NEU}. We focus on structured conditional models in the exponential family, with the standard parametrization pθ(yd,ys|s) = expnhφ(yd,ys,s),θi − Aθ(s)o

5 0.88279456 224 acl-2011-Models and Training for Unsupervised Preposition Sense Disambiguation

Author: Dirk Hovy ; Ashish Vaswani ; Stephen Tratz ; David Chiang ; Eduard Hovy

Abstract: We present a preliminary study on unsupervised preposition sense disambiguation (PSD), comparing different models and training techniques (EM, MAP-EM with L0 norm, Bayesian inference using Gibbs sampling). To our knowledge, this is the first attempt at unsupervised preposition sense disambiguation. Our best accuracy reaches 56%, a significant improvement (at p <.001) of 16% over the most-frequent-sense baseline.

6 0.8539564 329 acl-2011-Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition

same-paper 7 0.81714708 51 acl-2011-Automatic Headline Generation using Character Cross-Correlation

8 0.68523794 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons

9 0.66900539 164 acl-2011-Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features

10 0.65515196 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization

11 0.64547539 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

12 0.64192128 7 acl-2011-A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality

13 0.62484384 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing

14 0.62191665 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation

15 0.62188661 174 acl-2011-Insights from Network Structure for Text Mining

16 0.61794806 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts

17 0.61162305 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

18 0.60662985 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters

19 0.60452223 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation

20 0.60105711 213 acl-2011-Local and Global Algorithms for Disambiguation to Wikipedia