acl acl2013 acl2013-346 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Oliver Ferschke ; Iryna Gurevych ; Marc Rittberger
Abstract: With the increasing amount of user generated reference texts in the web, automatic quality assessment has become a key challenge. However, only a small amount of annotated data is available for training quality assessment systems. Wikipedia contains a large amount of texts annotated with cleanup templates which identify quality flaws. We show that the distribution of these labels is topically biased, since they cannot be applied freely to any arbitrary article. We argue that it is necessary to consider the topical restrictions of each label in order to avoid a sampling bias that results in a skewed classifier and overly optimistic evaluation results. . We factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles. This approach better reflects the situation a classifier would face in a real-life application.
Reference: text
sentIndex sentText sentNum sentScore
1 Wikipedia contains a large amount of texts annotated with cleanup templates which identify quality flaws. [sent-4, score-0.509]
2 We factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles. [sent-8, score-0.475]
3 de contains so-called cleanup templates, which constitute a sophisticated system of user generated labels that mark quality problems in articles. [sent-18, score-0.37]
4 Recently, these cleanup templates have been used for automatically identifying articles with particular quality flaws in order to support Wikipedia’s quality assurance process in Wikipedia. [sent-19, score-1.069]
5 In a shared task (Anderka and Stein, 2012b), several systems have shown that it is possible to identify the ten most frequent quality flaws with high recall and fair precision. [sent-20, score-0.418]
6 However, quality flaw detection based on cleanup template recognition suffers from a topic bias that is well known from other text classification applications such as authorship attribution or genre identification. [sent-21, score-1.437]
7 We discovered that cleanup templates have implicit topical restrictions, i. [sent-22, score-0.507]
8 As a consequence, corpora of flawed articles based on these templates are biased towards particular topics. [sent-25, score-0.46]
9 We argue that it is therefore not sufficient for evaluating a quality flaw prediction systems to measure how well they can separate (topically restricted) flawed articles from a set of random outliers. [sent-26, score-1.059]
10 It is rather necessary to determine reliable negative instances with a similar topic distribution as the set of positive instances in order to factor out the sampling bias. [sent-27, score-0.325]
11 We present an approach for factoring out the bias from quality flaw corpora by mining reliable negative instances for each flaw from the article revision history. [sent-29, score-2.01]
12 Furthermore, we employ the article revision history to extract reliable positive training instances by using the version of each article at the time it has first been identified as flawed. [sent-30, score-0.672]
13 This way, we avoid including articles with outdated cleanup templates, a frequent phe721 ProceedingSsof oifa, th Beu 5l1gsarti Aan,An uuaglu Mste 4e-ti9n2g 0 o1f3 t. [sent-31, score-0.424]
14 As already noted above, a similar kind of topic bias negatively influences quality flaw detection in Wikipedia. [sent-49, score-0.933]
15 (2012) automatically identify quality flaws by predicting the cleanup templates in unseen articles with a one-class classification approach. [sent-51, score-1.006]
16 Based on this work, a competition on quality flaw prediction has been established (Anderka and Stein, 2012b). [sent-52, score-0.771]
17 The winning team of the inaugural edition of the task was able to detect the ten most common quality flaws with an average F1-Score of 0. [sent-53, score-0.418]
18 A closer examination of the aforementioned quality flaw detection systems reveals a systematic sampling bias in the training data, which leads to an overly optimistic performance evaluation and classifiers that are biased towards particular article topics. [sent-59, score-1.201]
19 Our approach factors out the topic bias from the training data by mining topically controlled training instances from the Wikipedia revision history. [sent-60, score-0.337]
20 The results show that flaw detection is a much harder problem in a real-life scenario. [sent-61, score-0.718]
21 Other sets of quality criteria are adaptations or relaxations of these standards, such as the good article criteria or the quality grading schemes of individual interest groups in Wikipedia. [sent-64, score-0.348]
22 In this work, we focus on quality flaws regarding neutrality and style problems. [sent-65, score-0.571]
23 Any articles that violate these criteria can be marked with cleanup templates4 to indicate their need for improvement. [sent-73, score-0.424]
24 These templates can thus be regarded as proxies for quality flaws in Wikipedia. [sent-74, score-0.557]
25 Table 1: Neutrality and style flaw corpora used in this work Template Clusters Since several cleanup templates might represent different manifestations of the same quality flaw, there is a 1 to n relationship between quality flaws and cleanup templates. [sent-84, score-1.974]
26 For instance, the templates pov-check5, pov6 and npov language7 can all be mapped to the same flaw concerning the neutral point of view of an article. [sent-85, score-0.841]
27 This aggregation of cleanup templates into flaw-clusters is a subjective task. [sent-86, score-0.421]
28 It is not always clear whether a particular template refers to an existing flaw or should be regarded as a separate class. [sent-87, score-0.8]
29 similar cleanup templates are assigned to different clusters), while too few clusters will result in unclear flaw definitions, since each flaw receives a wide range of possible manifestations. [sent-90, score-1.809]
30 Template Scope Another important aspect to be considered is the difference in the scope which cleanup templates can have. [sent-91, score-0.445]
31 The consideration of the template scope is of particular importance for qual- ity flaw recognition problems. [sent-95, score-0.848]
32 For example, the presence of a cleanup template which marks a single section as not notable does not entail that the whole article is not notable. [sent-96, score-0.571]
33 Many cleanup templates can only be applied to articles from certain subject areas. [sent-102, score-0.563]
34 An example with a particularly obvious restriction is the template in-universe (see Table 1), which should only be applied to articles about fiction. [sent-103, score-0.308]
35 This topical restriction is neither explicitly defined nor automatically enforced, but it plays an important role in the quality flaw recognition task, as the remainder of this paper will show. [sent-104, score-0.93]
36 While flaws merely concerning the structural or linguistic properties of an article are less restricted to individual topics, they are still affected by a certain degree of topical preference. [sent-105, score-0.607]
37 Thus, the distribution of cleanup templates regarding structural or grammatical flaws is also biased towards certain topics. [sent-108, score-0.804]
38 Quality Flaw Recognition Based on the above definition of quality flaws, we define the quality flaw recognition task similar to Anderka et al. [sent-112, score-0.883]
39 (2012) as follows: Given a sample of articles in which each article has been tagged with any cleanup template τi from a specific template cluster Tf thus marking all articles in the sample with a quality flaw f, it has to be decided whether or not an unseen article suffers from f. [sent-113, score-1.936]
40 4 Data Selection and Corpus Creation For creating our corpora, we start with selecting all cleanup templates listed under the categories neutrality and style in the typology of cleanup templates provided by Anderka and Stein (2012a). [sent-114, score-0.995]
41 Each of the selected templates serves as the nucleus of a template cluster that potentially represents a quality flaw. [sent-115, score-0.365]
42 Furthermore, we manually inspect the lists of similar templates in the see also sections of the template descriptions and include all templates that refer to the same concept as the other templates in the cluster. [sent-119, score-0.552]
43 As mentioned earlier, this is a subjective task and largely depends on the desired granularity of the flaw definitions. [sent-120, score-0.706]
44 We finally merge semantically similar template clusters to avoid too fine grained flaw distinctions. [sent-121, score-0.822]
45 As a result, we obtain a total number of 94 template clusters representing 60 style flaws and 34 neutrality flaws. [sent-122, score-0.622]
46 We only regard flaws with at least 500 affected articles in the snapshot of the English Wikipedia from January 4, 2012. [sent-129, score-0.472]
47 Table 1 lists the final sets of flaws used in this work. [sent-130, score-0.348]
48 Agreement with Human Rater Quality flaw detection in Wikipedia is based on the assuption that cleanup templates are valid mark- ers of quality flaws. [sent-160, score-1.227]
49 In order to test the reliability of these user assigned templates as quality flaw markers, we carried out an annotation study in which a human annotator was asked to perform the binary flaw detection task manually. [sent-161, score-1.628]
50 For each of the 12 article scope flaws, we extracted the plain text of 10 random flawed articles and 10 random untagged articles. [sent-164, score-0.566]
51 The annotator had to decide for each flaw individually whether a given text belonged to a flawed article or not. [sent-165, score-0.981]
52 Upon manual inspection, we found a wide range of possible manifestations of 724 this flaw ranging from an agglomeration of incoherent factoids to well-structured sections that did not exactly match the focus of the article, which is the main reason for the low agreement. [sent-172, score-0.705]
53 In this context, the topical restrictions of cleanup templates have to be taken into account. [sent-178, score-0.533]
54 In the following, we describe our approach to extracting reliable training instances from the quality flaw corpora. [sent-179, score-0.908]
55 1 Reliable Positives In previous work, the latest available versions of flawed articles have been used as positive training instances. [sent-181, score-0.328]
56 However, we found upon manual inspection of the data that a substantial number of articles has been significantly edited between the time tτ, at which the template was first assigned, and the time te, at which the articles have been extracted. [sent-182, score-0.401]
57 Using the latest version at time te can thus include articles in which the respective flaw has already been fixed without removing the cleanup template. [sent-183, score-1.152]
58 Therefore, we use the revision of the article at time tτ to assure that the flaw is still present in the training instance. [sent-184, score-0.987]
59 For every article in the corpus of positive examples for flaw f that is marked with template τ ∈ Tf, we backtrack the revision history chronologically, until we find the first revision rtτ−1 that is not tagged with τ . [sent-187, score-1.295]
60 We then add the succeeding revision rtτ to the corpus of reliable positives for flaw f. [sent-188, score-0.927]
61 In Section 6, we show that the classification performance improves for most flaws when using reliable positives instead of the latest available article versions. [sent-189, score-0.665]
62 2 Reliable Negatives and Topical Restriction A central problem of the quality flaw recognition approach is the fact that there are no articles available that are tagged to not contain a particular quality problem. [sent-191, score-1.025]
63 The authors circumvent this issue by evaluating their classifiers on a set of random untagged instances and a set of featured articles and argue that the actual performance of predicting the quality flaws lies between the two. [sent-196, score-0.755]
64 (2012) follow a two step classification approach (PU learning) that first uses a Naive Bayes classifier trained on positive instances and random untagged articles to pre-classify the data. [sent-198, score-0.364]
65 Both approaches sample random negative instances Arnd for any given set of flawed articles Af from a set of untagged articles Au (see Fig. [sent-202, score-0.592]
66 This will avoid the systematic bias and 725 Figure 1: Sampling of negative instances for a given set of flawed articles (Af). [sent-206, score-0.46]
67 Random negatives (Arnd) are sampled from articles without any cleanup templates (Au). [sent-207, score-0.612]
68 In the following, we present our approach to extracting reliable negative training instances that conform with the topical restrictions of the cleanup templates. [sent-209, score-0.573]
69 Without loss of generality, we assume that an article, from which a cleanup template τ ∈ Tf is deleted at a point in time dτ, the aplrtaitcele τ no longer suffers from flaw f at that point in time. [sent-210, score-1.082]
70 Thus, the revision rdτ is a reliable negative instance for the flaw f. [sent-211, score-0.936]
71 Additionally, since the article was once tagged with τ ∈ Tf, it belongs to the tthicel same o renscteri tcatgegde topic set Atopic as the positive instances for flaw f. [sent-212, score-1.001]
72 Since there are occasions in which a template is replaced by another template from the same cluster, we ensure that rdτ does also not contain any other template from cluster Tf before we finally add the revision to the set of reliable negatives for flaw f. [sent-216, score-1.315]
73 In the remainder of this section, we evaluate the topical similarity between the positive and the negative set of articles for each flaw using both our method and the original approach. [sent-217, score-0.987]
74 In order to compare two sets of articles with respect to their topical similarity, we represent each article set as a category frequency vector. [sent-222, score-0.4]
75 We can see that the topics of articles in the positive training sets are highly similar to the topics of the corresponding reliable negative articles while they show little similarity to the articles in the random set. [sent-230, score-0.643]
76 That is, a flaw such as in-universe is restricted to a very narrow selection of articles, while a flaw such as copy edit can be applied to most articles and rather shows a topical preference due to reasons outlined in Section 3. [sent-234, score-1.632]
77 252 Table 3: Cosine similarity scores between the category frequency vectors of the flawed article sets and the respective random or reliable negatives is therefore to be expected that that flaws with a small Atopic are more prone to the topic bias. [sent-259, score-0.87]
78 Our system for quality flaw detection follows the approach by Ferschke et al. [sent-261, score-0.806]
79 The BASE configuration uses the newest version of each flawed article as positive instances and a random set of untagged articles as negative instances. [sent-266, score-0.656]
80 Features An extensive survey of features for quality flaw recognition has been provided by Anderka et al. [sent-271, score-0.795]
81 67 Table 5: Average F1-scores over all flaws on RELP using all features the quality flaw recognition task. [sent-294, score-1.125]
82 7 Evaluation and Discussion The SVMs achieve a similar cross-validated performance on all feature sets containing ngrams, showing only minor improvements for individual flaws when adding non-lexical features. [sent-311, score-0.33]
83 While structural quality flaws can be well captured by special purpose features or intensional modeling, as related work has shown, more subtle content flaws such as the neutrality and style flaws are mainly captured by the wording itself. [sent-313, score-1.231]
84 Textual features beyond the ngram level, such as syntactic and semantic qualities of the text, could further improve the classification performance of these flaws and should be addressed in future work. [sent-314, score-0.382]
85 The classifiers trained on reliable positives and random untagged articles (RELP) outperform the respective classifiers based on the BASE dataset for most flaws. [sent-321, score-0.465]
86 This confirms our original hypothesis that using the appropriate revision of each tagged article is superior to using the latest available version from the dump. [sent-322, score-0.33]
87 In the RELALL setting, however, the differences between the positive and negative instances are largely determined by the flaws alone. [sent-329, score-0.487]
88 Hereby, the positive training and test instances remain the same in both settings, while the unbiased data contains negative instances sampled from Arel and the unbiased data from Arnd (see Figure 1). [sent-335, score-0.352]
89 With the NGRAM feature set, the reliable classifiers outperformed the unreliable classifiers on all flaws that can be well identified with lexical cues, such as Advert or Technical. [sent-336, score-0.519]
90 In the biased case, we found both topic related and flaw specific ngrams among the most highly ranked ngram features. [sent-337, score-0.86]
91 In the unbiased case, most of the informative ngrams were flaw specific expressions. [sent-338, score-0.806]
92 Consequently, biased classifiers fail on the unbiased dataset in which the positive and negative class are sampled from the same topics, which renders the highly ranked topic ngrams unusable. [sent-339, score-0.361]
93 A direct comparison of our results to related work is difficult, since neutrality and style flaws have not been targeted before in a similar manner. [sent-343, score-0.483]
94 However, the Advert flaw was also part of the ten flaw types in the PAN Quality Flaw Recognition Task (Anderka and Stein, 2012b). [sent-344, score-1.366]
95 8 Conclusions We showed that text classification based on Wikipedia cleanup templates is prone to a topic bias which causes skewed classifiers and overly optimistic cross-validated evaluation results. [sent-347, score-0.672]
96 Table 6: F1 scores for the 10-fold cross validation of the SVMs with RBF kernel on all datasets using NGRAM features strated how to avoid the topic bias when creating quality flaw corpora. [sent-390, score-0.923]
97 Unbiased classifiers reflect much better the performance of quality flaw recognition “in the wild”, because they detect actual flawed articles rather than identifying the articles that are prone to certain quality due to their topic or subject matter. [sent-392, score-1.423]
98 In our experiments, we presented a system for identifying Wikipedia articles with style and neutrality flaws, a novel category of quality problems that is of particular importance within and outside of Wikipedia. [sent-393, score-0.383]
99 We showed that selecting a reliable set of positive training instances mined from the revision history improves the classification performance. [sent-394, score-0.353]
100 In future work, we aim to extend our quality flaw detection system to not only find articles that contain a particular flaw, but also to identify the flaws within the articles, which can be achieved by leveraging the positional information of in-line cleanup templates. [sent-395, score-1.56]
wordName wordTfidf (topN-words)
[('flaw', 0.683), ('flaws', 0.33), ('cleanup', 0.282), ('article', 0.172), ('anderka', 0.143), ('articles', 0.142), ('templates', 0.139), ('revision', 0.132), ('flawed', 0.126), ('template', 0.117), ('ferschke', 0.11), ('neutrality', 0.093), ('quality', 0.088), ('topical', 0.086), ('wikipedia', 0.085), ('unbiased', 0.08), ('reliable', 0.079), ('bias', 0.073), ('relall', 0.066), ('relp', 0.066), ('trivia', 0.066), ('untagged', 0.062), ('style', 0.06), ('instances', 0.058), ('rbf', 0.057), ('authorship', 0.056), ('classifiers', 0.055), ('atopic', 0.055), ('topic', 0.054), ('biased', 0.053), ('negatives', 0.049), ('advert', 0.049), ('restriction', 0.049), ('tone', 0.048), ('arel', 0.044), ('arnd', 0.044), ('ferretti', 0.044), ('ngrams', 0.043), ('negative', 0.042), ('weasel', 0.039), ('pov', 0.039), ('af', 0.036), ('confusing', 0.036), ('detection', 0.035), ('stein', 0.035), ('orgy', 0.034), ('positive', 0.034), ('positives', 0.033), ('globalize', 0.033), ('maik', 0.033), ('mikros', 0.033), ('peacock', 0.033), ('stvilia', 0.033), ('iryna', 0.033), ('tf', 0.032), ('gy', 0.03), ('stumps', 0.029), ('ngram', 0.027), ('svms', 0.026), ('rd', 0.026), ('restrictions', 0.026), ('latest', 0.026), ('pu', 0.026), ('oliver', 0.026), ('farkas', 0.025), ('classification', 0.025), ('history', 0.025), ('kernel', 0.025), ('recognition', 0.024), ('scope', 0.024), ('optimistic', 0.023), ('benno', 0.023), ('clef', 0.023), ('classifier', 0.023), ('largely', 0.023), ('labs', 0.022), ('disputed', 0.022), ('luyckx', 0.022), ('manifestations', 0.022), ('clusters', 0.022), ('stylistic', 0.022), ('outliers', 0.022), ('zesch', 0.022), ('cluster', 0.021), ('prone', 0.021), ('topics', 0.021), ('topically', 0.02), ('naive', 0.02), ('random', 0.02), ('miscellaneous', 0.019), ('npov', 0.019), ('outlinks', 0.019), ('restricted', 0.019), ('copy', 0.019), ('systematic', 0.019), ('respective', 0.019), ('pan', 0.018), ('lists', 0.018), ('bayes', 0.018), ('szarvas', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 346 acl-2013-The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
Author: Oliver Ferschke ; Iryna Gurevych ; Marc Rittberger
Abstract: With the increasing amount of user generated reference texts in the web, automatic quality assessment has become a key challenge. However, only a small amount of annotated data is available for training quality assessment systems. Wikipedia contains a large amount of texts annotated with cleanup templates which identify quality flaws. We show that the distribution of these labels is topically biased, since they cannot be applied freely to any arbitrary article. We argue that it is necessary to consider the topical restrictions of each label in order to avoid a sampling bias that results in a skewed classifier and overly optimistic evaluation results. . We factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles. This approach better reflects the situation a classifier would face in a real-life application.
2 0.13033755 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
Author: Oleg Rokhlenko ; Idan Szpektor
Abstract: We introduce the novel task of automatically generating questions that are relevant to a text but do not appear in it. One motivating example of its application is for increasing user engagement around news articles by suggesting relevant comparable questions, such as “is Beyonce a better singer than Madonna?”, for the user to answer. We present the first algorithm for the task, which consists of: (a) offline construction of a comparable question template database; (b) ranking of relevant templates to a given article; and (c) instantiation of templates only with entities in the article whose comparison under the template’s relation makes sense. We tested the suggestions generated by our algorithm via a Mechanical Turk experiment, which showed a significant improvement over the strongest baseline of more than 45% in all metrics.
3 0.10931554 232 acl-2013-Linguistic Models for Analyzing and Detecting Biased Language
Author: Marta Recasens ; Cristian Danescu-Niculescu-Mizil ; Dan Jurafsky
Abstract: Unbiased language is a requirement for reference sources like encyclopedias and scientific texts. Bias is, nonetheless, ubiquitous, making it crucial to understand its nature and linguistic realization and hence detect bias automatically. To this end we analyze real instances of human edits designed to remove bias from Wikipedia articles. The analysis uncovers two classes of bias: framing bias, such as praising or perspective-specific words, which we link to the literature on subjectivity; and epistemological bias, related to whether propositions that are presupposed or entailed in the text are uncontroversially accepted as true. We identify common linguistic cues for these classes, including factive verbs, implicatives, hedges, and subjective inten- cs . sifiers. These insights help us develop features for a model to solve a new prediction task of practical importance: given a biased sentence, identify the bias-inducing word. Our linguistically-informed model performs almost as well as humans tested on the same task.
4 0.078412443 356 acl-2013-Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia
Author: Zhigang Wang ; Zhixing Li ; Juanzi Li ; Jie Tang ; Jeff Z. Pan
Abstract: Wikipedia infoboxes are a valuable source of structured knowledge for global knowledge sharing. However, infobox information is very incomplete and imbalanced among the Wikipedias in different languages. It is a promising but challenging problem to utilize the rich structured knowledge from a source language Wikipedia to help complete the missing infoboxes for a target language. In this paper, we formulate the problem of cross-lingual knowledge extraction from multilingual Wikipedia sources, and present a novel framework, called WikiCiKE, to solve this problem. An instancebased transfer learning method is utilized to overcome the problems of topic drift and translation errors. Our experimental results demonstrate that WikiCiKE outperforms the monolingual knowledge extraction method and the translation-based method.
5 0.074940659 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data
Author: Oren Melamud ; Ido Dagan ; Jacob Goldberger ; Idan Szpektor
Abstract: Automatic acquisition of inference rules for predicates is widely addressed by computing distributional similarity scores between vectors of argument words. In this scheme, prior work typically refrained from learning rules for low frequency predicates associated with very sparse argument vectors due to expected low reliability. To improve the learning of such rules in an unsupervised way, we propose to lexically expand sparse argument word vectors with semantically similar words. Our evaluation shows that lexical expansion significantly improves performance in comparison to state-of-the-art baselines.
6 0.057646375 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization
7 0.053648517 351 acl-2013-Topic Modeling Based Classification of Clinical Reports
8 0.051589034 121 acl-2013-Discovering User Interactions in Ideological Discussions
9 0.050746024 233 acl-2013-Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
10 0.048250049 74 acl-2013-Building Comparable Corpora Based on Bilingual LDA Model
11 0.047892224 248 acl-2013-Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation
12 0.046631638 323 acl-2013-Simpler unsupervised POS tagging with bilingual projections
13 0.045139637 21 acl-2013-A Statistical NLG Framework for Aggregated Planning and Realization
14 0.044981122 55 acl-2013-Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?
15 0.043591239 67 acl-2013-Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model
16 0.040725164 78 acl-2013-Categorization of Turkish News Documents with Morphological Analysis
17 0.038964905 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification
18 0.03879514 300 acl-2013-Reducing Annotation Effort for Quality Estimation via Active Learning
19 0.038321607 140 acl-2013-Evaluating Text Segmentation using Boundary Edit Distance
20 0.038045738 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl
topicId topicWeight
[(0, 0.115), (1, 0.059), (2, 0.013), (3, -0.041), (4, 0.028), (5, -0.017), (6, 0.029), (7, -0.021), (8, 0.01), (9, -0.027), (10, 0.009), (11, 0.039), (12, -0.017), (13, 0.032), (14, -0.05), (15, -0.021), (16, -0.016), (17, 0.024), (18, -0.001), (19, -0.011), (20, 0.001), (21, 0.025), (22, -0.011), (23, 0.057), (24, 0.013), (25, 0.028), (26, 0.041), (27, -0.067), (28, -0.027), (29, 0.024), (30, -0.044), (31, 0.008), (32, 0.047), (33, 0.01), (34, -0.07), (35, 0.061), (36, 0.018), (37, 0.046), (38, -0.025), (39, 0.002), (40, -0.011), (41, 0.072), (42, 0.003), (43, -0.012), (44, 0.026), (45, -0.057), (46, 0.046), (47, -0.026), (48, -0.03), (49, 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.918275 346 acl-2013-The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
Author: Oliver Ferschke ; Iryna Gurevych ; Marc Rittberger
Abstract: With the increasing amount of user generated reference texts in the web, automatic quality assessment has become a key challenge. However, only a small amount of annotated data is available for training quality assessment systems. Wikipedia contains a large amount of texts annotated with cleanup templates which identify quality flaws. We show that the distribution of these labels is topically biased, since they cannot be applied freely to any arbitrary article. We argue that it is necessary to consider the topical restrictions of each label in order to avoid a sampling bias that results in a skewed classifier and overly optimistic evaluation results. . We factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles. This approach better reflects the situation a classifier would face in a real-life application.
2 0.65799367 21 acl-2013-A Statistical NLG Framework for Aggregated Planning and Realization
Author: Ravi Kondadadi ; Blake Howald ; Frank Schilder
Abstract: We present a hybrid natural language generation (NLG) system that consolidates macro and micro planning and surface realization tasks into one statistical learning process. Our novel approach is based on deriving a template bank automatically from a corpus of texts from a target domain. First, we identify domain specific entity tags and Discourse Representation Structures on a per sentence basis. Each sentence is then organized into semantically similar groups (representing a domain specific concept) by k-means clustering. After this semi-automatic processing (human review of cluster assignments), a number of corpus–level statistics are compiled and used as features by a ranking SVM to develop model weights from a training corpus. At generation time, a set of input data, the collection of semantically organized templates, and the model weights are used to select optimal templates. Our system is evaluated with automatic, non–expert crowdsourced and expert evaluation metrics. We also introduce a novel automatic metric syntactic variability that represents linguistic variation as a measure of unique template sequences across a collection of automatically generated documents. The metrics for generated weather and biography texts fall within acceptable ranges. In sum, we argue that our statistical approach to NLG reduces the need for complicated knowledge-based architectures and readily adapts to different domains with reduced development time. – – *∗Ravi Kondadadi is now affiliated with Nuance Communications, Inc.
3 0.62330592 14 acl-2013-A Novel Classifier Based on Quantum Computation
Author: Ding Liu ; Xiaofang Yang ; Minghu Jiang
Abstract: In this article, we propose a novel classifier based on quantum computation theory. Different from existing methods, we consider the classification as an evolutionary process of a physical system and build the classifier by using the basic quantum mechanics equation. The performance of the experiments on two datasets indicates feasibility and potentiality of the quantum classifier.
4 0.61989492 356 acl-2013-Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia
Author: Zhigang Wang ; Zhixing Li ; Juanzi Li ; Jie Tang ; Jeff Z. Pan
Abstract: Wikipedia infoboxes are a valuable source of structured knowledge for global knowledge sharing. However, infobox information is very incomplete and imbalanced among the Wikipedias in different languages. It is a promising but challenging problem to utilize the rich structured knowledge from a source language Wikipedia to help complete the missing infoboxes for a target language. In this paper, we formulate the problem of cross-lingual knowledge extraction from multilingual Wikipedia sources, and present a novel framework, called WikiCiKE, to solve this problem. An instancebased transfer learning method is utilized to overcome the problems of topic drift and translation errors. Our experimental results demonstrate that WikiCiKE outperforms the monolingual knowledge extraction method and the translation-based method.
5 0.61873597 298 acl-2013-Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms
Author: Elijah Mayfield ; David Adamson ; Carolyn Penstein Rose
Abstract: Automated annotation of social behavior in conversation is necessary for large-scale analysis of real-world conversational data. Important behavioral categories, though, are often sparse and often appear only in specific subsections of a conversation. This makes supervised machine learning difficult, through a combination of noisy features and unbalanced class distributions. We propose within-instance content selection, using cue features to selectively suppress sections of text and biasing the remaining representation towards minority classes. We show the effectiveness of this technique in automated annotation of empowerment language in online , support group chatrooms. Our technique is significantly more accurate than multiple baselines, especially when prioritizing high precision.
6 0.6084168 52 acl-2013-Annotating named entities in clinical text by combining pre-annotation and active learning
7 0.58963716 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users
8 0.58820808 337 acl-2013-Tag2Blog: Narrative Generation from Satellite Tag Data
9 0.58006668 232 acl-2013-Linguistic Models for Analyzing and Detecting Biased Language
10 0.56041515 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
11 0.55973256 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification
12 0.55920225 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization
13 0.55419904 268 acl-2013-PATHS: A System for Accessing Cultural Heritage Collections
15 0.54021031 30 acl-2013-A computational approach to politeness with application to social factors
16 0.53663319 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
17 0.53590834 78 acl-2013-Categorization of Turkish News Documents with Morphological Analysis
18 0.52674389 340 acl-2013-Text-Driven Toponym Resolution using Indirect Supervision
19 0.52075368 63 acl-2013-Automatic detection of deception in child-produced speech using syntactic complexity features
20 0.52016914 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation
topicId topicWeight
[(0, 0.051), (6, 0.046), (7, 0.237), (11, 0.061), (15, 0.041), (24, 0.055), (26, 0.061), (33, 0.015), (35, 0.077), (42, 0.038), (48, 0.024), (70, 0.056), (88, 0.041), (90, 0.023), (95, 0.068)]
simIndex simValue paperId paperTitle
1 0.80206645 253 acl-2013-Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts
Author: Zornitsa Kozareva
Abstract: Metaphor is an important way of conveying the affect of people, hence understanding how people use metaphors to convey affect is important for the communication between individuals and increases cohesion if the perceived affect of the concrete example is the same for the two individuals. Therefore, building computational models that can automatically identify the affect in metaphor-rich texts like “The team captain is a rock.”, “Time is money.”, “My lawyer is a shark.” is an important challenging problem, which has been of great interest to the research community. To solve this task, we have collected and manually annotated the affect of metaphor-rich texts for four languages. We present novel algorithms that integrate triggers for cognitive, affective, perceptual and social processes with stylistic and lexical information. By running evaluations on datasets in English, Spanish, Russian and Farsi, we show that the developed affect polarity and valence prediction technology of metaphor-rich texts is portable and works equally well for different languages.
same-paper 2 0.79965836 346 acl-2013-The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
Author: Oliver Ferschke ; Iryna Gurevych ; Marc Rittberger
Abstract: With the increasing amount of user generated reference texts in the web, automatic quality assessment has become a key challenge. However, only a small amount of annotated data is available for training quality assessment systems. Wikipedia contains a large amount of texts annotated with cleanup templates which identify quality flaws. We show that the distribution of these labels is topically biased, since they cannot be applied freely to any arbitrary article. We argue that it is necessary to consider the topical restrictions of each label in order to avoid a sampling bias that results in a skewed classifier and overly optimistic evaluation results. . We factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles. This approach better reflects the situation a classifier would face in a real-life application.
3 0.78551286 153 acl-2013-Extracting Events with Informal Temporal References in Personal Histories in Online Communities
Author: Miaomiao Wen ; Zeyu Zheng ; Hyeju Jang ; Guang Xiang ; Carolyn Penstein Rose
Abstract: We present a system for extracting the dates of illness events (year and month of the event occurrence) from posting histories in the context of an online medical support community. A temporal tagger retrieves and normalizes dates mentioned informally in social media to actual month and year referents. Building on this, an event date extraction system learns to integrate the likelihood of candidate dates extracted from time-rich sentences with temporal constraints extracted from eventrelated sentences. Our integrated model achieves 89.7% of the maximum performance given the performance of the temporal expression retrieval step.
4 0.73934352 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning
Author: Xiaojun Quan ; Chunyu Kit ; Yan Song
Abstract: This paper studies the problem of nonmonotonic sentence alignment, motivated by the observation that coupled sentences in real bitexts do not necessarily occur monotonically, and proposes a semisupervised learning approach based on two assumptions: (1) sentences with high affinity in one language tend to have their counterparts with similar relatedness in the other; and (2) initial alignment is readily available with existing alignment techniques. They are incorporated as two constraints into a semisupervised learning framework for optimization to produce a globally optimal solution. The evaluation with realworld legal data from a comprehensive legislation corpus shows that while exist- ing alignment algorithms suffer severely from non-monotonicity, this approach can work effectively on both monotonic and non-monotonic data.
5 0.62919325 264 acl-2013-Online Relative Margin Maximization for Statistical Machine Translation
Author: Vladimir Eidelman ; Yuval Marton ; Philip Resnik
Abstract: Recent advances in large-margin learning have shown that better generalization can be achieved by incorporating higher order information into the optimization, such as the spread of the data. However, these solutions are impractical in complex structured prediction problems such as statistical machine translation. We present an online gradient-based algorithm for relative margin maximization, which bounds the spread ofthe projected data while maximizing the margin. We evaluate our optimizer on Chinese-English and ArabicEnglish translation tasks, each with small and large feature sets, and show that our learner is able to achieve significant im- provements of 1.2-2 BLEU and 1.7-4.3 TER on average over state-of-the-art optimizers with the large feature set.
6 0.59585965 233 acl-2013-Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
7 0.59156424 318 acl-2013-Sentiment Relevance
8 0.58530432 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations
9 0.57961893 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification
10 0.57941955 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
11 0.57693541 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
12 0.57501948 333 acl-2013-Summarization Through Submodularity and Dispersion
13 0.57305294 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering
14 0.57245177 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
15 0.57226598 250 acl-2013-Models of Translation Competitions
16 0.57219672 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
17 0.57168752 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
18 0.57050633 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users
19 0.57046866 224 acl-2013-Learning to Extract International Relations from Political Context
20 0.56940401 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions