acl acl2013 acl2013-309 knowledge-graph by maker-knowledge-mining

309 acl-2013-Scaling Semi-supervised Naive Bayes with Feature Marginals


Source: pdf

Author: Michael Lucas ; Doug Downey

Abstract: Semi-supervised learning (SSL) methods augment standard machine learning (ML) techniques to leverage unlabeled data. SSL techniques are often effective in text classification, where labeled data is scarce but large unlabeled corpora are readily available. However, existing SSL techniques typically require multiple passes over the entirety of the unlabeled data, meaning the techniques are not applicable to large corpora being produced today. In this paper, we show that improving marginal word frequency estimates using unlabeled data can enable semi-supervised text classification that scales to massive unlabeled data sets. We present a novel learning algorithm, which optimizes a Naive Bayes model to accord with statistics calculated from the unlabeled corpus. In experiments with text topic classification and sentiment analysis, we show that our method is both more scalable and more accurate than SSL techniques from previous work.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract Semi-supervised learning (SSL) methods augment standard machine learning (ML) techniques to leverage unlabeled data. [sent-5, score-0.373]

2 SSL techniques are often effective in text classification, where labeled data is scarce but large unlabeled corpora are readily available. [sent-6, score-0.432]

3 However, existing SSL techniques typically require multiple passes over the entirety of the unlabeled data, meaning the techniques are not applicable to large corpora being produced today. [sent-7, score-0.498]

4 In this paper, we show that improving marginal word frequency estimates using unlabeled data can enable semi-supervised text classification that scales to massive unlabeled data sets. [sent-8, score-1.114]

5 We present a novel learning algorithm, which optimizes a Naive Bayes model to accord with statistics calculated from the unlabeled corpus. [sent-9, score-0.473]

6 In experiments with text topic classification and sentiment analysis, we show that our method is both more scalable and more accurate than SSL techniques from previous work. [sent-10, score-0.266]

7 1 Introduction Semi-supervised Learning (SSL) is a Machine Learning (ML) approach that utilizes large amounts of unlabeled data, combined with a smaller amount of labeled data, to learn a target function (Zhu, 2006; Chapelle et al. [sent-11, score-0.438]

8 Experiments in text classification and other domains have demonstrated that by leveraging unlabeled data, SSL techniques improve machine learning performance when human input is limited northwe stern . [sent-14, score-0.468]

9 Typically, for each target concept to be learned, a semi-supervised classifier is trained using iterative techniques that execute multiple passes over the unlabeled data (e. [sent-20, score-0.477]

10 This is problematic for text classification over large unlabeled corpora like the Web: new target concepts (new tasks and new topics of interest) arise frequently, and performing even a single pass over a large corpus for each new target concept is intractable. [sent-24, score-0.428]

11 In this paper, we present a new SSL text classification approach that scales to large corpora. [sent-25, score-0.13]

12 Instead of utilizing unlabeled examples directly for each given target concept, our approach is to precompute a small set of statistics over the unlabeled data in advance. [sent-26, score-0.769]

13 Then, for a given target class and labeled data set, we utilize the statistics to improve a classifier. [sent-27, score-0.182]

14 Specifically, we introduce a method that extends Multinomial Naive Bayes (MNB) to leverage marginal probability statistics P(w) of each word w, computed over the unlabeled data. [sent-28, score-0.59]

15 The marginal statistics are used as a constraint to improve the class-conditional probability estimates P(w|+) and P(w| −) for the positive and negative classes, )w anhdich P are |o−ft)en fo noisy pwosheitinv eest ainmda nteedg over sparse labeled data sets. [sent-29, score-0.644]

16 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioinngauli Lsitnicgsu,i psatgices 343–351, FM improves accuracy, and find that surprisingly MNB-FM is especially useful for improving classconditional probability estimates for words that never occur in the training set. [sent-35, score-0.256]

17 2 Problem Definition We consider a semi-supervised classification task, in which the goal is to produce a mapping from an instance space X consisting of T-tuples forfo non-negative integer-valued tfienagtu orfes T w = (w1, . [sent-41, score-0.095]

18 We assume the following inputs: • A set of zero or more labeled documents DL = o{(fw zedr, oyd) o |d = 1, . [sent-47, score-0.1]

19 • A large set of unlabeled documents DU = A{(w ladrg) |d = n 1, . [sent-57, score-0.374]

20 , n u} durmaewnnt sfro Dm the + + marginal distribution P(w) = XP(w,y). [sent-60, score-0.124]

21 Our semi-supervised technique utilizes statistics computed over the labeled corpus, denoted as follows. [sent-63, score-0.203]

22 We use Nw+ to denote the sum of the occurrences of word w over all documents in the positiveP class in the labeled data DL. [sent-64, score-0.171]

23 Also, let N+ = Pnw∈DL Nw+ be the sum value of all word counts iPn wth∈eD labeled positive documents. [sent-65, score-0.145]

24 The count ofP the remaining words in the positive documents is represented as N¬+w = N+ Nw+. [sent-66, score-0.127]

25 − 3 MNB with Feature Marginals We now introduce our algorithm, which scalably utilizes large unlabeled data stores for classification tasks. [sent-68, score-0.474]

26 1 MNB-FM Method In the text classification setting , each feature value wd represents count of observations of word w in document d. [sent-71, score-0.127]

27 |L+e)t P(+) denote the prior probability that a document is of the positive class, and P(−) = 1−P(+) the prior hfoer ptohes negative c,l aansds. [sent-75, score-0.214]

28 PTh(−en) M =N 1B− represents the class probability of an example as: Y(θw+)wdP(+) P(+|d) =Y(θw−)wdwYP∈d(−) +Y(θw+)wdP(+) wY∈d wY∈d (1) MNB estimates the parameters θw+ from the corresponding counts in the training set. [sent-76, score-0.298]

29 The maximum-likelihood estimate of θw+ is Nw+/N+, and to prevent zero-probability estimates we employ “add-1” smoothing (typical in MNB) to obtain the estimate: θ+w=NN+w+++ |T 1|. [sent-77, score-0.236]

30 MNB-FM attempts to improve MNB’s estimates of θw+ and θw−, using statistics computed over the unlabeled data. [sent-79, score-0.596]

31 Formally, MNB-FM leverages the equality: P(w) = θw+Pt(+) + θw−Pt(−) (2) The left-hand-side of Equation 2, P(w), represents the probability that a given randomly drawn token from the unlabeled data happens to be the word w. [sent-80, score-0.368]

32 Note that Pt(+) can differ from P(+), the prior probability that a document is positive, due to variations in document length. [sent-84, score-0.099]

33 MN(−B)-F iMs d eis- 34m4otivated by the insight that the left-hand-side of Equation 2 can be estimated in advance, without knowledge of the target class, simply by counting the number of tokens of each word in the unlabeled data. [sent-86, score-0.333]

34 MNB-FM attempts to improve the noisy estimates θw+ and θw− utilizing the robust estimate for P(w) computed over unlabeled data. [sent-90, score-0.611]

35 Specifically, MNB-FM proceeds by assuming the MLEs for P(w) (computed over unlabeled data), Pt(+), and Pt(−) are correct, and reestimates θw+ and θw− un(d−er) )th aer eco cnostrrreaicntt, i ann Equation 2. [sent-91, score-0.356]

36 In that case, we default to the add-1 Smoothing estimates used by MNB. [sent-96, score-0.165]

37 Finally, after optimizing the values θw+ and θw− for each word w as described above, we normalize the estimates to obtain valid conditional probability distributions, i. [sent-97, score-0.2]

38 2 PMNB-FMP PExample The following concrete example illustrates how MNB-FM can improve MNB parameters using the statistic P(w) computed over unlabeled data. [sent-100, score-0.379]

39 The example comes from the Reuters Aptemod text classification task addressed in Section 4, using bag-of-words features for the Earnings class. [sent-101, score-0.095]

40 In one experiment with 10 labeled training examples, we observed 5 positive and 5 negative examples, with the word “resources” occurring three times in the set (once in the positive class, twice in the negative class). [sent-102, score-0.38]

41 MNB uses add-1 smoothing to estimate the conditional probability of the word “resources” in each class as θw+ = = 5. [sent-103, score-0.177]

42 Yet because MNB estimates its parameters from only the sparse training data, it can be inaccurate. [sent-114, score-0.229]

43 The optimization in MNB-FM seeks to accord its parameter estimates with the feature frequency, computed from unlabeled data, of P(w) = 4. [sent-115, score-0.608]

44 We see that compared with P(w), the θw+ and θw− that MNB estimates from the training data are argθ+mwaxN w +−ln(θK+w)−+LNθ+w¬) +lnN(1¬− wlθn+w()1+−K+Lθw+)s(bFeuoratvshae dtior ,n tshl)aoeinwsmoscbayxmui erawleumhn aoctselmtickaoenrulienhorteodlifear2belos etfuimt hoa ftneg5tn4fhio7atruodθfboew−. [sent-117, score-0.192]

45 The above example illustrates how MNB-FM can leverage frequency marginal statistics computed over unlabeled data to improve MNB’s conditional probability estimates. [sent-125, score-0.619]

46 We analyze how frequently MNB-FM succeeds in improving MNB’s estimates in practice, and the resulting impact on classification accuracy, below. [sent-126, score-0.26]

47 1 Data Sets We evaluate on two text classification tasks: topic classification, and sentiment detection. [sent-130, score-0.188]

48 , in a binary classification setting) for each topic and measure classification performance for each class individually. [sent-134, score-0.292]

49 The sentiment detection task is to determine whether a document is written with a positive or negative sentiment. [sent-135, score-0.241]

50 1 RCV1 The Reuters RCV1 corpus is a standard large corpus used for topic classification evaluations (Lewis et al. [sent-139, score-0.126]

51 We consider the 5 largest base classes after punctuation and stopwords were removed. [sent-142, score-0.103]

52 2 Reuters Aptemod While MNB-FM is designed to improve the scalability of SSL to large corpora, some of the comparison methods from previous work were not tractable on the large topic classification data set RCV1 . [sent-147, score-0.207]

53 In the Amazon Sentiment Classification data set, the task is to determine whether a review is positive or negative based solely on the reviewer’s submitted text. [sent-176, score-0.147]

54 As such, the positive and negative Class# Instances# PositiveVocabulary Music124362113997 (91. [sent-177, score-0.147]

55 For our metrics, we calculate the scores for both the positive and negative class and report the average of the two (in contrast to the Reuters data sets, in which we only report the scores for the positive class). [sent-188, score-0.304]

56 We also experimented with different weighting factors to assign to the unlabeled data. [sent-197, score-0.357]

57 While performing per-data-split cross-validation was computationally prohibitive for NB+EM, we performed experiments on one class from each data set that revealed weighting unlabeled examples at 1/5 the weight of a labeled example performed best. [sent-198, score-0.513]

58 3 Label Propagation For our large unlabeled data set sizes, we found that a standard Label Propogation (LP) approach, which considers propagating information between all pairs of unlabeled examples, was not tractable. [sent-209, score-0.666]

59 Even with these aggressive constraints, Label Propagation was intractable to execute on some of the larger data sets, so we do not report LP results for the RCV1 dataset or for the 5 largest Amazon categories. [sent-215, score-0.119]

60 SFE also augments multinomial Naive Bayes with the frequency information P(w), although in a manner distinct from MNB-FM. [sent-223, score-0.101]

61 In particular, SFE uses the equality P(+|w) = P(+, w)/P(w) aSnFdE e usstiemsa thtees tqhuea rlhitsy using P(w) computed over all the unlabeled data, rather than using only labeled data as in standard MNB. [sent-224, score-0.462]

62 The primary distinction between MNB-FM and SFE is that SFE adjusts sparse estimates P(+, w) in the same way as non-sparse estimates, whereas MNB-FM is designed to adjust sparse estimates more than nonsparse ones. [sent-225, score-0.404]

63 Further, it can be shown that as P(w) of a word w in the unlabeled data becomes larger than that in the labeled data, SFE’s estimate of the ratio P(w|+)/P(w| −) approaches one. [sent-226, score-0.434]

64 Each set included at 34le7ast one positive and one negative document. [sent-232, score-0.147]

65 These experiments are limited to the 5 largest base classes and show the F1 performance of MNB-FM and the various comparison methods, excluding Label Propagation which was intractable on this data set. [sent-292, score-0.104]

66 The results show the runtimes of the SSL methods discussed in this paper as the size of the unlabeled dataset grows. [sent-376, score-0.407]

67 As expected, we find that MNB-FM has runtime similar to MNB, and scales much better than methods that take multiple passes over the unlabeled data. [sent-377, score-0.418]

68 MNB-FM improves the con- ditional probability estimates in MNB and, surprisingly, we found that it can often improve these estimates for words that do not even occur in the training set. [sent-379, score-0.421]

69 Tables 8 and 9 show the details of the improvements MNB-FM makes on the feature marginal estimates. [sent-380, score-0.156]

70 We ran MNB-FM and MNB on the RCV1 class MCAT and stored the computed feature marginals for direct comparison. [sent-381, score-0.189]

71 From the data, we can see that MNB-FM improves the estimates for many words not seen in the training set as well as the most common words, even with small training sets. [sent-387, score-0.248]

72 fraction of positive documents classified correctly) of the R highestranked test documents, where R is the total number of positive test documents. [sent-392, score-0.213]

73 “Known” iTnadbicleat 8e:s Awonardlys occurring uinr e bo Mtha positive manprdo negative training examples, M“HNaBlf K(|Dnow|n =” in 10di)c. [sent-405, score-0.174]

74 at “eKs nwowordns” occurring in only positive or negative training examples, while “Unknown” indicates words that never occur in labelled examples. [sent-406, score-0.174]

75 MNB-FM improves estimates by a substantial amount for unknown words and also the most common known and half-known words. [sent-408, score-0.194]

76 However, these experiments show that MNB-FM offers more advantages in document classification than in document ranking. [sent-424, score-0.159]

77 However, LR underperforms in classification tasks (in terms of F1, Tables 4-6). [sent-426, score-0.095]

78 The reason for this is that LR’s learned classification threshold becomes less accurate when datasets are small and classes are highly ClassMNB-FM SFEMNB NBEM LProp Logist. [sent-427, score-0.137]

79 6 Related Work To our knowledge, MNB-FM is the first approach that utilizes a small set of statistics computed over Data SetMNB-FM SFEMNB NBEM Logist. [sent-485, score-0.144]

80 689 Table 12: RCV1 : R-Precision, DL= 100 349 a large unlabeled data set as constraints to improve a semi-supervised classifier. [sent-547, score-0.333]

81 Our experiments demonstrate that MNB-FM outperforms previous approaches across multiple text classification techniques including topic classification and sentiment analysis. [sent-548, score-0.323]

82 identifying a small number of representative unlabeled examples (Liu et al. [sent-555, score-0.359]

83 In general, these techniques require passes over the entirety of the unlabeled data for each new learn- ing task, intractable for massive unlabeled data sets. [sent-557, score-0.827]

84 Naive implementations of LP cannot scale to large unlabeled data sets, as they have time complexity that increases quadratically with the number of unlabeled examples. [sent-558, score-0.666]

85 Recent LP techniques have achieved greater scalability through the use of parallel processing and heuristics such as Approximate-Nearest Neighbor (Subramanya and Bilmes, 2009), or by decomposing the similarity matrix (Lin and Cohen, 2011). [sent-559, score-0.121]

86 Our approach, by contrast, is to pre-compute a small set of marginal statistics over the unlabeled data, which eliminates the need to scan unlabeled data for each new task. [sent-560, score-0.842]

87 propose the Semisupervised Frequency Estimate (SFE), which like MNB-FM utilizes the marginal probabilities of features computed from unlabeled data to improve the Multinomial Naive Bayes (MNB) classifier (Su et al. [sent-563, score-0.573]

88 However, unlike our approach, SFE does not compute maximumlikelihood estimates using the marginal statistics as a constraint. [sent-566, score-0.341]

89 A distinct method for pre-processing unlabeled data in order to help scale semi-supervised learning techniques involves dimensionality reduction or manifold learning (Belkin and Niyogi, 2004), and for NLP tasks, identifying word representations from unlabeled data (Turian et al. [sent-568, score-0.706]

90 In contrast to these approaches, MNB-FM preserves the original feature set and is more scalable (the marginal statistics can be computed in a single pass over the unlabeled data set). [sent-570, score-0.593]

91 7 Conclusion We presented a novel algorithm for efficiently leveraging large unlabeled data sets for semisupervised learning. [sent-571, score-0.388]

92 Our MNB-FM technique optimizes a Multinomial Naive Bayes model to accord with statistics of the unlabeled corpus. [sent-572, score-0.473]

93 In experiments across topic classification and sentiment analysis, MNB-FM was found to be more accu- rate and more scalable than several supervised and semi-supervised baselines from previous work. [sent-573, score-0.226]

94 In future work, we plan to explore utilizing richer statistics from the unlabeled data, beyond word marginals. [sent-574, score-0.41]

95 Further, we plan to experiment with techniques for unlabeled data sets that also include continuous-valued features. [sent-575, score-0.373]

96 Lastly, we also wish to explore ensemble approaches that combine the best supervised classifiers with the improved class-conditional estimates provided by MNB-FM. [sent-576, score-0.165]

97 Transductive inference for text classification using support vector machines. [sent-595, score-0.095]

98 Text classification from labeled and unlabeled documents using em. [sent-621, score-0.528]

99 Large scale text classification using semisupervised multinomial naive bayes. [sent-627, score-0.331]

100 Learning from labeled and unlabeled data with label propagation. [sent-648, score-0.421]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mnb', 0.507), ('unlabeled', 0.333), ('ssl', 0.247), ('sfe', 0.227), ('estimates', 0.165), ('aptemod', 0.15), ('ecat', 0.15), ('gpol', 0.15), ('mcat', 0.15), ('amzn', 0.129), ('nbem', 0.129), ('sfemnb', 0.129), ('dl', 0.128), ('marginal', 0.124), ('apte', 0.114), ('nw', 0.11), ('naive', 0.109), ('gcat', 0.107), ('classification', 0.095), ('nigam', 0.087), ('positive', 0.086), ('nb', 0.086), ('reuters', 0.084), ('pt', 0.082), ('scalability', 0.081), ('bayes', 0.079), ('multinomial', 0.072), ('marginals', 0.072), ('class', 0.071), ('propagation', 0.067), ('lp', 0.065), ('accord', 0.064), ('mnbfm', 0.064), ('amazon', 0.064), ('sentiment', 0.062), ('negative', 0.061), ('ln', 0.061), ('labeled', 0.059), ('semisupervised', 0.055), ('lr', 0.055), ('statistics', 0.052), ('logistic', 0.051), ('em', 0.051), ('passes', 0.05), ('earnings', 0.047), ('runtimes', 0.047), ('utilizes', 0.046), ('computed', 0.046), ('lprop', 0.043), ('wdp', 0.043), ('estimate', 0.042), ('classes', 0.042), ('equation', 0.041), ('regression', 0.041), ('documents', 0.041), ('techniques', 0.04), ('su', 0.038), ('scalable', 0.038), ('sparse', 0.037), ('intractable', 0.036), ('mann', 0.036), ('stopwords', 0.035), ('scales', 0.035), ('entirety', 0.035), ('tables', 0.035), ('probability', 0.035), ('belkin', 0.033), ('document', 0.032), ('details', 0.032), ('chapelle', 0.031), ('topic', 0.031), ('ml', 0.031), ('execute', 0.03), ('yiming', 0.03), ('frequency', 0.029), ('label', 0.029), ('improves', 0.029), ('gw', 0.029), ('subramanya', 0.029), ('transductive', 0.029), ('smoothing', 0.029), ('zhu', 0.028), ('wy', 0.028), ('lk', 0.028), ('ghahramani', 0.028), ('training', 0.027), ('dataset', 0.027), ('turian', 0.026), ('examples', 0.026), ('largest', 0.026), ('utilizing', 0.025), ('constraint', 0.025), ('icml', 0.025), ('optimizes', 0.024), ('classifier', 0.024), ('weighting', 0.024), ('equality', 0.024), ('lewis', 0.024), ('blitzer', 0.023), ('proceeds', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 309 acl-2013-Scaling Semi-supervised Naive Bayes with Feature Marginals

Author: Michael Lucas ; Doug Downey

Abstract: Semi-supervised learning (SSL) methods augment standard machine learning (ML) techniques to leverage unlabeled data. SSL techniques are often effective in text classification, where labeled data is scarce but large unlabeled corpora are readily available. However, existing SSL techniques typically require multiple passes over the entirety of the unlabeled data, meaning the techniques are not applicable to large corpora being produced today. In this paper, we show that improving marginal word frequency estimates using unlabeled data can enable semi-supervised text classification that scales to massive unlabeled data sets. We present a novel learning algorithm, which optimizes a Naive Bayes model to accord with statistics calculated from the unlabeled corpus. In experiments with text topic classification and sentiment analysis, we show that our method is both more scalable and more accurate than SSL techniques from previous work.

2 0.19387057 173 acl-2013-Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Author: Xiaodong Zeng ; Derek F. Wong ; Lidia S. Chao ; Isabel Trancoso

Abstract: This paper introduces a graph-based semisupervised joint model of Chinese word segmentation and part-of-speech tagging. The proposed approach is based on a graph-based label propagation technique. One constructs a nearest-neighbor similarity graph over all trigrams of labeled and unlabeled data for propagating syntactic information, i.e., label distributions. The derived label distributions are regarded as virtual evidences to regularize the learning of linear conditional random fields (CRFs) on unlabeled data. An inductive character-based joint model is obtained eventually. Empirical results on Chinese tree bank (CTB-7) and Microsoft Research corpora (MSR) reveal that the proposed model can yield better results than the supervised baselines and other competitive semi-supervised CRFs in this task.

3 0.18548104 342 acl-2013-Text Classification from Positive and Unlabeled Data using Misclassified Data Correction

Author: Fumiyo Fukumoto ; Yoshimi Suzuki ; Suguru Matsuyoshi

Abstract: This paper addresses the problem of dealing with a collection of labeled training documents, especially annotating negative training documents and presents a method of text classification from positive and unlabeled data. We applied an error detection and correction technique to the results of positive and negative documents classified by the Support Vector Machines (SVM). The results using Reuters documents showed that the method was comparable to the current state-of-the-art biasedSVM method as the F-score obtained by our method was 0.627 and biased-SVM was 0.614.

4 0.16883054 315 acl-2013-Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression

Author: Asli Celikyilmaz ; Dilek Hakkani-Tur ; Gokhan Tur ; Ruhi Sarikaya

Abstract: Microsoft Research Microsoft Mountain View, CA, USA Redmond, WA, USA dilek @ ieee .org rus arika@mi cro s o ft . com gokhan .tur @ ieee .org performance (Tur and DeMori, 2011). This requires a tedious and time intensive data collection Finding concepts in natural language utterances is a challenging task, especially given the scarcity of labeled data for learning semantic ambiguity. Furthermore, data mismatch issues, which arise when the expected test (target) data does not exactly match the training data, aggravate this scarcity problem. To deal with these issues, we describe an efficient semisupervised learning (SSL) approach which has two components: (i) Markov Topic Regression is a new probabilistic model to cluster words into semantic tags (concepts). It can efficiently handle semantic ambiguity by extending standard topic models with two new features. First, it encodes word n-gram features from labeled source and unlabeled target data. Second, by going beyond a bag-of-words approach, it takes into account the inherent sequential nature of utterances to learn semantic classes based on context. (ii) Retrospective Learner is a new learning technique that adapts to the unlabeled target data. Our new SSL approach improves semantic tagging performance by 3% absolute over the baseline models, and also compares favorably on semi-supervised syntactic tagging.

5 0.11726753 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

Author: Xiaodong Zeng ; Derek F. Wong ; Lidia S. Chao ; Isabel Trancoso

Abstract: This paper presents a semi-supervised Chinese word segmentation (CWS) approach that co-regularizes character-based and word-based models. Similarly to multi-view learning, the “segmentation agreements” between the two different types of view are used to overcome the scarcity of the label information on unlabeled data. The proposed approach trains a character-based and word-based model on labeled data, respectively, as the initial models. Then, the two models are constantly updated using unlabeled examples, where the learning objective is maximizing their segmentation agreements. The agreements are regarded as a set of valuable constraints for regularizing the learning of both models on unlabeled data. The segmentation for an input sentence is decoded by using a joint scoring function combining the two induced models. The evaluation on the Chinese tree bank reveals that our model results in better gains over the state-of-the-art semi-supervised models reported in the literature.

6 0.10834722 211 acl-2013-LABR: A Large Scale Arabic Book Reviews Dataset

7 0.10223159 81 acl-2013-Co-Regression for Cross-Language Review Rating Prediction

8 0.089714311 193 acl-2013-Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations

9 0.086473271 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data

10 0.077383973 78 acl-2013-Categorization of Turkish News Documents with Morphological Analysis

11 0.075698152 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

12 0.074503526 351 acl-2013-Topic Modeling Based Classification of Clinical Reports

13 0.074099667 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words

14 0.071702801 294 acl-2013-Re-embedding words

15 0.068962842 318 acl-2013-Sentiment Relevance

16 0.064981043 325 acl-2013-Smoothed marginal distribution constraints for language modeling

17 0.059186604 345 acl-2013-The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

18 0.056839868 74 acl-2013-Building Comparable Corpora Based on Bilingual LDA Model

19 0.055633448 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction

20 0.055019535 121 acl-2013-Discovering User Interactions in Ideological Discussions


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.162), (1, 0.066), (2, -0.034), (3, 0.051), (4, 0.062), (5, -0.094), (6, 0.021), (7, -0.003), (8, -0.083), (9, 0.035), (10, 0.073), (11, -0.004), (12, 0.011), (13, -0.029), (14, -0.058), (15, 0.023), (16, -0.067), (17, 0.078), (18, -0.015), (19, 0.035), (20, 0.052), (21, 0.044), (22, 0.044), (23, 0.066), (24, 0.034), (25, -0.016), (26, 0.018), (27, -0.009), (28, -0.088), (29, -0.044), (30, -0.085), (31, 0.051), (32, -0.025), (33, 0.249), (34, 0.048), (35, 0.0), (36, -0.118), (37, -0.097), (38, -0.004), (39, -0.035), (40, 0.002), (41, -0.053), (42, 0.029), (43, 0.08), (44, 0.031), (45, 0.065), (46, 0.085), (47, -0.022), (48, 0.143), (49, -0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94423395 309 acl-2013-Scaling Semi-supervised Naive Bayes with Feature Marginals

Author: Michael Lucas ; Doug Downey

Abstract: Semi-supervised learning (SSL) methods augment standard machine learning (ML) techniques to leverage unlabeled data. SSL techniques are often effective in text classification, where labeled data is scarce but large unlabeled corpora are readily available. However, existing SSL techniques typically require multiple passes over the entirety of the unlabeled data, meaning the techniques are not applicable to large corpora being produced today. In this paper, we show that improving marginal word frequency estimates using unlabeled data can enable semi-supervised text classification that scales to massive unlabeled data sets. We present a novel learning algorithm, which optimizes a Naive Bayes model to accord with statistics calculated from the unlabeled corpus. In experiments with text topic classification and sentiment analysis, we show that our method is both more scalable and more accurate than SSL techniques from previous work.

2 0.84483647 342 acl-2013-Text Classification from Positive and Unlabeled Data using Misclassified Data Correction

Author: Fumiyo Fukumoto ; Yoshimi Suzuki ; Suguru Matsuyoshi

Abstract: This paper addresses the problem of dealing with a collection of labeled training documents, especially annotating negative training documents and presents a method of text classification from positive and unlabeled data. We applied an error detection and correction technique to the results of positive and negative documents classified by the Support Vector Machines (SVM). The results using Reuters documents showed that the method was comparable to the current state-of-the-art biasedSVM method as the F-score obtained by our method was 0.627 and biased-SVM was 0.614.

3 0.67884302 81 acl-2013-Co-Regression for Cross-Language Review Rating Prediction

Author: Xiaojun Wan

Abstract: The task of review rating prediction can be well addressed by using regression algorithms if there is a reliable training set of reviews with human ratings. In this paper, we aim to investigate a more challenging task of crosslanguage review rating prediction, which makes use of only rated reviews in a source language (e.g. English) to predict the rating scores of unrated reviews in a target language (e.g. German). We propose a new coregression algorithm to address this task by leveraging unlabeled reviews. Evaluation results on several datasets show that our proposed co-regression algorithm can consistently improve the prediction results. 1

4 0.66914886 315 acl-2013-Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression

Author: Asli Celikyilmaz ; Dilek Hakkani-Tur ; Gokhan Tur ; Ruhi Sarikaya

Abstract: Microsoft Research Microsoft Mountain View, CA, USA Redmond, WA, USA dilek @ ieee .org rus arika@mi cro s o ft . com gokhan .tur @ ieee .org performance (Tur and DeMori, 2011). This requires a tedious and time intensive data collection Finding concepts in natural language utterances is a challenging task, especially given the scarcity of labeled data for learning semantic ambiguity. Furthermore, data mismatch issues, which arise when the expected test (target) data does not exactly match the training data, aggravate this scarcity problem. To deal with these issues, we describe an efficient semisupervised learning (SSL) approach which has two components: (i) Markov Topic Regression is a new probabilistic model to cluster words into semantic tags (concepts). It can efficiently handle semantic ambiguity by extending standard topic models with two new features. First, it encodes word n-gram features from labeled source and unlabeled target data. Second, by going beyond a bag-of-words approach, it takes into account the inherent sequential nature of utterances to learn semantic classes based on context. (ii) Retrospective Learner is a new learning technique that adapts to the unlabeled target data. Our new SSL approach improves semantic tagging performance by 3% absolute over the baseline models, and also compares favorably on semi-supervised syntactic tagging.

5 0.66845095 173 acl-2013-Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Author: Xiaodong Zeng ; Derek F. Wong ; Lidia S. Chao ; Isabel Trancoso

Abstract: This paper introduces a graph-based semisupervised joint model of Chinese word segmentation and part-of-speech tagging. The proposed approach is based on a graph-based label propagation technique. One constructs a nearest-neighbor similarity graph over all trigrams of labeled and unlabeled data for propagating syntactic information, i.e., label distributions. The derived label distributions are regarded as virtual evidences to regularize the learning of linear conditional random fields (CRFs) on unlabeled data. An inductive character-based joint model is obtained eventually. Empirical results on Chinese tree bank (CTB-7) and Microsoft Research corpora (MSR) reveal that the proposed model can yield better results than the supervised baselines and other competitive semi-supervised CRFs in this task.

6 0.65718561 182 acl-2013-High-quality Training Data Selection using Latent Topics for Graph-based Semi-supervised Learning

7 0.57720554 277 acl-2013-Part-of-speech tagging with antagonistic adversaries

8 0.50173891 341 acl-2013-Text Classification based on the Latent Topics of Important Sentences extracted by the PageRank Algorithm

9 0.49404269 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

10 0.4848243 72 acl-2013-Bridging Languages through Etymology: The case of cross language text categorization

11 0.47939169 294 acl-2013-Re-embedding words

12 0.47442552 14 acl-2013-A Novel Classifier Based on Quantum Computation

13 0.46478847 78 acl-2013-Categorization of Turkish News Documents with Morphological Analysis

14 0.4612433 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data

15 0.44534585 91 acl-2013-Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning

16 0.43636894 356 acl-2013-Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia

17 0.43316045 295 acl-2013-Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages

18 0.43172404 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

19 0.42231882 217 acl-2013-Latent Semantic Matching: Application to Cross-language Text Categorization without Alignment Information

20 0.41893575 298 acl-2013-Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.049), (4, 0.063), (6, 0.02), (11, 0.099), (24, 0.046), (26, 0.056), (35, 0.057), (42, 0.042), (48, 0.07), (67, 0.196), (70, 0.038), (88, 0.034), (90, 0.05), (95, 0.062)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.82109058 138 acl-2013-Enriching Entity Translation Discovery using Selective Temporality

Author: Gae-won You ; Young-rok Cha ; Jinhan Kim ; Seung-won Hwang

Abstract: This paper studies named entity translation and proposes “selective temporality” as a new feature, as using temporal features may be harmful for translating “atemporal” entities. Our key contribution is building an automatic classifier to distinguish temporal and atemporal entities then align them in separate procedures to boost translation accuracy by 6. 1%.

same-paper 2 0.81226754 309 acl-2013-Scaling Semi-supervised Naive Bayes with Feature Marginals

Author: Michael Lucas ; Doug Downey

Abstract: Semi-supervised learning (SSL) methods augment standard machine learning (ML) techniques to leverage unlabeled data. SSL techniques are often effective in text classification, where labeled data is scarce but large unlabeled corpora are readily available. However, existing SSL techniques typically require multiple passes over the entirety of the unlabeled data, meaning the techniques are not applicable to large corpora being produced today. In this paper, we show that improving marginal word frequency estimates using unlabeled data can enable semi-supervised text classification that scales to massive unlabeled data sets. We present a novel learning algorithm, which optimizes a Naive Bayes model to accord with statistics calculated from the unlabeled corpus. In experiments with text topic classification and sentiment analysis, we show that our method is both more scalable and more accurate than SSL techniques from previous work.

3 0.78408509 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network

Author: Nan Yang ; Shujie Liu ; Mu Li ; Ming Zhou ; Nenghai Yu

Abstract: In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). We describe in detail how we adapt and extend the CD-DNNHMM (Dahl et al., 2012) method introduced in speech recognition to the HMMbased word alignment model, in which bilingual word embedding is discriminatively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. While being capable to model the rich bilingual correspondence, our method generates a very compact model with much fewer parameters. Experiments on a large scale EnglishChinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score.

4 0.7648381 100 acl-2013-Crowdsourcing Interaction Logs to Understand Text Reuse from the Web

Author: Martin Potthast ; Matthias Hagen ; Michael Volske ; Benno Stein

Abstract: unkown-abstract

5 0.76015365 110 acl-2013-Deepfix: Statistical Post-editing of Statistical Machine Translation Using Deep Syntactic Analysis

Author: Rudolf Rosa ; David Marecek ; Ales Tamchyna

Abstract: Deepfix is a statistical post-editing system for improving the quality of statistical machine translation outputs. It attempts to correct errors in verb-noun valency using deep syntactic analysis and a simple probabilistic model of valency. On the English-to-Czech translation pair, we show that statistical post-editing of statistical machine translation leads to an improvement of the translation quality when helped by deep linguistic knowledge.

6 0.74338025 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution

7 0.70868742 294 acl-2013-Re-embedding words

8 0.68367922 173 acl-2013-Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

9 0.67301655 216 acl-2013-Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language

10 0.66715407 38 acl-2013-Additive Neural Networks for Statistical Machine Translation

11 0.66034073 275 acl-2013-Parsing with Compositional Vector Grammars

12 0.64910847 315 acl-2013-Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression

13 0.64387161 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

14 0.64232844 121 acl-2013-Discovering User Interactions in Ideological Discussions

15 0.641527 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing

16 0.63972157 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals

17 0.63892609 273 acl-2013-Paraphrasing Adaptation for Web Search Ranking

18 0.63701594 156 acl-2013-Fast and Adaptive Online Training of Feature-Rich Translation Models

19 0.63585615 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

20 0.63441503 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words