nips nips2002 nips2002-162 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Naonori Ueda, Kazumi Saito
Abstract: We propose probabilistic generative models, called parametric mixture models (PMMs), for multiclass, multi-labeled text categorization problem. Conventionally, the binary classification approach has been employed, in which whether or not text belongs to a category is judged by the binary classifier for every category. In contrast, our approach can simultaneously detect multiple categories of text using PMMs. We derive efficient learning and prediction algorithms for PMMs. We also empirically show that our method could significantly outperform the conventional binary methods when applied to multi-labeled text categorization using real World Wide Web pages. 1
Reference: text
sentIndex sentText sentNum sentScore
1 jp Abstract We propose probabilistic generative models, called parametric mixture models (PMMs), for multiclass, multi-labeled text categorization problem. [sent-5, score-0.552]
2 Conventionally, the binary classification approach has been employed, in which whether or not text belongs to a category is judged by the binary classifier for every category. [sent-6, score-0.571]
3 In contrast, our approach can simultaneously detect multiple categories of text using PMMs. [sent-7, score-0.317]
4 We derive efficient learning and prediction algorithms for PMMs. [sent-8, score-0.062]
5 We also empirically show that our method could significantly outperform the conventional binary methods when applied to multi-labeled text categorization using real World Wide Web pages. [sent-9, score-0.407]
6 1 Introduction Recently, as the number of online documents has been rapidly increasing, automatic text categorization is becoming a more important and fundamental task in information retrieval and text mining. [sent-10, score-0.649]
7 Since a document often belongs to multiple categories, the task of text categorization is generally defined as assigning one or more category labels to new text. [sent-11, score-0.716]
8 Hence, this type of categorization problem has become a challenging research theme in the field of machine learning. [sent-14, score-0.156]
9 Conventionally, a binary classification approach has been used, in which the multicategory detection problem is decomposed into independent binary classification problems. [sent-15, score-0.09]
10 However, since the binary approach does not consider a generative model of multi-labeled text, we think that it has an important limitation when applied to the multi-labeled text categorization. [sent-17, score-0.348]
11 In this paper, using independent word-based representation, known as Bag-of-Words (BOW) representation [3], we present two types of probabilistic generative models for multi-labeled text called parametric mixture models (PMM1, PMM2), where PMM2 is a more flexible version of PMM1. [sent-18, score-0.418]
12 The basic assumption under PMMs is that multi-labeled text has a mixture of characteristic words appearing in singlelabeled text that belong to each category of the multi-categories. [sent-19, score-0.823]
13 This assumption leads us to construct quite simple generative models with a good feature: the objective function of PMM1 is convex (i. [sent-20, score-0.133]
14 We also show the actual benefits of PMMs through an application of WWW page categorization, focusing on those from the “yahoo. [sent-24, score-0.091]
15 1 Parametric Mixture Models Multi-labeled Text According to the BOW representation, which ignores the order of word occurrence in a document, the nth document, dn , can be represented by a word-frequency vector, xn = (xn , . [sent-27, score-0.213]
16 , xn ), where xn denotes the frequency of word wi occurrence 1 i V in dn among the vocabulary V =< w1 , . [sent-30, score-0.339]
17 Here, V is the total number n n of words in the vocabulary. [sent-34, score-0.044]
18 , yL ) be a category vector for n n n d , where yl takes a value of 1(0) when d belongs (does not belong) to the lth category. [sent-38, score-0.673]
19 Note that L categories are pre-defined and that a document always belongs to at least one category (i. [sent-40, score-0.443]
20 In the case of multi-class and single-labeled text, it is natural that x in the lth catV egory should be generated from a multinomial distribution: P (x|l) ∝ i=1 (θl,i )xi V Here, θl,i ≥ 0 and i=1 θl,i = 1. [sent-43, score-0.123]
21 θl,i is a probability that the ith word wi appears in a ducument belonging to the lth class. [sent-44, score-0.217]
22 We generalize this to multi-class and multi-labeled text as: V V x (ϕi (y)) i , P (x|y) ∝ ϕi (y) = 1. [sent-45, score-0.228]
23 where ϕi (y) ≥ 0 and i=1 (1) i=1 Here, ϕi (y) is a class-dependent probability that the ith word appears in a document belonging to class y. [sent-46, score-0.201]
24 Clearly, it is impractical to independently set a multinomial parameter vector to each distinct y, since there are 2L − 1 possible classes. [sent-47, score-0.063]
25 2 PMM1 In general, words in a document belonging to a multi-category class can be regarded as a mixture of characteristic words related to each of the categories. [sent-50, score-0.352]
26 For example, a document that belongs to both “sports” and “music” would consist of a mixture of characteristic words mainly related to both categories. [sent-51, score-0.312]
27 , ϕV (y))) can be represented by the following parametric mixture: L ϕ(y) = hl (y)θ l , where hl (y) = 0 for l such that yl = 0. [sent-59, score-0.773]
28 (2) l=1 L Here, hl (y)(> 0) is a mixing proportion ( l=1 hl (y) = 1). [sent-60, score-0.403]
29 Intuitively, hl (y) can also be interpreted as the degree to which x has the lth category. [sent-61, score-0.284]
30 Based on the parametric mixture assumption, we can construct a simple parametric L mixture model, PMM1, in which the degree is uniform: hl (y) = yl / l =1 yl . [sent-63, score-1.135]
31 (1), PMM1 can be defined by V L l=1 yl θl,i L l =1 yl P (x|y, Θ) ∝ i=1 xi . [sent-67, score-0.692]
32 l=1 Of course, multi-category text may sometimes be weighted more toward one category than to the rest of the categories among multiple categories. [sent-69, score-0.532]
33 PMMs are different from usual distributional mixture models in the sense that the mixing is performed in a parameter space, while the latter several distributional components are mixed. [sent-72, score-0.208]
34 ” On the other hand, PMM1 can represent 2L − 1 multi-category classes with only L parameter vectors. [sent-74, score-0.035]
35 We consider the second order model, PMM2, as a more flexible model, in which parameter vectors of duplicate-category, θ l,m , are also used to approximate ϕ(y). [sent-78, score-0.035]
36 L L ϕ(y) = hl (y)hm (y)θ l,m , where θ l,m = αl,m θ l + αm,l θ m . [sent-79, score-0.189]
37 (4) l=1 m=1 Here, αl,m is a non-negative bias parameter satisfying αl,m + αm,l = 1, ∀l, m. [sent-80, score-0.035]
38 In PMM2, unlike in PMM1, the category biases themselves can be estimated from given training data. [sent-84, score-0.26]
39 (4), PMM2 can be defined by V P (x|y; Θ) ∝ i=1 L L m=1 yl ym θl,m,i l=1 L L l=1 yl m=1 ym xi (5) A set of unknown parameters in PMM2 becomes Θ = {θ l , αl,m }L,L l=1,m=1 . [sent-86, score-0.786]
40 also perform single-labeled text categorization using LDA in which individual LDA is fitted to each class. [sent-91, score-0.362]
41 Moreover, unlike in LDA, it is feasible to compute the objective functions for PMMs exactly as shown below. [sent-95, score-0.032]
42 1 Objective functions Let D = {(xn , y n )}N denote the given training data (N labeled documents). [sent-97, score-0.035]
43 The n=1 unknown parameter Θ is estimated by maximizing posterior p(Θ|D). [sent-98, score-0.059]
44 Assuming ˆ that P (y) is independent of Θ, Θmap = arg maxΘ {log P (xn |y n , Θ) + log p(Θ)}. [sent-99, score-0.127]
45 ˆ Consequently, the objective function to find Θmap is given by L V L J(Θ; D) = L(Θ; D) + (ξ − 1) L log θl,i + (ζ − 1) log αl,m . [sent-103, score-0.234]
46 The likelihood term, L, is given by N PMM1 : V L L(Θ; D) = n=1 i=1 N PMM2 : hn θl,i , l xn,i log V L L(Θ; D) = (7) l=1 L hn hn θl,m,i . [sent-106, score-1.895]
47 l m xn,i log n=1 i=1 (8) l=1 m=1 Note that θl,m,i = αl,m θl,i + αm,l θm,i . [sent-107, score-0.101]
48 Although the steepest ascend algorithms involving Newton’s method are available, here we derive an efficient algorithm in a similar manner to the EM algorithm [2]. [sent-111, score-0.035]
49 First, we derive parameter update formulae for PMM2 because they are more general than those for PMM1. [sent-112, score-0.166]
50 We then attmpt to derive Θ(t+1) by using n Θ(t) . [sent-115, score-0.035]
51 L L n gl,m,i (Θ) = hn hn θl,m,i l m hn hn θl,m,i , l m (9) l=1 m=1 λl,m,i (θ l,m ) = αl,m θl,i /θl,m,i , λm,l,i (θ l,m ) = αm,l θm,i /θl,m,i . [sent-117, score-2.392]
52 L l=1 Noting that L(Θ; D) = l,m n n gl,m,i (Θ(t) ) log hn hn θl,m,i l m xn,i n,i = 1, L for PMM2 can be rewritten as n gl,m,i (Θ(t) )} log{( xn,i { n,i = L n m=1 gl,m,i (Θ) (10) hn hn θl,m,i l m ) hn hn θl,m,i l m − l ,m n n gl,m,i (Θ(t) ) log gl,m,i (Θ). [sent-118, score-3.79]
53 (11) xn,i n,i l,m hn hn θl ,m ,i } l m l,m Moreover, noting that λl,m,i (θ l,m ) + λm,l,i (θ l,m ) = 1, we rewrite the first term on the RHS of Eq. [sent-119, score-1.244]
54 (17) V N n n (t) i=1 n=1 xi ql,m,i (Θ ) + ζ − 1 2 2 = V i=1 N n=1 These parameter updates always converge to a local optimum of J given by Eq. [sent-130, score-0.144]
55 In PMM1, since unknown parameter is just {θ l }, by modifying Eq. [sent-132, score-0.059]
56 (9) as hn θl,i l , L n l=1 hl θl,i n gl,i (Θ) = (18) and rewriting Eq. [sent-133, score-0.787]
57 (7) in a similar manner, we obtain n gl,i (Θ(t) ) log hn θl,i − l xn,i L(Θ; D) = n,i n n gl,i (Θ(t) ) log gl,i (Θ). [sent-134, score-0.8]
58 xn,i n,i l (19) l In this case, U becomes a simpler form as N V L U(Θ|Θ(t) ) = n gl,i (Θ(t) ) log hn θl,i . [sent-135, score-0.699]
59 l xn,i n=1 i=1 (20) l=1 L V Therefore, maximizing U(Θ|Θ(t) )+(ξ − 1) l=1 i=1 log θl,i w. [sent-136, score-0.101]
60 (21) of PMM1 always converges to the global optimum solution. [sent-141, score-0.093]
61 Proof: The Hessian matrix, H, of the objective function, J, of PMM1 becomes H = ΦT ∂ 2 J(Θ; D) Φ ∂Θ∂ΘT = d2 J(Θ + κΦ; D) dκ2 xn i = − n,i κ=0 2 n l hi φli n l hi θli − (ξ − 1) l,i φli θli 2 . [sent-142, score-0.188]
62 Noting that xn ≥ 0, ξ > 1 and Φ = 0, i H is negative definite; therefore J is a strictly convex function of Θ. [sent-144, score-0.138]
63 Moreover, since the feasible region defined by J and constraints i θl,1 = 1, ∀l is a convex set, the maximization problem here becomes a convex programming problem and has a unique global solution. [sent-145, score-0.117]
64 (21) always increases J at each iteration, the learning algorithm given above always converges to the global optimum solution, irrespective of any initial parameter value. [sent-147, score-0.128]
65 Then, applying Bayes’ rule, the optimum category vector y ∗ for x∗ of a new document is defined as: y ∗ = ˆ arg maxy P (y|x∗ ; Θ) under a uniform class prior assumption. [sent-150, score-0.41]
66 Since this maximization problem belongs to the zero-one integer problem (i. [sent-151, score-0.084]
67 1 Automatic Web Page Categorization We tried to categorize real Web pages linked from the “yahoo. [sent-161, score-0.061]
68 More specifically, Yahoo consists of 14 top-level categories (i. [sent-163, score-0.089]
69 , “Arts & Humanities,” “Business & Economy,” “Computers & Internet,” and so on), and each category is classified into a number of second-level subcategories. [sent-165, score-0.192]
70 By focusing on the secondlevel categories, we can make 14 independent text categorization problems. [sent-166, score-0.391]
71 To collect a set of related Web pages for each problem, we used a software robot called ”GNU Wget (version 1. [sent-170, score-0.062]
72 A text multi-label can be obtained by following its hyperlinks in reverse toward the page of origin. [sent-173, score-0.337]
73 0), tuning the C (penalty cost) and J (cost-factor for negative and positive samples) parameters for each binary classification to improve the SVM results [6]3 . [sent-176, score-0.045]
74 In addition, it is worth mentioning that when performing the SVM, V each xn was normalized to be i=1 xn = 1 because discrimination is much easier i in the V − 1-dimensional simplex than in the original V dimensional space. [sent-177, score-0.232]
75 In other words, classification is generally not determined by the number of words on the page; actually, normalization could also significantly improve the performance. [sent-178, score-0.044]
76 1 This domain is a famous portal site and most related pages linked from the domain are registered by site recommendation and therefore category labels would be reliable. [sent-179, score-0.307]
77 2 We could not collect enough pages for three categories due to our communication network security. [sent-180, score-0.151]
78 3 Since the ratio of the number of positive samples to negative samples per category was quite small in our web pages, SVM without the J option provided poor results. [sent-182, score-0.351]
79 1 2 3 4 5 6 7 8 9 10 11 Table 1: Performance for 3000 test data using 2000 training data. [sent-184, score-0.035]
80 As for NNs, an NN consists of V input units and L output units for estimating a category vector from each frequency vector. [sent-318, score-0.192]
81 An NN was trained to maximize the sum of cross-entropy functions for target and estimated category vectors of training samples, together with a regularization term consisting of a sum of squared NN weights. [sent-320, score-0.227]
82 , yL ) be actual and predicted category vecyn ˆn n tors for x , respectively. [sent-331, score-0.192]
83 Subsequently, the Fn = 2Pn Rn /(Pn + Rn ), where L L L L n n n n n ˆ ˆn ˆ Pn = l=1 yl yl / l=1 yl and Rn = l=1 yl yl / l=1 yl . [sent-332, score-1.95]
84 We evaluated the per3000 1 ¯ formance by F = 3000 n=1 Fn using 3000 test data independent of the training data. [sent-333, score-0.059]
85 Although micro- and macro-averages can be used, we think that the samplebased F -measure is the most suitable for evaluating the generalization performance, since it is natural to consider the i. [sent-334, score-0.033]
86 2 Results For each of the 11 problems, we used five pairs of training and test data sets. [sent-339, score-0.035]
87 In ¯ Table 1 (Table 2) we compared the mean of the F values over five trials by using 2000 (500) training documents. [sent-340, score-0.035]
88 PMMs took about five minutes for training (2000 data) and only about one minute for the test (3000 data) on 2. [sent-342, score-0.061]
89 In the binary approach, SVMs with optimally tuned parameters produced rather better results than the NB method. [sent-345, score-0.045]
90 These experimental results support the importance of considering generative models of multi-category text. [sent-347, score-0.042]
91 When the training sample size was 2000, kNN provided comparable results to the NB method. [sent-348, score-0.035]
92 On the other hand, when the training sample size was 500, the kNN method obtained results similar to or slightly better than those of SVM. [sent-349, score-0.035]
93 We think that the memorybased approach is limited in its generalization ability for multi-labeled text categorization. [sent-351, score-0.261]
94 The results of well-regularized NN were fair, although it took an intolerable amount of training time, indicating that flexible discrimination would not be necessary for Table 2: Performance for 3000 test No. [sent-352, score-0.085]
95 5 Concluding Remarks We have proposed new types of mixture models (PMMs) for multi-labeled text categorization, and also efficient algorithms for both learning and prediction. [sent-489, score-0.306]
96 Moreover, we have confirmed that studying the generative model for multi-labeled text is beneficial in improving the performance. [sent-491, score-0.27]
97 Text categorization with support vector machines: Learning with many relevant features. [sent-523, score-0.134]
98 A comparison of two learning algorithms for text categorization. [sent-529, score-0.228]
99 Text classification from labeled and unlabeled documents using EM. [sent-546, score-0.036]
100 A comparative study on feature selection in text categorization. [sent-551, score-0.228]
wordName wordTfidf (topN-words)
[('hn', 0.598), ('pmms', 0.326), ('yl', 0.325), ('text', 0.228), ('category', 0.192), ('hl', 0.189), ('categorization', 0.134), ('lda', 0.129), ('knn', 0.129), ('nn', 0.112), ('nb', 0.108), ('xn', 0.104), ('log', 0.101), ('document', 0.101), ('lth', 0.095), ('categories', 0.089), ('web', 0.087), ('mixture', 0.078), ('parametric', 0.07), ('optimum', 0.067), ('formulae', 0.065), ('page', 0.062), ('belongs', 0.061), ('classi', 0.057), ('li', 0.054), ('svm', 0.053), ('noting', 0.048), ('binary', 0.045), ('words', 0.044), ('www', 0.043), ('fn', 0.043), ('xi', 0.042), ('generative', 0.042), ('word', 0.041), ('dirichlet', 0.041), ('ve', 0.041), ('conventionally', 0.04), ('bow', 0.04), ('dn', 0.04), ('rn', 0.038), ('documents', 0.036), ('moreover', 0.036), ('rhs', 0.036), ('derive', 0.035), ('parameter', 0.035), ('training', 0.035), ('rmed', 0.035), ('ym', 0.035), ('distributional', 0.035), ('pn', 0.035), ('exible', 0.034), ('convex', 0.034), ('blei', 0.033), ('collect', 0.033), ('biases', 0.033), ('think', 0.033), ('linked', 0.032), ('objective', 0.032), ('belonging', 0.032), ('exhaustive', 0.031), ('update', 0.031), ('pages', 0.029), ('cation', 0.029), ('focusing', 0.029), ('svms', 0.028), ('multinomial', 0.028), ('occurrence', 0.028), ('characteristic', 0.028), ('site', 0.027), ('ith', 0.027), ('prediction', 0.027), ('took', 0.026), ('hi', 0.026), ('arg', 0.026), ('global', 0.026), ('bayes', 0.025), ('regarded', 0.025), ('assumption', 0.025), ('mixing', 0.025), ('su', 0.025), ('samples', 0.024), ('unknown', 0.024), ('discrimination', 0.024), ('hikaridai', 0.024), ('naonori', 0.024), ('formance', 0.024), ('maxy', 0.024), ('nns', 0.024), ('hyperlinks', 0.024), ('option', 0.024), ('economy', 0.024), ('svmlight', 0.024), ('ned', 0.023), ('maximization', 0.023), ('toward', 0.023), ('retrieval', 0.023), ('naive', 0.023), ('cient', 0.022), ('wi', 0.022), ('theme', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 162 nips-2002-Parametric Mixture Models for Multi-Labeled Text
Author: Naonori Ueda, Kazumi Saito
Abstract: We propose probabilistic generative models, called parametric mixture models (PMMs), for multiclass, multi-labeled text categorization problem. Conventionally, the binary classification approach has been employed, in which whether or not text belongs to a category is judged by the binary classifier for every category. In contrast, our approach can simultaneously detect multiple categories of text using PMMs. We derive efficient learning and prediction algorithms for PMMs. We also empirically show that our method could significantly outperform the conventional binary methods when applied to multi-labeled text categorization using real World Wide Web pages. 1
2 0.12858841 109 nips-2002-Improving a Page Classifier with Anchor Extraction and Link Analysis
Author: William W. Cohen
Abstract: Most text categorization systems use simple models of documents and document collections. In this paper we describe a technique that improves a simple web page classifier’s performance on pages from a new, unseen web site, by exploiting link structure within a site as well as page structure within hub pages. On real-world test cases, this technique significantly and substantially improves the accuracy of a bag-of-words classifier, reducing error rate by about half, on average. The system uses a variant of co-training to exploit unlabeled data from a new site. Pages are labeled using the base classifier; the results are used by a restricted wrapper-learner to propose potential “main-category anchor wrappers”; and finally, these wrappers are used as features by a third learner to find a categorization of the site that implies a simple hub structure, but which also largely agrees with the original bag-of-words classifier.
3 0.11644772 1 nips-2002-"Name That Song!" A Probabilistic Approach to Querying on Music and Text
Author: Brochu Eric, Nando de Freitas
Abstract: We present a novel, flexible statistical approach for modelling music and text jointly. The approach is based on multi-modal mixture models and maximum a posteriori estimation using EM. The learned models can be used to browse databases with documents containing music and text, to search for music using queries consisting of music and text (lyrics and other contextual information), to annotate text documents with music, and to automatically recommend or identify similar songs.
4 0.094859511 48 nips-2002-Categorization Under Complexity: A Unified MDL Account of Human Learning of Regular and Irregular Categories
Author: David Fass, Jacob Feldman
Abstract: We present an account of human concept learning-that is, learning of categories from examples-based on the principle of minimum description length (MDL). In support of this theory, we tested a wide range of two-dimensional concept types, including both regular (simple) and highly irregular (complex) structures, and found the MDL theory to give a good account of subjects' performance. This suggests that the intrinsic complexity of a concept (that is, its description -length) systematically influences its leamability. 1- The Structure of Categories A number of different principles have been advanced to explain the manner in which humans learn to categorize objects. It has been variously suggested that the underlying principle might be the similarity structure of objects [1], the manipulability of decision bound~ aries [2], or Bayesian inference [3][4]. While many of these theories are mathematically well-grounded and have been successful in explaining a range of experimental findings, they have commonly only been tested on a narrow collection of concept types similar to the simple unimodal categories of Figure 1(a-e). (a) (b) (c) (d) (e) Figure 1: Categories similar to those previously studied. Lines represent contours of equal probability. All except (e) are unimodal. ~http://ruccs.rutgers.edu/~jacob/feldman.html Moreover, in the scarce research that has ventured to look beyond simple category types, the goal has largely been to investigate categorization performance for isolated irregular distributions, rather than to present a survey of performance across a range of interesting distributions. For example, Nosofsky has previously examined the
5 0.089802332 115 nips-2002-Informed Projections
Author: David Tax
Abstract: Low rank approximation techniques are widespread in pattern recognition research — they include Latent Semantic Analysis (LSA), Probabilistic LSA, Principal Components Analysus (PCA), the Generative Aspect Model, and many forms of bibliometric analysis. All make use of a low-dimensional manifold onto which data are projected. Such techniques are generally “unsupervised,” which allows them to model data in the absence of labels or categories. With many practical problems, however, some prior knowledge is available in the form of context. In this paper, I describe a principled approach to incorporating such information, and demonstrate its application to PCA-based approximations of several data sets. 1
6 0.089348711 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs
7 0.085658394 112 nips-2002-Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis
8 0.077168472 143 nips-2002-Mean Field Approach to a Probabilistic Model in Information Retrieval
9 0.076069951 64 nips-2002-Data-Dependent Bounds for Bayesian Mixture Methods
10 0.075679533 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
11 0.075060144 63 nips-2002-Critical Lines in Symmetry of Mixture Models and its Application to Component Splitting
12 0.075005293 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers
13 0.072826125 163 nips-2002-Prediction and Semantic Association
14 0.068107024 135 nips-2002-Learning with Multiple Labels
15 0.06790141 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine
16 0.064873159 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking
17 0.062537454 125 nips-2002-Learning Semantic Similarity
18 0.059774544 19 nips-2002-Adapting Codes and Embeddings for Polychotomies
19 0.059528232 83 nips-2002-Extracting Relevant Structures with Side Information
20 0.05843563 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization
topicId topicWeight
[(0, -0.177), (1, -0.089), (2, 0.031), (3, -0.006), (4, -0.037), (5, 0.066), (6, -0.095), (7, -0.112), (8, 0.046), (9, -0.068), (10, -0.093), (11, -0.026), (12, 0.033), (13, -0.054), (14, 0.073), (15, -0.044), (16, -0.005), (17, 0.044), (18, 0.047), (19, 0.061), (20, 0.025), (21, 0.026), (22, 0.012), (23, -0.087), (24, -0.005), (25, -0.016), (26, -0.177), (27, -0.021), (28, -0.03), (29, 0.046), (30, -0.058), (31, 0.064), (32, -0.059), (33, 0.049), (34, -0.029), (35, 0.119), (36, -0.031), (37, 0.067), (38, 0.038), (39, 0.221), (40, -0.126), (41, 0.217), (42, -0.139), (43, 0.05), (44, -0.02), (45, 0.016), (46, 0.106), (47, 0.015), (48, -0.136), (49, 0.07)]
simIndex simValue paperId paperTitle
same-paper 1 0.9291988 162 nips-2002-Parametric Mixture Models for Multi-Labeled Text
Author: Naonori Ueda, Kazumi Saito
Abstract: We propose probabilistic generative models, called parametric mixture models (PMMs), for multiclass, multi-labeled text categorization problem. Conventionally, the binary classification approach has been employed, in which whether or not text belongs to a category is judged by the binary classifier for every category. In contrast, our approach can simultaneously detect multiple categories of text using PMMs. We derive efficient learning and prediction algorithms for PMMs. We also empirically show that our method could significantly outperform the conventional binary methods when applied to multi-labeled text categorization using real World Wide Web pages. 1
2 0.7361322 109 nips-2002-Improving a Page Classifier with Anchor Extraction and Link Analysis
Author: William W. Cohen
Abstract: Most text categorization systems use simple models of documents and document collections. In this paper we describe a technique that improves a simple web page classifier’s performance on pages from a new, unseen web site, by exploiting link structure within a site as well as page structure within hub pages. On real-world test cases, this technique significantly and substantially improves the accuracy of a bag-of-words classifier, reducing error rate by about half, on average. The system uses a variant of co-training to exploit unlabeled data from a new site. Pages are labeled using the base classifier; the results are used by a restricted wrapper-learner to propose potential “main-category anchor wrappers”; and finally, these wrappers are used as features by a third learner to find a categorization of the site that implies a simple hub structure, but which also largely agrees with the original bag-of-words classifier.
3 0.57246792 1 nips-2002-"Name That Song!" A Probabilistic Approach to Querying on Music and Text
Author: Brochu Eric, Nando de Freitas
Abstract: We present a novel, flexible statistical approach for modelling music and text jointly. The approach is based on multi-modal mixture models and maximum a posteriori estimation using EM. The learned models can be used to browse databases with documents containing music and text, to search for music using queries consisting of music and text (lyrics and other contextual information), to annotate text documents with music, and to automatically recommend or identify similar songs.
Author: David Fass, Jacob Feldman
Abstract: We present an account of human concept learning-that is, learning of categories from examples-based on the principle of minimum description length (MDL). In support of this theory, we tested a wide range of two-dimensional concept types, including both regular (simple) and highly irregular (complex) structures, and found the MDL theory to give a good account of subjects' performance. This suggests that the intrinsic complexity of a concept (that is, its description -length) systematically influences its leamability. 1- The Structure of Categories A number of different principles have been advanced to explain the manner in which humans learn to categorize objects. It has been variously suggested that the underlying principle might be the similarity structure of objects [1], the manipulability of decision bound~ aries [2], or Bayesian inference [3][4]. While many of these theories are mathematically well-grounded and have been successful in explaining a range of experimental findings, they have commonly only been tested on a narrow collection of concept types similar to the simple unimodal categories of Figure 1(a-e). (a) (b) (c) (d) (e) Figure 1: Categories similar to those previously studied. Lines represent contours of equal probability. All except (e) are unimodal. ~http://ruccs.rutgers.edu/~jacob/feldman.html Moreover, in the scarce research that has ventured to look beyond simple category types, the goal has largely been to investigate categorization performance for isolated irregular distributions, rather than to present a survey of performance across a range of interesting distributions. For example, Nosofsky has previously examined the
5 0.55318904 115 nips-2002-Informed Projections
Author: David Tax
Abstract: Low rank approximation techniques are widespread in pattern recognition research — they include Latent Semantic Analysis (LSA), Probabilistic LSA, Principal Components Analysus (PCA), the Generative Aspect Model, and many forms of bibliometric analysis. All make use of a low-dimensional manifold onto which data are projected. Such techniques are generally “unsupervised,” which allows them to model data in the absence of labels or categories. With many practical problems, however, some prior knowledge is available in the form of context. In this paper, I describe a principled approach to incorporating such information, and demonstrate its application to PCA-based approximations of several data sets. 1
6 0.435238 143 nips-2002-Mean Field Approach to a Probabilistic Model in Information Retrieval
7 0.4347041 63 nips-2002-Critical Lines in Symmetry of Mixture Models and its Application to Component Splitting
8 0.43454993 64 nips-2002-Data-Dependent Bounds for Bayesian Mixture Methods
9 0.4040294 40 nips-2002-Bayesian Models of Inductive Generalization
10 0.38470301 8 nips-2002-A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains
11 0.34923437 112 nips-2002-Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis
12 0.3446686 190 nips-2002-Stochastic Neighbor Embedding
13 0.33345163 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine
14 0.32347244 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
15 0.3211531 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
16 0.31305701 108 nips-2002-Improving Transfer Rates in Brain Computer Interfacing: A Case Study
18 0.30263481 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation
19 0.29993704 150 nips-2002-Multiple Cause Vector Quantization
20 0.29845467 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization
topicId topicWeight
[(11, 0.018), (23, 0.01), (42, 0.068), (54, 0.103), (55, 0.023), (64, 0.01), (68, 0.01), (74, 0.539), (92, 0.043), (98, 0.083)]
simIndex simValue paperId paperTitle
1 0.99432957 57 nips-2002-Concurrent Object Recognition and Segmentation by Graph Partitioning
Author: Stella X. Yu, Ralph Gross, Jianbo Shi
Abstract: Segmentation and recognition have long been treated as two separate processes. We propose a mechanism based on spectral graph partitioning that readily combine the two processes into one. A part-based recognition system detects object patches, supplies their partial segmentations as well as knowledge about the spatial configurations of the object. The goal of patch grouping is to find a set of patches that conform best to the object configuration, while the goal of pixel grouping is to find a set of pixels that have the best low-level feature similarity. Through pixel-patch interactions and between-patch competition encoded in the solution space, these two processes are realized in one joint optimization problem. The globally optimal partition is obtained by solving a constrained eigenvalue problem. We demonstrate that the resulting object segmentation eliminates false positives for the part detection, while overcoming occlusion and weak contours for the low-level edge detection.
2 0.98152196 114 nips-2002-Information Regularization with Partially Labeled Data
Author: Martin Szummer, Tommi S. Jaakkola
Abstract: Classification with partially labeled data requires using a large number of unlabeled examples (or an estimated marginal P (x)), to further constrain the conditional P (y|x) beyond a few available labeled examples. We formulate a regularization approach to linking the marginal and the conditional in a general way. The regularization penalty measures the information that is implied about the labels over covering regions. No parametric assumptions are required and the approach remains tractable even for continuous marginal densities P (x). We develop algorithms for solving the regularization problem for finite covers, establish a limiting differential equation, and exemplify the behavior of the new regularization approach in simple cases.
3 0.97945338 78 nips-2002-Efficient Learning Equilibrium
Author: Ronen I. Brafman, Moshe Tennenholtz
Abstract: We introduce efficient learning equilibrium (ELE), a normative approach to learning in non cooperative settings. In ELE, the learning algorithms themselves are required to be in equilibrium. In addition, the learning algorithms arrive at a desired value after polynomial time, and deviations from a prescribed ELE become irrational after polynomial time. We prove the existence of an ELE in the perfect monitoring setting, where the desired value is the expected payoff in a Nash equilibrium. We also show that an ELE does not always exist in the imperfect monitoring case. Yet, it exists in the special case of common-interest games. Finally, we extend our results to general stochastic games. 1
Author: Terry Elliott, Jörg Kramer
Abstract: A neurotrophic model for the co-development of topography and ocular dominance columns in the primary visual cortex has recently been proposed. In the present work, we test this model by driving it with the output of a pair of neuronal vision sensors stimulated by disparate moving patterns. We show that the temporal correlations in the spike trains generated by the two sensors elicit the development of refined topography and ocular dominance columns, even in the presence of significant amounts of spontaneous activity and fixed-pattern noise in the sensors.
5 0.97164661 90 nips-2002-Feature Selection in Mixture-Based Clustering
Author: Martin H. Law, Anil K. Jain, Mário Figueiredo
Abstract: There exist many approaches to clustering, but the important issue of feature selection, i.e., selecting the data attributes that are relevant for clustering, is rarely addressed. Feature selection for clustering is difficult due to the absence of class labels. We propose two approaches to feature selection in the context of Gaussian mixture-based clustering. In the first one, instead of making hard selections, we estimate feature saliencies. An expectation-maximization (EM) algorithm is derived for this task. The second approach extends Koller and Sahami’s mutual-informationbased feature relevance criterion to the unsupervised case. Feature selection is then carried out by a backward search scheme. This scheme can be classified as a “wrapper”, since it wraps mixture estimation in an outer layer that performs feature selection. Experimental results on synthetic and real data show that both methods have promising performance.
same-paper 6 0.93292844 162 nips-2002-Parametric Mixture Models for Multi-Labeled Text
7 0.80257642 175 nips-2002-Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games
8 0.76758444 39 nips-2002-Bayesian Image Super-Resolution
9 0.74089956 152 nips-2002-Nash Propagation for Loopy Graphical Games
10 0.7383846 89 nips-2002-Feature Selection by Maximum Marginal Diversity
11 0.73368263 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture
12 0.73145062 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization
13 0.72942525 74 nips-2002-Dynamic Structure Super-Resolution
14 0.72053117 182 nips-2002-Shape Recipes: Scene Representations that Refer to the Image
15 0.71828276 122 nips-2002-Learning About Multiple Objects in Images: Factorial Learning without Factorial Search
16 0.70145875 173 nips-2002-Recovering Intrinsic Images from a Single Image
17 0.6881187 2 nips-2002-A Bilinear Model for Sparse Coding
18 0.68703401 135 nips-2002-Learning with Multiple Labels
19 0.68431002 100 nips-2002-Half-Lives of EigenFlows for Spectral Clustering
20 0.67912877 105 nips-2002-How to Combine Color and Shape Information for 3D Object Recognition: Kernels do the Trick