nips nips2002 nips2002-109 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: William W. Cohen
Abstract: Most text categorization systems use simple models of documents and document collections. In this paper we describe a technique that improves a simple web page classifier’s performance on pages from a new, unseen web site, by exploiting link structure within a site as well as page structure within hub pages. On real-world test cases, this technique significantly and substantially improves the accuracy of a bag-of-words classifier, reducing error rate by about half, on average. The system uses a variant of co-training to exploit unlabeled data from a new site. Pages are labeled using the base classifier; the results are used by a restricted wrapper-learner to propose potential “main-category anchor wrappers”; and finally, these wrappers are used as features by a third learner to find a categorization of the site that implies a simple hub structure, but which also largely agrees with the original bag-of-words classifier.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract Most text categorization systems use simple models of documents and document collections. [sent-3, score-0.197]
2 In this paper we describe a technique that improves a simple web page classifier’s performance on pages from a new, unseen web site, by exploiting link structure within a site as well as page structure within hub pages. [sent-4, score-2.71]
3 On real-world test cases, this technique significantly and substantially improves the accuracy of a bag-of-words classifier, reducing error rate by about half, on average. [sent-5, score-0.134]
4 The system uses a variant of co-training to exploit unlabeled data from a new site. [sent-6, score-0.204]
5 1 Introduction Most text categorization systems use simple models of documents and document collections. [sent-8, score-0.197]
6 For instance, it is common to model documents as “bags of words”, and to model a collection as a set of documents drawn from some fixed distribution. [sent-9, score-0.148]
7 An interesting question is how to exploit more detailed information about the structure of individual documents, or the structure of a collection of documents. [sent-10, score-0.168]
8 For web page categorization, a frequently-used approach is to use hyperlink information to improve classification accuracy (e. [sent-11, score-1.098]
9 Often hyperlink structure is used to “smooth” the predictions of a learned classifier, so that documents that (say) are pointed to by the same “hub” page will be more likely to have the same classification after smoothing. [sent-14, score-0.934]
10 This smoothing can be done either explicitly [15] or implicitly (for instance, by representing examples so that the distance between examples depends on hyperlink connectivity [7, 9]). [sent-15, score-0.265]
11 The structure of individual pages, as represented by HTML markup structure or linguis- tic structure, is less commonly used in web page classification: however, page structure is often used in extracting information from web pages. [sent-16, score-1.868]
12 Page structure seems to be particularly important in finding site-specific extraction rules (“wrappers”), since on a given site, formatting information is frequently an excellent indication of content [6, 10, 12]. [sent-17, score-0.189]
13 This paper is based on two practical observations about web page classification. [sent-18, score-0.812]
14 , product pages, job-posting pages, and press releases) many sites contain “hub” or index pages that point to essentially all pages in that category on a site. [sent-21, score-0.341]
15 These hubs rarely link exclusively to pages of a single category—instead the hubs will contain a number of additional links, such as links back to a home page and links to related hubs. [sent-22, score-1.213]
16 However, the page structure of a hub page often gives strong indications of which links are to pages from the “main” category associated with the hub, and which are ancillary links that exist for other (e. [sent-23, score-2.042]
17 Links to pages in the main category associated with this hub (previous NIPS conference homepages) are in the left-hand column of the table, and hence can be easily identified by the page structure. [sent-27, score-1.147]
18 The second observation is that it is relatively easy to learn to extract links from hub pages to main-category pages using existing wrapper-learning methods [8, 6]. [sent-28, score-0.896]
19 Wrapper-learning techniques interactively learn to extract data of some type from a single site using userprovided training examples. [sent-29, score-0.225]
20 Our experience in a number of domains indicates that maincategory links on hub pages (like the NIPS-homepage links from Figure 1) can almost always be learned from two or three positive examples. [sent-30, score-0.888]
21 Exploiting these observations, we describe in this paper a web page categorization system that exploits link structure within a site, as well as page structure within hub pages, to improve classification accuracy of a traditional bag-of-words classifier on pages from a previously unseen site. [sent-31, score-2.299]
22 The system uses a variant of co-training [3] to exploit unlabeled data from a new, previously unseen site. [sent-32, score-0.25]
23 Specifically, pages are labeled using a simple bag-of-words classifier, and the results are used by a restricted wrapper-learner to propose potential “main-category link wrappers”. [sent-33, score-0.285]
24 These wrappers are then used as features by a decision tree learner to find a categorization of the pages on the site that implies a simple hub structure, but which also largely agrees with the original bag-of-words classifier. [sent-34, score-1.108]
25 2 One-step co-training and hyperlink structure Consider a binary bag-of-words classifier f that has been learned from some set of labeled web pages D . [sent-35, score-0.755]
26 We wish to improve the performance of f on pages from an unknown web site S, by smoothing its predictions in a way that is plausible given the hyperlink of S, and the page structure of potential hub pages in S. [sent-36, score-1.973]
27 As background for the algorithm, let us consider first co-training, a well-studied approach for improving classifier performance using unlabeled data [3]. [sent-37, score-0.144]
28 edu) created and maintained these web pages from 1994 until 1996. [sent-51, score-0.45]
29 NIPS*2000 NIPS 13, the conference proceedings for 2000 (”Advances in Neural Information Processing Systems 13”, edited by Leen, Todd K. [sent-60, score-0.065]
30 ∗ Abstracts and papers from this forthcoming volume are available on-line. [sent-63, score-0.112]
31 ∗ BibTeX entries for all papers from this forthcoming volume are available on-line. [sent-64, score-0.112]
32 Abstracts and papers from this volume are available on-line. [sent-66, score-0.076]
33 Abstracts and (some) papers from this volume are available on-line. [sent-68, score-0.076]
34 Links to pages in the main category associated with this hub are in the left-hand column of the table. [sent-73, score-0.602]
35 In this setting, a large amount of unlabeled data Du can be used to improve the accuracy of a small set of labeled data D , as follows. [sent-74, score-0.296]
36 Then, use f1 to label the examples in Du , and use A2 to learn from this training set. [sent-76, score-0.096]
37 Given the assumptions above, f1 ’s errors on Du will appear to A2 as random, uncorrelated noise, and A2 can in principle learn an arbitrarily good approximation to f , given enough unlabeled data in Du . [sent-77, score-0.182]
38 Now, consider a set DS of unlabeled pages from a unseen web site S. [sent-79, score-0.762]
39 It seems not unreasonable to assume that the words x1 on a page x ∈ S and the hub pages x2 ∈ S that hyperlink to x are independent, given the class of x. [sent-80, score-1.275]
40 Let S be a web site, f1 be a bag-of-words page classifier, and DS be the pages on the site S. [sent-83, score-1.09]
41 For each page xi ∈ DS , represent xi as a vector of all pages in S that hyperlink to xi . [sent-86, score-1.005]
42 Use a learner A2 to learn f2 from the labeled examples D2 = {(xi , y i )}i . [sent-91, score-0.217]
43 Use f2 (x) as the final label for each page x ∈ DS . [sent-94, score-0.547]
44 In experimental studies, co-training is usually done iteratively, alternating between using f1 and f2 for tagging the unlabeled data. [sent-96, score-0.144]
45 The one-step version seems more appropriate in this setting, in which there are a limited number of unlabeled examples over which each x2 is defined. [sent-97, score-0.173]
46 1 Learning to extract anchors from web pages Algorithm 1 has some shortcomings. [sent-99, score-0.557]
47 Co-training assumes a large pool of unlabeled data: however, if the informative hubs for pages on S are mostly within S (a very plausible assumption) then the amount of useful unlabeled data is limited by the size of S. [sent-100, score-0.493]
48 With limited amounts of unlabeled data, it is very important that A2 has a strong (and appropriate) statistical bias, and that A2 has some effective method for avoiding overfitting. [sent-101, score-0.144]
49 As suggested by Figure 1, the informativeness of hub features can be improved by using knowledge of the structure of hub pages themselves. [sent-102, score-1.082]
50 To make use of hub page structure, we used a wrapper-learning system called WL2 , which has experimentally proven to be effective at learning substructures of web pages [6]. [sent-103, score-1.385]
51 The output of WL 2 is an extraction predicate: a binary relation p between pages x and substrings a within x. [sent-104, score-0.235]
52 As an example, WL2 might output p = {(x, a) : x is the page of Figure 1 and a is an anchor appearing in the first column of the table}. [sent-105, score-0.641]
53 (An anchor is a substring of a web page that defines a hyperlink. [sent-106, score-0.968]
54 ) This suggests a modification of Algorithm 1, in which one-step co-training is carried out on the problem of extracting anchors rather than the problem of labeling web pages. [sent-107, score-0.488]
55 Specifically, one might map f1 ’s predictions from web pages to anchors, by giving a positive label to anchor a iff a links to a page x such that f1 (x) = 1; then use WL2 algorithm A2 to learn a predicate p2 ; and finally, map the predictions of p2 from anchors back to web pages. [sent-108, score-1.862]
56 Another problem is that it unclear how to map class labels from anchors back to web pages, since a page might be pointed to by many different anchors. [sent-110, score-0.994]
57 2 Bridging the gap between anchors and pages Based on these observations we modified Algorithm 1 as follows. [sent-112, score-0.239]
58 As suggested, we map the predictions about page labels made by f1 to anchors. [sent-113, score-0.55]
59 Using these anchor labels, we then produce many small training sets that are passed to WL2 . [sent-114, score-0.123]
60 Finally, we use the many wrappers produced by WL2 as features in a representation of a page x, and again use a learner to combine the wrapper-features and produce a single classification for a page. [sent-116, score-0.788]
61 Let S be a web site, f1 be a bag-of-words page classifier, and DS be the pages on the site. [sent-119, score-0.927]
62 For each anchor a on a page x ∈ S, label a as tentatively-positive if a points to a page x such that x ∈ S and f1 (x ) = 1. [sent-122, score-1.188]
63 Let P be the set of all pairs (x, a) where a is a tentativelypositive link and x is the page on which a is found. [sent-125, score-0.615]
64 , Dk containing such pairs, and for each subset Di , use WL2 to produce a number of possible extraction predicates pi,1 , . [sent-129, score-0.15]
65 We will say that the “wrapper predicate” p ij links to x iff pij includes some pair (x , a) such that x ∈ DS and a is a hyperlink to page x. [sent-136, score-0.94]
66 For each page xi ∈ DS , represent xi as a vector of all wrappers pij that link to x. [sent-137, score-0.933]
67 Use a learner A2 to learn f2 from the labeled examples DS = {(xi , y i )}i . [sent-142, score-0.217]
68 Use f2 (x) as the final label for each page x ∈ DS . [sent-145, score-0.547]
69 In this case, in building a page classifier, one would like to exploit knowledge about the related problem of link extraction. [sent-147, score-0.651]
70 The bagging-like approach of “feeding” WL 2 many small training sets, and the use of a second learning algorithm to aggregate the results of WL 2 , are a means of exploiting prior experimental results, in lieu of more precise statistical assumptions. [sent-153, score-0.069]
71 4 Experimental results To evaluate the technique, we used the task of categorizing web pages from company sites as executive biography or other. [sent-154, score-0.586]
72 We selected nine company web sites with non-trivial hub structures. [sent-155, score-0.885]
73 These were crawled using a heuristic spidering strategy intended to find executive biography pages with high recall. [sent-156, score-0.197]
74 1 The crawl found 879 pages, of which 128 were labeled positive. [sent-157, score-0.073]
75 A simple bag-of-words classifier f1 was trained using a disjoint set of sites (different from the nine above), obtaining an average accuracy of 91. [sent-158, score-0.171]
76 Algorithm 2 improves over the baseline classifier f1 on six of the nine sites, and obtains the same accuracy on two more. [sent-164, score-0.139]
77 This difference is significant at the 98% level with a 2-tailed paired sign test, and at the 95% level with a 2-tailed paired t test. [sent-165, score-0.068]
78 5-like decision tree learning algorithm [14] for learner A2 . [sent-167, score-0.101]
79 5 Related work The introduction discusses the relationship between this work and a number of previous techniques for using hyperlink structure in web page classification [7, 9, 15]. [sent-232, score-1.085]
80 The WL 2 based method for finding document structure has antecedents in other techniques for learning [10, 12] and automatically detecting [4, 5] structure in web pages. [sent-233, score-0.46]
81 Blei et al do not address the specific problem considered here, of using both page structure and hyperlink structure in web page classification. [sent-235, score-1.727]
82 However, they do apply their technique to two closely related problems: they augment a page classification method with local features based on the page’s URL, and also augment content-based classification of “text nodes” (specific substrings of a web page) with page-structure-based local features. [sent-236, score-0.949]
83 (In fact, Blei et al generated page-structure-based features for their extraction task in exactly this way: the only difference is that WL2 was parameterized differently. [sent-238, score-0.185]
84 6 Conclusions We have described a technique that improves a simple web page classifier by exploiting link structure within a site, as well as page structure within hub pages. [sent-241, score-2.092]
85 The system uses a variant of co-training called “one-step co-training” to exploit unlabeled data from a new site. [sent-242, score-0.227]
86 Next, results of this labeling are propogated to links to labeled pages, and these labeled links are used by a wrapper-learner called WL2 to propose potential “main-category link wrappers”. [sent-244, score-0.628]
87 Finally, these wrappers are used as features by another learner A2 to find a categorization of the site that implies a simple hub structure, but which also largely agrees with the original bag-of-words classifier. [sent-245, score-0.993]
88 On a real-world benchmark problem, this technique substantially improved the accuracy of a simple bag-of-words classifier, reducing error rate by about half. [sent-247, score-0.107]
89 Appendix A: Details on “Wrapper Proposal” Extraction predicates are constructed by WL2 using a rule-learning algorithm and a configurable set of components called builders. [sent-250, score-0.101]
90 Each builder B corresponds to a language L B of extraction predicates. [sent-251, score-0.132]
91 Given a set of pairs D = {(x i , ai )} such that each ai is a substring of xi , LGGB (D) is the least general p ∈ LB such that (x, a) ∈ D ⇒ (x, a) ∈ p. [sent-253, score-0.088]
92 Depending on B, these properties might be membership in a particular syntactic HTML structure (e. [sent-255, score-0.066]
93 These heuristics were based on the observation that in most extraction tasks, the items to be extracted are close together. [sent-261, score-0.122]
94 (We also note that these heuristics were initially developed to support a different set of experiments [1], and were not substantially modified for the experiments in this paper. [sent-263, score-0.056]
95 In our use of WL2 , we simply applied each builder Bj to a dataset Di , to get the set of predicates {pij } = {LGGBj (Di )}, instead of running the full WL2 learning algorithm. [sent-265, score-0.09]
96 Learning with scope, with application to information extraction and classification. [sent-270, score-0.096]
97 Automatically extracting features for concept learning from the web. [sent-280, score-0.077]
98 Learning page-independent heuristics for extracting data from web pages. [sent-285, score-0.366]
99 The missing link - a probabilistic model of document content and hypertext connectivity. [sent-293, score-0.189]
100 A structured wrapper induction system for extracting information from semi-structured documents. [sent-299, score-0.183]
wordName wordTfidf (topN-words)
[('page', 0.518), ('hub', 0.435), ('web', 0.294), ('hyperlink', 0.207), ('links', 0.169), ('site', 0.163), ('wrappers', 0.162), ('unlabeled', 0.144), ('classi', 0.142), ('anchors', 0.124), ('anchor', 0.123), ('pages', 0.115), ('wrapper', 0.108), ('ds', 0.103), ('link', 0.097), ('extraction', 0.096), ('er', 0.095), ('learner', 0.077), ('blei', 0.076), ('documents', 0.074), ('labeled', 0.073), ('william', 0.071), ('structure', 0.066), ('categorization', 0.065), ('nips', 0.065), ('du', 0.063), ('hubs', 0.062), ('nine', 0.061), ('sites', 0.059), ('wl', 0.058), ('xi', 0.055), ('predicates', 0.054), ('winnow', 0.054), ('category', 0.052), ('accuracy', 0.051), ('papers', 0.051), ('html', 0.049), ('predicate', 0.049), ('abstracts', 0.049), ('extracting', 0.046), ('pij', 0.046), ('unseen', 0.046), ('exploiting', 0.045), ('cohen', 0.043), ('biography', 0.041), ('builders', 0.041), ('executive', 0.041), ('lggb', 0.041), ('maintained', 0.041), ('learn', 0.038), ('proceedings', 0.038), ('pointed', 0.037), ('forthcoming', 0.036), ('company', 0.036), ('builder', 0.036), ('exploit', 0.036), ('al', 0.035), ('paired', 0.034), ('agrees', 0.034), ('document', 0.034), ('substring', 0.033), ('di', 0.032), ('predictions', 0.032), ('se', 0.032), ('features', 0.031), ('hypertext', 0.031), ('cation', 0.031), ('substantially', 0.03), ('induction', 0.029), ('label', 0.029), ('examples', 0.029), ('improve', 0.028), ('assumes', 0.028), ('augment', 0.028), ('douglas', 0.028), ('content', 0.027), ('conference', 0.027), ('improves', 0.027), ('technique', 0.026), ('heuristics', 0.026), ('largely', 0.026), ('baker', 0.025), ('available', 0.025), ('substrings', 0.024), ('labeling', 0.024), ('algorithm', 0.024), ('variant', 0.024), ('text', 0.024), ('extract', 0.024), ('jensen', 0.024), ('attributes', 0.024), ('instance', 0.023), ('andrew', 0.023), ('et', 0.023), ('called', 0.023), ('june', 0.022), ('international', 0.022), ('cally', 0.022), ('back', 0.021), ('subsets', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 109 nips-2002-Improving a Page Classifier with Anchor Extraction and Link Analysis
Author: William W. Cohen
Abstract: Most text categorization systems use simple models of documents and document collections. In this paper we describe a technique that improves a simple web page classifier’s performance on pages from a new, unseen web site, by exploiting link structure within a site as well as page structure within hub pages. On real-world test cases, this technique significantly and substantially improves the accuracy of a bag-of-words classifier, reducing error rate by about half, on average. The system uses a variant of co-training to exploit unlabeled data from a new site. Pages are labeled using the base classifier; the results are used by a restricted wrapper-learner to propose potential “main-category anchor wrappers”; and finally, these wrappers are used as features by a third learner to find a categorization of the site that implies a simple hub structure, but which also largely agrees with the original bag-of-words classifier.
2 0.12946475 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
Author: Olivier Chapelle, Jason Weston, Bernhard SchĂślkopf
Abstract: We propose a framework to incorporate unlabeled data in kernel classifier, based on the idea that two points in the same cluster are more likely to have the same label. This is achieved by modifying the eigenspectrum of the kernel matrix. Experimental results assess the validity of this approach. 1
3 0.12858841 162 nips-2002-Parametric Mixture Models for Multi-Labeled Text
Author: Naonori Ueda, Kazumi Saito
Abstract: We propose probabilistic generative models, called parametric mixture models (PMMs), for multiclass, multi-labeled text categorization problem. Conventionally, the binary classification approach has been employed, in which whether or not text belongs to a category is judged by the binary classifier for every category. In contrast, our approach can simultaneously detect multiple categories of text using PMMs. We derive efficient learning and prediction algorithms for PMMs. We also empirically show that our method could significantly outperform the conventional binary methods when applied to multi-labeled text categorization using real World Wide Web pages. 1
4 0.10384692 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking
Author: Sariel Har-Peled, Dan Roth, Dav Zimak
Abstract: The constraint classification framework captures many flavors of multiclass classification including winner-take-all multiclass classification, multilabel classification and ranking. We present a meta-algorithm for learning in this framework that learns via a single linear classifier in high dimension. We discuss distribution independent as well as margin-based generalization bounds and present empirical and theoretical evidence showing that constraint classification benefits over existing methods of multiclass classification.
5 0.10330726 115 nips-2002-Informed Projections
Author: David Tax
Abstract: Low rank approximation techniques are widespread in pattern recognition research — they include Latent Semantic Analysis (LSA), Probabilistic LSA, Principal Components Analysus (PCA), the Generative Aspect Model, and many forms of bibliometric analysis. All make use of a low-dimensional manifold onto which data are projected. Such techniques are generally “unsupervised,” which allows them to model data in the absence of labels or categories. With many practical problems, however, some prior knowledge is available in the form of context. In this paper, I describe a principled approach to incorporating such information, and demonstrate its application to PCA-based approximations of several data sets. 1
6 0.09427914 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers
7 0.077268802 92 nips-2002-FloatBoost Learning for Classification
8 0.072419368 45 nips-2002-Boosted Dyadic Kernel Discriminants
9 0.067652367 83 nips-2002-Extracting Relevant Structures with Side Information
10 0.066992514 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs
11 0.064194836 19 nips-2002-Adapting Codes and Embeddings for Polychotomies
12 0.063557483 112 nips-2002-Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis
13 0.063275509 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine
14 0.055544853 55 nips-2002-Combining Features for BCI
15 0.055441897 72 nips-2002-Dyadic Classification Trees via Structural Risk Minimization
16 0.055108823 90 nips-2002-Feature Selection in Mixture-Based Clustering
17 0.054677814 143 nips-2002-Mean Field Approach to a Probabilistic Model in Information Retrieval
18 0.054143526 149 nips-2002-Multiclass Learning by Probabilistic Embeddings
19 0.053849217 120 nips-2002-Kernel Design Using Boosting
20 0.053233534 196 nips-2002-The RA Scanner: Prediction of Rheumatoid Joint Inflammation Based on Laser Imaging
topicId topicWeight
[(0, -0.164), (1, -0.093), (2, 0.05), (3, -0.021), (4, 0.037), (5, 0.043), (6, -0.053), (7, -0.138), (8, 0.019), (9, -0.055), (10, -0.128), (11, 0.024), (12, 0.005), (13, 0.008), (14, 0.063), (15, -0.036), (16, 0.009), (17, 0.023), (18, 0.049), (19, 0.052), (20, 0.018), (21, 0.044), (22, -0.02), (23, -0.04), (24, 0.066), (25, -0.028), (26, -0.125), (27, -0.05), (28, -0.049), (29, -0.011), (30, -0.022), (31, 0.044), (32, 0.004), (33, 0.132), (34, -0.031), (35, 0.033), (36, 0.038), (37, 0.047), (38, 0.017), (39, 0.15), (40, 0.02), (41, 0.227), (42, -0.129), (43, 0.0), (44, 0.112), (45, 0.013), (46, 0.074), (47, 0.026), (48, -0.073), (49, -0.047)]
simIndex simValue paperId paperTitle
same-paper 1 0.93347013 109 nips-2002-Improving a Page Classifier with Anchor Extraction and Link Analysis
Author: William W. Cohen
Abstract: Most text categorization systems use simple models of documents and document collections. In this paper we describe a technique that improves a simple web page classifier’s performance on pages from a new, unseen web site, by exploiting link structure within a site as well as page structure within hub pages. On real-world test cases, this technique significantly and substantially improves the accuracy of a bag-of-words classifier, reducing error rate by about half, on average. The system uses a variant of co-training to exploit unlabeled data from a new site. Pages are labeled using the base classifier; the results are used by a restricted wrapper-learner to propose potential “main-category anchor wrappers”; and finally, these wrappers are used as features by a third learner to find a categorization of the site that implies a simple hub structure, but which also largely agrees with the original bag-of-words classifier.
2 0.77636421 162 nips-2002-Parametric Mixture Models for Multi-Labeled Text
Author: Naonori Ueda, Kazumi Saito
Abstract: We propose probabilistic generative models, called parametric mixture models (PMMs), for multiclass, multi-labeled text categorization problem. Conventionally, the binary classification approach has been employed, in which whether or not text belongs to a category is judged by the binary classifier for every category. In contrast, our approach can simultaneously detect multiple categories of text using PMMs. We derive efficient learning and prediction algorithms for PMMs. We also empirically show that our method could significantly outperform the conventional binary methods when applied to multi-labeled text categorization using real World Wide Web pages. 1
3 0.56894666 115 nips-2002-Informed Projections
Author: David Tax
Abstract: Low rank approximation techniques are widespread in pattern recognition research — they include Latent Semantic Analysis (LSA), Probabilistic LSA, Principal Components Analysus (PCA), the Generative Aspect Model, and many forms of bibliometric analysis. All make use of a low-dimensional manifold onto which data are projected. Such techniques are generally “unsupervised,” which allows them to model data in the absence of labels or categories. With many practical problems, however, some prior knowledge is available in the form of context. In this paper, I describe a principled approach to incorporating such information, and demonstrate its application to PCA-based approximations of several data sets. 1
Author: David Fass, Jacob Feldman
Abstract: We present an account of human concept learning-that is, learning of categories from examples-based on the principle of minimum description length (MDL). In support of this theory, we tested a wide range of two-dimensional concept types, including both regular (simple) and highly irregular (complex) structures, and found the MDL theory to give a good account of subjects' performance. This suggests that the intrinsic complexity of a concept (that is, its description -length) systematically influences its leamability. 1- The Structure of Categories A number of different principles have been advanced to explain the manner in which humans learn to categorize objects. It has been variously suggested that the underlying principle might be the similarity structure of objects [1], the manipulability of decision bound~ aries [2], or Bayesian inference [3][4]. While many of these theories are mathematically well-grounded and have been successful in explaining a range of experimental findings, they have commonly only been tested on a narrow collection of concept types similar to the simple unimodal categories of Figure 1(a-e). (a) (b) (c) (d) (e) Figure 1: Categories similar to those previously studied. Lines represent contours of equal probability. All except (e) are unimodal. ~http://ruccs.rutgers.edu/~jacob/feldman.html Moreover, in the scarce research that has ventured to look beyond simple category types, the goal has largely been to investigate categorization performance for isolated irregular distributions, rather than to present a survey of performance across a range of interesting distributions. For example, Nosofsky has previously examined the
5 0.4535833 108 nips-2002-Improving Transfer Rates in Brain Computer Interfacing: A Case Study
Author: Peter Meinicke, Matthias Kaper, Florian Hoppe, Manfred Heumann, Helge Ritter
Abstract: In this paper we present results of a study on brain computer interfacing. We adopted an approach of Farwell & Donchin [4], which we tried to improve in several aspects. The main objective was to improve the transfer rates based on offline analysis of EEG-data but within a more realistic setup closer to an online realization than in the original studies. The objective was achieved along two different tracks: on the one hand we used state-of-the-art machine learning techniques for signal classification and on the other hand we augmented the data space by using more electrodes for the interface. For the classification task we utilized SVMs and, as motivated by recent findings on the learning of discriminative densities, we accumulated the values of the classification function in order to combine several classifications, which finally lead to significantly improved rates as compared with techniques applied in the original work. In combination with the data space augmentation, we achieved competitive transfer rates at an average of 50.5 bits/min and with a maximum of 84.7 bits/min.
6 0.45332801 1 nips-2002-"Name That Song!" A Probabilistic Approach to Querying on Music and Text
7 0.43990102 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking
8 0.4394201 55 nips-2002-Combining Features for BCI
9 0.41282219 196 nips-2002-The RA Scanner: Prediction of Rheumatoid Joint Inflammation Based on Laser Imaging
10 0.39668667 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation
11 0.38545573 40 nips-2002-Bayesian Models of Inductive Generalization
12 0.38026774 8 nips-2002-A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains
13 0.3782447 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
14 0.3639051 149 nips-2002-Multiclass Learning by Probabilistic Embeddings
15 0.35883322 190 nips-2002-Stochastic Neighbor Embedding
16 0.35771307 45 nips-2002-Boosted Dyadic Kernel Discriminants
17 0.35518768 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers
18 0.35412994 92 nips-2002-FloatBoost Learning for Classification
19 0.33194527 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine
20 0.32134867 62 nips-2002-Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems
topicId topicWeight
[(11, 0.014), (14, 0.013), (21, 0.271), (23, 0.014), (42, 0.086), (54, 0.091), (55, 0.041), (57, 0.022), (67, 0.021), (68, 0.025), (74, 0.134), (87, 0.01), (92, 0.026), (98, 0.138)]
simIndex simValue paperId paperTitle
same-paper 1 0.81073344 109 nips-2002-Improving a Page Classifier with Anchor Extraction and Link Analysis
Author: William W. Cohen
Abstract: Most text categorization systems use simple models of documents and document collections. In this paper we describe a technique that improves a simple web page classifier’s performance on pages from a new, unseen web site, by exploiting link structure within a site as well as page structure within hub pages. On real-world test cases, this technique significantly and substantially improves the accuracy of a bag-of-words classifier, reducing error rate by about half, on average. The system uses a variant of co-training to exploit unlabeled data from a new site. Pages are labeled using the base classifier; the results are used by a restricted wrapper-learner to propose potential “main-category anchor wrappers”; and finally, these wrappers are used as features by a third learner to find a categorization of the site that implies a simple hub structure, but which also largely agrees with the original bag-of-words classifier.
2 0.63328874 135 nips-2002-Learning with Multiple Labels
Author: Rong Jin, Zoubin Ghahramani
Abstract: In this paper, we study a special kind of learning problem in which each training instance is given a set of (or distribution over) candidate class labels and only one of the candidate labels is the correct one. Such a problem can occur, e.g., in an information retrieval setting where a set of words is associated with an image, or if classes labels are organized hierarchically. We propose a novel discriminative approach for handling the ambiguity of class labels in the training examples. The experiments with the proposed approach over five different UCI datasets show that our approach is able to find the correct label among the set of candidate labels and actually achieve performance close to the case when each training instance is given a single correct label. In contrast, naIve methods degrade rapidly as more ambiguity is introduced into the labels. 1
3 0.62179238 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture
Author: David R. Martin, Charless C. Fowlkes, Jitendra Malik
Abstract: The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness and texture associated with natural boundaries. In order to combine the information from these features in an optimal way, a classifier is trained using human labeled images as ground truth. We present precision-recall curves showing that the resulting detector outperforms existing approaches.
4 0.61808622 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
Author: Max Welling, Simon Osindero, Geoffrey E. Hinton
Abstract: We propose a model for natural images in which the probability of an image is proportional to the product of the probabilities of some filter outputs. We encourage the system to find sparse features by using a Studentt distribution to model each filter output. If the t-distribution is used to model the combined outputs of sets of neurally adjacent filters, the system learns a topographic map in which the orientation, spatial frequency and location of the filters change smoothly across the map. Even though maximum likelihood learning is intractable in our model, the product form allows a relatively efficient learning procedure that works well even for highly overcomplete sets of filters. Once the model has been learned it can be used as a prior to derive the “iterated Wiener filter” for the purpose of denoising images.
5 0.61740232 110 nips-2002-Incremental Gaussian Processes
Author: Joaquin Quiñonero-candela, Ole Winther
Abstract: In this paper, we consider Tipping’s relevance vector machine (RVM) [1] and formalize an incremental training strategy as a variant of the expectation-maximization (EM) algorithm that we call Subspace EM (SSEM). Working with a subset of active basis functions, the sparsity of the RVM solution will ensure that the number of basis functions and thereby the computational complexity is kept low. We also introduce a mean field approach to the intractable classification model that is expected to give a very good approximation to exact Bayesian inference and contains the Laplace approximation as a special case. We test the algorithms on two large data sets with O(103 − 104 ) examples. The results indicate that Bayesian learning of large data sets, e.g. the MNIST database is realistic.
6 0.61684042 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
7 0.61618429 2 nips-2002-A Bilinear Model for Sparse Coding
8 0.6152482 74 nips-2002-Dynamic Structure Super-Resolution
9 0.61492276 89 nips-2002-Feature Selection by Maximum Marginal Diversity
10 0.61311758 163 nips-2002-Prediction and Semantic Association
11 0.61256105 31 nips-2002-Application of Variational Bayesian Approach to Speech Recognition
12 0.612499 124 nips-2002-Learning Graphical Models with Mercer Kernels
13 0.61126447 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks
14 0.61064589 10 nips-2002-A Model for Learning Variance Components of Natural Images
15 0.61020958 122 nips-2002-Learning About Multiple Objects in Images: Factorial Learning without Factorial Search
16 0.61020559 41 nips-2002-Bayesian Monte Carlo
17 0.60803068 48 nips-2002-Categorization Under Complexity: A Unified MDL Account of Human Learning of Regular and Irregular Categories
18 0.60779744 3 nips-2002-A Convergent Form of Approximate Policy Iteration
19 0.6074239 141 nips-2002-Maximally Informative Dimensions: Analyzing Neural Responses to Natural Signals
20 0.60596466 53 nips-2002-Clustering with the Fisher Score