nips nips2004 nips2004-145 knowledge-graph by maker-knowledge-mining

145 nips-2004-Parametric Embedding for Class Visualization


Source: pdf

Author: Tomoharu Iwata, Kazumi Saito, Naonori Ueda, Sean Stromsten, Thomas L. Griffiths, Joshua B. Tenenbaum

Abstract: In this paper, we propose a new method, Parametric Embedding (PE), for visualizing the posteriors estimated over a mixture model. PE simultaneously embeds both objects and their classes in a low-dimensional space. PE takes as input a set of class posterior vectors for given data points, and tries to preserve the posterior structure in an embedding space by minimizing a sum of Kullback-Leibler divergences, under the assumption that samples are generated by a Gaussian mixture with equal covariances in the embedding space. PE has many potential uses depending on the source of the input data, providing insight into the classifier’s behavior in supervised, semi-supervised and unsupervised settings. The PE algorithm has a computational advantage over conventional embedding methods based on pairwise object relations since its complexity scales with the product of the number of objects and the number of classes. We demonstrate PE by visualizing supervised categorization of web pages, semi-supervised categorization of digits, and the relations of words and latent topics found by an unsupervised algorithm, Latent Dirichlet Allocation. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In this paper, we propose a new method, Parametric Embedding (PE), for visualizing the posteriors estimated over a mixture model. [sent-8, score-0.162]

2 PE simultaneously embeds both objects and their classes in a low-dimensional space. [sent-9, score-0.389]

3 PE has many potential uses depending on the source of the input data, providing insight into the classifier’s behavior in supervised, semi-supervised and unsupervised settings. [sent-11, score-0.123]

4 The PE algorithm has a computational advantage over conventional embedding methods based on pairwise object relations since its complexity scales with the product of the number of objects and the number of classes. [sent-12, score-0.606]

5 We demonstrate PE by visualizing supervised categorization of web pages, semi-supervised categorization of digits, and the relations of words and latent topics found by an unsupervised algorithm, Latent Dirichlet Allocation. [sent-13, score-0.675]

6 1 Introduction Recently there has been great interest in algorithms for constructing low-dimensional feature-space embeddings of high-dimensional data sets. [sent-14, score-0.078]

7 These algorithms seek to capture some aspect of the data set’s intrinsic structure in a low-dimensional representation that is easier to visualize or more efficient to process by other learning algorithms. [sent-15, score-0.138]

8 Typical embedding algorithms take as input a matrix of data coordinates in a high-dimensional ambient space (e. [sent-16, score-0.386]

9 , PCA [5]), or a matrix of metric relations between pairs of data points (MDS [7], Isomap [6], SNE [4]). [sent-18, score-0.11]

10 The algorithms generally attempt to map all and only nearby input points onto nearby points in the output embedding. [sent-19, score-0.202]

11 Here we consider a different sort of embedding problem with two sets of points X = {x1 , . [sent-20, score-0.306]

12 The input consists of conditional probabilities p(ck |xn ) associating each object xn with each class ck . [sent-27, score-0.861]

13 Typically, the number of classes is much smaller than the number of objects, K << N . [sent-29, score-0.162]

14 We seek a low-dimensional embedding of both objects and classes such that the distance between object n and class k is monotonically related to the probability p(ck |xn ). [sent-30, score-0.773]

15 This embedding simultaneously represents not only the relations between objects and classes, but also the relations within the set of objects and within the set of classes – each defined in terms of relations to points in the other set. [sent-31, score-1.052]

16 That is, objects that tend to be associated with the same classes should be embedded nearby, as should classes that tend to have the same objects associated with them. [sent-32, score-0.774]

17 Our primary goals are visualization and structure discovery, so we typically work with two- or three-dimensional embeddings. [sent-33, score-0.068]

18 Object-class embeddings have many potential uses, depending on the source of the input data. [sent-34, score-0.116]

19 The probabilities p(c k |xn ) may also be the product of unsupervised or semi-supervised learning, where the classes ck represent components in a generative mixture model. [sent-37, score-0.788]

20 Then an object-class embedding shows how well the intrinsic structure of the objects (and, in a semi-supervised setting, any given labels) accords with the clustering assumptions of the mixture model. [sent-38, score-0.544]

21 Our specific formulation of the embedding problem assumes that each class c k can be represented by a spherical Gaussian distribution in the embedding space, so that the embedding as a whole represents a simple Gaussian mixture model for each object x n . [sent-39, score-1.014]

22 We seek an embedding that matches the posterior probabilities for each object under this Gaussian mixture model to the input probabilities p(ck |xn ). [sent-40, score-0.706]

23 Minimizing the Kullback-Leibler (KL) divergence between these two posterior distributions leads to an efficient algorithm, which we call Parametric Embedding (PE). [sent-41, score-0.085]

24 PE can be seen as a generalization of stochastic neighbor embedding (SNE). [sent-42, score-0.264]

25 SNE corresponds to a special case of PE where the objects and classes are identical sets. [sent-43, score-0.352]

26 In SNE, the class posterior probabilities p(ck |xn ) are replaced by the probability p(xm |xn ) of object xn under a Gaussian distribution centered on xm . [sent-44, score-0.466]

27 When the inputs (posterior probabilities) to PE come from an unsupervised mixture model, PE performs unsupervised dimensionality reduction just like SNE. [sent-45, score-0.178]

28 However, it has several advantages over SNE and other methods for embedding a single set of data points based on their pairwise relations (e. [sent-46, score-0.408]

29 It can be applied in supervised or semi-supervised modes, when class labels are available. [sent-49, score-0.151]

30 Because its computational complexity scales with N K, the product of the number of objects and the number of classes, it can be applied efficiently to data sets with very many objects (as long as the number of classes remains small). [sent-50, score-0.542]

31 In this sense, PE is closely related to landmark MDS (LMDS) [2], if we equate classes with landmarks, objects with data points, and − log p(ck |xn ) with the squared distances input to LMDS. [sent-51, score-0.428]

32 However, LMDS lacks a probabilistic semantics and is only suitable for unsupervised settings. [sent-52, score-0.106]

33 Lastly, even if hard classifications are not available, it is often the relations of the objects to the classes, rather than to each other, that we are interested in. [sent-53, score-0.258]

34 After describing the mathematical formulation and optimization procedures used in PE (Section 2), we present applications to visualizing the structure of several kinds of class posteriors. [sent-54, score-0.203]

35 In section 3, we look at supervised classifiers of hand-labeled web pages. [sent-55, score-0.145]

36 Lastly, in section 5, we apply PE to an unsupervised probabilistic topics model, treating latent topics as classes, and words as objects. [sent-57, score-0.541]

37 PE handles these datasets easily, in the last producing an embedding for over 26,000 objects in a little over a minute (on a 2GHz Pentium computer). [sent-58, score-0.454]

38 (1) Here · is the Euclidean norm in the embedding space. [sent-60, score-0.264]

39 When the conditional probabilities p(ck |xn ) arise as posterior probabilities from a mixture model, we will also typically be given priors p(ck ) as input; otherwise the p(ck ) terms above may be assumed equal. [sent-61, score-0.377]

40 It is natural to measure the degree of correspondence between input probabilities and embedding-space probabilities using a sum of KL divergences for each object: N n=1 KL(p(ck |xn )||p(ck |rn )). [sent-62, score-0.248]

41 Derivatives of E are: ∂E = ∂rn K N αn,k (rn − φk ) and k=1 ∂E = αn,k (φk − rn ), ∂φk n=1 (3) where αn,k = p(ck |xn ) − p(ck |rn ). [sent-72, score-0.192]

42 These learning rules have an intuitive interpretation (analogous to those in SNE) as a sum of forces pulling or pushing rn (φk ) depending on the sign of αn,k . [sent-73, score-0.192]

43 1 In our experiments, we found that optimization proceeded more smoothly with a regularized N K objective function, J = E + ηr n=1 rn 2 +ηφ k=1 φk 2 , where ηr , ηφ > 0. [sent-83, score-0.192]

44 3 Analyzing supervised classifiers on web data In this section, we show how PE can be used to visualize the structure of labeled data (web pages) in a supervised classification task. [sent-84, score-0.355]

45 MDS seeks a lowdimensional embedding that preserves the input distances between objects. [sent-86, score-0.382]

46 It does not normally use class labels for data points, although below we discuss a way to apply MDS to label probabilities that arise in classification. [sent-87, score-0.219]

47 It seeks a a linear projection of the objects’ coordinates in a high-dimensional ambient space that maximizes between-class variance and minimizes within-class variance. [sent-89, score-0.126]

48 The set of objects comprised 5500 human-classified web pages: 500 pages sampled from each of 11 top level classes in Japanese directories of Open Directory (http://dmoz. [sent-90, score-0.49]

49 A Naive Bayes (NB) classifier was trained on the full data (represented as word frequency vectors). [sent-93, score-0.11]

50 Posterior probabilities p(ck |xn ) were calculated for classifying each object (web page), assuming its true class label was unknown. [sent-94, score-0.219]

51 Pages belonging to the same class tend to cluster well in the embedding, which makes sense given the large sample of labeled data. [sent-98, score-0.2]

52 Distinctive pages are evident as well: a few pages that are scattered among the objects of another category might be misclassified. [sent-106, score-0.266]

53 Pages located between clusters are likely to be categorized in multiple classes; arcs between two classes show subsets of objects that distribute their probability among those two classes and no others. [sent-107, score-0.609]

54 1(b) shows the result of MDS applied to cosine distances between web pages. [sent-109, score-0.138]

55 No labeled information is used (only word frequency vectors for the pages), and consequently no class structure is visible. [sent-110, score-0.27]

56 To stabilize the calculation, FLDA was applied only after word frequencies were smoothed via SVD. [sent-113, score-0.11]

57 FLDA uses label information, and clusters together the objects in each class better than MDS does. [sent-114, score-0.331]

58 However, most clusters are highly overlapping, and the separation of classes is much poorer than with PE. [sent-115, score-0.223]

59 1(d) shows another way of embedding the data using MDS, but this time applied to Euclidean distances in the (K − 1)−dimensional space of posterior distributions p(c k |xn ). [sent-118, score-0.387]

60 Pages belonging to the same class are definitely more clustered in this mode, but still the clusters are highly overlapping and provide little insight into the classifier’s behavior. [sent-119, score-0.196]

61 This version of MDS uses the same inputs as PE, rather than any high-dimensional word frequency vectors, but its computations are not explicitly probabilistic. [sent-120, score-0.11]

62 4 Application to semi-supervised classification The utility of PE for analyzing classifier performance may best be illustrated in a semisupervised setting, with a large unlabeled set of objects and a smaller set of labeled objects. [sent-123, score-0.27]

63 Each of the 5500 web pages is show by a particle with shape indicating the page’s class. [sent-125, score-0.138]

64 We constructed a simple probabilistic classifier for 2558 handwritten digits (classes 0-4) from the MNIST database. [sent-127, score-0.132]

65 The classifier was based on a mixture model for the density of each class, defined by selecting either 10 or 100 digits uniformly at random from each class and centering a fixed-covariance Gaussian (in pixel space) on each of these examples – essentially a soft nearest-neighbor method. [sent-128, score-0.217]

66 The posterior distribution over this classifier for all 2558 digits was submitted as input to PE. [sent-129, score-0.196]

67 The resulting embeddings allow us to predict the classifiers’ patterns of confusions, calculated based on the true labels for all 2558 objects. [sent-130, score-0.104]

68 2 shows embeddings for both 10 labels/class and 100 labels/class. [sent-132, score-0.078]

69 In both cases we see five clouds of points corresponding to the five classes. [sent-133, score-0.079]

70 The clouds are elongated and oriented roughly towards a common center, forming a star shape (also seen to some extent in our other applications). [sent-134, score-0.088]

71 x 0 1 2 0 330 117 5 4 3 1 0 Figure 2: Parametric embeddings for handwritten digit classification. [sent-2393, score-0.11]

72 Each dot represents the coordinates rn of one image. [sent-2394, score-0.246]

73 Images of several unlabeled digits are shown for each class. [sent-2397, score-0.099]

74 Moving towards the center of the plot, objects become increasingly confused with other classes. [sent-2399, score-0.295]

75 Relative to using only 10 labels/class, using 100 labels yields clusters that are more distinct, reflecting better between-class discrimination. [sent-2400, score-0.087]

76 In both cases, the ‘1’ class is much closer than any other to the center of the plot, reflecting the fact that instances of other classes tend to be mistaken for ‘1’s. [sent-2402, score-0.359]

77 Instances of other classes near the ‘1’ center also tend to look rather “one-like” – thinner and more elongated. [sent-2403, score-0.223]

78 The dense cluster of points just outside the mean for ‘1’ reflects the fact that ‘1’s are rarely mistaken for other digits. [sent-2404, score-0.167]

79 2(a), the ‘0’ and ‘3’ distributions are particularly overlapping, reflecting that those two digits are most readily confused with each other (apart from 1). [sent-2406, score-0.129]

80 2(b), that ‘webbing’ persists, consistent with the observation that (again, apart from many mistaken responses of 1) the confusion of ‘2’s for ‘3’s is the only large-scale error these larger data permit. [sent-2409, score-0.078]

81 5 Application to unsupervised latent class models In the applications above, PE was applied to visualize the structure of classes based at least to some degree on labeled examples. [sent-2410, score-0.535]

82 The algorithm can also be used in a completely unsupervised setting, to visualize the structure of a probabilistic generative model based on latent classes. [sent-2411, score-0.266]

83 The problem of mapping a large vocabulary is particularly challenging, and, with over 26,000 objects (word types), prohibitively expensive for pairwise methods. [sent-2413, score-0.224]

84 In LDA (not to be confused with FLDA above), each topic defines a probability distribution . [sent-2415, score-0.128]

85 Each dot represents the coordinates rn of one word. [sent-3281, score-0.246]

86 Large phrases indicate the positions of topic means φk (with topics labeled intuitively). [sent-3282, score-0.294]

87 Examples of words that belong to one or more topics are also shown. [sent-3283, score-0.218]

88 This model can be inverted to give the probability that topic ck was responsible for generating word xn ; these probabilities p(ck |xn ) provide the input needed to construct a space of word and topic meanings in PE. [sent-3285, score-1.137]

89 Then, for each word type, we computed its posterior distribution restricted to a subset of 5 topics, and input these conditional probabilities to PE (with N = 26, 243, K = 5). [sent-3287, score-0.348]

90 1 and 2, the topics are arranged roughly in a star shape, with a tight cluster of points at each corner of the star corresponding to words that place almost all of their probability mass on that topic. [sent-3291, score-0.393]

91 Semantically, the words in these extreme clusters often (though not always) have a fairly specialized meaning particular to the nearest topic. [sent-3292, score-0.111]

92 Moving towards the center of the plot, words take on increasingly general meanings. [sent-3293, score-0.099]

93 This embedding shows other structures not visible in previous figures: in particular, dense curves of points connecting every pair of clusters. [sent-3294, score-0.367]

94 This pattern reflects the characteristic probabilistic structure of topic models of semantics: in addition to the clusters of words that associate with just one topic, there are many words that associate with just two topics, or just three, and so on. [sent-3295, score-0.344]

95 3 show that for any pair of topics in this corpus, there exists a substantial subset of words that associate with just those topics. [sent-3297, score-0.247]

96 For words with probability sharply concentrated on two topics, points along these curves minimize the sum of the KL and regularization terms. [sent-3298, score-0.115]

97 3, points along the curves connecting two apparently unrelated topics often have multiple meanings or senses that join them to each topic: ‘deposit’ has both a geological and a financial sense, ‘phase’ has both an everyday and a chemical sense, and so on. [sent-3301, score-0.301]

98 6 Conclusions We have proposed a probabilistic embedding method, PE, that embeds objects and classes simultaneously. [sent-3302, score-0.68]

99 PE takes as input a probability distribution for objects over classes, or more generally of one set of points over another set, and attempts to fit that distribution with a simple class-conditional parametric mixture in the embedding space. [sent-3303, score-0.652]

100 PE provides a tractable, principled, and effective visualization method for large volumes of such data, for which pairwise methods are not appropriate. [sent-3313, score-0.076]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pe', 0.51), ('ck', 0.416), ('embedding', 0.264), ('rn', 0.192), ('objects', 0.19), ('mds', 0.184), ('topics', 0.168), ('xn', 0.162), ('classes', 0.162), ('flda', 0.149), ('sne', 0.13), ('word', 0.11), ('web', 0.1), ('probabilities', 0.089), ('posterior', 0.085), ('visualize', 0.085), ('class', 0.08), ('embeddings', 0.078), ('digits', 0.073), ('classi', 0.072), ('topic', 0.072), ('latent', 0.071), ('meanings', 0.068), ('visualizing', 0.068), ('relations', 0.068), ('mixture', 0.064), ('clusters', 0.061), ('er', 0.059), ('lda', 0.059), ('unsupervised', 0.057), ('confused', 0.056), ('mistaken', 0.056), ('labeled', 0.054), ('parametric', 0.054), ('coordinates', 0.054), ('star', 0.051), ('xx', 0.051), ('object', 0.05), ('words', 0.05), ('ntt', 0.047), ('dirichlet', 0.047), ('ecting', 0.047), ('supervised', 0.045), ('lmds', 0.043), ('webbing', 0.043), ('xxx', 0.043), ('seeks', 0.042), ('visualization', 0.042), ('points', 0.042), ('nearby', 0.04), ('kl', 0.039), ('dense', 0.038), ('input', 0.038), ('pages', 0.038), ('distances', 0.038), ('isomap', 0.038), ('categories', 0.037), ('clouds', 0.037), ('sean', 0.037), ('tasa', 0.037), ('embeds', 0.037), ('tend', 0.035), ('pairwise', 0.034), ('chemistry', 0.034), ('categorized', 0.034), ('handwritten', 0.032), ('divergences', 0.032), ('cluster', 0.031), ('sports', 0.03), ('ambient', 0.03), ('posteriors', 0.03), ('associate', 0.029), ('kinds', 0.029), ('intrinsically', 0.028), ('spherical', 0.028), ('insight', 0.028), ('overlapping', 0.027), ('lastly', 0.027), ('arm', 0.027), ('probabilistic', 0.027), ('seek', 0.027), ('structure', 0.026), ('center', 0.026), ('conditional', 0.026), ('labels', 0.026), ('silva', 0.026), ('unlabeled', 0.026), ('plot', 0.024), ('allocation', 0.024), ('categorization', 0.024), ('arise', 0.024), ('increasingly', 0.023), ('tenenbaum', 0.023), ('conversely', 0.023), ('curves', 0.023), ('re', 0.023), ('apart', 0.022), ('page', 0.022), ('semantics', 0.022), ('semantic', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 145 nips-2004-Parametric Embedding for Class Visualization

Author: Tomoharu Iwata, Kazumi Saito, Naonori Ueda, Sean Stromsten, Thomas L. Griffiths, Joshua B. Tenenbaum

Abstract: In this paper, we propose a new method, Parametric Embedding (PE), for visualizing the posteriors estimated over a mixture model. PE simultaneously embeds both objects and their classes in a low-dimensional space. PE takes as input a set of class posterior vectors for given data points, and tries to preserve the posterior structure in an embedding space by minimizing a sum of Kullback-Leibler divergences, under the assumption that samples are generated by a Gaussian mixture with equal covariances in the embedding space. PE has many potential uses depending on the source of the input data, providing insight into the classifier’s behavior in supervised, semi-supervised and unsupervised settings. The PE algorithm has a computational advantage over conventional embedding methods based on pairwise object relations since its complexity scales with the product of the number of objects and the number of classes. We demonstrate PE by visualizing supervised categorization of web pages, semi-supervised categorization of digits, and the relations of words and latent topics found by an unsupervised algorithm, Latent Dirichlet Allocation. 1

2 0.25893986 62 nips-2004-Euclidean Embedding of Co-Occurrence Data

Author: Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby

Abstract: Embedding algorithms search for low dimensional structure in complex data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for embedding objects of different types, such as images and text, into a single common Euclidean space based on their co-occurrence statistics. The joint distributions are modeled as exponentials of Euclidean distances in the low-dimensional embedding space, which links the problem to convex optimization over positive semidefinite matrices. The local structure of our embedding corresponds to the statistical correlations via random walks in the Euclidean space. We quantify the performance of our method on two text datasets, and show that it consistently and significantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling and correspondence analysis. 1

3 0.17130238 125 nips-2004-Multiple Relational Embedding

Author: Roland Memisevic, Geoffrey E. Hinton

Abstract: We describe a way of using multiple different types of similarity relationship to learn a low-dimensional embedding of a dataset. Our method chooses different, possibly overlapping representations of similarity by individually reweighting the dimensions of a common underlying latent space. When applied to a single similarity relation that is based on Euclidean distances between the input data points, the method reduces to simple dimensionality reduction. If additional information is available about the dataset or about subsets of it, we can use this information to clean up or otherwise improve the embedding. We demonstrate the potential usefulness of this form of semi-supervised dimensionality reduction on some simple examples. 1

4 0.14130652 87 nips-2004-Integrating Topics and Syntax

Author: Thomas L. Griffiths, Mark Steyvers, David M. Blei, Joshua B. Tenenbaum

Abstract: Statistical approaches to language learning typically focus on either short-range syntactic dependencies or long-range semantic dependencies between words. We present a generative model that uses both kinds of dependencies, and can be used to simultaneously find syntactic classes and semantic topics despite having no representation of syntax or semantics beyond statistical dependency. This model is competitive on tasks like part-of-speech tagging and document classification with models that exclusively use short- and long-range dependencies respectively. 1

5 0.12450123 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics

Author: Michael Fink

Abstract: We describe a framework for learning an object classifier from a single example. This goal is achieved by emphasizing the relevant dimensions for classification using available examples of related classes. Learning to accurately classify objects from a single training example is often unfeasible due to overfitting effects. However, if the instance representation provides that the distance between each two instances of the same class is smaller than the distance between any two instances from different classes, then a nearest neighbor classifier could achieve perfect performance with a single training example. We therefore suggest a two stage strategy. First, learn a metric over the instances that achieves the distance criterion mentioned above, from available examples of other related classes. Then, using the single examples, define a nearest neighbor classifier where distance is evaluated by the learned class relevance metric. Finding a metric that emphasizes the relevant dimensions for classification might not be possible when restricted to linear projections. We therefore make use of a kernel based metric learning algorithm. Our setting encodes object instances as sets of locality based descriptors and adopts an appropriate image kernel for the class relevance metric learning. The proposed framework for learning from a single example is demonstrated in a synthetic setting and on a character classification task. 1

6 0.10583295 186 nips-2004-The Correlated Correspondence Algorithm for Unsupervised Registration of Nonrigid Surfaces

7 0.087086141 78 nips-2004-Hierarchical Distributed Representations for Statistical Language Modeling

8 0.087027572 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

9 0.084518239 163 nips-2004-Semi-parametric Exponential Family PCA

10 0.083744831 111 nips-2004-Maximal Margin Labeling for Multi-Topic Text Categorization

11 0.08034379 160 nips-2004-Seeing through water

12 0.076628 54 nips-2004-Distributed Information Regularization on Graphs

13 0.075304829 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering

14 0.074620135 166 nips-2004-Semi-supervised Learning via Gaussian Processes

15 0.074099064 169 nips-2004-Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes

16 0.073318876 15 nips-2004-Active Learning for Anomaly and Rare-Category Detection

17 0.071248539 127 nips-2004-Neighbourhood Components Analysis

18 0.06928958 164 nips-2004-Semi-supervised Learning by Entropy Minimization

19 0.067537211 115 nips-2004-Maximum Margin Clustering

20 0.067429975 165 nips-2004-Semi-supervised Learning on Directed Graphs


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.21), (1, 0.084), (2, -0.074), (3, -0.097), (4, 0.05), (5, 0.12), (6, -0.124), (7, 0.118), (8, 0.19), (9, 0.065), (10, 0.023), (11, -0.058), (12, -0.085), (13, -0.057), (14, -0.02), (15, 0.131), (16, 0.135), (17, -0.084), (18, -0.084), (19, 0.083), (20, -0.113), (21, -0.07), (22, 0.162), (23, -0.018), (24, -0.119), (25, -0.165), (26, -0.04), (27, -0.023), (28, -0.051), (29, -0.125), (30, 0.036), (31, 0.02), (32, 0.078), (33, -0.16), (34, 0.013), (35, 0.037), (36, -0.073), (37, -0.044), (38, 0.03), (39, -0.03), (40, 0.153), (41, 0.047), (42, 0.142), (43, 0.086), (44, 0.017), (45, 0.014), (46, -0.027), (47, -0.079), (48, 0.172), (49, -0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9675492 145 nips-2004-Parametric Embedding for Class Visualization

Author: Tomoharu Iwata, Kazumi Saito, Naonori Ueda, Sean Stromsten, Thomas L. Griffiths, Joshua B. Tenenbaum

Abstract: In this paper, we propose a new method, Parametric Embedding (PE), for visualizing the posteriors estimated over a mixture model. PE simultaneously embeds both objects and their classes in a low-dimensional space. PE takes as input a set of class posterior vectors for given data points, and tries to preserve the posterior structure in an embedding space by minimizing a sum of Kullback-Leibler divergences, under the assumption that samples are generated by a Gaussian mixture with equal covariances in the embedding space. PE has many potential uses depending on the source of the input data, providing insight into the classifier’s behavior in supervised, semi-supervised and unsupervised settings. The PE algorithm has a computational advantage over conventional embedding methods based on pairwise object relations since its complexity scales with the product of the number of objects and the number of classes. We demonstrate PE by visualizing supervised categorization of web pages, semi-supervised categorization of digits, and the relations of words and latent topics found by an unsupervised algorithm, Latent Dirichlet Allocation. 1

2 0.81661856 62 nips-2004-Euclidean Embedding of Co-Occurrence Data

Author: Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby

Abstract: Embedding algorithms search for low dimensional structure in complex data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for embedding objects of different types, such as images and text, into a single common Euclidean space based on their co-occurrence statistics. The joint distributions are modeled as exponentials of Euclidean distances in the low-dimensional embedding space, which links the problem to convex optimization over positive semidefinite matrices. The local structure of our embedding corresponds to the statistical correlations via random walks in the Euclidean space. We quantify the performance of our method on two text datasets, and show that it consistently and significantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling and correspondence analysis. 1

3 0.5901556 125 nips-2004-Multiple Relational Embedding

Author: Roland Memisevic, Geoffrey E. Hinton

Abstract: We describe a way of using multiple different types of similarity relationship to learn a low-dimensional embedding of a dataset. Our method chooses different, possibly overlapping representations of similarity by individually reweighting the dimensions of a common underlying latent space. When applied to a single similarity relation that is based on Euclidean distances between the input data points, the method reduces to simple dimensionality reduction. If additional information is available about the dataset or about subsets of it, we can use this information to clean up or otherwise improve the embedding. We demonstrate the potential usefulness of this form of semi-supervised dimensionality reduction on some simple examples. 1

4 0.57227272 87 nips-2004-Integrating Topics and Syntax

Author: Thomas L. Griffiths, Mark Steyvers, David M. Blei, Joshua B. Tenenbaum

Abstract: Statistical approaches to language learning typically focus on either short-range syntactic dependencies or long-range semantic dependencies between words. We present a generative model that uses both kinds of dependencies, and can be used to simultaneously find syntactic classes and semantic topics despite having no representation of syntax or semantics beyond statistical dependency. This model is competitive on tasks like part-of-speech tagging and document classification with models that exclusively use short- and long-range dependencies respectively. 1

5 0.43338847 127 nips-2004-Neighbourhood Components Analysis

Author: Jacob Goldberger, Geoffrey E. Hinton, Sam T. Roweis, Ruslan Salakhutdinov

Abstract: In this paper we propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leave-one-out KNN score on the training set. It can also learn a low-dimensional linear embedding of labeled data that can be used for data visualization and fast classification. Unlike other methods, our classification model is non-parametric, making no assumptions about the shape of the class distributions or the boundaries between them. The performance of the method is demonstrated on several data sets, both for metric learning and linear dimensionality reduction. 1

6 0.40447357 78 nips-2004-Hierarchical Distributed Representations for Statistical Language Modeling

7 0.37706482 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics

8 0.37528953 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

9 0.36360887 186 nips-2004-The Correlated Correspondence Algorithm for Unsupervised Registration of Nonrigid Surfaces

10 0.36124441 111 nips-2004-Maximal Margin Labeling for Multi-Topic Text Categorization

11 0.35758761 54 nips-2004-Distributed Information Regularization on Graphs

12 0.3490158 166 nips-2004-Semi-supervised Learning via Gaussian Processes

13 0.34212047 101 nips-2004-Learning Syntactic Patterns for Automatic Hypernym Discovery

14 0.3355056 169 nips-2004-Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes

15 0.32908589 100 nips-2004-Learning Preferences for Multiclass Problems

16 0.32616255 15 nips-2004-Active Learning for Anomaly and Rare-Category Detection

17 0.32318604 196 nips-2004-Triangle Fixing Algorithms for the Metric Nearness Problem

18 0.31343943 160 nips-2004-Seeing through water

19 0.30517378 163 nips-2004-Semi-parametric Exponential Family PCA

20 0.29808027 165 nips-2004-Semi-supervised Learning on Directed Graphs


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(9, 0.041), (13, 0.084), (15, 0.097), (21, 0.01), (26, 0.04), (31, 0.034), (33, 0.179), (35, 0.039), (39, 0.035), (50, 0.039), (54, 0.016), (56, 0.068), (87, 0.039), (94, 0.191)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91079462 152 nips-2004-Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization

Author: Fei Sha, Lawrence K. Saul

Abstract: An auditory “scene”, composed of overlapping acoustic sources, can be viewed as a complex object whose constituent parts are the individual sources. Pitch is known to be an important cue for auditory scene analysis. In this paper, with the goal of building agents that operate in human environments, we describe a real-time system to identify the presence of one or more voices and compute their pitch. The signal processing in the front end is based on instantaneous frequency estimation, a method for tracking the partials of voiced speech, while the pattern-matching in the back end is based on nonnegative matrix factorization, an unsupervised algorithm for learning the parts of complex objects. While supporting a framework to analyze complicated auditory scenes, our system maintains real-time operability and state-of-the-art performance in clean speech.

same-paper 2 0.87214702 145 nips-2004-Parametric Embedding for Class Visualization

Author: Tomoharu Iwata, Kazumi Saito, Naonori Ueda, Sean Stromsten, Thomas L. Griffiths, Joshua B. Tenenbaum

Abstract: In this paper, we propose a new method, Parametric Embedding (PE), for visualizing the posteriors estimated over a mixture model. PE simultaneously embeds both objects and their classes in a low-dimensional space. PE takes as input a set of class posterior vectors for given data points, and tries to preserve the posterior structure in an embedding space by minimizing a sum of Kullback-Leibler divergences, under the assumption that samples are generated by a Gaussian mixture with equal covariances in the embedding space. PE has many potential uses depending on the source of the input data, providing insight into the classifier’s behavior in supervised, semi-supervised and unsupervised settings. The PE algorithm has a computational advantage over conventional embedding methods based on pairwise object relations since its complexity scales with the product of the number of objects and the number of classes. We demonstrate PE by visualizing supervised categorization of web pages, semi-supervised categorization of digits, and the relations of words and latent topics found by an unsupervised algorithm, Latent Dirichlet Allocation. 1

3 0.84376854 96 nips-2004-Learning, Regularization and Ill-Posed Inverse Problems

Author: Lorenzo Rosasco, Andrea Caponnetto, Ernesto D. Vito, Francesca Odone, Umberto D. Giovannini

Abstract: Many works have shown that strong connections relate learning from examples to regularization techniques for ill-posed inverse problems. Nevertheless by now there was no formal evidence neither that learning from examples could be seen as an inverse problem nor that theoretical results in learning theory could be independently derived using tools from regularization theory. In this paper we provide a positive answer to both questions. Indeed, considering the square loss, we translate the learning problem in the language of regularization theory and show that consistency results and optimal regularization parameter choice can be derived by the discretization of the corresponding inverse problem. 1

4 0.82348859 19 nips-2004-An Application of Boosting to Graph Classification

Author: Taku Kudo, Eisaku Maeda, Yuji Matsumoto

Abstract: This paper presents an application of Boosting for classifying labeled graphs, general structures for modeling a number of real-world data, such as chemical compounds, natural language texts, and bio sequences. The proposal consists of i) decision stumps that use subgraph as features, and ii) a Boosting algorithm in which subgraph-based decision stumps are used as weak learners. We also discuss the relation between our algorithm and SVMs with convolution kernels. Two experiments using natural language data and chemical compounds show that our method achieves comparable or even better performance than SVMs with convolution kernels as well as improves the testing efficiency. 1

5 0.76949865 62 nips-2004-Euclidean Embedding of Co-Occurrence Data

Author: Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby

Abstract: Embedding algorithms search for low dimensional structure in complex data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for embedding objects of different types, such as images and text, into a single common Euclidean space based on their co-occurrence statistics. The joint distributions are modeled as exponentials of Euclidean distances in the low-dimensional embedding space, which links the problem to convex optimization over positive semidefinite matrices. The local structure of our embedding corresponds to the statistical correlations via random walks in the Euclidean space. We quantify the performance of our method on two text datasets, and show that it consistently and significantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling and correspondence analysis. 1

6 0.76672709 46 nips-2004-Constraining a Bayesian Model of Human Visual Speed Perception

7 0.76132762 89 nips-2004-Joint MRI Bias Removal Using Entropy Minimization Across Images

8 0.74274182 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images

9 0.74145985 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach

10 0.73751891 54 nips-2004-Distributed Information Regularization on Graphs

11 0.73325145 125 nips-2004-Multiple Relational Embedding

12 0.73171681 32 nips-2004-Boosting on Manifolds: Adaptive Regularization of Base Classifiers

13 0.72995597 163 nips-2004-Semi-parametric Exponential Family PCA

14 0.72800207 100 nips-2004-Learning Preferences for Multiclass Problems

15 0.7264573 87 nips-2004-Integrating Topics and Syntax

16 0.72559136 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss

17 0.72507232 131 nips-2004-Non-Local Manifold Tangent Learning

18 0.72441 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

19 0.72432125 102 nips-2004-Learning first-order Markov models for control

20 0.72346598 95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity