nips nips2002 nips2002-34 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nicholas P. Hughes, David Lowe
Abstract: We consider the problem of illusory or artefactual structure from the visualisation of high-dimensional structureless data. In particular we examine the role of the distance metric in the use of topographic mappings based on the statistical field of multidimensional scaling. We show that the use of a squared Euclidean metric (i.e. the SS TRESS measure) gives rise to an annular structure when the input data is drawn from a highdimensional isotropic distribution, and we provide a theoretical justification for this observation.
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract We consider the problem of illusory or artefactual structure from the visualisation of high-dimensional structureless data. [sent-8, score-0.727]
2 In particular we examine the role of the distance metric in the use of topographic mappings based on the statistical field of multidimensional scaling. [sent-9, score-0.479]
3 We show that the use of a squared Euclidean metric (i. [sent-10, score-0.214]
4 the SS TRESS measure) gives rise to an annular structure when the input data is drawn from a highdimensional isotropic distribution, and we provide a theoretical justification for this observation. [sent-12, score-0.426]
5 One approach to the aforementioned problem then is to find a faithful1 representation of the data in a lower dimensional space. [sent-15, score-0.059]
6 Typically this space is chosen to be two- or three-dimensional, thus facilitating the visualisation and exploratory analysis of the intrinsic low-dimensional structure in the data (which would otherwise be masked by the dimensionality of the data space). [sent-16, score-0.436]
7 In this context then, an effective dimensionality reduction algorithm should seek to extract the underlying relationships in the data with minimum loss of information. [sent-17, score-0.107]
8 Conversely, any interesting patterns which are present in the visualisation space should be representative of similar patterns in the original data space, and not artefacts of the dimensionality reduction process. [sent-18, score-0.41]
9 1 By “faithful” we mean that the underlying geometric structure in the data space, which characterises the informative relationships in the data, is preserved in the visualisation space. [sent-19, score-0.391]
10 Although much effort has been focused on the former problem of optimal structure elucidation (see [7, 10] for recent approaches to dimensionality reduction), comparatively little work has been undertaken on the latter (and equally important) problem of artefactual structure. [sent-20, score-0.354]
11 This shortcoming was recently highlighted in a controversial example of the application of visualisation techniques to neuroanatomical connectivity data derived from the primate visual cortex [12, 9, 13, 3]. [sent-21, score-0.376]
12 In this paper we attempt to redress the balance by considering the visualisation of highdimensional structureless data through the use of topographic mappings based on the statistical field of multidimensional scaling (MDS). [sent-22, score-0.839]
13 This is an important class of mappings which have recently been brought into the neural network domain [5], and have significant connections to modern kernel-based algorithms such as kernel PCA [11]. [sent-23, score-0.097]
14 The organisation of the remainder of this paper is as follows: In section 2 we introduce the technique of multidimensional scaling and relate this to the field of topographic mappings. [sent-24, score-0.296]
15 In section 3 we show how under certain conditions such mappings can give rise to artefactual structure. [sent-25, score-0.399]
16 A theoretical analysis of this effect is then presented in section 4. [sent-26, score-0.026]
17 2 Multidimensional Scaling and Topographic Mappings The visualisation of experimental data which is characterised by pairwise proximity values is a common problem in areas such as psychology, molecular biology and linguistics. [sent-27, score-0.408]
18 Multidimensional scaling (MDS) is a statistical technique which can be used to construct a spatial configuration of points in a (typically) two- or three-dimensional space given a matrix of pairwise proximity values between objects. [sent-28, score-0.263]
19 The proximity matrix provides a measure of the similarity or dissimilarity between the objects, and the geometric layout of the resulting MDS configuration reflects the relationships between the objects as defined by this matrix. [sent-29, score-0.22]
20 In this way the information contained within the proximity matrix can be captured by a more succinct spatial model which aids visualisation of the data and improves understanding of the processes that generated it. [sent-30, score-0.406]
21 In many situations, the raw dissimilarities will not be representative of actual inter-point distances between the objects, and thus will not be suitable for embedding in a lowdimensional space. [sent-31, score-0.256]
22 The aim of metric MDS then is that the transformed dissimilarities should correspond as closely as possible to the inter-point disin the resulting configuration2. [sent-33, score-0.243]
23 tances ¥£ ¡ ¢ ¥ £ Metric MDS can be formulated as a continuous optimisation problem through the definition of an appropriate error function. [sent-34, score-0.096]
24 In particular, least squares scaling algorithms directly seek to minimise the sum-of-squares error between the disparities and the inter-point distances. [sent-35, score-0.367]
25 § S TRESS (1) 2 This is in contrast to nonmetric MDS which requires that only the ordering of the disparities corresponds to the ordering of the inter-point distances (and thus that the disparities are some arbitrary monotonically increasing function of the distances). [sent-37, score-0.412]
26 ¥ £ where the term is a normalising constant which reduces the sensitivity of the measure to the number of points and the scaling of the disparities, and the are the weighting factors. [sent-40, score-0.243]
27 It is straightforward to differentiate this S TRESS measure with respect to the configuration points and minimise the error through the use of standard nonlinear optimisation techniques. [sent-41, score-0.261]
28 ¥ £ ( £ ¡ ¢ An alternative and commonly used error function, which is referred to as SS TRESS, is given by: 5 4 ¦"¥ £ 2 "¥ £ ¡ ¢ 0 £ $ "¥ £ ¡ ¢ ¥ £ ! [sent-42, score-0.049]
29 £ % '&¥$ § SS TRESS (2) which represents the sum-of-squares error between squared disparities and squared distances. [sent-43, score-0.37]
30 The primary advantage of the SS TRESS measure is that it can be efficiently minimised through the use of an alternating least squares procedure4 [1]. [sent-44, score-0.144]
31 Closely related to the field of Metric MDS is Sammon’s mapping [8], which takes as its input a set of high-dimensional vectors and seeks to produce a set of lower dimensional vectors such that the following error measure is minimised: 5 ¦£ ¥ ¥ £ ¥ 2 £ £ $ ¥ £ ¥£ ! [sent-45, score-0.164]
32 £ &¥ $ % ¨ ¨ ¦¤ § ©§¥£ ¥ ¥ 2 £ § £ are the inter-point Euclidean distances in the data space: , are the corresponding inter-point Euclidean distances in the feature or map . [sent-46, score-0.344]
33 § ¥ £ ¥ £ Ignoring the normalising constant, Sammon’s mapping is thus equivalent to least squares metric MDS with the disparities taken to be the raw inter-point distances in the data space and the weighting factors given by . [sent-48, score-0.59]
34 Lowe (1993) termed such a mapping based on the minimisation of an error measure of the form a topographic mapping, since this constraint “optimally preserves the geometric structure in the data” [5]. [sent-49, score-0.349]
35 ¥ £ " § ¥ £ ( Interestingly the choice of the S TRESS or SS TRESS measure in MDS has a more natural interpretation when viewed within the framework of Sammon’s mapping. [sent-51, score-0.047]
36 In particular, S TRESS corresponds to the use of the standard Euclidean distance metric whereas SS TRESS corresponds to the use of the squared Euclidean distance metric. [sent-52, score-0.274]
37 In the next section we show that this choice of metric can lead to markedly different results when the input data is sampled from a high-dimensional isotropic distribution. [sent-53, score-0.308]
38 Such data can be generated by sampling from an isotropic distribution (such as a spherical Gaussian), which is characterised by a covariance matrix that is proportional to the identity matrix, and a skewness of zero. [sent-55, score-0.317]
39 We created four structureless data sets by randomly sampling 1000 i. [sent-56, score-0.155]
40 points from unit hypercubes of dimensions 5, 10, 30 and 100. [sent-59, score-0.14]
41 For each data set, we generated a pair § # 4 The SS TRESS measure now forms the basis of the ALSCAL implementation of MDS, which is included as part of the SPSS software package for statistical data analysis. [sent-60, score-0.109]
42 2 Figure 1: Final map configurations produced by S TRESS mappings of data uniformly randomly distributed in unit hypercubes of dimension . [sent-88, score-0.327]
43 # of 2-D configurations by minimising5 S TRESS and SS TRESS error measures of the form and respectively. [sent-89, score-0.049]
44 The process was repeated fifty times (for each individual error function and data set) using different initial configurations of the map points, and the configuration with the lowest final error was retained. [sent-90, score-0.246]
45 ¥ £ As previously noted, the choice of the S TRESS or SS TRESS error measure is best viewed as a choice of distance metric, where S TRESS corresponds to the standard Euclidean metric and SS TRESS corresponds to the squared Euclidean metric. [sent-93, score-0.34]
46 It is clear that each configuration has captured the isotropic nature of the associated data set, and there are no spurious patterns or clusters evident in the final visualisation plots. [sent-95, score-0.52]
47 2 Figure 2: Final map configurations produced by SS TRESS mappings of data uniformly randomly distributed in unit hypercubes of dimension . [sent-131, score-0.327]
48 The configurations exhibit significant artefactual structure, which is characterised by a tendency for the map points to cluster in a circular fashion. [sent-133, score-0.648]
49 Furthermore, the degree of clustering increases with increasing dimensionality of the data space (and is clearly evident for as low as 10). [sent-134, score-0.141]
50 # # Although the tendency for SS TRESS configurations to cluster in a circular fashion has been noted in the MDS literature [2], the connection between artefactual structure and the choice of distance metric has not been made. [sent-135, score-0.65]
51 Indeed, in the next section we show analytically that the use of the squared Euclidean metric leads to a globally optimal solution corresponding to an annular structure. [sent-136, score-0.288]
52 To date, the most significant work on this problem is that of Klock and Buhmann [4], who proposed a novel transformation of the dissimilarities (i. [sent-137, score-0.112]
53 the squared inter-point distances 5 We used a conjugate gradients optimisation algorithm. [sent-139, score-0.227]
54 in the data space) such that “the final disparities are more suitable for Euclidean embedding”. [sent-140, score-0.211]
55 However this transformation assumes that the input data are drawn from a spherical Gaussian distribution6 , which is inappropriate for most real-world data sets of interest. [sent-141, score-0.153]
56 4 Theoretical Analysis of Artefactual Structure In this section we present a theoretical analysis of the artefactual structure problem. [sent-142, score-0.343]
57 A dimensional map configuration is considered to be the result of a SS TRESS mapping of a data set of i. [sent-143, score-0.216]
58 points drawn from a dimensional isotropic distribution (where ). [sent-146, score-0.285]
59 T The set of data points is given by the x matrix and similarly T the set of map points is given by the x matrix . [sent-147, score-0.374]
60 ¥$ ¢ £ 2 ¡ 2 6£ ¢ GI H £ ¢ £ ¢ 2 £ £ In this case the squared inter-point distances will follow a T ¥ £ § 6 ¥ £ £ ¢ " £ ! [sent-167, score-0.18]
61 ¥$ § 8£ 2 ¢ ¢ T T ¥ £ 6 T ¢ ¢ Thus at a stationary point of the error (i. [sent-169, score-0.1]
62 ¥$ T T ¢ £ T T ¥ T ¥ Equation (4) can therefore be expanded to: T We begin by defining the derivative of the SS TRESS error measure with respect to a particular map vector : ¢ 1 2 6£ " ! [sent-173, score-0.213]
63 ¥$ £ GI ¢ 2 % § ¡ £ Since the error is a function of the inter-point distances only, we can centre both the data points and the map points on the origin without loss of generality. [sent-177, score-0.457]
64 For large we have: ¥ ¥ ¨ ¦ ©¡ T tr ¨ ¦ ¥ ¥ ¡ " ! [sent-178, score-0.092]
65 ¥$ £ T ¦ ¥ ¡ ¨ ¦ ¡ ¥ ¡ ¨ ¦ ©§¡ ¡ ¢ £ ¥ ¥ ¢ ¥ ¡ tr ¦ ! [sent-182, score-0.092]
66 ¨ where is the x zero matrix, is the covariance matrix of the map vectors, is the covariance matrix of the map vectors and the data vectors, and tr is the matrix trace operator. [sent-187, score-0.453]
67 ¥$ £ ¨ ¦ 2 # $£ tr T ¢ # ¡ £ ¢ £ ¢ £ T ¥ % 2 46£ ¨ ¦ 2 £ £ T ¢ ¢ % ¨ ¦ § This represents a general expression for the value of the map vector at a stationary point of the SS TRESS error, regardless of the nature of the input data distribution. [sent-191, score-0.337]
68 However we are interested in the case where the input data is drawn from a high-dimensional isotropic distribution. [sent-192, score-0.207]
69 ¢ If the data space is isotropic then a stationary point of the error will correspond to a similarly isotropic map space7 . [sent-193, score-0.561]
70 Thus, at a stationary point, we have for large : 2 ) # # " ) # " and " " % ¡ (¦ " # ¤ ! [sent-194, score-0.051]
71 £E ¡ ¨ ¦ ¢ & # ¡ ¨ '% $"©¦ % # 2 ¨ ©¦ ¢ where is the x identity matrix, and the data space respectively. [sent-195, score-0.054]
72 ¢ tr ¡ tr are the variances in the map space and & Finally, consider the expression: 2 ! [sent-196, score-0.355]
73 ¥$ 2 ¡ The first term is the third order moment, which is zero for an isotropic distribution [6]. [sent-199, score-0.145]
74 ¥$ 2 ¡ 7 T (7) This is true regardless of the initial distribution of the map points, although a highly non-uniform initial configuration would take significantly longer to reach a local minimum of the error function. [sent-206, score-0.189]
75 ¢ F§ E T (8) ¤ % # Thus, for large , the variance of the map points is related to the variance of the data points by a factor of . [sent-208, score-0.31]
76 Table 1 shows the values of the observed and predicted map variances for 1000 data points sampled randomly from uniform distributions in the interval (i. [sent-209, score-0.283]
77 Clearly as the dimension of the data space increases, so too does the accuracy of the approximation given by equation (7), and therefore the accuracy of equation (8). [sent-212, score-0.077]
78 823 " % # ¡ 3 § # # Number of points 1000 1000 1000 1000 ¤ ¦ ¤ ¦#¥ ¢ # " ) # Dimension 5 10 30 100 predicted 0. [sent-217, score-0.104]
79 4% Table 1: A comparison of the predicted and observed map variances. [sent-225, score-0.14]
80 We can show that this mismatch in variances in the two spaces results in the map points clustering in a circular fashion by considering the expected squared distance of the map of the annulus): points from the origin (i. [sent-226, score-0.652]
81 % "# ¦ ¢ § &% '#$ 5 5¦ separates since and will be uncorrelated due to the where the expectation over isotropic nature of . [sent-231, score-0.168]
82 In general for a -dimensional map space we have that . [sent-232, score-0.14]
83 the optimal configuration will be an annulus or ring shape, as observed 5 Conclusions We have investigated the problem or artefactual or illusory structure from topographic mappings based upon least squares scaling algorithms from multidimensional scaling. [sent-235, score-0.846]
84 In particular we have shown that the use of a squared Euclidean distance metric (i. [sent-236, score-0.244]
85 the SS TRESS measure) gives rise to an annular structure when the input data is drawn from a highdimensional isotropic distribution. [sent-238, score-0.4]
86 A theoretical analysis of this problem was presented and a simple relationship between the variance of the map and the data points was derived. [sent-239, score-0.255]
87 Finally we showed that this relationship results in an optimal configuration which is characterised by the map points clustering in a circular fashion. [sent-240, score-0.367]
88 An evaluation of the use of multidimensional scaling for understanding brain connectivity. [sent-262, score-0.206]
89 Neuroscale: Novel topographic feature extraction with radial basis function networks. [sent-280, score-0.09]
90 On a connection between kernel PCA and metric multidimensional scaling. [sent-340, score-0.289]
91 Objective analysis of the topological organization of the primate cortical visual system. [sent-351, score-0.049]
92 Non-metric multidimensional scaling in the analysis of neuroanatomical connection data and the organization of the primate cortical visual system. [sent-364, score-0.362]
wordName wordTfidf (topN-words)
[('tress', 0.624), ('artefactual', 0.272), ('ss', 0.262), ('visualisation', 0.247), ('mds', 0.173), ('disparities', 0.157), ('isotropic', 0.145), ('guration', 0.136), ('metric', 0.132), ('multidimensional', 0.13), ('structureless', 0.124), ('map', 0.117), ('gurations', 0.103), ('euclidean', 0.102), ('distances', 0.098), ('mappings', 0.097), ('tr', 0.092), ('con', 0.091), ('topographic', 0.09), ('squared', 0.082), ('dissimilarities', 0.082), ('points', 0.081), ('characterised', 0.079), ('scaling', 0.076), ('annular', 0.074), ('sammon', 0.074), ('circular', 0.066), ('lowe', 0.065), ('hypercubes', 0.059), ('proximity', 0.051), ('stationary', 0.051), ('error', 0.049), ('annulus', 0.049), ('klock', 0.049), ('minimised', 0.049), ('neuroanatomical', 0.049), ('simmen', 0.049), ('primate', 0.049), ('squares', 0.048), ('optimisation', 0.047), ('measure', 0.047), ('structure', 0.045), ('highdimensional', 0.044), ('goodhill', 0.043), ('mapping', 0.04), ('illusory', 0.039), ('normalising', 0.039), ('relationships', 0.039), ('dimensionality', 0.037), ('philosophical', 0.037), ('minimise', 0.037), ('pq', 0.033), ('tendency', 0.033), ('matrix', 0.032), ('variances', 0.031), ('embedding', 0.031), ('drawn', 0.031), ('editors', 0.031), ('data', 0.031), ('transformation', 0.03), ('spherical', 0.03), ('rise', 0.03), ('distance', 0.03), ('geometric', 0.029), ('transformed', 0.029), ('dimensional', 0.028), ('connection', 0.027), ('termed', 0.027), ('theoretical', 0.026), ('evident', 0.026), ('gi', 0.026), ('patterns', 0.025), ('discovery', 0.025), ('final', 0.024), ('clustering', 0.024), ('eld', 0.024), ('space', 0.023), ('nature', 0.023), ('predicted', 0.023), ('regardless', 0.023), ('suitable', 0.023), ('captured', 0.023), ('fashion', 0.023), ('cant', 0.023), ('dimension', 0.023), ('objects', 0.022), ('ma', 0.022), ('noted', 0.022), ('transactions', 0.022), ('manifold', 0.022), ('raw', 0.022), ('aston', 0.022), ('birmingham', 0.022), ('aids', 0.022), ('artefacts', 0.022), ('facilitating', 0.022), ('kent', 0.022), ('mardia', 0.022), ('minimisation', 0.022), ('multichannel', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 34 nips-2002-Artefactual Structure from Least-Squares Multidimensional Scaling
Author: Nicholas P. Hughes, David Lowe
Abstract: We consider the problem of illusory or artefactual structure from the visualisation of high-dimensional structureless data. In particular we examine the role of the distance metric in the use of topographic mappings based on the statistical field of multidimensional scaling. We show that the use of a squared Euclidean metric (i.e. the SS TRESS measure) gives rise to an annular structure when the input data is drawn from a highdimensional isotropic distribution, and we provide a theoretical justification for this observation.
2 0.12440532 97 nips-2002-Global Versus Local Methods in Nonlinear Dimensionality Reduction
Author: Vin D. Silva, Joshua B. Tenenbaum
Abstract: Recently proposed algorithms for nonlinear dimensionality reduction fall broadly into two categories which have different advantages and disadvantages: global (Isomap [1]), and local (Locally Linear Embedding [2], Laplacian Eigenmaps [3]). We present two variants of Isomap which combine the advantages of the global approach with what have previously been exclusive advantages of local methods: computational sparsity and the ability to invert conformal maps.
3 0.11316101 70 nips-2002-Distance Metric Learning with Application to Clustering with Side-Information
Author: Eric P. Xing, Michael I. Jordan, Stuart Russell, Andrew Y. Ng
Abstract: Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many “plausible” ways, and if a clustering algorithm such as K-means initially fails to find one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are found. For these and other applications requiring good metrics, it is desirable that we provide a more systematic way for users to indicate what they consider “similar.” For instance, we may ask them to provide examples. In this paper, we present an algorithm that, given examples of similar (and, , learns a distance metric over if desired, dissimilar) pairs of points in that respects these relationships. Our method is based on posing metric learning as a convex optimization problem, which allows us to give efficient, local-optima-free algorithms. We also demonstrate empirically that the learned metrics can be used to significantly improve clustering performance. £ ¤¢ £ ¥¢
4 0.09497112 80 nips-2002-Exact MAP Estimates by (Hyper)tree Agreement
Author: Martin J. Wainwright, Tommi S. Jaakkola, Alan S. Willsky
Abstract: We describe a method for computing provably exact maximum a posteriori (MAP) estimates for a subclass of problems on graphs with cycles. The basic idea is to represent the original problem on the graph with cycles as a convex combination of tree-structured problems. A convexity argument then guarantees that the optimal value of the original problem (i.e., the log probability of the MAP assignment) is upper bounded by the combined optimal values of the tree problems. We prove that this upper bound is met with equality if and only if the tree problems share an optimal configuration in common. An important implication is that any such shared configuration must also be the MAP configuration for the original problem. Next we develop a tree-reweighted max-product algorithm for attempting to find convex combinations of tree-structured problems that share a common optimum. We give necessary and sufficient conditions for a fixed point to yield the exact MAP estimate. An attractive feature of our analysis is that it generalizes naturally to convex combinations of hypertree-structured distributions.
5 0.088588931 98 nips-2002-Going Metric: Denoising Pairwise Data
Author: Volker Roth, Julian Laub, Klaus-Robert Müller, Joachim M. Buhmann
Abstract: Pairwise data in empirical sciences typically violate metricity, either due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multi-dimensional scaling (MDS) that allows us to apply a variety of classical machine learning and signal processing algorithms. The class of pairwise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step. Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation. We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods. 1
6 0.066071965 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs
7 0.064711034 49 nips-2002-Charting a Manifold
8 0.056931816 190 nips-2002-Stochastic Neighbor Embedding
9 0.05678523 27 nips-2002-An Impossibility Theorem for Clustering
10 0.054890849 124 nips-2002-Learning Graphical Models with Mercer Kernels
11 0.054586183 131 nips-2002-Learning to Classify Galaxy Shapes Using the EM Algorithm
12 0.048084334 158 nips-2002-One-Class LP Classifiers for Dissimilarity Representations
13 0.047003329 115 nips-2002-Informed Projections
14 0.045107935 113 nips-2002-Information Diffusion Kernels
15 0.04475205 36 nips-2002-Automatic Alignment of Local Representations
16 0.044547107 138 nips-2002-Manifold Parzen Windows
17 0.042559374 66 nips-2002-Developing Topography and Ocular Dominance Using Two aVLSI Vision Sensors and a Neurotrophic Model of Plasticity
18 0.041251533 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers
19 0.04120332 203 nips-2002-Using Tarjan's Red Rule for Fast Dependency Tree Construction
20 0.039579619 19 nips-2002-Adapting Codes and Embeddings for Polychotomies
topicId topicWeight
[(0, -0.139), (1, -0.024), (2, 0.009), (3, 0.025), (4, -0.057), (5, 0.038), (6, -0.029), (7, -0.06), (8, -0.088), (9, 0.144), (10, 0.06), (11, 0.006), (12, -0.017), (13, -0.028), (14, 0.043), (15, 0.112), (16, -0.048), (17, 0.039), (18, -0.001), (19, -0.083), (20, -0.042), (21, 0.082), (22, -0.004), (23, 0.012), (24, -0.028), (25, -0.065), (26, 0.1), (27, 0.053), (28, -0.018), (29, -0.025), (30, -0.017), (31, -0.011), (32, -0.034), (33, 0.106), (34, 0.038), (35, -0.072), (36, -0.069), (37, 0.01), (38, -0.081), (39, 0.06), (40, 0.008), (41, -0.083), (42, 0.184), (43, 0.183), (44, 0.073), (45, -0.105), (46, -0.108), (47, 0.125), (48, -0.121), (49, 0.187)]
simIndex simValue paperId paperTitle
same-paper 1 0.94292921 34 nips-2002-Artefactual Structure from Least-Squares Multidimensional Scaling
Author: Nicholas P. Hughes, David Lowe
Abstract: We consider the problem of illusory or artefactual structure from the visualisation of high-dimensional structureless data. In particular we examine the role of the distance metric in the use of topographic mappings based on the statistical field of multidimensional scaling. We show that the use of a squared Euclidean metric (i.e. the SS TRESS measure) gives rise to an annular structure when the input data is drawn from a highdimensional isotropic distribution, and we provide a theoretical justification for this observation.
2 0.59545922 97 nips-2002-Global Versus Local Methods in Nonlinear Dimensionality Reduction
Author: Vin D. Silva, Joshua B. Tenenbaum
Abstract: Recently proposed algorithms for nonlinear dimensionality reduction fall broadly into two categories which have different advantages and disadvantages: global (Isomap [1]), and local (Locally Linear Embedding [2], Laplacian Eigenmaps [3]). We present two variants of Isomap which combine the advantages of the global approach with what have previously been exclusive advantages of local methods: computational sparsity and the ability to invert conformal maps.
3 0.51804161 70 nips-2002-Distance Metric Learning with Application to Clustering with Side-Information
Author: Eric P. Xing, Michael I. Jordan, Stuart Russell, Andrew Y. Ng
Abstract: Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many “plausible” ways, and if a clustering algorithm such as K-means initially fails to find one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are found. For these and other applications requiring good metrics, it is desirable that we provide a more systematic way for users to indicate what they consider “similar.” For instance, we may ask them to provide examples. In this paper, we present an algorithm that, given examples of similar (and, , learns a distance metric over if desired, dissimilar) pairs of points in that respects these relationships. Our method is based on posing metric learning as a convex optimization problem, which allows us to give efficient, local-optima-free algorithms. We also demonstrate empirically that the learned metrics can be used to significantly improve clustering performance. £ ¤¢ £ ¥¢
4 0.50994682 117 nips-2002-Intrinsic Dimension Estimation Using Packing Numbers
Author: Balázs Kégl
Abstract: We propose a new algorithm to estimate the intrinsic dimension of data sets. The method is based on geometric properties of the data and requires neither parametric assumptions on the data generating model nor input parameters to set. The method is compared to a similar, widelyused algorithm from the same family of geometric techniques. Experiments show that our method is more robust in terms of the data generating distribution and more reliable in the presence of noise. 1
5 0.49018842 158 nips-2002-One-Class LP Classifiers for Dissimilarity Representations
Author: Elzbieta Pekalska, David Tax, Robert Duin
Abstract: Problems in which abnormal or novel situations should be detected can be approached by describing the domain of the class of typical examples. These applications come from the areas of machine diagnostics, fault detection, illness identification or, in principle, refer to any problem where little knowledge is available outside the typical class. In this paper we explain why proximities are natural representations for domain descriptors and we propose a simple one-class classifier for dissimilarity representations. By the use of linear programming an efficient one-class description can be found, based on a small number of prototype objects. This classifier can be made (1) more robust by transforming the dissimilarities and (2) cheaper to compute by using a reduced representation set. Finally, a comparison to a comparable one-class classifier by Campbell and Bennett is given.
6 0.47226727 80 nips-2002-Exact MAP Estimates by (Hyper)tree Agreement
7 0.40855911 190 nips-2002-Stochastic Neighbor Embedding
8 0.3955828 131 nips-2002-Learning to Classify Galaxy Shapes Using the EM Algorithm
9 0.38149995 98 nips-2002-Going Metric: Denoising Pairwise Data
10 0.37074447 124 nips-2002-Learning Graphical Models with Mercer Kernels
11 0.36842459 107 nips-2002-Identity Uncertainty and Citation Matching
12 0.35436919 49 nips-2002-Charting a Manifold
13 0.33917484 203 nips-2002-Using Tarjan's Red Rule for Fast Dependency Tree Construction
14 0.30223835 179 nips-2002-Scaling of Probability-Based Optimization Algorithms
15 0.29340094 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs
16 0.2918739 113 nips-2002-Information Diffusion Kernels
17 0.27985111 5 nips-2002-A Digital Antennal Lobe for Pattern Equalization: Analysis and Design
18 0.27322394 27 nips-2002-An Impossibility Theorem for Clustering
19 0.2716659 174 nips-2002-Regularized Greedy Importance Sampling
20 0.26011613 22 nips-2002-Adaptive Nonlinear System Identification with Echo State Networks
topicId topicWeight
[(11, 0.036), (23, 0.015), (42, 0.097), (43, 0.268), (54, 0.19), (55, 0.035), (67, 0.011), (68, 0.031), (74, 0.078), (92, 0.024), (98, 0.106)]
simIndex simValue paperId paperTitle
1 0.91819459 42 nips-2002-Bias-Optimal Incremental Problem Solving
Author: Jürgen Schmidhuber
Abstract: Given is a problem sequence and a probability distribution (the bias) on programs computing solution candidates. We present an optimally fast way of incrementally solving each task in the sequence. Bias shifts are computed by program prefixes that modify the distribution on their suffixes by reusing successful code for previous tasks (stored in non-modifiable memory). No tested program gets more runtime than its probability times the total search time. In illustrative experiments, ours becomes the first general system to learn a universal solver for arbitrary disk Towers of Hanoi tasks (minimal solution size ). It demonstrates the advantages of incremental learning by profiting from previously solved, simpler tasks involving samples of a simple context free language. ¦ ¤ ¢ §¥£¡ 1 Brief Introduction to Optimal Universal Search Consider an asymptotically optimal method for tasks with quickly verifiable solutions: ¦ ¦ © £ £¨ © © © © ¦ ¦ ¦ Method 1.1 (L SEARCH ) View the -th binary string as a potential program for a universal Turing machine. Given some problem, for all do: every steps on average execute (if possible) one instruction of the -th program candidate, until one of the programs has computed a solution. ! © © © ¢
same-paper 2 0.8296771 34 nips-2002-Artefactual Structure from Least-Squares Multidimensional Scaling
Author: Nicholas P. Hughes, David Lowe
Abstract: We consider the problem of illusory or artefactual structure from the visualisation of high-dimensional structureless data. In particular we examine the role of the distance metric in the use of topographic mappings based on the statistical field of multidimensional scaling. We show that the use of a squared Euclidean metric (i.e. the SS TRESS measure) gives rise to an annular structure when the input data is drawn from a highdimensional isotropic distribution, and we provide a theoretical justification for this observation.
3 0.68196535 190 nips-2002-Stochastic Neighbor Embedding
Author: Geoffrey E. Hinton, Sam T. Roweis
Abstract: We describe a probabilistic approach to the task of placing objects, described by high-dimensional vectors or by pairwise dissimilarities, in a low-dimensional space in a way that preserves neighbor identities. A Gaussian is centered on each object in the high-dimensional space and the densities under this Gaussian (or the given dissimilarities) are used to define a probability distribution over all the potential neighbors of the object. The aim of the embedding is to approximate this distribution as well as possible when the same operation is performed on the low-dimensional “images” of the objects. A natural cost function is a sum of Kullback-Leibler divergences, one per object, which leads to a simple gradient for adjusting the positions of the low-dimensional images. Unlike other dimensionality reduction methods, this probabilistic framework makes it easy to represent each object by a mixture of widely separated low-dimensional images. This allows ambiguous objects, like the document count vector for the word “bank”, to have versions close to the images of both “river” and “finance” without forcing the images of outdoor concepts to be located close to those of corporate concepts.
4 0.68109965 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
Author: Max Welling, Simon Osindero, Geoffrey E. Hinton
Abstract: We propose a model for natural images in which the probability of an image is proportional to the product of the probabilities of some filter outputs. We encourage the system to find sparse features by using a Studentt distribution to model each filter output. If the t-distribution is used to model the combined outputs of sets of neurally adjacent filters, the system learns a topographic map in which the orientation, spatial frequency and location of the filters change smoothly across the map. Even though maximum likelihood learning is intractable in our model, the product form allows a relatively efficient learning procedure that works well even for highly overcomplete sets of filters. Once the model has been learned it can be used as a prior to derive the “iterated Wiener filter” for the purpose of denoising images.
5 0.6809898 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
Author: Olivier Chapelle, Jason Weston, Bernhard SchĂślkopf
Abstract: We propose a framework to incorporate unlabeled data in kernel classifier, based on the idea that two points in the same cluster are more likely to have the same label. This is achieved by modifying the eigenspectrum of the kernel matrix. Experimental results assess the validity of this approach. 1
6 0.67791802 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers
7 0.67569834 169 nips-2002-Real-Time Particle Filters
8 0.67565042 119 nips-2002-Kernel Dependency Estimation
9 0.67485946 21 nips-2002-Adaptive Classification by Variational Kalman Filtering
10 0.67381251 53 nips-2002-Clustering with the Fisher Score
11 0.67341876 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs
12 0.67322135 124 nips-2002-Learning Graphical Models with Mercer Kernels
13 0.67316735 14 nips-2002-A Probabilistic Approach to Single Channel Blind Signal Separation
14 0.6725142 46 nips-2002-Boosting Density Estimation
15 0.67132127 3 nips-2002-A Convergent Form of Approximate Policy Iteration
16 0.6687004 70 nips-2002-Distance Metric Learning with Application to Clustering with Side-Information
17 0.66846573 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation
18 0.66840059 65 nips-2002-Derivative Observations in Gaussian Process Models of Dynamic Systems
19 0.66769803 106 nips-2002-Hyperkernels
20 0.66681218 120 nips-2002-Kernel Design Using Boosting