jmlr jmlr2010 jmlr2010-49 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Miloš Radovanović, Alexandros Nanopoulos, Mirjana Ivanović
Abstract: Different aspects of the curse of dimensionality are known to present serious challenges to various machine-learning methods and tasks. This paper explores a new aspect of the dimensionality curse, referred to as hubness, that affects the distribution of k-occurrences: the number of times a point appears among the k nearest neighbors of other points in a data set. Through theoretical and empirical analysis involving synthetic and real data sets we show that under commonly used assumptions this distribution becomes considerably skewed as dimensionality increases, causing the emergence of hubs, that is, points with very high k-occurrences which effectively represent “popular” nearest neighbors. We examine the origins of this phenomenon, showing that it is an inherent property of data distributions in high-dimensional vector space, discuss its interaction with dimensionality reduction, and explore its influence on a wide range of machine-learning tasks directly or indirectly based on measuring distances, belonging to supervised, semi-supervised, and unsupervised learning families. Keywords: nearest neighbors, curse of dimensionality, classification, semi-supervised learning, clustering
Reference: text
sentIndex sentText sentNum sentScore
1 Unlike distance concentration, hubness and its influence on machine learning have not been explored in depth. [sent-34, score-0.562]
2 As will be described in Section 4, the phenomena of distance concentration and hubness are related, but distinct. [sent-36, score-0.591]
3 (1983) and Newman and Rinott (1985), is that the hubness phenomenon is an inherent property of data distributions in high-dimensional space under widely used assumptions, and not an artefact of a finite sample or specific properties of a particular data set. [sent-44, score-0.544]
4 Moreover, the hubness phenomenon recently started to be observed in application fields like music retrieval (Aucouturier and Pachet, 2007), speech recognition (Doddington et al. [sent-46, score-0.541]
5 First, we demonstrate the emergence of hubness on synthetic and real data in Section 3. [sent-51, score-0.491]
6 The following section provides a comprehensive explanation of the origins of the phenomenon, through empirical and theoretical analysis of artificial data distributions, as well as observations on a large collection of real data sets, linking hubness with the intrinsic dimensionality of data. [sent-52, score-0.696]
7 Section 7 explores the impact of hubness on common supervised, semi-supervised, and unsupervised machine-learning algorithms, showing that the information provided by hubness can be used to significantly affect the success of the generated models. [sent-55, score-0.936]
8 Related Work The hubness phenomenon has been recently observed in several application areas involving sound and image data (Aucouturier and Pachet, 2007; Doddington et al. [sent-58, score-0.516]
9 (2009) briefly mention hubness in the context of graph construction for semi-supervised learning. [sent-62, score-0.468]
10 In addition, there have been attempts to avoid the influence of hubs in 1-NN time-series classification, apparently without clear awareness about the existence of the phenomenon (Islam et al. [sent-63, score-0.574]
11 None of the mentioned papers, however, successfully analyze the causes of hubness or generalize it to other applications. [sent-66, score-0.468]
12 All these results imply that no hubness is to be expected within the settings in question. [sent-73, score-0.468]
13 It is worth noting that in ε-neighborhood graphs, that is, graphs where two points are connected if the distance between them is less than a given limit ε, the hubness phenomenon does not occur. [sent-87, score-0.687]
14 In this section we will empirically demonstrate the emergence of hubness through increasing skewness of the distribution of Nk on synthetic (Section 3. [sent-115, score-0.614]
15 In most practical settings, however, such situations are not expected, and a thorough discussion of the necessary conditions for hubness to occur in high dimensions will be given in Section 5. [sent-183, score-0.509]
16 2 Hubness in Real Data To illustrate the hubness phenomenon on real data, let us consider the empirical distribution of Nk (k = 10) for three real data sets, given in Figure 2. [sent-186, score-0.516]
17 It can be observed that some SNk values in Table 1 are quite high, indicating strong hubness in the corresponding data sets. [sent-211, score-0.468]
18 62), signifying that the relationship between dimensionality and hubness extends from synthetic to real data in general. [sent-213, score-0.587]
19 On the other hand, careful scrutiny of the charts in Figure 2 and SNk values in Table 1 reveals that for real data the impact of dimensionality on hubness may not be as strong as could be expected after viewing hubness on synthetic data in Figure 1. [sent-214, score-1.055]
20 The Origins of Hubness This section moves on to exploring the causes of hubness and the mechanisms through which hubs emerge. [sent-650, score-0.994]
21 2 explains the mechanism through which hubs emerge as points in high-dimensional space that become closer to other points than their low-dimensional counterparts, outlining our main theoretical result. [sent-654, score-0.714]
22 The emergence of hubness in real data is studied in Section 4. [sent-655, score-0.491]
23 4 discusses hubs and their opposites—antihubs—and the relationships between hubs, antihubs, and different notions of outliers. [sent-657, score-0.526]
24 We made analogous observations with other values of k, and combinations of data distributions and distance measures for which hubness occurs. [sent-669, score-0.59]
25 It is important to note that proximity to one global data-set mean correlates with hubness in high dimensions when the underlying data distribution is unimodal. [sent-670, score-0.536]
26 For multimodal data distributions, for example those obtained through a mixture of unimodal distributions, hubs tend to appear close to the means of individual component distributions of the mixture. [sent-671, score-0.554]
27 To understand why points closer to the data mean become hubs in high dimensions, let us consider the following example. [sent-735, score-0.664]
28 normal data with specific distances from the origin expressed in terms of expected distance and deviation from it, and tracked the analogues of the two points for increasing values of dimensionality d. [sent-779, score-0.488]
29 The next section takes a closer look at the hubness phenomenon in real data sets. [sent-807, score-0.55]
30 3 Hubness in Real Data Results describing the origins of hubness given in the previous sections were obtained by examining data sets that follow specific distributions and generated as i. [sent-809, score-0.537]
31 For the vast majority of high-dimensional S data sets, SNk is considerably higher than SNk , indicating that hubness actually depends on the intrinsic rather than embedding dimensionality. [sent-820, score-0.536]
32 This provides an explanation for the apparent weaker influence of d on hubness in real data than in synthetic data sets, which was observed in Section 3. [sent-821, score-0.468]
33 Consequently, in real data, hubs tend to be closer than other points to their respective cluster centers (which we verified by examining the individual scatter plots). [sent-826, score-0.671]
34 Moreover, intrinsic dimensionality positively affects the correlations between Nk and the distance to the dataset mean / closest cluster mean, implying that in higher (intrinsic) dimensions the positions of hubs become increasingly localized to the proximity of centers. [sent-830, score-0.909]
35 Section 6, which discusses the interaction of hubness with dimensionality reduction, will provide even more support to the observation that hubness depends on intrinsic, rather than embedding dimensionality. [sent-831, score-1.055]
36 To further support the connection between antihubs and distance-based outliers, let us consider a common outlier score of a point as the distance from its kth nearest neighbor (Tan et al. [sent-870, score-0.47]
37 dimensions hubs actually are outliers, as points closer to the distribution (or cluster) mean than the majority of other points. [sent-889, score-0.705]
38 Therefore, somewhat counterintuitively, it can be said that hubs are points that reside in low-density regions of the high-dimensional data space, and are at the same time close to many other points. [sent-890, score-0.603]
39 Proofs and Discussion This section is predominantly devoted to the theoretical aspects of the behavior of distances in high-dimensional space and the hubness phenomenon. [sent-898, score-0.587]
40 On the other hand, for higher values of d hubness may or may not occur, and the geometric bounds, besides providing “room” for hubness (even for values of k as low as 1) do not contribute much in fully characterizing the hubness phenomenon. [sent-1087, score-1.404]
41 Although it does not directly fit into the framework of Theorem 12, the same principle can be used to explain the absence of hubness for normally distributed data and cosine distance from Section 3. [sent-1151, score-0.562]
42 of points (c) Figure 8: (a) Out-link, and (b) in-link densities of groups of hubs with increasing size; (c) ratio of the number of in-links originating from points within the group and in-links originating from outside points, for i. [sent-1180, score-0.68]
43 Figure 8(a, b) shows the out-link and in-link densities of groups of strongest hubs in i. [sent-1202, score-0.526]
44 It can be seen in Figure 8(a) that hubs are more cohesive in high dimensions, with more of their out-links leading to other hubs. [sent-1206, score-0.526]
45 On the other hand, Figure 8(b) suggests that hubs also receive more in-links from non-hub points in high dimensions than in low dimensions. [sent-1207, score-0.644]
46 Moreover, Figure 8(c), which plots the ratio of the number of in-links that originate within the group, and the number of in-links which originate outside, shows that hubs receive a larger proportion of in-links from non-hub points in high dimensions than in low dimensions. [sent-1208, score-0.714]
47 Overall, it can be said that in high dimensions hubs receive more in-links than in low dimensions from both hubs and non-hubs, and that the range of influence of hubs gradually widens as dimensionality increases. [sent-1210, score-1.779]
48 We can therefore conclude that the transition of hubness from low to high dimensionalities is “smooth,” both in the sense of the change in the overall distribution of Nk , and the change in the degree of influence of data points, as expressed by the above analysis of links. [sent-1211, score-0.493]
49 2510 H UBS IN S PACE So far, we have viewed hubs primarily through their exhibited high values of Nk , that is, high degree centrality in the k-NN directed graph. [sent-1212, score-0.592]
50 21 This indicates that real data sets which exhibit strong skewness in the distribution of N10 also tend to have strong correlation between N10 and betweenness centrality of nodes, giving hubs a broader significance for the structure of the k-NN graphs. [sent-1228, score-0.775]
51 , 1983; Newman and Rinott, 1985), citing empirically observed slow convergence (Maloney, 1983), even to the extent of not observing significant differences between hubness in the Poisson process and i. [sent-1232, score-0.468]
52 However, results in the preceding sections of this paper suggest that this convergence is fast enough to produce notable hubness in high-dimensional data. [sent-1236, score-0.468]
53 We leave a more detailed and general investigation of the interaction between hubness and dimensionality reduction as a point of future work. [sent-1320, score-0.587]
54 The Impact of Hubness on Machine Learning The impact of hubness on machine-learning applications has not been thoroughly investigated so far. [sent-1367, score-0.468]
55 Our main objective is to demonstrate that hubs (as well as their opposites, antihubs) can have a significant effect on these methods. [sent-1372, score-0.526]
56 The presented results highlight the need to take hubs into account in a way equivalent to other factors, such as the existence of outliers, the role of which has been well studied. [sent-1373, score-0.526]
57 1 Supervised Learning To investigate possible implications of hubness on supervised learning, we first study the interaction of k-occurrences with information provided by labels (Section 7. [sent-1375, score-0.468]
58 We then move on to examine the effects of hubness on several well-known classification algorithms in Section 7. [sent-1378, score-0.468]
59 To understand the origins of “bad” hubs in real data, we rely on the notion of the cluster assumption from semi-supervised learning (Chapelle et al. [sent-1394, score-0.601]
60 2 I NFLUENCE ON C LASSIFICATION A LGORITHMS We now examine how skewness of Nk and the existence of (“bad”) hubs affects well-known classification techniques, focusing on the k-nearest neighbor classifier (k-NN), support vector machines (SVM), and AdaBoost. [sent-1412, score-0.718]
61 For each point x, we compute its standardized “bad” hubness score: hB (x, k) = BN k (x) − µBN k , σBN k where µBN k and σBN k are the mean and standard deviation of BN k , respectively. [sent-1419, score-0.495]
62 During majority voting in the k-NN classification phase, when point x participates in the k-NN list of the point being classified, the vote of x is weighted by wk (x) = exp(−hB (x, k)), thus lowering the influence of “bad” hubs on the classification decision. [sent-1420, score-0.526]
63 Although “bad” hubs tend to carry more information about the location of class boundaries than other points, the “model” created by the k-NN classifier places the emphasis on describing non-borderline regions of the space occupied by each class. [sent-1425, score-0.526]
64 For this reason, it can be said that “bad” hubs are truly bad for k-NN classification, creating the need to penalize their influence on the classification decision. [sent-1426, score-0.583]
65 On the other hand, for classifiers that explicitly model the borders between classes, such as support vector machines, “bad” hubs can represent points which contribute information to the model in a positive way, as will be discussed next. [sent-1427, score-0.603]
66 22 To examine the influence of “bad” hubs on SVMs, Figure 12 illustrates 10-fold crossvalidation accuracy results of SVM trained using sequential minimal optimization (Platt, 1999; Keerthi et al. [sent-1440, score-0.526]
67 Accuracy drops with removal by BN k , indicating that bad hubs are important for support vector machines. [sent-1442, score-0.583]
68 24 We define for each point x its standardized hubness score: Nk (x) − µNk , (24) h(x, k) = σNk where µNk , σNk are the mean and standard deviation of Nk , respectively. [sent-1497, score-0.495]
69 The motivation behind the weighting scheme is to assign less importance to both hubs and outliers than other points (this is why we take the absolute value of h(x, k)). [sent-1499, score-0.68]
70 , 2001), improved accuracy suggests that hubs should a be regarded in an analogous manner, that is, both hubs and antihubs are intrinsically more difficult to classify correctly, and the attention of the weak learners should initially be focused on “regular” points. [sent-1507, score-1.208]
71 4, about hubs corresponding to probabilistic outliers in high-dimensional data, offers an explanation for the observed good performance of the weighting scheme, as both hubs and antihubs can be regarded as (probabilistic) outliers. [sent-1509, score-1.285]
72 It illustrates how in earlier phases of ensemble training the generalization power with hubs and/or antihubs is worse than with regular points. [sent-1511, score-0.682]
73 Moreover, for the considered data sets it is actually the hubs that appear to cause more problems for AdaBoost than antihubs (that is, distance-based outliers). [sent-1512, score-0.682]
74 Therefore, following an approach that resembles active learning, selecting the initial labeled point set from hubs could be more beneficial in terms of classification accuracy than arbitrarily selecting the initial points to be labeled. [sent-1524, score-0.603]
75 2520 H UBS IN S PACE mfeat−factors mfeat−fourier 80 70 N hubs sel. [sent-1537, score-0.526]
76 5 2 3 4 5 6 7 8 9 0 10 1 2 3 4 5 6 7 8 9 40 10 5 1 2 3 4 60 N hubs sel. [sent-1541, score-0.526]
77 5 50 6 7 8 9 10 Labeled points (%) (d) 8 9 10 vehicle 70 60 50 N hubs sel. [sent-1543, score-0.63]
78 1 Accurracy (%) 90 50 optdigits 80 Accurracy (%) Accurracy (%) 100 2 3 4 5 6 7 Labeled points (%) (e) 8 50 45 40 N hubs sel. [sent-1555, score-0.643]
79 3 Unsupervised Learning This section will discuss the interaction of the hubness phenomenon with unsupervised learning, specifically the tasks of clustering (Section 7. [sent-1574, score-0.547]
80 Like outliers, hubs do not cluster well, but for a different reason: they have low inter-cluster distance, because they are close to many points, thus also to points from other clusters. [sent-1596, score-0.637]
81 In contrast to outliers, the influence of hubs on clustering has not attracted significant attention. [sent-1597, score-0.557]
82 We select as hubs those points x with h(x, k) > 2, that is, Nk (x) more than two standard deviations higher than the mean (note that h(x, k), defined by Equation 24, ignores labels). [sent-1606, score-0.63]
83 To compare hubs and antihubs against random points, we measure the relative SC of hubs (antihubs): the mean SC of hubs (antihubs) divided by the mean SC of random points. [sent-1610, score-1.788]
84 27 To gain further insight, Figure 16 plots with lines (referring to the right vertical axes) the relative mean values of ai and bi for hubs and outliers (dividing with those of randomly selected points). [sent-1614, score-0.63]
85 In conclusion, when clustering high-dimensional data, hubs should receive analogous attention as outliers. [sent-1617, score-0.557]
86 2 O UTLIER D ETECTION This section will briefly discuss possible implications of high dimensionality on distance-based outlier detection, in light of the findings concerning the hubness phenomenon presented in previous sections. [sent-1620, score-0.689]
87 The statistical significance of differences between the SC of hubs and randomly selected points has been verified with the paired t-test at 0. [sent-1625, score-0.603]
88 2522 H UBS IN S PACE Figure 16: Relative silhouette coefficients for hubs (gray filled bars) and outliers (empty bars). [sent-1627, score-0.626]
89 Based on the observations regarding hubness and the behavior of distances discussed earlier, we believe that the true problem actually lies in the opposite extreme: high dimensionality induces antihubs that can represent “artificial” outliers. [sent-1631, score-0.862]
90 This is because, from the point of view of common distance-based outlier scoring schemes, antihubs may appear to be stronger outliers in high dimensions than in low dimensions, only due to the effects of increasing dimensionality of data. [sent-1632, score-0.447]
91 We also discussed the interaction of hubness with dimensionality reduction. [sent-1660, score-0.587]
92 , 2005), identifying hubness within data and methods from other fields can be considered an important aspect of future work, as well as designing application-specific methods to mitigate or take advantage of the phenomenon. [sent-1664, score-0.468]
93 We already established the existence of hubness and its dependence on data dimensionality on collaborative filtering data with commonly used variants of cosine distance (Nanopoulos et al. [sent-1665, score-0.681]
94 In the immediate future we plan to perform a more detailed investigation of hubness in the fields of outlier detection and image mining. [sent-1670, score-0.522]
95 Another application area that could directly benefit from an investigation into hubness are reverse k-NN queries (which retrieve data points that have the query point q as one of their k nearest neighbors, Tao et al. [sent-1671, score-0.642]
96 , 2009) and hubness, in both directions: to what degree do approximate k-NN graphs preserve hubness information, and can hubness information be used to enhance the computation of approximate k-NN graphs for high-dimensional data (in terms of both speed and accuracy). [sent-1675, score-0.936]
97 Since we determined that for K-means clustering of high-dimensional data hubs tend to be close to cluster centers, it would be interesting to explore whether this can be used to improve iterative clustering algorithms, like K-means or self-organizing maps (Kohonen, 2001). [sent-1680, score-0.622]
98 Nearest-neighbor clustering (Bubeck and von Luxburg, 2009) of high-dimensional data may also directly benefit from hubness information. [sent-1681, score-0.499]
99 Topics that could also be worth further study are the interplay of hubness with learned metrics (Weinberger and Saul, 2009) and dimensionality reduction, including supervised (Vlassis et al. [sent-1682, score-0.587]
100 Finally, as we determined high correlation between intrinsic dimensionality and the skewness of Nk , it would be interesting to see whether some measure of skewness of the distribution of Nk can be used for estimation of the intrinsic dimensionality of a data set. [sent-1687, score-0.655]
wordName wordTfidf (topN-words)
[('hubs', 0.526), ('hubness', 0.468), ('nanopoulos', 0.191), ('nk', 0.174), ('accurracy', 0.173), ('stan', 0.16), ('antihubs', 0.156), ('adovanovi', 0.127), ('ubs', 0.127), ('vanovi', 0.127), ('skewness', 0.123), ('dimensionality', 0.119), ('distances', 0.119), ('nearest', 0.097), ('distance', 0.094), ('uci', 0.092), ('francois', 0.085), ('radovanovi', 0.081), ('outliers', 0.077), ('points', 0.077), ('pace', 0.075), ('mfeat', 0.075), ('neighbor', 0.069), ('snk', 0.069), ('intrinsic', 0.068), ('centrality', 0.066), ('bn', 0.064), ('unweighted', 0.062), ('tversky', 0.059), ('ivanovi', 0.058), ('laguerre', 0.058), ('rinott', 0.058), ('spectrometer', 0.058), ('bad', 0.057), ('chi', 0.057), ('outlier', 0.054), ('xd', 0.054), ('zd', 0.053), ('aggarwal', 0.052), ('cdm', 0.052), ('milo', 0.052), ('mirjana', 0.052), ('iid', 0.051), ('noncentral', 0.049), ('phenomenon', 0.048), ('poisson', 0.047), ('alexandros', 0.046), ('cav', 0.046), ('origin', 0.046), ('origins', 0.041), ('dimensions', 0.041), ('ccm', 0.04), ('noncentrality', 0.04), ('optdigits', 0.04), ('cos', 0.04), ('newman', 0.04), ('neighbors', 0.04), ('cube', 0.04), ('spearman', 0.036), ('correlation', 0.035), ('cbn', 0.035), ('originate', 0.035), ('curse', 0.034), ('closer', 0.034), ('cluster', 0.034), ('euclidean', 0.034), ('normal', 0.033), ('bins', 0.032), ('lim', 0.032), ('clustering', 0.031), ('hypercubes', 0.03), ('sne', 0.03), ('adaboost', 0.029), ('aucouturier', 0.029), ('cbc', 0.029), ('dmle', 0.029), ('kr', 0.029), ('concentration', 0.029), ('distributions', 0.028), ('freedom', 0.028), ('decreasing', 0.028), ('mean', 0.027), ('vehicle', 0.027), ('rem', 0.027), ('diffusion', 0.026), ('betweenness', 0.025), ('dimensionalities', 0.025), ('tan', 0.025), ('music', 0.025), ('beyer', 0.023), ('doddington', 0.023), ('hicklin', 0.023), ('hinneburg', 0.023), ('limd', 0.023), ('novi', 0.023), ('pachet', 0.023), ('silhouette', 0.023), ('isomap', 0.023), ('emergence', 0.023), ('degrees', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
Author: Miloš Radovanović, Alexandros Nanopoulos, Mirjana Ivanović
Abstract: Different aspects of the curse of dimensionality are known to present serious challenges to various machine-learning methods and tasks. This paper explores a new aspect of the dimensionality curse, referred to as hubness, that affects the distribution of k-occurrences: the number of times a point appears among the k nearest neighbors of other points in a data set. Through theoretical and empirical analysis involving synthetic and real data sets we show that under commonly used assumptions this distribution becomes considerably skewed as dimensionality increases, causing the emergence of hubs, that is, points with very high k-occurrences which effectively represent “popular” nearest neighbors. We examine the origins of this phenomenon, showing that it is an inherent property of data distributions in high-dimensional vector space, discuss its interaction with dimensionality reduction, and explore its influence on a wide range of machine-learning tasks directly or indirectly based on measuring distances, belonging to supervised, semi-supervised, and unsupervised learning families. Keywords: nearest neighbors, curse of dimensionality, classification, semi-supervised learning, clustering
2 0.07683821 30 jmlr-2010-Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting
Author: Philippos Mordohai, Gérard Medioni
Abstract: We address instance-based learning from a perceptual organization standpoint and present methods for dimensionality estimation, manifold learning and function approximation. Under our approach, manifolds in high-dimensional spaces are inferred by estimating geometric relationships among the input instances. Unlike conventional manifold learning, we do not perform dimensionality reduction, but instead perform all operations in the original input space. For this purpose we employ a novel formulation of tensor voting, which allows an N-D implementation. Tensor voting is a perceptual organization framework that has mostly been applied to computer vision problems. Analyzing the estimated local structure at the inputs, we are able to obtain reliable dimensionality estimates at each instance, instead of a global estimate for the entire data set. Moreover, these local dimensionality and structure estimates enable us to measure geodesic distances and perform nonlinear interpolation for data sets with varying density, outliers, perturbation and intersections, that cannot be handled by state-of-the-art methods. Quantitative results on the estimation of local manifold structure using ground truth data are presented. In addition, we compare our approach with several leading methods for manifold learning at the task of measuring geodesic distances. Finally, we show competitive function approximation results on real data. Keywords: dimensionality estimation, manifold learning, geodesic distance, function approximation, high-dimensional processing, tensor voting
3 0.069220006 95 jmlr-2010-Rademacher Complexities and Bounding the Excess Risk in Active Learning
Author: Vladimir Koltchinskii
Abstract: Sequential algorithms of active learning based on the estimation of the level sets of the empirical risk are discussed in the paper. Localized Rademacher complexities are used in the algorithms to estimate the sample sizes needed to achieve the required accuracy of learning in an adaptive way. Probabilistic bounds on the number of active examples have been proved and several applications to binary classification problems are considered. Keywords: active learning, excess risk, Rademacher complexities, capacity function, disagreement coefficient
4 0.053940024 54 jmlr-2010-Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization
Author: Jarkko Venna, Jaakko Peltonen, Kristian Nybo, Helena Aidos, Samuel Kaski
Abstract: Nonlinear dimensionality reduction methods are often used to visualize high-dimensional data, although the existing methods have been designed for other related tasks such as manifold learning. It has been difficult to assess the quality of visualizations since the task has not been well-defined. We give a rigorous definition for a specific visualization task, resulting in quantifiable goodness measures and new visualization methods. The task is information retrieval given the visualization: to find similar data based on the similarities shown on the display. The fundamental tradeoff between precision and recall of information retrieval can then be quantified in visualizations as well. The user needs to give the relative cost of missing similar points vs. retrieving dissimilar points, after which the total cost can be measured. We then introduce a new method NeRV (neighbor retrieval visualizer) which produces an optimal visualization by minimizing the cost. We further derive a variant for supervised visualization; class information is taken rigorously into account when computing the similarity relationships. We show empirically that the unsupervised version outperforms existing unsupervised dimensionality reduction methods in the visualization task, and the supervised version outperforms existing supervised methods. Keywords: information retrieval, manifold learning, multidimensional scaling, nonlinear dimensionality reduction, visualization
5 0.046421196 22 jmlr-2010-Classification Using Geometric Level Sets
Author: Kush R. Varshney, Alan S. Willsky
Abstract: A variational level set method is developed for the supervised classification problem. Nonlinear classifier decision boundaries are obtained by minimizing an energy functional that is composed of an empirical risk term with a margin-based loss and a geometric regularization term new to machine learning: the surface area of the decision boundary. This geometric level set classifier is analyzed in terms of consistency and complexity through the calculation of its ε-entropy. For multicategory classification, an efficient scheme is developed using a logarithmic number of decision functions in the number of classes rather than the typical linear number of decision functions. Geometric level set classification yields performance results on benchmark data sets that are competitive with well-established methods. Keywords: level set methods, nonlinear classification, geometric regularization, consistency, complexity
6 0.046003897 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond
7 0.043572642 29 jmlr-2010-Covariance in Unsupervised Learning of Probabilistic Grammars
8 0.043407798 90 jmlr-2010-Permutation Tests for Studying Classifier Performance
9 0.041603368 40 jmlr-2010-Fast and Scalable Local Kernel Machines
10 0.038814645 47 jmlr-2010-Hilbert Space Embeddings and Metrics on Probability Measures
11 0.037667677 42 jmlr-2010-Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data
12 0.036190636 62 jmlr-2010-Learning Gradients: Predictive Models that Infer Geometry and Statistical Dependence
13 0.036149599 55 jmlr-2010-Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance
14 0.034245722 10 jmlr-2010-An Exponential Model for Infinite Rankings
15 0.033780586 88 jmlr-2010-Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure
16 0.032952182 44 jmlr-2010-Graph Kernels
17 0.031132516 48 jmlr-2010-How to Explain Individual Classification Decisions
18 0.030143155 97 jmlr-2010-Regret Bounds and Minimax Policies under Partial Monitoring
19 0.030104298 63 jmlr-2010-Learning Instance-Specific Predictive Models
20 0.029761391 7 jmlr-2010-A Streaming Parallel Decision Tree Algorithm
topicId topicWeight
[(0, -0.156), (1, 0.002), (2, -0.023), (3, 0.059), (4, -0.054), (5, 0.04), (6, -0.035), (7, 0.04), (8, 0.015), (9, 0.052), (10, 0.029), (11, 0.095), (12, 0.038), (13, -0.044), (14, 0.02), (15, 0.005), (16, -0.239), (17, 0.039), (18, -0.015), (19, -0.132), (20, 0.032), (21, -0.078), (22, 0.117), (23, 0.09), (24, -0.058), (25, 0.041), (26, -0.103), (27, -0.021), (28, 0.031), (29, -0.126), (30, 0.108), (31, 0.057), (32, 0.131), (33, 0.013), (34, 0.149), (35, 0.044), (36, -0.152), (37, -0.136), (38, 0.089), (39, 0.046), (40, -0.077), (41, -0.039), (42, 0.08), (43, -0.042), (44, -0.053), (45, 0.103), (46, -0.163), (47, -0.067), (48, -0.171), (49, -0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.91624904 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
Author: Miloš Radovanović, Alexandros Nanopoulos, Mirjana Ivanović
Abstract: Different aspects of the curse of dimensionality are known to present serious challenges to various machine-learning methods and tasks. This paper explores a new aspect of the dimensionality curse, referred to as hubness, that affects the distribution of k-occurrences: the number of times a point appears among the k nearest neighbors of other points in a data set. Through theoretical and empirical analysis involving synthetic and real data sets we show that under commonly used assumptions this distribution becomes considerably skewed as dimensionality increases, causing the emergence of hubs, that is, points with very high k-occurrences which effectively represent “popular” nearest neighbors. We examine the origins of this phenomenon, showing that it is an inherent property of data distributions in high-dimensional vector space, discuss its interaction with dimensionality reduction, and explore its influence on a wide range of machine-learning tasks directly or indirectly based on measuring distances, belonging to supervised, semi-supervised, and unsupervised learning families. Keywords: nearest neighbors, curse of dimensionality, classification, semi-supervised learning, clustering
2 0.64249253 30 jmlr-2010-Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting
Author: Philippos Mordohai, Gérard Medioni
Abstract: We address instance-based learning from a perceptual organization standpoint and present methods for dimensionality estimation, manifold learning and function approximation. Under our approach, manifolds in high-dimensional spaces are inferred by estimating geometric relationships among the input instances. Unlike conventional manifold learning, we do not perform dimensionality reduction, but instead perform all operations in the original input space. For this purpose we employ a novel formulation of tensor voting, which allows an N-D implementation. Tensor voting is a perceptual organization framework that has mostly been applied to computer vision problems. Analyzing the estimated local structure at the inputs, we are able to obtain reliable dimensionality estimates at each instance, instead of a global estimate for the entire data set. Moreover, these local dimensionality and structure estimates enable us to measure geodesic distances and perform nonlinear interpolation for data sets with varying density, outliers, perturbation and intersections, that cannot be handled by state-of-the-art methods. Quantitative results on the estimation of local manifold structure using ground truth data are presented. In addition, we compare our approach with several leading methods for manifold learning at the task of measuring geodesic distances. Finally, we show competitive function approximation results on real data. Keywords: dimensionality estimation, manifold learning, geodesic distance, function approximation, high-dimensional processing, tensor voting
3 0.54765522 54 jmlr-2010-Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization
Author: Jarkko Venna, Jaakko Peltonen, Kristian Nybo, Helena Aidos, Samuel Kaski
Abstract: Nonlinear dimensionality reduction methods are often used to visualize high-dimensional data, although the existing methods have been designed for other related tasks such as manifold learning. It has been difficult to assess the quality of visualizations since the task has not been well-defined. We give a rigorous definition for a specific visualization task, resulting in quantifiable goodness measures and new visualization methods. The task is information retrieval given the visualization: to find similar data based on the similarities shown on the display. The fundamental tradeoff between precision and recall of information retrieval can then be quantified in visualizations as well. The user needs to give the relative cost of missing similar points vs. retrieving dissimilar points, after which the total cost can be measured. We then introduce a new method NeRV (neighbor retrieval visualizer) which produces an optimal visualization by minimizing the cost. We further derive a variant for supervised visualization; class information is taken rigorously into account when computing the similarity relationships. We show empirically that the unsupervised version outperforms existing unsupervised dimensionality reduction methods in the visualization task, and the supervised version outperforms existing supervised methods. Keywords: information retrieval, manifold learning, multidimensional scaling, nonlinear dimensionality reduction, visualization
4 0.40743494 88 jmlr-2010-Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure
Author: Kaname Kojima, Eric Perrier, Seiya Imoto, Satoru Miyano
Abstract: We study the problem of learning an optimal Bayesian network in a constrained search space; skeletons are compelled to be subgraphs of a given undirected graph called the super-structure. The previously derived constrained optimal search (COS) remains limited even for sparse superstructures. To extend its feasibility, we propose to divide the super-structure into several clusters and perform an optimal search on each of them. Further, to ensure acyclicity, we introduce the concept of ancestral constraints (ACs) and derive an optimal algorithm satisfying a given set of ACs. Finally, we theoretically derive the necessary and sufficient sets of ACs to be considered for finding an optimal constrained graph. Empirical evaluations demonstrate that our algorithm can learn optimal Bayesian networks for some graphs containing several hundreds of vertices, and even for super-structures having a high average degree (up to four), which is a drastic improvement in feasibility over the previous optimal algorithm. Learnt networks are shown to largely outperform state-of-the-art heuristic algorithms both in terms of score and structural hamming distance. Keywords: Bayesian networks, structure learning, constrained optimal search
5 0.34550583 22 jmlr-2010-Classification Using Geometric Level Sets
Author: Kush R. Varshney, Alan S. Willsky
Abstract: A variational level set method is developed for the supervised classification problem. Nonlinear classifier decision boundaries are obtained by minimizing an energy functional that is composed of an empirical risk term with a margin-based loss and a geometric regularization term new to machine learning: the surface area of the decision boundary. This geometric level set classifier is analyzed in terms of consistency and complexity through the calculation of its ε-entropy. For multicategory classification, an efficient scheme is developed using a logarithmic number of decision functions in the number of classes rather than the typical linear number of decision functions. Geometric level set classification yields performance results on benchmark data sets that are competitive with well-established methods. Keywords: level set methods, nonlinear classification, geometric regularization, consistency, complexity
6 0.3453286 95 jmlr-2010-Rademacher Complexities and Bounding the Excess Risk in Active Learning
7 0.31364581 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference
8 0.30298641 38 jmlr-2010-Expectation Truncation and the Benefits of Preselection In Training Generative Models
9 0.29472357 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers
10 0.27400857 40 jmlr-2010-Fast and Scalable Local Kernel Machines
11 0.26862788 97 jmlr-2010-Regret Bounds and Minimax Policies under Partial Monitoring
13 0.23243567 62 jmlr-2010-Learning Gradients: Predictive Models that Infer Geometry and Statistical Dependence
14 0.2303659 19 jmlr-2010-Characterization, Stability and Convergence of Hierarchical Clustering Methods
15 0.22854732 14 jmlr-2010-Approximate Riemannian Conjugate Gradient Learning for Fixed-Form Variational Bayes
16 0.21324705 85 jmlr-2010-On the Foundations of Noise-free Selective Classification
17 0.21312542 26 jmlr-2010-Consensus-Based Distributed Support Vector Machines
18 0.21247572 90 jmlr-2010-Permutation Tests for Studying Classifier Performance
19 0.19888632 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond
20 0.19785097 65 jmlr-2010-Learning Translation Invariant Kernels for Classification
topicId topicWeight
[(3, 0.028), (4, 0.025), (8, 0.015), (21, 0.027), (23, 0.341), (24, 0.015), (32, 0.055), (33, 0.016), (36, 0.035), (37, 0.048), (75, 0.116), (81, 0.061), (85, 0.078), (96, 0.011), (97, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.68446726 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
Author: Miloš Radovanović, Alexandros Nanopoulos, Mirjana Ivanović
Abstract: Different aspects of the curse of dimensionality are known to present serious challenges to various machine-learning methods and tasks. This paper explores a new aspect of the dimensionality curse, referred to as hubness, that affects the distribution of k-occurrences: the number of times a point appears among the k nearest neighbors of other points in a data set. Through theoretical and empirical analysis involving synthetic and real data sets we show that under commonly used assumptions this distribution becomes considerably skewed as dimensionality increases, causing the emergence of hubs, that is, points with very high k-occurrences which effectively represent “popular” nearest neighbors. We examine the origins of this phenomenon, showing that it is an inherent property of data distributions in high-dimensional vector space, discuss its interaction with dimensionality reduction, and explore its influence on a wide range of machine-learning tasks directly or indirectly based on measuring distances, belonging to supervised, semi-supervised, and unsupervised learning families. Keywords: nearest neighbors, curse of dimensionality, classification, semi-supervised learning, clustering
2 0.43312684 10 jmlr-2010-An Exponential Model for Infinite Rankings
Author: Marina Meilă, Le Bao
Abstract: This paper presents a statistical model for expressing preferences through rankings, when the number of alternatives (items to rank) is large. A human ranker will then typically rank only the most preferred items, and may not even examine the whole set of items, or know how many they are. Similarly, a user presented with the ranked output of a search engine, will only consider the highest ranked items. We model such situations by introducing a stagewise ranking model that operates with finite ordered lists called top-t orderings over an infinite space of items. We give algorithms to estimate this model from data, and demonstrate that it has sufficient statistics, being thus an exponential family model with continuous and discrete parameters. We describe its conjugate prior and other statistical properties. Then, we extend the estimation problem to multimodal data by introducing an Exponential-Blurring-Mean-Shift nonparametric clustering algorithm. The experiments highlight the properties of our model and demonstrate that infinite models over permutations can be simple, elegant and practical. Keywords: permutations, partial orderings, Mallows model, distance based ranking model, exponential family, non-parametric clustering, branch-and-bound
3 0.4317942 30 jmlr-2010-Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting
Author: Philippos Mordohai, Gérard Medioni
Abstract: We address instance-based learning from a perceptual organization standpoint and present methods for dimensionality estimation, manifold learning and function approximation. Under our approach, manifolds in high-dimensional spaces are inferred by estimating geometric relationships among the input instances. Unlike conventional manifold learning, we do not perform dimensionality reduction, but instead perform all operations in the original input space. For this purpose we employ a novel formulation of tensor voting, which allows an N-D implementation. Tensor voting is a perceptual organization framework that has mostly been applied to computer vision problems. Analyzing the estimated local structure at the inputs, we are able to obtain reliable dimensionality estimates at each instance, instead of a global estimate for the entire data set. Moreover, these local dimensionality and structure estimates enable us to measure geodesic distances and perform nonlinear interpolation for data sets with varying density, outliers, perturbation and intersections, that cannot be handled by state-of-the-art methods. Quantitative results on the estimation of local manifold structure using ground truth data are presented. In addition, we compare our approach with several leading methods for manifold learning at the task of measuring geodesic distances. Finally, we show competitive function approximation results on real data. Keywords: dimensionality estimation, manifold learning, geodesic distance, function approximation, high-dimensional processing, tensor voting
4 0.42533207 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels
Author: Pinar Donmez, Guy Lebanon, Krishnakumar Balasubramanian
Abstract: Estimating the error rates of classifiers or regression models is a fundamental task in machine learning which has thus far been studied exclusively using supervised learning techniques. We propose a novel unsupervised framework for estimating these error rates using only unlabeled data and mild assumptions. We prove consistency results for the framework and demonstrate its practical applicability on both synthetic and real world data. Keywords: classification and regression, maximum likelihood, latent variable models
5 0.42532799 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond
Author: Yevgeny Seldin, Naftali Tishby
Abstract: We derive PAC-Bayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as co-clustering, matrix tri-factorization, graphical models, graph clustering, and pairwise clustering.1 We begin with the analysis of co-clustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in co-occurrence matrices. We derive PAC-Bayesian generalization bounds for the expected out-of-sample performance of co-clustering-based solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of co-clustering. The bounds suggest that the expected performance of co-clustering is governed by a trade-off between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for finding a local optimum of this trade-off for discriminative prediction tasks. This algorithm achieved stateof-the-art performance in the MovieLens collaborative filtering task. Our co-clustering model can also be seen as matrix tri-factorization and the results provide generalization bounds, regularization terms, and new algorithms for this form of matrix factorization. The analysis of co-clustering is extended to tree-shaped graphical models, which can be used to analyze high dimensional tensors. According to the bounds, the generalization abilities of treeshaped graphical models depend on a trade-off between their empirical data fit and the mutual information that is propagated up the tree levels. We also formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. The analysis of co-clustering easily
6 0.42293325 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking
7 0.42116365 74 jmlr-2010-Maximum Relative Margin and Data-Dependent Regularization
8 0.4197779 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming
9 0.41833875 103 jmlr-2010-Sparse Semi-supervised Learning Using Conjugate Functions
10 0.4174616 102 jmlr-2010-Semi-Supervised Novelty Detection
11 0.41629228 18 jmlr-2010-Bundle Methods for Regularized Risk Minimization
12 0.41360798 92 jmlr-2010-Practical Approaches to Principal Component Analysis in the Presence of Missing Values
13 0.41238931 54 jmlr-2010-Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization
14 0.41154209 109 jmlr-2010-Stochastic Composite Likelihood
15 0.41064692 9 jmlr-2010-An Efficient Explanation of Individual Classifications using Game Theory
16 0.40971959 22 jmlr-2010-Classification Using Geometric Level Sets
17 0.40844333 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes
18 0.40780097 63 jmlr-2010-Learning Instance-Specific Predictive Models
20 0.40676948 105 jmlr-2010-Spectral Regularization Algorithms for Learning Large Incomplete Matrices