nips nips2010 nips2010-114 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tim Rogers, Chuck Kalish, Joseph Harrison, Xiaojin Zhu, Bryan R. Gibson
Abstract: When the distribution of unlabeled data in feature space lies along a manifold, the information it provides may be used by a learner to assist classification in a semi-supervised setting. While manifold learning is well-known in machine learning, the use of manifolds in human learning is largely unstudied. We perform a set of experiments which test a human’s ability to use a manifold in a semisupervised learning task, under varying conditions. We show that humans may be encouraged into using the manifold, overcoming the strong preference for a simple, axis-parallel linear boundary. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract When the distribution of unlabeled data in feature space lies along a manifold, the information it provides may be used by a learner to assist classification in a semi-supervised setting. [sent-7, score-0.201]
2 While manifold learning is well-known in machine learning, the use of manifolds in human learning is largely unstudied. [sent-8, score-0.388]
3 We perform a set of experiments which test a human’s ability to use a manifold in a semisupervised learning task, under varying conditions. [sent-9, score-0.273]
4 In addition, the learner is given some unlabeled items xl+1 , . [sent-20, score-0.287]
5 The question is: given this knowledge of labeled and unlabeled data, how will the learner classify xl+1 , . [sent-28, score-0.295]
6 Will the learner ignore the distribution information of the unlabeled data, and simply use the labeled data to form a decision boundary as in Figure 1(b)? [sent-32, score-0.334]
7 Or will the learner propagate labels along the nonlinear manifolds as in Figure 1(c)? [sent-33, score-0.191]
8 (a) the data (b) supervised learning (c) manifold learning Figure 1: On a dataset with manifold structure, supervised learning and manifold learning make dramatically different predictions. [sent-34, score-0.736]
9 The designer of the algorithm can choose to make the manifold assumption, also known as graph-based semi-supervised learning, which states that the labels vary slowly along the manifolds or the discrete graph formed by connecting nearby items. [sent-37, score-0.468]
10 The mathematics of manifold learning is wellunderstood [1, 6, 9, 10]. [sent-39, score-0.236]
11 Alternatively, the designer can choose to ignore the unlabeled data and perform supervised learning, which results in Figure 1(b). [sent-40, score-0.191]
12 Consider that the human learner does not directly see how the items are distributed in the feature space (such as Figure 1(a)), but only a set of items (such as those in Figure 2(a)). [sent-42, score-0.315]
13 The underlying manifold structure of the data may not be immediately obvious. [sent-43, score-0.236]
14 Thus there are many possibilities for how the human learner will behave: 1) They may completely ignore the manifold structure and perform supervised learning; 2) They may discover the manifold under some learning conditions and not others; or 3) They may always learn using the manifold. [sent-44, score-0.662]
15 For readers not familiar with manifold learning, the setting might seem artificial. [sent-45, score-0.236]
16 However, if we continuously change the viewing angle, these 2D images will form a one-dimensional manifold in a very high dimensional image space. [sent-49, score-0.236]
17 This example illustrates the importance of a manifold to facilitate learning: if we can form and maintain such a face manifold, then with a single label (e. [sent-50, score-0.284]
18 For example, text documents in a corpus occupy a potentially nonlinear manifold in the otherwise very high dimensional space used to represent them, such as the “bag of words” representation. [sent-55, score-0.236]
19 There exists little empirical evidence addressing the question of whether human beings can learn using manifolds when classifying objects, and the few studies we are aware of come to opposing conclusions. [sent-56, score-0.205]
20 When participants were shown such sequences during training, their ability to match frontal and profile faces during testing was impaired [8]. [sent-58, score-0.247]
21 This might be evidence that people depend on manifold structure stemming from temporal and spatial proximity to perform face recognition. [sent-59, score-0.383]
22 This study seeks to understand under what conditions, if any, people are capable of manifold learning in a semi-supervised setting. [sent-66, score-0.312]
23 Second, if there are reliable methods for encouraging manifold learning in people, these methods can be employed to aid learning in other domains that are structured along manifolds. [sent-68, score-0.254]
24 For machine learning, our study will help in the design of algorithms which can decide when to invoke the manifold learning assumption. [sent-69, score-0.236]
25 2 Human Manifold Learning Experiments We designed and conducted a set of experiments to study manifold learning in humans, with the following design considerations. [sent-70, score-0.236]
26 First, the task was a “batch learning” paradigm in which participants viewed all labeled and unlabeled items at once (in contrast to “online” or sequential learning paradigm where items appear one at a time). [sent-71, score-0.65]
27 Second, we avoided using faces or familiar 3D objects as stimuli, despite their natural manifold structures as discussed above, because we wished to avoid any bias resulting from strong prior real-world knowledge. [sent-73, score-0.236]
28 Instead, we used unfamiliar stimuli, from which we could add or remove a manifold structure easily. [sent-74, score-0.236]
29 Unlabeled cards were initially placed in a central white bin, with bins to 2 either side colored red and blue to indicate the two classes y ∈ {−1, 1}. [sent-79, score-0.301]
30 Participants sorted cards by clicking and dragging with a mouse. [sent-81, score-0.305]
31 When a card was clicked, other similar cards could be “highlighted” in gray (depending on condition). [sent-82, score-0.345]
32 Labeled cards were pinned down in their respective red or blue bins and could not be moved, indicated by a “pin” in the corner of the card. [sent-83, score-0.301]
33 The layout of the cards was such that all cards remained visible at all times. [sent-84, score-0.562]
34 Unlabeled cards could be re-categorized at any time by dragging from any bin to any other bin. [sent-85, score-0.305]
35 Upon sorting all cards, participants would click a button to indicating completion. [sent-86, score-0.266]
36 The first, used solely to acquaint the participants with the interface, consisted of a set of 20 cards with animal line drawings on a white background. [sent-88, score-0.505]
37 59) Figure 2: Experimental interface (with highlighting shown), and example crosshair stimuli. [sent-99, score-0.255]
38 The participant was asked to sort the set of 20 animal cards into two categories, with the two ends of the continuum (a clown fish and a dachshund) labeled. [sent-103, score-0.484]
39 Participants were told that when they clicked on a card, highlighting of similar cards might occur. [sent-104, score-0.548]
40 In reality, highlighting was always shown for the two nearest-neighboring cards (on the defined continuum) of a clicked card. [sent-105, score-0.525]
41 Importantly, we designed the dataset so that, near the middle of the continuum, cards from opposite biological classes would be highlighted together. [sent-106, score-0.312]
42 The intention was to indicate to the participant that highlighting is not always a clear give-away for class labels. [sent-108, score-0.342]
43 Task 2 asked the participant to sort a set of 82 crosshair cards into two categories. [sent-112, score-0.482]
44 The set of cards, the number of labeled cards, and the highlighting of cards depended on condition. [sent-113, score-0.565]
45 The participant was again told that some cards might be highlighted, whether the condition actually provided for highlighting or not. [sent-114, score-0.665]
46 The participant was also told that cards that shared highlighting may not all have the same classification. [sent-115, score-0.646]
47 Each of the 139 participants was randomly assigned to one of 6 conditions, shown in Figure 3, which varied according to three manipulations: The number of labeled items l can be 2 or 4 (2l vs. [sent-119, score-0.42]
48 For conditions with two labeled items, the labeled items are always (x1 , y1 = −1), (x2 , y2 = 1); with four labeled items, they are always (x1 , y1 = −1), (x2 , y2 = 1), (x3 , y3 = 1), (x4 , y4 = −1). [sent-121, score-0.418]
49 We chose these four labeled points by maximizing the prediction differences made by seven machine learning models, as discussed in the next section. [sent-126, score-0.216]
50 3 Unlabeled items are distributed on a uniform grid or manifolds (gridU vs. [sent-127, score-0.225]
51 For the moonsU conditions, the neighboring cards of any clicked card may be highlighted. [sent-136, score-0.412]
52 8 1 4l moonsU h 23 participants Figure 3: The six experimental conditions. [sent-190, score-0.224]
53 3 Model Predictions We hypothesize that human participants consider a set of models ranging from simple to sophisticated, and that they will perform model selection based on the training data given to them. [sent-193, score-0.304]
54 In particular, we find seven different kernels k to match GP classification to each of the seven model predictions on all 6 conditions. [sent-213, score-0.255]
55 That is, we extend a base RBF kernel k with a graph component: ˜ k(x, z) = k(x, z) − k⊤ (I + cLK)−1 cLkz (1) x where x, z are two arbitrary items (not necessarily on the graph), kx = (k(x, x1 ), . [sent-222, score-0.215]
56 xl+u in the graph, K is the (l + u) × (l + u) Gram matrix with Kij = k(xi , xj ), L is the unnormalized graph Laplacian matrix derived from unweighted edges on the ǫNN graph defined earlier for highlighting, and c is the parameter that we tune. [sent-228, score-0.17]
57 In the end, our seven GPs were able to exactly match the predictions made by the seven models in Figure 4. [sent-232, score-0.25]
58 We first consider the aggregate behavior for all participants within each condition. [sent-288, score-0.24]
59 One way to characterize this aggregate behavior is the “majority vote” of the participants on each item. [sent-289, score-0.24]
60 That is, if more than half of the participants classified an item as y = 1, the majority vote classification for that item is y = 1, and so on. [sent-290, score-0.315]
61 We also compute how well the seven GPs predict human majority votes. [sent-293, score-0.194]
62 (First row) the majority vote of participants within each condition. [sent-344, score-0.315]
63 task varied widely, with some participant simply categorizing cards in the order they appeared on the screen, while others took a much longer, studied approach. [sent-400, score-0.431]
64 Most interestingly, different participants seem to use different models, as the individual participant plots in the bottom three rows of Figure 5 suggest. [sent-401, score-0.374]
65 In order to do this, we compute per participant accuracies of the seven models on that participant’s classification. [sent-403, score-0.273]
66 75, we declare that the participant is potentially using model M ; otherwise no model is deemed a good fit and we say the participant is using some “other” model. [sent-406, score-0.3]
67 We show the proportion of participants in each condition attributed to each of our seven models, plus “other”, in Table 2. [sent-407, score-0.365]
68 67 Table 2: Percentage of participants potentially using each model Based on Figure 5, Table 1, and Table 2, we make some observations: 1. [sent-462, score-0.224]
69 When there are only two labeled points, the unlabeled distribution does not encourage humans to perform manifold learning (comparing 2l gridU vs. [sent-463, score-0.53]
70 With two labeled points, even if they are explicitly given the graph structure in the form of highlighting, participants still do not perform manifold learning (comparing 2l moonsU vs. [sent-468, score-0.645]
71 When there are four labeled points but no highlighting, the distribution of unlabeled data still does not encourage people to perform manifold learning (comparing 4l gridU vs. [sent-472, score-0.546]
72 This further suggests that people can not easily extract manifold structure from unlabeled data in order to learn, when there is no hint to do so. [sent-474, score-0.444]
73 However, most participants have given up the simple single vertical or horizontal decision boundary, because it contradicts with the four labeled points. [sent-475, score-0.413]
74 Finally, when we provide the graph structure, there is a marked switch to manifold learning (comparing 4l moonsU vs. [sent-477, score-0.314]
75 This suggests that a combination of the elimination of preferred, simpler hypotheses, together with a stronger graph hint, finally gives the originally less preferred manifold learning model a chance of being used. [sent-479, score-0.314]
76 It is under this condition that we observed human manifold learning behavior. [sent-480, score-0.305]
77 Because our graph has disconnected components with consistently labeled seeds, this procedure will succeed. [sent-485, score-0.17]
78 First, participants in 2l moonsU h did not learn the manifold while those in 4l moonsU h did, even though the two conditions have the same ǫNN highlighting. [sent-489, score-0.498]
79 Second, a necessary condition for follow-the-highlighting is to always classify an unlabeled x′ according to a labeled highlighted neighbor x. [sent-490, score-0.288]
80 Conversely, if a participant classifies x′ as class y ′ , while all neighbors of x′ are either still unlabeled or have labels other than y ′ , she could not have been using follow-the-highlighting on x′ . [sent-491, score-0.307]
81 The 4l moonsU h participants had an average of 17 leaps-of-faith among about 78 classifications3 , while strict follow-the-highlighting procedure would yield zero leaps-of-faith. [sent-493, score-0.224]
82 Third, the basic challenge of follow-the-highlighting is that the underlying manifold structure of the stimuli may have been irrelevant. [sent-494, score-0.278]
83 Would participants have shown the same behavior, following the highlighting, regardless of the actual stimuli? [sent-495, score-0.224]
84 Take the 4l moonsU h graph which has 4 labeled nodes, 78 unlabeled nodes, and an adjacency matrix (i. [sent-497, score-0.296]
85 This creates the random-looking graph in Figure 6(a) which we call 4l moonsU hR condition (the suffix R stands for random), which is equivalent to the 4l moonsU h graph in structure. [sent-505, score-0.175]
86 That is, having random highlighting is similar to having no highlighting in how it affects human categorization. [sent-514, score-0.434]
87 6 Discussion We have presented a set of experiments exploring human manifold learning behaviors. [sent-516, score-0.286]
88 Our results suggest that people can perform manifold learning, but only when there is no alternative, simpler explanation of the data, and people need strong hints about the graph structure. [sent-517, score-0.472]
89 4 In addition, if we create a GP from the Laplacian of the random highlighting graph, the GP accuracy in predicting 4l moonsU hR human majority vote is 0. [sent-523, score-0.333]
90 46, and the percentage of participants in 4l moonsU hR who can be attributed to this model is 0. [sent-524, score-0.238]
91 (b) The behavioral evaluation for 4l moonsU hR , where the x-axis is the shortest path length of an unlabeled point to a labeled point, and the y-axis is the fraction of participants who classified that unlabeled point consistent with the nearest labeled point. [sent-555, score-0.714]
92 only when strong hints (highlighting) exists and agrees with the underlying unlabeled data manifold structure. [sent-558, score-0.383]
93 Under this assumption, we can then explain the contrast between the lack of manifold learning in 2l moonsU h, and the presence of manifold learning in 4l moonsU h. [sent-559, score-0.472]
94 On one hand, for the 2l moonsU h condition, the evidence for the seven GP models on the two labeled points are: (graph) 0. [sent-560, score-0.252]
95 In any case, there is no reason to prefer the GP with a graph kernel, and we do not expect humans to learn on manifold in 2l moonsU h. [sent-569, score-0.413]
96 On the other hand, for 4l moonsU h, the evidence for the seven GP models on those four labeled points are: (graph) 0. [sent-570, score-0.268]
97 In particular, it is better than the evidence 1/16 for kernels that treat the four labeled points essentially independently. [sent-579, score-0.165]
98 Thus, there is a reason to prefer the GP with a graph kernel, and we do expect humans to learn on manifold in 4l moonsU h. [sent-582, score-0.413]
99 , it again suggests that people would perform manifold learning in 4l moonsU h. [sent-597, score-0.312]
100 Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. [sent-607, score-0.218]
wordName wordTfidf (topN-words)
[('moonsu', 0.745), ('cards', 0.281), ('manifold', 0.236), ('participants', 0.224), ('highlighting', 0.192), ('participant', 0.15), ('gridu', 0.134), ('unlabeled', 0.126), ('gp', 0.112), ('seven', 0.108), ('hr', 0.104), ('items', 0.104), ('manifolds', 0.102), ('labeled', 0.092), ('graph', 0.078), ('card', 0.064), ('gps', 0.064), ('humans', 0.061), ('people', 0.061), ('xl', 0.06), ('learner', 0.057), ('vote', 0.055), ('clicked', 0.052), ('human', 0.05), ('stimuli', 0.042), ('continuum', 0.039), ('evidence', 0.037), ('crosshair', 0.037), ('majority', 0.036), ('face', 0.034), ('vertical', 0.034), ('kernel', 0.033), ('blindly', 0.032), ('moons', 0.032), ('highlighted', 0.031), ('horizontal', 0.031), ('xiaojin', 0.028), ('shortest', 0.027), ('boundary', 0.027), ('psychology', 0.027), ('behavioral', 0.027), ('click', 0.026), ('interface', 0.026), ('dragging', 0.024), ('shark', 0.024), ('vandist', 0.024), ('wallis', 0.024), ('told', 0.023), ('frontal', 0.023), ('classi', 0.023), ('semisupervised', 0.022), ('prefer', 0.022), ('conditions', 0.022), ('dolphin', 0.021), ('undergraduates', 0.021), ('hint', 0.021), ('hints', 0.021), ('whale', 0.021), ('sh', 0.021), ('kernels', 0.02), ('classify', 0.02), ('designer', 0.02), ('participated', 0.02), ('bins', 0.02), ('rbf', 0.02), ('condition', 0.019), ('nn', 0.019), ('grid', 0.019), ('predictions', 0.019), ('mammal', 0.018), ('elongated', 0.018), ('along', 0.018), ('categorization', 0.017), ('neighbors', 0.017), ('vikas', 0.017), ('mikhail', 0.017), ('partha', 0.017), ('ignore', 0.016), ('learn', 0.016), ('four', 0.016), ('sorting', 0.016), ('behavior', 0.016), ('decision', 0.016), ('batch', 0.016), ('warping', 0.015), ('models', 0.015), ('perform', 0.015), ('neighboring', 0.015), ('capable', 0.015), ('niyogi', 0.015), ('comparing', 0.015), ('label', 0.014), ('attributed', 0.014), ('carl', 0.014), ('unweighted', 0.014), ('zhu', 0.014), ('pro', 0.014), ('supervised', 0.014), ('labels', 0.014), ('asked', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 114 nips-2010-Humans Learn Using Manifolds, Reluctantly
Author: Tim Rogers, Chuck Kalish, Joseph Harrison, Xiaojin Zhu, Bryan R. Gibson
Abstract: When the distribution of unlabeled data in feature space lies along a manifold, the information it provides may be used by a learner to assist classification in a semi-supervised setting. While manifold learning is well-known in machine learning, the use of manifolds in human learning is largely unstudied. We perform a set of experiments which test a human’s ability to use a manifold in a semisupervised learning task, under varying conditions. We show that humans may be encouraged into using the manifold, overcoming the strong preference for a simple, axis-parallel linear boundary. 1
2 0.13427962 146 nips-2010-Learning Multiple Tasks using Manifold Regularization
Author: Arvind Agarwal, Samuel Gerber, Hal Daume
Abstract: We present a novel method for multitask learning (MTL) based on manifold regularization: assume that all task parameters lie on a manifold. This is the generalization of a common assumption made in the existing literature: task parameters share a common linear subspace. One proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes across tasks which can be learnt independently. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets. 1
3 0.10487509 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
Author: Hariharan Narayanan, Sanjoy Mitter
Abstract: The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume log 1 and curvature is unavoidable. Whether the known lower bound of O( k + 2 δ ) 2 for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. Here is the desired bound on the error and δ is a bound on the probability of failure. We improve the best currently known upper bound [14] of 2 log 1 log4 k log 1 O( k2 + 2 δ ) to O k min k, 2 + 2 δ . Based on these results, we 2 devise a simple algorithm for k−means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension. 1
4 0.095149234 155 nips-2010-Learning the context of a category
Author: Dan Navarro
Abstract: This paper outlines a hierarchical Bayesian model for human category learning that learns both the organization of objects into categories, and the context in which this knowledge should be applied. The model is fit to multiple data sets, and provides a parsimonious method for describing how humans learn context specific conceptual representations.
5 0.07933972 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
Author: Uri Shalit, Daphna Weinshall, Gal Chechik
Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches for minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is hard to compute, and so is the projection operator that approximates it, we describe another second-order retraction that can be computed efficiently, with run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, given rank-one gradients. We use this algorithm, LORETA, to learn a matrixform similarity measure over pairs of documents represented as high dimensional vectors. LORETA improves the mean average precision over a passive- aggressive approach in a factorized model, and also improves over a full model trained over pre-selected features using the same memory requirements. LORETA also showed consistent improvement over standard methods in a large (1600 classes) multi-label image classification task. 1
6 0.069722742 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process
7 0.058397904 25 nips-2010-Active Learning by Querying Informative and Representative Examples
8 0.055729985 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
9 0.054355241 23 nips-2010-Active Instance Sampling via Matrix Partition
10 0.046524178 100 nips-2010-Gaussian Process Preference Elicitation
11 0.045233257 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies
12 0.042433031 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs
13 0.041913744 233 nips-2010-Scrambled Objects for Least-Squares Regression
14 0.038694862 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone
15 0.038235955 150 nips-2010-Learning concept graphs from text with stick-breaking priors
16 0.038003758 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes
17 0.037364747 250 nips-2010-Spectral Regularization for Support Estimation
18 0.036691919 162 nips-2010-Link Discovery using Graph Feature Tracking
19 0.036534227 27 nips-2010-Agnostic Active Learning Without Constraints
20 0.036516588 254 nips-2010-Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
topicId topicWeight
[(0, 0.106), (1, 0.039), (2, 0.014), (3, -0.019), (4, 0.014), (5, 0.017), (6, 0.021), (7, -0.051), (8, -0.028), (9, -0.033), (10, 0.034), (11, -0.032), (12, 0.002), (13, -0.055), (14, 0.054), (15, -0.002), (16, -0.018), (17, -0.104), (18, -0.031), (19, 0.065), (20, -0.114), (21, -0.026), (22, -0.021), (23, -0.023), (24, 0.081), (25, 0.002), (26, -0.165), (27, -0.052), (28, -0.059), (29, 0.079), (30, 0.08), (31, 0.003), (32, 0.107), (33, 0.015), (34, -0.054), (35, -0.133), (36, -0.036), (37, 0.08), (38, 0.022), (39, -0.005), (40, 0.124), (41, 0.04), (42, 0.028), (43, -0.038), (44, 0.041), (45, -0.001), (46, 0.086), (47, 0.036), (48, -0.014), (49, -0.082)]
simIndex simValue paperId paperTitle
same-paper 1 0.91749811 114 nips-2010-Humans Learn Using Manifolds, Reluctantly
Author: Tim Rogers, Chuck Kalish, Joseph Harrison, Xiaojin Zhu, Bryan R. Gibson
Abstract: When the distribution of unlabeled data in feature space lies along a manifold, the information it provides may be used by a learner to assist classification in a semi-supervised setting. While manifold learning is well-known in machine learning, the use of manifolds in human learning is largely unstudied. We perform a set of experiments which test a human’s ability to use a manifold in a semisupervised learning task, under varying conditions. We show that humans may be encouraged into using the manifold, overcoming the strong preference for a simple, axis-parallel linear boundary. 1
2 0.62196404 146 nips-2010-Learning Multiple Tasks using Manifold Regularization
Author: Arvind Agarwal, Samuel Gerber, Hal Daume
Abstract: We present a novel method for multitask learning (MTL) based on manifold regularization: assume that all task parameters lie on a manifold. This is the generalization of a common assumption made in the existing literature: task parameters share a common linear subspace. One proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes across tasks which can be learnt independently. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets. 1
3 0.46686554 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
Author: Hariharan Narayanan, Sanjoy Mitter
Abstract: The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume log 1 and curvature is unavoidable. Whether the known lower bound of O( k + 2 δ ) 2 for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. Here is the desired bound on the error and δ is a bound on the probability of failure. We improve the best currently known upper bound [14] of 2 log 1 log4 k log 1 O( k2 + 2 δ ) to O k min k, 2 + 2 δ . Based on these results, we 2 devise a simple algorithm for k−means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension. 1
4 0.46624207 220 nips-2010-Random Projection Trees Revisited
Author: Aman Dhesi, Purushottam Kar
Abstract: The Random Projection Tree (RPT REE) structures proposed in [1] are space partitioning data structures that automatically adapt to various notions of intrinsic dimensionality of data. We prove new results for both the RPT REE -M AX and the RPT REE -M EAN data structures. Our result for RPT REE -M AX gives a nearoptimal bound on the number of levels required by this data structure to reduce the size of its cells by a factor s ≥ 2. We also prove a packing lemma for this data structure. Our final result shows that low-dimensional manifolds have bounded Local Covariance Dimension. As a consequence we show that RPT REE -M EAN adapts to manifold dimension as well.
5 0.41989148 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs
Author: Matthew Urry, Peter Sollich
Abstract: We study learning curves for Gaussian process regression which characterise performance in terms of the Bayes error averaged over datasets of a given size. Whilst learning curves are in general very difficult to calculate we show that for discrete input domains, where similarity between input points is characterised in terms of a graph, accurate predictions can be obtained. These should in fact become exact for large graphs drawn from a broad range of random graph ensembles with arbitrary degree distributions where each input (node) is connected only to a finite number of others. Our approach is based on translating the appropriate belief propagation equations to the graph ensemble. We demonstrate the accuracy of the predictions for Poisson (Erdos-Renyi) and regular random graphs, and discuss when and why previous approximations of the learning curve fail. 1
6 0.41269735 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
7 0.35662085 204 nips-2010-Penalized Principal Component Regression on Graphs for Analysis of Subnetworks
8 0.35633132 155 nips-2010-Learning the context of a category
9 0.35525617 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty
10 0.32939103 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
11 0.3227559 25 nips-2010-Active Learning by Querying Informative and Representative Examples
12 0.31202507 2 nips-2010-A Bayesian Approach to Concept Drift
13 0.30742565 100 nips-2010-Gaussian Process Preference Elicitation
14 0.29106194 23 nips-2010-Active Instance Sampling via Matrix Partition
15 0.29030821 280 nips-2010-Unsupervised Kernel Dimension Reduction
16 0.28558877 233 nips-2010-Scrambled Objects for Least-Squares Regression
17 0.27940583 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes
18 0.27741992 107 nips-2010-Global seismic monitoring as probabilistic inference
19 0.27519983 105 nips-2010-Getting lost in space: Large sample analysis of the resistance distance
20 0.27416378 27 nips-2010-Agnostic Active Learning Without Constraints
topicId topicWeight
[(6, 0.188), (13, 0.052), (27, 0.066), (30, 0.054), (35, 0.03), (45, 0.197), (50, 0.042), (52, 0.03), (60, 0.044), (64, 0.011), (69, 0.026), (77, 0.045), (78, 0.017), (82, 0.018), (90, 0.043)]
simIndex simValue paperId paperTitle
same-paper 1 0.84330118 114 nips-2010-Humans Learn Using Manifolds, Reluctantly
Author: Tim Rogers, Chuck Kalish, Joseph Harrison, Xiaojin Zhu, Bryan R. Gibson
Abstract: When the distribution of unlabeled data in feature space lies along a manifold, the information it provides may be used by a learner to assist classification in a semi-supervised setting. While manifold learning is well-known in machine learning, the use of manifolds in human learning is largely unstudied. We perform a set of experiments which test a human’s ability to use a manifold in a semisupervised learning task, under varying conditions. We show that humans may be encouraged into using the manifold, overcoming the strong preference for a simple, axis-parallel linear boundary. 1
2 0.83431357 38 nips-2010-Batch Bayesian Optimization via Simulation Matching
Author: Javad Azimi, Alan Fern, Xiaoli Z. Fern
Abstract: Bayesian optimization methods are often used to optimize unknown functions that are costly to evaluate. Typically, these methods sequentially select inputs to be evaluated one at a time based on a posterior over the unknown function that is updated after each evaluation. In many applications, however, it is desirable to perform multiple evaluations in parallel, which requires selecting batches of multiple inputs to evaluate at once. In this paper, we propose a novel approach to batch Bayesian optimization, providing a policy for selecting batches of inputs with the goal of optimizing the function as efficiently as possible. The key idea is to exploit the availability of high-quality and efficient sequential policies, by using Monte-Carlo simulation to select input batches that closely match their expected behavior. Our experimental results on six benchmarks show that the proposed approach significantly outperforms two baselines and can lead to large advantages over a top sequential approach in terms of performance per unit time. 1
3 0.76691395 63 nips-2010-Distributed Dual Averaging In Networks
Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi
Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. 1
4 0.76683843 155 nips-2010-Learning the context of a category
Author: Dan Navarro
Abstract: This paper outlines a hierarchical Bayesian model for human category learning that learns both the organization of objects into categories, and the context in which this knowledge should be applied. The model is fit to multiple data sets, and provides a parsimonious method for describing how humans learn context specific conceptual representations.
5 0.76635283 117 nips-2010-Identifying graph-structured activation patterns in networks
Author: James Sharpnack, Aarti Singh
Abstract: We consider the problem of identifying an activation pattern in a complex, largescale network that is embedded in very noisy measurements. This problem is relevant to several applications, such as identifying traces of a biochemical spread by a sensor network, expression levels of genes, and anomalous activity or congestion in the Internet. Extracting such patterns is a challenging task specially if the network is large (pattern is very high-dimensional) and the noise is so excessive that it masks the activity at any single node. However, typically there are statistical dependencies in the network activation process that can be leveraged to fuse the measurements of multiple nodes and enable reliable extraction of highdimensional noisy patterns. In this paper, we analyze an estimator based on the graph Laplacian eigenbasis, and establish the limits of mean square error recovery of noisy patterns arising from a probabilistic (Gaussian or Ising) model based on an arbitrary graph structure. We consider both deterministic and probabilistic network evolution models, and our results indicate that by leveraging the network interaction structure, it is possible to consistently recover high-dimensional patterns even when the noise variance increases with network size. 1
6 0.76470226 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
7 0.76402247 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
8 0.76270878 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
9 0.7624374 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
10 0.76219803 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
11 0.76191866 265 nips-2010-The LASSO risk: asymptotic results and real world examples
12 0.76112384 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups
13 0.76104361 148 nips-2010-Learning Networks of Stochastic Differential Equations
14 0.76054084 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
15 0.76048595 158 nips-2010-Learning via Gaussian Herding
16 0.76000434 196 nips-2010-Online Markov Decision Processes under Bandit Feedback
17 0.75967991 27 nips-2010-Agnostic Active Learning Without Constraints
18 0.75951844 287 nips-2010-Worst-Case Linear Discriminant Analysis
19 0.75945008 280 nips-2010-Unsupervised Kernel Dimension Reduction
20 0.75929445 282 nips-2010-Variable margin losses for classifier design