nips nips2009 nips2009-169 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kai Yu, Tong Zhang, Yihong Gong
Abstract: This paper introduces a new method for semi-supervised learning on high dimensional nonlinear manifolds, which includes a phase of unsupervised basis learning and a phase of supervised function learning. The learned bases provide a set of anchor points to form a local coordinate system, such that each data point x on the manifold can be locally approximated by a linear combination of its nearby anchor points, and the linear weights become its local coordinate coding. We show that a high dimensional nonlinear function can be approximated by a global linear function with respect to this coding scheme, and the approximation quality is ensured by the locality of such coding. The method turns a difficult nonlinear learning problem into a simple global linear learning problem, which overcomes some drawbacks of traditional local learning methods. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract This paper introduces a new method for semi-supervised learning on high dimensional nonlinear manifolds, which includes a phase of unsupervised basis learning and a phase of supervised function learning. [sent-7, score-0.353]
2 The learned bases provide a set of anchor points to form a local coordinate system, such that each data point x on the manifold can be locally approximated by a linear combination of its nearby anchor points, and the linear weights become its local coordinate coding. [sent-8, score-2.149]
3 We show that a high dimensional nonlinear function can be approximated by a global linear function with respect to this coding scheme, and the approximation quality is ensured by the locality of such coding. [sent-9, score-1.185]
4 The method turns a difficult nonlinear learning problem into a simple global linear learning problem, which overcomes some drawbacks of traditional local learning methods. [sent-10, score-0.426]
5 1 Introduction Consider the problem of learning a nonlinear function f (x) on a high dimensional space x ∈ Rd . [sent-11, score-0.243]
6 This is because although data are physically represented in a high-dimensional space, they often lie on a manifold which has a much smaller intrinsic dimensionality. [sent-21, score-0.422]
7 This paper proposes a new method that can take advantage of the manifold geometric structure to learn a nonlinear function in high dimension. [sent-22, score-0.455]
8 The main idea is to locally embed points on the manifold into a lower dimensional space, expressed as coordinates with respect to a set of anchor points. [sent-23, score-0.685]
9 Our main observation is simple but very important: we show that a nonlinear function on the manifold can be effectively approximated by a linear function with such an coding under appropriate localization conditions. [sent-24, score-1.143]
10 Therefore using Local Coordinate Coding, we turn a very difficult high dimensional nonlinear learning problem into a much simpler linear learning problem, which has been extensively studied in the literature. [sent-25, score-0.279]
11 This idea may also be considered as a high dimensional generalization of low dimensional local smoothing methods in the traditional statistical literature. [sent-26, score-0.454]
12 However, in many practical applications, one often observe that the data we are interested in approximately lie on a manifold M which is embedded into Rd . [sent-35, score-0.361]
13 Although d is large, the intrinsic dimensionality of M can be much smaller. [sent-36, score-0.191]
14 Therefore if we are only interested in learning f (x) on M, then the complexity should depend on the intrinsic dimensionality of M instead of d. [sent-37, score-0.212]
15 In this paper, we approach this problem by introducing the idea of localized coordinate coding. [sent-38, score-0.381]
16 The formal definition of (non-localized) coordinate coding is given below, where we represent a point in Rd by a linear combination of a set of “anchor points”. [sent-39, score-1.052]
17 Later we show it is sufficient to choose a set of “anchor points” with cardinality depending on the intrinsic dimensionality of the manifold rather than d. [sent-40, score-0.458]
18 2 (Coordinate Coding) A coordinate coding is a pair (γ, C), where C ⊂ Rd is a set of anchor points, and γ is a map of x ∈ Rd to [γv (x)]v∈C ∈ R|C| such that v γv (x) = 1. [sent-42, score-1.232]
19 Moreover, for all x ∈ Rd , we define the corresponding coding norm as x γ = v∈C γv (x)2 1/2 . [sent-44, score-0.673]
20 The condition v γv (x) = 1 follows from the shift-invariance requirement, which means that the coding should remain the same if we use a different origin of the Rd coordinate system for representing data points. [sent-46, score-1.061]
21 The importance of the coordinate coding concept is that if a coordinate coding is sufficiently localized, then a nonlinear function can be approximate by a linear function with respect to the coding. [sent-48, score-2.263]
22 1 (Linearization) Let (γ, C) be an arbitrary coordinate coding on Rd . [sent-52, score-1.016]
23 v∈C To understand this result, we note that on the left hand side, a nonlinear function f (x) in Rd is approximated by a linear function v∈C γv (x)f (v) with respect to the coding γ(x), where [f (v)]v∈C is the set of coefficients to be estimated from data. [sent-55, score-0.893]
24 The quality of this approximation is bounded by the right hand side, which has two terms: the first term x − γ(x) means x should be close to its physical approximation γ(x), and the second term means that the coding should be localized. [sent-56, score-0.703]
25 The quality of a coding γ with respect to C can be measured by the right hand side. [sent-57, score-0.687]
26 3 (Localization Measure) Given α, β, p, and coding (γ, C), we define Qα,β,p (γ, C) = Ex α x − γ(x) + β |γv (x)| v − γ(x) 1+p . [sent-60, score-0.635]
27 Since the quality function Qα,β,p (γ, C) only depends on unlabeled data, in principle, we can find [γ, C] by optimizing this quality using unlabeled data. [sent-62, score-0.22]
28 Next we show that if the data lie on a manifold, then the complexity of local coordinate coding depends on the intrinsic manifold dimensionality instead of d. [sent-64, score-1.712]
29 4 (Manifold) A subset M ⊂ Rd is called a p-smooth (p > 0) manifold with intrinsic dimensionality m = m(M) if there exists a constant cp (M) such that given any x ∈ M, there exists m vectors v1 (x), . [sent-67, score-0.474]
30 The smooth manifold structure implies that one can approximate a point in M effectively using local coordinate coding. [sent-72, score-0.832]
31 Note that for a typical manifold with welldefined curvature, we can take p = 1. [sent-73, score-0.243]
32 For a compact manifold with intrinsic dimensionality m, there exists a constant c(M) such that its covering number is bounded by N ( , M) ≤ c(M) −m . [sent-78, score-0.49]
33 The following result shows that there exists a local coordinate coding to a set of anchor points C of cardinality O(m(M)N ( , M)) such that any (α, β, p)-Lipschitz smooth function can be linearly approximated using local coordinate coding up to the accuracy O( m(M) 1+p ). [sent-79, score-2.724]
34 1 (Manifold Coding) If the data points x lie on a compact p-smooth manifold M, and the norm is defined as x = (x Ax)1/2 for some positive definite matrix A. [sent-81, score-0.436]
35 Then given any > 0, there exist anchor points C ⊂ M and coding γ such that |C| ≤ (1+m(M))N ( , M), Qα,β,p (γ, C) ≤ [αcp (M)+(1+ m(M)+21+p Moreover, for all x ∈ M, we have x 2 γ ≤ 1 + (1 + m(M))β] 1+p . [sent-82, score-0.902]
36 Although this result is proved for manifolds, it is important to observe that the coordinate coding method proposed in this paper does not require the data to lie precisely on a manifold, and it does not require knowing m(M). [sent-87, score-1.113]
37 In the next section, we characterize the learning complexity of the local coordinate coding method. [sent-89, score-1.182]
38 It implies that linear prediction methods can be used to effectively learn nonlinear functions on a manifold. [sent-90, score-0.205]
39 The nonlinearity is fully captured by the coordinate coding map γ (which can be a nonlinear function). [sent-91, score-1.185]
40 This approach has some great advantages because the problem of finding local coordinate coding is much simpler than direct nonlinear learning: • Learning (γ, C) only requires unlabeled data, and the number of unlabeled data can be significantly more than the number of labeled data. [sent-92, score-1.523]
41 • In practice, we do not have to find the optimal coding because the coordinates are merely features for linear supervised learning. [sent-94, score-0.763]
42 Consequently, it is more robust than some standard approaches to nonlinear learning that direct optimize nonlinear functions on labeled data (e. [sent-96, score-0.388]
43 The local coordinate coding method considers a linear approximation of functions in Fα,β,p on the a data manifold. [sent-101, score-1.264]
44 Consider coordinate coding (γ, C), and the estimation method (1) with random training examples Sn = {(x1 , y1 ), . [sent-110, score-1.016]
45 It shows that the algorithm can learn an arbitrary nonlinear function on manifold when n → ∞. [sent-123, score-0.412]
46 1 implies that the convergence only depends on the intrinsic dimensionality of the manifold M, not d. [sent-125, score-0.454]
47 2 (Consistency) Suppose the data lie on a compact manifold M ⊂ Rd , and the norm · is the Euclidean norm in Rd . [sent-127, score-0.423]
48 Then it is possible to find coding (γ, C) using unlabeled data such that |C|/n → 0 and Qα,β,p (γ, C) → 0. [sent-130, score-0.721]
49 Then the local coordinate coding method (1) with g(v) ≡ 0 is consistent as n → ∞: limn→∞ ESn Ex,y φ(f (w, x), y) = inf f :M→R Ex,y φ (f (x), y). [sent-132, score-1.225]
50 ˆ 4 Practical Learning of Coding Given a coordinate coding (γ, C), we can use (1) to learn a nonlinear function in Rd . [sent-133, score-1.185]
51 The formulation is related to sparse coding [6] which has no locality constraints with p = −1. [sent-137, score-0.869]
52 The first class of them is nonlinear manifold learning, such as LLE [8], Isomap [9], and Laplacian 4 Eigenmaps [1]. [sent-144, score-0.412]
53 These methods find global coordinates of data manifold based on a pre-computed affinity graph of data points. [sent-145, score-0.374]
54 Our method learns a compact set of bases to form local coordinates, which has a linear complexity with respect to data size and can naturally handle unseen data. [sent-147, score-0.424]
55 More importantly, local coordinate coding has a direct connection to nonlinear function approximation on manifold, and thus provides a theoretically sound unsupervised pre-training method to facilitate further supervised learning tasks. [sent-148, score-1.472]
56 Another set of related models are local models in statistics, such as local kernel smoothing and local regression, e. [sent-149, score-0.647]
57 Local kernel smoothing can be regarded as a zero-order method; while local regression is higher-order, including local linear regression as the 1st-order case. [sent-152, score-0.641]
58 Traditional local methods are not widely used in machine learning practice, because data with non-uniform distribution on the manifold require to use adaptivebandwidth kernels. [sent-153, score-0.434]
59 Our method can be seen as a generalized 1st-order local method with basis learning and adaptive locality. [sent-157, score-0.196]
60 Compared to local linear regression, the learning is achieved by fitting a single globally linear function with respect to a set of learned local coordinates, which is much less prone to overfitting and computationally much cheaper. [sent-158, score-0.455]
61 This means that our method achieves better balance between local and global aspects of learning. [sent-159, score-0.19]
62 Finally, local coordinate coding draws connections to vector quantization (VQ) coding, e. [sent-161, score-1.182]
63 Learning linear functions of VQ codes can be regarded as a generalized zeroorder local method with basis learning. [sent-164, score-0.317]
64 In fact, we can regard local coordinate coding as locally constrained sparse coding. [sent-166, score-1.31]
65 However, to the best of our knowledge, there is no analysis in the literature that directly answers the question why sparse codes can help learning nonlinear functions in high dimensional space. [sent-168, score-0.379]
66 Our work reveals an important finding — a good first-order approximation to nonlinear function requires the codes to be local, which consequently requires the codes to be sparse. [sent-169, score-0.308]
67 Our experiments demonstrate that sparse coding is helpful for learning only when the codes are local. [sent-171, score-0.771]
68 1 Synthetic Data Our first example is based on a synthetic data set, where a nonlinear function is defined on a Swissroll manifold, as shown in Figure 1-(1). [sent-176, score-0.222]
69 We randomly sample 50, 000 data points on the manifold for unsupervised basis learning, and 500 labeled points for supervised regression. [sent-192, score-0.505]
70 The learned nonlinear functions are tested on another set of 10, 000 data points, with their performances evaluated by root mean square error (RMSE). [sent-194, score-0.194]
71 In the first setting, we let both coding methods use the same set of fixed bases, which are 128 points randomly sampled from the manifold. [sent-195, score-0.686]
72 Sparse coding based approach fails to capture the nonlinear function, while local coordinate coding behaves much better. [sent-197, score-1.986]
73 We take a closer look at the data representations obtained from the two different encoding methods, by visualizing the distributions of distances from encoded data to bases that have positive, negative, or zero coefficients in Figure 2. [sent-198, score-0.243]
74 It shows that sparse coding lets bases faraway from the encoded data have nonzero coefficients, while local coordinate coding allows only nearby bases to get nonzero coefficients. [sent-199, score-2.316]
75 In other words, sparse coding on this data does not ensure a good locality and thus fails to facilitate the nonlinear function learning. [sent-200, score-1.083]
76 As another interesting phenomenon, local coordinate coding seems to encourage coefficients to be nonnegative, which is intuitively understandable — if we use several bases close to a data point to linearly approximate the point, each basis should have a positive contribution. [sent-201, score-1.38]
77 In the next two experiments, given the random bases as a common initialization, we let the two algorithms learn bases from the 50, 000 unlabeled data points. [sent-203, score-0.372]
78 The regression results based on the learned bases are depicted in Figure 1-(4) and (5), which indicate that regression error is further reduced for local coordinate coding, but remains to be high for sparse coding. [sent-204, score-0.888]
79 We also make a comparison with local kernel smoothing, which takes a weighted average of function values of K-nearest training points to make prediction. [sent-205, score-0.26]
80 As shown in Figure 1-(6), the method works very well on this simple low-dimensional data, even outperforming the local coordinate coding approach. [sent-206, score-1.211]
81 However, if we increase the data dimensionality to be 256 by adding 253-dimensional independent Gaussian noises with zero mean and unitary variance, local coordinate coding becomes superior to local kernel smoothing, as shown in Figure 1-(7) and (8). [sent-207, score-1.537]
82 This is consistent with our theory, which suggests that local coordinate coding can work well in high dimension; on the other hand, local kernel smoothing is known to suffer from high dimensionality and noise. [sent-208, score-1.631]
83 In our setting, the set C of anchor points is obtained from sparse coding, with the regularization on 6 (a-1) (a-2) (b-1) (b-2) Figure 2: Coding locality on Swiss roll: (a) sparse coding vs. [sent-211, score-1.223]
84 Our focus here is not on anchor point learning, but rather on checking whether a good nonlinear classifier can be obtained if we enforce sparsity and locality in data representation, and then apply simple one-against-all linear SVMs. [sent-214, score-0.593]
85 Since the optimization cost of sparse coding is invariant under flipping the sign of v, we take a postprocessing step to change the sign of v if we find the corresponding γv (x) for most of x is negative. [sent-215, score-0.722]
86 This rectification will ensure the anchor points to be on the data manifold. [sent-216, score-0.292]
87 With the obtained C, for each data point x we solve the local coordinate coding problem (2), by optimizing γ only, to obtain the representation [γv (x)]v∈C . [sent-217, score-1.233]
88 We note that, like most of other manifold learning approaches, Laplacian eigenmaps or LLE is a transductive method which has to incorporate both training and testing data in training. [sent-221, score-0.32]
89 Both sparse coding and local coordinate coding perform quite good for this nonlinear classification task, significantly outperforming linear classifiers on raw images. [sent-223, score-2.166]
90 In addition, local coordinate coding is consistently better than sparse coding across various basis sizes. [sent-224, score-1.934]
91 This locality explains why sparse coding works well on MNIST data. [sent-226, score-0.869]
92 On the other hand, local coordinate coding is able to remove the unusual coefficients and further improve the locality. [sent-227, score-1.182]
93 7 (a-1) (a-2) (b-1) (b-2) Figure 3: Coding locality on MNIST: (a) sparse coding vs. [sent-234, score-0.869]
94 |C| Linear SVM with sparse coding Linear SVM with local coordinate coding 512 2. [sent-237, score-1.904]
95 Methods Linear SVM with raw images Linear SVM with VQ coding Local kernel smoothing Linear SVM with Laplacian eigenmap Linear SVM with LLE Linear classifier with deep belief network Linear SVM with sparse coding Linear SVM with local coordinate coding 7 Error Rate 12. [sent-246, score-2.787]
96 90 Conclusion This paper introduces a new method for high dimensional nonlinear learning with data distributed on manifolds. [sent-254, score-0.268]
97 The method can be seen as generalized local linear function approximation, but can be achieved by learning a global linear function with respect to coordinates from unsupervised local coordinate coding. [sent-255, score-0.937]
98 Compared to popular manifold learning methods, our approach can naturally handle unseen data and has a linear complexity with respect to data size. [sent-256, score-0.355]
99 The work also generalizes popular VQ coding and sparse coding schemes, and reveals that locality of coding is essential for supervised function learning. [sent-257, score-2.194]
100 The generalization performance depends on intrinsic dimensionality of the data manifold. [sent-258, score-0.262]
wordName wordTfidf (topN-words)
[('coding', 0.635), ('coordinate', 0.381), ('manifold', 0.243), ('anchor', 0.216), ('nonlinear', 0.169), ('local', 0.166), ('locality', 0.147), ('rd', 0.147), ('bases', 0.143), ('rmse', 0.107), ('smoothing', 0.106), ('intrinsic', 0.103), ('wv', 0.097), ('vq', 0.097), ('dimensionality', 0.088), ('sparse', 0.087), ('lle', 0.066), ('unlabeled', 0.061), ('coordinates', 0.057), ('svm', 0.053), ('lipschitz', 0.053), ('eigenmaps', 0.052), ('dimensional', 0.051), ('lie', 0.051), ('points', 0.051), ('mnist', 0.05), ('linearization', 0.05), ('codes', 0.049), ('cients', 0.047), ('unsupervised', 0.045), ('deep', 0.044), ('regression', 0.044), ('esn', 0.044), ('swissroll', 0.044), ('coef', 0.044), ('inf', 0.043), ('laplacian', 0.043), ('kernel', 0.043), ('smooth', 0.042), ('locally', 0.041), ('cp', 0.04), ('alexis', 0.039), ('rajat', 0.039), ('norm', 0.038), ('curse', 0.038), ('tting', 0.036), ('linear', 0.036), ('regarded', 0.036), ('honglak', 0.036), ('nec', 0.036), ('supervised', 0.035), ('unitary', 0.033), ('localization', 0.033), ('ruslan', 0.031), ('ex', 0.031), ('traditional', 0.031), ('digit', 0.03), ('laboratories', 0.03), ('battle', 0.03), ('basis', 0.03), ('raina', 0.029), ('outperforming', 0.029), ('handwritten', 0.028), ('raw', 0.028), ('covering', 0.028), ('compact', 0.028), ('encoded', 0.028), ('synthetic', 0.028), ('classi', 0.027), ('belief', 0.027), ('approximated', 0.027), ('generalization', 0.026), ('respect', 0.026), ('theorem', 0.026), ('manifolds', 0.026), ('nonzero', 0.026), ('optimizing', 0.026), ('quality', 0.026), ('nition', 0.026), ('pick', 0.025), ('prone', 0.025), ('loss', 0.025), ('data', 0.025), ('labeled', 0.025), ('america', 0.024), ('simpli', 0.024), ('cardinality', 0.024), ('global', 0.024), ('high', 0.023), ('ridge', 0.023), ('distances', 0.022), ('interested', 0.021), ('approximation', 0.021), ('nearby', 0.021), ('observe', 0.021), ('geometric', 0.02), ('depends', 0.02), ('facilitate', 0.02), ('origin', 0.02), ('reveals', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999994 169 nips-2009-Nonlinear Learning using Local Coordinate Coding
Author: Kai Yu, Tong Zhang, Yihong Gong
Abstract: This paper introduces a new method for semi-supervised learning on high dimensional nonlinear manifolds, which includes a phase of unsupervised basis learning and a phase of supervised function learning. The learned bases provide a set of anchor points to form a local coordinate system, such that each data point x on the manifold can be locally approximated by a linear combination of its nearby anchor points, and the linear weights become its local coordinate coding. We show that a high dimensional nonlinear function can be approximated by a global linear function with respect to this coding scheme, and the approximation quality is ensured by the locality of such coding. The method turns a difficult nonlinear learning problem into a simple global linear learning problem, which overcomes some drawbacks of traditional local learning methods. 1
2 0.17398392 163 nips-2009-Neurometric function analysis of population codes
Author: Philipp Berens, Sebastian Gerwinn, Alexander Ecker, Matthias Bethge
Abstract: The relative merits of different population coding schemes have mostly been analyzed in the framework of stimulus reconstruction using Fisher Information. Here, we consider the case of stimulus discrimination in a two alternative forced choice paradigm and compute neurometric functions in terms of the minimal discrimination error and the Jensen-Shannon information to study neural population codes. We first explore the relationship between minimum discrimination error, JensenShannon Information and Fisher Information and show that the discrimination framework is more informative about the coding accuracy than Fisher Information as it defines an error for any pair of possible stimuli. In particular, it includes Fisher Information as a special case. Second, we use the framework to study population codes of angular variables. Specifically, we assess the impact of different noise correlations structures on coding accuracy in long versus short decoding time windows. That is, for long time window we use the common Gaussian noise approximation. To address the case of short time windows we analyze the Ising model with identical noise correlation structure. In this way, we provide a new rigorous framework for assessing the functional consequences of noise correlation structures for the representational accuracy of neural population codes that is in particular applicable to short-time population coding. 1
3 0.16798502 52 nips-2009-Code-specific policy gradient rules for spiking neurons
Author: Henning Sprekeler, Guillaume Hennequin, Wulfram Gerstner
Abstract: Although it is widely believed that reinforcement learning is a suitable tool for describing behavioral learning, the mechanisms by which it can be implemented in networks of spiking neurons are not fully understood. Here, we show that different learning rules emerge from a policy gradient approach depending on which features of the spike trains are assumed to influence the reward signals, i.e., depending on which neural code is in effect. We use the framework of Williams (1992) to derive learning rules for arbitrary neural codes. For illustration, we present policy-gradient rules for three different example codes - a spike count code, a spike timing code and the most general “full spike train” code - and test them on simple model problems. In addition to classical synaptic learning, we derive learning rules for intrinsic parameters that control the excitability of the neuron. The spike count learning rule has structural similarities with established Bienenstock-Cooper-Munro rules. If the distribution of the relevant spike train features belongs to the natural exponential family, the learning rules have a characteristic shape that raises interesting prediction problems. 1
4 0.15962508 164 nips-2009-No evidence for active sparsification in the visual cortex
Author: Pietro Berkes, Ben White, Jozsef Fiser
Abstract: The proposal that cortical activity in the visual cortex is optimized for sparse neural activity is one of the most established ideas in computational neuroscience. However, direct experimental evidence for optimal sparse coding remains inconclusive, mostly due to the lack of reference values on which to judge the measured sparseness. Here we analyze neural responses to natural movies in the primary visual cortex of ferrets at different stages of development and of rats while awake and under different levels of anesthesia. In contrast with prediction from a sparse coding model, our data shows that population and lifetime sparseness decrease with visual experience, and increase from the awake to anesthetized state. These results suggest that the representation in the primary visual cortex is not actively optimized to maximize sparseness. 1
5 0.13798127 104 nips-2009-Group Sparse Coding
Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow
Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1
7 0.10739796 137 nips-2009-Learning transport operators for image manifolds
8 0.10534361 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data
9 0.10026866 157 nips-2009-Multi-Label Prediction via Compressed Sensing
10 0.094932705 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity
11 0.078861073 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels
12 0.076518461 253 nips-2009-Unsupervised feature learning for audio classification using convolutional deep belief networks
13 0.076366767 26 nips-2009-Adaptive Regularization for Transductive Support Vector Machine
14 0.075844474 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models
15 0.073295943 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors
16 0.07234662 238 nips-2009-Submanifold density estimation
17 0.07139343 213 nips-2009-Semi-supervised Learning using Sparse Eigenfunction Bases
18 0.071077839 146 nips-2009-Manifold Regularization for SIR with Rate Root-n Convergence
19 0.065755561 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds
20 0.061373889 91 nips-2009-Fast, smooth and adaptive regression in metric spaces
topicId topicWeight
[(0, -0.203), (1, 0.001), (2, 0.033), (3, 0.173), (4, -0.077), (5, 0.004), (6, -0.01), (7, 0.025), (8, 0.079), (9, 0.154), (10, 0.005), (11, 0.099), (12, -0.076), (13, 0.022), (14, -0.047), (15, -0.032), (16, -0.001), (17, 0.112), (18, 0.016), (19, -0.073), (20, -0.109), (21, 0.043), (22, 0.059), (23, 0.086), (24, 0.027), (25, -0.007), (26, 0.086), (27, 0.06), (28, -0.056), (29, 0.013), (30, 0.271), (31, -0.126), (32, -0.04), (33, 0.048), (34, -0.16), (35, -0.07), (36, -0.056), (37, 0.017), (38, -0.109), (39, 0.098), (40, 0.007), (41, 0.054), (42, 0.063), (43, -0.073), (44, 0.078), (45, -0.166), (46, 0.032), (47, -0.029), (48, -0.055), (49, -0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.97387064 169 nips-2009-Nonlinear Learning using Local Coordinate Coding
Author: Kai Yu, Tong Zhang, Yihong Gong
Abstract: This paper introduces a new method for semi-supervised learning on high dimensional nonlinear manifolds, which includes a phase of unsupervised basis learning and a phase of supervised function learning. The learned bases provide a set of anchor points to form a local coordinate system, such that each data point x on the manifold can be locally approximated by a linear combination of its nearby anchor points, and the linear weights become its local coordinate coding. We show that a high dimensional nonlinear function can be approximated by a global linear function with respect to this coding scheme, and the approximation quality is ensured by the locality of such coding. The method turns a difficult nonlinear learning problem into a simple global linear learning problem, which overcomes some drawbacks of traditional local learning methods. 1
2 0.66919535 164 nips-2009-No evidence for active sparsification in the visual cortex
Author: Pietro Berkes, Ben White, Jozsef Fiser
Abstract: The proposal that cortical activity in the visual cortex is optimized for sparse neural activity is one of the most established ideas in computational neuroscience. However, direct experimental evidence for optimal sparse coding remains inconclusive, mostly due to the lack of reference values on which to judge the measured sparseness. Here we analyze neural responses to natural movies in the primary visual cortex of ferrets at different stages of development and of rats while awake and under different levels of anesthesia. In contrast with prediction from a sparse coding model, our data shows that population and lifetime sparseness decrease with visual experience, and increase from the awake to anesthetized state. These results suggest that the representation in the primary visual cortex is not actively optimized to maximize sparseness. 1
3 0.66064566 163 nips-2009-Neurometric function analysis of population codes
Author: Philipp Berens, Sebastian Gerwinn, Alexander Ecker, Matthias Bethge
Abstract: The relative merits of different population coding schemes have mostly been analyzed in the framework of stimulus reconstruction using Fisher Information. Here, we consider the case of stimulus discrimination in a two alternative forced choice paradigm and compute neurometric functions in terms of the minimal discrimination error and the Jensen-Shannon information to study neural population codes. We first explore the relationship between minimum discrimination error, JensenShannon Information and Fisher Information and show that the discrimination framework is more informative about the coding accuracy than Fisher Information as it defines an error for any pair of possible stimuli. In particular, it includes Fisher Information as a special case. Second, we use the framework to study population codes of angular variables. Specifically, we assess the impact of different noise correlations structures on coding accuracy in long versus short decoding time windows. That is, for long time window we use the common Gaussian noise approximation. To address the case of short time windows we analyze the Ising model with identical noise correlation structure. In this way, we provide a new rigorous framework for assessing the functional consequences of noise correlation structures for the representational accuracy of neural population codes that is in particular applicable to short-time population coding. 1
4 0.53839505 104 nips-2009-Group Sparse Coding
Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow
Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1
Author: Kwang I. Kim, Florian Steinke, Matthias Hein
Abstract: Semi-supervised regression based on the graph Laplacian suffers from the fact that the solution is biased towards a constant and the lack of extrapolating power. Based on these observations, we propose to use the second-order Hessian energy for semi-supervised regression which overcomes both these problems. If the data lies on or close to a low-dimensional submanifold in feature space, the Hessian energy prefers functions whose values vary linearly with respect to geodesic distance. We first derive the Hessian energy for smooth manifolds and continue to give a stable estimation procedure for the common case where only samples of the underlying manifold are given. The preference of ‘’linear” functions on manifolds renders the Hessian energy particularly suited for the task of semi-supervised dimensionality reduction, where the goal is to find a user-defined embedding function given some labeled points which varies smoothly (and ideally linearly) along the manifold. The experimental results suggest superior performance of our method compared with semi-supervised regression using Laplacian regularization or standard supervised regression techniques applied to this task. 1
6 0.48376164 26 nips-2009-Adaptive Regularization for Transductive Support Vector Machine
7 0.43477681 146 nips-2009-Manifold Regularization for SIR with Rate Root-n Convergence
8 0.42605335 138 nips-2009-Learning with Compressible Priors
9 0.40718326 137 nips-2009-Learning transport operators for image manifolds
10 0.3965199 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data
11 0.38418752 157 nips-2009-Multi-Label Prediction via Compressed Sensing
12 0.37796605 91 nips-2009-Fast, smooth and adaptive regression in metric spaces
13 0.36897811 238 nips-2009-Submanifold density estimation
14 0.36693463 67 nips-2009-Directed Regression
15 0.36514941 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds
16 0.3608624 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models
17 0.35911924 52 nips-2009-Code-specific policy gradient rules for spiking neurons
18 0.35328797 180 nips-2009-On the Convergence of the Concave-Convex Procedure
19 0.35196555 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors
20 0.34097335 253 nips-2009-Unsupervised feature learning for audio classification using convolutional deep belief networks
topicId topicWeight
[(24, 0.046), (25, 0.085), (30, 0.162), (35, 0.041), (36, 0.162), (39, 0.063), (58, 0.097), (61, 0.019), (71, 0.06), (81, 0.023), (86, 0.109), (91, 0.046)]
simIndex simValue paperId paperTitle
1 0.91791093 218 nips-2009-Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining
Author: George Konidaris, Andre S. Barreto
Abstract: We introduce a skill discovery method for reinforcement learning in continuous domains that constructs chains of skills leading to an end-of-task reward. We demonstrate experimentally that it creates appropriate skills and achieves performance benefits in a challenging continuous domain. 1
same-paper 2 0.89593476 169 nips-2009-Nonlinear Learning using Local Coordinate Coding
Author: Kai Yu, Tong Zhang, Yihong Gong
Abstract: This paper introduces a new method for semi-supervised learning on high dimensional nonlinear manifolds, which includes a phase of unsupervised basis learning and a phase of supervised function learning. The learned bases provide a set of anchor points to form a local coordinate system, such that each data point x on the manifold can be locally approximated by a linear combination of its nearby anchor points, and the linear weights become its local coordinate coding. We show that a high dimensional nonlinear function can be approximated by a global linear function with respect to this coding scheme, and the approximation quality is ensured by the locality of such coding. The method turns a difficult nonlinear learning problem into a simple global linear learning problem, which overcomes some drawbacks of traditional local learning methods. 1
3 0.82498807 129 nips-2009-Learning a Small Mixture of Trees
Author: M. P. Kumar, Daphne Koller
Abstract: The problem of approximating a given probability distribution using a simpler distribution plays an important role in several areas of machine learning, for example variational inference and classification. Within this context, we consider the task of learning a mixture of tree distributions. Although mixtures of trees can be learned by minimizing the KL-divergence using an EM algorithm, its success depends heavily on the initialization. We propose an efficient strategy for obtaining a good initial set of trees that attempts to cover the entire observed distribution by minimizing the α-divergence with α = ∞. We formulate the problem using the fractional covering framework and present a convergent sequential algorithm that only relies on solving a convex program at each iteration. Compared to previous methods, our approach results in a significantly smaller mixture of trees that provides similar or better accuracies. We demonstrate the usefulness of our approach by learning pictorial structures for face recognition.
4 0.81979227 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction
Author: Kurt Miller, Michael I. Jordan, Thomas L. Griffiths
Abstract: As the availability and importance of relational data—such as the friendships summarized on a social networking website—increases, it becomes increasingly important to have good models for such data. The kinds of latent structure that have been considered for use in predicting links in such networks have been relatively limited. In particular, the machine learning community has focused on latent class models, adapting Bayesian nonparametric methods to jointly infer how many latent classes there are while learning which entities belong to each class. We pursue a similar approach with a richer kind of latent variable—latent features—using a Bayesian nonparametric approach to simultaneously infer the number of features at the same time we learn which entities have each feature. Our model combines these inferred features with known covariates in order to perform link prediction. We demonstrate that the greater expressiveness of this approach allows us to improve performance on three datasets. 1
5 0.81832302 72 nips-2009-Distribution Matching for Transduction
Author: Novi Quadrianto, James Petterson, Alex J. Smola
Abstract: Many transductive inference algorithms assume that distributions over training and test estimates should be related, e.g. by providing a large margin of separation on both sets. We use this idea to design a transduction algorithm which can be used without modification for classification, regression, and structured estimation. At its heart we exploit the fact that for a good learner the distributions over the outputs on training and test sets should match. This is a classical two-sample problem which can be solved efficiently in its most general form by using distance measures in Hilbert Space. It turns out that a number of existing heuristics can be viewed as special cases of our approach. 1
6 0.81735677 104 nips-2009-Group Sparse Coding
7 0.81588095 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions
8 0.81427354 27 nips-2009-Adaptive Regularization of Weight Vectors
9 0.81413662 70 nips-2009-Discriminative Network Models of Schizophrenia
10 0.81346053 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes
11 0.81225306 157 nips-2009-Multi-Label Prediction via Compressed Sensing
12 0.81128937 260 nips-2009-Zero-shot Learning with Semantic Output Codes
13 0.81105691 38 nips-2009-Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity
14 0.80988836 128 nips-2009-Learning Non-Linear Combinations of Kernels
15 0.80972171 213 nips-2009-Semi-supervised Learning using Sparse Eigenfunction Bases
16 0.80932498 118 nips-2009-Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions
17 0.80930233 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
18 0.80902815 97 nips-2009-Free energy score space
19 0.80842733 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA
20 0.80832803 137 nips-2009-Learning transport operators for image manifolds