nips nips2008 nips2008-126 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Qiang Wu, Sayan Mukherjee, Feng Liang
Abstract: We developed localized sliced inverse regression for supervised dimension reduction. It has the advantages of preventing degeneracy, increasing estimation accuracy, and automatic subclass discovery in classification problems. A semisupervised version is proposed for the use of unlabeled data. The utility is illustrated on simulated as well as real data sets.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We developed localized sliced inverse regression for supervised dimension reduction. [sent-9, score-0.383]
2 It has the advantages of preventing degeneracy, increasing estimation accuracy, and automatic subclass discovery in classification problems. [sent-10, score-0.073]
3 A semisupervised version is proposed for the use of unlabeled data. [sent-11, score-0.114]
4 Characterizing this predictive subspace, supervised dimension reduction, requires both the response and explanatory variables. [sent-14, score-0.243]
5 In machine learning community research on nonlinear dimension reduction in the spirit of [19] has been developed of late. [sent-17, score-0.241]
6 This has led to a variety of manifold learning algorithms such as isometric mapping (ISOMAP, [16]), local linear embedding (LLE, [14]), Hessian Eigenmaps [5], and Laplacian Eigenmaps [1]. [sent-18, score-0.098]
7 Two key differences exist between the paradigm explored in this approach and that of supervised dimension reduction. [sent-19, score-0.122]
8 The first difference is that the above methods are unsupervised in that the algorithms take into account only the explanatory variables. [sent-20, score-0.079]
9 The bigger problem is that these manifold learning algorithms do not operate on the space of the explanatory variables and hence do not provide a predictive submanifold onto which the data should be projected. [sent-22, score-0.192]
10 These methods are based on embedding the observed data onto a graph and then using spectral properties of the embedded graph for dimension reduction. [sent-23, score-0.168]
11 The key observation in all of these manifold algorithms is that metrics must be local and properties that hold in an ambient Euclidean space are true locally on smooth manifolds. [sent-24, score-0.1]
12 This suggests that the use of local information in supervised dimension reduction methods may be of use to extend methods for dimension reduction to the setting of nonlinear subspaces and submanifolds of the ambient space. [sent-25, score-0.532]
13 1 In this paper we extend SIR by taking into account the local structure of the explanatory variables. [sent-27, score-0.12]
14 This localized variant of SIR, LSIR, can be used for classification as well as regression applications. [sent-28, score-0.085]
15 Though the predictive directions obtained by LSIR are linear ones, they coded nonlinear information. [sent-29, score-0.153]
16 Another advantage of our approach is that ancillary unlabeled data can be easily added to the dimension reduction analysis – semi-supervised learning. [sent-30, score-0.282]
17 The utility with respect to predictive accuracy as well as exploratory data analysis via visualization is demonstrated on a variety of simulated and real data in Sections 4 and 5. [sent-34, score-0.141]
18 Then we propose a generalization of SIR, called localized SIR, by incorporating some localization idea from manifold learning. [sent-37, score-0.13]
19 1 Sliced inverse regression Assume the functional dependence between a response variable Y and an explanatory variable X ∈ Rp is given by t t (1) Y = f (β1 X, . [sent-40, score-0.173]
20 Estimating B or βl ’s becomes the central problem in supervised dimension reduction. [sent-46, score-0.122]
21 Following [4], we refer to B as the dimension reduction (d. [sent-48, score-0.194]
22 Slice inverse regression (SIR) model is introduced [10] to estimate the d. [sent-53, score-0.087]
23 The idea underlying this approach is that, if X has an identity covariance matrix, the centered inverse regression curve E(X|Y ) − EX is contained in the d. [sent-56, score-0.116]
24 directions βl ’s are given by the top eigenvectors of the covariance matrix Γ = Cov(E(X|Y )). [sent-61, score-0.145]
25 In general when the covariance matrix of X is Σ, the βl ’s can be obtained by solving a generalized eigen decomposition problem Γβ = λΣβ. [sent-62, score-0.078]
26 Compute an empirical estimate of Γ, H ˆ Γ= i=1 where mh = 1 nh j∈Gh nh (mh − m)(mh − m)T n xj is the sample mean for group Gh with nh being the group size. [sent-70, score-0.285]
27 directions β by solving a generalized eigen problem ˆ ˆ Γβ = λΣβ. [sent-74, score-0.135]
28 2 (2) When Y takes categorical values as in classification problems, it is natural to divide the data into different groups by their group labels. [sent-75, score-0.143]
29 1 Though SIR has been widely used for dimension reduction and yielded many useful results in practice, it has some known problems. [sent-77, score-0.194]
30 For example, it is easy to construct a function f such that E(X|Y = y) = 0 then SIR fails to retrieve any useful directions [3]. [sent-78, score-0.148]
31 Generalizations of SIR include SAVE [3], SIR-II [12] and covariance inverse regression estimation (CIRE, [2]) that exploit the information from the second moment of the conditional distribution of X|Y . [sent-81, score-0.136]
32 However in some scenario the information in each slice can not be well described by a global statistics. [sent-82, score-0.067]
33 For example, similar to the multimodal situation considered by [15], the data in a slice may form two clusters, then a good description of the data would not be a single number such as any moments, but the two cluster centers. [sent-83, score-0.129]
34 2 Localization A key principle in manifold learning is that the Euclidean representation of a data point in Rp is only meaningful locally. [sent-86, score-0.06]
35 Under this principle, it is dangerous to calculate the slice average mh , whenever the slice contains data that are far away. [sent-87, score-0.228]
36 Motivated by this idea we introduce a localized SIR (LSIR) method for dimension reduction. [sent-89, score-0.149]
37 Let us start with the transformed data set where the empirical covariance is identity, for example, the data set after PCA. [sent-91, score-0.086]
38 In the original SIR method, we shift every data point xi to the corresponding group average, then apply PCA on the new data set to identify SIR directions. [sent-92, score-0.119]
39 The underline rational for this approach is that if a direction does not differentiate different groups well, the group means projected to that direction would be very close, therefore the variance of the new data set will be small at that direction. [sent-93, score-0.118]
40 A natural way to incorporate localization idea into this approach is to shift each data point xi to the average of a local neighborhood instead of the average of its global neighborhood (i. [sent-94, score-0.258]
41 In manifolds learning, local neighborhood is often chosen by k nearest neighborhood (k-NN). [sent-97, score-0.2]
42 Different from manifolds learning that is designed for unsupervised learning, the neighborhood selection for LSIR that is designed for supervised learning will also incorporate information from the response variable y. [sent-98, score-0.134]
43 Recall that the group average mh is used in estimating ˆ Γ = Cov(E(X|Y )). [sent-100, score-0.128]
44 The estimate Γ is equivalent to the sample covariance of a data set {mi }n i=1 where mi = mh , average of the group Gh to which xi belongs. [sent-101, score-0.235]
45 In our LSIR algorithm, we set mi ˆ equal to some local average, and then use the corresponding sample covariance matrix to replace Γ in equation (2). [sent-102, score-0.105]
46 For each sample (xi , yi ) we compute mi,loc = 1 k xj , j∈si where, with h being the group so that i ∈ Gh , si = {j : xj belongs to the k nearest neighbors of xi in Gh } . [sent-107, score-0.141]
47 With a moderate choice of k, LSIR uses the local information within each slice and is expected to retrieve directions lost by SIR in case of SIR fails due to degeneracy. [sent-114, score-0.256]
48 As showed in one of our examples, this property of LSIR is very useful in data analysis such as cancer subtype discovery using genomic data. [sent-121, score-0.118]
49 3 Connection to Existing Work The idea of localization has been introduced to dimension reduction for classification problems before. [sent-123, score-0.247]
50 For example, the local discriminant information (LDI) introduced by [9] is one of the early work in this area. [sent-124, score-0.096]
51 In LDI, the local information is used to compute the between-group covariance matrix Γi over a nearest neighborhood at every data point xi and then estimate the d. [sent-125, score-0.201]
52 directions by n 1 the top eigenvector of the averaged between-group matrix n i=1 Γi . [sent-127, score-0.101]
53 The local Fisher discriminant analysis (LFDA) introduced by [15] can be regarded as an improvement of LDI with the within-class covariance matrix also being localized. [sent-128, score-0.14]
54 For example, for a problem of C classes, LDI needs to compute nC local mean points and n between-group covariance matrices, while LSIR computes only n local mean points and one covariance matrix. [sent-131, score-0.206]
55 Another advantage is LSIR can be easily extended to handle unlabeled data in semi-supervised learning as explained in the next section. [sent-132, score-0.088]
56 Such an extension is less straightforward for the other two approaches that operate on the covariance matrices instead of data points. [sent-133, score-0.065]
57 How to incorporate the information from unlabeled data has been the main focus of research in semi-supervised learning. [sent-142, score-0.088]
58 Our LSIR algorithm can be easily modified to take the unlabeled data into consideration. [sent-143, score-0.088]
59 Since y of an unlabeled sample can take any possible values, we put the unlabeled data into every slice. [sent-144, score-0.155]
60 So the neighborhood si is defined as the following: for any point in the k-NN of xi , it belongs to si if it is unlabeled, or if it is labeled and belongs to the same slice as xi . [sent-145, score-0.274]
61 The performance of LSIR is compared with other dimension reduction methods including SIR, SAVE, pHd, and LFDA. [sent-147, score-0.194]
62 0008) Table 1: Estimation accuracy (and standard deviation) of various dimension reduction methods for semisupervised learning in Example 1. [sent-156, score-0.277]
63 The data points in red and blue are labeled and the ones in green are unlabeled when the semisupervised setting is considered. [sent-163, score-0.184]
64 (c) Projection of data to the first two LSIR directions when all the n = 400 data points are labeled. [sent-165, score-0.161]
65 (d) Projection of the data to the first two LSIR directions when only 20 points as indicated in (a) are labeled. [sent-166, score-0.14]
66 directions are the first two dimensions and the remaining eight dimensions are Gaussian noise. [sent-181, score-0.189]
67 The data in the first two relevant dimensions are plotted in Figure 1(a) with sample size n = 400. [sent-182, score-0.065]
68 directions because the group averages of the two groups are roughly the same for the first two dimensions, due to the symmetry in the data. [sent-185, score-0.18]
69 Using local average instead of group average, LSIR can find both directions, see Figure 1(c). [sent-186, score-0.114]
70 The directions from PCA where one ignores the labels do not agree with the discriminant directions as shown in Figure 1(b). [sent-189, score-0.242]
71 So to retrieve the relevant directions, the information from the labeled points has to be taken consideration. [sent-190, score-0.096]
72 We evaluate the accuracy of LSIR (the semi-supervised version), SAVE and pHd where the latter two are operated on just the labeled set. [sent-191, score-0.067]
73 The result for one iteration is displayed in Figure 1 where the labeled points are indicated in (a) and the projection to the top two directions from LSIR (with k = 40) is in (d). [sent-194, score-0.171]
74 All the results clearly indicate that LSIR out-performs the other two supervised dimension reduction methods. [sent-195, score-0.22]
75 We first generate a 10-dimensional data set where the first three dimensions are the Swiss roll data [14]: X1 = t cos t, X2 = 21h, X3 = t sin t, 3π where t = 2 (1 + 2θ), θ ∼ Uniform(0, 1) and h ∼ Uniform([0, 1]). [sent-197, score-0.131]
76 4 200 300 400 500 600 700 sample size 800 900 1000 Figure 2: Estimation accuracy of various dimension methods for example 2. [sent-208, score-0.132]
77 We randomly choose n samples as a training set and let n change from 200 to 1000 and compare the estimation accuracy for LSIR with SIR, SAVE and pHd. [sent-209, score-0.074]
78 Note that Swiss roll (the first three dimensions) is a benchmark data set in manifolds learning, where the goal is to “unroll” the data into the intrinsic two dimensional space. [sent-212, score-0.108]
79 Since LSIR is a linear dimension reduction method we do not expect LSIR to unroll the data, but expect to retrieve the dimensions relevant to the prediction of Y . [sent-213, score-0.33]
80 Meanwhile, with the noise, manifolds learning algorithms will not unroll the data either since the dominant directions are now the noise dimensions. [sent-214, score-0.203]
81 The Tai Chi data set was first used as a dimension reduction example in [12, Chapter 14]. [sent-222, score-0.215]
82 By taking the local structure into account, LSIR can easily retrieve the relevant directions. [sent-227, score-0.088]
83 (a) The training data in first two dimensions; (b) The training data projected onto the first two LSIR directions; (c) An independent test data projected onto the first two LSIR directions. [sent-234, score-0.165]
84 It contains 60, 000 images of handwritten digits as training data and 10, 000 images as test data. [sent-241, score-0.059]
85 6 4 5 x 10 0 −5 −10 −5 0 5 10 4 x 10 Figure 4: Result for leukemia data by LSIR. [sent-243, score-0.061]
86 Then we project the training data and 10000 test data onto these directions. [sent-249, score-0.075]
87 2 Gene expression data Cancer classification and discovery using gene expression data becomes an important technique in modern biology and medical science. [sent-277, score-0.102]
88 In gene expression data number of genes is huge (usually up to thousands) and the samples is quite limited. [sent-278, score-0.07]
89 As a typical large p small n problem, dimension reduction plays very essential role to understand the data structure and make inference. [sent-279, score-0.215]
90 An interesting point is that LSIR automatically realizes subtype discovery while SIR cannot. [sent-286, score-0.069]
91 By project the training data onto the first two directions (Figure 4), we immediately notice that the ALL has two subtypes. [sent-287, score-0.155]
92 Note that there are two samples (which are T-cell ALL) cannot be assigned to each subtype only by visualization. [sent-289, score-0.058]
93 6 Discussion We developed LSIR method for dimension reduction by incorporating local information into the original SIR. [sent-291, score-0.25]
94 A semisupervised version is developed for the use of unlabeled data. [sent-294, score-0.129]
95 An extension of LSIR along this direction 7 can be helpful to realize nonlinear dimension reduction directions and to reduce the computational complexity in case of p ≫ n. [sent-297, score-0.327]
96 Further research on LSIR and its kernelized version includes their asymptotic properties such as consistency and statistically more rigorous approaches for the choice of k, the size of local neighborhoods, and L, the dimensionality of the reduced space. [sent-298, score-0.058]
97 Dimension reduction and visualization in discriminant analysis (with discussion). [sent-321, score-0.162]
98 Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. [sent-363, score-0.06]
99 On principal hessian directions for data visulization and dimension reduction: another application of stein’s lemma. [sent-381, score-0.274]
100 Dimension reduction of multimodal labeled data by local fisher discriminatn analysis. [sent-406, score-0.211]
wordName wordTfidf (topN-words)
[('lsir', 0.757), ('sir', 0.464), ('sliced', 0.121), ('save', 0.113), ('directions', 0.101), ('reduction', 0.098), ('dimension', 0.096), ('explanatory', 0.079), ('chi', 0.076), ('loc', 0.076), ('tai', 0.076), ('gh', 0.073), ('unlabeled', 0.067), ('slice', 0.067), ('fda', 0.066), ('phd', 0.056), ('mh', 0.055), ('group', 0.055), ('localized', 0.053), ('ldi', 0.053), ('neighborhood', 0.05), ('eigenmaps', 0.049), ('retrieve', 0.047), ('semisupervised', 0.047), ('unroll', 0.045), ('covariance', 0.044), ('dimensions', 0.044), ('local', 0.041), ('pb', 0.041), ('classi', 0.04), ('discriminant', 0.04), ('subtype', 0.04), ('leukemia', 0.04), ('degeneracy', 0.04), ('nh', 0.04), ('inverse', 0.04), ('manifold', 0.039), ('subspace', 0.038), ('localization', 0.038), ('digits', 0.038), ('manifolds', 0.036), ('cook', 0.036), ('accuracy', 0.036), ('hessian', 0.034), ('eigen', 0.034), ('onto', 0.033), ('nonlinear', 0.032), ('regression', 0.032), ('gene', 0.031), ('labeled', 0.031), ('aml', 0.03), ('lfda', 0.03), ('roll', 0.03), ('sayan', 0.03), ('semilsir', 0.03), ('rp', 0.029), ('discovery', 0.029), ('cancer', 0.028), ('qiang', 0.026), ('supervised', 0.026), ('subspaces', 0.025), ('subclass', 0.024), ('cation', 0.024), ('visualization', 0.024), ('groups', 0.024), ('divide', 0.023), ('duke', 0.023), ('swiss', 0.023), ('nearest', 0.023), ('principal', 0.022), ('si', 0.022), ('xi', 0.022), ('response', 0.022), ('neighborhoods', 0.021), ('projection', 0.021), ('data', 0.021), ('multimodal', 0.02), ('ambient', 0.02), ('mi', 0.02), ('estimation', 0.02), ('categorical', 0.02), ('predictive', 0.02), ('utility', 0.019), ('nc', 0.019), ('belongs', 0.019), ('embedding', 0.018), ('projected', 0.018), ('pca', 0.018), ('cov', 0.018), ('samples', 0.018), ('average', 0.018), ('points', 0.018), ('moments', 0.018), ('liang', 0.017), ('dimensionality', 0.017), ('euclidean', 0.016), ('introduced', 0.015), ('sin', 0.015), ('developed', 0.015), ('regularization', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 126 nips-2008-Localized Sliced Inverse Regression
Author: Qiang Wu, Sayan Mukherjee, Feng Liang
Abstract: We developed localized sliced inverse regression for supervised dimension reduction. It has the advantages of preventing degeneracy, increasing estimation accuracy, and automatic subclass discovery in classification problems. A semisupervised version is proposed for the use of unlabeled data. The utility is illustrated on simulated as well as real data sets.
2 0.068276778 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
Author: Andrew Smith, Hongyuan Zha, Xiao-ming Wu
Abstract: We study the convergence and the rate of convergence of a local manifold learning algorithm: LTSA [13]. The main technical tool is the perturbation analysis on the linear invariant subspace that corresponds to the solution of LTSA. We derive a worst-case upper bound of errors for LTSA which naturally leads to a convergence result. We then derive the rate of convergence for LTSA in a special case. 1
3 0.057297949 194 nips-2008-Regularized Learning with Networks of Features
Author: Ted Sandler, John Blitzer, Partha P. Talukdar, Lyle H. Ungar
Abstract: For many supervised learning problems, we possess prior knowledge about which features yield similar information about the target variable. In predicting the topic of a document, we might know that two words are synonyms, and when performing image recognition, we know which pixels are adjacent. Such synonymous or neighboring features are near-duplicates and should be expected to have similar weights in an accurate model. Here we present a framework for regularized learning when one has prior knowledge about which features are expected to have similar and dissimilar weights. The prior knowledge is encoded as a network whose vertices are features and whose edges represent similarities and dissimilarities between them. During learning, each feature’s weight is penalized by the amount it differs from the average weight of its neighbors. For text classification, regularization using networks of word co-occurrences outperforms manifold learning and compares favorably to other recently proposed semi-supervised learning methods. For sentiment analysis, feature networks constructed from declarative human knowledge significantly improve prediction accuracy. 1
4 0.055617575 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization
Author: Yuhong Guo
Abstract: Recently, supervised dimensionality reduction has been gaining attention, owing to the realization that data labels are often available and indicate important underlying structure in the data. In this paper, we present a novel convex supervised dimensionality reduction approach based on exponential family PCA, which is able to avoid the local optima of typical EM learning. Moreover, by introducing a sample-based approximation to exponential family models, it overcomes the limitation of the prevailing Gaussian assumptions of standard PCA, and produces a kernelized formulation for nonlinear supervised dimensionality reduction. A training algorithm is then devised based on a subgradient bundle method, whose scalability can be gained using a coordinate descent procedure. The advantage of our global optimization approach is demonstrated by empirical results over both synthetic and real data. 1
5 0.052935258 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
Author: Aarti Singh, Robert Nowak, Xiaojin Zhu
Abstract: Empirical evidence shows that in favorable situations semi-supervised learning (SSL) algorithms can capitalize on the abundance of unlabeled training data to improve the performance of a learning task, in the sense that fewer labeled training data are needed to achieve a target error bound. However, in other situations unlabeled data do not seem to help. Recent attempts at theoretically characterizing SSL gains only provide a partial and sometimes apparently conflicting explanations of whether, and to what extent, unlabeled data can help. In this paper, we attempt to bridge the gap between the practice and theory of semi-supervised learning. We develop a finite sample analysis that characterizes the value of unlabeled data and quantifies the performance improvement of SSL compared to supervised learning. We show that there are large classes of problems for which SSL can significantly outperform supervised learning, in finite sample regimes and sometimes also in terms of error convergence rates.
6 0.048519179 61 nips-2008-Diffeomorphic Dimensionality Reduction
7 0.048202883 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
8 0.047381606 151 nips-2008-Non-parametric Regression Between Manifolds
9 0.046809237 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
10 0.046279769 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
11 0.044855885 60 nips-2008-Designing neurophysiology experiments to optimally constrain receptive field models along parametric submanifolds
12 0.044534776 145 nips-2008-Multi-stage Convex Relaxation for Learning with Sparse Regularization
13 0.043401707 62 nips-2008-Differentiable Sparse Coding
14 0.038888454 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE
15 0.037419111 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition
16 0.035430692 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning
17 0.034621604 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
18 0.03287274 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition
19 0.03280523 193 nips-2008-Regularized Co-Clustering with Dual Supervision
20 0.032435473 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
topicId topicWeight
[(0, -0.112), (1, -0.038), (2, -0.004), (3, 0.02), (4, 0.03), (5, 0.001), (6, 0.008), (7, 0.018), (8, -0.029), (9, 0.003), (10, 0.093), (11, -0.036), (12, -0.074), (13, -0.041), (14, -0.027), (15, -0.028), (16, 0.026), (17, 0.082), (18, 0.026), (19, 0.006), (20, -0.023), (21, 0.014), (22, 0.005), (23, -0.052), (24, -0.029), (25, -0.011), (26, -0.05), (27, 0.009), (28, 0.037), (29, 0.051), (30, 0.031), (31, 0.034), (32, 0.026), (33, 0.015), (34, 0.009), (35, -0.144), (36, -0.02), (37, 0.033), (38, -0.052), (39, -0.025), (40, -0.017), (41, -0.028), (42, -0.042), (43, 0.06), (44, -0.004), (45, 0.009), (46, 0.098), (47, -0.092), (48, 0.033), (49, 0.045)]
simIndex simValue paperId paperTitle
same-paper 1 0.87495679 126 nips-2008-Localized Sliced Inverse Regression
Author: Qiang Wu, Sayan Mukherjee, Feng Liang
Abstract: We developed localized sliced inverse regression for supervised dimension reduction. It has the advantages of preventing degeneracy, increasing estimation accuracy, and automatic subclass discovery in classification problems. A semisupervised version is proposed for the use of unlabeled data. The utility is illustrated on simulated as well as real data sets.
2 0.62292159 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
Author: Andrew Smith, Hongyuan Zha, Xiao-ming Wu
Abstract: We study the convergence and the rate of convergence of a local manifold learning algorithm: LTSA [13]. The main technical tool is the perturbation analysis on the linear invariant subspace that corresponds to the solution of LTSA. We derive a worst-case upper bound of errors for LTSA which naturally leads to a convergence result. We then derive the rate of convergence for LTSA in a special case. 1
Author: Jeremy Lewi, Robert Butera, David M. Schneider, Sarah Woolley, Liam Paninski
Abstract: Sequential optimal design methods hold great promise for improving the efficiency of neurophysiology experiments. However, previous methods for optimal experimental design have incorporated only weak prior information about the underlying neural system (e.g., the sparseness or smoothness of the receptive field). Here we describe how to use stronger prior information, in the form of parametric models of the receptive field, in order to construct optimal stimuli and further improve the efficiency of our experiments. For example, if we believe that the receptive field is well-approximated by a Gabor function, then our method constructs stimuli that optimally constrain the Gabor parameters (orientation, spatial frequency, etc.) using as few experimental trials as possible. More generally, we may believe a priori that the receptive field lies near a known sub-manifold of the full parameter space; in this case, our method chooses stimuli in order to reduce the uncertainty along the tangent space of this sub-manifold as rapidly as possible. Applications to simulated and real data indicate that these methods may in many cases improve the experimental efficiency. 1
4 0.55529279 151 nips-2008-Non-parametric Regression Between Manifolds
Author: Florian Steinke, Matthias Hein
Abstract: This paper discusses non-parametric regression between Riemannian manifolds. This learning problem arises frequently in many application areas ranging from signal processing, computer vision, over robotics to computer graphics. We present a new algorithmic scheme for the solution of this general learning problem based on regularized empirical risk minimization. The regularization functional takes into account the geometry of input and output manifold, and we show that it implements a prior which is particularly natural. Moreover, we demonstrate that our algorithm performs well in a difficult surface registration problem. 1
5 0.53134644 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
Author: Liu Yang, Rong Jin, Rahul Sukthankar
Abstract: The cluster assumption is exploited by most semi-supervised learning (SSL) methods. However, if the unlabeled data is merely weakly related to the target classes, it becomes questionable whether driving the decision boundary to the low density regions of the unlabeled data will help the classification. In such case, the cluster assumption may not be valid; and consequently how to leverage this type of unlabeled data to enhance the classification accuracy becomes a challenge. We introduce “Semi-supervised Learning with Weakly-Related Unlabeled Data” (SSLW), an inductive method that builds upon the maximum-margin approach, towards a better usage of weakly-related unlabeled information. Although the SSLW could improve a wide range of classification tasks, in this paper, we focus on text categorization with a small training pool. The key assumption behind this work is that, even with different topics, the word usage patterns across different corpora tends to be consistent. To this end, SSLW estimates the optimal wordcorrelation matrix that is consistent with both the co-occurrence information derived from the weakly-related unlabeled documents and the labeled documents. For empirical evaluation, we present a direct comparison with a number of stateof-the-art methods for inductive semi-supervised learning and text categorization. We show that SSLW results in a significant improvement in categorization accuracy, equipped with a small training set and an unlabeled resource that is weakly related to the test domain.
6 0.5291124 128 nips-2008-Look Ma, No Hands: Analyzing the Monotonic Feature Abstraction for Text Classification
7 0.50070322 61 nips-2008-Diffeomorphic Dimensionality Reduction
8 0.47918105 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
9 0.45609757 188 nips-2008-QUIC-SVD: Fast SVD Using Cosine Trees
10 0.42848584 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization
11 0.42749453 122 nips-2008-Learning with Consistency between Inductive Functions and Kernels
12 0.41649225 90 nips-2008-Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity
13 0.41348279 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
14 0.41259992 193 nips-2008-Regularized Co-Clustering with Dual Supervision
15 0.40657505 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization
16 0.40498096 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics
17 0.40090042 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning
18 0.36564115 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
19 0.3535428 232 nips-2008-The Conjoint Effect of Divisive Normalization and Orientation Selectivity on Redundancy Reduction
20 0.34684563 5 nips-2008-A Transductive Bound for the Voted Classifier with an Application to Semi-supervised Learning
topicId topicWeight
[(6, 0.047), (7, 0.09), (12, 0.016), (28, 0.558), (57, 0.046), (63, 0.014), (71, 0.011), (77, 0.04), (83, 0.045)]
simIndex simValue paperId paperTitle
1 0.99836332 117 nips-2008-Learning Taxonomies by Dependence Maximization
Author: Matthew Blaschko, Arthur Gretton
Abstract: We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-ofthe-art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data. 1
2 0.99782288 110 nips-2008-Kernel-ARMA for Hand Tracking and Brain-Machine interfacing During 3D Motor Control
Author: Lavi Shpigelman, Hagai Lalazar, Eilon Vaadia
Abstract: Using machine learning algorithms to decode intended behavior from neural activity serves a dual purpose. First, these tools allow patients to interact with their environment through a Brain-Machine Interface (BMI). Second, analyzing the characteristics of such methods can reveal the relative significance of various features of neural activity, task stimuli, and behavior. In this study we adapted, implemented and tested a machine learning method called Kernel Auto-Regressive Moving Average (KARMA), for the task of inferring movements from neural activity in primary motor cortex. Our version of this algorithm is used in an online learning setting and is updated after a sequence of inferred movements is completed. We first used it to track real hand movements executed by a monkey in a standard 3D reaching task. We then applied it in a closed-loop BMI setting to infer intended movement, while the monkey’s arms were comfortably restrained, thus performing the task using the BMI alone. KARMA is a recurrent method that learns a nonlinear model of output dynamics. It uses similarity functions (termed kernels) to compare between inputs. These kernels can be structured to incorporate domain knowledge into the method. We compare KARMA to various state-of-the-art methods by evaluating tracking performance and present results from the KARMA based BMI experiments. 1
same-paper 3 0.99750543 126 nips-2008-Localized Sliced Inverse Regression
Author: Qiang Wu, Sayan Mukherjee, Feng Liang
Abstract: We developed localized sliced inverse regression for supervised dimension reduction. It has the advantages of preventing degeneracy, increasing estimation accuracy, and automatic subclass discovery in classification problems. A semisupervised version is proposed for the use of unlabeled data. The utility is illustrated on simulated as well as real data sets.
4 0.99676853 206 nips-2008-Sequential effects: Superstition or rational behavior?
Author: Angela J. Yu, Jonathan D. Cohen
Abstract: In a variety of behavioral tasks, subjects exhibit an automatic and apparently suboptimal sequential effect: they respond more rapidly and accurately to a stimulus if it reinforces a local pattern in stimulus history, such as a string of repetitions or alternations, compared to when it violates such a pattern. This is often the case even if the local trends arise by chance in the context of a randomized design, such that stimulus history has no real predictive power. In this work, we use a normative Bayesian framework to examine the hypothesis that such idiosyncrasies may reflect the inadvertent engagement of mechanisms critical for adapting to a changing environment. We show that prior belief in non-stationarity can induce experimentally observed sequential effects in an otherwise Bayes-optimal algorithm. The Bayesian algorithm is shown to be well approximated by linear-exponential filtering of past observations, a feature also apparent in the behavioral data. We derive an explicit relationship between the parameters and computations of the exact Bayesian algorithm and those of the approximate linear-exponential filter. Since the latter is equivalent to a leaky-integration process, a commonly used model of neuronal dynamics underlying perceptual decision-making and trial-to-trial dependencies, our model provides a principled account of why such dynamics are useful. We also show that parameter-tuning of the leaky-integration process is possible, using stochastic gradient descent based only on the noisy binary inputs. This is a proof of concept that not only can neurons implement near-optimal prediction based on standard neuronal dynamics, but that they can also learn to tune the processing parameters without explicitly representing probabilities. 1
5 0.9960655 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking
Author: Nir Ailon
Abstract: The problem of ranking arises ubiquitously in almost every aspect of life, and in particular in Machine Learning/Information Retrieval. A statistical model for ranking predicts how humans rank subsets V of some universe U . In this work we define a statistical model for ranking that satisfies certain desirable properties. The model automatically gives rise to a logistic regression based approach to learning how to rank, for which the score and comparison based approaches are dual views. This offers a new generative approach to ranking which can be used for IR. There are two main contexts for this work. The first is the theory of econometrics and study of statistical models explaining human choice of alternatives. In this context, we will compare our model with other well known models. The second context is the problem of ranking in machine learning, usually arising in the context of information retrieval. Here, much work has been done in the discriminative setting, where different heuristics are used to define ranking risk functions. Our model is built rigorously and axiomatically based on very simple desirable properties defined locally for comparisons, and automatically implies the existence of a global score function serving as a natural model parameter which can be efficiently fitted to pairwise comparison judgment data by solving a convex optimization problem. 1
6 0.99555612 115 nips-2008-Learning Bounded Treewidth Bayesian Networks
7 0.99498367 72 nips-2008-Empirical performance maximization for linear rank statistics
8 0.99324799 174 nips-2008-Overlaying classifiers: a practical approach for optimal ranking
9 0.99067134 222 nips-2008-Stress, noradrenaline, and realistic prediction of mouse behaviour using reinforcement learning
10 0.98418176 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models
11 0.97682279 159 nips-2008-On Bootstrapping the ROC Curve
12 0.97008455 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields
13 0.96325582 211 nips-2008-Simple Local Models for Complex Dynamical Systems
14 0.96148956 101 nips-2008-Human Active Learning
15 0.95920271 112 nips-2008-Kernel Measures of Independence for non-iid Data
16 0.95762658 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation
17 0.95758325 107 nips-2008-Influence of graph construction on graph-based clustering measures
18 0.95658362 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
19 0.95562518 223 nips-2008-Structure Learning in Human Sequential Decision-Making
20 0.94991243 244 nips-2008-Unifying the Sensory and Motor Components of Sensorimotor Adaptation