nips nips2010 nips2010-133 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox
Abstract: The design of low-level image features is critical for computer vision algorithms. Orientation histograms, such as those in SIFT [16] and HOG [3], are the most successful and popular features for visual object and scene recognition. We highlight the kernel view of orientation histograms, and show that they are equivalent to a certain type of match kernels over image patches. This novel view allows us to design a family of kernel descriptors which provide a unified and principled framework to turn pixel attributes (gradient, color, local binary pattern, etc.) into compact patch-level features. In particular, we introduce three types of match kernels to measure similarities between image patches, and construct compact low-dimensional kernel descriptors from these match kernels using kernel principal component analysis (KPCA) [23]. Kernel descriptors are easy to design and can turn any type of pixel attribute into patch-level features. They outperform carefully tuned and sophisticated features including SIFT and deep belief networks. We report superior performance on standard image classification benchmarks: Scene-15, Caltech-101, CIFAR10 and CIFAR10-ImageNet.
Reference: text
sentIndex sentText sentNum sentScore
1 We highlight the kernel view of orientation histograms, and show that they are equivalent to a certain type of match kernels over image patches. [sent-3, score-1.085]
2 This novel view allows us to design a family of kernel descriptors which provide a unified and principled framework to turn pixel attributes (gradient, color, local binary pattern, etc. [sent-4, score-1.068]
3 In particular, we introduce three types of match kernels to measure similarities between image patches, and construct compact low-dimensional kernel descriptors from these match kernels using kernel principal component analysis (KPCA) [23]. [sent-6, score-2.044]
4 Kernel descriptors are easy to design and can turn any type of pixel attribute into patch-level features. [sent-7, score-0.49]
5 In this work, we highlight the kernel view of orientation histograms and provide a unified way to low-level image feature design and learning. [sent-15, score-0.904]
6 We show how our framework is applied to gradient, color, and shape pixel attributes, leading to three effective kernel descriptors. [sent-17, score-0.681]
7 1 The most relevant work to this paper is that of efficient match kernels (EMK) [1], which provides a kernel view to the frequently used Bag-of-Words representation and forms image-level features by learning compact low dimensional projections or using random Fourier transformations. [sent-20, score-0.939]
8 Another related work is based on mathematics of the neural response, which shows that the hierarchical architectures motivated by the neuroscience of the visual cortex is associated to the derived kernel [24]. [sent-22, score-0.462]
9 Instead, the goal of this paper is to provide a deep understanding of how orientation histograms (SIFT and HOG) work, and we can generalize them and design novel low-level image features based on the kernel insight. [sent-23, score-0.985]
10 Our kernel descriptors are general and provide a principled way to convert pixel attributes to patch-level features. [sent-24, score-0.973]
11 To the best of our knowledge, this is the first time that low-level image features are designed and learned from scratch using kernel methods; they can serve as the foundation of many computer vision tasks including object recognition. [sent-25, score-0.714]
12 Our novel kernel descriptors are presented in Section 3, followed by an extensive experimental evaluation in Section 4. [sent-28, score-0.733]
13 Here we describe the kernel view of such orientation histograms features, and show how this kernel view can help overcome issues such as orientation binning. [sent-31, score-1.317]
14 Let θ(z) and m(z) be the orientation and magnitude of the image gradient at a pixel z. [sent-32, score-0.546]
15 In HOG and SIFT, the gradient orientation of each pixel is discretized into a d−dimensional indicator vector δ(z) = [δ1 (z), · · · , δd (z)] with δi (z) = dθ(z) 1, =i−1 2π 0, otherwise (1) where x takes the largest integer less than or equal to x (we will describe soft binning further below). [sent-33, score-0.564]
16 Aggregating feature vectors of pixels over an image patch P , we obtain the histogram of oriented gradients: Fh (P ) = m(z)δ(z) (2) z∈P m(z)2 where m(z) = m(z)/ + g is the normalized gradient magnitude, with g a small z∈P constant. [sent-35, score-0.539]
17 This is equivalent to measuring the similarity of image patches using a linear kernel in the feature map Fh (P ) in kernel space: Kh (P, Q) = Fh (P ) Fh (Q) = m(z)m(z )δ(z) δ(z ) (3) z∈P z ∈Q where P and Q are patches usually from two different images. [sent-39, score-1.277]
18 Therefore, Kh (P, Q) is a match kernel over sets (here the sets are image patches) as in [8, 1, 11, 17, 7]. [sent-42, score-0.712]
19 3 provides a kernel view of HOG features over image patches. [sent-44, score-0.69]
20 To get a kernel view of soft binning [13], we only need to replace the delta function in Eq. [sent-48, score-0.619]
21 The L2 distance between P and Q can be expressed as D(P, Q) = 2 − 2F (P ) F (Q) as we know F (P ) F (P ) = 1, and the kernel view can be provided in the same manner. [sent-51, score-0.465]
22 To measure similarity between two pixel orientation gradients θ and θ , we use the L2 norm between the normalized gradient vectors θ = [sin(θ) cos(θ)] and θ = [sin(θ ) cos(θ )]. [sent-54, score-0.559]
23 Note that the kernel km (z, z ) measuring the similarity of gradient magnitudes of two pixels is linear in gradient magnitude. [sent-60, score-0.764]
24 As can be seen, this kernel introduces quantization errors and could lead to suboptimal performance in subsequent stages of processing. [sent-63, score-0.425]
25 While soft binning results in a smoother kernel function, it still suffers from discretization. [sent-64, score-0.579]
26 This motivates us to search for alternative match kernels which can measure the similarity of image patches more accurately. [sent-65, score-0.615]
27 To estimate the difference between orientations at pixels z and z , we use the following normalized gradient vectors in the kernel function ko : θ(z) = [sin(θ(z)) cos(θ(z))] . [sent-68, score-1.099]
28 The kernel view of orientation histograms provides a simple, unified way to turn pixel attributes into patch-level features. [sent-75, score-0.91]
29 One immediate extension is to construct color match kernels over pixel values: Kcol (P, Q) = kc (c(z), c(z ))kp (z, z ) (7) z∈P z ∈Q where c(z) is the pixel color at position z (intensity for gray images and RGB values for color images). [sent-76, score-1.094]
30 The normalized linear kernel s(z)s(z ) weighs the contribution of each local binary pattern, and the Gaussian kernel kb (b(z), b(z )) = exp(−γb b(z) − b(z ) 2 ) measures shape similarity through local binary patterns. [sent-80, score-1.138]
31 Match kernels defined over various pixel attributes provide a unified way to generate a rich, diverse visual feature set, which has been shown to be very successful to boost recognition accuracy [6]. [sent-81, score-0.589]
32 As validated by our own experiments, gradient, color and shape match kernels are strong in their own right and complement one another. [sent-82, score-0.515]
33 2 Learning Compact Features Match kernels provide a principled way to measure the similarity of image patches, but evaluating kernels can be computationally expensive when image patches are large [1]. [sent-85, score-0.811]
34 An important advantage of our approach is that no local minima are involved, unlike constrained kernel singular value decomposition [1]. [sent-87, score-0.453]
35 We now describe how our compact low-dimensional features are extracted from the gradient kernel Kgrad ; features for the other kernels can be generated the same way. [sent-88, score-0.94]
36 5 as inner products ko (θ(z), θ(z )) = φo (θ(z)) φo (θ(z )), kp (z, z ) = φp (z) φp (z ), we can derive the following feature over image patches: Fgrad (P ) = m(z)φo (θ(z)) ⊗ φp (z) (9) z∈P where ⊗ is the tensor product. [sent-90, score-0.722]
37 A straightforward way to dimension reduction is to sample sufficient image patches from training images and perform KPCA for match kernels. [sent-93, score-0.482]
38 A key issue in this projection process is how to choose a set of basis vectors which makes the finite-dimensional kernel approximate well the original kernel. [sent-99, score-0.573]
39 For example, consider the Gaussian kernel ko (θ(z), θ(z )) over gradient orientation. [sent-101, score-0.891]
40 Left: the orientation kernel ko (θ(z), θ(z )) and its finite-dimensional approximation. [sent-109, score-0.94]
41 All curves show kernel values as functions of θ(z). [sent-111, score-0.425]
42 The red line is the ground truth kernel function ko , and the black, green and blue lines are the finite approximation kernels with different grid sizes. [sent-112, score-1.022]
43 Right: root mean square error (RMSE) between KPCA approximation and the corresponding match kernel as a function of dimensionality. [sent-113, score-0.576]
44 In a similar manner, we can also approximate the kernels kp , kc and kb . [sent-117, score-0.498]
45 The finite-dimensional feature for the gradient match kernel is Fgrad (P ) = z∈P m(z)φo (θ(z)) ⊗ φp (z), and may be efficiently used as features over image patches. [sent-118, score-0.945]
46 When the grid size is larger than 16, the finite kernel and the original kernel become virtually indistinguishable. [sent-122, score-0.903]
47 For the shape kernel over local binary patterns, because the variables are binary, we simply choose the set of all 28 = 256 basis vectors and thus no approximation error is introduced. [sent-123, score-0.702]
48 The size of basis vectors can be further reduced by performing kernel principal component analysis over joint basis vectors: {φo (x1 ) ⊗ φp (y1 ), · · · , φo (xdo ) ⊗ φp (ydp )}, where φp (ys ) are basis vectors for the position kernel and dp is the number of basis vectors. [sent-128, score-1.414]
49 (2), match kernels can be approximated rather accurately using the reduced basis vectors by KPCA. [sent-131, score-0.48]
50 Under the framework of kernel principal component analysis, our gradient kernel descriptor (Eq. [sent-132, score-1.118]
51 5) has the form t do dp t αij F grad (P ) = i=1 j=1 m(z)ko (θ(z), xi )kp (z, yj ) (12) z∈P The computational bottleneck of extracting kernel descriptors are to evaluate the kernel function ko kp between pixels. [sent-133, score-1.751]
52 Fortunately, we can compute two kernel values separately at the cost do + dp , rather than do dp . [sent-134, score-0.521]
53 Our most expensive kernel descriptor, the shape kernel, takes about 4 seconds in MATLAB to compute on a typical image (300 × 300 resolution and 16 × 16 image patches over 5 8×8 grids). [sent-135, score-0.928]
54 5 seconds for the gradient kernel descriptor, compared to about 0. [sent-137, score-0.528]
55 A more efficient GPU-based implementation will certainly reduce the computation time for kernel descriptors such that real time applications become feasible. [sent-139, score-0.733]
56 4 Experiments We compare gradient (KDES-G), color (KDES-C), and shape (KDES-S) kernel descriptors to SIFT and several other state of the art object recognition algorithms on four publicly available datasets: Scene-15, Caltech101, CIFAR10, and CIFAR10-ImageNet (a subset of ImageNet). [sent-140, score-1.121]
57 For gradient and shape kernel descriptors and SIFT, all images are transformed into grayscale ([0, 1]) and resized to be no larger than 300 × 300 pixels with preserved ratio. [sent-141, score-1.088]
58 For our gradient kernel descriptors we use the same gradient computation as used for SIFT descriptors. [sent-149, score-0.939]
59 We also evaluate the performance of the combination of the three kernel descriptors (KDES-A) by simply concatenating the image-level features vectors. [sent-150, score-0.822]
60 Instead of spatial pyramid kernels, we compute image-level features using efficient match kernels (EMK), which has been shown to produce more accurate quantization. [sent-151, score-0.525]
61 We consider 1 × 1, 2 × 2 and 4 × 4 pyramid sub-regions (see [1]), and perform constrained kernel singular value decomposition (CKSVD) to form image-level features, using 1,000 visual words (basis vectors in CKSVD) learned by K-means from about 100,000 image patch features. [sent-152, score-0.818]
62 We have experimented both with linear SVMs and Laplacian kernel SVMs and found that Laplacian kernel SVMs over efficient match kernel features are always better than linear SVMs (see (§4. [sent-154, score-1.515]
63 We use Laplacian kernel SVMs in our experiments (except for the tiny image dataset CIFAR10). [sent-156, score-0.614]
64 1 Hyperparameter Selection We select kernel parameters using a subset of ImageNet. [sent-158, score-0.425]
65 We choose basis vectors for ko , kc , and kp from 25, 5 × 5 × 5 and 5 × 5 uniform grids, respectively, which give sufficient approximations to the original kernels (see also Fig. [sent-160, score-0.957]
66 We optimize the dimensionality of KPCA and match kernel parameters jointly using exhaustive grid search. [sent-162, score-0.629]
67 Our experiments suggest that the optimal parameter settings are r = 200 (dimensionality of kernel descriptors), γo = 5, γc = 4, γb = 2, γp = 3, g = 0. [sent-163, score-0.425]
68 As we see, both gradient and shape kernel descriptors outperform SIFT with a margin. [sent-174, score-0.937]
69 Gradient kernel descriptors and shape kernel descriptors have similar performance. [sent-175, score-1.567]
70 It is not surprising that the intensity kernel descriptor has a lower accuracy, as all the images are grayscale. [sent-176, score-0.673]
71 The combination of the three kernel descriptors further boosts the performance by about 2 percent. [sent-177, score-0.733]
72 Another interesting finding is that Laplacian kernel SVMs are significantly better than linear SVMs, 86. [sent-178, score-0.425]
73 We also tried to replace SIFT features with our gradient and shape kernel descriptors in SPM, and both obtained 83. [sent-183, score-1.026]
74 To our best knowledge, our gradient kernel descriptor alone outperforms the best published result 84. [sent-185, score-0.685]
75 left: Accuracy as functions of feature dimensionality for orientation kernel (KDES-G) and shape kernel (KDES-S), respectively. [sent-206, score-1.144]
76 Methods Linear SVM Laplacian kernel SVM SIFT 76. [sent-209, score-0.425]
77 4 Table 1: Comparisons of recognition accuracy on Scene-15: kernel descriptors and their combination vs SIFT. [sent-229, score-0.82]
78 We compare our kernel descriptors with recently published results obtained both by low-level feature learning algorithms, convolutional deep belief networks (CDBN), and sparse coding methods: invariant predictive sparse decomposition (IPSD) and locality-constrained linear coding. [sent-236, score-0.955]
79 Both our gradient kernel descriptor and shape kernel descriptor are superior to CDBN by a large margin. [sent-239, score-1.318]
80 Notice that both naive Bayesian nearest neighbor (NBNN) and locality-constrained linear coding should be compared to our kernel descriptors over multiple patch sizes because both of them used multiple scales to boost the performance. [sent-241, score-0.861]
81 Another finding is that the combination of three kernel descriptors outperforms any single kernel descriptor. [sent-244, score-1.158]
82 Our goal in this paper is to evaluate the strengths of kernel descriptors. [sent-246, score-0.425]
83 To improve accuracy further, kernel descriptors can be combined with other types of image features. [sent-247, score-0.918]
84 This dataset consists of 60,000 32x32 color images in 10 categories, with 5,000 images per category as training set and 1,000 images per category as test set. [sent-250, score-0.502]
85 We extract kernel descriptors over 8×8 image patches per pixel. [sent-252, score-0.996]
86 The resulting feature vectors have a length of (1+4+9)∗1000(visual words)= 14000 per kernel descriptor. [sent-254, score-0.56]
87 We compare our kernel descriptors to deep networks [14, 9] and several baselines in table 3. [sent-316, score-0.83]
88 5%, respectively, over 30 percent lower than deep belief networks and our kernel descriptors. [sent-319, score-0.6]
89 To validate our intuitions, we also evaluated the color kernel descriptor without spatial information (kernel features are extracted on 1 × 1 spatial grid), and only obtained 38. [sent-326, score-0.852]
90 5% accuracy, 18 percent lower than the color kernel descriptor over pyramid spatial grids. [sent-327, score-0.788]
91 The shape kernel feature, KDES-S, has accuracy of 68. [sent-329, score-0.575]
92 Combing the three kernel descriptors, we obtain the best performance of 76%, 5 percent higher than the most sophisticated deep network mcRBM-DBN, which model pixel mean and covariance jointly using factorized third-order Boltzmann machines. [sent-331, score-0.725]
93 Both gradient and shape kernel descriptors achieve higher accuracy than SIFT features, which again confirms that our gradient kernel descriptor and shape kernel descriptor outperform SIFT features on high resolution images with the same category as CIFAR-10. [sent-339, score-2.56]
94 5 Conclusion We have proposed a general framework, kernel descriptors, to extract low-level features from image patches. [sent-343, score-0.65]
95 Kernel descriptors are based on the insight that the inner product of orientation histograms is a particular match kernel over image patches. [sent-345, score-1.255]
96 We have performed extensive comparisons and confirmed that kernel descriptors outperform both SIFT features and hierarchical feature learning, where the former is the default choice for object recognition and the latter is the most popular low-level feature learning technique. [sent-346, score-1.042]
97 To our best knowledge, we are the first to show how kernel methods can be applied for extracting low-level image features and show superior performance. [sent-347, score-0.65]
98 This opens up many possibilities for learning low-level features with other kernel methods. [sent-348, score-0.514]
99 Considering the huge success of kernel methods in the last twenty years, we believe that this direction is worth being pursued. [sent-349, score-0.425]
100 In the future, we plan to investigate alternative kernels for low-level feature learning and learn pixel attributes from large image data collections such as ImageNet. [sent-350, score-0.568]
wordName wordTfidf (topN-words)
[('kernel', 0.425), ('ko', 0.363), ('descriptors', 0.308), ('sift', 0.297), ('kp', 0.182), ('kernels', 0.181), ('pixel', 0.155), ('orientation', 0.152), ('match', 0.151), ('image', 0.136), ('descriptor', 0.132), ('fgrad', 0.121), ('kgrad', 0.121), ('binning', 0.111), ('kdes', 0.103), ('gradient', 0.103), ('patches', 0.103), ('shape', 0.101), ('patch', 0.095), ('images', 0.092), ('features', 0.089), ('histograms', 0.083), ('kc', 0.083), ('color', 0.082), ('basis', 0.078), ('deep', 0.073), ('hog', 0.072), ('imagenet', 0.071), ('vectors', 0.07), ('kpca', 0.069), ('svms', 0.067), ('object', 0.064), ('pixels', 0.059), ('seattle', 0.057), ('grbm', 0.056), ('pyramid', 0.055), ('attributes', 0.055), ('compact', 0.053), ('grid', 0.053), ('tiny', 0.053), ('spm', 0.052), ('kb', 0.052), ('gko', 0.052), ('kcol', 0.052), ('cvpr', 0.05), ('spatial', 0.049), ('accuracy', 0.049), ('dp', 0.048), ('category', 0.048), ('cdbn', 0.045), ('fh', 0.045), ('percent', 0.045), ('similarity', 0.044), ('orientations', 0.044), ('ys', 0.044), ('soft', 0.043), ('app', 0.042), ('feature', 0.041), ('view', 0.04), ('rmse', 0.04), ('laplacian', 0.04), ('grids', 0.039), ('recognition', 0.038), ('ranzato', 0.037), ('visual', 0.037), ('comparisons', 0.036), ('pami', 0.036), ('scene', 0.035), ('normalized', 0.035), ('cksvd', 0.034), ('emk', 0.034), ('ipsd', 0.034), ('kshape', 0.034), ('xdo', 0.034), ('cos', 0.034), ('principal', 0.033), ('belief', 0.033), ('boost', 0.033), ('position', 0.031), ('uni', 0.031), ('nbnn', 0.03), ('magnitudes', 0.03), ('principled', 0.03), ('wa', 0.029), ('categories', 0.028), ('fergus', 0.028), ('local', 0.028), ('resolution', 0.027), ('design', 0.027), ('sophisticated', 0.027), ('sin', 0.026), ('validate', 0.026), ('convolutional', 0.026), ('rgb', 0.026), ('published', 0.025), ('kh', 0.025), ('boltzmann', 0.024), ('intensity', 0.024), ('networks', 0.024), ('per', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.000001 133 nips-2010-Kernel Descriptors for Visual Recognition
Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox
Abstract: The design of low-level image features is critical for computer vision algorithms. Orientation histograms, such as those in SIFT [16] and HOG [3], are the most successful and popular features for visual object and scene recognition. We highlight the kernel view of orientation histograms, and show that they are equivalent to a certain type of match kernels over image patches. This novel view allows us to design a family of kernel descriptors which provide a unified and principled framework to turn pixel attributes (gradient, color, local binary pattern, etc.) into compact patch-level features. In particular, we introduce three types of match kernels to measure similarities between image patches, and construct compact low-dimensional kernel descriptors from these match kernels using kernel principal component analysis (KPCA) [23]. Kernel descriptors are easy to design and can turn any type of pixel attribute into patch-level features. They outperform carefully tuned and sophisticated features including SIFT and deep belief networks. We report superior performance on standard image classification benchmarks: Scene-15, Caltech-101, CIFAR10 and CIFAR10-ImageNet.
Author: Serhat Bucak, Rong Jin, Anil K. Jain
Abstract: Recent studies have shown that multiple kernel learning is very effective for object recognition, leading to the popularity of kernel learning in computer vision problems. In this work, we develop an efficient algorithm for multi-label multiple kernel learning (ML-MKL). We assume that all the classes under consideration share the same combination of kernel functions, and the objective is to find the optimal kernel combination that benefits all the classes. Although several algorithms have been developed for ML-MKL, their computational cost is linear in the number of classes, making them unscalable when the number of classes is large, a challenge frequently encountered in visual object recognition. We address this computational challenge by developing a framework for ML-MKL that combines the worst-case analysis with stochastic approximation. Our analysis √ shows that the complexity of our algorithm is O(m1/3 lnm), where m is the number of classes. Empirical studies with object recognition show that while achieving similar classification accuracy, the proposed method is significantly more efficient than the state-of-the-art algorithms for ML-MKL. 1
3 0.22202557 145 nips-2010-Learning Kernels with Radiuses of Minimum Enclosing Balls
Author: Kun Gai, Guangyun Chen, Chang-shui Zhang
Abstract: In this paper, we point out that there exist scaling and initialization problems in most existing multiple kernel learning (MKL) approaches, which employ the large margin principle to jointly learn both a kernel and an SVM classifier. The reason is that the margin itself can not well describe how good a kernel is due to the negligence of the scaling. We use the ratio between the margin and the radius of the minimum enclosing ball to measure the goodness of a kernel, and present a new minimization formulation for kernel learning. This formulation is invariant to scalings of learned kernels, and when learning linear combination of basis kernels it is also invariant to scalings of basis kernels and to the types (e.g., L1 or L2 ) of norm constraints on combination coefficients. We establish the differentiability of our formulation, and propose a gradient projection algorithm for kernel learning. Experiments show that our method significantly outperforms both SVM with the uniform combination of basis kernels and other state-of-art MKL approaches. 1
4 0.19525932 124 nips-2010-Inductive Regularized Learning of Kernel Functions
Author: Prateek Jain, Brian Kulis, Inderjit S. Dhillon
Abstract: In this paper we consider the problem of semi-supervised kernel function learning. We first propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformation learning problem. Our result shows that the learned kernel matrices parameterize a linear transformation kernel function and can be applied inductively to new data points. Furthermore, our result gives a constructive method for kernelizing most existing Mahalanobis metric learning formulations. To make our results practical for large-scale data, we modify our framework to limit the number of parameters in the optimization process. We also consider the problem of kernelized inductive dimensionality reduction in the semi-supervised setting. To this end, we introduce a novel method for this problem by considering a special case of our general kernel learning framework where we select the trace norm function as the regularizer. We empirically demonstrate that our framework learns useful kernel functions, improving the k-NN classification accuracy significantly in a variety of domains. Furthermore, our kernelized dimensionality reduction technique significantly reduces the dimensionality of the feature space while achieving competitive classification accuracies. 1
5 0.19297442 279 nips-2010-Universal Kernels on Non-Standard Input Spaces
Author: Andreas Christmann, Ingo Steinwart
Abstract: During the last years support vector machines (SVMs) have been successfully applied in situations where the input space X is not necessarily a subset of Rd . Examples include SVMs for the analysis of histograms or colored images, SVMs for text classiÄ?Ĺš cation and web mining, and SVMs for applications from computational biology using, e.g., kernels for trees and graphs. Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H ⊂ Lp (PX ) is dense, or if the SVM uses a universal kernel k. So far, however, there are no kernels of practical interest known that satisfy these assumptions, if X ⊂ Rd . We close this gap by providing a general technique based on Taylor-type kernels to explicitly construct universal kernels on compact metric spaces which are not subset of Rd . We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing. 1
6 0.17504241 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
7 0.15937486 149 nips-2010-Learning To Count Objects in Images
8 0.15734833 17 nips-2010-A biologically plausible network for the computation of orientation dominance
9 0.13531169 250 nips-2010-Spectral Regularization for Support Estimation
10 0.13016942 280 nips-2010-Unsupervised Kernel Dimension Reduction
11 0.12799218 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition
12 0.12208635 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification
13 0.1188454 103 nips-2010-Generating more realistic images using gated MRF's
14 0.11355127 176 nips-2010-Multiple Kernel Learning and the SMO Algorithm
15 0.11203608 140 nips-2010-Layer-wise analysis of deep networks with Gaussian kernels
16 0.1082212 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
17 0.10437724 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression
18 0.10187822 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
19 0.099059373 95 nips-2010-Feature Transitions with Saccadic Search: Size, Color, and Orientation Are Not Alike
20 0.098167263 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression
topicId topicWeight
[(0, 0.22), (1, 0.124), (2, -0.152), (3, -0.227), (4, 0.24), (5, 0.032), (6, 0.302), (7, 0.024), (8, 0.067), (9, 0.081), (10, -0.046), (11, 0.001), (12, -0.118), (13, -0.045), (14, -0.031), (15, -0.0), (16, -0.033), (17, -0.063), (18, 0.02), (19, 0.032), (20, 0.016), (21, 0.041), (22, -0.009), (23, -0.031), (24, -0.022), (25, 0.059), (26, 0.024), (27, 0.008), (28, 0.02), (29, 0.025), (30, 0.012), (31, -0.027), (32, -0.006), (33, -0.001), (34, 0.007), (35, -0.069), (36, -0.045), (37, -0.028), (38, 0.086), (39, -0.002), (40, 0.015), (41, 0.034), (42, 0.052), (43, 0.009), (44, 0.019), (45, -0.011), (46, 0.001), (47, -0.026), (48, 0.05), (49, 0.071)]
simIndex simValue paperId paperTitle
same-paper 1 0.98476738 133 nips-2010-Kernel Descriptors for Visual Recognition
Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox
Abstract: The design of low-level image features is critical for computer vision algorithms. Orientation histograms, such as those in SIFT [16] and HOG [3], are the most successful and popular features for visual object and scene recognition. We highlight the kernel view of orientation histograms, and show that they are equivalent to a certain type of match kernels over image patches. This novel view allows us to design a family of kernel descriptors which provide a unified and principled framework to turn pixel attributes (gradient, color, local binary pattern, etc.) into compact patch-level features. In particular, we introduce three types of match kernels to measure similarities between image patches, and construct compact low-dimensional kernel descriptors from these match kernels using kernel principal component analysis (KPCA) [23]. Kernel descriptors are easy to design and can turn any type of pixel attribute into patch-level features. They outperform carefully tuned and sophisticated features including SIFT and deep belief networks. We report superior performance on standard image classification benchmarks: Scene-15, Caltech-101, CIFAR10 and CIFAR10-ImageNet.
Author: Serhat Bucak, Rong Jin, Anil K. Jain
Abstract: Recent studies have shown that multiple kernel learning is very effective for object recognition, leading to the popularity of kernel learning in computer vision problems. In this work, we develop an efficient algorithm for multi-label multiple kernel learning (ML-MKL). We assume that all the classes under consideration share the same combination of kernel functions, and the objective is to find the optimal kernel combination that benefits all the classes. Although several algorithms have been developed for ML-MKL, their computational cost is linear in the number of classes, making them unscalable when the number of classes is large, a challenge frequently encountered in visual object recognition. We address this computational challenge by developing a framework for ML-MKL that combines the worst-case analysis with stochastic approximation. Our analysis √ shows that the complexity of our algorithm is O(m1/3 lnm), where m is the number of classes. Empirical studies with object recognition show that while achieving similar classification accuracy, the proposed method is significantly more efficient than the state-of-the-art algorithms for ML-MKL. 1
3 0.73855036 145 nips-2010-Learning Kernels with Radiuses of Minimum Enclosing Balls
Author: Kun Gai, Guangyun Chen, Chang-shui Zhang
Abstract: In this paper, we point out that there exist scaling and initialization problems in most existing multiple kernel learning (MKL) approaches, which employ the large margin principle to jointly learn both a kernel and an SVM classifier. The reason is that the margin itself can not well describe how good a kernel is due to the negligence of the scaling. We use the ratio between the margin and the radius of the minimum enclosing ball to measure the goodness of a kernel, and present a new minimization formulation for kernel learning. This formulation is invariant to scalings of learned kernels, and when learning linear combination of basis kernels it is also invariant to scalings of basis kernels and to the types (e.g., L1 or L2 ) of norm constraints on combination coefficients. We establish the differentiability of our formulation, and propose a gradient projection algorithm for kernel learning. Experiments show that our method significantly outperforms both SVM with the uniform combination of basis kernels and other state-of-art MKL approaches. 1
4 0.71955091 124 nips-2010-Inductive Regularized Learning of Kernel Functions
Author: Prateek Jain, Brian Kulis, Inderjit S. Dhillon
Abstract: In this paper we consider the problem of semi-supervised kernel function learning. We first propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformation learning problem. Our result shows that the learned kernel matrices parameterize a linear transformation kernel function and can be applied inductively to new data points. Furthermore, our result gives a constructive method for kernelizing most existing Mahalanobis metric learning formulations. To make our results practical for large-scale data, we modify our framework to limit the number of parameters in the optimization process. We also consider the problem of kernelized inductive dimensionality reduction in the semi-supervised setting. To this end, we introduce a novel method for this problem by considering a special case of our general kernel learning framework where we select the trace norm function as the regularizer. We empirically demonstrate that our framework learns useful kernel functions, improving the k-NN classification accuracy significantly in a variety of domains. Furthermore, our kernelized dimensionality reduction technique significantly reduces the dimensionality of the feature space while achieving competitive classification accuracies. 1
5 0.68432456 280 nips-2010-Unsupervised Kernel Dimension Reduction
Author: Meihong Wang, Fei Sha, Michael I. Jordan
Abstract: We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for unsupervised problems where covariates and responses are the same. Our empirical studies show that the resulting compact representation yields meaningful and appealing visualization and clustering of data. Furthermore, when used in conjunction with supervised learners for classification, our methods lead to lower classification errors than state-of-the-art methods, especially when embedding data in spaces of very few dimensions.
6 0.67892659 176 nips-2010-Multiple Kernel Learning and the SMO Algorithm
7 0.67047739 279 nips-2010-Universal Kernels on Non-Standard Input Spaces
8 0.59961027 250 nips-2010-Spectral Regularization for Support Estimation
9 0.59325719 17 nips-2010-A biologically plausible network for the computation of orientation dominance
10 0.58600169 149 nips-2010-Learning To Count Objects in Images
11 0.58193684 245 nips-2010-Space-Variant Single-Image Blind Deconvolution for Removing Camera Shake
12 0.5802002 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
13 0.57208008 72 nips-2010-Efficient algorithms for learning kernels from multiple similarity matrices with general convex loss functions
14 0.52304417 95 nips-2010-Feature Transitions with Saccadic Search: Size, Color, and Orientation Are Not Alike
15 0.51296687 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition
16 0.51110482 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors
17 0.50008041 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification
18 0.49850219 249 nips-2010-Spatial and anatomical regularization of SVM for brain image analysis
19 0.49847156 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process
20 0.48613504 266 nips-2010-The Maximal Causes of Natural Scenes are Edge Filters
topicId topicWeight
[(13, 0.021), (17, 0.016), (27, 0.061), (30, 0.03), (35, 0.012), (45, 0.61), (50, 0.044), (52, 0.042), (60, 0.019), (77, 0.016), (78, 0.014), (90, 0.042)]
simIndex simValue paperId paperTitle
same-paper 1 0.99890774 133 nips-2010-Kernel Descriptors for Visual Recognition
Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox
Abstract: The design of low-level image features is critical for computer vision algorithms. Orientation histograms, such as those in SIFT [16] and HOG [3], are the most successful and popular features for visual object and scene recognition. We highlight the kernel view of orientation histograms, and show that they are equivalent to a certain type of match kernels over image patches. This novel view allows us to design a family of kernel descriptors which provide a unified and principled framework to turn pixel attributes (gradient, color, local binary pattern, etc.) into compact patch-level features. In particular, we introduce three types of match kernels to measure similarities between image patches, and construct compact low-dimensional kernel descriptors from these match kernels using kernel principal component analysis (KPCA) [23]. Kernel descriptors are easy to design and can turn any type of pixel attribute into patch-level features. They outperform carefully tuned and sophisticated features including SIFT and deep belief networks. We report superior performance on standard image classification benchmarks: Scene-15, Caltech-101, CIFAR10 and CIFAR10-ImageNet.
2 0.99698806 187 nips-2010-Occlusion Detection and Motion Estimation with Convex Optimization
Author: Alper Ayvaci, Michalis Raptis, Stefano Soatto
Abstract: We tackle the problem of simultaneously detecting occlusions and estimating optical flow. We show that, under standard assumptions of Lambertian reflection and static illumination, the task can be posed as a convex minimization problem. Therefore, the solution, computed using efficient algorithms, is guaranteed to be globally optimal, for any number of independently moving objects, and any number of occlusion layers. We test the proposed algorithm on benchmark datasets, expanded to enable evaluation of occlusion detection performance. 1
3 0.99668318 167 nips-2010-Mixture of time-warped trajectory models for movement decoding
Author: Elaine Corbett, Eric Perreault, Konrad Koerding
Abstract: Applications of Brain-Machine-Interfaces typically estimate user intent based on biological signals that are under voluntary control. For example, we might want to estimate how a patient with a paralyzed arm wants to move based on residual muscle activity. To solve such problems it is necessary to integrate obtained information over time. To do so, state of the art approaches typically use a probabilistic model of how the state, e.g. position and velocity of the arm, evolves over time – a so-called trajectory model. We wanted to further develop this approach using two intuitive insights: (1) At any given point of time there may be a small set of likely movement targets, potentially identified by the location of objects in the workspace or by gaze information from the user. (2) The user may want to produce movements at varying speeds. We thus use a generative model with a trajectory model incorporating these insights. Approximate inference on that generative model is implemented using a mixture of extended Kalman filters. We find that the resulting algorithm allows us to decode arm movements dramatically better than when we use a trajectory model with linear dynamics. 1 In trod u cti on When patients have lost a limb or the ability to communicate with the outside world, brain machine interfaces (BMIs) are often used to enable robotic prostheses or restore communication. To achieve this, the user's intended state of the device must be decoded from biological signals. In the context of Bayesian statistics, two aspects are important for the design of an estimator of a temporally evolving state: the observation model, which describes how measured variables relate to the system’s state and the trajectory model which describes how the state changes over time in a probabilistic manner. Following this logic many recent BMI applications have relied on Bayesian estimation for a wide range of problems including the decoding of intended human [1] and animal [2] movements. In the context of BMIs, Bayesian approaches offer a principled way of formalizing the uncertainty about signals and thus often result in improvements over other signal processing techniques [1]-[3]. Most work on state estimation in dynamical systems has assumed linear dynamics and Gaussian noise. Under these circumstances, efficient algorithms result from belief propagation. The most frequent application uses the Kalman filter (KF), which recursively combines noisy state observations with the probabilistic evolution of state defined by the trajectory model to estimate the marginal distribution over states [4]. Such approaches have been used widely for applications including upper [1] and lower [5] extremity prosthetic 1 devices, functional electric stimulation [6] and human computer interactions [7]. As these algorithms are so commonly used, it seems promising to develop extensions to nonlinear trajectory models that may better describe the probabilistic distribution of movements in everyday life. One salient departure from the standard assumptions is that people tend to produce both slow and fast movements, depending on the situation. Models with linear dynamics only allow such deviation through the noise term, which makes these models poor at describing the natural variation of movement speeds during real world tasks. Explicitly incorporating movement speed into the trajectory model should lead to better movement estimates. Knowledge of the target position should also strongly affect trajectory models. After all , we tend to accelerate our arm early during movement and slow down later on. Target information can be linearly incorporated into the trajectory model, and this has greatly improved predictions [8]-[12]. Alternatively, if there are a small number of potential targets then a mixture of trajectory models approach [13] can be used. Here we are interested in the case where available data provide a prior over potential t argets but where movement targets may be anywhere. We want to incorporate target uncertainty and allow generalization to novel targets. Prior information about potential targets could come from a number of sources but would generally be noisy. For example, activity in the dorsal premotor cortex provides information about intended target location prior to movement and may be used where such recordings are available [14]. Target information may also be found noninvasively by tracking eye movements. However, such data will generally provide non-zero priors for a number of possible target locations as the subject saccades over the scene. While subjects almost always look at a target before reaching for it [15], there may be a delay of up to a second between looking at the target and the reach – a time interval over which up to 3 saccades are typically made. Each of these fixations could be the target. Hence, a probabilistic distribution of targets is appropriate when using either neural recordings or eye tracking to estimate potential reach targets Here we present an algorithm that uses a mixture of extended Kalman Filters (EKFs) to combine our insights related to the variation of movement speed and the availability of probabilistic target knowledge. Each of the mixture component s allows the speed of the movement to vary continuously over time. We tested how well we could use EMGs and eye movements to decode hand position of humans performing a three -dimensional large workspace reaching task. We find that using a trajectory model that allows for probabilistic target information and variation of speed leads to dramatic improvements in decoding quality. 2 Gen e ral Decod i n g S etti n g We wanted to test how well different decoding algorithms can decode human movement, over a wide range of dynamics. While many recent studies have looked at more restrictive, two-dimensional movements, a system to restore arm function should produce a wide range of 3D trajectories. We recorded arm kinematics and EMGs of healthy subjects during unconstrained 3D reaches to targets over a large workspace. Two healthy subjects were asked to reach at slow, normal and fast speeds, as they would in everyday life. Subjects were seated as they reached towards 16 LEDs in blocks of 150s, which were located on two planes positioned such that all targets were just reachable (Fig 1A). The target LED was lit for one second prior to an auditory go cue, at which time the subject would reach to the target at the appropriate speed. Slow, normal and fast reaches were allotted 3 s, 1.5s and 1s respectively; however, subjects determined the speed. An approximate total of 450 reaches were performed per subject. The subjects provided informed consent, and the protocol was approved by the Northwestern University Institutional Review Board. EMG signals were measured from the pectoralis major, and the three deltoid muscles of the shoulder. This represents a small subset of the muscles involved in reaching, and approximates those muscles retaining some voluntary control following mid-level cervical spinal cord injuries. 2 The EMG signals were band-pass filtered between 10 and 1,000 Hz, and subsequently anti aliased filtered. Hand, wrist, shoulder and head positions were tracked using an Optotrak motion analysis system. We simultaneously recorded eye movements with an ASL EYETRAC-6 head mounted eye tracker. Approximately 25% of the reaches were assigned to the test set, and the rest were used for training. Reaches for which either the motion capture data was incomplete, or there was visible motion artifact on the EMG were removed. As the state we used hand positions and joint angles (3 shoulder, 2 elbow, position, velocity and acceleration, 24 dimensions). Joint angles were calculated from the shoulder and wrist marker data using digitized bony landmarks which defined a coordinate system for the upper limb as detailed by Wu et al. [16]. As the motion data were sampled at 60Hz, the mean absolute value o f the EMG in the corresponding 16.7ms windows was used as an observation of the state at each time-step. Algorithm accuracy was quantified by normalizing the root -mean-squared error by the straight line distance between the first and final position of the endpoint for each reach. We compared the algorithms statistically using repeated measures ANOVAs with Tukey post -hoc tests, treating reach and subject as random effects. In the rest of the paper we will ask how well these reaching movements can be decoded from EMG and eye-tracking data. Figure 1: A Experimental setup and B sample kinematics and processed EMGs for one reach 3 Kal man Fi l ters w i th Target i n f ormati on All models that we consider in this paper assume linear observations with Gaussian noise: (1) where x is the state, y is the observation and v is the measurement noise with p(v) ~ N(0,R), and R is the observation covariance matrix. The model fitted the measured EMGs with an average r2 of 0.55. This highlights the need to integrate information over time. The standard approach also assumes linear dynamics and Gaussian process noise: (2) where, x t represents the hand and joint angle positions, w is the process noise with p(w) ~ N(0,Q), and Q is the state covariance matrix. The Kalman filter does optimal inference for this generative model. This model can effectively capture the dynamics of stereotypical reaches to a single target by appropriately tuning its parameters. However, when used to describe reaches to multiple targets, the model cannot describe target dependent aspects of reaching but boils down to a random drift model. Fast velocities are underestimated as they are unlikely under the trajectory model and there is excessive drift close to the target (Fig. 2A). 3 In many decoding applications we may know the subject’s target. A range of recent studies have addressed the issue of incorporating this information into the trajectory model [8, 13], and we might assume the effect of the target on the dynamics to be linear. This naturally suggests adding the target to the state space, which works well in practice [9, 12]. By appending the target to the state vector (KFT), the simple linear format of the KF may be retained: (3) where xTt is the vector of target positions, with dimensionality less than or equal to that of xt. This trajectory model thus allows describing both the rapid acceleration that characterizes the beginning of a reach and the stabilization towards its end. We compared the accuracy of the KF and the KFT to the Single Target Model (STM), a KF trained only on reaches to the target being tested (Fig. 2). The STM represents the best possible prediction that could be obtained with a Kalman filter. Assuming the target is perfectly known, we implemented the KFT by correctly initializing the target state xT at the beginning of the reach. We will relax this assumption below. The initial hand and joint angle positions were also assumed to be known. Figure 2: A Sample reach and predictions and B average accuracies with standard errors for KFT, KF and MTM. Consistent with the recent literature, both methods that incorporated target information produced higher prediction accuracy than the standard KF (both p<0.0001). Interestingly, there was no significant difference between the KFT and the STM (p=0.9). It seems that when we have knowledge of the target, we do not lose much by training a single model over the whole workspace rather than modeling the targets individually. This is encouraging, as we desire a BMI system that can generalize to any target within the workspace, not just specifically to those that are available in the training data. Clearly, adding the target to the state space allows the dynamics of typical movements to be modeled effectively, resulting in dramatic increases in decoding performance. 4 Ti me Warp i n g 4.1 I m p l e m e n t i n g a t i m e - w a r p e d t r a j e c t o r y mo d e l While the KFT above can capture the general reach trajectory profile, it does not allow for natural variability in the speed of movements. Depending on our task objectives, which would not directly be observed by a BMI, we might lazily reach toward a target or move a t maximal speed. We aim to change the trajectory model to explicitly incorporate a warping factor by which the average movement speed is scaled, allowing for such variability. As the movement speed will be positive in all practical cases, we model the logarithm of this factor, 4 and append it to the state vector: (4) We create a time-warped trajectory model by noting that if the average rate of a trajectory is to be scaled by a factor S, the position at time t will equal that of the original trajectory at time St. Differentiating, the velocity will be multiplied by S, and the acceleration by S 2. For simplicity, the trajectory noise is assumed to be additive and Gaussian, and the model is assumed to be stationary: (5) where Ip is the p-dimensional identity matrix and is a p p matrix of zeros. Only the terms used to predict the acceleration states need to be estimated to build the state transition matrix, and they are scaled as a nonlinear function of xs. After adding the variable movement speed to the state space the system is no longer linear. Therefore we need a different solution strategy. Instead of the typical KFT we use the Extended Kalman Filter (EKFT) to implement a nonlinear trajectory model by linearizing the dynamics around the best estimate at each time-step [17]. With this approach we add only small computational overhead to the KFT recursions. 4.2 Tr a i n i n g t h e t i m e w a r p i n g mo d e l The filter parameters were trained using a variant of the Expectation Maximization (EM) algorithm [18]. For extended Kalman filter learning the initialization for the variables may matter. S was initialized with the ground truth average reach speeds for each movement relative to the average speed across all movements. The state transition parameters were estimated using nonlinear least squares regression, while C, Q and R were estimated linearly for the new system, using the maximum likelihood solution [18] (M-step). For the E-step we used a standard extended Kalman smoother. We thus found the expected values for t he states given the current filter parameters. For this computation, and later when testing the algorithm, xs was initialized to its average value across all reaches while the remaining states were initialized to their true values. The smoothed estimate fo r xs was then used, along with the true values for the other states, to re-estimate the filter parameters in the M-step as before. We alternated between the E and M steps until the log likelihood converged (which it did in all cases). Following the training procedure, the diagonal of the state covariance matrix Q corresponding to xs was set to the variance of the smoothed xs over all reaches, according to how much this state should be allowed to change during prediction. This allowed the estimate of xs to develop over the course of the reach due to the evidence provided by the observations, better capturing the dynamics of reaches at different speeds. 4.3 P e r f o r ma n c e o f t h e t i m e - w a r p e d E K F T Incorporating time warping explicitly into the trajectory model pro duced a noticeable increase in decoding performance over the KFT. As the speed state xs is estimated throughout the course of the reach, based on the evidence provided by the observations, the trajectory model has the flexibility to follow the dynamics of the reach more accurately (Fig. 3). While at the normal self-selected speed the difference between the algorithms is small, for the slow and fast speeds, where the dynamics deviate from average, there i s a clear advantage to the time warping model. 5 Figure 3: Hand positions and predictions of the KFT and EKFT for sample reaches at A slow, B normal and C fast speeds. Note the different time scales between reaches. The models were first trained using data from all speeds (Fig. 4A). The EKFT was 1.8% more accurate on average (p<0.01), and the effect was significant at the slow (1.9%, p<0.05) and the fast (2.8%, p<0.01), but not at the normal (p=0.3) speed. We also trained the models from data using only reaches at the self-selected normal speed, as we wanted to see if there was enough variation to effectively train the EKFT (Fig. 4B). Interestingly, the performance of the EKFT was reduced by only 0.6%, and the KFT by 1.1%. The difference in performance between the EKFT and KFT was even more pronounced on aver age (2.3%, p<0.001), and for the slow and fast speeds (3.6 and 4.1%, both p< 0.0001). At the normal speed, the algorithms again were not statistically different (p=0.6). This result demonstrates that the EKFT is a practical option for a real BMI system, as it is not necessary to greatly vary the speeds while collecting training data for the model to be effective over a wide range of intended speeds. Explicitly incorporating speed information into the trajectory model helps decoding, by modeling the natural variation in volitional speed. Figure 4: Mean and standard error of EKFT and KFT accuracy at the different subjectselected speeds. Models were trained on reaches at A all speeds and B just normal speed reaches. Asterisks indicate statistically significant differences between the algorithms. 5 Mi xtu res of Target s So far, we have assumed that the targets of our reaches are perfectly known. In a real-world system, there will be uncertainty about the intended target of the reach. However, in typical applications there are a small number of possible objectives. Here we address this situation. Drawing on the recent literature, we use a mixture model to consider each of the possible targets [11, 13]. We condition the posterior probability for the state on the N possible targets, T: (6) 6 Using Bayes' Rule, this equation becomes: (7) As we are dealing with a mixture model, we perform the Kalman filter recursion for each possible target, xT, and our solution is a weighted sum of the outputs. The weights are proportional to the prior for that target, , and the likelihood of the model given that target . is independent of the target and does not need to be calculated. We tested mixtures of both algorithms, the mKFT and mEKFT, with real uncert ain priors obtained from eye-tracking in the one-second period preceding movement. As the targets were situated on two planes, the three-dimensional location of the eye gaze was found by projecting its direction onto those planes. The first, middle and last eye samples were selected, and all other samples were assigned to a group according to which of the three was closest. The mean and variance of these three groups were used to initialize three Kalman filters in the mixture model. The priors of the three groups were assigned proportional to the number of samples in them. If the subject looks at multiple positions prior to reaching, this method ensures with a high probability that the correct target was accounted for in one of the filters in the mixture. We also compared the MTM approach of Yu et al. [13], where a different KF model was generated for each target, and a mixture is performed over these models. This approach explicitly captures the dynamics of stereotypical reaches to specific targets. Given perfect target information, it would reduce to the STM described above. Priors for the MTM were found by assigning each valid eye sample to its closest two targets, and weighting the models proportional to the number of samples assigned to the corresponding target, divided by its distance from the mean of those samples. We tried other ways of assigning priors and the one presented gave the best results. We calculated the reduction in decoding quality when instead of perfect priors we provide eye-movement based noisy priors (Fig. 5). The accuracies of the mEKFT, the mKFT and the MTM were only degraded by 0.8, 1.9 and 2.1% respectively, compared to the perfect prior situation. The mEKFT was still close to 10% better than the KF. The mixture model framework is effective in accounting for uncertain priors. Figure 5: Mean and standard errors of accuracy for algorithms with perfect priors, and uncertain priors with full and partial training set. The asterisk indicates a statistically significant effects between the two training types, where real priors are used. Here, only reaches at normal speed were used to train the models, as this is a more realistic training set for a BMI application. This accounts for the degraded performance of the MTM with perfect priors relative to the STM from above (Fig. 2). With even more stereotyped training data for each target, the MTM doesn't generalize as well to new speeds. 7 We also wanted to know if the algorithms could generalize to new targets. In a real application, the available training data will generally not span the entire useable worksp ace. We compared the algorithms where reaches to all targets except the one being tested had been used to train the models. The performance of the MTM was significantly de graded unsurprisingly, as it was designed for reaches to a set of known targets. Performance of the mKFT and mEKFT degraded by about 1%, but not significantly (both p>0.7), demonstrating that the continuous approach to target information is preferable when the target could be anywhere in space, not just at locations for which training data is available. 6 Di scu ssi on and concl u si on s The goal of this work was to design a trajectory model that would improve decoding for BMIs with an application to reaching. We incorporated two features that prominently influence the dynamics of natural reach: the movement speed and the target location. Our approach is appropriate where uncertain target information is available. The model generalizes well to new regions of the workspace for which there is no training data, and across a broad range of reaching dynamics to widely spaced targets in three dimensions. The advantages over linear models in decoding precision we report here could be equally obtained using mixtures over many targets and speeds. While mixture models [11, 13] could allow for slow versus fast movements and any number of potential targets, this strategy will generally require many mixture components. Such an approach would require a lot more training data, as we have shown that it does not generalize well. It would also be run-time intensive which is problematic for prosthetic devices that rely on low power controllers. In contrast, the algorithm introduced here only takes a small amount of additional run-time in comparison to the standard KF approach. The EKF is only marginally slower than the standard KF and the algorithm will not generally need to consider more than 3 mixture components assuming the subject fixates the target within the second pre ceding the reach. In this paper we assumed that subjects always would fixate a reach target – along with other non-targets. While this is close to the way humans usually coordinate eyes and reaches [15], there might be cases where people do not fixate a reach target. Our approach could be easily extended to deal with such situations by adding a dummy mixture component that all ows the description of movements to any target. As an alternative to mixture approaches, a system can explicitly estimate the target position in the state vector [9]. This approach, however, would not straightforwardly allow for the rich target information available; we look at the target but also at other locations, strongly suggesting mixture distributions. A combination of the two approaches could further improve decoding quality. We could both estimate speed and target position for the EKFT in a continuous manner while retaining the mixture over target priors. We believe that the issues that we have addressed here are almost universal. Virtually all types of movements are executed at varying speed. A probabilistic distribution for a small number of action candidates may also be expected in most BMI applications – after all there are usually only a small number of actions that make sense in a given environment. While this work is presented in the context of decoding human reaching, it may be applied to a wide range of BMI applications including lower limb prosthetic devices and human computer interactions, as well as different signal sources such as electrode grid recordings and electroencephalograms. The increased user control in conveying their intended movements would significantly improve the functionality of a neuroprosthetic device. A c k n o w l e d g e me n t s T h e a u t h o r s t h a n k T. H a s w e l l , E . K r e p k o v i c h , a n d V. Ravichandran for assistance with experiments. This work was funded by the NSF Program in Cyber-Physical Systems. R e f e re n c e s [1] L.R. Hochberg, M.D. Serruya, G.M. Friehs, J.A. Mukand, M. Saleh, A.H. Caplan, A. Branner, D. 8 [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] Chen, R.D. Penn, and J.P. Donoghue, “Neuronal ensemble control of prosthetic devices by a human with tetraplegia,” Nature, vol. 442, 2006, pp. 164–171. W. Wu, Y. Gao, E. Bienenstock, J.P. Donoghue, and M.J. Black, “Bayesian population decoding of motor cortical activity using a Kalman filter,” Neural Computation, vol. 18, 2006, pp. 80–118. W. Wu, M.J. Black, Y. Gao, E. Bienenstock, M. Serruya, A. Shaikhouni, and J.P. Donoghue, “Neural decoding of cursor motion using a Kalman filter,” Advances in Neural Information Processing Systems 15: Proceedings of the 2002 Conference, 2003, p. 133. R.E. Kalman, “A new approach to linear filtering and prediction problems,” Journal of basic Engineering, vol. 82, 1960, pp. 35–45. G.G. Scandaroli, G.A. Borges, J.Y. Ishihara, M.H. Terra, A.F.D. Rocha, and F.A.D.O. Nascimento, “Estimation of foot orientation with respect to ground for an above knee robotic prosthesis,” Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems, St. Louis, MO, USA: IEEE Press, 2009, pp. 1112-1117. I. Cikajlo, Z. Matjačić, and T. Bajd, “Efficient FES triggering applying Kalman filter during sensory supported treadmill walking,” Journal of Medical Engineering & Technology, vol. 32, 2008, pp. 133144. S. Kim, J.D. Simeral, L.R. Hochberg, J.P. Donoghue, and M.J. Black, “Neural control of computer cursor velocity by decoding motor cortical spiking activity in humans with tetraplegia,” Journal of Neural Engineering, vol. 5, 2008, pp. 455-476. L. Srinivasan, U.T. Eden, A.S. Willsky, and E.N. Brown, “A state-space analysis for reconstruction of goal-directed movements using neural signals,” Neural computation, vol. 18, 2006, pp. 2465–2494. G.H. Mulliken, S. Musallam, and R.A. Andersen, “Decoding trajectories from posterior parietal cortex ensembles,” Journal of Neuroscience, vol. 28, 2008, p. 12913. W. Wu, J.E. Kulkarni, N.G. Hatsopoulos, and L. Paninski, “Neural Decoding of Hand Motion Using a Linear State-Space Model With Hidden States,” IEEE Transactions on neural systems and rehabilitation engineering, vol. 17, 2009, p. 1. J.E. Kulkarni and L. Paninski, “State-space decoding of goal-directed movements,” IEEE Signal Processing Magazine, vol. 25, 2008, p. 78. C. Kemere and T. Meng, “Optimal estimation of feed-forward-controlled linear systems,” IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP'05), 2005. B.M. Yu, C. Kemere, G. Santhanam, A. Afshar, S.I. Ryu, T.H. Meng, M. Sahani, and K.V. Shenoy, “Mixture of trajectory models for neural decoding of goal-directed movements,” Journal of neurophysiology, vol. 97, 2007, p. 3763. N. Hatsopoulos, J. Joshi, and J.G. O'Leary, “Decoding continuous and discrete motor behaviors using motor and premotor cortical ensembles,” Journal of neurophysiology, vol. 92, 2004, p. 1165. R.S. Johansson, G. Westling, A. Backstrom, and J.R. Flanagan, “Eye-hand coordination in object manipulation,” Journal of Neuroscience, vol. 21, 2001, p. 6917. G. Wu, F.C. van der Helm, H.E.J. Veeger, M. Makhsous, P. Van Roy, C. Anglin, J. Nagels, A.R. Karduna, and K. McQuade, “ISB recommendation on definitions of joint coordinate systems of various joints for the reporting of human joint motion–Part II: shoulder, elbow, wrist and hand,” Journal of biomechanics, vol. 38, 2005, pp. 981–992. D. Simon, Optimal state estimation: Kalman, H [infinity] and nonlinear approaches, John Wiley and Sons, 2006. Z. Ghahramani and G.E. Hinton, “Parameter estimation for linear dynamical systems,” University of Toronto technical report CRG-TR-96-2, vol. 6, 1996. 9
4 0.99643052 100 nips-2010-Gaussian Process Preference Elicitation
Author: Shengbo Guo, Scott Sanner, Edwin V. Bonilla
Abstract: Bayesian approaches to preference elicitation (PE) are particularly attractive due to their ability to explicitly model uncertainty in users’ latent utility functions. However, previous approaches to Bayesian PE have ignored the important problem of generalizing from previous users to an unseen user in order to reduce the elicitation burden on new users. In this paper, we address this deficiency by introducing a Gaussian Process (GP) prior over users’ latent utility functions on the joint space of user and item features. We learn the hyper-parameters of this GP on a set of preferences of previous users and use it to aid in the elicitation process for a new user. This approach provides a flexible model of a multi-user utility function, facilitates an efficient value of information (VOI) heuristic query selection strategy, and provides a principled way to incorporate the elicitations of multiple users back into the model. We show the effectiveness of our method in comparison to previous work on a real dataset of user preferences over sushi types. 1
5 0.99596298 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs
Author: Nikos Karampatziakis
Abstract: We cast the problem of identifying basic blocks of code in a binary executable as learning a mapping from a byte sequence to a segmentation of the sequence. In general, inference in segmentation models, such as semi-CRFs, can be cubic in the length of the sequence. By taking advantage of the structure of our problem, we derive a linear-time inference algorithm which makes our approach practical, given that even small programs are tens or hundreds of thousands bytes long. Furthermore, we introduce two loss functions which are appropriate for our problem and show how to use structural SVMs to optimize the learned mapping for these losses. Finally, we present experimental results that demonstrate the advantages of our method against a strong baseline. 1
6 0.99572313 235 nips-2010-Self-Paced Learning for Latent Variable Models
7 0.99527931 11 nips-2010-A POMDP Extension with Belief-dependent Rewards
8 0.99433529 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits
9 0.99416161 108 nips-2010-Graph-Valued Regression
10 0.98699886 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
11 0.98270702 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering
12 0.98088056 83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs
13 0.97887844 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning
14 0.97764736 212 nips-2010-Predictive State Temporal Difference Learning
15 0.97619444 94 nips-2010-Feature Set Embedding for Incomplete Data
16 0.97609073 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
17 0.9745127 197 nips-2010-Optimal Bayesian Recommendation Sets and Myopically Optimal Choice Query Sets
18 0.97337115 144 nips-2010-Learning Efficient Markov Networks
19 0.97214514 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition
20 0.97204506 103 nips-2010-Generating more realistic images using gated MRF's