nips nips2011 nips2011-165 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ricardo S. Cabral, Fernando Torre, Joao P. Costeira, Alexandre Bernardino
Abstract: Recently, image categorization has been an active research topic due to the urgent need to retrieve and browse digital images via semantic keywords. This paper formulates image categorization as a multi-label classification problem using recent advances in matrix completion. Under this setting, classification of testing data is posed as a problem of completing unknown label entries on a data matrix that concatenates training and testing features with training labels. We propose two convex algorithms for matrix completion based on a Rank Minimization criterion specifically tailored to visual data, and prove its convergence properties. A major advantage of our approach w.r.t. standard discriminative classification methods for image categorization is its robustness to outliers, background noise and partial occlusions both in the feature and label space. Experimental validation on several datasets shows how our method outperforms state-of-the-art algorithms, while effectively capturing semantic concepts of classes. 1
Reference: text
sentIndex sentText sentNum sentScore
1 pt Abstract Recently, image categorization has been an active research topic due to the urgent need to retrieve and browse digital images via semantic keywords. [sent-9, score-0.375]
2 This paper formulates image categorization as a multi-label classification problem using recent advances in matrix completion. [sent-10, score-0.29]
3 Under this setting, classification of testing data is posed as a problem of completing unknown label entries on a data matrix that concatenates training and testing features with training labels. [sent-11, score-0.261]
4 We propose two convex algorithms for matrix completion based on a Rank Minimization criterion specifically tailored to visual data, and prove its convergence properties. [sent-12, score-0.363]
5 standard discriminative classification methods for image categorization is its robustness to outliers, background noise and partial occlusions both in the feature and label space. [sent-16, score-0.408]
6 Experimental validation on several datasets shows how our method outperforms state-of-the-art algorithms, while effectively capturing semantic concepts of classes. [sent-17, score-0.069]
7 1 Introduction With the ever-growing amount of digital image data in multimedia databases, there is a great need for algorithms that can provide effective semantic indexing. [sent-18, score-0.22]
8 Categorizing digital images using keywords, however, is the quintessential example of a challenging classification problem. [sent-19, score-0.087]
9 Several aspects contribute to the difficulty of the image categorization problem, including the large variability in appearance, illumination and pose of different objects. [sent-20, score-0.219]
10 Over the last decade, progress in the image classification problem has been achieved by using more powerful classifiers and building or learning better image representations. [sent-22, score-0.228]
11 On one hand, standard discriminative approaches such as Support Vector Machines or Boosting have been extended to the multi-label case [28, 14] and incorporated under frameworks such as Multiple Instance Learning [31, 33, 32, 20, 27] and Multi-task Learning [26]. [sent-23, score-0.078]
12 However, a major limitation of discriminative approaches is the lack of robustness to outliers and missing data. [sent-24, score-0.108]
13 Recall most discriminative approaches project the data directly onto linear or non-linear spaces, thus lacking a noise model for it. [sent-25, score-0.235]
14 To address this issue, we propose formulating the image classification problem under a matrix completion framework, that has been fueled by recent advances in Rank Minimization [7, 18]. [sent-26, score-0.4]
15 On the other hand, traversal to the use of more powerful classifiers, better image representations, such as SIFT [17] or GIST [21] have boosted recognition and categorization performance. [sent-28, score-0.219]
16 Our algorithms make use of the fact that in this model the histogram of an entire image contains information of all of its subparts. [sent-30, score-0.182]
17 By modeling the error in the histogram, our matrix completion algorithm is able to capture semantically discriminative portions of the image, thus obviating the need for training with precise localization, as required by previous methods [31, 33, 32, 20, 27]. [sent-31, score-0.423]
18 1 Our main contributions are twofold: (1) We propose two new Rank Minimization algorithms, MCPos and MC-Simplex, motivated by the image categorization problem. [sent-32, score-0.219]
19 2 Previous Work This section reviews related work in the area of image categorization and the problem of Matrix Completion using a Rank Minimization criterion, optimized with Nuclear Norm methods. [sent-36, score-0.219]
20 Image semantic understanding is now typically formulated as a multi-label problem. [sent-39, score-0.069]
21 In this setting, each image may be simultaneously categorized into more than one of a set of predefined categories. [sent-40, score-0.114]
22 Additionally, Multiple Instance Learning (MIL) approaches can be used to explicitly model the relations between labels and specific regions of the image, as initially proposed by Maron et al. [sent-43, score-0.066]
23 This framework allows for the localization and classification tasks to benefit from each other, thus reducing noise in the corresponding feature space and making the learned semantic models more accurate [31, 33, 32, 20, 27, 26]. [sent-45, score-0.191]
24 [31] exploit asymmetric loss functions to balance false positives and negatives. [sent-51, score-0.067]
25 This is usually obtained by pre-segmenting images to a small fixed number of parts or applied in settings where detectors perform well, such as the problem of associating faces to captioned names [4]. [sent-53, score-0.08]
26 A major breakthrough by [7] states the minimization of the rank function, under broad conditions, can be achieved using the minimizer obtained with the Nuclear Norm (sum of singular values). [sent-59, score-0.191]
27 In the last few years, incremental matrix completion methods have also been proposed [1, 2, 5]. [sent-62, score-0.286]
28 2 3 Multi-label classification using Matrix Completion In a supervised setting, a classifier learns a mapping1 W : X → Y between the space of features X and the space of labels Y, from Ntr tuples of known features and labels. [sent-64, score-0.139]
29 Linear classifiers define (xj , yj ) ∈ RF × RK , where F is the feature dimension and K the number of classes, and minimize the loss l between the output space and the projection of the input space, as Ntr minimize W,b l yj , [W b] j=1 xj 1 , (1) with parameters W ∈ RK×F , b ∈ RK . [sent-65, score-0.138]
30 For this purpose, they concatenate all labels and features into matrices Ytst ∈ RK×Ntst , Ytr ∈ RK×Ntr , Xtst ∈ RF ×Ntst , Xtr ∈ RF ×Ntr . [sent-68, score-0.069]
31 If the linear model holds, then the matrix Ytr Ytst Z0 = Xtr Xtst , (2) 1 should be rank deficient. [sent-69, score-0.156]
32 The classification process consists in filling the unknown entries in Ytst such that the Nuclear Norm of Z0 , the convex envelope of rank [7], is minimized. [sent-70, score-0.177]
33 Since in practice we may have errors and partial knowledge in the training labels and in the feature space, let us define ΩX and ΩY as the set of known feature and label entries and zero out unknown entries in Z0 . [sent-71, score-0.278]
34 Additionally, let the data matrix Z be defined as a sum of Z0 with an error term E, as Ytr Ytst EY tr 0 ZY (3) Z = ZX = Xtr Xtst + EXtr EXtst = Z0 + E, Z1 1 0 where ZY , ZX , Z1 respectively stand for the label, feature and last rows of Z. [sent-72, score-0.175]
35 Then, classification can be posed as an optimization problem that finds the best label assignment Ytst and error matrix E such that the rank of Z is minimized. [sent-73, score-0.233]
36 The resulting optimization problem, MC-1 [11], is minimize Ytst ,EXtr ,EY tr ,EXtst µ Z ∗ + 1 |ΩX | cx (zij , z0ij ) + ij∈ΩX λ |ΩY | cy (zij , z0ij ) ij∈ΩY Z = Z0 + E subject to (4) Z1 = 1 . [sent-74, score-0.299]
37 The parameters λ, µ are positive trade-off weights between better feature adaptation and label error correction. [sent-77, score-0.111]
38 We note this problem is equivalent to minimize Z subject to µ Z ∗ + 1 |ΩX | cx (zij , z0ij ) + ij∈ΩX λ |ΩY | cy (zij , z0ij ) ij∈ΩY (5) Z1 = 1 which can be solved using a Fixed Point Continuation method [18], described in Sec. [sent-78, score-0.259]
39 ||A||∗ designates the Nuclear Norm (sum of singular values) of A. [sent-92, score-0.083]
40 1k ∈ Rk×1 is a vector of ones, 0k×n ∈ Rk×n is a matrix of zeros and Ik ∈ Rk×k denotes the identity matrix (dimensions are omitted when trivially inferred). [sent-94, score-0.142]
41 3 4 Matrix completion for multi-label classification of visual data In this section, we present the main contributions of this paper: the application of Matrix Completion to the multi-label image classification problem and its convergence proof. [sent-95, score-0.364]
42 In the bag of (visual) words (BoW) model [24], visual data is encoded by the distribution of features among entries from a codebook. [sent-96, score-0.085]
43 The codebook is typically created by clustering local feature representations such as SIFT [17] or GIST [21]. [sent-97, score-0.067]
44 In this setting, the formulation MC-1 (5) is inadequate because it introduces negative values to the histograms in ZX . [sent-98, score-0.075]
45 To address this issue, we replace the penalties used so they reflect the nature of data: we replace the Least-Squares penalty in cx (·) by Pearson’s χ2 distance, that takes into account the asymmetry in histogram data, as F F χ2 (zj , z0j ) = χ2 (zij , z0ij ) = i i=1 i=1 (zij − z0ij )2 . [sent-99, score-0.159]
46 Additionally, we note that the Log label error in cy (·), albeit asymmetric, incurs in unnecessary penalization of entries belonging to the same class as the original entry (see Fig. [sent-101, score-0.295]
47 Therefore, we generalize this loss to progressively resemble smooth version of the Hinge loss, specified by the parameter γ as cy (zij , z0ij ) = 1 log(1 + exp (−γz0ij zij )). [sent-103, score-0.499]
48 8 1 z Figure 1: Comparison of Generalized Log loss with Log loss (γ = 1). [sent-115, score-0.076]
49 1 Fixed Point continuation (FPC) for MC-1 Albeit convex, the Nuclear Norm operator makes (5), (7), (8) not smooth. [sent-117, score-0.093]
50 Since the natural reformulation of a Nuclear Norm minimization is a Semidefinite Program, existing off-the-shelf interior point methods are not applicable due to the large dimension of Z. [sent-118, score-0.095]
51 The FPC method [18], in particular, is comprised by a series of gradient updates h(·) = I(·) − τ g(·) with step size τ and gradient g(·) given by the error penalizations cx (·) and cy (·). [sent-120, score-0.44]
52 These steps are alternated with a shrinkage operator Sν (·) = max (0, · − ν), applied to the singular values of the resulting matrix, so the rank is minimized. [sent-121, score-0.125]
53 In this paper, we prove the convergence of FPC to the constrained problem class by using the fact that projections onto Convex sets are also non-expansive; thus, the composition of gradient, shrinkage and projection steps is also a contraction. [sent-125, score-0.185]
54 Error > do Gradient Descent: A = h(A) = Z − τ g(Z) Shrink: A = UΣV , Z = USτ µ (Σ)V Project onto feasible set: Z1 = 1 end while end for Output: Complete Matrix Z Lemma 1 Let pC (·) be a projection operator onto any given convex set C. [sent-129, score-0.276]
55 These values are λγ easily obtained by noting the gradient of the Log loss function is Lipschitz continuous with L = 0. [sent-148, score-0.077]
56 Key to the feasibility of (7) and (8) within this algorithmic framework, however, is an efficient way to project Z onto the newly defined constraint sets. [sent-150, score-0.157]
57 While for MC-Pos (7) projecting a vector onto the Non-Negative Orthant is done in closed form by truncating negative components to zero, efficiently performing the projection onto the Probability Simplex in MC-Simplex (8) is not straightforward. [sent-151, score-0.234]
58 We note, however, this is a projection onto a convex subset of an 1 ball [9]. [sent-152, score-0.192]
59 Therefore, we can explore the dual of the projection problem and use a sorting procedure to implement this projection in closed form, as described in Alg. [sent-153, score-0.132]
60 Algorithm 2 Projection of a vector onto probability Simplex Input: Vector v ∈ RF to be projected Sort v into µ : µ1 ≥ µ2 ≥ . [sent-158, score-0.084]
61 Error > do Gradient Descent: A = Z − τ g(Z) Shrink: A = UΣV Shrink: Z = USτ µ (Σ)V Project ZX onto P (Alg. [sent-164, score-0.084]
62 We compare our results with MC-1 (5) and standard discriminative and MIL approaches [30, 20, 27, 26, 33, 32] on three datasets: CMU-Face , MSRC and 15 Scene. [sent-167, score-0.078]
63 The continuation steps require a decreasing sequence of µ, which we chose as µk = 0. [sent-169, score-0.093]
64 CMU-Face dataset This dataset consists in 624 images of 20 subject faces with several expressions and poses, under two conditions: wearing sunglasses and not. [sent-174, score-0.122]
65 As in [20], our training set is built using images of the first 8 subjects (126 images with glasses and 128 without), leaving the remainder for testing (370, equally split among the classes). [sent-176, score-0.129]
66 We describe each image by extracting 10000 SIFT features [17] at random scales and positions and quantizing them onto a 1000 visual codebook, obtained by performing hierarchical k-means clustering on 100000 features randomly selected from the training set. [sent-177, score-0.297]
67 For this dataset, note that subjects were captured in a very similar environment, so the most discriminative part is the eye region. [sent-178, score-0.078]
68 Since the face position varies, they propose using a Multiple Instance Learning framework (MIL-SegSVM), that localizes the most discriminative region in each image while learning a classifier to split both classes. [sent-181, score-0.192]
69 For the SVM, we either trained with the entire image information (SVM-Img) or with only the features extracted from the relevant, manually labeled, region of the eyes. [sent-183, score-0.149]
70 We fill Z with the label vector and the BoW histograms of each entire image and leave the test set labels Ytst as unknown entries. [sent-185, score-0.27]
71 For the MCSimplex case, we preprocess Z by 1 -normalizing each histogram in ZX . [sent-186, score-0.068]
72 This is done to avoid the Simplex projection picking a single bin and zeroing out the others, due to scale disparities in the bin counts. [sent-187, score-0.222]
73 These indicate both the fully supervised and the MIL approaches are more robust to the variability introduced by background noise, when compared to what is obtained when training without localization information (SVM-Img). [sent-189, score-0.152]
74 By using Matrix Completion, in turn, we are able to surpass these classification scores by solving a single convex minimization, since our error term E removes noise introduced by non-discriminative parts of the image. [sent-191, score-0.072]
75 We search for the box having the normalized histogram most closely resemblant to the corrected version in ZX according to the χ2 distance, and get the results shown in Fig. [sent-193, score-0.129]
76 These show how the corrected histograms capture the semantic concept being trained. [sent-195, score-0.205]
77 Moreover, MC-1 does not allow to pursue further localization of the class representative since it introduces erroneous negative numbers in the histograms (Fig. [sent-197, score-0.196]
78 bin count Figure 2: Histograms corrected by MC-Pos (7) preserve semantic meaning. [sent-199, score-0.208]
79 96 100 50 0 0 200 400 600 800 1000 600 800 1000 bin count bin 4 2 0 2 4 0 200 400 bin Figure 3: Erroneous histogram correction performed by MC-1 (5). [sent-207, score-0.302]
80 The MSRC dataset consists of 591 real world images distributed among 21 classes, with an average of 3 classes present per image. [sent-211, score-0.116]
81 We mimic the setup of [27] and use as features histograms of Textons [23] concatenated with histograms of colors in the L+U+V space. [sent-212, score-0.185]
82 In this dataset, we compare our formulations to MC-1 and several state-of-the-art approaches for categorization using Multiple-Label Multiple Instance Learning: Multiple Set Kernel MIL SVM (MSK-MIL) by Vijayanarasimhan et al. [sent-215, score-0.137]
83 Again, the fact that feature errors are corrected allows us to achieve good results while training with the entire image. [sent-222, score-0.124]
84 15 Scene dataset Finally, we test the performance of our algorithm for scene classification. [sent-225, score-0.101]
85 The 15 scene dataset is a multi-label dataset with 4485 images. [sent-227, score-0.137]
86 According to the feature study in [30], we use GIST [21], the non-histogram feature achieving best results on this dataset. [sent-228, score-0.068]
87 94 Conclusions We presented two new convex methods for performing semi-supervised multi-label classification of histogram data, with proven convergence properties. [sent-254, score-0.145]
88 Moreover, since histograms of full images contain the information for parts contained therein, the error embedded in our formulation is able to capture intra class variability arising from different backgrounds. [sent-256, score-0.155]
89 Experiments show that our methods perform comparably to state-of-the-art MIL methods in several image datasets, surpassing them in several cases, without the need for precise localization of objects in the training set. [sent-257, score-0.231]
90 Acknowledgements: Support for this research was provided by the Portuguese Foundation for Science and Technology through the Carnegie Mellon Portugal program under the project FCT/CMU/P11. [sent-258, score-0.073]
91 Fast incremental method for matrix completion: an application to trajectory correction. [sent-296, score-0.071]
92 Efficient projections onto the l1-ball for learning in high dimensions. [sent-323, score-0.084]
93 A rank minimization heuristic with application to minimum order system approximation. [sent-330, score-0.151]
94 Transduction with matrix completion: Three birds with one stone. [sent-339, score-0.071]
95 Fixed point and bregman iterative methods for matrix rank minimization. [sent-384, score-0.156]
96 Weakly supervised discriminative localization and classification: a joint learning process. [sent-397, score-0.201]
97 An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. [sent-429, score-0.212]
98 Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning. [sent-434, score-0.139]
99 Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning. [sent-469, score-0.114]
100 Image tag refinement towards low-rank, content-tag prior and error sparsity. [sent-493, score-0.073]
wordName wordTfidf (topN-words)
[('pc', 0.473), ('zij', 0.293), ('zx', 0.224), ('completion', 0.215), ('mil', 0.187), ('auroc', 0.173), ('ytst', 0.173), ('cy', 0.168), ('fpc', 0.148), ('nuclear', 0.122), ('image', 0.114), ('msrc', 0.108), ('classi', 0.107), ('categorization', 0.105), ('torre', 0.099), ('continuation', 0.093), ('cx', 0.091), ('localization', 0.088), ('ntr', 0.087), ('rank', 0.085), ('onto', 0.084), ('discriminative', 0.078), ('bin', 0.078), ('rk', 0.078), ('fixed', 0.076), ('histograms', 0.075), ('ntst', 0.074), ('xtst', 0.074), ('ytr', 0.074), ('project', 0.073), ('matrix', 0.071), ('semantic', 0.069), ('ij', 0.069), ('histogram', 0.068), ('projection', 0.066), ('minimization', 0.066), ('vijayanarasimhan', 0.065), ('xtr', 0.065), ('scene', 0.065), ('shrink', 0.064), ('corrected', 0.061), ('rf', 0.057), ('cvpr', 0.057), ('bow', 0.056), ('zha', 0.053), ('nguyen', 0.053), ('la', 0.051), ('norm', 0.051), ('images', 0.05), ('entries', 0.05), ('cabral', 0.049), ('costeira', 0.049), ('mcsimplex', 0.049), ('vezhnevets', 0.049), ('label', 0.047), ('simplex', 0.045), ('designates', 0.043), ('tag', 0.043), ('cation', 0.043), ('convex', 0.042), ('singular', 0.04), ('tr', 0.04), ('gist', 0.04), ('zy', 0.04), ('portugal', 0.04), ('penalizations', 0.04), ('maron', 0.04), ('orthant', 0.04), ('gradient', 0.039), ('loss', 0.038), ('goldberg', 0.037), ('digital', 0.037), ('dataset', 0.036), ('segmentation', 0.035), ('supervised', 0.035), ('features', 0.035), ('convergence', 0.035), ('barnard', 0.034), ('labels', 0.034), ('svm', 0.034), ('feature', 0.034), ('sift', 0.033), ('erroneous', 0.033), ('comprised', 0.033), ('allerton', 0.033), ('codebook', 0.033), ('et', 0.032), ('ganesh', 0.031), ('cv', 0.031), ('fernando', 0.031), ('letters', 0.03), ('associating', 0.03), ('classes', 0.03), ('error', 0.03), ('robustness', 0.03), ('asymmetric', 0.029), ('reformulation', 0.029), ('contraction', 0.029), ('candes', 0.029), ('training', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999994 165 nips-2011-Matrix Completion for Multi-label Image Classification
Author: Ricardo S. Cabral, Fernando Torre, Joao P. Costeira, Alexandre Bernardino
Abstract: Recently, image categorization has been an active research topic due to the urgent need to retrieve and browse digital images via semantic keywords. This paper formulates image categorization as a multi-label classification problem using recent advances in matrix completion. Under this setting, classification of testing data is posed as a problem of completing unknown label entries on a data matrix that concatenates training and testing features with training labels. We propose two convex algorithms for matrix completion based on a Rank Minimization criterion specifically tailored to visual data, and prove its convergence properties. A major advantage of our approach w.r.t. standard discriminative classification methods for image categorization is its robustness to outliers, background noise and partial occlusions both in the feature and label space. Experimental validation on several datasets shows how our method outperforms state-of-the-art algorithms, while effectively capturing semantic concepts of classes. 1
2 0.16388264 168 nips-2011-Maximum Margin Multi-Instance Learning
Author: Hua Wang, Heng Huang, Farhad Kamangar, Feiping Nie, Chris H. Ding
Abstract: Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags. Taking into account the two major challenges in MIL, high heterogeneity on data and weak label association, we propose a novel Maximum Margin Multi-Instance Learning (M3 I) approach to parameterize the C2B distance by introducing the class specific distance metrics and the locally adaptive significance coefficients. We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. Extensive experiments have demonstrated promising results that validate the proposed method.
3 0.15646395 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows
Author: Bogdan Alexe, Viviana Petrescu, Vittorio Ferrari
Abstract: We present a computationally efficient technique to compute the distance of highdimensional appearance descriptor vectors between image windows. The method exploits the relation between appearance distance and spatial overlap. We derive an upper bound on appearance distance given the spatial overlap of two windows in an image, and use it to bound the distances of many pairs between two images. We propose algorithms that build on these basic operations to efficiently solve tasks relevant to many computer vision applications, such as finding all pairs of windows between two images with distance smaller than a threshold, or finding the single pair with the smallest distance. In experiments on the PASCAL VOC 07 dataset, our algorithms accurately solve these problems while greatly reducing the number of appearance distances computed, and achieve larger speedups than approximate nearest neighbour algorithms based on trees [18] and on hashing [21]. For example, our algorithm finds the most similar pair of windows between two images while computing only 1% of all distances on average. 1
4 0.1557294 211 nips-2011-Penalty Decomposition Methods for Rank Minimization
Author: Yong Zhang, Zhaosong Lu
Abstract: In this paper we consider general rank minimization problems with rank appearing in either objective function or constraint. We first show that a class of matrix optimization problems can be solved as lower dimensional vector optimization problems. As a consequence, we establish that a class of rank minimization problems have closed form solutions. Using this result, we then propose penalty decomposition methods for general rank minimization problems. The convergence results of the PD methods have been shown in the longer version of the paper [19]. Finally, we test the performance of our methods by applying them to matrix completion and nearest low-rank correlation matrix problems. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed. 1
5 0.14967221 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
Author: Rina Foygel, Ohad Shamir, Nati Srebro, Ruslan Salakhutdinov
Abstract: We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions. We show that the standard weighted-trace norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it works better in practice. We provide guarantees when weighting by either the true or empirical sampling distribution, and suggest that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial. 1
6 0.14746937 142 nips-2011-Large-Scale Sparse Principal Component Analysis with Application to Text Data
7 0.14694068 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes
8 0.12617555 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
9 0.12534322 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
10 0.11862093 271 nips-2011-Statistical Tests for Optimization Efficiency
11 0.11296013 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
12 0.10065518 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
13 0.097774528 270 nips-2011-Statistical Performance of Convex Tensor Decomposition
14 0.096945219 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
15 0.095561124 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
16 0.092434406 157 nips-2011-Learning to Search Efficiently in High Dimensions
17 0.091873609 180 nips-2011-Multiple Instance Filtering
18 0.089960985 13 nips-2011-A blind sparse deconvolution method for neural spike identification
19 0.089317426 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
20 0.087496631 154 nips-2011-Learning person-object interactions for action recognition in still images
topicId topicWeight
[(0, 0.249), (1, 0.1), (2, -0.147), (3, 0.028), (4, -0.0), (5, 0.125), (6, 0.045), (7, 0.033), (8, 0.008), (9, 0.1), (10, 0.132), (11, 0.06), (12, -0.063), (13, 0.03), (14, -0.042), (15, -0.045), (16, -0.09), (17, 0.063), (18, -0.013), (19, -0.017), (20, -0.025), (21, 0.016), (22, 0.073), (23, -0.141), (24, 0.039), (25, 0.185), (26, -0.01), (27, -0.076), (28, 0.002), (29, 0.119), (30, 0.029), (31, 0.027), (32, 0.022), (33, 0.101), (34, -0.035), (35, 0.079), (36, 0.03), (37, 0.001), (38, -0.099), (39, -0.115), (40, -0.005), (41, -0.075), (42, 0.037), (43, 0.046), (44, -0.014), (45, 0.035), (46, 0.038), (47, -0.009), (48, 0.031), (49, -0.098)]
simIndex simValue paperId paperTitle
same-paper 1 0.93306738 165 nips-2011-Matrix Completion for Multi-label Image Classification
Author: Ricardo S. Cabral, Fernando Torre, Joao P. Costeira, Alexandre Bernardino
Abstract: Recently, image categorization has been an active research topic due to the urgent need to retrieve and browse digital images via semantic keywords. This paper formulates image categorization as a multi-label classification problem using recent advances in matrix completion. Under this setting, classification of testing data is posed as a problem of completing unknown label entries on a data matrix that concatenates training and testing features with training labels. We propose two convex algorithms for matrix completion based on a Rank Minimization criterion specifically tailored to visual data, and prove its convergence properties. A major advantage of our approach w.r.t. standard discriminative classification methods for image categorization is its robustness to outliers, background noise and partial occlusions both in the feature and label space. Experimental validation on several datasets shows how our method outperforms state-of-the-art algorithms, while effectively capturing semantic concepts of classes. 1
2 0.74124318 211 nips-2011-Penalty Decomposition Methods for Rank Minimization
Author: Yong Zhang, Zhaosong Lu
Abstract: In this paper we consider general rank minimization problems with rank appearing in either objective function or constraint. We first show that a class of matrix optimization problems can be solved as lower dimensional vector optimization problems. As a consequence, we establish that a class of rank minimization problems have closed form solutions. Using this result, we then propose penalty decomposition methods for general rank minimization problems. The convergence results of the PD methods have been shown in the longer version of the paper [19]. Finally, we test the performance of our methods by applying them to matrix completion and nearest low-rank correlation matrix problems. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed. 1
3 0.66348022 168 nips-2011-Maximum Margin Multi-Instance Learning
Author: Hua Wang, Heng Huang, Farhad Kamangar, Feiping Nie, Chris H. Ding
Abstract: Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags. Taking into account the two major challenges in MIL, high heterogeneity on data and weak label association, we propose a novel Maximum Margin Multi-Instance Learning (M3 I) approach to parameterize the C2B distance by introducing the class specific distance metrics and the locally adaptive significance coefficients. We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. Extensive experiments have demonstrated promising results that validate the proposed method.
4 0.65427458 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows
Author: Bogdan Alexe, Viviana Petrescu, Vittorio Ferrari
Abstract: We present a computationally efficient technique to compute the distance of highdimensional appearance descriptor vectors between image windows. The method exploits the relation between appearance distance and spatial overlap. We derive an upper bound on appearance distance given the spatial overlap of two windows in an image, and use it to bound the distances of many pairs between two images. We propose algorithms that build on these basic operations to efficiently solve tasks relevant to many computer vision applications, such as finding all pairs of windows between two images with distance smaller than a threshold, or finding the single pair with the smallest distance. In experiments on the PASCAL VOC 07 dataset, our algorithms accurately solve these problems while greatly reducing the number of appearance distances computed, and achieve larger speedups than approximate nearest neighbour algorithms based on trees [18] and on hashing [21]. For example, our algorithm finds the most similar pair of windows between two images while computing only 1% of all distances on average. 1
5 0.61079395 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
Author: Fahad S. Khan, Joost Weijer, Andrew D. Bagdanov, Maria Vanrell
Abstract: We describe a novel technique for feature combination in the bag-of-words model of image classification. Our approach builds discriminative compound words from primitive cues learned independently from training images. Our main observation is that modeling joint-cue distributions independently is more statistically robust for typical classification problems than attempting to empirically estimate the dependent, joint-cue distribution directly. We use Information theoretic vocabulary compression to find discriminative combinations of cues and the resulting vocabulary of portmanteau1 words is compact, has the cue binding property, and supports individual weighting of cues in the final image representation. State-of-theart results on both the Oxford Flower-102 and Caltech-UCSD Bird-200 datasets demonstrate the effectiveness of our technique compared to other, significantly more complex approaches to multi-cue image representation. 1
6 0.60837162 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
7 0.58182311 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
8 0.58049196 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance
9 0.55789536 230 nips-2011-RTRMC: A Riemannian trust-region method for low-rank matrix completion
10 0.55327338 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
11 0.54230571 142 nips-2011-Large-Scale Sparse Principal Component Analysis with Application to Text Data
12 0.52641231 293 nips-2011-Understanding the Intrinsic Memorability of Images
13 0.52537578 51 nips-2011-Clustered Multi-Task Learning Via Alternating Structure Optimization
14 0.52396506 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
15 0.5193333 181 nips-2011-Multiple Instance Learning on Structured Data
16 0.51531148 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes
17 0.51482338 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection
18 0.50293547 73 nips-2011-Divide-and-Conquer Matrix Factorization
19 0.49390051 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
20 0.4925226 33 nips-2011-An Exact Algorithm for F-Measure Maximization
topicId topicWeight
[(0, 0.047), (4, 0.042), (14, 0.014), (20, 0.05), (26, 0.027), (31, 0.057), (33, 0.301), (43, 0.065), (45, 0.141), (57, 0.033), (65, 0.016), (74, 0.037), (83, 0.03), (99, 0.048)]
simIndex simValue paperId paperTitle
1 0.99010253 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows
Author: Bogdan Alexe, Viviana Petrescu, Vittorio Ferrari
Abstract: We present a computationally efficient technique to compute the distance of highdimensional appearance descriptor vectors between image windows. The method exploits the relation between appearance distance and spatial overlap. We derive an upper bound on appearance distance given the spatial overlap of two windows in an image, and use it to bound the distances of many pairs between two images. We propose algorithms that build on these basic operations to efficiently solve tasks relevant to many computer vision applications, such as finding all pairs of windows between two images with distance smaller than a threshold, or finding the single pair with the smallest distance. In experiments on the PASCAL VOC 07 dataset, our algorithms accurately solve these problems while greatly reducing the number of appearance distances computed, and achieve larger speedups than approximate nearest neighbour algorithms based on trees [18] and on hashing [21]. For example, our algorithm finds the most similar pair of windows between two images while computing only 1% of all distances on average. 1
2 0.94067252 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
Author: Bin Zhao, Fei Li, Eric P. Xing
Abstract: Most previous research on image categorization has focused on medium-scale data sets, while large-scale image categorization with millions of images from thousands of categories remains a challenge. With the emergence of structured large-scale dataset such as the ImageNet, rich information about the conceptual relationships between images, such as a tree hierarchy among various image categories, become available. As human cognition of complex visual world benefits from underlying semantic relationships between object classes, we believe a machine learning system can and should leverage such information as well for better performance. In this paper, we employ such semantic relatedness among image categories for large-scale image categorization. Specifically, a category hierarchy is utilized to properly define loss function and select common set of features for related categories. An efficient optimization method based on proximal approximation and accelerated parallel gradient method is introduced. Experimental results on a subset of ImageNet containing 1.2 million images from 1000 categories demonstrate the effectiveness and promise of our proposed approach. 1
3 0.94035119 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes
Author: Yee W. Teh, Charles Blundell, Lloyd Elliott
Abstract: We propose a novel class of Bayesian nonparametric models for sequential data called fragmentation-coagulation processes (FCPs). FCPs model a set of sequences using a partition-valued Markov process which evolves by splitting and merging clusters. An FCP is exchangeable, projective, stationary and reversible, and its equilibrium distributions are given by the Chinese restaurant process. As opposed to hidden Markov models, FCPs allow for flexible modelling of the number of clusters, and they avoid label switching non-identifiability problems. We develop an efficient Gibbs sampler for FCPs which uses uniformization and the forward-backward algorithm. Our development of FCPs is motivated by applications in population genetics, and we demonstrate the utility of FCPs on problems of genotype imputation with phased and unphased SNP data. 1
4 0.87223619 103 nips-2011-Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss
Author: Joseph Keshet, David A. McAllester
Abstract: We consider latent structural versions of probit loss and ramp loss. We show that these surrogate loss functions are consistent in the strong sense that for any feature map (finite or infinite dimensional) they yield predictors approaching the infimum task loss achievable by any linear predictor over the given features. We also give finite sample generalization bounds (convergence rates) for these loss functions. These bounds suggest that probit loss converges more rapidly. However, ramp loss is more easily optimized on a given sample. 1
5 0.85708231 99 nips-2011-From Stochastic Nonlinear Integrate-and-Fire to Generalized Linear Models
Author: Skander Mensi, Richard Naud, Wulfram Gerstner
Abstract: Variability in single neuron models is typically implemented either by a stochastic Leaky-Integrate-and-Fire model or by a model of the Generalized Linear Model (GLM) family. We use analytical and numerical methods to relate state-of-theart models from both schools of thought. First we find the analytical expressions relating the subthreshold voltage from the Adaptive Exponential Integrate-andFire model (AdEx) to the Spike-Response Model with escape noise (SRM as an example of a GLM). Then we calculate numerically the link-function that provides the firing probability given a deterministic membrane potential. We find a mathematical expression for this link-function and test the ability of the GLM to predict the firing probability of a neuron receiving complex stimulation. Comparing the prediction performance of various link-functions, we find that a GLM with an exponential link-function provides an excellent approximation to the Adaptive Exponential Integrate-and-Fire with colored-noise input. These results help to understand the relationship between the different approaches to stochastic neuron models. 1 Motivation When it comes to modeling the intrinsic variability in simple neuron models, we can distinguish two traditional approaches. One approach is inspired by the stochastic Leaky Integrate-and-Fire (LIF) hypothesis of Stein (1967) [1], where a noise term is added to the system of differential equations implementing the leaky integration to a threshold. There are multiple versions of such a stochastic LIF [2]. How the noise affects the firing probability is also a function of the parameters of the neuron model. Therefore, it is important to take into account the refinements of simple neuron models in terms of subthreshold resonance [3, 4], spike-triggered adaptation [5, 6] and non-linear spike 1 initiation [7, 5]. All these improvements are encompassed by the Adaptive Exponential Integrateand-Fire model (AdEx [8, 9]). The other approach is to start with some deterministic dynamics for the the state of the neuron (for instance the instantaneous distance from the membrane potential to the threshold) and link the probability intensity of emitting a spike with a non-linear function of the state variable. Under some conditions, this type of model is part of a greater class of statistical models called Generalized Linear Models (GLM [10]). As a single neuron model, the Spike Response Model (SRM) with escape noise is a GLM in which the state variable is explicitly the distance between a deterministic voltage and the threshold. The original SRM could account for subthreshold resonance, refractory effects and spike-frequency adaptation [11]. Mathematically similar models were developed independently in the study of the visual system [12] where spike-frequency adaptation has also been modeled [13]. Recently, this approach has retained increased attention since the probabilistic framework can be linked with the Bayesian theory of neural systems [14] and because Bayesian inference can be applied to the population of neurons [15]. In this paper, we investigate the similarity and differences between the state-of-the-art GLM and the stochastic AdEx. The motivation behind this work is to relate the traditional threshold neuron models to Bayesian theory. Our results extend the work of Plesser and Gerstner (2000) [16] since we include the non-linearity for spike initiation and spike-frequency adaptation. We also provide relationships between the parameters of the AdEx and the equivalent GLM. These precise relationships can be used to relate analog implementations of threshold models [17] to the probabilistic models used in the Bayesian approach. The paper is organized as follows: We first describe the expressions relating the SRM state-variable to the parameters of the AdEx (Sect. 3.1) in the subthreshold regime. Then, we use numerical methods to find the non-linear link-function that models the firing probability (Sect. 3.2). We find a functional form for the SRM link-function that best describes the firing probability of a stochastic AdEx. We then compare the performance of this link-function with the often used exponential or linear-rectifier link-functions (also called half-wave linear rectifier) in terms of predicting the firing probability of an AdEx under complex stimulus (Sect. 3.3). We find that the exponential linkfunction yields almost perfect prediction. Finally, we explore the relations between the statistic of the noise and the sharpness of the non-linearity for spike initiation with the parameters of the SRM. 2 Presentation of the Models In this section we present the general formula for the stochastic AdEx model (Sect. 2.1) and the SRM (Sect 2.2). 2.1 The Stochastic Adaptive Exponential Integrate-and-Fire Model The voltage dynamics of the stochastic AdEx is given by: V −Θ ˙ τm V = El − V + ∆T exp − Rw + RI + R (1) ∆T τw w = a(V − El ) − w ˙ (2) where τm is the membrane time constant, El the reverse potential, R the membrane resistance, Θ is the threshold, ∆T is the shape factor and I(t) the input current which is chosen to be an Ornstein−Θ Uhlenbeck process with correlation time-constant of 5 ms. The exponential term ∆T exp( V∆T ) is a non-linear function responsible for the emission of spikes and is a diffusive white noise with standard deviation σ (i.e. ∼ N (0, σ)). Note that the diffusive white-noise does not imply white noise fluctuations of the voltage V (t), the probability distribution of V (t) will depend on ∆T and Θ. The second variable, w, describes the subthreshold as well as the spike-triggered adaptation both ˆ parametrized by the coupling strength a and the time constant τw . Each time tj the voltage goes to infinity, we assumed that a spike is emitted. Then the voltage is reset to a fixed value Vr and w is increased by a constant value b. 2.2 The Generalized Linear Model In the SRM, The voltage V (t) is given by the convolution of the injected current I(t) with the membrane filter κ(t) plus the additional kernel η(t) that acts after each spikes (here we split the 2 spike-triggered kernel in two η(t) = ηv (t) + ηw (t) for reasons that will become clear later): V (t) = ˆ ˆ ηv (t − tj ) + ηw (t − tj ) El + [κ ∗ I](t) + (3) ˆ {tj } ˆ Then at each time tj a spike is emitted which results in a change of voltage described by η(t) = ηv (t) + ηw (t). Given the deterministic voltage, (Eq. 3) a spike is emitted according to the firing intensity λ(V ): λ(t) = f (V (t)) (4) where f (·) is an arbitrary function called the link-function. Then the firing behavior of the SRM depends on the choice of the link-function and its parameters. The most common link-function used to model single neuron activities are the linear-rectifier and the exponential function. 3 Mapping In order to map the stochastic AdEx to the SRM we follow a two-step procedure. First we derive the filter κ(t) and the kernels ηv (t) and ηw (t) analytically as a function of AdEx parameters. Second, we derive the link-function of the SRM from the stochastic spike emission of the AdEx. Figure 1: Mapping of the subthreshold dynamics of an AdEx to an equivalent SRM. A. Membrane filter κ(t) for three different sets of parameters of the AdEx leading to over-damped, critically damped and under-damped cases (upper, middle and lower panel, respectively). B. Spike-Triggered η(t) (black), ηv (t) (light gray) and ηw (gray) for the three cases. C. Example of voltage trace produced when an AdEx is stimulated with a step of colored noise (black). The corresponding voltage from a SRM stimulated with the same current and where we forced the spikes to match those of the AdEx (red). D. Error in the subthreshold voltage (VAdEx − VGLM ) as a function of the mean voltage of the AdEx, for the three different cases: over-, critically and under-damped (light gray, gray and black, respectively) with ∆T = 1 mV. Red line represents the voltage threshold Θ. E. Root Mean Square Error (RMSE) ratio for the three cases with ∆T = 1 mV. The RMSE ratio is the RMSE between the deterministic VSRM and the stochastic VAdEx divided by the RMSE between repetitions of the stochastic AdEx voltage. The error bar shows a single standard deviation as the RMSE ratio is averaged accross multiple value of σ. 3.1 Subthreshold voltage dynamics We start by assuming that the non-linearity for spike initiation does not affect the mean subthreshold voltage of the stochastic AdEx (see Figure 1 D). This assumption is motivated by the small ∆T 3 observed in in-vitro recordings (from 0.5 to 2 mV [8, 9]) which suggest that the subthreshold dynamics are mainly linear except very close to Θ. Also, we expect that the non-linear link-function will capture some of the dynamics due to the non-linearity for spike initiation. Thus it is possible to rewrite the deterministic subthreshold part of the AdEx (Eq. 1-2 without and without ∆T exp((V − Θ)/∆T )) using matrices: ˙ x = Ax (5) with x = V w and A = − τ1 m a τw − gl1m τ − τ1 w (6) In this form, the dynamics of the deterministic AdEx voltage is a damped oscillator with a driving force. Depending on the eigenvalues of A the system could be over-damped, critically damped or under-damped. The filter κ(t) of the GLM is given by the impulse response of the system of coupled differential equations of the AdEx, described by Eq. 5 and 6. In other words, one has to derive the response of the system when stimulating with a Dirac-delta function. The type of damping gives three different qualitative shapes of the kernel κ(t), which are summarized in Table 3.1 and Figure 1 A. Since the three different filters also affect the nature of the stochastic voltage fluctuations, we will keep the distinction between over-damped, critically damped and under-damped scenarios throughout the paper. This means that our approach is valid for at least 3 types of diffusive voltage-noise (i.e. the white noise in Eq. 1 filtered by 3 different membrane filters κ(t)). To complete the description of the deterministic voltage, we need an expression for the spiketriggered kernels. The voltage reset at each spike brings a spike-triggered jump in voltage of magˆ nitude ∆ = Vr − V (t). This perturbation is superposed to the current fluctuations due to I(t) and can be mediated by a Delta-diract pulse of current. Thus we can write the voltage reset kernel by: ηv (t) = ∆ ∆ [δ ∗ κ] (t) = κ(t) κ(0) κ(0) (7) where δ(t) is the Dirac-delta function. The shape of this kernel depends on κ(t) and can be computed from Table 3.1 (see Figure 1 B). Finally, the AdEx mediates spike-frequency adaptation by the jump of the second variables w. From Eq. 2 we can see that this produces a current wspike (t) = b exp (−t/τw ) that can cumulate over subsequent spikes. The effect of this current on voltage is then given by the convolution of wspike (t) with the membrane filter κ(t). Thus in the SRM framework the spike-frequency adaptation is taken into account by: ηw (t) = [wspike ∗ κ](t) (8) Again the precise form of ηw (t) depends on κ(t) and can be computed from Table 3.1 (see Figure 1 B). At this point, we would like to verify our assumption that the non-linearity for spike emission can be neglected. Fig. 1 C and D shows that the error between the voltage from Eq. 3 and the voltage from the stochastic AdEx is generally small. Moreover, we see that the main contribution to the voltage prediction error is due to the mismatch close to the spikes. However the non-linearity for spike initiation may change the probability distribution of the voltage fluctuations, which in turn influences the probability of spiking. This will influence the choice of the link-function, as we will see in the next section. 3.2 Spike Generation Using κ(t), ηv (t) and ηw (t), we must relate the spiking probability of the stochastic AdEx as a function of its deterministic voltage. According to [2] the probability of spiking in time bin dt given the deterministic voltage V (t) is given by: p(V ) = prob{spike in [t, t + dt]} = 1 − exp (−f (V (t))dt) (9) where f (·) gives the firing intensity as a function of the deterministic V (t) (Eq. 3). Thus to extract the link-function f we have to compute the probability of spiking given V (t) for our SRM. To do so we apply the method proposed by Jolivet et al. (2004) [18], where the probability of spiking is simply given by the distribution of the deterministic voltage estimated at the spike times divided by the distribution of the SRM voltage when there is no spike (see figure 2 A). One can numerically compute these two quantities for our models using N repetitions of the same stimulus. 4 Table 1: Analytical expressions for the membrane filter κ(t) in terms of the parameters of the AdEx for over-, critically-, and under-damped cases. Membrane Filter: κ(t) over-damped if: (τm + τw )2 > 4τm τw (gl +a) gl κ(t) = k1 eλ1 t + k2 eλ2 t λ1 = 1 2τm τw (−(τm + τw ) + critically-damped if: (τm + τw )2 = 4τm τw (gl +a) gl κ(t) = (αt + β)eλt λ= under-damped if: (τm + τw )2 < 4τm τw (gl +a) gl κ(t) = (k1 cos (ωt) + k2 sin (ωt)) eλt −(τm +τw ) 2τm τw λ= −(τm +τw ) 2τm τw (τm + τw )2 − 4 τm τw (gl + a) gl λ2 = 1 2τm τw (−(τm + τw ) − α= τm −τw 2Cτm τw ω= τw −τm 2τm τw 2 − a g l τm τw (τm + τw )2 − 4 τm τw (gl + a) gl k1 = −(1+(τm λ2 )) Cτm (λ1 −λ2 ) k2 = 1+(τm λ1 ) Cτm (λ1 −λ2 ) β= 1 C k1 = k2 = 1 C −(1+τm λ) Cωτm The standard deviation σ of the noise and the parameter ∆T of the AdEx non-linearity may affect the shape of the link-function. We thus extract p(V ) for different σ and ∆T (Fig. 2 B). Then using visual heuristics and previous knowledge about the potential analytical expression of the link-funtion, we try to find a simple analytical function that captures p(V ) for a large range of combinations of σ and ∆T . We observed that the log(− log(p)) is close to linear in most studied conditions Fig. 2 B suggesting the following two distributions of p(V ): V − VT (10) p(V ) = 1 − exp − exp ∆V V − VT p(V ) = exp − exp − (11) ∆V Once we have p(V ), we can use Eq. 4 to obtain the equivalent SRM link-function, which leads to: −1 f (V ) = log (1 − p(V )) (12) dt Then the two potential link-functions of the SRM can be derived from Eq. 10 and Eq. 11 (respectively): V − VT f (V ) = λ0 exp (13) ∆V V − VT (14) f (V ) = −λ0 log 1 − exp − exp − ∆V 1 with λ0 = dt , VT the threshold of the SRM and ∆V the sharpness of the link-function (i.e. the parameters that governs the degree of the stochasticity). Note that the exact value of λ0 has no importance since it is redundant with VT . Eq. 13 is the standard exponential link-function, but we call Eq. 14 the log-exp-exp link-function. 3.3 Prediction The next point is to evaluate the fit quality of each link-function. To do this, we first estimate the parameters VT and ∆V of the GLM link-function that maximize the likelihood of observing a spike 5 Figure 2: SRM link-function. A. Histogram of the SRM voltage at the AdEx firing times (red) and at non-firing times (gray). The ratio of the two distributions gives p(V ) (Eq. 9, dashed lines). Inset, zoom to see the voltage histogram evaluated at the firing time (red). B. log(− log(p)) as a function of the SRM voltage for three different noise levels σ = 0.07, 0.14, 0.18 nA (pale gray, gray, black dots, respectively) and ∆T = 1 mV. The line is a linear fit corresponding to the log-exp-exp linkfunction and the dashed line corresponds to a fit with the exponential link-function. C. Same data and labeling scheme as B, but plotting f (V ) according to Eq. 12. The lines are produced with Eq. 14 with parameters fitted as described in B. and the dashed lines are produced with Eq. 13. Inset, same plot but on a semi-log(y) axis. train generated with an AdEx. Second we look at the predictive power of the resulting SRM in terms of Peri-Stimulus Time Histogram (PSTH). In other words we ask how close the spike trains generated with a GLM are from the spike train generated with a stochastic AdEx when both models are stimulated with the same input current. For any GLM with link-function f (V ) ≡ f (t|I, θ) and parameters θ regulating the shape of κ(t), ˆ ηv (t) and ηw (t), the Negative Log-Likelihood (NLL) of observing a spike-train {t} is given by: NLL = − log(f (t|I, θ)) − f (t|I, θ) (15) t ˆ t It has been shown that the negative log-likelihood is convex in the parameters if f is convex and logconcave [19]. It is easy to show that a linear-rectifier link-function, the exponential link-function and the log-exp-exp link-function all satisfy these conditions. This allows efficient estimation of ˆ ˆ the optimal parameters VT and ∆V using a simple gradient descent. One can thus estimate from a single AdEx spike train the optimal parameters of a given link-function, which is more efficient than the method used in Sect. 3.2. The minimal NLL resulting from the gradient descent gives an estimation of the fit quality. A better estimate of the fit quality is given by the distance between the PSTHs in response to stimuli not used for parameter fitting . Let ν1 (t) be the PSTH of the AdEx, and ν2 (t) be the PSTH of the fitted SRM, 6 Figure 3: PSTH prediction. A. Injected current. B. Voltage traces produced by an AdEx (black) and the equivalent SRM (red), when stimulated with the current in A. C. Raster plot for 20 realizations of AdEx (black tick marks) and equivalent SRM (red tick marks). D. PSTH of the AdEx (black) and the SRM (red) obtained by averaging 10,000 repetitions. E. Optimal log-likelihood for the three cases of the AdEx, using three different link-functions, a linear-rectifier (light gray), an exponential link-function (gray) and the link-function defined by Eq. 14 (dark gray), these values are obtained by averaging over 40 different combinations σ and ∆T (see Fig. 4). Error bars are one standard deviation, the stars denote a significant difference, two-sample t-test with α = 0.01. F. same as E. but for Md (Eq. 16). then we use Md ∈ [0, 1] as a measure of match: Md = 2 2 (ν1 (t) − ν2 (t)) dt ν1 (t)2 dt + ν2 (t)2 dt (16) Md = 1 means that it is impossible to differentiate the SRM from the AdEx in terms of their PSTHs, whereas a Md of 0 means that the two PSTHs are completely different. Thus Md is a normalized similarity measure between two PSTHs. In practice, Md is estimated from the smoothed (boxcar average of 1 ms half-width) averaged spike train of 1 000 repetitions for each models. We use both the NLL and Md to quantify the fit quality for each of the three damping cases and each of the three link-functions. Figure 3 shows the match between the stochastic AdEx used as a reference and the derived GLM when both are stimulated with the same input current (Fig. 3 A). The resulting voltage traces are almost identical (Fig. 3 B) and both models predict almost the same spike trains and so the same PSTHs (Fig. 3 C and D). More quantitalively, we see on Fig. 3 E and F, that the linear-rectifier fits significantly worse than both the exponential and log-exp-exp link-functions, both in terms of NLL and of Md . The exponential link-function performs as well as the log-exp-exp link-function, with a spike train similarity measure Md being almost 1 for both. Finally the likelihood-based method described above gives us the opportunity to look at the relationship between the AdEx parameters σ and ∆T that governs its spike emission and the parameters VT and ∆V of the link-function (Fig. 4). We observe that an increase of the noise level produces a flatter link-function (greater ∆V ) while an increase in ∆T also produces an increase in ∆V and VT (note that Fig. 4 shows ∆V and VT for the exponential link-function only, but equivalent results are obtained with the log-exp-exp link-function). 4 Discussion In Sect. 3.3 we have shown that it is possible to predict with almost perfect accuracy the PSTH of a stochastic AdEx model using an appropriate set of parameters in the SRM. Moreover, since 7 Figure 4: Influence of the AdEx parameters on the parameters of the exponential link-function. A. VT as a function of ∆T and σ. B. ∆V as a function of ∆T and σ. the subthreshold voltage of the AdEx also gives a good match with the deterministic voltage of the SRM, we expect that the AdEx and the SRM will not differ in higher moments of the spike train probability distributions beyond the PSTH. We therefore conclude that diffusive noise models of the type of Eq. 1-2 are equivalent to GLM of the type of Eq. 3-4. Once combined with similar results on other types of stochastic LIF (e.g. correlated noise), we could bridge the gap between the literature on GLM and the literature on diffusive noise models. Another noteworthy observation pertains to the nature of the link-function. The link-function has been hypothesized to be a linear-rectifier, an exponential, a sigmoidal or a Gaussian [16]. We have observed that for the AdEx the link-function follows Eq. 14 that we called the log-exp-exp linkfunction. Although the link-function is log-exp-exp for most of the AdEx parameters, the exponential link-function gives an equivalently good prediction of the PSTH. This can be explained by the fact that the difference between log-exp-exp and exponential link-functions happens mainly at low voltage (i.e. far from the threshold), where the probability of emitting a spike is so low (Figure 2 C, until -50 mv). Therefore, even if the exponential link-function overestimates the firing probability at these low voltages it rarely produces extra spikes. At voltages closer to the threshold, where most of the spikes are emitted, the two link-functions behave almost identically and hence produce the same PSTH. The Gaussian link-function can be seen as lying in-between the exponential link-function and the log-exp-exp link-function in Fig. 2. This means that the work of Plesser and Gerstner (2000) [16] is in agreement with the results presented here. The importance of the time-derivative of the ˙ voltage stressed by Plesser and Gerstner (leading to a two-dimensional link-function f (V, V )) was not studied here to remain consistent with the typical usage of GLM in neural systems [14]. Finally we restricted our study to exponential non-linearity for spike initiation and do not consider other cases such as the Quadratic Integrate-and-fire (QIF, [5]) or other polynomial functional shapes. We overlooked these cases for two reasons. First, there are many evidences that the non-linearity in neurons (estimated from in-vitro recordings of Pyramidal neurons) is well approximated by a single exponential [9]. Second, the exponential non-linearity of the AdEx only affects the subthreshold voltage at high voltage (close to threshold) and thus can be neglected to derive the filters κ(t) and η(t). Polynomial non-linearities on the other hand affect a larger range of the subthreshold voltage so that it would be difficult to justify the linearization of subthreshold dynamics essential to the method presented here. References [1] R. B. Stein, “Some models of neuronal variability,” Biophys J, vol. 7, no. 1, pp. 37–68, 1967. [2] W. Gerstner and W. Kistler, Spiking neuron models. Cambridge University Press New York, 2002. [3] E. Izhikevich, “Resonate-and-fire neurons,” Neural Networks, vol. 14, no. 883-894, 2001. [4] M. J. E. Richardson, N. Brunel, and V. Hakim, “From subthreshold to firing-rate resonance,” Journal of Neurophysiology, vol. 89, pp. 2538–2554, 2003. 8 [5] E. Izhikevich, “Simple model of spiking neurons,” IEEE Transactions on Neural Networks, vol. 14, pp. 1569–1572, 2003. [6] S. Mensi, R. Naud, M. Avermann, C. C. H. Petersen, and W. Gerstner, “Parameter extraction and classification of three neuron types reveals two different adaptation mechanisms,” Under review. [7] N. Fourcaud-Trocme, D. Hansel, C. V. Vreeswijk, and N. Brunel, “How spike generation mechanisms determine the neuronal response to fluctuating inputs,” Journal of Neuroscience, vol. 23, no. 37, pp. 11 628–11 640, 2003. [8] R. Brette and W. Gerstner, “Adaptive exponential integrate-and-fire model as an effective description of neuronal activity,” Journal of Neurophysiology, vol. 94, pp. 3637–3642, 2005. [9] L. Badel, W. Gerstner, and M. Richardson, “Dependence of the spike-triggered average voltage on membrane response properties,” Neurocomputing, vol. 69, pp. 1062–1065, 2007. [10] P. McCullagh and J. A. Nelder, Generalized linear models, 2nd ed. Chapman & Hall/CRC, 1998, vol. 37. [11] W. Gerstner, J. van Hemmen, and J. Cowan, “What matters in neuronal locking?” Neural computation, vol. 8, pp. 1653–1676, 1996. [12] D. Hubel and T. Wiesel, “Receptive fields and functional architecture of monkey striate cortex,” Journal of Physiology, vol. 195, pp. 215–243, 1968. [13] J. Pillow, L. Paninski, V. Uzzell, E. Simoncelli, and E. Chichilnisky, “Prediction and decoding of retinal ganglion cell responses with a probabilistic spiking model,” Journal of Neuroscience, vol. 25, no. 47, pp. 11 003–11 013, 2005. [14] K. Doya, S. Ishii, A. Pouget, and R. P. N. Rao, Bayesian brain: Probabilistic approaches to neural coding. The MIT Press, 2007. [15] S. Gerwinn, J. H. Macke, M. Seeger, and M. Bethge, “Bayesian inference for spiking neuron models with a sparsity prior,” in Advances in Neural Information Processing Systems, 2007. [16] H. Plesser and W. Gerstner, “Noise in integrate-and-fire neurons: From stochastic input to escape rates,” Neural Computation, vol. 12, pp. 367–384, 2000. [17] J. Schemmel, J. Fieres, and K. Meier, “Wafer-scale integration of analog neural networks,” in Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, june 2008, pp. 431 –438. [18] R. Jolivet, T. Lewis, and W. Gerstner, “Generalized integrate-and-fire models of neuronal activity approximate spike trains of a detailed model to a high degree of accuracy,” Journal of Neurophysiology, vol. 92, pp. 959–976, 2004. [19] L. Paninski, “Maximum likelihood estimation of cascade point-process neural encoding models,” Network: Computation in Neural Systems, vol. 15, pp. 243–262, 2004. 9
same-paper 6 0.851668 165 nips-2011-Matrix Completion for Multi-label Image Classification
7 0.80728686 188 nips-2011-Non-conjugate Variational Message Passing for Multinomial and Binary Regression
8 0.79874814 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
9 0.75857365 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
10 0.74032432 168 nips-2011-Maximum Margin Multi-Instance Learning
11 0.73905152 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
12 0.7364676 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling
13 0.72456652 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
14 0.72456193 227 nips-2011-Pylon Model for Semantic Segmentation
15 0.71210188 154 nips-2011-Learning person-object interactions for action recognition in still images
16 0.71053654 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
17 0.6864711 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
18 0.68206769 127 nips-2011-Image Parsing with Stochastic Scene Grammar
19 0.67418855 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation
20 0.67215514 35 nips-2011-An ideal observer model for identifying the reference frame of objects