nips nips2011 nips2011-266 knowledge-graph by maker-knowledge-mining

266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

Source: pdf

Author: Soumya Ghosh, Andrei B. Ungureanu, Erik B. Sudderth, David M. Blei

Abstract: The distance dependent Chinese restaurant process (ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data [1]. The ddCRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This paper examines the ddCRP in a spatial setting with the goal of natural image segmentation. We explore the biases of the spatial ddCRP model and propose a novel hierarchical extension better suited for producing “human-like” segmentations. We then study the sensitivity of the models to various distance and appearance hyperparameters, and provide the ﬁrst rigorous comparison of nonparametric Bayesian models in the image segmentation domain. On unsupervised image segmentation, we demonstrate that similar performance to existing nonparametric Bayesian models is possible with substantially simpler models and algorithms.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Spatial distance dependent Chinese restaurant processes for image segmentation Soumya Ghosh1 , Andrei B. [sent-1, score-0.561]

2 edu Abstract The distance dependent Chinese restaurant process (ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data [1]. [sent-11, score-0.31]

3 This paper examines the ddCRP in a spatial setting with the goal of natural image segmentation. [sent-13, score-0.252]

4 We explore the biases of the spatial ddCRP model and propose a novel hierarchical extension better suited for producing “human-like” segmentations. [sent-14, score-0.179]

5 We then study the sensitivity of the models to various distance and appearance hyperparameters, and provide the ﬁrst rigorous comparison of nonparametric Bayesian models in the image segmentation domain. [sent-15, score-0.431]

6 On unsupervised image segmentation, we demonstrate that similar performance to existing nonparametric Bayesian models is possible with substantially simpler models and algorithms. [sent-16, score-0.216]

7 1 Introduction The Chinese restaurant process (CRP) is a distribution on partitions of integers [2]. [sent-17, score-0.216]

8 When used in a mixture model, it provides an alternative representation of a Bayesian nonparametric Dirichlet process mixture—the data are clustered and the number of clusters is determined via the posterior distribution. [sent-18, score-0.242]

9 The distance dependent Chinese restaurant process (ddCRP) was recently introduced to model random partitions of non-exchangeable data [1]. [sent-23, score-0.31]

10 We use a spatial distance function between pixels in natural images and cluster them to provide an unsupervised segmentation. [sent-30, score-0.352]

11 The spatial distance encourages the discovery of connected segments. [sent-31, score-0.198]

12 Analogous to the hierarchical Dirichlet process (HDP) [5], the rddCRP clusters groups of data, where cluster components are shared across groups. [sent-33, score-0.191]

13 Each square is a data point (a pixel or superpixel) and each arrow is a customer assignment. [sent-37, score-0.228]

14 , partially specifying each image’s segmentation via manual user input [7]). [sent-51, score-0.19]

15 Recently, several promising segmentation algorithms have been proposed based on nonparametric Bayesian methods [10, 11, 12]. [sent-52, score-0.253]

16 Moreover, unlike previous nonparametric Bayesian approaches, the structure of the ddCRP allows spatial connectivity of the inferred segments to (optionally) be enforced. [sent-55, score-0.315]

17 We also provide the ﬁrst rigorous comparison of nonparametric Bayesian image segmentation models. [sent-58, score-0.357]

18 Image segmentation is the problem of partitioning an image into self-similar groups of adjacent pixels. [sent-60, score-0.309]

19 Our goal is to associate the features xi in the ith superpixel with some cluster zi ; these clusters form the segments of that image. [sent-63, score-0.368]

20 Image segmentation is thus a special kind of clustering problem where the desired solution has two properties. [sent-64, score-0.209]

21 First, we hope to ﬁnd contiguous regions of the image assigned to the same cluster. [sent-65, score-0.265]

22 Due to physical processes such as occlusion, it may be appropriate to ﬁnd segments that contain two or three contiguous image regions, but we do not want a cluster that is scattered across individual image pixels. [sent-66, score-0.578]

23 Traditional clustering algorithms, such as k-means or probabilistic mixture models, do not account for external information such as pixel location and are not biased towards contiguous regions. [sent-67, score-0.223]

24 Image locations have been heuristically incorporated into Gaussian mixture models by concatenating positions with appearance features in a vector [16], but the resulting bias towards elliptical regions often produces segmentation artifacts. [sent-68, score-0.256]

25 Image segmentation algorithms are typically applied to collections of images of widely varying scenes, which are likely to require different numbers of segments. [sent-70, score-0.215]

26 Except in certain restricted domains such as medical image analysis, it is not practical to use an algorithm that requires knowing the number of segments in advance. [sent-71, score-0.244]

27 In the following sections, we develop a Bayesian algorithm for image segmentation based on the distance dependent Chinese restaurant process (ddCRP) mixture model [1]. [sent-72, score-0.598]

28 Our algorithm ﬁnds spatially contiguous segments in the image and determines an image-speciﬁc number of segments from the observed data. [sent-73, score-0.547]

29 1 Chinese restaurant process mixtures The ddCRP mixture is an extension of the traditional Chinese restaurant process (CRP) mixture. [sent-75, score-0.432]

30 CRP mixtures provide a clustering method that determines the number of clusters from the data— they are an alternative formulation of the Dirichlet process mixture model. [sent-76, score-0.219]

31 The assumed generative process is described by imagining a restaurant with an inﬁnite number of tables, each of which is endowed with a parameter for some family of data generating distributions (in our experiments, Dirichlet). [sent-77, score-0.189]

32 Customers enter the restaurant in sequence and sit at a randomly chosen table. [sent-78, score-0.181]

33 They sit at the previously occupied tables with probability proportional to how many customers are already sitting at each; they sit at an unoccupied table with probability proportional to a scaling parameter. [sent-79, score-0.281]

34 Finally, each customer draws an observation from a distribution determined by the parameter at the assigned table. [sent-81, score-0.264]

35 Conditioned on observed data, the CRP mixture provides a posterior distribution over table assignments and the parameters attached to those tables. [sent-82, score-0.264]

36 However, exchangeability is not an appropriate assumption in image segmentation problems—the locations of the image pixels are critical to providing contiguous segmentations. [sent-87, score-0.562]

37 2 Distance dependent CRPs The distance dependent Chinese Restaurant Process (ddCRP) is a generalization of the Chinese restaurant process that allows for a non-exchangeable distribution on partitions [1]. [sent-89, score-0.354]

38 Rather than represent a partition by customers assigned to tables, the ddCRP models customers linking to other customers. [sent-90, score-0.232]

39 The seating plan is a byproduct of these links—two customers are sitting at the same table if one can reach the other by traversing the customer assignments. [sent-91, score-0.396]

40 Once the partition is determined, the observed data for each customer are generated by the per-table parameters. [sent-93, score-0.267]

41 As illustrated in Figure 1, the generative process is described in terms of customer assignments ci (as opposed to partition assignments or tables, zi ). [sent-94, score-0.688]

42 The distribution of customer assignments is p (ci = j | D, f, α) ∝ f (dij ) j = i, α j = i. [sent-95, score-0.371]

43 The ddCRP is appropriate for image segmentation because it can naturally account for the spatial structure of the superpixels through its distance function. [sent-103, score-0.54]

44 We use a spatial distance between pixels to enforce a bias towards contiguous clusters. [sent-104, score-0.298]

45 3 Figure 2: Comparison of distance-dependent segmentation priors. [sent-106, score-0.166]

46 Restaurants represent images, tables represent segment assignment, and customers represent superpixels. [sent-108, score-0.175]

47 The distance between superpixels is modeled as the number of hops required to reach one superpixel from another, with hops being allowed only amongst spatially neighboring superpixels. [sent-109, score-0.302]

48 , in the same connected component of the customer assignment graph) are in the same image segment. [sent-114, score-0.38]

49 For this special case segments are guaranteed with probability one to form spatially connected subsets of the image, a property not enforced by other Bayesian nonparametric models [10, 11, 12]. [sent-115, score-0.242]

50 The full generative process for the observed features x1:N within a N -superpixel image is as follows: 1. [sent-116, score-0.169]

51 For each customer, sample a customer assignment ci ∼ ddCRP(α, f, D). [sent-119, score-0.353]

52 This indirectly determines the cluster assignments z1:N , and thus the segmentation. [sent-120, score-0.257]

53 The customer assignments are sampled using the spatial distance between pixels. [sent-123, score-0.549]

54 The partition structure, derived from the customer assignments, is used to sample the observed image features. [sent-124, score-0.391]

55 Given an image, the posterior distribution of the customer assignments induces a posterior over the cluster structure; this provides the segmentation. [sent-125, score-0.529]

56 See Figure 1 for an illustration of the customer assignments and their derived table assignments in a segmentation setting. [sent-126, score-0.702]

57 3 Region-based hierarchical distance dependent CRPs The ddCRP model, when applied to an image with window size a = 1, produces a collection of contiguous patches (tables) homogeneous in color and texture features (Figure 2). [sent-130, score-0.448]

58 For each customer, sample customer assignments ci ∼ ddCRP (α, f, D). [sent-136, score-0.468]

59 For each table t, sample region assignments kt ∼ CRP (γ). [sent-139, score-0.212]

60 Note that this region level rddCRP model is a direct extension of the Chinese restaurant franchise (CRF) representation of the HDP [5], with the image partition being drawn from a ddCRP instead 4 of a CRP. [sent-144, score-0.333]

61 3 Inference with Gibbs Sampling A segmentation of an observed image is found by posterior inference. [sent-148, score-0.343]

62 Notice again that the prior term uses the customer representation to take into account distances between data points; the likelihood term uses the cluster representation. [sent-150, score-0.338]

63 (4) k=1 We have introduced notation to more easily move from the customer representation—the primary latent variables of our model—and the cluster representation. [sent-156, score-0.318]

64 Let K(c1:N ) denote the number of unique clusters in the customer assignments, z(c1:N ) the cluster assignments derived from the customer assignments, and xz(c1:N )=k the collection of observations assigned to cluster k. [sent-157, score-0.881]

65 First, we remove the customer link ci from the current conﬁguration. [sent-163, score-0.325]

66 In the ﬁrst stage, removing ci either leaves the cluster structure intact, i. [sent-165, score-0.21]

67 In the second stage, randomly reassigning ci either leaves the cluster structure intact, i. [sent-168, score-0.21]

68 p(xz(c1:N )=ℓ | λ)p(xz(c1:N )=m | λ) (5) (6) This deﬁnes a Markov chain whose stationary distribution is the posterior of the spatial ddCRP deﬁned in Section 2. [sent-176, score-0.162]

69 5 In the rddCRP, the algorithm for sampling the customer indicators is nearly the same, but with two differences. [sent-178, score-0.228]

70 Second, the likelihood term in Equation (4) depends only on the superpixels in the image assigned to the segment in question. [sent-181, score-0.258]

71 In the rddCRP, it also depends on other superpixels assigned to segments that are assigned to the same region. [sent-182, score-0.264]

72 Finally, the rddCRP also requires resampling of region assignments as follows: p(kt = ℓ | k−t , x1:N , t(c1:N ), γ, λ) ∝ −t mℓ p(xt | x−t , λ) γp(xt | λ) if ℓ is used; if ℓ is new. [sent-183, score-0.19]

73 (7) Here, xt is the set of customers sitting at table t, x−t is the set of all customers associated with region ℓ excluding xt , and m−t is the number of tables associated with region ℓ excluding xt . [sent-184, score-0.445]

74 ℓ 4 Empirical Results We compare the performance of the ddCRP to manual segmentations of images drawn from eight natural scene categories [19]. [sent-185, score-0.205]

75 This image collection has been previously used to analyze an image segmentation method based on spatially dependent Pitman-Yor (PY) processes [10], and we compare both methods using an identical feature set. [sent-189, score-0.566]

76 Each image is ﬁrst divided into approximately 1000 superpixels [15, 20]2 using the normalized cut algorithm [9]. [sent-190, score-0.214]

77 We also present segmentation results for qualitative evaluation in Figures 3 and 4 . [sent-196, score-0.166]

78 Empirically, γ has little impact on the segmentation results, due to the high-dimensional and informative image features; all our experiments set γ = 1. [sent-202, score-0.29]

79 Increasing to a = 2 (ddCRP2) results in fewer segments, but the produced segments are typically spatially fragmented. [sent-207, score-0.205]

80 Because it is hard to recover meaningful partitions if these initial segments are poor, the rddCRP performs best when a = 1. [sent-210, score-0.168]

81 We also compare to two previous models [10]: a PY mixture model with no spatial dependence (pybof20), and a PY mixture with spatial coupling induced via thresholded Gaussian processes (pydist20). [sent-216, score-0.421]

82 From left to right, the columns display natural images, segmentations for the ddCRP with a = 1, the ddCRP with a = 2, the rddCRP with a = 1, and thresholded Gaussian processes (pydist20) [10]. [sent-229, score-0.181]

83 Figure 4: Top left: Average segmentation performance on the database of natural scenes, as measured by the Rand index (larger is better), and those pairs of methods for which a Wilcoxon’s signed rank test indicates comparable performance with 95% conﬁdence. [sent-231, score-0.166]

84 Note that the rddCRP, spatial PY, and mean shift methods are statistically indistinguishable, and signiﬁcantly better than all others. [sent-233, score-0.162]

85 Finally, from the vision literature we also compare to the normalized cuts (Ncuts) [8] and mean shift (MS) [6] segmentation algorithms. [sent-237, score-0.247]

86 For normalized cuts the optimal number of segments was determined to be 5. [sent-240, score-0.167]

87 For mean shift we held the spatial 7 Quantitative performance is summarized in Figure 4. [sent-241, score-0.162]

88 Nevertheless, the patchy ddCRP1 segmentations are interesting for applications where segmentation is an intermediate step rather than the ﬁnal goal. [sent-243, score-0.274]

89 4, which show Rand indexes for each image from the mountain and street categories, provide insights into when one model outperforms the other. [sent-247, score-0.158]

90 For the street images rddCRP is better, while for images containing mountains spatial PY is superior. [sent-248, score-0.26]

91 To most fairly compare priors, we have tested a version of the spatial PY model employing a covariance function that depends only on spatial distance. [sent-250, score-0.256]

92 The ddCRP models only require pairwise superpixel distances to be speciﬁed, as opposed to the positive deﬁnite covariance function required by the spatial PY model. [sent-254, score-0.227]

93 5 Discussion We have studied the properties of spatial distance dependent Chinese restaurant processes, and applied them to the problem of image segmentation. [sent-257, score-0.488]

94 We showed that the spatial ddCRP model is particularly well suited for segmenting an image into a collection of contiguous patches. [sent-258, score-0.355]

95 Unlike previous Bayesian nonparametric models, it can produce segmentations with guaranteed spatial connectivity. [sent-259, score-0.303]

96 This hierarchical model achieves performance similar to state-of-the-art nonparametric Bayesian segmentation algorithms, using a simpler model and a substantially simpler inference algorithm. [sent-261, score-0.33]

97 Spectral chinese restaurant processes: Nonparametric clustering based on similarities. [sent-282, score-0.297]

98 Shared segmentation of natural scenes using dependent pitman-yor processes. [sent-327, score-0.233]

99 A bayesian model for simultaneous image clustering, annotation and object segmentation. [sent-336, score-0.16]

100 Blobworld: Image segmentation using expectation-maximization and its application to image querying. [sent-362, score-0.29]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ddcrp', 0.715), ('rddcrp', 0.313), ('customer', 0.228), ('segmentation', 0.166), ('py', 0.147), ('assignments', 0.143), ('restaurant', 0.142), ('spatial', 0.128), ('image', 0.124), ('segments', 0.12), ('crp', 0.118), ('chinese', 0.112), ('segmentations', 0.108), ('ci', 0.097), ('cluster', 0.09), ('customers', 0.088), ('contiguous', 0.085), ('superpixel', 0.079), ('superpixels', 0.072), ('nonparametric', 0.067), ('dirichlet', 0.066), ('tables', 0.061), ('spatially', 0.055), ('xz', 0.054), ('distance', 0.05), ('images', 0.049), ('partitions', 0.048), ('clusters', 0.048), ('region', 0.047), ('mixture', 0.046), ('crps', 0.045), ('dependent', 0.044), ('clustering', 0.043), ('hdp', 0.043), ('window', 0.039), ('sit', 0.039), ('thresholded', 0.038), ('rand', 0.038), ('assigned', 0.036), ('bayesian', 0.036), ('pixels', 0.035), ('processes', 0.035), ('posterior', 0.034), ('street', 0.034), ('shift', 0.034), ('sitting', 0.032), ('mixtures', 0.032), ('zi', 0.031), ('external', 0.031), ('gibbs', 0.03), ('produced', 0.03), ('cuts', 0.029), ('exchangeability', 0.028), ('unchanged', 0.028), ('assignment', 0.028), ('hierarchical', 0.027), ('process', 0.026), ('seating', 0.026), ('segment', 0.026), ('simpler', 0.025), ('scene', 0.024), ('determines', 0.024), ('appearance', 0.024), ('biases', 0.024), ('intact', 0.024), ('joins', 0.024), ('manual', 0.024), ('leaves', 0.023), ('scenes', 0.023), ('hops', 0.023), ('table', 0.022), ('color', 0.022), ('endowed', 0.021), ('decay', 0.021), ('clustered', 0.021), ('texture', 0.021), ('fowlkes', 0.02), ('distances', 0.02), ('xt', 0.02), ('regions', 0.02), ('partition', 0.02), ('encourages', 0.02), ('inference', 0.02), ('samplers', 0.02), ('labelme', 0.02), ('promising', 0.02), ('base', 0.019), ('scatter', 0.019), ('uential', 0.019), ('pami', 0.019), ('adjacent', 0.019), ('observed', 0.019), ('priors', 0.018), ('biased', 0.018), ('collection', 0.018), ('normalized', 0.018), ('traditional', 0.018), ('patches', 0.018), ('clusterings', 0.018), ('dij', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

Author: Soumya Ghosh, Andrei B. Ungureanu, Erik B. Sudderth, David M. Blei

2 0.19984104 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes

Author: Yee W. Teh, Charles Blundell, Lloyd Elliott

Abstract: We propose a novel class of Bayesian nonparametric models for sequential data called fragmentation-coagulation processes (FCPs). FCPs model a set of sequences using a partition-valued Markov process which evolves by splitting and merging clusters. An FCP is exchangeable, projective, stationary and reversible, and its equilibrium distributions are given by the Chinese restaurant process. As opposed to hidden Markov models, FCPs allow for ﬂexible modelling of the number of clusters, and they avoid label switching non-identiﬁability problems. We develop an efﬁcient Gibbs sampler for FCPs which uses uniformization and the forward-backward algorithm. Our development of FCPs is motivated by applications in population genetics, and we demonstrate the utility of FCPs on problems of genotype imputation with phased and unphased SNP data. 1

3 0.16443048 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation

Author: Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, Chang D. Yoo

Abstract: For many of the state-of-the-art computer vision algorithms, image segmentation is an important preprocessing step. As such, several image segmentation algorithms have been proposed, however, with certain reservation due to high computational load and many hand-tuning parameters. Correlation clustering, a graphpartitioning algorithm often used in natural language processing and document clustering, has the potential to perform better than previously proposed image segmentation algorithms. We improve the basic correlation clustering formulation by taking into account higher-order cluster relationships. This improves clustering in the presence of local boundary ambiguities. We ﬁrst apply the pairwise correlation clustering to image segmentation over a pairwise superpixel graph and then develop higher-order correlation clustering over a hypergraph that considers higher-order relations among superpixels. Fast inference is possible by linear programming relaxation, and also effective parameter learning framework by structured support vector machine is possible. Experimental results on various datasets show that the proposed higher-order correlation clustering outperforms other state-of-the-art image segmentation algorithms.

4 0.15224165 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling

Author: Adrian Ion, Joao Carreira, Cristian Sminchisescu

Abstract: We present a joint image segmentation and labeling model (JSL) which, given a bag of ﬁgure-ground segment hypotheses extracted at multiple image locations and scales, constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as ﬁrst sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag [1], followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect conﬁgurations that a not-yet-competent model rates probable during learning. We show that the proposed methodology matches the current state of the art in the Stanford dataset [2], as well as in VOC2010, where 41.7% accuracy on the test set is achieved.

5 0.14567004 227 nips-2011-Pylon Model for Semantic Segmentation

Author: Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman

Abstract: Graph cut optimization is one of the standard workhorses of image segmentation since for binary random ﬁeld representations of the image, it gives globally optimal results and there are efﬁcient polynomial time implementations. Often, the random ﬁeld is applied over a ﬂat partitioning of the image into non-intersecting elements, such as pixels or super-pixels. In the paper we show that if, instead of a ﬂat partitioning, the image is represented by a hierarchical segmentation tree, then the resulting energy combining unary and boundary terms can still be optimized using graph cut (with all the corresponding beneﬁts of global optimality and efﬁciency). As a result of such inference, the image gets partitioned into a set of segments that may come from different layers of the tree. We apply this formulation, which we call the pylon model, to the task of semantic segmentation where the goal is to separate an image into areas belonging to different semantic classes. The experiments highlight the advantage of inference on a segmentation tree (over a ﬂat partitioning) and demonstrate that the optimization in the pylon model is able to ﬂexibly choose the level of segmentation across the image. Overall, the proposed system has superior segmentation accuracy on several datasets (Graz-02, Stanford background) compared to previously suggested approaches. 1

6 0.097108103 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

7 0.095969789 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies

8 0.086210839 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

9 0.081510231 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

10 0.081087045 156 nips-2011-Learning to Learn with Compound HD Models

11 0.072149113 162 nips-2011-Lower Bounds for Passive and Active Learning

12 0.07004828 285 nips-2011-The Kernel Beta Process

13 0.064172186 186 nips-2011-Noise Thresholds for Spectral Clustering

14 0.064112827 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs

15 0.062593862 127 nips-2011-Image Parsing with Stochastic Scene Grammar

16 0.062050927 66 nips-2011-Crowdclustering

17 0.060276333 221 nips-2011-Priors over Recurrent Continuous Time Processes

18 0.058526602 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction

19 0.058113761 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows

20 0.057172187 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.15), (1, 0.104), (2, -0.06), (3, 0.123), (4, 0.045), (5, -0.087), (6, -0.024), (7, -0.031), (8, 0.097), (9, 0.057), (10, 0.092), (11, 0.065), (12, 0.129), (13, -0.204), (14, -0.155), (15, 0.013), (16, 0.06), (17, -0.14), (18, -0.032), (19, -0.026), (20, 0.051), (21, -0.077), (22, -0.032), (23, -0.075), (24, -0.132), (25, 0.031), (26, 0.049), (27, -0.064), (28, 0.038), (29, 0.058), (30, 0.028), (31, 0.008), (32, -0.088), (33, 0.01), (34, -0.009), (35, -0.014), (36, 0.093), (37, 0.027), (38, 0.061), (39, 0.023), (40, 0.065), (41, -0.057), (42, -0.024), (43, 0.026), (44, 0.066), (45, -0.067), (46, -0.055), (47, 0.051), (48, 0.029), (49, -0.078)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93638843 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

Author: Soumya Ghosh, Andrei B. Ungureanu, Erik B. Sudderth, David M. Blei

2 0.68553567 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation

Author: Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, Chang D. Yoo

3 0.67748511 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling

Author: Adrian Ion, Joao Carreira, Cristian Sminchisescu

4 0.6772911 227 nips-2011-Pylon Model for Semantic Segmentation

Author: Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman

5 0.63773996 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Author: Philipp Krähenbühl, Vladlen Koltun

Abstract: Most state-of-the-art techniques for multi-class image segmentation and labeling use conditional random ﬁelds deﬁned over pixels or image regions. While regionlevel models often feature dense pairwise connectivity, pixel-level models are considerably larger and have only permitted sparse graph structures. In this paper, we consider fully connected CRF models deﬁned on the complete set of pixels in an image. The resulting graphs have billions of edges, making traditional inference algorithms impractical. Our main contribution is a highly efﬁcient approximate inference algorithm for fully connected CRF models in which the pairwise edge potentials are deﬁned by a linear combination of Gaussian kernels. Our experiments demonstrate that dense connectivity at the pixel level substantially improves segmentation and labeling accuracy. 1

6 0.59702295 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes

7 0.5542984 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies

8 0.5115186 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance

9 0.50330573 285 nips-2011-The Kernel Beta Process

10 0.45632914 221 nips-2011-Priors over Recurrent Continuous Time Processes

11 0.44451922 127 nips-2011-Image Parsing with Stochastic Scene Grammar

12 0.44447526 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction

13 0.41015843 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

14 0.40700409 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data

15 0.39643297 55 nips-2011-Collective Graphical Models

16 0.38363644 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC

17 0.38325414 131 nips-2011-Inference in continuous-time change-point models

18 0.37592313 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation

19 0.37562987 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors

20 0.35230958 101 nips-2011-Gaussian process modulated renewal processes

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.019), (4, 0.037), (12, 0.145), (20, 0.083), (31, 0.075), (33, 0.155), (43, 0.051), (45, 0.082), (57, 0.047), (60, 0.011), (74, 0.064), (83, 0.056), (84, 0.036), (99, 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83922076 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

Author: Soumya Ghosh, Andrei B. Ungureanu, Erik B. Sudderth, David M. Blei

2 0.77740091 99 nips-2011-From Stochastic Nonlinear Integrate-and-Fire to Generalized Linear Models

Author: Skander Mensi, Richard Naud, Wulfram Gerstner

Abstract: Variability in single neuron models is typically implemented either by a stochastic Leaky-Integrate-and-Fire model or by a model of the Generalized Linear Model (GLM) family. We use analytical and numerical methods to relate state-of-theart models from both schools of thought. First we ﬁnd the analytical expressions relating the subthreshold voltage from the Adaptive Exponential Integrate-andFire model (AdEx) to the Spike-Response Model with escape noise (SRM as an example of a GLM). Then we calculate numerically the link-function that provides the ﬁring probability given a deterministic membrane potential. We ﬁnd a mathematical expression for this link-function and test the ability of the GLM to predict the ﬁring probability of a neuron receiving complex stimulation. Comparing the prediction performance of various link-functions, we ﬁnd that a GLM with an exponential link-function provides an excellent approximation to the Adaptive Exponential Integrate-and-Fire with colored-noise input. These results help to understand the relationship between the different approaches to stochastic neuron models. 1 Motivation When it comes to modeling the intrinsic variability in simple neuron models, we can distinguish two traditional approaches. One approach is inspired by the stochastic Leaky Integrate-and-Fire (LIF) hypothesis of Stein (1967) [1], where a noise term is added to the system of differential equations implementing the leaky integration to a threshold. There are multiple versions of such a stochastic LIF [2]. How the noise affects the ﬁring probability is also a function of the parameters of the neuron model. Therefore, it is important to take into account the reﬁnements of simple neuron models in terms of subthreshold resonance [3, 4], spike-triggered adaptation [5, 6] and non-linear spike 1 initiation [7, 5]. All these improvements are encompassed by the Adaptive Exponential Integrateand-Fire model (AdEx [8, 9]). The other approach is to start with some deterministic dynamics for the the state of the neuron (for instance the instantaneous distance from the membrane potential to the threshold) and link the probability intensity of emitting a spike with a non-linear function of the state variable. Under some conditions, this type of model is part of a greater class of statistical models called Generalized Linear Models (GLM [10]). As a single neuron model, the Spike Response Model (SRM) with escape noise is a GLM in which the state variable is explicitly the distance between a deterministic voltage and the threshold. The original SRM could account for subthreshold resonance, refractory effects and spike-frequency adaptation [11]. Mathematically similar models were developed independently in the study of the visual system [12] where spike-frequency adaptation has also been modeled [13]. Recently, this approach has retained increased attention since the probabilistic framework can be linked with the Bayesian theory of neural systems [14] and because Bayesian inference can be applied to the population of neurons [15]. In this paper, we investigate the similarity and differences between the state-of-the-art GLM and the stochastic AdEx. The motivation behind this work is to relate the traditional threshold neuron models to Bayesian theory. Our results extend the work of Plesser and Gerstner (2000) [16] since we include the non-linearity for spike initiation and spike-frequency adaptation. We also provide relationships between the parameters of the AdEx and the equivalent GLM. These precise relationships can be used to relate analog implementations of threshold models [17] to the probabilistic models used in the Bayesian approach. The paper is organized as follows: We ﬁrst describe the expressions relating the SRM state-variable to the parameters of the AdEx (Sect. 3.1) in the subthreshold regime. Then, we use numerical methods to ﬁnd the non-linear link-function that models the ﬁring probability (Sect. 3.2). We ﬁnd a functional form for the SRM link-function that best describes the ﬁring probability of a stochastic AdEx. We then compare the performance of this link-function with the often used exponential or linear-rectiﬁer link-functions (also called half-wave linear rectiﬁer) in terms of predicting the ﬁring probability of an AdEx under complex stimulus (Sect. 3.3). We ﬁnd that the exponential linkfunction yields almost perfect prediction. Finally, we explore the relations between the statistic of the noise and the sharpness of the non-linearity for spike initiation with the parameters of the SRM. 2 Presentation of the Models In this section we present the general formula for the stochastic AdEx model (Sect. 2.1) and the SRM (Sect 2.2). 2.1 The Stochastic Adaptive Exponential Integrate-and-Fire Model The voltage dynamics of the stochastic AdEx is given by: V −Θ ˙ τm V = El − V + ∆T exp − Rw + RI + R (1) ∆T τw w = a(V − El ) − w ˙ (2) where τm is the membrane time constant, El the reverse potential, R the membrane resistance, Θ is the threshold, ∆T is the shape factor and I(t) the input current which is chosen to be an Ornstein−Θ Uhlenbeck process with correlation time-constant of 5 ms. The exponential term ∆T exp( V∆T ) is a non-linear function responsible for the emission of spikes and is a diffusive white noise with standard deviation σ (i.e. ∼ N (0, σ)). Note that the diffusive white-noise does not imply white noise ﬂuctuations of the voltage V (t), the probability distribution of V (t) will depend on ∆T and Θ. The second variable, w, describes the subthreshold as well as the spike-triggered adaptation both ˆ parametrized by the coupling strength a and the time constant τw . Each time tj the voltage goes to inﬁnity, we assumed that a spike is emitted. Then the voltage is reset to a ﬁxed value Vr and w is increased by a constant value b. 2.2 The Generalized Linear Model In the SRM, The voltage V (t) is given by the convolution of the injected current I(t) with the membrane ﬁlter κ(t) plus the additional kernel η(t) that acts after each spikes (here we split the 2 spike-triggered kernel in two η(t) = ηv (t) + ηw (t) for reasons that will become clear later): V (t) = ˆ ˆ ηv (t − tj ) + ηw (t − tj ) El + [κ ∗ I](t) + (3) ˆ {tj } ˆ Then at each time tj a spike is emitted which results in a change of voltage described by η(t) = ηv (t) + ηw (t). Given the deterministic voltage, (Eq. 3) a spike is emitted according to the ﬁring intensity λ(V ): λ(t) = f (V (t)) (4) where f (·) is an arbitrary function called the link-function. Then the ﬁring behavior of the SRM depends on the choice of the link-function and its parameters. The most common link-function used to model single neuron activities are the linear-rectiﬁer and the exponential function. 3 Mapping In order to map the stochastic AdEx to the SRM we follow a two-step procedure. First we derive the ﬁlter κ(t) and the kernels ηv (t) and ηw (t) analytically as a function of AdEx parameters. Second, we derive the link-function of the SRM from the stochastic spike emission of the AdEx. Figure 1: Mapping of the subthreshold dynamics of an AdEx to an equivalent SRM. A. Membrane ﬁlter κ(t) for three different sets of parameters of the AdEx leading to over-damped, critically damped and under-damped cases (upper, middle and lower panel, respectively). B. Spike-Triggered η(t) (black), ηv (t) (light gray) and ηw (gray) for the three cases. C. Example of voltage trace produced when an AdEx is stimulated with a step of colored noise (black). The corresponding voltage from a SRM stimulated with the same current and where we forced the spikes to match those of the AdEx (red). D. Error in the subthreshold voltage (VAdEx − VGLM ) as a function of the mean voltage of the AdEx, for the three different cases: over-, critically and under-damped (light gray, gray and black, respectively) with ∆T = 1 mV. Red line represents the voltage threshold Θ. E. Root Mean Square Error (RMSE) ratio for the three cases with ∆T = 1 mV. The RMSE ratio is the RMSE between the deterministic VSRM and the stochastic VAdEx divided by the RMSE between repetitions of the stochastic AdEx voltage. The error bar shows a single standard deviation as the RMSE ratio is averaged accross multiple value of σ. 3.1 Subthreshold voltage dynamics We start by assuming that the non-linearity for spike initiation does not affect the mean subthreshold voltage of the stochastic AdEx (see Figure 1 D). This assumption is motivated by the small ∆T 3 observed in in-vitro recordings (from 0.5 to 2 mV [8, 9]) which suggest that the subthreshold dynamics are mainly linear except very close to Θ. Also, we expect that the non-linear link-function will capture some of the dynamics due to the non-linearity for spike initiation. Thus it is possible to rewrite the deterministic subthreshold part of the AdEx (Eq. 1-2 without and without ∆T exp((V − Θ)/∆T )) using matrices: ˙ x = Ax (5) with x = V w and A = − τ1 m a τw − gl1m τ − τ1 w (6) In this form, the dynamics of the deterministic AdEx voltage is a damped oscillator with a driving force. Depending on the eigenvalues of A the system could be over-damped, critically damped or under-damped. The ﬁlter κ(t) of the GLM is given by the impulse response of the system of coupled differential equations of the AdEx, described by Eq. 5 and 6. In other words, one has to derive the response of the system when stimulating with a Dirac-delta function. The type of damping gives three different qualitative shapes of the kernel κ(t), which are summarized in Table 3.1 and Figure 1 A. Since the three different ﬁlters also affect the nature of the stochastic voltage ﬂuctuations, we will keep the distinction between over-damped, critically damped and under-damped scenarios throughout the paper. This means that our approach is valid for at least 3 types of diffusive voltage-noise (i.e. the white noise in Eq. 1 ﬁltered by 3 different membrane ﬁlters κ(t)). To complete the description of the deterministic voltage, we need an expression for the spiketriggered kernels. The voltage reset at each spike brings a spike-triggered jump in voltage of magˆ nitude ∆ = Vr − V (t). This perturbation is superposed to the current ﬂuctuations due to I(t) and can be mediated by a Delta-diract pulse of current. Thus we can write the voltage reset kernel by: ηv (t) = ∆ ∆ [δ ∗ κ] (t) = κ(t) κ(0) κ(0) (7) where δ(t) is the Dirac-delta function. The shape of this kernel depends on κ(t) and can be computed from Table 3.1 (see Figure 1 B). Finally, the AdEx mediates spike-frequency adaptation by the jump of the second variables w. From Eq. 2 we can see that this produces a current wspike (t) = b exp (−t/τw ) that can cumulate over subsequent spikes. The effect of this current on voltage is then given by the convolution of wspike (t) with the membrane ﬁlter κ(t). Thus in the SRM framework the spike-frequency adaptation is taken into account by: ηw (t) = [wspike ∗ κ](t) (8) Again the precise form of ηw (t) depends on κ(t) and can be computed from Table 3.1 (see Figure 1 B). At this point, we would like to verify our assumption that the non-linearity for spike emission can be neglected. Fig. 1 C and D shows that the error between the voltage from Eq. 3 and the voltage from the stochastic AdEx is generally small. Moreover, we see that the main contribution to the voltage prediction error is due to the mismatch close to the spikes. However the non-linearity for spike initiation may change the probability distribution of the voltage ﬂuctuations, which in turn inﬂuences the probability of spiking. This will inﬂuence the choice of the link-function, as we will see in the next section. 3.2 Spike Generation Using κ(t), ηv (t) and ηw (t), we must relate the spiking probability of the stochastic AdEx as a function of its deterministic voltage. According to [2] the probability of spiking in time bin dt given the deterministic voltage V (t) is given by: p(V ) = prob{spike in [t, t + dt]} = 1 − exp (−f (V (t))dt) (9) where f (·) gives the ﬁring intensity as a function of the deterministic V (t) (Eq. 3). Thus to extract the link-function f we have to compute the probability of spiking given V (t) for our SRM. To do so we apply the method proposed by Jolivet et al. (2004) [18], where the probability of spiking is simply given by the distribution of the deterministic voltage estimated at the spike times divided by the distribution of the SRM voltage when there is no spike (see ﬁgure 2 A). One can numerically compute these two quantities for our models using N repetitions of the same stimulus. 4 Table 1: Analytical expressions for the membrane ﬁlter κ(t) in terms of the parameters of the AdEx for over-, critically-, and under-damped cases. Membrane Filter: κ(t) over-damped if: (τm + τw )2 > 4τm τw (gl +a) gl κ(t) = k1 eλ1 t + k2 eλ2 t λ1 = 1 2τm τw (−(τm + τw ) + critically-damped if: (τm + τw )2 = 4τm τw (gl +a) gl κ(t) = (αt + β)eλt λ= under-damped if: (τm + τw )2 < 4τm τw (gl +a) gl κ(t) = (k1 cos (ωt) + k2 sin (ωt)) eλt −(τm +τw ) 2τm τw λ= −(τm +τw ) 2τm τw (τm + τw )2 − 4 τm τw (gl + a) gl λ2 = 1 2τm τw (−(τm + τw ) − α= τm −τw 2Cτm τw ω= τw −τm 2τm τw 2 − a g l τm τw (τm + τw )2 − 4 τm τw (gl + a) gl k1 = −(1+(τm λ2 )) Cτm (λ1 −λ2 ) k2 = 1+(τm λ1 ) Cτm (λ1 −λ2 ) β= 1 C k1 = k2 = 1 C −(1+τm λ) Cωτm The standard deviation σ of the noise and the parameter ∆T of the AdEx non-linearity may affect the shape of the link-function. We thus extract p(V ) for different σ and ∆T (Fig. 2 B). Then using visual heuristics and previous knowledge about the potential analytical expression of the link-funtion, we try to ﬁnd a simple analytical function that captures p(V ) for a large range of combinations of σ and ∆T . We observed that the log(− log(p)) is close to linear in most studied conditions Fig. 2 B suggesting the following two distributions of p(V ): V − VT (10) p(V ) = 1 − exp − exp ∆V V − VT p(V ) = exp − exp − (11) ∆V Once we have p(V ), we can use Eq. 4 to obtain the equivalent SRM link-function, which leads to: −1 f (V ) = log (1 − p(V )) (12) dt Then the two potential link-functions of the SRM can be derived from Eq. 10 and Eq. 11 (respectively): V − VT f (V ) = λ0 exp (13) ∆V V − VT (14) f (V ) = −λ0 log 1 − exp − exp − ∆V 1 with λ0 = dt , VT the threshold of the SRM and ∆V the sharpness of the link-function (i.e. the parameters that governs the degree of the stochasticity). Note that the exact value of λ0 has no importance since it is redundant with VT . Eq. 13 is the standard exponential link-function, but we call Eq. 14 the log-exp-exp link-function. 3.3 Prediction The next point is to evaluate the ﬁt quality of each link-function. To do this, we ﬁrst estimate the parameters VT and ∆V of the GLM link-function that maximize the likelihood of observing a spike 5 Figure 2: SRM link-function. A. Histogram of the SRM voltage at the AdEx ﬁring times (red) and at non-ﬁring times (gray). The ratio of the two distributions gives p(V ) (Eq. 9, dashed lines). Inset, zoom to see the voltage histogram evaluated at the ﬁring time (red). B. log(− log(p)) as a function of the SRM voltage for three different noise levels σ = 0.07, 0.14, 0.18 nA (pale gray, gray, black dots, respectively) and ∆T = 1 mV. The line is a linear ﬁt corresponding to the log-exp-exp linkfunction and the dashed line corresponds to a ﬁt with the exponential link-function. C. Same data and labeling scheme as B, but plotting f (V ) according to Eq. 12. The lines are produced with Eq. 14 with parameters ﬁtted as described in B. and the dashed lines are produced with Eq. 13. Inset, same plot but on a semi-log(y) axis. train generated with an AdEx. Second we look at the predictive power of the resulting SRM in terms of Peri-Stimulus Time Histogram (PSTH). In other words we ask how close the spike trains generated with a GLM are from the spike train generated with a stochastic AdEx when both models are stimulated with the same input current. For any GLM with link-function f (V ) ≡ f (t|I, θ) and parameters θ regulating the shape of κ(t), ˆ ηv (t) and ηw (t), the Negative Log-Likelihood (NLL) of observing a spike-train {t} is given by:   NLL = − log(f (t|I, θ)) − f (t|I, θ) (15) t ˆ t It has been shown that the negative log-likelihood is convex in the parameters if f is convex and logconcave [19]. It is easy to show that a linear-rectiﬁer link-function, the exponential link-function and the log-exp-exp link-function all satisfy these conditions. This allows efﬁcient estimation of ˆ ˆ the optimal parameters VT and ∆V using a simple gradient descent. One can thus estimate from a single AdEx spike train the optimal parameters of a given link-function, which is more efﬁcient than the method used in Sect. 3.2. The minimal NLL resulting from the gradient descent gives an estimation of the ﬁt quality. A better estimate of the ﬁt quality is given by the distance between the PSTHs in response to stimuli not used for parameter ﬁtting . Let ν1 (t) be the PSTH of the AdEx, and ν2 (t) be the PSTH of the ﬁtted SRM, 6 Figure 3: PSTH prediction. A. Injected current. B. Voltage traces produced by an AdEx (black) and the equivalent SRM (red), when stimulated with the current in A. C. Raster plot for 20 realizations of AdEx (black tick marks) and equivalent SRM (red tick marks). D. PSTH of the AdEx (black) and the SRM (red) obtained by averaging 10,000 repetitions. E. Optimal log-likelihood for the three cases of the AdEx, using three different link-functions, a linear-rectiﬁer (light gray), an exponential link-function (gray) and the link-function deﬁned by Eq. 14 (dark gray), these values are obtained by averaging over 40 different combinations σ and ∆T (see Fig. 4). Error bars are one standard deviation, the stars denote a signiﬁcant difference, two-sample t-test with α = 0.01. F. same as E. but for Md (Eq. 16). then we use Md ∈ [0, 1] as a measure of match: Md = 2 2 (ν1 (t) − ν2 (t)) dt ν1 (t)2 dt + ν2 (t)2 dt (16) Md = 1 means that it is impossible to differentiate the SRM from the AdEx in terms of their PSTHs, whereas a Md of 0 means that the two PSTHs are completely different. Thus Md is a normalized similarity measure between two PSTHs. In practice, Md is estimated from the smoothed (boxcar average of 1 ms half-width) averaged spike train of 1 000 repetitions for each models. We use both the NLL and Md to quantify the ﬁt quality for each of the three damping cases and each of the three link-functions. Figure 3 shows the match between the stochastic AdEx used as a reference and the derived GLM when both are stimulated with the same input current (Fig. 3 A). The resulting voltage traces are almost identical (Fig. 3 B) and both models predict almost the same spike trains and so the same PSTHs (Fig. 3 C and D). More quantitalively, we see on Fig. 3 E and F, that the linear-rectiﬁer ﬁts signiﬁcantly worse than both the exponential and log-exp-exp link-functions, both in terms of NLL and of Md . The exponential link-function performs as well as the log-exp-exp link-function, with a spike train similarity measure Md being almost 1 for both. Finally the likelihood-based method described above gives us the opportunity to look at the relationship between the AdEx parameters σ and ∆T that governs its spike emission and the parameters VT and ∆V of the link-function (Fig. 4). We observe that an increase of the noise level produces a ﬂatter link-function (greater ∆V ) while an increase in ∆T also produces an increase in ∆V and VT (note that Fig. 4 shows ∆V and VT for the exponential link-function only, but equivalent results are obtained with the log-exp-exp link-function). 4 Discussion In Sect. 3.3 we have shown that it is possible to predict with almost perfect accuracy the PSTH of a stochastic AdEx model using an appropriate set of parameters in the SRM. Moreover, since 7 Figure 4: Inﬂuence of the AdEx parameters on the parameters of the exponential link-function. A. VT as a function of ∆T and σ. B. ∆V as a function of ∆T and σ. the subthreshold voltage of the AdEx also gives a good match with the deterministic voltage of the SRM, we expect that the AdEx and the SRM will not differ in higher moments of the spike train probability distributions beyond the PSTH. We therefore conclude that diffusive noise models of the type of Eq. 1-2 are equivalent to GLM of the type of Eq. 3-4. Once combined with similar results on other types of stochastic LIF (e.g. correlated noise), we could bridge the gap between the literature on GLM and the literature on diffusive noise models. Another noteworthy observation pertains to the nature of the link-function. The link-function has been hypothesized to be a linear-rectiﬁer, an exponential, a sigmoidal or a Gaussian [16]. We have observed that for the AdEx the link-function follows Eq. 14 that we called the log-exp-exp linkfunction. Although the link-function is log-exp-exp for most of the AdEx parameters, the exponential link-function gives an equivalently good prediction of the PSTH. This can be explained by the fact that the difference between log-exp-exp and exponential link-functions happens mainly at low voltage (i.e. far from the threshold), where the probability of emitting a spike is so low (Figure 2 C, until -50 mv). Therefore, even if the exponential link-function overestimates the ﬁring probability at these low voltages it rarely produces extra spikes. At voltages closer to the threshold, where most of the spikes are emitted, the two link-functions behave almost identically and hence produce the same PSTH. The Gaussian link-function can be seen as lying in-between the exponential link-function and the log-exp-exp link-function in Fig. 2. This means that the work of Plesser and Gerstner (2000) [16] is in agreement with the results presented here. The importance of the time-derivative of the ˙ voltage stressed by Plesser and Gerstner (leading to a two-dimensional link-function f (V, V )) was not studied here to remain consistent with the typical usage of GLM in neural systems [14]. Finally we restricted our study to exponential non-linearity for spike initiation and do not consider other cases such as the Quadratic Integrate-and-ﬁre (QIF, [5]) or other polynomial functional shapes. We overlooked these cases for two reasons. First, there are many evidences that the non-linearity in neurons (estimated from in-vitro recordings of Pyramidal neurons) is well approximated by a single exponential [9]. Second, the exponential non-linearity of the AdEx only affects the subthreshold voltage at high voltage (close to threshold) and thus can be neglected to derive the ﬁlters κ(t) and η(t). Polynomial non-linearities on the other hand affect a larger range of the subthreshold voltage so that it would be difﬁcult to justify the linearization of subthreshold dynamics essential to the method presented here. References [1] R. B. Stein, “Some models of neuronal variability,” Biophys J, vol. 7, no. 1, pp. 37–68, 1967. [2] W. Gerstner and W. Kistler, Spiking neuron models. Cambridge University Press New York, 2002. [3] E. Izhikevich, “Resonate-and-ﬁre neurons,” Neural Networks, vol. 14, no. 883-894, 2001. [4] M. J. E. Richardson, N. Brunel, and V. Hakim, “From subthreshold to ﬁring-rate resonance,” Journal of Neurophysiology, vol. 89, pp. 2538–2554, 2003. 8 [5] E. Izhikevich, “Simple model of spiking neurons,” IEEE Transactions on Neural Networks, vol. 14, pp. 1569–1572, 2003. [6] S. Mensi, R. Naud, M. Avermann, C. C. H. Petersen, and W. Gerstner, “Parameter extraction and classiﬁcation of three neuron types reveals two different adaptation mechanisms,” Under review. [7] N. Fourcaud-Trocme, D. Hansel, C. V. Vreeswijk, and N. Brunel, “How spike generation mechanisms determine the neuronal response to ﬂuctuating inputs,” Journal of Neuroscience, vol. 23, no. 37, pp. 11 628–11 640, 2003. [8] R. Brette and W. Gerstner, “Adaptive exponential integrate-and-ﬁre model as an effective description of neuronal activity,” Journal of Neurophysiology, vol. 94, pp. 3637–3642, 2005. [9] L. Badel, W. Gerstner, and M. Richardson, “Dependence of the spike-triggered average voltage on membrane response properties,” Neurocomputing, vol. 69, pp. 1062–1065, 2007. [10] P. McCullagh and J. A. Nelder, Generalized linear models, 2nd ed. Chapman & Hall/CRC, 1998, vol. 37. [11] W. Gerstner, J. van Hemmen, and J. Cowan, “What matters in neuronal locking?” Neural computation, vol. 8, pp. 1653–1676, 1996. [12] D. Hubel and T. Wiesel, “Receptive ﬁelds and functional architecture of monkey striate cortex,” Journal of Physiology, vol. 195, pp. 215–243, 1968. [13] J. Pillow, L. Paninski, V. Uzzell, E. Simoncelli, and E. Chichilnisky, “Prediction and decoding of retinal ganglion cell responses with a probabilistic spiking model,” Journal of Neuroscience, vol. 25, no. 47, pp. 11 003–11 013, 2005. [14] K. Doya, S. Ishii, A. Pouget, and R. P. N. Rao, Bayesian brain: Probabilistic approaches to neural coding. The MIT Press, 2007. [15] S. Gerwinn, J. H. Macke, M. Seeger, and M. Bethge, “Bayesian inference for spiking neuron models with a sparsity prior,” in Advances in Neural Information Processing Systems, 2007. [16] H. Plesser and W. Gerstner, “Noise in integrate-and-ﬁre neurons: From stochastic input to escape rates,” Neural Computation, vol. 12, pp. 367–384, 2000. [17] J. Schemmel, J. Fieres, and K. Meier, “Wafer-scale integration of analog neural networks,” in Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, june 2008, pp. 431 –438. [18] R. Jolivet, T. Lewis, and W. Gerstner, “Generalized integrate-and-ﬁre models of neuronal activity approximate spike trains of a detailed model to a high degree of accuracy,” Journal of Neurophysiology, vol. 92, pp. 959–976, 2004. [19] L. Paninski, “Maximum likelihood estimation of cascade point-process neural encoding models,” Network: Computation in Neural Systems, vol. 15, pp. 243–262, 2004. 9

3 0.7653259 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows

Author: Bogdan Alexe, Viviana Petrescu, Vittorio Ferrari

Abstract: We present a computationally efﬁcient technique to compute the distance of highdimensional appearance descriptor vectors between image windows. The method exploits the relation between appearance distance and spatial overlap. We derive an upper bound on appearance distance given the spatial overlap of two windows in an image, and use it to bound the distances of many pairs between two images. We propose algorithms that build on these basic operations to efﬁciently solve tasks relevant to many computer vision applications, such as ﬁnding all pairs of windows between two images with distance smaller than a threshold, or ﬁnding the single pair with the smallest distance. In experiments on the PASCAL VOC 07 dataset, our algorithms accurately solve these problems while greatly reducing the number of appearance distances computed, and achieve larger speedups than approximate nearest neighbour algorithms based on trees [18] and on hashing [21]. For example, our algorithm ﬁnds the most similar pair of windows between two images while computing only 1% of all distances on average. 1

4 0.76319712 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes

Author: Yee W. Teh, Charles Blundell, Lloyd Elliott

5 0.76225144 141 nips-2011-Large-Scale Category Structure Aware Image Categorization

Author: Bin Zhao, Fei Li, Eric P. Xing

Abstract: Most previous research on image categorization has focused on medium-scale data sets, while large-scale image categorization with millions of images from thousands of categories remains a challenge. With the emergence of structured large-scale dataset such as the ImageNet, rich information about the conceptual relationships between images, such as a tree hierarchy among various image categories, become available. As human cognition of complex visual world beneﬁts from underlying semantic relationships between object classes, we believe a machine learning system can and should leverage such information as well for better performance. In this paper, we employ such semantic relatedness among image categories for large-scale image categorization. Speciﬁcally, a category hierarchy is utilized to properly deﬁne loss function and select common set of features for related categories. An efﬁcient optimization method based on proximal approximation and accelerated parallel gradient method is introduced. Experimental results on a subset of ImageNet containing 1.2 million images from 1000 categories demonstrate the effectiveness and promise of our proposed approach. 1

6 0.7528798 103 nips-2011-Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss

7 0.72908419 227 nips-2011-Pylon Model for Semantic Segmentation

8 0.72791195 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling

9 0.7278598 188 nips-2011-Non-conjugate Variational Message Passing for Multinomial and Binary Regression

10 0.72167498 197 nips-2011-On Tracking The Partition Function

11 0.71408337 165 nips-2011-Matrix Completion for Multi-label Image Classification

12 0.70859444 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

13 0.6964612 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation

14 0.69007325 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors

15 0.68561739 154 nips-2011-Learning person-object interactions for action recognition in still images

16 0.68064731 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data

17 0.67384356 168 nips-2011-Maximum Margin Multi-Instance Learning

18 0.67087185 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation

19 0.67060989 66 nips-2011-Crowdclustering

20 0.67046523 35 nips-2011-An ideal observer model for identifying the reference frame of objects