nips nips2012 nips2012-120 knowledge-graph by maker-knowledge-mining

120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization


Source: pdf

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. [sent-9, score-0.416]

2 In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. [sent-10, score-0.531]

3 We supplement our theoretical analysis with simulations and an application to real video data. [sent-11, score-0.117]

4 These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity. [sent-12, score-0.35]

5 1 Introduction In the field of theoretical signal processing, compressive sensing (CS) has arguably been one of the major developments of the past decade. [sent-13, score-0.179]

6 This claim is supported in part by the deluge of research efforts (see for example Rice University’s CS repository [1]) which has followed the inception of this field [2, 3, 4]. [sent-14, score-0.031]

7 CS considers the problem of acquiring and recovering signals that are sparse (or compressible) in a given basis using non-adaptive linear measurements, at a rate smaller than what the Shannon-Nyquist theorem would require. [sent-15, score-0.31]

8 The work [2, 4] derived conditions under which a sparse signal can be recovered exactly from a small set of non-adaptive linear measurements. [sent-16, score-0.125]

9 In [3], the authors propose a recovery algorithm for the case of measurements contaminated by bounded noise. [sent-17, score-0.387]

10 They show that this algorithm is stable, that is, within a constant of the noise tolerance. [sent-18, score-0.061]

11 Recovery of these sparse or compressible signals is performed using convex optimization techniques. [sent-19, score-0.327]

12 temporal or spatial, of the underlying high-dimensional sparse signals of interest. [sent-22, score-0.201]

13 In recent years, the attention has shifted to formulations which incorporate the signal structure into the CS framework. [sent-23, score-0.029]

14 A number of problems and applications of interest deal with time-varying signals which may not only be sparse at any given instant, but may also exhibit sparse changes from one instant to the next. [sent-24, score-0.311]

15 For example, a video 1 of a natural scene consists of a sequence of natural images (compressible signals) which exhibits sparse changes from one frame to the next. [sent-25, score-0.182]

16 It is thus reasonable to hope that one would be able to get away with far fewer measurements than prescribed by conventional CS theory to acquire and recover such time-varying signals as videos. [sent-26, score-0.415]

17 The problem of recovering signals with time-varying sparsity has been referred to in the literature as dynamic CS. [sent-27, score-0.264]

18 A number of empirically-motivated algorithms to solve the dynamic CS problem have been proposed, e. [sent-28, score-0.069]

19 To our knowledge, no recovery guarantees have been proved for these algorithms, which typically assume that the support of the signal and/or the amplitudes of the coefficients change smoothly with time. [sent-31, score-0.269]

20 In [5], for instance, the authors propose message-passing algorithms for tracking and smoothing of signals with time-varying sparsity. [sent-32, score-0.136]

21 Simulation results show the superiority of the algorithms compared to one based on applying conventional CS principles at each time instant. [sent-33, score-0.055]

22 Dynamic CS algorithms have potential applications to video processing [7], estimation of sources of brain activity from MEG time-series [8], medical imaging [7], and estimation of time-varying networks [9]. [sent-34, score-0.054]

23 To the best of our knowledge, the dynamic CS problem has not received rigorous, theoretical scrutiny. [sent-35, score-0.095]

24 In this paper, we develop rigorous results for dynamic CS both in the absence and in the presence of noise. [sent-36, score-0.264]

25 In the presence of noise, we derive error bounds for a quadratically-constrained convex program for recovery of the sequence (xk )K . [sent-38, score-0.569]

26 In Section 3, we present our main theoretical results, which we supplement with simulated experiments and an application to real video data in Section 4. [sent-40, score-0.117]

27 In this latter section, we introduce probability-ofrecovery surfaces for the dynamic CS problem, which generalize the traditional recovery curves of CS. [sent-41, score-0.283]

28 We say that a vector x ∈ Rp is S-sparse if ||x||0 ≤ S, where ||x||0 := |supp(x)|. [sent-44, score-0.025]

29 We consider the problem of recovering a sequence (xk )K of Rp vectors such that xk − xk−1 is Sk -sparse based on linear measurements k=0 of the form yk = Ak xk + ek . [sent-45, score-1.51]

30 Here, Ak ∈ Rnk ×p , ek ∈ Rnk and yk ∈ Rnk denote the measurement matrix, measurement noise, and the observation vector, respectively. [sent-46, score-0.458]

31 Typically, Sk < nk ≪ p, which accounts for the compressive nature of the measurements. [sent-47, score-0.147]

32 We will be dealing with sequences (of sets, matrices, vectors), as such we let the index k denote the k th element of any such sequence. [sent-50, score-0.1]

33 For two matrices A1 ∈ Rn1 ×p and A2 ∈ Rn2 ×p , n2 ≤ n1 , we say that A2 ⊂ A1 if the rows of A2 are distinct and each row of A2 coincides with a row of A1 . [sent-53, score-0.025]

34 We say that the matrix A ∈ Rn×p satisfies the restricted isometry property (RIP) or order S if, for all S-sparse x ∈ Rp , we have 2 2 2 (1 − δS ) ||x||2 ≤ ||Ax||2 ≤ (1 + δS ) ||x||2 , (1) where δS ∈ (0, 1) is the smallest constant for which Equation 1 is satisfied [2]. [sent-54, score-0.074]

35 Consider the following convex optimization programs K min x1 ,x2 ,··· ,xK k=1 K min x1 ,x2 ,··· ,xK k=1 ||xk − xk−1 ||1 √ Sk xk − xk−1 √ Sk 1 s. [sent-55, score-0.511]

36 yk = Ak xk , yk − Ak xk 2 2 ≤ ǫk , k = 1, 2, · · · , K. [sent-59, score-1.228]

37 (P1) (P2) What theoretical guarantees can we provide on the performance of the above programs for recovery of sequences of signals with sparse increments, respectively in the absence (P1) and in the presence (P2) of noise? [sent-61, score-0.691]

38 3 Theoretical Results We first present a lemma giving sufficient conditions for the uniqueness of sequences of vectors with sparse increments given linear measurements in the absence of noise. [sent-62, score-0.811]

39 Then, we prove a theorem which shows that, by strengthening the conditions of this lemma, program (P1) can exactly recover every sequence of vectors with sparse increments. [sent-63, score-0.449]

40 Finally, we derive error bounds for program (P2) in the context of recovery of sequences of vectors with sparse increments in the presence of noise. [sent-64, score-0.889]

41 k=0 Let xk ∈ Rp supported on Tk ⊆ J be such that ||xk − xk−1 ||0 ≤ Sk , for k = 1, 2, · · · , K. [sent-68, score-0.428]

42 Then, given Ak and yk = Ak xk , the sequence of sets (Tk )K , and consequently the sequence of coefficients (xk )K , can be reconstructed uniquely. [sent-74, score-0.74]

43 We prove that there is a unique choice of x1 and x2 such that ||x1 − x0 ||0 ≤ S1 , ||x2 − x1 ||0 ≤ S2 and obeying y1 = A1 x1 , y2 = A2 x2 . [sent-81, score-0.029]

44 We proceed by contradiction , and assume that there exist x′ = x1 and x′ = x2 supported 1 2 ′ ′ on T1 and T2 , respectively, such that y1 = A1 x1 = A1 x′ , y2 = A2 x2 = A2 x′ , ||x′ − x0 ||0 ≤ S1 , 1 2 1 and ||x′ − x′ ||0 ≤ S2 . [sent-82, score-0.022]

45 x1 = x′ , thus contradicting our assumption 1 1 ′ that x1 = x1 . [sent-86, score-0.029]

46 2 As in Cand` s and Tao’s work [2], this lemma only suggests what may be possible in terms of e recovery of (xk )K through a combinatorial, brute-force approach. [sent-92, score-0.227]

47 By imposing stricter conditions k=1 on (δ2Sk )K , we can recover (xk )K by solving a convex program. [sent-93, score-0.146]

48 Let (¯k )K ∈ Rp be a sequence of Rp vectors such that, for each k, ||¯k − xk−1 ||0 ≤ Sk for some x k=1 x ¯ Sk < p/2. [sent-96, score-0.115]

49 Suppose that the measurements yk = Ak xk ∈ Rnk are given, such that nk < p, A1 ⊃ ¯ A2 , Ak = A2 for k = 3, · · · , K and (Ak )K satisfies δSk + δ2Sk + δ3Sk < 1 for k = 1, 2 · · · , K. [sent-97, score-0.861]

50 k=1 Then, the sequence (¯k )K is the unique minimizer to the program (P1). [sent-98, score-0.203]

51 We can re-write the program as follows: ||x2 − x1 ||1 ||x1 || √ min √ 1 + S1 S2 x1 ,x2 s. [sent-102, score-0.14]

52 Key element of the proof: The key element of the proof is the existence of vectors u1 , u2 satisfying the exact reconstruction property (ERP) [10, 11]. [sent-108, score-0.175]

53 Using the lower bounds in the RIP of 1,j 2,j 1,j A1 and A2 leads to 0= ||A2 (x∗ 2 0 = ||A1 (x∗ − x1 )||2 ¯ 1 − x∗ − (¯2 − x1 ))||2 x ¯ 1 ≥ ≥ ¯ (1 − δ2S1 ) ||x∗ − x1 ||2 1 x ¯ (1 − δ2S2 ) ||x∗ − x∗ − (¯2 − x1 )||2 , 2 1 (6) (7) so that x∗ = x1 , and x∗ = x2 . [sent-118, score-0.031]

54 First, Theorem 2 effectively asserts that the program (P1) is equivalent to sequentially solving (i. [sent-121, score-0.165]

55 for k = 1, 2, · · · , K) the following program, starting with x∗ the vector 0 of all zeros in Rp : min xk − x∗ k−1 xk 1 s. [sent-123, score-0.888]

56 yk − Ak x∗ = Ak (xk − x∗ ), k−1 k−1 4 k = 1, 2, · · · , K. [sent-125, score-0.186]

57 (8) Second, it is interesting and surprising that Theorem 2 would hold, if one naively applies standard CS principles to our problem. [sent-126, score-0.05]

58 To see this, if we let wk = xk − xk−1 , then program (P1) becomes K min w1 ,··· ,wK ||wk ||1 √ Sk k=1 s. [sent-127, score-0.623]

59 AK ··· AK K k=1 nk (9) and A is given by   . [sent-141, score-0.081]

60  As K grows large, the columns of A become increasingly correlated or coherent, which intuitively means that A would be far from satisfying RIP of any order. [sent-142, score-0.02]

61 This is an important reminder that the RIP is a sufficient, but not necessary condition for recovery. [sent-144, score-0.029]

62 Third, the assumption that A1 ⊃ A2 , Ak = A2 for k = 3, · · · , K makes practical sense as it allows one to avoid the prohibitive storage and computational cost of generating several distinct measurement matrices. [sent-145, score-0.072]

63 Lastly, the key advantage of dynamic CS recovery (P1) is the smaller number of measurements required compared to the classical approach [2] which would solve K separate ℓ1 -minimization problems. [sent-147, score-0.427]

64 For each k = 1, · · · , K, one would require nk ≥ CSk log(p/Sk ) measurements for dynamic recovery, compared to nk ≥ CS1 log(p/S1 ) for classical recovery. [sent-148, score-0.397]

65 , the sparse increments are small, we conclude that there are less number of measurements required for dynamic CS. [sent-151, score-0.537]

66 We now move to the case where the measurements are perturbed by bounded noise. [sent-152, score-0.166]

67 More specifically, we derive error bounds for a quadratically-constrained convex program for recovery of sequences of vectors with sparse increments in the presence of noise. [sent-153, score-0.936]

68 Let (¯k )K ∈ Rp be as stated in Theorem 2, and x0 be the vector of all zeros in Rp . [sent-155, score-0.032]

69 Suppose that x k=1 the measurements yk = Ak xk + ek ∈ Rnk are given such that ||ek ||2 ≤ ǫk and (Ak )K satisfy k=1 δ3Sk + 3δ4Sk < 2, for each k. [sent-156, score-0.934]

70 Finally, let hk := k k=1 (x∗ − x∗ ) − (¯k − xk−1 ), for k = 1, 2, · · · , K, with the convention that x0 := x∗ := 0 ∈ Rp . [sent-158, score-0.075]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xk', 0.428), ('ak', 0.406), ('rip', 0.279), ('increments', 0.237), ('sk', 0.235), ('cs', 0.227), ('recovery', 0.192), ('yk', 0.186), ('rp', 0.182), ('rnk', 0.167), ('measurements', 0.166), ('program', 0.14), ('signals', 0.136), ('ek', 0.128), ('sgn', 0.092), ('absence', 0.088), ('supp', 0.085), ('nk', 0.081), ('compressible', 0.079), ('sequences', 0.076), ('csk', 0.076), ('demba', 0.076), ('measurement', 0.072), ('presence', 0.072), ('dynamic', 0.069), ('compressive', 0.066), ('sparse', 0.065), ('sequence', 0.063), ('noise', 0.061), ('uniqueness', 0.061), ('recovering', 0.059), ('wk', 0.055), ('video', 0.054), ('hk', 0.052), ('vectors', 0.052), ('isometry', 0.049), ('convex', 0.047), ('instant', 0.045), ('remarks', 0.039), ('recover', 0.037), ('supplement', 0.037), ('programs', 0.036), ('rigorous', 0.035), ('successive', 0.035), ('sensing', 0.035), ('lemma', 0.035), ('tk', 0.034), ('erp', 0.033), ('bcs', 0.033), ('strengthening', 0.033), ('stable', 0.032), ('zeros', 0.032), ('stricter', 0.031), ('emery', 0.031), ('fruit', 0.031), ('inception', 0.031), ('suppose', 0.031), ('bounds', 0.031), ('conditions', 0.031), ('exact', 0.03), ('signal', 0.029), ('specifically', 0.029), ('pain', 0.029), ('contaminated', 0.029), ('obeying', 0.029), ('contradicting', 0.029), ('reminder', 0.029), ('akj', 0.029), ('principles', 0.028), ('theorem', 0.028), ('prescribed', 0.027), ('aw', 0.027), ('amplitudes', 0.027), ('conventional', 0.027), ('satis', 0.026), ('theoretical', 0.026), ('meg', 0.026), ('satisfy', 0.026), ('asserts', 0.025), ('obeys', 0.025), ('say', 0.025), ('reconstruction', 0.025), ('derive', 0.024), ('element', 0.024), ('rice', 0.024), ('patrick', 0.024), ('arguably', 0.023), ('boston', 0.023), ('convention', 0.023), ('validity', 0.022), ('acquiring', 0.022), ('naively', 0.022), ('surfaces', 0.022), ('contradiction', 0.022), ('acquire', 0.022), ('regularity', 0.022), ('tao', 0.022), ('smoothly', 0.021), ('concluding', 0.021), ('satisfying', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

2 0.37017182 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

3 0.23606952 282 nips-2012-Proximal Newton-type methods for convex optimization

Author: Jason Lee, Yuekai Sun, Michael Saunders

Abstract: We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x), n x∈R where g is convex and continuously differentiable and h : Rn → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. 1

4 0.22844251 27 nips-2012-A quasi-Newton proximal splitting method

Author: Stephen Becker, Jalal Fadili

Abstract: A new result in convex analysis on the calculation of proximity operators in certain scaled norms is derived. We describe efficient implementations of the proximity calculation for a useful class of functions; the implementations exploit the piece-wise linear nature of the dual problem. The second part of the paper applies the previous result to acceleration of convex minimization problems, and leads to an elegant quasi-Newton method. The optimization method compares favorably against state-of-the-art alternatives. The algorithm has extensive applications including signal processing, sparse recovery and machine learning and classification. 1

5 0.17958996 79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities

Author: Xaq Pitkow

Abstract: This paper shows how sparse, high-dimensional probability distributions could be represented by neurons with exponential compression. The representation is a novel application of compressive sensing to sparse probability distributions rather than to the usual sparse signals. The compressive measurements correspond to expected values of nonlinear functions of the probabilistically distributed variables. When these expected values are estimated by sampling, the quality of the compressed representation is limited only by the quality of sampling. Since the compression preserves the geometric structure of the space of sparse probability distributions, probabilistic computation can be performed in the compressed domain. Interestingly, functions satisfying the requirements of compressive sensing can be implemented as simple perceptrons. If we use perceptrons as a simple model of feedforward computation by neurons, these results show that the mean activity of a relatively small number of neurons can accurately represent a highdimensional joint distribution implicitly, even without accounting for any noise correlations. This comprises a novel hypothesis for how neurons could encode probabilities in the brain. 1

6 0.16168031 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

7 0.15901363 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

8 0.13256101 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

9 0.12832078 34 nips-2012-Active Learning of Multi-Index Function Models

10 0.12453844 63 nips-2012-CPRL -- An Extension of Compressive Sensing to the Phase Retrieval Problem

11 0.12104792 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

12 0.12050375 184 nips-2012-Learning Probability Measures with respect to Optimal Transport Metrics

13 0.10308515 304 nips-2012-Selecting Diverse Features via Spectral Regularization

14 0.098220907 60 nips-2012-Bayesian nonparametric models for ranked data

15 0.097593307 261 nips-2012-Online allocation and homogeneous partitioning for piecewise constant mean-approximation

16 0.092614233 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

17 0.090233289 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

18 0.089052461 36 nips-2012-Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions

19 0.087656438 351 nips-2012-Transelliptical Component Analysis

20 0.085901693 142 nips-2012-Generalization Bounds for Domain Adaptation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.176), (1, 0.049), (2, 0.15), (3, -0.088), (4, 0.114), (5, 0.165), (6, 0.04), (7, -0.142), (8, 0.298), (9, 0.091), (10, -0.151), (11, 0.134), (12, 0.104), (13, -0.145), (14, -0.159), (15, -0.108), (16, -0.007), (17, 0.102), (18, -0.016), (19, -0.027), (20, -0.151), (21, 0.085), (22, -0.095), (23, 0.008), (24, 0.097), (25, 0.05), (26, -0.056), (27, 0.083), (28, 0.042), (29, -0.008), (30, -0.003), (31, 0.029), (32, 0.001), (33, 0.034), (34, -0.105), (35, -0.018), (36, 0.001), (37, 0.038), (38, 0.016), (39, -0.042), (40, -0.021), (41, 0.041), (42, -0.032), (43, -0.016), (44, -0.046), (45, 0.043), (46, 0.07), (47, -0.022), (48, 0.062), (49, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98235297 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

2 0.835356 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

3 0.72401059 27 nips-2012-A quasi-Newton proximal splitting method

Author: Stephen Becker, Jalal Fadili

Abstract: A new result in convex analysis on the calculation of proximity operators in certain scaled norms is derived. We describe efficient implementations of the proximity calculation for a useful class of functions; the implementations exploit the piece-wise linear nature of the dual problem. The second part of the paper applies the previous result to acceleration of convex minimization problems, and leads to an elegant quasi-Newton method. The optimization method compares favorably against state-of-the-art alternatives. The algorithm has extensive applications including signal processing, sparse recovery and machine learning and classification. 1

4 0.72050995 282 nips-2012-Proximal Newton-type methods for convex optimization

Author: Jason Lee, Yuekai Sun, Michael Saunders

Abstract: We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x), n x∈R where g is convex and continuously differentiable and h : Rn → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. 1

5 0.56947821 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

Author: Francesco Dinuzzo, Bernhard Schölkopf

Abstract: The representer theorem is a property that lies at the foundation of regularization theory and kernel methods. A class of regularization functionals is said to admit a linear representer theorem if every member of the class admits minimizers that lie in the finite dimensional subspace spanned by the representers of the data. A recent characterization states that certain classes of regularization functionals with differentiable regularization term admit a linear representer theorem for any choice of the data if and only if the regularization term is a radial nondecreasing function. In this paper, we extend such result by weakening the assumptions on the regularization term. In particular, the main result of this paper implies that, for a sufficiently large family of regularization functionals, radial nondecreasing functions are the only lower semicontinuous regularization terms that guarantee existence of a representer theorem for any choice of the data. 1

6 0.54146159 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

7 0.49517915 63 nips-2012-CPRL -- An Extension of Compressive Sensing to the Phase Retrieval Problem

8 0.4657186 20 nips-2012-A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets

9 0.46265608 34 nips-2012-Active Learning of Multi-Index Function Models

10 0.45794779 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

11 0.45392117 79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities

12 0.41311491 184 nips-2012-Learning Probability Measures with respect to Optimal Transport Metrics

13 0.40601367 36 nips-2012-Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions

14 0.4005194 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

15 0.39410529 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting

16 0.38769919 179 nips-2012-Learning Manifolds with K-Means and K-Flats

17 0.36343694 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity

18 0.3625637 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

19 0.35503998 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

20 0.34165174 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.024), (17, 0.024), (21, 0.085), (36, 0.015), (38, 0.202), (39, 0.015), (42, 0.033), (54, 0.034), (55, 0.014), (69, 0.093), (74, 0.013), (76, 0.128), (80, 0.094), (92, 0.053), (94, 0.064)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94496524 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

2 0.90102243 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

Author: Stephen Bach, Matthias Broecheler, Lise Getoor, Dianne O'leary

Abstract: Probabilistic graphical models are powerful tools for analyzing constrained, continuous domains. However, finding most-probable explanations (MPEs) in these models can be computationally expensive. In this paper, we improve the scalability of MPE inference in a class of graphical models with piecewise-linear and piecewise-quadratic dependencies and linear constraints over continuous domains. We derive algorithms based on a consensus-optimization framework and demonstrate their superior performance over state of the art. We show empirically that in a large-scale voter-preference modeling problem our algorithms scale linearly in the number of dependencies and constraints. 1

3 0.89473325 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

Author: Xue-xin Wei, Alan Stocker

Abstract: A common challenge for Bayesian models of perception is the fact that the two fundamental Bayesian components, the prior distribution and the likelihood function, are formally unconstrained. Here we argue that a neural system that emulates Bayesian inference is naturally constrained by the way it represents sensory information in populations of neurons. More specifically, we show that an efficient coding principle creates a direct link between prior and likelihood based on the underlying stimulus distribution. The resulting Bayesian estimates can show biases away from the peaks of the prior distribution, a behavior seemingly at odds with the traditional view of Bayesian estimation, yet one that has been reported in human perception. We demonstrate that our framework correctly accounts for the repulsive biases previously reported for the perception of visual orientation, and show that the predicted tuning characteristics of the model neurons match the reported orientation tuning properties of neurons in primary visual cortex. Our results suggest that efficient coding is a promising hypothesis in constraining Bayesian models of perceptual inference. 1 Motivation Human perception is not perfect. Biases have been observed in a large number of perceptual tasks and modalities, of which the most salient ones constitute many well-known perceptual illusions. It has been suggested, however, that these biases do not reflect a failure of perception but rather an observer’s attempt to optimally combine the inherently noisy and ambiguous sensory information with appropriate prior knowledge about the world [13, 4, 14]. This hypothesis, which we will refer to as the Bayesian hypothesis, has indeed proven quite successful in providing a normative explanation of perception at a qualitative and, more recently, quantitative level (see e.g. [15]). A major challenge in forming models based on the Bayesian hypothesis is the correct selection of two main components: the prior distribution (belief) and the likelihood function. This has encouraged some to criticize the Bayesian hypothesis altogether, claiming that arbitrary choices for these components always allow for unjustified post-hoc explanations of the data [1]. We do not share this criticism, referring to a number of successful attempts to constrain prior beliefs and likelihood functions based on principled grounds. For example, prior beliefs have been defined as the relative distribution of the sensory variable in the environment in cases where these statistics are relatively easy to measure (e.g. local visual orientations [16]), or where it can be assumed that subjects have learned them over the course of the experiment (e.g. time perception [17]). Other studies have constrained the likelihood function according to known noise characteristics of neurons that are crucially involved in the specific perceptual process (e.g motion tuned neurons in visual cor∗ http://www.sas.upenn.edu/ astocker/lab 1 world neural representation efficient encoding percept Bayesian decoding Figure 1: Encoding-decoding framework. A stimulus representing a sensory variable θ elicits a firing rate response R = {r1 , r2 , ..., rN } in a population of N neurons. The perceptual task is to generate a ˆ good estimate θ(R) of the presented value of the sensory variable based on this population response. Our framework assumes that encoding is efficient, and decoding is Bayesian based on the likelihood p(R|θ), the prior p(θ), and a squared-error loss function. tex [18]). However, we agree that finding appropriate constraints is generally difficult and that prior beliefs and likelihood functions have been often selected on the basis of mathematical convenience. Here, we propose that the efficient coding hypothesis [19] offers a joint constraint on the prior and likelihood function in neural implementations of Bayesian inference. Efficient coding provides a normative description of how neurons encode sensory information, and suggests a direct link between measured perceptual discriminability, neural tuning characteristics, and environmental statistics [11]. We show how this link can be extended to a full Bayesian account of perception that includes perceptual biases. We validate our model framework against behavioral as well as neural data characterizing the perception of visual orientation. We demonstrate that we can account not only for the reported perceptual biases away from the cardinal orientations, but also for the specific response characteristics of orientation-tuned neurons in primary visual cortex. Our work is a novel proposal of how two important normative hypotheses in perception science, namely efficient (en)coding and Bayesian decoding, might be linked. 2 Encoding-decoding framework We consider perception as an inference process that takes place along the simplified neural encodingdecoding cascade illustrated in Fig. 11 . 2.1 Efficient encoding Efficient encoding proposes that the tuning characteristics of a neural population are adapted to the prior distribution p(θ) of the sensory variable such that the population optimally represents the sensory variable [19]. Different definitions of “optimally” are possible, and may lead to different results. Here, we assume an efficient representation that maximizes the mutual information between the sensory variable and the population response. With this definition and an upper limit on the total firing activity, the square-root of the Fisher Information must be proportional to the prior distribution [12, 21]. In order to constrain the tuning curves of individual neurons in the population we also impose a homogeneity constraint, requiring that there exists a one-to-one mapping F (θ) that transforms the ˜ physical space with units θ to a homogeneous space with units θ = F (θ) in which the stimulus distribution becomes uniform. This defines the mapping as θ F (θ) = p(χ)dχ , (1) −∞ which is the cumulative of the prior distribution p(θ). We then assume a neural population with identical tuning curves that evenly tiles the stimulus range in this homogeneous space. The population provides an efficient representation of the sensory variable θ according to the above constraints [11]. ˜ The tuning curves in the physical space are obtained by applying the inverse mapping F −1 (θ). Fig. 2 1 In the context of this paper, we consider ‘inferring’, ‘decoding’, and ‘estimating’ as synonymous. 2 stimulus distribution d samples # a Fisher information discriminability and average firing rates and b firing rate [ Hz] efficient encoding F likelihood function F -1 likelihood c symmetric asymmetric homogeneous space physical space Figure 2: Efficient encoding constrains the likelihood function. a) Prior distribution p(θ) derived from stimulus statistics. b) Efficient coding defines the shape of the tuning curves in the physical space by transforming a set of homogeneous neurons using a mapping F −1 that is the inverse of the cumulative of the prior p(θ) (see Eq. (1)). c) As a result, the likelihood shape is constrained by the prior distribution showing heavier tails on the side of lower prior density. d) Fisher information, discrimination threshold, and average firing rates are all uniform in the homogeneous space. illustrates the applied efficient encoding scheme, the mapping, and the concept of the homogeneous space for the example of a symmetric, exponentially decaying prior distribution p(θ). The key idea here is that by assuming efficient encoding, the prior (i.e. the stimulus distribution in the world) directly constrains the likelihood function. In particular, the shape of the likelihood is determined by the cumulative distribution of the prior. As a result, the likelihood is generally asymmetric, as shown in Fig. 2, exhibiting heavier tails on the side of the prior with lower density. 2.2 Bayesian decoding Let us consider a population of N sensory neurons that efficiently represents a stimulus variable θ as described above. A stimulus θ0 elicits a specific population response that is characterized by the vector R = [r1 , r2 , ..., rN ] where ri is the spike-count of the ith neuron over a given time-window τ . Under the assumption that the variability in the individual firing rates is governed by a Poisson process, we can write the likelihood function over θ as N p(R|θ) = (τ fi (θ))ri −τ fi (θ) e , ri ! i=1 (2) ˆ with fi (θ) describing the tuning curve of neuron i. We then define a Bayesian decoder θLSE as the estimator that minimizes the expected squared-error between the estimate and the true stimulus value, thus θp(R|θ)p(θ)dθ ˆ θLSE (R) = , (3) p(R|θ)p(θ)dθ where we use Bayes’ rule to appropriately combine the sensory evidence with the stimulus prior p(θ). 3 Bayesian estimates can be biased away from prior peaks Bayesian models of perception typically predict perceptual biases toward the peaks of the prior density, a characteristic often considered a hallmark of Bayesian inference. This originates from the 3 a b prior attraction prior prior attraction likelihood repulsion! likelihood c prior prior repulsive bias likelihood likelihood mean posterior mean posterior mean Figure 3: Bayesian estimates biased away from the prior. a) If the likelihood function is symmetric, then the estimate (posterior mean) is, on average, shifted away from the actual value of the sensory variable θ0 towards the prior peak. b) Efficient encoding typically leads to an asymmetric likelihood function whose normalized mean is away from the peak of the prior (relative to θ0 ). The estimate is determined by a combination of prior attraction and shifted likelihood mean, and can exhibit an overall repulsive bias. c) If p(θ0 ) < 0 and the likelihood is relatively narrow, then (1/p(θ)2 ) > 0 (blue line) and the estimate is biased away from the prior peak (see Eq. (6)). common approach of choosing a parametric description of the likelihood function that is computationally convenient (e.g. Gaussian). As a consequence, likelihood functions are typically assumed to be symmetric (but see [23, 24]), leaving the bias of the Bayesian estimator to be mainly determined by the shape of the prior density, i.e. leading to biases toward the peak of the prior (Fig. 3a). In our model framework, the shape of the likelihood function is constrained by the stimulus prior via efficient neural encoding, and is generally not symmetric for non-flat priors. It has a heavier tail on the side with lower prior density (Fig. 3b). The intuition is that due to the efficient allocation of neural resources, the side with smaller prior density will be encoded less accurately, leading to a broader likelihood function on that side. The likelihood asymmetry pulls the Bayes’ least-squares estimate away from the peak of the prior while at the same time the prior pulls it toward its peak. Thus, the resulting estimation bias is the combination of these two counter-acting forces - and both are determined by the prior! 3.1 General derivation of the estimation bias In the following, we will formally derive the mean estimation bias b(θ) of the proposed encodingdecoding framework. Specifically, we will study the conditions for which the bias is repulsive i.e. away from the peak of the prior density. ˆ We first re-write the estimator θLSE (3) by replacing θ with the inverse of its mapping to the homo−1 ˜ geneous space, i.e., θ = F (θ). The motivation for this is that the likelihood in the homogeneous space is symmetric (Fig. 2). Given a value θ0 and the elicited population response R, we can write the estimator as ˜ ˜ ˜ ˜ θp(R|θ)p(θ)dθ F −1 (θ)p(R|F −1 (θ))p(F −1 (θ))dF −1 (θ) ˆ θLSE (R) = = . ˜ ˜ ˜ p(R|θ)p(θ)dθ p(R|F −1 (θ))p(F −1 (θ))dF −1 (θ) Calculating the derivative of the inverse function and noting that F is the cumulative of the prior density, we get 1 1 1 ˜ ˜ ˜ ˜ ˜ ˜ dθ = dθ. dF −1 (θ) = (F −1 (θ)) dθ = dθ = −1 (θ)) ˜ F (θ) p(θ) p(F ˆ Hence, we can simplify θLSE (R) as ˆ θLSE (R) = ˜ ˜ ˜ F −1 (θ)p(R|F −1 (θ))dθ . ˜ ˜ p(R|F −1 (θ))dθ With ˜ K(R, θ) = ˜ p(R|F −1 (θ)) ˜ ˜ p(R|F −1 (θ))dθ 4 we can further simplify the notation and get ˆ θLSE (R) = ˜ ˜ ˜ F −1 (θ)K(R, θ)dθ . (4) ˆ ˜ In order to get the expected value of the estimate, θLSE (θ), we marginalize (4) over the population response space S, ˆ ˜ ˜ ˜ ˜ θLSE (θ) = p(R)F −1 (θ)K(R, θ)dθdR S = F −1 ˜ (θ)( ˜ ˜ p(R)K(R, θ)dR)dθ = ˜ ˜ ˜ F −1 (θ)L(θ)dθ, S where we define ˜ L(θ) = ˜ p(R)K(R, θ)dR. S ˜ ˜ ˜ It follows that L(θ)dθ = 1. Due to the symmetry in this space, it can be shown that L(θ) is ˜0 . Intuitively, L(θ) can be thought as the normalized ˜ symmetric around the true stimulus value θ average likelihood in the homogeneous space. We can then compute the expected bias at θ0 as b(θ0 ) = ˜ ˜ ˜ ˜ F −1 (θ)L(θ)dθ − F −1 (θ0 ) (5) ˜ This is expression is general where F −1 (θ) is defined as the inverse of the cumulative of an arbitrary ˜ prior density p(θ) (see Eq. (1)) and the dispersion of L(θ) is determined by the internal noise level. ˜ ˜ Assuming the prior density to be smooth, we expand F −1 in a neighborhood (θ0 − h, θ0 + h) that is larger than the support of the likelihood function. Using Taylor’s theorem with mean-value forms of the remainder, we get 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ F −1 (θ) = F −1 (θ0 ) + F −1 (θ0 ) (θ − θ0 ) + F −1 (θx ) (θ − θ0 )2 , 2 ˜ ˜ ˜ with θx lying between θ0 and θ. By applying this expression to (5), we find ˜ θ0 +h b(θ0 ) = = 1 2 ˜ θ0 −h 1 −1 ˜ ˜ ˜ ˜ ˜ 1 F (θx )θ (θ − θ0 )2 L(θ)dθ = ˜ 2 2 ˜ θ0 +h −( ˜ θ0 −h p(θx )θ ˜ ˜ 2 ˜ ˜ 1 )(θ − θ0 ) L(θ)dθ = p(θx )3 4 ˜ θ0 +h 1 ˜ − θ0 )2 L(θ)dθ ˜ ˜ ˜ ( ) ˜(θ ˜ p(F −1 (θx )) θ ( 1 ˜ ˜ ˜ ˜ ) (θ − θ0 )2 L(θ)dθ. p(θx )2 θ ˜ θ0 −h ˜ θ0 +h ˜ θ0 −h In general, there is no simple rule to judge the sign of b(θ0 ). However, if the prior is monotonic ˜ ˜ on the interval F −1 ((θ0 − h, θ0 + h)), then the sign of ( p(θ1 )2 ) is always the same as the sign of x 1 1 ( p(θ0 )2 ) . Also, if the likelihood is sufficiently narrow we can approximate ( p(θ1 )2 ) by ( p(θ0 )2 ) , x and therefore approximate the bias as b(θ0 ) ≈ C( 1 ) , p(θ0 )2 (6) where C is a positive constant. The result is quite surprising because it states that as long as the prior is monotonic over the support of the likelihood function, the expected estimation bias is always away from the peaks of the prior! 3.2 Internal (neural) versus external (stimulus) noise The above derivation of estimation bias is based on the assumption that all uncertainty about the sensory variable is caused by neural response variability. This level of internal noise depends on the response magnitude, and thus can be modulated e.g. by changing stimulus contrast. This contrastcontrolled noise modulation is commonly exploited in perceptual studies (e.g. [18]). Internal noise will always lead to repulsive biases in our framework if the prior is monotonic. If internal noise is low, the likelihood is narrow and thus the bias is small. Increasing internal noise leads to increasingly 5 larger biases up to the point where the likelihood becomes wide enough such that monotonicity of the prior over the support of the likelihood is potentially violated. Stimulus noise is another way to modulate the noise level in perception (e.g. random-dot motion stimuli). Such external noise, however, has a different effect on the shape of the likelihood function as compared to internal noise. It modifies the likelihood function (2) by convolving it with the noise kernel. External noise is frequently chosen as additive and symmetric (e.g. zero-mean Gaussian). It is straightforward to prove that such symmetric external noise does not lead to a change in the mean of the likelihood, and thus does not alter the repulsive effect induced by its asymmetry. However, by increasing the overall width of the likelihood, the attractive influence of the prior increases, resulting in an estimate that is closer to the prior peak than without external noise2 . 4 Perception of visual orientation We tested our framework by modelling the perception of visual orientation. Our choice was based on the fact that i) we have pretty good estimates of the prior distribution of local orientations in natural images, ii) tuning characteristics of orientation selective neurons in visual cortex are wellstudied (monkey/cat), and iii) biases in perceived stimulus orientation have been well characterized. We start by creating an efficient neural population based on measured prior distributions of local visual orientation, and then compare the resulting tuning characteristics of the population and the predicted perceptual biases with reported data in the literature. 4.1 Efficient neural model population for visual orientation Previous studies measured the statistics of the local orientation in large sets of natural images and consistently found that the orientation distribution is multimodal, peaking at the two cardinal orientations as shown in Fig. 4a [16, 20]. We assumed that the visual system’s prior belief over orientation p(θ) follows this distribution and approximate it formally as p(θ) ∝ 2 − | sin(θ)| (black line in Fig. 4b) . (7) Based on this prior distribution we defined an efficient neural representation for orientation. We assumed a population of model neurons (N = 30) with tuning curves that follow a von-Mises distribution in the homogeneous space on top of a constant spontaneous firing rate (5 Hz). We then ˜ applied the inverse transformation F −1 (θ) to all these tuning curves to get the corresponding tuning curves in the physical space (Fig. 4b - red curves), where F (θ) is the cumulative of the prior (7). The concentration parameter for the von-Mises tuning curves was set to κ ≈ 1.6 in the homogeneous space in order to match the measured average tuning width (∼ 32 deg) of neurons in area V1 of the macaque [9]. 4.2 Predicted tuning characteristics of neurons in primary visual cortex The orientation tuning characteristics of our model population well match neurophysiological data of neurons in primary visual cortex (V1). Efficient encoding predicts that the distribution of neurons’ preferred orientation follows the prior, with more neurons tuned to cardinal than oblique orientations by a factor of approximately 1.5. A similar ratio has been found for neurons in area V1 of monkey/cat [9, 10]. Also, the tuning widths of the model neurons vary between 25-42 deg depending on their preferred tuning (see Fig. 4c), matching the measured tuning width ratio of 0.6 between neurons tuned to the cardinal versus oblique orientations [9]. An important prediction of our model is that most of the tuning curves should be asymmetric. Such asymmetries have indeed been reported for the orientation tuning of neurons in area V1 [6, 7, 8]. We computed the asymmetry index for our model population as defined in previous studies [6, 7], and plotted it as a function of the preferred tuning of each neuron (Fig. 4d). The overall asymmetry index in our model population is 1.24 ± 0.11, which approximately matches the measured values for neurons in area V1 of the cat (1.26 ± 0.06) [6]. It also predicts that neurons tuned to the cardinal and oblique orientations should show less symmetry than those tuned to orientations in between. Finally, 2 Note, that these predictions are likely to change if the external noise is not symmetric. 6 a b 25 firing rate(Hz) 0 orientation(deg) asymmetry vs. tuning width 1.0 2.0 90 2.0 e asymmetry 1.0 0 asymmetry index 50 30 width (deg) 10 90 preferred tuning(deg) -90 0 d 0 0 90 asymmetry index 0 orientation(deg) tuning width -90 0 0 probability 0 -90 c efficient representation 0.01 0.01 image statistics -90 0 90 preferred tuning(deg) 25 30 35 40 tuning width (deg) Figure 4: Tuning characteristics of model neurons. a) Distribution of local orientations in natural images, replotted from [16]. b) Prior used in the model (black) and predicted tuning curves according to efficient coding (red). c) Tuning width as a function of preferred orientation. d) Tuning curves of cardinal and oblique neurons are more symmetric than those tuned to orientations in between. e) Both narrowly and broadly tuned neurons neurons show less asymmetry than neurons with tuning widths in between. neurons with tuning widths at the lower and upper end of the range are predicted to exhibit less asymmetry than those neurons whose widths lie in between these extremes (illustrated in Fig. 4e). These last two predictions have not been tested yet. 4.3 Predicted perceptual biases Our model framework also provides specific predictions for the expected perceptual biases. Humans show systematic biases in perceived orientation of visual stimuli such as e.g. arrays of Gabor patches (Fig. 5a,d). Two types of biases can be distinguished: First, perceived orientations show an absolute bias away from the cardinal orientations, thus away from the peaks of the orientation prior [2, 3]. We refer to these biases as absolute because they are typically measured by adjusting a noise-free reference until it matched the orientation of the test stimulus. Interestingly, these repulsive absolute biases are the larger the smaller the external stimulus noise is (see Fig. 5b). Second, the relative bias between the perceived overall orientations of a high-noise and a low-noise stimulus is toward the cardinal orientations as shown in Fig. 5c, and thus toward the peak of the prior distribution [3, 16]. The predicted perceptual biases of our model are shown Fig. 5e,f. We computed the likelihood function according to (2) and used the prior in (7). External noise was modeled by convolving the stimulus likelihood function with a Gaussian (different widths for different noise levels). The predictions well match both, the reported absolute bias away as well as the relative biases toward the cardinal orientations. Note, that our model framework correctly accounts for the fact that less external noise leads to larger absolute biases (see also discussion in section 3.2). 5 Discussion We have presented a modeling framework for perception that combines efficient (en)coding and Bayesian decoding. Efficient coding imposes constraints on the tuning characteristics of a population of neurons according to the stimulus distribution (prior). It thus establishes a direct link between prior and likelihood, and provides clear constraints on the latter for a Bayesian observer model of perception. We have shown that the resulting likelihoods are in general asymmetric, with 7 absolute bias (data) b c relative bias (data) -4 0 bias(deg) 4 a low-noise stimulus -90 e 90 absolute bias (model) low external noise high external noise 3 high-noise stimulus -90 f 0 90 relative bias (model) 0 bias(deg) d 0 attraction -3 repulsion -90 0 orientation (deg) 90 -90 0 orientation (deg) 90 Figure 5: Biases in perceived orientation: Human data vs. Model prediction. a,d) Low- and highnoise orientation stimuli of the type used in [3, 16]. b) Humans show absolute biases in perceived orientation that are away from the cardinal orientations. Data replotted from [2] (pink squares) and [3] (green (black) triangles: bias for low (high) external noise). c) Relative bias between stimuli with different external noise level (high minus low). Data replotted from [3] (blue triangles) and [16] (red circles). e,f) Model predictions for absolute and relative bias. heavier tails away from the prior peaks. We demonstrated that such asymmetric likelihoods can lead to the counter-intuitive prediction that a Bayesian estimator is biased away from the peaks of the prior distribution. Interestingly, such repulsive biases have been reported for human perception of visual orientation, yet a principled and consistent explanation of their existence has been missing so far. Here, we suggest that these counter-intuitive biases directly follow from the asymmetries in the likelihood function induced by efficient neural encoding of the stimulus. The good match between our model predictions and the measured perceptual biases and orientation tuning characteristics of neurons in primary visual cortex provides further support of our framework. Previous work has suggested that there might be a link between stimulus statistics, neuronal tuning characteristics, and perceptual behavior based on efficient coding principles, yet none of these studies has recognized the importance of the resulting likelihood asymmetries [16, 11]. We have demonstrated here that such asymmetries can be crucial in explaining perceptual data, even though the resulting estimates appear “anti-Bayesian” at first sight (see also models of sensory adaptation [23]). Note, that we do not provide a neural implementation of the Bayesian inference step. However, we and others have proposed various neural decoding schemes that can approximate Bayes’ leastsquares estimation using efficient coding [26, 25, 22]. It is also worth pointing out that our estimator is set to minimize total squared-error, and that other choices of the loss function (e.g. MAP estimator) could lead to different predictions. Our framework is general and should be directly applicable to other modalities. In particular, it might provide a new explanation for perceptual biases that are hard to reconcile with traditional Bayesian approaches [5]. Acknowledgments We thank M. Jogan and A. Tank for helpful comments on the manuscript. This work was partially supported by grant ONR N000141110744. 8 References [1] M. Jones, and B. C. Love. Bayesian fundamentalism or enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition. Behavioral and Brain Sciences, 34, 169–231,2011. [2] D. P. Andrews. Perception of contours in the central fovea. Nature, 205:1218- 1220, 1965. [3] A. Tomassini, M. J.Morgam. and J. A. Solomon. Orientation uncertainty reduces perceived obliquity. Vision Res, 50, 541–547, 2010. [4] W. S. Geisler, D. Kersten. Illusions, perception and Bayes. Nature Neuroscience, 5(6):508- 510, 2002. [5] M. O. Ernst Perceptual learning: inverting the size-weight illusion. Current Biology, 19:R23- R25, 2009. [6] G. H. Henry, B. Dreher, P. O. Bishop. Orientation specificity of cells in cat striate cortex. J Neurophysiol, 37(6):1394-409,1974. [7] D. Rose, C. Blakemore An analysis of orientation selectivity in the cat’s visual cortex. Exp Brain Res., Apr 30;20(1):1-17, 1974. [8] N. V. Swindale. Orientation tuning curves: empirical description and estimation of parameters. Biol Cybern., 78(1):45-56, 1998. [9] R. L. De Valois, E. W. Yund, N. Hepler. The orientation and direction selectivity of cells in macaque visual cortex. Vision Res.,22, 531544,1982. [10] B. Li, M. R. Peterson, R. D. Freeman. The oblique effect: a neural basis in the visual cortex. J. Neurophysiol., 90, 204217, 2003. [11] D. Ganguli and E.P. Simoncelli. Implicit encoding of prior probabilities in optimal neural populations. In Adv. Neural Information Processing Systems NIPS 23, vol. 23:658–666, 2011. [12] M. D. McDonnell, N. G. Stocks. Maximally Informative Stimuli and Tuning Curves for Sigmoidal RateCoding Neurons and Populations. Phys Rev Lett., 101(5):058103, 2008. [13] H Helmholtz. Treatise on Physiological Optics (transl.). Thoemmes Press, Bristol, U.K., 2000. Original publication 1867. [14] Y. Weiss, E. Simoncelli, and E. Adelson. Motion illusions as optimal percept. Nature Neuroscience, 5(6):598–604, June 2002. [15] D.C. Knill and W. Richards, editors. Perception as Bayesian Inference. Cambridge University Press, 1996. [16] A R Girshick, M S Landy, and E P Simoncelli. Cardinal rules: visual orientation perception reflects knowledge of environmental statistics. Nat Neurosci, 14(7):926–932, Jul 2011. [17] M. Jazayeri and M.N. Shadlen. Temporal context calibrates interval timing. Nature Neuroscience, 13(8):914–916, 2010. [18] A.A. Stocker and E.P. Simoncelli. Noise characteristics and prior expectations in human visual speed perception. Nature Neuroscience, pages 578–585, April 2006. [19] H.B. Barlow. Possible principles underlying the transformation of sensory messages. In W.A. Rosenblith, editor, Sensory Communication, pages 217–234. MIT Press, Cambridge, MA, 1961. [20] D.M. Coppola, H.R. Purves, A.N. McCoy, and D. Purves The distribution of oriented contours in the real world. Proc Natl Acad Sci U S A., 95(7): 4002–4006, 1998. [21] N. Brunel and J.-P. Nadal. Mutual information, Fisher information and population coding. Neural Computation, 10, 7, 1731–1757, 1998. [22] X-X. Wei and A.A. Stocker. Bayesian inference with efficient neural population codes. In Lecture Notes in Computer Science, Artificial Neural Networks and Machine Learning - ICANN 2012, Lausanne, Switzerland, volume 7552, pages 523–530, 2012. [23] A.A. Stocker and E.P. Simoncelli. Sensory adaptation within a Bayesian framework for perception. In Y. Weiss, B. Sch¨ lkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages o 1291–1298. MIT Press, Cambridge, MA, 2006. Oral presentation. [24] D.C. Knill. Robust cue integration: A Bayesian model and evidence from cue-conflict studies with stereoscopic and figure cues to slant. Journal of Vision, 7(7):1–24, 2007. [25] Deep Ganguli. Efficient coding and Bayesian inference with neural populations. PhD thesis, Center for Neural Science, New York University, New York, NY, September 2012. [26] B. Fischer. Bayesian estimates from heterogeneous population codes. In Proc. IEEE Intl. Joint Conf. on Neural Networks. IEEE, 2010. 9

4 0.89203542 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization

Author: Brendan Mcmahan, Matthew Streeter

Abstract: Some of the most compelling applications of online convex optimization, including online prediction and classification, are unconstrained: the natural feasible set is Rn . Existing algorithms fail to achieve sub-linear regret in this setting unless constraints on the comparator point ˚ are known in advance. We present algox rithms that, without such prior knowledge, offer near-optimal regret bounds with respect to any choice of ˚. In particular, regret with respect to ˚ = 0 is constant. x x We then prove lower bounds showing that our guarantees are near-optimal in this setting. 1

5 0.89114094 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

6 0.88873541 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

7 0.8852303 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

8 0.88499594 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

9 0.88150018 348 nips-2012-Tractable Objectives for Robust Policy Optimization

10 0.88099962 187 nips-2012-Learning curves for multi-task Gaussian process regression

11 0.87877262 227 nips-2012-Multiclass Learning with Simplex Coding

12 0.87786055 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease

13 0.8776685 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

14 0.87685907 190 nips-2012-Learning optimal spike-based representations

15 0.87579358 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

16 0.87527299 324 nips-2012-Stochastic Gradient Descent with Only One Projection

17 0.87499154 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release

18 0.87478483 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

19 0.87278765 79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities

20 0.8726061 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration