nips nips2013 nips2013-247 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Praneeth Netrapalli, Prateek Jain, Sujay Sanghavi
Abstract: Phase retrieval problems involve solving linear equations, but with missing sign (or phase, for complex numbers). Over the last two decades, a popular generic empirical approach to the many variants of this problem has been one of alternating minimization; i.e. alternating between estimating the missing phase information, and the candidate solution. In this paper, we show that a simple alternating minimization algorithm geometrically converges to the solution of one such problem – finding a vector x from y, A, where y = |AT x| and |z| denotes a vector of element-wise magnitudes of z – under the assumption that A is Gaussian. Empirically, our algorithm performs similar to recently proposed convex techniques for this variant (which are based on “lifting” to a convex matrix problem) in sample complexity and robustness to noise. However, our algorithm is much more efficient and can scale to large problems. Analytically, we show geometric convergence to the solution, and sample complexity that is off by log factors from obvious lower bounds. We also establish close to optimal scaling for the case when the unknown vector is sparse. Our work represents the only known theoretical guarantee for alternating minimization for any variant of phase retrieval problems in the non-convex setting. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Phase retrieval problems involve solving linear equations, but with missing sign (or phase, for complex numbers). [sent-5, score-0.268]
2 Over the last two decades, a popular generic empirical approach to the many variants of this problem has been one of alternating minimization; i. [sent-6, score-0.14]
3 alternating between estimating the missing phase information, and the candidate solution. [sent-8, score-0.399]
4 In this paper, we show that a simple alternating minimization algorithm geometrically converges to the solution of one such problem – finding a vector x from y, A, where y = |AT x| and |z| denotes a vector of element-wise magnitudes of z – under the assumption that A is Gaussian. [sent-9, score-0.361]
5 Empirically, our algorithm performs similar to recently proposed convex techniques for this variant (which are based on “lifting” to a convex matrix problem) in sample complexity and robustness to noise. [sent-10, score-0.195]
6 Our work represents the only known theoretical guarantee for alternating minimization for any variant of phase retrieval problems in the non-convex setting. [sent-14, score-0.668]
7 1 Introduction In this paper we are interested in recovering a complex1 vector x∗ ∈ Cn from magnitudes of its linear measurements. [sent-15, score-0.12]
8 That is, for ai ∈ Cn , if ∗ yi = | ai , x∗ |, for i = 1, . [sent-16, score-0.203]
9 , m (1) then the task is to recover x using y and the measurement matrix A = [a1 a2 . [sent-19, score-0.18]
10 The above problem arises in many settings where it is harder / infeasible to record the phase of measurements, while recording the magnitudes is significantly easier. [sent-23, score-0.349]
11 This problem, known as phase retrieval, is encountered in several applications in crystallography, optics, spectroscopy and tomography [14]. [sent-24, score-0.259]
12 Moreover, the problem is broadly studied in the following two settings: (i) The measurements in (1) correspond to the Fourier transform (the number of measurements here is equal to n) and there is some apriori information about the signal. [sent-25, score-0.485]
13 1 (ii) The set of measurements y are overcomplete (i. [sent-29, score-0.2]
14 In the first case, various types of apriori information about the underlying signal such as positivity, magnitude information on the signal [11], sparsity [25] and so on have been studied. [sent-32, score-0.208]
15 In the second case, algorithms for various measurement schemes such as Fourier oversampling [21], multiple random illuminations [4, 28] and wavelet transform [28] have been suggested. [sent-33, score-0.168]
16 These algorithms are alternating projection algorithms that iterate between the unknown phases of the measurements and the unknown underlying vector. [sent-35, score-0.376]
17 random Gaussian measurements to geometrically converge to the true vector. [sent-46, score-0.234]
18 • Our analytical contribution is the first theoretical guarantee regarding the convergence of alternating minimization for the phase retrieval problem in a non-convex setting. [sent-48, score-0.722]
19 • When the underlying vector is sparse, we design another algorithm that achieves a sample −4 complexity of O (x∗ ) log n + log3 k where k is the sparsity and x∗ is the minimin min mum non-zero entry of x∗ . [sent-49, score-0.141]
20 However, the methods with the best (or only) analytical guarantees involve convex relaxations (e. [sent-53, score-0.115]
21 In most of these settings, correctness of alternating minimization is an open question. [sent-56, score-0.255]
22 1 Related Work Phase Retrieval via Non-Convex Procedures: Inspite of the huge amount of work it has attracted, phase retrieval has been a long standing open problem. [sent-65, score-0.457]
23 Early work in this area focused on using holography to capture the phase information along with magnitude measurements [12]. [sent-66, score-0.59]
24 However, computational methods for reconstruction of the signal using only magnitude measurements received a lot of attention due to their applicability in resolving spurious noise, fringes, optical system aberrations and so on and difficulties in the implementation of interferometer setups [9]. [sent-67, score-0.292]
25 [21] first suggested the use of multiple magnitude measurements to resolve the phase problem. [sent-72, score-0.521]
26 Following the empirical success of these algorithms, researchers were able to explain its success in some of the instances [29] using Bregman’s theory of iterated projections onto convex sets [3]. [sent-74, score-0.163]
27 The papers [7, 6] then take the approach of relaxing the rank constraint by a trace norm penalty, making the overall algorithm a convex program (called PhaseLift) over n × n matrices. [sent-78, score-0.131]
28 To date, these convex methods are the only ones with analytical guarantees on statistical performance [5, 28] (i. [sent-80, score-0.115]
29 the number m of measurements required to recover x∗ ) – under an i. [sent-82, score-0.226]
30 random Gaussian model on the measurement vectors ai . [sent-85, score-0.206]
31 Sparse Phase Retrieval: A special case of the phase retrieval problem which has received a lot of attention recently is when the underlying signal x∗ is known to be sparse. [sent-87, score-0.487]
32 Though this problem is closely related to the compressed sensing problem, lack of phase information makes this harder. [sent-88, score-0.322]
33 [22, 18] use this observation to design ℓ1 regularized SDP algorithms for phase retrieval of sparse vectors. [sent-91, score-0.511]
34 For random Gaussian measurements, [18] shows that ℓ1 regularized PhaseLift recovers x∗ correctly if the number of measurements is Ω(k 2 log n). [sent-92, score-0.298]
35 ALS): Alternating minimization has been successfully applied to many applications in the low-rank matrix setting. [sent-97, score-0.106]
36 The only exceptions are the results in [16, 15] which give provable guarantees for alternating minimization for the problems of matrix sensing and matrix completion. [sent-101, score-0.35]
37 For every complex vector w ∈ Cn , |w| ∈ Rn denotes its element-wise magnitude vector. [sent-106, score-0.105]
38 We also define Ph (z) = |z| def for every z ∈ C, and dist (w1 , w2 ) = 1− w1 ,w2 w1 2 w2 2 2 for every w1 , w2 ∈ Cn . [sent-113, score-0.339]
39 Finally, we use the shorthand wlog for without loss of generality and whp for with high probability. [sent-114, score-0.116]
40 3 Algorithm In this section, we present our alternating minimization based algorithm for solving the phase retrieval problem. [sent-115, score-0.721]
41 If we had access to the true phase c∗ of AT x∗ (i. [sent-119, score-0.259]
42 , c∗ = Ph ( ai , x∗ )) and m ≥ n, then our problem reduces to one of solving a system of linear i equations: C∗ y = AT x ∗ , def where C∗ = Diag(c∗ ) is the diagonal matrix of phases. [sent-121, score-0.234]
43 Of course we do not know C∗ , hence one approach to recovering x∗ is to solve: argmin AT x − Cy 2 , (2) C,x where x ∈ Cn and C ∈ Cm×m is a diagonal matrix with each diagonal entry of magnitude 1. [sent-122, score-0.208]
44 Note that the above problem is not convex since C is restricted to be a diagonal phase matrix and hence, one cannot use standard convex optimization methods to solve it. [sent-123, score-0.415]
45 Instead, our algorithm uses the well-known alternating minimization: alternatingly update x and C so as to minimize (2). [sent-124, score-0.166]
46 Since the number of measurements m is larger than the dimensionality n and since each entry of A is sampled from independent Gaussians, A is invertible with probability 1. [sent-126, score-0.227]
47 We address the first key challenge in our AltMinPhase algorithm (Algorithm 1) by initializing x as 1 2 the largest singular vector of the matrix S = m i yi ai ai T . [sent-133, score-0.264]
48 1 shows that when A is sampled from standard complex normal distribution, this initialization is accurate. [sent-135, score-0.139]
49 Now, since the measurement matrix A is sampled from Gaussian with m > Cn, it is well conditioned. [sent-143, score-0.154]
50 Moreover, our initialization step increases the likelihood of successful recovery as opposed to a random initialization (which has been considered so far in prior work). [sent-147, score-0.25]
51 4 (a) (b) Figure 1: Sample and Time complexity of various methods for Gaussian measurement matrices A. [sent-149, score-0.184]
52 Figure 1(a) compares the number of measurements required for successful recovery by various methods. [sent-150, score-0.335]
53 We note that our initialization improves sample complexity over that of random initialization (AltMin (random init)) by a factor of 2. [sent-151, score-0.18]
54 AltMinPhase requires similar number of measurements as PhaseLift and PhaseCut. [sent-152, score-0.2]
55 4 Main Results: Analysis In this section we describe the main contribution of this paper: provable statistical guarantees for the success of alternating minimization in solving the phase recovery problem. [sent-155, score-0.684]
56 To this end, we consider the setting where each measurement vector ai is iid and is sampled from the standard complex normal distribution. [sent-156, score-0.275]
57 We would like to stress that all the existing guarantees for phase recovery also use exactly the same setting [6, 5, 28]. [sent-157, score-0.437]
58 Algorithm 2 PhaseLift [5] PhaseCut [28] Sample complexity O n log3 n + log 1 log log ǫ O (n) O (n) Comp. [sent-159, score-0.184]
59 complexity O n2 log3 n + log2 1 log log ǫ O n3 /ǫ2 √ O n3 / ǫ 1 ǫ 1 ǫ Table 1: Comparison of Algorithm 2 with PhaseLift and PhaseCut: Though the sample complexity of Algorithm 2 is off by log factors from that of PhaseLift and PhaseCut, it is O (n) better than them in computational complexity. [sent-160, score-0.224]
60 Our proof for convergence of alternating minimization can be broken into two key results. [sent-162, score-0.211]
61 We first show that if m ≥ Cn log3 n, then whp the initialization step used by AltMinPhase returns x0 which is at most a constant distance away from x∗ . [sent-163, score-0.151]
62 We then show that if xt is a fixed vector such that dist xt , x∗ < c (small enough) and A is sampled independently of xt with m > Cn (C large enough) then whp xt+1 satisfies: dist xt+1 , x∗ < 3 t ∗ (see Theorem 4. [sent-166, score-0.964]
63 Note that our analysis critically requires xt to be “fixed” and 4 dist x , x be independent of the sample matrix A. [sent-168, score-0.423]
64 There exist constants c, c and c such that in iteration t of Algorithm 2, if 1 dist xt , x∗ < c and the number of columns of At is greater than cn log η then, with probability more than 1 − η, we have: 3 dist xt , x∗ , and xt+1 − x∗ 4 dist xt+1 , x∗ < 2 < c dist xt , x∗ . [sent-180, score-1.699]
65 For simplicity of notation in the proof of the theorem, we will use A for At+1 , C for Ct+1 , x for xt , x+ for xt+1 , and y for yt+1 . [sent-182, score-0.107]
66 Now consider the update in the (t + 1)th iteration: x+ = argmin AT x − Cy x∈Rn 2 = AAT −1 ACy = AAT −1 ADAT x∗ , (3) def where D is a diagonal matrix with Dll = Ph aℓ T x · aℓ T x∗ . [sent-183, score-0.12]
67 ∃ a constant c1 such that | x∗ , x+ | ≥ 1 − c1 dist (x, x∗ ) (see Lemma A. [sent-186, score-0.281]
68 4) 9 Assuming the above two bounds and choosing c < dist x+ , x∗ 2 1 100c1 , we can prove the theorem: 2 < (25/81) · dist (x, x∗ ) 9 2 dist (x, x∗ ) , ≤ ∗ ))2 (1 − c1 dist (x, x 16 proving the first part of the theorem. [sent-192, score-1.124]
69 Intuition and key challenge: If we look at step 6 of Algorithm 2, we see that, for the measurements, we use magnitudes calculated from x∗ and phases calculated from x. [sent-195, score-0.126]
70 The key intuition behind the success of this procedure is that the push towards x∗ is stronger than the push towards x, when x is close to x∗ . [sent-197, score-0.12]
71 Algorithm 3 ℓ1 -PhaseLift [18] Sample complexity O k k log n + log 1 log log ǫ O k 2 log n 1 ǫ Comp. [sent-206, score-0.28]
72 complexity O k 2 kn log n + log2 1 log log ǫ O n3 /ǫ2 1 ǫ √ Table 2: Comparison of Algorithm 3 with ℓ1 -PhaseLift when x∗ = Ω 1/ k . [sent-207, score-0.184]
73 Suppose the measurement vectors in (1) are independent standard complex normal vectors. [sent-217, score-0.188]
74 For every η > 0, there exists a constant c such that if m > cn log3 n + log 1 log log 1 ǫ ǫ then, with probability greater than 1 − η, Algorithm 2 outputs xt0 such that xt0 − x∗ 2 < ǫ. [sent-218, score-0.35]
75 A natural and practical question to ask here is: can the sample and computational complexity of the recovery algorithm be improved when k ≪ n. [sent-220, score-0.176]
76 Recently, [18] studied this problem for Gaussian A and showed that for ℓ1 regularized PhaseLift, m = O(k 2 log n) samples suffice for exact recovery of x∗ . [sent-221, score-0.183]
77 Then, the problem reduces to phase retrieval of a k-dimensional signal. [sent-225, score-0.457]
78 The following lemma shows that if the number of measurements is large enough, step 1 of SparseAltMinPhase recovers the support of x∗ correctly. [sent-229, score-0.261]
79 If ai are standard complex Gaussian random vectors and m > ∗c 4 log n , then Algorithm 3 recovers S with probability δ (xmin ) greater than 1 − δ, where x∗ is the minimum non-zero entry of x∗ . [sent-233, score-0.258]
80 If the number of measurements m > c k 2 log n + k log2 k + k log δ probability greater than 1 − δ. [sent-243, score-0.324]
81 1 ǫ , then Algorithm 3 will recover x∗ up to accuracy ǫ with 7 (a) (b) (c) Figure 2: (a) & (b): Sample and time complexity for successful recovery using random Gaussian illumination filters. [sent-244, score-0.218]
82 We also empirically demonstrate the advantage of our initialization procedure over random initialization (denoted by AltMin (random init)), which has thus far been considered in the literature [13, 11, 28, 4]. [sent-250, score-0.171]
83 Independent Random Gaussian Measurements: Each measurement vector ai is generated from the standard complex Gaussian distribution. [sent-258, score-0.249]
84 This measurement scheme was first suggested by [6] and till date, this is the only scheme with theoretical guarantees. [sent-259, score-0.119]
85 Multiple Random Illumination Filters: We now present our results for the setting where the measurements are obtained using multiple illumination filters; this setting was suggested by [4]. [sent-260, score-0.242]
86 Our measurements will then be the magnitudes of components of the vectors x(1) , · · · , x(J) . [sent-262, score-0.29]
87 The above measurement scheme can be implemented by modulating the light beam or by the use of masks; see [4] for more details. [sent-263, score-0.119]
88 We again see that the measurement complexity of AltMinPhase is similar to that of PhaseCut and PhaseLift, but AltMinPhase is orders of magnitude faster than PhaseLift and PhaseCut. [sent-266, score-0.221]
89 Noisy Phase Retrieval: Finally, we study our method in the following noisy measurement scheme: yi = | ai , x∗ + wi | for i = 1, . [sent-267, score-0.235]
90 , m, (5) where wi is the noise in the i-th measurement and is sampled from N (0, σ 2 ). [sent-270, score-0.119]
91 We observe that our method outperforms PhaseLift and has similar recovery error as PhaseCut. [sent-276, score-0.11]
92 Solving quadratic equations via phaselift when there are about as many equations as unknowns. [sent-313, score-0.462]
93 Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. [sent-321, score-0.449]
94 A practical algorithm for the determination of phase from image and diffraction plane pictures. [sent-368, score-0.311]
95 Sparse signal recovery from quadratic measurements via convex programming. [sent-396, score-0.387]
96 Invited article: A unified evaluation of iterative projection algorithms for phase retrieval. [sent-401, score-0.259]
97 Extending the methodology of x-ray crystallography to allow imaging of micrometre-sized non-crystalline specimens. [sent-408, score-0.11]
98 A method for the solution of the phase problem in electron microscopy. [sent-412, score-0.259]
99 Compressive phase retrieval from squared output measurements via semidefinite programming. [sent-421, score-0.657]
100 Mathematical considerations for the problem of fourier transform phase retrieval from magnitude. [sent-437, score-0.523]
wordName wordTfidf (topN-words)
[('phaselift', 0.462), ('altminphase', 0.369), ('dist', 0.281), ('phase', 0.259), ('phasecut', 0.231), ('measurements', 0.2), ('retrieval', 0.198), ('cn', 0.178), ('alternating', 0.14), ('measurement', 0.119), ('ph', 0.117), ('sdp', 0.112), ('recovery', 0.11), ('xt', 0.107), ('altmin', 0.092), ('sparsealtminphase', 0.092), ('magnitudes', 0.09), ('ai', 0.087), ('whp', 0.081), ('minimization', 0.071), ('initialization', 0.07), ('crystallography', 0.069), ('cy', 0.069), ('fienup', 0.069), ('gerchberg', 0.069), ('holography', 0.069), ('supp', 0.067), ('aat', 0.064), ('magnitude', 0.062), ('apriori', 0.061), ('arxiv', 0.06), ('def', 0.058), ('init', 0.05), ('eldar', 0.05), ('log', 0.048), ('convex', 0.047), ('ct', 0.047), ('josa', 0.046), ('saxton', 0.046), ('strohmer', 0.046), ('preprint', 0.046), ('austin', 0.045), ('correctness', 0.044), ('complex', 0.043), ('zij', 0.042), ('illumination', 0.042), ('fourier', 0.042), ('push', 0.041), ('imaging', 0.041), ('adat', 0.041), ('complexity', 0.04), ('iterated', 0.04), ('sanghavi', 0.039), ('guarantees', 0.039), ('success', 0.038), ('netrapalli', 0.038), ('candes', 0.037), ('lemma', 0.036), ('phases', 0.036), ('wlog', 0.035), ('matrix', 0.035), ('geometrically', 0.034), ('ece', 0.034), ('compressed', 0.033), ('gaussian', 0.033), ('trace', 0.033), ('resampled', 0.032), ('argminx', 0.031), ('optics', 0.031), ('empirically', 0.031), ('recovering', 0.03), ('india', 0.03), ('signal', 0.03), ('sensing', 0.03), ('analytical', 0.029), ('relaxation', 0.029), ('stress', 0.029), ('lifting', 0.029), ('sparse', 0.029), ('yi', 0.029), ('diag', 0.029), ('greater', 0.028), ('tx', 0.027), ('solving', 0.027), ('diagonal', 0.027), ('entry', 0.027), ('normal', 0.026), ('recover', 0.026), ('determination', 0.026), ('algorithm', 0.026), ('squares', 0.025), ('various', 0.025), ('recovers', 0.025), ('relaxing', 0.025), ('regularized', 0.025), ('regarding', 0.025), ('transform', 0.024), ('aij', 0.024), ('jain', 0.023), ('date', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 247 nips-2013-Phase Retrieval using Alternating Minimization
Author: Praneeth Netrapalli, Prateek Jain, Sujay Sanghavi
Abstract: Phase retrieval problems involve solving linear equations, but with missing sign (or phase, for complex numbers). Over the last two decades, a popular generic empirical approach to the many variants of this problem has been one of alternating minimization; i.e. alternating between estimating the missing phase information, and the candidate solution. In this paper, we show that a simple alternating minimization algorithm geometrically converges to the solution of one such problem – finding a vector x from y, A, where y = |AT x| and |z| denotes a vector of element-wise magnitudes of z – under the assumption that A is Gaussian. Empirically, our algorithm performs similar to recently proposed convex techniques for this variant (which are based on “lifting” to a convex matrix problem) in sample complexity and robustness to noise. However, our algorithm is much more efficient and can scale to large problems. Analytically, we show geometric convergence to the solution, and sample complexity that is off by log factors from obvious lower bounds. We also establish close to optimal scaling for the case when the unknown vector is sparse. Our work represents the only known theoretical guarantee for alternating minimization for any variant of phase retrieval problems in the non-convex setting. 1
2 0.13559929 188 nips-2013-Memory Limited, Streaming PCA
Author: Ioannis Mitliagkas, Constantine Caramanis, Prateek Jain
Abstract: We consider streaming, one-pass principal component analysis (PCA), in the highdimensional regime, with limited memory. Here, p-dimensional samples are presented sequentially, and the goal is to produce the k-dimensional subspace that best approximates these points. Standard algorithms require O(p2 ) memory; meanwhile no algorithm can do better than O(kp) memory, since this is what the output itself requires. Memory (or storage) complexity is most meaningful when understood in the context of computational and sample complexity. Sample complexity for high-dimensional PCA is typically studied in the setting of the spiked covariance model, where p-dimensional points are generated from a population covariance equal to the identity (white noise) plus a low-dimensional perturbation (the spike) which is the signal to be recovered. It is now well-understood that the spike can be recovered when the number of samples, n, scales proportionally with the dimension, p. Yet, all algorithms that provably achieve this, have memory complexity O(p2 ). Meanwhile, algorithms with memory-complexity O(kp) do not have provable bounds on sample complexity comparable to p. We present an algorithm that achieves both: it uses O(kp) memory (meaning storage of any kind) and is able to compute the k-dimensional spike with O(p log p) samplecomplexity – the first algorithm of its kind. While our theoretical analysis focuses on the spiked covariance model, our simulations show that our algorithm is successful on much more general models for the data. 1
3 0.12082994 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions
Author: Eftychios A. Pnevmatikakis, Liam Paninski
Abstract: We propose a compressed sensing (CS) calcium imaging framework for monitoring large neuronal populations, where we image randomized projections of the spatial calcium concentration at each timestep, instead of measuring the concentration at individual locations. We develop scalable nonnegative deconvolution methods for extracting the neuronal spike time series from such observations. We also address the problem of demixing the spatial locations of the neurons using rank-penalized matrix factorization methods. By exploiting the sparsity of neural spiking we demonstrate that the number of measurements needed per timestep is significantly smaller than the total number of neurons, a result that can potentially enable imaging of larger populations at considerably faster rates compared to traditional raster-scanning techniques. Unlike traditional CS setups, our problem involves a block-diagonal sensing matrix and a non-orthogonal sparse basis that spans multiple timesteps. We provide tight approximations to the number of measurements needed for perfect deconvolution for certain classes of spiking processes, and show that this number undergoes a “phase transition,” which we characterize using modern tools relating conic geometry to compressed sensing. 1
4 0.10451426 59 nips-2013-Blind Calibration in Compressed Sensing using Message Passing Algorithms
Author: Christophe Schulke, Francesco Caltagirone, Florent Krzakala, Lenka Zdeborova
Abstract: Compressed sensing (CS) is a concept that allows to acquire compressible signals with a small number of measurements. As such it is very attractive for hardware implementations. Therefore, correct calibration of the hardware is a central issue. In this paper we study the so-called blind calibration, i.e. when the training signals that are available to perform the calibration are sparse but unknown. We extend the approximate message passing (AMP) algorithm used in CS to the case of blind calibration. In the calibration-AMP, both the gains on the sensors and the elements of the signals are treated as unknowns. Our algorithm is also applicable to settings in which the sensors distort the measurements in other ways than multiplication by a gain, unlike previously suggested blind calibration algorithms based on convex relaxations. We study numerically the phase diagram of the blind calibration problem, and show that even in cases where convex relaxation is possible, our algorithm requires a smaller number of measurements and/or signals in order to perform well. 1
5 0.1010721 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
Author: Ke Hou, Zirui Zhou, Anthony Man-Cho So, Zhi-Quan Luo
Abstract: Motivated by various applications in machine learning, the problem of minimizing a convex smooth loss function with trace norm regularization has received much attention lately. Currently, a popular method for solving such problem is the proximal gradient method (PGM), which is known to have a sublinear rate of convergence. In this paper, we show that for a large class of loss functions, the convergence rate of the PGM is in fact linear. Our result is established without any strong convexity assumption on the loss function. A key ingredient in our proof is a new Lipschitzian error bound for the aforementioned trace norm–regularized problem, which may be of independent interest.
6 0.097843915 269 nips-2013-Regression-tree Tuning in a Streaming Setting
7 0.092453912 88 nips-2013-Designed Measurements for Vector Count Data
8 0.085726902 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors
9 0.079505511 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
10 0.075190306 331 nips-2013-Top-Down Regularization of Deep Belief Networks
11 0.073811479 224 nips-2013-On the Sample Complexity of Subspace Learning
12 0.073052146 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
13 0.066043362 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models
14 0.065961979 137 nips-2013-High-Dimensional Gaussian Process Bandits
15 0.064049967 185 nips-2013-Matrix Completion From any Given Set of Observations
16 0.063719787 70 nips-2013-Contrastive Learning Using Spectral Methods
17 0.063660048 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
18 0.06310226 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
19 0.060951423 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
20 0.060453471 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
topicId topicWeight
[(0, 0.172), (1, 0.05), (2, 0.086), (3, 0.057), (4, -0.041), (5, -0.007), (6, -0.044), (7, 0.047), (8, -0.051), (9, -0.062), (10, 0.005), (11, -0.028), (12, -0.018), (13, 0.011), (14, -0.085), (15, -0.003), (16, 0.002), (17, 0.006), (18, -0.016), (19, 0.029), (20, -0.002), (21, 0.003), (22, 0.022), (23, -0.006), (24, -0.024), (25, -0.036), (26, -0.064), (27, -0.001), (28, 0.006), (29, -0.071), (30, 0.012), (31, -0.02), (32, -0.043), (33, 0.099), (34, 0.056), (35, -0.069), (36, 0.117), (37, 0.009), (38, 0.06), (39, -0.079), (40, 0.046), (41, 0.065), (42, -0.044), (43, -0.061), (44, 0.07), (45, -0.025), (46, 0.098), (47, -0.064), (48, -0.001), (49, -0.091)]
simIndex simValue paperId paperTitle
same-paper 1 0.91334403 247 nips-2013-Phase Retrieval using Alternating Minimization
Author: Praneeth Netrapalli, Prateek Jain, Sujay Sanghavi
Abstract: Phase retrieval problems involve solving linear equations, but with missing sign (or phase, for complex numbers). Over the last two decades, a popular generic empirical approach to the many variants of this problem has been one of alternating minimization; i.e. alternating between estimating the missing phase information, and the candidate solution. In this paper, we show that a simple alternating minimization algorithm geometrically converges to the solution of one such problem – finding a vector x from y, A, where y = |AT x| and |z| denotes a vector of element-wise magnitudes of z – under the assumption that A is Gaussian. Empirically, our algorithm performs similar to recently proposed convex techniques for this variant (which are based on “lifting” to a convex matrix problem) in sample complexity and robustness to noise. However, our algorithm is much more efficient and can scale to large problems. Analytically, we show geometric convergence to the solution, and sample complexity that is off by log factors from obvious lower bounds. We also establish close to optimal scaling for the case when the unknown vector is sparse. Our work represents the only known theoretical guarantee for alternating minimization for any variant of phase retrieval problems in the non-convex setting. 1
2 0.72032177 59 nips-2013-Blind Calibration in Compressed Sensing using Message Passing Algorithms
Author: Christophe Schulke, Francesco Caltagirone, Florent Krzakala, Lenka Zdeborova
Abstract: Compressed sensing (CS) is a concept that allows to acquire compressible signals with a small number of measurements. As such it is very attractive for hardware implementations. Therefore, correct calibration of the hardware is a central issue. In this paper we study the so-called blind calibration, i.e. when the training signals that are available to perform the calibration are sparse but unknown. We extend the approximate message passing (AMP) algorithm used in CS to the case of blind calibration. In the calibration-AMP, both the gains on the sensors and the elements of the signals are treated as unknowns. Our algorithm is also applicable to settings in which the sensors distort the measurements in other ways than multiplication by a gain, unlike previously suggested blind calibration algorithms based on convex relaxations. We study numerically the phase diagram of the blind calibration problem, and show that even in cases where convex relaxation is possible, our algorithm requires a smaller number of measurements and/or signals in order to perform well. 1
3 0.64154184 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions
Author: Eftychios A. Pnevmatikakis, Liam Paninski
Abstract: We propose a compressed sensing (CS) calcium imaging framework for monitoring large neuronal populations, where we image randomized projections of the spatial calcium concentration at each timestep, instead of measuring the concentration at individual locations. We develop scalable nonnegative deconvolution methods for extracting the neuronal spike time series from such observations. We also address the problem of demixing the spatial locations of the neurons using rank-penalized matrix factorization methods. By exploiting the sparsity of neural spiking we demonstrate that the number of measurements needed per timestep is significantly smaller than the total number of neurons, a result that can potentially enable imaging of larger populations at considerably faster rates compared to traditional raster-scanning techniques. Unlike traditional CS setups, our problem involves a block-diagonal sensing matrix and a non-orthogonal sparse basis that spans multiple timesteps. We provide tight approximations to the number of measurements needed for perfect deconvolution for certain classes of spiking processes, and show that this number undergoes a “phase transition,” which we characterize using modern tools relating conic geometry to compressed sensing. 1
4 0.62715501 88 nips-2013-Designed Measurements for Vector Count Data
Author: Liming Wang, David Carlson, Miguel Rodrigues, David Wilcox, Robert Calderbank, Lawrence Carin
Abstract: We consider design of linear projection measurements for a vector Poisson signal model. The projections are performed on the vector Poisson rate, X ∈ Rn , and the + observed data are a vector of counts, Y ∈ Zm . The projection matrix is designed + by maximizing mutual information between Y and X, I(Y ; X). When there is a latent class label C ∈ {1, . . . , L} associated with X, we consider the mutual information with respect to Y and C, I(Y ; C). New analytic expressions for the gradient of I(Y ; X) and I(Y ; C) are presented, with gradient performed with respect to the measurement matrix. Connections are made to the more widely studied Gaussian measurement model. Example results are presented for compressive topic modeling of a document corpora (word counting), and hyperspectral compressive sensing for chemical classification (photon counting). 1
5 0.61734855 55 nips-2013-Bellman Error Based Feature Generation using Random Projections on Sparse Spaces
Author: Mahdi Milani Fard, Yuri Grinberg, Amir massoud Farahmand, Joelle Pineau, Doina Precup
Abstract: This paper addresses the problem of automatic generation of features for value function approximation in reinforcement learning. Bellman Error Basis Functions (BEBFs) have been shown to improve policy evaluation, with a convergence rate similar to that of value iteration. We propose a simple, fast and robust algorithm based on random projections, which generates BEBFs for sparse feature spaces. We provide a finite sample analysis of the proposed method, and prove that projections logarithmic in the dimension of the original space guarantee a contraction in the error. Empirical results demonstrate the strength of this method in domains in which choosing a good state representation is challenging. 1
6 0.56030291 354 nips-2013-When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements
7 0.55948865 117 nips-2013-Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis
8 0.54716253 321 nips-2013-Supervised Sparse Analysis and Synthesis Operators
9 0.53469771 269 nips-2013-Regression-tree Tuning in a Streaming Setting
10 0.53267097 131 nips-2013-Geometric optimisation on positive definite matrices for elliptically contoured distributions
11 0.5171513 342 nips-2013-Unsupervised Spectral Learning of Finite State Transducers
12 0.51256889 214 nips-2013-On Algorithms for Sparse Multi-factor NMF
13 0.50571173 188 nips-2013-Memory Limited, Streaming PCA
14 0.49875182 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
15 0.498182 212 nips-2013-Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty
16 0.48215154 52 nips-2013-Bayesian inference as iterated random functions with applications to sequential inference in graphical models
17 0.48185116 186 nips-2013-Matrix factorization with binary components
18 0.48053938 91 nips-2013-Dirty Statistical Models
19 0.47923914 284 nips-2013-Robust Spatial Filtering with Beta Divergence
20 0.47515246 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
topicId topicWeight
[(2, 0.017), (16, 0.031), (33, 0.168), (34, 0.099), (36, 0.013), (38, 0.232), (41, 0.018), (49, 0.027), (56, 0.132), (70, 0.038), (85, 0.028), (89, 0.044), (93, 0.059), (95, 0.029)]
simIndex simValue paperId paperTitle
1 0.8302235 299 nips-2013-Solving inverse problem of Markov chain with partial observations
Author: Tetsuro Morimura, Takayuki Osogami, Tsuyoshi Ide
Abstract: The Markov chain is a convenient tool to represent the dynamics of complex systems such as traffic and social systems, where probabilistic transition takes place between internal states. A Markov chain is characterized by initial-state probabilities and a state-transition probability matrix. In the traditional setting, a major goal is to study properties of a Markov chain when those probabilities are known. This paper tackles an inverse version of the problem: we find those probabilities from partial observations at a limited number of states. The observations include the frequency of visiting a state and the rate of reaching a state from another. Practical examples of this task include traffic monitoring systems in cities, where we need to infer the traffic volume on single link on a road network from a limited number of observation points. We formulate this task as a regularized optimization problem, which is efficiently solved using the notion of natural gradient. Using synthetic and real-world data sets including city traffic monitoring data, we demonstrate the effectiveness of our method.
same-paper 2 0.80484527 247 nips-2013-Phase Retrieval using Alternating Minimization
Author: Praneeth Netrapalli, Prateek Jain, Sujay Sanghavi
Abstract: Phase retrieval problems involve solving linear equations, but with missing sign (or phase, for complex numbers). Over the last two decades, a popular generic empirical approach to the many variants of this problem has been one of alternating minimization; i.e. alternating between estimating the missing phase information, and the candidate solution. In this paper, we show that a simple alternating minimization algorithm geometrically converges to the solution of one such problem – finding a vector x from y, A, where y = |AT x| and |z| denotes a vector of element-wise magnitudes of z – under the assumption that A is Gaussian. Empirically, our algorithm performs similar to recently proposed convex techniques for this variant (which are based on “lifting” to a convex matrix problem) in sample complexity and robustness to noise. However, our algorithm is much more efficient and can scale to large problems. Analytically, we show geometric convergence to the solution, and sample complexity that is off by log factors from obvious lower bounds. We also establish close to optimal scaling for the case when the unknown vector is sparse. Our work represents the only known theoretical guarantee for alternating minimization for any variant of phase retrieval problems in the non-convex setting. 1
3 0.79807985 315 nips-2013-Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs
Author: Yann Dauphin, Yoshua Bengio
Abstract: Sparse high-dimensional data vectors are common in many application domains where a very large number of rarely non-zero features can be devised. Unfortunately, this creates a computational bottleneck for unsupervised feature learning algorithms such as those based on auto-encoders and RBMs, because they involve a reconstruction step where the whole input vector is predicted from the current feature values. An algorithm was recently developed to successfully handle the case of auto-encoders, based on an importance sampling scheme stochastically selecting which input elements to actually reconstruct during training for each particular example. To generalize this idea to RBMs, we propose a stochastic ratio-matching algorithm that inherits all the computational advantages and unbiasedness of the importance sampling scheme. We show that stochastic ratio matching is a good estimator, allowing the approach to beat the state-of-the-art on two bag-of-word text classification benchmarks (20 Newsgroups and RCV1), while keeping computational cost linear in the number of non-zeros. 1
4 0.72759902 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
Author: Tianbao Yang
Abstract: We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We make a progress along the line by presenting a distributed stochastic dual coordinate ascent algorithm in a star network, with an analysis of the tradeoff between computation and communication. We verify our analysis by experiments on real data sets. Moreover, we compare the proposed algorithm with distributed stochastic gradient descent methods and distributed alternating direction methods of multipliers for optimizing SVMs in the same distributed framework, and observe competitive performances. 1
5 0.72709316 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
Author: Cho-Jui Hsieh, Matyas A. Sustik, Inderjit Dhillon, Pradeep Ravikumar, Russell Poldrack
Abstract: The 1 -regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix even under high-dimensional settings. However, it requires solving a difficult non-smooth log-determinant program with number of parameters scaling quadratically with the number of Gaussian variables. State-of-the-art methods thus do not scale to problems with more than 20, 000 variables. In this paper, we develop an algorithm B IG QUIC, which can solve 1 million dimensional 1 regularized Gaussian MLE problems (which would thus have 1000 billion parameters) using a single machine, with bounded memory. In order to do so, we carefully exploit the underlying structure of the problem. Our innovations include a novel block-coordinate descent method with the blocks chosen via a clustering scheme to minimize repeated computations; and allowing for inexact computation of specific components. In spite of these modifications, we are able to theoretically analyze our procedure and show that B IG QUIC can achieve super-linear or even quadratic convergence rates. 1
6 0.72641879 318 nips-2013-Structured Learning via Logistic Regression
7 0.72552103 149 nips-2013-Latent Structured Active Learning
8 0.7255103 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization
9 0.72522807 75 nips-2013-Convex Two-Layer Modeling
10 0.72494745 233 nips-2013-Online Robust PCA via Stochastic Optimization
11 0.72363883 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints
12 0.72339678 251 nips-2013-Predicting Parameters in Deep Learning
13 0.72329557 249 nips-2013-Polar Operators for Structured Sparse Estimation
14 0.72308457 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions
15 0.72251791 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test
16 0.7222563 358 nips-2013-q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions
17 0.72216094 99 nips-2013-Dropout Training as Adaptive Regularization
18 0.72211134 186 nips-2013-Matrix factorization with binary components
19 0.72176176 15 nips-2013-A memory frontier for complex synapses
20 0.7215395 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC