nips nips2005 nips2005-163 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Michael B. Wakin, Marco F. Duarte, Shriram Sarvotham, Dror Baron, Richard G. Baraniuk
Abstract: Compressed sensing is an emerging field based on the revelation that a small group of linear projections of a sparse signal contains enough information for reconstruction. In this paper we introduce a new theory for distributed compressed sensing (DCS) that enables new distributed coding algorithms for multi-signal ensembles that exploit both intra- and inter-signal correlation structures. The DCS theory rests on a new concept that we term the joint sparsity of a signal ensemble. We study three simple models for jointly sparse signals, propose algorithms for joint recovery of multiple signals from incoherent projections, and characterize theoretically and empirically the number of measurements per sensor required for accurate reconstruction. In some sense DCS is a framework for distributed compression of sources with memory, which has remained a challenging problem in information theory for some time. DCS is immediately applicable to a range of problems in sensor networks and arrays. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Compressed sensing is an emerging field based on the revelation that a small group of linear projections of a sparse signal contains enough information for reconstruction. [sent-9, score-0.622]
2 In this paper we introduce a new theory for distributed compressed sensing (DCS) that enables new distributed coding algorithms for multi-signal ensembles that exploit both intra- and inter-signal correlation structures. [sent-10, score-0.393]
3 The DCS theory rests on a new concept that we term the joint sparsity of a signal ensemble. [sent-11, score-0.37]
4 We study three simple models for jointly sparse signals, propose algorithms for joint recovery of multiple signals from incoherent projections, and characterize theoretically and empirically the number of measurements per sensor required for accurate reconstruction. [sent-12, score-1.336]
5 In such a setting, distributed source coding [8, 13, 14, 17] may allow the sensors to save on communication costs. [sent-18, score-0.253]
6 Because sensor networks and arrays rely on data that often exhibit strong spatial correlations [13, 17], distributed compression can reduce the communication costs substantially, thus enhancing battery life. [sent-20, score-0.285]
7 We propose a new approach for distributed coding of correlated sources whose signal correlations take the form of a sparse structure. [sent-22, score-0.582]
8 [4] and Donoho [9], who showed that signals that are sparse relative to a e known basis can be recovered from a small number of nonadaptive linear projections onto a second basis that is incoherent with the first. [sent-25, score-0.742]
9 Hence CS with random projections is universal — the signals can be reconstructed if they are sparse relative to any known basis. [sent-27, score-0.432]
10 ) The implications of CS for signal acquisition and compression are very promising. [sent-28, score-0.265]
11 In our framework for distributed compressed sensing (DCS), this advantage is particularly compelling. [sent-30, score-0.25]
12 In a typical DCS scenario, a number of sensors measure signals that are each individually sparse in some basis and also correlated from sensor to sensor. [sent-31, score-0.67]
13 Each sensor independently encodes its signal by projecting it onto another, incoherent basis (such as a random one) and then transmits just a few of the resulting coefficients to a single collection point. [sent-32, score-0.562]
14 Under the right conditions, a decoder at the collection point can reconstruct each of the signals precisely. [sent-33, score-0.301]
15 The DCS theory rests on a concept that we term the joint sparsity of a signal ensemble. [sent-34, score-0.37]
16 We study in detail three simple models for jointly sparse signals, propose tractable algorithms for joint recovery of signal ensembles from incoherent projections, and characterize theoretically and empirically the number of measurements per sensor required for reconstruction. [sent-35, score-1.425]
17 While the sensors operate entirely without collaboration, joint decoding can recover signals using far fewer measurements per sensor than would be required for separable CS recovery. [sent-36, score-0.925]
18 2 Sparse Signal Recovery from Incoherent Projections In the traditional CS setting, we consider a single signal x ∈ RN , which we assume to be sparse in a known orthonormal basis or frame Ψ = [ψ1 , ψ2 , . [sent-38, score-0.467]
19 1 The signal x is observed indirectly via an M × N measurement matrix Φ, where M < N . [sent-43, score-0.339]
20 The M rows of Φ are the measurement vectors, against which the signal is projected. [sent-45, score-0.362]
21 In principle, remarkably few random measurements are required to recover a K-sparse signal via 0 minimization. [sent-55, score-0.608]
22 Clearly, more than K measurements must be taken to avoid ambiguity; in theory, K + 1 random measurements will suffice [2]. [sent-56, score-0.606]
23 The amazing revelation that supports the CS theory is that a much simpler problem yields an equivalent solution (thanks again to the incoherence of the bases): we need only solve 1 The 0 “norm” θ 0 merely counts the number of nonzero entries in the vector θ. [sent-58, score-0.243]
24 There is no free lunch, however; more than K + 1 measurements will be required in order to recover sparse signals. [sent-64, score-0.575]
25 In general, there exists a constant oversampling factor c = c(K, N ) such that cK measurements suffice to recover x with very high probability [4, 9]. [sent-65, score-0.353]
26 We exploit both BP and greedy algorithms for recovering jointly sparse signals. [sent-69, score-0.296]
27 3 Joint Sparsity Models In this section, we generalize the notion of a signal being sparse in some basis to the notion of an ensemble of signals being jointly sparse. [sent-70, score-0.783]
28 We use the following notation for our signal ensembles and measurement model. [sent-74, score-0.38]
29 Denote the signals in the ensemble by xj , j ∈ {1, 2, . [sent-75, score-0.337]
30 We assume that there exists a known sparse basis Ψ for RN in which the xj can be sparsely represented. [sent-79, score-0.338]
31 Denote by Φj the measurement matrix for signal j; Φj is Mj × N and, in general, the entries of Φj are different for each j. [sent-80, score-0.373]
32 Thus, yj = Φj xj consists of Mj < N incoherent measurements of xj . [sent-81, score-0.699]
33 In this model, all signals share a common sparse component while each individual signal contains a sparse innovation component; that is, xj = zC + zj , j ∈ {1, 2, . [sent-83, score-1.285]
34 Thus, the signal zC is common to all of the xj and has sparsity K in basis Ψ. [sent-87, score-0.517]
35 The signals zj are the unique portions of the xj and have sparsity Kj in the same basis. [sent-88, score-0.518]
36 Global factors, such as the sun and prevailing winds, could have an effect zC that is both common to all sensors and structured enough to permit sparse representation. [sent-91, score-0.343]
37 More local factors, such as shade, water, or animals, could contribute localized innovations zj that are also structured (and hence sparse). [sent-92, score-0.418]
38 In this model, all signals are constructed from the same sparse set of basis vectors, but with different coefficients; that is, xj = Ψθj , j ∈ {1, 2, . [sent-95, score-0.51]
39 Hence, all signals have 0 sparsity of K, and all are constructed from the same K basis elements, but with arbitrarily different coefficients. [sent-102, score-0.32]
40 A practical situation well-modeled by JSM-2 is where multiple sensors acquire the same signal but with phase shifts and attenuations caused by signal propagation. [sent-103, score-0.596]
41 This model extends JSM-1 so that the common component need no longer be sparse in any basis; that is, xj = zC + zj , j ∈ {1, 2, . [sent-107, score-0.547]
42 , J} with zC = ΨθC and zj = Ψθj , θj 0 = Kj , but zC is not necessarily sparse in the basis Ψ. [sent-110, score-0.414]
43 We also consider the case where the supports of the innovations are shared for all signals, which extends JSM-2. [sent-111, score-0.351]
44 A practical situation well-modeled by JSM-3 is where several sources are recorded by different sensors together with a background signal that is not sparse in any basis. [sent-112, score-0.531]
45 For example, it motivates the compression of data such as video, where the innovations or differences between video frames may be sparse, even though a single frame may not be very sparse. [sent-117, score-0.296]
46 4 Recovery of Jointly Sparse Signals In a setting where a network or array of sensors may encounter a collection of jointly sparse signals, and where a centralized reconstruction algorithm is feasible, the number of incoherent measurements required by each sensor can be reduced. [sent-119, score-1.099]
47 For each JSM, we propose algorithms for joint signal recovery from incoherent projections and characterize theoretically and empirically the number of measurements per sensor required for accurate reconstruction. [sent-120, score-1.2]
48 1 JSM-1: Sparse common component + innovations For this model (see also [1, 2]), we have proposed an analytical framework inspired by the principles of information theory. [sent-123, score-0.353]
49 This allows us to characterize the measurement rates Mj required to jointly reconstruct the signals xj . [sent-124, score-0.632]
50 In addition, given the (K + K1 )c measurements for x1 as side information, and assuming that the partitioning of x1 into zC and z1 is known, cK2 measurements that describe z2 should allow reconstruction of x2 . [sent-129, score-0.73]
51 We have also established upper bounds on the required measurement rates Mj by proposing a specific algorithm for reconstruction [1]. [sent-132, score-0.314]
52 The algorithm uses carefully designed measurement matrices Φj (in which some rows are identical and some differ) so that the resulting measurements can be combined to allow step-by-step recovery of the sparse components. [sent-133, score-0.802]
53 The theoretical rates Mj are below those required for separable CS recovery of each signal xj (see Fig. [sent-134, score-0.578]
54 8 1 0 0 5 10 15 20 25 30 R Number of measurements per sensor (a) (b) Figure 1: (a) Converse bounds and achievable measurement rates for J = 2 signals with common 1 sparse component and sparse innovations (JSM-1). [sent-162, score-1.577]
55 We fix signal lengths N = 1000 and sparsities K = 200, K1 = K2 = 50. [sent-163, score-0.253]
56 The measurement rates Rj := Mj /N reflect the number of measurements normalized by the signal length. [sent-164, score-0.678]
57 Blue curves indicate our theoretical and anticipated converse bounds; red indicates a provably achievable region, and pink denotes the rates required for separable CS signal reconstruction. [sent-165, score-0.43]
58 (b) Reconstructing a signal ensemble with common sparse supports (JSM2). [sent-166, score-0.639]
59 We plot the probability of perfect reconstruction via DCS-SOMP (solid lines) and independent CS reconstruction (dashed lines) as a function of the number of measurements per signal M and the number of signals J . [sent-167, score-1.026]
60 We fix the signal length to N = 50 and the sparsity to K = 5. [sent-168, score-0.308]
61 An oracle encoder that knows the positions of the large coefficients would use 5 measurements per signal. [sent-169, score-0.383]
62 2 JSM-2: Common sparse supports Under the JSM-2 signal ensemble model (see also [2, 11]), independent recovery of each signal via 1 minimization would require cK measurements per signal. [sent-175, score-1.329]
63 To establish a theoretical justification for our approach, we first proposed a simple OneStep Greedy Algorithm (OSGA) [11] that combines all of the measurements and seeks the largest correlations with the columns of the Φj Ψ. [sent-179, score-0.347]
64 Gaussian, then with M ≥ 1 measurements per signal, OSGA recovers Ω with probability approaching 1 as J → ∞. [sent-186, score-0.414]
65 Moreover, with M ≥ K measurements per signal, OSGA recovers all xj with probability approaching 1 as J → ∞. [sent-187, score-0.505]
66 We fix the signal lengths at N = 50 and the sparsity of each signal to K = 5. [sent-193, score-0.528]
67 With DCS-SOMP, for perfect reconstruction of all signals the average number of measurements per signal decreases as a function of J. [sent-194, score-0.902]
68 The trend suggests that, for very large J, close to K measurements per signal should suffice. [sent-195, score-0.578]
69 On the contrary, with independent CS reconstruction, for perfect reconstruction of all signals the number of measurements per sensor increases as a function of J. [sent-196, score-0.811]
70 3 JSM-3: Nonsparse common + sparse innovations The JSM-3 signal ensemble model provides a particularly compelling motivation for joint recovery. [sent-200, score-0.825]
71 Under this model, no individual signal xj is sparse, and so separate signal recovery would require fully N measurements per signal. [sent-201, score-1.059]
72 Our recovery algorithms are based on the observation that if the common component zC were known, then each innovation zj could be estimated using the standard single-signal CS machinery on the adjusted measurements yj −Φj zC = Φj zj . [sent-203, score-1.151]
73 Since zC is not sparse, it cannot be recovered via CS techniques, but when the number of measurements is sufficiently large ( j Mj N ), zC can be estimated using standard tools from linear algebra. [sent-206, score-0.362]
74 In the limit, zC can be recovered while still allowing each sensor to operate at the minimum measurement rate dictated by the {zj }. [sent-208, score-0.336]
75 Estimate measurements generated by innovations: Using the previous estimate, subtract the contribution of the common part on the measurements and generate estimates for the measurements caused by the innovations for each signal: yj = yj − Φj zC . [sent-220, score-1.435]
76 Reconstruct innovations: Using a standard single-signal CS reconstruction algorithm, obtain estimates of the innovations zj from the estimated innovation measurements yj . [sent-222, score-1.087]
77 Obtain signal estimates: Sum the above estimates, letting xj = zC + zj . [sent-224, score-0.478]
78 The following theorem shows that asymptotically, by using the TECC algorithm, each sensor need only measure at the rate dictated by the sparsity Kj . [sent-225, score-0.246]
79 Theorem 1 [2] Assume that the nonzero expansion coefficients of the sparse innovations zj are i. [sent-226, score-0.635]
80 Then each signal xj can be recovered using the TECC algorithm with probability approaching 1 as J → ∞. [sent-238, score-0.405]
81 Then with probability 1, the signal xj cannot be uniquely recovered by any algorithm for any J. [sent-244, score-0.37]
82 For large J, the measurement rates permitted by Statement 1 are the lowest possible for any reconstruction strategy on JSM-3 signals, even neglecting the presence of the nonsparse component. [sent-245, score-0.363]
83 For example, suppose in the TECC algorithm that the initial estimate zC is not accurate enough to enable correct identification of the sparse innovation supports {Ωj }. [sent-250, score-0.476]
84 In such a case, it may still be possible for a rough approximation of the innovations {zj } to help refine the estimate zC . [sent-251, score-0.281]
85 The Alternating Common and Innovation Estimation (ACIE) algorithm exploits the observation that once the basis vectors comprising the innovation zj have been identified in the index set Ωj , their effect on the measurements yj can be removed to aid in estimating zC . [sent-254, score-0.772]
86 Remove the projection of the measurements into the aforementioned span to obtain measurements caused exclusively by vectors not in Ωj , letting T T T T yj = QT yj and Φj = QT Φj . [sent-263, score-0.822]
87 ΦT 1 2 J T to refine the estimate of the measurements caused by the common part of the signal, setting zC = Φ† Y , where A† = (AT A)−1 AT denotes the pseudoinverse of matrix A. [sent-270, score-0.42]
88 Estimate innovation supports: For each signal j, subtract zC from the measurements, yj = yj − Φj zC , and estimate the sparse support of each innovation Ωj . [sent-272, score-0.943]
89 Estimate innovation coefficients: For each signal j, estimate the coefficients for the indices in Ωj , setting θj,Ωj = Φ† Ω (yj − Φj zC ), where θj,Ωj is a sampled version of j, j the innovation’s sparse coefficient vector estimate θj . [sent-277, score-0.626]
90 Reconstruct signals: Estimate each signal as xj = zC + zj = zC + Φj θj . [sent-279, score-0.478]
91 In the case where the innovation support estimate is correct (Ωj = Ωj ), the measurements yj will describe only the common component zC . [sent-280, score-0.677]
92 If this is true for every signal j and the number of remaining measurements j Mj −KJ ≥ N , then zC can be perfectly recovered in Step 2. [sent-281, score-0.582]
93 2(a) shows that, for sufficiently large J, we can recover all of the signals with significantly fewer than N measurements per signal. [sent-284, score-0.602]
94 We believe this is inevitable, because even if zC were known without error, then perfect ensemble recovery would require the successful execution of J independent runs of OMP. [sent-287, score-0.272]
95 We believe this is due to the fact that initial errors in estimating zC may tend to be somewhat sparse (since zC roughly becomes an average of the signals {xj }), and these sparse errors can mislead the subsequent OMP processes. [sent-289, score-0.546]
96 2(b) shows that when the sparse innovations share common supports we see an even greater savings. [sent-293, score-0.596]
97 As a point of reference, a traditional approach to signal encoding would require 1600 total measurements to reconstruct these J = 32 nonsparse signals of length N = 50. [sent-294, score-0.864]
98 1 50 0 0 5 10 15 20 25 30 35 Number of Measurements per Signal, M Number of Measurements per Signal, M (a) (b) Figure 2: Reconstructing a signal ensemble with nonsparse common component and sparse inno- vations (JSM-3) using ACIE. [sent-314, score-0.777]
99 Stable signal recovery from incomplete and inaccurate e measurements. [sent-356, score-0.39]
100 Near optimal signal recovery from random projections and universal e encoding strategies. [sent-363, score-0.463]
wordName wordTfidf (topN-words)
[('zc', 0.465), ('measurements', 0.303), ('innovations', 0.251), ('signal', 0.22), ('sparse', 0.187), ('mj', 0.178), ('signals', 0.172), ('recovery', 0.17), ('cs', 0.17), ('zj', 0.167), ('innovation', 0.159), ('dcs', 0.134), ('incoherent', 0.131), ('sensor', 0.129), ('reconstruction', 0.124), ('measurement', 0.119), ('compressed', 0.111), ('cand', 0.1), ('supports', 0.1), ('sensors', 0.098), ('xj', 0.091), ('sparsity', 0.088), ('kj', 0.087), ('reconstruct', 0.085), ('duarte', 0.084), ('nonsparse', 0.084), ('tecc', 0.084), ('wakin', 0.084), ('yj', 0.083), ('sensing', 0.082), ('ensemble', 0.074), ('projections', 0.073), ('omp', 0.073), ('jointly', 0.07), ('acie', 0.067), ('osga', 0.067), ('sarvotham', 0.067), ('ece', 0.066), ('basis', 0.06), ('recovered', 0.059), ('rice', 0.058), ('baron', 0.058), ('common', 0.058), ('distributed', 0.057), ('coef', 0.057), ('cients', 0.056), ('per', 0.055), ('incoherence', 0.05), ('jsms', 0.05), ('recover', 0.05), ('converse', 0.05), ('compression', 0.045), ('coding', 0.045), ('component', 0.044), ('tropp', 0.044), ('ce', 0.041), ('ensembles', 0.041), ('pursuit', 0.039), ('greedy', 0.039), ('donoho', 0.037), ('rates', 0.036), ('achievable', 0.036), ('joint', 0.035), ('required', 0.035), ('approaching', 0.035), ('entries', 0.034), ('sparsities', 0.033), ('communication', 0.031), ('emerging', 0.031), ('estimate', 0.03), ('nonzero', 0.03), ('acquire', 0.029), ('dictated', 0.029), ('revelation', 0.029), ('caused', 0.029), ('perfect', 0.028), ('rests', 0.027), ('anticipated', 0.027), ('separable', 0.026), ('sources', 0.026), ('encoder', 0.025), ('theoretically', 0.025), ('correlated', 0.024), ('characterize', 0.024), ('orthogonal', 0.023), ('production', 0.023), ('rows', 0.023), ('correlations', 0.023), ('source', 0.022), ('rn', 0.022), ('collection', 0.022), ('decoder', 0.022), ('subtract', 0.022), ('meets', 0.022), ('suf', 0.022), ('fewer', 0.022), ('recovers', 0.021), ('device', 0.021), ('columns', 0.021), ('span', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 163 nips-2005-Recovery of Jointly Sparse Signals from Few Random Projections
Author: Michael B. Wakin, Marco F. Duarte, Shriram Sarvotham, Dror Baron, Richard G. Baraniuk
Abstract: Compressed sensing is an emerging field based on the revelation that a small group of linear projections of a sparse signal contains enough information for reconstruction. In this paper we introduce a new theory for distributed compressed sensing (DCS) that enables new distributed coding algorithms for multi-signal ensembles that exploit both intra- and inter-signal correlation structures. The DCS theory rests on a new concept that we term the joint sparsity of a signal ensemble. We study three simple models for jointly sparse signals, propose algorithms for joint recovery of multiple signals from incoherent projections, and characterize theoretically and empirically the number of measurements per sensor required for accurate reconstruction. In some sense DCS is a framework for distributed compression of sources with memory, which has remained a challenging problem in information theory for some time. DCS is immediately applicable to a range of problems in sensor networks and arrays. 1
2 0.095812649 162 nips-2005-Rate Distortion Codes in Sensor Networks: A System-level Analysis
Author: Tatsuto Murayama, Peter Davis
Abstract: This paper provides a system-level analysis of a scalable distributed sensing model for networked sensors. In our system model, a data center acquires data from a bunch of L sensors which each independently encode their noisy observations of an original binary sequence, and transmit their encoded data sequences to the data center at a combined rate R, which is limited. Supposing that the sensors use independent LDGM rate distortion codes, we show that the system performance can be evaluated for any given finite R when the number of sensors L goes to infinity. The analysis shows how the optimal strategy for the distributed sensing problem changes at critical values of the data rate R or the noise level. 1
3 0.095341146 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations
Author: Bhaskar D. Rao, David P. Wipf
Abstract: Given a redundant dictionary of basis vectors (or atoms), our goal is to find maximally sparse representations of signals. Previously, we have argued that a sparse Bayesian learning (SBL) framework is particularly well-suited for this task, showing that it has far fewer local minima than other Bayesian-inspired strategies. In this paper, we provide further evidence for this claim by proving a restricted equivalence condition, based on the distribution of the nonzero generating model weights, whereby the SBL solution will equal the maximally sparse representation. We also prove that if these nonzero weights are drawn from an approximate Jeffreys prior, then with probability approaching one, our equivalence condition is satisfied. Finally, we motivate the worst-case scenario for SBL and demonstrate that it is still better than the most widely used sparse representation algorithms. These include Basis Pursuit (BP), which is based on a convex relaxation of the ℓ0 (quasi)-norm, and Orthogonal Matching Pursuit (OMP), a simple greedy strategy that iteratively selects basis vectors most aligned with the current residual. 1
4 0.089585148 101 nips-2005-Is Early Vision Optimized for Extracting Higher-order Dependencies?
Author: Yan Karklin, Michael S. Lewicki
Abstract: Linear implementations of the efficient coding hypothesis, such as independent component analysis (ICA) and sparse coding models, have provided functional explanations for properties of simple cells in V1 [1, 2]. These models, however, ignore the non-linear behavior of neurons and fail to match individual and population properties of neural receptive fields in subtle but important ways. Hierarchical models, including Gaussian Scale Mixtures [3, 4] and other generative statistical models [5, 6], can capture higher-order regularities in natural images and explain nonlinear aspects of neural processing such as normalization and context effects [6,7]. Previously, it had been assumed that the lower level representation is independent of the hierarchy, and had been fixed when training these models. Here we examine the optimal lower-level representations derived in the context of a hierarchical model and find that the resulting representations are strikingly different from those based on linear models. Unlike the the basis functions and filters learned by ICA or sparse coding, these functions individually more closely resemble simple cell receptive fields and collectively span a broad range of spatial scales. Our work unifies several related approaches and observations about natural image structure and suggests that hierarchical models might yield better representations of image structure throughout the hierarchy.
5 0.088899165 20 nips-2005-Affine Structure From Sound
Author: Sebastian Thrun
Abstract: We consider the problem of localizing a set of microphones together with a set of external acoustic events (e.g., hand claps), emitted at unknown times and unknown locations. We propose a solution that approximates this problem under a far field approximation defined in the calculus of affine geometry, and that relies on singular value decomposition (SVD) to recover the affine structure of the problem. We then define low-dimensional optimization techniques for embedding the solution into Euclidean geometry, and further techniques for recovering the locations and emission times of the acoustic events. The approach is useful for the calibration of ad-hoc microphone arrays and sensor networks. 1
6 0.07567443 192 nips-2005-The Information-Form Data Association Filter
7 0.075525247 158 nips-2005-Products of ``Edge-perts
8 0.071187191 15 nips-2005-A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels
9 0.068924844 168 nips-2005-Rodeo: Sparse Nonparametric Regression in High Dimensions
10 0.068583719 23 nips-2005-An Application of Markov Random Fields to Range Sensing
11 0.066476479 88 nips-2005-Gradient Flow Independent Component Analysis in Micropower VLSI
12 0.066411227 184 nips-2005-Structured Prediction via the Extragradient Method
13 0.064331852 136 nips-2005-Noise and the two-thirds power Law
14 0.061097629 183 nips-2005-Stimulus Evoked Independent Factor Analysis of MEG Data with Large Background Activity
15 0.058167126 174 nips-2005-Separation of Music Signals by Harmonic Structure Modeling
16 0.051554762 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis
17 0.05068057 16 nips-2005-A matching pursuit approach to sparse Gaussian process regression
18 0.048954654 29 nips-2005-Analyzing Coupled Brain Sources: Distinguishing True from Spurious Interaction
19 0.048685223 27 nips-2005-Analysis of Spectral Kernel Design based Semi-supervised Learning
20 0.048123199 102 nips-2005-Kernelized Infomax Clustering
topicId topicWeight
[(0, 0.155), (1, 0.009), (2, -0.023), (3, 0.056), (4, -0.009), (5, 0.001), (6, -0.164), (7, -0.098), (8, 0.135), (9, -0.027), (10, -0.116), (11, 0.006), (12, 0.031), (13, -0.031), (14, 0.057), (15, 0.069), (16, -0.045), (17, 0.064), (18, 0.035), (19, -0.091), (20, -0.1), (21, -0.051), (22, 0.162), (23, -0.009), (24, 0.1), (25, -0.062), (26, -0.077), (27, -0.015), (28, 0.036), (29, -0.15), (30, 0.019), (31, 0.002), (32, 0.014), (33, 0.103), (34, 0.054), (35, 0.043), (36, -0.067), (37, 0.052), (38, 0.019), (39, -0.028), (40, -0.007), (41, -0.164), (42, -0.052), (43, 0.058), (44, -0.026), (45, 0.095), (46, -0.072), (47, -0.18), (48, 0.106), (49, -0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.97382575 163 nips-2005-Recovery of Jointly Sparse Signals from Few Random Projections
Author: Michael B. Wakin, Marco F. Duarte, Shriram Sarvotham, Dror Baron, Richard G. Baraniuk
Abstract: Compressed sensing is an emerging field based on the revelation that a small group of linear projections of a sparse signal contains enough information for reconstruction. In this paper we introduce a new theory for distributed compressed sensing (DCS) that enables new distributed coding algorithms for multi-signal ensembles that exploit both intra- and inter-signal correlation structures. The DCS theory rests on a new concept that we term the joint sparsity of a signal ensemble. We study three simple models for jointly sparse signals, propose algorithms for joint recovery of multiple signals from incoherent projections, and characterize theoretically and empirically the number of measurements per sensor required for accurate reconstruction. In some sense DCS is a framework for distributed compression of sources with memory, which has remained a challenging problem in information theory for some time. DCS is immediately applicable to a range of problems in sensor networks and arrays. 1
2 0.65932274 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations
Author: Bhaskar D. Rao, David P. Wipf
Abstract: Given a redundant dictionary of basis vectors (or atoms), our goal is to find maximally sparse representations of signals. Previously, we have argued that a sparse Bayesian learning (SBL) framework is particularly well-suited for this task, showing that it has far fewer local minima than other Bayesian-inspired strategies. In this paper, we provide further evidence for this claim by proving a restricted equivalence condition, based on the distribution of the nonzero generating model weights, whereby the SBL solution will equal the maximally sparse representation. We also prove that if these nonzero weights are drawn from an approximate Jeffreys prior, then with probability approaching one, our equivalence condition is satisfied. Finally, we motivate the worst-case scenario for SBL and demonstrate that it is still better than the most widely used sparse representation algorithms. These include Basis Pursuit (BP), which is based on a convex relaxation of the ℓ0 (quasi)-norm, and Orthogonal Matching Pursuit (OMP), a simple greedy strategy that iteratively selects basis vectors most aligned with the current residual. 1
3 0.60194373 162 nips-2005-Rate Distortion Codes in Sensor Networks: A System-level Analysis
Author: Tatsuto Murayama, Peter Davis
Abstract: This paper provides a system-level analysis of a scalable distributed sensing model for networked sensors. In our system model, a data center acquires data from a bunch of L sensors which each independently encode their noisy observations of an original binary sequence, and transmit their encoded data sequences to the data center at a combined rate R, which is limited. Supposing that the sensors use independent LDGM rate distortion codes, we show that the system performance can be evaluated for any given finite R when the number of sensors L goes to infinity. The analysis shows how the optimal strategy for the distributed sensing problem changes at critical values of the data rate R or the noise level. 1
4 0.51854223 174 nips-2005-Separation of Music Signals by Harmonic Structure Modeling
Author: Yun-gang Zhang, Chang-shui Zhang
Abstract: Separation of music signals is an interesting but difficult problem. It is helpful for many other music researches such as audio content analysis. In this paper, a new music signal separation method is proposed, which is based on harmonic structure modeling. The main idea of harmonic structure modeling is that the harmonic structure of a music signal is stable, so a music signal can be represented by a harmonic structure model. Accordingly, a corresponding separation algorithm is proposed. The main idea is to learn a harmonic structure model for each music signal in the mixture, and then separate signals by using these models to distinguish harmonic structures of different signals. Experimental results show that the algorithm can separate signals and obtain not only a very high Signalto-Noise Ratio (SNR) but also a rather good subjective audio quality. 1
5 0.48706129 158 nips-2005-Products of ``Edge-perts
Author: Max Welling, Peter V. Gehler
Abstract: Images represent an important and abundant source of data. Understanding their statistical structure has important applications such as image compression and restoration. In this paper we propose a particular kind of probabilistic model, dubbed the “products of edge-perts model” to describe the structure of wavelet transformed images. We develop a practical denoising algorithm based on a single edge-pert and show state-ofthe-art denoising performance on benchmark images. 1
6 0.46802136 20 nips-2005-Affine Structure From Sound
7 0.45143253 101 nips-2005-Is Early Vision Optimized for Extracting Higher-order Dependencies?
8 0.42368734 88 nips-2005-Gradient Flow Independent Component Analysis in Micropower VLSI
9 0.42319781 183 nips-2005-Stimulus Evoked Independent Factor Analysis of MEG Data with Large Background Activity
10 0.40717465 180 nips-2005-Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms
11 0.38219061 15 nips-2005-A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels
12 0.36427146 62 nips-2005-Efficient Estimation of OOMs
13 0.35636806 121 nips-2005-Location-based activity recognition
14 0.3284494 73 nips-2005-Fast biped walking with a reflexive controller and real-time policy searching
15 0.32265711 168 nips-2005-Rodeo: Sparse Nonparametric Regression in High Dimensions
16 0.32041699 29 nips-2005-Analyzing Coupled Brain Sources: Distinguishing True from Spurious Interaction
17 0.31672511 24 nips-2005-An Approximate Inference Approach for the PCA Reconstruction Error
18 0.30953944 136 nips-2005-Noise and the two-thirds power Law
19 0.28507182 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach
20 0.27489749 184 nips-2005-Structured Prediction via the Extragradient Method
topicId topicWeight
[(3, 0.051), (10, 0.035), (27, 0.039), (31, 0.048), (34, 0.077), (39, 0.024), (50, 0.018), (52, 0.016), (55, 0.043), (69, 0.069), (73, 0.03), (77, 0.019), (88, 0.088), (91, 0.042), (93, 0.307)]
simIndex simValue paperId paperTitle
same-paper 1 0.82351065 163 nips-2005-Recovery of Jointly Sparse Signals from Few Random Projections
Author: Michael B. Wakin, Marco F. Duarte, Shriram Sarvotham, Dror Baron, Richard G. Baraniuk
Abstract: Compressed sensing is an emerging field based on the revelation that a small group of linear projections of a sparse signal contains enough information for reconstruction. In this paper we introduce a new theory for distributed compressed sensing (DCS) that enables new distributed coding algorithms for multi-signal ensembles that exploit both intra- and inter-signal correlation structures. The DCS theory rests on a new concept that we term the joint sparsity of a signal ensemble. We study three simple models for jointly sparse signals, propose algorithms for joint recovery of multiple signals from incoherent projections, and characterize theoretically and empirically the number of measurements per sensor required for accurate reconstruction. In some sense DCS is a framework for distributed compression of sources with memory, which has remained a challenging problem in information theory for some time. DCS is immediately applicable to a range of problems in sensor networks and arrays. 1
2 0.77091336 170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape
Author: Tai-sing Lee, Brian R. Potetz
Abstract: This paper explores the statistical relationship between natural images and their underlying range (depth) images. We look at how this relationship changes over scale, and how this information can be used to enhance low resolution range data using a full resolution intensity image. Based on our findings, we propose an extension to an existing technique known as shape recipes [3], and the success of the two methods are compared using images and laser scans of real scenes. Our extension is shown to provide a two-fold improvement over the current method. Furthermore, we demonstrate that ideal linear shape-from-shading filters, when learned from natural scenes, may derive even more strength from shadow cues than from the traditional linear-Lambertian shading cues. 1
3 0.47433102 45 nips-2005-Conditional Visual Tracking in Kernel Space
Author: Cristian Sminchisescu, Atul Kanujia, Zhiguo Li, Dimitris Metaxas
Abstract: We present a conditional temporal probabilistic framework for reconstructing 3D human motion in monocular video based on descriptors encoding image silhouette observations. For computational efÄ?Ĺš ciency we restrict visual inference to low-dimensional kernel induced non-linear state spaces. Our methodology (kBME) combines kernel PCA-based non-linear dimensionality reduction (kPCA) and Conditional Bayesian Mixture of Experts (BME) in order to learn complex multivalued predictors between observations and model hidden states. This is necessary for accurate, inverse, visual perception inferences, where several probable, distant 3D solutions exist due to noise or the uncertainty of monocular perspective projection. Low-dimensional models are appropriate because many visual processes exhibit strong non-linear correlations in both the image observations and the target, hidden state variables. The learned predictors are temporally combined within a conditional graphical model in order to allow a principled propagation of uncertainty. We study several predictors and empirically show that the proposed algorithm positively compares with techniques based on regression, Kernel Dependency Estimation (KDE) or PCA alone, and gives results competitive to those of high-dimensional mixture predictors at a fraction of their computational cost. We show that the method successfully reconstructs the complex 3D motion of humans in real monocular video sequences. 1 Introduction and Related Work We consider the problem of inferring 3D articulated human motion from monocular video. This research topic has applications for scene understanding including human-computer interfaces, markerless human motion capture, entertainment and surveillance. A monocular approach is relevant because in real-world settings the human body parts are rarely completely observed even when using multiple cameras. This is due to occlusions form other people or objects in the scene. A robust system has to necessarily deal with incomplete, ambiguous and uncertain measurements. Methods for 3D human motion reconstruction can be classiÄ?Ĺš ed as generative and discriminative. They both require a state representation, namely a 3D human model with kinematics (joint angles) or shape (surfaces or joint positions) and they both use a set of image features as observations for state inference. The computational goal in both cases is the conditional distribution for the model state given image observations. Generative model-based approaches [6, 16, 14, 13] have been demonstrated to Ä?Ĺš‚exibly reconstruct complex unknown human motions and to naturally handle problem constraints. However it is difÄ?Ĺš cult to construct reliable observation likelihoods due to the complexity of modeling human appearance. This varies widely due to different clothing and deformation, body proportions or lighting conditions. Besides being somewhat indirect, the generative approach further imposes strict conditional independence assumptions on the temporal observations given the states in order to ensure computational tractability. Due to these factors inference is expensive and produces highly multimodal state distributions [6, 16, 13]. Generative inference algorithms require complex annealing schedules [6, 13] or systematic non-linear search for local optima [16] in order to ensure continuing tracking. These difÄ?Ĺš culties motivate the advent of a complementary class of discriminative algorithms [10, 12, 18, 2], that approximate the state conditional directly, in order to simplify inference. However, inverse, observation-to-state multivalued mappings are difÄ?Ĺš cult to learn (see e.g. Ä?Ĺš g. 1a) and a probabilistic temporal setting is necessary. In an earlier paper [15] we introduced a probabilistic discriminative framework for human motion reconstruction. Because the method operates in the originally selected state and observation spaces that can be task generic, therefore redundant and often high-dimensional, inference is more expensive and can be less robust. To summarize, reconstructing 3D human motion in a Figure 1: (a, Left) Example of 180o ambiguity in predicting 3D human poses from silhouette image features (center). It is essential that multiple plausible solutions (e.g. F 1 and F2 ) are correctly represented and tracked over time. A single state predictor will either average the distant solutions or zig-zag between them, see also tables 1 and 2. (b, Right) A conditional chain model. The local distributions p(yt |yt−1 , zt ) or p(yt |zt ) are learned as in Ä?Ĺš g. 2. For inference, the predicted local state conditional is recursively combined with the Ä?Ĺš ltered prior c.f . (1). conditional temporal framework poses the following difÄ?Ĺš culties: (i) The mapping between temporal observations and states is multivalued (i.e. the local conditional distributions to be learned are multimodal), therefore it cannot be accurately represented using global function approximations. (ii) Human models have multivariate, high-dimensional continuous states of 50 or more human joint angles. The temporal state conditionals are multimodal which makes efÄ?Ĺš cient Kalman Ä?Ĺš ltering algorithms inapplicable. General inference methods (particle Ä?Ĺš lters, mixtures) have to be used instead, but these are expensive for high-dimensional models (e.g. when reconstructing the motion of several people that operate in a joint state space). (iii) The components of the human state and of the silhouette observation vector exhibit strong correlations, because many repetitive human activities like walking or running have low intrinsic dimensionality. It appears wasteful to work with high-dimensional states of 50+ joint angles. Even if the space were truly high-dimensional, predicting correlated state dimensions independently may still be suboptimal. In this paper we present a conditional temporal estimation algorithm that restricts visual inference to low-dimensional, kernel induced state spaces. To exploit correlations among observations and among state variables, we model the local, temporal conditional distributions using ideas from Kernel PCA [11, 19] and conditional mixture modeling [7, 5], here adapted to produce multiple probabilistic predictions. The corresponding predictor is referred to as a Conditional Bayesian Mixture of Low-dimensional Kernel-Induced Experts (kBME). By integrating it within a conditional graphical model framework (Ä?Ĺš g. 1b), we can exploit temporal constraints probabilistically. We demonstrate that this methodology is effective for reconstructing the 3D motion of multiple people in monocular video. Our contribution w.r.t. [15] is a probabilistic conditional inference framework that operates over a non-linear, kernel-induced low-dimensional state spaces, and a set of experiments (on both real and artiÄ?Ĺš cial image sequences) that show how the proposed framework positively compares with powerful predictors based on KDE, PCA, or with the high-dimensional models of [15] at a fraction of their cost. 2 Probabilistic Inference in a Kernel Induced State Space We work with conditional graphical models with a chain structure [9], as shown in Ä?Ĺš g. 1b, These have continuous temporal states yt , t = 1 . . . T , observations zt . For compactness, we denote joint states Yt = (y1 , y2 , . . . , yt ) or joint observations Zt = (z1 , . . . , zt ). Learning and inference are based on local conditionals: p(yt |zt ) and p(yt |yt−1 , zt ), with yt and zt being low-dimensional, kernel induced representations of some initial model having state xt and observation rt . We obtain zt , yt from rt , xt using kernel PCA [11, 19]. Inference is performed in a low-dimensional, non-linear, kernel induced latent state space (see Ä?Ĺš g. 1b and Ä?Ĺš g. 2 and (1)). For display or error reporting, we compute the original conditional p(x|r), or a temporally Ä?Ĺš ltered version p(xt |Rt ), Rt = (r1 , r2 , . . . , rt ), using a learned pre-image state map [3]. 2.1 Density Propagation for Continuous Conditional Chains For online Ä?Ĺš ltering, we compute the optimal distribution p(yt |Zt ) for the state yt , conditioned by observations Zt up to time t. The Ä?Ĺš ltered density can be recursively derived as: p(yt |Zt ) = p(yt |yt−1 , zt )p(yt−1 |Zt−1 ) (1) yt−1 We compute using a conditional mixture for p(yt |yt−1 , zt ) (a Bayesian mixture of experts c.f . §2.2) and the prior p(yt−1 |Zt−1 ), each having, say M components. We integrate M 2 pairwise products of Gaussians analytically. The means of the expanded posterior are clustered and the centers are used to initialize a reduced M -component Kullback-Leibler approximation that is reÄ?Ĺš ned using gradient descent [15]. The propagation rule (1) is similar to the one used for discrete state labels [9], but here we work with multivariate continuous state spaces and represent the local multimodal state conditionals using kBME (Ä?Ĺš g. 2), and not log-linear models [9] (these would require intractable normalization). This complex continuous model rules out inference based on Kalman Ä?Ĺš ltering or dynamic programming [9]. 2.2 Learning Bayesian Mixtures over Kernel Induced State Spaces (kBME) In order to model conditional mappings between low-dimensional non-linear spaces we rely on kernel dimensionality reduction and conditional mixture predictors. The authors of KDE [19] propose a powerful structured unimodal predictor. This works by decorrelating the output using kernel PCA and learning a ridge regressor between the input and each decorrelated output dimension. Our procedure is also based on kernel PCA but takes into account the structure of the studied visual problem where both inputs and outputs are likely to be low-dimensional and the mapping between them multivalued. The output variables xi are projected onto the column vectors of the principal space in order to obtain their principal coordinates y i . A z ∈ P(Fr ) O p(y|z) kP CA ĂŽĹšr (r) ⊂ Fr O / y ∈ P(Fx ) O QQQ QQQ QQQ kP CA QQQ Q( ĂŽĹšx (x) ⊂ Fx x ≈ PreImage(y) O ĂŽĹšr ĂŽĹšx r ∈ R ⊂ Rr x ∈ X ⊂ Rx p(x|r) ≈ p(x|y) Figure 2: The learned low-dimensional predictor, kBME, for computing p(x|r) â‰Ä„ p(xt |rt ), ∀t. (We similarly learn p(xt |xt−1 , rt ), with input (x, r) instead of r – here we illustrate only p(x|r) for clarity.) The input r and the output x are decorrelated using Kernel PCA to obtain z and y respectively. The kernels used for the input and output are ĂŽĹš r and ĂŽĹšx , with induced feature spaces Fr and Fx , respectively. Their principal subspaces obtained by kernel PCA are denoted by P(Fr ) and P(Fx ), respectively. A conditional Bayesian mixture of experts p(y|z) is learned using the low-dimensional representation (z, y). Using learned local conditionals of the form p(yt |zt ) or p(yt |yt−1 , zt ), temporal inference can be efÄ?Ĺš ciently performed in a low-dimensional kernel induced state space (see e.g. (1) and Ä?Ĺš g. 1b). For visualization and error measurement, the Ä?Ĺš ltered density, e.g. p(yt |Zt ), can be mapped back to p(xt |Rt ) using the pre-image c.f . (3). similar procedure is performed on the inputs ri to obtain zi . In order to relate the reduced feature spaces of z and y (P(Fr ) and P(Fx )), we estimate a probability distribution over mappings from training pairs (zi , yi ). We use a conditional Bayesian mixture of experts (BME) [7, 5] in order to account for ambiguity when mapping similar, possibly identical reduced feature inputs to very different feature outputs, as common in our problem (Ä?Ĺš g. 1a). This gives a model that is a conditional mixture of low-dimensional kernel-induced experts (kBME): M g(z|δ j )N (y|Wj z, ĂŽĹ j ) p(y|z) = (2) j=1 where g(z|δ j ) is a softmax function parameterized by δ j and (Wj , ĂŽĹ j ) are the parameters and the output covariance of expert j, here a linear regressor. As in many Bayesian settings [17, 5], the weights of the experts and of the gates, Wj and δ j , are controlled by hierarchical priors, typically Gaussians with 0 mean, and having inverse variance hyperparameters controlled by a second level of Gamma distributions. We learn this model using a double-loop EM and employ ML-II type approximations [8, 17] with greedy (weight) subset selection [17, 15]. Finally, the kBME algorithm requires the computation of pre-images in order to recover the state distribution x from it’s image y ∈ P(Fx ). This is a closed form computation for polynomial kernels of odd degree. For more general kernels optimization or learning (regression based) methods are necessary [3]. Following [3, 19], we use a sparse Bayesian kernel regressor to learn the pre-image. This is based on training data (xi , yi ): p(x|y) = N (x|AĂŽĹšy (y), â„Ĺš) (3) with parameters and covariances (A, â„Ĺš). Since temporal inference is performed in the low-dimensional kernel induced state space, the pre-image function needs to be calculated only for visualizing results or for the purpose of error reporting. Propagating the result from the reduced feature space P(Fx ) to the output space X pro- duces a Gaussian mixture with M elements, having coefÄ?Ĺš cients g(z|δ j ) and components N (x|AĂŽĹšy (Wj z), AJĂŽĹšy ĂŽĹ j JĂŽĹšy A + â„Ĺš), where JĂŽĹšy is the Jacobian of the mapping ĂŽĹšy . 3 Experiments We run experiments on both real image sequences (Ä?Ĺš g. 5 and Ä?Ĺš g. 6) and on sequences where silhouettes were artiÄ?Ĺš cially rendered. The prediction error is reported in degrees (for mixture of experts, this is w.r.t. the most probable one, but see also Ä?Ĺš g. 4a), and normalized per joint angle, per frame. The models are learned using standard cross-validation. Pre-images are learned using kernel regressors and have average error 1.7o . Training Set and Model State Representation: For training we gather pairs of 3D human poses together with their image projections, here silhouettes, using the graphics package Maya. We use realistically rendered computer graphics human surface models which we animate using human motion capture [1]. Our original human representation (x) is based on articulated skeletons with spherical joints and has 56 skeletal d.o.f. including global translation. The database consists of 8000 samples of human activities including walking, running, turns, jumps, gestures in conversations, quarreling and pantomime. Image Descriptors: We work with image silhouettes obtained using statistical background subtraction (with foreground and background models). Silhouettes are informative for pose estimation although prone to ambiguities (e.g. the left / right limb assignment in side views) or occasional lack of observability of some of the d.o.f. (e.g. 180o ambiguities in the global azimuthal orientation for frontal views, e.g. Ä?Ĺš g. 1a). These are multiplied by intrinsic forward / backward monocular ambiguities [16]. As observations r, we use shape contexts extracted on the silhouette [4] (5 radial, 12 angular bins, size range 1/8 to 3 on log scale). The features are computed at different scales and sizes for points sampled on the silhouette. To work in a common coordinate system, we cluster all features in the training set into K = 50 clusters. To compute the representation of a new shape feature (a point on the silhouette), we ‘project’ onto the common basis by (inverse distance) weighted voting into the cluster centers. To obtain the representation (r) for a new silhouette we regularly sample 200 points on it and add all their feature vectors into a feature histogram. Because the representation uses overlapping features of the observation the elements of the descriptor are not independent. However, a conditional temporal framework (Ä?Ĺš g. 1b) Ä?Ĺš‚exibly accommodates this. For experiments, we use Gaussian kernels for the joint angle feature space and dot product kernels for the observation feature space. We learn state conditionals for p(yt |zt ) and p(yt |yt−1 , zt ) using 6 dimensions for the joint angle kernel induced state space and 25 dimensions for the observation induced feature space, respectively. In Ä?Ĺš g. 3b) we show an evaluation of the efÄ?Ĺš cacy of our kBME predictor for different dimensions in the joint angle kernel induced state space (the observation feature space dimension is here 50). On the analyzed dancing sequence, that involves complex motions of the arms and the legs, the non-linear model signiÄ?Ĺš cantly outperforms alternative PCA methods and gives good predictions for compact, low-dimensional models.1 In tables 1 and 2, as well as Ä?Ĺš g. 4, we perform quantitative experiments on artiÄ?Ĺš cially rendered silhouettes. 3D ground truth joint angles are available and this allows a more 1 Running times: On a Pentium 4 PC (3 GHz, 2 GB RAM), a full dimensional BME model with 5 experts takes 802s to train p(xt |xt−1 , rt ), whereas a kBME (including the pre-image) takes 95s to train p(yt |yt−1 , zt ). The prediction time is 13.7s for BME and 8.7s (including the pre-image cost 1.04s) for kBME. The integration in (1) takes 2.67s for BME and 0.31s for kBME. The speed-up for kBME is signiÄ?Ĺš cant and likely to increase with original models having higher dimensionality. Prediction Error Number of Clusters 100 1000 100 10 1 1 2 3 4 5 6 7 8 Degree of Multimodality kBME KDE_RVM PCA_BME PCA_RVM 10 1 0 20 40 Number of Dimensions 60 Figure 3: (a, Left) Analysis of ‘multimodality’ for a training set. The input zt dimension is 25, the output yt dimension is 6, both reduced using kPCA. We cluster independently in (yt−1 , zt ) and yt using many clusters (2100) to simulate small input perturbations and we histogram the yt clusters falling within each cluster in (yt−1 , zt ). This gives intuition on the degree of ambiguity in modeling p(yt |yt−1 , zt ), for small perturbations in the input. (b, Right) Evaluation of dimensionality reduction methods for an artiÄ?Ĺš cial dancing sequence (models trained on 300 samples). The kBME is our model §2.2, whereas the KDE-RVM is a KDE model learned with a Relevance Vector Machine (RVM) [17] feature space map. PCA-BME and PCA-RVM are models where the mappings between feature spaces (obtained using PCA) is learned using a BME and a RVM. The non-linearity is signiÄ?Ĺš cant. Kernel-based methods outperform PCA and give low prediction error for 5-6d models. systematic evaluation. Notice that the kernelized low-dimensional models generally outperform the PCA ones. At the same time, they give results competitive to the ones of high-dimensional BME predictors, while being lower-dimensional and therefore signiÄ?Ĺš cantly less expensive for inference, e.g. the integral in (1). In Ä?Ĺš g. 5 and Ä?Ĺš g. 6 we show human motion reconstruction results for two real image sequences. Fig. 5 shows the good quality reconstruction of a person performing an agile jump. (Given the missing observations in a side view, 3D inference for the occluded body parts would not be possible without using prior knowledge!) For this sequence we do inference using conditionals having 5 modes and reduced 6d states. We initialize tracking using p(yt |zt ), whereas for inference we use p(yt |yt−1 , zt ) within (1). In the second sequence in Ä?Ĺš g. 6, we simultaneously reconstruct the motion of two people mimicking domestic activities, namely washing a window and picking an object. Here we do inference over a product, 12-dimensional state space consisting of the joint 6d state of each person. We obtain good 3D reconstruction results, using only 5 hypotheses. Notice however, that the results are not perfect, there are small errors in the elbow and the bending of the knee for the subject at the l.h.s., and in the different wrist orientations for the subject at the r.h.s. This reÄ?Ĺš‚ects the bias of our training set. Walk and turn Conversation Run and turn left KDE-RR 10.46 7.95 5.22 RVM 4.95 4.96 5.02 KDE-RVM 7.57 6.31 6.25 BME 4.27 4.15 5.01 kBME 4.69 4.79 4.92 Table 1: Comparison of average joint angle prediction error for different models. All kPCA-based models use 6 output dimensions. Testing is done on 100 video frames for each sequence, the inputs are artiÄ?Ĺš cially generated silhouettes, not in the training set. 3D joint angle ground truth is used for evaluation. KDE-RR is a KDE model with ridge regression (RR) for the feature space mapping, KDE-RVM uses an RVM. BME uses a Bayesian mixture of experts with no dimensionality reduction. kBME is our proposed model. kPCAbased methods use kernel regressors to compute pre-images. Expert Prediction Frequency − Closest to Ground truth Frequency − Close to ground truth 30 25 20 15 10 5 0 1 2 3 4 Expert Number 14 10 8 6 4 2 0 5 1st Probable Prev Output 2nd Probable Prev Output 3rd Probable Prev Output 4th Probable Prev Output 5th Probable Prev Output 12 1 2 3 4 Current Expert 5 Figure 4: (a, Left) Histogram showing the accuracy of various expert predictors: how many times the expert ranked as the k-th most probable by the model (horizontal axis) is closest to the ground truth. The model is consistent (the most probable expert indeed is the most accurate most frequently), but occasionally less probable experts are better. (b, Right) Histograms show the dynamics of p(yt |yt−1 , zt ), i.e. how the probability mass is redistributed among experts between two successive time steps, in a conversation sequence. Walk and turn back Run and turn KDE-RR 7.59 17.7 RVM 6.9 16.8 KDE-RVM 7.15 16.08 BME 3.6 8.2 kBME 3.72 8.01 Table 2: Joint angle prediction error computed for two complex sequences with walks, runs and turns, thus more ambiguity (100 frames). Models have 6 state dimensions. Unimodal predictors average competing solutions. kBME has signiÄ?Ĺš cantly lower error. Figure 5: Reconstruction of a jump (selected frames). Top: original image sequence. Middle: extracted silhouettes. Bottom: 3D reconstruction seen from a synthetic viewpoint. 4 Conclusion We have presented a probabilistic framework for conditional inference in latent kernelinduced low-dimensional state spaces. Our approach has the following properties: (a) Figure 6: Reconstructing the activities of 2 people operating in an 12-d state space (each person has its own 6d state). Top: original image sequence. Bottom: 3D reconstruction seen from a synthetic viewpoint. Accounts for non-linear correlations among input or output variables, by using kernel nonlinear dimensionality reduction (kPCA); (b) Learns probability distributions over mappings between low-dimensional state spaces using conditional Bayesian mixture of experts, as required for accurate prediction. In the resulting low-dimensional kBME predictor ambiguities and multiple solutions common in visual, inverse perception problems are accurately represented. (c) Works in a continuous, conditional temporal probabilistic setting and offers a formal management of uncertainty. We show comparisons that demonstrate how the proposed approach outperforms regression, PCA or KDE alone for reconstructing the 3D human motion in monocular video. Future work we will investigate scaling aspects for large training sets and alternative structured prediction methods. References [1] CMU Human Motion DataBase. Online at http://mocap.cs.cmu.edu/search.html, 2003. [2] A. Agarwal and B. Triggs. 3d human pose from silhouettes by Relevance Vector Regression. In CVPR, 2004. [3] G. Bakir, J. Weston, and B. Scholkopf. Learning to Ä?Ĺš nd pre-images. In NIPS, 2004. [4] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. PAMI, 24, 2002. [5] C. Bishop and M. Svensen. Bayesian mixtures of experts. In UAI, 2003. [6] J. Deutscher, A. Blake, and I. Reid. Articulated Body Motion Capture by Annealed Particle Filtering. In CVPR, 2000. [7] M. Jordan and R. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, (6):181–214, 1994. [8] D. Mackay. Bayesian interpolation. Neural Computation, 4(5):720–736, 1992. [9] A. McCallum, D. Freitag, and F. Pereira. Maximum entropy Markov models for information extraction and segmentation. In ICML, 2000. [10] R. Rosales and S. Sclaroff. Learning Body Pose Via Specialized Maps. In NIPS, 2002. [11] B. Sch¨ lkopf, A. Smola, and K. M¨ ller. Nonlinear component analysis as a kernel eigenvalue o u problem. Neural Computation, 10:1299–1319, 1998. [12] G. Shakhnarovich, P. Viola, and T. Darrell. Fast Pose Estimation with Parameter Sensitive Hashing. In ICCV, 2003. [13] L. Sigal, S. Bhatia, S. Roth, M. Black, and M. Isard. Tracking Loose-limbed People. In CVPR, 2004. [14] C. Sminchisescu and A. Jepson. Generative Modeling for Continuous Non-Linearly Embedded Visual Inference. In ICML, pages 759–766, Banff, 2004. [15] C. Sminchisescu, A. Kanaujia, Z. Li, and D. Metaxas. Discriminative Density Propagation for 3D Human Motion Estimation. In CVPR, 2005. [16] C. Sminchisescu and B. Triggs. Kinematic Jump Processes for Monocular 3D Human Tracking. In CVPR, volume 1, pages 69–76, Madison, 2003. [17] M. Tipping. Sparse Bayesian learning and the Relevance Vector Machine. JMLR, 2001. [18] C. Tomasi, S. Petrov, and A. Sastry. 3d tracking = classiÄ?Ĺš cation + interpolation. In ICCV, 2003. [19] J. Weston, O. Chapelle, A. Elisseeff, B. Scholkopf, and V. Vapnik. Kernel dependency estimation. In NIPS, 2002.
4 0.4719671 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations
Author: Bhaskar D. Rao, David P. Wipf
Abstract: Given a redundant dictionary of basis vectors (or atoms), our goal is to find maximally sparse representations of signals. Previously, we have argued that a sparse Bayesian learning (SBL) framework is particularly well-suited for this task, showing that it has far fewer local minima than other Bayesian-inspired strategies. In this paper, we provide further evidence for this claim by proving a restricted equivalence condition, based on the distribution of the nonzero generating model weights, whereby the SBL solution will equal the maximally sparse representation. We also prove that if these nonzero weights are drawn from an approximate Jeffreys prior, then with probability approaching one, our equivalence condition is satisfied. Finally, we motivate the worst-case scenario for SBL and demonstrate that it is still better than the most widely used sparse representation algorithms. These include Basis Pursuit (BP), which is based on a convex relaxation of the ℓ0 (quasi)-norm, and Orthogonal Matching Pursuit (OMP), a simple greedy strategy that iteratively selects basis vectors most aligned with the current residual. 1
5 0.471641 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
Author: Jeremy Kubica, Joseph Masiero, Robert Jedicke, Andrew Connolly, Andrew W. Moore
Abstract: In this paper we consider the problem of finding sets of points that conform to a given underlying model from within a dense, noisy set of observations. This problem is motivated by the task of efficiently linking faint asteroid detections, but is applicable to a range of spatial queries. We survey current tree-based approaches, showing a trade-off exists between single tree and multiple tree algorithms. To this end, we present a new type of multiple tree algorithm that uses a variable number of trees to exploit the advantages of both approaches. We empirically show that this algorithm performs well using both simulated and astronomical data.
6 0.46658745 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach
7 0.46596703 30 nips-2005-Assessing Approximations for Gaussian Process Classification
8 0.4658027 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity
9 0.46437988 144 nips-2005-Off-policy Learning with Options and Recognizers
10 0.46314594 74 nips-2005-Faster Rates in Regression via Active Learning
11 0.46204796 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification
12 0.46053424 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
13 0.46042794 78 nips-2005-From Weighted Classification to Policy Search
14 0.46039438 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation
15 0.45945069 184 nips-2005-Structured Prediction via the Extragradient Method
16 0.45758528 50 nips-2005-Convex Neural Networks
17 0.4570545 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction
18 0.45701015 23 nips-2005-An Application of Markov Random Fields to Range Sensing
19 0.45649508 102 nips-2005-Kernelized Infomax Clustering
20 0.45638552 32 nips-2005-Augmented Rescorla-Wagner and Maximum Likelihood Estimation