nips nips2007 nips2007-43 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Céline Levy-leduc, Zaïd Harchaoui
Abstract: We propose a new approach for dealing with the estimation of the location of change-points in one-dimensional piecewise constant signals observed in white noise. Our approach consists in reframing this task in a variable selection context. We use a penalized least-squares criterion with a 1 -type penalty for this purpose. We prove some theoretical results on the estimated change-points and on the underlying piecewise constant estimated function. Then, we explain how to implement this method in practice by combining the LAR algorithm and a reduced version of the dynamic programming algorithm and we apply it to synthetic and real data. 1
Reference: text
sentIndex sentText sentNum sentScore
1 fr Abstract We propose a new approach for dealing with the estimation of the location of change-points in one-dimensional piecewise constant signals observed in white noise. [sent-2, score-0.243]
2 Our approach consists in reframing this task in a variable selection context. [sent-3, score-0.105]
3 We use a penalized least-squares criterion with a 1 -type penalty for this purpose. [sent-4, score-0.177]
4 We prove some theoretical results on the estimated change-points and on the underlying piecewise constant estimated function. [sent-5, score-0.312]
5 Then, we explain how to implement this method in practice by combining the LAR algorithm and a reduced version of the dynamic programming algorithm and we apply it to synthetic and real data. [sent-6, score-0.156]
6 1 Introduction Change-points detection tasks are pervasive in various fields, ranging from audio [10] to EEG segmentation [5]. [sent-7, score-0.14]
7 The goal is to partition a signal into several homogeneous segments of variable durations, in which some quantity remains approximately constant over time. [sent-8, score-0.166]
8 This issue was addressed in a large literature (see [20] [11]), where the problem was tackled both from an online (sequential) [1] and an off-line (retrospective) [5] points of view. [sent-9, score-0.042]
9 Most off-line approaches rely on a Dynamic Programming algorithm (DP), allowing to retrieve K change-points within n observations of a signal with a complexity of O(Kn2 ) in time [11]. [sent-10, score-0.128]
10 Moreover, one often observes a sub-optimal behavior of the raw DP algorithm on real datasets. [sent-12, score-0.048]
11 We suggest here to slightly depart from this line of research, by focusing on a reformulation of change-point estimation in a variable selection framework. [sent-13, score-0.25]
12 Then, estimating change-point locations off-line turns into performing variable selection on dummy variables representing all possible change-point locations. [sent-14, score-0.168]
13 This allows us to take advantage of the latest theoretical [23], [3] and practical [7] advances in regression with Lasso penalty. [sent-15, score-0.086]
14 Indeed, Lasso provides us with a very efficient method for selecting potential change-point locations. [sent-16, score-0.065]
15 This selection is then refined by using the DP algorithm to estimate the change-point locations. [sent-17, score-0.074]
16 In Section 2, we first describe our theoretical reformulation of off-line change-point estimation as regression with a Lasso penalty. [sent-19, score-0.201]
17 Then, we show that the estimated magnitude of jumps are close in mean, in a sense to be precized, to the true magnitude of jumps. [sent-20, score-0.257]
18 We also give a non asymptotic inequality to upper-bound the 2 -loss of the true underlying piecewise constant function and the estimated one. [sent-21, score-0.393]
19 1 Theoretical approach Framework We describe, in this section, how off-line change-point estimation can be cast as a variable selection problem. [sent-26, score-0.208]
20 Off-line estimation of change-point locations within a signal (Yt ) consists in estimating the τk ’s in the following model: Yt = µk + εt , t = 1, . [sent-27, score-0.215]
21 The above multiple change-point estimation problem (1) can thus be tackled as a variable selection one: Minimize Yn − Xn β β 2 n subject to β 1 ≤s, (3) n where u 1 and u n are defined for a vector u = (u1 , . [sent-41, score-0.26]
22 , un ) ∈ Rn by u 1 = j=1 |uj | n 2 −1 2 and u n = n j=1 uj respectively. [sent-44, score-0.046]
23 ,µn 1 n n n−1 t=1 (Yt − µt )2 subject to t=1 |µt+1 − µt | ≤ s, (4) which consists in imposing an 1 -constraint on the magnitude of jumps. [sent-48, score-0.045]
24 It is related to the popular Least Absolute Shrinkage eStimatOr (LASSO) in least-square regression of [21], used for efficient variable selection. [sent-50, score-0.073]
25 In the next section, we provide two results supporting the use of the formulation (3) for off-line multiple change-point estimation. [sent-51, score-0.044]
26 We show that estimates of jumps minimizing (3) are consistent in mean, and we provide a non asymptotic upper bound for the 2 loss of the underlying estimated piecewise constant function and the true underlying piecewise function. [sent-52, score-0.611]
27 This inequality shows that, at a precized rate, the estimated piecewise constant function tends to the true piecewise constant function with a probability tending to one. [sent-53, score-0.563]
28 2 Main results In this section, we shall study the properties of the solutions of the problem (3) defined by ˆ β n (λ) = Arg min β Yn − Xn β 2 n +λ β 1 . [sent-55, score-0.104]
29 It maps positive entry to 1, negative entry to -1 and a null entry to zero. [sent-57, score-0.12]
30 The condition (8) is not fulfilled in the 2 change-point framework implying that we cannot have a perfect estimation of the change-points as it is already known, see [13]. [sent-64, score-0.069]
31 In the following, we shall assume that the number of break points is equal to K . [sent-66, score-0.104]
32 The following proposition ensures that for a large enough value of n the estimated change-point locations are close to the true change-points. [sent-67, score-0.241]
33 Assume that the observations (Yn ) are given by (2) and that the εn ’s are centered. [sent-69, score-0.045]
34 For this, we denote β n (λ) the estimator ˆ β n (λ) under the absence of noise and γn (λ) the bias associated to the Lasso estimator: γn (λ) = β n (λ) − β n . [sent-73, score-0.068]
35 For notational simplicity, we shall write γ instead of γn (λ). [sent-74, score-0.104]
36 The following proposition ensures, thanks to a non asymptotic result, that the estimated underlying piecewise function is close to the true piecewise constant function. [sent-79, score-0.72]
37 Assume that the observations (Yn ) are given by (2) and that the εn ’s are centered iid j n Gaussian random variables with variance σ 2 > 0. [sent-81, score-0.082]
38 Then, using the fact that the εi ’s are iid zero-mean Gaussian random variables, we obtain n n n n2 λ2 −1 n ¯ ≤ n ≤ P(E) P εi > λ exp − 2 . [sent-90, score-0.037]
39 ˆ ˆ Estimation with a Lasso penalty We compute the first Kmax non-null coefficients βτ1 , . [sent-96, score-0.065]
40 , βτKmax on the regularization path of the LASSO problem (3). [sent-99, score-0.105]
41 The LAR/LASSO algorithm, as described in [7], provides an efficient algorithm to compute the entire regularization path for the LASSO problem. [sent-100, score-0.105]
42 ˆ Since j |βj | ≤ s is a sparsity-enforcing constraint, the set {j, βj = 0} = {τj } becomes larger as we run through the regularization path. [sent-101, score-0.051]
43 We shall denote by S the Kmax -selected variables: S = {τ1 , . [sent-102, score-0.104]
44 (9) The computational complexity of the Kmax -long regularization path of LASSO solutions is 3 2 O(Kmax + Kmax n). [sent-106, score-0.105]
45 Most of the time, we can see that the Lasso effectively catches the true changepoint but also irrelevant change-points at the vicinity of the true ones. [sent-107, score-0.2]
46 Therefore, we propose to refine the set of change-points caught by the Lasso by performing a post-selection. [sent-108, score-0.06]
47 Reduced Dynamic Programming algorithm One can consider several strategies to remove irrelevant change-points from the ones retrieved by the Lasso. [sent-109, score-0.116]
48 Among them, since usually in applications, one is only interested in change-point estimation up to a given accuracy, we could launch the Lasso on a subsample of the signal. [sent-110, score-0.137]
49 Here, our reduced DP calculations (rDP) scales 2 as O(Kmax Kmax ) where Kmax is the maximum number of change-points/variables selected by LAR/LASSO algorithm. [sent-128, score-0.049]
50 Since typically Kmax n, our method thus provides a reduction of the computational burden associated with the classical change-points detection approach which consists in running the DP algorithm over all the n observations. [sent-129, score-0.069]
51 As n → ∞, according to [15], the ratio ρk = J(k + 1)/J(k) should show different qualitative behavior when k K and when k > K , K being the true number of change-points. [sent-131, score-0.082]
52 Actually we found out that Cn was close to 1, even in small-sample settings, for various experimental designs in terms of noise variance and true number of change-points. [sent-133, score-0.143]
53 Hence, conciliating theoretical guidance in large-sample setting and experimental findings in fixed-sample setting, we suggest the following rule of thumb for selectˆ ˆ ing the number of change-points K : K = Mink≥1 {ρk ≥ 1 − ν} , where ρk = J(k + 1)/J(k). [sent-134, score-0.044]
54 Cachalot Algorithm Input • Vector of observations Y ∈ Rn • Upper bound Kmax on the number of change-points • Model selection threshold ν Processing 1. [sent-135, score-0.119]
55 , βτKmax ) on the regularization path with the LAR/LASSO algorithm. [sent-139, score-0.105]
56 Launch the rDP algorithm on the set of potential change-points (τ1 , . [sent-141, score-0.033]
57 Select the smallest subset of the potential change-points (τ1 , . [sent-146, score-0.033]
58 ˆ ˆˆ To illustrate our algorithm, we consider observations (Yn ) satisfying model (2) with (β30 , β50 , β70 , β90 ) = (5, −3, 4, −2), the other βj being equal to zero, n = 100 and εn a Gaussian random vector with a covariance matrix equal to Id, Id being a n × n identity matrix. [sent-154, score-0.045]
59 The set of the first nine active variables caught by the Lasso along the regularization path, i. [sent-155, score-0.111]
60 The set S contains the true change-points but also irrelevant ones close to the true change-points. [sent-158, score-0.17]
61 This supports the use of the reduced version of the DP algorithm hereafter. [sent-160, score-0.049]
62 τ ˆ Table 1: Toy example: The empirical risk J and the estimated change-points as a function of the possible number of change-points K K 0 1 2 3 4 5 6 7 8 9 J(K) 696. [sent-168, score-0.047]
63 , τK ) τ ˆ ∅ 30 (30,70) (30,50,69) (30,50,70,90) (30,50,69,70,90) (21,30,50,69,70,90) (21,29,30,50,69,70,90) (21,23,29,30,50,69,70,90) (21,23,28,29,30,50,69,70,90) The different values of the ratio ρk for k = 0, . [sent-181, score-0.036]
64 , 8 of the model selection procedure are given in ˆ Table 2. [sent-184, score-0.074]
65 We conclude, as expected, that K = 4 and that the change-points are (30, 50, 70, 90), thanks to the results obtained in Table 1. [sent-187, score-0.068]
66 4 Discussion Off-line multiple change-point estimation has recently received much attention in theoretical works, both in a non-asymptotic and in an asymptotic setting by [17] and [13] respectively. [sent-188, score-0.221]
67 From a practical point of view, retrieving the set of change-point locations {τ1 , . [sent-189, score-0.063]
68 , τK } is challenging, since it is 5 Table 2: Toy example: The values of the ratio (ρk = J(k + 1)/J(k), k = 0, . [sent-192, score-0.036]
69 Yet, a dynamic programming algorithm (DP), proposed by [9] and [2], allows to explore all the configurations with a complexity of O(n3 ) in time. [sent-206, score-0.107]
70 Then selecting the number of change-points is usually performed thanks to a Schwarz-like penalty λn K, where λn has to be calibrated on data [13] [12], or a penalty K(a + b log(n/K)) as in [17] [14], where a and b are data-driven as well. [sent-207, score-0.23]
71 We should also mention that an abundant literature tackles both change-point estimation and model selection issues from a Bayesian point of view (see [20] [8] and references therein). [sent-208, score-0.211]
72 All approaches cited above rely on DP, or variants in Bayesian settings, and hence yield a computational complexity of O(n3 ), which makes them inappropriate for very largescale signal segmentation. [sent-209, score-0.083]
73 Moreover, despite its theoretical optimality in a maximum likelihood framework, raw DP may sometimes have poor performances when applied to very noisy observations. [sent-210, score-0.149]
74 Our alternative framework for multiple change-point estimation was previously elusively mentioned several times, e. [sent-211, score-0.113]
75 However up to our knowledge neither successful practical implementation nor theoretical grounding was given so far to support such an approach for change-point estimation. [sent-214, score-0.044]
76 Let us also mention [22], where the Fused Lasso is applied in a similar yet different way to perform hot-spot detection. [sent-215, score-0.038]
77 However, this approach includes an additional penalty, penalizing departures from the overall mean of the observations, and should thus rather be considered as an outlier detection method. [sent-216, score-0.069]
78 1 Comparison with other methods Synthetic data We propose to compare our algorithm with a recent method based on a penalized least-squares criterion studied by [12]. [sent-218, score-0.112]
79 We shall use Recall and Precision as relevant performance measures to analyze the previous two algorithms. [sent-226, score-0.104]
80 More precisely, the Recall corresponds to the ratio of change-points retrieved by a method with those really present in the data. [sent-227, score-0.104]
81 As for the Precision, it corresponds to the number of change-points retrieved divided by the number of suggested change-points. [sent-228, score-0.068]
82 We shall also estimate the probability of false alarm corresponding to the number of suggested change-points which are not present in the signal divided by the number of true change-points. [sent-229, score-0.352]
83 To compute the precision and the recall of methods A and B, we ran Monte-Carlo experiments. [sent-230, score-0.137]
84 More precisely, we sampled 30 configurations of change-points for each real number of change-points K equal to 5, 10, 15 and 20 within a signal containing 500 observations. [sent-231, score-0.083]
85 We used the following setting for the noise: for each configuration of change-points and levels, we synthesized a Gaussian white noise such that the standard deviation is set to a multiple of the minimum magnitude jump between two contiguous segments, i. [sent-234, score-0.195]
86 Performances in recall are comparable whereas method A provides better results than method B in terms of precision and false alarm rate. [sent-240, score-0.256]
87 04 Table 5: False alarm rate of methods A and B K =5 A B 0. [sent-367, score-0.063]
88 The multiple change-point estimation method should then, either be used after a data cleaning step (median filtering [6]), or explicitly make heavy-tailed noise distribution assumption. [sent-438, score-0.15]
89 The results given by our method applied to the welllog data processed with a median filter are displayed in Figure 1 for Kmax = 200 and 1 − ν = 0. [sent-440, score-0.138]
90 A least-square criterion with a Lasso-penalty yields an efficient primary estimation of change-point locations. [sent-465, score-0.109]
91 Yet these change-point location estimates can be further refined thanks to a reduced dynamic programming algorithm. [sent-466, score-0.224]
92 We obtained competitive performances on both artificial and real data, in terms of precision, recall and false alarm. [sent-467, score-0.176]
93 Thus, Cachalot is a computationally efficient multiple change-point estimation method, paving the way for processing large datasets. [sent-468, score-0.113]
94 On the approximation of curves by line segments using dynamic programming. [sent-477, score-0.11]
95 Consistencies and rates of convergence of jump penalized least squares estimators. [sent-491, score-0.111]
96 Exact and efficient bayesian inference for multiple changepoint problems. [sent-513, score-0.139]
97 On the correlation of automatic audio and visual segmentation of music videos. [sent-524, score-0.071]
98 Least-squares estimation of an unknown number of shifts in a time series. [sent-539, score-0.069]
99 Detecting multiple change-points in the mean of a gaussian process by model selection. [sent-543, score-0.044]
100 Spatial smoothing and hot spot detection for cgh data using the fused lasso. [sent-584, score-0.159]
wordName wordTfidf (topN-words)
[('kmax', 0.656), ('lasso', 0.356), ('piecewise', 0.174), ('dp', 0.174), ('xn', 0.125), ('yn', 0.117), ('shall', 0.104), ('cachalot', 0.103), ('rdp', 0.103), ('signal', 0.083), ('precision', 0.074), ('selection', 0.074), ('penalized', 0.072), ('detection', 0.069), ('estimation', 0.069), ('launch', 0.068), ('mink', 0.068), ('precized', 0.068), ('retrieved', 0.068), ('median', 0.068), ('thanks', 0.068), ('penalty', 0.065), ('asymptotic', 0.064), ('locations', 0.063), ('recall', 0.063), ('alarm', 0.063), ('non', 0.062), ('caa', 0.06), ('changepoint', 0.06), ('catching', 0.06), ('caught', 0.06), ('fused', 0.06), ('dynamic', 0.058), ('performances', 0.057), ('cn', 0.057), ('false', 0.056), ('proposition', 0.055), ('tending', 0.054), ('path', 0.054), ('segments', 0.052), ('preprint', 0.051), ('regularization', 0.051), ('reduced', 0.049), ('programming', 0.049), ('irrelevant', 0.048), ('raw', 0.048), ('annals', 0.048), ('estimated', 0.047), ('reformulation', 0.046), ('uj', 0.046), ('true', 0.046), ('gurations', 0.045), ('observations', 0.045), ('magnitude', 0.045), ('multiple', 0.044), ('theoretical', 0.044), ('jumps', 0.044), ('regression', 0.042), ('tackled', 0.042), ('id', 0.042), ('entry', 0.04), ('criterion', 0.04), ('table', 0.04), ('shrinkage', 0.039), ('jump', 0.039), ('mention', 0.038), ('processed', 0.038), ('noise', 0.037), ('iid', 0.037), ('yt', 0.036), ('ratio', 0.036), ('audio', 0.036), ('bayesian', 0.035), ('segmentation', 0.035), ('rn', 0.035), ('cast', 0.034), ('potential', 0.033), ('statistics', 0.032), ('selecting', 0.032), ('displayed', 0.032), ('estimator', 0.031), ('society', 0.031), ('variable', 0.031), ('close', 0.03), ('lar', 0.03), ('rosset', 0.03), ('plagued', 0.03), ('depart', 0.03), ('tackles', 0.03), ('durations', 0.03), ('contiguous', 0.03), ('abrupt', 0.03), ('cgh', 0.03), ('consistencies', 0.03), ('designs', 0.03), ('fundamentals', 0.03), ('kempe', 0.03), ('moulines', 0.03), ('munk', 0.03), ('replications', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 43 nips-2007-Catching Change-points with Lasso
Author: Céline Levy-leduc, Zaïd Harchaoui
Abstract: We propose a new approach for dealing with the estimation of the location of change-points in one-dimensional piecewise constant signals observed in white noise. Our approach consists in reframing this task in a variable selection context. We use a penalized least-squares criterion with a 1 -type penalty for this purpose. We prove some theoretical results on the estimated change-points and on the underlying piecewise constant estimated function. Then, we explain how to implement this method in practice by combining the LAR algorithm and a reduced version of the dynamic programming algorithm and we apply it to synthetic and real data. 1
2 0.16048899 53 nips-2007-Compressed Regression
Author: Shuheng Zhou, Larry Wasserman, John D. Lafferty
Abstract: Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. In this paper we study a variant of this problem where the original n input variables are compressed by a random linear transformation to m n examples in p dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of random projections that are required for 1 -regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called “sparsistence.” In addition, we show that 1 -regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called “persistence.” Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the rate of information communicated between the compressed and uncompressed data that decay to zero. 1
3 0.12643793 99 nips-2007-Hierarchical Penalization
Author: Marie Szafranski, Yves Grandvalet, Pierre Morizet-mahoudeaux
Abstract: Hierarchical penalization is a generic framework for incorporating prior information in the fitting of statistical models, when the explicative variables are organized in a hierarchical structure. The penalizer is a convex functional that performs soft selection at the group level, and shrinks variables within each group. This favors solutions with few leading terms in the final combination. The framework, originally derived for taking prior knowledge into account, is shown to be useful in linear regression, when several parameters are used to model the influence of one feature, or in kernel regression, for learning multiple kernels. Keywords – Optimization: constrained and convex optimization. Supervised learning: regression, kernel methods, sparsity and feature selection. 1
4 0.098413572 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC
Author: Matthew Hoffman, Arnaud Doucet, Nando D. Freitas, Ajay Jasra
Abstract: A recently proposed formulation of the stochastic planning and control problem as one of parameter estimation for suitable artificial statistical models has led to the adoption of inference algorithms for this notoriously hard problem. At the algorithmic level, the focus has been on developing Expectation-Maximization (EM) algorithms. In this paper, we begin by making the crucial observation that the stochastic control problem can be reinterpreted as one of trans-dimensional inference. With this new interpretation, we are able to propose a novel reversible jump Markov chain Monte Carlo (MCMC) algorithm that is more efficient than its EM counterparts. Moreover, it enables us to implement full Bayesian policy search, without the need for gradients and with one single Markov chain. The new approach involves sampling directly from a distribution that is proportional to the reward and, consequently, performs better than classic simulations methods in situations where the reward is a rare event.
5 0.09664803 44 nips-2007-Catching Up Faster in Bayesian Model Selection and Model Averaging
Author: Tim V. Erven, Steven D. Rooij, Peter Grünwald
Abstract: Bayesian model averaging, model selection and their approximations such as BIC are generally statistically consistent, but sometimes achieve slower rates of convergence than other methods such as AIC and leave-one-out cross-validation. On the other hand, these other methods can be inconsistent. We identify the catch-up phenomenon as a novel explanation for the slow convergence of Bayesian methods. Based on this analysis we define the switch-distribution, a modification of the Bayesian model averaging distribution. We prove that in many situations model selection and prediction based on the switch-distribution is both consistent and achieves optimal convergence rates, thereby resolving the AIC-BIC dilemma. The method is practical; we give an efficient algorithm. 1
6 0.096251667 8 nips-2007-A New View of Automatic Relevance Determination
7 0.088291116 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis
8 0.076894656 173 nips-2007-Second Order Bilinear Discriminant Analysis for single trial EEG analysis
9 0.075386167 179 nips-2007-SpAM: Sparse Additive Models
10 0.067807183 170 nips-2007-Robust Regression with Twinned Gaussian Processes
11 0.061208539 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
12 0.058839459 186 nips-2007-Statistical Analysis of Semi-Supervised Regression
13 0.058485456 189 nips-2007-Supervised Topic Models
14 0.055647992 203 nips-2007-The rat as particle filter
15 0.055589296 13 nips-2007-A Unified Near-Optimal Estimator For Dimension Reduction in $l \alpha$ ($0<\alpha\leq 2$) Using Stable Random Projections
16 0.051647134 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons
17 0.050802723 82 nips-2007-Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization
18 0.049692091 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering
19 0.049126092 66 nips-2007-Density Estimation under Independent Similarly Distributed Sampling Assumptions
20 0.047356933 112 nips-2007-Learning Monotonic Transformations for Classification
topicId topicWeight
[(0, -0.189), (1, 0.02), (2, -0.027), (3, 0.023), (4, -0.019), (5, 0.039), (6, -0.026), (7, -0.013), (8, -0.112), (9, 0.01), (10, 0.03), (11, 0.016), (12, 0.064), (13, -0.038), (14, 0.065), (15, 0.033), (16, 0.049), (17, 0.054), (18, -0.07), (19, -0.226), (20, -0.027), (21, -0.076), (22, 0.053), (23, 0.04), (24, 0.115), (25, 0.101), (26, -0.015), (27, -0.017), (28, -0.039), (29, 0.18), (30, -0.016), (31, 0.084), (32, -0.114), (33, 0.208), (34, 0.16), (35, 0.029), (36, -0.061), (37, 0.01), (38, 0.145), (39, -0.004), (40, -0.032), (41, -0.021), (42, 0.029), (43, -0.019), (44, 0.009), (45, -0.009), (46, 0.049), (47, 0.064), (48, -0.021), (49, 0.078)]
simIndex simValue paperId paperTitle
same-paper 1 0.92936683 43 nips-2007-Catching Change-points with Lasso
Author: Céline Levy-leduc, Zaïd Harchaoui
Abstract: We propose a new approach for dealing with the estimation of the location of change-points in one-dimensional piecewise constant signals observed in white noise. Our approach consists in reframing this task in a variable selection context. We use a penalized least-squares criterion with a 1 -type penalty for this purpose. We prove some theoretical results on the estimated change-points and on the underlying piecewise constant estimated function. Then, we explain how to implement this method in practice by combining the LAR algorithm and a reduced version of the dynamic programming algorithm and we apply it to synthetic and real data. 1
2 0.8448351 53 nips-2007-Compressed Regression
Author: Shuheng Zhou, Larry Wasserman, John D. Lafferty
Abstract: Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. In this paper we study a variant of this problem where the original n input variables are compressed by a random linear transformation to m n examples in p dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of random projections that are required for 1 -regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called “sparsistence.” In addition, we show that 1 -regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called “persistence.” Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the rate of information communicated between the compressed and uncompressed data that decay to zero. 1
3 0.70351571 99 nips-2007-Hierarchical Penalization
Author: Marie Szafranski, Yves Grandvalet, Pierre Morizet-mahoudeaux
Abstract: Hierarchical penalization is a generic framework for incorporating prior information in the fitting of statistical models, when the explicative variables are organized in a hierarchical structure. The penalizer is a convex functional that performs soft selection at the group level, and shrinks variables within each group. This favors solutions with few leading terms in the final combination. The framework, originally derived for taking prior knowledge into account, is shown to be useful in linear regression, when several parameters are used to model the influence of one feature, or in kernel regression, for learning multiple kernels. Keywords – Optimization: constrained and convex optimization. Supervised learning: regression, kernel methods, sparsity and feature selection. 1
4 0.69021666 179 nips-2007-SpAM: Sparse Additive Models
Author: Han Liu, Larry Wasserman, John D. Lafferty, Pradeep K. Ravikumar
Abstract: We present a new class of models for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive a method for fitting the models that is effective even when the number of covariates is larger than the sample size. A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, showing that SpAM can be effective in fitting sparse nonparametric models in high dimensional data. 1
5 0.51235163 44 nips-2007-Catching Up Faster in Bayesian Model Selection and Model Averaging
Author: Tim V. Erven, Steven D. Rooij, Peter Grünwald
Abstract: Bayesian model averaging, model selection and their approximations such as BIC are generally statistically consistent, but sometimes achieve slower rates of convergence than other methods such as AIC and leave-one-out cross-validation. On the other hand, these other methods can be inconsistent. We identify the catch-up phenomenon as a novel explanation for the slow convergence of Bayesian methods. Based on this analysis we define the switch-distribution, a modification of the Bayesian model averaging distribution. We prove that in many situations model selection and prediction based on the switch-distribution is both consistent and achieves optimal convergence rates, thereby resolving the AIC-BIC dilemma. The method is practical; we give an efficient algorithm. 1
7 0.45740667 82 nips-2007-Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization
8 0.43675369 8 nips-2007-A New View of Automatic Relevance Determination
9 0.42016828 186 nips-2007-Statistical Analysis of Semi-Supervised Regression
10 0.40246704 173 nips-2007-Second Order Bilinear Discriminant Analysis for single trial EEG analysis
11 0.39771572 159 nips-2007-Progressive mixture rules are deviation suboptimal
12 0.34333751 178 nips-2007-Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains
13 0.34098703 28 nips-2007-Augmented Functional Time Series Representation and Forecasting with Gaussian Processes
14 0.33215693 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta
15 0.32932553 67 nips-2007-Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation
16 0.31977352 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis
17 0.31822437 174 nips-2007-Selecting Observations against Adversarial Objectives
18 0.31428084 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC
19 0.30710551 130 nips-2007-Modeling Natural Sounds with Modulation Cascade Processes
20 0.30204022 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning
topicId topicWeight
[(5, 0.03), (13, 0.032), (16, 0.042), (18, 0.027), (19, 0.019), (21, 0.059), (26, 0.234), (31, 0.028), (34, 0.037), (35, 0.035), (47, 0.132), (49, 0.062), (83, 0.115), (90, 0.068)]
simIndex simValue paperId paperTitle
1 0.85798043 143 nips-2007-Object Recognition by Scene Alignment
Author: Bryan Russell, Antonio Torralba, Ce Liu, Rob Fergus, William T. Freeman
Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database. 1
same-paper 2 0.77421701 43 nips-2007-Catching Change-points with Lasso
Author: Céline Levy-leduc, Zaïd Harchaoui
Abstract: We propose a new approach for dealing with the estimation of the location of change-points in one-dimensional piecewise constant signals observed in white noise. Our approach consists in reframing this task in a variable selection context. We use a penalized least-squares criterion with a 1 -type penalty for this purpose. We prove some theoretical results on the estimated change-points and on the underlying piecewise constant estimated function. Then, we explain how to implement this method in practice by combining the LAR algorithm and a reduced version of the dynamic programming algorithm and we apply it to synthetic and real data. 1
3 0.72904801 46 nips-2007-Cluster Stability for Finite Samples
Author: Ohad Shamir, Naftali Tishby
Abstract: Over the past few years, the notion of stability in data clustering has received growing attention as a cluster validation criterion in a sample-based framework. However, recent work has shown that as the sample size increases, any clustering model will usually become asymptotically stable. This led to the conclusion that stability is lacking as a theoretical and practical tool. The discrepancy between this conclusion and the success of stability in practice has remained an open question, which we attempt to address. Our theoretical approach is that stability, as used by cluster validation algorithms, is similar in certain respects to measures of generalization in a model-selection framework. In such cases, the model chosen governs the convergence rate of generalization bounds. By arguing that these rates are more important than the sample size, we are led to the prediction that stability-based cluster validation algorithms should not degrade with increasing sample size, despite the asymptotic universal stability. This prediction is substantiated by a theoretical analysis as well as some empirical results. We conclude that stability remains a meaningful cluster validation criterion over finite samples. 1
4 0.6399827 210 nips-2007-Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks
Author: Alex Graves, Marcus Liwicki, Horst Bunke, Jürgen Schmidhuber, Santiago Fernández
Abstract: In online handwriting recognition the trajectory of the pen is recorded during writing. Although the trajectory provides a compact and complete representation of the written output, it is hard to transcribe directly, because each letter is spread over many pen locations. Most recognition systems therefore employ sophisticated preprocessing techniques to put the inputs into a more localised form. However these techniques require considerable human effort, and are specific to particular languages and alphabets. This paper describes a system capable of directly transcribing raw online handwriting data. The system consists of an advanced recurrent neural network with an output layer designed for sequence labelling, combined with a probabilistic language model. In experiments on an unconstrained online database, we record excellent results using either raw or preprocessed data, well outperforming a state-of-the-art HMM based system in both cases. 1
5 0.6340332 168 nips-2007-Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods
Author: Alessandro Lazaric, Marcello Restelli, Andrea Bonarini
Abstract: Learning in real-world domains often requires to deal with continuous state and action spaces. Although many solutions have been proposed to apply Reinforcement Learning algorithms to continuous state problems, the same techniques can be hardly extended to continuous action spaces, where, besides the computation of a good approximation of the value function, a fast method for the identification of the highest-valued action is needed. In this paper, we propose a novel actor-critic approach in which the policy of the actor is estimated through sequential Monte Carlo methods. The importance sampling step is performed on the basis of the values learned by the critic, while the resampling step modifies the actor’s policy. The proposed approach has been empirically compared to other learning algorithms into several domains; in this paper, we report results obtained in a control problem consisting of steering a boat across a river. 1
6 0.62795639 122 nips-2007-Locality and low-dimensions in the prediction of natural experience from fMRI
7 0.62379509 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning
8 0.62295949 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC
9 0.62016439 100 nips-2007-Hippocampal Contributions to Control: The Third Way
10 0.61884719 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
11 0.61840594 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
12 0.6174131 174 nips-2007-Selecting Observations against Adversarial Objectives
13 0.61727369 158 nips-2007-Probabilistic Matrix Factorization
14 0.61691165 86 nips-2007-Exponential Family Predictive Representations of State
15 0.61657381 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
16 0.61606997 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection
17 0.61553884 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons
18 0.61537445 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
19 0.61429524 63 nips-2007-Convex Relaxations of Latent Variable Training
20 0.61424708 24 nips-2007-An Analysis of Inference with the Universum