nips nips2010 nips2010-76 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: J. Z. Kolter, Siddharth Batra, Andrew Y. Ng
Abstract: Energy disaggregation is the task of taking a whole-home energy signal and separating it into its component appliances. Studies have shown that having devicelevel energy information can cause users to conserve significant amounts of energy, but current electricity meters only report whole-home data. Thus, developing algorithmic methods for disaggregation presents a key technical challenge in the effort to maximize energy conservation. In this paper, we examine a large scale energy disaggregation task, and apply a novel extension of sparse coding to this problem. In particular, we develop a method, based upon structured prediction, for discriminatively training sparse coding algorithms specifically to maximize disaggregation performance. We show that this significantly improves the performance of sparse coding algorithms on the energy task and illustrate how these disaggregation results can provide useful information about energy usage. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Energy disaggregation is the task of taking a whole-home energy signal and separating it into its component appliances. [sent-7, score-1.095]
2 Studies have shown that having devicelevel energy information can cause users to conserve significant amounts of energy, but current electricity meters only report whole-home data. [sent-8, score-0.39]
3 Thus, developing algorithmic methods for disaggregation presents a key technical challenge in the effort to maximize energy conservation. [sent-9, score-1.066]
4 In this paper, we examine a large scale energy disaggregation task, and apply a novel extension of sparse coding to this problem. [sent-10, score-1.318]
5 In particular, we develop a method, based upon structured prediction, for discriminatively training sparse coding algorithms specifically to maximize disaggregation performance. [sent-11, score-1.129]
6 We show that this significantly improves the performance of sparse coding algorithms on the energy task and illustrate how these disaggregation results can provide useful information about energy usage. [sent-12, score-1.629]
7 While there are of course numerous facets to the energy problem, there is a growing consensus that many energy and sustainability problems are fundamentally informatics problems, areas where machine learning can play a significant role. [sent-15, score-0.59]
8 This paper looks specifically at the task of energy disaggregation, an informatics task relating to energy efficiency. [sent-16, score-0.622]
9 Energy disaggregation, also called non-intrusive load monitoring [11], involves taking an aggregated energy signal, for example the total power consumption of a house as read by an electricity meter, and separating it into the different electrical appliances being used. [sent-17, score-0.751]
10 In the United States, electricity constitutes 38% of all energy used, and residential and commercial buildings together use 75% of this electricity [28]; thus, this 12% figure accounts for a sizable amount of energy that could potentially be saved. [sent-19, score-0.776]
11 However, the widely-available sensors that provide electricity consumption information, namely the so-called “Smart Meters” that are already becoming ubiquitous, collect energy information only at the whole-home level and at a very low resolution (typically every hour or 15 minutes). [sent-20, score-0.482]
12 Thus, energy disaggregation methods that can take this wholehome data and use it to predict individual appliance usage present an algorithmic challenge where advances can have a significant impact on large-scale energy efficiency issues. [sent-21, score-1.576]
13 The algorithmic approach we present in this paper builds upon sparse coding methods and recent work in single-channel source separation [24, 23, 22]. [sent-25, score-0.345]
14 Specifically, we use a sparse coding algorithm to learn a model of each device’s power consumption over a typical week, then combine these learned models to predict the power consumption of different devices in previously unseen homes, using their aggregate signal alone. [sent-26, score-0.846]
15 While energy disaggregation can naturally be formulated as such a single-channel source separation problem, we know of no previous application of these methods to the energy disaggregation task. [sent-27, score-2.125]
16 As a second major contribution of the paper, we develop a novel approach for discriminatively training sparse coding dictionaries for disaggregation tasks, and show that this significantly improves performance on our energy domain. [sent-29, score-1.398]
17 Specifically, we formulate the task of maximizing disaggregation performance as a structured prediction problem, which leads to a simple and effective algorithm for discriminatively training such sparse representation for disaggregation tasks. [sent-30, score-1.735]
18 The algorithm is similar in spirit to a number of recent approaches to discriminative training of sparse representations [12, 17, 18]. [sent-31, score-0.201]
19 2 Discriminative Disaggregation via Sparse Coding We begin by reviewing sparse coding methods and their application to disaggregation tasks. [sent-33, score-1.023]
20 For concreteness we use the terminology of our energy disaggregation domain throughout this description, but the algorithms can apply equally to other domains. [sent-34, score-1.041]
21 Formally, assume we are given k different classes, which in our setting corresponds to device categories such as televisions, refrigerators, heaters, etc. [sent-35, score-0.22]
22 , k, we have a matrix Xi ∈ RT ×m where each column of Xi contains a week of energy usage (measured every hour) for a particular house and for this particular (j) type of device. [sent-39, score-0.589]
23 Thus, for example, the jth column of X1 , which we denote x1 , may contain weekly (j) energy consumption for a refrigerator (for a single week in a single house) and x2 could contain weekly energy consumption of a heater (for this same week in the same house). [sent-40, score-1.274]
24 We denote the k ¯ ¯ aggregate power consumption over all device types as X ≡ i=1 Xi so that the jth column of X, (j) ¯ x , contains a week of aggregated energy consumption for all devices in a given house. [sent-41, score-1.179]
25 At training time, we assume we have access to the individual device energy readings X1 , . [sent-42, score-0.594]
26 At test time, however, ¯ we assume that we have access only to the aggregate signal of a new set of data points X′ (as would be reported by smart meter), and the goal is to separate this signal into its components, X′ , . [sent-46, score-0.169]
27 1 k The sparse coding approach to source separation (e. [sent-50, score-0.32]
28 , [24, 23]), which forms for the basis for our disaggregation approach, is to train separate models for each individual class Xi , then use these models to separate an aggregate signal. [sent-52, score-0.881]
29 Formally, sparse coding models the ith data matrix using the approximation Xi ≈ Bi Ai where the columns of Bi ∈ RT ×n contain a set of n basis functions, also called the dictionary, and the columns of Ai ∈ Rn×m contain the activations of these basis functions 2 [20]. [sent-53, score-0.505]
30 Sparse coding additionally imposes the the constraint that the activations Ai be sparse, i. [sent-54, score-0.256]
31 Since energy usage is an inherently non-negative quantity, we impose the further constraint that the activations and bases be non-negative, an extension known as non-negative sparse coding [13, 7]. [sent-58, score-0.927]
32 Specifically, in this paper we will consider the non-negative sparse coding objective 1 Xi − B i A i Ai ≥0,Bi ≥0 2 min 2 F +λ (Ai )pq (j) bi subject to 2 ≤ 1, j = 1, . [sent-59, score-0.36]
33 , k, we can disaggregate a new aggregate signal X ∈ RT ×m (without providing the algorithm its individual components), using the following procedure (used by, e. [sent-67, score-0.123]
34 We concatenate the bases to form single joint set of basis functions and solve the optimization problem ˆ A1:k A1 ¯ . [sent-70, score-0.208]
35 , require smaller activations) than all other bases Bj for j = i. [sent-80, score-0.165]
36 1 Structured Prediction for Discriminative Disaggregation Sparse Coding An issue with using sparse coding alone for disaggregation tasks is that the bases are not trained to minimize the disaggregation error. [sent-83, score-1.949]
37 Instead, the method relies on the hope that learning basis functions for each class individually will produce bases that are distinct enough to also produce small disaggregation error. [sent-84, score-0.954]
38 Furthermore, it is very difficult to optimize the disaggregation error directly over B1:k , due to the non-differentiability (and discontinuity) of the argmin operator with a nonnegativity constraint. [sent-85, score-0.746]
39 3 Instead, we propose in this paper a method for optimizing disaggregation performance based upon structured prediction methods [27]. [sent-88, score-0.794]
40 To describe our approach, we first define the regularized disagˆ gregation error, which is simply the disaggregation error plus a regularization penalty on A1:k , ˆ (Ai )pq Ereg (X1:k , B1:k ) ≡ E(X1:k , B1:k ) + λ (5) i,p,q ˆ where A is defined as in (2). [sent-89, score-0.746]
41 This criterion provides a better optimization objective for our algorithm, as we wish to obtain a sparse set of coefficients that can achieve low disaggregation error. [sent-90, score-0.845]
42 Clearly, ˆ the best possible value of Ai for this objective function is given by A⋆ = arg min i Ai ≥0 1 X i − B i Ai 2 2 F +λ (Ai )pq , (6) p,q which is precisely the activations obtained after an iteration of sparse coding on the data matrix Xi . [sent-91, score-0.379]
43 Motivated by this fact, the first intuition of our algorithm is that in order to minimize disaggregation error, we can discriminatively optimize the bases B1:k that such performing the optimization (2) produces activations that are as close to A⋆ as possible. [sent-92, score-1.039]
44 Of course, changing the bases B1:k to 1:k optimize this criterion would also change the resulting optimal coefficients A⋆ . [sent-93, score-0.165]
45 Thus, the second 1:k intuition of our method is that the bases used in the optimization (2) need not be the same as the bases used to reconstruct the signals. [sent-94, score-0.348]
46 1:k ˜ Discriminatively training the disaggregation bases B1:k is naturally framed as a structured prediction ˜ ¯ task: the input is X, the multi-variate desired output is A⋆ , the model parameters are B1:k , and the 1:k ¯ B1:k , A1:k ). [sent-96, score-0.989]
47 1 In other words, we seek bases B1:k such that (ideally) ˜ ˜ discriminant function is F (X, ¯ ˜ A⋆ = arg min F (X, B1:k , A1:k ). [sent-97, score-0.207]
48 Our complete method for discriminative disaggregation sparse coding, which we call DDSC, is shown in Algorithm 1. [sent-103, score-0.917]
49 However, since the function F the goal is to output the desired activations (a1:k ) , for the jth example x decomposes across the columns of X and A, the above notation is equivalent to the more explicit formulation. [sent-105, score-0.114]
50 1 4 Algorithm 1 Discriminative disaggregation sparse coding Input: data points for each individual source Xi ∈ RT ×m , i = 1, . [sent-106, score-1.06]
51 , k, iterate until convergence: (a) Ai ← arg minA≥0 Xi − Bi A 2 + λ p,q Apq F (b) Bi ← arg minB≥0, b(j) 2 ≤1 Xi − BAi 2 F Discriminative disaggregation training: ˜ 3. [sent-116, score-0.794]
52 Iterate until convergence: ˆ ¯ ˜ (a) A1:k ← arg minA ≥0 F (X, B1:k , A1:k ) 1:k ˜ ˜ ¯ ˜ˆ ˆ ¯ ˜ (b) B ← B − α (X − BA)AT − (X − BA⋆ )(A⋆ )T (c) For all i, j, (j) bi ← (j) bi / ¯′ + (j) bi 2 . [sent-119, score-0.273]
53 2 Extensions Although, as we show shortly, the discriminative training procedure has made the largest difference in terms of improving disaggregation performance in our domain, a number of other modifications to the standard sparse coding formulation have also proven useful. [sent-124, score-1.125]
54 One deficiency of the sparse coding framework for energy disaggregation is that the optimization objective does not take into consideration the size of an energy signal for determinining which class it belongs to, just its shape. [sent-127, score-1.651]
55 Since total energy used is obviously a discriminating factor for different device types, we consider an extension that penalizes the ℓ2 deviation between a device and its mean total energy. [sent-128, score-0.735]
56 Formally, we augment the objective F with the penalty k ¯ ¯ FT EP (X, B1:k , A1:k ) = F (X, B1:k , A1:k ) + λT EP µi 1T − 1T Bi Ai 2 2 (11) i=1 where 1 denotes a vector of ones of the appropriate size, and µi = total energy of device class i. [sent-129, score-0.515]
57 Since the data set we consider exhibits some amount of sparsity at the device level (i. [sent-131, score-0.22]
58 , several examples have zero energy consumed by certain device types, as there is either no such device in the home or it was not being monitored), we also would like to encourage a grouping effect to the activations. [sent-133, score-0.801]
59 To achieve this, we employ the group Lasso algorithm [29], which adds an ℓ2 norm penalty to the activations of each device k m (j) ¯ ¯ FGL (X, B1:k , A1:k ) = F (X, B1:k , A1:k ) + λGL ai 2. [sent-135, score-0.402]
60 Shift invariant, or convolutional sparse coding is an extension to the standard sparse coding framework where each basis is convolved over the input data, with a separate activation for each shift position [3, 10]. [sent-137, score-0.647]
61 Such a scheme may intuitively seem to be beneficial for the energy disaggregation task, where a given device might exhibit the same energy signature at different times. [sent-138, score-1.576]
62 However, pure shift invariant bases cannot capture information about when in the week or day each device is typically used, and such information has proven crucial for disaggregation performance. [sent-140, score-1.326]
63 In particular, most of the time spent by the algorithm involves solving sparse optimization problems to find the activation coefficients, namely steps 2a and 4a in Algorithm 1. [sent-143, score-0.099]
64 The data set contains hourly energy readings from 10,165 different devices in 590 homes, collected over more than two years. [sent-148, score-0.508]
65 Each device is labeled with one of 52 device types, which we further reduce to ten broad categories of electrical devices: lighting, TV, computer, other electronics, kitchen appliances, washing machine and dryer, refrigerator and freezer, dishwasher, heating/cooling, and a miscellaneous category. [sent-149, score-0.602]
66 We fit the hyper-parameters of the algorithms (number of bases and regularization parameters) using grid search over each parameter independently on a cross validation set consisting of 20% of the training homes. [sent-152, score-0.195]
67 Figure 1 shows the true energy energy consumed by two different houses in the test set for two different weeks, along with the energy consumption predicted by our algorithms. [sent-155, score-1.062]
68 The figure shows both the predicted energy of several devices over the whole week, as well as a pie chart that shows the relative energy consumption of different device types over the whole week (a more intuitive display of energy consumed over the week). [sent-156, score-1.601]
69 In many cases, certain devices like the refrigerator, washer/dryer, and computer are predicted quite accurately, both in terms the total predicted percentage and in terms of the signals themselves. [sent-157, score-0.233]
70 There are also cases where certain devices are not predicted well, such as underestimating the heating component in the example on the left, and a predicting spike in computer usage in the example on the right when it was in fact a dishwasher. [sent-158, score-0.306]
71 Nonetheless, despite some poor predictions at the hourly device level, the breakdown of electric consumption is still quite informative, determining the approximate percentage of many devices types and demonstrating the promise of such feedback. [sent-159, score-0.566]
72 In addition to the disaggregation results themselves, sparse coding representations of the different device types are interesting in their own right, as they give a good intuition about how the different devices are typically used. [sent-160, score-1.415]
73 In each plot, the grayscale image on the right shows an intensity map of all bases functions learned for that device category, where each column in the image corresponds to a learned basis. [sent-162, score-0.431]
74 The plot on the left shows examples of seven basis functions for the different device types. [sent-163, score-0.287]
75 Notice, for example, that the bases learned for the washer/dryer devices are nearly all heavily peaked, while the refrigerator bases are much lower in maximum magnitude. [sent-164, score-0.599]
76 Additionally, in the basis images devices like lighting demonstrate a clear “band” pattern, indicating that these devices are likely to 6 Actual Energy Whole Home Whole Home 3 Predicted Energy 2 1 0 1 2 3 4 5 6 7 0. [sent-165, score-0.395]
77 Blue lines show the true energy usage, and red the predicted usage, both in units of kWh. [sent-193, score-0.334]
78 2 0 Figure 2: Example basis functions learned from three device categories (best viewed in color). [sent-206, score-0.278]
79 The plot of the left shows seven example bases, while the image on the right shows all learned basis functions (one basis per column). [sent-207, score-0.125]
80 be on and off during certain times of the day (each basis covers a week of energy usage, so the seven bands represent the seven days). [sent-208, score-0.531]
81 There is sufficient training data such that, for devices like washers and dryers, we learn a separate basis for all possible shifts. [sent-210, score-0.244]
82 In contrast, for devices like lighting, where the time of usage is an important factor, simple shift-invariant bases miss key information. [sent-211, score-0.432]
83 While many of the algorithmic elements improve the disaggregation performance, the results in this section show that the discriminative training in particular is crucial for optimizing disaggregation performance. [sent-214, score-1.619]
84 The most natural metric for evaluating disaggregation performance is the disaggregation error in (4). [sent-215, score-1.492]
85 However, average disaggregation error is not a particularly intuitive metric, and so we also evaluate a total-week accuracy of the prediction system, defined formally as Accuracy ≡ i,q min p (Xi )pq , p,q 7 ¯ Xp,q ˆ p (Bi Ai )pq . [sent-216, score-0.768]
86 52 100 Figure 3: Evolution of training and testing errors for iterations of the discriminative DDSC updates. [sent-278, score-0.102]
87 Despite the complex definition, this quantity simply captures the average amount of energy predicted correctly over the week (i. [sent-279, score-0.459]
88 , the overlap between the true and predicted energy pie charts). [sent-281, score-0.356]
89 Table 1 shows the disaggregation performance obtained by many different prediction methods. [sent-282, score-0.768]
90 To put these accuracies in context, we note that separate to the results presented here we trained an SVM, using a variety of hand-engineered features, to classify individual energy signals into their device category, and were able to achieve at most 59% classification accuracy. [sent-284, score-0.552]
91 It is clear, then, that the discriminative training is crucial to improving the performance of the sparse coding disaggregation procedure within this range, and does provide a significant improvement over the baseline. [sent-286, score-1.125]
92 4 Conclusion Energy disaggregation is a domain where advances in machine learning can have a significant impact on energy use. [sent-288, score-1.041]
93 In this paper we presented an application of sparse coding algorithms to this task, focusing on a large data set that contains the type of low-resolution data readily available from smart meters. [sent-289, score-0.315]
94 We developed the discriminative disaggregation sparse coding (DDSC) algorithm, a novel discriminative training procedure, and show that this algorithm significantly improves the accuracy of sparse coding for the energy disaggregation task. [sent-290, score-2.515]
95 We are very grateful to Plugwise for providing us with their plug-level energy data set, and in particular we thank Willem Houck for his assistance with this data. [sent-292, score-0.295]
96 A note on the group lasso and a sparse group lasso. [sent-339, score-0.157]
97 Discriminative sparse image models for class-specific edge detection and image interpretation. [sent-400, score-0.099]
98 Emergence of simple-cell receptive field properties by learning a sparse code for natural images. [sent-412, score-0.099]
99 At the flick of a switch: Detecting and classifying unique electrical events on the residential power line. [sent-424, score-0.138]
100 Using appliance signatures for monitoring residential loads at meter panel level. [sent-460, score-0.195]
wordName wordTfidf (topN-words)
[('disaggregation', 0.746), ('energy', 0.295), ('device', 0.22), ('coding', 0.178), ('bases', 0.165), ('devices', 0.155), ('ddsc', 0.137), ('week', 0.125), ('usage', 0.112), ('consumption', 0.11), ('refrigerator', 0.099), ('sparse', 0.099), ('homes', 0.087), ('ai', 0.086), ('bi', 0.083), ('activations', 0.078), ('discriminative', 0.072), ('pq', 0.07), ('electricity', 0.062), ('appliances', 0.062), ('dishwasher', 0.062), ('residential', 0.062), ('tep', 0.062), ('discriminatively', 0.05), ('basis', 0.043), ('gl', 0.042), ('lighting', 0.042), ('house', 0.041), ('power', 0.041), ('aggregate', 0.039), ('predicted', 0.039), ('signal', 0.038), ('smart', 0.038), ('home', 0.038), ('load', 0.038), ('appliance', 0.037), ('leeb', 0.037), ('meter', 0.037), ('plugwise', 0.037), ('monitoring', 0.037), ('ba', 0.036), ('electrical', 0.035), ('shift', 0.034), ('electric', 0.034), ('meters', 0.033), ('aggregated', 0.03), ('training', 0.03), ('hourly', 0.03), ('mina', 0.03), ('weeks', 0.03), ('consumed', 0.028), ('readings', 0.028), ('kitchen', 0.028), ('separation', 0.027), ('electronics', 0.027), ('structured', 0.026), ('disaggregate', 0.025), ('disaggregating', 0.025), ('ereg', 0.025), ('laughman', 0.025), ('nonintrusive', 0.025), ('shaw', 0.025), ('sisc', 0.025), ('wholehome', 0.025), ('algorithmic', 0.025), ('seven', 0.024), ('arg', 0.024), ('pie', 0.022), ('signatures', 0.022), ('weekly', 0.022), ('lasso', 0.022), ('perceptron', 0.022), ('prediction', 0.022), ('jth', 0.021), ('individual', 0.021), ('day', 0.02), ('loses', 0.02), ('signature', 0.02), ('rt', 0.02), ('predict', 0.02), ('tv', 0.019), ('xi', 0.019), ('group', 0.018), ('reconstruct', 0.018), ('discriminant', 0.018), ('contain', 0.017), ('types', 0.017), ('separate', 0.016), ('task', 0.016), ('invariant', 0.016), ('stanford', 0.016), ('ubiquitous', 0.016), ('source', 0.016), ('column', 0.016), ('hour', 0.015), ('columns', 0.015), ('alone', 0.015), ('audio', 0.015), ('receiving', 0.015), ('learned', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 76 nips-2010-Energy Disaggregation via Discriminative Sparse Coding
Author: J. Z. Kolter, Siddharth Batra, Andrew Y. Ng
Abstract: Energy disaggregation is the task of taking a whole-home energy signal and separating it into its component appliances. Studies have shown that having devicelevel energy information can cause users to conserve significant amounts of energy, but current electricity meters only report whole-home data. Thus, developing algorithmic methods for disaggregation presents a key technical challenge in the effort to maximize energy conservation. In this paper, we examine a large scale energy disaggregation task, and apply a novel extension of sparse coding to this problem. In particular, we develop a method, based upon structured prediction, for discriminatively training sparse coding algorithms specifically to maximize disaggregation performance. We show that this significantly improves the performance of sparse coding algorithms on the energy task and illustrate how these disaggregation results can provide useful information about energy usage. 1
2 0.18488233 59 nips-2010-Deep Coding Network
Author: Yuanqing Lin, Zhang Tong, Shenghuo Zhu, Kai Yu
Abstract: This paper proposes a principled extension of the traditional single-layer flat sparse coding scheme, where a two-layer coding scheme is derived based on theoretical analysis of nonlinear functional approximation that extends recent results for local coordinate coding. The two-layer approach can be easily generalized to deeper structures in a hierarchical multiple-layer manner. Empirically, it is shown that the deep coding approach yields improved performance in benchmark datasets.
3 0.075067498 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition
Author: Koray Kavukcuoglu, Pierre Sermanet, Y-lan Boureau, Karol Gregor, Michael Mathieu, Yann L. Cun
Abstract: We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in highly redundant codes because overlapping patches are encoded in isolation. By training convolutionally over large image windows, our method reduces the redudancy between feature vectors at neighboring locations and improves the efficiency of the overall representation. In addition to a linear decoder that reconstructs the image from sparse features, our method trains an efficient feed-forward encoder that predicts quasisparse features from the input. While patch-based training rarely produces anything but oriented edge detectors, we show that convolutional training produces highly diverse filters, including center-surround filters, corner detectors, cross detectors, and oriented grating detectors. We show that using these filters in multistage convolutional network architecture improves performance on a number of visual recognition and detection tasks. 1
4 0.074145809 37 nips-2010-Basis Construction from Power Series Expansions of Value Functions
Author: Sridhar Mahadevan, Bo Liu
Abstract: This paper explores links between basis construction methods in Markov decision processes and power series expansions of value functions. This perspective provides a useful framework to analyze properties of existing bases, as well as provides insight into constructing more effective bases. Krylov and Bellman error bases are based on the Neumann series expansion. These bases incur very large initial Bellman errors, and can converge rather slowly as the discount factor approaches unity. The Laurent series expansion, which relates discounted and average-reward formulations, provides both an explanation for this slow convergence as well as suggests a way to construct more efficient basis representations. The first two terms in the Laurent series represent the scaled average-reward and the average-adjusted sum of rewards, and subsequent terms expand the discounted value function using powers of a generalized inverse called the Drazin (or group inverse) of a singular matrix derived from the transition matrix. Experiments show that Drazin bases converge considerably more quickly than several other bases, particularly for large values of the discount factor. An incremental variant of Drazin bases called Bellman average-reward bases (BARBs) is described, which provides some of the same benefits at lower computational cost. 1
5 0.067945316 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives
Author: Taehwan Kim, Gregory Shakhnarovich, Raquel Urtasun
Abstract: Sparse coding has recently become a popular approach in computer vision to learn dictionaries of natural images. In this paper we extend the sparse coding framework to learn interpretable spatio-temporal primitives. We formulated the problem as a tensor factorization problem with tensor group norm constraints over the primitives, diagonal constraints on the activations that provide interpretability as well as smoothness constraints that are inherent to human motion. We demonstrate the effectiveness of our approach to learn interpretable representations of human motion from motion capture data, and show that our approach outperforms recently developed matching pursuit and sparse coding algorithms. 1
6 0.066018082 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
7 0.055888217 268 nips-2010-The Neural Costs of Optimal Control
8 0.053835623 56 nips-2010-Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication
9 0.052840386 89 nips-2010-Factorized Latent Spaces with Structured Sparsity
10 0.047249768 123 nips-2010-Individualized ROI Optimization via Maximization of Group-wise Consistency of Structural and Functional Profiles
11 0.042527691 3 nips-2010-A Bayesian Framework for Figure-Ground Interpretation
12 0.042007852 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
13 0.04162344 65 nips-2010-Divisive Normalization: Justification and Effectiveness as Efficient Coding Transform
14 0.040820312 96 nips-2010-Fractionally Predictive Spiking Neurons
15 0.038773175 35 nips-2010-Auto-Regressive HMM Inference with Incomplete Data for Short-Horizon Wind Forecasting
16 0.038263898 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
17 0.03813111 103 nips-2010-Generating more realistic images using gated MRF's
18 0.036886211 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning
19 0.035772435 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
20 0.035480428 224 nips-2010-Regularized estimation of image statistics by Score Matching
topicId topicWeight
[(0, 0.113), (1, 0.041), (2, -0.064), (3, 0.022), (4, 0.036), (5, -0.035), (6, -0.004), (7, 0.086), (8, -0.077), (9, -0.014), (10, 0.037), (11, -0.008), (12, 0.032), (13, -0.068), (14, -0.045), (15, -0.063), (16, -0.052), (17, 0.057), (18, -0.037), (19, -0.064), (20, 0.047), (21, -0.036), (22, -0.001), (23, -0.017), (24, -0.056), (25, -0.069), (26, -0.072), (27, -0.071), (28, -0.034), (29, 0.116), (30, 0.027), (31, 0.132), (32, -0.13), (33, 0.035), (34, -0.045), (35, 0.053), (36, 0.048), (37, -0.041), (38, -0.069), (39, -0.051), (40, -0.005), (41, -0.052), (42, 0.139), (43, 0.022), (44, 0.127), (45, 0.035), (46, 0.036), (47, -0.05), (48, -0.034), (49, -0.054)]
simIndex simValue paperId paperTitle
same-paper 1 0.93136448 76 nips-2010-Energy Disaggregation via Discriminative Sparse Coding
Author: J. Z. Kolter, Siddharth Batra, Andrew Y. Ng
Abstract: Energy disaggregation is the task of taking a whole-home energy signal and separating it into its component appliances. Studies have shown that having devicelevel energy information can cause users to conserve significant amounts of energy, but current electricity meters only report whole-home data. Thus, developing algorithmic methods for disaggregation presents a key technical challenge in the effort to maximize energy conservation. In this paper, we examine a large scale energy disaggregation task, and apply a novel extension of sparse coding to this problem. In particular, we develop a method, based upon structured prediction, for discriminatively training sparse coding algorithms specifically to maximize disaggregation performance. We show that this significantly improves the performance of sparse coding algorithms on the energy task and illustrate how these disaggregation results can provide useful information about energy usage. 1
2 0.8135134 59 nips-2010-Deep Coding Network
Author: Yuanqing Lin, Zhang Tong, Shenghuo Zhu, Kai Yu
Abstract: This paper proposes a principled extension of the traditional single-layer flat sparse coding scheme, where a two-layer coding scheme is derived based on theoretical analysis of nonlinear functional approximation that extends recent results for local coordinate coding. The two-layer approach can be easily generalized to deeper structures in a hierarchical multiple-layer manner. Empirically, it is shown that the deep coding approach yields improved performance in benchmark datasets.
3 0.67589343 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives
Author: Taehwan Kim, Gregory Shakhnarovich, Raquel Urtasun
Abstract: Sparse coding has recently become a popular approach in computer vision to learn dictionaries of natural images. In this paper we extend the sparse coding framework to learn interpretable spatio-temporal primitives. We formulated the problem as a tensor factorization problem with tensor group norm constraints over the primitives, diagonal constraints on the activations that provide interpretability as well as smoothness constraints that are inherent to human motion. We demonstrate the effectiveness of our approach to learn interpretable representations of human motion from motion capture data, and show that our approach outperforms recently developed matching pursuit and sparse coding algorithms. 1
4 0.66208065 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
Author: Pierre Garrigues, Bruno A. Olshausen
Abstract: We propose a class of sparse coding models that utilizes a Laplacian Scale Mixture (LSM) prior to model dependencies among coefficients. Each coefficient is modeled as a Laplacian distribution with a variable scale parameter, with a Gamma distribution prior over the scale parameter. We show that, due to the conjugacy of the Gamma prior, it is possible to derive efficient inference procedures for both the coefficients and the scale parameter. When the scale parameters of a group of coefficients are combined into a single variable, it is possible to describe the dependencies that occur due to common amplitude fluctuations among coefficients, which have been shown to constitute a large fraction of the redundancy in natural images [1]. We show that, as a consequence of this group sparse coding, the resulting inference of the coefficients follows a divisive normalization rule, and that this may be efficiently implemented in a network architecture similar to that which has been proposed to occur in primary visual cortex. We also demonstrate improvements in image coding and compressive sensing recovery using the LSM model. 1
5 0.56587648 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition
Author: Koray Kavukcuoglu, Pierre Sermanet, Y-lan Boureau, Karol Gregor, Michael Mathieu, Yann L. Cun
Abstract: We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in highly redundant codes because overlapping patches are encoded in isolation. By training convolutionally over large image windows, our method reduces the redudancy between feature vectors at neighboring locations and improves the efficiency of the overall representation. In addition to a linear decoder that reconstructs the image from sparse features, our method trains an efficient feed-forward encoder that predicts quasisparse features from the input. While patch-based training rarely produces anything but oriented edge detectors, we show that convolutional training produces highly diverse filters, including center-surround filters, corner detectors, cross detectors, and oriented grating detectors. We show that using these filters in multistage convolutional network architecture improves performance on a number of visual recognition and detection tasks. 1
6 0.55590528 56 nips-2010-Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication
7 0.53964311 37 nips-2010-Basis Construction from Power Series Expansions of Value Functions
8 0.48649216 65 nips-2010-Divisive Normalization: Justification and Effectiveness as Efficient Coding Transform
9 0.45248598 266 nips-2010-The Maximal Causes of Natural Scenes are Edge Filters
10 0.40406159 268 nips-2010-The Neural Costs of Optimal Control
11 0.37119392 89 nips-2010-Factorized Latent Spaces with Structured Sparsity
12 0.34741482 96 nips-2010-Fractionally Predictive Spiking Neurons
13 0.34563196 45 nips-2010-CUR from a Sparse Optimization Viewpoint
14 0.29891422 3 nips-2010-A Bayesian Framework for Figure-Ground Interpretation
15 0.2967476 35 nips-2010-Auto-Regressive HMM Inference with Incomplete Data for Short-Horizon Wind Forecasting
16 0.29390201 62 nips-2010-Discriminative Clustering by Regularized Information Maximization
17 0.29187676 248 nips-2010-Sparse Inverse Covariance Selection via Alternating Linearization Methods
18 0.28063989 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods
19 0.27545202 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
20 0.27540147 271 nips-2010-Tiled convolutional neural networks
topicId topicWeight
[(13, 0.042), (17, 0.024), (27, 0.071), (30, 0.04), (35, 0.074), (45, 0.175), (50, 0.065), (52, 0.03), (60, 0.031), (77, 0.028), (90, 0.023), (91, 0.3)]
simIndex simValue paperId paperTitle
same-paper 1 0.72093105 76 nips-2010-Energy Disaggregation via Discriminative Sparse Coding
Author: J. Z. Kolter, Siddharth Batra, Andrew Y. Ng
Abstract: Energy disaggregation is the task of taking a whole-home energy signal and separating it into its component appliances. Studies have shown that having devicelevel energy information can cause users to conserve significant amounts of energy, but current electricity meters only report whole-home data. Thus, developing algorithmic methods for disaggregation presents a key technical challenge in the effort to maximize energy conservation. In this paper, we examine a large scale energy disaggregation task, and apply a novel extension of sparse coding to this problem. In particular, we develop a method, based upon structured prediction, for discriminatively training sparse coding algorithms specifically to maximize disaggregation performance. We show that this significantly improves the performance of sparse coding algorithms on the energy task and illustrate how these disaggregation results can provide useful information about energy usage. 1
2 0.64154434 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering
Author: Deqing Sun, Erik B. Sudderth, Michael J. Black
Abstract: Layered models are a powerful way of describing natural scenes containing smooth surfaces that may overlap and occlude each other. For image motion estimation, such models have a long history but have not achieved the wide use or accuracy of non-layered methods. We present a new probabilistic model of optical flow in layers that addresses many of the shortcomings of previous approaches. In particular, we define a probabilistic graphical model that explicitly captures: 1) occlusions and disocclusions; 2) depth ordering of the layers; 3) temporal consistency of the layer segmentation. Additionally the optical flow in each layer is modeled by a combination of a parametric model and a smooth deviation based on an MRF with a robust spatial prior; the resulting model allows roughness in layers. Finally, a key contribution is the formulation of the layers using an imagedependent hidden field prior based on recent models for static scene segmentation. The method achieves state-of-the-art results on the Middlebury benchmark and produces meaningful scene segmentations as well as detected occlusion regions.
3 0.59109223 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
Author: Pierre Garrigues, Bruno A. Olshausen
Abstract: We propose a class of sparse coding models that utilizes a Laplacian Scale Mixture (LSM) prior to model dependencies among coefficients. Each coefficient is modeled as a Laplacian distribution with a variable scale parameter, with a Gamma distribution prior over the scale parameter. We show that, due to the conjugacy of the Gamma prior, it is possible to derive efficient inference procedures for both the coefficients and the scale parameter. When the scale parameters of a group of coefficients are combined into a single variable, it is possible to describe the dependencies that occur due to common amplitude fluctuations among coefficients, which have been shown to constitute a large fraction of the redundancy in natural images [1]. We show that, as a consequence of this group sparse coding, the resulting inference of the coefficients follows a divisive normalization rule, and that this may be efficiently implemented in a network architecture similar to that which has been proposed to occur in primary visual cortex. We also demonstrate improvements in image coding and compressive sensing recovery using the LSM model. 1
4 0.57797855 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
Author: Hongbo Zhou, Qiang Cheng
Abstract: Regularization technique has become a principled tool for statistics and machine learning research and practice. However, in most situations, these regularization terms are not well interpreted, especially on how they are related to the loss function and data. In this paper, we propose a robust minimax framework to interpret the relationship between data and regularization terms for a large class of loss functions. We show that various regularization terms are essentially corresponding to different distortions to the original data matrix. This minimax framework includes ridge regression, lasso, elastic net, fused lasso, group lasso, local coordinate coding, multiple kernel learning, etc., as special cases. Within this minimax framework, we further give mathematically exact definition for a novel representation called sparse grouping representation (SGR), and prove a set of sufficient conditions for generating such group level sparsity. Under these sufficient conditions, a large set of consistent regularization terms can be designed. This SGR is essentially different from group lasso in the way of using class or group information, and it outperforms group lasso when there appears group label noise. We also provide some generalization bounds in a classification setting. 1
5 0.57714027 73 nips-2010-Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization
Author: Feiping Nie, Heng Huang, Xiao Cai, Chris H. Ding
Abstract: Feature selection is an important component of many machine learning applications. Especially in many bioinformatics tasks, efficient and robust feature selection methods are desired to extract meaningful features and eliminate noisy ones. In this paper, we propose a new robust feature selection method with emphasizing joint 2,1 -norm minimization on both loss function and regularization. The 2,1 -norm based loss function is robust to outliers in data points and the 2,1 norm regularization selects features across all data points with joint sparsity. An efficient algorithm is introduced with proved convergence. Our regression based objective makes the feature selection process more efficient. Our method has been applied into both genomic and proteomic biomarkers discovery. Extensive empirical studies are performed on six data sets to demonstrate the performance of our feature selection method. 1
6 0.57652712 97 nips-2010-Functional Geometry Alignment and Localization of Brain Areas
7 0.57594275 59 nips-2010-Deep Coding Network
8 0.5756883 149 nips-2010-Learning To Count Objects in Images
9 0.57255274 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
10 0.57241762 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
11 0.57104534 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
12 0.57013941 117 nips-2010-Identifying graph-structured activation patterns in networks
13 0.56990474 217 nips-2010-Probabilistic Multi-Task Feature Selection
14 0.56988233 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups
15 0.56818926 26 nips-2010-Adaptive Multi-Task Lasso: with Application to eQTL Detection
16 0.56815124 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
17 0.56800932 44 nips-2010-Brain covariance selection: better individual functional connectivity models using population prior
18 0.56681603 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts
19 0.56643295 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models
20 0.56570059 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning