jmlr jmlr2013 jmlr2013-82 knowledge-graph by maker-knowledge-mining

82 jmlr-2013-Optimally Fuzzy Temporal Memory


Source: pdf

Author: Karthik H. Shankar, Marc W. Howard

Abstract: Any learner with the ability to predict the future of a structured time-varying signal must maintain a memory of the recent past. If the signal has a characteristic timescale relevant to future prediction, the memory can be a simple shift register—a moving window extending into the past, requiring storage resources that linearly grows with the timescale to be represented. However, an independent general purpose learner cannot a priori know the characteristic prediction-relevant timescale of the signal. Moreover, many naturally occurring signals show scale-free long range correlations implying that the natural prediction-relevant timescale is essentially unbounded. Hence the learner should maintain information from the longest possible timescale allowed by resource availability. Here we construct a fuzzy memory system that optimally sacrifices the temporal accuracy of information in a scale-free fashion in order to represent prediction-relevant information from exponentially long timescales. Using several illustrative examples, we demonstrate the advantage of the fuzzy memory system over a shift register in time series forecasting of natural signals. When the available storage resources are limited, we suggest that a general purpose learner would be better off committing to such a fuzzy memory system. Keywords: temporal information compression, forecasting long range correlated time series

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 If the signal has a characteristic timescale relevant to future prediction, the memory can be a simple shift register—a moving window extending into the past, requiring storage resources that linearly grows with the timescale to be represented. [sent-6, score-0.809]

2 Here we construct a fuzzy memory system that optimally sacrifices the temporal accuracy of information in a scale-free fashion in order to represent prediction-relevant information from exponentially long timescales. [sent-10, score-0.939]

3 Using several illustrative examples, we demonstrate the advantage of the fuzzy memory system over a shift register in time series forecasting of natural signals. [sent-11, score-1.583]

4 When the available storage resources are limited, we suggest that a general purpose learner would be better off committing to such a fuzzy memory system. [sent-12, score-0.875]

5 A shift register can accurately represent information from the recent past up to a chosen timescale, while consuming resources that grow linearly with that timescale. [sent-18, score-0.795]

6 With an optimal choice of a set of memory nodes, this representation naturally leads to a self-sufficient fuzzy memory system. [sent-39, score-1.085]

7 In Section 4, we illustrate the utility of the fuzzy memory with some simple time series forecasting examples. [sent-40, score-0.95]

8 We show that the fuzzy memory enhances the ability to predict the future in comparison to a shift register with equal number of nodes. [sent-41, score-1.402]

9 The x-axis labeled as shift register denotes a memory buffer wherein each vn is accurately stored in 1. [sent-58, score-1.229]

10 Fuzzy temporal memory is not related to fuzzy logic. [sent-59, score-0.877]

11 The figure contrasts the way in which the information is represented in a shift register and the fuzzy buffer. [sent-64, score-1.113]

12 Each node of the fuzzy buffer linearly combines information from a bin containing multiple shift register nodes. [sent-65, score-1.539]

13 The shift register can thus self sufficiently evolve in real time. [sent-69, score-0.757]

14 We will show that this is achieved in the fuzzy buffer shown in Figure 1. [sent-72, score-0.737]

15 Each node of the fuzzy buffer holds the average of vn s over a bin. [sent-73, score-0.867]

16 It is clear that the fuzzy buffer shown in Figure 1 cannot evolve self sufficiently in real time; information lost during compression of a bin at any moment is required at the next moment to recompute the bin average. [sent-77, score-1.129]

17 Even though such a fuzzy buffer is not self sufficient, we shall analyze it to derive the optimal binning strategy that maximally preserves prediction-relevant information from the past . [sent-79, score-0.914]

18 When the entire V is accurately represented in an infinite shift register, the total P I C contained in the shift register is the sum of all a(n). [sent-100, score-0.851]

19 n=1 In a shift register with Nmax nodes, the total P I C is P I C SR = tot ∞ Nmax ∑ a(n) = n=1 d→0 −→ − 1− ∑ a(n) Nmax d ln Nmax . [sent-103, score-0.74]

20 So the shift register is ill-suited to represent information from a long range correlated series. [sent-116, score-0.729]

21 The P I C tot for a memory buffer can be increased if each of its nodes stored a linear combination of many vn s rather than a single vn as in the shift register. [sent-117, score-1.181]

22 d (5) When the total number of nodes Nmax is finite, and we want to represent information from the longest possible timescale, the straightforward choice is to pick successive bins such that they completely tile up the past time line as schematically shown by fuzzy buffer in Figure 1. [sent-142, score-1.125]

23 If we label 3789 S HANKAR AND H OWARD the nodes of the fuzzy buffer by N, ranging from 1 to Nmax , and denote the starting point of each bin by nN , then nN+1 = (1 + c)nN =⇒ nN = n1 (1 + c)(N−1) , (6) where 1 + c = 21/(1+d) . [sent-143, score-1.015]

24 Note that the fuzzy buffer can represent information from timescales of the order nNmax , which is exponentially large compared to the timescales represented by a shift register with Nmax nodes. [sent-144, score-1.528]

25 − 1 − (1 + c)−d Nmax , 1 − (1 + c)−d (7) Comparing Equations 4 and 7, note that when d is small, the P I C FB of the fuzzy buffer grows tot linearly with Nmax while the P I C SR of the shift register grows logarithmically with Nmax . [sent-148, score-1.498]

26 28, while the P I C SR tot tot of the shift register is only 0. [sent-151, score-0.847]

27 Hence when Nmax is relatively small, the fuzzy buffer represents a lot more predictive information than a shift register. [sent-153, score-0.955]

28 The above description of the fuzzy buffer corresponds to the ideal case wherein the neighboring bins do not overlap and uniform averaging is performed within each bin. [sent-154, score-0.866]

29 However, this ideal fuzzy buffer cannot self sufficiently evolve in real time because at every moment all vn s are explicitly needed for its construction. [sent-156, score-1.048]

30 In the next section, we present a self sufficient memory system that possesses the critical property of the ideal fuzzy buffer, but differs from it by having overlapping bins and non-uniform weighted averaging within the bins. [sent-157, score-0.907]

31 To the extent the self sufficient fuzzy memory system resembles the ideal fuzzy buffer, we can expect it to be useful in representing long range correlated signals in a resource-limited environment. [sent-158, score-1.477]

32 Like the ideal fuzzy buffer described in the Section 2, this memory representation will sacrifice temporal accuracy to represent prediction-relevant information over exponential time scales. [sent-162, score-1.232]

33 But unlike the ideal fuzzy buffer, the resulting memory representation will be self sufficient, without requiring additional resources to construct the representation. [sent-163, score-0.945]

34 The activity of the t column is transcribed at each moment by the operator L-1 to represent the k past functional values in a scale-free fuzzy fashion in the T column. [sent-166, score-0.874]

35 Except for the fact that this ∗ weighting function is not uniform, the activity of a τ node has the desired property mimicking a bin of the ideal fuzzy buffer described in Section 2. [sent-208, score-1.084]

36 To the extent the activity in a discrete set of T nodes can be constructed from a discrete set of t nodes, this memory ∗ system as a whole can self sufficiently evolve in real time. [sent-213, score-0.799]

37 Since the activity of each τ node in the T ∗ column is constructed independently of the activity of other τ nodes, we can choose any discrete set ∗ of τ values to form our memory system. [sent-214, score-0.747]

38 In accordance with our analysis of the ideal fuzzy buffer in Section 2, we shall pick the following nodes. [sent-215, score-0.76]

39 (13) Together, these nodes form the fuzzy memory representation with Nmax nodes. [sent-220, score-1.006]

40 Unlike the ideal fuzzy buffer described in Section 2, it is not possible to associate a circumscribed bin for a node because the weighting function (see Equation 12) does not have compact support. [sent-222, score-0.908]

41 1 D ISCRETIZED D ERIVATIVE ∗ Although any set of τ nodes could be picked to form the memory buffer, their activity is ultimately constructed from the nodes in the t column whose s values are given by the one to one correspon∗ dence s = −k/τ. [sent-228, score-0.911]

42 In other words, even if the discretized implementation does not accurately match the continuum limit, it still accurately satisfies the basic properties we require from a fuzzy memory system. [sent-259, score-0.796]

43 In the appendix, it is shown that in the presence of scale free input signals, the mutual information shared by any two neighboring buffer nodes can be ∗ a constant only if the τ nodes are distributed according to Equation 13. [sent-294, score-0.785]

44 In Figure 4a where the ∗ τ values of the buffer nodes are chosen to be equidistant, the activity is smeared over more and more number of nodes as time progresses. [sent-298, score-0.929]

45 In Figure 4b where the τ values of the buffer nodes are chosen according to Equation 13, the activity pattern does not smear, instead the activity as a whole gets translated with an overall reduction in size. [sent-300, score-0.819]

46 The translational invariance of the activity pattern as it passes through the buffer nodes explains why the information redundancy between any two neighboring nodes is a constant. [sent-301, score-1.042]

47 The activity of four successive fuzzy memory nodes N − 1, N, N + 1, and N + 2 in response to a delta function input at a past moment τo that falls right in between the timescales of the N-th and the (N + 1)-th nodes. [sent-332, score-1.455]

48 ∗ In summary, the fuzzy memory system is the set of T column nodes with τ values given by Equation 13, with the value of k appropriately matched with c to minimize information redundancy and information loss. [sent-356, score-1.122]

49 Time Series Forecasting We compare the performance of the self-sufficient fuzzy memory to a shift register in time series forecasting with a few simple illustrations. [sent-358, score-1.583]

50 Our goal here is to illustrate the differences between a simple shift register and the self-sufficient fuzzy memory. [sent-359, score-1.113]

51 The time series was sequentially fed into both the shift register and the self-sufficient fuzzy memory and the representations were evolved appropriately at each time step. [sent-377, score-1.555]

52 The values in the shift register nodes were shifted downstream at each time step as discussed Section 2. [sent-378, score-0.891]

53 The values in the self-sufficient fuzzy memory were evolved as described in Section 3, ∗ ∗ with τ values taken to be 1, 2, 4, 8, 16, 32,. [sent-380, score-0.769]

54 (h,i,j) Error in forecasting the series a, b and c using either the fuzzy memory or the the shift register. [sent-391, score-1.12]

55 Adding more nodes reduces the error for both the shift register and the self-sufficient fuzzy memory. [sent-406, score-1.323]

56 But for a given number of nodes, the fuzzy memory always has a lower error than the shift register. [sent-407, score-0.987]

57 This can be seen from Figure 6h where the curve corresponding to the fuzzy memory falls below that of the shift register. [sent-408, score-0.987]

58 Note that the fuzzy memory approaches this bound with a smaller number of nodes than the shift register. [sent-412, score-1.197]

59 Note from Figure 6i that with additional nodes the fuzzy memory performs better at forecasting and has a lower error in forecasting than a shift register. [sent-421, score-1.349]

60 This is because the fuzzy memory can represent much longer timescales than the shift register of equal size, and thereby exploit the long range correlations that exist. [sent-422, score-1.61]

61 The sunspots time series of length 2820 is extrapolated for 500 time steps in the future using (a) shift register with 8 nodes, and (b) fuzzy memory with 8 nodes. [sent-427, score-1.598]

62 As before, with more nodes, the fuzzy memory consistently has a lower error in forecasting than the shift register with equal number of nodes. [sent-431, score-1.478]

63 Note that when the number of nodes is increased from 4 to 8, the shift register does not improve in accuracy while the fuzzy memory continues to improve in accuracy. [sent-432, score-1.612]

64 With a single node, both fuzzy memory and shift register essentially just store the information from the previous time step. [sent-433, score-1.45]

65 Because most of the variance in the series can be captured by the information in the first node, the difference between the fuzzy memory and the shift register with additional nodes is not numerically overwhelming when viewed in Figure 6j. [sent-434, score-1.669]

66 A shift register with 8 nodes cannot represent these timescales but the fuzzy memory with 8 nodes can. [sent-439, score-1.901]

67 Forecasting error of the fuzzy memory, shift register and subsampled shift register with a node spacing of 5 and 10. [sent-455, score-1.901]

68 Figure 7b shows the series generated by the fuzzy memory with 8 nodes. [sent-459, score-0.826]

69 Note that the series predicted by the fuzzy memory continues in an oscillating fashion with decreasing amplitude for several cycles eventually settling at the mean value. [sent-460, score-0.856]

70 This is possible because the fuzzy memory represents the signal at a sufficiently long time scale to capture the negative correlations in the two-point correlation function. [sent-461, score-1.001]

71 Of course, a shift register with many more nodes can capture the long-range correlations and predict the periodic oscillations in the signal. [sent-462, score-0.912]

72 Other than the noticeable fact that the testing error is slightly higher than the training error, the shape of the two plots are very similar for both fuzzy memory and the shift register. [sent-470, score-0.987]

73 If our goal was to only capture the oscillatory structure of the sunspot series within a small number of regression coefficients, then we could subsample from a lengthy shift register so that 3803 S HANKAR AND H OWARD information from both positive and negative correlations can be obtained. [sent-471, score-0.805]

74 Although the subsampled shift register contains relatively few nodes, it cannot self sufficiently evolve in real time; we would need the resources associated with the complete shift register in order to evolve the memory at each moment. [sent-472, score-1.799]

75 By subsampling 8 equidistantly spaced nodes of the shift register 1,11,21,31,. [sent-473, score-0.843]

76 However it turns out that the forecasting error for the subsampled shift register is significantly higher than the forecasting error from the fuzzy memory. [sent-477, score-1.322]

77 Figure 8b shows the forecasting error for subsampled shift register with equidistant node spacing of 5 and 10. [sent-478, score-0.864]

78 Even though the subsampled shift register with a node spacing of 10 extends over a similar temporal range as the fuzzy memory, and captures the oscillatory structure in the data, the fuzzy memory outperforms it with a lower error. [sent-479, score-2.198]

79 The advantage of the fuzzy memory over the subsampled shift register comes from the property of averaging over many previous values at long time scales rather than picking a single noisy value and using that for prediction. [sent-480, score-1.514]

80 Discussion The fuzzy memory holds more predictively relevant information than a shift register with the same number of nodes for long-range correlated signals, and hence performs better in time series forecasting such signals. [sent-483, score-1.829]

81 Since the fuzzy memory discards information about the precise time of a stimulus presentation, the temporal inaccuracy in memory can help the learner make such a generalization. [sent-486, score-1.253]

82 After many learning trials, a learner relying on a shift register memory would be able to sample the entire distribution of delays and learn it precisely. [sent-489, score-0.961]

83 Because the fuzzy memory system represents the past information in a smeared fashion, a single training sample from the distribution will naturally let the learner make a scale-free temporal generalization about the distribution of delays between A and B. [sent-491, score-1.04]

84 It then seems natural to wonder if human and animal memory resembles the fuzzy memory system. [sent-493, score-1.058]

85 1 Neural Networks with Temporal Memory Let us now consider the fuzzy memory system in the context of neural networks with temporal memory. [sent-502, score-0.877]

86 If we define a memory function of the reservoir to be the precision with which inputs from each past moment can be reconstructed, it turns out that the net memory, or the area under the memory function over all past times, is bounded by the number of nodes in the reservoir Nmax . [sent-513, score-1.185]

87 For a simple shift register connectivity, the memory function is a step function which is 1 up to Nmax time steps in the past and zero beyond Nmax . [sent-515, score-1.066]

88 The self-sufficient fuzzy memory can be viewed as a special case of a linear reservoir with a specific, tailored connectivity. [sent-527, score-0.826]

89 To conserve storage resources associated with the time dimension, we could replace the shift register with the self-sufficient fuzzy memory. [sent-536, score-1.228]

90 If the components of a time varying high dimensional signal is represented in a temporally fuzzy fashion rather than in a shift register, then we could potentially extract the slowly varying parts in an online fashion by examining ∗ differences in the activities of the largest two τ nodes. [sent-543, score-0.902]

91 This suggests that the gradually fading temporal memory of the recurrent kernel functions is more effective than the step-function memory of shift register used in standard SVMs for time series prediction. [sent-551, score-1.469]

92 Training SVMs with standard kernel functions along with fuzzy memory inputs rather than shift register inputs is an alternative strategy for approaching problems involving signals with long range temporal correlations. [sent-552, score-1.598]

93 Moreover, since t nodes contain all the temporal information needed to construct the fuzzy memory, directly training the SVMs with inputs from t nodes could also be very fruitful. [sent-553, score-1.008]

94 An online learning algorithm tailored to act on the fuzzy memory representation could potentially be very useful for autonomous agents with finite memory resources. [sent-559, score-1.085]

95 Although the temporal accuracy of the signal is sacrificed, predictively relevant information from exponentially long timescales is available in the fuzzy memory system when compared to a shift register with the same number of nodes. [sent-565, score-1.663]

96 This is because, in a shift register the functional value of f at each moment is just passed on to the downstream nodes without being integrated, and the temporal autocorrelation in the activity of any node will simply reflect the temporal correlation in the input function. [sent-594, score-1.443]

97 At any instant, the activity of two different nodes in a shift register will be uncorrelated in response to a white noise input. [sent-596, score-1.139]

98 The different nodes in a shift register carry completely different information, making their mutual information zero. [sent-597, score-0.879]

99 As a point of comparison, it is useful to note that any node in a shift register will also exactly reflect the correlations in the input. [sent-625, score-0.782]

100 In this limit where |τ2 | ≫ |τ1 |, the correlation between the two nodes behaves like the correlation between two shift register nodes. [sent-632, score-0.925]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('fuzzy', 0.48), ('register', 0.415), ('memory', 0.289), ('buffer', 0.257), ('shift', 0.218), ('nodes', 0.21), ('nmax', 0.2), ('activity', 0.176), ('redundancy', 0.117), ('temporal', 0.108), ('tot', 0.107), ('hankar', 0.1), ('oward', 0.1), ('past', 0.096), ('emory', 0.093), ('emporal', 0.093), ('ptimally', 0.093), ('uzzy', 0.093), ('timescale', 0.086), ('self', 0.081), ('node', 0.08), ('timescales', 0.079), ('forecasting', 0.076), ('neighboring', 0.072), ('correlations', 0.069), ('bin', 0.068), ('white', 0.067), ('moment', 0.066), ('delta', 0.059), ('reservoir', 0.057), ('series', 0.057), ('smear', 0.05), ('vn', 0.05), ('time', 0.048), ('resources', 0.045), ('recurrent', 0.045), ('shankar', 0.043), ('sunspots', 0.043), ('spacing', 0.043), ('leaky', 0.043), ('vo', 0.043), ('evolve', 0.043), ('kk', 0.043), ('signal', 0.042), ('correlation', 0.041), ('instantaneous', 0.04), ('learner', 0.039), ('forecast', 0.039), ('snr', 0.039), ('spread', 0.036), ('mutual', 0.036), ('arfima', 0.036), ('correlated', 0.036), ('sr', 0.036), ('sacri', 0.035), ('equation', 0.035), ('bins', 0.034), ('long', 0.032), ('subsampled', 0.032), ('temporally', 0.031), ('fashion', 0.03), ('creutzig', 0.029), ('echo', 0.029), ('ganguli', 0.029), ('hermans', 0.029), ('jaeger', 0.029), ('periodicity', 0.029), ('wagenmakers', 0.029), ('signals', 0.028), ('range', 0.028), ('discretization', 0.028), ('smeared', 0.028), ('fb', 0.028), ('discretized', 0.027), ('year', 0.027), ('uncorrelated', 0.027), ('representation', 0.027), ('noise', 0.026), ('column', 0.026), ('oscillatory', 0.025), ('turns', 0.025), ('connectivity', 0.025), ('integrator', 0.024), ('temperature', 0.024), ('cing', 0.024), ('activities', 0.023), ('ideal', 0.023), ('vm', 0.022), ('storage', 0.022), ('extracted', 0.022), ('fractional', 0.022), ('howard', 0.022), ('linearly', 0.021), ('history', 0.021), ('autocorrelation', 0.021), ('compressing', 0.021), ('fractionally', 0.021), ('gallistel', 0.021), ('integrators', 0.021), ('sunspot', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 82 jmlr-2013-Optimally Fuzzy Temporal Memory

Author: Karthik H. Shankar, Marc W. Howard

Abstract: Any learner with the ability to predict the future of a structured time-varying signal must maintain a memory of the recent past. If the signal has a characteristic timescale relevant to future prediction, the memory can be a simple shift register—a moving window extending into the past, requiring storage resources that linearly grows with the timescale to be represented. However, an independent general purpose learner cannot a priori know the characteristic prediction-relevant timescale of the signal. Moreover, many naturally occurring signals show scale-free long range correlations implying that the natural prediction-relevant timescale is essentially unbounded. Hence the learner should maintain information from the longest possible timescale allowed by resource availability. Here we construct a fuzzy memory system that optimally sacrifices the temporal accuracy of information in a scale-free fashion in order to represent prediction-relevant information from exponentially long timescales. Using several illustrative examples, we demonstrate the advantage of the fuzzy memory system over a shift register in time series forecasting of natural signals. When the available storage resources are limited, we suggest that a general purpose learner would be better off committing to such a fuzzy memory system. Keywords: temporal information compression, forecasting long range correlated time series

2 0.083517499 58 jmlr-2013-Language-Motivated Approaches to Action Recognition

Author: Manavender R. Malgireddy, Ifeoma Nwogu, Venu Govindaraju

Abstract: We present language-motivated approaches to detecting, localizing and classifying activities and gestures in videos. In order to obtain statistical insight into the underlying patterns of motions in activities, we develop a dynamic, hierarchical Bayesian model which connects low-level visual features in videos with poses, motion patterns and classes of activities. This process is somewhat analogous to the method of detecting topics or categories from documents based on the word content of the documents, except that our documents are dynamic. The proposed generative model harnesses both the temporal ordering power of dynamic Bayesian networks such as hidden Markov models (HMMs) and the automatic clustering power of hierarchical Bayesian models such as the latent Dirichlet allocation (LDA) model. We also introduce a probabilistic framework for detecting and localizing pre-specified activities (or gestures) in a video sequence, analogous to the use of filler models for keyword detection in speech processing. We demonstrate the robustness of our classification model and our spotting framework by recognizing activities in unconstrained real-life video sequences and by spotting gestures via a one-shot-learning approach. Keywords: dynamic hierarchical Bayesian networks, topic models, activity recognition, gesture spotting, generative models

3 0.060289029 56 jmlr-2013-Keep It Simple And Sparse: Real-Time Action Recognition

Author: Sean Ryan Fanello, Ilaria Gori, Giorgio Metta, Francesca Odone

Abstract: Sparsity has been showed to be one of the most important properties for visual recognition purposes. In this paper we show that sparse representation plays a fundamental role in achieving one-shot learning and real-time recognition of actions. We start off from RGBD images, combine motion and appearance cues and extract state-of-the-art features in a computationally efficient way. The proposed method relies on descriptors based on 3D Histograms of Scene Flow (3DHOFs) and Global Histograms of Oriented Gradient (GHOGs); adaptive sparse coding is applied to capture high-level patterns from data. We then propose a simultaneous on-line video segmentation and recognition of actions using linear SVMs. The main contribution of the paper is an effective realtime system for one-shot action modeling and recognition; the paper highlights the effectiveness of sparse coding techniques to represent 3D actions. We obtain very good results on three different data sets: a benchmark data set for one-shot action learning (the ChaLearn Gesture Data Set), an in-house data set acquired by a Kinect sensor including complex actions and gestures differing by small details, and a data set created for human-robot interaction purposes. Finally we demonstrate that our system is effective also in a human-robot interaction setting and propose a memory game, “All Gestures You Can”, to be played against a humanoid robot. Keywords: real-time action recognition, sparse representation, one-shot action learning, human robot interaction

4 0.052724954 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning

Author: Markus Thom, Günther Palm

Abstract: Sparseness is a useful regularizer for learning in a wide range of applications, in particular in neural networks. This paper proposes a model targeted at classification tasks, where sparse activity and sparse connectivity are used to enhance classification capabilities. The tool for achieving this is a sparseness-enforcing projection operator which finds the closest vector with a pre-defined sparseness for any given vector. In the theoretical part of this paper, a comprehensive theory for such a projection is developed. In conclusion, it is shown that the projection is differentiable almost everywhere and can thus be implemented as a smooth neuronal transfer function. The entire model can hence be tuned end-to-end using gradient-based methods. Experiments on the MNIST database of handwritten digits show that classification performance can be boosted by sparse activity or sparse connectivity. With a combination of both, performance can be significantly better compared to classical non-sparse approaches. Keywords: supervised learning, sparseness projection, sparse activity, sparse connectivity

5 0.045449924 44 jmlr-2013-Finding Optimal Bayesian Networks Using Precedence Constraints

Author: Pekka Parviainen, Mikko Koivisto

Abstract: We consider the problem of finding a directed acyclic graph (DAG) that optimizes a decomposable Bayesian network score. While in a favorable case an optimal DAG can be found in polynomial time, in the worst case the fastest known algorithms rely on dynamic programming across the node subsets, taking time and space 2n , to within a factor polynomial in the number of nodes n. In practice, these algorithms are feasible to networks of at most around 30 nodes, mainly due to the large space requirement. Here, we generalize the dynamic programming approach to enhance its feasibility in three dimensions: first, the user may trade space against time; second, the proposed algorithms easily and efficiently parallelize onto thousands of processors; third, the algorithms can exploit any prior knowledge about the precedence relation on the nodes. Underlying all these results is the key observation that, given a partial order P on the nodes, an optimal DAG compatible with P can be found in time and space roughly proportional to the number of ideals of P , which can be significantly less than 2n . Considering sufficiently many carefully chosen partial orders guarantees that a globally optimal DAG will be found. Aside from the generic scheme, we present and analyze concrete tradeoff schemes based on parallel bucket orders. Keywords: exact algorithm, parallelization, partial order, space-time tradeoff, structure learning

6 0.040290002 92 jmlr-2013-Random Spanning Trees and the Prediction of Weighted Graphs

7 0.035876635 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model

8 0.035647538 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models

9 0.031839725 106 jmlr-2013-Stationary-Sparse Causality Network Learning

10 0.031608019 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

11 0.031358678 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation

12 0.02932041 110 jmlr-2013-Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion

13 0.027879151 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks

14 0.026534373 120 jmlr-2013-Variational Algorithms for Marginal MAP

15 0.0260693 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization

16 0.025979832 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows

17 0.02558668 121 jmlr-2013-Variational Inference in Nonconjugate Models

18 0.024391877 19 jmlr-2013-BudgetedSVM: A Toolbox for Scalable SVM Approximations

19 0.022267822 91 jmlr-2013-Query Induction with Schema-Guided Pruning Strategies

20 0.0220343 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.127), (1, -0.014), (2, -0.118), (3, 0.007), (4, 0.04), (5, 0.067), (6, 0.014), (7, 0.009), (8, -0.017), (9, 0.058), (10, -0.01), (11, 0.014), (12, -0.055), (13, -0.023), (14, 0.001), (15, 0.0), (16, -0.008), (17, 0.021), (18, 0.03), (19, 0.038), (20, 0.091), (21, 0.024), (22, -0.096), (23, 0.005), (24, 0.056), (25, -0.119), (26, -0.117), (27, -0.016), (28, 0.342), (29, 0.13), (30, -0.018), (31, -0.273), (32, -0.02), (33, 0.176), (34, 0.082), (35, 0.026), (36, 0.174), (37, 0.033), (38, 0.013), (39, -0.005), (40, -0.184), (41, 0.137), (42, 0.112), (43, 0.03), (44, -0.084), (45, 0.005), (46, 0.035), (47, -0.042), (48, 0.075), (49, 0.12)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96869129 82 jmlr-2013-Optimally Fuzzy Temporal Memory

Author: Karthik H. Shankar, Marc W. Howard

Abstract: Any learner with the ability to predict the future of a structured time-varying signal must maintain a memory of the recent past. If the signal has a characteristic timescale relevant to future prediction, the memory can be a simple shift register—a moving window extending into the past, requiring storage resources that linearly grows with the timescale to be represented. However, an independent general purpose learner cannot a priori know the characteristic prediction-relevant timescale of the signal. Moreover, many naturally occurring signals show scale-free long range correlations implying that the natural prediction-relevant timescale is essentially unbounded. Hence the learner should maintain information from the longest possible timescale allowed by resource availability. Here we construct a fuzzy memory system that optimally sacrifices the temporal accuracy of information in a scale-free fashion in order to represent prediction-relevant information from exponentially long timescales. Using several illustrative examples, we demonstrate the advantage of the fuzzy memory system over a shift register in time series forecasting of natural signals. When the available storage resources are limited, we suggest that a general purpose learner would be better off committing to such a fuzzy memory system. Keywords: temporal information compression, forecasting long range correlated time series

2 0.51020986 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks

Author: Vinod K. Valsalam, Risto Miikkulainen

Abstract: Sorting networks are an interesting class of parallel sorting algorithms with applications in multiprocessor computers and switching networks. They are built by cascading a series of comparisonexchange units called comparators. Minimizing the number of comparators for a given number of inputs is a challenging optimization problem. This paper presents a two-pronged approach called Symmetry and Evolution based Network Sort Optimization (SENSO) that makes it possible to scale the solutions to networks with a larger number of inputs than previously possible. First, it uses the symmetry of the problem to decompose the minimization goal into subgoals that are easier to solve. Second, it minimizes the resulting greedy solutions further by using an evolutionary algorithm to learn the statistical distribution of comparators in minimal networks. The final solutions improve upon half-century of results published in patents, books, and peer-reviewed literature, demonstrating the potential of the SENSO approach for solving difficult combinatorial problems. Keywords: symmetry, evolution, estimation of distribution algorithms, sorting networks, combinatorial optimization

3 0.47205669 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning

Author: Markus Thom, Günther Palm

Abstract: Sparseness is a useful regularizer for learning in a wide range of applications, in particular in neural networks. This paper proposes a model targeted at classification tasks, where sparse activity and sparse connectivity are used to enhance classification capabilities. The tool for achieving this is a sparseness-enforcing projection operator which finds the closest vector with a pre-defined sparseness for any given vector. In the theoretical part of this paper, a comprehensive theory for such a projection is developed. In conclusion, it is shown that the projection is differentiable almost everywhere and can thus be implemented as a smooth neuronal transfer function. The entire model can hence be tuned end-to-end using gradient-based methods. Experiments on the MNIST database of handwritten digits show that classification performance can be boosted by sparse activity or sparse connectivity. With a combination of both, performance can be significantly better compared to classical non-sparse approaches. Keywords: supervised learning, sparseness projection, sparse activity, sparse connectivity

4 0.39030886 44 jmlr-2013-Finding Optimal Bayesian Networks Using Precedence Constraints

Author: Pekka Parviainen, Mikko Koivisto

Abstract: We consider the problem of finding a directed acyclic graph (DAG) that optimizes a decomposable Bayesian network score. While in a favorable case an optimal DAG can be found in polynomial time, in the worst case the fastest known algorithms rely on dynamic programming across the node subsets, taking time and space 2n , to within a factor polynomial in the number of nodes n. In practice, these algorithms are feasible to networks of at most around 30 nodes, mainly due to the large space requirement. Here, we generalize the dynamic programming approach to enhance its feasibility in three dimensions: first, the user may trade space against time; second, the proposed algorithms easily and efficiently parallelize onto thousands of processors; third, the algorithms can exploit any prior knowledge about the precedence relation on the nodes. Underlying all these results is the key observation that, given a partial order P on the nodes, an optimal DAG compatible with P can be found in time and space roughly proportional to the number of ideals of P , which can be significantly less than 2n . Considering sufficiently many carefully chosen partial orders guarantees that a globally optimal DAG will be found. Aside from the generic scheme, we present and analyze concrete tradeoff schemes based on parallel bucket orders. Keywords: exact algorithm, parallelization, partial order, space-time tradeoff, structure learning

5 0.35678524 58 jmlr-2013-Language-Motivated Approaches to Action Recognition

Author: Manavender R. Malgireddy, Ifeoma Nwogu, Venu Govindaraju

Abstract: We present language-motivated approaches to detecting, localizing and classifying activities and gestures in videos. In order to obtain statistical insight into the underlying patterns of motions in activities, we develop a dynamic, hierarchical Bayesian model which connects low-level visual features in videos with poses, motion patterns and classes of activities. This process is somewhat analogous to the method of detecting topics or categories from documents based on the word content of the documents, except that our documents are dynamic. The proposed generative model harnesses both the temporal ordering power of dynamic Bayesian networks such as hidden Markov models (HMMs) and the automatic clustering power of hierarchical Bayesian models such as the latent Dirichlet allocation (LDA) model. We also introduce a probabilistic framework for detecting and localizing pre-specified activities (or gestures) in a video sequence, analogous to the use of filler models for keyword detection in speech processing. We demonstrate the robustness of our classification model and our spotting framework by recognizing activities in unconstrained real-life video sequences and by spotting gestures via a one-shot-learning approach. Keywords: dynamic hierarchical Bayesian networks, topic models, activity recognition, gesture spotting, generative models

6 0.29457647 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation

7 0.27573267 106 jmlr-2013-Stationary-Sparse Causality Network Learning

8 0.2552976 91 jmlr-2013-Query Induction with Schema-Guided Pruning Strategies

9 0.24699204 19 jmlr-2013-BudgetedSVM: A Toolbox for Scalable SVM Approximations

10 0.24012528 92 jmlr-2013-Random Spanning Trees and the Prediction of Weighted Graphs

11 0.20300914 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model

12 0.19179188 85 jmlr-2013-Pairwise Likelihood Ratios for Estimation of Non-Gaussian Structural Equation Models

13 0.19069195 56 jmlr-2013-Keep It Simple And Sparse: Real-Time Action Recognition

14 0.17203327 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization

15 0.16817233 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions

16 0.1657846 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

17 0.15803279 15 jmlr-2013-Bayesian Canonical Correlation Analysis

18 0.14917687 81 jmlr-2013-Optimal Discovery with Probabilistic Expert Advice: Finite Time Analysis and Macroscopic Optimality

19 0.14368585 46 jmlr-2013-GURLS: A Least Squares Library for Supervised Learning

20 0.14315556 110 jmlr-2013-Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.018), (5, 0.103), (6, 0.09), (10, 0.062), (18, 0.395), (20, 0.04), (23, 0.041), (68, 0.019), (70, 0.024), (75, 0.044), (85, 0.02), (87, 0.019), (89, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.7428776 82 jmlr-2013-Optimally Fuzzy Temporal Memory

Author: Karthik H. Shankar, Marc W. Howard

Abstract: Any learner with the ability to predict the future of a structured time-varying signal must maintain a memory of the recent past. If the signal has a characteristic timescale relevant to future prediction, the memory can be a simple shift register—a moving window extending into the past, requiring storage resources that linearly grows with the timescale to be represented. However, an independent general purpose learner cannot a priori know the characteristic prediction-relevant timescale of the signal. Moreover, many naturally occurring signals show scale-free long range correlations implying that the natural prediction-relevant timescale is essentially unbounded. Hence the learner should maintain information from the longest possible timescale allowed by resource availability. Here we construct a fuzzy memory system that optimally sacrifices the temporal accuracy of information in a scale-free fashion in order to represent prediction-relevant information from exponentially long timescales. Using several illustrative examples, we demonstrate the advantage of the fuzzy memory system over a shift register in time series forecasting of natural signals. When the available storage resources are limited, we suggest that a general purpose learner would be better off committing to such a fuzzy memory system. Keywords: temporal information compression, forecasting long range correlated time series

2 0.37192914 106 jmlr-2013-Stationary-Sparse Causality Network Learning

Author: Yuejia He, Yiyuan She, Dapeng Wu

Abstract: Recently, researchers have proposed penalized maximum likelihood to identify network topology underlying a dynamical system modeled by multivariate time series. The time series of interest are assumed to be stationary, but this restriction is never taken into consideration by existing estimation methods. Moreover, practical problems of interest may have ultra-high dimensionality and obvious node collinearity. In addition, none of the available algorithms provides a probabilistic measure of the uncertainty for the obtained network topology which is informative in reliable network identification. The main purpose of this paper is to tackle these challenging issues. We propose the S2 learning framework, which stands for stationary-sparse network learning. We propose a novel algorithm referred to as the Berhu iterative sparsity pursuit with stationarity (BISPS), where the Berhu regularization can improve the Lasso in detection and estimation. The algorithm is extremely easy to implement, efficient in computation and has a theoretical guarantee to converge to a global optimum. We also incorporate a screening technique into BISPS to tackle ultra-high dimensional problems and enhance computational efficiency. Furthermore, a stationary bootstrap technique is applied to provide connection occurring frequency for reliable topology learning. Experiments show that our method can achieve stationary and sparse causality network learning and is scalable for high-dimensional problems. Keywords: stationarity, sparsity, Berhu, screening, bootstrap

3 0.36170161 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model

Author: Dan Stowell, Mark D. Plumbley

Abstract: We describe an inference task in which a set of timestamped event observations must be clustered into an unknown number of temporal sequences with independent and varying rates of observations. Various existing approaches to multi-object tracking assume a fixed number of sources and/or a fixed observation rate; we develop an approach to inferring structure in timestamped data produced by a mixture of an unknown and varying number of similar Markov renewal processes, plus independent clutter noise. The inference simultaneously distinguishes signal from noise as well as clustering signal observations into separate source streams. We illustrate the technique via synthetic experiments as well as an experiment to track a mixture of singing birds. Source code is available. Keywords: multi-target tracking, clustering, point processes, flow network, sound

4 0.35709342 120 jmlr-2013-Variational Algorithms for Marginal MAP

Author: Qiang Liu, Alexander Ihler

Abstract: The marginal maximum a posteriori probability (MAP) estimation problem, which calculates the mode of the marginal posterior distribution of a subset of variables with the remaining variables marginalized, is an important inference problem in many models, such as those with hidden variables or uncertain parameters. Unfortunately, marginal MAP can be NP-hard even on trees, and has attracted less attention in the literature compared to the joint MAP (maximization) and marginalization problems. We derive a general dual representation for marginal MAP that naturally integrates the marginalization and maximization operations into a joint variational optimization problem, making it possible to easily extend most or all variational-based algorithms to marginal MAP. In particular, we derive a set of “mixed-product” message passing algorithms for marginal MAP, whose form is a hybrid of max-product, sum-product and a novel “argmax-product” message updates. We also derive a class of convergent algorithms based on proximal point methods, including one that transforms the marginal MAP problem into a sequence of standard marginalization problems. Theoretically, we provide guarantees under which our algorithms give globally or locally optimal solutions, and provide novel upper bounds on the optimal objectives. Empirically, we demonstrate that our algorithms significantly outperform the existing approaches, including a state-of-the-art algorithm based on local search methods. Keywords: graphical models, message passing, belief propagation, variational methods, maximum a posteriori, marginal-MAP, hidden variable models

5 0.35467073 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees

Author: Nima Noorshams, Martin J. Wainwright

Abstract: The sum-product or belief propagation (BP) algorithm is a widely used message-passing technique for computing approximate marginals in graphical models. We introduce a new technique, called stochastic orthogonal series message-passing (SOSMP), for computing the BP fixed point in models with continuous random variables. It is based on a deterministic approximation of the messages via orthogonal series basis expansion, and a stochastic estimation of the basis coefficients via Monte Carlo techniques and damped updates. We prove that the SOSMP iterates converge to a δ-neighborhood of the unique BP fixed point for any tree-structured graph, and for any graphs with cycles in which the BP updates satisfy a contractivity condition. In addition, we demonstrate how to choose the number of basis coefficients as a function of the desired approximation accuracy δ and smoothness of the compatibility functions. We illustrate our theory with both simulated examples and in application to optical flow estimation. Keywords: graphical models, sum-product for continuous state spaces, low-complexity belief propagation, stochastic approximation, Monte Carlo methods, orthogonal basis expansion

6 0.35328513 51 jmlr-2013-Greedy Sparsity-Constrained Optimization

7 0.3506617 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization

8 0.34961018 2 jmlr-2013-A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems

9 0.34934184 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

10 0.34757474 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut

11 0.34600562 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning

12 0.34456563 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

13 0.34316629 22 jmlr-2013-Classifying With Confidence From Incomplete Information

14 0.34193486 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

15 0.34104928 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

16 0.33993307 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning

17 0.33927721 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning

18 0.33807549 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

19 0.33797282 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

20 0.33759242 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization