nips nips2012 nips2012-203 knowledge-graph by maker-knowledge-mining

203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points


Source: pdf

Author: Azadeh Khaleghi, Daniil Ryabko

Abstract: The problem of multiple change point estimation is considered for sequences with unknown number of change points. A consistency framework is suggested that is suitable for highly dependent time-series, and an asymptotically consistent algorithm is proposed. In order for the consistency to be established the only assumption required is that the data is generated by stationary ergodic time-series distributions. No modeling, independence or parametric assumptions are made; the data are allowed to be dependent and the dependence can be of arbitrary form. The theoretical results are complemented with experimental evaluations. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 fr Abstract The problem of multiple change point estimation is considered for sequences with unknown number of change points. [sent-4, score-1.456]

2 A consistency framework is suggested that is suitable for highly dependent time-series, and an asymptotically consistent algorithm is proposed. [sent-5, score-0.347]

3 In order for the consistency to be established the only assumption required is that the data is generated by stationary ergodic time-series distributions. [sent-6, score-0.537]

4 However, even nonparametric formulations of the problem typically assume that the data in each segment are independent and identically distributed, and that the change necessarily affects singledimensional marginal distributions. [sent-33, score-0.84]

5 We propose a change point estimation algorithm that is asymptotically consistent under such minimal assumptions. [sent-37, score-0.856]

6 From a machine learning perspective, change point estimation is a difficult unsupervised learning problem: the objective is to estimate the change points in a given sequence while no labeled examples are available. [sent-41, score-1.547]

7 In time-series clustering, a set of sequences is to be partitioned, whereas in change point estimation the partitioning is done on a sequence of sequences. [sent-43, score-1.014]

8 This makes change point estimation a more challenging problem than time-series clustering. [sent-45, score-0.704]

9 In the general setting of highly-dependent time-series correct estimation of the number of change points is provably impossible, even in the weakest asymptotic sense, and even if there is at most one change [23]. [sent-46, score-1.301]

10 In light of the similarities between clustering and change point analysis, we propose a formulation that is motivated by hierarchical clustering. [sent-48, score-0.696]

11 In change point estimation with an unknown number k of change points, we suggest to aim for a sorted list of change points, whose first k elements are some permutation of the true change points. [sent-52, score-2.651]

12 and the change usually refers to the change in the mean. [sent-59, score-1.086]

13 In these frameworks the problem of estimating the number of change points is usually addressed with penalized criteria, see, for example, [19, 18]. [sent-64, score-0.662]

14 In nonparametric settings, the typical assumptions usually impose restrictions on the form of the change or the nature of dependence (e. [sent-65, score-0.671]

15 However, as mentioned above, when the number k of change points is unknown, it is provably impossible to estimate it, even under the assumption k ∈ {0, 1} [23]. [sent-72, score-0.732]

16 In particular, if the input k is not the correct number of change points, then the behavior of the algorithm proposed in [13] can be arbitrary bad. [sent-73, score-0.582]

17 We present a nonparametric change point estimation algorithm for time-series data with unknown number of change points. [sent-75, score-1.431]

18 We consider the most general framework where the only assumption made is that the unknown distributions generating the data are stationary ergodic. [sent-76, score-0.3]

19 Moreover, we do not need the finite-dimensional marginals of any fixed size before and after the change points to be different. [sent-78, score-0.662]

20 We show that the proposed algorithm is asymptotically consistent in the sense that among the change point estimates that it outputs, the first k converge to the true change points. [sent-80, score-1.446]

21 To the best of our knowledge, this work is the first to address the change point problem with an unknown number of change points in such general framework. [sent-82, score-1.395]

22 To this end, we generate our data by time-series distributions that, while being stationary ergodic, do not belong to any “simpler” class of processes. [sent-85, score-0.279]

23 Through our experiments we show that the algorithm is consistent in the sense that as the length of the input sequence grows, the produced change point estimations converge to the actual change points. [sent-87, score-1.576]

24 A stationary process ρ is called (stationary) ergodic if for all B ∈ B we have limn→∞ ν(X1. [sent-117, score-0.427]

25 The distributional distance between a pair of process distributions ρ1 and ρ2 is defined as follows ∞ d(ρ1 , ρ2 ) := |ρ1 (B) − ρ2 (B)| wm wl B∈B m,l m,l=1 where, wi := 2−i , i ∈ N. [sent-120, score-0.468]

26 Specifically, the empirical estimate of the distance between a sequence x = X1. [sent-125, score-0.276]

27 n ∈ X n , n ∈ N and a process distribution ρ is defined as ∞ ˆ d(x, ρ) := |ν(x, B) − ρ(B)| wm wl (2) B∈B m,l m,l=1 and for a pair of sequences xi ∈ X ni ni ∈ N, i = 1, 2. [sent-127, score-0.533]

28 Moreover, ˆ the estimates d(·, ·) are asymptotically consistent [25]: for any pair of stationary ergodic distributions ρ1 , ρ2 generating sequences xi ∈ X ni i = 1, 2 we have ˆ d(x1 , x2 ) = d(ρ1 , ρ2 ), a. [sent-130, score-0.915]

29 mn ln ˇ d(x1 , x2 ) := |ν(x1 , B) − ν(x2 , B)| wm wl m=1 l=1 (6) B∈B m,l where, mn and ln are any sequences of integers that go to infinity with n. [sent-135, score-0.449]

30 b+nα ), α ∈ (0, 1) t∈[a,b] 3 Problem Formulation We formalize the multiple change point estimation problem as follows. [sent-158, score-0.704]

31 3 (8) Each of these sequences is generated by an unknown stationary ergodic process distribution. [sent-172, score-0.67]

32 Moreover, every two consecutive sequences are generated by two different process distributions. [sent-173, score-0.337]

33 The parameters πk are unknown and have to be estimated; they are called change points. [sent-176, score-0.627]

34 A change point estimator is a function that takes a sequence x ˆ and a parameter λ ∈ (0, 1) and outputs a sequence of change point estimates, π := π1 , π2 , . [sent-180, score-1.669]

35 ˆ ˆ ˆ (Note that the total number of estimated change points 1/λ may be larger than the true number of change points κ. [sent-184, score-1.324]

36 ) To construct consistent algorithms, we assume that the change points πk are linear in n i. [sent-185, score-0.723]

37 We also define the minimum normalized distance between the change points as λmin := min θk − θk−1 (9) k=1. [sent-190, score-0.868]

38 If the length of one of the sequences is constant or sublinear in n then asymptotic consistency is impossible in this setting. [sent-194, score-0.402]

39 We define the consistency of a change point estimator as follows. [sent-195, score-0.836]

40 ˆ Definition 2 (Consistency of a change point estimator). [sent-196, score-0.649]

41 We call the change point estimator ˆ ˆ ˆ π asymptotically consistent if with probability 1 we have ˆ lim sup |θk − θk | = 0. [sent-211, score-0.991]

42 κ 4 Theoretical Results In this section we introduce a nonparametric multiple change point estimation algorithm for the case where the number of change points is unknown. [sent-214, score-1.466]

43 n into consecutive segments of length nα, where α := λ . [sent-222, score-0.323]

44 The single-change point estimator Φ(·, ·, ·) is used to generate a 3 candidate change point within every segment. [sent-223, score-0.951]

45 The change point candidates are ordered according to the performance-scores of their corresponding segments. [sent-225, score-0.737]

46 The algorithm assumes the input parameter λ to be a lower-bound on the true normalized minimum distance λmin between actual change points. [sent-226, score-0.714]

47 Hence, the sorted list of estimated change points is filtered in such a way that ˆ its elements are at least λn apart. [sent-227, score-0.863]

48 The algorithm outputs an ordered sequence π of change point estimates, where the ordering is done with respect to the performance scores s(·, ·). [sent-228, score-0.894]

49 κ of the output π converge to some permutation of the true change points, π1 , · · · , πκ . [sent-232, score-0.627]

50 n ∈ X n , n ∈ N be a sequence with change points at least nλmin apart, for some λmin ∈ (0, 1). [sent-236, score-0.811]

51 Indeed, in (3) all summands corresponding to m > maxi=1,2 ni equal 0; moreover, all summands corresponding to l > smin are equal, where smin corresponds to the partition in which each cell has at most one point in it smin := mini,j∈1. [sent-240, score-0.559]

52 A more efficient ˇ ˆ implementation can be obtained if one uses d(·, ·) given by (6), instead of d(·, ·), with m = log n, 4 Algorithm 1 Estimating the change points input: Sequence x = X1. [sent-244, score-0.662]

53 n , Minimum Normalized Distance between the change points λ ˆ initialize: Step size α ← λ , Output change point Sequence π ← () 3 1. [sent-246, score-1.311]

54 Generate 2 sets of index-sequences: bt ← nα(i + i 1 1 ), i = 0. [sent-247, score-0.319]

55 Use the single-change point-estimator (given by (8)) to estimate a change point in every segment: p(t, i) := Φx (bt , bt , α), i = 1. [sent-257, score-1.065]

56 Select an available change point estimate of highest score and add it to π: (τ, l) ← argmax(t,i)∈U s(t, i) - break the ties arbitrarily ˆ ˆ ˆ ˆ π ← π ⊕ p(τ, l), i. [sent-263, score-0.732]

57 Remove the estimates within a radius of λn/2 from π (l): ˆ U ← U \ {(t, i) : p(t, i) ∈ (ˆ(τ, l) − λn/2, p(τ, l) + λn/2)} ˆ p ˆ end while ˆ ˆ return: A sequence π of change point estimates. [sent-266, score-0.849]

58 where n is the length of the samples; in this case, the consistency results are unaffected, and the computational complexity of calculating the distance becomes n polylog n, making the complexity of the algorithm n2 polylog n. [sent-268, score-0.511]

59 First, recall that the empirical distributional distance between a given pair of sequences converges to the distributional distance between the corresponding process distributions. [sent-272, score-0.568]

60 On the other hand, if there is a single change point π within Xa. [sent-287, score-0.649]

61 In this case, the single-change point estimator Φ(a, b, α ) (defined by (8)) produces an estimate that from some n on converges to π provided that π is the only change point in Xa−nα . [sent-297, score-0.901]

62 When λ ≤ λmin , each of the index-sequences generated with α := λ partitions x in such a way 3 that every three consecutive segments of the partition contain at most one change point. [sent-301, score-0.881]

63 In this scenario, from some n on, the change point estimator Φ(·, ·, ·) produces correct candidates within each of the segments that contains a true change point. [sent-303, score-1.517]

64 Moreover, from some n on, the performance scores s(·, ·) of the segments without change points converge to 0, while those corresponding to the segments that encompass a change point converge 5 to a non-zero constant. [sent-304, score-1.837]

65 Thus from some n on, the κ change point candidates of highest performance score that are at least at a distance λn from one another, each converge to a unique change point. [sent-305, score-1.477]

66 A problem occurs if the generated index-sequence is such that it includes some of the change points as elements. [sent-306, score-0.696]

67 This way, 3 every change point is fully encompassed by at least one segment from either of the two partitions. [sent-309, score-0.886]

68 From the above argument we can see that segments with change points will have higher scores, and the change points within will be estimated correctly; finally, this is used to prove the theorem in the next session. [sent-311, score-1.488]

69 κ and every fixed t = 1, 2 we denote by Lt (πk ) 1 and by Rt (πk ) the elements of the index-sequence bt , i = 1. [sent-317, score-0.437]

70 Denote by κ the “unknown” number of change points and assume that for some ζ ∈ (0, 1) and some t = 1, 2 we have, inf k=1. [sent-336, score-0.698]

71 We show that from some n on, for every change point 3 the algorithm selects an appropriate segment satisfying these conditions, and assigns it a performance score s(·, ·) that converges to a non-zero constant. [sent-354, score-0.951]

72 Moreover, the performance scores of the segments without change points converge to 0. [sent-355, score-0.973]

73 Recall that, the change point candidates are finally sorted according to their performance scores, and the sorted list is filtered to include only elements that are at least λn apart. [sent-356, score-1.029]

74 For λ ≤ λmin , from some n on, the first κ elements of the output change ˆ point sequence π are some permutation of the true change points. [sent-357, score-1.427]

75 Recall that the algorithm specifies α := λ and generates a sequence of evenly-spaced 3 1 1 indicies bt := nα(i + t+1 ), i = 1. [sent-360, score-0.468]

76 α and t ∈ 1, 2 we have that the index bt is either exactly equal to a change point or i 1 has a linear distance from it. [sent-368, score-1.063]

77 1/α, a performance score s(t, i) is calculated as the intra-subsequence 1 distance ∆x (bt , bt ) of the segment Xbt . [sent-384, score-0.648]

78 Therefore, for every t = 1, 2 and every change point πk , k ∈ 1. [sent-399, score-0.779]

79 1/α, t = 1, 2 denote the change point that is contained within bt . [sent-408, score-0.968]

80 As specified in Step 3, the change point candidates are obtained as p(t, i) := ˆ i i+1 τ (i) τ (i) Φx (bi , bi+1 , α), i = 1. [sent-417, score-0.737]

81 , π1/λ by first sorting the change point candidates according to their perforˆ ˆ mance scores, and then filtering the sorted list so that the remaining elements are at least nλ apart. [sent-428, score-0.938]

82 ˆ It remains to see that the corresponding estimate of every change point appears exactly once in π. [sent-429, score-0.746]

83 bt , (t, i) ∈ I are assigned higher scores than i i+1 bt . [sent-432, score-0.734]

84 Moreover, by construction for every change point πk , k = 1. [sent-435, score-0.714]

85 By (17) and (18) the duplicate estimates of every change point are filtered, while estimates corresponding to different change points are left untouched. [sent-441, score-1.478]

86 To generate the data we have selected distributions that while being stationary ergodic, do not belong to any “simpler” class of time-series, and are difficult to approximate by finite-state models. [sent-450, score-0.279]

87 These distributions were used in [26] as examples of stationary ergodic processes which are not B-processes. [sent-452, score-0.476]

88 If α is irrational then x forms a stationary ergodic time-series. [sent-497, score-0.389]

89 23 and randomly generate κ = 3 change points at a minimum distance nλmin . [sent-521, score-0.852]

90 4 to respectively generate the four subsequences between every pair of consecutive change points. [sent-524, score-0.775]

91 Experiment 1: (Convergence with Sequence Length) In this experiment we demonstrate that the estimation error converges to 0 as the sequence length grows. [sent-525, score-0.357]

92 20000; at every iteration we generate an input sequence of length n as described above. [sent-528, score-0.395]

93 7 Outlook In this work we propose a consistency framework for multiple change points estimation in highly dependent time-series, for the case where the number of change points is unknown. [sent-541, score-1.574]

94 The notion of consistency that we consider requires an algorithm to produce a list of change points such that the first k change points approach the true unknown change points in asymptotic. [sent-542, score-2.241]

95 While in the general setting that we consider it is not possible to estimate the number of change points, other related formulations may be of interest. [sent-543, score-0.627]

96 For example, if the number of different time-series distributions is known, but the number of change points is not, it may still be possible to estimate the latter. [sent-544, score-0.749]

97 A simple example of this scenario would be when two distributions generate many segments in alternation. [sent-545, score-0.313]

98 , clustering) and thus may also prove useful in change point analysis. [sent-548, score-0.649]

99 Nonparametric change-point estimation for data from an ergodic sequence. [sent-592, score-0.312]

100 A nonparametric locationscale statistic for detecting a change point. [sent-669, score-0.643]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('change', 0.543), ('bt', 0.319), ('ergodic', 0.257), ('segments', 0.164), ('sequence', 0.149), ('segment', 0.145), ('stationary', 0.132), ('sequences', 0.125), ('points', 0.119), ('lt', 0.115), ('consistency', 0.114), ('polylog', 0.109), ('khaleghi', 0.109), ('point', 0.106), ('ryabko', 0.101), ('nonparametric', 0.1), ('rt', 0.1), ('scores', 0.096), ('distance', 0.095), ('smin', 0.093), ('asymptotically', 0.091), ('sorted', 0.091), ('wm', 0.09), ('candidates', 0.088), ('wl', 0.086), ('length', 0.084), ('unknown', 0.084), ('ni', 0.08), ('consecutive', 0.075), ('min', 0.074), ('estimator', 0.073), ('distributional', 0.07), ('every', 0.065), ('lim', 0.065), ('das', 0.062), ('azadeh', 0.062), ('consistent', 0.061), ('sort', 0.06), ('xa', 0.058), ('generate', 0.058), ('list', 0.057), ('estimation', 0.055), ('terrence', 0.055), ('mitigation', 0.055), ('distributions', 0.055), ('ltered', 0.054), ('elements', 0.053), ('dependent', 0.053), ('sup', 0.052), ('formulations', 0.052), ('converge', 0.051), ('estimates', 0.051), ('score', 0.051), ('xbt', 0.05), ('changepoint', 0.05), ('daniil', 0.05), ('ri', 0.049), ('summands', 0.047), ('clustering', 0.047), ('moreover', 0.047), ('concatenation', 0.047), ('informal', 0.043), ('converges', 0.041), ('mn', 0.041), ('asymptotic', 0.041), ('bi', 0.041), ('lemma', 0.04), ('input', 0.039), ('prentice', 0.039), ('impossible', 0.038), ('calculated', 0.038), ('process', 0.038), ('minimum', 0.037), ('scenario', 0.036), ('inf', 0.036), ('partitioning', 0.036), ('pair', 0.034), ('belong', 0.034), ('adams', 0.034), ('generated', 0.034), ('permutation', 0.033), ('fix', 0.033), ('ln', 0.033), ('estimate', 0.032), ('processes', 0.032), ('nally', 0.03), ('andrew', 0.03), ('generating', 0.029), ('frequencies', 0.029), ('maxi', 0.029), ('xn', 0.028), ('dependence', 0.028), ('highly', 0.028), ('experiment', 0.028), ('cubes', 0.027), ('cylinders', 0.027), ('abbreviation', 0.027), ('bookstore', 0.027), ('encompassed', 0.027), ('exi', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

Author: Azadeh Khaleghi, Daniil Ryabko

Abstract: The problem of multiple change point estimation is considered for sequences with unknown number of change points. A consistency framework is suggested that is suitable for highly dependent time-series, and an asymptotically consistent algorithm is proposed. In order for the consistency to be established the only assumption required is that the data is generated by stationary ergodic time-series distributions. No modeling, independence or parametric assumptions are made; the data are allowed to be dependent and the dependence can be of arbitrary form. The theoretical results are complemented with experimental evaluations. 1

2 0.23506445 291 nips-2012-Reducing statistical time-series problems to binary classification

Author: Daniil Ryabko, Jeremie Mary

Abstract: We show how binary classification methods developed to work on i.i.d. data can be used for solving statistical problems that are seemingly unrelated to classification and concern highly-dependent time series. Specifically, the problems of time-series clustering, homogeneity testing and the three-sample problem are addressed. The algorithms that we construct for solving these problems are based on a new metric between time-series distributions, which can be evaluated using binary classification methods. Universal consistency of the proposed algorithms is proven under most general assumptions. The theoretical results are illustrated with experiments on synthetic and real-world data. 1

3 0.1309157 151 nips-2012-High-Order Multi-Task Feature Learning to Identify Longitudinal Phenotypic Markers for Alzheimer's Disease Progression Prediction

Author: Hua Wang, Feiping Nie, Heng Huang, Jingwen Yan, Sungeun Kim, Shannon Risacher, Andrew Saykin, Li Shen

Abstract: Alzheimer’s disease (AD) is a neurodegenerative disorder characterized by progressive impairment of memory and other cognitive functions. Regression analysis has been studied to relate neuroimaging measures to cognitive status. However, whether these measures have further predictive power to infer a trajectory of cognitive performance over time is still an under-explored but important topic in AD research. We propose a novel high-order multi-task learning model to address this issue. The proposed model explores the temporal correlations existing in imaging and cognitive data by structured sparsity-inducing norms. The sparsity of the model enables the selection of a small number of imaging measures while maintaining high prediction accuracy. The empirical studies, using the longitudinal imaging and cognitive data of the ADNI cohort, have yielded promising results.

4 0.12141824 41 nips-2012-Ancestor Sampling for Particle Gibbs

Author: Fredrik Lindsten, Thomas Schön, Michael I. Jordan

Abstract: We present a novel method in the family of particle MCMC methods that we refer to as particle Gibbs with ancestor sampling (PG-AS). Similarly to the existing PG with backward simulation (PG-BS) procedure, we use backward sampling to (considerably) improve the mixing of the PG kernel. Instead of using separate forward and backward sweeps as in PG-BS, however, we achieve the same effect in a single forward sweep. We apply the PG-AS framework to the challenging class of non-Markovian state-space models. We develop a truncation strategy of these models that is applicable in principle to any backward-simulation-based method, but which is particularly well suited to the PG-AS framework. In particular, as we show in a simulation study, PG-AS can yield an order-of-magnitude improved accuracy relative to PG-BS due to its robustness to the truncation error. Several application examples are discussed, including Rao-Blackwellized particle smoothing and inference in degenerate state-space models. 1

5 0.11641524 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

Author: Yanping Huang, Timothy Hanks, Mike Shadlen, Abram L. Friesen, Rajesh P. Rao

Abstract: How does the brain combine prior knowledge with sensory evidence when making decisions under uncertainty? Two competing descriptive models have been proposed based on experimental data. The first posits an additive offset to a decision variable, implying a static effect of the prior. However, this model is inconsistent with recent data from a motion discrimination task involving temporal integration of uncertain sensory evidence. To explain this data, a second model has been proposed which assumes a time-varying influence of the prior. Here we present a normative model of decision making that incorporates prior knowledge in a principled way. We show that the additive offset model and the time-varying prior model emerge naturally when decision making is viewed within the framework of partially observable Markov decision processes (POMDPs). Decision making in the model reduces to (1) computing beliefs given observations and prior information in a Bayesian manner, and (2) selecting actions based on these beliefs to maximize the expected sum of future rewards. We show that the model can explain both data previously explained using the additive offset model as well as more recent data on the time-varying influence of prior knowledge on decision making. 1

6 0.10109681 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

7 0.092585154 292 nips-2012-Regularized Off-Policy TD-Learning

8 0.086445473 110 nips-2012-Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

9 0.086413719 165 nips-2012-Iterative ranking from pair-wise comparisons

10 0.085725382 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

11 0.083311148 16 nips-2012-A Polynomial-time Form of Robust Regression

12 0.080918334 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search

13 0.070369154 175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data

14 0.068737991 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

15 0.067784041 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models

16 0.06219966 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

17 0.061139826 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

18 0.06020255 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

19 0.059035096 227 nips-2012-Multiclass Learning with Simplex Coding

20 0.058445089 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.208), (1, -0.008), (2, 0.048), (3, 0.004), (4, -0.001), (5, -0.029), (6, 0.008), (7, 0.051), (8, 0.015), (9, 0.015), (10, 0.038), (11, -0.088), (12, -0.015), (13, -0.087), (14, -0.003), (15, -0.081), (16, -0.007), (17, 0.014), (18, -0.011), (19, -0.037), (20, 0.008), (21, 0.004), (22, -0.098), (23, -0.032), (24, 0.007), (25, -0.008), (26, 0.012), (27, 0.059), (28, 0.104), (29, -0.099), (30, 0.241), (31, 0.005), (32, -0.02), (33, -0.022), (34, 0.07), (35, 0.064), (36, -0.073), (37, 0.008), (38, -0.044), (39, 0.004), (40, -0.029), (41, 0.013), (42, -0.073), (43, -0.272), (44, 0.014), (45, -0.044), (46, -0.126), (47, -0.009), (48, -0.246), (49, 0.075)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97887045 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

Author: Azadeh Khaleghi, Daniil Ryabko

Abstract: The problem of multiple change point estimation is considered for sequences with unknown number of change points. A consistency framework is suggested that is suitable for highly dependent time-series, and an asymptotically consistent algorithm is proposed. In order for the consistency to be established the only assumption required is that the data is generated by stationary ergodic time-series distributions. No modeling, independence or parametric assumptions are made; the data are allowed to be dependent and the dependence can be of arbitrary form. The theoretical results are complemented with experimental evaluations. 1

2 0.67923069 291 nips-2012-Reducing statistical time-series problems to binary classification

Author: Daniil Ryabko, Jeremie Mary

Abstract: We show how binary classification methods developed to work on i.i.d. data can be used for solving statistical problems that are seemingly unrelated to classification and concern highly-dependent time series. Specifically, the problems of time-series clustering, homogeneity testing and the three-sample problem are addressed. The algorithms that we construct for solving these problems are based on a new metric between time-series distributions, which can be evaluated using binary classification methods. Universal consistency of the proposed algorithms is proven under most general assumptions. The theoretical results are illustrated with experiments on synthetic and real-world data. 1

3 0.63083464 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

Author: Lloyd Elliott, Yee W. Teh

Abstract: We present a Bayesian nonparametric model for genetic sequence data in which a set of genetic sequences is modelled using a Markov model of partitions. The partitions at consecutive locations in the genome are related by the splitting and merging of their clusters. Our model can be thought of as a discrete analogue of the continuous fragmentation-coagulation process [Teh et al 2011], preserving the important properties of projectivity, exchangeability and reversibility, while being more scalable. We apply this model to the problem of genotype imputation, showing improved computational efficiency while maintaining accuracies comparable to other state-of-the-art genotype imputation methods. 1

4 0.60514891 151 nips-2012-High-Order Multi-Task Feature Learning to Identify Longitudinal Phenotypic Markers for Alzheimer's Disease Progression Prediction

Author: Hua Wang, Feiping Nie, Heng Huang, Jingwen Yan, Sungeun Kim, Shannon Risacher, Andrew Saykin, Li Shen

Abstract: Alzheimer’s disease (AD) is a neurodegenerative disorder characterized by progressive impairment of memory and other cognitive functions. Regression analysis has been studied to relate neuroimaging measures to cognitive status. However, whether these measures have further predictive power to infer a trajectory of cognitive performance over time is still an under-explored but important topic in AD research. We propose a novel high-order multi-task learning model to address this issue. The proposed model explores the temporal correlations existing in imaging and cognitive data by structured sparsity-inducing norms. The sparsity of the model enables the selection of a small number of imaging measures while maintaining high prediction accuracy. The empirical studies, using the longitudinal imaging and cognitive data of the ADNI cohort, have yielded promising results.

5 0.53971344 110 nips-2012-Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

Author: Morteza Ibrahimi, Adel Javanmard, Benjamin V. Roy

Abstract: We study the problem of adaptive control of a high dimensional linear quadratic (LQ) system. Previous work established the asymptotic convergence to an optimal controller for various adaptive control schemes. More recently, for the average √ cost LQ problem, a regret bound of O( T ) was shown, apart form logarithmic factors. However, this bound scales exponentially with p, the dimension of the state space. In this work we consider the case where the matrices describing the dynamic of the LQ system are sparse and their dimensions are large. We present √ an adaptive control scheme that achieves a regret bound of O(p T ), apart from logarithmic factors. In particular, our algorithm has an average cost of (1 + ) times the optimum cost after T = polylog(p)O(1/ 2 ). This is in comparison to previous work on the dense dynamics where the algorithm requires time that scales exponentially with dimension in order to achieve regret of times the optimal cost. We believe that our result has prominent applications in the emerging area of computational advertising, in particular targeted online advertising and advertising in social networks. 1

6 0.52860129 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

7 0.51356 175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data

8 0.51273775 250 nips-2012-On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization

9 0.47516203 136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models

10 0.4636806 95 nips-2012-Density-Difference Estimation

11 0.43463659 155 nips-2012-Human memory search as a random walk in a semantic network

12 0.43202817 41 nips-2012-Ancestor Sampling for Particle Gibbs

13 0.42511678 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

14 0.42433152 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

15 0.42084992 2 nips-2012-3D Social Saliency from Head-mounted Cameras

16 0.41876268 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

17 0.41827142 269 nips-2012-Persistent Homology for Learning Densities with Bounded Support

18 0.4167549 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

19 0.4041391 338 nips-2012-The Perturbed Variation

20 0.40334326 232 nips-2012-Multiplicative Forests for Continuous-Time Processes


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.066), (21, 0.036), (34, 0.133), (38, 0.143), (42, 0.023), (54, 0.019), (55, 0.016), (74, 0.055), (76, 0.223), (80, 0.106), (92, 0.092)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.97944927 322 nips-2012-Spiking and saturating dendrites differentially expand single neuron computation capacity

Author: Romain Cazé, Mark Humphries, Boris S. Gutkin

Abstract: The integration of excitatory inputs in dendrites is non-linear: multiple excitatory inputs can produce a local depolarization departing from the arithmetic sum of each input’s response taken separately. If this depolarization is bigger than the arithmetic sum, the dendrite is spiking; if the depolarization is smaller, the dendrite is saturating. Decomposing a dendritic tree into independent dendritic spiking units greatly extends its computational capacity, as the neuron then maps onto a two layer neural network, enabling it to compute linearly non-separable Boolean functions (lnBFs). How can these lnBFs be implemented by dendritic architectures in practise? And can saturating dendrites equally expand computational capacity? To address these questions we use a binary neuron model and Boolean algebra. First, we confirm that spiking dendrites enable a neuron to compute lnBFs using an architecture based on the disjunctive normal form (DNF). Second, we prove that saturating dendrites as well as spiking dendrites enable a neuron to compute lnBFs using an architecture based on the conjunctive normal form (CNF). Contrary to a DNF-based architecture, in a CNF-based architecture, dendritic unit tunings do not imply the neuron tuning, as has been observed experimentally. Third, we show that one cannot use a DNF-based architecture with saturating dendrites. Consequently, we show that an important family of lnBFs implemented with a CNF-architecture can require an exponential number of saturating dendritic units, whereas the same family implemented with either a DNF-architecture or a CNF-architecture always require a linear number of spiking dendritic units. This minimization could explain why a neuron spends energetic resources to make its dendrites spike. 1

same-paper 2 0.92508399 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

Author: Azadeh Khaleghi, Daniil Ryabko

Abstract: The problem of multiple change point estimation is considered for sequences with unknown number of change points. A consistency framework is suggested that is suitable for highly dependent time-series, and an asymptotically consistent algorithm is proposed. In order for the consistency to be established the only assumption required is that the data is generated by stationary ergodic time-series distributions. No modeling, independence or parametric assumptions are made; the data are allowed to be dependent and the dependence can be of arbitrary form. The theoretical results are complemented with experimental evaluations. 1

3 0.89851141 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

Author: Ben Calderhead, Mátyás A. Sustik

Abstract: One of the enduring challenges in Markov chain Monte Carlo methodology is the development of proposal mechanisms to make moves distant from the current point, that are accepted with high probability and at low computational cost. The recent introduction of locally adaptive MCMC methods based on the natural underlying Riemannian geometry of such models goes some way to alleviating these problems for certain classes of models for which the metric tensor is analytically tractable, however computational efficiency is not assured due to the necessity of potentially high-dimensional matrix operations at each iteration. In this paper we firstly investigate a sampling-based approach for approximating the metric tensor and suggest a valid MCMC algorithm that extends the applicability of Riemannian Manifold MCMC methods to statistical models that do not admit an analytically computable metric tensor. Secondly, we show how the approximation scheme we consider naturally motivates the use of 1 regularisation to improve estimates and obtain a sparse approximate inverse of the metric, which enables stable and sparse approximations of the local geometry to be made. We demonstrate the application of this algorithm for inferring the parameters of a realistic system of ordinary differential equations using a biologically motivated robust Student-t error model, for which the Expected Fisher Information is analytically intractable. 1

4 0.89795518 321 nips-2012-Spectral learning of linear dynamics from generalised-linear observations with application to neural population data

Author: Lars Buesing, Maneesh Sahani, Jakob H. Macke

Abstract: Latent linear dynamical systems with generalised-linear observation models arise in a variety of applications, for instance when modelling the spiking activity of populations of neurons. Here, we show how spectral learning methods (usually called subspace identification in this context) for linear systems with linear-Gaussian observations can be extended to estimate the parameters of a generalised-linear dynamical system model despite a non-linear and non-Gaussian observation process. We use this approach to obtain estimates of parameters for a dynamical model of neural population data, where the observed spike-counts are Poisson-distributed with log-rates determined by the latent dynamical process, possibly driven by external inputs. We show that the extended subspace identification algorithm is consistent and accurately recovers the correct parameters on large simulated data sets with a single calculation, avoiding the costly iterative computation of approximate expectation-maximisation (EM). Even on smaller data sets, it provides an effective initialisation for EM, avoiding local optima and speeding convergence. These benefits are shown to extend to real neural data.

5 0.89652765 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes

Author: Simon Lyons, Amos J. Storkey, Simo Särkkä

Abstract: Stochastic differential equations (SDE) are a natural tool for modelling systems that are inherently noisy or contain uncertainties that can be modelled as stochastic processes. Crucial to the process of using SDE to build mathematical models is the ability to estimate parameters of those models from observed data. Over the past few decades, significant progress has been made on this problem, but we are still far from having a definitive solution. We describe a novel method of approximating a diffusion process that we show to be useful in Markov chain Monte-Carlo (MCMC) inference algorithms. We take the ‘white’ noise that drives a diffusion process and decompose it into two terms. The first is a ‘coloured noise’ term that can be deterministically controlled by a set of auxilliary variables. The second term is small and enables us to form a linear Gaussian ‘small noise’ approximation. The decomposition allows us to take a diffusion process of interest and cast it in a form that is amenable to sampling by MCMC methods. We explain why many state-of-the-art inference methods fail on highly nonlinear inference problems, and we demonstrate experimentally that our method performs well in such situations. Our results show that this method is a promising new tool for use in inference and parameter estimation problems. 1

6 0.89584929 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

7 0.89469182 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

8 0.89468485 294 nips-2012-Repulsive Mixtures

9 0.89436966 220 nips-2012-Monte Carlo Methods for Maximum Margin Supervised Topic Models

10 0.89324319 197 nips-2012-Learning with Recursive Perceptual Representations

11 0.8931551 188 nips-2012-Learning from Distributions via Support Measure Machines

12 0.892968 126 nips-2012-FastEx: Hash Clustering with Exponential Families

13 0.89264101 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

14 0.89173287 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

15 0.8915655 41 nips-2012-Ancestor Sampling for Particle Gibbs

16 0.89155132 163 nips-2012-Isotropic Hashing

17 0.89126325 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition

18 0.88988858 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

19 0.88988394 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

20 0.88895673 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines