nips nips2012 nips2012-269 knowledge-graph by maker-knowledge-mining

269 nips-2012-Persistent Homology for Learning Densities with Bounded Support


Source: pdf

Author: Florian T. Pokorny, Hedvig Kjellström, Danica Kragic, Carl Ek

Abstract: We present a novel method for learning densities with bounded support which enables us to incorporate ‘hard’ topological constraints. In particular, we show how emerging techniques from computational algebraic topology and the notion of persistent homology can be combined with kernel-based methods from machine learning for the purpose of density estimation. The proposed formalism facilitates learning of models with bounded support in a principled way, and – by incorporating persistent homology techniques in our approach – we are able to encode algebraic-topological constraints which are not addressed in current state of the art probabilistic models. We study the behaviour of our method on two synthetic examples for various sample sizes and exemplify the benefits of the proposed approach on a real-world dataset by learning a motion model for a race car. We show how to learn a model which respects the underlying topological structure of the racetrack, constraining the trajectories of the car. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 se Abstract We present a novel method for learning densities with bounded support which enables us to incorporate ‘hard’ topological constraints. [sent-4, score-0.461]

2 In particular, we show how emerging techniques from computational algebraic topology and the notion of persistent homology can be combined with kernel-based methods from machine learning for the purpose of density estimation. [sent-5, score-0.696]

3 The proposed formalism facilitates learning of models with bounded support in a principled way, and – by incorporating persistent homology techniques in our approach – we are able to encode algebraic-topological constraints which are not addressed in current state of the art probabilistic models. [sent-6, score-0.575]

4 We show how to learn a model which respects the underlying topological structure of the racetrack, constraining the trajectories of the car. [sent-8, score-0.377]

5 ) they do fall into the class of densities on Rd for which supp f = Rd ; i. [sent-12, score-0.165]

6 A simple example would be a probabilistic model of admissible positions of a robot in an indoor environment, where one wants to assign zero – rather than just ‘low’ – probability to positions corresponding to collisions with the environment. [sent-16, score-0.185]

7 In contrast to the above Gaussian models, we consider non-parametric density estimators based on spherical kernels with bounded support. [sent-20, score-0.369]

8 As we shall explain, this enables us to study topological properties of the support region Ωε for such estimators. [sent-21, score-0.421]

9 Kernel-based density estimators are wellestablished in the statistical literature [6] with the basic idea being that one should put a rescaled version of a given model density over each observed data-point to obtain an estimate for the probability density from which the data was sampled. [sent-22, score-0.468]

10 We focus particularly on spherical truncated Gaussian kernels here which have been some∗ This work was supported by the EU projects FLEXBOT (FP7-ERC-279933) and TOMSY (IST-FP7270436) and the Swedish Foundation for Strategic Research 1 what overlooked as a tool for probabilistic modelling. [sent-24, score-0.187]

11 A different interpretation of a density with support in an ε-ball can be given using the notion of bounded noise. [sent-26, score-0.227]

12 There, one assumes that observations are distorted by noise following a density with bounded support (instead of e. [sent-27, score-0.227]

13 Bounded noise models are used in the signal processing community for robust filtering and estimation [8, 9], but to our knowledge, we are the first to combine densities with bounded support and topology to model the underlying structure of data. [sent-30, score-0.193]

14 Persistent homology is a novel tool for studying topological properties of spaces such as Ωε (S) which has emerged from the field of computational algebraic topology in recent years [10, 11]. [sent-35, score-0.723]

15 Using persistent homology, it becomes possible to study clustering, periodicity and more generally the existence of ‘holes’ of various dimensions in Ωε (S) for ε lying in an interval. [sent-36, score-0.203]

16 ˆ Starting from the basic observation that one can construct a kernel-based density estimator fε whose region of support is exactly Ωε (S), this paper investigates the interplay between the topological information contained in Ωε (S) and a corresponding density estimate. [sent-37, score-0.764]

17 Specifically, we make the following contributions: • Given prior topological information about supp f = Ω, we define a topologically admissible bandwidth interval [εmin , εmax ] and propose and evaluate a topological bandwidth selector εtop ∈ [εmin , εmax ]. [sent-38, score-1.461]

18 • Given no prior topological information, we explain how persistent homology can be of use to determine a topologically admissible bandwidth interval. [sent-39, score-1.196]

19 • We describe how additional constraints defining a forbidden subset F ⊂ Rn of the parameter-space can be incorporated into our topological bandwidth estimation framework. [sent-40, score-0.66]

20 • We provide quantitative results on synthetic data in 1D and 2D evaluating the expected L2 errors for density estimators with topologically chosen bandwidth values ε ∈ {εmin , εmid , εmax , εtop }. [sent-41, score-0.554]

21 We carry out this evaluation for various spherical kernels and compare our results to an asymptotically optimal bandwidth choice. [sent-42, score-0.397]

22 1 Background Kernel-based density estimation Let S = {X1 , . [sent-45, score-0.147]

23 sample arising from a probability density f : Rd → R. [sent-51, score-0.181]

24 Kernel-based density estimation [13, 14, 15] is an approach for reconstructing f from the sample 1 ˆ by means of an estimator fε,n (x) = nεd n K x−Xi , where the kernel function K : Rd → R i=1 ε is a suitably chosen probability density. [sent-52, score-0.263]

25 Let {εn }∞ be n=1 a sequence of positive bandwidth values depending on the sample size n. [sent-55, score-0.272]

26 It follows from classical ˆ results [14, 15] that for any sufficiently well-behaved density K, limn→∞ E[(fεn ,n (x) − f (x))2 ] = 0 d provided that limn→∞ εn = 0 and limn→∞ nεn = ∞. [sent-56, score-0.147]

27 Despite this encouraging result, the question of determining the best bandwidth for a given sample is an ongoing research topic and the interested reader is referred to the review [7] for an in-depth discussion. [sent-57, score-0.272]

28 Due to the availability of a relatively simple explicit formula for AMISE, a large class of bandwidth selection methods attempt to estimate and minimize AMISE instead of working with MISE directly. [sent-65, score-0.238]

29 In this paper, we will work with two synthetic examples of densities for which we can compute εamise numerically in order to benchmark our topological bandwidth selection procedure. [sent-68, score-0.642]

30 1 4 ˆ for the indicated kernels and a corresponding estimator f4,3 for three sample points. [sent-75, score-0.17]

31 Persistent homology Consider the point cloud S shown in Figure 2(a). [sent-76, score-0.315]

32 One can reformulate the existence of the ‘hole’ in Figure 2(a) in a mathematically precise way using persistent homology [16] which has recently gained increasing traction as a tool for the analysis of structure in point-cloud data [10]. [sent-78, score-0.463]

33 5 (d) b0 (e) b1 Figure 2: Noisy data concentrated around a circle (a) and corresponding barcodes in dimension zero (d) and one (e). [sent-81, score-0.204]

34 While the vertical axis in the ith barcode has no special meaning, the horizontal axis displays the ε parameter of V2ε . [sent-85, score-0.386]

35 In the approach of [10], one starts with a subset Ω ⊂ Rd and assumes that there exists some probability density f on Rd that is concentrated near Ω. [sent-92, score-0.171]

36 sample S = {X1 , · · · , Xn } from the corresponding probability distribution, one of the aims of persistent homology in this setting is to recover some of the topological structure of Ω – the homology groups Hi (Ω, Z2 ), for i = 1, . [sent-96, score-1.161]

37 One of the properties of homology is that homology groups are invariant under a large class of deformations (i. [sent-101, score-0.68]

38 The key insight of persistent homology is that we should study not just the homology of Ωε (S) for a fixed value of ε but for all ε ∈ [0, ∞) simultaneously. [sent-111, score-0.778]

39 The idea is then to study how the homology groups Hi (Ωε (S), Z2 ) change with ε and one records the changes in Betti number using a barcode [10] (see e. [sent-112, score-0.518]

40 ˇ Computing the barcode corresponding to Hi (Ωε (S), Z2 ) directly (via the Cech complex given by our covering of balls Bε (X1 ), . [sent-115, score-0.153]

41 , Bε (Xn ) [10]) is computationally very expensive and one hence computes the barcode corresponding to the homology groups of the Vietoris-Rips complex V2ε (S). [sent-118, score-0.495]

42 The homology groups of V2ε (S) are not necessarily isomorphic to the homology groups of Ωε (S), but can serve as an approximation due to the interleaving property ˇ of the Vietoris-Rips and Cech complex, see e. [sent-120, score-0.684]

43 The computed ith barcode then records the birth and death times of topological features of V2ε in dimension i as we increase ε from zero to some maximal value M , where M is called the maximal filtration value. [sent-125, score-0.523]

44 fashion from an underlying probability distribution with density f : Rd → R with bounded support Ω, we propose to recover f ˆ using a kernel density estimator fε,n in a way that respects the algebraic topology of Ω. [sent-132, score-0.564]

45 For this, we ˆ based on kernels K with supp K = B1 (0), and in particular, we experiment with consider only fε,n ˆ Kt , Ku and Kc . [sent-133, score-0.159]

46 For such kernels, supp fε,n = Ωε (S) = ∪n Bε (Xi ) whose topological features i=1 we can approximate by computing the barcodes for V2ε . [sent-134, score-0.547]

47 If no prior information on the topological features of Ω is given, we can then inspect these barcodes and search for large intervals in which the Betti numbers do not change. [sent-135, score-0.472]

48 This approach is used in [10], who demonstrated that topological features of data can be discovered in this way. [sent-136, score-0.322]

49 In the robotics setting, frequently encountered examples for such forbidden regions are singular points in the joint space of a robot, or positions in space corresponding to collisions with the environment. [sent-141, score-0.167]

50 For a given sample S, we then compute the barcodes for V2ε in each dimension i ∈ {1, . [sent-143, score-0.178]

51 If A is empty, we consider the topological reconstruction to have failed. [sent-147, score-0.322]

52 For ε ∈ [εmin (n), εmax (n)], the resulting density fε,n then has a support region Ωε (S) with the correct Betti numbers – as approximated by V2ε . [sent-151, score-0.277]

53 1 provides a motivation for choosing {εtop (n)}∞ as a topological bandwidth selector since – while it is difficult to analyse n=1 εmin (n) asymptotically – at least the second summand of εtop (n) has the same asymptotics in n as the optimal AMISE solution. [sent-160, score-0.583]

54 Furthermore, this choice of bandwidth then corresponds to a support region Ωεtop (n) (S) with the correct Betti numbers (as approximated by the Vietoris-Rips complex) since εtop (n) ∈ [εmin (n), εmax (n)]. [sent-161, score-0.368]

55 If the topologically admissible interval [εmin (n), εmax (n)] is for example determined by the constraint of having three connected components of supp f as in 3(a), εmax (n) will increase if we shift the connected components of supp f further apart. [sent-164, score-0.482]

56 Note however also that the L2 error might not be the right quality measure for applications where the topological features of supp f are most important – we illustrate an example of this situation in our racetrack data experiment. [sent-168, score-0.632]

57 We will show that – in the absence of further problem-specific knowledge – εtop (n) does yields a good bandwidth estimate with respect to the L2 error in our examples. [sent-169, score-0.238]

58 4 Experiments Results in 1D We consider the probability density f : R → R displayed in grey in each of the graphs in Figure 3. [sent-170, score-0.227]

59 To benchmark the performance of our topological bandwidth estimators, we then compute the AMISE-optimal bandwidth parameter εamise numerically from the analytic formula for f and for Kt , Ku , Kc and Ke . [sent-171, score-0.821]

60 30 0 5 10 15 20 25 (b) fεamise ,10 using Ke 30 0 0 5 10 15 20 25 30 (c) fεtop ,2500 using Kt Figure 3: Density f (grey) and reconstructions (black) for the indicated sample size, bandwidth and kernel. [sent-180, score-0.306]

61 We then find εtop (n) by computing a topologically admissible interval [εmin (n), εmax (n)] from the barcode corresponding to the given sample. [sent-184, score-0.365]

62 Note here that εamise can only be computed if the true density f is known, while, for εtop , we only 5 3. [sent-188, score-0.147]

63 30 top amise min mid max top amise min mid max 0. [sent-194, score-1.886]

64 00 (d) Ku 1 10 20 30 (e) Kc Figure 4: We generate samples from our 1D density using rejection sampling and consider sample sizes n from 10 to 100 in increments of 10 (small scale) and from 250 to 5000 in increments of 250 (larger scale), resulting in 30 increasing sample sizes n1 , . [sent-224, score-0.363]

65 We then compute the corresponding kernel density estimators fε,n and the mean L2 norm of ˆ n ,n . [sent-229, score-0.207]

66 Figures (b)-(e) display these mean L2 errors (vertical axis) for the indicated kernel function and f − fε bandwidth selectors. [sent-230, score-0.378]

67 Figure (a) displays the bandwidth values (vertical axis) for the given bandwidth selectors. [sent-231, score-0.527]

68 0 ˆ (c) fεtop ,100 using just 100 samples as in 5(b) 3 (d) barcode for b0 0 3 (e) barcode for b1 Figure 5: 2D density, samples with inferred support region Ωεtop , topological reconstruction (using Kt , 1 σ 2 = 4 ) and barcodes with [εmin , εmax ] highlighted. [sent-237, score-0.846]

69 In our experiments (sample sizes n 10), we were able to determine a valid interval [εmin (n), εmax (n)] in all cases and did not encounter a case where the topological reconstruction was impossible. [sent-239, score-0.4]

70 Results in 2D Here, we consider the density f displayed in Figure 5(a). [sent-240, score-0.199]

71 In such scenarios, we might be able to obtain topological information about the unobstructed space X, such as knowing the number of components or holes in X. [sent-242, score-0.367]

72 Such information could be particularly valuable in the case of deformable obstacles since their homology stays invariant under continuous deformations by homotopies. [sent-243, score-0.338]

73 we iterate sampling from the given density for various sample sizes and compute the resulting mean L2 errors to evaluate our results. [sent-246, score-0.251]

74 As we can see from Figure 6, our results indicate that bandwidths ε ∈ [εmin , εmax ] yield errors comparable with the AMISE optimal bandwidth choice. [sent-247, score-0.269]

75 since path-connectedness of supp f is important), making a bounded support kernel a preferable choice (see also our racetrack example). [sent-252, score-0.423]

76 00 1500 (a) bandwidth values 100 500 1000 2 (b) Kt , σ = 0. [sent-275, score-0.238]

77 00 100 500 1000 1500 (e) Kc Figure 6: We generate samples from our 2D density using rejection sampling and consider sample sizes from 100 to 1500 in increments of 100. [sent-278, score-0.255]

78 We perform sampling 10 times for each sample size and compute the ˆ ˆ corresponding kernel-based density estimator fε,n and the mean L2 norm of f − fεn ,n . [sent-279, score-0.23]

79 Figures (b)-(e) display 2 these mean L errors (vertical axis) for the indicated sample size (horizontal axis) and kernel function. [sent-280, score-0.174]

80 Figure (a) displays the indicated bandwidth values (vertical axis) and sample size (horizontal axis). [sent-281, score-0.357]

81 150 100 50 0 0 50 100 150 150 100 50 200 (a) Position component of our racetrack data 0 0 50 100 150 200 (b) Projection of inferred support region, generated vector field and sample trajectories (c) Inferred vector field, position likelihood and sample trajectories using GMR. [sent-1939, score-0.416]

82 We exploit the topological information that a racetrack should be connected and ‘circular’ when learning the density. [sent-1942, score-0.555]

83 Application to regression We now consider how our framework can be applied to learn complex dynamics given a topological constraint. [sent-1947, score-0.322]

84 We consider GPS/timestamp data from 10 laps of a race car driving around a racetrack which was provided to us by [20]. [sent-1948, score-0.294]

85 For this dataset (see Figure 7(a)), we are given no information on what the boundaries of the racetrack are. [sent-1949, score-0.204]

86 In order to model the dynamics, one can then employ a Gaussian mixture model [12] in R2n to learn a probability ˆ density f for the dataset (usually using the EM-algorithm). [sent-1955, score-0.191]

87 To every position x ∈ Rn , one can ˆ then associate the velocity vector given by E(V |P = x) with respect to the learned density f – this uses the idea of Gaussian mixture regression (GMR). [sent-1956, score-0.284]

88 We then experimented with the software [21] to model our racetrack data with a mixture of a varying number of Gaussians. [sent-1961, score-0.275]

89 We observe both an undesired periodic trajectory as well as a trajectory that almost completely traverses the racetrack before converging towards an attractor. [sent-1964, score-0.256]

90 Given that we know that the racetrack is a closed curve, we assume that the data should be modelled by a probability density f : R4 → R whose support region Ω has a single component (b0 (Ω) = 1) and Ω should topologically be a circle (b1 (Ω) = 1). [sent-1969, score-0.597]

91 In order for the velocities of differing laps around the track not to lie too far apart , and so that the topology of the racetrack dominates in R4 , we rescale the velocity components of our data to lie inside the in0 2. [sent-1970, score-0.398]

92 97] is the bandwidth interval for which the topological constraints that we just defined are satFigure 8: Barcodes isfied. [sent-1978, score-0.631]

93 Using the kernel K with σ 2 = 1 and the corresponding density t 4 in dimension zero (a) ˆ estimator fεtop , we obtain Ωεtop ⊂ R4 with the correct topological propand one (b) and shaded [εmin , εmax ] interval for erties. [sent-1979, score-0.615]

94 At the same time, the displayed support region looks like a sensible choice for the position of the racetrack. [sent-1994, score-0.185]

95 5 Conclusion In this paper, we have presented a novel method for learning density models with bounded support. [sent-1995, score-0.183]

96 The proposed topological bandwidth selection approach allows to incorporate topological constraints within a probabilistic modelling framework by combining algebraic-topological information obtained in terms of persistent homology with tools from kernel-based density estimation. [sent-1996, score-1.524]

97 Turlach, “Bandwidth selection in kernel density estimation: A review,” in CORE and Institut de Statistique, pp. [sent-2040, score-0.18]

98 Weinberger, “A topological view of unsupervised learning from noisy data,” SIAM Journal of Computing, vol. [sent-2070, score-0.322]

99 Weinberger, “Finding the homology of submanifolds with high confidence from random samples,” Discrete Comput. [sent-2109, score-0.315]

100 Adams, “JavaPlex: A software package for computing persistent topological invariants. [sent-2118, score-0.497]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('amise', 0.51), ('topological', 0.322), ('homology', 0.315), ('bandwidth', 0.238), ('racetrack', 0.204), ('mid', 0.189), ('betti', 0.187), ('barcode', 0.153), ('persistent', 0.148), ('density', 0.147), ('kt', 0.141), ('barcodes', 0.119), ('top', 0.114), ('topologically', 0.111), ('spherical', 0.106), ('supp', 0.106), ('kc', 0.104), ('ke', 0.099), ('ku', 0.099), ('limn', 0.093), ('gmr', 0.09), ('min', 0.071), ('forbidden', 0.068), ('rd', 0.065), ('admissible', 0.062), ('ise', 0.06), ('densities', 0.059), ('max', 0.059), ('velocity', 0.059), ('axis', 0.057), ('region', 0.055), ('topology', 0.054), ('kernels', 0.053), ('displayed', 0.052), ('displays', 0.051), ('hess', 0.051), ('javaplex', 0.051), ('laps', 0.051), ('estimator', 0.049), ('holes', 0.045), ('billard', 0.045), ('mixture', 0.044), ('support', 0.044), ('display', 0.042), ('robotics', 0.04), ('sizes', 0.039), ('race', 0.039), ('interval', 0.039), ('vertical', 0.038), ('gaussian', 0.037), ('dx', 0.036), ('bounded', 0.036), ('circle', 0.036), ('increments', 0.035), ('position', 0.034), ('sample', 0.034), ('cech', 0.034), ('hedvig', 0.034), ('homotopies', 0.034), ('stockholm', 0.034), ('indicated', 0.034), ('robot', 0.033), ('trajectories', 0.033), ('kernel', 0.033), ('algebraic', 0.032), ('hi', 0.032), ('constraints', 0.032), ('numbers', 0.031), ('eld', 0.031), ('positions', 0.031), ('errors', 0.031), ('track', 0.03), ('horizontal', 0.03), ('smale', 0.03), ('lying', 0.029), ('connected', 0.029), ('demonstration', 0.028), ('truncated', 0.028), ('collisions', 0.028), ('grey', 0.028), ('groups', 0.027), ('estimators', 0.027), ('software', 0.027), ('kth', 0.026), ('trajectory', 0.026), ('submanifold', 0.026), ('periodicity', 0.026), ('motion', 0.025), ('dimension', 0.025), ('gaussians', 0.025), ('gmm', 0.024), ('concentrated', 0.024), ('numerically', 0.023), ('rn', 0.023), ('deformations', 0.023), ('niyogi', 0.023), ('records', 0.023), ('selector', 0.023), ('respects', 0.022), ('circular', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 269 nips-2012-Persistent Homology for Learning Densities with Bounded Support

Author: Florian T. Pokorny, Hedvig Kjellström, Danica Kragic, Carl Ek

Abstract: We present a novel method for learning densities with bounded support which enables us to incorporate ‘hard’ topological constraints. In particular, we show how emerging techniques from computational algebraic topology and the notion of persistent homology can be combined with kernel-based methods from machine learning for the purpose of density estimation. The proposed formalism facilitates learning of models with bounded support in a principled way, and – by incorporating persistent homology techniques in our approach – we are able to encode algebraic-topological constraints which are not addressed in current state of the art probabilistic models. We study the behaviour of our method on two synthetic examples for various sample sizes and exemplify the benefits of the proposed approach on a real-world dataset by learning a motion model for a race car. We show how to learn a model which respects the underlying topological structure of the racetrack, constraining the trajectories of the car. 1

2 0.091313303 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

Author: Arthur Gretton, Dino Sejdinovic, Heiko Strathmann, Sivaraman Balakrishnan, Massimiliano Pontil, Kenji Fukumizu, Bharath K. Sriperumbudur

Abstract: Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p = q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.

3 0.082222998 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

Author: Kumar Sricharan, Alfred O. Hero

Abstract: The problem of estimation of entropy functionals of probability densities has received much attention in the information theory, machine learning and statistics communities. Kernel density plug-in estimators are simple, easy to implement and widely used for estimation of entropy. However, for large feature dimension d, kernel plug-in estimators suffer from the curse of dimensionality: the MSE rate of convergence is glacially slow - of order O(T −γ/d ), where T is the number of samples, and γ > 0 is a rate parameter. In this paper, it is shown that for sufficiently smooth densities, an ensemble of kernel plug-in estimators can be combined via a weighted convex combination, such that the resulting weighted estimator has a superior parametric MSE rate of convergence of order O(T −1 ). Furthermore, it is shown that these optimal weights can be determined by solving a convex optimization problem which does not require training data or knowledge of the underlying density, and therefore can be performed offline. This novel result is remarkable in that, while each of the individual kernel plug-in estimators belonging to the ensemble suffer from the curse of dimensionality, by appropriate ensemble averaging we can achieve parametric convergence rates. 1

4 0.077842377 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

Author: Xianxing Zhang, Lawrence Carin

Abstract: A new methodology is developed for joint analysis of a matrix and accompanying documents, with the documents associated with the matrix rows/columns. The documents are modeled with a focused topic model, inferring interpretable latent binary features for each document. A new matrix decomposition is developed, with latent binary features associated with the rows/columns, and with imposition of a low-rank constraint. The matrix decomposition and topic model are coupled by sharing the latent binary feature vectors associated with each. The model is applied to roll-call data, with the associated documents defined by the legislation. Advantages of the proposed model are demonstrated for prediction of votes on a new piece of legislation, based only on the observed text of legislation. The coupling of the text and legislation is also shown to yield insight into the properties of the matrix decomposition for roll-call data. 1

5 0.076503813 346 nips-2012-Topology Constraints in Graphical Models

Author: Marcelo Fiori, Pablo Musé, Guillermo Sapiro

Abstract: Graphical models are a very useful tool to describe and understand natural phenomena, from gene expression to climate change and social interactions. The topological structure of these graphs/networks is a fundamental part of the analysis, and in many cases the main goal of the study. However, little work has been done on incorporating prior topological knowledge onto the estimation of the underlying graphical models from sample data. In this work we propose extensions to the basic joint regression model for network estimation, which explicitly incorporate graph-topological constraints into the corresponding optimization approach. The first proposed extension includes an eigenvector centrality constraint, thereby promoting this important prior topological property. The second developed extension promotes the formation of certain motifs, triangle-shaped ones in particular, which are known to exist for example in genetic regulatory networks. The presentation of the underlying formulations, which serve as examples of the introduction of topological constraints in network estimation, is complemented with examples in diverse datasets demonstrating the importance of incorporating such critical prior knowledge. 1

6 0.061803002 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests

7 0.054606371 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

8 0.053867921 48 nips-2012-Augmented-SVM: Automatic space partitioning for combining multiple non-linear dynamics

9 0.053866237 294 nips-2012-Repulsive Mixtures

10 0.053653814 188 nips-2012-Learning from Distributions via Support Measure Machines

11 0.049865782 30 nips-2012-Accuracy at the Top

12 0.045454565 16 nips-2012-A Polynomial-time Form of Robust Regression

13 0.04450202 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

14 0.044311367 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

15 0.04312792 95 nips-2012-Density-Difference Estimation

16 0.042970985 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

17 0.042228717 289 nips-2012-Recognizing Activities by Attribute Dynamics

18 0.041407 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

19 0.04071524 261 nips-2012-Online allocation and homogeneous partitioning for piecewise constant mean-approximation

20 0.040148702 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.133), (1, 0.016), (2, 0.0), (3, -0.016), (4, 0.0), (5, 0.01), (6, 0.005), (7, 0.04), (8, -0.027), (9, -0.04), (10, -0.011), (11, -0.036), (12, 0.015), (13, -0.083), (14, 0.006), (15, -0.073), (16, 0.053), (17, -0.002), (18, -0.029), (19, -0.023), (20, 0.006), (21, -0.004), (22, 0.035), (23, -0.054), (24, -0.028), (25, 0.009), (26, 0.025), (27, 0.012), (28, 0.038), (29, -0.008), (30, 0.006), (31, -0.035), (32, -0.039), (33, -0.031), (34, 0.033), (35, -0.003), (36, -0.036), (37, -0.042), (38, 0.048), (39, 0.051), (40, -0.026), (41, -0.005), (42, -0.02), (43, -0.043), (44, 0.01), (45, 0.026), (46, 0.055), (47, 0.04), (48, 0.019), (49, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91672421 269 nips-2012-Persistent Homology for Learning Densities with Bounded Support

Author: Florian T. Pokorny, Hedvig Kjellström, Danica Kragic, Carl Ek

Abstract: We present a novel method for learning densities with bounded support which enables us to incorporate ‘hard’ topological constraints. In particular, we show how emerging techniques from computational algebraic topology and the notion of persistent homology can be combined with kernel-based methods from machine learning for the purpose of density estimation. The proposed formalism facilitates learning of models with bounded support in a principled way, and – by incorporating persistent homology techniques in our approach – we are able to encode algebraic-topological constraints which are not addressed in current state of the art probabilistic models. We study the behaviour of our method on two synthetic examples for various sample sizes and exemplify the benefits of the proposed approach on a real-world dataset by learning a motion model for a race car. We show how to learn a model which respects the underlying topological structure of the racetrack, constraining the trajectories of the car. 1

2 0.7508027 95 nips-2012-Density-Difference Estimation

Author: Masashi Sugiyama, Takafumi Kanamori, Taiji Suzuki, Marthinus D. Plessis, Song Liu, Ichiro Takeuchi

Abstract: We address the problem of estimating the difference between two probability densities. A naive approach is a two-step procedure of first estimating two densities separately and then computing their difference. However, such a two-step procedure does not necessarily work well because the first step is performed without regard to the second step and thus a small estimation error incurred in the first stage can cause a big error in the second stage. In this paper, we propose a single-shot procedure for directly estimating the density difference without separately estimating two densities. We derive a non-parametric finite-sample error bound for the proposed single-shot density-difference estimator and show that it achieves the optimal convergence rate. We then show how the proposed density-difference estimator can be utilized in L2 -distance approximation. Finally, we experimentally demonstrate the usefulness of the proposed method in robust distribution comparison such as class-prior estimation and change-point detection.

3 0.73954612 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

Author: Arthur Gretton, Dino Sejdinovic, Heiko Strathmann, Sivaraman Balakrishnan, Massimiliano Pontil, Kenji Fukumizu, Bharath K. Sriperumbudur

Abstract: Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p = q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.

4 0.68246657 175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data

Author: Assaf Glazer, Michael Lindenbaum, Shaul Markovitch

Abstract: We propose an efficient, generalized, nonparametric, statistical KolmogorovSmirnov test for detecting distributional change in high-dimensional data. To implement the test, we introduce a novel, hierarchical, minimum-volume sets estimator to represent the distributions to be tested. Our work is motivated by the need to detect changes in data streams, and the test is especially efficient in this context. We provide the theoretical foundations of our test and show its superiority over existing methods. 1

5 0.645989 249 nips-2012-Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

Author: Tianbao Yang, Yu-feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou

Abstract: Both random Fourier features and the Nystr¨ m method have been successfully o applied to efficient kernel learning. In this work, we investigate the fundamental difference between these two approaches, and how the difference could affect their generalization performances. Unlike approaches based on random Fourier features where the basis functions (i.e., cosine and sine functions) are sampled from a distribution independent from the training data, basis functions used by the Nystr¨ m method are randomly sampled from the training examples and are o therefore data dependent. By exploring this difference, we show that when there is a large gap in the eigen-spectrum of the kernel matrix, approaches based on the Nystr¨ m method can yield impressively better generalization error bound than o random Fourier features based approach. We empirically verify our theoretical findings on a wide range of large data sets. 1

6 0.64256203 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests

7 0.64079756 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection

8 0.62089288 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

9 0.61895096 231 nips-2012-Multiple Operator-valued Kernel Learning

10 0.61451679 188 nips-2012-Learning from Distributions via Support Measure Machines

11 0.60632086 330 nips-2012-Supervised Learning with Similarity Functions

12 0.60191989 145 nips-2012-Gradient Weights help Nonparametric Regressors

13 0.58804494 177 nips-2012-Learning Invariant Representations of Molecules for Atomization Energy Prediction

14 0.58418345 284 nips-2012-Q-MKL: Matrix-induced Regularization in Multi-Kernel Learning with Applications to Neuroimaging

15 0.58122224 338 nips-2012-The Perturbed Variation

16 0.57910883 294 nips-2012-Repulsive Mixtures

17 0.5680657 167 nips-2012-Kernel Hyperalignment

18 0.56326729 222 nips-2012-Multi-Task Averaging

19 0.53335941 2 nips-2012-3D Social Saliency from Head-mounted Cameras

20 0.51336735 343 nips-2012-Tight Bounds on Profile Redundancy and Distinguishability


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.025), (14, 0.311), (21, 0.051), (38, 0.111), (39, 0.022), (42, 0.04), (54, 0.029), (55, 0.033), (74, 0.054), (76, 0.124), (80, 0.064), (92, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.70405579 269 nips-2012-Persistent Homology for Learning Densities with Bounded Support

Author: Florian T. Pokorny, Hedvig Kjellström, Danica Kragic, Carl Ek

Abstract: We present a novel method for learning densities with bounded support which enables us to incorporate ‘hard’ topological constraints. In particular, we show how emerging techniques from computational algebraic topology and the notion of persistent homology can be combined with kernel-based methods from machine learning for the purpose of density estimation. The proposed formalism facilitates learning of models with bounded support in a principled way, and – by incorporating persistent homology techniques in our approach – we are able to encode algebraic-topological constraints which are not addressed in current state of the art probabilistic models. We study the behaviour of our method on two synthetic examples for various sample sizes and exemplify the benefits of the proposed approach on a real-world dataset by learning a motion model for a race car. We show how to learn a model which respects the underlying topological structure of the racetrack, constraining the trajectories of the car. 1

2 0.63654602 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

Author: Anima Anandkumar, Yi-kai Liu, Daniel J. Hsu, Dean P. Foster, Sham M. Kakade

Abstract: Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of topic models, including Latent Dirichlet Allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, called Excess Correlation Analysis, is based on a spectral decomposition of low-order moments via two singular value decompositions (SVDs). Moreover, the algorithm is scalable, since the SVDs are carried out only on k × k matrices, where k is the number of latent factors (topics) and is typically much smaller than the dimension of the observation (word) space. 1

3 0.6339879 254 nips-2012-On the Sample Complexity of Robust PCA

Author: Matthew Coudron, Gilad Lerman

Abstract: We estimate the rate of convergence and sample complexity of a recent robust estimator for a generalized version of the inverse covariance matrix. This estimator is used in a convex algorithm for robust subspace recovery (i.e., robust PCA). Our model assumes a sub-Gaussian underlying distribution and an i.i.d. sample from it. Our main result shows with high probability that the norm of the difference between the generalized inverse covariance of the underlying distribution and its estimator from an i.i.d. sample of size N is of order O(N −0.5+ ) for arbitrarily small > 0 (affecting the probabilistic estimate); this rate of convergence is close to the one of direct covariance estimation, i.e., O(N −0.5 ). Our precise probabilistic estimate implies for some natural settings that the sample complexity of the generalized inverse covariance estimation when using the Frobenius norm is O(D2+δ ) for arbitrarily small δ > 0 (whereas the sample complexity of direct covariance estimation with Frobenius norm is O(D2 )). These results provide similar rates of convergence and sample complexity for the corresponding robust subspace recovery algorithm. To the best of our knowledge, this is the only work analyzing the sample complexity of any robust PCA algorithm. 1

4 0.61866003 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

Author: Aaron Defazio, Tibério S. Caetano

Abstract: A key problem in statistics and machine learning is the determination of network structure from data. We consider the case where the structure of the graph to be reconstructed is known to be scale-free. We show that in such cases it is natural to formulate structured sparsity inducing priors using submodular functions, and we use their Lov´ sz extension to obtain a convex relaxation. For tractable classes a such as Gaussian graphical models, this leads to a convex optimization problem that can be efficiently solved. We show that our method results in an improvement in the accuracy of reconstructed networks for synthetic data. We also show how our prior encourages scale-free reconstructions on a bioinfomatics dataset.

5 0.53783023 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release

Author: Moritz Hardt, Katrina Ligett, Frank Mcsherry

Abstract: We present a new algorithm for differentially private data release, based on a simple combination of the Multiplicative Weights update rule with the Exponential Mechanism. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simple to implement and experimentally more accurate on actual data sets than existing techniques. 1

6 0.53710443 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

7 0.53651369 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

8 0.53481746 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

9 0.53426367 38 nips-2012-Algorithms for Learning Markov Field Policies

10 0.53308678 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

11 0.531353 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

12 0.53128201 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

13 0.52969652 260 nips-2012-Online Sum-Product Computation Over Trees

14 0.52965665 148 nips-2012-Hamming Distance Metric Learning

15 0.52943695 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

16 0.52927852 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

17 0.52905166 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

18 0.52904302 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

19 0.52891648 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

20 0.52851635 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization