nips nips2010 nips2010-283 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alexandre Bouchard-côté, Michael I. Jordan
Abstract: Since the discovery of sophisticated fully polynomial randomized algorithms for a range of #P problems [1, 2, 3], theoretical work on approximate inference in combinatorial spaces has focused on Markov chain Monte Carlo methods. Despite their strong theoretical guarantees, the slow running time of many of these randomized algorithms and the restrictive assumptions on the potentials have hindered the applicability of these algorithms to machine learning. Because of this, in applications to combinatorial spaces simple exact models are often preferred to more complex models that require approximate inference [4]. Variational inference would appear to provide an appealing alternative, given the success of variational methods for graphical models [5]; unfortunately, however, it is not obvious how to develop variational approximations for combinatorial objects such as matchings, partial orders, plane partitions and sequence alignments. We propose a new framework that extends variational inference to a wide range of combinatorial spaces. Our method is based on a simple assumption: the existence of a tractable measure factorization, which we show holds in many examples. Simulations on a range of matching models show that the algorithm is more general and empirically faster than a popular fully polynomial randomized algorithm. We also apply the framework to the problem of multiple alignment of protein sequences, obtaining state-of-the-art results on the BAliBASE dataset [6]. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Variational Inference over Combinatorial Spaces ∗ Alexandre Bouchard-Cˆ t´ ∗ oe Michael I. [sent-1, score-0.074]
2 Despite their strong theoretical guarantees, the slow running time of many of these randomized algorithms and the restrictive assumptions on the potentials have hindered the applicability of these algorithms to machine learning. [sent-3, score-0.145]
3 Because of this, in applications to combinatorial spaces simple exact models are often preferred to more complex models that require approximate inference [4]. [sent-4, score-0.43]
4 We propose a new framework that extends variational inference to a wide range of combinatorial spaces. [sent-6, score-0.671]
5 Our method is based on a simple assumption: the existence of a tractable measure factorization, which we show holds in many examples. [sent-7, score-0.144]
6 Simulations on a range of matching models show that the algorithm is more general and empirically faster than a popular fully polynomial randomized algorithm. [sent-8, score-0.171]
7 We also apply the framework to the problem of multiple alignment of protein sequences, obtaining state-of-the-art results on the BAliBASE dataset [6]. [sent-9, score-0.524]
8 This setup subsumes many probabilistic inference and classical combinatorics problems. [sent-11, score-0.176]
9 It is often intractable to compute this sum, so approximations are used. [sent-12, score-0.08]
10 Each Ci is larger than C, but paradoxically it is often possible to find such a decomposition where for each i, 1 x∈Ci f (x) is tractable. [sent-14, score-0.041]
11 This paper describes an effective way of using this type of decomposition to approximate the original sum. [sent-16, score-0.041]
12 Another way of viewing this setup is in terms of exponential families. [sent-17, score-0.215]
13 In this view, described in detail in Section 2, the decomposition becomes a factorization of the base measure. [sent-18, score-0.308]
14 As we will show, the exponential family view gives a principled way of defining variational approximations. [sent-19, score-0.423]
15 In order to make variational approximations tractable in the combinatorial setup, we use what we call an implicit message representation. [sent-20, score-0.825]
16 The canonical parameter space of the exponential family enables such representation. [sent-21, score-0.193]
17 We also show how additional approximations can be introduced in cases where the factorization has a large number of factors. [sent-22, score-0.25]
18 These further approximations rely on an outer bound of the partition function, and therefore preserve the guarantees of convex variational objective functions. [sent-23, score-0.392]
19 While previous authors have proposed mean field or loopy belief propagation algorithms to approximate the partition function of a few specific combinatorial models—for example [7, 8] for parsing, 1 The appendices can be found in the supplementary material. [sent-24, score-0.566]
20 1 and [9, 10] for computing the permanent of a matrix—we are not aware of a general treatment of variational inference in combinatorial spaces. [sent-25, score-0.83]
21 There has been work on applying variational algorithms to the problem of maximization over combinatorial spaces [11, 12, 13, 14], but maximization over combinatorial spaces is rather different than summation. [sent-26, score-1.107]
22 For example, in the bipartite matching example considered in both [13] and this paper, there is a known polynomial algorithm for maximization, but not for summation. [sent-27, score-0.277]
23 Our approach is also related to agreement-based learning [15, 16], although agreement-based learning is defined within the context of unsupervised learning using EM, while our framework is agnostic with respect to parameter estimation. [sent-28, score-0.04]
24 The paper is organized as follows: in Section 2 we present the measure factorization framework; in Section 3 we show examples of this framework applied to various combinatorial inference problems; and in Section 4 we present empirical results. [sent-29, score-0.633]
25 2 Variational measure factorization In this section, we present the variational measure factorization framework. [sent-30, score-0.731]
26 At a high level, the first step is to construct an equivalent but more convenient exponential family. [sent-31, score-0.099]
27 This exponential family will allow us to transform variational algorithms over graphical models into approximation algorithms over combinatorial spaces. [sent-32, score-0.725]
28 We first describe the techniques needed to do this transformation in the case of a specific variational inference algorithm—loopy belief propagation—and then discuss mean-field and tree-reweighted approximations. [sent-33, score-0.436]
29 To make the exposition more concrete, we use the running example of approximating the value and gradient of the log-partition function of a Bipartite Matching model (BM) over KN,N , a well-known #P problem [17]. [sent-34, score-0.097]
30 Unless we mention otherwise, we will consider bipartite perfect matchings; nonbipartite and non-perfect matchings are discussed in Section 3. [sent-35, score-0.374]
31 The reader should keep in mind, however, that our framework is applicable to a much broader class of combinatorial objects. [sent-37, score-0.342]
32 The link between this setup and the general problem of computing x∈C f (x) is the base measure ν, which is set to the indicator function over C: ν(x) = 1[x ∈ C], where 1[·] is equal to one if its argument holds true, and zero otherwise. [sent-42, score-0.274]
33 The goal is to approximate A(θ) and A(θ) (recall that the j-th coordinate of the gradient, j A, is equal to the expectation of the sufficient statistic φj under the exponential family with base measure ν [5]). [sent-43, score-0.423]
34 We want to exploit situations where the base measure can be written as a product of I I measures ν(x) = i=1 νi (x) such that each factor νi : X → {0, 1} induces a super-partition function assumed to be tractable: Ai (θ) = log x∈X exp{ φ(x), θ }νi (x). [sent-44, score-0.158]
35 In the case of BM, the space X is a product of N 2 binary alignment variables, x = x1,1 , x1,2 , . [sent-47, score-0.36]
36 In the Standard Bipartite Matching formulation (which we denote by SBM), the sufficient statistic takes the form φj (x) = xm,n . [sent-51, score-0.111]
37 The measure factorization we use to enforce the matching property is ν = ν1 ν2 , where: N ν1 (x) = N m=1 N N xm,n ≤ 1], 1[ ν2 (x) = n=1 n=1 2 xm,n ≤ 1]. [sent-52, score-0.304]
38 1[ m=1 (2) We start by constructing an equivalent but more convenient exponential family. [sent-53, score-0.099]
39 < 25% identity short, 20% — 40% identity short, > 35% identity 0. [sent-54, score-0.174]
40 89 Table 1: Average SP scores in the ref1/test1 directory of BAliBASE. [sent-74, score-0.043]
41 BPMF-i denotes the average SP of the BPMF algorithm after i iterations of (parallel) message passing. [sent-75, score-0.091]
42 The experimental setup is based on a generative model over noisy observations of bipartite perfect matchings described in Appendix C. [sent-78, score-0.49]
43 We show in Figure 3(c) the results of a sequence of these experiments for different bipartite component sizes N/2. [sent-80, score-0.228]
44 This experiments demonstrates the scalability of sophisticated factorizations, and their superiority over simpler ones. [sent-81, score-0.075]
45 2 Multiple sequence alignment To assess the practical significance of this framework, we also apply it to BAliBASE [6], a standard protein multiple sequence alignment benchmark. [sent-83, score-0.976]
46 12 [24], the most popular multiple alignment tool, and ProbCons 1. [sent-86, score-0.408]
47 12, a state-of-the-art system [25] that also relies on enforcing transitivity constraints, but which is not derived via the optimization of an objective function. [sent-87, score-0.118]
48 Our system uses a basic pair HMM [26] to score pairwise alignments. [sent-88, score-0.039]
49 6 The advantage of our system over the other systems is the better optimization technique, based on the measure factorization described in Section 3. [sent-90, score-0.27]
50 We used a standard technique to transform the pairwise alignment marginals into a single valid multiple sequence alignment (see Appendix C. [sent-92, score-0.834]
51 Our system outperformed both baselines after three BPMF parallel message passing iterations. [sent-94, score-0.171]
52 The algorithm converged in all protein groups, and performance was identical after more than three iterations. [sent-95, score-0.076]
53 Although the overall performance gain is not statistically significant according to a Wilcoxon signed-rank test, the larger gains were obtained in the small identity subset, the “twilight zone” where research on multiple sequence alignment has focused. [sent-96, score-0.532]
54 One caveat of this multiple alignment approach is its running time, which is cubic in the length of the longest sequence, while most multiple sequence alignment approaches are quadratic. [sent-97, score-0.968]
55 For example, the running time for one iteration of BPMF in this experiment was 364. [sent-98, score-0.046]
56 98s for Clustal—this is why we have restricted the experiments to the short sequences section of BAliBASE. [sent-100, score-0.046]
57 Fortunately, several techniques are available to decrease the computational complexity of this algorithm: the transitivity factors can be subsampled using a coarse pass, or along a phylogenetic tree; and computation of the factors can be entirely parallelized. [sent-101, score-0.158]
58 5 Conclusion Computing the moments of discrete exponential families can be difficult for two reasons: the structure of the sufficient statistic that can create junction trees of high tree-width, and the structure of the base measures that can induce an intractable combinatorial space. [sent-103, score-0.712]
59 Most previous work on variational approximations has focused on the first difficulty; however, the second challenge also arises frequently in machine learning. [sent-104, score-0.349]
60 In this work, we have presented a framework that fills this gap. [sent-105, score-0.04]
61 It is based on an intuitive notion of measure factorization, which, as we have shown, applies to a variety of combinatorial spaces. [sent-106, score-0.363]
62 This notion enables variational algorithms to be adapted to the combinatorial setting. [sent-107, score-0.571]
63 A polynomial-time approximation algorithm for the permanent of a matrix with non-negative entries. [sent-114, score-0.199]
64 Mixing times of lozenge tiling and card shuffling Markov chains. [sent-117, score-0.043]
65 BAliBASE: A benchmark alignments database for e e the evaluation of multiple sequence alignment programs. [sent-128, score-0.517]
66 Belief propagation and loop calculus for the permanent of a non-negative matrix. [sent-141, score-0.265]
67 Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudomoment matching. [sent-187, score-0.173]
68 In oe Proceedings of Uncertainty in Artifical Intelligence, 2009. [sent-192, score-0.074]
69 CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. [sent-202, score-0.474]
wordName wordTfidf (topN-words)
[('alignment', 0.36), ('combinatorial', 0.302), ('variational', 0.269), ('permanent', 0.199), ('clustal', 0.197), ('matchings', 0.173), ('factorization', 0.17), ('bipartite', 0.162), ('balibase', 0.148), ('bpmf', 0.148), ('dan', 0.132), ('setup', 0.116), ('statistic', 0.111), ('belief', 0.107), ('exponential', 0.099), ('probcons', 0.099), ('base', 0.097), ('emnlp', 0.097), ('parsing', 0.097), ('michael', 0.095), ('message', 0.091), ('ci', 0.087), ('mol', 0.086), ('ben', 0.083), ('tractable', 0.083), ('approximations', 0.08), ('transitivity', 0.079), ('phylogenetic', 0.079), ('percy', 0.079), ('protein', 0.076), ('sp', 0.074), ('oe', 0.074), ('matching', 0.073), ('alexandre', 0.071), ('david', 0.069), ('spaces', 0.068), ('appendix', 0.068), ('propagation', 0.066), ('sequence', 0.066), ('klein', 0.065), ('measure', 0.061), ('inference', 0.06), ('bm', 0.059), ('taskar', 0.058), ('identity', 0.058), ('north', 0.057), ('liang', 0.057), ('randomized', 0.056), ('family', 0.055), ('daphne', 0.054), ('moments', 0.054), ('approximating', 0.051), ('linguistics', 0.05), ('families', 0.049), ('maximization', 0.049), ('loopy', 0.048), ('multiple', 0.048), ('short', 0.046), ('running', 0.046), ('dp', 0.046), ('partition', 0.043), ('yusuke', 0.043), ('hindered', 0.043), ('brudno', 0.043), ('sbm', 0.043), ('hbm', 0.043), ('thompson', 0.043), ('card', 0.043), ('chuong', 0.043), ('sera', 0.043), ('ashish', 0.043), ('directory', 0.043), ('siepel', 0.043), ('alignments', 0.043), ('watanabe', 0.043), ('eld', 0.043), ('polynomial', 0.042), ('martin', 0.042), ('passing', 0.041), ('decomposition', 0.041), ('wainwright', 0.04), ('wilcoxon', 0.04), ('shuf', 0.04), ('mutation', 0.04), ('zone', 0.04), ('caveat', 0.04), ('carsten', 0.04), ('bert', 0.04), ('graham', 0.04), ('bart', 0.04), ('framework', 0.04), ('canonical', 0.039), ('perfect', 0.039), ('system', 0.039), ('sophisticated', 0.038), ('peterson', 0.037), ('enumeration', 0.037), ('yedidia', 0.037), ('conductance', 0.037), ('superiority', 0.037)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 283 nips-2010-Variational Inference over Combinatorial Spaces
Author: Alexandre Bouchard-côté, Michael I. Jordan
Abstract: Since the discovery of sophisticated fully polynomial randomized algorithms for a range of #P problems [1, 2, 3], theoretical work on approximate inference in combinatorial spaces has focused on Markov chain Monte Carlo methods. Despite their strong theoretical guarantees, the slow running time of many of these randomized algorithms and the restrictive assumptions on the potentials have hindered the applicability of these algorithms to machine learning. Because of this, in applications to combinatorial spaces simple exact models are often preferred to more complex models that require approximate inference [4]. Variational inference would appear to provide an appealing alternative, given the success of variational methods for graphical models [5]; unfortunately, however, it is not obvious how to develop variational approximations for combinatorial objects such as matchings, partial orders, plane partitions and sequence alignments. We propose a new framework that extends variational inference to a wide range of combinatorial spaces. Our method is based on a simple assumption: the existence of a tractable measure factorization, which we show holds in many examples. Simulations on a range of matching models show that the algorithm is more general and empirically faster than a popular fully polynomial randomized algorithm. We also apply the framework to the problem of multiple alignment of protein sequences, obtaining state-of-the-art results on the BAliBASE dataset [6]. 1
2 0.10905297 118 nips-2010-Implicit Differentiation by Perturbation
Author: Justin Domke
Abstract: This paper proposes a simple and efficient finite difference method for implicit differentiation of marginal inference results in discrete graphical models. Given an arbitrary loss function, defined on marginals, we show that the derivatives of this loss with respect to model parameters can be obtained by running the inference procedure twice, on slightly perturbed model parameters. This method can be used with approximate inference, with a loss function over approximate marginals. Convenient choices of loss functions make it practical to fit graphical models with hidden variables, high treewidth and/or model misspecification. 1
3 0.10652796 97 nips-2010-Functional Geometry Alignment and Localization of Brain Areas
Author: Georg Langs, Yanmei Tie, Laura Rigolo, Alexandra Golby, Polina Golland
Abstract: Matching functional brain regions across individuals is a challenging task, largely due to the variability in their location and extent. It is particularly difficult, but highly relevant, for patients with pathologies such as brain tumors, which can cause substantial reorganization of functional systems. In such cases spatial registration based on anatomical data is only of limited value if the goal is to establish correspondences of functional areas among different individuals, or to localize potentially displaced active regions. Rather than rely on spatial alignment, we propose to perform registration in an alternative space whose geometry is governed by the functional interaction patterns in the brain. We first embed each brain into a functional map that reflects connectivity patterns during a fMRI experiment. The resulting functional maps are then registered, and the obtained correspondences are propagated back to the two brains. In application to a language fMRI experiment, our preliminary results suggest that the proposed method yields improved functional correspondences across subjects. This advantage is pronounced for subjects with tumors that affect the language areas and thus cause spatial reorganization of the functional regions. 1
4 0.097070359 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes
Author: Manfred Opper, Andreas Ruttor, Guido Sanguinetti
Abstract: We present a novel approach to inference in conditionally Gaussian continuous time stochastic processes, where the latent process is a Markovian jump process. We first consider the case of jump-diffusion processes, where the drift of a linear stochastic differential equation can jump at arbitrary time points. We derive partial differential equations for exact inference and present a very efficient mean field approximation. By introducing a novel lower bound on the free energy, we then generalise our approach to Gaussian processes with arbitrary covariance, such as the non-Markovian RBF covariance. We present results on both simulated and real data, showing that the approach is very accurate in capturing latent dynamics and can be useful in a number of real data modelling tasks.
5 0.092435859 268 nips-2010-The Neural Costs of Optimal Control
Author: Samuel Gershman, Robert Wilson
Abstract: Optimal control entails combining probabilities and utilities. However, for most practical problems, probability densities can be represented only approximately. Choosing an approximation requires balancing the benefits of an accurate approximation against the costs of computing it. We propose a variational framework for achieving this balance and apply it to the problem of how a neural population code should optimally represent a distribution under resource constraints. The essence of our analysis is the conjecture that population codes are organized to maximize a lower bound on the log expected utility. This theory can account for a plethora of experimental data, including the reward-modulation of sensory receptive fields, GABAergic effects on saccadic movements, and risk aversion in decisions under uncertainty. 1
6 0.088191092 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
7 0.082510673 194 nips-2010-Online Learning for Latent Dirichlet Allocation
8 0.075496957 61 nips-2010-Direct Loss Minimization for Structured Prediction
9 0.069508284 288 nips-2010-Worst-case bounds on the quality of max-product fixed-points
10 0.069253325 263 nips-2010-Switching state space model for simultaneously estimating state transitions and nonstationary firing rates
11 0.06825427 90 nips-2010-Fast Large-scale Mixture Modeling with Component-specific Data Partitions
12 0.067963317 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
13 0.066790283 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits
14 0.066755399 165 nips-2010-MAP estimation in Binary MRFs via Bipartite Multi-cuts
15 0.066421106 164 nips-2010-MAP Estimation for Graphical Models by Likelihood Maximization
16 0.065987736 284 nips-2010-Variational bounds for mixed-data factor analysis
17 0.06499932 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
18 0.063750498 144 nips-2010-Learning Efficient Markov Networks
19 0.060683101 257 nips-2010-Structured Determinantal Point Processes
20 0.060284376 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades
topicId topicWeight
[(0, 0.172), (1, 0.035), (2, 0.017), (3, 0.04), (4, -0.106), (5, 0.024), (6, 0.01), (7, 0.002), (8, 0.05), (9, 0.069), (10, -0.081), (11, -0.026), (12, 0.094), (13, 0.001), (14, -0.077), (15, 0.003), (16, -0.023), (17, 0.074), (18, 0.072), (19, -0.049), (20, -0.004), (21, 0.04), (22, 0.055), (23, -0.052), (24, 0.018), (25, -0.13), (26, -0.058), (27, -0.006), (28, -0.075), (29, -0.011), (30, 0.038), (31, -0.09), (32, -0.041), (33, -0.011), (34, 0.074), (35, -0.039), (36, -0.049), (37, -0.027), (38, 0.047), (39, 0.011), (40, 0.005), (41, -0.006), (42, 0.125), (43, -0.075), (44, 0.114), (45, -0.168), (46, -0.066), (47, -0.018), (48, 0.05), (49, -0.118)]
simIndex simValue paperId paperTitle
same-paper 1 0.95667362 283 nips-2010-Variational Inference over Combinatorial Spaces
Author: Alexandre Bouchard-côté, Michael I. Jordan
Abstract: Since the discovery of sophisticated fully polynomial randomized algorithms for a range of #P problems [1, 2, 3], theoretical work on approximate inference in combinatorial spaces has focused on Markov chain Monte Carlo methods. Despite their strong theoretical guarantees, the slow running time of many of these randomized algorithms and the restrictive assumptions on the potentials have hindered the applicability of these algorithms to machine learning. Because of this, in applications to combinatorial spaces simple exact models are often preferred to more complex models that require approximate inference [4]. Variational inference would appear to provide an appealing alternative, given the success of variational methods for graphical models [5]; unfortunately, however, it is not obvious how to develop variational approximations for combinatorial objects such as matchings, partial orders, plane partitions and sequence alignments. We propose a new framework that extends variational inference to a wide range of combinatorial spaces. Our method is based on a simple assumption: the existence of a tractable measure factorization, which we show holds in many examples. Simulations on a range of matching models show that the algorithm is more general and empirically faster than a popular fully polynomial randomized algorithm. We also apply the framework to the problem of multiple alignment of protein sequences, obtaining state-of-the-art results on the BAliBASE dataset [6]. 1
2 0.62867367 90 nips-2010-Fast Large-scale Mixture Modeling with Component-specific Data Partitions
Author: Bo Thiesson, Chong Wang
Abstract: Remarkably easy implementation and guaranteed convergence has made the EM algorithm one of the most used algorithms for mixture modeling. On the downside, the E-step is linear in both the sample size and the number of mixture components, making it impractical for large-scale data. Based on the variational EM framework, we propose a fast alternative that uses component-specific data partitions to obtain a sub-linear E-step in sample size, while the algorithm still maintains provable convergence. Our approach builds on previous work, but is significantly faster and scales much better in the number of mixture components. We demonstrate this speedup by experiments on large-scale synthetic and real data. 1
3 0.61219561 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes
Author: Manfred Opper, Andreas Ruttor, Guido Sanguinetti
Abstract: We present a novel approach to inference in conditionally Gaussian continuous time stochastic processes, where the latent process is a Markovian jump process. We first consider the case of jump-diffusion processes, where the drift of a linear stochastic differential equation can jump at arbitrary time points. We derive partial differential equations for exact inference and present a very efficient mean field approximation. By introducing a novel lower bound on the free energy, we then generalise our approach to Gaussian processes with arbitrary covariance, such as the non-Markovian RBF covariance. We present results on both simulated and real data, showing that the approach is very accurate in capturing latent dynamics and can be useful in a number of real data modelling tasks.
4 0.60752016 118 nips-2010-Implicit Differentiation by Perturbation
Author: Justin Domke
Abstract: This paper proposes a simple and efficient finite difference method for implicit differentiation of marginal inference results in discrete graphical models. Given an arbitrary loss function, defined on marginals, we show that the derivatives of this loss with respect to model parameters can be obtained by running the inference procedure twice, on slightly perturbed model parameters. This method can be used with approximate inference, with a loss function over approximate marginals. Convenient choices of loss functions make it practical to fit graphical models with hidden variables, high treewidth and/or model misspecification. 1
5 0.5601244 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits
Author: Daniel Lowd, Pedro Domingos
Abstract: Arithmetic circuits (ACs) exploit context-specific independence and determinism to allow exact inference even in networks with high treewidth. In this paper, we introduce the first ever approximate inference methods using ACs, for domains where exact inference remains intractable. We propose and evaluate a variety of techniques based on exact compilation, forward sampling, AC structure learning, Markov network parameter learning, variational inference, and Gibbs sampling. In experiments on eight challenging real-world domains, we find that the methods based on sampling and learning work best: one such method (AC2 -F) is faster and usually more accurate than loopy belief propagation, mean field, and Gibbs sampling; another (AC2 -G) has a running time similar to Gibbs sampling but is consistently more accurate than all baselines. 1
6 0.55189204 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization
8 0.48908553 144 nips-2010-Learning Efficient Markov Networks
9 0.47611231 284 nips-2010-Variational bounds for mixed-data factor analysis
10 0.46679938 164 nips-2010-MAP Estimation for Graphical Models by Likelihood Maximization
11 0.46581945 35 nips-2010-Auto-Regressive HMM Inference with Incomplete Data for Short-Horizon Wind Forecasting
12 0.46052498 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
13 0.45667249 84 nips-2010-Exact inference and learning for cumulative distribution functions on loopy graphs
14 0.45582214 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models
15 0.44902009 159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features
16 0.43811581 226 nips-2010-Repeated Games against Budgeted Adversaries
17 0.42551762 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades
18 0.42479447 11 nips-2010-A POMDP Extension with Belief-dependent Rewards
19 0.41891319 251 nips-2010-Sphere Embedding: An Application to Part-of-Speech Induction
20 0.41871265 194 nips-2010-Online Learning for Latent Dirichlet Allocation
topicId topicWeight
[(13, 0.021), (17, 0.019), (27, 0.02), (30, 0.629), (45, 0.146), (50, 0.027), (52, 0.018), (60, 0.018), (77, 0.02)]
simIndex simValue paperId paperTitle
1 0.95586711 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
Author: Hariharan Narayanan, Sanjoy Mitter
Abstract: The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume log 1 and curvature is unavoidable. Whether the known lower bound of O( k + 2 δ ) 2 for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. Here is the desired bound on the error and δ is a bound on the probability of failure. We improve the best currently known upper bound [14] of 2 log 1 log4 k log 1 O( k2 + 2 δ ) to O k min k, 2 + 2 δ . Based on these results, we 2 devise a simple algorithm for k−means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension. 1
2 0.93214095 264 nips-2010-Synergies in learning words and their referents
Author: Mark Johnson, Katherine Demuth, Bevan Jones, Michael J. Black
Abstract: This paper presents Bayesian non-parametric models that simultaneously learn to segment words from phoneme strings and learn the referents of some of those words, and shows that there is a synergistic interaction in the acquisition of these two kinds of linguistic information. The models themselves are novel kinds of Adaptor Grammars that are an extension of an embedding of topic models into PCFGs. These models simultaneously segment phoneme sequences into words and learn the relationship between non-linguistic objects to the words that refer to them. We show (i) that modelling inter-word dependencies not only improves the accuracy of the word segmentation but also of word-object relationships, and (ii) that a model that simultaneously learns word-object relationships and word segmentation segments more accurately than one that just learns word segmentation on its own. We argue that these results support an interactive view of language acquisition that can take advantage of synergies such as these. 1
same-paper 3 0.89001596 283 nips-2010-Variational Inference over Combinatorial Spaces
Author: Alexandre Bouchard-côté, Michael I. Jordan
Abstract: Since the discovery of sophisticated fully polynomial randomized algorithms for a range of #P problems [1, 2, 3], theoretical work on approximate inference in combinatorial spaces has focused on Markov chain Monte Carlo methods. Despite their strong theoretical guarantees, the slow running time of many of these randomized algorithms and the restrictive assumptions on the potentials have hindered the applicability of these algorithms to machine learning. Because of this, in applications to combinatorial spaces simple exact models are often preferred to more complex models that require approximate inference [4]. Variational inference would appear to provide an appealing alternative, given the success of variational methods for graphical models [5]; unfortunately, however, it is not obvious how to develop variational approximations for combinatorial objects such as matchings, partial orders, plane partitions and sequence alignments. We propose a new framework that extends variational inference to a wide range of combinatorial spaces. Our method is based on a simple assumption: the existence of a tractable measure factorization, which we show holds in many examples. Simulations on a range of matching models show that the algorithm is more general and empirically faster than a popular fully polynomial randomized algorithm. We also apply the framework to the problem of multiple alignment of protein sequences, obtaining state-of-the-art results on the BAliBASE dataset [6]. 1
4 0.78557998 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems
Author: Ronny Luss, Saharon Rosset, Moni Shahar
Abstract: A new algorithm for isotonic regression is presented based on recursively partitioning the solution space. We develop efficient methods for each partitioning subproblem through an equivalent representation as a network flow problem, and prove that this sequence of partitions converges to the global solution. These network flow problems can further be decomposed in order to solve very large problems. Success of isotonic regression in prediction and our algorithm’s favorable computational properties are demonstrated through simulated examples as large as 2 × 105 variables and 107 constraints.
5 0.75314885 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities
Author: Tian Lan, Yang Wang, Weilong Yang, Greg Mori
Abstract: We propose a discriminative model for recognizing group activities. Our model jointly captures the group activity, the individual person actions, and the interactions among them. Two new types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. Different from most of the previous latent structured models which assume a predefined structure for the hidden layer, e.g. a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. Our experimental results demonstrate that by inferring this contextual information together with adaptive structures, the proposed model can significantly improve activity recognition performance. 1
6 0.7133652 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
7 0.6667341 220 nips-2010-Random Projection Trees Revisited
8 0.58774769 222 nips-2010-Random Walk Approach to Regret Minimization
9 0.58350337 75 nips-2010-Empirical Risk Minimization with Approximations of Probabilistic Grammars
10 0.55589134 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
11 0.55494326 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs
12 0.54347062 285 nips-2010-Why are some word orders more common than others? A uniform information density account
13 0.54062235 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
14 0.53091359 221 nips-2010-Random Projections for $k$-means Clustering
15 0.52406794 274 nips-2010-Trading off Mistakes and Don't-Know Predictions
16 0.52195388 155 nips-2010-Learning the context of a category
17 0.51700121 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods
18 0.51308012 233 nips-2010-Scrambled Objects for Least-Squares Regression
19 0.50910652 288 nips-2010-Worst-case bounds on the quality of max-product fixed-points
20 0.50804603 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case