nips nips2010 nips2010-283 knowledge-graph by maker-knowledge-mining

283 nips-2010-Variational Inference over Combinatorial Spaces

Source: pdf

Author: Alexandre Bouchard-côté, Michael I. Jordan

Abstract: Since the discovery of sophisticated fully polynomial randomized algorithms for a range of #P problems [1, 2, 3], theoretical work on approximate inference in combinatorial spaces has focused on Markov chain Monte Carlo methods. Despite their strong theoretical guarantees, the slow running time of many of these randomized algorithms and the restrictive assumptions on the potentials have hindered the applicability of these algorithms to machine learning. Because of this, in applications to combinatorial spaces simple exact models are often preferred to more complex models that require approximate inference [4]. Variational inference would appear to provide an appealing alternative, given the success of variational methods for graphical models [5]; unfortunately, however, it is not obvious how to develop variational approximations for combinatorial objects such as matchings, partial orders, plane partitions and sequence alignments. We propose a new framework that extends variational inference to a wide range of combinatorial spaces. Our method is based on a simple assumption: the existence of a tractable measure factorization, which we show holds in many examples. Simulations on a range of matching models show that the algorithm is more general and empirically faster than a popular fully polynomial randomized algorithm. We also apply the framework to the problem of multiple alignment of protein sequences, obtaining state-of-the-art results on the BAliBASE dataset [6]. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Variational Inference over Combinatorial Spaces ∗ Alexandre Bouchard-Cˆ t´ ∗ oe Michael I. [sent-1, score-0.074]

2 Despite their strong theoretical guarantees, the slow running time of many of these randomized algorithms and the restrictive assumptions on the potentials have hindered the applicability of these algorithms to machine learning. [sent-3, score-0.145]

3 Because of this, in applications to combinatorial spaces simple exact models are often preferred to more complex models that require approximate inference [4]. [sent-4, score-0.43]

4 We propose a new framework that extends variational inference to a wide range of combinatorial spaces. [sent-6, score-0.671]

5 Our method is based on a simple assumption: the existence of a tractable measure factorization, which we show holds in many examples. [sent-7, score-0.144]

6 Simulations on a range of matching models show that the algorithm is more general and empirically faster than a popular fully polynomial randomized algorithm. [sent-8, score-0.171]

7 We also apply the framework to the problem of multiple alignment of protein sequences, obtaining state-of-the-art results on the BAliBASE dataset [6]. [sent-9, score-0.524]

8 This setup subsumes many probabilistic inference and classical combinatorics problems. [sent-11, score-0.176]

9 It is often intractable to compute this sum, so approximations are used. [sent-12, score-0.08]

10 Each Ci is larger than C, but paradoxically it is often possible to ﬁnd such a decomposition where for each i, 1 x∈Ci f (x) is tractable. [sent-14, score-0.041]

11 This paper describes an effective way of using this type of decomposition to approximate the original sum. [sent-16, score-0.041]

12 Another way of viewing this setup is in terms of exponential families. [sent-17, score-0.215]

13 In this view, described in detail in Section 2, the decomposition becomes a factorization of the base measure. [sent-18, score-0.308]

14 As we will show, the exponential family view gives a principled way of deﬁning variational approximations. [sent-19, score-0.423]

15 In order to make variational approximations tractable in the combinatorial setup, we use what we call an implicit message representation. [sent-20, score-0.825]

16 The canonical parameter space of the exponential family enables such representation. [sent-21, score-0.193]

17 We also show how additional approximations can be introduced in cases where the factorization has a large number of factors. [sent-22, score-0.25]

18 These further approximations rely on an outer bound of the partition function, and therefore preserve the guarantees of convex variational objective functions. [sent-23, score-0.392]

19 While previous authors have proposed mean ﬁeld or loopy belief propagation algorithms to approximate the partition function of a few speciﬁc combinatorial models—for example [7, 8] for parsing, 1 The appendices can be found in the supplementary material. [sent-24, score-0.566]

20 1 and [9, 10] for computing the permanent of a matrix—we are not aware of a general treatment of variational inference in combinatorial spaces. [sent-25, score-0.83]

21 There has been work on applying variational algorithms to the problem of maximization over combinatorial spaces [11, 12, 13, 14], but maximization over combinatorial spaces is rather different than summation. [sent-26, score-1.107]

22 For example, in the bipartite matching example considered in both [13] and this paper, there is a known polynomial algorithm for maximization, but not for summation. [sent-27, score-0.277]

23 Our approach is also related to agreement-based learning [15, 16], although agreement-based learning is deﬁned within the context of unsupervised learning using EM, while our framework is agnostic with respect to parameter estimation. [sent-28, score-0.04]

24 The paper is organized as follows: in Section 2 we present the measure factorization framework; in Section 3 we show examples of this framework applied to various combinatorial inference problems; and in Section 4 we present empirical results. [sent-29, score-0.633]

25 2 Variational measure factorization In this section, we present the variational measure factorization framework. [sent-30, score-0.731]

26 At a high level, the ﬁrst step is to construct an equivalent but more convenient exponential family. [sent-31, score-0.099]

27 This exponential family will allow us to transform variational algorithms over graphical models into approximation algorithms over combinatorial spaces. [sent-32, score-0.725]

28 We ﬁrst describe the techniques needed to do this transformation in the case of a speciﬁc variational inference algorithm—loopy belief propagation—and then discuss mean-ﬁeld and tree-reweighted approximations. [sent-33, score-0.436]

29 To make the exposition more concrete, we use the running example of approximating the value and gradient of the log-partition function of a Bipartite Matching model (BM) over KN,N , a well-known #P problem [17]. [sent-34, score-0.097]

30 Unless we mention otherwise, we will consider bipartite perfect matchings; nonbipartite and non-perfect matchings are discussed in Section 3. [sent-35, score-0.374]

31 The reader should keep in mind, however, that our framework is applicable to a much broader class of combinatorial objects. [sent-37, score-0.342]

32 The link between this setup and the general problem of computing x∈C f (x) is the base measure ν, which is set to the indicator function over C: ν(x) = 1[x ∈ C], where 1[·] is equal to one if its argument holds true, and zero otherwise. [sent-42, score-0.274]

33 The goal is to approximate A(θ) and A(θ) (recall that the j-th coordinate of the gradient, j A, is equal to the expectation of the sufﬁcient statistic φj under the exponential family with base measure ν [5]). [sent-43, score-0.423]

34 We want to exploit situations where the base measure can be written as a product of I I measures ν(x) = i=1 νi (x) such that each factor νi : X → {0, 1} induces a super-partition function assumed to be tractable: Ai (θ) = log x∈X exp{ φ(x), θ }νi (x). [sent-44, score-0.158]

35 In the case of BM, the space X is a product of N 2 binary alignment variables, x = x1,1 , x1,2 , . [sent-47, score-0.36]

36 In the Standard Bipartite Matching formulation (which we denote by SBM), the sufﬁcient statistic takes the form φj (x) = xm,n . [sent-51, score-0.111]

37 The measure factorization we use to enforce the matching property is ν = ν1 ν2 , where: N ν1 (x) = N m=1 N N xm,n ≤ 1], 1[ ν2 (x) = n=1 n=1 2 xm,n ≤ 1]. [sent-52, score-0.304]

38 1[ m=1 (2) We start by constructing an equivalent but more convenient exponential family. [sent-53, score-0.099]

39 < 25% identity short, 20% — 40% identity short, > 35% identity 0. [sent-54, score-0.174]

40 89 Table 1: Average SP scores in the ref1/test1 directory of BAliBASE. [sent-74, score-0.043]

41 BPMF-i denotes the average SP of the BPMF algorithm after i iterations of (parallel) message passing. [sent-75, score-0.091]

42 The experimental setup is based on a generative model over noisy observations of bipartite perfect matchings described in Appendix C. [sent-78, score-0.49]

43 We show in Figure 3(c) the results of a sequence of these experiments for different bipartite component sizes N/2. [sent-80, score-0.228]

44 This experiments demonstrates the scalability of sophisticated factorizations, and their superiority over simpler ones. [sent-81, score-0.075]

45 2 Multiple sequence alignment To assess the practical signiﬁcance of this framework, we also apply it to BAliBASE [6], a standard protein multiple sequence alignment benchmark. [sent-83, score-0.976]

46 12 [24], the most popular multiple alignment tool, and ProbCons 1. [sent-86, score-0.408]

47 12, a state-of-the-art system [25] that also relies on enforcing transitivity constraints, but which is not derived via the optimization of an objective function. [sent-87, score-0.118]

48 Our system uses a basic pair HMM [26] to score pairwise alignments. [sent-88, score-0.039]

49 6 The advantage of our system over the other systems is the better optimization technique, based on the measure factorization described in Section 3. [sent-90, score-0.27]

50 We used a standard technique to transform the pairwise alignment marginals into a single valid multiple sequence alignment (see Appendix C. [sent-92, score-0.834]

51 Our system outperformed both baselines after three BPMF parallel message passing iterations. [sent-94, score-0.171]

52 The algorithm converged in all protein groups, and performance was identical after more than three iterations. [sent-95, score-0.076]

53 Although the overall performance gain is not statistically signiﬁcant according to a Wilcoxon signed-rank test, the larger gains were obtained in the small identity subset, the “twilight zone” where research on multiple sequence alignment has focused. [sent-96, score-0.532]

54 One caveat of this multiple alignment approach is its running time, which is cubic in the length of the longest sequence, while most multiple sequence alignment approaches are quadratic. [sent-97, score-0.968]

55 For example, the running time for one iteration of BPMF in this experiment was 364. [sent-98, score-0.046]

56 98s for Clustal—this is why we have restricted the experiments to the short sequences section of BAliBASE. [sent-100, score-0.046]

57 Fortunately, several techniques are available to decrease the computational complexity of this algorithm: the transitivity factors can be subsampled using a coarse pass, or along a phylogenetic tree; and computation of the factors can be entirely parallelized. [sent-101, score-0.158]

58 5 Conclusion Computing the moments of discrete exponential families can be difﬁcult for two reasons: the structure of the sufﬁcient statistic that can create junction trees of high tree-width, and the structure of the base measures that can induce an intractable combinatorial space. [sent-103, score-0.712]

59 Most previous work on variational approximations has focused on the ﬁrst difﬁculty; however, the second challenge also arises frequently in machine learning. [sent-104, score-0.349]

60 In this work, we have presented a framework that ﬁlls this gap. [sent-105, score-0.04]

61 It is based on an intuitive notion of measure factorization, which, as we have shown, applies to a variety of combinatorial spaces. [sent-106, score-0.363]

62 This notion enables variational algorithms to be adapted to the combinatorial setting. [sent-107, score-0.571]

63 A polynomial-time approximation algorithm for the permanent of a matrix with non-negative entries. [sent-114, score-0.199]

64 Mixing times of lozenge tiling and card shufﬂing Markov chains. [sent-117, score-0.043]

65 BAliBASE: A benchmark alignments database for e e the evaluation of multiple sequence alignment programs. [sent-128, score-0.517]

66 Belief propagation and loop calculus for the permanent of a non-negative matrix. [sent-141, score-0.265]

67 Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudomoment matching. [sent-187, score-0.173]

68 In oe Proceedings of Uncertainty in Artiﬁcal Intelligence, 2009. [sent-192, score-0.074]

69 CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. [sent-202, score-0.474]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('alignment', 0.36), ('combinatorial', 0.302), ('variational', 0.269), ('permanent', 0.199), ('clustal', 0.197), ('matchings', 0.173), ('factorization', 0.17), ('bipartite', 0.162), ('balibase', 0.148), ('bpmf', 0.148), ('dan', 0.132), ('setup', 0.116), ('statistic', 0.111), ('belief', 0.107), ('exponential', 0.099), ('probcons', 0.099), ('base', 0.097), ('emnlp', 0.097), ('parsing', 0.097), ('michael', 0.095), ('message', 0.091), ('ci', 0.087), ('mol', 0.086), ('ben', 0.083), ('tractable', 0.083), ('approximations', 0.08), ('transitivity', 0.079), ('phylogenetic', 0.079), ('percy', 0.079), ('protein', 0.076), ('sp', 0.074), ('oe', 0.074), ('matching', 0.073), ('alexandre', 0.071), ('david', 0.069), ('spaces', 0.068), ('appendix', 0.068), ('propagation', 0.066), ('sequence', 0.066), ('klein', 0.065), ('measure', 0.061), ('inference', 0.06), ('bm', 0.059), ('taskar', 0.058), ('identity', 0.058), ('north', 0.057), ('liang', 0.057), ('randomized', 0.056), ('family', 0.055), ('daphne', 0.054), ('moments', 0.054), ('approximating', 0.051), ('linguistics', 0.05), ('families', 0.049), ('maximization', 0.049), ('loopy', 0.048), ('multiple', 0.048), ('short', 0.046), ('running', 0.046), ('dp', 0.046), ('partition', 0.043), ('yusuke', 0.043), ('hindered', 0.043), ('brudno', 0.043), ('sbm', 0.043), ('hbm', 0.043), ('thompson', 0.043), ('card', 0.043), ('chuong', 0.043), ('sera', 0.043), ('ashish', 0.043), ('directory', 0.043), ('siepel', 0.043), ('alignments', 0.043), ('watanabe', 0.043), ('eld', 0.043), ('polynomial', 0.042), ('martin', 0.042), ('passing', 0.041), ('decomposition', 0.041), ('wainwright', 0.04), ('wilcoxon', 0.04), ('shuf', 0.04), ('mutation', 0.04), ('zone', 0.04), ('caveat', 0.04), ('carsten', 0.04), ('bert', 0.04), ('graham', 0.04), ('bart', 0.04), ('framework', 0.04), ('canonical', 0.039), ('perfect', 0.039), ('system', 0.039), ('sophisticated', 0.038), ('peterson', 0.037), ('enumeration', 0.037), ('yedidia', 0.037), ('conductance', 0.037), ('superiority', 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 283 nips-2010-Variational Inference over Combinatorial Spaces

Author: Alexandre Bouchard-côté, Michael I. Jordan

2 0.10905297 118 nips-2010-Implicit Differentiation by Perturbation

Author: Justin Domke

Abstract: This paper proposes a simple and eﬃcient ﬁnite diﬀerence method for implicit diﬀerentiation of marginal inference results in discrete graphical models. Given an arbitrary loss function, deﬁned on marginals, we show that the derivatives of this loss with respect to model parameters can be obtained by running the inference procedure twice, on slightly perturbed model parameters. This method can be used with approximate inference, with a loss function over approximate marginals. Convenient choices of loss functions make it practical to ﬁt graphical models with hidden variables, high treewidth and/or model misspeciﬁcation. 1

3 0.10652796 97 nips-2010-Functional Geometry Alignment and Localization of Brain Areas

Author: Georg Langs, Yanmei Tie, Laura Rigolo, Alexandra Golby, Polina Golland

Abstract: Matching functional brain regions across individuals is a challenging task, largely due to the variability in their location and extent. It is particularly difﬁcult, but highly relevant, for patients with pathologies such as brain tumors, which can cause substantial reorganization of functional systems. In such cases spatial registration based on anatomical data is only of limited value if the goal is to establish correspondences of functional areas among different individuals, or to localize potentially displaced active regions. Rather than rely on spatial alignment, we propose to perform registration in an alternative space whose geometry is governed by the functional interaction patterns in the brain. We ﬁrst embed each brain into a functional map that reﬂects connectivity patterns during a fMRI experiment. The resulting functional maps are then registered, and the obtained correspondences are propagated back to the two brains. In application to a language fMRI experiment, our preliminary results suggest that the proposed method yields improved functional correspondences across subjects. This advantage is pronounced for subjects with tumors that affect the language areas and thus cause spatial reorganization of the functional regions. 1

4 0.097070359 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

Author: Manfred Opper, Andreas Ruttor, Guido Sanguinetti

Abstract: We present a novel approach to inference in conditionally Gaussian continuous time stochastic processes, where the latent process is a Markovian jump process. We ﬁrst consider the case of jump-diffusion processes, where the drift of a linear stochastic differential equation can jump at arbitrary time points. We derive partial differential equations for exact inference and present a very efﬁcient mean ﬁeld approximation. By introducing a novel lower bound on the free energy, we then generalise our approach to Gaussian processes with arbitrary covariance, such as the non-Markovian RBF covariance. We present results on both simulated and real data, showing that the approach is very accurate in capturing latent dynamics and can be useful in a number of real data modelling tasks.

5 0.092435859 268 nips-2010-The Neural Costs of Optimal Control

Author: Samuel Gershman, Robert Wilson

Abstract: Optimal control entails combining probabilities and utilities. However, for most practical problems, probability densities can be represented only approximately. Choosing an approximation requires balancing the beneﬁts of an accurate approximation against the costs of computing it. We propose a variational framework for achieving this balance and apply it to the problem of how a neural population code should optimally represent a distribution under resource constraints. The essence of our analysis is the conjecture that population codes are organized to maximize a lower bound on the log expected utility. This theory can account for a plethora of experimental data, including the reward-modulation of sensory receptive ﬁelds, GABAergic effects on saccadic movements, and risk aversion in decisions under uncertainty. 1

6 0.088191092 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings

7 0.082510673 194 nips-2010-Online Learning for Latent Dirichlet Allocation

8 0.075496957 61 nips-2010-Direct Loss Minimization for Structured Prediction

9 0.069508284 288 nips-2010-Worst-case bounds on the quality of max-product fixed-points

10 0.069253325 263 nips-2010-Switching state space model for simultaneously estimating state transitions and nonstationary firing rates

11 0.06825427 90 nips-2010-Fast Large-scale Mixture Modeling with Component-specific Data Partitions

12 0.067963317 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

13 0.066790283 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits

14 0.066755399 165 nips-2010-MAP estimation in Binary MRFs via Bipartite Multi-cuts

15 0.066421106 164 nips-2010-MAP Estimation for Graphical Models by Likelihood Maximization

16 0.065987736 284 nips-2010-Variational bounds for mixed-data factor analysis

17 0.06499932 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction

18 0.063750498 144 nips-2010-Learning Efficient Markov Networks

19 0.060683101 257 nips-2010-Structured Determinantal Point Processes

20 0.060284376 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.172), (1, 0.035), (2, 0.017), (3, 0.04), (4, -0.106), (5, 0.024), (6, 0.01), (7, 0.002), (8, 0.05), (9, 0.069), (10, -0.081), (11, -0.026), (12, 0.094), (13, 0.001), (14, -0.077), (15, 0.003), (16, -0.023), (17, 0.074), (18, 0.072), (19, -0.049), (20, -0.004), (21, 0.04), (22, 0.055), (23, -0.052), (24, 0.018), (25, -0.13), (26, -0.058), (27, -0.006), (28, -0.075), (29, -0.011), (30, 0.038), (31, -0.09), (32, -0.041), (33, -0.011), (34, 0.074), (35, -0.039), (36, -0.049), (37, -0.027), (38, 0.047), (39, 0.011), (40, 0.005), (41, -0.006), (42, 0.125), (43, -0.075), (44, 0.114), (45, -0.168), (46, -0.066), (47, -0.018), (48, 0.05), (49, -0.118)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95667362 283 nips-2010-Variational Inference over Combinatorial Spaces

Author: Alexandre Bouchard-côté, Michael I. Jordan

2 0.62867367 90 nips-2010-Fast Large-scale Mixture Modeling with Component-specific Data Partitions

Author: Bo Thiesson, Chong Wang

Abstract: Remarkably easy implementation and guaranteed convergence has made the EM algorithm one of the most used algorithms for mixture modeling. On the downside, the E-step is linear in both the sample size and the number of mixture components, making it impractical for large-scale data. Based on the variational EM framework, we propose a fast alternative that uses component-speciﬁc data partitions to obtain a sub-linear E-step in sample size, while the algorithm still maintains provable convergence. Our approach builds on previous work, but is signiﬁcantly faster and scales much better in the number of mixture components. We demonstrate this speedup by experiments on large-scale synthetic and real data. 1

3 0.61219561 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

Author: Manfred Opper, Andreas Ruttor, Guido Sanguinetti

4 0.60752016 118 nips-2010-Implicit Differentiation by Perturbation

Author: Justin Domke

5 0.5601244 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits

Author: Daniel Lowd, Pedro Domingos

Abstract: Arithmetic circuits (ACs) exploit context-speciﬁc independence and determinism to allow exact inference even in networks with high treewidth. In this paper, we introduce the ﬁrst ever approximate inference methods using ACs, for domains where exact inference remains intractable. We propose and evaluate a variety of techniques based on exact compilation, forward sampling, AC structure learning, Markov network parameter learning, variational inference, and Gibbs sampling. In experiments on eight challenging real-world domains, we ﬁnd that the methods based on sampling and learning work best: one such method (AC2 -F) is faster and usually more accurate than loopy belief propagation, mean ﬁeld, and Gibbs sampling; another (AC2 -G) has a running time similar to Gibbs sampling but is consistently more accurate than all baselines. 1

6 0.55189204 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization

7 0.54386216 263 nips-2010-Switching state space model for simultaneously estimating state transitions and nonstationary firing rates

8 0.48908553 144 nips-2010-Learning Efficient Markov Networks

9 0.47611231 284 nips-2010-Variational bounds for mixed-data factor analysis

10 0.46679938 164 nips-2010-MAP Estimation for Graphical Models by Likelihood Maximization

11 0.46581945 35 nips-2010-Auto-Regressive HMM Inference with Incomplete Data for Short-Horizon Wind Forecasting

12 0.46052498 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction

13 0.45667249 84 nips-2010-Exact inference and learning for cumulative distribution functions on loopy graphs

14 0.45582214 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models

15 0.44902009 159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features

16 0.43811581 226 nips-2010-Repeated Games against Budgeted Adversaries

17 0.42551762 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades

18 0.42479447 11 nips-2010-A POMDP Extension with Belief-dependent Rewards

19 0.41891319 251 nips-2010-Sphere Embedding: An Application to Part-of-Speech Induction

20 0.41871265 194 nips-2010-Online Learning for Latent Dirichlet Allocation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.021), (17, 0.019), (27, 0.02), (30, 0.629), (45, 0.146), (50, 0.027), (52, 0.018), (60, 0.018), (77, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95586711 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis

Author: Hariharan Narayanan, Sanjoy Mitter

Abstract: The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of ﬁtting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume log 1 and curvature is unavoidable. Whether the known lower bound of O( k + 2 δ ) 2 for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. Here is the desired bound on the error and δ is a bound on the probability of failure. We improve the best currently known upper bound [14] of 2 log 1 log4 k log 1 O( k2 + 2 δ ) to O k min k, 2 + 2 δ . Based on these results, we 2 devise a simple algorithm for k−means and another that uses a family of convex programs to ﬁt a piecewise linear curve of a speciﬁed length to high dimensional data, where the sample complexity is independent of the ambient dimension. 1

2 0.93214095 264 nips-2010-Synergies in learning words and their referents

Author: Mark Johnson, Katherine Demuth, Bevan Jones, Michael J. Black

Abstract: This paper presents Bayesian non-parametric models that simultaneously learn to segment words from phoneme strings and learn the referents of some of those words, and shows that there is a synergistic interaction in the acquisition of these two kinds of linguistic information. The models themselves are novel kinds of Adaptor Grammars that are an extension of an embedding of topic models into PCFGs. These models simultaneously segment phoneme sequences into words and learn the relationship between non-linguistic objects to the words that refer to them. We show (i) that modelling inter-word dependencies not only improves the accuracy of the word segmentation but also of word-object relationships, and (ii) that a model that simultaneously learns word-object relationships and word segmentation segments more accurately than one that just learns word segmentation on its own. We argue that these results support an interactive view of language acquisition that can take advantage of synergies such as these. 1

same-paper 3 0.89001596 283 nips-2010-Variational Inference over Combinatorial Spaces

Author: Alexandre Bouchard-côté, Michael I. Jordan

4 0.78557998 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems

Author: Ronny Luss, Saharon Rosset, Moni Shahar

Abstract: A new algorithm for isotonic regression is presented based on recursively partitioning the solution space. We develop efﬁcient methods for each partitioning subproblem through an equivalent representation as a network ﬂow problem, and prove that this sequence of partitions converges to the global solution. These network ﬂow problems can further be decomposed in order to solve very large problems. Success of isotonic regression in prediction and our algorithm’s favorable computational properties are demonstrated through simulated examples as large as 2 × 105 variables and 107 constraints.

5 0.75314885 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities

Author: Tian Lan, Yang Wang, Weilong Yang, Greg Mori

Abstract: We propose a discriminative model for recognizing group activities. Our model jointly captures the group activity, the individual person actions, and the interactions among them. Two new types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. Different from most of the previous latent structured models which assume a predeﬁned structure for the hidden layer, e.g. a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. Our experimental results demonstrate that by inferring this contextual information together with adaptive structures, the proposed model can signiﬁcantly improve activity recognition performance. 1

6 0.7133652 270 nips-2010-Tight Sample Complexity of Large-Margin Learning

7 0.6667341 220 nips-2010-Random Projection Trees Revisited

8 0.58774769 222 nips-2010-Random Walk Approach to Regret Minimization

9 0.58350337 75 nips-2010-Empirical Risk Minimization with Approximations of Probabilistic Grammars

10 0.55589134 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

11 0.55494326 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

12 0.54347062 285 nips-2010-Why are some word orders more common than others? A uniform information density account

13 0.54062235 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework

14 0.53091359 221 nips-2010-Random Projections for $k$-means Clustering

15 0.52406794 274 nips-2010-Trading off Mistakes and Don't-Know Predictions

16 0.52195388 155 nips-2010-Learning the context of a category

17 0.51700121 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods

18 0.51308012 233 nips-2010-Scrambled Objects for Least-Squares Regression

19 0.50910652 288 nips-2010-Worst-case bounds on the quality of max-product fixed-points

20 0.50804603 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case