nips nips2010 nips2010-90 knowledge-graph by maker-knowledge-mining

90 nips-2010-Fast Large-scale Mixture Modeling with Component-specific Data Partitions

Source: pdf

Author: Bo Thiesson, Chong Wang

Abstract: Remarkably easy implementation and guaranteed convergence has made the EM algorithm one of the most used algorithms for mixture modeling. On the downside, the E-step is linear in both the sample size and the number of mixture components, making it impractical for large-scale data. Based on the variational EM framework, we propose a fast alternative that uses component-speciﬁc data partitions to obtain a sub-linear E-step in sample size, while the algorithm still maintains provable convergence. Our approach builds on previous work, but is signiﬁcantly faster and scales much better in the number of mixture components. We demonstrate this speedup by experiments on large-scale synthetic and real data. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Fast Large-scale Mixture Modeling with Component-speciﬁc Data Partitions Bo Thiesson∗ Microsoft Research Chong Wang∗† Princeton University Abstract Remarkably easy implementation and guaranteed convergence has made the EM algorithm one of the most used algorithms for mixture modeling. [sent-1, score-0.119]

2 Based on the variational EM framework, we propose a fast alternative that uses component-speciﬁc data partitions to obtain a sub-linear E-step in sample size, while the algorithm still maintains provable convergence. [sent-3, score-0.312]

3 Our approach builds on previous work, but is signiﬁcantly faster and scales much better in the number of mixture components. [sent-4, score-0.115]

4 We demonstrate this speedup by experiments on large-scale synthetic and real data. [sent-5, score-0.117]

5 1 Introduction Probabilistic mixture modeling [7] has been widely used for density estimation and clustering applications. [sent-6, score-0.123]

6 In particular, the E-step is linear in both the number of data points and the number of mixture components, and therefore computationally impractical for large-scale applications. [sent-9, score-0.151]

7 Our work was motivated by a large-scale geo-spatial problem, demanding a mixture model of a customer base (a huge number of data points) for competing businesses (a large number mixture components), as the basis for site evaluation (where to locate a new store). [sent-10, score-0.25]

8 Our work is inspired by the “chunky EM” algorithm in [17, 16], a smart application of the variational EM framework [11], where a lower bound on the objective function increases at each iteration and convergence is guaranteed. [sent-16, score-0.203]

9 The variational EM framework alters the E-step to use sufﬁcient statistics calculated under a variational distribution instead. [sent-18, score-0.306]

10 In chunky EM, the speedup is obtained by using a variational distribution with shared (variational) membership probabilities for blocks of data (in an exhaustive partition for the entire data into non-overlapping blocks of data). [sent-19, score-1.102]

11 The chunky EM starts from a coarse partition of the data and gradually reﬁnes the partition until convergence. [sent-20, score-0.729]

12 However, chunky EM does not scale well in the number of components, since all components share the same partition. [sent-21, score-0.493]

13 The individual components are different – in order to obtain membership probabilities of appropriate quality, one component may need ﬁne-grained blocks in one area of the data space, while another component is perfectly ﬁne with coarse blocks in that area. [sent-22, score-0.401]

14 Chunky EM expands the shared partition to match the needed granularity for the most demanding mixture component in any area of the data space, which might unnecessarily increase the computational *Equal contributors. [sent-23, score-0.345]

15 We demonstrate a signiﬁcant performance improvement over standard and chunky EM for experiments on synthetic and mentioned customer-business data. [sent-27, score-0.478]

16 data x {x1 , · · · , xN }, we are interested in estimating the parameters θ = {η1:K , π1:K } in the K-component mixture model with log-likelihood function L(θ) = n log k p(xn |ηk )πk . [sent-32, score-0.132]

17 (1) For this task, we consider a variational generalization [11] of standard EM [4], which maximizes a lower bound of L(θ) through the introduction of a variational distribution q. [sent-33, score-0.337]

18 We assume that the variational distribution factorizes in accordance with data points, i. [sent-34, score-0.172]

19 e, q = n qn , where each qn is an arbitrary discrete distribution over mixture components k = 1, . [sent-35, score-0.303]

20 The variational EM algorithm alternates the following two steps, i. [sent-40, score-0.153]

21 The variational EM is in this case equivalent to the standard EM, and hence produces the maximum likelihood (ML) estimate. [sent-45, score-0.153]

22 In the following, we consider certain ways of restricting q to attain speedup over standard EM, implying that the minimum KL-divergence between qn and p(·|xn , θ) is not necessarily zero. [sent-46, score-0.172]

23 Still the variational EM deﬁnes a convergent algorithm, which instead optimizes a lower bound of the log-likelihood. [sent-47, score-0.184]

24 The chunky EM algorithm [17, 16] falls into the framework of variational EM algorithms. [sent-49, score-0.605]

25 In chunky EM, the variational distribution q = n qn is restricted according to a partition into exhaustive and mutually exclusive blocks of the data. [sent-50, score-0.887]

26 The intuition is that data points in the same block are somewhat similar and can be treated in the same way, which leads to computational savings in the E-step. [sent-52, score-0.117]

27 If M is the number of blocks in a given partition, the E-step for chunky EM has cost O(KM ) whereas in standard EM the cost is O(KN ). [sent-53, score-0.534]

28 The speedup is gained by a trade-off between the tightness of the lower bound for the log-likelihood and the restrictiveness of constraints. [sent-55, score-0.122]

29 Chunky EM starts from a coarse partition and iteratively reﬁnes it. [sent-56, score-0.139]

30 This reﬁnement process always produces a tighter bound, since restrictions on the variational distribution are gradually relaxed. [sent-57, score-0.153]

31 The chunky EM algorithm stops when reﬁning any block in a partition will not signiﬁcantly increase the lower bound. [sent-58, score-0.655]

32 3 Component-speciﬁc EM In chunky EM, all mixture components share the same data partition. [sent-59, score-0.612]

33 However, for a particular block of data, the variation in membership probabilities differs across components, resulting in varying differences from the equality constrained variational probabilities. [sent-60, score-0.32]

34 Roughly, the variation in membership probabilities is greatest for components closer to a block of data, and, in particular, for components far away the membership probabilities are all so small that the variation is insigniﬁcant. [sent-61, score-0.348]

35 Let Mk be the number of data blocks in the partition for component k. [sent-63, score-0.256]

36 The complexity for the E-step is then O( k Mk ), compared to O(KM ) in chunky EM. [sent-64, score-0.452]

37 Since our model maintains different partitions for different mixture components, we call it the component-speciﬁc EM algorithm (CS-EM). [sent-66, score-0.198]

38 The bottom-right ﬁgure is the corresponding MPT, where {·} indicates the component marks and a, b, c, d, e, f, g enumerate all the marked nodes. [sent-68, score-0.12]

39 Starting from a coarse partition for each component (see Section 4. [sent-73, score-0.175]

40 1 for examples), CS-EM runs variational EM to convergence and then selectively reﬁne the component-speciﬁc partitions. [sent-74, score-0.191]

41 1 Marked Partition Trees It is convenient to organize the data into a pre-computed partition tree, where a node in the tree represents the union of the data represented by its children. [sent-80, score-0.22]

42 ) Any hierarchical decomposition of data that ensures some degree of similarity between data in a block is suitable for constructing a partition tree. [sent-84, score-0.225]

43 We will in the following consider tree-consistent partitions, where each data block in a partition corresponds to exactly one node for a cut (possibly across different levels) in the tree–see Figure 1. [sent-87, score-0.244]

44 Let us now deﬁne a marked partition tree (MPT), a simple encoding of all component-speciﬁc partitions, as follows. [sent-88, score-0.201]

45 Let Bk be the data partition (a set of blocks) in the tree-consistent partition for mixture component k. [sent-89, score-0.393]

46 In Figure 1, for example, B1 is the partition into data blocks associated with nodes {e, c, d}. [sent-90, score-0.245]

47 In the shared data partition tree used to generate the component-speciﬁc partitions, we mark the corresponding nodes for the data blocks in each Bk by the component identiﬁer k. [sent-91, score-0.325]

48 For example, in chunky EM, the component-speciﬁc partitions are the same for each mixture component. [sent-97, score-0.65]

49 In this case, only the leaves in the MPT are marked, with each leaf marked by all mixture components. [sent-98, score-0.234]

50 The following important property for a MPT holds since all component-speciﬁc partitions are constructed with respect to the same data partition tree. [sent-99, score-0.236]

51 The marked nodes on a path from leaf to root in T mark exactly one data block from each of the K component-speciﬁc data partitions. [sent-102, score-0.276]

52 In the following, it becomes important to identify the data block in a component-speciﬁc partition, which embeds the block deﬁned by a leaf. [sent-103, score-0.155]

53 Let L denote the set of leaves in T , and let BL denote a partition with data blocks Bl ∈ BL according to these leaves. [sent-104, score-0.242]

54 Example: In Figure 1, the path a → e → g in turn marks the components Ka = {3, 4}, Ke = {1, 2}, and Kg = {5} and we see that each component is marked exactly once on this path, as stated in Property 1. [sent-107, score-0.178]

55 2 The Variational Distribution Our variational distribution q assigns the same variational membership probability to mixture component k for all data points in a component-speciﬁc block Bk ∈ Bk . [sent-110, score-0.609]

56 That is, qn (k) = qBk for all xn ∈ Bk , (4) which we denote as the component-speciﬁc block constraint. [sent-111, score-0.172]

57 Unlike chunky EM, we do not assume that the data partition Bk is the same across different mixture components. [sent-112, score-0.69]

58 3 To further drive intuition behind the E-step complication, let us make the sum-to-one constraint for the variational distributions qn (·) explicit. [sent-115, score-0.248]

59 That is, k qn (k) = 1 for all data points n, which according to the above block constraint and using Property 1 can be reformulated as the |L| constraints k qBk(l) = 1 for all l ∈ L. [sent-116, score-0.211]

60 The closer a data block Bk is to the root of T the more constraints simultaneously involve the same qBk . [sent-119, score-0.13]

61 1 For chunky EM, the E-step is straightforward, because Bk(l) = Bl and therefore l:Bl ⊆Bk(l) λl = λl for all k = 1, . [sent-137, score-0.468]

62 Consider a leaf node l ∈ L and recall that Kl denotes the components with Bk(l) = Bl in their partitions. [sent-147, score-0.134]

63 Efﬁcient Variational M-step In the variational M-step the model parameters θ = {η1:K , π1:K } are updated by maximizing Eq. [sent-170, score-0.153]

64 Any reﬁnement enlarges the family of variational distributions, and therefore always tightens the optimal lower bound for the log-likelihood. [sent-182, score-0.2]

65 We deﬁne a reﬁnement unit as the reﬁnement of one data block in the current partition for one component in the model. [sent-183, score-0.242]

66 This introduces K new free variational parameters, which is similar to a reﬁnement step in chunky EM. [sent-187, score-0.605]

67 However, chunky EM reﬁnes the same data block across all components, which is not the case in CS-EM. [sent-188, score-0.539]

68 2: repeat 3: repeat 4: Run variational E-step and M-step. [sent-192, score-0.153]

69 2: for each marked node v in T do 3: Compute q via E-step with constraints as in (14). [sent-199, score-0.122]

70 To approximate the new variational distribution q , we ﬁx the ¯ ¯ value for each qBk(l) , with k ∈ Ku and l ∈ L, to the value obtained for the distribution q before the ¯ reﬁnement. [sent-210, score-0.153]

71 ¯ The improvement to F that is achieved by the reﬁnement-guiding E-step for the reﬁnement unit reﬁning data block v for component k is denoted ∆Fv,k , and can be computed as ∆Fv,k = u∈ch(v) F(θ, qBk(u) ) − F(θ, qBk(v) ). [sent-215, score-0.123]

72 Notice that this selective reﬁnement step will most likely not reﬁne the same data block for all components and therefore creates component-speciﬁc partitions. [sent-217, score-0.144]

73 2 4 Experiments In this section we provide a systematic evaluation of CS-EM, chunky EM, and standard EM on synthetic data, as well as a comparison between CS-EM and chunky EM on the business-customer data, mentioned in Section 1. [sent-221, score-0.93]

74 The locations for competing stores of a particular type, in this way, correspond to ﬁxed centers for components in a mixture model. [sent-233, score-0.2]

75 After constructing KD-tree, the ﬁrst tree-level containing at least K nodes ( log2 K ) is used as the initial data partition for both chunky EM and all components in CS-EM. [sent-236, score-0.671]

76 For all algorithms (including standard EM), we randomly chose K data blocks from the initial partition and initialized parameters for the individual mixture components accordingly. [sent-237, score-0.39]

77 The experiments on the business-customer data are initialized in the same way, except that the component centers are ﬁxed and the initial data blocks that cover these centers are used for initializing the remaining parameters. [sent-239, score-0.219]

78 It starts from the CS-EM initialization and recursively, according to the KD-tree structure, merges two data blocks in a component-speciﬁc partition, if the merge has little effect on that component. [sent-241, score-0.131]

79 We recorded the log-likelihood for the test data at each iteration of the EM algorithm, and before each S-step in chunky EM and the CS-EM. [sent-245, score-0.471]

80 4 We now recorded the run-time for each algorithm to reach this baseline, and computed the EM-speedup factors for chunky EM, CS-EM, and CS-EM∗ , each deﬁned as the standard EM run-time divided by the run-time for the alternative algorithm. [sent-247, score-0.472]

81 First of all, we see that both CS-EM and CSEM∗ are signiﬁcantly faster than chunky EM in all experiments. [sent-250, score-0.467]

82 In general, the k Mk variational parameters needed for the CS-EM algorithms is far fewer than the KM parameters needed for chunky EM in order to reach an estimate of same quality. [sent-251, score-0.625]

83 This observation can be explained by the fact that the resulting component-speciﬁc data partitions greatly reﬁne the initial partitions, and any computational speedup due to the smarter initial partition in CS-EM∗ is therefore overwhelmed. [sent-256, score-0.373]

84 However, the last plot in Figure 4 reveals an important difference between chunky EM and CS-EM: with a ﬁxed ratio between number of data points and number of clusters, the EM-speedup declines a lot for chunky EM, as the number of clusters and data points increases. [sent-259, score-0.989]

85 To compare run-times for chunky 3 Let µ and Σ be the mean and variance parameter for an initial component, and µp , µl , and µr denote the sample mean for data in the considered parent, left and right child. [sent-264, score-0.486]

86 Figure 5: A comparison of run-time and ﬁnal number of variational parameters for Chunky EM vs. [sent-270, score-0.153]

87 Figure 5 shows the speedup (time ratio) and the reduction in number of variational parameters (parameter ratio) for CS-EM compared to chunky EM, as evaluated on exemplary types of businesses. [sent-287, score-0.717]

88 Again, CS-EM is signiﬁcantly faster than chunky EM and the speedup is achieved by a better targeting of variational distribution through the component-speciﬁc partitions. [sent-288, score-0.711]

89 CS-EM combines the best from two major directions in the literature regarding speedup of EM for mixture modeling. [sent-290, score-0.191]

90 In contrast, CS-EM is provable convergent–even for arbitrary rough partitions, if extreme speedup is needed. [sent-294, score-0.133]

91 The granularity of partitions in [10] is controlled by a user-speciﬁed threshold on the minimum and maximum membership probabilities that are reachable within the boundaries of a node in the KD-tree. [sent-295, score-0.257]

92 Finally, [10] “prunes” a component (sets the membership probability to zero) for data far away from the component. [sent-298, score-0.119]

93 The second direction of speedup approaches are based on the variational EM framework [11]. [sent-300, score-0.244]

94 The chunky EM in [17, 16] is the approach most similar to our CS-EM. [sent-305, score-0.452]

95 Both are based on the variational EM framework and therefore guarantees convergence, but CS-EM is faster and scales better in the number of clusters. [sent-306, score-0.184]

96 Sub-sampling therefore has same computational complexity as chunky EM, and our results therefore suggest that we should expect CS-EM to be much faster than sub-sampling and scale better in the number of mixture components. [sent-309, score-0.599]

97 Finally, we exempliﬁed our work by using KD-trees as the tree-consistent partition structure for generating the component-speciﬁc partitions in CS-EM, which limited its effectiveness in high dimensions. [sent-310, score-0.217]

98 However, any hierarchical partition structure can be used, and the work in [8] therefore suggest that changing to an anchor tree (a special kind of metric tree [15]) will also render CS-EM effective in high dimensions, under the assumption of lower intrinsic dimensionality for the data. [sent-311, score-0.201]

99 , k-means clustering [6, 13, 5] and 2) general EM applications beyond mixture modeling. [sent-315, score-0.123]

100 Very fast EM-based mixture model clustering using multiresolution kd-trees. [sent-379, score-0.123]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('qbk', 0.595), ('chunky', 0.452), ('bk', 0.32), ('em', 0.312), ('gbk', 0.155), ('variational', 0.153), ('mpt', 0.143), ('partition', 0.119), ('bl', 0.102), ('nement', 0.102), ('mixture', 0.1), ('partitions', 0.098), ('ku', 0.094), ('speedup', 0.091), ('blocks', 0.082), ('qn', 0.081), ('block', 0.068), ('membership', 0.064), ('re', 0.061), ('kl', 0.059), ('marked', 0.057), ('leaf', 0.055), ('mk', 0.051), ('kv', 0.043), ('provable', 0.042), ('components', 0.041), ('fs', 0.041), ('ch', 0.041), ('node', 0.038), ('component', 0.036), ('units', 0.036), ('granularity', 0.036), ('xxt', 0.036), ('ft', 0.034), ('accelerating', 0.029), ('km', 0.028), ('ql', 0.027), ('ke', 0.027), ('stores', 0.027), ('marks', 0.027), ('constraints', 0.027), ('synthetic', 0.026), ('ka', 0.026), ('tree', 0.025), ('nodes', 0.025), ('intertwined', 0.024), ('nail', 0.024), ('salon', 0.024), ('business', 0.023), ('clustering', 0.023), ('xn', 0.023), ('leaves', 0.022), ('probabilities', 0.021), ('exemplary', 0.021), ('thiesson', 0.021), ('priority', 0.021), ('unmarked', 0.021), ('coarse', 0.02), ('reach', 0.02), ('lagrange', 0.02), ('data', 0.019), ('convergence', 0.019), ('unnecessarily', 0.019), ('nements', 0.019), ('queue', 0.019), ('selectively', 0.019), ('nes', 0.018), ('ru', 0.018), ('kb', 0.018), ('stored', 0.018), ('children', 0.018), ('accordingly', 0.017), ('verbeek', 0.017), ('default', 0.017), ('centers', 0.017), ('path', 0.017), ('exp', 0.017), ('therefore', 0.016), ('root', 0.016), ('parent', 0.016), ('notice', 0.016), ('points', 0.016), ('initialization', 0.016), ('customers', 0.016), ('demanding', 0.016), ('lower', 0.016), ('initial', 0.015), ('bound', 0.015), ('ning', 0.015), ('ratio', 0.015), ('competing', 0.015), ('faster', 0.015), ('variation', 0.014), ('merge', 0.014), ('multipliers', 0.014), ('initialized', 0.014), ('microsoft', 0.014), ('intuition', 0.014), ('truly', 0.013), ('log', 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 90 nips-2010-Fast Large-scale Mixture Modeling with Component-specific Data Partitions

Author: Bo Thiesson, Chong Wang

2 0.1895756 164 nips-2010-MAP Estimation for Graphical Models by Likelihood Maximization

Author: Akshat Kumar, Shlomo Zilberstein

Abstract: Computing a maximum a posteriori (MAP) assignment in graphical models is a crucial inference problem for many practical applications. Several provably convergent approaches have been successfully developed using linear programming (LP) relaxation of the MAP problem. We present an alternative approach, which transforms the MAP problem into that of inference in a mixture of simple Bayes nets. We then derive the Expectation Maximization (EM) algorithm for this mixture that also monotonically increases a lower bound on the MAP assignment until convergence. The update equations for the EM algorithm are remarkably simple, both conceptually and computationally, and can be implemented using a graph-based message passing paradigm similar to max-product computation. Experiments on the real-world protein design dataset show that EM’s convergence rate is signiﬁcantly higher than the previous LP relaxation based approach MPLP. EM also achieves a solution quality within 95% of optimal for most instances. 1

3 0.1296892 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models

Author: Armand Joulin, Jean Ponce, Francis R. Bach

Abstract: Dimensionality reduction is commonly used in the setting of multi-label supervised classiﬁcation to control the learning capacity and to provide a meaningful representation of the data. We introduce a simple forward probabilistic model which is a multinomial extension of reduced rank regression, and show that this model provides a probabilistic interpretation of discriminative clustering methods with added beneﬁts in terms of number of hyperparameters and optimization. While the expectation-maximization (EM) algorithm is commonly used to learn these probabilistic models, it usually leads to local maxima because it relies on a non-convex cost function. To avoid this problem, we introduce a local approximation of this cost function, which in turn leads to a quadratic non-convex optimization problem over a product of simplices. In order to maximize quadratic functions, we propose an efﬁcient algorithm based on convex relaxations and lowrank representations of the data, capable of handling large-scale problems. Experiments on text document classiﬁcation show that the new model outperforms other supervised dimensionality reduction methods, while simulations on unsupervised clustering show that our probabilistic formulation has better properties than existing discriminative clustering methods. 1

4 0.087674826 284 nips-2010-Variational bounds for mixed-data factor analysis

Author: Mohammad E. Khan, Guillaume Bouchard, Kevin P. Murphy, Benjamin M. Marlin

Abstract: We propose a new variational EM algorithm for ﬁtting factor analysis models with mixed continuous and categorical observations. The algorithm is based on a simple quadratic bound to the log-sum-exp function. In the special case of fully observed binary data, the bound we propose is signiﬁcantly faster than previous variational methods. We show that EM is signiﬁcantly more robust in the presence of missing data compared to treating the latent factors as parameters, which is the approach used by exponential family PCA and other related matrix-factorization methods. A further beneﬁt of the variational approach is that it can easily be extended to the case of mixtures of factor analyzers, as we show. We present results on synthetic and real data sets demonstrating several desirable properties of our proposed method. 1

5 0.06825427 283 nips-2010-Variational Inference over Combinatorial Spaces

Author: Alexandre Bouchard-côté, Michael I. Jordan

Abstract: Since the discovery of sophisticated fully polynomial randomized algorithms for a range of #P problems [1, 2, 3], theoretical work on approximate inference in combinatorial spaces has focused on Markov chain Monte Carlo methods. Despite their strong theoretical guarantees, the slow running time of many of these randomized algorithms and the restrictive assumptions on the potentials have hindered the applicability of these algorithms to machine learning. Because of this, in applications to combinatorial spaces simple exact models are often preferred to more complex models that require approximate inference [4]. Variational inference would appear to provide an appealing alternative, given the success of variational methods for graphical models [5]; unfortunately, however, it is not obvious how to develop variational approximations for combinatorial objects such as matchings, partial orders, plane partitions and sequence alignments. We propose a new framework that extends variational inference to a wide range of combinatorial spaces. Our method is based on a simple assumption: the existence of a tractable measure factorization, which we show holds in many examples. Simulations on a range of matching models show that the algorithm is more general and empirically faster than a popular fully polynomial randomized algorithm. We also apply the framework to the problem of multiple alignment of protein sequences, obtaining state-of-the-art results on the BAliBASE dataset [6]. 1

6 0.064385213 166 nips-2010-Minimum Average Cost Clustering

7 0.063983664 263 nips-2010-Switching state space model for simultaneously estimating state transitions and nonstationary firing rates

8 0.059066318 108 nips-2010-Graph-Valued Regression

9 0.055774912 194 nips-2010-Online Learning for Latent Dirichlet Allocation

10 0.055414621 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data

11 0.052924749 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

12 0.050940841 268 nips-2010-The Neural Costs of Optimal Control

13 0.049798775 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits

14 0.042579826 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks

15 0.041028623 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems

16 0.039857171 62 nips-2010-Discriminative Clustering by Regularized Information Maximization

17 0.038450092 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods

18 0.036562879 266 nips-2010-The Maximal Causes of Natural Scenes are Edge Filters

19 0.035722036 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone

20 0.03508608 273 nips-2010-Towards Property-Based Classification of Clustering Paradigms

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.11), (1, 0.024), (2, 0.024), (3, 0.021), (4, -0.106), (5, 0.011), (6, 0.024), (7, 0.047), (8, 0.071), (9, 0.036), (10, -0.012), (11, -0.018), (12, 0.093), (13, -0.024), (14, 0.001), (15, -0.052), (16, -0.025), (17, 0.076), (18, 0.054), (19, -0.065), (20, 0.03), (21, -0.057), (22, -0.006), (23, 0.079), (24, 0.001), (25, -0.078), (26, -0.128), (27, 0.174), (28, -0.151), (29, -0.063), (30, 0.021), (31, 0.003), (32, 0.026), (33, 0.0), (34, 0.129), (35, 0.04), (36, 0.061), (37, -0.015), (38, 0.012), (39, 0.091), (40, -0.047), (41, 0.058), (42, 0.114), (43, 0.036), (44, 0.066), (45, -0.05), (46, -0.049), (47, -0.02), (48, 0.013), (49, -0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95126152 90 nips-2010-Fast Large-scale Mixture Modeling with Component-specific Data Partitions

Author: Bo Thiesson, Chong Wang

2 0.68856919 164 nips-2010-MAP Estimation for Graphical Models by Likelihood Maximization

Author: Akshat Kumar, Shlomo Zilberstein

3 0.5545122 284 nips-2010-Variational bounds for mixed-data factor analysis

Author: Mohammad E. Khan, Guillaume Bouchard, Kevin P. Murphy, Benjamin M. Marlin

4 0.54951811 263 nips-2010-Switching state space model for simultaneously estimating state transitions and nonstationary firing rates

Author: Ken Takiyama, Masato Okada

Abstract: 019 020 We propose an algorithm for simultaneously estimating state transitions among neural states, the number of neural states, and nonstationary ﬁring rates using a switching state space model (SSSM). This algorithm enables us to detect state transitions on the basis of not only the discontinuous changes of mean ﬁring rates but also discontinuous changes in temporal proﬁles of ﬁring rates, e.g., temporal correlation. We construct a variational Bayes algorithm for a non-Gaussian SSSM whose non-Gaussian property is caused by binary spike events. Synthetic data analysis reveals that our algorithm has the high performance for estimating state transitions, the number of neural states, and nonstationary ﬁring rates compared to previous methods. We also analyze neural data that were recorded from the medial temporal area. The statistically detected neural states probably coincide with transient and sustained states that have been detected heuristically. Estimated parameters suggest that our algorithm detects the state transition on the basis of discontinuous changes in the temporal correlation of ﬁring rates, which transitions previous methods cannot detect. This result suggests that our algorithm is advantageous in real-data analysis. 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 051 052 053

5 0.53901666 283 nips-2010-Variational Inference over Combinatorial Spaces

Author: Alexandre Bouchard-côté, Michael I. Jordan

6 0.50456512 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models

7 0.47957185 165 nips-2010-MAP estimation in Binary MRFs via Bipartite Multi-cuts

8 0.44068024 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization

9 0.3498235 288 nips-2010-Worst-case bounds on the quality of max-product fixed-points

10 0.34060249 194 nips-2010-Online Learning for Latent Dirichlet Allocation

11 0.32839486 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings

12 0.32786942 268 nips-2010-The Neural Costs of Optimal Control

13 0.32647547 29 nips-2010-An Approximate Inference Approach to Temporal Optimization in Optimal Control

14 0.32413742 171 nips-2010-Movement extraction by detecting dynamics switches and repetitions

15 0.31011763 62 nips-2010-Discriminative Clustering by Regularized Information Maximization

16 0.30768123 139 nips-2010-Latent Variable Models for Predicting File Dependencies in Large-Scale Software Development

17 0.30711564 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems

18 0.30290934 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

19 0.30118707 223 nips-2010-Rates of convergence for the cluster tree

20 0.30084094 84 nips-2010-Exact inference and learning for cumulative distribution functions on loopy graphs

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.06), (17, 0.01), (27, 0.082), (30, 0.082), (35, 0.011), (45, 0.2), (50, 0.047), (52, 0.019), (53, 0.275), (60, 0.028), (77, 0.05), (78, 0.012), (90, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84117043 226 nips-2010-Repeated Games against Budgeted Adversaries

Author: Jacob D. Abernethy, Manfred K. Warmuth

Abstract: We study repeated zero-sum games against an adversary on a budget. Given that an adversary has some constraint on the sequence of actions that he plays, we consider what ought to be the player’s best mixed strategy with knowledge of this budget. We show that, for a general class of normal-form games, the minimax strategy is indeed efﬁciently computable and relies on a “random playout” technique. We give three diverse applications of this new algorithmic template: a cost-sensitive “Hedge” setting, a particular problem in Metrical Task Systems, and the design of combinatorial prediction markets. 1

2 0.8065083 2 nips-2010-A Bayesian Approach to Concept Drift

Author: Stephen Bach, Mark Maloof

Abstract: To cope with concept drift, we placed a probability distribution over the location of the most-recent drift point. We used Bayesian model comparison to update this distribution from the predictions of models trained on blocks of consecutive observations and pruned potential drift points with low probability. We compare our approach to a non-probabilistic method for drift and a probabilistic method for change-point detection. In our experiments, our approach generally yielded improved accuracy and/or speed over these other methods. 1

same-paper 3 0.77929199 90 nips-2010-Fast Large-scale Mixture Modeling with Component-specific Data Partitions

Author: Bo Thiesson, Chong Wang

4 0.76182812 208 nips-2010-Policy gradients in linearly-solvable MDPs

Author: Emanuel Todorov

Abstract: We present policy gradient results within the framework of linearly-solvable MDPs. For the ﬁrst time, compatible function approximators and natural policy gradients are obtained by estimating the cost-to-go function, rather than the (much larger) state-action advantage function as is necessary in traditional MDPs. We also develop the ﬁrst compatible function approximators and natural policy gradients for continuous-time stochastic systems.

5 0.75450796 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods

Author: Xinhua Zhang, Ankan Saha, S.v.n. Vishwanathan

Abstract: In a recent paper Joachims [1] presented SVM-Perf, a cutting plane method (CPM) for training linear Support Vector Machines (SVMs) which converges to an accurate solution in O(1/ 2 ) iterations. By tightening the analysis, Teo et al. [2] showed that O(1/ ) iterations sufﬁce. Given the impressive convergence speed of CPM on a number of practical problems, it was conjectured that these rates could be further improved. In this paper we disprove this conjecture. We present counter examples which are not only applicable for training linear SVMs with hinge loss, but also hold for support vector methods which optimize a multivariate performance score. However, surprisingly, these problems are not inherently hard. By exploiting the structure of the objective function we can devise an algo√ rithm that converges in O(1/ ) iterations. 1

6 0.66637594 196 nips-2010-Online Markov Decision Processes under Bandit Feedback

7 0.66616124 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations

8 0.66600299 38 nips-2010-Batch Bayesian Optimization via Simulation Matching

9 0.66531134 179 nips-2010-Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks

10 0.66194141 155 nips-2010-Learning the context of a category

11 0.65704173 64 nips-2010-Distributionally Robust Markov Decision Processes

12 0.65601701 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

13 0.65514284 160 nips-2010-Linear Complementarity for Regularized Policy Evaluation and Improvement

14 0.65469211 63 nips-2010-Distributed Dual Averaging In Networks

15 0.65394211 117 nips-2010-Identifying graph-structured activation patterns in networks

16 0.65374631 268 nips-2010-The Neural Costs of Optimal Control

17 0.65369153 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes

18 0.65334386 194 nips-2010-Online Learning for Latent Dirichlet Allocation

19 0.65200585 4 nips-2010-A Computational Decision Theory for Interactive Assistants

20 0.65159321 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs