nips nips2012 nips2012-320 knowledge-graph by maker-knowledge-mining

320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion


Source: pdf

Author: Borja Balle, Mehryar Mohri

Abstract: Many tasks in text and speech processing and computational biology require estimating functions mapping strings to real numbers. A broad class of such functions can be defined by weighted automata. Spectral methods based on the singular value decomposition of a Hankel matrix have been recently proposed for learning a probability distribution represented by a weighted automaton from a training sample drawn according to this same target distribution. In this paper, we show how spectral methods can be extended to the problem of learning a general weighted automaton from a sample generated by an arbitrary distribution. The main obstruction to this approach is that, in general, some entries of the Hankel matrix may be missing. We present a solution to this problem based on solving a constrained matrix completion problem. Combining these two ingredients, matrix completion and spectral method, a whole new family of algorithms for learning general weighted automata is obtained. We present generalization bounds for a particular algorithm in this family. The proofs rely on a joint stability analysis of matrix completion and spectral learning. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Many tasks in text and speech processing and computational biology require estimating functions mapping strings to real numbers. [sent-5, score-0.315]

2 A broad class of such functions can be defined by weighted automata. [sent-6, score-0.162]

3 Spectral methods based on the singular value decomposition of a Hankel matrix have been recently proposed for learning a probability distribution represented by a weighted automaton from a training sample drawn according to this same target distribution. [sent-7, score-0.516]

4 In this paper, we show how spectral methods can be extended to the problem of learning a general weighted automaton from a sample generated by an arbitrary distribution. [sent-8, score-0.555]

5 The main obstruction to this approach is that, in general, some entries of the Hankel matrix may be missing. [sent-9, score-0.148]

6 We present a solution to this problem based on solving a constrained matrix completion problem. [sent-10, score-0.385]

7 Combining these two ingredients, matrix completion and spectral method, a whole new family of algorithms for learning general weighted automata is obtained. [sent-11, score-0.963]

8 The proofs rely on a joint stability analysis of matrix completion and spectral learning. [sent-13, score-0.636]

9 A broad class of such functions can be defined by weighted automata. [sent-15, score-0.162]

10 The mathematical and algorithmic properties of weighted automata have been extensively studied in the most general setting where they are defined in terms of an arbitrary semiring [28, 9, 23]. [sent-16, score-0.487]

11 Weighted automata are widely used in applications ranging from natural text and speech processing [24] to optical character recognition [12] and image processing [1]. [sent-17, score-0.306]

12 This paper addresses the problem of learning weighted automata from a finite set of labeled examples. [sent-18, score-0.408]

13 The particular instance of this problem where the objective is to learn a probabilistic automaton from examples drawn from this same distribution has recently drawn much attention: starting with the seminal work of Hsu et al. [sent-19, score-0.242]

14 [19], the so-called spectral method has proven to be a valuable tool in developing novel and theoretically-sound algorithms for learning HMMs and other related classes of distributions [5, 30, 31, 10, 6, 4]. [sent-20, score-0.194]

15 The spectral term takes its origin from the use of a singular value decomposition in solving those equations. [sent-25, score-0.26]

16 Indeed, in general, there seems to be a large gap separating the scenario of learning a probabilistic automaton using data drawn according to the distribution it generates, from that of learning an arbitrary weighted automaton from labeled data drawn from some unknown distribution. [sent-27, score-0.622]

17 In contrast, the latter involves two distinct objects: a distribution according to which strings are drawn, and a target weighted automaton assigning labels to these strings. [sent-29, score-0.602]

18 It is not difficult in this setting to conceive that, for a particular target, an adversary could find a distribution over strings making the learner’s task insurmountably difficult. [sent-30, score-0.257]

19 In fact, this is the core idea behind the cryptography-based hardness results for learning deterministic finite automata given by Kearns and Valiant [20] – these same results apply to our setting as well. [sent-31, score-0.267]

20 But, even in cases where the distribution “cooperates,” there is still an obstruction in leveraging the spectral method for learning general weighted automata. [sent-32, score-0.35]

21 The statistics used by the spectral method are essentially the probabilities assigned by the target distribution to each string in some fixed finite set B. [sent-33, score-0.355]

22 Thus, it can be safely assumed that the probability of any string from B not present in the sample is zero. [sent-35, score-0.114]

23 When learning arbitrary weighted automata, however, the value assigned by the target to an unseen string is unknown. [sent-36, score-0.322]

24 Furthermore, one cannot expect that a sample would contain the values of the target function for all the strings in B. [sent-37, score-0.304]

25 This observation raises the question of whether it is possible at all to apply the spectral method in a setting with missing data, or, alternatively, whether there is a principled way to “estimate” this missing information and then apply the spectral method. [sent-38, score-0.442]

26 As it turns out, the latter approach can be naturally formulated as a constrained matrix completion problem. [sent-39, score-0.385]

27 When applying the spectral method, the (approximate) values of the target on B are arranged in a matrix H. [sent-40, score-0.314]

28 Thus, the main difference between the two settings can be restated as follows: when learning a weighted automaton representing a distribution, unknown entries of H can be filled in with zeros, while in the general setting there is a priori no straightforward method to fill in the missing values. [sent-41, score-0.384]

29 We propose to use a matrix completion algorithm for solving this last problem. [sent-42, score-0.342]

30 In particular, since H is a Hankel matrix whose entries must satisfy some equality constraints, it turns out that the problem of learning weighted automata under an arbitrary distribution leads to what we call the Hankel matrix completion problem. [sent-43, score-0.882]

31 This is essentially a constrained matrix completion problem where entries of valid hypotheses need to satisfy a set of equalities. [sent-44, score-0.424]

32 Since the set of valid hypotheses for our constrained matrix completion problem is convex, many of these algorithms could also be modified to deal with the Hankel matrix completion problem. [sent-49, score-0.727]

33 In summary, our approach leverages two recent techniques for learning a general weighted automaton: matrix completion and spectral learning. [sent-50, score-0.656]

34 It consists of first predicting the missing entries in H and then applying the spectral method to the resulting matrix. [sent-51, score-0.26]

35 Altogether, this yields a family of algorithms parametrized by the choice of the specific Hankel matrix completion algorithm used. [sent-52, score-0.382]

36 These algorithms are designed for learning an arbitrary weighted automaton from samples generated by an unknown distribution over strings and labels. [sent-53, score-0.638]

37 We study a special instance of this family of algorithms and prove generalization guarantees for its performance based on a stability analysis, under mild conditions on the distribution. [sent-54, score-0.179]

38 In Section 3, we describe a family of algorithms for learning general weighted automata by combining constrained matrix completion and spectral methods. [sent-58, score-1.006]

39 The set of all strings over Σ is denoted by Σ and the length of a string x denoted by |x|. [sent-73, score-0.371]

40 For any n ≥ 0, Σ≤n denotes the set of all strings of length at most n. [sent-74, score-0.257]

41 Given two sets of strings P, S ⊆ Σ we denote by PS the set of all strings uv obtained by concatenation of a string u ∈ P and a string v ∈ S. [sent-75, score-0.803]

42 A set of strings P is called Σ-complete when P = P Σ for some set P . [sent-76, score-0.257]

43 The Hankel matrix H ∈ RP×S associated to a function h ∈ H is the matrix whose entries are defined by H(u, v) = h(uv) for all u ∈ P and v ∈ S. [sent-82, score-0.185]

44 2 Weighted finite automata A widely used class of functions mapping strings to real numbers is that of functions defined by weighted finite automata (WFA) or in short weighted automata [23]. [sent-96, score-1.336]

45 A WFA over Σ with n states can be defined as a tuple A = α, β, {Aa }a∈Σ , where α, β ∈ Rn are the initial and final weight vectors, and Aa ∈ Rn×n the transition matrix associated to each alphabet symbol a ∈ Σ. [sent-98, score-0.14]

46 The function fA realized by a WFA A is defined by fA (x) = α Ax1 · · · Axt β , for any string x = x1 · · · xt ∈ Σ∗ with t = |x| and xi ∈ Σ for all i ∈ [1, t]. [sent-99, score-0.114]

47 This property is convenient to bound the maximum value assigned by a WFA to any string of a given length. [sent-101, score-0.137]

48 3 1/2 a, 3/4 b, 6/5 a, 0 b, 2/3 1 1/2 -1 α = [1/2 a, 1/3 b, 1 Aa = a, 0 b, 3/4 (a) 1/2] 3/4 0 0 1/3 β = [1 Ab = −1] 6/5 2/3 3/4 1 (b) Figure 1: Example of a weighted automaton over Σ = {a, b} with 2 states: (a) graph representation; (b) algebraic representation. [sent-102, score-0.298]

49 WFAs can be more generally defined over an arbitrary semiring instead of the field of real numbers and are also known as multiplicity automata (e. [sent-103, score-0.361]

50 To any function f : Σ → R, we can associate its Hankel matrix Hf ∈ RΣ ×Σ with entries defined by Hf (u, v) = f (uv). [sent-106, score-0.112]

51 Carlyle and Paz [15] and Fliess [17] gave the following characterization of the set of functions f in RΣ defined by a WFA in terms of the rank of their Hankel matrix rank(Hf ). [sent-108, score-0.127]

52 Since finite sub-blocks of a Hankel matrix cannot have a larger rank than its bi-infinite extension, this justifies the use of a low-rank-enforcing regularization in the definition of a Hankel matrix completion. [sent-111, score-0.181]

53 Note that deterministic finite automata (DFA) with n states can be represented by a WFA with at most n states. [sent-112, score-0.297]

54 The following is the Hankel matrix of A on this basis shown with three-digit precision entries:   a b aa ab ba bb 0. [sent-119, score-0.222]

55 58 By Theorem 1, the Hankel matrix of A has rank at most 2. [sent-141, score-0.108]

56 Given HB , the spectral method described ˆ ˆ in [19] can be used to recover a WFA A equivalent to A, in the sense that A and A compute the same function. [sent-142, score-0.194]

57 In general, one may be given a sample of strings labeled using some WFA that does not contain enough information to fully specify a Hankel matrix over a complete basis. [sent-143, score-0.351]

58 In that case, Theorem 1 motivates the use of a low-rank matrix completion algorithm to fill in the missing entries in HB prior to the application of the spectral method. [sent-144, score-0.602]

59 3 The HMC+SM Algorithm In this section we describe our algorithm HMC+SM for learning weighted automata. [sent-146, score-0.12]

60 , zm ) containing m examples zi = (xi , yi ) ∈ Σ × R, 1 The construction of an equivalent WFA with the minimal number of states from a given WFA was first given by Sch¨ tzenberger [29]. [sent-150, score-0.123]

61 In the first stage, a constrained matrix completion algorithm with input Z and regularization parameter τ is used to return a Hankel matrix HZ ∈ HB . [sent-158, score-0.458]

62 In the second stage, the spectral method is applied to HZ to compute a WFA AZ with n states. [sent-159, score-0.194]

63 In particular, by combining the spectral method with any algorithm for solving the Hankel matrix completion problem, one can derive a new algorithm for learning WFAs. [sent-162, score-0.536]

64 For concreteness, in the following, we will only consider the Hankel matrix completion algorithm described in Section 3. [sent-163, score-0.342]

65 Through its parametrization by a number 1 ≤ p ≤ ∞ and a convex loss : R × R → R+ , this completion algorithm already gives rise to a family of learning algorithms that we denote by HMCp, +SM. [sent-165, score-0.362]

66 However, it is important to keep in mind that for each existing matrix completion algorithm that can be modified to solve the Hankel matrix completion problem, a new algorithm for learning WFAs can be obtained via the general scheme we describe below. [sent-166, score-0.684]

67 1 Hankel Matrix Completion We now describe our Hankel matrix completion algorithm. [sent-168, score-0.342]

68 Given a basis B = (P, S) of Σ and a sample Z over Σ × R, the algorithm solves a convex optimization problem and returns a matrix HZ ∈ HB . [sent-169, score-0.146]

69 For each string x ∈ PS, fix a pair of coordinate vectors (ux , vx ) ∈ RP × RS such that ux Hvx = H(x) for any H ∈ H. [sent-177, score-0.202]

70 That is, ux and vx are coordinate vectors corresponding respectively to a prefix u ∈ P and a suffix v ∈ S, and such that uv = x. [sent-178, score-0.149]

71 The fact that there may not be a true underlying Hankel matrix makes it somewhat close to the agnostic setting in [18], where matrix completion is also applied under arbitrary distributions. [sent-183, score-0.502]

72 Nonetheless, it is also possible to consider other learning frameworks for WFAs where algorithms for exact matrix completion [14, 27] or noisy matrix completion [13] may be useful. [sent-184, score-0.704]

73 Furthermore, since most algorithms in the literature of matrix completion are based on convex optimization problems, it is likely that most of them can be adapted to solve constrained matrix completions problems such as the one we discuss here. [sent-185, score-0.493]

74 2 Spectral Method for General WFA Here, we describe how the spectral method can be applied to HZ to obtain a WFA. [sent-187, score-0.194]

75 We use the same notation as in [7] and a version of the spectral method working with an arbitrary basis (as in [5, 4, 7]), in contrast to versions restricted to P = Σ≤2 and S = Σ like [19]. [sent-188, score-0.298]

76 Using this notation, the spectral method can be described as follows. [sent-197, score-0.194]

77 Then, using the right singular vectors Vn of H , the next step consists of computing a weighted automaton AZ = α, β, {Aa } as follows: α =h ,S Vn β = (H Vn )+ hP , Aa = (H Vn )+ Ha Vn . [sent-200, score-0.364]

78 (SM) The fact that the spectral method is based on a singular value decomposition justifies in part the use of a Schatten p-norm as a regularizer in (HMC-H). [sent-201, score-0.26]

79 We give a stability analysis for a special instance of this family of algorithms and use it to derive a generalization bound. [sent-208, score-0.179]

80 The probability that a string x ∼ DΣ belongs to PS is denoted by π = DΣ (PS). [sent-215, score-0.114]

81 2, we define: σ= E Z∼D m [σn (H )] ρ= E Z∼D m σn (H )2 − σn+1 (H )2 , where σn (M) denotes the nth singular value of matrix M. [sent-223, score-0.161]

82 In contrast to previous learning results based on the spectral method, our bound holds in an agnostic setting. [sent-225, score-0.283]

83 6 Second, we assume that the strings generated by the distribution will not be too long. [sent-230, score-0.279]

84 In particular, that the length of the strings generated by DΣ follows a distribution whose tail is slightly lighter than sub-exponential. [sent-231, score-0.304]

85 This seems to be a limitation common to all known learnability proofs based on the spectral method. [sent-241, score-0.194]

86 The proof of this theorem is based on an algorithmic stability analysis. [sent-249, score-0.129]

87 examples drawn from D, and Z differing from Z by just one point: say zm in Z = (z1 , . [sent-253, score-0.125]

88 The new example zm is an arbitrary point the support of D. [sent-260, score-0.134]

89 The first step in the analysis is to bound the stability of the matrix completion algorithm. [sent-262, score-0.465]

90 This is done in the following lemma, that gives a sample-dependent and a sample-independent bound for the stability of H. [sent-263, score-0.123]

91 The standard method for deriving generalization bounds from algorithmic stability results could be applied here to obtain a generalization bound for our Hankel matrix completion algorithm. [sent-266, score-0.625]

92 Using the bound on the Frobenius norm H − H F , we are able to analyze the stability of σn (H ), σn (H )2 − σn+1 (H )2 , and Vn using well-known results on the stability of singular values and singular vectors. [sent-268, score-0.355]

93 Furthermore, to obtain the desired bounds we need to extend the usual tools for analyzing spectral methods to the current setting. [sent-277, score-0.224]

94 Once all this is achieved, it remains to combine these new tools to show an algorithmic stability result for HMCp, +SM. [sent-280, score-0.129]

95 5 Conclusion We described a new algorithmic solution for learning arbitrary weighted automata from a sample of labeled strings drawn from an unknown distribution. [sent-290, score-0.787]

96 Our approach combines an algorithm for constrained matrix completion with the recently developed spectral learning methods for learning probabilistic automata. [sent-291, score-0.579]

97 Using our general scheme, a broad family of algorithms for learning weighted automata can be obtained. [sent-292, score-0.45]

98 We gave a stability analysis of a particular algorithm in that family and used it to prove generalization bounds that hold for all distributions satisfying two reasonable assumptions. [sent-293, score-0.209]

99 Local loss optimization in operator models: A new insight into spectral learning. [sent-344, score-0.212]

100 Learning with the weighted trace-norm under arbitrary sampling distributions. [sent-417, score-0.161]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hankel', 0.482), ('wfa', 0.382), ('completion', 0.269), ('automata', 0.267), ('strings', 0.257), ('fz', 0.207), ('spectral', 0.194), ('automaton', 0.178), ('hz', 0.164), ('weighted', 0.12), ('ps', 0.118), ('string', 0.114), ('aa', 0.111), ('stability', 0.1), ('hb', 0.098), ('zm', 0.093), ('az', 0.093), ('hmcp', 0.091), ('wfas', 0.091), ('balle', 0.08), ('vn', 0.077), ('matrix', 0.073), ('sm', 0.068), ('hmc', 0.066), ('singular', 0.066), ('schatten', 0.064), ('hf', 0.063), ('uv', 0.061), ('rz', 0.056), ('ux', 0.056), ('mohri', 0.054), ('boots', 0.048), ('siddiqi', 0.048), ('target', 0.047), ('agnostic', 0.046), ('rp', 0.044), ('constrained', 0.043), ('hsu', 0.042), ('arbitrary', 0.041), ('family', 0.04), ('quattoni', 0.04), ('speech', 0.039), ('generalization', 0.039), ('entries', 0.039), ('hp', 0.038), ('basis', 0.038), ('alphabet', 0.037), ('carlyle', 0.036), ('catalunya', 0.036), ('faz', 0.036), ('hak', 0.036), ('hvx', 0.036), ('obstruction', 0.036), ('convex', 0.035), ('rank', 0.035), ('matrices', 0.035), ('fa', 0.033), ('jacm', 0.032), ('vx', 0.032), ('borja', 0.032), ('drawn', 0.032), ('ha', 0.032), ('mehryar', 0.03), ('semiring', 0.03), ('bounds', 0.03), ('states', 0.03), ('algorithmic', 0.029), ('mcdiarmid', 0.028), ('handbook', 0.027), ('missing', 0.027), ('notation', 0.025), ('tail', 0.025), ('anandkumar', 0.024), ('frobenius', 0.023), ('bound', 0.023), ('deriving', 0.023), ('foster', 0.023), ('multiplicity', 0.023), ('rational', 0.023), ('broad', 0.023), ('nth', 0.022), ('generated', 0.022), ('nite', 0.022), ('ingredients', 0.022), ('cx', 0.022), ('candes', 0.021), ('pre', 0.021), ('kearns', 0.021), ('hidden', 0.021), ('labeled', 0.021), ('frameworks', 0.02), ('holds', 0.02), ('lemma', 0.02), ('unknown', 0.02), ('rs', 0.019), ('functions', 0.019), ('ln', 0.019), ('formed', 0.019), ('un', 0.019), ('loss', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion

Author: Borja Balle, Mehryar Mohri

Abstract: Many tasks in text and speech processing and computational biology require estimating functions mapping strings to real numbers. A broad class of such functions can be defined by weighted automata. Spectral methods based on the singular value decomposition of a Hankel matrix have been recently proposed for learning a probability distribution represented by a weighted automaton from a training sample drawn according to this same target distribution. In this paper, we show how spectral methods can be extended to the problem of learning a general weighted automaton from a sample generated by an arbitrary distribution. The main obstruction to this approach is that, in general, some entries of the Hankel matrix may be missing. We present a solution to this problem based on solving a constrained matrix completion problem. Combining these two ingredients, matrix completion and spectral method, a whole new family of algorithms for learning general weighted automata is obtained. We present generalization bounds for a particular algorithm in this family. The proofs rely on a joint stability analysis of matrix completion and spectral learning. 1

2 0.15443379 321 nips-2012-Spectral learning of linear dynamics from generalised-linear observations with application to neural population data

Author: Lars Buesing, Maneesh Sahani, Jakob H. Macke

Abstract: Latent linear dynamical systems with generalised-linear observation models arise in a variety of applications, for instance when modelling the spiking activity of populations of neurons. Here, we show how spectral learning methods (usually called subspace identification in this context) for linear systems with linear-Gaussian observations can be extended to estimate the parameters of a generalised-linear dynamical system model despite a non-linear and non-Gaussian observation process. We use this approach to obtain estimates of parameters for a dynamical model of neural population data, where the observed spike-counts are Poisson-distributed with log-rates determined by the latent dynamical process, possibly driven by external inputs. We show that the extended subspace identification algorithm is consistent and accurately recovers the correct parameters on large simulated data sets with a single calculation, avoiding the costly iterative computation of approximate expectation-maximisation (EM). Even on smaller data sets, it provides an effective initialisation for EM, avoiding local optima and speeding convergence. These benefits are shown to extend to real neural data.

3 0.076885171 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

Author: Jinfeng Yi, Rong Jin, Shaili Jain, Tianbao Yang, Anil K. Jain

Abstract: One of the main challenges in data clustering is to define an appropriate similarity measure between two objects. Crowdclustering addresses this challenge by defining the pairwise similarity based on the manual annotations obtained through crowdsourcing. Despite its encouraging results, a key limitation of crowdclustering is that it can only cluster objects when their manual annotations are available. To address this limitation, we propose a new approach for clustering, called semi-crowdsourced clustering that effectively combines the low-level features of objects with the manual annotations of a subset of the objects obtained via crowdsourcing. The key idea is to learn an appropriate similarity measure, based on the low-level features of objects and from the manual annotations of only a small portion of the data to be clustered. One difficulty in learning the pairwise similarity measure is that there is a significant amount of noise and inter-worker variations in the manual annotations obtained via crowdsourcing. We address this difficulty by developing a metric learning algorithm based on the matrix completion method. Our empirical study with two real-world image data sets shows that the proposed algorithm outperforms state-of-the-art distance metric learning algorithms in both clustering accuracy and computational efficiency. 1

4 0.076622061 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion

Author: Thanh Ngo, Yousef Saad

Abstract: This paper describes gradient methods based on a scaled metric on the Grassmann manifold for low-rank matrix completion. The proposed methods significantly improve canonical gradient methods, especially on ill-conditioned matrices, while maintaining established global convegence and exact recovery guarantees. A connection between a form of subspace iteration for matrix completion and the scaled gradient descent procedure is also established. The proposed conjugate gradient method based on the scaled gradient outperforms several existing algorithms for matrix completion and is competitive with recently proposed methods. 1

5 0.069578163 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

Author: Tingni Sun, Cun-hui Zhang

Abstract: This paper concerns the problem of matrix completion, which is to estimate a matrix from observations in a small subset of indices. We propose a calibrated spectrum elastic net method with a sum of the nuclear and Frobenius penalties and develop an iterative algorithm to solve the convex minimization problem. The iterative algorithm alternates between imputing the missing entries in the incomplete matrix by the current guess and estimating the matrix by a scaled soft-thresholding singular value decomposition of the imputed matrix until the resulting matrix converges. A calibration step follows to correct the bias caused by the Frobenius penalty. Under proper coherence conditions and for suitable penalties levels, we prove that the proposed estimator achieves an error bound of nearly optimal order and in proportion to the noise level. This provides a unified analysis of the noisy and noiseless matrix completion problems. Simulation results are presented to compare our proposal with previous ones. 1

6 0.063824117 208 nips-2012-Matrix reconstruction with the local max norm

7 0.063420616 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

8 0.058109712 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

9 0.057608373 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

10 0.053303134 304 nips-2012-Selecting Diverse Features via Spectral Regularization

11 0.052498568 180 nips-2012-Learning Mixtures of Tree Graphical Models

12 0.052140642 69 nips-2012-Clustering Sparse Graphs

13 0.049145389 165 nips-2012-Iterative ranking from pair-wise comparisons

14 0.046153188 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

15 0.045965847 30 nips-2012-Accuracy at the Top

16 0.043975152 247 nips-2012-Nonparametric Reduced Rank Regression

17 0.043114357 34 nips-2012-Active Learning of Multi-Index Function Models

18 0.042131577 271 nips-2012-Pointwise Tracking the Optimal Regression Function

19 0.042084631 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

20 0.041458096 277 nips-2012-Probabilistic Low-Rank Subspace Clustering


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.13), (1, 0.02), (2, 0.044), (3, -0.04), (4, 0.012), (5, 0.02), (6, -0.009), (7, 0.001), (8, -0.012), (9, 0.024), (10, 0.029), (11, -0.002), (12, -0.033), (13, -0.007), (14, 0.024), (15, 0.041), (16, 0.056), (17, 0.036), (18, 0.056), (19, -0.052), (20, -0.03), (21, 0.03), (22, -0.052), (23, -0.027), (24, -0.007), (25, 0.082), (26, 0.014), (27, 0.01), (28, 0.014), (29, -0.008), (30, -0.0), (31, 0.095), (32, -0.0), (33, 0.052), (34, -0.1), (35, 0.061), (36, -0.027), (37, -0.05), (38, 0.05), (39, 0.117), (40, 0.057), (41, -0.073), (42, -0.06), (43, 0.04), (44, 0.018), (45, -0.011), (46, 0.001), (47, 0.026), (48, -0.042), (49, -0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.86933225 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion

Author: Borja Balle, Mehryar Mohri

Abstract: Many tasks in text and speech processing and computational biology require estimating functions mapping strings to real numbers. A broad class of such functions can be defined by weighted automata. Spectral methods based on the singular value decomposition of a Hankel matrix have been recently proposed for learning a probability distribution represented by a weighted automaton from a training sample drawn according to this same target distribution. In this paper, we show how spectral methods can be extended to the problem of learning a general weighted automaton from a sample generated by an arbitrary distribution. The main obstruction to this approach is that, in general, some entries of the Hankel matrix may be missing. We present a solution to this problem based on solving a constrained matrix completion problem. Combining these two ingredients, matrix completion and spectral method, a whole new family of algorithms for learning general weighted automata is obtained. We present generalization bounds for a particular algorithm in this family. The proofs rely on a joint stability analysis of matrix completion and spectral learning. 1

2 0.71027005 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

Author: Shusen Wang, Zhihua Zhang

Abstract: The CUR matrix decomposition is an important extension of Nystr¨ m approximao tion to a general matrix. It approximates any data matrix in terms of a small number of its columns and rows. In this paper we propose a novel randomized CUR algorithm with an expected relative-error bound. The proposed algorithm has the advantages over the existing relative-error CUR algorithms that it possesses tighter theoretical bound and lower time complexity, and that it can avoid maintaining the whole data matrix in main memory. Finally, experiments on several real-world datasets demonstrate significant improvement over the existing relative-error algorithms. 1

3 0.66347522 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion

Author: Thanh Ngo, Yousef Saad

Abstract: This paper describes gradient methods based on a scaled metric on the Grassmann manifold for low-rank matrix completion. The proposed methods significantly improve canonical gradient methods, especially on ill-conditioned matrices, while maintaining established global convegence and exact recovery guarantees. A connection between a form of subspace iteration for matrix completion and the scaled gradient descent procedure is also established. The proposed conjugate gradient method based on the scaled gradient outperforms several existing algorithms for matrix completion and is competitive with recently proposed methods. 1

4 0.6325953 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

Author: Tingni Sun, Cun-hui Zhang

Abstract: This paper concerns the problem of matrix completion, which is to estimate a matrix from observations in a small subset of indices. We propose a calibrated spectrum elastic net method with a sum of the nuclear and Frobenius penalties and develop an iterative algorithm to solve the convex minimization problem. The iterative algorithm alternates between imputing the missing entries in the incomplete matrix by the current guess and estimating the matrix by a scaled soft-thresholding singular value decomposition of the imputed matrix until the resulting matrix converges. A calibration step follows to correct the bias caused by the Frobenius penalty. Under proper coherence conditions and for suitable penalties levels, we prove that the proposed estimator achieves an error bound of nearly optimal order and in proportion to the noise level. This provides a unified analysis of the noisy and noiseless matrix completion problems. Simulation results are presented to compare our proposal with previous ones. 1

5 0.60709244 321 nips-2012-Spectral learning of linear dynamics from generalised-linear observations with application to neural population data

Author: Lars Buesing, Maneesh Sahani, Jakob H. Macke

Abstract: Latent linear dynamical systems with generalised-linear observation models arise in a variety of applications, for instance when modelling the spiking activity of populations of neurons. Here, we show how spectral learning methods (usually called subspace identification in this context) for linear systems with linear-Gaussian observations can be extended to estimate the parameters of a generalised-linear dynamical system model despite a non-linear and non-Gaussian observation process. We use this approach to obtain estimates of parameters for a dynamical model of neural population data, where the observed spike-counts are Poisson-distributed with log-rates determined by the latent dynamical process, possibly driven by external inputs. We show that the extended subspace identification algorithm is consistent and accurately recovers the correct parameters on large simulated data sets with a single calculation, avoiding the costly iterative computation of approximate expectation-maximisation (EM). Even on smaller data sets, it provides an effective initialisation for EM, avoiding local optima and speeding convergence. These benefits are shown to extend to real neural data.

6 0.59366918 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

7 0.57519591 34 nips-2012-Active Learning of Multi-Index Function Models

8 0.57244825 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

9 0.57054693 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

10 0.53813982 247 nips-2012-Nonparametric Reduced Rank Regression

11 0.53053039 304 nips-2012-Selecting Diverse Features via Spectral Regularization

12 0.52145314 125 nips-2012-Factoring nonnegative matrices with linear programs

13 0.51445299 86 nips-2012-Convex Multi-view Subspace Learning

14 0.51068819 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

15 0.4944669 221 nips-2012-Multi-Stage Multi-Task Feature Learning

16 0.49318084 44 nips-2012-Approximating Concavely Parameterized Optimization Problems

17 0.48469839 208 nips-2012-Matrix reconstruction with the local max norm

18 0.48389742 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

19 0.48223558 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

20 0.47360554 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.049), (14, 0.013), (21, 0.027), (38, 0.167), (39, 0.013), (42, 0.031), (53, 0.02), (54, 0.026), (55, 0.028), (74, 0.021), (76, 0.117), (80, 0.069), (89, 0.293), (92, 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74276584 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion

Author: Borja Balle, Mehryar Mohri

Abstract: Many tasks in text and speech processing and computational biology require estimating functions mapping strings to real numbers. A broad class of such functions can be defined by weighted automata. Spectral methods based on the singular value decomposition of a Hankel matrix have been recently proposed for learning a probability distribution represented by a weighted automaton from a training sample drawn according to this same target distribution. In this paper, we show how spectral methods can be extended to the problem of learning a general weighted automaton from a sample generated by an arbitrary distribution. The main obstruction to this approach is that, in general, some entries of the Hankel matrix may be missing. We present a solution to this problem based on solving a constrained matrix completion problem. Combining these two ingredients, matrix completion and spectral method, a whole new family of algorithms for learning general weighted automata is obtained. We present generalization bounds for a particular algorithm in this family. The proofs rely on a joint stability analysis of matrix completion and spectral learning. 1

2 0.71704715 131 nips-2012-Feature Clustering for Accelerating Parallel Coordinate Descent

Author: Chad Scherrer, Ambuj Tewari, Mahantesh Halappanavar, David Haglin

Abstract: Large-scale `1 -regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. High-performance algorithms and implementations are critical to efficiently solving these problems. Building upon previous work on coordinate descent algorithms for `1 -regularized problems, we introduce a novel family of algorithms called block-greedy coordinate descent that includes, as special cases, several existing algorithms such as SCD, Greedy CD, Shotgun, and Thread-Greedy. We give a unified convergence analysis for the family of block-greedy algorithms. The analysis suggests that block-greedy coordinate descent can better exploit parallelism if features are clustered so that the maximum inner product between features in different blocks is small. Our theoretical convergence analysis is supported with experimental results using data from diverse real-world applications. We hope that algorithmic approaches and convergence analysis we provide will not only advance the field, but will also encourage researchers to systematically explore the design space of algorithms for solving large-scale `1 -regularization problems. 1

3 0.67807364 226 nips-2012-Multiclass Learning Approaches: A Theoretical Comparison with Implications

Author: Amit Daniely, Sivan Sabato, Shai S. Shwartz

Abstract: We theoretically analyze and compare the following five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass SVM. In the first four methods, the classification is based on a reduction to binary classification. We consider the case where the binary classifier comes from a class of VC dimension d, and in particular from the class of halfspaces over Rd . We analyze both the estimation error and the approximation error of these methods. Our analysis reveals interesting conclusions of practical relevance, regarding the success of the different approaches under various conditions. Our proof technique employs tools from VC theory to analyze the approximation error of hypothesis classes. This is in contrast to most previous uses of VC theory, which only deal with estimation error. 1

4 0.60256118 27 nips-2012-A quasi-Newton proximal splitting method

Author: Stephen Becker, Jalal Fadili

Abstract: A new result in convex analysis on the calculation of proximity operators in certain scaled norms is derived. We describe efficient implementations of the proximity calculation for a useful class of functions; the implementations exploit the piece-wise linear nature of the dual problem. The second part of the paper applies the previous result to acceleration of convex minimization problems, and leads to an elegant quasi-Newton method. The optimization method compares favorably against state-of-the-art alternatives. The algorithm has extensive applications including signal processing, sparse recovery and machine learning and classification. 1

5 0.60250002 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

Author: Andriy Mnih, Yee W. Teh

Abstract: User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user’s item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data. 1

6 0.60215944 86 nips-2012-Convex Multi-view Subspace Learning

7 0.60134393 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

8 0.60114348 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

9 0.6003055 319 nips-2012-Sparse Prediction with the $k$-Support Norm

10 0.60024977 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

11 0.59983647 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

12 0.59932041 263 nips-2012-Optimal Regularized Dual Averaging Methods for Stochastic Optimization

13 0.59917414 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

14 0.59913665 187 nips-2012-Learning curves for multi-task Gaussian process regression

15 0.59910542 227 nips-2012-Multiclass Learning with Simplex Coding

16 0.59882843 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

17 0.59878123 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

18 0.59800261 125 nips-2012-Factoring nonnegative matrices with linear programs

19 0.59687608 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization

20 0.59626758 361 nips-2012-Volume Regularization for Binary Classification