nips nips2013 nips2013-342 knowledge-graph by maker-knowledge-mining

342 nips-2013-Unsupervised Spectral Learning of Finite State Transducers


Source: pdf

Author: Raphael Bailly, Xavier Carreras, Ariadna Quattoni

Abstract: Finite-State Transducers (FST) are a standard tool for modeling paired inputoutput sequences and are used in numerous applications, ranging from computational biology to natural language processing. Recently Balle et al. [4] presented a spectral algorithm for learning FST from samples of aligned input-output sequences. In this paper we address the more realistic, yet challenging setting where the alignments are unknown to the learning algorithm. We frame FST learning as finding a low rank Hankel matrix satisfying constraints derived from observable statistics. Under this formulation, we provide identifiability results for FST distributions. Then, following previous work on rank minimization, we propose a regularized convex relaxation of this objective which is based on minimizing a nuclear norm penalty subject to linear constraints and can be solved efficiently. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Finite-State Transducers (FST) are a standard tool for modeling paired inputoutput sequences and are used in numerous applications, ranging from computational biology to natural language processing. [sent-3, score-0.132]

2 [4] presented a spectral algorithm for learning FST from samples of aligned input-output sequences. [sent-5, score-0.245]

3 In this paper we address the more realistic, yet challenging setting where the alignments are unknown to the learning algorithm. [sent-6, score-0.105]

4 We frame FST learning as finding a low rank Hankel matrix satisfying constraints derived from observable statistics. [sent-7, score-0.321]

5 Then, following previous work on rank minimization, we propose a regularized convex relaxation of this objective which is based on minimizing a nuclear norm penalty subject to linear constraints and can be solved efficiently. [sent-9, score-0.315]

6 A pair of sequences is made of an input sequence, built from an input alphabet, and an output sequence, built from an output alphabet. [sent-11, score-0.198]

7 A variety of algorithms for learning FST have been proposed in the literature, most of them are based on EM optimizations [9, 11] or grammatical inference techniques [8, 6]. [sent-13, score-0.071]

8 GAATTCAG| | || | GGA-TC-GA GAATTCAG| || | | GGAT-C-GA GAATTC-AG | | || | GGA-TCGA- GAATTC-AG | || | | GGAT-CGA- To be able to handle different alignments, a special empty symbol ∗ is added to the input and output alphabets. [sent-17, score-0.088]

9 With this enlarged set of bi-symbols, the model is able to generate an input symbol (resp. [sent-18, score-0.068]

10 As an example, the first alignment above will correspond ∗ y to the two possible representations G A ∗ A T T C A G ∗ and G ∗ A A T T C A G ∗ . [sent-23, score-0.069]

11 Under this model the G∗ GA∗ T C ∗ GA GG∗ A∗ T C ∗ GA probability of observing a pair of un-aligned input-output sequences is obtained by integrating over all possible alignments. [sent-24, score-0.108]

12 Recently, following a recent trend of work on spectral learning algorithms for finite state machines [14, 2, 17, 18, 7, 16, 10, 5], Balle et al. [sent-25, score-0.105]

13 [4] presented an algorithm for learning FST where the input to the algorithm are samples of aligned input-output sequences. [sent-26, score-0.14]

14 As with most spectral methods the core idea of this algorithm is to exploit low-rank decompositions of some Hankel matrix representing 1 the distribution of aligned sequences. [sent-27, score-0.285]

15 To estimate this Hankel matrix it is assumed that the algorithm can sample aligned sequences, i. [sent-28, score-0.18]

16 While the problem of learning FST from fully aligned sequences (what we sometimes refer to as supervised learning) has been solved, the problem of deriving an unsupervised spectral method that can be trained from samples of input-output sequences alone (i. [sent-31, score-0.515]

17 In this paper we address this unsupervised setting and present a spectral algorithm that can approximate the distribution of paired sequences generated by an FST without having access to aligned sequences. [sent-35, score-0.442]

18 To the best of our knowledge this is the first spectral algorithm for this problem. [sent-36, score-0.105]

19 The main challenge in the unsupervised setting is that since the alignment information is not available, the Hankel matrices (as in [4]) can no longer be directly estimated from observable statistics. [sent-37, score-0.231]

20 However, a key observation is that we can nevertheless compute observable statistics that can constraint the coefficients of the Hankel matrix. [sent-38, score-0.098]

21 This is because the probability of observing a pair of un-aligned input-output sequences (i. [sent-39, score-0.108]

22 an observable statistic) is computed by summing over all possible alignments; i. [sent-41, score-0.097]

23 The main idea of our algorithm is to exploit these constraints and find a Hankel matrix (from which we can directly recover the model) which both agrees on the observed statistics and has a low-rank matrix factorization. [sent-44, score-0.168]

24 In brief, our main contribution is to show that an FST can be approximated by solving an optimization which is based on finding a low-rank matrix satisfying a set of constraints derived from observable statistics and Hankel structure. [sent-45, score-0.189]

25 We provide sample complexity bounds and some identifiability results for this optimization that show that, theoretically, the rank and the parameters of an FST distribution can be identified. [sent-46, score-0.117]

26 Following previous work on rank minimization, we propose a regularized convex relaxation of the proposed objective which is based on minimizing a nuclear norm penalty subject to linear constraints. [sent-47, score-0.266]

27 The proposed relaxation balances a trade-off between model complexity (measured by the nuclear norm penalty) and fitting the observed statistics. [sent-48, score-0.111]

28 Synthetic experiments show that the performance of our unsupervised algorithm efficiently approximates that of a supervised method trained from fully aligned sequences. [sent-49, score-0.232]

29 Section 2 gives preliminaries on FST and spectral learning methods, and establishes that an FST can be induced from a Hankel matrix of observable aligned statistics. [sent-51, score-0.367]

30 Section 3 presents a generalized form of Hankel matrices for FST that allows to express observation constraints efficiently. [sent-52, score-0.162]

31 One can not observe generalized Hankel matrices without assumming access to aligned samples. [sent-53, score-0.251]

32 To solve this problem, Section 4 formulates finding the Hankel matrix of an FST from unaligned samples as rank minimization problem. [sent-54, score-0.27]

33 This section also presents the main theoretical results of the method, as well as a convex relaxation of the rank minimization problem. [sent-55, score-0.218]

34 A Finite-State Transducer (FST) of rank d is given by: • alphabets Σ+ = {x1 , . [sent-59, score-0.146]

35 An alignment of (s, t) is 1 n given by a sequence of pairs x1 . [sent-67, score-0.087]

36 yn ) y y by removing the empty symbols ∗ equals s (resp. [sent-77, score-0.059]

37 The set of alignments for a pair of sequences (s, t) is denoted [s, t]. [sent-80, score-0.213]

38 2 Computing with an FST In order to compute the value of a pair of sequences (s, t), one needs to sum over all possible alignments, which is generally exponential in the length of s and t. [sent-100, score-0.108]

39 0 01 Let us consider the model A which satisfies the constraints above. [sent-114, score-0.049]

40 One has rA (0 ∗ 1 ) = 00 1 1/96, rA (∗ 0 1 ) = 1/192 and, as those two aligned sequences are the only possible alignments for 0 01 (01, 001), one has rA ((01, 001)) = 1/64. [sent-115, score-0.334]

41 y y y y A Hankel matrix on U and V is a matrix with rows corresponding to elements u ∈ U and columns corresponding to v ∈ V , which satisfies uv = u′ v ′ → H(u, v) = H(u′ , v ′ ). [sent-124, score-0.153]

42 One supposes that U is prefix-closed, V is suffix-closed, and that rank(Hε ) = rank(H). [sent-130, score-0.049]

43 The rank equality comes from the fact that the WA defined above has the same rank as Hε , and that the rank of a mapping f which satisfies f (uv) = H(u, v) is at least the rank of H. [sent-133, score-0.485]

44 The complete Hankel b matrix has at least rank 7. [sent-139, score-0.157]

45 3 Inducing FST from Generalized Hankel Matrices Proposition 2 tells us that if we had access to certain sub-blocks of the Hankel matrix for aligned sequences we could recover the FST model. [sent-140, score-0.298]

46 However, we do not have access to the hidden alignment information: we only have access to the statistics p(s, t), which we will call observations. [sent-141, score-0.152]

47 One natural idea would be to search for a Hankel matrix that agrees with the observations. [sent-142, score-0.079]

48 To do so, we introduce observable constraints, which are linear constraints of the form n 1 n 1 p(s, t) = ∑x1 . [sent-143, score-0.131]

49 n y y y y y y 1 From a matrix satisfying the hypothesis of Proposition 2 and the observation constraints, one can derive an FST computing a mapping which agrees on the observations. [sent-153, score-0.13]

50 1n 1n We will denote sets of alignments as follows: x1 n will denote the set {x1 n }, which contains a single y y x1 n x1 n xn+1 n+k xn+1 n+k aligned sequence; then y1 n [s, t] will denote the set {y1 n yn+1 n+k yn+1 n+k ∈ [s, t]}, which extends 1n 1n {x1 n } with all ways of aligning (s, t). [sent-161, score-0.245]

51 generalized suffix) is the empty set or a set of the form x1 n x1 n [s, t]y1 n (resp. [sent-165, score-0.077]

52 y1 n [s, t]), where the aligned part is possibly empty. [sent-166, score-0.14]

53 1 Generalized Hankel One needs to extend the definition of a Hankel matrix to the generalized prefixes and suffixes. [sent-168, score-0.092]

54 Let U be a set of generalized prefixes, V be a set of generalized suffixes. [sent-170, score-0.104]

55 H is a generalized Hankel matrix if it satisfies: ′ ′ ′ ′ ∀¯, u′ ⊂ U, ∀¯, v ′ ⊂ V, u ¯ v ¯ ⊍ uv = ⊍ u v ⇒ ∑ H(u, v) = ∑ H(u v ) u∈¯,v∈¯ u v u′ ∈¯′ ,v ′ ∈¯′ u v where ⊍ is the disjoint union. [sent-172, score-0.165]

56 Let U be a set of generalized prefixes, V be a set of generalized suffixes. [sent-179, score-0.104]

57 y y A key result is the following, which is analogous to Proposition 2 for generalized Hankel matrices: Proposition 3. [sent-182, score-0.052]

58 Let U and V be two sets of generalized prefixes and generalized suffixes. [sent-183, score-0.104]

59 One supposes that rank(Hε ) = rank(H), U is prefix⊺ ⊺ + + closed and V is suffix-closed. [sent-185, score-0.049]

60 The sets U and V contain all the observed pairs of unaligned sequences, and one can check that the sizes of U Σ and ΣV are polynomial in the size of S. [sent-194, score-0.097]

61 4 FST Learning as Non-convex Optimization Proposition 3 shows that FST models can be recovered from generalized Hankel matrices. [sent-199, score-0.052]

62 In this section we show how FST learning can be framed as an optimization problem where one searches for a low-rank generalized Hankel matrix that agrees with observation constraints derived from a sample. [sent-200, score-0.196]

63 We will denote by z = (p([s, t]))s∈Σ∗ ,t∈Σ∗ the vector built from observable probabilities, and by + − z S = (pS ([s, t]))s∈Σ∗ ,t∈Σ∗ the set of empirical observable probabilities, where pS is the frequency + − deduced from an i. [sent-202, score-0.188]

64 Let K be the matrix such that K H = 0 rep⃗ = z represents resents the Hankel constraints (cf. [sent-207, score-0.089]

65 Let O be the matrix such that OH the observation constraints (i. [sent-209, score-0.105]

66 Let us remark that the set of matrices satisfying the constraints is a compact set. [sent-215, score-0.097]

67 Let p be a distribution over i/o sequences computed by an FST. [sent-222, score-0.089]

68 The first one concerns the rank identification, while the second one concerns the consistency of the method. [sent-228, score-0.149]

69 Let p be a rank d distribution computed by an FST. [sent-230, score-0.117]

70 Let p be a rank d distribution computed by an FST. [sent-238, score-0.117]

71 Let us first remark that, among all the values used to build the Hankel matrices in Example 3, some of them correspond to observable statistics, as there is only one possible alignment for them. [sent-247, score-0.181]

72 Then the rank minimization objective is not sufficient, as it allows any value for those variables. [sent-249, score-0.151]

73 It is clear that H ′ has a rank greater or equal than 2. [sent-258, score-0.117]

74 0∗ The only way to reach rank 2 under the constraints is h22 = h51 = 1/72, h52 = 1/216, h63 = 1/192, h92 = 1/96 Thus, the process of rank minimization under linear constraints leads to one single model, which is identical to the original one. [sent-259, score-0.366]

75 Of course, in the general case, the rank minimization objective may lead to several models. [sent-260, score-0.151]

76 One can solve instead a convex relaxation of the problem (1), obtained by replacing the rank objective by the nuclear norm. [sent-263, score-0.219]

77 The nuclear norm ∗ plays the same role than the 1 norm in convex relaxations of the 0 norm, used to reach sparsity under linear constraints. [sent-266, score-0.131]

78 5 Experiments We ran synthetic experiments using samples generated from random FST with input-output alphabets of size two. [sent-267, score-0.051]

79 The main goal of our experiments was to compare our algorithm to a supervised spectral algorithm for FST that has access to alignments. [sent-268, score-0.176]

80 Each run consists of generating a target FST, and generating N aligned samples according to the target distribution. [sent-270, score-0.186]

81 Then, we removed the alignment information from each sample (thus obtaining a pair of unaligned strings), and we used them to train an FST with our general learning algorithm, trying different values for a C parameter that trades-off the nuclear norm term and the observation term. [sent-272, score-0.258]

82 We ran this experiment for 8 target models of 5 states, for different sampling sizes. [sent-273, score-0.045]

83 We measure the L1 error of the learned models with respect to the target distribution on all unaligned pairs of strings whose sizes sum up to 6. [sent-274, score-0.136]

84 First a factorized method that assumes that the two sequences are generated independently, and learns two separate weighted automata using a spectral 7 0. [sent-277, score-0.217]

85 0001 10k 20k 50k 100k 200k 500k 1M Number of Samples (logscale) Figure 1: Learning curves for different methods: SUP, supervised; UNS, unsupervised with different regularizers; EM, expectation maximization. [sent-286, score-0.05]

86 Clearly, our algorithm is able to estimate the target distribution and gets close to the performance of the supervised method, while making use of much simpler statistics. [sent-294, score-0.065]

87 EM performed slightly better than the spectral methods, but nonetheless at the same levels of performance. [sent-295, score-0.105]

88 One can find other experimental results for the unsupervised spectral method in [1], where it is shown that, under certain circumstances, unsupervised spectral method can perform better than supervised EM. [sent-296, score-0.352]

89 6 Conclusion In this paper we presented a spectral algorithm for learning FST from unaligned sequences. [sent-298, score-0.184]

90 This is the first paper to derive a spectral algorithm for the unsupervised FST learning setting. [sent-299, score-0.155]

91 We prove that there is theoretical identifiability of the rank and the parameters of an FST distribution, using a rank minimization formulation. [sent-300, score-0.268]

92 Classically, rank minimization problems are solved with convex relaxations such as the nuclear norm minimization we have proposed. [sent-302, score-0.291]

93 In experiments, we show that our method is comparable to a fully supervised spectral method, and close to the performance of EM. [sent-303, score-0.147]

94 Our approach follows a similar idea to that of [3] since both works combine classic ideas from spectral learning with classic ideas from low rank matrix completion. [sent-304, score-0.262]

95 The basic idea is to frame learning of distributions over structured objects as a low-rank matrix factorization subject to linear constraints derived from observable statistics. [sent-305, score-0.208]

96 This method applies to other grammatical inference domains, such as unsupervised spectral learning of PCFGs ([1]). [sent-306, score-0.226]

97 Unsupervised spectral learning of wcfg as low-rank matrix completion. [sent-316, score-0.145]

98 Spectral learning of general weighted automata via constrained matrix completion. [sent-328, score-0.063]

99 Local loss optimization in operator models: A new insight into spectral learning. [sent-338, score-0.105]

100 Inference of finite-state transducers by using regular grammars and morphisms. [sent-357, score-0.103]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('fst', 0.631), ('hankel', 0.571), ('pre', 0.193), ('xes', 0.157), ('aligned', 0.14), ('rank', 0.117), ('alignments', 0.105), ('spectral', 0.105), ('balle', 0.097), ('ra', 0.089), ('sequences', 0.089), ('observable', 0.082), ('unaligned', 0.079), ('uv', 0.073), ('transducers', 0.071), ('grammatical', 0.071), ('alignment', 0.069), ('rt', 0.066), ('mx', 0.052), ('generalized', 0.052), ('unsupervised', 0.05), ('nuclear', 0.05), ('constraints', 0.049), ('bailly', 0.049), ('msi', 0.049), ('supposes', 0.049), ('uns', 0.049), ('proposition', 0.046), ('hx', 0.043), ('boots', 0.043), ('symbol', 0.042), ('supervised', 0.042), ('matrix', 0.04), ('carreras', 0.039), ('quattoni', 0.039), ('siddiqi', 0.039), ('agrees', 0.039), ('wa', 0.037), ('relaxation', 0.036), ('nition', 0.035), ('minimization', 0.034), ('xavier', 0.034), ('strings', 0.034), ('xn', 0.033), ('oh', 0.032), ('ariadna', 0.032), ('gaattcag', 0.032), ('pcfgs', 0.032), ('matrices', 0.03), ('access', 0.029), ('paired', 0.029), ('suf', 0.029), ('borja', 0.029), ('edit', 0.029), ('uu', 0.029), ('alphabets', 0.029), ('enlarged', 0.026), ('ps', 0.026), ('norm', 0.025), ('hidden', 0.025), ('empty', 0.025), ('ga', 0.025), ('em', 0.024), ('built', 0.024), ('automata', 0.023), ('kh', 0.023), ('target', 0.023), ('alphabet', 0.022), ('ran', 0.022), ('subject', 0.022), ('computes', 0.022), ('output', 0.021), ('let', 0.021), ('ux', 0.02), ('pair', 0.019), ('regular', 0.018), ('sequence', 0.018), ('symbols', 0.018), ('identi', 0.018), ('satisfying', 0.018), ('check', 0.018), ('mapping', 0.017), ('convex', 0.016), ('observation', 0.016), ('concerns', 0.016), ('linguistics', 0.016), ('yn', 0.016), ('frame', 0.015), ('summing', 0.015), ('song', 0.015), ('relaxations', 0.015), ('presents', 0.015), ('rapha', 0.014), ('universitat', 0.014), ('zwald', 0.014), ('grammars', 0.014), ('morphology', 0.014), ('ability', 0.014), ('language', 0.014), ('completion', 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 342 nips-2013-Unsupervised Spectral Learning of Finite State Transducers

Author: Raphael Bailly, Xavier Carreras, Ariadna Quattoni

Abstract: Finite-State Transducers (FST) are a standard tool for modeling paired inputoutput sequences and are used in numerous applications, ranging from computational biology to natural language processing. Recently Balle et al. [4] presented a spectral algorithm for learning FST from samples of aligned input-output sequences. In this paper we address the more realistic, yet challenging setting where the alignments are unknown to the learning algorithm. We frame FST learning as finding a low rank Hankel matrix satisfying constraints derived from observable statistics. Under this formulation, we provide identifiability results for FST distributions. Then, following previous work on rank minimization, we propose a regularized convex relaxation of this objective which is based on minimizing a nuclear norm penalty subject to linear constraints and can be solved efficiently. 1

2 0.065523371 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Author: Adrien Todeschini, François Caron, Marie Chavent

Abstract: We propose a novel class of algorithms for low rank matrix completion. Our approach builds on novel penalty functions on the singular values of the low rank matrix. By exploiting a mixture model representation of this penalty, we show that a suitably chosen set of latent variables enables to derive an ExpectationMaximization algorithm to obtain a Maximum A Posteriori estimate of the completed low rank matrix. The resulting algorithm is an iterative soft-thresholded algorithm which iteratively adapts the shrinkage coefficients associated to the singular values. The algorithm is simple to implement and can scale to large matrices. We provide numerical comparisons between our approach and recent alternatives showing the interest of the proposed approach for low rank matrix completion. 1

3 0.061896153 185 nips-2013-Matrix Completion From any Given Set of Observations

Author: Troy Lee, Adi Shraibman

Abstract: In the matrix completion problem the aim is to recover an unknown real matrix from a subset of its entries. This problem comes up in many application areas, and has received a great deal of attention in the context of the netflix prize. A central approach to this problem is to output a matrix of lowest possible complexity (e.g. rank or trace norm) that agrees with the partially specified matrix. The performance of this approach under the assumption that the revealed entries are sampled randomly has received considerable attention (e.g. [1, 2, 3, 4, 5, 6, 7, 8]). In practice, often the set of revealed entries is not chosen at random and these results do not apply. We are therefore left with no guarantees on the performance of the algorithm we are using. We present a means to obtain performance guarantees with respect to any set of initial observations. The first step remains the same: find a matrix of lowest possible complexity that agrees with the partially specified matrix. We give a new way to interpret the output of this algorithm by next finding a probability distribution over the non-revealed entries with respect to which a bound on the generalization error can be proven. The more complex the set of revealed entries according to a certain measure, the better the bound on the generalization error. 1

4 0.054215249 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions

Author: Le Song, Bo Dai

Abstract: Kernel embedding of distributions has led to many recent advances in machine learning. However, latent and low rank structures prevalent in real world distributions have rarely been taken into account in this setting. Furthermore, no prior work in kernel embedding literature has addressed the issue of robust embedding when the latent and low rank information are misspecified. In this paper, we propose a hierarchical low rank decomposition of kernels embeddings which can exploit such low rank structures in data while being robust to model misspecification. We also illustrate with empirical evidence that the estimated low rank embeddings lead to improved performance in density estimation. 1

5 0.053274617 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization

Author: Ke Hou, Zirui Zhou, Anthony Man-Cho So, Zhi-Quan Luo

Abstract: Motivated by various applications in machine learning, the problem of minimizing a convex smooth loss function with trace norm regularization has received much attention lately. Currently, a popular method for solving such problem is the proximal gradient method (PGM), which is known to have a sublinear rate of convergence. In this paper, we show that for a large class of loss functions, the convergence rate of the PGM is in fact linear. Our result is established without any strong convexity assumption on the loss function. A key ingredient in our proof is a new Lipschitzian error bound for the aforementioned trace norm–regularized problem, which may be of independent interest.

6 0.053143673 307 nips-2013-Speedup Matrix Completion with Side Information: Application to Multi-Label Learning

7 0.051790781 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

8 0.050758235 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints

9 0.049958978 137 nips-2013-High-Dimensional Gaussian Process Bandits

10 0.047316551 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling

11 0.04301111 295 nips-2013-Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors

12 0.042477556 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition

13 0.040039986 11 nips-2013-A New Convex Relaxation for Tensor Completion

14 0.039980218 73 nips-2013-Convex Relaxations for Permutation Problems

15 0.039588235 91 nips-2013-Dirty Statistical Models

16 0.038348891 75 nips-2013-Convex Two-Layer Modeling

17 0.036913473 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors

18 0.036108028 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion

19 0.034945391 301 nips-2013-Sparse Additive Text Models with Low Rank Background

20 0.033686932 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.095), (1, 0.036), (2, 0.04), (3, 0.062), (4, -0.003), (5, -0.028), (6, -0.007), (7, -0.003), (8, -0.027), (9, -0.007), (10, 0.001), (11, 0.005), (12, 0.038), (13, 0.035), (14, 0.004), (15, 0.029), (16, -0.012), (17, 0.021), (18, -0.017), (19, -0.004), (20, -0.028), (21, -0.045), (22, -0.003), (23, 0.028), (24, 0.006), (25, 0.0), (26, -0.024), (27, -0.075), (28, -0.041), (29, 0.009), (30, -0.016), (31, -0.048), (32, -0.025), (33, -0.018), (34, 0.022), (35, 0.056), (36, 0.073), (37, -0.009), (38, 0.039), (39, 0.027), (40, 0.016), (41, -0.011), (42, 0.049), (43, -0.032), (44, 0.022), (45, 0.012), (46, -0.019), (47, -0.004), (48, 0.005), (49, -0.057)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87975663 342 nips-2013-Unsupervised Spectral Learning of Finite State Transducers

Author: Raphael Bailly, Xavier Carreras, Ariadna Quattoni

Abstract: Finite-State Transducers (FST) are a standard tool for modeling paired inputoutput sequences and are used in numerous applications, ranging from computational biology to natural language processing. Recently Balle et al. [4] presented a spectral algorithm for learning FST from samples of aligned input-output sequences. In this paper we address the more realistic, yet challenging setting where the alignments are unknown to the learning algorithm. We frame FST learning as finding a low rank Hankel matrix satisfying constraints derived from observable statistics. Under this formulation, we provide identifiability results for FST distributions. Then, following previous work on rank minimization, we propose a regularized convex relaxation of this objective which is based on minimizing a nuclear norm penalty subject to linear constraints and can be solved efficiently. 1

2 0.73080391 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Author: Adrien Todeschini, François Caron, Marie Chavent

Abstract: We propose a novel class of algorithms for low rank matrix completion. Our approach builds on novel penalty functions on the singular values of the low rank matrix. By exploiting a mixture model representation of this penalty, we show that a suitably chosen set of latent variables enables to derive an ExpectationMaximization algorithm to obtain a Maximum A Posteriori estimate of the completed low rank matrix. The resulting algorithm is an iterative soft-thresholded algorithm which iteratively adapts the shrinkage coefficients associated to the singular values. The algorithm is simple to implement and can scale to large matrices. We provide numerical comparisons between our approach and recent alternatives showing the interest of the proposed approach for low rank matrix completion. 1

3 0.72378737 185 nips-2013-Matrix Completion From any Given Set of Observations

Author: Troy Lee, Adi Shraibman

Abstract: In the matrix completion problem the aim is to recover an unknown real matrix from a subset of its entries. This problem comes up in many application areas, and has received a great deal of attention in the context of the netflix prize. A central approach to this problem is to output a matrix of lowest possible complexity (e.g. rank or trace norm) that agrees with the partially specified matrix. The performance of this approach under the assumption that the revealed entries are sampled randomly has received considerable attention (e.g. [1, 2, 3, 4, 5, 6, 7, 8]). In practice, often the set of revealed entries is not chosen at random and these results do not apply. We are therefore left with no guarantees on the performance of the algorithm we are using. We present a means to obtain performance guarantees with respect to any set of initial observations. The first step remains the same: find a matrix of lowest possible complexity that agrees with the partially specified matrix. We give a new way to interpret the output of this algorithm by next finding a probability distribution over the non-revealed entries with respect to which a bound on the generalization error can be proven. The more complex the set of revealed entries according to a certain measure, the better the bound on the generalization error. 1

4 0.71982008 307 nips-2013-Speedup Matrix Completion with Side Information: Application to Multi-Label Learning

Author: Miao Xu, Rong Jin, Zhi-Hua Zhou

Abstract: In standard matrix completion theory, it is required to have at least O(n ln2 n) observed entries to perfectly recover a low-rank matrix M of size n × n, leading to a large number of observations when n is large. In many real tasks, side information in addition to the observed entries is often available. In this work, we develop a novel theory of matrix completion that explicitly explore the side information to reduce the requirement on the number of observed entries. We show that, under appropriate conditions, with the assistance of side information matrices, the number of observed entries needed for a perfect recovery of matrix M can be dramatically reduced to O(ln n). We demonstrate the effectiveness of the proposed approach for matrix completion in transductive incomplete multi-label learning. 1

5 0.67202818 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices

Author: Dimitris Achlioptas, Zohar S. Karnin, Edo Liberty

Abstract: We consider the problem of selecting non-zero entries of a matrix A in order to produce a sparse sketch of it, B, that minimizes A B 2 . For large m n matrices, such that n m (for example, representing n observations over m attributes) we give sampling distributions that exhibit four important properties. First, they have closed forms computable from minimal information regarding A. Second, they allow sketching of matrices whose non-zeros are presented to the algorithm in arbitrary order as a stream, with O 1 computation per non-zero. Third, the resulting sketch matrices are not only sparse, but their non-zero entries are highly compressible. Lastly, and most importantly, under mild assumptions, our distributions are provably competitive with the optimal offline distribution. Note that the probabilities in the optimal offline distribution may be complex functions of all the entries in the matrix. Therefore, regardless of computational complexity, the optimal distribution might be impossible to compute in the streaming model. 1

6 0.65751237 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling

7 0.62185675 73 nips-2013-Convex Relaxations for Permutation Problems

8 0.60065663 186 nips-2013-Matrix factorization with binary components

9 0.58038723 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion

10 0.56842142 352 nips-2013-What do row and column marginals reveal about your dataset?

11 0.54329473 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC

12 0.54274082 306 nips-2013-Speeding up Permutation Testing in Neuroimaging

13 0.53979814 91 nips-2013-Dirty Statistical Models

14 0.53951985 296 nips-2013-Sinkhorn Distances: Lightspeed Computation of Optimal Transport

15 0.51776057 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization

16 0.5032655 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

17 0.4982512 297 nips-2013-Sketching Structured Matrices for Faster Nonlinear Regression

18 0.49717033 247 nips-2013-Phase Retrieval using Alternating Minimization

19 0.49501684 137 nips-2013-High-Dimensional Gaussian Process Bandits

20 0.47120956 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.01), (10, 0.338), (16, 0.015), (33, 0.126), (34, 0.077), (36, 0.011), (41, 0.026), (49, 0.029), (56, 0.123), (70, 0.022), (85, 0.037), (89, 0.032), (93, 0.033), (95, 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.69579923 342 nips-2013-Unsupervised Spectral Learning of Finite State Transducers

Author: Raphael Bailly, Xavier Carreras, Ariadna Quattoni

Abstract: Finite-State Transducers (FST) are a standard tool for modeling paired inputoutput sequences and are used in numerous applications, ranging from computational biology to natural language processing. Recently Balle et al. [4] presented a spectral algorithm for learning FST from samples of aligned input-output sequences. In this paper we address the more realistic, yet challenging setting where the alignments are unknown to the learning algorithm. We frame FST learning as finding a low rank Hankel matrix satisfying constraints derived from observable statistics. Under this formulation, we provide identifiability results for FST distributions. Then, following previous work on rank minimization, we propose a regularized convex relaxation of this objective which is based on minimizing a nuclear norm penalty subject to linear constraints and can be solved efficiently. 1

2 0.65792727 276 nips-2013-Reshaping Visual Datasets for Domain Adaptation

Author: Boqing Gong, Kristen Grauman, Fei Sha

Abstract: In visual recognition problems, the common data distribution mismatches between training and testing make domain adaptation essential. However, image data is difficult to manually divide into the discrete domains required by adaptation algorithms, and the standard practice of equating datasets with domains is a weak proxy for all the real conditions that alter the statistics in complex ways (lighting, pose, background, resolution, etc.) We propose an approach to automatically discover latent domains in image or video datasets. Our formulation imposes two key properties on domains: maximum distinctiveness and maximum learnability. By maximum distinctiveness, we require the underlying distributions of the identified domains to be different from each other to the maximum extent; by maximum learnability, we ensure that a strong discriminative model can be learned from the domain. We devise a nonparametric formulation and efficient optimization procedure that can successfully discover domains among both training and test data. We extensively evaluate our approach on object recognition and human activity recognition tasks. 1

3 0.6156925 158 nips-2013-Learning Multiple Models via Regularized Weighting

Author: Daniel Vainsencher, Shie Mannor, Huan Xu

Abstract: We consider the general problem of Multiple Model Learning (MML) from data, from the statistical and algorithmic perspectives; this problem includes clustering, multiple regression and subspace clustering as special cases. A common approach to solving new MML problems is to generalize Lloyd’s algorithm for clustering (or Expectation-Maximization for soft clustering). However this approach is unfortunately sensitive to outliers and large noise: a single exceptional point may take over one of the models. We propose a different general formulation that seeks for each model a distribution over data points; the weights are regularized to be sufficiently spread out. This enhances robustness by making assumptions on class balance. We further provide generalization bounds and explain how the new iterations may be computed efficiently. We demonstrate the robustness benefits of our approach with some experimental results and prove for the important case of clustering that our approach has a non-trivial breakdown point, i.e., is guaranteed to be robust to a fixed percentage of adversarial unbounded outliers. 1

4 0.6042279 95 nips-2013-Distributed Exploration in Multi-Armed Bandits

Author: Eshcar Hillel, Zohar S. Karnin, Tomer Koren, Ronny Lempel, Oren Somekh

Abstract: We study exploration in Multi-Armed Bandits in a setting where k players collaborate in order to identify an ε-optimal arm. Our motivation comes from recent employment of bandit algorithms in computationally intensive, large-scale applications. Our results demonstrate a non-trivial tradeoff between the number of arm pulls required by each of the players, and the amount of communication between them. In particular, our main result shows that by allowing the k players to com√ municate only once, they are able to learn k times faster than a single player. √ That is, distributing learning to k players gives rise to a factor k parallel speedup. We complement this result with a lower bound showing this is in general the best possible. On the other extreme, we present an algorithm that achieves the ideal factor k speed-up in learning performance, with communication only logarithmic in 1/ε. 1

5 0.51462501 249 nips-2013-Polar Operators for Structured Sparse Estimation

Author: Xinhua Zhang, Yao-Liang Yu, Dale Schuurmans

Abstract: Structured sparse estimation has become an important technique in many areas of data analysis. Unfortunately, these estimators normally create computational difficulties that entail sophisticated algorithms. Our first contribution is to uncover a rich class of structured sparse regularizers whose polar operator can be evaluated efficiently. With such an operator, a simple conditional gradient method can then be developed that, when combined with smoothing and local optimization, significantly reduces training time vs. the state of the art. We also demonstrate a new reduction of polar to proximal maps that enables more efficient latent fused lasso. 1

6 0.5139311 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints

7 0.5128696 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization

8 0.51201278 344 nips-2013-Using multiple samples to learn mixture models

9 0.51183915 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

10 0.51158601 233 nips-2013-Online Robust PCA via Stochastic Optimization

11 0.51157057 318 nips-2013-Structured Learning via Logistic Regression

12 0.51105034 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

13 0.51092279 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

14 0.51055306 224 nips-2013-On the Sample Complexity of Subspace Learning

15 0.50990993 355 nips-2013-Which Space Partitioning Tree to Use for Search?

16 0.50980562 28 nips-2013-Adaptive Step-Size for Policy Gradient Methods

17 0.50872797 177 nips-2013-Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation

18 0.50859618 314 nips-2013-Stochastic Optimization of PCA with Capped MSG

19 0.50857919 184 nips-2013-Marginals-to-Models Reducibility

20 0.50855958 137 nips-2013-High-Dimensional Gaussian Process Bandits