jmlr jmlr2012 jmlr2012-15 knowledge-graph by maker-knowledge-mining

15 jmlr-2012-Algebraic Geometric Comparison of Probability Distributions


Source: pdf

Author: Franz J. Király, Paul von Bünau, Frank C. Meinecke, Duncan A.J. Blythe, Klaus-Robert Müller

Abstract: We propose a novel algebraic algorithmic framework for dealing with probability distributions represented by their cumulants such as the mean and covariance matrix. As an example, we consider the unsupervised learning problem of finding the subspace on which several probability distributions agree. Instead of minimizing an objective function involving the estimated cumulants, we show that by treating the cumulants as elements of the polynomial ring we can directly solve the problem, at a lower computational cost and with higher accuracy. Moreover, the algebraic viewpoint on probability distributions allows us to invoke the theory of algebraic geometry, which we demonstrate in a compact proof for an identifiability criterion. Keywords: computational algebraic geometry, approximate algebra, unsupervised Learning

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 28/29 10587 Berlin, Germany Editor: Kenji Fukumizu Abstract We propose a novel algebraic algorithmic framework for dealing with probability distributions represented by their cumulants such as the mean and covariance matrix. [sent-17, score-0.762]

2 Moreover, the algebraic viewpoint on probability distributions allows us to invoke the theory of algebraic geometry, which we demonstrate in a compact proof for an identifiability criterion. [sent-20, score-1.02]

3 Instead of minimizing an objective function that involves the polynomials, we consider the polynomials as objects in their own right and then solve the problem by algebraic manipulations. [sent-60, score-0.788]

4 Therefore we present an algorithm which combines the statistical treatment of uncertainty in the coefficients of polynomials with the exactness of algebraic computations to obtain a consistent estimator for v that is computationally efficient. [sent-77, score-0.788]

5 Moreover, an algebraic approach may also be useful in solving certain optimization problems, as the set of extrema of a polynomial objective function can be described by the vanishing set of its gradient. [sent-81, score-0.674]

6 The proof hinges on viewing the cumulants as polynomials in the algebraic geometry framework: the polynomials that define the sought-after projection (e. [sent-136, score-1.386]

7 , Equations 3) generate an ideal in the polynomial ring which corresponds to an algebraic set that contains all possible solutions. [sent-138, score-0.81]

8 We can then show how many independent polynomials are necessary so that the dimension of the linear part of the algebraic set has smaller dimension in the generic case. [sent-139, score-1.058]

9 We conjecture that these proof techniques are also applicable to other scenarios where we aim to identify a property of a probability distribution from its cumulants using algebraic methods. [sent-140, score-0.75]

10 There is also the nascent field of algebraic statistics which studies the parameter spaces of mainly discrete random variables in terms of commutative algebra and algebraic geometry, see the recent overviews by Sturmfels (2002, Chapter 8) and Drton et al. [sent-144, score-1.032]

11 We will first of all review the relationship between the probability density function and its cumulants, before we translate the cumulants into algebraic objects. [sent-172, score-0.727]

12 Now it is only a formal step to arrive in the framework of algebraic geometry: let us think of the left hand side of each of the quadratic and linear equations as polynomials q1 , . [sent-211, score-0.832]

13 Thus S can be rewritten in terms of polynomials, S = v ∈ RD | qi (v) = fi (v) = 0 ∀ 1 ≤ i ≤ m − 1} , which means that S is an algebraic set. [sent-228, score-0.683]

14 Intuitively, a generic polynomial is a continuous, polynomial valued random variable which almost surely has no algebraic properties except for those that are logically implied by the conditions on it. [sent-255, score-0.995]

15 An algebraic property is an event in the probability space of polynomials which is defined by the common vanishing of a set of polynomial equations in the coefficients. [sent-256, score-1.026]

16 For example, the property that a quadratic polynomial is a square of linear polynomial is an algebraic property, since it is described by the vanishing of the discriminants. [sent-257, score-0.84]

17 In the context of Problem 7, we will consider the observed polynomials as generic conditioned on the algebraic property that they vanish on a fixed d-dimensional linear subspace S′ . [sent-258, score-1.163]

18 2 Identifiability In the last subsection, we have seen how to reformulate our initial Problem 1 about comparison of cumulants as the completely algebraic Problem 8. [sent-290, score-0.703]

19 Since identifiability is now a completely algebraic statement, it can be treated also in algebraic terms. [sent-293, score-0.966]

20 , fn be generic homogenous polynomials in D variables (of fixed but arbitrary degree each), vanishing on Z. [sent-312, score-0.912]

21 In the previous section, we derived in Problem 8 an algebraic formulation of our task: given generic quadratic polynomials q1 , . [sent-342, score-1.08]

22 , fm−1 , vanishing on a unknown linear subspace S′ of CD , find S′ as the unique d-dimensional linear subspace in the algebraic set V(q1 , . [sent-348, score-0.717]

23 First of all, note that the linear equations fi can easily be removed from the problem: instead of looking at CD , we can consider the linear subspace defined by the fi , and examine the algebraic set V(q′ , . [sent-355, score-0.821]

24 Given m − 1 ≥ D generic homogenous quadratic polynomials q1 , . [sent-363, score-0.774]

25 The algebraic set S corresponds to an ideal in the polynomial ring C[T1 , . [sent-379, score-0.81]

26 , qm−1 be generic homogenous quadratic polynomials vanishing on a linear d-dimensional subspace S ⊆ CD . [sent-396, score-0.926]

27 Computing the radical of an ideal is a classical problem in computational algebraic geometry, which is known to be difficult (for a more detailed discussion see Section 3. [sent-406, score-0.702]

28 Moreover, slight algebraic modifications of this strategy also allow to consider data from cumulants of different degree simultaneously, and to reduce the number of needed polynomials to O(D); however, due to its technicality, this is beyond the scope of the paper. [sent-506, score-1.076]

29 , qm with m ≥ ∆(D) − ∆(d) be generic homogenous quadratic polynomials in s. [sent-535, score-0.89]

30 3 Relation to Previous Work in Computational Algebraic Geometry In this section, we discuss how the algebraic formulation of the cumulant comparison problem given in Problem 14 relates to the classical problems in computational algebraic geometry. [sent-597, score-1.026]

31 Recently, the algebraic geometry community has developed an increasing interest in solving algebraic problems arising from the consideration of real world data. [sent-643, score-1.016]

32 Approximate Algebraic Geometry on Real Data In this section we show how algebraic computations can be applied to polynomials with inexact coefficients obtained from estimated cumulants on finite samples. [sent-659, score-1.008]

33 In algebraic terms, working with inexact polynomials means that the joint vanishing set of q1 , . [sent-669, score-0.858]

34 1 Noise level (log10) 1 2 3 4 5 6 7 8 9 Number of stationary sources Figure 6: The left panel shows a comparison of the algebraic method and the SSA algorithm over varying noise levels (five stationary sources in ten dimensions), the two curves show the median log error. [sent-841, score-0.724]

35 Conclusion In this paper we have shown how a learning problem formulated in terms of cumulants of probability distributions can be addressed in the framework of computational algebraic geometry. [sent-850, score-0.727]

36 To that end, we have introduced the theoretical groundwork for an algebraic treatment of inexact cumulants estimated from data: the concept of polynomials that are generic up to a certain property which we aim to recover from the data. [sent-855, score-1.301]

37 In particular, we have shown how we can find an approximate exact solution to this problem using algebraic manipulation of cumulants estimated on samples drawn from X1 , . [sent-856, score-0.703]

38 Moreover, using the algebraic problem formulation in terms of generic polynomials, we have presented compact proofs for a condition on the identifiability of the true solution. [sent-861, score-0.753]

39 This is due to the fact that the algebraic algorithm represents the cumulant polynomials in the vector space of coefficients. [sent-866, score-0.848]

40 Namely, one has to note that q1 and q2 are generic homogenous polynomials of degree 2, vanishing on S. [sent-908, score-0.89]

41 The solution can then be determined as an ideal generated by generic homogenous polynomials vanishing on a linear subspace. [sent-948, score-0.951]

42 Since genericity is an algebraic-geometric concept, knowledge about basic algebraic geometry will be required for an understanding of this section. [sent-950, score-0.733]

43 In particular, the reader should be at least familiar with the following concepts before reading this section: polynomial rings, ideals, radicals, factor rings, algebraic sets, algebra-geometry correspondence (including Hilbert’s Nullstellensatz), primary decomposition, height resp. [sent-951, score-0.658]

44 1 Definition of Genericity In the algebraic setting of the paper, we would like to calculate the radical of an ideal I = q1 , . [sent-956, score-0.702]

45 So, informally spoken, the ideal I is generated by generic homogenous elements vanishing on S. [sent-968, score-0.646]

46 On the other hand, genericity is a classical concept in algebraic geometry, in particular in the theory of moduli. [sent-971, score-0.683]

47 The interpretation of generic properties as probability-one-properties is also a known concept in applied algebraic geometry, for example, 883 ´ ¨ ¨ K IR ALY, VON B UNAU , M EINECKE , B LYTHE AND M ULLER algebraic statistics. [sent-972, score-1.236]

48 However, the application of probability distributions and genericity to the setting of generic ideals, in particular in the context of conditional probabilities, are original to the best of our knowledge, though not being the first one to involve generic resp. [sent-973, score-0.764]

49 A collection of results on generic polynomials and ideals which partly overlap with ours may also be found in the recent paper of Pardue (2010). [sent-976, score-0.682]

50 We will now fix notation for the parameter space of polynomials and endow it with algebraic structure. [sent-1000, score-0.788]

51 This dual interpretation will be the main ingredient in our definition of genericity, and will allow us to relate algebraic results on genericity to the probabilistic setting in the applications. [sent-1005, score-0.683]

52 Then an event for X is called algebraic event or algebraic property if the corresponding event set in Mk is an algebraic set. [sent-1007, score-1.472]

53 By definition, it suffices to prove that the subset of Mk corresponding to those polynomials is an irreducible algebraic set. [sent-1029, score-0.855]

54 For example, a 885 ´ ¨ ¨ K IR ALY, VON B UNAU , M EINECKE , B LYTHE AND M ULLER polynomial of degree 4 may factor into two polynomials of degree 1 and 3, or in two polynomials of degree 2 each. [sent-1039, score-0.935]

55 , irreducible) algebraic property, should not fulfill any other algebraic property which is not logically implied by the first one. [sent-1045, score-0.989]

56 Definition 32 We call a generic random variable with values in Mk a generic polynomial of degree k. [sent-1064, score-0.729]

57 , fm generic if they are generic and independent. [sent-1070, score-0.679]

58 We call an ideal generic if it is generated by a set of m generic polynomials. [sent-1071, score-0.669]

59 We call an algebraic set generic if it is the vanishing set of a generic ideal. [sent-1072, score-1.093]

60 886 A LGEBRAIC G EOMETRIC C OMPARISON OF P ROBABILITY D ISTRIBUTIONS Let P be an algebraic property on a polynomial, a set of polynomials, an ideal, or an algebraic set (e. [sent-1073, score-0.989]

61 If A is a statement about an object (polynomial, ideal etc), and P an algebraic property, we will say briefly “A generic P object is A ” instead of saying “A generic P object is A with probability one”. [sent-1078, score-1.199]

62 So for example, when we say “The Z of a generic red polynomial is a generic yellow polynomial”, this is an abbreviation of the statement “Let X be a generic random variable on Mk , let X ′ be the yellow conditional of X. [sent-1093, score-0.976]

63 As we will see, genericity is the right concept to model random sampling of polynomials, as we will derive special properties of the ideal I which follow from the genericity of the fi . [sent-1102, score-0.657]

64 Furthermore, sums of generic distributions are again generic; also, one can infer that any continuous probability density dominated by the distribution of a generic density defines again a generic distribution. [sent-1144, score-0.834]

65 Another example where genericity classically occurs is algebraic geometry, where it is defined rather general for moduli spaces. [sent-1172, score-0.683]

66 generic properties in the algebraic sense, as any general resp. [sent-1174, score-0.753]

67 property A of P is called very generic if it holds on the complement of some countable union of algebraic sets in B. [sent-1181, score-0.776]

68 , XD−1 , 1) is a generic polynomial of degree d in the polynomial ring C[X1 , . [sent-1218, score-0.657]

69 We will derive an statement about spans of generic polynomials, and generic versions of Krull’s principal ideal and height theorems which will be the main tool in controlling the structure of generic ideals. [sent-1243, score-1.04]

70 Now we present the first result which can be easily formulated in terms of genericity: Proposition 42 Let P be an algebraic property such that the polynomials with property P form a vector space V . [sent-1245, score-0.834]

71 We now proceed to another nontrivial result which will now allow us to formulate a generic version of Krull’s principal ideal theorem: Proposition 43 Let Z ⊆ CD be a non-empty algebraic set, let f ∈ C[X1 , . [sent-1264, score-0.928]

72 Proof We claim: being a zero divisor in O (Z) is an irreducible algebraic property. [sent-1272, score-0.652]

73 This proves that (g1 + αg2 ) is also a zero divisor, proving that the zero divisors form a linear subspace and thus an irreducible algebraic property. [sent-1276, score-0.709]

74 If one can write f = f ′ + c, where f ′ is a generic polynomial subject to some property P′ , and c is a generic constant, then f is no zero divisor in C[X1 , . [sent-1291, score-0.786]

75 Now, as in the proof of Proposition 43, we see that being a zero divisor in O (Z) is an irreducible algebraic property and corresponds to a linear subspace of Mk , where k = deg f . [sent-1303, score-0.807]

76 Note that Proposition 43 is actually a special case of Corollary 44, since we can write any generic polynomial f as f ′ + c, where f ′ is generic of the same degree, and c is a generic constant. [sent-1310, score-0.931]

77 The major tool to deal with the dimension of generic intersections is Krull’s principal ideal theorem: Theorem 45 (Krull’s principal ideal theorem) Let R be a commutative ring with unit, let f ∈ R be non-zero and non-invertible. [sent-1311, score-0.702]

78 Together with Proposition 43, one gets a generic version of Krull’s principal ideal theorem: Theorem 47 (Generic principal ideal theorem) Let Z be a non-empty algebraic set in CD , let R = O (Z), and let f ∈ C[X1 , . [sent-1320, score-1.103]

79 Corollary 48 Consider an algebraic set Z ⊆ CD , and the algebraic set V( f ) for some generic f ∈ C[X1 , . [sent-1326, score-1.236]

80 The generic version of the principal ideal theorem straightforwardly generalizes to a generic version of Krull’s height theorem. [sent-1334, score-0.747]

81 The generic version can be derived directly from the generic principal ideal theorem: Theorem 50 (Generic height theorem) Let Z be an algebraic set in CD , let I = f1 , . [sent-1346, score-1.252]

82 Note that we could have proved the generic height theorem also directly from the generic principal ideal theorem by induction. [sent-1384, score-0.747]

83 Again, we give the geometric interpretation of Krull’s height theorem: Corollary 51 Let Z1 be an algebraic set in CD , let Z2 be a generic algebraic set in CD . [sent-1385, score-1.312]

84 We can now immediately formulate a homogenous version of Proposition 51: Corollary 52 Let Z1 be a homogenous algebraic set in CD , let Z2 be a generic homogenous algebraic set in CD . [sent-1388, score-1.789]

85 Proof Note that homogenization and dehomogenization of a non-empty algebraic set do not change its codimension, and homogenous algebraic sets always contain the origin. [sent-1390, score-1.193]

86 Also, one has to note that by Lemma 41, the dehomogenization of Z2 is a generic algebraic set in CD−1 . [sent-1391, score-0.803]

87 If one can write f = f ′ + c, where f ′ is a generic polynomial subject to some property P′ , and c is a generic constant, we say that f has independent constant term. [sent-1397, score-0.684]

88 Using this, we can now formulate the corresponding variant of the generic height theorem: 895 ´ ¨ ¨ K IR ALY, VON B UNAU , M EINECKE , B LYTHE AND M ULLER Lemma 54 Let Z be an algebraic set in CD . [sent-1400, score-0.807]

89 5 Generic Ideals The generic height theorem 50 has allowed us to make statements about the structure of ideals generated by generic elements without constraints. [sent-1418, score-0.701]

90 In this subsection, we will use the theory developed so far to study generic ideals and generic ideals subject to some algebraic properties, for example, generic ideals contained in other ideals. [sent-1420, score-1.614]

91 , fm , m ≥ max(D + 1, n) with generic fi ∈ s such that deg fi ≥ max (deg g j ) for all 1 ≤ i ≤ m. [sent-1433, score-0.715]

92 Proof First note that since the gi form a degree-first Groebner basis, a generic f ∈ s is of the form n f= ∑ gk hk with generic hk , k=1 where the degrees of the hk are appropriately chosen, that is, deg hk ≤ deg f − deg gk . [sent-1435, score-0.777]

93 So we may write n fi = ∑ gk hki with generic hki , k=1 where the hki are generic with appropriate degrees, and independently chosen. [sent-1436, score-0.83]

94 Algebraic sets which are not complete intersection sets are still contained in a complete intersection set of same dimension, so the following similar result holds for arbitrary algebraic sets: Corollary 59 Let V be a fixed algebraic set in CD , of codimension d; let g1 , . [sent-1576, score-1.048]

95 , fm ) = V ∪U with an algebraic set U whose codimension satisfies codimU = m if m ≤ d codimU ≥ min(D + 1 + m − n, m, D + 1) if m ≥ d. [sent-1587, score-0.682]

96 Let Z2 be a generic algebraic set in CD containing V. [sent-1590, score-0.753]

97 Informally, we have derived a height theorem type result for algebraic sets under the constraint that they contain another prescribed algebraic set V . [sent-1592, score-1.02]

98 , fm be generic homogenous forms in I(V ), satisfying the degree condition as in Proposition 57. [sent-1597, score-0.681]

99 , fN be generic homogenous quadratic or linear polynomials in s. [sent-1637, score-0.774]

100 , fN be generic homogenous polynomials of degree one or two, vanishing on a linear space S of dimension d > 0. [sent-1651, score-0.89]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('algebraic', 0.483), ('polynomials', 0.305), ('generic', 0.27), ('cumulants', 0.22), ('genericity', 0.2), ('homogenous', 0.177), ('codim', 0.14), ('fm', 0.139), ('ideal', 0.129), ('fi', 0.128), ('td', 0.124), ('polynomial', 0.121), ('aly', 0.12), ('einecke', 0.12), ('istributions', 0.12), ('lgebraic', 0.12), ('lythe', 0.12), ('uller', 0.12), ('unau', 0.12), ('qm', 0.116), ('ideals', 0.107), ('von', 0.103), ('eometric', 0.103), ('robability', 0.103), ('cd', 0.09), ('radical', 0.09), ('subspace', 0.082), ('divisor', 0.081), ('nau', 0.08), ('ssa', 0.08), ('ir', 0.08), ('panel', 0.077), ('ring', 0.077), ('krull', 0.075), ('monomials', 0.075), ('qi', 0.072), ('omparison', 0.071), ('vanishing', 0.07), ('degree', 0.068), ('irreducible', 0.067), ('xd', 0.067), ('mk', 0.065), ('codimension', 0.06), ('cumulant', 0.06), ('stationary', 0.055), ('height', 0.054), ('deg', 0.05), ('dehomogenization', 0.05), ('meinecke', 0.05), ('geometry', 0.05), ('proposition', 0.048), ('hki', 0.045), ('span', 0.044), ('triangular', 0.04), ('algebra', 0.039), ('coef', 0.038), ('covariance', 0.035), ('divisors', 0.035), ('groebner', 0.035), ('divisible', 0.034), ('monomial', 0.034), ('degrees', 0.033), ('doi', 0.033), ('generators', 0.032), ('ti', 0.032), ('cients', 0.032), ('ful', 0.031), ('px', 0.031), ('blythe', 0.03), ('codimu', 0.03), ('kir', 0.03), ('lexicographical', 0.03), ('viewpoint', 0.03), ('row', 0.029), ('gk', 0.027), ('commutative', 0.027), ('forms', 0.027), ('xm', 0.027), ('sources', 0.027), ('subspaces', 0.026), ('homogenously', 0.025), ('pxm', 0.025), ('paul', 0.025), ('principal', 0.024), ('vanishes', 0.024), ('epochs', 0.024), ('probability', 0.024), ('remark', 0.023), ('statement', 0.023), ('projection', 0.023), ('property', 0.023), ('ly', 0.023), ('identi', 0.023), ('fn', 0.022), ('let', 0.022), ('quadratic', 0.022), ('zero', 0.021), ('duncan', 0.021), ('franz', 0.021), ('basis', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 15 jmlr-2012-Algebraic Geometric Comparison of Probability Distributions

Author: Franz J. Király, Paul von Bünau, Frank C. Meinecke, Duncan A.J. Blythe, Klaus-Robert Müller

Abstract: We propose a novel algebraic algorithmic framework for dealing with probability distributions represented by their cumulants such as the mean and covariance matrix. As an example, we consider the unsupervised learning problem of finding the subspace on which several probability distributions agree. Instead of minimizing an objective function involving the estimated cumulants, we show that by treating the cumulants as elements of the polynomial ring we can directly solve the problem, at a lower computational cost and with higher accuracy. Moreover, the algebraic viewpoint on probability distributions allows us to invoke the theory of algebraic geometry, which we demonstrate in a compact proof for an identifiability criterion. Keywords: computational algebraic geometry, approximate algebra, unsupervised Learning

2 0.094905242 91 jmlr-2012-Plug-in Approach to Active Learning

Author: Stanislav Minsker

Abstract: We present a new active learning algorithm based on nonparametric estimators of the regression function. Our investigation provides probabilistic bounds for the rates of convergence of the generalization error achievable by proposed method over a broad class of underlying distributions. We also prove minimax lower bounds which show that the obtained rates are almost tight. Keywords: active learning, selective sampling, model selection, classification, confidence bands

3 0.084793501 73 jmlr-2012-Multi-task Regression using Minimal Penalties

Author: Matthieu Solnon, Sylvain Arlot, Francis Bach

Abstract: In this paper we study the kernel multiple ridge regression framework, which we refer to as multitask regression, using penalization techniques. The theoretical analysis of this problem shows that the key element appearing for an optimal calibration is the covariance matrix of the noise between the different tasks. We present a new algorithm to estimate this covariance matrix, based on the concept of minimal penalty, which was previously used in the single-task regression framework to estimate the variance of the noise. We show, in a non-asymptotic setting and under mild assumptions on the target function, that this estimator converges towards the covariance matrix. Then plugging this estimator into the corresponding ideal penalty leads to an oracle inequality. We illustrate the behavior of our algorithm on synthetic examples. Keywords: multi-task, oracle inequality, learning theory

4 0.072848372 76 jmlr-2012-Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics

Author: Michael U. Gutmann, Aapo Hyvärinen

Abstract: We consider the task of estimating, from observed data, a probabilistic model that is parameterized by a finite number of parameters. In particular, we are considering the situation where the model probability density function is unnormalized. That is, the model is only specified up to the partition function. The partition function normalizes a model so that it integrates to one for any choice of the parameters. However, it is often impossible to obtain it in closed form. Gibbs distributions, Markov and multi-layer networks are examples of models where analytical normalization is often impossible. Maximum likelihood estimation can then not be used without resorting to numerical approximations which are often computationally expensive. We propose here a new objective function for the estimation of both normalized and unnormalized models. The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise. With this approach, the normalizing partition function can be estimated like any other parameter. We prove that the new estimation method leads to a consistent (convergent) estimator of the parameters. For large noise sample sizes, the new estimator is furthermore shown to behave like the maximum likelihood estimator. In the estimation of unnormalized models, there is a trade-off between statistical and computational performance. We show that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models. As an application to real data, we estimate novel two-layer models of natural image statistics with spline nonlinearities. Keywords: statistics unnormalized models, partition function, computation, estimation, natural image

5 0.06423644 68 jmlr-2012-Minimax Manifold Estimation

Author: Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman

Abstract: We find the minimax rate of convergence in Hausdorff distance for estimating a manifold M of dimension d embedded in RD given a noisy sample from the manifold. Under certain conditions, we show that the optimal rate of convergence is n−2/(2+d) . Thus, the minimax rate depends only on the dimension of the manifold, not on the dimension of the space in which M is embedded. Keywords: manifold learning, minimax estimation

6 0.049547795 3 jmlr-2012-A Geometric Approach to Sample Compression

7 0.04522679 49 jmlr-2012-Hope and Fear for Discriminative Training of Statistical Translation Models

8 0.037909709 14 jmlr-2012-Activized Learning: Transforming Passive to Active with Improved Label Complexity

9 0.037047766 59 jmlr-2012-Linear Regression With Random Projections

10 0.036183774 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression

11 0.032725938 51 jmlr-2012-Integrating a Partial Model into Model Free Reinforcement Learning

12 0.032563705 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

13 0.032158148 57 jmlr-2012-Learning Symbolic Representations of Hybrid Dynamical Systems

14 0.031546682 96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels

15 0.030441845 99 jmlr-2012-Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise

16 0.030370789 80 jmlr-2012-On Ranking and Generalization Bounds

17 0.028089633 43 jmlr-2012-Fast Approximation of Matrix Coherence and Statistical Leverage

18 0.025680587 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices

19 0.025537161 118 jmlr-2012-Variational Multinomial Logit Gaussian Process

20 0.025416266 46 jmlr-2012-Finite-Sample Analysis of Least-Squares Policy Iteration


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.153), (1, 0.071), (2, -0.033), (3, -0.032), (4, 0.003), (5, -0.094), (6, 0.027), (7, 0.006), (8, 0.043), (9, -0.063), (10, -0.049), (11, 0.003), (12, -0.1), (13, 0.035), (14, -0.089), (15, 0.061), (16, -0.187), (17, -0.061), (18, 0.097), (19, -0.037), (20, -0.155), (21, 0.322), (22, -0.074), (23, 0.167), (24, -0.036), (25, -0.053), (26, 0.144), (27, 0.039), (28, -0.001), (29, -0.003), (30, 0.002), (31, -0.093), (32, 0.007), (33, -0.004), (34, -0.057), (35, 0.077), (36, 0.024), (37, 0.07), (38, -0.121), (39, 0.094), (40, -0.104), (41, 0.119), (42, -0.012), (43, 0.094), (44, -0.025), (45, 0.134), (46, -0.017), (47, 0.004), (48, -0.046), (49, -0.13)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94678676 15 jmlr-2012-Algebraic Geometric Comparison of Probability Distributions

Author: Franz J. Király, Paul von Bünau, Frank C. Meinecke, Duncan A.J. Blythe, Klaus-Robert Müller

Abstract: We propose a novel algebraic algorithmic framework for dealing with probability distributions represented by their cumulants such as the mean and covariance matrix. As an example, we consider the unsupervised learning problem of finding the subspace on which several probability distributions agree. Instead of minimizing an objective function involving the estimated cumulants, we show that by treating the cumulants as elements of the polynomial ring we can directly solve the problem, at a lower computational cost and with higher accuracy. Moreover, the algebraic viewpoint on probability distributions allows us to invoke the theory of algebraic geometry, which we demonstrate in a compact proof for an identifiability criterion. Keywords: computational algebraic geometry, approximate algebra, unsupervised Learning

2 0.5148561 73 jmlr-2012-Multi-task Regression using Minimal Penalties

Author: Matthieu Solnon, Sylvain Arlot, Francis Bach

Abstract: In this paper we study the kernel multiple ridge regression framework, which we refer to as multitask regression, using penalization techniques. The theoretical analysis of this problem shows that the key element appearing for an optimal calibration is the covariance matrix of the noise between the different tasks. We present a new algorithm to estimate this covariance matrix, based on the concept of minimal penalty, which was previously used in the single-task regression framework to estimate the variance of the noise. We show, in a non-asymptotic setting and under mild assumptions on the target function, that this estimator converges towards the covariance matrix. Then plugging this estimator into the corresponding ideal penalty leads to an oracle inequality. We illustrate the behavior of our algorithm on synthetic examples. Keywords: multi-task, oracle inequality, learning theory

3 0.4905892 91 jmlr-2012-Plug-in Approach to Active Learning

Author: Stanislav Minsker

Abstract: We present a new active learning algorithm based on nonparametric estimators of the regression function. Our investigation provides probabilistic bounds for the rates of convergence of the generalization error achievable by proposed method over a broad class of underlying distributions. We also prove minimax lower bounds which show that the obtained rates are almost tight. Keywords: active learning, selective sampling, model selection, classification, confidence bands

4 0.37550139 3 jmlr-2012-A Geometric Approach to Sample Compression

Author: Benjamin I.P. Rubinstein, J. Hyam Rubinstein

Abstract: The Sample Compression Conjecture of Littlestone & Warmuth has remained unsolved for a quarter century. While maximum classes (concept classes meeting Sauer’s Lemma with equality) can be compressed, the compression of general concept classes reduces to compressing maximal classes (classes that cannot be expanded without increasing VC dimension). Two promising ways forward are: embedding maximal classes into maximum classes with at most a polynomial increase to VC dimension, and compression via operating on geometric representations. This paper presents positive results on the latter approach and a first negative result on the former, through a systematic investigation of finite maximum classes. Simple arrangements of hyperplanes in hyperbolic space are shown to represent maximum classes, generalizing the corresponding Euclidean result. We show that sweeping a generic hyperplane across such arrangements forms an unlabeled compression scheme of size VC dimension and corresponds to a special case of peeling the one-inclusion graph, resolving a recent conjecture of Kuzmin & Warmuth. A bijection between finite maximum classes and certain arrangements of piecewise-linear (PL) hyperplanes in either a ball or Euclidean space is established. Finally we show that d-maximum classes corresponding to PL-hyperplane arrangements in Rd have cubical complexes homeomorphic to a d-ball, or equivalently complexes that are manifolds with boundary. A main result is that PL arrangements can be swept by a moving hyperplane to unlabeled d-compress any finite maximum class, forming a peeling scheme as conjectured by Kuzmin & Warmuth. A corollary is that some d-maximal classes cannot be embedded into any maximum class of VC-dimension d + k, for any constant k. The construction of the PL sweeping involves Pachner moves on the one-inclusion graph, corresponding to moves of a hyperplane across the intersection of d other hyperplanes. This extends the well known Pachner moves for triangulations to c

5 0.35031006 68 jmlr-2012-Minimax Manifold Estimation

Author: Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman

Abstract: We find the minimax rate of convergence in Hausdorff distance for estimating a manifold M of dimension d embedded in RD given a noisy sample from the manifold. Under certain conditions, we show that the optimal rate of convergence is n−2/(2+d) . Thus, the minimax rate depends only on the dimension of the manifold, not on the dimension of the space in which M is embedded. Keywords: manifold learning, minimax estimation

6 0.33026439 49 jmlr-2012-Hope and Fear for Discriminative Training of Statistical Translation Models

7 0.32620367 76 jmlr-2012-Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics

8 0.27964222 57 jmlr-2012-Learning Symbolic Representations of Hybrid Dynamical Systems

9 0.26265612 70 jmlr-2012-Multi-Assignment Clustering for Boolean Data

10 0.24024314 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs

11 0.19858253 96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels

12 0.19839606 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression

13 0.1780449 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables

14 0.17443463 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression

15 0.16909531 6 jmlr-2012-A Model of the Perception of Facial Expressions of Emotion by Humans: Research Overview and Perspectives

16 0.15206908 43 jmlr-2012-Fast Approximation of Matrix Coherence and Statistical Leverage

17 0.15081312 51 jmlr-2012-Integrating a Partial Model into Model Free Reinforcement Learning

18 0.1494752 59 jmlr-2012-Linear Regression With Random Projections

19 0.1490123 100 jmlr-2012-Robust Kernel Density Estimation

20 0.1456238 118 jmlr-2012-Variational Multinomial Logit Gaussian Process


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.01), (18, 0.396), (21, 0.031), (26, 0.04), (27, 0.014), (29, 0.04), (35, 0.015), (49, 0.023), (56, 0.011), (57, 0.013), (69, 0.019), (75, 0.052), (77, 0.02), (79, 0.021), (92, 0.108), (96, 0.106)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.71851039 15 jmlr-2012-Algebraic Geometric Comparison of Probability Distributions

Author: Franz J. Király, Paul von Bünau, Frank C. Meinecke, Duncan A.J. Blythe, Klaus-Robert Müller

Abstract: We propose a novel algebraic algorithmic framework for dealing with probability distributions represented by their cumulants such as the mean and covariance matrix. As an example, we consider the unsupervised learning problem of finding the subspace on which several probability distributions agree. Instead of minimizing an objective function involving the estimated cumulants, we show that by treating the cumulants as elements of the polynomial ring we can directly solve the problem, at a lower computational cost and with higher accuracy. Moreover, the algebraic viewpoint on probability distributions allows us to invoke the theory of algebraic geometry, which we demonstrate in a compact proof for an identifiability criterion. Keywords: computational algebraic geometry, approximate algebra, unsupervised Learning

2 0.39717799 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

Author: Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, Lin Xiao

Abstract: Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work, we present the distributed mini-batch algorithm, a method of converting many serial gradient-based online prediction algorithms into distributed algorithms. We prove a regret bound for this method that is asymptotically optimal for smooth convex loss functions and stochastic inputs. Moreover, our analysis explicitly takes into account communication latencies between nodes in the distributed environment. We show how our method can be used to solve the closely-related distributed stochastic optimization problem, achieving an asymptotically linear speed-up over multiple processors. Finally, we demonstrate the merits of our approach on a web-scale online prediction problem. Keywords: distributed computing, online learning, stochastic optimization, regret bounds, convex optimization

3 0.39682853 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

Author: Sangkyun Lee, Stephen J. Wright

Abstract: Iterative methods that calculate their steps from approximate subgradient directions have proved to be useful for stochastic learning problems over large and streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, the solution often lies on a lowdimensional manifold of parameter space along which the regularizer is smooth. (When an ℓ1 regularizer is used to induce sparsity in the solution, for example, this manifold is defined by the set of nonzero components of the parameter vector.) This paper shows that a regularized dual averaging algorithm can identify this manifold, with high probability, before reaching the solution. This observation motivates an algorithmic strategy in which, once an iterate is suspected of lying on an optimal or near-optimal manifold, we switch to a “local phase” that searches in this manifold, thus converging rapidly to a near-optimal point. Computational results are presented to verify the identification property and to illustrate the effectiveness of this approach. Keywords: regularization, dual averaging, partly smooth manifold, manifold identification

4 0.39382219 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

Author: Matus Telgarsky

Abstract: Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε > 0, O (ln(1/ε)) iterations suffice to produce a predictor with empirical risk ε-close to the infimum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O (ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O (1/ε), with a matching lower bound provided for the logistic loss. Keywords: boosting, convex analysis, weak learnability, coordinate descent, maximum entropy

5 0.39153683 73 jmlr-2012-Multi-task Regression using Minimal Penalties

Author: Matthieu Solnon, Sylvain Arlot, Francis Bach

Abstract: In this paper we study the kernel multiple ridge regression framework, which we refer to as multitask regression, using penalization techniques. The theoretical analysis of this problem shows that the key element appearing for an optimal calibration is the covariance matrix of the noise between the different tasks. We present a new algorithm to estimate this covariance matrix, based on the concept of minimal penalty, which was previously used in the single-task regression framework to estimate the variance of the noise. We show, in a non-asymptotic setting and under mild assumptions on the target function, that this estimator converges towards the covariance matrix. Then plugging this estimator into the corresponding ideal penalty leads to an oracle inequality. We illustrate the behavior of our algorithm on synthetic examples. Keywords: multi-task, oracle inequality, learning theory

6 0.38983786 80 jmlr-2012-On Ranking and Generalization Bounds

7 0.38862485 13 jmlr-2012-Active Learning via Perfect Selective Classification

8 0.38857156 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods

9 0.38802367 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints

10 0.3873356 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class

11 0.38595685 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

12 0.38473755 82 jmlr-2012-On the Necessity of Irrelevant Variables

13 0.38385144 103 jmlr-2012-Sampling Methods for the Nyström Method

14 0.38338137 7 jmlr-2012-A Multi-Stage Framework for Dantzig Selector and LASSO

15 0.38334772 111 jmlr-2012-Structured Sparsity and Generalization

16 0.38266572 117 jmlr-2012-Variable Selection in High-dimensional Varying-coefficient Models with Global Optimality

17 0.38232839 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems

18 0.37983659 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers

19 0.37962836 84 jmlr-2012-Online Submodular Minimization

20 0.37861302 34 jmlr-2012-Dynamic Policy Programming