jmlr jmlr2013 jmlr2013-55 knowledge-graph by maker-knowledge-mining

55 jmlr-2013-Joint Harmonic Functions and Their Supervised Connections


Source: pdf

Author: Mark Vere Culp, Kenneth Joseph Ryan

Abstract: The cluster assumption had a significant impact on the reasoning behind semi-supervised classification methods in graph-based learning. The literature includes numerous applications where harmonic functions provided estimates that conformed to data satisfying this well-known assumption, but the relationship between this assumption and harmonic functions is not as well-understood theoretically. We investigate these matters from the perspective of supervised kernel classification and provide concrete answers to two fundamental questions. (i) Under what conditions do semisupervised harmonic approaches satisfy this assumption? (ii) If such an assumption is satisfied then why precisely would an observation sacrifice its own supervised estimate in favor of the cluster? First, a harmonic function is guaranteed to assign labels to data in harmony with the cluster assumption if a specific condition on the boundary of the harmonic function is satisfied. Second, it is shown that any harmonic function estimate within the interior is a probability weighted average of supervised estimates, where the weight is focused on supervised kernel estimates near labeled cases. We demonstrate that the uniqueness criterion for harmonic estimators is sensitive when the graph is sparse or the size of the boundary is relatively small. This sets the stage for a third contribution, a new regularized joint harmonic function for semi-supervised learning based on a joint optimization criterion. Mathematical properties of this estimator, such as its uniqueness even when the graph is sparse or the size of the boundary is relatively small, are proven. A main selling point is its ability to operate in circumstances where the cluster assumption may not be fully satisfied on real data by compromising between the purely harmonic and purely supervised estimators. The competitive stature of the new regularized joint harmonic approach is established. Keywords: harmonic function, joint training, cluster assumption, s

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The literature includes numerous applications where harmonic functions provided estimates that conformed to data satisfying this well-known assumption, but the relationship between this assumption and harmonic functions is not as well-understood theoretically. [sent-6, score-0.572]

2 (i) Under what conditions do semisupervised harmonic approaches satisfy this assumption? [sent-8, score-0.303]

3 First, a harmonic function is guaranteed to assign labels to data in harmony with the cluster assumption if a specific condition on the boundary of the harmonic function is satisfied. [sent-10, score-0.656]

4 Second, it is shown that any harmonic function estimate within the interior is a probability weighted average of supervised estimates, where the weight is focused on supervised kernel estimates near labeled cases. [sent-11, score-0.561]

5 We demonstrate that the uniqueness criterion for harmonic estimators is sensitive when the graph is sparse or the size of the boundary is relatively small. [sent-12, score-0.426]

6 This sets the stage for a third contribution, a new regularized joint harmonic function for semi-supervised learning based on a joint optimization criterion. [sent-13, score-0.415]

7 A main selling point is its ability to operate in circumstances where the cluster assumption may not be fully satisfied on real data by compromising between the purely harmonic and purely supervised estimators. [sent-15, score-0.4]

8 The competitive stature of the new regularized joint harmonic approach is established. [sent-16, score-0.366]

9 Keywords: harmonic function, joint training, cluster assumption, semi-supervised learning 1. [sent-17, score-0.37]

10 The definition of a harmonic function depends on two key terms, that is, the boundary (observed labels) and the interior (unlabeled). [sent-39, score-0.379]

11 With a given function estimate on the boundary, the harmonic solution achieves an equilibrium on the interior. [sent-41, score-0.286]

12 Currently, the authors are aware of only one harmonic approach in the semi-supervised literature. [sent-43, score-0.286]

13 This estimator, referred to as the clamped harmonic estimator, sets the boundary equal to its observed labeling. [sent-44, score-0.403]

14 The clamped harmonic estimator in semi-supervised learning was studied and applied to energy optimization (Chapelle et al. [sent-45, score-0.422]

15 Recent work suggests that the clamped harmonic solution also suffers in circumstances where the size of the boundary is much smaller than that of the interior. [sent-52, score-0.403]

16 The main argument is that the harmonic solution converges to the zero function with spikes within the boundary as the size of the interior grows (Nadler et al. [sent-53, score-0.379]

17 The clamped harmonic estimator has been empirically validated to satisfy the cluster assumption, but this, to our knowledge, has not been established rigorously. [sent-66, score-0.457]

18 A key contribution of this work is a condition on the boundary for when any harmonic function is guaranteed to satisfy the cluster assumption. [sent-67, score-0.37]

19 In the case of harmonic functions, we are primarily interested in articulating how these 3722 J OINT H ARMONIC F UNCTIONS approaches compare to supervised local smoothing classifiers. [sent-69, score-0.365]

20 A significant contribution of this work is extensive analysis and development of harmonic functions in this capacity. [sent-70, score-0.286]

21 In this regard, we show that any harmonic function, no matter how the boundary estimator is generated, can be decomposed as the reweighted average of soft local supervised estimates consisting only of unlabeled predictions. [sent-71, score-0.569]

22 Local supervised approaches essentially form a weighted average of this labeled adjacent information even when an unlabeled case has small adjacency to each labeled case. [sent-76, score-0.327]

23 The second type of information, which we term labeled connective, exploits interconnectivity within unlabeled cases to find other unlabeled cases that have stronger adjacency to labeled cases. [sent-77, score-0.286]

24 Harmonic functions propagate the local supervised estimates from unlabeled cases with strong adjacency to some labeled cases to the other unlabeled cases. [sent-78, score-0.309]

25 In short, harmonic functions in semi-supervised learning are purely labeled connective, while local supervised approaches are purely labeled adjacent. [sent-79, score-0.477]

26 Another key contribution of this work is a new harmonic function approach based off of a joint optimization criterion. [sent-80, score-0.335]

27 The benefits of regularization in joint harmonic estimation are empirically assessed with strong results. [sent-84, score-0.335]

28 General results on harmonic functions with regard to the cluster assumption and supervised learning are in Section 4. [sent-88, score-0.4]

29 Section 5 includes the definition of the new regularized joint harmonic function approach and characterization of its mathematical properties. [sent-89, score-0.366]

30 The supervised boundary estimator fL = SLLYL in Display (6) is based on the supervised graph (L, WLL ). [sent-173, score-0.312]

31 This adjacency condition is a stringent requirement, especially when the proportion of labeled observations |L|/n is small, and one might correctly guess that such a rigid requirement is not necessary if a semi-supervised harmonic function approach from Section 4 is taken. [sent-181, score-0.37]

32 ∑ j∈L∪U Wi j Matrix QL provides the case-by-case probability weighted average compromise between the supervised and offset stochastic smoothers that is the semi-supervised stochastic smoother MLL = QL SLL + (I − QL )SLUL (Semi-Supervised Stochastic Smoother). [sent-213, score-0.265]

33 (11) −1 Adjacencies accumulate in semi-supervised graph WLL + ∆LU ∆UU ∆UL due to exactly two types of connectedness among the labeled observations in graph W : (i) supervised L → L and (ii) offset L → U ↔ U → L. [sent-215, score-0.302]

34 The addition of the offset WLUL to WLL achieves the same level of connectedness in L as W , but more importantly introduces offset adjacencies in the semi-supervised graph not found in the supervised graph. [sent-234, score-0.302]

35 The use of harmonic estimation in semi-supervised learning is discussed extensively in its relation to random walks, electrical networks, and energy optimization (Zhu et al. [sent-251, score-0.286]

36 A function h : V → IR is harmonic with respect to a stochastic matrix S if fi = ∑ Siℓ fℓ for each i ∈ U, (13) ℓ∈L∪U where fi = h(i) (Zhu et al. [sent-253, score-0.319]

37 In matrix form, the implication of Equation (13) on a resulting harmonic estimator f ∈ IRn is Sf = SLL fL + SLU fU SUL fL + SUU fU = (S f )L fU . [sent-255, score-0.367]

38 (14) In the case of a harmonic estimator in Display (14), it follows by Display (12) that (S f )L = fL if and only if fL ∈ N (∆⋆ ). [sent-256, score-0.354]

39 In other words, S f = f holds for a harmonic estimator f if and only if LL fL is constant within the connected components of W . [sent-257, score-0.376]

40 This precise concept of when S f = f is in tandem with the practical application of a judiciously chosen harmonic estimator under the cluster assumption studied further in Section 4. [sent-258, score-0.389]

41 A question not addressed in the above discussion is the existence and uniqueness of a harmonic estimator f . [sent-260, score-0.383]

42 So with given estimate fL , a harmonic estimate fU exists, but is not unique because there is an arbitrary choice for a constant labeling within each pure interior connected component. [sent-267, score-0.352]

43 The assumption ρ(SUU ) < 1 used throughout most of Sections 3 and 4 avoids this arbitrary nature of harmonic estimators when ρ(SUU ) = 1. [sent-268, score-0.311]

44 The maximum principle states that a harmonic solution is bounded above and below by the boundary estimate (Doyle and Snell, 1984). [sent-270, score-0.335]

45 The uniqueness principle, which applies in the case of ρ(SUU ) < 1, states that if two harmonic functions are applied with the same boundary estimate fL then they must produce the same interior estimate fU . [sent-271, score-0.408]

46 One thing that is clear from each of these principles is that a harmonic estimate fU of the interior is a function of the boundary estimate fL . [sent-272, score-0.379]

47 Let f be a harmonic function trained from the weighted graph W and response YL . [sent-282, score-0.352]

48 The simplest boundary estimate for a harmonic estimator is the clamped harmonic estimator fL = YL (Zhu et al. [sent-291, score-0.825]

49 The clamped harmonic estimator can be motivated as solving min (YL − fL )T (YL − fL ) fL to obtain the boundary estimator fL = YL and then enforcing Equation (15) to define a harmonic estimator by setting fU = (I − SUU )−1 SUL fL . [sent-293, score-0.893]

50 This is not the only possible harmonic estimator because one can use any boundary estimator to develop a harmonic estimator. [sent-294, score-0.757]

51 The solution to Optimization (17) is a harmonic function with the boundary estimate fL = MLLYL from Section 3. [sent-296, score-0.335]

52 The reason why Optimization (17) produces a harmonic function can be seen by studying the optimization of a generalized labeled loss function with penalty min L (YL , fL ) + η f T ∆ f , f (18) where L(YL ,YL ) ≤ L(YL , fL ) for any fL . [sent-298, score-0.342]

53 In general, the clamped harmonic estimator is not LL necessarily optimal among harmonic estimators. [sent-302, score-0.708]

54 Any harmonic estimator with boundary fL has the form fU = (I − SUU )−1 SUL fL . [sent-316, score-0.403]

55 (21) Equation (21) shows that any semi-supervised harmonic function is a probability weighted average of the soft supervised estimators of U, that is, fUi = ∑ Pi j f j , j∈U where the weights come from the stochastic matrix −1 P = (I − SUU )−1 DUU DUL . [sent-320, score-0.454]

56 Boundary L is treated as an absorbing state, and the harmonic estimator of the interior is fUi = eT fU = i ∞ −1 k ∑ eT SUU DUU DUL i fU with elementary vector ei . [sent-324, score-0.398]

57 In particular, the regularized joint harmonic estimator is the solution to T min (Y (YU ) − f )T W (Y (YU ) − f ) + f T ∆ f + γYU YU (Joint Optimization Problem). [sent-353, score-0.434]

58 YU , f (24) The regularized joint harmonic estimator, given in Proposition 12, includes an estimator for both YU and f . [sent-354, score-0.434]

59 The form of the f portion of this estimator is established as harmonic when γ = 0 in Section T 5. [sent-355, score-0.354]

60 By Proposition 11, W 0 is a sufficient condition for the uniqueness of the joint harmonic estimator when γ > 0, but the added condition (∆S)UU ≻ 0 from Proposition 12 is needed if γ = 0. [sent-366, score-0.432]

61 1 Joint Harmonic Estimator γ = 0 Here the joint harmonic function requires (∆S)UU ≻ 0 for its uniqueness (see Proposition 12). [sent-375, score-0.364]

62 LL Conditions from Proposition 16 are necessary and sufficient for the existence of the smoother ⋆ ⋆ ΓLL = (WLL + ∆⋆ )−1 WLL (Joint Harmonic Smoother), LL (25) and Theorem 18 states that smoother ΓLL is that for the joint harmonic estimator. [sent-379, score-0.455]

63 The work connecting the interior of a harmonic estimator to supervised estimators in Section 4. [sent-382, score-0.502]

64 2 now applies to the joint harmonic estimator, that is, in particular recall fU = P SUL ΓLLYL with P from Display (22). [sent-383, score-0.335]

65 Hence, the harmonic result for labeled loss is still preserved, but the loss function is not independent of the unlabeled data. [sent-387, score-0.415]

66 Furthermore, since LL ⋆ + ∆⋆ )−1 ∆⋆ , ΓLL = I − (WLL LL LL ∆⋆ ν = 0 ⇐⇒ ΓLL ν = ν, LL so the joint harmonic estimator is a cluster assumption classifier (refer to Section 4. [sent-389, score-0.438]

67 In cases when ΓLL is stochastic, the stronger condition that |eT fU | ≤ |eT YL | holds, by the maximum principle of harmonic functions (Doyle and Snell, 1984). [sent-395, score-0.286]

68 i i 3736 J OINT H ARMONIC F UNCTIONS In applications such as those in Sections 6 and 7, assumptions for the uniqueness of the γ = 0 joint harmonic estimator are not likely to be satisfied. [sent-396, score-0.432]

69 On the other hand, the γ > 0 regularized joint harmonic estimators in Section 5. [sent-399, score-0.391]

70 fγ =  − − − ∆UUγ ∆UL ΓLLγ + I − ∆UUγ ∆UU SUL The Theorem 21 decomposition is a compromise between the semi-supervised harmonic estimator (labeled connective) and supervised kernel estimator (labeled adjacent) − fUγ = − ∆UUγ T ∆UL fLγ + − I − ∆UUγ Harmonic Part T ∆UU SULYL . [sent-410, score-0.513]

71 Supervised Part In the case of γ = 0, the harmonic part reduces to the harmonic estimator, and the supervised part equals zero. [sent-411, score-0.651]

72 On the other extreme, as γ → ∞, the harmonic part converges to zero, while the supervised part has limit fγ → f∞ = SY 0 = SLL SUL YL = QL SLL (I − QU ) SUL 3737 YL = QL f L (I − QU ) fU , (31) C ULP AND RYAN Regularized Joint Harmonic Cluster Simulation Spectrum Boundary Plot 0. [sent-412, score-0.365]

73 0 Noise Variation Figure 3: The “two moons” data from Figure 2 with regularized joint harmonic classification boundary curves on left. [sent-423, score-0.415]

74 The black joint harmonic function (γ = 0) and the gray supervised extreme (γ = ∞) borders in the left panel of Figure 3 correspond to the harmonic and supervised borders in Figure 2 as expected. [sent-431, score-0.779]

75 The regularized joint harmonic estimate was computed for each γ and σ over a grid, and unlabeled errors were recorded over this grid assuming the “truth” of a constant labeling by moon in the σ = 0 noiseless data on left. [sent-435, score-0.439]

76 While the joint harmonic function and the supervised solution are optimal for small and large σ, compromise solutions are best for data with an intermediate level of noise. [sent-437, score-0.426]

77 Overall, the regularized joint harmonic estimator is a compromise between the harmonic estimator (which emphasizes unlabeled connectivity to labeled cases) and the supervised estimator (which requires unlabeled adjacency to labeled cases). [sent-438, score-1.245]

78 35 −4 ● 124 124 248 372 495 1114 Number of Labeled Examples (|L|) log (γ) Figure 4: Regularized joint harmonic analyses of the protein data. [sent-450, score-0.36]

79 A number of analyses of W using the regularized joint harmonic estimator are presented in Figure 4. [sent-468, score-0.434]

80 The clamped harmonic and joint harmonic approaches are singular in each of these analyses, whereas the regularization strategy posed for the joint harmonic estimator provides the practical benefit of a well-defined classifier with a unique solution. [sent-470, score-1.092]

81 Furthermore, the protein interaction graph W was observed directly, so there is no tuning parameter for either harmonic estimator. [sent-471, score-0.348]

82 Since ρ(SUU ) = 1, any harmonic estimator is singular. [sent-473, score-0.354]

83 On the other hand, the regularized joint harmonic estimator is applicable with large enough γ so that (∆S)UU + γI is invertible, that is, when ρ(|U|) ((∆S)UU + γI) > 0 in the top panel. [sent-474, score-0.434]

84 The top right panel shows that the spectral radius uniqueness assumption was violated for any harmonic estimator, for example, the clamped or γ = 0, whereas regularization of the joint harmonic approach identified a well-defined classifier. [sent-481, score-0.718]

85 Five-fold cross-validation was used to estimate (λ, γ) for the ˆ regularized joint harmonic function and λ for the clamped harmonic estimator. [sent-487, score-0.72]

86 The clamped harmonic estimator is computationally singular and cannot be computed when ρ(SUU ) ≈ 1 (see Remark 6). [sent-494, score-0.422]

87 The joint harmonic estimator (γ = 0) requires the more stringent assumption (∆S)UU ≻ 0, and it was singular for all λ in the ionosphere 3740 J OINT H ARMONIC F UNCTIONS Uniqueness Measure 0. [sent-500, score-0.431]

88 04 in the ionosphere and thyroid applications were obtainable with the regularized joint harmonic estimator. [sent-531, score-0.419]

89 As expected, a substantial performance gap exists between the regularized joint harmonic estimator and the clamped harmonic estimator. [sent-533, score-0.788]

90 The S3V M also outperformed the clamped harmonic estimator. [sent-534, score-0.354]

91 Semi-supervised performance comparisons (Figure 6) of the regularized joint harmonic approach to the S3V M are consistent with the transductive case (Figure 5), and the variability of the error measure increased in the out-of-sample extension as expected. [sent-576, score-0.389]

92 In short, Figures 5 and 6 include real, low labeled sample size, transductive and semi-supervised applications, and the competitive stature of our proposed regularized joint harmonic estimator holds. [sent-577, score-0.513]

93 Remark 6 Assumptions for the uniqueness of the clamped harmonic and the regularized joint harmonic approaches depend on the denseness or sparseness of the WUL component of similarity graph W . [sent-578, score-0.786]

94 As kernel parameter λ decreases, the off-diagonal elements of W approach 0, and this forces computational zeros in matrix WUL leading to less stable estimators for any harmonic estimator. [sent-581, score-0.324]

95 Regularization within the joint harmonic approach has the key advantage of a unique estimator for any λ > 0. [sent-586, score-0.403]

96 Conclusion Semi-supervised harmonic estimation for graph-based semi-supervised learning was examined theoretically and empirically. [sent-588, score-0.286]

97 In addition, harmonic functions were shown to be weighted averages of local supervised estimators applied to the interior. [sent-591, score-0.407]

98 This work further established that harmonic estimators rely primarily on connectivity within the unlabeled network to form predictions using local supervised estimators; supervised estimates near labeled cases are up-weighted while supervised estimates deep within the network are down-weighted. [sent-592, score-0.689]

99 Another key contribution, the development of the regularized joint harmonic function approach, used a joint optimization criterion with regularization to automate the trade-off between labeled connectivity versus labeled adjacency. [sent-593, score-0.539]

100 Empirical results demonstrated the practical benefit gained by regularization of joint harmonic estimation. [sent-594, score-0.335]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('suu', 0.544), ('harmonic', 0.286), ('wll', 0.278), ('slu', 0.262), ('sul', 0.262), ('uu', 0.255), ('dll', 0.202), ('wul', 0.202), ('fl', 0.185), ('ll', 0.165), ('sll', 0.129), ('yl', 0.121), ('wuu', 0.121), ('dul', 0.113), ('ul', 0.1), ('duu', 0.097), ('supervised', 0.079), ('fu', 0.079), ('wlu', 0.077), ('wlul', 0.077), ('unlabeled', 0.073), ('clamped', 0.068), ('estimator', 0.068), ('ulp', 0.065), ('yu', 0.062), ('smoother', 0.06), ('armonic', 0.06), ('offset', 0.057), ('ryan', 0.056), ('labeled', 0.056), ('joint', 0.049), ('boundary', 0.049), ('schur', 0.049), ('mll', 0.048), ('display', 0.047), ('interior', 0.044), ('oint', 0.043), ('unctions', 0.043), ('graph', 0.037), ('adjacencies', 0.036), ('connectedness', 0.036), ('lul', 0.036), ('proposition', 0.035), ('cluster', 0.035), ('lu', 0.034), ('culp', 0.032), ('regularized', 0.031), ('uniqueness', 0.029), ('adjacency', 0.028), ('dlu', 0.028), ('ionosphere', 0.028), ('laplacian', 0.028), ('protein', 0.025), ('thyroid', 0.025), ('estimators', 0.025), ('auu', 0.024), ('slul', 0.024), ('sy', 0.024), ('ql', 0.024), ('vertices', 0.023), ('transductive', 0.023), ('connected', 0.022), ('complements', 0.021), ('stochastic', 0.02), ('ir', 0.02), ('abney', 0.02), ('doyle', 0.02), ('moons', 0.02), ('chapelle', 0.02), ('adjacent', 0.018), ('irn', 0.017), ('semisupervised', 0.017), ('eigenvalue', 0.017), ('weighted', 0.017), ('remark', 0.016), ('llyl', 0.016), ('mlli', 0.016), ('snell', 0.016), ('sulyl', 0.016), ('vertex', 0.014), ('connective', 0.014), ('soft', 0.014), ('eigenvalues', 0.014), ('matrix', 0.013), ('semi', 0.013), ('complement', 0.013), ('wi', 0.013), ('proteins', 0.012), ('connectivity', 0.012), ('response', 0.012), ('absorption', 0.012), ('fui', 0.012), ('kenneth', 0.012), ('kui', 0.012), ('qlii', 0.012), ('solubility', 0.012), ('vere', 0.012), ('yamanishi', 0.012), ('yli', 0.012), ('compromise', 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 55 jmlr-2013-Joint Harmonic Functions and Their Supervised Connections

Author: Mark Vere Culp, Kenneth Joseph Ryan

Abstract: The cluster assumption had a significant impact on the reasoning behind semi-supervised classification methods in graph-based learning. The literature includes numerous applications where harmonic functions provided estimates that conformed to data satisfying this well-known assumption, but the relationship between this assumption and harmonic functions is not as well-understood theoretically. We investigate these matters from the perspective of supervised kernel classification and provide concrete answers to two fundamental questions. (i) Under what conditions do semisupervised harmonic approaches satisfy this assumption? (ii) If such an assumption is satisfied then why precisely would an observation sacrifice its own supervised estimate in favor of the cluster? First, a harmonic function is guaranteed to assign labels to data in harmony with the cluster assumption if a specific condition on the boundary of the harmonic function is satisfied. Second, it is shown that any harmonic function estimate within the interior is a probability weighted average of supervised estimates, where the weight is focused on supervised kernel estimates near labeled cases. We demonstrate that the uniqueness criterion for harmonic estimators is sensitive when the graph is sparse or the size of the boundary is relatively small. This sets the stage for a third contribution, a new regularized joint harmonic function for semi-supervised learning based on a joint optimization criterion. Mathematical properties of this estimator, such as its uniqueness even when the graph is sparse or the size of the boundary is relatively small, are proven. A main selling point is its ability to operate in circumstances where the cluster assumption may not be fully satisfied on real data by compromising between the purely harmonic and purely supervised estimators. The competitive stature of the new regularized joint harmonic approach is established. Keywords: harmonic function, joint training, cluster assumption, s

2 0.061928492 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut

Author: Jun Wang, Tony Jebara, Shih-Fu Chang

Abstract: Graph-based semi-supervised learning (SSL) methods play an increasingly important role in practical machine learning systems, particularly in agnostic settings when no parametric information or other prior knowledge is available about the data distribution. Given the constructed graph represented by a weight matrix, transductive inference is used to propagate known labels to predict the values of all unlabeled vertices. Designing a robust label diffusion algorithm for such graphs is a widely studied problem and various methods have recently been suggested. Many of these can be formalized as regularized function estimation through the minimization of a quadratic cost. However, most existing label diffusion methods minimize a univariate cost with the classification function as the only variable of interest. Since the observed labels seed the diffusion process, such univariate frameworks are extremely sensitive to the initial label choice and any label noise. To alleviate the dependency on the initial observed labels, this article proposes a bivariate formulation for graph-based SSL, where both the binary label information and a continuous classification function are arguments of the optimization. This bivariate formulation is shown to be equivalent to a linearly constrained Max-Cut problem. Finally an efficient solution via greedy gradient Max-Cut (GGMC) is derived which gradually assigns unlabeled vertices to each class with minimum connectivity. Once convergence guarantees are established, this greedy Max-Cut based SSL is applied on both artificial and standard benchmark data sets where it obtains superior classification accuracy compared to existing state-of-the-art SSL methods. Moreover, GGMC shows robustness with respect to the graph construction method and maintains high accuracy over extensive experiments with various edge linking and weighting schemes. Keywords: graph transduction, semi-supervised learning, bivariate formulation, mixed integer programming, greedy

3 0.035555962 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses

Author: Partha Niyogi

Abstract: Manifold regularization (Belkin et al., 2006) is a geometrically motivated framework for machine learning within which several semi-supervised algorithms have been constructed. Here we try to provide some theoretical understanding of this approach. Our main result is to expose the natural structure of a class of problems on which manifold regularization methods are helpful. We show that for such problems, no supervised learner can learn effectively. On the other hand, a manifold based learner (that knows the manifold or “learns” it from unlabeled examples) can learn with relatively few labeled examples. Our analysis follows a minimax style with an emphasis on finite sample results (in terms of n: the number of labeled examples). These results allow us to properly interpret manifold regularization and related spectral and geometric algorithms in terms of their potential use in semi-supervised learning. Keywords: semi-supervised learning, manifold regularization, graph Laplacian, minimax rates

4 0.031140659 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding

Author: Ery Arias-Castro, Bruno Pelletier

Abstract: Maximum Variance Unfolding is one of the main methods for (nonlinear) dimensionality reduction. We study its large sample limit, providing specific rates of convergence under standard assumptions. We find that it is consistent when the underlying submanifold is isometric to a convex subset, and we provide some simple examples where it fails to be consistent. Keywords: maximum variance unfolding, isometric embedding, U-processes, empirical processes, proximity graphs.

5 0.029318746 29 jmlr-2013-Convex and Scalable Weakly Labeled SVMs

Author: Yu-Feng Li, Ivor W. Tsang, James T. Kwok, Zhi-Hua Zhou

Abstract: In this paper, we study the problem of learning from weakly labeled data, where labels of the training examples are incomplete. This includes, for example, (i) semi-supervised learning where labels are partially known; (ii) multi-instance learning where labels are implicitly known; and (iii) clustering where labels are completely unknown. Unlike supervised learning, learning with weak labels involves a difficult Mixed-Integer Programming (MIP) problem. Therefore, it can suffer from poor scalability and may also get stuck in local minimum. In this paper, we focus on SVMs and propose the W ELL SVM via a novel label generation strategy. This leads to a convex relaxation of the original MIP, which is at least as tight as existing convex Semi-Definite Programming (SDP) relaxations. Moreover, the W ELL SVM can be solved via a sequence of SVM subproblems that are much more scalable than previous convex SDP relaxations. Experiments on three weakly labeled learning tasks, namely, (i) semi-supervised learning; (ii) multi-instance learning for locating regions of interest in content-based information retrieval; and (iii) clustering, clearly demonstrate improved performance, and W ELL SVM is also readily applicable on large data sets. Keywords: weakly labeled data, semi-supervised learning, multi-instance learning, clustering, cutting plane, convex relaxation

6 0.025079759 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows

7 0.021868793 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso

8 0.020946819 11 jmlr-2013-Algorithms for Discovery of Multiple Markov Boundaries

9 0.0208659 64 jmlr-2013-Lovasz theta function, SVMs and Finding Dense Subgraphs

10 0.020824168 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs

11 0.020275736 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach

12 0.020067923 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models

13 0.019613091 92 jmlr-2013-Random Spanning Trees and the Prediction of Weighted Graphs

14 0.019546879 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

15 0.017712276 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization

16 0.016960591 70 jmlr-2013-Maximum Volume Clustering: A New Discriminative Clustering Approach

17 0.015992304 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning

18 0.015715569 96 jmlr-2013-Regularization-Free Principal Curve Estimation

19 0.01515268 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

20 0.014663391 103 jmlr-2013-Sparse Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.086), (1, 0.037), (2, 0.003), (3, -0.025), (4, 0.048), (5, 0.063), (6, -0.002), (7, 0.003), (8, -0.017), (9, -0.092), (10, 0.008), (11, 0.009), (12, -0.026), (13, 0.009), (14, 0.037), (15, 0.081), (16, 0.053), (17, 0.053), (18, -0.003), (19, 0.005), (20, -0.105), (21, -0.006), (22, 0.045), (23, 0.046), (24, 0.004), (25, -0.065), (26, -0.026), (27, -0.027), (28, -0.021), (29, -0.139), (30, -0.113), (31, 0.067), (32, -0.096), (33, -0.111), (34, 0.009), (35, -0.126), (36, -0.182), (37, -0.047), (38, 0.091), (39, 0.013), (40, -0.349), (41, -0.026), (42, -0.035), (43, 0.035), (44, 0.13), (45, -0.089), (46, -0.551), (47, 0.03), (48, 0.061), (49, -0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97104728 55 jmlr-2013-Joint Harmonic Functions and Their Supervised Connections

Author: Mark Vere Culp, Kenneth Joseph Ryan

Abstract: The cluster assumption had a significant impact on the reasoning behind semi-supervised classification methods in graph-based learning. The literature includes numerous applications where harmonic functions provided estimates that conformed to data satisfying this well-known assumption, but the relationship between this assumption and harmonic functions is not as well-understood theoretically. We investigate these matters from the perspective of supervised kernel classification and provide concrete answers to two fundamental questions. (i) Under what conditions do semisupervised harmonic approaches satisfy this assumption? (ii) If such an assumption is satisfied then why precisely would an observation sacrifice its own supervised estimate in favor of the cluster? First, a harmonic function is guaranteed to assign labels to data in harmony with the cluster assumption if a specific condition on the boundary of the harmonic function is satisfied. Second, it is shown that any harmonic function estimate within the interior is a probability weighted average of supervised estimates, where the weight is focused on supervised kernel estimates near labeled cases. We demonstrate that the uniqueness criterion for harmonic estimators is sensitive when the graph is sparse or the size of the boundary is relatively small. This sets the stage for a third contribution, a new regularized joint harmonic function for semi-supervised learning based on a joint optimization criterion. Mathematical properties of this estimator, such as its uniqueness even when the graph is sparse or the size of the boundary is relatively small, are proven. A main selling point is its ability to operate in circumstances where the cluster assumption may not be fully satisfied on real data by compromising between the purely harmonic and purely supervised estimators. The competitive stature of the new regularized joint harmonic approach is established. Keywords: harmonic function, joint training, cluster assumption, s

2 0.39123109 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut

Author: Jun Wang, Tony Jebara, Shih-Fu Chang

Abstract: Graph-based semi-supervised learning (SSL) methods play an increasingly important role in practical machine learning systems, particularly in agnostic settings when no parametric information or other prior knowledge is available about the data distribution. Given the constructed graph represented by a weight matrix, transductive inference is used to propagate known labels to predict the values of all unlabeled vertices. Designing a robust label diffusion algorithm for such graphs is a widely studied problem and various methods have recently been suggested. Many of these can be formalized as regularized function estimation through the minimization of a quadratic cost. However, most existing label diffusion methods minimize a univariate cost with the classification function as the only variable of interest. Since the observed labels seed the diffusion process, such univariate frameworks are extremely sensitive to the initial label choice and any label noise. To alleviate the dependency on the initial observed labels, this article proposes a bivariate formulation for graph-based SSL, where both the binary label information and a continuous classification function are arguments of the optimization. This bivariate formulation is shown to be equivalent to a linearly constrained Max-Cut problem. Finally an efficient solution via greedy gradient Max-Cut (GGMC) is derived which gradually assigns unlabeled vertices to each class with minimum connectivity. Once convergence guarantees are established, this greedy Max-Cut based SSL is applied on both artificial and standard benchmark data sets where it obtains superior classification accuracy compared to existing state-of-the-art SSL methods. Moreover, GGMC shows robustness with respect to the graph construction method and maintains high accuracy over extensive experiments with various edge linking and weighting schemes. Keywords: graph transduction, semi-supervised learning, bivariate formulation, mixed integer programming, greedy

3 0.26924351 11 jmlr-2013-Algorithms for Discovery of Multiple Markov Boundaries

Author: Alexander Statnikov, Nikita I. Lytkin, Jan Lemeire, Constantin F. Aliferis

Abstract: Algorithms for Markov boundary discovery from data constitute an important recent development in machine learning, primarily because they offer a principled solution to the variable/feature selection problem and give insight on local causal structure. Over the last decade many sound algorithms have been proposed to identify a single Markov boundary of the response variable. Even though faithful distributions and, more broadly, distributions that satisfy the intersection property always have a single Markov boundary, other distributions/data sets may have multiple Markov boundaries of the response variable. The latter distributions/data sets are common in practical data-analytic applications, and there are several reasons why it is important to induce multiple Markov boundaries from such data. However, there are currently no sound and efficient algorithms that can accomplish this task. This paper describes a family of algorithms TIE* that can discover all Markov boundaries in a distribution. The broad applicability as well as efficiency of the new algorithmic family is demonstrated in an extensive benchmarking study that involved comparison with 26 state-of-the-art algorithms/variants in 15 data sets from a diversity of application domains. Keywords: Markov boundary discovery, variable/feature selection, information equivalence, violations of faithfulness

4 0.20226055 29 jmlr-2013-Convex and Scalable Weakly Labeled SVMs

Author: Yu-Feng Li, Ivor W. Tsang, James T. Kwok, Zhi-Hua Zhou

Abstract: In this paper, we study the problem of learning from weakly labeled data, where labels of the training examples are incomplete. This includes, for example, (i) semi-supervised learning where labels are partially known; (ii) multi-instance learning where labels are implicitly known; and (iii) clustering where labels are completely unknown. Unlike supervised learning, learning with weak labels involves a difficult Mixed-Integer Programming (MIP) problem. Therefore, it can suffer from poor scalability and may also get stuck in local minimum. In this paper, we focus on SVMs and propose the W ELL SVM via a novel label generation strategy. This leads to a convex relaxation of the original MIP, which is at least as tight as existing convex Semi-Definite Programming (SDP) relaxations. Moreover, the W ELL SVM can be solved via a sequence of SVM subproblems that are much more scalable than previous convex SDP relaxations. Experiments on three weakly labeled learning tasks, namely, (i) semi-supervised learning; (ii) multi-instance learning for locating regions of interest in content-based information retrieval; and (iii) clustering, clearly demonstrate improved performance, and W ELL SVM is also readily applicable on large data sets. Keywords: weakly labeled data, semi-supervised learning, multi-instance learning, clustering, cutting plane, convex relaxation

5 0.19917527 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach

Author: Alon Gonen, Sivan Sabato, Shai Shalev-Shwartz

Abstract: We study pool-based active learning of half-spaces. We revisit the aggressive approach for active learning in the realizable case, and show that it can be made efficient and practical, while also having theoretical guarantees under reasonable assumptions. We further show, both theoretically and experimentally, that it can be preferable to mellow approaches. Our efficient aggressive active learner of half-spaces has formal approximation guarantees that hold when the pool is separable with a margin. While our analysis is focused on the realizable setting, we show that a simple heuristic allows using the same algorithm successfully for pools with low error as well. We further compare the aggressive approach to the mellow approach, and prove that there are cases in which the aggressive approach results in significantly better label complexity compared to the mellow approach. We demonstrate experimentally that substantial improvements in label complexity can be achieved using the aggressive approach, for both realizable and low-error settings. Keywords: active learning, linear classifiers, margin, adaptive sub-modularity

6 0.17245314 13 jmlr-2013-Approximating the Permanent with Fractional Belief Propagation

7 0.16991954 7 jmlr-2013-A Risk Comparison of Ordinary Least Squares vs Ridge Regression

8 0.16776063 67 jmlr-2013-MLPACK: A Scalable C++ Machine Learning Library

9 0.15441507 33 jmlr-2013-Dimension Independent Similarity Computation

10 0.14977229 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning

11 0.13444373 14 jmlr-2013-Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators

12 0.13416812 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding

13 0.12546995 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses

14 0.11671212 92 jmlr-2013-Random Spanning Trees and the Prediction of Weighted Graphs

15 0.11553964 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

16 0.11381287 104 jmlr-2013-Sparse Single-Index Model

17 0.11223649 81 jmlr-2013-Optimal Discovery with Probabilistic Expert Advice: Finite Time Analysis and Macroscopic Optimality

18 0.10562007 51 jmlr-2013-Greedy Sparsity-Constrained Optimization

19 0.099132098 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning

20 0.09545622 41 jmlr-2013-Experiment Selection for Causal Discovery


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.013), (5, 0.088), (6, 0.024), (10, 0.044), (23, 0.017), (57, 0.548), (68, 0.033), (75, 0.032), (85, 0.014), (87, 0.02), (90, 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78894639 55 jmlr-2013-Joint Harmonic Functions and Their Supervised Connections

Author: Mark Vere Culp, Kenneth Joseph Ryan

Abstract: The cluster assumption had a significant impact on the reasoning behind semi-supervised classification methods in graph-based learning. The literature includes numerous applications where harmonic functions provided estimates that conformed to data satisfying this well-known assumption, but the relationship between this assumption and harmonic functions is not as well-understood theoretically. We investigate these matters from the perspective of supervised kernel classification and provide concrete answers to two fundamental questions. (i) Under what conditions do semisupervised harmonic approaches satisfy this assumption? (ii) If such an assumption is satisfied then why precisely would an observation sacrifice its own supervised estimate in favor of the cluster? First, a harmonic function is guaranteed to assign labels to data in harmony with the cluster assumption if a specific condition on the boundary of the harmonic function is satisfied. Second, it is shown that any harmonic function estimate within the interior is a probability weighted average of supervised estimates, where the weight is focused on supervised kernel estimates near labeled cases. We demonstrate that the uniqueness criterion for harmonic estimators is sensitive when the graph is sparse or the size of the boundary is relatively small. This sets the stage for a third contribution, a new regularized joint harmonic function for semi-supervised learning based on a joint optimization criterion. Mathematical properties of this estimator, such as its uniqueness even when the graph is sparse or the size of the boundary is relatively small, are proven. A main selling point is its ability to operate in circumstances where the cluster assumption may not be fully satisfied on real data by compromising between the purely harmonic and purely supervised estimators. The competitive stature of the new regularized joint harmonic approach is established. Keywords: harmonic function, joint training, cluster assumption, s

2 0.56792969 34 jmlr-2013-Distance Preserving Embeddings for General n-Dimensional Manifolds

Author: Nakul Verma

Abstract: Low dimensional embeddings of manifold data have gained popularity in the last decade. However, a systematic finite sample analysis of manifold embedding algorithms largely eludes researchers. Here we present two algorithms that embed a general n-dimensional manifold into Rd (where d only depends on some key manifold properties such as its intrinsic dimension, volume and curvature) that guarantee to approximately preserve all interpoint geodesic distances. Keywords: manifold learning, isometric embeddings, non-linear dimensionality reduction, Nash’s embedding theorem

3 0.20361927 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut

Author: Jun Wang, Tony Jebara, Shih-Fu Chang

Abstract: Graph-based semi-supervised learning (SSL) methods play an increasingly important role in practical machine learning systems, particularly in agnostic settings when no parametric information or other prior knowledge is available about the data distribution. Given the constructed graph represented by a weight matrix, transductive inference is used to propagate known labels to predict the values of all unlabeled vertices. Designing a robust label diffusion algorithm for such graphs is a widely studied problem and various methods have recently been suggested. Many of these can be formalized as regularized function estimation through the minimization of a quadratic cost. However, most existing label diffusion methods minimize a univariate cost with the classification function as the only variable of interest. Since the observed labels seed the diffusion process, such univariate frameworks are extremely sensitive to the initial label choice and any label noise. To alleviate the dependency on the initial observed labels, this article proposes a bivariate formulation for graph-based SSL, where both the binary label information and a continuous classification function are arguments of the optimization. This bivariate formulation is shown to be equivalent to a linearly constrained Max-Cut problem. Finally an efficient solution via greedy gradient Max-Cut (GGMC) is derived which gradually assigns unlabeled vertices to each class with minimum connectivity. Once convergence guarantees are established, this greedy Max-Cut based SSL is applied on both artificial and standard benchmark data sets where it obtains superior classification accuracy compared to existing state-of-the-art SSL methods. Moreover, GGMC shows robustness with respect to the graph construction method and maintains high accuracy over extensive experiments with various edge linking and weighting schemes. Keywords: graph transduction, semi-supervised learning, bivariate formulation, mixed integer programming, greedy

4 0.19641493 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

Author: Alberto N. Escalante-B., Laurenz Wiskott

Abstract: Supervised learning from high-dimensional data, for example, multimedia data, is a challenging task. We propose an extension of slow feature analysis (SFA) for supervised dimensionality reduction called graph-based SFA (GSFA). The algorithm extracts a label-predictive low-dimensional set of features that can be post-processed by typical supervised algorithms to generate the final label or class estimation. GSFA is trained with a so-called training graph, in which the vertices are the samples and the edges represent similarities of the corresponding labels. A new weighted SFA optimization problem is introduced, generalizing the notion of slowness from sequences of samples to such training graphs. We show that GSFA computes an optimal solution to this problem in the considered function space and propose several types of training graphs. For classification, the most straightforward graph yields features equivalent to those of (nonlinear) Fisher discriminant analysis. Emphasis is on regression, where four different graphs were evaluated experimentally with a subproblem of face detection on photographs. The method proposed is promising particularly when linear models are insufficient as well as when feature selection is difficult. Keywords: slow feature analysis, feature extraction, classification, regression, pattern recognition, training graphs, nonlinear dimensionality reduction, supervised learning, implicitly supervised, high-dimensional data, image analysis

5 0.19537349 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization

Author: Yuchen Zhang, John C. Duchi, Martin J. Wainwright

Abstract: We analyze two communication-efficient algorithms for distributed optimization in statistical settings involving large-scale data sets. The first algorithm is a standard averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves √ mean-squared error (MSE) that decays as O (N −1 + (N/m)−2 ). Whenever m ≤ N, this guarantee matches the best possible rate achievable by a centralized algorithm having access to all N samples. The second algorithm is a novel method, based on an appropriate form of bootstrap subsampling. Requiring only a single round of communication, it has mean-squared error that decays as O (N −1 + (N/m)−3 ), and so is more robust to the amount of parallelization. In addition, we show that a stochastic gradient-based method attains mean-squared error decaying as O (N −1 + (N/m)−3/2 ), easing computation at the expense of a potentially slower MSE rate. We also provide an experimental evaluation of our methods, investigating their performance both on simulated data and on a large-scale regression problem from the internet search domain. In particular, we show that our methods can be used to efficiently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which involves logistic regression with N ≈ 2.4 × 108 samples and d ≈ 740,000 covariates. Keywords: distributed learning, stochastic optimization, averaging, subsampling

6 0.19509538 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems

7 0.1947276 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

8 0.19468413 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning

9 0.19349381 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

10 0.19318078 2 jmlr-2013-A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems

11 0.19307442 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

12 0.19221051 41 jmlr-2013-Experiment Selection for Causal Discovery

13 0.19208612 29 jmlr-2013-Convex and Scalable Weakly Labeled SVMs

14 0.19185573 73 jmlr-2013-Multicategory Large-Margin Unified Machines

15 0.19154847 59 jmlr-2013-Large-scale SVD and Manifold Learning

16 0.19093175 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering

17 0.19024783 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs

18 0.19000879 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees

19 0.18993297 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning

20 0.18978636 46 jmlr-2013-GURLS: A Least Squares Library for Supervised Learning