nips nips2007 nips2007-175 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Qiuhua Liu, Xuejun Liao, Lawrence Carin
Abstract: A semi-supervised multitask learning (MTL) framework is presented, in which M parameterized semi-supervised classifiers, each associated with one of M partially labeled data manifolds, are learned jointly under the constraint of a softsharing prior imposed over the parameters of the classifiers. The unlabeled data are utilized by basing classifier learning on neighborhoods, induced by a Markov random walk over a graph representation of each manifold. Experimental results on real data sets demonstrate that semi-supervised MTL yields significant improvements in generalization performance over either semi-supervised single-task learning (STL) or supervised MTL. 1
Reference: text
sentIndex sentText sentNum sentScore
1 The unlabeled data are utilized by basing classifier learning on neighborhoods, induced by a Markov random walk over a graph representation of each manifold. [sent-2, score-0.167]
2 Experimental results on real data sets demonstrate that semi-supervised MTL yields significant improvements in generalization performance over either semi-supervised single-task learning (STL) or supervised MTL. [sent-3, score-0.201]
3 1 Introduction Supervised learning has proven an effective technique for learning a classifier when the quantity of labeled data is large enough to represent a sufficient sample from the true labeling function. [sent-4, score-0.165]
4 Unfortunately, a generous provision of labeled data is often not available since acquiring the label of a datum is expensive in many applications. [sent-5, score-0.197]
5 A classifier supervised by a limited amount of labeled data is known to generalize poorly even if it produces zero training errors. [sent-6, score-0.316]
6 There has been much recent work on improving the generalization of classifiers based on using information sources beyond the labeled data. [sent-7, score-0.138]
7 The former employs the information from the data manifold, in which the manifold information provided by the usually abundant unlabeled data is exploited, while the latter leverages information from related tasks. [sent-9, score-0.236]
8 In this paper we attempt to integrate the benefits offered by semi-supervised learning and MTL, by proposing semi-supervised multitask learning. [sent-10, score-0.143]
9 Each classifier provides the solution for a partially labeled data classification task. [sent-12, score-0.195]
10 The solutions for the M tasks are obtained simultaneously under the unified framework. [sent-13, score-0.135]
11 Since the label is a local property of the associated data point, information sharing must be performed at the level of data locations, instead of at the task level. [sent-16, score-0.31]
12 The inductive algorithm in [10] employs a data-dependent prior to encode manifold information. [sent-17, score-0.198]
13 First, the formulation has a parametric classifier built for each task, thus multitask learning can be performed efficiently at the task level, using the parameters of the classifiers. [sent-20, score-0.205]
14 Second, the formulation encodes the manifold information of each task inside the associated likelihood function, sparing the prior for exclusive use by the information from related tasks. [sent-21, score-0.265]
15 Third, the formulation lends itself to a Dirichlet process, allowing the tasks to share information in a complex manner. [sent-22, score-0.182]
16 In the MTL setting, we have M partially labeled data manifolds, each defining a classification task and involving design of a semi-supervised classifier. [sent-24, score-0.254]
17 The M classifiers are designed simultaneously within a unified sharing structure. [sent-25, score-0.14]
18 The key component of the sharing structure is a soft variant of the Dirichlet process (DP), which implements a soft-sharing prior over the parameters of all classifiers. [sent-26, score-0.242]
19 The soft-DP retains the clustering property of DP and yet does not require exact sharing of parameters, which increases flexibility and promotes robustness in information sharing. [sent-27, score-0.14]
20 5 xi − xj 2 /σi ), where · is the Euclidean norm and σij > 0. [sent-32, score-0.121]
21 A Markov random walk on graph G = (X , W) is characterized by a matrix of one-step transition probabilities A = [aij ]n×n , where aij is the probability of transiting from xi to xj via a single step w and is given by aij = n ij w [4]. [sent-33, score-0.278]
22 Then (i, j)-th element bij represents k=1 ik the probability of transiting from xi to xj in t steps. [sent-35, score-0.308]
23 Data point xj is said to be a t-step neighbor of xi if bij > 0. [sent-36, score-0.238]
24 The t-step neighborhood of xi , denoted as Nt (xi ), is defined by all t-step neighbors of xi along with the associated t-step transition probabilities, i. [sent-37, score-0.196]
25 A rule of choosing t is given in [12], based on maximizing the margin of the associated classifier on both labeled and unlabeled data points. [sent-41, score-0.29]
26 The σi in specifying wij represents the step-size (distance traversed in a single step) for xi to reach its immediate neighbor, and we have used a distinct σ for each data point. [sent-42, score-0.126]
27 Location-dependent step-sizes allow one to account for possible heterogeneities in the data manifold — at locations with dense data distributions a small step-size is suitable, while at locations with sparse data distributions a large step-size is appropriate. [sent-43, score-0.163]
28 2 Formulation of the PNBC Classifier Let p∗ (yi |xi , θ) be a base classifier parameterized by θ, which gives the probability of class label yi of data point xi , given xi alone (which is a zero-step neighborhood of xi ). [sent-47, score-0.487]
29 The base classifier can be implemented by any parameterized probabilistic classifier. [sent-48, score-0.116]
30 Let p(yi |Nt (xi ), θ) denote a neighborhood-based classifier parameterized by θ, representing the probability of class label yi for xi , given the neighborhood of xi . [sent-50, score-0.325]
31 The PNBC classifier is defined as a mixture n p(yi |Nt (xi ), θ) = j=1 bij p∗ (yi |xj , θ) (2) where the j-th component is the base classifier applied to (xj , yi ) and the associated mixing proportion is defined by the probability of transiting from xi to xj in t steps. [sent-51, score-0.456]
32 Since the magnitude of bij automatically determines the contribution of xj to the mixture, we let index j run over the entire X for notational simplicity. [sent-52, score-0.195]
33 The utility of unlabeled data in (2) is conspicuous — in order for xi to be labeled yi , each neighbor xj must be labeled consistently with yi , with the strength of consistency proportional to bij ; in such a manner, yi implicitly propagates over the neighborhood of xi . [sent-53, score-1.002]
34 By taking neighborhoods into account, it is possible to obtain an accurate estimate of θ, based on a small amount of labeled data. [sent-54, score-0.185]
35 The over-fitting problem associated with limited labeled data is ameliorated in the PNBC formulation, through enforcing consistent labeling over each neighborhood. [sent-55, score-0.19]
36 Let L ⊆ {1, 2, · · · , n} denote the index set of labeled data in X . [sent-56, score-0.187]
37 Each normal distribution represents the prior transferred from a previous task; it is the metaknowledge indicating how the present task should be learned, based on the experience with a previous task. [sent-62, score-0.169]
38 It is through these normal distributions that information sharing between tasks is enforced. [sent-63, score-0.304]
39 The base distribution represents the baseline prior, which is exclusively used when there are no previous tasks available, as is seen from (5) by setting m = 1. [sent-65, score-0.205]
40 When there are m − 1 previous α tasks, one uses the baseline prior with probability α+m−1 , and uses the prior transferred from each 1 of the m − 1 previous tasks with probability α+m−1 . [sent-66, score-0.268]
41 The role of baseline prior decreases as m increases, which is in agreement with our intuition, since the information from previous tasks increase with m. [sent-68, score-0.187]
42 While the Dirac delta requires two tasks to have exactly the same θ when sharing occurs, the soft delta only requires sharing tasks to have similar θ’s. [sent-74, score-0.712]
43 The soft sharing may therefore be more consistent with situations in practical applications. [sent-75, score-0.19]
44 Under the sharing prior in (4), the current task is equally influenced by each previous task but is influenced unevenly by future tasks — a distant future task has less influence than a near future task. [sent-78, score-0.504]
45 The ordering of the tasks imposed by (4) may in principle affect performance, although we have not found this to be an issue in the experimental results. [sent-79, score-0.157]
46 Alternatively, one may obtain a sharing prior that does not depend on task ordering, by modifying (5) as 1 p(θm |θ−m ) = α+M −1 αp(θm |Υ) + l=m N (θm ; θl , η 2 I) (6) where θ−m = {θ1 , · · · , θM } \ {θm }. [sent-80, score-0.251]
47 Note that the neighborhoods are built for each task independently of other tasks, thus a random walk is always restricted to the same task (the one where the starting data point belongs) and can never traverse multiple tasks. [sent-84, score-0.232]
48 As seen from (5), the prior tends to have similar θ’s across tasks (similar θ’s increase the prior); however sharing between unrelated tasks is discouraged, since each task requires a distinct θ to make its likelihood large. [sent-86, score-0.521]
49 As a result, to make the prior and the likelihood large at the same time, one must let related tasks have similar θ’s. [sent-87, score-0.187]
50 Utilization of the manifold information and the information from related tasks has greatly reduced the hypothesis space. [sent-90, score-0.217]
51 Therefore, point MAP estimation in semi-supervised MTL will not suffer as much from overfitting as in supervised STL. [sent-91, score-0.151]
52 2, where semi-supervised MTL outperforms both supervised MTL and supervised STL, although the former is based on MAP and the latter two are based on Bayesian learning. [sent-93, score-0.302]
53 With MAP estimation, one obtains the parameters of the base classifier in (1) for each task, which can be employed to predict the class label of any data point in the associated task, regardless of whether the data point has been seen during training. [sent-94, score-0.202]
54 In the special case when predictions are desired only for the unlabeled data points seen during training (transductive learning), one can alternatively employ the PNBC classifier in (2) to perform the predictions. [sent-95, score-0.127]
55 Then we demonstrate the performance improvements achieved by semi-supervised MTL, relative to semi-supervised STL and supervised MTL. [sent-97, score-0.174]
56 Throughout this section, the base classifier in (1) is logistic regression. [sent-98, score-0.149]
57 86 80 Accuracy on unlabeled data Accuracy on unlabeled data 0. [sent-111, score-0.254]
58 5 0 20 40 60 80 100 Number of Unlabeled Samples 120 30 labeled samples 0. [sent-126, score-0.138]
59 9 Accuracy on separated test data 10 labeled samples 0. [sent-133, score-0.192]
60 The evaluation is performed in comparison to four existing semi-supervised learning algorithms, namely, the transductive SVM [9], the algorithm of Szummer & Jaakkola [12], GRF [15], and Logistic GRF [10]. [sent-144, score-0.14]
61 In the transductive mode, the test data are the unlabeled data that are used in training the semi-supervised algorithms; in the inductive mode, the test data are a set of holdout data unseen during training. [sent-147, score-0.412]
62 In the transductive mode, we randomly sample XL ⊂ X and assume the associated class labels YL are available; the semi-supervised algorithms are trained by X ∪ YL and tested on X \ XL . [sent-150, score-0.192]
63 In the inductive mode, we randomly sample two disjoint data subsets XL ⊂ X and XU ⊂ X , and assume the class labels YL associated with XL are available; the semisupervised algorithms are trained by XL ∪ YL ∪ XU and tested on 200 data randomly sampled from X \ (XL ∪ XU ). [sent-151, score-0.17]
64 The algorithm of Szummer & Jaakkola [12] and the PNBC use σi = minj xi − xj /3 and t = 100; learning of the PNBC is based on MAP estimation. [sent-153, score-0.121]
65 Each curve in the figures is a result averaged from T independent trials, with T = 20 for the transductive results and T = 50 for the inductive results. [sent-154, score-0.204]
66 In the inductive case, the comparison is between the proposed algorithm and the Logistic GRF, as the others are transductive algorithms. [sent-155, score-0.204]
67 For the PNBC, we can either use the base classifier in (1) or the PNBC classifier in (2) to predict the labels of unlabeled data seen in training (the transductive mode). [sent-156, score-0.364]
68 In the inductive mode, however, the {bij } are not available for the test data (unseen in training) since they are not in the graph representation, therefore we can only employ the base classifier. [sent-157, score-0.161]
69 Figures 1 and 2 show that the PNBC outperforms all the competing algorithms in general, regardless of the number of labeled data points. [sent-159, score-0.165]
70 As indicated in Figure 1(c), employing manifold information in testing by using (2) can improve classification accuracy in the transductive learning case. [sent-161, score-0.244]
71 Figure 2 also shows that the advantage of the PNBC becomes more conspicuous with decreasing amount of labeled data considered during training. [sent-163, score-0.192]
72 The tasks are ordered as they are when the data are provided to the experimenter (we have randomly permuted the tasks and found the performance does not change much). [sent-170, score-0.319]
73 A separate t-neighborhood is employed to represent the manifold information (consisting of labeled and unlabeled data points) for each task, where the step-size at each data point is one third of the shortest distance to the remaining points and t is set to half the number of data points. [sent-171, score-0.445]
74 The base prior p(θm |Υ) = N (θm ; 0, υ 2 I) and the soft delta is N (θm ; θl , η 2 I), where υ = η = 1. [sent-172, score-0.228]
75 The α balancing the base prior and the soft delta’s is 0. [sent-173, score-0.172]
76 55 20 40 60 80 100 120 Number of Labeled Data in Each Task (a) 140 Index of Landmine Field Average AUC on 19 tasks 0. [sent-181, score-0.135]
77 75 6 8 10 12 14 16 18 2 4 6 8 10 12 14 Index of Landmine Field 16 18 (b) Figure 3: (a) Performance of the semi-supervised MTL algorithm on landmine detection, in comparison to the remaining five algorithms. [sent-182, score-0.118]
78 (b) The Hinton diagram of between-task similarity when there are 140 labeled data in each task. [sent-183, score-0.19]
79 In this problem, there are a total of 29 sets of data, collected from various landmine fields. [sent-185, score-0.119]
80 Of the 19 selected data sets, 1-10 are collected at foliated regions and 11-19 are collected at regions that are bare earth or desert. [sent-195, score-0.161]
81 The AUC averaged over the 19 tasks is presented in Figure 3(a), as a function of the number of labeled data, where each curve represents the mean calculated from the 100 independent trials and the error bars represent the corresponding standard deviations. [sent-198, score-0.332]
82 The results of supervised STL, supervised MTL, and supervised pooling are cited from [13]. [sent-199, score-0.572]
83 Semi-supervised MTL clearly yields the best results up to 80 labeled data points; after that supervised MTL catches up but semi-supervised MTL still outperforms the remaining three algorithms by significant margins. [sent-200, score-0.364]
84 In this example semi-supervised MTL seems relatively insensitive to the amount of labeled data; this may be attributed to the doubly enhanced information provided by the data manifold plus the related tasks, which significantly augment the information available in the limited labeled data. [sent-201, score-0.385]
85 The superiority of supervised pooling over supervised STL on this dataset suggests that there are significant benefits offered by sharing across the tasks, which partially explains why supervised MTL eventually catches up with semi-supervised MTL. [sent-202, score-0.776]
86 We plot in Figure 3(b) the Hinton diagram [8] of the between-task sharing matrix (an average over the 100 trials) found by the semi-supervised MTL when there are 140 labeled data in each task. [sent-203, score-0.33]
87 As seen from Figure 3(b), there is a dominant sharing among tasks 1-10 and another dominant sharing among tasks 11-19. [sent-205, score-0.592]
88 Recall from the beginning of the section that data sets 1-10 are from foliated regions and data sets 11-19 are from regions that are bare earth or desert. [sent-206, score-0.14]
89 Therefore, the sharing is in agreement with the similarity between tasks. [sent-207, score-0.14]
90 Art Images Retrieval We now consider the problem of art image retrieval [14, 13], in which we have a library of 642 art images and want to retrieve the images based on a user’s preference. [sent-208, score-0.16]
91 The preference prediction for each user is treated as a task, with the associated set of ground truth data defined by the images rated by the user. [sent-218, score-0.172]
92 These 69 tasks are used in our experiment to evaluate the performance of semi-supervised MTL. [sent-219, score-0.135]
93 Since two users may give different ratings to exactly the same image, pooling the tasks together can lead to multiple labels for the same data point. [sent-220, score-0.273]
94 For this reason, we exclude supervised pooling and semi-supervised pooling in the performance comparison. [sent-221, score-0.319]
95 The AUC averaged over the 69 tasks is presented in Figure 4, as a function of the number of labeled data (rated images), where each curve represents the mean calculated from the 50 independent trials and the error bars represent the corresponding standard deviations. [sent-235, score-0.359]
96 The results of supervised STL and supervised MTL are cited from [13]. [sent-236, score-0.337]
97 It is noteworthy that the performance improvement achieved by semi-supervised MTL over semi-supervised STL is larger than corresponding improvement achieved by supervised MTL over supervised STL. [sent-238, score-0.302]
98 The greater improvement demonstrates that unlabeled data can be more valuable when used along with multitask learning. [sent-239, score-0.226]
99 The additional utility of unlabeled data can be attributed to its role in helping to find the appropriate sharing between tasks. [sent-240, score-0.267]
100 We have proposed a soft sharing prior, which allows each task to robustly borrow information from related tasks and is amenable to simple point estimation based on maximum a posteriori. [sent-243, score-0.41]
wordName wordTfidf (topN-words)
[('mtl', 0.526), ('pnbc', 0.489), ('stl', 0.252), ('grf', 0.221), ('supervised', 0.151), ('sharing', 0.14), ('transductive', 0.14), ('labeled', 0.138), ('tasks', 0.135), ('bij', 0.117), ('classi', 0.114), ('unlabeled', 0.1), ('multitask', 0.099), ('landmine', 0.095), ('er', 0.092), ('lm', 0.088), ('pooling', 0.084), ('manifold', 0.082), ('szummer', 0.082), ('logistic', 0.079), ('yi', 0.076), ('nt', 0.074), ('base', 0.07), ('xl', 0.065), ('xi', 0.065), ('inductive', 0.064), ('dirichlet', 0.061), ('task', 0.059), ('ers', 0.058), ('delta', 0.056), ('xj', 0.056), ('dirac', 0.053), ('prior', 0.052), ('auc', 0.051), ('jaakkola', 0.051), ('pima', 0.05), ('semi', 0.05), ('soft', 0.05), ('formulation', 0.047), ('carin', 0.047), ('ionosphere', 0.047), ('transiting', 0.047), ('neighborhoods', 0.047), ('parameterized', 0.046), ('offered', 0.044), ('yl', 0.042), ('art', 0.041), ('mode', 0.041), ('neighborhood', 0.041), ('walk', 0.04), ('rated', 0.039), ('trials', 0.037), ('nm', 0.036), ('aij', 0.035), ('cited', 0.035), ('xm', 0.034), ('wij', 0.034), ('user', 0.034), ('label', 0.032), ('bare', 0.032), ('foliated', 0.032), ('urn', 0.032), ('xu', 0.031), ('hinton', 0.031), ('partially', 0.03), ('normal', 0.029), ('transferred', 0.029), ('retrieval', 0.028), ('xue', 0.027), ('mine', 0.027), ('conspicuous', 0.027), ('wdbc', 0.027), ('polya', 0.027), ('data', 0.027), ('separated', 0.027), ('labels', 0.027), ('amenable', 0.026), ('images', 0.025), ('catches', 0.025), ('diagram', 0.025), ('associated', 0.025), ('map', 0.024), ('collected', 0.024), ('figures', 0.023), ('liao', 0.023), ('element', 0.023), ('improvements', 0.023), ('remaining', 0.023), ('accuracy', 0.022), ('bars', 0.022), ('preference', 0.022), ('imposed', 0.022), ('experimenter', 0.022), ('earth', 0.022), ('index', 0.022), ('dp', 0.022), ('dominant', 0.021), ('svm', 0.021), ('employed', 0.021), ('receiver', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 175 nips-2007-Semi-Supervised Multitask Learning
Author: Qiuhua Liu, Xuejun Liao, Lawrence Carin
Abstract: A semi-supervised multitask learning (MTL) framework is presented, in which M parameterized semi-supervised classifiers, each associated with one of M partially labeled data manifolds, are learned jointly under the constraint of a softsharing prior imposed over the parameters of the classifiers. The unlabeled data are utilized by basing classifier learning on neighborhoods, induced by a Markov random walk over a graph representation of each manifold. Experimental results on real data sets demonstrate that semi-supervised MTL yields significant improvements in generalization performance over either semi-supervised single-task learning (STL) or supervised MTL. 1
2 0.34139207 134 nips-2007-Multi-Task Learning via Conic Programming
Author: Tsuyoshi Kato, Hisashi Kashima, Masashi Sugiyama, Kiyoshi Asai
Abstract: When we have several related tasks, solving them simultaneously is shown to be more effective than solving them individually. This approach is called multi-task learning (MTL) and has been studied extensively. Existing approaches to MTL often treat all the tasks as uniformly related to each other and the relatedness of the tasks is controlled globally. For this reason, the existing methods can lead to undesired solutions when some tasks are not highly related to each other, and some pairs of related tasks can have significantly different solutions. In this paper, we propose a novel MTL algorithm that can overcome these problems. Our method makes use of a task network, which describes the relation structure among tasks. This allows us to deal with intricate relation structures in a systematic way. Furthermore, we control the relatedness of the tasks locally, so all pairs of related tasks are guaranteed to have similar solutions. We apply the above idea to support vector machines (SVMs) and show that the optimization problem can be cast as a second order cone program, which is convex and can be solved efficiently. The usefulness of our approach is demonstrated through simulations with protein super-family classification and ordinal regression problems.
3 0.12299958 166 nips-2007-Regularized Boost for Semi-Supervised Learning
Author: Ke Chen, Shihai Wang
Abstract: Semi-supervised inductive learning concerns how to learn a decision rule from a data set containing both labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes local smoothness constraints among data into account during ensemble learning. In this paper, we introduce a local smoothness regularizer to semi-supervised boosting algorithms based on the universal optimization framework of margin cost functionals. Our regularizer is applicable to existing semi-supervised boosting algorithms to improve their generalization and speed up their training. Comparative results on synthetic, benchmark and real world tasks demonstrate the effectiveness of our local smoothness regularizer. We discuss relevant issues and relate our regularizer to previous work. 1
4 0.1155507 69 nips-2007-Discriminative Batch Mode Active Learning
Author: Yuhong Guo, Dale Schuurmans
Abstract: Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance to label at a time while retraining in each iteration. Recently a few batch mode active learning approaches have been proposed that select a set of most informative unlabeled instances in each iteration under the guidance of heuristic scores. In this paper, we propose a discriminative batch mode active learning approach that formulates the instance selection task as a continuous optimization problem over auxiliary instance selection variables. The optimization is formulated to maximize the discriminative classification performance of the target classifier, while also taking the unlabeled data into account. Although the objective is not convex, we can manipulate a quasi-Newton method to obtain a good local solution. Our empirical studies on UCI datasets show that the proposed active learning is more effective than current state-of-the art batch mode active learning algorithms. 1
5 0.10589744 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
Author: Geoffrey E. Hinton, Ruslan Salakhutdinov
Abstract: We show how to use unlabeled data and a deep belief net (DBN) to learn a good covariance kernel for a Gaussian process. We first learn a deep generative model of the unlabeled data using the fast, greedy algorithm introduced by [7]. If the data is high-dimensional and highly-structured, a Gaussian kernel applied to the top layer of features in the DBN works much better than a similar kernel applied to the raw input. Performance at both regression and classification can then be further improved by using backpropagation through the DBN to discriminatively fine-tune the covariance kernel.
6 0.10195161 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine
7 0.10033339 186 nips-2007-Statistical Analysis of Semi-Supervised Regression
8 0.088403799 201 nips-2007-The Value of Labeled and Unlabeled Examples when the Model is Imperfect
9 0.087779365 32 nips-2007-Bayesian Co-Training
10 0.082811914 136 nips-2007-Multiple-Instance Active Learning
11 0.081277646 147 nips-2007-One-Pass Boosting
12 0.07731808 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers
13 0.06896849 107 nips-2007-Iterative Non-linear Dimensionality Reduction with Manifold Sculpting
14 0.068908289 135 nips-2007-Multi-task Gaussian Process Prediction
15 0.062000755 97 nips-2007-Hidden Common Cause Relations in Relational Learning
16 0.061902571 39 nips-2007-Boosting the Area under the ROC Curve
17 0.060012888 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators
18 0.059469804 161 nips-2007-Random Projections for Manifold Learning
19 0.058555637 12 nips-2007-A Spectral Regularization Framework for Multi-Task Structure Learning
20 0.056838572 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
topicId topicWeight
[(0, -0.194), (1, 0.067), (2, -0.144), (3, 0.107), (4, 0.023), (5, 0.098), (6, 0.061), (7, -0.079), (8, 0.117), (9, 0.075), (10, 0.008), (11, -0.038), (12, -0.124), (13, 0.021), (14, 0.028), (15, 0.039), (16, -0.111), (17, -0.05), (18, 0.144), (19, -0.004), (20, -0.077), (21, 0.083), (22, -0.052), (23, -0.03), (24, 0.166), (25, 0.085), (26, 0.153), (27, 0.225), (28, -0.068), (29, 0.153), (30, 0.088), (31, 0.139), (32, 0.108), (33, -0.113), (34, 0.13), (35, 0.145), (36, -0.051), (37, -0.049), (38, -0.086), (39, -0.063), (40, 0.107), (41, 0.079), (42, 0.124), (43, 0.073), (44, -0.08), (45, -0.051), (46, -0.007), (47, -0.219), (48, -0.054), (49, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.92770761 175 nips-2007-Semi-Supervised Multitask Learning
Author: Qiuhua Liu, Xuejun Liao, Lawrence Carin
Abstract: A semi-supervised multitask learning (MTL) framework is presented, in which M parameterized semi-supervised classifiers, each associated with one of M partially labeled data manifolds, are learned jointly under the constraint of a softsharing prior imposed over the parameters of the classifiers. The unlabeled data are utilized by basing classifier learning on neighborhoods, induced by a Markov random walk over a graph representation of each manifold. Experimental results on real data sets demonstrate that semi-supervised MTL yields significant improvements in generalization performance over either semi-supervised single-task learning (STL) or supervised MTL. 1
2 0.77075738 134 nips-2007-Multi-Task Learning via Conic Programming
Author: Tsuyoshi Kato, Hisashi Kashima, Masashi Sugiyama, Kiyoshi Asai
Abstract: When we have several related tasks, solving them simultaneously is shown to be more effective than solving them individually. This approach is called multi-task learning (MTL) and has been studied extensively. Existing approaches to MTL often treat all the tasks as uniformly related to each other and the relatedness of the tasks is controlled globally. For this reason, the existing methods can lead to undesired solutions when some tasks are not highly related to each other, and some pairs of related tasks can have significantly different solutions. In this paper, we propose a novel MTL algorithm that can overcome these problems. Our method makes use of a task network, which describes the relation structure among tasks. This allows us to deal with intricate relation structures in a systematic way. Furthermore, we control the relatedness of the tasks locally, so all pairs of related tasks are guaranteed to have similar solutions. We apply the above idea to support vector machines (SVMs) and show that the optimization problem can be cast as a second order cone program, which is convex and can be solved efficiently. The usefulness of our approach is demonstrated through simulations with protein super-family classification and ordinal regression problems.
3 0.46039724 201 nips-2007-The Value of Labeled and Unlabeled Examples when the Model is Imperfect
Author: Kaushik Sinha, Mikhail Belkin
Abstract: Semi-supervised learning, i.e. learning from both labeled and unlabeled data has received significant attention in the machine learning literature in recent years. Still our understanding of the theoretical foundations of the usefulness of unlabeled data remains somewhat limited. The simplest and the best understood situation is when the data is described by an identifiable mixture model, and where each class comes from a pure component. This natural setup and its implications ware analyzed in [11, 5]. One important result was that in certain regimes, labeled data becomes exponentially more valuable than unlabeled data. However, in most realistic situations, one would not expect that the data comes from a parametric mixture distribution with identifiable components. There have been recent efforts to analyze the non-parametric situation, for example, “cluster” and “manifold” assumptions have been suggested as a basis for analysis. Still, a satisfactory and fairly complete theoretical understanding of the nonparametric problem, similar to that in [11, 5] has not yet been developed. In this paper we investigate an intermediate situation, when the data comes from a probability distribution, which can be modeled, but not perfectly, by an identifiable mixture distribution. This seems applicable to many situation, when, for example, a mixture of Gaussians is used to model the data. the contribution of this paper is an analysis of the role of labeled and unlabeled data depending on the amount of imperfection in the model.
4 0.44615293 166 nips-2007-Regularized Boost for Semi-Supervised Learning
Author: Ke Chen, Shihai Wang
Abstract: Semi-supervised inductive learning concerns how to learn a decision rule from a data set containing both labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes local smoothness constraints among data into account during ensemble learning. In this paper, we introduce a local smoothness regularizer to semi-supervised boosting algorithms based on the universal optimization framework of margin cost functionals. Our regularizer is applicable to existing semi-supervised boosting algorithms to improve their generalization and speed up their training. Comparative results on synthetic, benchmark and real world tasks demonstrate the effectiveness of our local smoothness regularizer. We discuss relevant issues and relate our regularizer to previous work. 1
5 0.40428612 186 nips-2007-Statistical Analysis of Semi-Supervised Regression
Author: Larry Wasserman, John D. Lafferty
Abstract: Semi-supervised methods use unlabeled data in addition to labeled data to construct predictors. While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our first result shows that some common methods based on regularization using graph Laplacians do not lead to faster minimax rates of convergence. Thus, the estimators that use the unlabeled data do not have smaller risk than the estimators that use only labeled data. We then develop several new approaches that provably lead to improved performance. The statistical tools of minimax analysis are thus used to offer some new perspective on the problem of semi-supervised learning. 1
6 0.38821128 139 nips-2007-Nearest-Neighbor-Based Active Learning for Rare Category Detection
7 0.36858284 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
8 0.36821848 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine
9 0.3677251 69 nips-2007-Discriminative Batch Mode Active Learning
10 0.36254761 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors
11 0.34933558 32 nips-2007-Bayesian Co-Training
12 0.34673193 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition
13 0.34580454 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
14 0.32871881 12 nips-2007-A Spectral Regularization Framework for Multi-Task Structure Learning
15 0.32150611 135 nips-2007-Multi-task Gaussian Process Prediction
16 0.31346688 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
17 0.30912274 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers
18 0.29569328 109 nips-2007-Kernels on Attributed Pointsets with Applications
19 0.29438278 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting
20 0.28247473 147 nips-2007-One-Pass Boosting
topicId topicWeight
[(5, 0.041), (13, 0.023), (16, 0.016), (18, 0.016), (19, 0.014), (21, 0.094), (24, 0.231), (31, 0.018), (34, 0.019), (35, 0.07), (47, 0.069), (49, 0.016), (83, 0.138), (85, 0.024), (87, 0.028), (90, 0.059), (98, 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.79592979 175 nips-2007-Semi-Supervised Multitask Learning
Author: Qiuhua Liu, Xuejun Liao, Lawrence Carin
Abstract: A semi-supervised multitask learning (MTL) framework is presented, in which M parameterized semi-supervised classifiers, each associated with one of M partially labeled data manifolds, are learned jointly under the constraint of a softsharing prior imposed over the parameters of the classifiers. The unlabeled data are utilized by basing classifier learning on neighborhoods, induced by a Markov random walk over a graph representation of each manifold. Experimental results on real data sets demonstrate that semi-supervised MTL yields significant improvements in generalization performance over either semi-supervised single-task learning (STL) or supervised MTL. 1
2 0.78069985 151 nips-2007-Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs
Author: Ambuj Tewari, Peter L. Bartlett
Abstract: We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). OLP uses its experience so far to estimate the MDP. It chooses actions by optimistically maximizing estimated future rewards over a set of next-state transition probabilities that are close to the estimates, a computation that corresponds to solving linear programs. We show that the total expected reward obtained by OLP up to time T is within C(P ) log T of the reward obtained by the optimal policy, where C(P ) is an explicit, MDP-dependent constant. OLP is closely related to an algorithm proposed by Burnetas and Katehakis with four key differences: OLP is simpler, it does not require knowledge of the supports of transition probabilities, the proof of the regret bound is simpler, but our regret bound is a constant factor larger than the regret of their algorithm. OLP is also similar in flavor to an algorithm recently proposed by Auer and Ortner. But OLP is simpler and its regret bound has a better dependence on the size of the MDP. 1
3 0.65060198 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
Author: Kai Yu, Wei Chu
Abstract: This paper aims to model relational data on edges of networks. We describe appropriate Gaussian Processes (GPs) for directed, undirected, and bipartite networks. The inter-dependencies of edges can be effectively modeled by adapting the GP hyper-parameters. The framework suggests an intimate connection between link prediction and transfer learning, which were traditionally two separate research topics. We develop an efficient learning algorithm that can handle a large number of observations. The experimental results on several real-world data sets verify superior learning capacity. 1
4 0.6436727 156 nips-2007-Predictive Matrix-Variate t Models
Author: Shenghuo Zhu, Kai Yu, Yihong Gong
Abstract: It is becoming increasingly important to learn from a partially-observed random matrix and predict its missing elements. We assume that the entire matrix is a single sample drawn from a matrix-variate t distribution and suggest a matrixvariate t model (MVTM) to predict those missing elements. We show that MVTM generalizes a range of known probabilistic models, and automatically performs model selection to encourage sparse predictive models. Due to the non-conjugacy of its prior, it is difficult to make predictions by computing the mode or mean of the posterior distribution. We suggest an optimization method that sequentially minimizes a convex upper-bound of the log-likelihood, which is very efficient and scalable. The experiments on a toy data and EachMovie dataset show a good predictive accuracy of the model. 1
5 0.6418035 16 nips-2007-A learning framework for nearest neighbor search
Author: Lawrence Cayton, Sanjoy Dasgupta
Abstract: Can we leverage learning techniques to build a fast nearest-neighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures. 1
6 0.63751656 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations
7 0.63751024 186 nips-2007-Statistical Analysis of Semi-Supervised Regression
8 0.63259494 63 nips-2007-Convex Relaxations of Latent Variable Training
9 0.63252079 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
10 0.63246411 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
11 0.63107592 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
12 0.63020837 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
13 0.63016182 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
14 0.62903064 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
15 0.62901837 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine
16 0.62801075 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
17 0.62773705 97 nips-2007-Hidden Common Cause Relations in Relational Learning
18 0.62708914 69 nips-2007-Discriminative Batch Mode Active Learning
19 0.62679052 196 nips-2007-The Infinite Gamma-Poisson Feature Model
20 0.62559074 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model