nips nips2006 nips2006-118 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chi-hoon Lee, Shaojun Wang, Feng Jiao, Dale Schuurmans, Russell Greiner
Abstract: We present a novel, semi-supervised approach to training discriminative random fields (DRFs) that efficiently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. We formulate DRF training as a form of MAP estimation that combines conditional loglikelihood on labeled data, given a data-dependent prior, with a conditional entropy regularizer defined on unlabeled data. Although the training objective is no longer concave, we develop an efficient local optimization procedure that produces classifiers that are more accurate than ones based on standard supervised DRF training. We then apply our semi-supervised approach to train DRFs to segment both synthetic and real data sets, and demonstrate significant improvements over supervised DRFs in each case.
Reference: text
sentIndex sentText sentNum sentScore
1 ca Abstract We present a novel, semi-supervised approach to training discriminative random fields (DRFs) that efficiently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. [sent-9, score-0.49]
2 We formulate DRF training as a form of MAP estimation that combines conditional loglikelihood on labeled data, given a data-dependent prior, with a conditional entropy regularizer defined on unlabeled data. [sent-10, score-0.634]
3 Although the training objective is no longer concave, we develop an efficient local optimization procedure that produces classifiers that are more accurate than ones based on standard supervised DRF training. [sent-11, score-0.217]
4 We then apply our semi-supervised approach to train DRFs to segment both synthetic and real data sets, and demonstrate significant improvements over supervised DRFs in each case. [sent-12, score-0.173]
5 Classical Markov random fields (MRFs) [2] follow a traditional generative approach, where one models the joint probability of the observed image along with the hidden label field over the pixels. [sent-15, score-0.086]
6 Discriminative random fields (DRFs) [11, 10], on the other hand, directly model the conditional probability over the pixel label field given an observed image. [sent-16, score-0.182]
7 In this sense, a DRF is equivalent to a conditional random field [12] defined over a 2-D lattice. [sent-17, score-0.122]
8 Following the basic tenet of Vapnik [18], it is natural to anticipate that learning an accurate joint model should be more challenging than learning an accurate conditional model. [sent-18, score-0.185]
9 Indeed, recent experimental evidence shows that DRFs tend to produce more accurate image labeling models than MRFs, in many applications like gesture recognition [15] and object detection [11, 10, 19, 17]. [sent-19, score-0.083]
10 Although DRFs tend to produce superior pixel labellings to MRFs, partly by relaxing the assumption of conditional independence of observed images given the labels, the approach relies more heavily on supervised training. [sent-20, score-0.389]
11 DRF training typically uses labeled image data where each pixel label has been assigned. [sent-21, score-0.222]
12 However, it is considerably more difficult to obtain labeled data for image analysis than for other classification tasks, such as document classification, since hand-labeling the individual pixels of each image is much harder than assigning class labels to objects like text documents. [sent-22, score-0.277]
13 ∗ Work done while at University of Alberta Recently, semi-supervised training has taken on an important new role in many application areas due to the abundance of unlabeled data. [sent-23, score-0.257]
14 Current work on semi-supervised learning for structured predictors [1, 9] has focused primarily on simple sequence prediction tasks where learning and inference can be efficiently performed using standard dynamic programming. [sent-27, score-0.087]
15 Unfortunately, the problem we address is more challenging, since the spatial correlations in a 2-D grid structure create numerous dependency cycles. [sent-28, score-0.096]
16 (1) We inherit the standard advantage of discriminative conditional versus joint model training, while still being able to exploit unlabeled data. [sent-33, score-0.416]
17 (2) The use of unlabeled data enhances our ability to avoid parameter over-fitting and over-estimation in grid based random fields even when using a learner that uses only approximate inference methods. [sent-34, score-0.251]
18 (3) We are still able to model spatial correlations in a 2-D lattice, despite the fact that this introduces dependency cycles in the model. [sent-35, score-0.12]
19 That is, our semi-supervised training procedure can be interpreted as a MAP estimator, where the parameter prior for the model on labeled data is governed by the conditional entropy of the model on unlabeled data. [sent-36, score-0.536]
20 This allows us to learn local potentials that capture spatial correlations while often avoiding local over-estimation. [sent-37, score-0.152]
21 We demonstrate the robustness of our model by applying it to a pixel denoising problem on synthetic images, and also to a challenging real world problem of segmenting tumor in magnetic resonance images. [sent-38, score-0.425]
22 2 Semi-Supervised DRFs (SSDRFs) We formulate a new semi-supervised DRF training principle based on the standard supervised formulation of [11, 10]. [sent-40, score-0.179]
23 Let x be an observed input image, represented by x = {x i }i∈S , where S is a set of the observed image pixels (nodes). [sent-41, score-0.122]
24 Let y = {yi }i∈S be the joint set of labels over all pixels of an image. [sent-42, score-0.129]
25 For example, x might be a magnetic resonance image of a brain and y is a realization of a joint labeling over all pixels that indicates whether each pixel is normal or a tumor. [sent-44, score-0.391]
26 A DRF is a conditional random field defined on the pixel labels, conditioned on the observation x. [sent-48, score-0.182]
27 More explicitly, the joint distribution over the labels y given the observations x is written pθ (y|x) = “X ” XX 1 exp Φw (yi , x) + Ψν (yi , yj , x) Zθ (x) i∈S i∈S j∈N (1) i Here Ni denotes the neighboring pixels of i. [sent-49, score-0.228]
28 Φw (yi , x) = log σ(yi wT hi (x) denotes the node potential at pixel i, which quantifies the belief that the class label is yi for the pre-defined feature 1 vextor hi (x), where σ(t) = 1+e−t . [sent-50, score-0.635]
29 Ψν (yi , yj , x) = yi yj vT µij (x) is an edge potential that captures spatial correlations among neighboring pixels (here, the ones at positions i and j), such that µ ij (x) is the pre-defined feature vector associated with observation x. [sent-51, score-0.573]
30 Z θ (x) is the normalizing factor, also known as a (conditional) partition function, which is Zθ (x) = X y exp “X i∈S Φw (yi , x) + X X Ψν (yi , yj , x) ” (2) i∈S j∈Ni Finally, θ = (w, ν) are the model parameters. [sent-52, score-0.072]
31 When the edge potentials are set to zero, a DRF yields a standard logistic regression classifier. [sent-53, score-0.101]
32 The potentials in a DRF can use properties of the observed image, and thereby relax the conditional independence assumption of MRFs. [sent-54, score-0.17]
33 Moreover, the edge potentials in a DRF can smooth discontinuities between heterogeneous class pixels, and also correct errors made by the node potentials. [sent-55, score-0.115]
34 Assume we have a set of independent labeled images, D l = (x(1) , y(1) )), · · · , (x(M ) , y(M ) ) , and a “ ” set of independent unlabeled images, D u = x(M +1) , · · · , x(T ) . [sent-56, score-0.283]
35 Our goal is to build a DRF model “ ” from the combined set of labeled and unlabeled examples, D l ∪ Du . [sent-57, score-0.283]
36 The standard supervised DRF training procedure is based on maximizing the log of the posterior probability of the labeled examples in D l CL(θ) = M X log P (y(k) |x(k) ) − k=1 νT ν 2τ 2 (3) A Gaussian prior over the edge parameters ν is assumed and a uniform prior over parameters w. [sent-58, score-0.305]
37 There are a few issues regarding the supervised learning criteria (3). [sent-62, score-0.119]
38 Second, the Gaussian prior is data-independent, and is not associated with either the unlabeled or labeled observations a priori. [sent-64, score-0.307]
39 Inspired by the work in [8] and [9], we propose a semi-supervised learning algorithm for DRFs that makes full use of the available data by exploiting a form of entropy regularization as a prior over the parameters on D u . [sent-65, score-0.121]
40 This criterion has been shown to be effective for univariate classification [8], and chain structured CRFs [9]; here we apply it to the 2-D lattice case. [sent-70, score-0.111]
41 Although the criticism of the gradient descent’s principle is well taken, it is the most practical approach we will adopt to optimize the semi-supervised MAP formulation (4) and allows us to improve on standard supervised DRF training. [sent-73, score-0.162]
42 To formulate a local optimization procedure, we need to compute the gradient of the objective (4) with respect to the parameters. [sent-74, score-0.081]
43 ), we are not able to represent the gradient of objective function as compactly as [9], which was able to express the gradient as a product of the covariance matrix of features and the parameter vector θ. [sent-76, score-0.082]
44 Nevertheless, it is straightforward to show that the derivatives of objective function with respect to the node parameters w is given by 1 1 Note that the derivatives of objective function with respect to the edge parameters ν are computed analogously. [sent-77, score-0.139]
45 Given the lattice structure of the joint labels, it is intractable to compute the exact expectation terms in the above derivatives. [sent-79, score-0.09]
46 It is also intractable to compute the conditional partition function Z θ (x). [sent-80, score-0.122]
47 Therefore, as in standard supervised DRFs, we need to incorporate some form of approximation. [sent-81, score-0.119]
48 Assuming the factorization, the true conditional ν entropy and feature expectations can be computed in terms of local conditional distributions. [sent-83, score-0.333]
49 This allows us efficiently to approximate the global conditional entropy over unlabeled data. [sent-84, score-0.406]
50 However, due to the fast and stable performance of this approximation in the supervised case [2, 10] we still employ it, but below show that the over-smoothing effect is mitigated by our data-dependent prior in the MAP objective (4). [sent-86, score-0.179]
51 That is, for the unlabeled data, XU , each time we compute the local conditional covariance (9), we perform inference steps for each node i and its neighboring nodes N i . [sent-88, score-0.457]
52 Our inference is based on iterative conditional modes (ICM) [2], and is given by ∗ yi = argmax P (yi |yNi , X) (10) yi ∈Y where, for each position i, we assume that the labels of all of its neighbors y ∈ Ni are fixed. [sent-89, score-0.637]
53 We could alternatively compute the marginal conditional probability P (y i |X) = yS\i P (yi , yS\i |X) for each node using the sum-product algorithm (i. [sent-90, score-0.157]
54 loopy belief propagation), which iteratively propagates the belief of each node to its neighbors. [sent-92, score-0.122]
55 5 Experiments Using standard supervised DRF models, Kumar and Hebert [11, 10] reported interesting experimental results for joint classification tasks on a 2-D lattice, which represents an image with a DRF model. [sent-95, score-0.205]
56 Since labeling image data is expensive and tedious, we believe that better results could be further obtained by formulating a MAP estimation of DRFs by also using the abundant unlabeled image data. [sent-96, score-0.356]
57 In this section, we present a series of experiments on synthetic and real data sets using our novel semi-supervised DRFs(SSDRFs). [sent-97, score-0.076]
58 In order to evaluate our model, we compare the results with those using maximum likelihood estimation of supervised DRFs [11]. [sent-98, score-0.119]
59 Although there are many accuracy measures available, we used this score to penalize the false negatives since many imaging tasks are very imbalanced: that is, only a small percentage of pixels are in the “positive” class. [sent-101, score-0.111]
60 1 Synthetic image sets Our primary goal in using synthetic data sets was to demonstrate how well different models classified pixels as a binary classification over a 2-D lattice in the presence of noise. [sent-105, score-0.28]
61 The intensities of pixels in each image were independently corrupted by noise generated from a Gaussian N (0, 1). [sent-107, score-0.122]
62 Figure 1 shows the results of using supervised DRFs, as well as semi-supervised DRFs. [sent-108, score-0.119]
63 However, using our semi-supervised DRF as a MAP formulation, we have dramatically improved the performance over standard supervised DRF. [sent-110, score-0.119]
64 Although the ML approach may learn proper parameters from some of data sets, unfortunately its performance has not been consistent since the standard DRF’s learning of the edge potential tends to be overestimated. [sent-112, score-0.076]
65 For instance, the last row shows that overestimating parameters of the DRF segment almost all pixels into a class due to the complicated edges and structures containing non-target area within the target area, while semi-supervised DRF performance is not degraded at all. [sent-113, score-0.066]
66 Overall, by learning more statistics from unlabeled data, our model dominates the standard DRF in most cases. [sent-114, score-0.217]
67 This is because our MAP formulation avoids the overestimate of potentials and uses the edge potential to correct the errors made by the node potential. [sent-115, score-0.155]
68 Note that our model stably converged as we increased the ratio (nU/nL) of unlabeled data sets in our learning, 1 4500 0. [sent-118, score-0.239]
69 2 Brain Tumor Segmentation We have applied our semi-supervised DRF model to the challenging real world problem of segmenting tumor in medical images. [sent-142, score-0.213]
70 Our goal here is to classify each pixel of an magnetic resonance (MR) image into a pre-defined category: tumor and non-tumor. [sent-143, score-0.367]
71 We applied three models to the classification of 9 studies from brain tumor MR images. [sent-145, score-0.167]
72 For each L U S study2 , i, we divided the MR images into Di , Di , and Di , where an MR image (a. [sent-146, score-0.089]
73 As with much of the related work on automatic brain tumor segmentation (such as [7, 21]), our training is based on patient-specific data, where training MR images for a classifier are obtained from the patient to be tested. [sent-150, score-0.374]
74 Note that the training sets and testing sets for a classifier are disjoint. [sent-151, score-0.123]
75 L U S Specifically, LR and DRF takes Di as the training set and Di and Di for testing sets, while SSDRF L U U S takes Di and Di for training and Di and Di for testing. [sent-152, score-0.119]
76 We segmented the “enhancing” tumor area, the region that appears hyper-intense after injecting the contrast agent (we also included non-enhancing areas contained within the enhancing contour). [sent-153, score-0.198]
77 U S Table 1 and 2 present Jaccard scores of testing Di and Di for each study, pi , respectively. [sent-154, score-0.067]
78 While the standard supervised DRF improves over its degenerate model LR by 1%, semi-supervised DRF significantly improves over the supervised DRF by 11%, which is significant at p < 0. [sent-155, score-0.238]
79 Considering the fact that MR images contain much noise and the three modalities are not consistent among slices of the same patient, our improvement is considerable. [sent-157, score-0.098]
80 Figure 3 shows the segmentation results by overlaying the testing slices with segmented outputs from the three models. [sent-158, score-0.153]
81 Each row demonstrates the segmentation for a slice, where the white blob areas for the slice correspond to the enhancing tumor area. [sent-159, score-0.274]
82 6 Conclusion We have proposed a new semi-supervised learning algorithm for DRFs, which was formulated as MAP estimation with conditional entropy over unlabeled data as a data-dependent prior regularization. [sent-160, score-0.43]
83 Our approach is motivated by the information-theoretic argument [8, 16] that unlabeled examples can provide the most benefit when classes have small overlap. [sent-161, score-0.217]
84 We introduced a simple approximation approach for this new learning procedure that exploits the local conditional probability to efficiently compute the derivative of objective function. [sent-162, score-0.199]
85 2 Each study involves a number (typically 21) of images of a single patient – here parallel axial slices through the head. [sent-163, score-0.111]
86 27 Figure 3: From Left to Right: Human Expert, LR, DRF, and SSDRF We have applied this new approach to the problem of image pixel classification tasks. [sent-226, score-0.116]
87 By exploiting the availability of auxiliary unlabeled data, we are able to improve the performance of the state of the art supervised DRF approach. [sent-227, score-0.336]
88 Our semi-supervised DRF approach shares all of the benefits of the standard DRF training, including the ability to exploit arbitrary potentials in the presence of dependency cycles, while improving accuracy through the use of the unlabeled data. [sent-228, score-0.325]
89 The main drawback is the increased training time involved in computing the derivative of the conditional entropy over unlabeled data. [sent-229, score-0.465]
90 Nevertheless, the algorithm is efficient to be trained on unlabeled data sets, and to obtain a significant improvement in classification accuracy over standard supervised training of DRFs as well as iid logistic regression classifiers. [sent-230, score-0.421]
91 To further accelerate the performance with respect to accuracy, we may apply loopy belief propagation [20] or graph-cuts [4] as an inference tool. [sent-231, score-0.089]
92 Since our model is tightly coupled with inference steps during the learning, the proper choice of an inference algorithm will most likely improve segmentation tasks. [sent-232, score-0.122]
93 Kernel based method for segmentation and modeling of magnetic resonance images. [sent-279, score-0.172]
94 Semi-supervised conditional random fields for improved sequence segmentation and labeling. [sent-291, score-0.176]
95 Discriminative random fields: A discriminative framework for contextual interaction in classification. [sent-301, score-0.074]
96 Efficient spatial classification using decoupled conditional ı random fields. [sent-313, score-0.15]
97 Text classification from labeled and unlabeled documents using EM. [sent-320, score-0.283]
98 Accelerated training of conditional random fields with stochastic gradient methods. [sent-350, score-0.185]
99 Tumor segmentation from magnetic resonance imaging by learning via one-class support vector machine. [sent-364, score-0.172]
100 Learning from labeled and unlabeled data on a directed o graph. [sent-379, score-0.283]
wordName wordTfidf (topN-words)
[('drf', 0.71), ('drfs', 0.291), ('yi', 0.224), ('unlabeled', 0.217), ('yni', 0.158), ('ssdrf', 0.146), ('tumor', 0.133), ('hi', 0.132), ('conditional', 0.122), ('supervised', 0.119), ('di', 0.098), ('jaccard', 0.079), ('greiner', 0.073), ('yj', 0.072), ('entropy', 0.067), ('lr', 0.067), ('pixels', 0.066), ('labeled', 0.066), ('mr', 0.064), ('lattice', 0.06), ('resonance', 0.06), ('pixel', 0.06), ('magnetic', 0.058), ('image', 0.056), ('ni', 0.056), ('labellings', 0.055), ('rlp', 0.055), ('segmentation', 0.054), ('synthetic', 0.054), ('hx', 0.054), ('elds', 0.051), ('alberta', 0.051), ('kumar', 0.048), ('potentials', 0.048), ('classi', 0.047), ('discriminative', 0.047), ('slice', 0.044), ('enhancing', 0.043), ('patient', 0.04), ('training', 0.04), ('testing', 0.039), ('mrfs', 0.038), ('slices', 0.038), ('icm', 0.036), ('ssdrfs', 0.036), ('objective', 0.036), ('dependency', 0.036), ('node', 0.035), ('inference', 0.034), ('brain', 0.034), ('wt', 0.034), ('map', 0.034), ('labels', 0.033), ('xx', 0.033), ('images', 0.033), ('challenging', 0.033), ('belief', 0.032), ('correlations', 0.032), ('edge', 0.032), ('jiao', 0.032), ('certainty', 0.032), ('joint', 0.03), ('regularization', 0.03), ('structured', 0.03), ('ys', 0.029), ('scores', 0.028), ('spatial', 0.028), ('neighboring', 0.027), ('labeling', 0.027), ('modalities', 0.027), ('contextual', 0.027), ('tp', 0.027), ('segmenting', 0.027), ('eld', 0.026), ('nips', 0.026), ('dale', 0.025), ('pl', 0.025), ('accuracy', 0.024), ('prior', 0.024), ('cycles', 0.024), ('unfortunately', 0.024), ('predictors', 0.023), ('vishwanathan', 0.023), ('factored', 0.023), ('loopy', 0.023), ('gradient', 0.023), ('sets', 0.022), ('segmented', 0.022), ('local', 0.022), ('univariate', 0.021), ('logistic', 0.021), ('false', 0.021), ('potential', 0.02), ('formulation', 0.02), ('held', 0.02), ('positives', 0.02), ('medical', 0.02), ('icml', 0.019), ('zhou', 0.019), ('derivative', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
Author: Chi-hoon Lee, Shaojun Wang, Feng Jiao, Dale Schuurmans, Russell Greiner
Abstract: We present a novel, semi-supervised approach to training discriminative random fields (DRFs) that efficiently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. We formulate DRF training as a form of MAP estimation that combines conditional loglikelihood on labeled data, given a data-dependent prior, with a conditional entropy regularizer defined on unlabeled data. Although the training objective is no longer concave, we develop an efficient local optimization procedure that produces classifiers that are more accurate than ones based on standard supervised DRF training. We then apply our semi-supervised approach to train DRFs to segment both synthetic and real data sets, and demonstrate significant improvements over supervised DRFs in each case.
2 0.16770935 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
Author: Dong S. Cheng, Vittorio Murino, Mário Figueiredo
Abstract: This paper proposes a new approach to model-based clustering under prior knowledge. The proposed formulation can be interpreted from two different angles: as penalized logistic regression, where the class labels are only indirectly observed (via the probability density of each class); as finite mixture learning under a grouping prior. To estimate the parameters of the proposed model, we derive a (generalized) EM algorithm with a closed-form E-step, in contrast with other recent approaches to semi-supervised probabilistic clustering which require Gibbs sampling or suboptimal shortcuts. We show that our approach is ideally suited for image segmentation: it avoids the combinatorial nature Markov random field priors, and opens the door to more sophisticated spatial priors (e.g., wavelet-based) in a simple and computationally efficient way. Finally, we extend our formulation to work in unsupervised, semi-supervised, or discriminative modes. 1
3 0.13810469 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi
Abstract: Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms. 1
4 0.11231327 150 nips-2006-On Transductive Regression
Author: Corinna Cortes, Mehryar Mohri
Abstract: In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik when applied to classification. It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efficiently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets.
5 0.10168236 104 nips-2006-Large-Scale Sparsified Manifold Regularization
Author: Ivor W. Tsang, James T. Kwok
Abstract: Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns. 1
6 0.10089382 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
7 0.090748943 175 nips-2006-Simplifying Mixture Models through Function Approximation
8 0.077111334 54 nips-2006-Comparative Gene Prediction using Conditional Random Fields
9 0.075549647 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization
10 0.074276082 130 nips-2006-Max-margin classification of incomplete data
11 0.061376326 20 nips-2006-Active learning for misspecified generalized linear models
12 0.061151322 57 nips-2006-Conditional mean field
13 0.057167482 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms
14 0.056529537 201 nips-2006-Using Combinatorial Optimization within Max-Product Belief Propagation
15 0.056326665 126 nips-2006-Logistic Regression for Single Trial EEG Classification
16 0.056109305 186 nips-2006-Support Vector Machines on a Budget
17 0.053920869 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space
18 0.053539943 7 nips-2006-A Local Learning Approach for Clustering
19 0.051657688 42 nips-2006-Bayesian Image Super-resolution, Continued
20 0.051021136 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models
topicId topicWeight
[(0, -0.197), (1, 0.058), (2, 0.062), (3, -0.031), (4, -0.008), (5, 0.095), (6, -0.082), (7, -0.014), (8, -0.012), (9, -0.007), (10, -0.077), (11, -0.129), (12, -0.029), (13, 0.096), (14, -0.015), (15, 0.167), (16, 0.17), (17, -0.032), (18, 0.0), (19, 0.041), (20, 0.002), (21, -0.136), (22, -0.018), (23, 0.101), (24, 0.072), (25, 0.082), (26, 0.068), (27, -0.032), (28, -0.06), (29, -0.013), (30, -0.17), (31, 0.033), (32, -0.057), (33, 0.001), (34, 0.034), (35, 0.087), (36, -0.033), (37, -0.059), (38, 0.021), (39, 0.046), (40, -0.093), (41, -0.025), (42, 0.03), (43, -0.017), (44, -0.003), (45, 0.024), (46, -0.072), (47, 0.036), (48, -0.013), (49, -0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.93162179 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
Author: Chi-hoon Lee, Shaojun Wang, Feng Jiao, Dale Schuurmans, Russell Greiner
Abstract: We present a novel, semi-supervised approach to training discriminative random fields (DRFs) that efficiently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. We formulate DRF training as a form of MAP estimation that combines conditional loglikelihood on labeled data, given a data-dependent prior, with a conditional entropy regularizer defined on unlabeled data. Although the training objective is no longer concave, we develop an efficient local optimization procedure that produces classifiers that are more accurate than ones based on standard supervised DRF training. We then apply our semi-supervised approach to train DRFs to segment both synthetic and real data sets, and demonstrate significant improvements over supervised DRFs in each case.
2 0.70498645 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
Author: Olivier Chapelle, Vikas Sindhwani, S. S. Keerthi
Abstract: Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, globally optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms. 1
3 0.64072359 104 nips-2006-Large-Scale Sparsified Manifold Regularization
Author: Ivor W. Tsang, James T. Kwok
Abstract: Semi-supervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated state-of-the-art performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semi-supervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for large-scale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a center-constrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns. 1
4 0.63980597 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
Author: Dong S. Cheng, Vittorio Murino, Mário Figueiredo
Abstract: This paper proposes a new approach to model-based clustering under prior knowledge. The proposed formulation can be interpreted from two different angles: as penalized logistic regression, where the class labels are only indirectly observed (via the probability density of each class); as finite mixture learning under a grouping prior. To estimate the parameters of the proposed model, we derive a (generalized) EM algorithm with a closed-form E-step, in contrast with other recent approaches to semi-supervised probabilistic clustering which require Gibbs sampling or suboptimal shortcuts. We show that our approach is ideally suited for image segmentation: it avoids the combinatorial nature Markov random field priors, and opens the door to more sophisticated spatial priors (e.g., wavelet-based) in a simple and computationally efficient way. Finally, we extend our formulation to work in unsupervised, semi-supervised, or discriminative modes. 1
5 0.63952786 150 nips-2006-On Transductive Regression
Author: Corinna Cortes, Mehryar Mohri
Abstract: In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik when applied to classification. It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efficiently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets.
6 0.58141583 101 nips-2006-Isotonic Conditional Random Fields and Local Sentiment Flow
7 0.49688241 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms
8 0.49367324 68 nips-2006-Dirichlet-Enhanced Spam Filtering based on Biased Samples
9 0.48732179 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification
10 0.44443473 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
11 0.42169559 54 nips-2006-Comparative Gene Prediction using Conditional Random Fields
12 0.41470954 122 nips-2006-Learning to parse images of articulated bodies
13 0.3949573 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models
14 0.3916679 130 nips-2006-Max-margin classification of incomplete data
15 0.38357636 182 nips-2006-Statistical Modeling of Images with Fields of Gaussian Scale Mixtures
16 0.37043765 129 nips-2006-Map-Reduce for Machine Learning on Multicore
17 0.36874774 178 nips-2006-Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation
18 0.36834601 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization
19 0.35992843 20 nips-2006-Active learning for misspecified generalized linear models
20 0.35824656 199 nips-2006-Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
topicId topicWeight
[(1, 0.092), (3, 0.043), (7, 0.077), (9, 0.038), (22, 0.061), (34, 0.018), (44, 0.078), (48, 0.254), (57, 0.105), (64, 0.013), (65, 0.041), (69, 0.052), (71, 0.013), (90, 0.023)]
simIndex simValue paperId paperTitle
1 0.8300584 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo
Author: Oliver Williams
Abstract: This paper describes a Gaussian process framework for inferring pixel-wise disparity and bi-layer segmentation of a scene given a stereo pair of images. The Gaussian process covariance is parameterized by a foreground-backgroundocclusion segmentation label to model both smooth regions and discontinuities. As such, we call our model a switched Gaussian process. We propose a greedy incremental algorithm for adding observations from the data and assigning segmentation labels. Two observation schedules are proposed: the first treats scanlines as independent, the second uses an active learning criterion to select a sparse subset of points to measure. We show that this probabilistic framework has comparable performance to the state-of-the-art. 1
same-paper 2 0.76038975 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
Author: Chi-hoon Lee, Shaojun Wang, Feng Jiao, Dale Schuurmans, Russell Greiner
Abstract: We present a novel, semi-supervised approach to training discriminative random fields (DRFs) that efficiently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. We formulate DRF training as a form of MAP estimation that combines conditional loglikelihood on labeled data, given a data-dependent prior, with a conditional entropy regularizer defined on unlabeled data. Although the training objective is no longer concave, we develop an efficient local optimization procedure that produces classifiers that are more accurate than ones based on standard supervised DRF training. We then apply our semi-supervised approach to train DRFs to segment both synthetic and real data sets, and demonstrate significant improvements over supervised DRFs in each case.
3 0.6087566 34 nips-2006-Approximate Correspondences in High Dimensions
Author: Kristen Grauman, Trevor Darrell
Abstract: Pyramid intersection is an efficient method for computing an approximate partial matching between two sets of feature vectors. We introduce a novel pyramid embedding based on a hierarchy of non-uniformly shaped bins that takes advantage of the underlying structure of the feature space and remains accurate even for sets with high-dimensional feature vectors. The matching similarity is computed in linear time and forms a Mercer kernel. Whereas previous matching approximation algorithms suffer from distortion factors that increase linearly with the feature dimension, we demonstrate that our approach can maintain constant accuracy even as the feature dimension increases. When used as a kernel in a discriminative classifier, our approach achieves improved object recognition results over a state-of-the-art set kernel. 1
4 0.60166776 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
Author: Wolf Kienzle, Felix A. Wichmann, Matthias O. Franz, Bernhard Schölkopf
Abstract: This paper addresses the bottom-up influence of local image information on human eye movements. Most existing computational models use a set of biologically plausible linear filters, e.g., Gabor or Difference-of-Gaussians filters as a front-end, the outputs of which are nonlinearly combined into a real number that indicates visual saliency. Unfortunately, this requires many design parameters such as the number, type, and size of the front-end filters, as well as the choice of nonlinearities, weighting and normalization schemes etc., for which biological plausibility cannot always be justified. As a result, these parameters have to be chosen in a more or less ad hoc way. Here, we propose to learn a visual saliency model directly from human eye movement data. The model is rather simplistic and essentially parameter-free, and therefore contrasts recent developments in the field that usually aim at higher prediction rates at the cost of additional parameters and increasing model complexity. Experimental results show that—despite the lack of any biological prior knowledge—our model performs comparably to existing approaches, and in fact learns image features that resemble findings from several previous studies. In particular, its maximally excitatory stimuli have center-surround structure, similar to receptive fields in the early human visual system. 1
5 0.6011923 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation
Author: David B. Grimes, Daniel R. Rashid, Rajesh P. Rao
Abstract: Learning by imitation represents an important mechanism for rapid acquisition of new behaviors in humans and robots. A critical requirement for learning by imitation is the ability to handle uncertainty arising from the observation process as well as the imitator’s own dynamics and interactions with the environment. In this paper, we present a new probabilistic method for inferring imitative actions that takes into account both the observations of the teacher as well as the imitator’s dynamics. Our key contribution is a nonparametric learning method which generalizes to systems with very different dynamics. Rather than relying on a known forward model of the dynamics, our approach learns a nonparametric forward model via exploration. Leveraging advances in approximate inference in graphical models, we show how the learned forward model can be directly used to plan an imitating sequence. We provide experimental results for two systems: a biomechanical model of the human arm and a 25-degrees-of-freedom humanoid robot. We demonstrate that the proposed method can be used to learn appropriate motor inputs to the model arm which imitates the desired movements. A second set of results demonstrates dynamically stable full-body imitation of a human teacher by the humanoid robot. 1
6 0.60059506 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
7 0.59910429 175 nips-2006-Simplifying Mixture Models through Function Approximation
8 0.5983476 160 nips-2006-Part-based Probabilistic Point Matching using Equivalence Constraints
9 0.59825248 119 nips-2006-Learning to Rank with Nonsmooth Cost Functions
10 0.59726411 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
11 0.59621614 43 nips-2006-Bayesian Model Scoring in Markov Random Fields
12 0.59324223 167 nips-2006-Recursive ICA
13 0.59175581 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
14 0.59175348 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming
15 0.59171164 188 nips-2006-Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks
16 0.59088808 20 nips-2006-Active learning for misspecified generalized linear models
17 0.5903073 110 nips-2006-Learning Dense 3D Correspondence
18 0.5900619 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition
19 0.58941495 101 nips-2006-Isotonic Conditional Random Fields and Local Sentiment Flow
20 0.58881062 65 nips-2006-Denoising and Dimension Reduction in Feature Space