jmlr jmlr2009 jmlr2009-99 knowledge-graph by maker-knowledge-mining

99 jmlr-2009-Using Local Dependencies within Batches to Improve Large Margin Classifiers

Source: pdf

Author: Volkan Vural, Glenn Fung, Balaji Krishnapuram, Jennifer G. Dy, Bharat Rao

Abstract: Most classiﬁcation methods assume that the samples are drawn independently and identically from an unknown data generating distribution, yet this assumption is violated in several real life problems. In order to relax this assumption, we consider the case where batches or groups of samples may have internal correlations, whereas the samples from different batches may be considered to be uncorrelated. This paper introduces three algorithms for classifying all the samples in a batch jointly: one based on a probabilistic formulation, and two based on mathematical programming analysis. Experiments on three real-life computer aided diagnosis (CAD) problems demonstrate that the proposed algorithms are signiﬁcantly more accurate than a naive support vector machine which ignores the correlations among the samples. Keywords: batch-wise classiﬁcation, support vector machine, linear programming, machine learning, statistical methods, unconstrained optimization

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In order to relax this assumption, we consider the case where batches or groups of samples may have internal correlations, whereas the samples from different batches may be considered to be uncorrelated. [sent-13, score-0.268]

2 This paper introduces three algorithms for classifying all the samples in a batch jointly: one based on a probabilistic formulation, and two based on mathematical programming analysis. [sent-14, score-0.232]

3 Experiments on three real-life computer aided diagnosis (CAD) problems demonstrate that the proposed algorithms are signiﬁcantly more accurate than a naive support vector machine which ignores the correlations among the samples. [sent-15, score-0.162]

4 The candidates are shown as ﬁve circles for the left embolus & six circles for the right embolus. [sent-22, score-0.12]

5 The disease status of spatially overlapping or proximate candidates is highly correlated. [sent-23, score-0.12]

6 A sample problem with the characteristics described above is computer aided diagnosis (CAD), where the goal is to detect structures of interest to physicians in medical images: for example, to identify potentially malignant tumors in computed tomography (CT) scans, X-ray images, etc. [sent-25, score-0.164]

7 Figure 1 displays a CT image of a lung showing circular marks that point to potential diseased candidate regions that are detected by a CAD algorithm. [sent-29, score-0.192]

8 There are ﬁve candidates on the left and six candidates on the right (marked by circles) in Figure 1. [sent-30, score-0.24]

9 Descriptive features are extracted for each candidate and each candidate region is classiﬁed as normal or abnormal. [sent-31, score-0.125]

10 In this setting, correlations exist among the candidates belonging to the same batch (image, in the case of CAD) both in the training data set and in the unseen testing data. [sent-32, score-0.456]

11 Further, the level of correlation is a function of the pairwise distance between candidates: the disease status (class-label) of a candidate is highly correlated with the status of other spatially proximate candidates, but the correlations decrease as the distance is increased. [sent-33, score-0.16]

12 Most conventional CAD algorithms classify one candidate at a time, ignoring the correlations amongst the candidates in an image. [sent-34, score-0.278]

13 We use the term 184 U SING L OCAL DATA D EPENDENCIES TO I MPROVE L ARGE M ARGIN C LASSIFIERS batch-wise classiﬁcation to mean training and testing classiﬁers in batches or groups (i. [sent-37, score-0.158]

14 , samples in the same batch are learned and classiﬁed collectively). [sent-39, score-0.293]

15 Our experiments on three real CAD problems show that indeed incorporating the local dependencies among samples within a batch improve the classiﬁcation accuracy signiﬁcantly compared to standard SVMs. [sent-40, score-0.232]

16 In the CAD example, a batch corresponds to an image and strong correlations among candidate samples are based on spatial proximity within an image. [sent-43, score-0.454]

17 A similar relationship structure is true with geographical classiﬁcation of regions in satellite images; an image would be a batch and spatial proximity corresponds to correlations among the pixel instances. [sent-44, score-0.372]

18 In other contexts, a batch may correspond to data from the same hospital, the patients treated by the same doctor or nurse, etc. [sent-45, score-0.223]

19 In non-medical contexts, a batch can be transactions from the same person, or documents from the same author. [sent-46, score-0.201]

20 In this credit card example, it might be preferable to test the recent transaction of this same person as a batch collectively rather than singly. [sent-48, score-0.201]

21 In our methods, on the other hand, we investigate the relations among the data points that are in the same batch only. [sent-78, score-0.201]

22 2 Organization of the Paper After brieﬂy describing the notation in Section 2, we build two SVM based methods for classifying batches of correlated samples in Section 3. [sent-81, score-0.134]

23 1 Standard Support Vector Machine In a standard SVM hyperplane classiﬁer, f (x) = x w − γ is learned from the training instances individually, ignoring the correlations among them. [sent-101, score-0.203]

24 , data point) depends on two factors: (a) the features that encode information about each candidate sample in a batch; and (b) additional side-information—not already fully encoded in the features—that is available to describe the level of correlations among the set of samples within a batch. [sent-115, score-0.185]

25 As mentioned above, in CAD applications, R j can be deﬁned in terms of Sj , the matrix of Euclidean distances between candidates inside a medical image (from a patient) as: 1. [sent-117, score-0.224]

26 We deﬁne a batch to be a set of possibly correlated samples that occur together naturally, for example, the set of all candidate locations from the image of the same patient. [sent-118, score-0.36]

27 Accordingly, B j ∈ Rm j ×n represents the m j training points that belong to the jth batch and the labels of these training points are represented by the m j × m j diagonal matrix D j with positive or negative ones along its diagonal. [sent-122, score-0.291]

28 In Equation (3), the class membership prediction for any single sample in batch j is a weighted average of the batch members’ prediction vector B j w − eγ, and the weighting coefﬁcients depend on the pairwise similarity between samples. [sent-140, score-0.425]

29 We refer to f (x) as a pre-classiﬁer that will be used in the next stage to make j the ﬁnal decision on a batch of instances. [sent-150, score-0.201]

30 While testing an arbitrary data point x i in batch B j , the j BatchSV M algorithm accounts for the pre-classiﬁer prediction w x p − γ for every member in the j batch. [sent-151, score-0.223]

31 j j j (5) Consider the two dimensional example in Figure 2, showing batches of training points. [sent-153, score-0.136]

32 The data points that belong to the same batch are indicated by the elliptical boundaries in the ﬁgure. [sent-154, score-0.201]

33 Figure 2b displays the correlations amongst the training points given in Figure 2a using an edge. [sent-155, score-0.14]

34 8 1 d Figure 2: An illustrative example for batch learning. [sent-208, score-0.201]

35 In Figure 2c, the hyperplane f svm (x) is the ﬁnal decision function for standard SVM and gives the results displayed in Table 1 where we observe that the fourth and the seventh instances are misclassiﬁed. [sent-215, score-0.135]

36 In Figure 2d, the pre-classiﬁer produced by BatchSV M, f batch (x) gives the results displayed in the ﬁfth column of Table 1 for the training data. [sent-216, score-0.292]

37 To yield a faster training method, we present a batch approach to proximal support vector machine in this section. [sent-257, score-0.301]

38 A Probabilistic Batch Classiﬁcation Model j Let xi ∈ Rn represent the n features for the ith candidate in the jth image, and let w ∈ Rn be the weight parameters and γ ∈ R1 be the offset of some hyperplane classiﬁer. [sent-278, score-0.127]

39 Beside the features xi that are available to inﬂuence the j classiﬁcation of a sample point (via ui ), we have side information about the extent of correlation j amongst the quantities ui . [sent-284, score-0.129]

40 However, besides the above information encoded in the form of features that describe each candidate, we also know that correlation amongst the samples in an image (batch) purely based on the proximity of the spatial locations of samples in the j th image. [sent-287, score-0.259]

41 In CAD applications, this spatial adjacency induces correlations in the predictions for the labels, but in other applications the nature of side-information that accounts for the correlations would be different. [sent-288, score-0.201]

42 P(u j ∈ Rn j ) = N(u j |0, R j ) (8) where n j is the number of the candidates in the jth image, and the covariance matrix R j encodes the correlations. [sent-290, score-0.144]

43 As mentioned above, in CAD applications, R j can be deﬁned in terms of S, the matrix of Euclidean distances between candidates inside a medical image (from a patient) as R j = exp(−αS), with α > 0 as a model parameter. [sent-291, score-0.224]

44 Note, however, that for each batch j, the probabilistic method requires calculating two matrix inversions to compute R j −1 −1 2 σ +I . [sent-297, score-0.229]

45 Hence, training and testing using this method can be time consuming for large batch sizes. [sent-298, score-0.285]

46 Hence, the intuition presented above is that we predict the classes for all the n j candidates in the jth image together, as a function of the features for all the candidates in the batch (here a batch corresponds to an image from a patient). [sent-316, score-0.789]

47 In every test image, each of the candidates is classiﬁed using the features from all the samples in the image. [sent-317, score-0.174]

48 Note that although we classify each instance in a batch at the same time, our algorithm outputs a class for each instance and the class membership for these samples need not be the same. [sent-318, score-0.232]

49 Experiments In this section, we compare regular SVM, probabilistic batch learning (BatchProb), and BatchSV M. [sent-320, score-0.201]

50 1 T HE S IMILARITY F UNCTION As mentioned earlier, the matrix R j represents the level of correlation between all pairs of candidates from a batch (an image in our case) and it is a function of the pairwise-similarity between 193 V URAL , F UNG , K RISHNAPURAM , DY AND R AO the candidates. [sent-327, score-0.4]

51 In our CAD applications, we have used the Euclidean distances between candidates inside a medical image as our similarity metric to deﬁne the covariance matrix R j . [sent-329, score-0.247]

52 In order to make fair comparisons between our methods and standard SVM and PSVM, we added the spatial locations (x,y,z coordinates of the candidates in 3D image) to the feature vectors of the candidates. [sent-349, score-0.188]

53 We report this domain-speciﬁc metric in an ROC plot, where the y-axis is a measure of sensitivity and the x-axis is the number of false-marks per patient (in our case, per image is also per patient). [sent-395, score-0.185]

54 PSVM, BatchPSV M − cv and BatchPSV M − learned algorithms require solving a set of linear equations while training, therefore we expect them to be faster than the other methods mentioned in this paper, namely SVM, BatchProb and BatchSV M. [sent-402, score-0.181]

55 On the other hand, the training time for BatchPSV M − learned is the total time divided by the number of iterations that BatchPSV M − learned required while converging to the optimum value of θ. [sent-406, score-0.189]

56 2 cont 0 10 20 30 40 50 60 70 10 False positives per patient 20 30 40 50 60 70 False positives per patient a b Figure 3: Experimental results for PE data. [sent-425, score-0.288]

57 ROC curves obtained by (a) SVM, BatchProb, BatchSV M and BatchSV Mcont (b) PSVM, BatchPSV M − cv and BatchPSV M − learned the pixel intensity descriptive statistics, texture, contrast, 2D and 3D shape descriptors, and the location of the corresponding structure. [sent-426, score-0.253]

58 2 E XAMPLE : C OLON C ANCER D ETECTION Colorectal cancer is the third most common cancer in both men and women. [sent-447, score-0.158]

59 It is estimated that in 2004, nearly 147, 000 cases of colon and rectal cancer will be diagnosed in the US, and more than 56, 730 people would die from colon cancer (Jemal et al. [sent-448, score-0.414]

60 In over 90% of the cases colon 196 U SING L OCAL DATA D EPENDENCIES TO I MPROVE L ARGE M ARGIN C LASSIFIERS 1 1 0. [sent-450, score-0.128]

61 2 0 0 30 False Positives per patient 5 10 15 20 25 30 False positives per patient a b Figure 4: Experimental results for Colon Cancer data. [sent-463, score-0.223]

62 ROC curves obtained by (a) SVM, BatchProb, BatchSV M and BatchSV Mcont (b) PSVM, BatchPSV M − cv and BatchPSV M − learned cancer progressed rapidly is from local (polyp adenomas) to advanced stages (colorectal cancer), which has very poor survival rates. [sent-464, score-0.273]

63 The sizes of a polyp in the colon can vary from 1mm all the way up to 10cm. [sent-466, score-0.175]

64 Moreover, for large polyps or so called masses a typical candidate generation algorithm generates several candidates across the polyp surface. [sent-468, score-0.265]

65 For the sake of clinical acceptability, it is sufﬁcient to detect one of these candidates during classiﬁcation. [sent-470, score-0.142]

66 Unlike a standard classiﬁcation algorithm where the emphasis is to accurately classify each and every candidate, here we seek to classify at least one of the candidates representing a polyp accurately. [sent-471, score-0.167]

67 3 E XAMPLE : L UNG C ANCER LungCAD is a computer aided detection system for detecting potentially cancerous pulmonary nodules from thin slice multi-detector computed tomography (CT) scans. [sent-481, score-0.216]

68 The ﬁnal output of LungCAD is provided by a classiﬁer that classiﬁes a set of candidates as positive or negative. [sent-482, score-0.12]

69 This is a very hard classiﬁcation problem: most patient lung CTs contain a few thousand structures (candidates), 197 V URAL , F UNG , K RISHNAPURAM , DY AND R AO 0. [sent-483, score-0.17]

70 1 18 0 False positives per patient 2 4 6 8 10 12 14 16 18 False positive per patient a b Figure 5: Experimental results for Lung Cancer data. [sent-501, score-0.223]

71 The number of candidates labeled as nodules in the training set are 157 and the total number of candidates is 9987. [sent-504, score-0.31]

72 In this testing set, there are 79 candidates labeled as nodules out of 6159 generated candidates. [sent-506, score-0.179]

73 A total of 15 features were used to describe the shape, size, texture and intensity proﬁle of each candidate region. [sent-507, score-0.147]

74 1 SVM VERSUS BATCH M ETHODS Figures 3, 4, and 5 show the ROC curves for pulmonary embolism, colon cancer, and lung cancer data respectively. [sent-512, score-0.435]

75 In our medical applications high-sensitivity is critical as early detection of lung and colon cancer is believed to greatly improve the chances of successful treatment (Bogoni et al. [sent-513, score-0.352]

76 Furthermore, high speciﬁcity is also critical, as a large number of false positives will vastly increase physician load and lead (ultimately) to loss of physician conﬁdence. [sent-515, score-0.163]

77 In Figure 3a, corresponding to the comparison of the ROC curves on the PE data set, we observe that standard SVM can only achieve 53% sensitivity for six false positives. [sent-516, score-0.132]

78 BatchProb also outperforms SVM with a 198 U SING L OCAL DATA D EPENDENCIES TO I MPROVE L ARGE M ARGIN C LASSIFIERS SVM BatchProb BatchSV M BatchSV Mcont PSVM BatchPSV M − cv BatchPSV M − learned PE 0. [sent-518, score-0.16]

79 When we compare the performances of PSVM and BatchPSV M − learned in Figure 3b, we observe that the proposed batch method (BatchPSV M − learned) dominates over PSVM. [sent-543, score-0.262]

80 Colon cancer data is a relatively easier data set than pulmonary embolism since standard SVM can achieve 54. [sent-545, score-0.257]

81 PSVM’s performance is substantially worse than that of our proposed batch methods, BatchPSV M − cv and BatchPSV M − learned. [sent-551, score-0.3]

82 Although SVM is very accurate for the lung cancer application, Figure 5a shows that BatchProb and BatchSV M could still improve SVM’s performance further. [sent-552, score-0.17]

83 Both BatchProb and BatchSV M outperform SVM in the 2-6 false positives per image region, which is the region of interest for commercial clinical lung CAD systems. [sent-554, score-0.27]

84 When we consider the area under the ROC curve for each of the methods in Table 3, we observe that our proposed methods are better than SVM and PSVM in every case except one where PSVM performs the same as our proposed methods in the lung cancer data. [sent-557, score-0.17]

85 3 BatchPSV M − learned VERSUS BatchPSV M − cv BatchPSV M − cv requires cross-validated tuning over the parameter θ to achieve the optimum accuracy. [sent-582, score-0.328]

86 In our experiments, BatchPSV M − learned was able to automatically converge to the optimum value of θ in only nine iterations on average while producing comparable ROC plots with cross-validated tuning (BatchPSV M − cv) as seen in Figures 3b, 4b and 5b. [sent-589, score-0.13]

87 When we compare the two methods with respect to training times, we observe that BatchPSV M − cv is faster than BatchPSV M − learned for a ﬁxed value of θ. [sent-592, score-0.214]

88 However, since we repeated crossvalidation 21 times for BatchPSV M − cv and BatchPSV M − learned required nine iterations on average, BatchPSV M − learned is actually faster than BatchPSV M − cv in overall while training. [sent-593, score-0.341]

89 Experimental results on training times displayed in Table 2 conﬁrm that it takes less time for 200 U SING L OCAL DATA D EPENDENCIES TO I MPROVE L ARGE M ARGIN C LASSIFIERS BatchPSV M − cv BatchPSV M − learned PE 1 1. [sent-598, score-0.219]

90 However, despite the training time performance, BatchSV M produced better ROC curves than that of BatchPSV M in PE and lung cancer data sets as illustrated in Figures 3 and 5 respectively. [sent-605, score-0.269]

91 For Colon data set, in Figure 4b, we observe that BatchPSV M − cv and BatchSV M performed slightly better than BatchSV M − learned where both achieve 86% sensitivity at one false positive level. [sent-607, score-0.258]

92 Experimental results indicate that the proposed method can substantially improve the diagnosis of (a) early stage cancer in the Lung & Colon, and (b) pulmonary embolisms (which may result in strokes). [sent-614, score-0.216]

93 For data with no clear batch information, clustering can be performed to create the batches. [sent-619, score-0.201]

94 Extending the batch algorithms to other data and automatically learning the similarity metric are topics for future work. [sent-620, score-0.224]

95 This is equivalent to assuming that the summation of all the rows of R is constant, or equivalently, for every batch j: 1 − 2 R j e ≈ ke σ where k is a constant. [sent-647, score-0.201]

96 j=1 Then, solving the problem: minw,γ,θ F(w, γ, θ) ¯ is equivalent to solving the proximal batch formulation presented in Equation (7) for some ν = ν (the constant k only rescales the problem in the variables w and γ). [sent-649, score-0.271]

97 Computerized detection of pulmonary embolism in spiral CT angiography based on volumetric image analysis. [sent-770, score-0.256]

98 Computer aided detection of pulmonary embolism on multi-detector CT. [sent-778, score-0.226]

99 CT angiography of pulmonary embolism: Diagnostic criteria and causes of misdiagnosis. [sent-806, score-0.131]

100 Computerized detection of pulmonary embolism in 3D computed tomographic (CT) images: vessel tracking and segmentation techniques. [sent-830, score-0.178]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('batchpsv', 0.515), ('batchsv', 0.45), ('batchprob', 0.244), ('cad', 0.215), ('batch', 0.201), ('psvm', 0.157), ('colon', 0.128), ('ung', 0.122), ('candidates', 0.12), ('ao', 0.112), ('rishnapuram', 0.112), ('ural', 0.112), ('batches', 0.103), ('mprove', 0.103), ('ocal', 0.103), ('pulmonary', 0.103), ('cv', 0.099), ('lung', 0.091), ('ependencies', 0.088), ('correlations', 0.08), ('svm', 0.08), ('patient', 0.079), ('cancer', 0.079), ('fung', 0.078), ('embolism', 0.075), ('dy', 0.073), ('arge', 0.072), ('lassifiers', 0.072), ('sing', 0.067), ('batchpsvm', 0.066), ('positives', 0.065), ('argin', 0.063), ('roc', 0.063), ('learned', 0.061), ('mcont', 0.056), ('siemens', 0.056), ('sensitivity', 0.056), ('medical', 0.054), ('pe', 0.052), ('candidate', 0.051), ('image', 0.05), ('rj', 0.05), ('aided', 0.048), ('dj', 0.048), ('batchsvm', 0.047), ('krishnapuram', 0.047), ('polyp', 0.047), ('polyps', 0.047), ('proximal', 0.046), ('opt', 0.043), ('false', 0.042), ('spatial', 0.041), ('texture', 0.04), ('artifact', 0.04), ('nodules', 0.037), ('vural', 0.037), ('ct', 0.036), ('tuning', 0.035), ('diagnosis', 0.034), ('optimum', 0.034), ('curves', 0.034), ('training', 0.033), ('intensity', 0.033), ('produced', 0.032), ('classi', 0.032), ('mil', 0.032), ('samples', 0.031), ('rn', 0.031), ('mangasarian', 0.03), ('hyperplane', 0.029), ('consuming', 0.029), ('correlation', 0.029), ('angiography', 0.028), ('balaji', 0.028), ('inversions', 0.028), ('lungcad', 0.028), ('pes', 0.028), ('physician', 0.028), ('tomography', 0.028), ('wopt', 0.028), ('locations', 0.027), ('amongst', 0.027), ('descriptive', 0.026), ('displayed', 0.026), ('ui', 0.025), ('formulation', 0.024), ('images', 0.024), ('jth', 0.024), ('zi', 0.023), ('similarity', 0.023), ('features', 0.023), ('clinical', 0.022), ('bag', 0.022), ('patients', 0.022), ('testing', 0.022), ('crf', 0.021), ('geman', 0.021), ('glenn', 0.021), ('er', 0.021), ('faster', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999934 99 jmlr-2009-Using Local Dependencies within Batches to Improve Large Margin Classifiers

Author: Volkan Vural, Glenn Fung, Balaji Krishnapuram, Jennifer G. Dy, Bharat Rao

2 0.044163805 39 jmlr-2009-Hybrid MPI OpenMP Parallel Linear Support Vector Machine Training

Author: Kristian Woodsend, Jacek Gondzio

Abstract: Support vector machines are a powerful machine learning technology, but the training process involves a dense quadratic optimization problem and is computationally challenging. A parallel implementation of linear Support Vector Machine training has been developed, using a combination of MPI and OpenMP. Using an interior point method for the optimization and a reformulation that avoids the dense Hessian matrix, the structure of the augmented system matrix is exploited to partition data and computations amongst parallel processors efﬁciently. The new implementation has been applied to solve problems from the PASCAL Challenge on Large-scale Learning. We show that our approach is competitive, and is able to solve problems in the Challenge many times faster than other parallel approaches. We also demonstrate that the hybrid version performs more efﬁciently than the version using pure MPI. Keywords: linear SVM training, hybrid parallelism, largescale learning, interior point method

3 0.042389378 86 jmlr-2009-Similarity-based Classification: Concepts and Algorithms

Author: Yihua Chen, Eric K. Garcia, Maya R. Gupta, Ali Rahimi, Luca Cazzanti

Abstract: This paper reviews and extends the ﬁeld of similarity-based classiﬁcation, presenting new analyses, algorithms, data sets, and a comprehensive set of experimental results for a rich collection of classiﬁcation problems. Speciﬁcally, the generalizability of using similarities as features is analyzed, design goals and methods for weighting nearest-neighbors for similarity-based learning are proposed, and different methods for consistently converting similarities into kernels are compared. Experiments on eight real data sets compare eight approaches and their variants to similarity-based learning. Keywords: similarity, dissimilarity, similarity-based learning, indeﬁnite kernels

4 0.041845109 63 jmlr-2009-On Efficient Large Margin Semisupervised Learning: Method and Theory

Author: Junhui Wang, Xiaotong Shen, Wei Pan

Abstract: In classiﬁcation, semisupervised learning usually involves a large amount of unlabeled data with only a small number of labeled data. This imposes a great challenge in that it is difﬁcult to achieve good classiﬁcation performance through labeled data alone. To leverage unlabeled data for enhancing classiﬁcation, this article introduces a large margin semisupervised learning method within the framework of regularization, based on an efﬁcient margin loss for unlabeled data, which seeks efﬁcient extraction of the information from unlabeled data for estimating the Bayes decision boundary for classiﬁcation. For implementation, an iterative scheme is derived through conditional expectations. Finally, theoretical and numerical analyses are conducted, in addition to an application to gene function prediction. They suggest that the proposed method enables to recover the performance of its supervised counterpart based on complete data in rates of convergence, when possible. Keywords: difference convex programming, classiﬁcation, nonconvex minimization, regularization, support vectors

5 0.039071843 23 jmlr-2009-Discriminative Learning Under Covariate Shift

Author: Steffen Bickel, Michael Brückner, Tobias Scheffer

Abstract: We address classiﬁcation problems for which the training instances are governed by an input distribution that is allowed to differ arbitrarily from the test distribution—problems also referred to as classiﬁcation under covariate shift. We derive a solution that is purely discriminative: neither training nor test distribution are modeled explicitly. The problem of learning under covariate shift can be written as an integrated optimization problem. Instantiating the general optimization problem leads to a kernel logistic regression and an exponential model classiﬁer for covariate shift. The optimization problem is convex under certain conditions; our ﬁndings also clarify the relationship to the known kernel mean matching procedure. We report on experiments on problems of spam ﬁltering, text classiﬁcation, and landmine detection. Keywords: covariate shift, discriminative learning, transfer learning

6 0.038193494 10 jmlr-2009-Application of Non Parametric Empirical Bayes Estimation to High Dimensional Classification

7 0.03753683 48 jmlr-2009-Learning Nondeterministic Classifiers

8 0.031234212 90 jmlr-2009-Structure Spaces

9 0.030762549 60 jmlr-2009-Nieme: Large-Scale Energy-Based Models (Machine Learning Open Source Software Paper)

10 0.030520894 87 jmlr-2009-Sparse Online Learning via Truncated Gradient

11 0.030017328 24 jmlr-2009-Distance Metric Learning for Large Margin Nearest Neighbor Classification

12 0.028269723 50 jmlr-2009-Learning When Concepts Abound

13 0.027455714 56 jmlr-2009-Model Monitor (M2): Evaluating, Comparing, and Monitoring Models (Machine Learning Open Source Software Paper)

14 0.027105331 8 jmlr-2009-An Anticorrelation Kernel for Subsystem Training in Multiple Classifier Systems

15 0.024869373 62 jmlr-2009-Nonlinear Models Using Dirichlet Process Mixtures

16 0.023091953 3 jmlr-2009-A Parameter-Free Classification Method for Large Scale Learning

17 0.023042347 58 jmlr-2009-NEUROSVM: An Architecture to Reduce the Effect of the Choice of Kernel on the Performance of SVM

18 0.022203183 16 jmlr-2009-Classification with Gaussians and Convex Loss

19 0.021972541 92 jmlr-2009-Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining

20 0.021605693 69 jmlr-2009-Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.123), (1, -0.042), (2, 0.073), (3, -0.022), (4, 0.05), (5, -0.039), (6, 0.034), (7, 0.074), (8, 0.031), (9, 0.051), (10, 0.031), (11, -0.024), (12, 0.039), (13, -0.104), (14, 0.037), (15, -0.087), (16, -0.027), (17, -0.043), (18, 0.116), (19, -0.103), (20, -0.019), (21, 0.114), (22, -0.025), (23, 0.079), (24, -0.076), (25, 0.045), (26, -0.303), (27, -0.052), (28, 0.079), (29, 0.121), (30, -0.25), (31, -0.008), (32, -0.09), (33, 0.069), (34, 0.1), (35, 0.076), (36, -0.123), (37, -0.045), (38, -0.174), (39, 0.117), (40, 0.076), (41, 0.072), (42, -0.11), (43, -0.071), (44, 0.476), (45, 0.123), (46, 0.244), (47, -0.188), (48, 0.091), (49, -0.103)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92776108 99 jmlr-2009-Using Local Dependencies within Batches to Improve Large Margin Classifiers

Author: Volkan Vural, Glenn Fung, Balaji Krishnapuram, Jennifer G. Dy, Bharat Rao

2 0.33198178 86 jmlr-2009-Similarity-based Classification: Concepts and Algorithms

Author: Yihua Chen, Eric K. Garcia, Maya R. Gupta, Ali Rahimi, Luca Cazzanti

3 0.28080595 60 jmlr-2009-Nieme: Large-Scale Energy-Based Models (Machine Learning Open Source Software Paper)

Author: Francis Maes

Abstract: N IEME,1 In this paper we introduce a machine learning library for large-scale classiﬁcation, regression and ranking. N IEME relies on the framework of energy-based models (LeCun et al., 2006) which uniﬁes several learning algorithms ranging from simple perceptrons to recent models such as the pegasos support vector machine or l1-regularized maximum entropy models. This framework also uniﬁes batch and stochastic learning which are both seen as energy minimization problems. N IEME can hence be used in a wide range of situations, but is particularly interesting for large-scale learning tasks where both the examples and the features are processed incrementally. Being able to deal with new incoming features at any time within the learning process is another original feature of the N IEME toolbox. N IEME is released under the GPL license. It is efﬁciently implemented in C++, it works on Linux, Mac OS X and Windows and provides interfaces for C++, Java and Python. Keywords: large-scale machine learning, classiﬁcation, ranking, regression, energy-based models, machine learning software

4 0.26090646 63 jmlr-2009-On Efficient Large Margin Semisupervised Learning: Method and Theory

Author: Junhui Wang, Xiaotong Shen, Wei Pan

5 0.21858364 48 jmlr-2009-Learning Nondeterministic Classifiers

Author: Juan José del Coz, Jorge Díez, Antonio Bahamonde

Abstract: Nondeterministic classiﬁers are deﬁned as those allowed to predict more than one class for some entries from an input space. Given that the true class should be included in predictions and the number of classes predicted should be as small as possible, these kind of classiﬁers can be considered as Information Retrieval (IR) procedures. In this paper, we propose a family of IR loss functions to measure the performance of nondeterministic learners. After discussing such measures, we derive an algorithm for learning optimal nondeterministic hypotheses. Given an entry from the input space, the algorithm requires the posterior probabilities to compute the subset of classes with the lowest expected loss. From a general point of view, nondeterministic classiﬁers provide an improvement in the proportion of predictions that include the true class compared to their deterministic counterparts; the price to be paid for this increase is usually a tiny proportion of predictions with more than one class. The paper includes an extensive experimental study using three deterministic learners to estimate posterior probabilities: a multiclass Support Vector Machine (SVM), a Logistic Regression, and a Na¨ve Bayes. The data sets considered comprise both UCI ı multi-class learning tasks and microarray expressions of different kinds of cancer. We successfully compare nondeterministic classiﬁers with other alternative approaches. Additionally, we shall see how the quality of posterior probabilities (measured by the Brier score) determines the goodness of nondeterministic predictions. Keywords: nondeterministic, multiclassiﬁcation, reject option, multi-label classiﬁcation, posterior probabilities

6 0.1994729 39 jmlr-2009-Hybrid MPI OpenMP Parallel Linear Support Vector Machine Training

7 0.18998259 6 jmlr-2009-An Algorithm for Reading Dependencies from the Minimal Undirected Independence Map of a Graphoid that Satisfies Weak Transitivity

8 0.1712662 90 jmlr-2009-Structure Spaces

9 0.16752613 54 jmlr-2009-Markov Properties for Linear Causal Models with Correlated Errors (Special Topic on Causality)

10 0.16106232 58 jmlr-2009-NEUROSVM: An Architecture to Reduce the Effect of the Choice of Kernel on the Performance of SVM

11 0.15814579 1 jmlr-2009-A Least-squares Approach to Direct Importance Estimation

12 0.15177362 24 jmlr-2009-Distance Metric Learning for Large Margin Nearest Neighbor Classification

13 0.14319803 10 jmlr-2009-Application of Non Parametric Empirical Bayes Estimation to High Dimensional Classification

14 0.14006284 62 jmlr-2009-Nonlinear Models Using Dirichlet Process Mixtures

15 0.13073368 67 jmlr-2009-Online Learning with Sample Path Constraints

16 0.12964737 50 jmlr-2009-Learning When Concepts Abound

17 0.12603538 87 jmlr-2009-Sparse Online Learning via Truncated Gradient

18 0.12511785 23 jmlr-2009-Discriminative Learning Under Covariate Shift

19 0.11632471 12 jmlr-2009-Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression

20 0.11572787 61 jmlr-2009-Nonextensive Information Theoretic Kernels on Measures

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(8, 0.026), (11, 0.019), (19, 0.012), (21, 0.01), (26, 0.022), (38, 0.026), (47, 0.016), (52, 0.063), (55, 0.047), (58, 0.041), (66, 0.071), (68, 0.025), (90, 0.08), (96, 0.029), (98, 0.436)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.70406222 99 jmlr-2009-Using Local Dependencies within Batches to Improve Large Margin Classifiers

Author: Volkan Vural, Glenn Fung, Balaji Krishnapuram, Jennifer G. Dy, Bharat Rao

2 0.2913768 97 jmlr-2009-Ultrahigh Dimensional Feature Selection: Beyond The Linear Model

Author: Jianqing Fan, Richard Samworth, Yichao Wu

Abstract: Variable selection in high-dimensional space characterizes many contemporary problems in scientiﬁc discovery and decision making. Many frequently-used techniques are based on independence screening; examples include correlation ranking (Fan & Lv, 2008) or feature selection using a twosample t-test in high-dimensional classiﬁcation (Tibshirani et al., 2003). Within the context of the linear model, Fan & Lv (2008) showed that this simple correlation ranking possesses a sure independence screening property under certain conditions and that its revision, called iteratively sure independent screening (ISIS), is needed when the features are marginally unrelated but jointly related to the response variable. In this paper, we extend ISIS, without explicit deﬁnition of residuals, to a general pseudo-likelihood framework, which includes generalized linear models as a special case. Even in the least-squares setting, the new method improves ISIS by allowing feature deletion in the iterative process. Our technique allows us to select important features in high-dimensional classiﬁcation where the popularly used two-sample t-method fails. A new technique is introduced to reduce the false selection rate in the feature screening stage. Several simulated and two real data examples are presented to illustrate the methodology. Keywords: classiﬁcation, feature screening, generalized linear models, robust regression, feature selection

3 0.29099566 82 jmlr-2009-Robustness and Regularization of Support Vector Machines

Author: Huan Xu, Constantine Caramanis, Shie Mannor

Abstract: We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classiﬁcation that explicitly build in protection to noise, and at the same time control overﬁtting. On the analysis front, the equivalence of robustness and regularization provides a robust optimization interpretation for the success of regularized SVMs. We use this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well. Keywords: robustness, regularization, generalization, kernel, support vector machine

4 0.29049847 85 jmlr-2009-Settable Systems: An Extension of Pearl's Causal Model with Optimization, Equilibrium, and Learning

Author: Halbert White, Karim Chalak

Abstract: Judea Pearl’s Causal Model is a rich framework that provides deep insight into the nature of causal relations. As yet, however, the Pearl Causal Model (PCM) has had a lesser impact on economics or econometrics than on other disciplines. This may be due in part to the fact that the PCM is not as well suited to analyzing structures that exhibit features of central interest to economists and econometricians: optimization, equilibrium, and learning. We offer the settable systems framework as an extension of the PCM that permits causal discourse in systems embodying optimization, equilibrium, and learning. Because these are common features of physical, natural, or social systems, our framework may prove generally useful for machine learning. Important features distinguishing the settable system framework from the PCM are its countable dimensionality and the use of partitioning and partition-speciﬁc response functions to accommodate the behavior of optimizing and interacting agents and to eliminate the requirement of a unique ﬁxed point for the system. Reﬁnements of the PCM include the settable systems treatment of attributes, the causal role of exogenous variables, and the dual role of variables as causes and responses. A series of closely related machine learning examples and examples from game theory and machine learning with feedback demonstrates some limitations of the PCM and motivates the distinguishing features of settable systems. Keywords: equations causal models, game theory, machine learning, recursive estimation, simultaneous

5 0.28998399 62 jmlr-2009-Nonlinear Models Using Dirichlet Process Mixtures

Author: Babak Shahbaba, Radford Neal

Abstract: We introduce a new nonlinear model for classiﬁcation, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relationship becomes nonlinear if the mixture contains more than one component, with different regression coefﬁcients. We use simulated data to compare the performance of this new approach to alternative methods such as multinomial logit (MNL) models, decision trees, and support vector machines. We also evaluate our approach on two classiﬁcation problems: identifying the folding class of protein sequences and detecting Parkinson’s disease. Our model can sometimes improve predictive accuracy. Moreover, by grouping observations into sub-populations (i.e., mixture components), our model can sometimes provide insight into hidden structure in the data. Keywords: mixture models, Dirichlet process, classiﬁcation

6 0.28886482 27 jmlr-2009-Efficient Online and Batch Learning Using Forward Backward Splitting

7 0.28882983 38 jmlr-2009-Hash Kernels for Structured Data

8 0.2887347 70 jmlr-2009-Particle Swarm Model Selection (Special Topic on Model Selection)

9 0.28865412 87 jmlr-2009-Sparse Online Learning via Truncated Gradient

10 0.28862002 48 jmlr-2009-Learning Nondeterministic Classifiers

11 0.28771207 32 jmlr-2009-Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

12 0.28738186 58 jmlr-2009-NEUROSVM: An Architecture to Reduce the Effect of the Choice of Kernel on the Performance of SVM

13 0.28593001 75 jmlr-2009-Provably Efficient Learning with Typed Parametric Models

14 0.28567305 10 jmlr-2009-Application of Non Parametric Empirical Bayes Estimation to High Dimensional Classification

15 0.28552225 52 jmlr-2009-Margin-based Ranking and an Equivalence between AdaBoost and RankBoost

16 0.28279516 69 jmlr-2009-Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization

17 0.2825951 86 jmlr-2009-Similarity-based Classification: Concepts and Algorithms

18 0.28230825 19 jmlr-2009-Controlling the False Discovery Rate of the Association Causality Structure Learned with the PC Algorithm (Special Topic on Mining and Learning with Graphs and Relations)

19 0.28143919 24 jmlr-2009-Distance Metric Learning for Large Margin Nearest Neighbor Classification

20 0.28027105 18 jmlr-2009-Consistency and Localizability