nips nips2006 nips2006-140 knowledge-graph by maker-knowledge-mining

140 nips-2006-Multiple Instance Learning for Computer Aided Diagnosis

Source: pdf

Author: Murat Dundar, Balaji Krishnapuram, R. B. Rao, Glenn M. Fung

Abstract: Many computer aided diagnosis (CAD) problems can be best modelled as a multiple-instance learning (MIL) problem with unbalanced data: i.e. , the training data typically consists of a few positive bags, and a very large number of negative instances. Existing MIL algorithms are much too computationally expensive for these datasets. We describe CH, a framework for learning a Convex Hull representation of multiple instances that is signiﬁcantly faster than existing MIL algorithms. Our CH framework applies to any standard hyperplane-based learning algorithm, and for some algorithms, is guaranteed to ﬁnd the global optimal solution. Experimental studies on two different CAD applications further demonstrate that the proposed algorithm signiﬁcantly improves diagnostic accuracy when compared to both MIL and traditional classiﬁers. Although not designed for standard MIL problems (which have both positive and negative bags and relatively balanced datasets), comparisons against other MIL methods on benchmark problems also indicate that the proposed method is competitive with the state-of-the-art.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract Many computer aided diagnosis (CAD) problems can be best modelled as a multiple-instance learning (MIL) problem with unbalanced data: i. [sent-7, score-0.114]

2 , the training data typically consists of a few positive bags, and a very large number of negative instances. [sent-9, score-0.143]

3 We describe CH, a framework for learning a Convex Hull representation of multiple instances that is signiﬁcantly faster than existing MIL algorithms. [sent-11, score-0.046]

4 Experimental studies on two different CAD applications further demonstrate that the proposed algorithm signiﬁcantly improves diagnostic accuracy when compared to both MIL and traditional classiﬁers. [sent-13, score-0.057]

5 Although not designed for standard MIL problems (which have both positive and negative bags and relatively balanced datasets), comparisons against other MIL methods on benchmark problems also indicate that the proposed method is competitive with the state-of-the-art. [sent-14, score-0.291]

6 1 Introduction In many computer aided diagnosis applications, the goal is to detect potentially malignant tumors and lesions in medical images (CT scans, X-ray, MRI etc). [sent-15, score-0.217]

7 In an almost universal paradigm for CAD algorithms, this problem is addressed by a 3 stage system: identiﬁcation of potentially unhealthy regions of interest (ROI) by a candidate generator, computation of descriptive features for each candidate, and labeling of each candidate (e. [sent-16, score-0.205]

8 The training dataset for the classiﬁer is generated as follows: Expert radiologists examine a set of images to mark out tumors. [sent-19, score-0.157]

9 Then, candidate ROIs (with associated computed features) are marked positive if they are sufﬁciently close to a radiologist mark, and negative otherwise. [sent-20, score-0.363]

10 Many CAD datasets have fewer than 1-10% positive candidates. [sent-21, score-0.128]

11 In Section 2 we show that CAD data is better modeled in the multiple instance learning (MIL) framework, and subsequently present a novel convex-hull-based MIL algorithm. [sent-23, score-0.045]

12 In Section 3 we provide experimental evidence from two different CAD problems to show that the proposed algorithm is signiﬁcantly faster than other MIL algorithms, and more accurate when compared to other MIL algorithms and to traditional classiﬁers. [sent-24, score-0.034]

13 Further—although this is not the main focus of our paper—on traditional benchmarks for MIL, our algorithm is again shown to be competitive with the current state-of-the-art. [sent-25, score-0.034]

14 This property is clearly violated in a CAD dataset, due to spatial adjacency of the regions identiﬁed by a candidate generator, both the features and the class labels of several adjacent candidates (training instances) are highly correlated. [sent-30, score-0.232]

15 Second, because candidates are labelled positive if they are within some pre-determined distance from a radiologist mark, multiple positive candidates could correspond with the same (positive) radiologist mark on the image. [sent-32, score-0.72]

16 Note that some of the positively labelled candidates may actually refer to healthy structures that just happen to be near a mark, thereby introducing an asymmetric labeling error in the training data. [sent-33, score-0.169]

17 In MIL terminology from previous literature, a “bag” may contain many observation instances of the same underlying entity, and every training bag is provided a class label (e. [sent-34, score-0.299]

18 The objective in MIL is to learn a classiﬁer that correctly classiﬁes at least one instance from every bag. [sent-37, score-0.087]

19 In particular, even if one of the candidates that refers to the underlying malignant structure (radiologist mark) is correctly highlighted to the radiologist, the malignant structure is detected; i. [sent-39, score-0.29]

20 , the correct classiﬁcation of every candidate instance is not as important as the ability to detect at least one candidate that points to a malignant region. [sent-41, score-0.283]

21 Furthermore, we would like to classify every sample that is distant from radiologist mark as negative, this is easily accomplished by considering each negative candidate as a bag. [sent-42, score-0.372]

22 Therefore, it would appear that MIL algorithms should outperform traditional classiﬁers on CAD datasets. [sent-43, score-0.034]

23 In CAD we typically have several thousand mostly negative candidates (instances), and a few hundred positive bags; existing MIL algorithms are simply unable to handle such large datasets due to time or memory requirements. [sent-45, score-0.329]

24 i i Notation: Let the i-th bag of class j be represented by the matrix Bj ∈ ℜmj ×n , i = 1, . [sent-46, score-0.224]

25 , rj , i il j ∈ {±1}, n is the number of features. [sent-49, score-0.106]

26 The row l of Bj , denoted by Bj represents the datapoint l of the bag i in class j with l = 1, . [sent-50, score-0.224]

27 1 Key idea: Relaxation of MIL via Convex-Hulls The original MIL problem requires at least one of the samples in a bag to be correctly labeled by the classiﬁer: this corresponds to a set of discrete constraints on the classiﬁer. [sent-57, score-0.266]

28 By contrast, we shall relax this and require that at least one point in the convex hull of a bag of samples (including, possibly one of the original samples) has to be correctly classiﬁed. [sent-58, score-0.409]

29 As mentioned above, we will i i consider that a bag Bj is correctly classiﬁed if any point inside the convex hull of the bag Bj (i. [sent-61, score-0.633]

30 i i ′ i any convex combination of points of Bj ) is correctly classiﬁed. [sent-63, score-0.102]

31 0 ≤ λj , e λj = 1 be the vector containing the coefﬁcients of the convex combination that deﬁnes the representative point of bag i in class j. [sent-66, score-0.318]

32 Let r be the total number of representative points, i. [sent-67, score-0.034]

33 Let γ be the total number of convex hull coefﬁcients corresponding to the representative points in class j, i. [sent-70, score-0.177]

34 E : ℜr ⇒ ℜ represents the loss j function, Φ : ℜ(n+1) ⇒ ℜ is a regularization function on the hyperplane coefﬁcients [2] and Ψ is a regularization function on the convex combination coefﬁcients λi . [sent-82, score-0.089]

35 Positive and negative classes are represented by blue circles and red diamonds respectively. [sent-98, score-0.061]

36 Cyan polyhedrons represent the convex hulls for the three positives bags, the points chosen by our algorithm to represent each bag is shown by blue stars. [sent-99, score-0.284]

37 The magenta line represents the linear hyperplane obtained by our algorithm and the black line represents the hyperplane for the SVM. [sent-100, score-0.058]

38 2 2 and Ω = ℜr+ , leads to MIL versions of the and Ω = ℜr , leads to MIL versions of the Least- 2 3. [sent-105, score-0.062]

39 ν = 1, E(ξ) = ξ 2 , Ω = {ξ : e′ ξj = 0, j ∈ {±}} leads to MIL versions of the QP formulation for Fisher’s linear discriminant (FD) [4]. [sent-106, score-0.088]

40 min (ξ,w,η,λ)∈Rr+n+1+γ ξ 2 2 + Φ(w, η) + Ψ(λ) i i ξ = di − (λi Bj w − eη) j (2) e ξj = 0 ′ i e λj = 1 0 ≤ λi j The number of variables to be optimized in (2) is r+n+1+γ: this is computationally infeasible when i the number of bags is large (r > 104 ). [sent-110, score-0.135]

41 Effectively, this results in the MIL version of the traditional FD algorithm. [sent-113, score-0.034]

42 As discussed later in the paper, in addition to the obvious computational gains, this manipulation results in some algorithmic advantages as well (For more information on the equivalence between the single instance learning versions of (2) and (3) see [4]). [sent-114, score-0.076]

43 wT (µ+ − µ− ) = e′ λi = j 0 ≤ T b 1 λi j (3) 1 1 where SW = j∈{±} rj (Xj − µj e′ ) (Xj − µj e′ ) is the within class scatter matrix, µj = rj Xj e is the mean for class j. [sent-119, score-0.212]

44 Xj ∈ ℜrj ×n is a matrix containing the rj representative points on ni dimensional space such that the row of Xj denoted by bi = Bj λi is the representative point of bag j j i in class j where i = {1, . [sent-120, score-0.398]

45 When Φ(w) and Ψ(λ) are strongly convex functions, both the original objective function and the two subproblems (for optimizing λ and w) in (3) are strongly convex, meaning that the algorithm converges to a global minimizer [6]. [sent-127, score-0.087]

46 For computational efﬁciency, in the remainder of the paper we will use the 2 2 regularizers Φ(w) = ǫ w 2 and Ψ(λ) = ǫ λ 2 , where ǫ is a positive regularization parameter. [sent-128, score-0.053]

47 Since SW is the sum of two covariance matrices, it is guaranteed to be at least positive semideﬁnite and thus the problem in (4) is convex. [sent-133, score-0.053]

48 the number of bags is much greater than the number of dimensionality, SW is positive deﬁnite and thus the problem in (4) is strictly convex. [sent-136, score-0.188]

49 This changes the order of complexity from O(nr2 ) to O(n2 r) and brings some computational advantages when dealing with datasets with r >> n. [sent-138, score-0.1]

50 For the positive class elements k=1 m+ + 1 through r+ i k k ¯i k=1 m+ of bj are nonzero, for the negative class nonzero elements are located at k=1 m+ + r+ i−1 i k k k ¯ k=1 m− + 1 through k=1 m+ + k=1 m− . [sent-142, score-0.242]

51 Note that SW is also a sum of two covariance matrices, it is positive semideﬁnite and thus the problem in (5) is convex. [sent-143, score-0.053]

52 Unlike sub problem 1 ¯ the positive deﬁniteness of SW does not depend on the data, since it always true that r ≤ γ. [sent-144, score-0.093]

53 As it was mentioned before, in CAD applications, a bag is deﬁned as a set of candidates that are spatially close to the radiologist marked ground-truth. [sent-146, score-0.545]

54 Any candidate that is spatially far from this location is considered negative in the training data, therefore the concept of bag for negative examples does not make any practical sense in this scenario. [sent-147, score-0.491]

55 Moreover, since ground truth is only available on the training set, there is no concept of a bag on the test set for both positive and negative examples. [sent-148, score-0.367]

56 The learned classiﬁer labels (ie classiﬁes) individual instances - the bag information for positive examples is only used to help learn a better classiﬁer from the training data. [sent-149, score-0.352]

57 The nonlinear version of the proposed algorithm can be obtained by ﬁrst transforming the original ¯ datapoints to a kernel space spanned by all datapoints through a kernel operator, i. [sent-170, score-0.126]

58 However when γ is ¯ large, for computational reasons we can use the technique presented in [7] to limit the number of datapoints spanning this new space. [sent-174, score-0.063]

59 All algorithms are trained on the training data and then tested on the sequestered test data. [sent-181, score-0.08]

60 Pulmonary embolism (PE), a potentially life-threatening condition, is a result of underlying venous thromboembolic disease. [sent-185, score-0.089]

61 An early and accurate diagnosis is the key to survival. [sent-186, score-0.047]

62 Computed tomography angiography (CTA) has emerged as an accurate diagnostic tool for PE, and However, there are hundreds of CT slices in each CTA study and manual reading is laborious, time consuming and complicated by various PE lookalikes. [sent-187, score-0.047]

63 At four different hospitals (two North American sites and two European sites), we collected 72 cases with 242 PE bags comprised of 1069 positive candidates marked by expert chest radiologists. [sent-189, score-0.406]

64 The cases were randomly divided into two sets: training (48 cases with 173 PE bags and 3655 candidates) and testing (24 cases with 69 PE bags and 1857 candidates). [sent-190, score-0.299]

65 The test group was sequestered and only used to evaluate the performance of the ﬁnal system. [sent-191, score-0.051]

66 Colorectal cancer is the third most common cancer in both men and women. [sent-193, score-0.112]

67 It is estimated that in 2004, nearly 147,000 cases of colon and rectal cancer will be diagnosed in the US, and more than 56,730 people would die from colon cancer [13]. [sent-194, score-0.554]

68 CT colonography is emerging as a new procedure to help in early detection of colon polyps. [sent-195, score-0.276]

69 However, reading through a large CT dataset, which typically consists of two CT series of the patient in prone and supine positions, each with several hundred slices, is time-consuming. [sent-196, score-0.072]

70 Colon CAD [14] can play a critical role to help the radiologist avoid the missing of colon polyps. [sent-197, score-0.357]

71 Moreover, for large polyps, a typical candidate generation algorithm generates several candidates across the polyp surface. [sent-199, score-0.266]

72 The test group was sequestered and only used to evaluate the performance of the ﬁnal system. [sent-202, score-0.051]

73 Note that CAD performance is only valid in the clinically acceptable range, < 10fp/patient for PE, < 5fp/volume for Colon (generally there are 2 volumes per patient). [sent-206, score-0.036]

74 Among MIL algorithms, for the PE data, CH-FD was roughly 2-times and 9-times as fast than IAPR and EMDD respectively, and for the much larger colon dataset was roughly 85-times and 2000-times faster, respectively(see Table 1). [sent-234, score-0.253]

75 1 0 0 10 20 30 FP/Patient 40 50 60 0 0 10 20 30 False positive per volume 40 50 60 Figure 2: ROC curves obtained for (left) PE Testing data and (right) COLON testing Data 3. [sent-253, score-0.053]

76 2 Experiments on Benchmark Datasets We compare CH-FD with several state-of-the-art MIL algorithms on 5 benchmark MIL datasets: 2 Musk datasets [9] and 3 Image Annotation datasets [15]. [sent-254, score-0.192]

77 Each of these datasets contain both positive and negative bags. [sent-255, score-0.189]

78 CH-FD (and MICA) use just the positive bag information and ignore the negative bag information, in effect, treating each negative instance as a separate bag. [sent-256, score-0.668]

79 All the other MIL algorithms use both the positive and negative bag information. [sent-257, score-0.338]

80 The Musk datasets contains feature vectors describing the surfaces of low-energy shapes from molecules. [sent-258, score-0.075]

81 The goal is to differentiate molecules that smell ”musky” from the rest of the molecules. [sent-260, score-0.097]

82 Approximately half of the molecules are known to smell musky. [sent-261, score-0.097]

83 MUSK1 contains 92 molecules with a total of 476 instances. [sent-263, score-0.063]

84 MUSK2 contains 102 molecules with a total of 6598 instances. [sent-264, score-0.063]

85 72 of the molecules are shared between two datasets but MUSK2 dataset contain more instances for the shared molecules. [sent-265, score-0.216]

86 Each dataset has 100 positive bags and 100 negative bags. [sent-267, score-0.281]

87 For the musk datasets our results are based on a Radial Basis Function 2 (RBF) kernel K(xi , xj ) = exp(−σ x − y ). [sent-269, score-0.181]

88 The kernel space is assumed to be spanned by all the datapoints in MUSK1 dataset and a subset of the datapoints in MUSK2 dataset (one tenth of the original training set is randomly selected for this purpose). [sent-270, score-0.219]

89 We follow the benchmark experiment design and report average accuracy of 10 runs of 10-fold Cross Validation Table 2: Average accuracy on Benchmark Datasets. [sent-273, score-0.042]

90 Table 2 shows that CH-FD is comparable to other techniques on all datasets, even though it ignores the negative bag information. [sent-313, score-0.285]

91 4 Discussions Relationship to previous literature on MIL: The Multiple Instance Learning problem described in this paper has been studied widely in the literature [9, 15, 16, 17, 8]. [sent-316, score-0.048]

92 The convex-hull idea presented in this paper to represent each bag is similar in nature to the one presented in [1]. [sent-317, score-0.224]

93 However in contrast with [1] and many other approaches in the literature [9, 15, 17] our formulation leads to a strongly convex minimization problem that converges to a unique minimizer. [sent-318, score-0.084]

94 Since our algorithm considers each negative instance as an individual bag, it is complexity is square proportional to the number of positive instances only which makes it scalable to large datasets with large number of negative examples. [sent-319, score-0.341]

95 In our benchmark experiments, the resulting algorithm achieves accuracy that is comparable to the current state of the art, but at a signiﬁcantly lower run time (typically several orders of magnitude speed ups were observed). [sent-326, score-0.042]

96 Related work: Note that as the distance between candidate ROI increases, the correlations between their features and labels decreases. [sent-327, score-0.092]

97 Thus we jointly classify entire batches of correlated samples both during training and testing. [sent-329, score-0.05]

98 Instead of classifying each sample independently, we use this spatial information along with the features of each candidate to simultaneously classify all the candidate ROIs for a single patient/volume in a joint operation [18]. [sent-330, score-0.205]

99 Computer aided detection of pulmonary embolism on multi-detector ct, 2004. [sent-408, score-0.224]

100 Computerized detection of pulmonary embolism in 3D computed tomographic (CT) images: vessel tracking and segmentation techniques. [sent-421, score-0.157]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mil', 0.646), ('cad', 0.34), ('bag', 0.224), ('colon', 0.221), ('pe', 0.166), ('fd', 0.148), ('candidates', 0.14), ('radiologist', 0.136), ('bags', 0.135), ('rj', 0.106), ('bj', 0.106), ('iapr', 0.102), ('sw', 0.093), ('candidate', 0.092), ('emdd', 0.085), ('hull', 0.083), ('datasets', 0.075), ('fisher', 0.075), ('embolism', 0.068), ('idapr', 0.068), ('musk', 0.068), ('pulmonary', 0.068), ('aided', 0.067), ('molecules', 0.063), ('datapoints', 0.063), ('ct', 0.062), ('mark', 0.062), ('negative', 0.061), ('convex', 0.06), ('discriminant', 0.057), ('auc', 0.056), ('cancer', 0.056), ('malignant', 0.054), ('classi', 0.054), ('positive', 0.053), ('ao', 0.051), ('polyps', 0.051), ('sequestered', 0.051), ('ch', 0.05), ('diagnosis', 0.047), ('instances', 0.046), ('instance', 0.045), ('benchmark', 0.042), ('correctly', 0.042), ('sub', 0.04), ('xj', 0.038), ('patient', 0.038), ('volumes', 0.036), ('representative', 0.034), ('colonography', 0.034), ('cta', 0.034), ('dundar', 0.034), ('elephant', 0.034), ('mica', 0.034), ('polyp', 0.034), ('radiologists', 0.034), ('smell', 0.034), ('supine', 0.034), ('annotation', 0.034), ('traditional', 0.034), ('wisconsin', 0.032), ('dataset', 0.032), ('versions', 0.031), ('ic', 0.031), ('comprised', 0.03), ('ftp', 0.03), ('roi', 0.03), ('rois', 0.03), ('tiger', 0.03), ('hyperplane', 0.029), ('er', 0.029), ('training', 0.029), ('medical', 0.028), ('sites', 0.027), ('receiver', 0.027), ('fung', 0.027), ('subproblems', 0.027), ('krishnapuram', 0.027), ('wt', 0.025), ('brings', 0.025), ('rr', 0.025), ('wc', 0.025), ('spatially', 0.024), ('literature', 0.024), ('generator', 0.024), ('dd', 0.024), ('operating', 0.024), ('slices', 0.024), ('roc', 0.023), ('diagnostic', 0.023), ('madison', 0.023), ('patients', 0.023), ('nonzero', 0.022), ('table', 0.022), ('multi', 0.022), ('fix', 0.022), ('potentially', 0.021), ('classify', 0.021), ('detection', 0.021), ('marked', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 140 nips-2006-Multiple Instance Learning for Computer Aided Diagnosis

Author: Murat Dundar, Balaji Krishnapuram, R. B. Rao, Glenn M. Fung

2 0.062591121 130 nips-2006-Max-margin classification of incomplete data

Author: Gal Chechik, Geremy Heitz, Gal Elidan, Pieter Abbeel, Daphne Koller

Abstract: We consider the problem of learning classiﬁers for structurally incomplete data, where some objects have a subset of features inherently absent due to complex relationships between the features. The common approach for handling missing features is to begin with a preprocessing phase that completes the missing features, and then use a standard classiﬁcation procedure. In this paper we show how incomplete data can be classiﬁed directly without any completion of the missing features using a max-margin learning framework. We formulate this task using a geometrically-inspired objective function, and discuss two optimization approaches: The linearly separable case is written as a set of convex feasibility problems, and the non-separable case has a non-convex objective that we optimize iteratively. By avoiding the pre-processing phase in which the data is completed, these approaches oﬀer considerable computational savings. More importantly, we show that by elegantly handling complex patterns of missing values, our approach is both competitive with other methods when the values are missing at random and outperforms them when the missing values have non-trivial structure. We demonstrate our results on two real-world problems: edge prediction in metabolic pathways, and automobile detection in natural images. 1

3 0.055725537 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification

Author: Zhi-hua Zhou, Min-ling Zhang

Abstract: In this paper, we formalize multi-instance multi-label learning, where each training example is associated with not only multiple instances but also multiple class labels. Such a problem can occur in many real-world tasks, e.g. an image usually contains multiple patches each of which can be described by a feature vector, and the image can belong to multiple categories since its semantics can be recognized in different ways. We analyze the relationship between multi-instance multi-label learning and the learning frameworks of traditional supervised learning, multiinstance learning and multi-label learning. Then, we propose the M IML B OOST and M IML S VM algorithms which achieve good performance in an application to scene classiﬁcation. 1

4 0.047979813 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests

Author: Frank Moosmann, Bill Triggs, Frederic Jurie

Abstract: Some of the most effective recent methods for content-based image classiﬁcation work by extracting dense or sparse local image descriptors, quantizing them according to a coding rule such as k-means vector quantization, accumulating histograms of the resulting “visual word” codes over the image, and classifying these with a conventional classiﬁer such as an SVM. Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests – ensembles of randomly created clustering trees – and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several state-of-the-art image classiﬁcation tasks. 1

5 0.047514506 64 nips-2006-Data Integration for Classification Problems Employing Gaussian Process Priors

Author: Mark Girolami, Mingjun Zhong

Abstract: By adopting Gaussian process priors a fully Bayesian solution to the problem of integrating possibly heterogeneous data sets within a classiﬁcation setting is presented. Approximate inference schemes employing Variational & Expectation Propagation based methods are developed and rigorously assessed. We demonstrate our approach to integrating multiple data sets on a large scale protein fold prediction problem where we infer the optimal combinations of covariance functions and achieve state-of-the-art performance without resorting to any ad hoc parameter tuning and classiﬁer combination. 1

6 0.046075221 186 nips-2006-Support Vector Machines on a Budget

7 0.044310264 153 nips-2006-Online Clustering of Moving Hyperplanes

8 0.041761972 50 nips-2006-Chained Boosting

9 0.041147828 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition

10 0.040943954 65 nips-2006-Denoising and Dimension Reduction in Feature Space

11 0.040898178 62 nips-2006-Correcting Sample Selection Bias by Unlabeled Data

12 0.04087083 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space

13 0.039979808 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

14 0.038718801 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization

15 0.038185813 75 nips-2006-Efficient sparse coding algorithms

16 0.036602661 66 nips-2006-Detecting Humans via Their Pose

17 0.036349349 164 nips-2006-Randomized PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension

18 0.036083896 138 nips-2006-Multi-Task Feature Learning

19 0.036047783 169 nips-2006-Relational Learning with Gaussian Processes

20 0.035731059 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.134), (1, 0.031), (2, 0.013), (3, -0.004), (4, -0.025), (5, 0.032), (6, -0.048), (7, 0.004), (8, -0.001), (9, -0.007), (10, -0.074), (11, 0.03), (12, -0.014), (13, -0.023), (14, 0.023), (15, 0.048), (16, 0.014), (17, -0.03), (18, 0.003), (19, 0.029), (20, 0.023), (21, -0.002), (22, 0.03), (23, 0.024), (24, 0.034), (25, -0.036), (26, -0.027), (27, -0.04), (28, 0.053), (29, -0.016), (30, -0.001), (31, -0.044), (32, 0.06), (33, 0.032), (34, 0.039), (35, 0.028), (36, 0.072), (37, 0.12), (38, -0.033), (39, -0.05), (40, 0.002), (41, -0.014), (42, 0.086), (43, 0.003), (44, 0.049), (45, 0.016), (46, -0.098), (47, 0.016), (48, 0.141), (49, 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87170702 140 nips-2006-Multiple Instance Learning for Computer Aided Diagnosis

Author: Murat Dundar, Balaji Krishnapuram, R. B. Rao, Glenn M. Fung

2 0.52283782 130 nips-2006-Max-margin classification of incomplete data

Author: Gal Chechik, Geremy Heitz, Gal Elidan, Pieter Abbeel, Daphne Koller

3 0.49701625 105 nips-2006-Large Margin Component Analysis

Author: Lorenzo Torresani, Kuang-chih Lee

Abstract: Metric learning has been shown to signiﬁcantly improve the accuracy of k-nearest neighbor (kNN) classiﬁcation. In problems involving thousands of features, distance learning algorithms cannot be used due to overﬁtting and high computational complexity. In such cases, previous work has relied on a two-step solution: ﬁrst apply dimensionality reduction methods to the data, and then learn a metric in the resulting low-dimensional subspace. In this paper we show that better classiﬁcation performance can be achieved by unifying the objectives of dimensionality reduction and metric learning. We propose a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin. This projection is deﬁned by a signiﬁcantly smaller number of parameters than metrics learned in input space, and thus our optimization reduces the risks of overﬁtting. Theory and results are presented for both a linear as well as a kernelized version of the algorithm. Overall, we achieve classiﬁcation rates similar, and in several cases superior, to those of support vector machines. 1

4 0.47050831 186 nips-2006-Support Vector Machines on a Budget

Author: Ofer Dekel, Yoram Singer

Abstract: The standard Support Vector Machine formulation does not provide its user with the ability to explicitly control the number of support vectors used to deﬁne the generated classiﬁer. We present a modiﬁed version of SVM that allows the user to set a budget parameter B and focuses on minimizing the loss attained by the B worst-classiﬁed examples while ignoring the remaining examples. This idea can be used to derive sparse versions of both L1-SVM and L2-SVM. Technically, we obtain these new SVM variants by replacing the 1-norm in the standard SVM formulation with various interpolation-norms. We also adapt the SMO optimization algorithm to our setting and report on some preliminary experimental results. 1

5 0.46537641 28 nips-2006-An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models

Author: S. S. Keerthi, Vikas Sindhwani, Olivier Chapelle

Abstract: We consider the task of tuning hyperparameters in SVM models based on minimizing a smooth performance validation function, e.g., smoothed k-fold crossvalidation error, using non-linear optimization techniques. The key computation in this approach is that of the gradient of the validation function with respect to hyperparameters. We show that for large-scale problems involving a wide choice of kernel-based models and validation functions, this computation can be very efﬁciently done; often within just a fraction of the training time. Empirical results show that a near-optimal set of hyperparameters can be identiﬁed by our approach with very few training rounds and gradient computations. . 1

6 0.4617022 119 nips-2006-Learning to Rank with Nonsmooth Cost Functions

7 0.46114722 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods

8 0.43753028 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification

9 0.42038399 153 nips-2006-Online Clustering of Moving Hyperplanes

10 0.41944987 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests

11 0.41379377 24 nips-2006-Aggregating Classification Accuracy across Time: Application to Single Trial EEG

12 0.41254669 178 nips-2006-Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation

13 0.40689585 5 nips-2006-A Kernel Method for the Two-Sample-Problem

14 0.40612069 55 nips-2006-Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees

15 0.40605798 75 nips-2006-Efficient sparse coding algorithms

16 0.39787111 142 nips-2006-Mutagenetic tree Fisher kernel improves prediction of HIV drug resistance from viral genotype

17 0.39515626 50 nips-2006-Chained Boosting

18 0.39145061 166 nips-2006-Recursive Attribute Factoring

19 0.38349172 177 nips-2006-Sparse Kernel Orthonormalized PLS for feature extraction in large data sets

20 0.38253739 47 nips-2006-Boosting Structured Prediction for Imitation Learning

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.111), (3, 0.027), (7, 0.063), (9, 0.04), (20, 0.013), (22, 0.07), (44, 0.04), (57, 0.075), (65, 0.057), (69, 0.019), (71, 0.024), (90, 0.022), (99, 0.334)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91626775 89 nips-2006-Handling Advertisements of Unknown Quality in Search Advertising

Author: Sandeep Pandey, Christopher Olston

Abstract: We consider how a search engine should select advertisements to display with search results, in order to maximize its revenue. Under the standard “pay-per-click” arrangement, revenue depends on how well the displayed advertisements appeal to users. The main diﬃculty stems from new advertisements whose degree of appeal has yet to be determined. Often the only reliable way of determining appeal is exploration via display to users, which detracts from exploitation of other advertisements known to have high appeal. Budget constraints and ﬁnite advertisement lifetimes make it necessary to explore as well as exploit. In this paper we study the tradeoﬀ between exploration and exploitation, modeling advertisement placement as a multi-armed bandit problem. We extend traditional bandit formulations to account for budget constraints that occur in search engine advertising markets, and derive theoretical bounds on the performance of a family of algorithms. We measure empirical performance via extensive experiments over real-world data. 1

2 0.80703318 49 nips-2006-Causal inference in sensorimotor integration

Author: Konrad P. Körding, Joshua B. Tenenbaum

Abstract: Many recent studies analyze how data from different modalities can be combined. Often this is modeled as a system that optimally combines several sources of information about the same variable. However, it has long been realized that this information combining depends on the interpretation of the data. Two cues that are perceived by different modalities can have different causal relationships: (1) They can both have the same cause, in this case we should fully integrate both cues into a joint estimate. (2) They can have distinct causes, in which case information should be processed independently. In many cases we will not know if there is one joint cause or two independent causes that are responsible for the cues. Here we model this situation as a Bayesian estimation problem. We are thus able to explain some experiments on visual auditory cue combination as well as some experiments on visual proprioceptive cue integration. Our analysis shows that the problem solved by people when they combine cues to produce a movement is much more complicated than is usually assumed, because they need to infer the causal structure that is underlying their sensory experience.

same-paper 3 0.73221445 140 nips-2006-Multiple Instance Learning for Computer Aided Diagnosis

Author: Murat Dundar, Balaji Krishnapuram, R. B. Rao, Glenn M. Fung

4 0.47579467 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

Author: Hamed Valizadegan, Rong Jin

Abstract: Maximum margin clustering was proposed lately and has shown promising performance in recent studies [1, 2]. It extends the theory of support vector machine to unsupervised learning. Despite its good performance, there are three major problems with maximum margin clustering that question its eﬃciency for real-world applications. First, it is computationally expensive and diﬃcult to scale to large-scale datasets because the number of parameters in maximum margin clustering is quadratic in the number of examples. Second, it requires data preprocessing to ensure that any clustering boundary will pass through the origins, which makes it unsuitable for clustering unbalanced dataset. Third, it is sensitive to the choice of kernel functions, and requires external procedure to determine the appropriate values for the parameters of kernel functions. In this paper, we propose “generalized maximum margin clustering” framework that addresses the above three problems simultaneously. The new framework generalizes the maximum margin clustering algorithm by allowing any clustering boundaries including those not passing through the origins. It signiﬁcantly improves the computational eﬃciency by reducing the number of parameters. Furthermore, the new framework is able to automatically determine the appropriate kernel matrix without any labeled data. Finally, we show a formal connection between maximum margin clustering and spectral clustering. We demonstrate the eﬃciency of the generalized maximum margin clustering algorithm using both synthetic datasets and real datasets from the UCI repository. 1

5 0.47482112 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy

Author: Samuel S. Gross, Olga Russakovsky, Chuong B. Do, Serafim Batzoglou

Abstract: We consider the problem of training a conditional random ﬁeld (CRF) to maximize per-label predictive accuracy on a training set, an approach motivated by the principle of empirical risk minimization. We give a gradient-based procedure for minimizing an arbitrarily accurate approximation of the empirical risk under a Hamming loss function. In experiments with both simulated and real data, our optimization procedure gives signiﬁcantly better testing performance than several current approaches for CRF training, especially in situations of high label noise. 1

6 0.47459111 65 nips-2006-Denoising and Dimension Reduction in Feature Space

7 0.47133112 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation

8 0.46918917 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition

9 0.46895269 61 nips-2006-Convex Repeated Games and Fenchel Duality

10 0.46866122 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections

11 0.46848851 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization

12 0.4683153 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment

13 0.46771577 20 nips-2006-Active learning for misspecified generalized linear models

14 0.46662518 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis

15 0.46569142 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space

16 0.46474531 79 nips-2006-Fast Iterative Kernel PCA

17 0.46414247 175 nips-2006-Simplifying Mixture Models through Function Approximation

18 0.46406776 138 nips-2006-Multi-Task Feature Learning

19 0.46394292 165 nips-2006-Real-time adaptive information-theoretic optimization of neurophysiology experiments

20 0.46345964 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency