nips nips2010 nips2010-36 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yanjun Han, Qing Tao, Jue Wang
Abstract: In multi-instance learning, there are two kinds of prediction failure, i.e., false negative and false positive. Current research mainly focus on avoiding the former. We attempt to utilize the geometric distribution of instances inside positive bags to avoid both the former and the latter. Based on kernel principal component analysis, we define a projection constraint for each positive bag to classify its constituent instances far away from the separating hyperplane while place positive instances and negative instances at opposite sides. We apply the Constrained Concave-Convex Procedure to solve the resulted problem. Empirical results demonstrate that our approach offers improved generalization performance.
Reference: text
sentIndex sentText sentNum sentScore
1 We attempt to utilize the geometric distribution of instances inside positive bags to avoid both the former and the latter. [sent-10, score-0.748]
2 Based on kernel principal component analysis, we define a projection constraint for each positive bag to classify its constituent instances far away from the separating hyperplane while place positive instances and negative instances at opposite sides. [sent-11, score-1.864]
3 We apply the Constrained Concave-Convex Procedure to solve the resulted problem. [sent-12, score-0.058]
4 in [1] to predict the binding ability of a drug from its biochemical structure. [sent-16, score-0.139]
5 A certain drug molecule corresponds to a set of conformations which cannot be differentiated via chemical experiments. [sent-17, score-0.185]
6 A drug is labeled positive if any of its constituent conformations has the binding ability greater than the threshold, otherwise negative. [sent-18, score-0.415]
7 Therefore, each sample (a drug) is a bag of instances (its constituent conformations). [sent-19, score-0.589]
8 In multi-instance learning the label information for positive samples is incomplete in that the instances in a certain positive bag are all labeled positive. [sent-20, score-0.798]
9 Generally, methods for multi-instance learning are modified versions of approaches for supervised learning by shifting the focus from discrimination on instances to discrimination on bags. [sent-21, score-0.269]
10 Examples include image classification [11], document categorization [12], computer aided diagnosis [13], etc. [sent-26, score-0.107]
11 As far as positive bags are concerned, current research usually treat them as labyrinths in which witnesses (responsible positive instances) are encaged, and consider nonwitnesses (other instances) therein to be useless or even distractive. [sent-27, score-1.058]
12 The information carried by nonwitnesses is not well utilized. [sent-28, score-0.469]
13 Factually, nonwitnesses are indispensable for characterizing the overall instance distribution, and thus help to improve the learner. [sent-29, score-0.513]
14 Several researchers realized the importance of nonwitnesses and attempted to utilize them. [sent-30, score-0.498]
15 In MI-kernels [5] and reg-SVM [6], nonwitnesses together with witnesses are squeezed into the kernel matrix. [sent-31, score-0.546]
16 In mi-SVM [4], the labels of all nonwitnesses are treated as unknown integer variables to be optimized. [sent-32, score-0.469]
17 mi-SVM tends to misclassify negative instances in positive bags since the resulted margin will be larger. [sent-33, score-0.92]
18 In MissSVM [7] and stMIL [8], multi-instance learning is addressed from the view of semisupervised learning, and nonwitnesses are treated as unlabeled data, whose labels should be assigned to maximize the margin. [sent-36, score-0.469]
19 sbMIL [8] attempt to estimate the ratio of positive instances inside positive bags and utilize this information in the subsequent classification. [sent-37, score-0.881]
20 1 Figure 1: Illustration of the False Positive Phenomenon: The top image is a positive training sample, and the bottom image is a negative testing sample. [sent-39, score-0.313]
21 The symbol ⊕ and ⊖ respectively denote positive and negative instances. [sent-40, score-0.227]
22 The Point not enveloped is a negative bag of just one instance. [sent-42, score-0.418]
23 The learners f and g are obtained with and without the projection constraint, respectively. [sent-44, score-0.107]
24 The neglect of nonwitnesses in positive bags may lead to false positive and cause a model to misclassify unseen negative samples. [sent-47, score-1.323]
25 For example, in natural scene classification, each image is segmented to a bag of instances beforehand, and each instance is a patch (ROI, Regions Of Interest) characterized by one feature vector describing its color. [sent-48, score-0.595]
26 The task is to predict whether an image contains a waterfall or not (Figure 1). [sent-49, score-0.128]
27 A positive image contains some positive instances corresponding to waterfall and some negative instances from other categories such as sky, stone, grass, etc. [sent-50, score-1.026]
28 , while a negative bag exclusively contains negative instances from other categories. [sent-51, score-0.696]
29 Naturally, some negative instances (patches) only exist in positive bags. [sent-52, score-0.496]
30 For instance, the end of a waterfall is often surrounded by mist. [sent-53, score-0.085]
31 The aforementioned approaches tend to misclassify negative instances in positive bags. [sent-54, score-0.571]
32 Given an unseen image with cirrus cloud and without waterfall, the obtained learner will misclassify this image as positive because cirrus cloud and mist are similar to each other. [sent-56, score-0.564]
33 To avoid both false negative and false positive, we attempt to classify instances inside positive bags far from the separating hyperplane and place positive and negative instances at opposite sides. [sent-57, score-1.672]
34 We achieve this by introducing projection constraints based on kernel principal component analysis into MI-SVM [4]. [sent-58, score-0.188]
35 Each constraint is defined on a positive bag to encourage large variance of its constituent instances along the normal direction of the separating hyperplane. [sent-59, score-0.862]
36 In Section 3 we bring out the projection constraint and the corresponding formulation for multi-instance learning. [sent-62, score-0.163]
37 Let X ⊆ Rp be the space containing instances and D = {(Bi , yi )}m be the training data, where Bi is the ith bag of i=1 instances {xi1 , · · · , xini } and yi ∈ Y is the label for Bi . [sent-67, score-0.777]
38 In addition, denote the index set for instances xij of Bi by Ii . [sent-69, score-0.51]
39 The task is to train 2 a learner to predict the label of an unseen bag. [sent-70, score-0.085]
40 Denote the index sets for positive and negative bags by I+ and I− respectively. [sent-72, score-0.518]
41 Without loss of generality, assume that the instances are ordered in the sequence {x11 , · · · , x1n1 , · · · , xm1 , · · · , xmnm }. [sent-73, score-0.269]
42 We index instances by a function i−1 i−1 i−1 ∑ ∑ ∑ I(xij ) = nk + j. [sent-74, score-0.296]
43 And I(Bi ) returns a vector ( nk + 1, · · · , nk + ni ). [sent-75, score-0.2]
44 However, if both objective function and constraints take the form of a difference between two convex functions, then a non-convex problem can be solved efficiently by the constrained concave-convex procedure [14]. [sent-78, score-0.094]
45 fi (x) − gi (x) ≤ ci , i = 1, · · · , m (1) where fi , gi (i = 0, · · · , m) are real-valued, convex and differentiable functions on Rn . [sent-82, score-0.218]
46 At the t + 1th iteration, the non-convex parts in the objective and constraints are substituted by their first-order Taylor expansions, and the resulted optimization problem is as follows: [ ] min f0 (x) − g0 (x(t) ) + ∇g0 (x(t) )T (x − x(t) ) (2) x [ ] s. [sent-84, score-0.094]
47 fi (x) − gi (x(t) ) + ∇gi (x(t) )T (x − x(t) ) ≤ ci where x(t) is the optimal solution to (2) at the tth iteration. [sent-86, score-0.124]
48 max(w xij + b) ≥ 1 − ξi , ξi ≥ 0, i ∈ I+ T j∈Ii − wT xij − b ≥ 1 − ξij , ξij ≥ 0, j ∈ Ii , i ∈ I− Compared with the conventional SVM, in MI-SVM the notion of slack variables for positive samples is extended from individual instances to bags while that for negative samples remains unchanged. [sent-92, score-1.303]
49 As shown by the first set of max constraints, only the “most positive” instance in a positive bag, or the witness, could affect the margin. [sent-93, score-0.209]
50 And other instances, or nonwitnesses, become irrelevant for determining the position of the separating plane once the witness is specified. [sent-94, score-0.165]
51 The max constraint at first sight seems to well embody the characteristic of multi-instance learning. [sent-95, score-0.088]
52 Indeed, it helps to avoid the false negative, i,e. [sent-96, score-0.097]
53 However, it may incur false positive due to the following two reasons. [sent-98, score-0.23]
54 Firstly, the max constraint aims at discovering the witness, and tends to skip nonwitnesses. [sent-99, score-0.088]
55 Thus each positive bag is approximately oversimplified to a single pattern, i. [sent-100, score-0.372]
56 Secondly, due to the characteristic of the max function and the greediness of optimization methods, the predictions of nonwitnesses are often adjusted above zero in the learning process. [sent-104, score-0.501]
57 Nevertheless, many nonwitnesses in positive bags are factually negative instances. [sent-106, score-1.03]
58 For example, in natural scene classification, 3 many image patches in a positive bag are from the irrelevant background; in document categorization, many posts in a positive bag are not from the target category. [sent-107, score-0.787]
59 Hence, many nonwitnesses are mislabeled as positive, and we obtain a falsely large margin. [sent-108, score-0.469]
60 As shown in Figure 1, MI-SVM classifies half instances in the training sample as positive, and some negative instances are mislabeled. [sent-109, score-0.632]
61 Any approach without properly utilizing nonwitnesses has the same problem. [sent-113, score-0.492]
62 Firstly, we treat the labels of all nonwitnesses as unknown integer variables to be optimized. [sent-115, score-0.469]
63 yij (wT xij + b) ≥ 1 − ξij , ξij ≥ 0, j ∈ Ii , i ∈ I+ ∑ yij + 1 ≥ 1, i ∈ I+ 2 j∈Ii − wT xij − b ≥ 1 − ξij , ξij ≥ 0, j ∈ Ii , i ∈ I− It seems that assigning labels over all nonwitnesses should lead to a reasonable model. [sent-118, score-1.029]
64 Nevertheless, nonwitnesses are usually labeled positive since the consequent margin will be larger. [sent-119, score-0.626]
65 For every instance in positive bags, two slack variables are introduced, measuring the distances from the instance to the positive boundary f (x) = +1 and the negative boundary f (x) = −1 respectively, and the label of the instance depends on the smaller slack variable. [sent-123, score-0.56]
66 Still, there is no mechanism in sbMIL to avoid false positive. [sent-128, score-0.097]
67 Although misclassification of nonwitnesses is alleviated since at least the “most negative” nonwitness is classified correctly, the information carried by most nonwitnesses are not fully utilized. [sent-130, score-0.938]
68 Besides, this solution is not appropriate for applications which involve positive bags only with positive instances. [sent-132, score-0.557]
69 The third solution is the projection constraint proposed in this paper. [sent-133, score-0.163]
70 In a maximum margin framework we want to classify instances in a positive bag far away from the separating hyperplane while place positive instances and negative instances at opposite sides. [sent-134, score-1.575]
71 From another point of view, in the feature (kernel) space, we want to maximize the variance of instances in a positive bag along w, the normal vector of the separating hyperplane. [sent-135, score-0.725]
72 Points not enveloped are negative bags of just one instance. [sent-146, score-0.47]
73 The learner f and g are obtained with and without the projection constraint, respectively. [sent-148, score-0.161]
74 ⊕ and ⊖ denote positive instances and negative instance respectively. [sent-150, score-0.54]
75 Given a positive bag Bi , denote its instances by {xij }ni , and denote the normal vector of the separating plane in the j=1 RKHS by f . [sent-155, score-0.76]
76 |cj | is the distance from ϕ(mi ) to the j=1 projection point of ϕ(xij ). [sent-157, score-0.107]
77 Instead, we average (10) by the bag size ni , and use a common threshold λ to bound the averaged projection distance for different bags from above. [sent-161, score-0.783]
78 We name the obtained inequality “the projection constraint”, as follows: αT L2 α ) 1( oi − T i ≤λ ni α Kα (12) This is equivalent to bounding variance of instances in positive bags along f from below [15]. [sent-162, score-1.027]
79 Substituting (7) into (6), and adding the projection constraint (12) for each positive bag to the resulted problem, we arrive at the following optimization problem: [∑ ] ∑ 1 min αT Kα + C ξi + ξij (13) α,b,ξ 2 i∈I+ s. [sent-163, score-0.593]
80 1 − ξi − j∈Ii ,i∈I− max(kT ij ) α I(x j∈Ii + b) ≤ 0, ξi ≥ 0, i ∈ I+ kT ij ) α + b ≤ −1 + ξij , ξij ≥ 0, j ∈ Ii , i ∈ I− I(x αT (oi · K − L2 )α − λni · αT Kα ≤ 0, i ∈ I+ i 3. [sent-165, score-0.506]
81 The first set of constraints are all in the form of difference of two convex functions since the max function is convex. [sent-167, score-0.092]
82 Thus for any i ∈ I+ , oi · K − L2 is semi-definite positive. [sent-169, score-0.081]
83 Consequently, the third set of constraints i are all in the form of difference of two convex functions. [sent-170, score-0.06]
84 Since the function max in the first set of constraints is nonsmooth, we have to change gradients to subgradients to use the CCCP. [sent-173, score-0.068]
85 1 − ξi − ( βij kI(xij ) α + b) ≤ 0, ξi ≥ 0, i ∈ I+ (19) j∈Ii kT ij ) α I(x + b ≤ −1 + ξij , ξij ≥ 0, j ∈ Ii , i ∈ I− T αT Si α − 2λni · α(t) K(α − α(t) ) ≤ 0, i ∈ I+ where Si = oi · K − L2 . [sent-178, score-0.334]
86 The problem (19) is a quadratically constrained quadratic program (QCQP) i with a convex objective function and convex constraints, and thus can be readily solved via interior point methods [18]. [sent-179, score-0.082]
87 Each drug molecule is a bag of potential conformations (instances). [sent-185, score-0.424]
88 The Musk 1 data set consists of 47 positive bags, 45 negative bags, and totally 476 instances. [sent-186, score-0.25]
89 The Musk 2 data set consists of 39 positive bags, 63 negative bags, and totally 6598 instances. [sent-187, score-0.25]
90 Elephant, tiger and fox are three data sets from image categorization. [sent-189, score-0.123]
91 A bag here is a group of ROIs (Region Of Interests) drawn from a certain image. [sent-191, score-0.239]
92 Each data set contains 100 positive bags and 100 negative bags, and each ROI as an instance is a 230 dimensional vector. [sent-192, score-0.562]
93 : γ in RBF kernel), and the bound parameter λ in the projection constraint. [sent-259, score-0.107]
94 Recall that the difference between our approach and MI-SVM is just the projection constraint. [sent-277, score-0.107]
95 2, the results in Table 1 demonstrates that the strength of nonwitnesses is well utilized via the projection constraint. [sent-279, score-0.576]
96 Each image is regarded as a bag, and the nine dimensional ROIs (Region Of Interests) in it are regarded as its constituent instances. [sent-322, score-0.184]
97 The results again suggest that fully utilizing the nonwitnesses is important for multi-instance classification. [sent-330, score-0.492]
98 5 Conclusion We design a projection constraint to fully exploit nonwitnesses to avoid false positive. [sent-331, score-0.729]
99 Since our approach is basically MI-SVM with projection constraints, the improved results on real world data sets validate the strength of nonwitnesses. [sent-332, score-0.107]
100 We will introduce the universal projection constraint into other existing approaches for multi-instance learning, and related learning tasks, such as multiinstance regression, multi-label multi-instance learning, generalized multi-instance learning, etc. [sent-333, score-0.163]
wordName wordTfidf (topN-words)
[('nonwitnesses', 0.469), ('bags', 0.291), ('instances', 0.269), ('ij', 0.253), ('xij', 0.241), ('bag', 0.239), ('kt', 0.197), ('migraph', 0.15), ('sbmil', 0.149), ('stmil', 0.149), ('ni', 0.146), ('ii', 0.136), ('positive', 0.133), ('musk', 0.128), ('projection', 0.107), ('misssvm', 0.107), ('false', 0.097), ('negative', 0.094), ('bi', 0.092), ('enveloped', 0.085), ('waterfall', 0.085), ('separating', 0.084), ('oi', 0.081), ('constituent', 0.081), ('drug', 0.076), ('conformations', 0.075), ('misclassify', 0.075), ('cccp', 0.067), ('corel', 0.064), ('resulted', 0.058), ('gi', 0.057), ('constraint', 0.056), ('citeseer', 0.056), ('learner', 0.054), ('misclassi', 0.049), ('lt', 0.047), ('rkhs', 0.047), ('witness', 0.046), ('kernel', 0.045), ('ki', 0.045), ('dd', 0.044), ('instance', 0.044), ('image', 0.043), ('cirrus', 0.043), ('factually', 0.043), ('mist', 0.043), ('tiger', 0.042), ('fi', 0.04), ('china', 0.039), ('yij', 0.039), ('fox', 0.038), ('biochemical', 0.037), ('rois', 0.037), ('please', 0.037), ('diverse', 0.036), ('constraints', 0.036), ('plane', 0.035), ('mi', 0.035), ('aided', 0.034), ('centralized', 0.034), ('molecule', 0.034), ('qcqp', 0.034), ('slack', 0.034), ('constrained', 0.034), ('li', 0.032), ('witnesses', 0.032), ('max', 0.032), ('hyperplane', 0.032), ('unseen', 0.031), ('wt', 0.031), ('cj', 0.031), ('elephant', 0.031), ('categorization', 0.03), ('regarded', 0.03), ('taylor', 0.03), ('rows', 0.03), ('benchmark', 0.029), ('roi', 0.029), ('utilize', 0.029), ('classi', 0.029), ('page', 0.028), ('cloud', 0.028), ('nk', 0.027), ('tao', 0.027), ('tth', 0.027), ('ji', 0.027), ('concerned', 0.027), ('opposite', 0.027), ('interests', 0.026), ('binding', 0.026), ('aw', 0.026), ('expansions', 0.026), ('classify', 0.026), ('inside', 0.026), ('dietterich', 0.025), ('labeled', 0.024), ('convex', 0.024), ('totally', 0.023), ('utilizing', 0.023), ('firstly', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 36 nips-2010-Avoiding False Positive in Multi-Instance Learning
Author: Yanjun Han, Qing Tao, Jue Wang
Abstract: In multi-instance learning, there are two kinds of prediction failure, i.e., false negative and false positive. Current research mainly focus on avoiding the former. We attempt to utilize the geometric distribution of instances inside positive bags to avoid both the former and the latter. Based on kernel principal component analysis, we define a projection constraint for each positive bag to classify its constituent instances far away from the separating hyperplane while place positive instances and negative instances at opposite sides. We apply the Constrained Concave-Convex Procedure to solve the resulted problem. Empirical results demonstrate that our approach offers improved generalization performance.
2 0.32254636 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
Author: Fuxin Li, Cristian Sminchisescu
Abstract: We propose an approach to multiple-instance learning that reformulates the problem as a convex optimization on the likelihood ratio between the positive and the negative class for each training instance. This is casted as joint estimation of both a likelihood ratio predictor and the target (likelihood ratio variable) for instances. Theoretically, we prove a quantitative relationship between the risk estimated under the 0-1 classification loss, and under a loss function for likelihood ratio. It is shown that likelihood ratio estimation is generally a good surrogate for the 0-1 loss, and separates positive and negative instances well. The likelihood ratio estimates provide a ranking of instances within a bag and are used as input features to learn a linear classifier on bags of instances. Instance-level classification is achieved from the bag-level predictions and the individual likelihood ratios. Experiments on synthetic and real datasets demonstrate the competitiveness of the approach.
3 0.21720135 151 nips-2010-Learning from Candidate Labeling Sets
Author: Jie Luo, Francesco Orabona
Abstract: In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines.
4 0.11770437 23 nips-2010-Active Instance Sampling via Matrix Partition
Author: Yuhong Guo
Abstract: Recently, batch-mode active learning has attracted a lot of attention. In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances. By employing a Gaussian process framework, this mutual information based instance selection problem can be formulated as a matrix partition problem. Although matrix partition is an NP-hard combinatorial optimization problem, we show that a good local solution can be obtained by exploiting an effective local optimization technique on a relaxed continuous optimization problem. The proposed active learning approach is independent of employed classification models. Our empirical studies show this approach can achieve comparable or superior performance to discriminative batch-mode active learning methods. 1
5 0.10433047 25 nips-2010-Active Learning by Querying Informative and Representative Examples
Author: Sheng-jun Huang, Rong Jin, Zhi-hua Zhou
Abstract: Most active learning approaches select either informative or representative unlabeled instances to query their labels. Although several active learning algorithms have been proposed to combine the two criteria for query selection, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this challenge by a principled approach, termed Q UIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an instance. Extensive experimental results show that the proposed Q UIRE approach outperforms several state-of -the-art active learning approaches. 1
6 0.087466531 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
7 0.085272573 161 nips-2010-Linear readout from a neural population with partial correlation data
8 0.078227103 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone
9 0.074344195 72 nips-2010-Efficient algorithms for learning kernels from multiple similarity matrices with general convex loss functions
10 0.072497927 235 nips-2010-Self-Paced Learning for Latent Variable Models
11 0.068520382 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning
12 0.060689308 124 nips-2010-Inductive Regularized Learning of Kernel Functions
13 0.060200889 228 nips-2010-Reverse Multi-Label Learning
14 0.058719732 149 nips-2010-Learning To Count Objects in Images
15 0.056974288 22 nips-2010-Active Estimation of F-Measures
16 0.055908654 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition
17 0.055718143 63 nips-2010-Distributed Dual Averaging In Networks
18 0.055522636 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning
19 0.054801766 215 nips-2010-Probabilistic Deterministic Infinite Automata
20 0.054064818 1 nips-2010-(RF)^2 -- Random Forest Random Field
topicId topicWeight
[(0, 0.168), (1, 0.065), (2, 0.053), (3, -0.092), (4, 0.063), (5, 0.05), (6, -0.027), (7, -0.066), (8, 0.022), (9, -0.15), (10, -0.034), (11, 0.025), (12, 0.012), (13, 0.087), (14, 0.022), (15, -0.031), (16, -0.228), (17, 0.019), (18, 0.068), (19, 0.028), (20, 0.035), (21, -0.088), (22, -0.013), (23, 0.121), (24, 0.045), (25, 0.125), (26, 0.139), (27, -0.18), (28, -0.214), (29, 0.025), (30, 0.02), (31, 0.086), (32, 0.132), (33, 0.018), (34, -0.119), (35, 0.074), (36, 0.036), (37, 0.022), (38, -0.173), (39, -0.201), (40, 0.121), (41, 0.064), (42, 0.034), (43, -0.02), (44, -0.056), (45, -0.059), (46, 0.073), (47, -0.03), (48, 0.014), (49, 0.052)]
simIndex simValue paperId paperTitle
same-paper 1 0.96059686 36 nips-2010-Avoiding False Positive in Multi-Instance Learning
Author: Yanjun Han, Qing Tao, Jue Wang
Abstract: In multi-instance learning, there are two kinds of prediction failure, i.e., false negative and false positive. Current research mainly focus on avoiding the former. We attempt to utilize the geometric distribution of instances inside positive bags to avoid both the former and the latter. Based on kernel principal component analysis, we define a projection constraint for each positive bag to classify its constituent instances far away from the separating hyperplane while place positive instances and negative instances at opposite sides. We apply the Constrained Concave-Convex Procedure to solve the resulted problem. Empirical results demonstrate that our approach offers improved generalization performance.
2 0.80228782 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
Author: Fuxin Li, Cristian Sminchisescu
Abstract: We propose an approach to multiple-instance learning that reformulates the problem as a convex optimization on the likelihood ratio between the positive and the negative class for each training instance. This is casted as joint estimation of both a likelihood ratio predictor and the target (likelihood ratio variable) for instances. Theoretically, we prove a quantitative relationship between the risk estimated under the 0-1 classification loss, and under a loss function for likelihood ratio. It is shown that likelihood ratio estimation is generally a good surrogate for the 0-1 loss, and separates positive and negative instances well. The likelihood ratio estimates provide a ranking of instances within a bag and are used as input features to learn a linear classifier on bags of instances. Instance-level classification is achieved from the bag-level predictions and the individual likelihood ratios. Experiments on synthetic and real datasets demonstrate the competitiveness of the approach.
3 0.74814087 151 nips-2010-Learning from Candidate Labeling Sets
Author: Jie Luo, Francesco Orabona
Abstract: In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines.
4 0.44016194 23 nips-2010-Active Instance Sampling via Matrix Partition
Author: Yuhong Guo
Abstract: Recently, batch-mode active learning has attracted a lot of attention. In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances. By employing a Gaussian process framework, this mutual information based instance selection problem can be formulated as a matrix partition problem. Although matrix partition is an NP-hard combinatorial optimization problem, we show that a good local solution can be obtained by exploiting an effective local optimization technique on a relaxed continuous optimization problem. The proposed active learning approach is independent of employed classification models. Our empirical studies show this approach can achieve comparable or superior performance to discriminative batch-mode active learning methods. 1
5 0.39164203 25 nips-2010-Active Learning by Querying Informative and Representative Examples
Author: Sheng-jun Huang, Rong Jin, Zhi-hua Zhou
Abstract: Most active learning approaches select either informative or representative unlabeled instances to query their labels. Although several active learning algorithms have been proposed to combine the two criteria for query selection, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this challenge by a principled approach, termed Q UIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an instance. Extensive experimental results show that the proposed Q UIRE approach outperforms several state-of -the-art active learning approaches. 1
6 0.38754171 3 nips-2010-A Bayesian Framework for Figure-Ground Interpretation
7 0.34617451 267 nips-2010-The Multidimensional Wisdom of Crowds
8 0.33593774 1 nips-2010-(RF)^2 -- Random Forest Random Field
9 0.33259228 235 nips-2010-Self-Paced Learning for Latent Variable Models
10 0.31746534 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
11 0.3085683 228 nips-2010-Reverse Multi-Label Learning
12 0.2937372 192 nips-2010-Online Classification with Specificity Constraints
13 0.29146367 287 nips-2010-Worst-Case Linear Discriminant Analysis
14 0.28839663 22 nips-2010-Active Estimation of F-Measures
15 0.28780794 214 nips-2010-Probabilistic Belief Revision with Structural Constraints
16 0.28399557 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
17 0.27719858 72 nips-2010-Efficient algorithms for learning kernels from multiple similarity matrices with general convex loss functions
18 0.27267602 114 nips-2010-Humans Learn Using Manifolds, Reluctantly
19 0.25361827 249 nips-2010-Spatial and anatomical regularization of SVM for brain image analysis
20 0.25331205 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition
topicId topicWeight
[(13, 0.055), (17, 0.025), (27, 0.036), (30, 0.039), (35, 0.033), (45, 0.188), (50, 0.042), (52, 0.037), (60, 0.052), (74, 0.251), (77, 0.041), (78, 0.066), (90, 0.043)]
simIndex simValue paperId paperTitle
1 0.87133574 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization
Author: Shinichi Nakajima, Masashi Sugiyama, Ryota Tomioka
Abstract: Bayesian methods of matrix factorization (MF) have been actively explored recently as promising alternatives to classical singular value decomposition. In this paper, we show that, despite the fact that the optimization problem is non-convex, the global optimal solution of variational Bayesian (VB) MF can be computed analytically by solving a quartic equation. This is highly advantageous over a popular VBMF algorithm based on iterated conditional modes since it can only find a local optimal solution after iterations. We further show that the global optimal solution of empirical VBMF (hyperparameters are also learned from data) can also be analytically computed. We illustrate the usefulness of our results through experiments.
2 0.7948578 159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features
Author: Abhay Jha, Vibhav Gogate, Alexandra Meliou, Dan Suciu
Abstract: Lifted Inference algorithms for representations that combine first-order logic and graphical models have been the focus of much recent research. All lifted algorithms developed to date are based on the same underlying idea: take a standard probabilistic inference algorithm (e.g., variable elimination, belief propagation etc.) and improve its efficiency by exploiting repeated structure in the first-order model. In this paper, we propose an approach from the other side in that we use techniques from logic for probabilistic inference. In particular, we define a set of rules that look only at the logical representation to identify models for which exact efficient inference is possible. Our rules yield new tractable classes that could not be solved efficiently by any of the existing techniques. 1
same-paper 3 0.79239511 36 nips-2010-Avoiding False Positive in Multi-Instance Learning
Author: Yanjun Han, Qing Tao, Jue Wang
Abstract: In multi-instance learning, there are two kinds of prediction failure, i.e., false negative and false positive. Current research mainly focus on avoiding the former. We attempt to utilize the geometric distribution of instances inside positive bags to avoid both the former and the latter. Based on kernel principal component analysis, we define a projection constraint for each positive bag to classify its constituent instances far away from the separating hyperplane while place positive instances and negative instances at opposite sides. We apply the Constrained Concave-Convex Procedure to solve the resulted problem. Empirical results demonstrate that our approach offers improved generalization performance.
4 0.66535723 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors
Author: Alessandro Chiuso, Gianluigi Pillonetto
Abstract: We introduce a new Bayesian nonparametric approach to identification of sparse dynamic linear systems. The impulse responses are modeled as Gaussian processes whose autocovariances encode the BIBO stability constraint, as defined by the recently introduced “Stable Spline kernel”. Sparse solutions are obtained by placing exponential hyperpriors on the scale factors of such kernels. Numerical experiments regarding estimation of ARMAX models show that this technique provides a definite advantage over a group LAR algorithm and state-of-the-art parametric identification techniques based on prediction error minimization. 1
5 0.65674078 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
Author: Leonidas Lefakis, Francois Fleuret
Abstract: The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1
6 0.65639555 222 nips-2010-Random Walk Approach to Regret Minimization
7 0.65404159 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning
8 0.65055329 265 nips-2010-The LASSO risk: asymptotic results and real world examples
9 0.6498118 151 nips-2010-Learning from Candidate Labeling Sets
10 0.64943641 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
11 0.64821124 74 nips-2010-Empirical Bernstein Inequalities for U-Statistics
12 0.64820111 63 nips-2010-Distributed Dual Averaging In Networks
13 0.64670664 282 nips-2010-Variable margin losses for classifier design
14 0.64614046 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
15 0.6458323 243 nips-2010-Smoothness, Low Noise and Fast Rates
16 0.64579809 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
17 0.64574832 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
18 0.64567959 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
19 0.6450457 138 nips-2010-Large Margin Multi-Task Metric Learning
20 0.64486194 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups