nips nips2011 nips2011-181 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dan Zhang, Yan Liu, Luo Si, Jian Zhang, Richard D. Lawrence
Abstract: Most existing Multiple-Instance Learning (MIL) algorithms assume data instances and/or data bags are independently and identically distributed. But there often exists rich additional dependency/structure information between instances/bags within many applications of MIL. Ignoring this structure information limits the performance of existing MIL algorithms. This paper explores the research problem as multiple instance learning on structured data (MILSD) and formulates a novel framework that considers additional structure information. In particular, an effective and efficient optimization algorithm has been proposed to solve the original non-convex optimization problem by using a combination of ConcaveConvex Constraint Programming (CCCP) method and an adapted Cutting Plane method, which deals with two sets of constraints caused by learning on instances within individual bags and learning on structured data. Our method has the nice convergence property, with specified precision on each set of constraints. Experimental results on three different applications, i.e., webpage classification, market targeting, and protein fold identification, clearly demonstrate the advantages of the proposed method over state-of-the-art methods. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract Most existing Multiple-Instance Learning (MIL) algorithms assume data instances and/or data bags are independently and identically distributed. [sent-16, score-0.413]
2 This paper explores the research problem as multiple instance learning on structured data (MILSD) and formulates a novel framework that considers additional structure information. [sent-19, score-0.103]
3 , webpage classification, market targeting, and protein fold identification, clearly demonstrate the advantages of the proposed method over state-of-the-art methods. [sent-24, score-0.355]
4 1 Introduction Multiple Instance Learning (MIL) is a variation of the classical learning methods for problems with incomplete knowledge on the instances (or examples) [4]. [sent-25, score-0.116]
5 , a set of instances, rather than individual instances [1, 4, 5, 13]. [sent-28, score-0.116]
6 One major assumption of most existing MIL methods is that instances (and bags) are independently and identically distributed. [sent-30, score-0.116]
7 For example, in business analytics, big corporations often analyze the websites of different companies to look for potential partnerships. [sent-32, score-0.146]
8 Since not all of the webpages in a website are useful, we can treat the whole website of a specific company as a bag, and each webpage in this website is considered as an instance. [sent-33, score-0.48]
9 The hyperlinks between webpages provide important information on various relationships between these companies (e. [sent-34, score-0.424]
10 supply-demand or joint-selling) and more partner companies can be identified if we follow the hyperlinks of existing partners. [sent-36, score-0.259]
11 Another example is protein fold identification [15], whose goal is to predict protein fold with low conservation in primary sequence, e. [sent-37, score-0.479]
12 MIL algorithms have been applied to identify new Trx-fold proteins, where each protein sequence is considered as a bag, and some of its subsequences are instances. [sent-40, score-0.19]
13 The relational information between protein sequences, such as same organism locations or similar species origins, can be used to help the prediction tasks. [sent-41, score-0.201]
14 1 Several recent methods have been proposed to model the dependencies between instances in each bag [11, 12, 21]. [sent-42, score-0.296]
15 However, none of them takes into consideration the relational structure between bags or between instances across bags. [sent-43, score-0.512]
16 Furthermore, most of existing MIL research only uses content similarity for modeling structures between instances in each bag, but does not consider other types of relational structure information (e. [sent-44, score-0.247]
17 While much research work [16, 22] for traditional single instance learning has demonstrated that additional structure information (e. [sent-47, score-0.073]
18 Generally speaking, we summarize three scenarios of the structure information in MIL: (1) the relational structures are on the instance level. [sent-50, score-0.174]
19 For example, in the business partner example, the hyperlinks between different webpages can be considered as the relational structure between instances (either in the same bag or across bags). [sent-51, score-0.781]
20 (2) the structure information is available on the bag level. [sent-52, score-0.18]
21 For example, in protein fold identification task, we can consider the phylogenetic tree to capture the evolutionary dependencies between protein sequences . [sent-53, score-0.34]
22 (3) the structure information is available on both instance level and bag level. [sent-54, score-0.226]
23 We refer these three scenarios of learning problems collectively as multiple instance learning on structured data (MILSD). [sent-55, score-0.105]
24 The model consists of a regularization term that confines the capacity of the classifier, a term that penalizes the difference between the predicted labels of the bags and their true labels, and a graph regularization term based on the structure information. [sent-57, score-0.404]
25 The novelty of the proposed variant of Cutting Plane method lies in modeling dual sets of constraints, i. [sent-62, score-0.094]
26 , one from modeling instances in individual bags, and the other from the structure information, and its ability to control the precisions (i. [sent-64, score-0.263]
27 , 1 and 2 in Table 1) on different sets of constraints separately. [sent-66, score-0.097]
28 The reason why we need to control precisions separately is that since different sets of constraints normally are derived from various sources and have different forms, their characteristics, as well as the required optimization precisions, are very likely to be diverse. [sent-67, score-0.217]
29 Furthermore, we prove an upper bound of the convergence rate of the proposed optimization method, which is a significant result given our optimization scheme for dual constraint sets can also be applied to many other learning problems. [sent-68, score-0.123]
30 1 Methodology Problem Statement and Notation Suppose we are given a set of n labeled bags {(Bi , Yi ), i = 1, 2, · · · , n}, u unlabeled bags {Bi , i = n + 1, n + 2, · · · , n + u}, and a directed or undirected graph G = (V, E) that depicts the structure between either bags or instances. [sent-71, score-1.072]
31 Here, the instances in the bag Bi are denoted as {Bi1 , Bi2 , . [sent-72, score-0.269]
32 , Bini } ∈ X , where ni is the total number of instances in this bag and Yi ∈ {−1, 1}. [sent-75, score-0.269]
33 Each node v ∈ V corresponds to either a bag or an instance in either the labeled set or the unlabeled set, and the j-th edge ej = (p, q) ∈ E represents a link from node p to node q. [sent-76, score-0.344]
34 The task is to learn a classifier w1 based on labeled, unlabeled bags, and the predefined structure graph so that the unlabeled bags can be correctly classified. [sent-77, score-0.48]
35 The soft label for the instance x can be estimated by: f (x) = wT Bij . [sent-78, score-0.075]
36 The soft label of the bag Bi can be modeled as: f (Bi ) = maxj∈Bi wT Bij , and if f (Bi ) > 0, this bag would be labeled as positive and otherwise negative. [sent-79, score-0.386]
37 2 Formulation Our motivation is that labeled bags should be correctly classified and the soft labels of the bags or instances defined on the graph G should be as smooth as possible. [sent-81, score-0.87]
38 Hd (w) penalizes the difference between the estimated bag labels and the given labels. [sent-88, score-0.183]
39 HG (w) n is a graph regularization term based on the given graph G that enforces the smoothness on the soft labels of nodes in the given graph, which can be defined as: µ |E| (p,q)∈E f (v ) f (v ) w(p, q) √ p − √ q , d(p) d(q) where vp and vq are two nodes in the graph. [sent-91, score-0.187]
40 Depending on which one of the three scenarios the graph is defined, we name the formulation where the graph is defined on instances as I-MILSD, the formulation where the graph is defined on bags as B-MILSD, and the formulation where the graph is defined on both bags and instances as BI-MILSD. [sent-95, score-1.175]
41 , n}, µ × |E| w(p, q) maxj∈Bp wT Bpj (p,q)∈E d(p) − maxj∈Bq wT Bqj Yi max wT Bij ≥ 1 − ξi , j∈Bi d(q) (1) where ξi is the hinge loss and µ is the trade-off parameter for the graph term. [sent-105, score-0.074]
42 The formulation proposed in [1] is a special case of the proposed formulation, with µ equals zero. [sent-106, score-0.119]
43 3 Optimization Procedure with CCCP and Multi-Constraint Cutting Plane The formulation in problem (1) combines both the goodness-of-fit for labeled bags and the structure information embedded in the graph. [sent-111, score-0.415]
44 , np }, − ≤ ζ(p,q) d(p) d(q) ∀(p, q) ∈ E, ∀k ∈ {np + 1, . [sent-132, score-0.225]
45 , np + nq }, wT Bq(k−np ) d(q) − wT Bpu(t) p d(p) ≤ ζ(p,q) . [sent-135, score-0.421]
46 However, in many real world applications, the number of the labeled bags as well as the number of links between bags are huge. [sent-137, score-0.645]
47 But different from the method employed in [6], in this paper, we need to deal with two sets of constraints, rather than just one constraint set, with specified precisions separately. [sent-140, score-0.178]
48 The benefit of making this transformation is that, as we shall see later, during each Cutting Plane iteration at most two constraints will be added and therefore the final solution would be extremely sparse, with the number of non-zero dual variables independent of the number of training examples. [sent-148, score-0.214]
49 Now the problem turns to how to solve the problem (3) efficiently, which is convex, but contains two sets of exponential number of constraints due to the large number of feasible c and τ . [sent-149, score-0.127]
50 We present a novel adaption of the Cutting Plane method that can handle the two sets of constraints simultaneously. [sent-150, score-0.097]
51 With these two sets of selected constraints, the solution of the corresponding relaxed problem satisfies all the constraints from problem (3) up to two precisions 1 and 2 , i. [sent-155, score-0.217]
52 , ∀c ∈ {0, 1}n : 1 wT n n i=1 1 wT |E| 1): ci Yi Biu(t) ≥ i |E| j=1 np k=1 1 n n i=1 ci − (ξ + ε1 ) and ∀(τ ∈ {0, 1}|E|×(np +nq ) ) B (t) quq B τjk ( √ pk − √ d(p) d(q) )+ nq k=1 (∀ej ∈ E, B (t) pup B τj(k+np ) ( √ qk − √ d(q) d(p) ) np +nq k=1 τjk ≤ ≤ (ζ + ε2 ). [sent-157, score-0.722]
53 It indicates that the two remaining sets of constraints (that are not added to Ω1 and Ω2 ) will not be violated up to two precisions ε1 and ε2 respectively, and therefore they do not need to be added to Ω1 and Ω2 explicitly. [sent-158, score-0.332]
54 Otherwise, cts ts t ts ts ts will be added to Ω1 if H1 is false, and τ will be added to Ω2 if H2s is false. [sent-169, score-0.687]
55 1 w 2 2 + Cξ + µζ ∀c ∈ Ωts , 1 ∀τ ∈ Ωts , 2 1 T w n wT |E| (8) n ci Yi Biu(t) ≥ i i=1 |E| j=1 1 n n ci − ξ i=1 np nq Bqu(t) Bpu(t) Bpk Bqk q p τjk ( − )+ τj(k+np ) ( − ) d(p) d(q) d(q) d(p) k=1 k=1 ≤ ζ. [sent-172, score-0.497]
56 We will show that the Cutting Plane iterations with two different sets of constraints converge in a fixed number of steps through the following two theorems. [sent-180, score-0.097]
57 Here, ts denotes the s-th Cutting Plane iteration for solving the problem from the t-th CCCP iteration. [sent-193, score-0.161]
58 (The graph can be built solely on the labeled bags, or on an union of both the labeled bags and the unlabeled bags); 4. [sent-198, score-0.502]
59 t(s+1) t = Ω1s t(s+1) t cts , if H1s is false. [sent-225, score-0.105]
60 Update Ω2s by Ω2s+1 = Ω2s t τ ts if H2s is t Ω2s . [sent-227, score-0.128]
61 The most violated constraints are added to both Ω1 and Ωts . [sent-247, score-0.148]
62 However, these works do not explicitly consider the case when several different sets of constraints with specified precisions are involved. [sent-256, score-0.217]
63 The novelty of the proposed method lies on its ability to control these optimization precisions separately, and meanwhile it still enjoys the sparseness of the final solution with respect to the number of dual variables, which is brought by slack variable transformation. [sent-257, score-0.185]
64 But the major limitation of [18] is that they cannot incorporate the relational information into the formulation, and therefore cannot be used here. [sent-260, score-0.072]
65 Furthermore, [18] does not consider dual sets of constraints in optimization, which is less appropriate than the proposed optimization method. [sent-261, score-0.162]
66 1 Webpage Classification In webpage classification, each webpage can be considered as a bag, and its corresponding passages represent its instances [1]. [sent-263, score-0.294]
67 The hyperlinks between different webpages are treated as the additional relational structure/links between different bags. [sent-264, score-0.438]
68 3 Training Ratio (h) in total 8280 webpages in this dataset. [sent-335, score-0.21]
69 The webpages without any incoming and outgoing links are deleted, and 6883 webpages are left. [sent-336, score-0.446]
70 , student, course, and faculty, are used for classification, where each sub-dataset contains all of the webpages/bags from one of the three categories, and the same number of the negative bags randomly sampled from the remaining six categories in WebKB. [sent-339, score-0.297]
71 The hyperlinks between these webpages are used as the structure/link information. [sent-340, score-0.34]
72 To show the effects of the structure information on the performance of MIL methods, we compare the proposed method with the instance-based multiple instance support vector machine (I-miSVM) as well as the bag-based multiple instance support vector machine (B-miSVM) [1]. [sent-346, score-0.146]
73 The formulation of these two methods can be considered as a special case of the proposed method with µ equals zero, and they are two different heuristic iterative ways of implementing the same formulation. [sent-347, score-0.12]
74 Their parameters are also set by 5 fold cross validations. [sent-348, score-0.117]
75 We conduct experiments with LC-MF based on the single instances that we extract for the same set of examples, and the corresponding links. [sent-350, score-0.116]
76 After computing the latent factors, a linear SVM is trained on the training set with the hinge loss parameter C being determined by using 5-fold cross validation. [sent-352, score-0.099]
77 For each experiment, a fixed ratio of bags are chosen as the training set, while the remaining examples are treated as the testing set. [sent-353, score-0.421]
78 The average results over 20 independent runs are reported on the training ratios [0. [sent-354, score-0.069]
79 In Table 2, we further report the performances when the training ratio equals 0. [sent-365, score-0.123]
80 2 Market Targeting Market targeting is a popular topic for big corporations. [sent-369, score-0.085]
81 One of the feasible market targeting strategy is to analyze the websites of the potential partners. [sent-371, score-0.169]
82 But usually not all of the webpages are useful for partner identification. [sent-372, score-0.255]
83 So, it is better to formulate it as a MIL problem, in which each website is considered as a bag, and its 7 associated webpages are considered as instances. [sent-373, score-0.313]
84 Two related companies may be connected through hyperlinks in some of their webpages. [sent-374, score-0.214]
85 In ASCOT, the webpages of around 225 companies are crawled. [sent-376, score-0.294]
86 25 of the companies/bags are labeled as positive, since they are partners of this corporation, while the remaining 200 companies/bags are treated as negative ones. [sent-377, score-0.077]
87 For each company, the webpages with less than 100 unique words are removed and at most 50 webpages with the largest number of unique words6 are selected as instances. [sent-378, score-0.42]
88 The hyperlinks between webpages of different companies are treated as the structure information. [sent-379, score-0.477]
89 For each experiment, we fix the training ratio of positive and negative bags, while the remaining bags are considered as the testing set. [sent-380, score-0.423]
90 For LC-MF, experiments are conducted on the instances which are the averages of the instances in each bag. [sent-391, score-0.232]
91 In Table 2, we report the performances when the training ratio equals 0. [sent-396, score-0.123]
92 On this dataset, B-MILSD performs much better than the comparison methods, especially when the ratio of training examples is low. [sent-398, score-0.098]
93 This is because the hyperlink information helps a lot when the content information is rare in MIL, and the MIL setting is useful to eliminate the useless instances especially when the supervised information is scare. [sent-399, score-0.197]
94 3 Protein Fold Identification In protein fold identification [15], the low conservation of primary sequence in protein superfamilies such as Thioredoxin-fold (Trx-fold) makes conventional modeling methods, such as Hidden Markov Models difficult to use. [sent-401, score-0.397]
95 MIL can be used to identify new Trx-fold proteins naturally, in which each protein sequence is considered as a bag, and some of its subsequences are considered as instances. [sent-402, score-0.262]
96 Following the experiment setting in [15], we conduct 5 fold cross validation to test the performances. [sent-409, score-0.117]
97 3 Conclusions This paper presents a novel machine learning problem – multiple instance learning on structured data (MILSD) for incorporating additional structure information into multiple instance learning. [sent-449, score-0.149]
98 For future work, we plan to adapt the current framework to solve multi-view multiple instance learning on structured data. [sent-465, score-0.076]
99 Identifying predictive structures in relational data using multiple instance learning. [sent-535, score-0.118]
100 Learning from labeled and unlabeled data on a directed graph. [sent-590, score-0.104]
wordName wordTfidf (topN-words)
[('bags', 0.297), ('misvm', 0.297), ('milsd', 0.26), ('cutting', 0.259), ('np', 0.225), ('mil', 0.211), ('webpages', 0.21), ('plane', 0.198), ('nq', 0.196), ('cccp', 0.191), ('bag', 0.153), ('wts', 0.149), ('hyperlinks', 0.13), ('protein', 0.129), ('ts', 0.128), ('precisions', 0.12), ('wt', 0.116), ('instances', 0.116), ('biu', 0.111), ('cts', 0.105), ('lc', 0.102), ('ascot', 0.093), ('bpk', 0.093), ('bpu', 0.093), ('bqu', 0.093), ('hg', 0.089), ('companies', 0.084), ('bi', 0.084), ('mf', 0.084), ('fold', 0.082), ('maxj', 0.082), ('bij', 0.08), ('webpage', 0.075), ('relational', 0.072), ('constraints', 0.068), ('bq', 0.065), ('jk', 0.064), ('seconds', 0.06), ('targeting', 0.06), ('ratio', 0.058), ('bqk', 0.056), ('faculty', 0.053), ('increment', 0.053), ('unlabeled', 0.053), ('labeled', 0.051), ('graph', 0.05), ('hyperlink', 0.049), ('website', 0.047), ('instance', 0.046), ('violated', 0.045), ('cpu', 0.045), ('partner', 0.045), ('proteins', 0.044), ('yi', 0.044), ('market', 0.042), ('ej', 0.041), ('training', 0.04), ('student', 0.04), ('formulation', 0.04), ('hd', 0.038), ('classi', 0.038), ('dual', 0.038), ('ci', 0.038), ('bpj', 0.037), ('bqj', 0.037), ('websites', 0.037), ('maxk', 0.037), ('cross', 0.035), ('added', 0.035), ('iteration', 0.033), ('porter', 0.033), ('lafayette', 0.033), ('subsequences', 0.033), ('content', 0.032), ('auc', 0.031), ('feasible', 0.03), ('labels', 0.03), ('structured', 0.03), ('ratios', 0.029), ('constraint', 0.029), ('sets', 0.029), ('primary', 0.029), ('scenarios', 0.029), ('soft', 0.029), ('considered', 0.028), ('conservation', 0.028), ('purdue', 0.028), ('vq', 0.028), ('identi', 0.028), ('structure', 0.027), ('proposed', 0.027), ('hr', 0.027), ('vishwanathan', 0.027), ('treated', 0.026), ('min', 0.026), ('company', 0.026), ('outgoing', 0.026), ('equals', 0.025), ('big', 0.025), ('hinge', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 181 nips-2011-Multiple Instance Learning on Structured Data
Author: Dan Zhang, Yan Liu, Luo Si, Jian Zhang, Richard D. Lawrence
Abstract: Most existing Multiple-Instance Learning (MIL) algorithms assume data instances and/or data bags are independently and identically distributed. But there often exists rich additional dependency/structure information between instances/bags within many applications of MIL. Ignoring this structure information limits the performance of existing MIL algorithms. This paper explores the research problem as multiple instance learning on structured data (MILSD) and formulates a novel framework that considers additional structure information. In particular, an effective and efficient optimization algorithm has been proposed to solve the original non-convex optimization problem by using a combination of ConcaveConvex Constraint Programming (CCCP) method and an adapted Cutting Plane method, which deals with two sets of constraints caused by learning on instances within individual bags and learning on structured data. Our method has the nice convergence property, with specified precision on each set of constraints. Experimental results on three different applications, i.e., webpage classification, market targeting, and protein fold identification, clearly demonstrate the advantages of the proposed method over state-of-the-art methods. 1
2 0.19869985 168 nips-2011-Maximum Margin Multi-Instance Learning
Author: Hua Wang, Heng Huang, Farhad Kamangar, Feiping Nie, Chris H. Ding
Abstract: Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags. Taking into account the two major challenges in MIL, high heterogeneity on data and weak label association, we propose a novel Maximum Margin Multi-Instance Learning (M3 I) approach to parameterize the C2B distance by introducing the class specific distance metrics and the locally adaptive significance coefficients. We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. Extensive experiments have demonstrated promising results that validate the proposed method.
3 0.14688067 114 nips-2011-Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation
Author: Nico Goernitz, Christian Widmer, Georg Zeller, Andre Kahles, Gunnar Rätsch, Sören Sonnenburg
Abstract: We present a novel regularization-based Multitask Learning (MTL) formulation for Structured Output (SO) prediction for the case of hierarchical task relations. Structured output prediction often leads to difficult inference problems and hence requires large amounts of training data to obtain accurate models. We propose to use MTL to exploit additional information from related learning tasks by means of hierarchical regularization. Training SO models on the combined set of examples from multiple tasks can easily become infeasible for real world applications. To be able to solve the optimization problems underlying multitask structured output learning, we propose an efficient algorithm based on bundle-methods. We demonstrate the performance of our approach in applications from the domain of computational biology addressing the key problem of gene finding. We show that 1) our proposed solver achieves much faster convergence than previous methods and 2) that the Hierarchical SO-MTL approach outperforms considered non-MTL methods. 1
4 0.078622088 165 nips-2011-Matrix Completion for Multi-label Image Classification
Author: Ricardo S. Cabral, Fernando Torre, Joao P. Costeira, Alexandre Bernardino
Abstract: Recently, image categorization has been an active research topic due to the urgent need to retrieve and browse digital images via semantic keywords. This paper formulates image categorization as a multi-label classification problem using recent advances in matrix completion. Under this setting, classification of testing data is posed as a problem of completing unknown label entries on a data matrix that concatenates training and testing features with training labels. We propose two convex algorithms for matrix completion based on a Rank Minimization criterion specifically tailored to visual data, and prove its convergence properties. A major advantage of our approach w.r.t. standard discriminative classification methods for image categorization is its robustness to outliers, background noise and partial occlusions both in the feature and label space. Experimental validation on several datasets shows how our method outperforms state-of-the-art algorithms, while effectively capturing semantic concepts of classes. 1
5 0.073421761 180 nips-2011-Multiple Instance Filtering
Author: Kamil A. Wnuk, Stefano Soatto
Abstract: We propose a robust filtering approach based on semi-supervised and multiple instance learning (MIL). We assume that the posterior density would be unimodal if not for the effect of outliers that we do not wish to explicitly model. Therefore, we seek for a point estimate at the outset, rather than a generic approximation of the entire posterior. Our approach can be thought of as a combination of standard finite-dimensional filtering (Extended Kalman Filter, or Unscented Filter) with multiple instance learning, whereby the initial condition comes with a putative set of inlier measurements. We show how both the state (regression) and the inlier set (classification) can be estimated iteratively and causally by processing only the current measurement. We illustrate our approach on visual tracking problems whereby the object of interest (target) moves and evolves as a result of occlusions and deformations, and partial knowledge of the target is given in the form of a bounding box (training set). 1
6 0.064944431 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
7 0.063813895 258 nips-2011-Sparse Bayesian Multi-Task Learning
8 0.0591571 45 nips-2011-Beating SGD: Learning SVMs in Sublinear Time
9 0.057177417 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation
10 0.055115748 199 nips-2011-On fast approximate submodular minimization
11 0.05392839 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
12 0.053686619 169 nips-2011-Maximum Margin Multi-Label Structured Prediction
13 0.053524498 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes
14 0.052735411 189 nips-2011-Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels
15 0.050777793 217 nips-2011-Practical Variational Inference for Neural Networks
16 0.050620388 284 nips-2011-The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers
17 0.049972799 193 nips-2011-Object Detection with Grammar Models
18 0.043784551 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction
19 0.042429298 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
20 0.041822057 303 nips-2011-Video Annotation and Tracking with Active Learning
topicId topicWeight
[(0, 0.153), (1, 0.032), (2, -0.084), (3, 0.004), (4, 0.009), (5, 0.006), (6, -0.02), (7, -0.039), (8, -0.093), (9, 0.003), (10, 0.001), (11, -0.0), (12, 0.014), (13, 0.014), (14, -0.049), (15, -0.007), (16, -0.074), (17, 0.015), (18, 0.07), (19, -0.03), (20, -0.022), (21, 0.006), (22, 0.072), (23, 0.028), (24, 0.026), (25, 0.06), (26, -0.114), (27, -0.068), (28, -0.061), (29, -0.058), (30, -0.006), (31, -0.082), (32, -0.043), (33, 0.117), (34, 0.004), (35, 0.063), (36, 0.014), (37, 0.071), (38, 0.043), (39, -0.156), (40, -0.037), (41, -0.037), (42, 0.139), (43, -0.142), (44, -0.21), (45, -0.07), (46, 0.062), (47, 0.086), (48, 0.058), (49, -0.1)]
simIndex simValue paperId paperTitle
same-paper 1 0.92878032 181 nips-2011-Multiple Instance Learning on Structured Data
Author: Dan Zhang, Yan Liu, Luo Si, Jian Zhang, Richard D. Lawrence
Abstract: Most existing Multiple-Instance Learning (MIL) algorithms assume data instances and/or data bags are independently and identically distributed. But there often exists rich additional dependency/structure information between instances/bags within many applications of MIL. Ignoring this structure information limits the performance of existing MIL algorithms. This paper explores the research problem as multiple instance learning on structured data (MILSD) and formulates a novel framework that considers additional structure information. In particular, an effective and efficient optimization algorithm has been proposed to solve the original non-convex optimization problem by using a combination of ConcaveConvex Constraint Programming (CCCP) method and an adapted Cutting Plane method, which deals with two sets of constraints caused by learning on instances within individual bags and learning on structured data. Our method has the nice convergence property, with specified precision on each set of constraints. Experimental results on three different applications, i.e., webpage classification, market targeting, and protein fold identification, clearly demonstrate the advantages of the proposed method over state-of-the-art methods. 1
2 0.72623414 114 nips-2011-Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation
Author: Nico Goernitz, Christian Widmer, Georg Zeller, Andre Kahles, Gunnar Rätsch, Sören Sonnenburg
Abstract: We present a novel regularization-based Multitask Learning (MTL) formulation for Structured Output (SO) prediction for the case of hierarchical task relations. Structured output prediction often leads to difficult inference problems and hence requires large amounts of training data to obtain accurate models. We propose to use MTL to exploit additional information from related learning tasks by means of hierarchical regularization. Training SO models on the combined set of examples from multiple tasks can easily become infeasible for real world applications. To be able to solve the optimization problems underlying multitask structured output learning, we propose an efficient algorithm based on bundle-methods. We demonstrate the performance of our approach in applications from the domain of computational biology addressing the key problem of gene finding. We show that 1) our proposed solver achieves much faster convergence than previous methods and 2) that the Hierarchical SO-MTL approach outperforms considered non-MTL methods. 1
3 0.61093354 169 nips-2011-Maximum Margin Multi-Label Structured Prediction
Author: Christoph H. Lampert
Abstract: We study multi-label prediction for structured output sets, a problem that occurs, for example, in object detection in images, secondary structure prediction in computational biology, and graph matching with symmetries. Conventional multilabel classification techniques are typically not applicable in this situation, because they require explicit enumeration of the label set, which is infeasible in case of structured outputs. Relying on techniques originally designed for single-label structured prediction, in particular structured support vector machines, results in reduced prediction accuracy, or leads to infeasible optimization problems. In this work we derive a maximum-margin training formulation for multi-label structured prediction that remains computationally tractable while achieving high prediction accuracy. It also shares most beneficial properties with single-label maximum-margin approaches, in particular formulation as a convex optimization problem, efficient working set training, and PAC-Bayesian generalization bounds. 1
4 0.56803811 168 nips-2011-Maximum Margin Multi-Instance Learning
Author: Hua Wang, Heng Huang, Farhad Kamangar, Feiping Nie, Chris H. Ding
Abstract: Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags. Taking into account the two major challenges in MIL, high heterogeneity on data and weak label association, we propose a novel Maximum Margin Multi-Instance Learning (M3 I) approach to parameterize the C2B distance by introducing the class specific distance metrics and the locally adaptive significance coefficients. We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. Extensive experiments have demonstrated promising results that validate the proposed method.
5 0.54468435 51 nips-2011-Clustered Multi-Task Learning Via Alternating Structure Optimization
Author: Jiayu Zhou, Jianhui Chen, Jieping Ye
Abstract: Multi-task learning (MTL) learns multiple related tasks simultaneously to improve generalization performance. Alternating structure optimization (ASO) is a popular MTL method that learns a shared low-dimensional predictive structure on hypothesis spaces from multiple related tasks. It has been applied successfully in many real world applications. As an alternative MTL approach, clustered multi-task learning (CMTL) assumes that multiple tasks follow a clustered structure, i.e., tasks are partitioned into a set of groups where tasks in the same group are similar to each other, and that such a clustered structure is unknown a priori. The objectives in ASO and CMTL differ in how multiple tasks are related. Interestingly, we show in this paper the equivalence relationship between ASO and CMTL, providing significant new insights into ASO and CMTL as well as their inherent relationship. The CMTL formulation is non-convex, and we adopt a convex relaxation to the CMTL formulation. We further establish the equivalence relationship between the proposed convex relaxation of CMTL and an existing convex relaxation of ASO, and show that the proposed convex CMTL formulation is significantly more efficient especially for high-dimensional data. In addition, we present three algorithms for solving the convex CMTL formulation. We report experimental results on benchmark datasets to demonstrate the efficiency of the proposed algorithms. 1
6 0.52847594 277 nips-2011-Submodular Multi-Label Learning
8 0.44636092 27 nips-2011-Advice Refinement in Knowledge-Based SVMs
9 0.43198854 165 nips-2011-Matrix Completion for Multi-label Image Classification
10 0.4311873 150 nips-2011-Learning a Distance Metric from a Network
11 0.41249847 74 nips-2011-Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
12 0.41192383 254 nips-2011-Similarity-based Learning via Data Driven Embeddings
13 0.4118256 103 nips-2011-Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss
14 0.39853233 213 nips-2011-Phase transition in the family of p-resistances
15 0.39635679 3 nips-2011-A Collaborative Mechanism for Crowdsourcing Prediction Problems
16 0.39616445 33 nips-2011-An Exact Algorithm for F-Measure Maximization
17 0.39496696 45 nips-2011-Beating SGD: Learning SVMs in Sublinear Time
18 0.38432431 193 nips-2011-Object Detection with Grammar Models
19 0.36257663 180 nips-2011-Multiple Instance Filtering
20 0.35252544 211 nips-2011-Penalty Decomposition Methods for Rank Minimization
topicId topicWeight
[(0, 0.035), (4, 0.023), (9, 0.325), (20, 0.03), (26, 0.021), (31, 0.054), (33, 0.037), (43, 0.056), (45, 0.127), (57, 0.025), (65, 0.04), (74, 0.044), (83, 0.061), (89, 0.01), (99, 0.043)]
simIndex simValue paperId paperTitle
same-paper 1 0.73740405 181 nips-2011-Multiple Instance Learning on Structured Data
Author: Dan Zhang, Yan Liu, Luo Si, Jian Zhang, Richard D. Lawrence
Abstract: Most existing Multiple-Instance Learning (MIL) algorithms assume data instances and/or data bags are independently and identically distributed. But there often exists rich additional dependency/structure information between instances/bags within many applications of MIL. Ignoring this structure information limits the performance of existing MIL algorithms. This paper explores the research problem as multiple instance learning on structured data (MILSD) and formulates a novel framework that considers additional structure information. In particular, an effective and efficient optimization algorithm has been proposed to solve the original non-convex optimization problem by using a combination of ConcaveConvex Constraint Programming (CCCP) method and an adapted Cutting Plane method, which deals with two sets of constraints caused by learning on instances within individual bags and learning on structured data. Our method has the nice convergence property, with specified precision on each set of constraints. Experimental results on three different applications, i.e., webpage classification, market targeting, and protein fold identification, clearly demonstrate the advantages of the proposed method over state-of-the-art methods. 1
2 0.71465069 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction
Author: Hai-son P. Le, Ziv Bar-joseph
Abstract: Determining interactions between entities and the overall organization and clustering of nodes in networks is a major challenge when analyzing biological and social network data. Here we extend the Indian Buffet Process (IBP), a nonparametric Bayesian model, to integrate noisy interaction scores with properties of individual entities for inferring interaction networks and clustering nodes within these networks. We present an application of this method to study how microRNAs regulate mRNAs in cells. Analysis of synthetic and real data indicates that the method improves upon prior methods, correctly recovers interactions and clusters, and provides accurate biological predictions. 1
3 0.70262104 69 nips-2011-Differentially Private M-Estimators
Author: Jing Lei
Abstract: This paper studies privacy preserving M-estimators using perturbed histograms. The proposed approach allows the release of a wide class of M-estimators with both differential privacy and statistical utility without knowing a priori the particular inference procedure. The performance of the proposed method is demonstrated through a careful study of the convergence rates. A practical algorithm is given and applied on a real world data set containing both continuous and categorical variables. 1
4 0.55320925 220 nips-2011-Prediction strategies without loss
Author: Michael Kapralov, Rina Panigrahy
Abstract: Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say ‘predict 0’ or ‘predict 1’, and our payoff is +1 if the prediction is correct and −1 otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right predictions so far. In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting 0 or always predicting 1. For a sequence of length T our algorithm has regret 14 T and loss √ 2 2 T e− T in expectation for all strings. We show that the tradeoff between loss and regret is optimal up to constant factors. Our techniques extend to the general setting of N experts, where the related problem of trading off regret to the best expert for regret to the ’special’ expert has been studied by Even-Dar et al. (COLT’07). We obtain essentially zero loss with respect to the special expert and optimal loss/regret tradeoff, improving upon the results of Even-Dar et al and settling the main question left open in their paper. The strong loss bounds of the algorithm have some surprising consequences. First, we obtain a parameter free algorithm for the experts problem that has optimal regret bounds with respect to k-shifting optima, i.e. bounds with respect to the optimum that is allowed to change arms multiple times. Moreover, for any window of size n the regret of our algorithm to any expert never exceeds O( n(log N + log T )), where N is the number of experts and T is the time horizon, while maintaining the essentially zero loss property. 1
5 0.48535711 244 nips-2011-Selecting Receptive Fields in Deep Networks
Author: Adam Coates, Andrew Y. Ng
Abstract: Recent deep learning and unsupervised feature learning systems that learn from unlabeled data have achieved high performance in benchmarks by using extremely large architectures with many features (hidden units) at each layer. Unfortunately, for such large architectures the number of parameters can grow quadratically in the width of the network, thus necessitating hand-coded “local receptive fields” that limit the number of connections from lower level features to higher ones (e.g., based on spatial locality). In this paper we propose a fast method to choose these connections that may be incorporated into a wide variety of unsupervised training methods. Specifically, we choose local receptive fields that group together those low-level features that are most similar to each other according to a pairwise similarity metric. This approach allows us to harness the advantages of local receptive fields (such as improved scalability, and reduced data requirements) when we do not know how to specify such receptive fields by hand or where our unsupervised training algorithm has no obvious generalization to a topographic setting. We produce results showing how this method allows us to use even simple unsupervised training algorithms to train successful multi-layered networks that achieve state-of-the-art results on CIFAR and STL datasets: 82.0% and 60.1% accuracy, respectively. 1
6 0.47987562 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition
7 0.47924861 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms
8 0.47924688 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
9 0.47893485 186 nips-2011-Noise Thresholds for Spectral Clustering
10 0.47525412 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
11 0.47507864 271 nips-2011-Statistical Tests for Optimization Efficiency
12 0.47354972 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
13 0.47325024 189 nips-2011-Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels
14 0.47299942 202 nips-2011-On the Universality of Online Mirror Descent
15 0.47172037 257 nips-2011-SpaRCS: Recovering low-rank and sparse matrices from compressive measurements
16 0.47111976 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction
17 0.47089198 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
18 0.46976054 124 nips-2011-ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning
19 0.46965733 77 nips-2011-Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
20 0.46952698 265 nips-2011-Sparse recovery by thresholded non-negative least squares