nips nips2009 nips2009-102 knowledge-graph by maker-knowledge-mining

102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models


Source: pdf

Author: Jing Gao, Feng Liang, Wei Fan, Yizhou Sun, Jiawei Han

Abstract: Ensemble classifiers such as bagging, boosting and model averaging are known to have improved accuracy and robustness over a single model. Their potential, however, is limited in applications which have no access to raw data but to the meta-level model output. In this paper, we study ensemble learning with output from multiple supervised and unsupervised models, a topic where little work has been done. Although unsupervised models, such as clustering, do not directly generate label prediction for each individual, they provide useful constraints for the joint prediction of a set of related objects. We propose to consolidate a classification solution by maximizing the consensus among both supervised predictions and unsupervised constraints. We cast this ensemble task as an optimization problem on a bipartite graph, where the objective function favors the smoothness of the prediction over the graph, as well as penalizing deviations from the initial labeling provided by supervised models. We solve this problem through iterative propagation of probability estimates among neighboring nodes. Our method can also be interpreted as conducting a constrained embedding in a transformed space, or a ranking on the graph. Experimental results on three real applications demonstrate the benefits of the proposed method over existing alternatives1 . 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this paper, we study ensemble learning with output from multiple supervised and unsupervised models, a topic where little work has been done. [sent-6, score-0.457]

2 Although unsupervised models, such as clustering, do not directly generate label prediction for each individual, they provide useful constraints for the joint prediction of a set of related objects. [sent-7, score-0.23]

3 We propose to consolidate a classification solution by maximizing the consensus among both supervised predictions and unsupervised constraints. [sent-8, score-0.51]

4 We cast this ensemble task as an optimization problem on a bipartite graph, where the objective function favors the smoothness of the prediction over the graph, as well as penalizing deviations from the initial labeling provided by supervised models. [sent-9, score-0.397]

5 Traditional ensemble methods such as bagging, boosting and model averaging are known to have improved accuracy and robustness over a single model. [sent-14, score-0.276]

6 In this paper, we consider the general problem of combining output of multiple supervised and unsupervised models to improve prediction accuracy. [sent-20, score-0.336]

7 Although unsupervised models, such as clustering, do not directly generate label predictions, they provide useful constraints for the classification task. [sent-21, score-0.23]

8 The rationale is that objects that are in the same cluster should be more likely to receive the same class label than the ones in different clusters. [sent-22, score-0.296]

9 Furthermore, incorporating the unsupervised clustering models into classification ensembles improves the base model diversity, and thus has the potential of improving prediction accuracy. [sent-23, score-0.485]

10 The output of the four models are: M1 = {1, 1, 1, 2, 3, 3, 2} M2 = {1, 1, 2, 2, 2, 3, 1} M3 = {2, 2, 1, 3, 3, 1, 3} M4 = {1, 2, 3, 1, 2, 1, 1} where M1 and M2 assign each object a class label, whereas M3 and M4 simply partition the objects into three clusters and assign each object a cluster ID. [sent-57, score-0.465]

11 Each model, no matter it is supervised or unsupervised, partitions X into groups, and objects in the same group share either the same predicted class label or the same cluster ID. [sent-58, score-0.575]

12 In the graph, nodes at the left denote the groups output by the m models with some labeled ones from the supervised models, nodes at the right denote the n objects, and a group and an object are connected if the object is assigned to the group by one of the models. [sent-60, score-1.122]

13 For the aforementioned toy example, we show the groups obtained from a classifier M1 and a clustering model M3 in Figure 1, as well as the group-object bipartite graph in Figure 2. [sent-61, score-0.438]

14 The objective is to predict the class label of xi ∈ X, which agrees with the base classifiers’ predictions, and meanwhile, satisfies the constraints enforced by the clustering models, as much as possible. [sent-62, score-0.411]

15 To reach maximum consensus among all the models, we define an optimization problem over the bipartite graph whose objective function penalizes deviations from the base classifiers’ predictions, and discrepancies of predicted class labels among nearby nodes. [sent-63, score-0.67]

16 In the toy example, the consensus label predictions for X should be {1, 1, 1, 2, 2, 3, 2}. [sent-64, score-0.385]

17 We summarize various learning problems in Figure 3, where one dimension represents the goal – from unsupervised to supervised, and the other dimension represents the method – single models, ensembles at the raw data, or ensembles at the output level. [sent-66, score-0.383]

18 Our proposed method is a semi-supervised ensemble working at the output level, where little work has been done. [sent-67, score-0.248]

19 Recent studies reveal that unsupervised information can also be utilized to improve the accuracy of supervised learning, which leads to semi-supervised [29, 8] and transductive learning [21]. [sent-69, score-0.304]

20 In Figure 3, we divide ensemble methods into two categories depending on whether they require access to raw data. [sent-73, score-0.26]

21 In unsupervised learning, many clustering ensemble methods [12, 17, 25, 26] have been developed to find a consensus clustering from multiple partitionings without accessing the features. [sent-74, score-0.86]

22 In supervised learning, however, only majority voting type algorithms work on the model output level, and most well-known classification ensemble approaches [2, 11, 19] (eg. [sent-75, score-0.448]

23 Methods such as mixture of experts [20] and stacked generalization [27] try to obtain a meta-learner on top of the model output, however, they still need the labels of the raw data as feedbacks, so we position them as an intermediate between raw data ensemble and output ensemble. [sent-77, score-0.503]

24 Therefore, it can be regarded as a semi-supervised ensemble requiring access to the raw data. [sent-79, score-0.297]

25 The proposed consensus maximization problem is a challenging problem that cannot be solved by simple majority voting. [sent-81, score-0.388]

26 In Section 2, we formally define the graph-based consensus maximization problem and propose an iterative algorithm to solve it. [sent-83, score-0.319]

27 The proposed solution propagates labeled information among neighboring nodes until stabilization. [sent-84, score-0.275]

28 We also present two different interpretations of the proposed method in Section 3, and discuss how to incorporate feedbacks obtained from a few labeled target objects into the framework in Section 4. [sent-85, score-0.362]

29 2 Methodology Suppose we have the output of r classification algorithms and (m − r) clustering algorithms on a data set X. [sent-87, score-0.257]

30 For the sake of simplicity, we assume that each point is assigned to only one class or cluster in each of the m algorithms, and the number of clusters in each clustering algorithm is c, same as the number of classes. [sent-88, score-0.266]

31 So each base algorithm partitions X into c groups and there are totally v = mc groups, where the first s = rc groups are generated by classifiers and the remaining v − s groups are from clustering algorithms. [sent-90, score-0.532]

32 We represent the objects and groups in a bipartite graph as shown in Figure 2, where the object nodes x1 , . [sent-93, score-0.62]

33 The affinity matrix An×v of this graph summarizes the output of m algorithms on X: aij = 1, if xi is assigned to group gj by one of the algorithms; 0, otherwise. [sent-100, score-0.861]

34 As a nuisance parameter, the conditional probabilities at each group node gj are also estimated. [sent-102, score-0.332]

35 These conditional probabilities are denoted by Un×c for object nodes and Qv×c for group nodes: ˆ uiz = P (y = z|xi ) and ˆ qjz = P (y = z|gj ). [sent-103, score-1.037]

36 Since the first s = rc groups are obtained from supervised learning models, they have some initial class label estimates denoted by Yv×c where yjz = 1, if gj ’s predicted label is z, j = 1, . [sent-104, score-0.721]

37 c Let kj = z=1 yjz , and we formulate the consensus agreement as the following optimization problem on the graph: n v Q,U Q,U v aij ||ui· − qj· ||2 + α min f (Q, U ) = min i=1 j=1 kj ||qj· − yj· ||2 (1) j=1 s. [sent-108, score-1.035]

38 The first term ensures that if an object xi is assigned to group gj by one of the algorithm, their conditional probability estimates must be close. [sent-113, score-0.392]

39 , s, the group node gj is from a classifier, so kj = 1 and the second term puts the constraints that a group gj ’s consensus class label estimate should not deviate much from its initial class label prediction. [sent-117, score-1.336]

40 , v, gj is a group from an unsupervised model with no such constraints, and thus kj = 0 and the weight of the constraint is 0. [sent-122, score-0.52]

41 The corresponding Hessian matrix is diagonal with entries equal to 3 Algorithm 1 BGCM algorithm Table 1: Important Notations Input: group-object affinity matrix A, initial labeling matrix Y ; parameters α and ; Symbol Output: consensus matrix U ; 1, . [sent-126, score-0.291]

42 (t) q j· = n i=1 (t−1) aij u i· + αkj yj· n aij + αkj i=1 (t) u i· = (t) v j=1 aij q j· v j=1 aij (2) n The update formula in matrix forms are given in Algorithm 1. [sent-142, score-1.6]

43 Dv = diag ( i=1 aij ) v×v and v c Dn = diag ( j=1 aij ) n×n act as the normalization factors. [sent-143, score-0.854]

44 Kv = diag ( z=1 yjz ) v×v indicates the existence of constraints on the group nodes. [sent-144, score-0.319]

45 , Q) receives the information from its neighboring object nodes while retains its initial value Y , and in return the updated probability estimates at group nodes propagate the information back to its neighboring object nodes when updating U . [sent-147, score-0.753]

46 In this paper, we bring up the concept of consensus maximization and solve the problem over a bipartite graph representation. [sent-150, score-0.505]

47 To apply SSL algorithms on our problem, we must first fuse all supervised models into one by some ensemble approach, and fuse all unsupervised models into one by defining a similarity function. [sent-158, score-0.549]

48 Such a compression may lead to information loss, whereas the proposed method retains all the information and thus consensus can be reached among all the based model output. [sent-159, score-0.292]

49 Now we focus on the “hard” consensus solution, i. [sent-162, score-0.265]

50 So U and Q are indicator matrices: uiz = 1 if the ensemble assigns xi to class z, and 0 otherwise; similar for qjz ’s. [sent-165, score-0.906]

51 For group nodes from classification algorithms, we will treat their entries in Q as known since they have been assigned a class label by one of the classifiers, that is, qjz = yjz for 1 ≤ j ≤ s. [sent-166, score-0.95]

52 Because U represents the consensus, we should let group gj correspond to class z if majority of the objects in group gj correspond to class z in the consensus solution. [sent-167, score-1.111]

53 The optimization is thus: v c min qjz − Q,U c s. [sent-168, score-0.38]

54 , n} z=1 n i=1 aij uiz n i=1 aij qjz = 1 ∀j ∈ {s+1, . [sent-173, score-1.487]

55 , v} uiz ∈ {0, 1} qjz ∈ {0, 1} (4) z=1 qjz = 1 ∀j ∈ {1, . [sent-176, score-1.067]

56 , s} if gj ’s label is not z 4 (5) Here, the two indicator matrices U and Q can be viewed as embedding x1 , . [sent-182, score-0.295]

57 a·j denotes the objects group gj contains, qj· can be regarded as the group representative in this new space, and n a u thus it should be close to the group mean: i=1 ijij i· . [sent-191, score-0.726]

58 When aij = 1, no matter n n n qjz is 1 or 0, we have |qjz i=1 aij − i=1 aij uiz | = i=1 |aij (qjz − uiz )|. [sent-197, score-2.194]

59 Therefore, c n i=1 aij uiz n i=1 aij qjz − j:aij =1 z=1 c = qjz n i=1 j:aij =1 z=1 aij − n i=1 n i=1 c aij uiz aij j:aij =1 z=1 n i=1 Suppose the groups found by the base models have balanced size, i. [sent-198, score-3.628]

60 (3) is equivalent n v c to: min Q,U i=1 j=1 aij z=1 |qjz − uiz |. [sent-204, score-0.707]

61 2) uiz and qjz are relaxed to have values between 0 and 1, instead of either 0 or 1, and quadratic cost functions replace the L1 norms. [sent-207, score-0.687]

62 Therefore, we can view our proposed method as embedding both object nodes and group nodes into a hyperlane so that object nodes are close to the group nodes they link to. [sent-210, score-1.055]

63 The constraints are put on the group nodes from supervised models to penalize the embedding that are far from the “ideal” ones. [sent-211, score-0.534]

64 Our method can also be viewed as conducting ranking with respect to each class on the bipartite graph, where group nodes from supervised models act as queries. [sent-213, score-0.701]

65 Suppose we wish to know the probability of any group gj belonging to class 1, which can be regarded n as the relevance score of gj with respect to example queries from class 1. [sent-214, score-0.615]

66 In Algorithm 1, the relevance scores of all the groups are learnt using the following equation: −1 −1 −1 q·1 = (Dv + αKv )−1 (AT Dn Aq·1 + αKv y·1 ) = Dλ (Dv AT Dn A)q·1 + D1−λ y·1 where the v × v diagonal matrices Dλ and D1−λ have (j, j) entries as wj wj +αkj and αkj wj +αkj . [sent-216, score-0.333]

67 Consider collapsing the original bipartite graph into a graph with group nodes only, then AT A is its −1 −1 affinity matrix. [sent-217, score-0.525]

68 The groups that are predicted to be in class 1 by one of the supervised models have 1 at the corresponding entries in y·1 , therefore these group nodes are “queries” and we wish to rank the group nodes according to their relevance to them. [sent-219, score-0.946]

69 2) In PageRank, the weights Dλ and D1−λ are fixed constants λ and 1 − λ, whereas in our model Dλ and D1−λ give personalized damping factors, where each group wj has a damping factor λj = wj +αkj . [sent-222, score-0.37]

70 When each base model generates balanced groups, both λj and outlinks at each node become constants, and the proposed method simulates the standard personalized PageRank. [sent-224, score-0.294]

71 4 Incorporating Labeled Information Thus far, we propose to combine the output of supervised and unsupervised models by consensus. [sent-263, score-0.31]

72 However, incorporating labels from even a small portion of the objects may greatly refine the final hypothesis. [sent-265, score-0.223]

73 We assume that labels of the first l objects are known, which is encoded in an n × c matrix F : fiz = 1, xi ’s observed label is z, i = 1, . [sent-266, score-0.32]

74 (1) to penalize the deviation of ui· of labeled objects from the observed label: n v v 2 f (Q, U ) = aij ||ui· − qj· || + α i=1 j=1 n 2 hi ||ui· − fi· ||2 kj ||qj· − yj· || + β j=1 (6) i=1 c where hi = z=1 fiz . [sent-271, score-0.896]

75 , l, hi = 1, so we enforce the constraints that an object xi ’s consensus class label estimate should be close to its observed label with a shadow price β. [sent-275, score-0.708]

76 To update the condition probability for the objects, we incorporate their prior labeled information: v t j=1 aij q j· + βhi fi· ut = (7) v i· j=1 aij + βhi In matrix forms, it would be U t = (Dn + βHn )−1 (AQt + βHn F ) with Hn = c diag ( z=1 fiz ) n×n . [sent-281, score-0.966]

77 The proposed algorithm generates a consolidated classification solution for the target set based on both classification and clustering results. [sent-290, score-0.31]

78 We learn logistic regression [15] and SVM models [7] from the training sets, and apply these models, as well as K-means and min-cut clustering algorithms [22] on the target sets. [sent-418, score-0.352]

79 We apply logistic regression classifiers and K-means clustering algorithms on the two views of the target sets. [sent-424, score-0.333]

80 Logistic regression and K-means clustering algorithms are used to derive the predictions on the target set. [sent-432, score-0.308]

81 On each target set, we apply four models M1 to M4 , where the first two are classification models and the remaining two are clustering models. [sent-435, score-0.339]

82 As shown in Figure 3, only clustering ensembles, majority voting methods, and the proposed BGCM algorithm work at the meta output level where raw data are discarded and only prediction results from multiple models are available. [sent-437, score-0.488]

83 However, majority voting can not be applied when there are clustering models because the correspondence between clusters and classes is unknown. [sent-438, score-0.288]

84 Therefore, we compare BGCM with two clustering ensemble approaches (MCLA [26] and HBGF [12]), which ignore the label information from supervised models, regard all the base models as unsupervised clustering, and integrate the output of the base models. [sent-439, score-0.869]

85 To evaluate classification accuracy, we map the output of all the clustering algorithms (the base models, and the ensembles) to the best possible class predictions with the help of hungarian method [5], where cluster IDs are matched with class labels. [sent-441, score-0.514]

86 Actually, it is “cheating” because the true class labels are used to do the mapping, and thus it should be able to generate the best accuracy from these unsupervised models. [sent-442, score-0.255]

87 By combining all the base models, the clustering ensemble approaches (MCLA and HBGF) can improve the performance over each single model. [sent-451, score-0.396]

88 Even when taking variance into consideration, the results demonstrate the power of consensus maximization in accuracy improvements. [sent-482, score-0.363]

89 α and β are the shadow prices paid for deviating from the estimated labels of groups and observed labels of objects, so they should be greater than 0. [sent-486, score-0.266]

90 α and β represent the confidence of our belief in the labels of the groups and objects compared with 1. [sent-487, score-0.282]

91 The labels of group nodes are obtained from supervised models and may not be correct, therefore, a smaller α usually achieves better performance. [sent-488, score-0.488]

92 Also, we fix the target set as 80% of all the objects, and use 1% to 20% as the labeled objects to see how the performance varies, and the results are summarized in Figure 4 (c). [sent-492, score-0.303]

93 We vary the number of base models incorporated into the consensus framework. [sent-496, score-0.389]

94 6 Conclusions In this work, we take advantage of the complementary predictive powers of multiple supervised and unsupervised models to derive a consolidated label assignment for a set of objects jointly. [sent-503, score-0.525]

95 We propose to summarize base model output in a group-object bipartite graph, and maximize the consensus by promoting smoothness of label assignment over the graph and consistency with the initial labeling. [sent-504, score-0.668]

96 The problem is solved by propagating labeled information between group and object nodes through their links iteratively. [sent-505, score-0.435]

97 The proposed method can be interpreted as conducting an embedding of object and group nodes into a new space, as well as an un-normalized personalized PageRank. [sent-506, score-0.539]

98 When a few labeled objects are available, the proposed method uses them to guide the propagation and refine the final hypothesis. [sent-507, score-0.234]

99 In the experiments on 20 newsgroup, Cora and DBLP data, the proposed consensus maximization method improves the best base model accuracy by 2% to 10%. [sent-508, score-0.468]

100 Heterogeneous source consensus learning via decision propagation and negotiation. [sent-597, score-0.265]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('aij', 0.4), ('qjz', 0.38), ('uiz', 0.307), ('consensus', 0.265), ('ensemble', 0.167), ('clustering', 0.151), ('gj', 0.15), ('group', 0.139), ('qj', 0.136), ('nodes', 0.134), ('kj', 0.131), ('bgcm', 0.127), ('dblp', 0.127), ('objects', 0.122), ('bipartite', 0.12), ('supervised', 0.11), ('yjz', 0.108), ('groups', 0.101), ('unsupervised', 0.1), ('target', 0.096), ('pagerank', 0.095), ('raw', 0.093), ('cora', 0.087), ('labeled', 0.085), ('label', 0.085), ('base', 0.078), ('object', 0.077), ('dn', 0.077), ('newsgroup', 0.074), ('classi', 0.073), ('bagging', 0.073), ('ensembles', 0.068), ('dv', 0.067), ('graph', 0.066), ('kv', 0.064), ('personalized', 0.063), ('ui', 0.063), ('ranking', 0.061), ('embedding', 0.06), ('labels', 0.059), ('wj', 0.057), ('output', 0.054), ('fiz', 0.054), ('hbgf', 0.054), ('mcla', 0.054), ('outlinks', 0.054), ('maximization', 0.054), ('hi', 0.052), ('class', 0.052), ('ssl', 0.051), ('ers', 0.051), ('transductive', 0.05), ('voting', 0.049), ('shadow', 0.047), ('models', 0.046), ('constraints', 0.045), ('accuracy', 0.044), ('node', 0.043), ('incorporating', 0.042), ('majority', 0.042), ('indexes', 0.041), ('conducting', 0.039), ('cluster', 0.037), ('regarded', 0.037), ('papers', 0.037), ('stacked', 0.037), ('aqt', 0.036), ('consolidated', 0.036), ('stars', 0.036), ('relevance', 0.035), ('boosting', 0.035), ('predictions', 0.035), ('logistic', 0.033), ('gv', 0.032), ('classifier', 0.032), ('yv', 0.032), ('eleven', 0.032), ('feedbacks', 0.032), ('qv', 0.032), ('averaging', 0.03), ('hn', 0.03), ('predicted', 0.03), ('hungarian', 0.029), ('wrt', 0.029), ('gao', 0.029), ('neighboring', 0.029), ('balanced', 0.029), ('cation', 0.028), ('damping', 0.027), ('fuse', 0.027), ('diag', 0.027), ('proposed', 0.027), ('views', 0.027), ('entries', 0.026), ('multiple', 0.026), ('std', 0.026), ('publication', 0.026), ('assigned', 0.026), ('algorithms', 0.026), ('heterogeneous', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

Author: Jing Gao, Feng Liang, Wei Fan, Yizhou Sun, Jiawei Han

Abstract: Ensemble classifiers such as bagging, boosting and model averaging are known to have improved accuracy and robustness over a single model. Their potential, however, is limited in applications which have no access to raw data but to the meta-level model output. In this paper, we study ensemble learning with output from multiple supervised and unsupervised models, a topic where little work has been done. Although unsupervised models, such as clustering, do not directly generate label prediction for each individual, they provide useful constraints for the joint prediction of a set of related objects. We propose to consolidate a classification solution by maximizing the consensus among both supervised predictions and unsupervised constraints. We cast this ensemble task as an optimization problem on a bipartite graph, where the objective function favors the smoothness of the prediction over the graph, as well as penalizing deviations from the initial labeling provided by supervised models. We solve this problem through iterative propagation of probability estimates among neighboring nodes. Our method can also be interpreted as conducting a constrained embedding in a transformed space, or a ranking on the graph. Experimental results on three real applications demonstrate the benefits of the proposed method over existing alternatives1 . 1

2 0.12188118 122 nips-2009-Label Selection on Graphs

Author: Andrew Guillory, Jeff A. Bilmes

Abstract: We investigate methods for selecting sets of labeled vertices for use in predicting the labels of vertices on a graph. We specifically study methods which choose a single batch of labeled vertices (i.e. offline, non sequential methods). In this setting, we find common graph smoothness assumptions directly motivate simple label selection methods with interesting theoretical guarantees. These methods bound prediction error in terms of the smoothness of the true labels with respect to the graph. Some of these bounds give new motivations for previously proposed algorithms, and some suggest new algorithms which we evaluate. We show improved performance over baseline methods on several real world data sets. 1

3 0.099404454 82 nips-2009-Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification

Author: Amarnag Subramanya, Jeff A. Bilmes

Abstract: We prove certain theoretical properties of a graph-regularized transductive learning objective that is based on minimizing a Kullback-Leibler divergence based loss. These include showing that the iterative alternating minimization procedure used to minimize the objective converges to the correct solution and deriving a test for convergence. We also propose a graph node ordering algorithm that is cache cognizant and leads to a linear speedup in parallel computations. This ensures that the algorithm scales to large data sets. By making use of empirical evaluation on the TIMIT and Switchboard I corpora, we show this approach is able to outperform other state-of-the-art SSL approaches. In one instance, we solve a problem on a 120 million node graph. 1

4 0.089425057 237 nips-2009-Subject independent EEG-based BCI decoding

Author: Siamac Fazli, Cristian Grozea, Marton Danoczy, Benjamin Blankertz, Florin Popescu, Klaus-Robert Müller

Abstract: In the quest to make Brain Computer Interfacing (BCI) more usable, dry electrodes have emerged that get rid of the initial 30 minutes required for placing an electrode cap. Another time consuming step is the required individualized adaptation to the BCI user, which involves another 30 minutes calibration for assessing a subject’s brain signature. In this paper we aim to also remove this calibration proceedure from BCI setup time by means of machine learning. In particular, we harvest a large database of EEG BCI motor imagination recordings (83 subjects) for constructing a library of subject-specific spatio-temporal filters and derive a subject independent BCI classifier. Our offline results indicate that BCI-na¨ve ı users could start real-time BCI use with no prior calibration at only a very moderate performance loss.

5 0.085323296 201 nips-2009-Region-based Segmentation and Object Detection

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1

6 0.082309611 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data

7 0.081009932 149 nips-2009-Maximin affinity learning of image segmentation

8 0.079867586 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering

9 0.079206251 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

10 0.075718336 87 nips-2009-Exponential Family Graph Matching and Ranking

11 0.070585698 157 nips-2009-Multi-Label Prediction via Compressed Sensing

12 0.070319466 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

13 0.068255521 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections

14 0.06783127 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank

15 0.067781828 120 nips-2009-Kernels and learning curves for Gaussian process regression on random graphs

16 0.066064306 182 nips-2009-Optimal Scoring for Unsupervised Learning

17 0.065953732 81 nips-2009-Ensemble Nystrom Method

18 0.065924928 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints

19 0.063253529 106 nips-2009-Heavy-Tailed Symmetric Stochastic Neighbor Embedding

20 0.063046344 130 nips-2009-Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.208), (1, -0.005), (2, -0.104), (3, 0.002), (4, -0.053), (5, 0.01), (6, -0.088), (7, -0.004), (8, -0.016), (9, -0.009), (10, -0.027), (11, 0.003), (12, -0.008), (13, -0.169), (14, -0.006), (15, 0.025), (16, 0.016), (17, -0.057), (18, -0.037), (19, -0.057), (20, 0.021), (21, -0.027), (22, -0.038), (23, 0.002), (24, 0.062), (25, 0.013), (26, -0.032), (27, -0.022), (28, 0.012), (29, -0.143), (30, 0.098), (31, -0.085), (32, 0.099), (33, 0.062), (34, 0.062), (35, 0.027), (36, 0.096), (37, 0.192), (38, 0.056), (39, 0.008), (40, 0.059), (41, 0.016), (42, -0.007), (43, 0.05), (44, 0.03), (45, -0.126), (46, -0.112), (47, 0.071), (48, 0.008), (49, 0.06)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94693464 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

Author: Jing Gao, Feng Liang, Wei Fan, Yizhou Sun, Jiawei Han

Abstract: Ensemble classifiers such as bagging, boosting and model averaging are known to have improved accuracy and robustness over a single model. Their potential, however, is limited in applications which have no access to raw data but to the meta-level model output. In this paper, we study ensemble learning with output from multiple supervised and unsupervised models, a topic where little work has been done. Although unsupervised models, such as clustering, do not directly generate label prediction for each individual, they provide useful constraints for the joint prediction of a set of related objects. We propose to consolidate a classification solution by maximizing the consensus among both supervised predictions and unsupervised constraints. We cast this ensemble task as an optimization problem on a bipartite graph, where the objective function favors the smoothness of the prediction over the graph, as well as penalizing deviations from the initial labeling provided by supervised models. We solve this problem through iterative propagation of probability estimates among neighboring nodes. Our method can also be interpreted as conducting a constrained embedding in a transformed space, or a ranking on the graph. Experimental results on three real applications demonstrate the benefits of the proposed method over existing alternatives1 . 1

2 0.6585114 122 nips-2009-Label Selection on Graphs

Author: Andrew Guillory, Jeff A. Bilmes

Abstract: We investigate methods for selecting sets of labeled vertices for use in predicting the labels of vertices on a graph. We specifically study methods which choose a single batch of labeled vertices (i.e. offline, non sequential methods). In this setting, we find common graph smoothness assumptions directly motivate simple label selection methods with interesting theoretical guarantees. These methods bound prediction error in terms of the smoothness of the true labels with respect to the graph. Some of these bounds give new motivations for previously proposed algorithms, and some suggest new algorithms which we evaluate. We show improved performance over baseline methods on several real world data sets. 1

3 0.65830815 149 nips-2009-Maximin affinity learning of image segmentation

Author: Kevin Briggman, Winfried Denk, Sebastian Seung, Moritz N. Helmstaedter, Srinivas C. Turaga

Abstract: Images can be segmented by first using a classifier to predict an affinity graph that reflects the degree to which image pixels must be grouped together and then partitioning the graph to yield a segmentation. Machine learning has been applied to the affinity classifier to produce affinity graphs that are good in the sense of minimizing edge misclassification rates. However, this error measure is only indirectly related to the quality of segmentations produced by ultimately partitioning the affinity graph. We present the first machine learning algorithm for training a classifier to produce affinity graphs that are good in the sense of producing segmentations that directly minimize the Rand index, a well known segmentation performance measure. The Rand index measures segmentation performance by quantifying the classification of the connectivity of image pixel pairs after segmentation. By using the simple graph partitioning algorithm of finding the connected components of the thresholded affinity graph, we are able to train an affinity classifier to directly minimize the Rand index of segmentations resulting from the graph partitioning. Our learning algorithm corresponds to the learning of maximin affinities between image pixel pairs, which are predictive of the pixel-pair connectivity. 1

4 0.57742488 82 nips-2009-Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification

Author: Amarnag Subramanya, Jeff A. Bilmes

Abstract: We prove certain theoretical properties of a graph-regularized transductive learning objective that is based on minimizing a Kullback-Leibler divergence based loss. These include showing that the iterative alternating minimization procedure used to minimize the objective converges to the correct solution and deriving a test for convergence. We also propose a graph node ordering algorithm that is cache cognizant and leads to a linear speedup in parallel computations. This ensures that the algorithm scales to large data sets. By making use of empirical evaluation on the TIMIT and Switchboard I corpora, we show this approach is able to outperform other state-of-the-art SSL approaches. In one instance, we solve a problem on a 120 million node graph. 1

5 0.56550622 71 nips-2009-Distribution-Calibrated Hierarchical Classification

Author: Ofer Dekel

Abstract: While many advances have already been made in hierarchical classification learning, we take a step back and examine how a hierarchical classification problem should be formally defined. We pay particular attention to the fact that many arbitrary decisions go into the design of the label taxonomy that is given with the training data. Moreover, many hand-designed taxonomies are unbalanced and misrepresent the class structure in the underlying data distribution. We attempt to correct these problems by using the data distribution itself to calibrate the hierarchical classification loss function. This distribution-based correction must be done with care, to avoid introducing unmanageable statistical dependencies into the learning problem. This leads us off the beaten path of binomial-type estimation and into the unfamiliar waters of geometric-type estimation. In this paper, we present a new calibrated definition of statistical risk for hierarchical classification, an unbiased estimator for this risk, and a new algorithmic reduction from hierarchical classification to cost-sensitive classification.

6 0.55424583 182 nips-2009-Optimal Scoring for Unsupervised Learning

7 0.54732579 130 nips-2009-Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization

8 0.52836049 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering

9 0.52007747 234 nips-2009-Streaming k-means approximation

10 0.50168699 237 nips-2009-Subject independent EEG-based BCI decoding

11 0.4757373 106 nips-2009-Heavy-Tailed Symmetric Stochastic Neighbor Embedding

12 0.47173697 26 nips-2009-Adaptive Regularization for Transductive Support Vector Machine

13 0.47157344 127 nips-2009-Learning Label Embeddings for Nearest-Neighbor Multi-class Classification with an Application to Speech Recognition

14 0.46741086 81 nips-2009-Ensemble Nystrom Method

15 0.46458998 49 nips-2009-Breaking Boundaries Between Induction Time and Diagnosis Time Active Information Acquisition

16 0.46448967 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints

17 0.45168948 143 nips-2009-Localizing Bugs in Program Executions with Graphical Models

18 0.44127941 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

19 0.43904042 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

20 0.43268901 249 nips-2009-Tracking Dynamic Sources of Malicious Activity at Internet Scale


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.035), (25, 0.063), (35, 0.035), (36, 0.09), (39, 0.476), (58, 0.068), (71, 0.038), (86, 0.076), (91, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98548573 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization

Author: Adam Sanborn, Nick Chater, Katherine A. Heller

Abstract: Existing models of categorization typically represent to-be-classified items as points in a multidimensional space. While from a mathematical point of view, an infinite number of basis sets can be used to represent points in this space, the choice of basis set is psychologically crucial. People generally choose the same basis dimensions – and have a strong preference to generalize along the axes of these dimensions, but not “diagonally”. What makes some choices of dimension special? We explore the idea that the dimensions used by people echo the natural variation in the environment. Specifically, we present a rational model that does not assume dimensions, but learns the same type of dimensional generalizations that people display. This bias is shaped by exposing the model to many categories with a structure hypothesized to be like those which children encounter. The learning behaviour of the model captures the developmental shift from roughly “isotropic” for children to the axis-aligned generalization that adults show. 1

2 0.95115125 54 nips-2009-Compositionality of optimal control laws

Author: Emanuel Todorov

Abstract: We present a theory of compositionality in stochastic optimal control, showing how task-optimal controllers can be constructed from certain primitives. The primitives are themselves feedback controllers pursuing their own agendas. They are mixed in proportion to how much progress they are making towards their agendas and how compatible their agendas are with the present task. The resulting composite control law is provably optimal when the problem belongs to a certain class. This class is rather general and yet has a number of unique properties – one of which is that the Bellman equation can be made linear even for non-linear or discrete dynamics. This gives rise to the compositionality developed here. In the special case of linear dynamics and Gaussian noise our framework yields analytical solutions (i.e. non-linear mixtures of LQG controllers) without requiring the final cost to be quadratic. More generally, a natural set of control primitives can be constructed by applying SVD to Green’s function of the Bellman equation. We illustrate the theory in the context of human arm movements. The ideas of optimality and compositionality are both very prominent in the field of motor control, yet they have been difficult to reconcile. Our work makes this possible.

same-paper 3 0.93550938 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

Author: Jing Gao, Feng Liang, Wei Fan, Yizhou Sun, Jiawei Han

Abstract: Ensemble classifiers such as bagging, boosting and model averaging are known to have improved accuracy and robustness over a single model. Their potential, however, is limited in applications which have no access to raw data but to the meta-level model output. In this paper, we study ensemble learning with output from multiple supervised and unsupervised models, a topic where little work has been done. Although unsupervised models, such as clustering, do not directly generate label prediction for each individual, they provide useful constraints for the joint prediction of a set of related objects. We propose to consolidate a classification solution by maximizing the consensus among both supervised predictions and unsupervised constraints. We cast this ensemble task as an optimization problem on a bipartite graph, where the objective function favors the smoothness of the prediction over the graph, as well as penalizing deviations from the initial labeling provided by supervised models. We solve this problem through iterative propagation of probability estimates among neighboring nodes. Our method can also be interpreted as conducting a constrained embedding in a transformed space, or a ranking on the graph. Experimental results on three real applications demonstrate the benefits of the proposed method over existing alternatives1 . 1

4 0.90792322 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

Author: Gunhee Kim, Antonio Torralba

Abstract: This paper proposes a fast and scalable alternating optimization technique to detect regions of interest (ROIs) in cluttered Web images without labels. The proposed approach discovers highly probable regions of object instances by iteratively repeating the following two functions: (1) choose the exemplar set (i.e. a small number of highly ranked reference ROIs) across the dataset and (2) refine the ROIs of each image with respect to the exemplar set. These two subproblems are formulated as ranking in two different similarity networks of ROI hypotheses by link analysis. The experiments with the PASCAL 06 dataset show that our unsupervised localization performance is better than one of state-of-the-art techniques and comparable to supervised methods. Also, we test the scalability of our approach with five objects in Flickr dataset consisting of more than 200K images. 1

5 0.90239495 110 nips-2009-Hierarchical Mixture of Classification Experts Uncovers Interactions between Brain Regions

Author: Bangpeng Yao, Dirk Walther, Diane Beck, Li Fei-fei

Abstract: The human brain can be described as containing a number of functional regions. These regions, as well as the connections between them, play a key role in information processing in the brain. However, most existing multi-voxel pattern analysis approaches either treat multiple regions as one large uniform region or several independent regions, ignoring the connections between them. In this paper we propose to model such connections in an Hidden Conditional Random Field (HCRF) framework, where the classiďŹ er of one region of interest (ROI) makes predictions based on not only its voxels but also the predictions from ROIs that it connects to. Furthermore, we propose a structural learning method in the HCRF framework to automatically uncover the connections between ROIs. We illustrate this approach with fMRI data acquired while human subjects viewed images of different natural scene categories and show that our model can improve the top-level (the classiďŹ er combining information from all ROIs) and ROI-level prediction accuracy, as well as uncover some meaningful connections between ROIs. 1

6 0.68245018 21 nips-2009-Abstraction and Relational learning

7 0.66419554 148 nips-2009-Matrix Completion from Power-Law Distributed Samples

8 0.63033897 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

9 0.6284461 154 nips-2009-Modeling the spacing effect in sequential category learning

10 0.62826473 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

11 0.62239128 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

12 0.6153962 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering

13 0.61240578 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

14 0.61003715 115 nips-2009-Individuation, Identification and Object Discovery

15 0.60088527 133 nips-2009-Learning models of object structure

16 0.6003418 125 nips-2009-Learning Brain Connectivity of Alzheimer's Disease from Neuroimaging Data

17 0.59147608 112 nips-2009-Human Rademacher Complexity

18 0.58514088 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

19 0.58147413 86 nips-2009-Exploring Functional Connectivities of the Human Brain using Multivariate Information Analysis

20 0.57365793 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs