nips nips2011 nips2011-180 knowledge-graph by maker-knowledge-mining

180 nips-2011-Multiple Instance Filtering


Source: pdf

Author: Kamil A. Wnuk, Stefano Soatto

Abstract: We propose a robust filtering approach based on semi-supervised and multiple instance learning (MIL). We assume that the posterior density would be unimodal if not for the effect of outliers that we do not wish to explicitly model. Therefore, we seek for a point estimate at the outset, rather than a generic approximation of the entire posterior. Our approach can be thought of as a combination of standard finite-dimensional filtering (Extended Kalman Filter, or Unscented Filter) with multiple instance learning, whereby the initial condition comes with a putative set of inlier measurements. We show how both the state (regression) and the inlier set (classification) can be estimated iteratively and causally by processing only the current measurement. We illustrate our approach on visual tracking problems whereby the object of interest (target) moves and evolves as a result of occlusions and deformations, and partial knowledge of the target is given in the form of a bounding box (training set). 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We assume that the posterior density would be unimodal if not for the effect of outliers that we do not wish to explicitly model. [sent-4, score-0.166]

2 Our approach can be thought of as a combination of standard finite-dimensional filtering (Extended Kalman Filter, or Unscented Filter) with multiple instance learning, whereby the initial condition comes with a putative set of inlier measurements. [sent-6, score-0.385]

3 We show how both the state (regression) and the inlier set (classification) can be estimated iteratively and causally by processing only the current measurement. [sent-7, score-0.337]

4 We illustrate our approach on visual tracking problems whereby the object of interest (target) moves and evolves as a result of occlusions and deformations, and partial knowledge of the target is given in the form of a bounding box (training set). [sent-8, score-0.873]

5 Unfortunately, in many applications of interest, from visual tracking to robotic navigation, the posterior is not unimodal. [sent-13, score-0.207]

6 However, in many applications one has reason to believe that the posterior would be unimodal if not for the effect of outlier measurements, and therefore the interest is in a point estimate, for instance the mode, mean or median, rather than in the entire posterior. [sent-15, score-0.134]

7 1 Prior related work Our goal is naturally framed in the classical robust statistical inference setting, whereby classification (inlier/outlier) is solved along with regression (filtering). [sent-20, score-0.134]

8 We assume that an initial condition is available, both for the regressor (state) as well as the inlier distribution. [sent-21, score-0.269]

9 1 The latter can be thought of as training data in a semi-supervised setting. [sent-23, score-0.125]

10 Our approach relates to recent work in detection-based tracking [3, 10] that use semi-supervised learning [4, 18, 13], as well as multiple-instance learning [2] and latent-SVM models [8, 20]. [sent-26, score-0.165]

11 In [3] an ensemble of pixel-level weak classifiers is combined on-line via boosting; this is efficient but suffers from drift; [10] improves stability by using a static model trained on the first frame as a prior for labeling new training samples used to update an online classifier. [sent-27, score-0.389]

12 MILTrack [4] addressed the problem of selecting training data for model update so as to maintain maximum discriminative power. [sent-28, score-0.14]

13 We adopt an incremental SVM with a fast approximation of a nonlinear kernel [21] rather than online boosting. [sent-32, score-0.103]

14 Our part based representation and explicit dynamics allow us to better handle scale and shape changes without the need for a multi-scale image search [4, 13]. [sent-33, score-0.218]

15 The P-N tracker [13] combined a median flow tracker with an online random forest. [sent-35, score-0.294]

16 New training samples were collected when detections violated structural constraints based on estimated object position. [sent-36, score-0.287]

17 In an effort to control drift, new training data was not incorporated into the model until the tracked object returned to a previously confirmed appearance with high confidence. [sent-37, score-0.445]

18 This meant that if object appearance never returned to the “key frames,” the online model would never be updated. [sent-38, score-0.367]

19 In the aforementioned works objects are represented as a bounding box. [sent-39, score-0.184]

20 Other approaches have used explicit temporal models together with sparsity constraints to model appearance changes [15]. [sent-42, score-0.312]

21 We propose a semi-supervised approach to filtering, with an explicit temporal model, that assumes imperfect labeling, whereby portions of the image inside the bounding box are “true positives” and others are outliers. [sent-43, score-0.633]

22 This enables us to handle appearance changes, for instance due to partial occlusions or changes of vantage point. [sent-44, score-0.29]

23 This can be thought of as a realization of a stochastic process that evolves via some kind of ordinary difference equation IID x(t + 1) = f (x(t)) + ν(t), where ν(t) ∼ pν is a temporally independent and identically distributed process. [sent-48, score-0.156]

24 m(t) We denote the set of measurements at time t with y(t) = {yi (t)}i=1 , yi (t) ∈ Rk . [sent-50, score-0.316]

25 In classical filtering, the measurements are a known function of the state, y(t) = h(x(t)) + n(t), up to the measurement noise, n(t), that is a realization of a stochastic process that is often assumed to be temporally independent and identically distributed, and also independent of ν(t). [sent-52, score-0.274]

26 In our case, however, the components of the measurement process y1 (t), . [sent-53, score-0.117]

27 , ym(t) (t) are divided into two groups: those that behave like standard measurements in a filtering process, and those that do not. [sent-56, score-0.1]

28 The measurements are thus samples from a stochastic process that includes two independent sources of uncertainty: the measurement noise, n(t), and the selection process χ(t). [sent-61, score-0.267]

29 Our goal is that of determining a point-estimate of the state x(t) given measurements up to time t. [sent-62, score-0.157]

30 k=1 In order to design a filter, we first consider the full forward model of how the various samples of the inlier measurements are generated. [sent-65, score-0.379]

31 To this end, we assume that the inlier set is separable from the outlier set by a hyper-plane in some feature space, represented by the normal vector w(t) ∈ Rl . [sent-66, score-0.281]

32 Conversely, if we are given the hyperplane w(t), and state x(t), the measurements can be classified via χ(t) = argminχ E(y(t), w(t), x(t), χ). [sent-71, score-0.219]

33 The energy function, E(y(t), w(t), x(t), χ) depends on how one chooses to model the object and what side information is applied to constrain the selection of training data. [sent-72, score-0.2]

34 In the implementation details we give examples of how appearance continuity can be used as a constraint in this step. [sent-73, score-0.152]

35 Further, motion similarity and occlusion boundaries could also be used. [sent-74, score-0.092]

36 Finally, the forward (data-formation) model for a sample (realization) of the measurement process is given as follows: At time t = 0, we will assume that we have available an initial distribution p(x0 ) together with an initial assignment of inliers and outliers χ0 , so x(0) ∼ p(x0 ); χ(0) = χ0 . [sent-75, score-0.51]

37 At all subsequent times t, each realization evolves according to:  x(t + 1) = f (x(t)) + v(t),   w(t + 1) = stochSubgradIters(w(t), y(t), χ(t)), χ(t) = argminχ E(y(t), w(t), x(t), χ),   {yi (t)}i∈χ(t)+ = h(x(t), t) + n(t). [sent-77, score-0.109]

38 Note that it is possible for the model above to proceed in open-loop, when no inliers are present. [sent-80, score-0.229]

39 The model (1) can easily be extended to the case when the measurement equation is in implicit form, h(x(t), {yi (t)}i∈χ(t)+ , t) = n(t), since all that matters is the innovation pro. [sent-81, score-0.202]

40 Additional extensions can be entertained where the ˆ dynamics f depends on the classifier w, so that x(t + 1) = f (x(t), w(t)) + v(t), and similarly for the measurement equation h(x(t), w(t), t), although we will not consider them here. [sent-83, score-0.117]

41 3 Application example: Visual tracking with shape and appearance changes Objects of interest (e. [sent-85, score-0.454]

42 humans, cars) move in ways that result in a deformation of their projection onto the image plane, even when the object is rigid. [sent-87, score-0.204]

43 Further changes of appearance occur due to motion relative to the light source and partial occlusions. [sent-88, score-0.288]

44 For instance, one can fix a bounding box (shape) and model change of appearance inside, 3 including outliers (due to occlusion) and inliers (newly visible portions of the object). [sent-90, score-0.91]

45 Alternatively, one can enforce constancy of the reflectance function, but then shape changes as well as illumination must be modeled explicitly, which is complex [12]. [sent-91, score-0.137]

46 Our approach tracks the motion of a bounding box, enclosing the data inliers. [sent-92, score-0.301]

47 Call c(t) ∈ R2 the center of this bounding box, vc (t) ∈ R2 the velocity of the center, d(t) ∈ R2 the length of the sides of the bounding box, and vd (t) ∈ R2 its rate of change. [sent-93, score-0.408]

48 As before χ(t) indicates a binary labeling of the measurement components, where χ(t)+ is the set of samples that correspond to the object of interest. [sent-95, score-0.358]

49 We have tested different versions of our framework where the components are superpixels as well as trajectories of feature points. [sent-96, score-0.139]

50 2 Algorithm development We focus our discussion in this section on the development of the discriminative appearance model at the heart of the inlier/outlier classification, w(t). [sent-103, score-0.152]

51 For simplicity, pretend for now that each frame contains m observations. [sent-104, score-0.166]

52 We assume an object is identified with a subset of the observations (inliers); at time t, we have {yi (t)}i∈χ(t)+ . [sent-105, score-0.122]

53 Also pretend that observations Nf from all frames, Y = {y(t)}t=1 , were available simultaneously; Nf is the number of frames in the video sequence. [sent-106, score-0.265]

54 If all frames were labeled, (χ(t) known ∀ t), a maximum margin classifier w could be obtained by minimizing the objective (3) over all samples in all frames: ˆ   Nf m λ 1 w = argmin  ||w||2 + ˆ (w, φ(yi (t)), χi (t)) . [sent-107, score-0.324]

55 In reality an exact label assignment at every frame is not available, so we must infer the latent labeling χ simultaneously while learning the hyperplane w. [sent-110, score-0.221]

56 Continuing our hypothetical batch processing scenario, pretend we have estimates of some state of the object throughout Nf ˆ time, X = {ˆ(t)}t=1 . [sent-111, score-0.255]

57 This allows us to identify a reduced subset of candidate inliers x 4 (in MIL terminology a positive bag), within which we assume all inliers are contained. [sent-112, score-0.458]

58 The specification of a positive bag helps reduce the search space, since we can assume all samples outside of a positive bag are negative. [sent-113, score-0.228]

59 Recently, [19] proposed an efficient incremental scheme, PEGASOS, to solve the hinge loss objective in the primal form. [sent-118, score-0.144]

60 This enables straightforward incremental training of w as new data becomes available. [sent-119, score-0.141]

61 In a nutshell, i=1 at each PEGASOS iteration we select a subset of training samples from the current training set Aj ⊆ T , and update w according to wj+1 = wj − ηj j . [sent-121, score-0.367]

62 (5) seeks a solution to the binary integer program of inlier selection given w and x. [sent-125, score-0.229]

63 Instead of tackling this NP-hard problem, we re-interpret it as a ˆ ˆ constraint enforcement step based on additional cues within a search area specified by our the current state estimate. [sent-126, score-0.124]

64 One example constraint for a superpixel based object representation is to re-interpret the given objective as a graph cut problem, with pairwise terms enforcing appearance consistency. [sent-127, score-0.274]

65 1 Initialization At t = 0 we are given initial observations y(0) and a bounding box indicating the object of interest {c(0) ± d(0)}. [sent-130, score-0.57]

66 We initialize χ(0) with positive indices corresponding to superpixels that have a majority of their area |yi (0)| within the bounding box: χi (0) = 1 if |{c(0)±d(0)} ∩ |yi (0)| −1 otherwise. [sent-131, score-0.39]

67 yi (0)| > y, (6) The area threshold is y = 0. [sent-132, score-0.283]

68 This represents a bootstrap training set, T1 from which we learn an initial classifier w(1) for distinguishing object appearance. [sent-134, score-0.302]

69 Each element of the training set is a triplet (φ(yi (t)), χi (t), τi = t), where the last element is the time at which the feature is added to the training set. [sent-135, score-0.156]

70 We start by selecting all positive samples and a set number of negatives, nf , sampled randomly from χ(0)− , giving T1 = {(φ(yi (0)), χi (0), 0)}∀i∈χ(0)+ ∪ {(φ(yj (0)), χj (0), 0) | j ∈ χ(0)− ⊆ rand χ(0)− , |χ(0)− | = nf }. [sent-136, score-0.545]

71 2 Prediction Step At time t, given the current estimate of the object state and classification χ(t), we add all positive samples and difficult negative samples lying outside of the estimated bounding box to the new training set Tt+1|t . [sent-139, score-0.765]

72 We then propagate the object state with the model of motion dynamics and finally update the decision boundary with the newly updated training set. [sent-140, score-0.429]

73 If τmax = 0, no memory is used and training data for model update consists only of observations from the current image. [sent-146, score-0.14]

74 Such a memory of recent training samples is analogous to the training cache used in [8] for training the latentSVM model. [sent-147, score-0.284]

75 During each classifier update we perform N − nT iterations of the stochastic subgradient descent algorithm, starting from the current best estimate of the separating hyperplane wnT = w(t). [sent-148, score-0.163]

76 The overall number of iterations N is set as N = 20/λ, where λ is a function of the bootstrap training set size, λ = 1/(10|T1 |). [sent-149, score-0.14]

77 3 Update Step The innovation is in implicit form with h(yi (t + 1)i∈χ(t+1)+ ) ∈ R4 giving a tight bounding box around the selected foreground regions in the same form as they appear in the state. [sent-155, score-0.56]

78 In the update equations r specifies the size of the search region around the predicted state within which we consider observations as candidates for foreground; ξ specifies the indices of candidate observations (positive bag). [sent-156, score-0.119]

79 6 Figure 1: Ski sequence: Left panel shows frame number, search area (black rectangle), filter prediction (blue), observation (red), and updated filter estimate (green). [sent-159, score-0.157]

80 The algorithm performs well and successfully recovers from missed detection (from frame 349 to 352 shown above). [sent-163, score-0.178]

81 Figure 2: P-N tracker [13] (above) and MILTrack [4] (below) initialized with the same bounding box as our approach. [sent-164, score-0.517]

82 The P-N tracker fails because of the absence of stable low-level tracks on the target and quickly locks onto a patch of trees in the background. [sent-166, score-0.171]

83 3 Experiments To compare with [18, 4, 13], we first evaluate our discriminative model without maintaining any training data history τmax = 0 and updating w every 6 frames, with training data collected between incremental updates. [sent-168, score-0.219]

84 Even with τmax = 0 we can track highly deforming objects (a skier) with significant scale changes through most of the 1496 frames (Fig. [sent-169, score-0.328]

85 We also recover from errors due to the implicit memory in the decision boundary from incremental updating. [sent-171, score-0.159]

86 Two evaluation metrics are reported: the mean center location error in pixels [4], and percentage of correctly area(ROID ∩ROIGT ) tracked frames as computed by the bounding box overlap criteria area(ROID ∪ROIGT ) > 0. [sent-176, score-0.637]

87 5, 7 Figure 3: Convergence of the classifier: Samples from frames 113, 125, 733, and 1435 of the “liquor” sequence. [sent-177, score-0.189]

88 The ground truth for the PROST dataset is reported using a constant sized bounding box. [sent-180, score-0.265]

89 In the liquor sequence our method correctly shrinks the bounding box to the label, since the rest of the bottle is not discriminative. [sent-182, score-0.552]

90 If we modify the criterion to count as valid a detection where > 99% of the detection area lies within the annotated ground truth region, the score becomes 75. [sent-187, score-0.244]

91 If we allow for > 90% of the detected area to lie within the ground truth box, the final pascal result for the liquor sequence becomes 79. [sent-189, score-0.388]

92 The same phenomenon occurs in the box sequence, where our approach adapts to tracking the label at the bottom of the box. [sent-192, score-0.389]

93 Additional results, including failure modes as well as successful tracking where other approaches fail, are reported in the supplementary material, both for the case of superpixels and tracks. [sent-194, score-0.304]

94 ours P-N [13] PROST [18] MILTrack [4] FragTrack [1] Overall pascal 74. [sent-195, score-0.096]

95 Our method and the P-N tracker [13] do not always detect the object. [sent-242, score-0.109]

96 Ground truthed frames in which no location was reported by the method of [13] were not counted into the final distance score. [sent-243, score-0.189]

97 The method of [13] missed 2 detections on the box sequence, 1 detection on the lemming sequence, and 80 on the liquor sequence. [sent-244, score-0.551]

98 When our approach failed to detect the object, we used the predicted bounding box from the state of the filter as our reported result. [sent-245, score-0.465]

99 However, we have provided empirical validation of our k=1 approach on challenging visual tracking problems, where it exceeds the state of the art, and illustrated some of its failure modes. [sent-249, score-0.222]

100 Dynamic shape and appearance modeling via moving and deforming layers. [sent-325, score-0.266]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nf', 0.229), ('inlier', 0.229), ('inliers', 0.229), ('box', 0.224), ('yi', 0.216), ('frames', 0.189), ('bounding', 0.184), ('prost', 0.173), ('ltering', 0.168), ('tracking', 0.165), ('appearance', 0.152), ('liquor', 0.144), ('superpixels', 0.139), ('object', 0.122), ('measurement', 0.117), ('miltrack', 0.116), ('mnf', 0.116), ('tt', 0.114), ('tracker', 0.109), ('measurements', 0.1), ('wj', 0.099), ('pascal', 0.096), ('frame', 0.09), ('bag', 0.089), ('roid', 0.087), ('roigt', 0.087), ('stochsubgraditers', 0.087), ('argmin', 0.085), ('outliers', 0.084), ('changes', 0.081), ('training', 0.078), ('pretend', 0.076), ('aj', 0.073), ('whereby', 0.069), ('labeling', 0.069), ('foreground', 0.067), ('area', 0.067), ('cvpr', 0.066), ('robust', 0.065), ('incremental', 0.063), ('classi', 0.063), ('bootstrap', 0.062), ('tracks', 0.062), ('hyperplane', 0.062), ('update', 0.062), ('deforming', 0.058), ('leistner', 0.058), ('lemming', 0.058), ('wnt', 0.058), ('state', 0.057), ('pegasos', 0.057), ('occlusions', 0.057), ('realization', 0.057), ('iid', 0.057), ('shape', 0.056), ('er', 0.055), ('boundary', 0.055), ('motion', 0.055), ('returned', 0.053), ('evolves', 0.052), ('outlier', 0.052), ('causally', 0.051), ('deformations', 0.051), ('si', 0.05), ('samples', 0.05), ('detection', 0.048), ('thought', 0.047), ('mode', 0.046), ('lter', 0.046), ('particle', 0.046), ('drift', 0.046), ('innovation', 0.044), ('mil', 0.044), ('ground', 0.044), ('nt', 0.043), ('primal', 0.043), ('posterior', 0.042), ('di', 0.042), ('centroid', 0.042), ('deformation', 0.042), ('explicit', 0.041), ('implicit', 0.041), ('online', 0.04), ('image', 0.04), ('missed', 0.04), ('tracked', 0.04), ('vd', 0.04), ('unimodal', 0.04), ('lc', 0.04), ('initial', 0.04), ('subgradient', 0.039), ('temporal', 0.038), ('hinge', 0.038), ('detections', 0.037), ('occlusion', 0.037), ('portions', 0.037), ('rand', 0.037), ('truth', 0.037), ('median', 0.036), ('max', 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 180 nips-2011-Multiple Instance Filtering

Author: Kamil A. Wnuk, Stefano Soatto

Abstract: We propose a robust filtering approach based on semi-supervised and multiple instance learning (MIL). We assume that the posterior density would be unimodal if not for the effect of outliers that we do not wish to explicitly model. Therefore, we seek for a point estimate at the outset, rather than a generic approximation of the entire posterior. Our approach can be thought of as a combination of standard finite-dimensional filtering (Extended Kalman Filter, or Unscented Filter) with multiple instance learning, whereby the initial condition comes with a putative set of inlier measurements. We show how both the state (regression) and the inlier set (classification) can be estimated iteratively and causally by processing only the current measurement. We illustrate our approach on visual tracking problems whereby the object of interest (target) moves and evolves as a result of occlusions and deformations, and partial knowledge of the target is given in the form of a bounding box (training set). 1

2 0.20659171 303 nips-2011-Video Annotation and Tracking with Active Learning

Author: Carl Vondrick, Deva Ramanan

Abstract: We introduce a novel active learning framework for video annotation. By judiciously choosing which frames a user should annotate, we can obtain highly accurate tracks with minimal user effort. We cast this problem as one of active learning, and show that we can obtain excellent performance by querying frames that, if annotated, would produce a large expected change in the estimated object track. We implement a constrained tracker and compute the expected change for putative annotations with efficient dynamic programming algorithms. We demonstrate our framework on four datasets, including two benchmark datasets constructed with key frame annotations obtained by Amazon Mechanical Turk. Our results indicate that we could obtain equivalent labels for a small fraction of the original cost. 1

3 0.17845595 275 nips-2011-Structured Learning for Cell Tracking

Author: Xinghua Lou, Fred A. Hamprecht

Abstract: We study the problem of learning to track a large quantity of homogeneous objects such as cell tracking in cell culture study and developmental biology. Reliable cell tracking in time-lapse microscopic image sequences is important for modern biomedical research. Existing cell tracking methods are usually kept simple and use only a small number of features to allow for manual parameter tweaking or grid search. We propose a structured learning approach that allows to learn optimum parameters automatically from a training set. This allows for the use of a richer set of features which in turn affords improved tracking compared to recently reported methods on two public benchmark sequences. 1

4 0.1571286 193 nips-2011-Object Detection with Grammar Models

Author: Ross B. Girshick, Pedro F. Felzenszwalb, David A. McAllester

Abstract: Compositional models provide an elegant formalism for representing the visual appearance of highly variable objects. While such models are appealing from a theoretical point of view, it has been difficult to demonstrate that they lead to performance advantages on challenging datasets. Here we develop a grammar model for person detection and show that it outperforms previous high-performance systems on the PASCAL benchmark. Our model represents people using a hierarchy of deformable parts, variable structure and an explicit model of occlusion for partially visible objects. To train the model, we introduce a new discriminative framework for learning structured prediction models from weakly-labeled data. 1

5 0.15356235 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

Author: Hema S. Koppula, Abhishek Anand, Thorsten Joachims, Ashutosh Saxena

Abstract: Inexpensive RGB-D cameras that give an RGB image together with depth data have become widely available. In this paper, we use this data to build 3D point clouds of full indoor scenes such as an office and address the task of semantic labeling of these 3D point clouds. We propose a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurence relationships and geometric relationships. With a large number of object classes and relations, the model’s parsimony becomes important and we address that by using multiple types of edge potentials. The model admits efficient approximate inference, and we train it using a maximum-margin learning approach. In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views, having 2495 segments labeled with 27 object classes), we get a performance of 84.06% in labeling 17 object classes for offices, and 73.38% in labeling 17 object classes for home scenes. Finally, we applied these algorithms successfully on a mobile robot for the task of finding objects in large cluttered rooms.1 1

6 0.13895975 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

7 0.13094431 154 nips-2011-Learning person-object interactions for action recognition in still images

8 0.12250048 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC

9 0.12151974 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning

10 0.11667194 168 nips-2011-Maximum Margin Multi-Instance Learning

11 0.11303417 138 nips-2011-Joint 3D Estimation of Objects and Scene Layout

12 0.10886393 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection

13 0.10854119 35 nips-2011-An ideal observer model for identifying the reference frame of objects

14 0.10605154 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows

15 0.10552841 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities

16 0.10206772 261 nips-2011-Sparse Filtering

17 0.099972174 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling

18 0.093392506 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation

19 0.091873609 165 nips-2011-Matrix Completion for Multi-label Image Classification

20 0.088784255 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.268), (1, 0.104), (2, -0.101), (3, 0.15), (4, 0.074), (5, 0.07), (6, 0.014), (7, -0.091), (8, -0.028), (9, 0.147), (10, 0.069), (11, -0.103), (12, 0.028), (13, 0.005), (14, -0.014), (15, -0.053), (16, -0.008), (17, 0.014), (18, -0.021), (19, 0.05), (20, -0.014), (21, 0.015), (22, 0.002), (23, -0.004), (24, 0.006), (25, -0.06), (26, -0.277), (27, 0.114), (28, -0.019), (29, -0.066), (30, -0.023), (31, -0.031), (32, 0.177), (33, -0.063), (34, -0.049), (35, 0.038), (36, -0.005), (37, -0.011), (38, 0.101), (39, -0.079), (40, 0.009), (41, -0.152), (42, 0.024), (43, -0.015), (44, 0.036), (45, -0.027), (46, 0.008), (47, -0.023), (48, -0.04), (49, -0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94560444 180 nips-2011-Multiple Instance Filtering

Author: Kamil A. Wnuk, Stefano Soatto

Abstract: We propose a robust filtering approach based on semi-supervised and multiple instance learning (MIL). We assume that the posterior density would be unimodal if not for the effect of outliers that we do not wish to explicitly model. Therefore, we seek for a point estimate at the outset, rather than a generic approximation of the entire posterior. Our approach can be thought of as a combination of standard finite-dimensional filtering (Extended Kalman Filter, or Unscented Filter) with multiple instance learning, whereby the initial condition comes with a putative set of inlier measurements. We show how both the state (regression) and the inlier set (classification) can be estimated iteratively and causally by processing only the current measurement. We illustrate our approach on visual tracking problems whereby the object of interest (target) moves and evolves as a result of occlusions and deformations, and partial knowledge of the target is given in the form of a bounding box (training set). 1

2 0.82524598 275 nips-2011-Structured Learning for Cell Tracking

Author: Xinghua Lou, Fred A. Hamprecht

Abstract: We study the problem of learning to track a large quantity of homogeneous objects such as cell tracking in cell culture study and developmental biology. Reliable cell tracking in time-lapse microscopic image sequences is important for modern biomedical research. Existing cell tracking methods are usually kept simple and use only a small number of features to allow for manual parameter tweaking or grid search. We propose a structured learning approach that allows to learn optimum parameters automatically from a training set. This allows for the use of a richer set of features which in turn affords improved tracking compared to recently reported methods on two public benchmark sequences. 1

3 0.73805618 193 nips-2011-Object Detection with Grammar Models

Author: Ross B. Girshick, Pedro F. Felzenszwalb, David A. McAllester

Abstract: Compositional models provide an elegant formalism for representing the visual appearance of highly variable objects. While such models are appealing from a theoretical point of view, it has been difficult to demonstrate that they lead to performance advantages on challenging datasets. Here we develop a grammar model for person detection and show that it outperforms previous high-performance systems on the PASCAL benchmark. Our model represents people using a hierarchy of deformable parts, variable structure and an explicit model of occlusion for partially visible objects. To train the model, we introduce a new discriminative framework for learning structured prediction models from weakly-labeled data. 1

4 0.73372173 303 nips-2011-Video Annotation and Tracking with Active Learning

Author: Carl Vondrick, Deva Ramanan

Abstract: We introduce a novel active learning framework for video annotation. By judiciously choosing which frames a user should annotate, we can obtain highly accurate tracks with minimal user effort. We cast this problem as one of active learning, and show that we can obtain excellent performance by querying frames that, if annotated, would produce a large expected change in the estimated object track. We implement a constrained tracker and compute the expected change for putative annotations with efficient dynamic programming algorithms. We demonstrate our framework on four datasets, including two benchmark datasets constructed with key frame annotations obtained by Amazon Mechanical Turk. Our results indicate that we could obtain equivalent labels for a small fraction of the original cost. 1

5 0.62968856 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection

Author: Joseph J. Lim, Antonio Torralba, Ruslan Salakhutdinov

Abstract: Despite the recent trend of increasingly large datasets for object detection, there still exist many classes with few training examples. To overcome this lack of training data for certain classes, we propose a novel way of augmenting the training data for each class by borrowing and transforming examples from other classes. Our model learns which training instances from other classes to borrow and how to transform the borrowed examples so that they become more similar to instances from the target class. Our experimental results demonstrate that our new object detector, with borrowed and transformed examples, improves upon the current state-of-the-art detector on the challenging SUN09 object detection dataset. 1

6 0.56645757 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities

7 0.56147134 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC

8 0.55731267 138 nips-2011-Joint 3D Estimation of Objects and Scene Layout

9 0.55590385 154 nips-2011-Learning person-object interactions for action recognition in still images

10 0.54573405 169 nips-2011-Maximum Margin Multi-Label Structured Prediction

11 0.54062611 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning

12 0.53549719 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

13 0.49548095 35 nips-2011-An ideal observer model for identifying the reference frame of objects

14 0.47326204 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

15 0.46769133 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

16 0.46493232 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling

17 0.45464125 277 nips-2011-Submodular Multi-Label Learning

18 0.45213994 197 nips-2011-On Tracking The Partition Function

19 0.44778535 232 nips-2011-Ranking annotators for crowdsourced labeling tasks

20 0.44693372 33 nips-2011-An Exact Algorithm for F-Measure Maximization


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.014), (4, 0.072), (6, 0.011), (20, 0.091), (26, 0.04), (31, 0.122), (33, 0.056), (40, 0.149), (43, 0.07), (45, 0.096), (57, 0.037), (65, 0.017), (74, 0.046), (83, 0.049), (84, 0.015), (99, 0.052)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89509523 107 nips-2011-Global Solution of Fully-Observed Variational Bayesian Matrix Factorization is Column-Wise Independent

Author: Shinichi Nakajima, Masashi Sugiyama, S. D. Babacan

Abstract: Variational Bayesian matrix factorization (VBMF) efficiently approximates the posterior distribution of factorized matrices by assuming matrix-wise independence of the two factors. A recent study on fully-observed VBMF showed that, under a stronger assumption that the two factorized matrices are column-wise independent, the global optimal solution can be analytically computed. However, it was not clear how restrictive the column-wise independence assumption is. In this paper, we prove that the global solution under matrix-wise independence is actually column-wise independent, implying that the column-wise independence assumption is harmless. A practical consequence of our theoretical finding is that the global solution under matrix-wise independence (which is a standard setup) can be obtained analytically in a computationally very efficient way without any iterative algorithms. We experimentally illustrate advantages of using our analytic solution in probabilistic principal component analysis. 1

same-paper 2 0.86494058 180 nips-2011-Multiple Instance Filtering

Author: Kamil A. Wnuk, Stefano Soatto

Abstract: We propose a robust filtering approach based on semi-supervised and multiple instance learning (MIL). We assume that the posterior density would be unimodal if not for the effect of outliers that we do not wish to explicitly model. Therefore, we seek for a point estimate at the outset, rather than a generic approximation of the entire posterior. Our approach can be thought of as a combination of standard finite-dimensional filtering (Extended Kalman Filter, or Unscented Filter) with multiple instance learning, whereby the initial condition comes with a putative set of inlier measurements. We show how both the state (regression) and the inlier set (classification) can be estimated iteratively and causally by processing only the current measurement. We illustrate our approach on visual tracking problems whereby the object of interest (target) moves and evolves as a result of occlusions and deformations, and partial knowledge of the target is given in the form of a bounding box (training set). 1

3 0.81471485 30 nips-2011-Algorithms for Hyper-Parameter Optimization

Author: James S. Bergstra, Rémi Bardenet, Yoshua Bengio, Balázs Kégl

Abstract: Several recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel approaches to feature learning. Traditionally, hyper-parameter optimization has been the job of humans because they can be very efficient in regimes where only a few trials are possible. Presently, computer clusters and GPU processors make it possible to run more trials and we show that algorithmic approaches can find better results. We present hyper-parameter optimization results on tasks of training neural networks and deep belief networks (DBNs). We optimize hyper-parameters using random search and two new greedy sequential methods based on the expected improvement criterion. Random search has been shown to be sufficiently efficient for learning neural networks for several datasets, but we show it is unreliable for training DBNs. The sequential algorithms are applied to the most difficult DBN learning problems from [1] and find significantly better results than the best previously reported. This work contributes novel techniques for making response surface models P (y|x) in which many elements of hyper-parameter assignment (x) are known to be irrelevant given particular values of other elements. 1

4 0.8100881 98 nips-2011-From Bandits to Experts: On the Value of Side-Observations

Author: Shie Mannor, Ohad Shamir

Abstract: We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game. In addition to observing the reward of the chosen action, the decision maker gets side observations on the reward he would have obtained had he chosen some of the other actions. The observation structure is encoded as a graph, where node i is linked to node j if sampling i provides information on the reward of j. This setting naturally interpolates between the well-known “experts” setting, where the decision maker can view all rewards, and the multi-armed bandits setting, where the decision maker can only view the reward of the chosen action. We develop practical algorithms with provable regret guarantees, which depend on non-trivial graph-theoretic properties of the information feedback structure. We also provide partially-matching lower bounds. 1

5 0.79617155 303 nips-2011-Video Annotation and Tracking with Active Learning

Author: Carl Vondrick, Deva Ramanan

Abstract: We introduce a novel active learning framework for video annotation. By judiciously choosing which frames a user should annotate, we can obtain highly accurate tracks with minimal user effort. We cast this problem as one of active learning, and show that we can obtain excellent performance by querying frames that, if annotated, would produce a large expected change in the estimated object track. We implement a constrained tracker and compute the expected change for putative annotations with efficient dynamic programming algorithms. We demonstrate our framework on four datasets, including two benchmark datasets constructed with key frame annotations obtained by Amazon Mechanical Turk. Our results indicate that we could obtain equivalent labels for a small fraction of the original cost. 1

6 0.79118067 227 nips-2011-Pylon Model for Semantic Segmentation

7 0.78912884 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

8 0.78681177 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling

9 0.77571887 55 nips-2011-Collective Graphical Models

10 0.77304608 75 nips-2011-Dynamical segmentation of single trials from population neural data

11 0.76966423 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

12 0.76947105 66 nips-2011-Crowdclustering

13 0.76944083 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

14 0.76845264 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

15 0.76637632 127 nips-2011-Image Parsing with Stochastic Scene Grammar

16 0.76509523 229 nips-2011-Query-Aware MCMC

17 0.76364118 156 nips-2011-Learning to Learn with Compound HD Models

18 0.76186723 154 nips-2011-Learning person-object interactions for action recognition in still images

19 0.75999725 231 nips-2011-Randomized Algorithms for Comparison-based Search

20 0.75898689 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation