nips nips2009 nips2009-44 knowledge-graph by maker-knowledge-mining

44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships


Source: pdf

Author: Tomasz Malisiewicz, Alyosha Efros

Abstract: The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the object’s relationship to other elements of the scene (context). Most current approaches rely on modeling the relationships between object categories as a source of context. In this paper we seek to move beyond categories to provide a richer appearancebased model of context. We present an exemplar-based model of objects and their relationships, the Visual Memex, that encodes both local appearance and 2D spatial context between object instances. We evaluate our model on Torralba’s proposed Context Challenge against a baseline category-based system. Our experiments suggest that moving beyond categories for context modeling appears to be quite beneficial, and may be the critical missing ingredient in scene understanding systems. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the object’s relationship to other elements of the scene (context). [sent-4, score-0.355]

2 Most current approaches rely on modeling the relationships between object categories as a source of context. [sent-5, score-0.337]

3 In this paper we seek to move beyond categories to provide a richer appearancebased model of context. [sent-6, score-0.129]

4 We present an exemplar-based model of objects and their relationships, the Visual Memex, that encodes both local appearance and 2D spatial context between object instances. [sent-7, score-0.472]

5 Our experiments suggest that moving beyond categories for context modeling appears to be quite beneficial, and may be the critical missing ingredient in scene understanding systems. [sent-9, score-0.237]

6 In real scenes composed of many different objects, the spatial configuration of one object can facilitate recognition of related objects [1], and quite often ambiguities in recognition cannot be resolved without looking beyond the spatial extent of the object in question. [sent-12, score-0.581]

7 Thus, algorithms which jointly recognize many objects at once by taking account of contextual relationships have been quite popular. [sent-13, score-0.237]

8 [2, 3]), more modern approaches typically perform inference in a probabilistic graphical model with respect to categories where object interactions are modeled as higher order potentials [4, 5, 6, 7, 8, 9, 10]. [sent-16, score-0.294]

9 One important implicit assumption made by all such models is that interactions between object instances can be adequately modeled as relationships between human-defined object categories. [sent-17, score-0.373]

10 In this paper we challenge this “category assumption” for object-object interactions and propose a novel category-free approach for modeling object relationships. [sent-18, score-0.229]

11 We propose a new framework, the Visual Memex Model, for representing and reasoning about object identities and their contextual relationships in an exemplar-based, non-parametric way. [sent-19, score-0.287]

12 2 Motivation The use of categories (classes) to represent concepts (e. [sent-21, score-0.129]

13 visual objects) is so prevalent in computer vision and machine learning that most researchers don’t give it a second thought. [sent-23, score-0.149]

14 Aristotle defined categories as discrete entities characterized by a set of properties shared by all their members [12]. [sent-29, score-0.129]

15 His categories are mutually exclusive, and every member of a category is equal. [sent-30, score-0.224]

16 This classical view is still the most widely accepted way of reasoning about categories and taxonomies in hard sciences. [sent-31, score-0.129]

17 The ground-breaking work of cognitive psychologist Eleanor Rosch [14] demonstrated that humans do not cut up the world into neat categories defined by shared properties, but instead use similarity as the basis of categorization. [sent-41, score-0.198]

18 Such Prototype models have been successfully used for object recognition [15, 16]. [sent-43, score-0.174]

19 This allows for a dynamic definition of categories based on data availability and task (e. [sent-45, score-0.129]

20 an object can be a vehicle, a car, a Volvo, or Bob’s Volvo). [sent-47, score-0.133]

21 A recent operationalization of the exemplar model in the visual domain can be found in [19]. [sent-48, score-0.372]

22 But it might not be too productive to concentrate on the various categorization theories without considering the final aim – what do we need categories for? [sent-49, score-0.184]

23 having been attacked once by a tiger, it’s critically important to determine if a newly observed object belongs to the tiger category so as to utilize the information from the previous encounter. [sent-53, score-0.228]

24 He argues that the goal of visual perception is not to recognize an object in the traditional sense of categorizing it (i. [sent-56, score-0.282]

25 One particular area where we feel these ideas might prove very useful is in modeling relationships between objects within an image. [sent-70, score-0.187]

26 Therefore, in this paper we propose, in an homage to Bush, the Visual Memex Model, as a first step towards operationalizing the direct modeling of associations between visual objects, and compare it with more standard tools for the same task. [sent-71, score-0.253]

27 Abandoning rigid object categories, we embrace Bush’s and Bar’s belief in the primary role of associations, but unlike Bush, we aim to discover these associations automatically from the data. [sent-73, score-0.237]

28 The Visual Memex can then be thought of as a vast graph, with nodes representing all the object instances 2 Figure 1: The Visual Memex graph encodes object similarity (solid black edge) and spatial context (dotted red edge) between pairs of object exemplars. [sent-75, score-0.651]

29 A spatial context feature is stored for each context edge. [sent-76, score-0.291]

30 The Memex graph can be used to interpret a new image (left) by associating image segments with exemplars in the graph (orange edges) and propagating the information. [sent-77, score-0.254]

31 There are two types of arcs in our model, encoding two different relationships between objects: 1) visual similarity (e. [sent-80, score-0.293]

32 this car looks like that car), and 2) contextual associations (e. [sent-82, score-0.295]

33 Once the graph is built, it can be used to interpret a novel image (Figure 1, left) by first connecting segments within the image with similar stored exemplars, and then propagating contextual information between these exemplars through the graph. [sent-85, score-0.333]

34 When an exemplar gets activated, visually similar exemplars as well as other contextually relevant objects get activated as well. [sent-86, score-0.482]

35 For example, in Figure 1, we should be able to infer that a car seen from the rear often co-occurs with an oblique building wall (but not a frontal wall) – something which category-based models would be hard-pressed to achieve. [sent-88, score-0.22]

36 Formally, we define the Visual Memex Model as a graph G = (V, ES , EC , {D}, {f }) consisting of N object exemplar nodes V , similarity edges ES , context edges EC , N per-exemplar similarity functions {D}, and the spatial features {f } associated with each context edge. [sent-89, score-0.865]

37 1 Similarity Edges We use the per-exemplar distance-function learning algorithm of Malisiewicz et al [19] to learn the object similarity edges. [sent-92, score-0.245]

38 For each exemplar, the algorithm learns which other exemplars it is similar to as well as a distance function. [sent-93, score-0.176]

39 For the j-th exemplar, wj is the vector of 14 weights, bj is a scalar bias, and αj ∈ {0, 1}|C| is a binary indicator vector which encodes which other exemplars the current ∗ exemplar is similar to. [sent-96, score-0.399]

40 Let di be the vector of 14 Euclidean distances between the exemplar whose similarity we are learning (the focal exemplar) and the i-th exemplar. [sent-98, score-0.292]

41 C is the set of exemplars that have the same label as the focal exemplar. [sent-99, score-0.176]

42 3) during learning where the regularization term favors connecting to many similarly-labeled exemplars and the loss term favors separability in distance 3 associations window tree Category Estimation tree window door window car car ? [sent-103, score-0.716]

43 wheel car person wheel road window fence building sidewalk door tree road Figure 2: Torralba’s Context Challenge: “How far can you go without running a local object detector? [sent-105, score-0.724]

44 ” The task is to reason about the identity of the hidden object (denoted by a “? [sent-106, score-0.175]

45 In our category-free Visual Memex model, object predictions are generated in the form of exemplar associations for the hidden object. [sent-108, score-0.502]

46 In a category-based model, the category of the hidden object is directly estimated. [sent-109, score-0.27]

47 We create a similarity edge between two exemplars if they are deemed similar by each others’ distance functions. [sent-111, score-0.245]

48 2 Context Edges When two objects occur inside a single image, we encode their 2-D spatial relationship into a context feature vector f ∈ 10 (visualized as red dotted edges in Figure 1). [sent-115, score-0.306]

49 The context feature vector encodes relative overlap, relative displacement, relative scale, and relative height of the bottom-most pixel between two exemplar regions in a single image. [sent-116, score-0.331]

50 This feature captures the spatial relationship between two regions and does not take into account any appearance information – it is a generalization of the spatial features used in [8]. [sent-117, score-0.223]

51 We measure the similarity between two context features 2 using a Gaussian kernel: K(f , f ) = e−α1 || f − f || with α1 = 1. [sent-118, score-0.177]

52 3 Building the Visual Memex We extract a large database of exemplar objects and their ground-truth segmentation masks from the LabelMe [22] dataset and learn the structure of the Visual Memex in an offline setting. [sent-121, score-0.306]

53 We use objects from the 30 most frequently occurring categories in LabelMe. [sent-122, score-0.212]

54 Similarity edges are created using the per-exemplar distance function learning framework of [19], and context edges are created each time two exemplars are observed in the same image. [sent-123, score-0.364]

55 We have a total of N = 87, 802 exemplars in the Visual Memex, |ES | = 276, 782 similarity edges, and |EC | = 989, 106 context edges. [sent-124, score-0.353]

56 4 Evaluating on the Context Challenge The intuition that we would like to evaluate is that many useful regularities of the visual world are lost when dealing solely with categories (e. [sent-125, score-0.278]

57 the side view of a building should associate more with a side view of a car than a frontal view of a car). [sent-127, score-0.149]

58 The key motivation behind the Visual Memex is that context should depend on the appearance of an object and not just the category it belongs to. [sent-128, score-0.409]

59 The evaluation task is inspired by the question: “How far can you go without running an object detector? [sent-131, score-0.133]

60 ” The goal is to recognize a single object in the image without peeking at pixels belonging to that object. [sent-132, score-0.172]

61 Torralba presented an algorithm for predicting the category and scale of an object using only contextual information [23], but his notion of context is scene-centered (where the appearance of the entire image is used for prediction). [sent-133, score-0.527]

62 While it is not clear if the absolute performance numbers on the Context Challenge are very meaningful in themselves, we feel that it is an ideal task for studying object-centered context and the role of categorization assumptions in such models. [sent-135, score-0.192]

63 4 In our variant of the Context Challenge, the goal is to predict the category of a hidden object yi solely based on its spatial relationships to some provided objects – without using the pixels belonging to the hidden object at all. [sent-136, score-0.712]

64 For our study, we use manually provided regions and category labels of K supporting objects inside a single image. [sent-137, score-0.246]

65 We refer to the identities of the K supporting objects in the image as {y1 , . [sent-138, score-0.19]

66 , |C|}) and the set of K 2D spatial relationship features between each supporting object and the hidden object as {f i1 , . [sent-144, score-0.451]

67 Not making the “category assumption,” the model is defined with respect to exemplar associations for the hidden object. [sent-150, score-0.369]

68 Inference in the model returns a compatibility score between every exemplar and the hidden object, and can be though of as returning an ordered list of exemplar associations. [sent-151, score-0.521]

69 Due to the nature of exemplar associations as opposed to category assignments, a supporting object can be associated with multiple exemplars as opposed to a single category. [sent-152, score-0.799]

70 We create soft exemplar associations between each of the supporting objects and the exemplars in the Visual Memex using the similarity functions {D} (see Section 3. [sent-153, score-0.723]

71 , SK } are the appearance features for the K supporting objects. [sent-158, score-0.141]

72 Aa is the affinity between j exemplar a in the Visual Memex and the j-th supporting object and is created by evaluating Sj under a’s distance function Aa = e−Da (Sj ) . [sent-159, score-0.424]

73 Ψ(ei , ej , f ij ) is the pairwise compatibility between j exemplar ei and ej under the spatial feature f ij . [sent-160, score-0.606]

74 Let Wab be the adjacency matrix representation of the similarity edges (Wuv = [(u, v) ∈ ES ]). [sent-161, score-0.109]

75 Inference in the Visual Memex Model is done by optimizing the following conditional distribution which scores the assignment of an arbitrary exemplar ei to the hidden object based on contextual relations: K N Aa Ψ(ei , ea , f ij ) j p(ei |A1 , . [sent-162, score-0.594]

76 , f iK ) ∝ (2) j=1 a=1 log Ψ(ei , ej , f ij ) = (u,v)∈EC Wiu Wjv K(f ij , f uv ) (u,v)∈EC Wiu Wjv (3) The reason for the summation inside Equation 3 is that it aggregates contextual interactions from similar exemplars. [sent-168, score-0.265]

77 By doing this, we effectively “densify” the contextual interactions in the Visual Memex. [sent-169, score-0.111]

78 We experimented with using a single kernel, Ψ(ei , ej | f ij ) = K(f ij , f ei ,ej ), and found that the integration of multiple features via the densification described above is a key ingredient for successful Visual Memex inference. [sent-171, score-0.221]

79 However, since the task we are evaluated on is category-based, we combine the returned exemplars into a vote for categories using Luce’s Axiom of Choice [17] which averages the exemplar responses per-category. [sent-174, score-0.528]

80 CoLA learns a set of parameters for each pair of categories which correspond to relative strengths of the four different top,above,below,inside spatial relationships. [sent-178, score-0.204]

81 In the case of dealing with categories directly we consider a conditional distribution over the category of the hidden object yi that factors as a star graph with K leaves (with the hidden object being connected to 5 all the supporting objects). [sent-179, score-0.676]

82 θ are model parameters, Ψ is a pairwise potential that measures the compatibility of two categories with a specified spatial relationship, and Z is a normalization constant such that the conditional distribution sums to 1. [sent-180, score-0.237]

83 , f iK , θ) = 1 Z K Ψ(yi , yj , f ij , θ) (4) j=1 Following [8], we use a feature function h(f ) that computes the affinity between feature f and a set of prototypical spatial relationships. [sent-187, score-0.209]

84 We automatically find P prototypical spatial relationships by clustering all spatial feature vectors {f } in the training set via the popular Kmeans algorithm. [sent-188, score-0.259]

85 θ is the set of all parameters in this model, with θ(yi , yj ) ∈ P being the parameters associated with the pair of categories (yi , yj ). [sent-193, score-0.229]

86 nonparametric), we feel it would also be useful to examine a hybrid model – dubbed the Reduced KDE Memex Model – which uses a nonparametric model of context but operates on object categories. [sent-201, score-0.27]

87 The Reduced KDE Memex Model is created by collapsing all exemplars belonging to a single category into fullyconnected components which can be thought of as adding categories into the Visual Memex graph. [sent-202, score-0.4]

88 Identities between individual exemplars are lost, and thus we lose the fine details of a spatial context. [sent-203, score-0.251]

89 By forming categories, we can no longer say a particular spatial relationship is between a blue side view of a car and an oblique brick building, we can only say it is a relationship between a car and a building. [sent-204, score-0.299]

90 Now that we are left with an unordered bag of spatial relationships {f } between two categories, we need a way to measure compatibility between a newly observed f and the stored relationships. [sent-205, score-0.183]

91 log Ψ(yi , yj , f ij ) = (u,v)∈EC δyi yu δyj yv K(f ij , f uv ) (u,v)∈EC δ yi yu δ yj yv (7) The Reduced Memex model, being category-based and nonparametric, aggregates the spatial relationships across many different pairs of exemplars from two categories. [sent-210, score-0.56]

92 7 Precision person car tree window head building sky wall road sidewalk sign chair door mountain table floor streetlight lamp plant pole balcony wheel text grass column pane trash blind ground arm 0. [sent-216, score-1.018]

93 2 0 ca pe r rso n tre e win he ad do w bu sk y ild ing wa ll roa d sid sig ew n alk ch air do or tab mo un le tain flo or str ee lam p tlig ht pla nt po le wh ba lco ee l ny tex t gr as s pa co lum ne n tra sh bli nd gr arm ou nd Figure 3: a. [sent-235, score-1.013]

94 For an image with K objects, we solve K Context Challenge problems with one hidden object and K-1 supporting objects. [sent-242, score-0.282]

95 The distributions returned by CoLA tend to degenerate to a single non-zero value (most often on one of the popular categories such as window). [sent-261, score-0.129]

96 We also demonstrate the power of the Visual Memex to predict appearance solely based on contextual interactions with other objects and their visual appearance. [sent-263, score-0.416]

97 In row 3 we see that the appearance of snow on one mountain suggests that the other portion of the image also contains a snowy mountain. [sent-266, score-0.145]

98 In summary, we presented a category-free Visual Memex Model and applied it to the task of contextual object recognition within the experimental framework of the Context Challenge. [sent-267, score-0.253]

99 Our experiments confirm our intuition that moving beyond categories is beneficial for improved modeling of relationships between objects. [sent-268, score-0.204]

100 Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. [sent-302, score-0.282]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('memex', 0.686), ('exemplar', 0.223), ('exemplars', 0.176), ('cola', 0.171), ('visual', 0.149), ('object', 0.133), ('categories', 0.129), ('kde', 0.125), ('car', 0.112), ('context', 0.108), ('associations', 0.104), ('category', 0.095), ('co', 0.095), ('bush', 0.092), ('objects', 0.083), ('contextual', 0.079), ('relationships', 0.075), ('spatial', 0.075), ('antonio', 0.075), ('ec', 0.074), ('appearance', 0.073), ('wall', 0.071), ('la', 0.07), ('similarity', 0.069), ('wheel', 0.069), ('door', 0.068), ('supporting', 0.068), ('ei', 0.067), ('gr', 0.067), ('challenge', 0.064), ('kd', 0.063), ('sidewalk', 0.063), ('bli', 0.057), ('flo', 0.057), ('pla', 0.057), ('su', 0.057), ('categorization', 0.055), ('ej', 0.054), ('ee', 0.054), ('torralba', 0.053), ('ij', 0.05), ('sid', 0.05), ('yj', 0.05), ('window', 0.048), ('road', 0.047), ('ex', 0.046), ('bu', 0.046), ('al', 0.043), ('ba', 0.043), ('str', 0.043), ('balcony', 0.043), ('din', 0.043), ('etl', 0.043), ('floor', 0.043), ('lamp', 0.043), ('pane', 0.043), ('sigewa', 0.043), ('streetlight', 0.043), ('unta', 0.043), ('vannevar', 0.043), ('wiu', 0.043), ('wjv', 0.043), ('hidden', 0.042), ('recognition', 0.041), ('edges', 0.04), ('le', 0.04), ('image', 0.039), ('ch', 0.039), ('malisiewicz', 0.038), ('son', 0.038), ('trash', 0.038), ('building', 0.037), ('pe', 0.036), ('po', 0.036), ('sk', 0.035), ('arm', 0.035), ('yi', 0.034), ('prototype', 0.034), ('prototypical', 0.034), ('te', 0.034), ('em', 0.034), ('mountain', 0.033), ('compatibility', 0.033), ('ht', 0.033), ('interactions', 0.032), ('lc', 0.032), ('ig', 0.032), ('plant', 0.032), ('ta', 0.032), ('vi', 0.032), ('person', 0.031), ('chair', 0.03), ('grass', 0.03), ('pole', 0.03), ('sky', 0.029), ('ro', 0.029), ('feel', 0.029), ('labelme', 0.029), ('un', 0.029), ('eleanor', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

Author: Tomasz Malisiewicz, Alyosha Efros

Abstract: The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the object’s relationship to other elements of the scene (context). Most current approaches rely on modeling the relationships between object categories as a source of context. In this paper we seek to move beyond categories to provide a richer appearancebased model of context. We present an exemplar-based model of objects and their relationships, the Visual Memex, that encodes both local appearance and 2D spatial context between object instances. We evaluate our model on Torralba’s proposed Context Challenge against a baseline category-based system. Our experiments suggest that moving beyond categories for context modeling appears to be quite beneficial, and may be the critical missing ingredient in scene understanding systems. 1

2 0.16741021 201 nips-2009-Region-based Segmentation and Object Detection

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1

3 0.16459911 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization

Author: Adam Sanborn, Nick Chater, Katherine A. Heller

Abstract: Existing models of categorization typically represent to-be-classified items as points in a multidimensional space. While from a mathematical point of view, an infinite number of basis sets can be used to represent points in this space, the choice of basis set is psychologically crucial. People generally choose the same basis dimensions – and have a strong preference to generalize along the axes of these dimensions, but not “diagonally”. What makes some choices of dimension special? We explore the idea that the dimensions used by people echo the natural variation in the environment. Specifically, we present a rational model that does not assume dimensions, but learns the same type of dimensional generalizations that people display. This bias is shaped by exposing the model to many categories with a structure hypothesized to be like those which children encounter. The learning behaviour of the model captures the developmental shift from roughly “isotropic” for children to the axis-aligned generalization that adults show. 1

4 0.13089217 133 nips-2009-Learning models of object structure

Author: Joseph Schlecht, Kobus Barnard

Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1

5 0.097771361 211 nips-2009-Segmenting Scenes by Matching Image Composites

Author: Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman

Abstract: In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.

6 0.084376492 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

7 0.083920009 154 nips-2009-Modeling the spacing effect in sequential category learning

8 0.075138174 175 nips-2009-Occlusive Components Analysis

9 0.074914262 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

10 0.073094539 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

11 0.072820984 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

12 0.072359048 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

13 0.071019977 96 nips-2009-Filtering Abstract Senses From Image Search Results

14 0.067955531 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

15 0.05749676 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection

16 0.056029037 122 nips-2009-Label Selection on Graphs

17 0.05489064 236 nips-2009-Structured output regression for detection with partial truncation

18 0.052373186 179 nips-2009-On the Algorithmics and Applications of a Mixed-norm based Kernel Learning Formulation

19 0.051024161 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

20 0.050265487 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.17), (1, -0.121), (2, -0.113), (3, -0.053), (4, -0.011), (5, 0.106), (6, 0.003), (7, 0.044), (8, 0.067), (9, -0.033), (10, 0.044), (11, -0.065), (12, 0.126), (13, -0.151), (14, 0.038), (15, 0.056), (16, -0.002), (17, 0.009), (18, 0.02), (19, -0.044), (20, -0.043), (21, -0.019), (22, 0.01), (23, 0.042), (24, -0.06), (25, -0.093), (26, -0.019), (27, -0.022), (28, -0.056), (29, -0.052), (30, -0.009), (31, 0.042), (32, 0.069), (33, 0.073), (34, -0.107), (35, 0.001), (36, -0.014), (37, 0.1), (38, -0.036), (39, -0.035), (40, 0.032), (41, -0.037), (42, -0.043), (43, -0.059), (44, 0.027), (45, 0.023), (46, 0.018), (47, 0.052), (48, 0.022), (49, -0.053)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94964349 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

Author: Tomasz Malisiewicz, Alyosha Efros

Abstract: The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the object’s relationship to other elements of the scene (context). Most current approaches rely on modeling the relationships between object categories as a source of context. In this paper we seek to move beyond categories to provide a richer appearancebased model of context. We present an exemplar-based model of objects and their relationships, the Visual Memex, that encodes both local appearance and 2D spatial context between object instances. We evaluate our model on Torralba’s proposed Context Challenge against a baseline category-based system. Our experiments suggest that moving beyond categories for context modeling appears to be quite beneficial, and may be the critical missing ingredient in scene understanding systems. 1

2 0.82591987 133 nips-2009-Learning models of object structure

Author: Joseph Schlecht, Kobus Barnard

Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1

3 0.73883742 201 nips-2009-Region-based Segmentation and Object Detection

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1

4 0.69313556 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

Author: Ed Vul, George Alvarez, Joshua B. Tenenbaum, Michael J. Black

Abstract: Multiple object tracking is a task commonly used to investigate the architecture of human visual attention. Human participants show a distinctive pattern of successes and failures in tracking experiments that is often attributed to limits on an object system, a tracking module, or other specialized cognitive structures. Here we use a computational analysis of the task of object tracking to ask which human failures arise from cognitive limitations and which are consequences of inevitable perceptual uncertainty in the tracking task. We find that many human performance phenomena, measured through novel behavioral experiments, are naturally produced by the operation of our ideal observer model (a Rao-Blackwelized particle filter). The tradeoff between the speed and number of objects being tracked, however, can only arise from the allocation of a flexible cognitive resource, which can be formalized as either memory or attention. 1

5 0.66750818 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization

Author: Adam Sanborn, Nick Chater, Katherine A. Heller

Abstract: Existing models of categorization typically represent to-be-classified items as points in a multidimensional space. While from a mathematical point of view, an infinite number of basis sets can be used to represent points in this space, the choice of basis set is psychologically crucial. People generally choose the same basis dimensions – and have a strong preference to generalize along the axes of these dimensions, but not “diagonally”. What makes some choices of dimension special? We explore the idea that the dimensions used by people echo the natural variation in the environment. Specifically, we present a rational model that does not assume dimensions, but learns the same type of dimensional generalizations that people display. This bias is shaped by exposing the model to many categories with a structure hypothesized to be like those which children encounter. The learning behaviour of the model captures the developmental shift from roughly “isotropic” for children to the axis-aligned generalization that adults show. 1

6 0.62800151 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

7 0.61434746 175 nips-2009-Occlusive Components Analysis

8 0.59911919 115 nips-2009-Individuation, Identification and Object Discovery

9 0.59812415 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

10 0.58066458 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

11 0.57476687 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

12 0.55654854 211 nips-2009-Segmenting Scenes by Matching Image Composites

13 0.49455035 21 nips-2009-Abstraction and Relational learning

14 0.49366644 154 nips-2009-Modeling the spacing effect in sequential category learning

15 0.47833073 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

16 0.47550711 236 nips-2009-Structured output regression for detection with partial truncation

17 0.46928743 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

18 0.46580943 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

19 0.42430794 152 nips-2009-Measuring model complexity with the prior predictive

20 0.40440351 196 nips-2009-Quantification and the language of thought


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.016), (24, 0.034), (25, 0.117), (35, 0.036), (36, 0.085), (39, 0.114), (44, 0.248), (55, 0.014), (58, 0.065), (61, 0.023), (71, 0.068), (81, 0.023), (86, 0.049), (91, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83620065 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

Author: Tomasz Malisiewicz, Alyosha Efros

Abstract: The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the object’s relationship to other elements of the scene (context). Most current approaches rely on modeling the relationships between object categories as a source of context. In this paper we seek to move beyond categories to provide a richer appearancebased model of context. We present an exemplar-based model of objects and their relationships, the Visual Memex, that encodes both local appearance and 2D spatial context between object instances. We evaluate our model on Torralba’s proposed Context Challenge against a baseline category-based system. Our experiments suggest that moving beyond categories for context modeling appears to be quite beneficial, and may be the critical missing ingredient in scene understanding systems. 1

2 0.83414739 121 nips-2009-Know Thy Neighbour: A Normative Theory of Synaptic Depression

Author: Jean-pascal Pfister, Peter Dayan, Máté Lengyel

Abstract: Synapses exhibit an extraordinary degree of short-term malleability, with release probabilities and effective synaptic strengths changing markedly over multiple timescales. From the perspective of a fixed computational operation in a network, this seems like a most unacceptable degree of added variability. We suggest an alternative theory according to which short-term synaptic plasticity plays a normatively-justifiable role. This theory starts from the commonplace observation that the spiking of a neuron is an incomplete, digital, report of the analog quantity that contains all the critical information, namely its membrane potential. We suggest that a synapse solves the inverse problem of estimating the pre-synaptic membrane potential from the spikes it receives, acting as a recursive filter. We show that the dynamics of short-term synaptic depression closely resemble those required for optimal filtering, and that they indeed support high quality estimation. Under this account, the local postsynaptic potential and the level of synaptic resources track the (scaled) mean and variance of the estimated presynaptic membrane potential. We make experimentally testable predictions for how the statistics of subthreshold membrane potential fluctuations and the form of spiking non-linearity should be related to the properties of short-term plasticity in any particular cell type. 1

3 0.82183892 86 nips-2009-Exploring Functional Connectivities of the Human Brain using Multivariate Information Analysis

Author: Barry Chai, Dirk Walther, Diane Beck, Li Fei-fei

Abstract: In this study, we present a new method for establishing fMRI pattern-based functional connectivity between brain regions by estimating their multivariate mutual information. Recent advances in the numerical approximation of highdimensional probability distributions allow us to successfully estimate mutual information from scarce fMRI data. We also show that selecting voxels based on the multivariate mutual information of local activity patterns with respect to ground truth labels leads to higher decoding accuracy than established voxel selection methods. We validate our approach with a 6-way scene categorization fMRI experiment. Multivariate information analysis is able to find strong information sharing between PPA and RSC, consistent with existing neuroscience studies on scenes. Furthermore, an exploratory whole-brain analysis uncovered other brain regions that share information with the PPA-RSC scene network.

4 0.67677134 110 nips-2009-Hierarchical Mixture of Classification Experts Uncovers Interactions between Brain Regions

Author: Bangpeng Yao, Dirk Walther, Diane Beck, Li Fei-fei

Abstract: The human brain can be described as containing a number of functional regions. These regions, as well as the connections between them, play a key role in information processing in the brain. However, most existing multi-voxel pattern analysis approaches either treat multiple regions as one large uniform region or several independent regions, ignoring the connections between them. In this paper we propose to model such connections in an Hidden Conditional Random Field (HCRF) framework, where the classiďŹ er of one region of interest (ROI) makes predictions based on not only its voxels but also the predictions from ROIs that it connects to. Furthermore, we propose a structural learning method in the HCRF framework to automatically uncover the connections between ROIs. We illustrate this approach with fMRI data acquired while human subjects viewed images of different natural scene categories and show that our model can improve the top-level (the classiďŹ er combining information from all ROIs) and ROI-level prediction accuracy, as well as uncover some meaningful connections between ROIs. 1

5 0.63637447 133 nips-2009-Learning models of object structure

Author: Joseph Schlecht, Kobus Barnard

Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1

6 0.6270507 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

7 0.62399054 154 nips-2009-Modeling the spacing effect in sequential category learning

8 0.61970681 115 nips-2009-Individuation, Identification and Object Discovery

9 0.61650491 70 nips-2009-Discriminative Network Models of Schizophrenia

10 0.61597556 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

11 0.61144906 25 nips-2009-Adaptive Design Optimization in Experiments with People

12 0.61129475 226 nips-2009-Spatial Normalized Gamma Processes

13 0.60914874 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

14 0.60761023 126 nips-2009-Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering

15 0.60720795 211 nips-2009-Segmenting Scenes by Matching Image Composites

16 0.6062367 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

17 0.60582161 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

18 0.6036092 112 nips-2009-Human Rademacher Complexity

19 0.59924996 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

20 0.59902126 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction