nips nips2010 nips2010-153 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Joseph L. Austerweil, Thomas L. Griffiths
Abstract: Identifying the features of objects becomes a challenge when those features can change in their appearance. We introduce the Transformed Indian Buffet Process (tIBP), and use it to define a nonparametric Bayesian model that infers features that can transform across instantiations. We show that this model can identify features that are location invariant by modeling a previous experiment on human feature learning. However, allowing features to transform adds new kinds of ambiguity: Are two parts of an object the same feature with different transformations or two unique features? What transformations can features undergo? We present two new experiments in which we explore how people resolve these questions, showing that the tIBP model demonstrates a similar sensitivity to context to that shown by human learners when determining the invariant aspects of features. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract Identifying the features of objects becomes a challenge when those features can change in their appearance. [sent-6, score-0.411]
2 We show that this model can identify features that are location invariant by modeling a previous experiment on human feature learning. [sent-8, score-0.461]
3 However, allowing features to transform adds new kinds of ambiguity: Are two parts of an object the same feature with different transformations or two unique features? [sent-9, score-0.773]
4 We present two new experiments in which we explore how people resolve these questions, showing that the tIBP model demonstrates a similar sensitivity to context to that shown by human learners when determining the invariant aspects of features. [sent-11, score-0.291]
5 One explanation for this capability is the visual system recognizes that the features of an object can occur differently across presentations, but will be transformed in a few predictable ways. [sent-15, score-0.401]
6 Representing objects in terms of invariant features poses a challenge for models of feature learning. [sent-16, score-0.486]
7 Despite this, people are able to identify invariant features (e. [sent-20, score-0.322]
8 Analogous to how the Transformed Dirichlet Process extends the Dirichlet Process [7], the tIBP associates a parameter with each instantiation of a feature that determines how the feature is transformed in the given image. [sent-24, score-0.313]
9 This allows for unsupervised learning of features that are invariant in location, size, or orientation. [sent-25, score-0.242]
10 After defining the generative model for the tIBP and presenting a Gibbs sampling inference algorithm, we show that this model can learn visual features that are location invariant by modeling previous behavioral results (from [6]). [sent-26, score-0.32]
11 (a) Does this object have one feature that contains two vertical bars or two features that each contain one vertical bar? [sent-29, score-0.814]
12 One new issue that arises from inferring invariant features is that it can be ambiguous whether parts of an image are the same feature with different transformations or different features. [sent-32, score-0.735]
13 For example, an object containing two vertical bars has (at least) two representations: a single feature containing two vertical bars a fixed distance apart, or two features each of which is a vertical bar with its own translational transformation (see Figure 1 (a)). [sent-33, score-1.208]
14 Introducing transformational invariance also raises the question of what kinds of transformations a feature can undergo. [sent-36, score-0.448]
15 A classic demonstration of the difficulty of defining a set of permissable transformations is the Mach square/diamond [8]. [sent-37, score-0.317]
16 We extend the tIBP to include variables that select the transformations each feature is allowed to undergo. [sent-40, score-0.445]
17 This raises the question of whether people can infer the permissable transformations of a feature. [sent-41, score-0.506]
18 This provides an interesting new explanation of the Mach square/diamond: People learn the allowed transformations of features for a given shape, not what transformations of features are allowed over all shapes. [sent-43, score-0.947]
19 The Indian Buffet Process (IBP) [4] is a stochastic process that can be used as a prior in nonparametric Bayesian models where each object is represented using an unknown but potentially infinite set of latent features. [sent-47, score-0.277]
20 If there are N objects and K features, then Z is a N × K binary matrix (where object n has feature k if znk = 1) and Y is a K × D matrix (where D is the dimensionality of the observed properties of each object, e. [sent-50, score-0.614]
21 The vector xn representing the properties of object n is generated based on its features zn and the matrix Y. [sent-62, score-0.447]
22 The transformations are object-specific, so in a sense, when an object takes a feature, the feature is transformed with respect to the object. [sent-68, score-0.671]
23 The following generative process defines the tIBP: Z|α Y|β ∼ IBP(α) ∼ g(β) rnk |η xn |rn , zn , Y, γ iid ∼ Φ(η) ∼ f (xn |rn (Y), zn , γ) In this paper, we focus on binary images where the transformations are drawn uniformly at random from a finite set (though Section 5. [sent-70, score-0.816]
24 Assuming our data are in {0, 1}D1 ×D2 , a translation shifts the starting place of its feature in each dimension by rnk = (d1 , d2 ). [sent-74, score-0.372]
25 The likelihood p(xnd = 1|Z, Y, R) is then identical to Equation 2, substituting the vector of transformed feature interpretations rn (yd ) for yd . [sent-83, score-0.371]
26 3 Inference by Gibbs sampling We sample from the posterior distribution on feature assignments Z, feature interpretations Y, and transformations R given observed properties X using Gibbs sampling [11]. [sent-85, score-0.595]
27 For features with mk > 0 (after removal of the current value of znk ), we draw znk by marginalizing over transformations. [sent-87, score-0.523]
28 If znk = 1, we then sample rnk from p(rnk |znk = 1, Z−(nk) , R−(nk) , Y, X) ∝ p(xn |zn , Y, R)p(rnk ) (4) where the relevant probabilities are also used in computing Equation 3, and can thus be cached. [sent-90, score-0.36]
29 To compute the first term on the right hand side, we need new to marginalize over the possible new feature images and their transformations (Y(K+1):(K+Kn ) new and Rn,(K+1):(K+Kn ) ). [sent-96, score-0.557]
30 We assume that the first object to take a feature takes it in its canonical form and thus it is not transformed. [sent-97, score-0.293]
31 With no transformations, drawing the new features in the noisy-OR tIBP model is equivalent to drawing the new features in the normal noisy-OR IBP model. [sent-99, score-0.382]
32 (6) new 1 − (1 − ǫ)(1 − λ)zn rn (yd ) (1 − pλ)Kn (7) where rn (yd ) is the vector of transformed feature interpretations along observed dimension d. [sent-106, score-0.363]
33 4 Prediction To compare the feature representations our model infers to behavioral results, we need to have judgements of the model for new test objects. [sent-110, score-0.258]
34 This is a prediction problem: computing the probability of a new object xN +1 given the set of N observed objects X. [sent-111, score-0.382]
35 For each sweep of Gibbs sampling, we sample a vector of features zN +1 and corresponding transformations rN +1 for a new object from their conditional distribution given the values of Z, Y, and R in that sweep, under the constraint that no new features are generated. [sent-115, score-0.81]
36 3 Demonstration: Learning Translation Invariant Features In many situations learners need to form a feature representation of a set of objects, and the features do not reoccur in the exact same location. [sent-117, score-0.231]
37 The tIBP provides a way for a learner to discover that features are translation invariant, and to infer them directly from the data. [sent-121, score-0.256]
38 Fiser and colleagues [6, 12] showed that when two parts of an image always occur together (forming a “base pair”), people expect the two parts to occur together as if they had one feature representing the pair. [sent-122, score-0.367]
39 In Experiments 1 and 2 of [6], participants viewed 144 scenes, where each scene contained three of the six base pairs in varied spatial location. [sent-123, score-0.258]
40 Afterwards, participants chose which of two images was more familiar: a base pair (in a never seen before location) and pair of parts that occured together at least once (but were not a base pair). [sent-125, score-0.506]
41 To demonstrate the ability of tIBP to infer translation invariant features that are made up of complex parts, we trained the model on the scenes with the same structure as those shown to participants. [sent-127, score-0.373]
42 Figure 2 (c) shows the features inferred 4 (a) (b) (c) Figure 2: Learning translation invariant features. [sent-130, score-0.297]
43 To compare the model people’s familiarity judgments, we calculated the model’s predictive probability for each base pair in a new location and for a part in that base pair with another part that co-occured with it at least once (but not in a base pair). [sent-142, score-0.338]
44 4 Experiment 1: One feature or two features transformed? [sent-144, score-0.231]
45 Is an image composed of the same feature multiple times with different instantiations or is it composed with different features that may or may not be transformed? [sent-146, score-0.321]
46 One way to decide between two possible feature representations for the object is to pick the features that allow you to encode the object and the other objects it is associated with. [sent-147, score-0.796]
47 For example, the object from Figure 1 (a) is the first object (from the top left) in the two sets of objects shown in Figure 3. [sent-148, score-0.535]
48 All of the objects in this set can be represented as translations of one feature that is two vertical bars. [sent-150, score-0.451]
49 Although this object set can also be described in terms of two features (each of which are vertical bars that can each translate independently), it is a surprising coincidence that the two vertical bars are always the same distance apart over all of the objects in the set. [sent-151, score-1.017]
50 Using different feature representations leads to different predictions about what other objects should be expected to be in the set. [sent-154, score-0.354]
51 Representing the objects with a single feature containing two vertical bars predicts new objects that have vertical bars where the two bars are the same distance apart (New Unitized). [sent-155, score-1.165]
52 These objects are also expected under the feature representation that is two features that are each vertical bars; however, any object with two vertical bars is expected (New Separate) — not just those with a particular distance apart. [sent-156, score-0.981]
53 Thus, interpreting objects with different feature representations has consequences for how to generalize set membership. [sent-157, score-0.306]
54 In the following experiment, we test these predictions by asking people after viewing either the unitized or separate object sets to judge how likely the New Unitized or New Separate objects are to be part of the object set they viewed. [sent-158, score-1.012]
55 We then compare the behavioral results to the features inferred by the tIBP model and the predictive probability of each of the test objects given each of the object sets. [sent-159, score-0.55]
56 (a) Objects made from spatial translations of the unitized feature. [sent-161, score-0.29]
57 The number of times each vertical bar is present is the same in the two object sets. [sent-163, score-0.415]
58 The unitized group only rated those images with two vertical bars close together highly. [sent-166, score-0.659]
59 The separate group rate any image with two vertical bars highly. [sent-167, score-0.411]
60 Three participants were removed for failing to complete the task leaving 19 and 18 participants in the separate and unitized conditions respectively. [sent-171, score-0.653]
61 In the training phase, participants read this cover story (adapted from [13]): “Recently a Mars rover found a cave with a collection of different images on its walls. [sent-173, score-0.343]
62 ” They then looked through the eight images (which were either the unitized or separate object set in a random order) and scrolled down to the next section once they were ready for the test phase. [sent-176, score-0.607]
63 2 Results Figure 4 (a) shows the average ratings made by participants in each group for the nine test images. [sent-181, score-0.234]
64 001) objects higher than the unitized group, but otherwise did not rate any of the other test images significantly different. [sent-186, score-0.496]
65 As predicted by the above analysis, the unitized group believed the Mars rover was likely to encounter the two images it observed and the New Unit image (the unitized feature in a new horizontal position), but did not think it would encounter the other objects. [sent-187, score-0.92]
66 The separate group rated any image with two vertical bars highly. [sent-188, score-0.448]
67 This indicates that they represent the images using two features each containing a single vertical bar varying in horizontal position. [sent-189, score-0.456]
68 Thus, each group of participants infer a set of features invariant over the set of observed objects (taking into account the different horizontal position of the features in each object). [sent-190, score-0.811]
69 Figure 4 (b) shows the predictions made by the tIBP model when given each object set. [sent-191, score-0.232]
70 A non-linear monotonic transformation of these probabilities was used for visualization, 6 (a) (b) (c) Rotation set Size set New Rotation New Size Figure 5: Stimuli for investigating how different types of invariances are learned for different object classes. [sent-193, score-0.265]
71 (c) Two new objects for testing the inferred type of invariance a New Rotation and a New Size object. [sent-196, score-0.284]
72 Unlike the participants in the separate condition, the model does not infer that each object has two features and so having only one feature is not a good object. [sent-202, score-0.72]
73 This suggests that while learning the feature representation for a set of objects, people also learn the number of features each object typically has. [sent-203, score-0.527]
74 Investigating how people infer expectations about the number of features objects have is an interesting phenomenon that demands further study. [sent-204, score-0.478]
75 5 Experiment 2: Learning the type of invariance A natural next step for improving the tIBP would be to make the set of transformations Φ larger and thus extend the number of possible invariants that can be learned. [sent-205, score-0.38]
76 This example teaches a counterintuitive moral: The best approach is not to include as many transformations as possible into the model. [sent-209, score-0.283]
77 Though rotations are not valid transformations for what people commonly consider to be squares, they are appropriate for many objects. [sent-210, score-0.395]
78 This suggests that people infer the set of allowable transformations for different classes of objects. [sent-211, score-0.472]
79 Given the three objects in Figure 5 (a) (the rotation set) it seems clear that the New Rotation object in Figure 5 (c) belongs in the set, but not the New Size object. [sent-212, score-0.516]
80 To explore this phenomenon, we first extend the tIBP to infer the appropriate set of transformations by introducing latent variables for each feature that indicate which transformations it is allowed to use. [sent-214, score-0.833]
81 We demonstrate this extension to the tIBP predicts the New Rotation object when given the rotation set and predicts the New Size object when given the size set — effectively learning the appropriate type of invariance for a given object class. [sent-215, score-0.865]
82 Finally, we confirm our introspective argument that people infer the type of invariance appropriate to the observed class of objects. [sent-216, score-0.245]
83 1 Learning invariance type using the tIBP It is straightforward to modify the tIBP such that the type of transformations allowed on a feature is inferred as well. [sent-218, score-0.531]
84 The experiment in this section is learning whether or not the feature defining a set of objects is either rotation or size invariant. [sent-221, score-0.514]
85 Formally, we model this using a generative process that is the same as the tIBP, but introduces the latent variable tk which determines the type of transformation allowed by feature k. [sent-222, score-0.375]
86 If tk = 1, then rotational transformations are drawn from Φρ (which is the discrete uniform distribution distribution ranging in multiples of fifteen degrees from zero to 45). [sent-223, score-0.425]
87 If tk = 0, then size transformations are drawn from Φσ (which is the discrete uniform distribution iid over [3/8, 3/7, 3/5, 5/7, 1, 7/5, 11/7, 5/3, 11/5, 7/3, 11/3]). [sent-224, score-0.42]
88 , rnk , p(tk |X, Y, Z, R−k , t−k ) p(xn |rnk , tk , Y, Z, R−k , t−k )p(rk |tk )p(tk ). [sent-230, score-0.315]
89 Prediction is as above except tk gives the set of transformations each feature is allowed to take. [sent-234, score-0.554]
90 2 Methods A total of 40 participants were recruited online and compensated a small amount, with 20 participants in both training conditions (rotation and size). [sent-236, score-0.366]
91 Participants observed the three objects in their training set and then generalize on a scale from 0 to 6 to five test objects: Same Both (the object that is in both training sets), Same Rot (the last object of the rotation set), Same Size (the last object of the size set), New Rot and New Size. [sent-238, score-0.912]
92 As expected, participants in the rotation condition generalize more to the New Rot object than the size condition (unpaired t(38) = 4. [sent-241, score-0.545]
93 This confirms our hypothesis; people infer the appropriate set of transformations (a subset of all transformations) features are allowed to use for a class of objects. [sent-246, score-0.647]
94 Importantly, the model predicts that only when given the rotation set should participants generalize to the New Rot object and only when given the size set should they generalize to the New Size object. [sent-257, score-0.577]
95 6 Conclusions and Future Directions In this paper, we presented a solution to how people infer feature representations that are invariant over transformations and in two behavioral experiments confirmed two predictions of a new model of human unsupervised feature learning. [sent-258, score-1.026]
96 In addition to these contributions, we proposed a first sketch of a new computational theory of shape representation — the features representing an object are transformed relative to the object and the set of transformations a feature is allowed to undergo depends on the object’s context. [sent-259, score-1.116]
97 In the future, we would like to pursue this theory further, expanding the account of learning the types of transformations and exploring how the transformations between features in an object interact (we should expect some interaction due to real world constraints on the transformations, e. [sent-260, score-0.872]
98 Finally, we hope to include other facets of visual perception into our model, like a perceptually realistic prior on feature instantiations and features relations (e. [sent-263, score-0.263]
99 , the horizontal bar is always ON TOP OF the vertical bar). [sent-265, score-0.262]
100 Infinite latent feature models and the Indian buffet process. [sent-285, score-0.261]
wordName wordTfidf (topN-words)
[('tibp', 0.48), ('transformations', 0.283), ('unitized', 0.257), ('rnk', 0.206), ('object', 0.184), ('participants', 0.168), ('objects', 0.167), ('rotation', 0.165), ('kn', 0.164), ('rot', 0.154), ('znk', 0.154), ('sep', 0.147), ('vertical', 0.142), ('ibp', 0.137), ('buffet', 0.124), ('features', 0.122), ('indian', 0.117), ('bars', 0.115), ('people', 0.112), ('feature', 0.109), ('tk', 0.109), ('transformed', 0.095), ('base', 0.09), ('bar', 0.089), ('invariant', 0.088), ('zn', 0.086), ('yd', 0.082), ('infer', 0.077), ('images', 0.072), ('rover', 0.069), ('znew', 0.069), ('gibbs', 0.068), ('mk', 0.061), ('xnd', 0.06), ('separate', 0.06), ('human', 0.06), ('image', 0.058), ('translation', 0.057), ('invariance', 0.056), ('xn', 0.055), ('allowed', 0.053), ('austerweil', 0.051), ('mach', 0.051), ('ykd', 0.051), ('nk', 0.051), ('transformation', 0.048), ('predictions', 0.048), ('behavioral', 0.047), ('experiment', 0.045), ('parts', 0.044), ('rn', 0.043), ('grif', 0.043), ('seen', 0.042), ('interpretations', 0.042), ('invariants', 0.041), ('fiser', 0.041), ('diamond', 0.041), ('unit', 0.041), ('infers', 0.041), ('drawing', 0.038), ('nonparametric', 0.037), ('sweep', 0.037), ('rated', 0.037), ('location', 0.037), ('group', 0.036), ('cave', 0.034), ('permissable', 0.034), ('scrolled', 0.034), ('unpaired', 0.034), ('judgments', 0.034), ('perceived', 0.034), ('dirichlet', 0.033), ('berkeley', 0.033), ('rotational', 0.033), ('translations', 0.033), ('invariances', 0.033), ('unsupervised', 0.032), ('predicts', 0.032), ('marginalizing', 0.032), ('instantiations', 0.032), ('new', 0.031), ('horizontal', 0.031), ('representations', 0.03), ('apart', 0.03), ('recruited', 0.03), ('nine', 0.03), ('bayesian', 0.03), ('inferred', 0.03), ('scenes', 0.029), ('shape', 0.029), ('process', 0.028), ('ths', 0.028), ('latent', 0.028), ('size', 0.028), ('scientists', 0.028), ('wood', 0.028), ('mars', 0.026), ('undergo', 0.026), ('sampling', 0.026), ('spearman', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process
Author: Joseph L. Austerweil, Thomas L. Griffiths
Abstract: Identifying the features of objects becomes a challenge when those features can change in their appearance. We introduce the Transformed Indian Buffet Process (tIBP), and use it to define a nonparametric Bayesian model that infers features that can transform across instantiations. We show that this model can identify features that are location invariant by modeling a previous experiment on human feature learning. However, allowing features to transform adds new kinds of ambiguity: Are two parts of an object the same feature with different transformations or two unique features? What transformations can features undergo? We present two new experiments in which we explore how people resolve these questions, showing that the tIBP model demonstrates a similar sensitivity to context to that shown by human learners when determining the invariant aspects of features. 1
Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing
Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.
3 0.11399239 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
Author: Mario Fritz, Kate Saenko, Trevor Darrell
Abstract: Metric constraints are known to be highly discriminative for many objects, but if training is limited to data captured from a particular 3-D sensor the quantity of training data may be severly limited. In this paper, we show how a crucial aspect of 3-D information–object and feature absolute size–can be added to models learned from commonly available online imagery, without use of any 3-D sensing or reconstruction at training time. Such models can be utilized at test time together with explicit 3-D sensing to perform robust search. Our model uses a “2.1D” local feature, which combines traditional appearance gradient statistics with an estimate of average absolute depth within the local window. We show how category size information can be obtained from online images by exploiting relatively unbiquitous metadata fields specifying camera intrinstics. We develop an efficient metric branch-and-bound algorithm for our search task, imposing 3-D size constraints as part of an optimal search for a set of features which indicate the presence of a category. Experiments on test scenes captured with a traditional stereo rig are shown, exploiting training data from from purely monocular sources with associated EXIF metadata. 1
4 0.10300761 155 nips-2010-Learning the context of a category
Author: Dan Navarro
Abstract: This paper outlines a hierarchical Bayesian model for human category learning that learns both the organization of objects into categories, and the context in which this knowledge should be applied. The model is fit to multiple data sets, and provides a parsimonious method for describing how humans learn context specific conceptual representations.
5 0.10016263 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
Author: Matthew Blaschko, Andrea Vedaldi, Andrew Zisserman
Abstract: A standard approach to learning object category detectors is to provide strong supervision in the form of a region of interest (ROI) specifying each instance of the object in the training images [17]. In this work are goal is to learn from heterogeneous labels, in which some images are only weakly supervised, specifying only the presence or absence of the object or a weak indication of object location, whilst others are fully annotated. To this end we develop a discriminative learning approach and make two contributions: (i) we propose a structured output formulation for weakly annotated images where full annotations are treated as latent variables; and (ii) we propose to optimize a ranking objective function, allowing our method to more effectively use negatively labeled images to improve detection average precision performance. The method is demonstrated on the benchmark INRIA pedestrian detection dataset of Dalal and Triggs [14] and the PASCAL VOC dataset [17], and it is shown that for a significant proportion of weakly supervised images the performance achieved is very similar to the fully supervised (state of the art) results. 1
6 0.096356958 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression
7 0.093685038 149 nips-2010-Learning To Count Objects in Images
8 0.087215982 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces
9 0.086294971 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs
10 0.082042456 99 nips-2010-Gated Softmax Classification
11 0.079317063 281 nips-2010-Using body-anchored priors for identifying actions in single images
12 0.075820059 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models
13 0.075206198 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
14 0.074745625 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
15 0.069722742 114 nips-2010-Humans Learn Using Manifolds, Reluctantly
16 0.066268824 94 nips-2010-Feature Set Embedding for Incomplete Data
17 0.065832309 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data
18 0.063084573 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
19 0.062559016 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities
20 0.062113721 133 nips-2010-Kernel Descriptors for Visual Recognition
topicId topicWeight
[(0, 0.166), (1, 0.074), (2, -0.123), (3, -0.131), (4, -0.036), (5, -0.002), (6, 0.01), (7, 0.049), (8, -0.007), (9, 0.036), (10, 0.016), (11, 0.032), (12, -0.087), (13, -0.027), (14, 0.052), (15, -0.013), (16, 0.09), (17, 0.031), (18, 0.097), (19, 0.027), (20, -0.011), (21, 0.085), (22, 0.016), (23, -0.036), (24, 0.061), (25, 0.02), (26, -0.049), (27, 0.014), (28, 0.01), (29, 0.019), (30, -0.075), (31, 0.009), (32, 0.001), (33, 0.065), (34, 0.007), (35, 0.005), (36, -0.007), (37, 0.018), (38, -0.018), (39, 0.018), (40, -0.014), (41, 0.093), (42, 0.084), (43, -0.058), (44, -0.068), (45, 0.02), (46, 0.033), (47, -0.082), (48, -0.026), (49, -0.054)]
simIndex simValue paperId paperTitle
same-paper 1 0.93401414 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process
Author: Joseph L. Austerweil, Thomas L. Griffiths
Abstract: Identifying the features of objects becomes a challenge when those features can change in their appearance. We introduce the Transformed Indian Buffet Process (tIBP), and use it to define a nonparametric Bayesian model that infers features that can transform across instantiations. We show that this model can identify features that are location invariant by modeling a previous experiment on human feature learning. However, allowing features to transform adds new kinds of ambiguity: Are two parts of an object the same feature with different transformations or two unique features? What transformations can features undergo? We present two new experiments in which we explore how people resolve these questions, showing that the tIBP model demonstrates a similar sensitivity to context to that shown by human learners when determining the invariant aspects of features. 1
Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing
Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.
3 0.65717459 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
Author: Mario Fritz, Kate Saenko, Trevor Darrell
Abstract: Metric constraints are known to be highly discriminative for many objects, but if training is limited to data captured from a particular 3-D sensor the quantity of training data may be severly limited. In this paper, we show how a crucial aspect of 3-D information–object and feature absolute size–can be added to models learned from commonly available online imagery, without use of any 3-D sensing or reconstruction at training time. Such models can be utilized at test time together with explicit 3-D sensing to perform robust search. Our model uses a “2.1D” local feature, which combines traditional appearance gradient statistics with an estimate of average absolute depth within the local window. We show how category size information can be obtained from online images by exploiting relatively unbiquitous metadata fields specifying camera intrinstics. We develop an efficient metric branch-and-bound algorithm for our search task, imposing 3-D size constraints as part of an optimal search for a set of features which indicate the presence of a category. Experiments on test scenes captured with a traditional stereo rig are shown, exploiting training data from from purely monocular sources with associated EXIF metadata. 1
4 0.64681208 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces
Author: Abhinav Gupta, Martial Hebert, Takeo Kanade, David M. Blei
Abstract: There has been a recent push in extraction of 3D spatial layout of scenes. However, none of these approaches model the 3D interaction between objects and the spatial layout. In this paper, we argue for a parametric representation of objects in 3D, which allows us to incorporate volumetric constraints of the physical world. We show that augmenting current structured prediction techniques with volumetric reasoning significantly improves the performance of the state-of-the-art. 1
5 0.61722761 281 nips-2010-Using body-anchored priors for identifying actions in single images
Author: Leonid Karlinsky, Michael Dinerstein, Shimon Ullman
Abstract: This paper presents an approach to the visual recognition of human actions using only single images as input. The task is easy for humans but difficult for current approaches to object recognition, because instances of different actions may be similar in terms of body pose, and often require detailed examination of relations between participating objects and body parts in order to be recognized. The proposed approach applies a two-stage interpretation procedure to each training and test image. The first stage produces accurate detection of the relevant body parts of the actor, forming a prior for the local evidence needed to be considered for identifying the action. The second stage extracts features that are anchored to the detected body parts, and uses these features and their feature-to-part relations in order to recognize the action. The body anchored priors we propose apply to a large range of human actions. These priors allow focusing on the relevant regions and relations, thereby significantly simplifying the learning process and increasing recognition performance. 1
6 0.58593559 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models
7 0.56038165 149 nips-2010-Learning To Count Objects in Images
8 0.55053043 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
9 0.54780215 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence
10 0.52403831 82 nips-2010-Evaluation of Rarity of Fingerprints in Forensics
11 0.50773412 245 nips-2010-Space-Variant Single-Image Blind Deconvolution for Removing Camera Shake
12 0.49631459 95 nips-2010-Feature Transitions with Saccadic Search: Size, Color, and Orientation Are Not Alike
13 0.48749945 1 nips-2010-(RF)^2 -- Random Forest Random Field
14 0.48746207 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs
15 0.47732517 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
16 0.47583622 17 nips-2010-A biologically plausible network for the computation of orientation dominance
17 0.47124457 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
18 0.46470585 256 nips-2010-Structural epitome: a way to summarize one’s visual experience
19 0.45050925 266 nips-2010-The Maximal Causes of Natural Scenes are Edge Filters
20 0.44942105 156 nips-2010-Learning to combine foveal glimpses with a third-order Boltzmann machine
topicId topicWeight
[(13, 0.033), (17, 0.013), (27, 0.101), (30, 0.06), (35, 0.027), (44, 0.301), (45, 0.188), (50, 0.051), (52, 0.044), (60, 0.03), (77, 0.034), (90, 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.74403137 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process
Author: Joseph L. Austerweil, Thomas L. Griffiths
Abstract: Identifying the features of objects becomes a challenge when those features can change in their appearance. We introduce the Transformed Indian Buffet Process (tIBP), and use it to define a nonparametric Bayesian model that infers features that can transform across instantiations. We show that this model can identify features that are location invariant by modeling a previous experiment on human feature learning. However, allowing features to transform adds new kinds of ambiguity: Are two parts of an object the same feature with different transformations or two unique features? What transformations can features undergo? We present two new experiments in which we explore how people resolve these questions, showing that the tIBP model demonstrates a similar sensitivity to context to that shown by human learners when determining the invariant aspects of features. 1
2 0.71957779 84 nips-2010-Exact inference and learning for cumulative distribution functions on loopy graphs
Author: Nebojsa Jojic, Chris Meek, Jim C. Huang
Abstract: Many problem domains including climatology and epidemiology require models that can capture both heavy-tailed statistics and local dependencies. Specifying such distributions using graphical models for probability density functions (PDFs) generally lead to intractable inference and learning. Cumulative distribution networks (CDNs) provide a means to tractably specify multivariate heavy-tailed models as a product of cumulative distribution functions (CDFs). Existing algorithms for inference and learning in CDNs are limited to those with tree-structured (nonloopy) graphs. In this paper, we develop inference and learning algorithms for CDNs with arbitrary topology. Our approach to inference and learning relies on recursively decomposing the computation of mixed derivatives based on a junction trees over the cumulative distribution functions. We demonstrate that our systematic approach to utilizing the sparsity represented by the junction tree yields signiďŹ cant performance improvements over the general symbolic differentiation programs Mathematica and D*. Using two real-world datasets, we demonstrate that non-tree structured (loopy) CDNs are able to provide signiďŹ cantly better ďŹ ts to the data as compared to tree-structured and unstructured CDNs and other heavy-tailed multivariate distributions such as the multivariate copula and logistic models.
3 0.71485698 48 nips-2010-Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm
Author: Nathan Srebro, Ruslan Salakhutdinov
Abstract: We show that matrix completion with trace-norm regularization can be significantly hurt when entries of the matrix are sampled non-uniformly, but that a properly weighted version of the trace-norm regularizer works well with non-uniform sampling. We show that the weighted trace-norm regularization indeed yields significant gains on the highly non-uniformly sampled Netflix dataset.
4 0.61316967 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models
Author: Ryan Kelly, Matthew Smith, Robert Kass, Tai S. Lee
Abstract: Activity of a neuron, even in the early sensory areas, is not simply a function of its local receptive field or tuning properties, but depends on global context of the stimulus, as well as the neural context. This suggests the activity of the surrounding neurons and global brain states can exert considerable influence on the activity of a neuron. In this paper we implemented an L1 regularized point process model to assess the contribution of multiple factors to the firing rate of many individual units recorded simultaneously from V1 with a 96-electrode “Utah” array. We found that the spikes of surrounding neurons indeed provide strong predictions of a neuron’s response, in addition to the neuron’s receptive field transfer function. We also found that the same spikes could be accounted for with the local field potentials, a surrogate measure of global network states. This work shows that accounting for network fluctuations can improve estimates of single trial firing rate and stimulus-response transfer functions. 1
5 0.61278898 268 nips-2010-The Neural Costs of Optimal Control
Author: Samuel Gershman, Robert Wilson
Abstract: Optimal control entails combining probabilities and utilities. However, for most practical problems, probability densities can be represented only approximately. Choosing an approximation requires balancing the benefits of an accurate approximation against the costs of computing it. We propose a variational framework for achieving this balance and apply it to the problem of how a neural population code should optimally represent a distribution under resource constraints. The essence of our analysis is the conjecture that population codes are organized to maximize a lower bound on the log expected utility. This theory can account for a plethora of experimental data, including the reward-modulation of sensory receptive fields, GABAergic effects on saccadic movements, and risk aversion in decisions under uncertainty. 1
6 0.61169505 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
7 0.60998398 98 nips-2010-Functional form of motion priors in human motion perception
9 0.60841507 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
10 0.60799569 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
11 0.60751069 161 nips-2010-Linear readout from a neural population with partial correlation data
12 0.60745895 17 nips-2010-A biologically plausible network for the computation of orientation dominance
13 0.60706007 194 nips-2010-Online Learning for Latent Dirichlet Allocation
14 0.60569698 155 nips-2010-Learning the context of a category
15 0.60330182 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
16 0.60212123 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts
17 0.6010282 131 nips-2010-Joint Analysis of Time-Evolving Binary Matrices and Associated Documents
18 0.60066283 56 nips-2010-Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication
19 0.6002996 117 nips-2010-Identifying graph-structured activation patterns in networks
20 0.5997386 150 nips-2010-Learning concept graphs from text with stick-breaking priors