nips nips2011 nips2011-304 knowledge-graph by maker-knowledge-mining

304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition

Source: pdf

Author: Joel Z. Leibo, Jim Mutch, Tomaso Poggio

Abstract: Many studies have uncovered evidence that visual cortex contains specialized regions involved in processing faces but not other object classes. Recent electrophysiology studies of cells in several of these specialized regions revealed that at least some of these regions are organized in a hierarchical manner with viewpointspeciﬁc cells projecting to downstream viewpoint-invariant identity-speciﬁc cells [1]. A separate computational line of reasoning leads to the claim that some transformations of visual inputs that preserve viewed object identity are class-speciﬁc. In particular, the 2D images evoked by a face undergoing a 3D rotation are not produced by the same image transformation (2D) that would produce the images evoked by an object of another class undergoing the same 3D rotation. However, within the class of faces, knowledge of the image transformation evoked by 3D rotation can be reliably transferred from previously viewed faces to help identify a novel face at a new viewpoint. We show, through computational simulations, that an architecture which applies this method of gaining invariance to class-speciﬁc transformations is effective when restricted to faces and fails spectacularly when applied to other object classes. We argue here that in order to accomplish viewpoint-invariant face identiﬁcation from a single example view, visual cortex must separate the circuitry involved in discounting 3D rotations of faces from the generic circuitry involved in processing other objects. The resulting model of the ventral stream of visual cortex is consistent with the recent physiology results showing the hierarchical organization of the face processing network. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Many studies have uncovered evidence that visual cortex contains specialized regions involved in processing faces but not other object classes. [sent-5, score-0.896]

2 Recent electrophysiology studies of cells in several of these specialized regions revealed that at least some of these regions are organized in a hierarchical manner with viewpointspeciﬁc cells projecting to downstream viewpoint-invariant identity-speciﬁc cells [1]. [sent-6, score-0.753]

3 A separate computational line of reasoning leads to the claim that some transformations of visual inputs that preserve viewed object identity are class-speciﬁc. [sent-7, score-0.652]

4 In particular, the 2D images evoked by a face undergoing a 3D rotation are not produced by the same image transformation (2D) that would produce the images evoked by an object of another class undergoing the same 3D rotation. [sent-8, score-1.129]

5 However, within the class of faces, knowledge of the image transformation evoked by 3D rotation can be reliably transferred from previously viewed faces to help identify a novel face at a new viewpoint. [sent-9, score-0.765]

6 We show, through computational simulations, that an architecture which applies this method of gaining invariance to class-speciﬁc transformations is effective when restricted to faces and fails spectacularly when applied to other object classes. [sent-10, score-0.957]

7 We argue here that in order to accomplish viewpoint-invariant face identiﬁcation from a single example view, visual cortex must separate the circuitry involved in discounting 3D rotations of faces from the generic circuitry involved in processing other objects. [sent-11, score-1.244]

8 The resulting model of the ventral stream of visual cortex is consistent with the recent physiology results showing the hierarchical organization of the face processing network. [sent-12, score-0.993]

9 1 Introduction There is increasing evidence that visual cortex contains discrete patches involved in processing faces but not other objects [2, 3, 4, 5, 6, 7]. [sent-13, score-0.874]

10 Though progress has been made recently in characterizing the properties of these brain areas, the computational-level reason the brain adopts this modular architecture has remained unknown. [sent-14, score-0.273]

11 In this paper, we propose a new computational-level explanation for why visual cortex separates face processing from object processing. [sent-15, score-0.814]

12 Our argument does not require us to claim that faces are automatically processed in ways that are inapplicable to objects (e. [sent-16, score-0.377]

13 gaze detection, gender detection) or that cortical specialization for faces arises due to perceptual expertise [8], though the perspective that emerges from our model is consistent with both of these claims. [sent-18, score-0.249]

14 We show that the task of identifying individual faces in an optimally viewpoint invariant way from single training examples requires a separate neural circuitry specialized for faces. [sent-19, score-0.818]

15 g, translation, scaling and 2D in-plane rotation can be learned from any 1 Figure 1: Layout of face-selective regions in macaque visual cortex, adapted from [1] with permission. [sent-22, score-0.34]

16 object class and usefully applied to any other class [9]. [sent-23, score-0.34]

17 Other transformations which are classspeciﬁc, include changes in viewpoint and illumination. [sent-24, score-0.551]

18 In this paper, we describe a method by which invariance to class-speciﬁc transformations can be encoded and used for within-class identiﬁcation. [sent-27, score-0.502]

19 The resulting model of visual cortex must separate the representations of different classes in order to achieve good performance. [sent-28, score-0.428]

20 Section 2 of this paper describes the recently discovered hierarchical organization of the macaque face processing network [1]. [sent-30, score-0.42]

21 Sections 3 and 4 describe an extension to an existing hierarchical model of object recognition to include invariances for class-speciﬁc transformations. [sent-31, score-0.391]

22 The ﬁnal section explains why the brain should have separate modules and relates the proposed computational model to physiology and neuroimaging evidence that the brain does indeed separate face recognition from object recognition. [sent-32, score-0.843]

23 Consistent with a hierarchical organization involving information passing from ML/MF to AM via AL, electrical stimulation of ML elicited a response in AL and stimulation in AL elicited a response in AM [10]. [sent-36, score-0.348]

24 The ﬁring rates of cells in ML/MF are most strongly modulated by face viewpoint. [sent-37, score-0.424]

25 Further along the hierarchy, in patch AM, cells are highly selective for individual faces but tolerate substantial changes in viewpoint [1]. [sent-38, score-0.912]

26 In this paper, we argue that such a system – with view-tuned cells upstream from view-invariant identity-selective cells – is ideally suited to support face identiﬁcation. [sent-40, score-0.628]

27 In the subsequent sections, we present a model of the ventral stream that is consistent with a large body of experimental results1 and additionally predicts the existence of discrete face-selective patches organized in this manner. [sent-41, score-0.322]

28 2 3 Hubel-Wiesel inspired hierarchical models of object recognition At the end of the ventral visual pathway, cells in the most anterior parts of visual cortex respond selectively to highly complex stimuli and also invariantly over several degrees of visual angle. [sent-43, score-1.651]

29 Hierarchical models inspired by Hubel and Wiesel’s work, H-W models, seek to achieve similar selectivity and invariance properties by subjecting visual inputs to successive tuning and pooling operations [12, 13, 14, 15]. [sent-44, score-0.46]

30 A major algorithmic claim made by these H-W models is that repeated application of this AND-like tuning operation is the source of the selective responses of cells at the end of the ventral stream. [sent-45, score-0.461]

31 Hubel and Wiesel described complex cells as pooling the outputs of simple cells with the same optimal stimuli but receptive ﬁelds in different locations [16]. [sent-47, score-0.559]

32 Similar pooling operations can also be employed to gain tolerance to other image transformations, including those induced by changes in viewpoint or illumination. [sent-50, score-0.387]

33 , viewpoint, simply by connecting to (simple-like) cells that are selective for the appearance of the same feature at different viewpoints. [sent-54, score-0.249]

34 Complex (C) cells pool over S cells by computing the max response of all the S cells with which they are connected. [sent-57, score-0.653]

35 There is also psychophysical and physiological evidence that visual cortex employs a temporal association strategy2 [23, 24, 25, 26, 27]. [sent-62, score-0.556]

36 4 Invariance to class-speciﬁc transformations H-W models can gain invariance to some transformations in a generic way. [sent-63, score-0.743]

37 , translation, scaling, and in-plane rotation, then the model’s response to any image undergoing the transformation will remain constant no matter what templates were associated with one another to build the model. [sent-66, score-0.308]

38 For example, a face can be encoded invariantly to translation as a vector of similarities to previously viewed template images of any other objects. [sent-67, score-0.563]

39 Other transformations are classspeciﬁc, that is, they depend on information about the depicted object that is not available in a single image. [sent-70, score-0.442]

40 For example, the 2D image evoked by an object undergoing a change in viewpoint depends on its 3D structure. [sent-71, score-0.633]

41 Likewise, the images evoked by changes in illumination depend on the object’s material properties. [sent-72, score-0.326]

42 These class-speciﬁc properties can be learned from one or more exemplars of the class and applied to other objects in the class (see also [28, 29]). [sent-73, score-0.299]

43 For this to work, the object class needs to consist of objects with similar 3D shape and material properties. [sent-74, score-0.401]

44 2 These temporal association algorithms and the evidence for their employment by visual cortex are interesting in their own right. [sent-77, score-0.494]

45 In this paper we sidestep the issue of how visual cortex associates similar features under different transformations in order to focus on the implications of having the representation that results from applying these learning rules. [sent-78, score-0.624]

46 3 C3 S3 C2 S2 C1 S1 Tuning Pooling Input Figure 2: Illustration of an extension to the HMAX model to incorporate class-speciﬁc invariance to face viewpoint changes. [sent-79, score-0.689]

47 rw (x) is invariant to viewpoint changes of the input face x, as long as the 3D structure of the face depicted in the template images wt matches the 3D structure of the face depicted in x. [sent-83, score-1.269]

48 Since all human faces have a relatively similar 3D structure, rw (x) will tolerate substantial viewpoint changes within the domain of faces. [sent-84, score-0.709]

49 It follows that templates derived from a class of objects with the wrong 3D structure give rise to C cells that do not respond invariantly to 3D rotations. [sent-85, score-0.607]

50 That is, a single view of a target object is encoded and a simple classiﬁer (nearest neighbors) must rank test images depicting the same object as being more similar to the encoded target than to images of any other objects. [sent-88, score-0.666]

51 This task models the common situation of encountering a new face or object at one viewpoint and then being asked to recognize it again later from a different viewpoint. [sent-90, score-0.677]

52 The original HMAX model [14], represented here by the red curves (C2), shows a rapid decline in performance due to changes in viewpoint and illumination. [sent-91, score-0.315]

53 Additionally, the performance of the C3 features is not strongly affected by viewpoint and illumination changes (see the plots along the diagonal in ﬁgures 3I and 4I). [sent-93, score-0.395]

54 Class A consists of faces produced using FaceGen (Singular Inversions). [sent-96, score-0.249]

55 Each object in this class has a central spike protruding from a sphere and two bumps always in the same location on top of the sphere. [sent-98, score-0.273]

56 Each object in this class has a central pyramid on a ﬂat plane and two walls on either side. [sent-101, score-0.273]

57 The S3/C3 templates were obtained from objects in class A in the top row, class B in the middle row and class C in the bottom row. [sent-105, score-0.507]

58 The abscissa of each plot shows the maximum invariance range (maximum deviation from the frontal view in either direction) over which targets and distractors were presented. [sent-106, score-0.31]

59 The ordinate shows the AUC obtained for the task of recognizing an individual novel object despite changes in viewpoint. [sent-107, score-0.31]

60 A simple correlation-based nearest-neighbor classiﬁer must rank all images of the same object at different viewpoints as being more similar to the frontal view than other objects. [sent-109, score-0.325]

61 Simulation details: These simulations used 2000 translation and scaling invariant C2 units tuned to patches of natural images. [sent-111, score-0.326]

62 Each class consists of faces with different light reﬂectance properties, modeling different materials. [sent-118, score-0.316]

63 S3/C3 templates were obtained from objects in class A (top row), B (middle row), and C (bottom row). [sent-125, score-0.335]

64 As in ﬁgure 3, the abscissa of each plot shows the maximum invariance range (maximum distance the light could move in either direction away from a neutral position where the lamp is even with the middle of the head) over which targets and distractors were presented. [sent-127, score-0.348]

65 The ordinate shows the AUC obtained for the task of recognizing an individual novel object despite changes in illumination. [sent-128, score-0.31]

66 A correlation-based nearest-neighbor “classiﬁer” must rank all images of the same object under each illumination condition as being more similar to the neutral view than other objects. [sent-129, score-0.365]

67 Simulation details: These simulations used 80 translation and scaling invariant C2 units tuned to patches of natural images. [sent-131, score-0.326]

68 The above simulations used 1200 S3 units (80 exemplar faces and 15 illumination conditions) and 80 C3 units. [sent-134, score-0.367]

69 5 Conclusion Everyday visual tasks require reasonably good invariance to non-generic transformations like changes in viewpoint and illumination3 . [sent-140, score-0.939]

70 We showed that a broad class of ventral stream models that is well-supported by physiology data (H-W models) require class-speciﬁc modules in order to accomplish these tasks. [sent-141, score-0.388]

71 The responses of cells in an early part of the hierarchy (patches ML and MF) are strongly dependent on viewpoint, while the cells in a downstream area (patch AM) tolerate large changes in viewpoint. [sent-143, score-0.695]

72 Identifying the S3 layer of our extended HMAX model with the ML/MF cells and the C3 layer with the AM cells is an intruiging possibility. [sent-144, score-0.408]

73 Another mapping from the model to the physiology could be to identify the outputs of simple classiﬁers operating on C2, S3 or C3 layers with the responses of cells in ML/MF and AM. [sent-145, score-0.327]

74 Fundamentally, the 3D rotation of an object class with one 3D structure e. [sent-146, score-0.345]

75 , faces, is not the same as the 3D rotation of another class of objects with a different 3D structure. [sent-148, score-0.267]

76 Generic circuitry cannot take into account both transformations at once. [sent-149, score-0.364]

77 Since the brain must take these transformations into account in interpreting the visual world, it follows that visual cortex must have a modular architecture. [sent-151, score-0.929]

78 Object classes that are important enough to require invariance to these transformations of novel exemplars must be encoded by dedicated circuitry. [sent-152, score-0.594]

79 We do not think it is coincidental that, just as for faces, brain areas which are thought to be specialized for visual processing of the human body (the extrastriate body area [32]) and reading (the visual word form area [33, 34]) are consistently found in human fMRI experiments. [sent-155, score-0.737]

80 We have argued in favor of visual cortex implenting a modularity of content rather than process. [sent-156, score-0.388]

81 The only difference across areas is the object class (and the transformations) being encoded. [sent-159, score-0.273]

82 In this view, visual cortex must be modular in order to succeed in the tasks with which it is faced. [sent-160, score-0.433]

83 of Brain & Cognitive 3 It is sometimes claimed that human vision is not viewpoint invariant [30]. [sent-162, score-0.394]

84 It is certainly true that performance on psychophysical tasks requiring viewpoint invariance is worse than on tasks requiring translation invariance. [sent-163, score-0.613]

85 Many psychophysical experiments on viewpoint invariance were performed with synthetic “paperclip” objects deﬁned entirely by their 3D structure. [sent-167, score-0.659]

86 Chun, “The fusiform face area: a module in human extrastriate cortex specialized for face perception,” The Journal of Neuroscience, vol. [sent-182, score-0.893]

87 Kanwisher, “The fusiform face area subserves face perception, not generic within-category identiﬁcation,” Nature Neuroscience, vol. [sent-189, score-0.608]

88 Tootell, “An anterior temporal face patch in human cortex, predicted by macaque maps,” Proceedings of the National Academy of Sciences, vol. [sent-213, score-0.592]

89 Gauthier, “FFA: a ﬂexible fusiform area for subordinate-level visual processing automatized by expertise,” Nature Neuroscience, vol. [sent-227, score-0.285]

90 Tsao, “Patches with links: a uniﬁed system for processing faces in the macaque temporal lobe,” Science, vol. [sent-240, score-0.413]

91 Poggio, “A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex,” CBCL Paper #259/AI Memo #2005-036, 2005. [sent-250, score-0.589]

92 Poggio, “Hierarchical models of object recognition in cortex,” Nature Neuroscience, vol. [sent-258, score-0.293]

93 F¨ ldi´ k, “Learning invariance from transformation sequences,” Neural Computation, vol. [sent-293, score-0.272]

94 Rolls, “Invariant object recognition in the visual system with novel views of 3D objects,” Neural Computation, vol. [sent-299, score-0.463]

95 Spratling, “Learning viewpoint invariant perceptual representations from cluttered images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. [sent-315, score-0.348]

96 DiCarlo, “Unsupervised natural experience rapidly alters invariant object representation in visual cortex. [sent-331, score-0.473]

97 B¨ lthoff, “Learning illumination-and orientationu invariant representations of objects through temporal association,” Journal of vision, vol. [sent-356, score-0.291]

98 Poggio, “View-based models of 3D object recognition: invariance to imaging transformations,” Cerebral Cortex, vol. [sent-362, score-0.424]

99 Poggio, “View-dependent object recognition by monkeys,” u Current Biology, vol. [sent-382, score-0.293]

100 Jiang, “A cortical area selective for visual processing of the human body,” Science, vol. [sent-388, score-0.308]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('viewpoint', 0.251), ('faces', 0.249), ('transformations', 0.236), ('face', 0.22), ('invariance', 0.218), ('cortex', 0.218), ('object', 0.206), ('cells', 0.204), ('visual', 0.17), ('ventral', 0.159), ('hmax', 0.159), ('poggio', 0.159), ('templates', 0.14), ('circuitry', 0.128), ('objects', 0.128), ('anterior', 0.11), ('patches', 0.109), ('evoked', 0.103), ('macaque', 0.098), ('invariant', 0.097), ('freiwald', 0.091), ('lthoff', 0.091), ('tsao', 0.091), ('brain', 0.09), ('recognition', 0.087), ('translation', 0.082), ('illumination', 0.08), ('images', 0.079), ('auc', 0.075), ('undergoing', 0.073), ('rotation', 0.072), ('pooling', 0.072), ('physiology', 0.07), ('extrastriate', 0.068), ('fusiform', 0.068), ('invariantly', 0.068), ('kanwisher', 0.068), ('tootell', 0.068), ('class', 0.067), ('temporal', 0.066), ('template', 0.066), ('changes', 0.064), ('psychophysical', 0.062), ('hubel', 0.06), ('mutch', 0.06), ('serre', 0.06), ('wiesel', 0.055), ('dedicated', 0.055), ('stream', 0.054), ('transformation', 0.054), ('generic', 0.053), ('responses', 0.053), ('specialized', 0.053), ('patch', 0.052), ('dicarlo', 0.052), ('rw', 0.052), ('distractors', 0.052), ('neuroscience', 0.051), ('organization', 0.051), ('hierarchical', 0.051), ('encoded', 0.048), ('architecture', 0.048), ('area', 0.047), ('invariances', 0.047), ('tolerate', 0.047), ('human', 0.046), ('blender', 0.045), ('cbcl', 0.045), ('classspeci', 0.045), ('facegen', 0.045), ('leibo', 0.045), ('macaques', 0.045), ('wallis', 0.045), ('modular', 0.045), ('stimulation', 0.045), ('selective', 0.045), ('receptive', 0.041), ('response', 0.041), ('separate', 0.04), ('mcdermott', 0.04), ('ordinate', 0.04), ('logothetis', 0.04), ('opaque', 0.04), ('abscissa', 0.04), ('viewpoints', 0.04), ('association', 0.04), ('hierarchy', 0.039), ('identi', 0.039), ('academy', 0.038), ('stimuli', 0.038), ('simulations', 0.038), ('accomplish', 0.038), ('middle', 0.038), ('panel', 0.037), ('elicited', 0.037), ('ective', 0.037), ('riesenhuber', 0.037), ('downstream', 0.037), ('exemplars', 0.037), ('ullman', 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition

Author: Joel Z. Leibo, Jim Mutch, Tomaso Poggio

2 0.14514107 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

Author: Congcong Li, Ashutosh Saxena, Tsuhan Chen

Abstract: For most scene understanding tasks (such as object detection or depth estimation), the classiﬁers need to consider contextual information in addition to the local features. We can capture such contextual information by taking as input the features/attributes from all the regions in the image. However, this contextual dependence also varies with the spatial location of the region of interest, and we therefore need a different set of parameters for each spatial location. This results in a very large number of parameters. In this work, we model the independence properties between the parameters for each location and for each task, by deﬁning a Markov Random Field (MRF) over the parameters. In particular, two sets of parameters are encouraged to have similar values if they are spatially close or semantically close. Our method is, in principle, complementary to other ways of capturing context such as the ones that use a graphical model over the labels instead. In extensive evaluation over two different settings, of multi-class object detection and of multiple scene understanding tasks (scene categorization, depth estimation, geometric labeling), our method beats the state-of-the-art methods in all the four tasks. 1

3 0.13453388 298 nips-2011-Unsupervised learning models of primary cortical receptive fields and receptive field plasticity

Author: Maneesh Bhand, Ritvik Mudur, Bipin Suresh, Andrew Saxe, Andrew Y. Ng

Abstract: The efﬁcient coding hypothesis holds that neural receptive ﬁelds are adapted to the statistics of the environment, but is agnostic to the timescale of this adaptation, which occurs on both evolutionary and developmental timescales. In this work we focus on that component of adaptation which occurs during an organism’s lifetime, and show that a number of unsupervised feature learning algorithms can account for features of normal receptive ﬁeld properties across multiple primary sensory cortices. Furthermore, we show that the same algorithms account for altered receptive ﬁeld properties in response to experimentally altered environmental statistics. Based on these modeling results we propose these models as phenomenological models of receptive ﬁeld plasticity during an organism’s lifetime. Finally, due to the success of the same models in multiple sensory areas, we suggest that these algorithms may provide a constructive realization of the theory, ﬁrst proposed by Mountcastle [1], that a qualitatively similar learning algorithm acts throughout primary sensory cortices. 1

4 0.12713814 154 nips-2011-Learning person-object interactions for action recognition in still images

Author: Vincent Delaitre, Josef Sivic, Ivan Laptev

Abstract: We investigate a discriminatively trained model of person-object interactions for recognizing common human actions in still images. We build on the locally order-less spatial pyramid bag-of-features model, which was shown to perform extremely well on a range of object, scene and human action recognition tasks. We introduce three principal contributions. First, we replace the standard quantized local HOG/SIFT features with stronger discriminatively trained body part and object detectors. Second, we introduce new person-object interaction features based on spatial co-occurrences of individual body parts and objects. Third, we address the combinatorial problem of a large number of possible interaction pairs and propose a discriminative selection procedure using a linear support vector machine (SVM) with a sparsity inducing regularizer. Learning of action-speciﬁc body part and object interactions bypasses the difﬁcult problem of estimating the complete human body pose conﬁguration. Beneﬁts of the proposed model are shown on human action recognition in consumer photographs, outperforming the strong bag-of-features baseline. 1

5 0.12534942 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning

Author: Xinggang Wang, Xiang Bai, Xingwei Yang, Wenyu Liu, Longin J. Latecki

Abstract: We propose a novel inference framework for ﬁnding maximal cliques in a weighted graph that satisfy hard constraints. The constraints specify the graph nodes that must belong to the solution as well as mutual exclusions of graph nodes, i.e., sets of nodes that cannot belong to the same solution. The proposed inference is based on a novel particle ﬁlter algorithm with state permeations. We apply the inference framework to a challenging problem of learning part-based, deformable object models. Two core problems in the learning framework, matching of image patches and ﬁnding salient parts, are formulated as two instances of the problem of ﬁnding maximal cliques with hard constraints. Our learning framework yields discriminative part based object models that achieve very good detection rate, and outperform other methods on object classes with large deformation. 1

6 0.1189393 244 nips-2011-Selecting Receptive Fields in Deep Networks

7 0.10810569 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

8 0.10336888 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness

9 0.10166218 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection

10 0.099202961 219 nips-2011-Predicting response time and error rates in visual search

11 0.098682277 231 nips-2011-Randomized Algorithms for Comparison-based Search

12 0.094104648 127 nips-2011-Image Parsing with Stochastic Scene Grammar

13 0.090943597 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

14 0.088673376 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model

15 0.087298289 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs

16 0.084137626 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

17 0.083943307 259 nips-2011-Sparse Estimation with Structured Dictionaries

18 0.083559446 141 nips-2011-Large-Scale Category Structure Aware Image Categorization

19 0.082266949 22 nips-2011-Active Ranking using Pairwise Comparisons

20 0.081300408 35 nips-2011-An ideal observer model for identifying the reference frame of objects

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.199), (1, 0.161), (2, -0.008), (3, 0.158), (4, 0.086), (5, 0.09), (6, 0.044), (7, 0.065), (8, 0.01), (9, -0.024), (10, 0.012), (11, -0.0), (12, 0.017), (13, 0.051), (14, 0.105), (15, -0.021), (16, 0.04), (17, 0.003), (18, 0.011), (19, 0.06), (20, 0.002), (21, 0.013), (22, -0.084), (23, 0.092), (24, -0.011), (25, -0.008), (26, 0.021), (27, 0.05), (28, 0.041), (29, -0.061), (30, 0.063), (31, 0.133), (32, -0.027), (33, 0.006), (34, 0.01), (35, 0.007), (36, -0.042), (37, 0.012), (38, -0.07), (39, 0.101), (40, -0.024), (41, 0.026), (42, -0.048), (43, -0.103), (44, -0.111), (45, -0.001), (46, 0.053), (47, -0.038), (48, 0.044), (49, 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9688229 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition

Author: Joel Z. Leibo, Jim Mutch, Tomaso Poggio

2 0.69270945 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

Author: Congcong Li, Ashutosh Saxena, Tsuhan Chen

3 0.65434128 35 nips-2011-An ideal observer model for identifying the reference frame of objects

Author: Joseph L. Austerweil, Abram L. Friesen, Thomas L. Griffiths

Abstract: The object people perceive in an image can depend on its orientation relative to the scene it is in (its reference frame). For example, the images of the symbols × and + differ by a 45 degree rotation. Although real scenes have multiple images and reference frames, psychologists have focused on scenes with only one reference frame. We propose an ideal observer model based on nonparametric Bayesian statistics for inferring the number of reference frames in a scene and their parameters. When an ambiguous image could be assigned to two conﬂicting reference frames, the model predicts two factors should inﬂuence the reference frame inferred for the image: The image should be more likely to share the reference frame of the closer object (proximity) and it should be more likely to share the reference frame containing the most objects (alignment). We conﬁrm people use both cues using a novel methodology that allows for easy testing of human reference frame inference. 1

4 0.64851063 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs

Author: Vicente Ordonez, Girish Kulkarni, Tamara L. Berg

Abstract: We develop and demonstrate automatic image description methods using a large captioned photo collection. One contribution is our technique for the automatic collection of this new dataset – performing a huge number of Flickr queries and then ﬁltering the noisy results down to 1 million images with associated visually relevant captions. Such a collection allows us to approach the extremely challenging problem of description generation using relatively simple non-parametric methods and produces surprisingly effective results. We also develop methods incorporating many state of the art, but fairly noisy, estimates of image content to produce even more pleasing results. Finally we introduce a new objective performance measure for image captioning. 1

5 0.64304549 138 nips-2011-Joint 3D Estimation of Objects and Scene Layout

Author: Andreas Geiger, Christian Wojek, Raquel Urtasun

Abstract: We propose a novel generative model that is able to reason jointly about the 3D scene layout as well as the 3D location and orientation of objects in the scene. In particular, we infer the scene topology, geometry as well as trafﬁc activities from a short video sequence acquired with a single camera mounted on a moving car. Our generative model takes advantage of dynamic information in the form of vehicle tracklets as well as static information coming from semantic labels and geometry (i.e., vanishing points). Experiments show that our approach outperforms a discriminative baseline based on multiple kernel learning (MKL) which has access to the same image information. Furthermore, as we reason about objects in 3D, we are able to signiﬁcantly increase the performance of state-of-the-art object detectors in their ability to estimate object orientation. 1

6 0.64011461 154 nips-2011-Learning person-object interactions for action recognition in still images

7 0.61963755 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection

8 0.59902191 293 nips-2011-Understanding the Intrinsic Memorability of Images

9 0.58049238 85 nips-2011-Emergence of Multiplication in a Biophysical Model of a Wide-Field Visual Neuron for Computing Object Approaches: Dynamics, Peaks, & Fits

10 0.55852461 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning

11 0.55528933 184 nips-2011-Neuronal Adaptation for Sampling-Based Probabilistic Inference in Perceptual Bistability

12 0.55343544 298 nips-2011-Unsupervised learning models of primary cortical receptive fields and receptive field plasticity

13 0.53261757 193 nips-2011-Object Detection with Grammar Models

14 0.52264005 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

15 0.52136618 127 nips-2011-Image Parsing with Stochastic Scene Grammar

16 0.49756899 244 nips-2011-Selecting Receptive Fields in Deep Networks

17 0.49024999 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

18 0.49022502 141 nips-2011-Large-Scale Category Structure Aware Image Categorization

19 0.47153085 125 nips-2011-Identifying Alzheimer's Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis

20 0.45574284 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.027), (4, 0.037), (20, 0.071), (26, 0.011), (27, 0.276), (31, 0.079), (33, 0.025), (43, 0.042), (45, 0.115), (57, 0.039), (65, 0.052), (74, 0.055), (83, 0.053), (84, 0.026), (99, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.80268466 6 nips-2011-A Global Structural EM Algorithm for a Model of Cancer Progression

Author: Ali Tofigh, Erik Sj̦lund, Mattias H̦glund, Jens Lagergren

Abstract: Cancer has complex patterns of progression that include converging as well as diverging progressional pathways. Vogelstein’s path model of colon cancer was a pioneering contribution to cancer research. Since then, several attempts have been made at obtaining mathematical models of cancer progression, devising learning algorithms, and applying these to cross-sectional data. Beerenwinkel et al. provided, what they coined, EM-like algorithms for Oncogenetic Trees (OTs) and mixtures of such. Given the small size of current and future data sets, it is important to minimize the number of parameters of a model. For this reason, we too focus on tree-based models and introduce Hidden-variable Oncogenetic Trees (HOTs). In contrast to OTs, HOTs allow for errors in the data and thereby provide more realistic modeling. We also design global structural EM algorithms for learning HOTs and mixtures of HOTs (HOT-mixtures). The algorithms are global in the sense that, during the M-step, they ﬁnd a structure that yields a global maximum of the expected complete log-likelihood rather than merely one that improves it. The algorithm for single HOTs performs very well on reasonable-sized data sets, while that for HOT-mixtures requires data sets of sizes obtainable only with tomorrow’s more cost-efﬁcient technologies. 1

same-paper 2 0.79976928 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition

Author: Joel Z. Leibo, Jim Mutch, Tomaso Poggio

3 0.77530771 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference

Author: David Wingate, Noah Goodman, Andreas Stuhlmueller, Jeffrey M. Siskind

Abstract: Probabilistic programming languages allow modelers to specify a stochastic process using syntax that resembles modern programming languages. Because the program is in machine-readable format, a variety of techniques from compiler design and program analysis can be used to examine the structure of the distribution represented by the probabilistic program. We show how nonstandard interpretations of probabilistic programs can be used to craft efﬁcient inference algorithms: information about the structure of a distribution (such as gradients or dependencies) is generated as a monad-like side computation while executing the program. These interpretations can be easily coded using special-purpose objects and operator overloading. We implement two examples of nonstandard interpretations in two different languages, and use them as building blocks to construct inference algorithms: automatic differentiation, which enables gradient based methods, and provenance tracking, which enables efﬁcient construction of global proposals. 1

4 0.6779142 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database

Author: Joshua T. Abbott, Katherine A. Heller, Zoubin Ghahramani, Thomas L. Griffiths

Abstract: How do people determine which elements of a set are most representative of that set? We extend an existing Bayesian measure of representativeness, which indicates the representativeness of a sample from a distribution, to deﬁne a measure of the representativeness of an item to a set. We show that this measure is formally related to a machine learning method known as Bayesian Sets. Building on this connection, we derive an analytic expression for the representativeness of objects described by a sparse vector of binary features. We then apply this measure to a large database of images, using it to determine which images are the most representative members of different sets. Comparing the resulting predictions to human judgments of representativeness provides a test of this measure with naturalistic stimuli, and illustrates how databases that are more commonly used in computer vision and machine learning can be used to evaluate psychological theories. 1

5 0.54450458 244 nips-2011-Selecting Receptive Fields in Deep Networks

Author: Adam Coates, Andrew Y. Ng

Abstract: Recent deep learning and unsupervised feature learning systems that learn from unlabeled data have achieved high performance in benchmarks by using extremely large architectures with many features (hidden units) at each layer. Unfortunately, for such large architectures the number of parameters can grow quadratically in the width of the network, thus necessitating hand-coded “local receptive ﬁelds” that limit the number of connections from lower level features to higher ones (e.g., based on spatial locality). In this paper we propose a fast method to choose these connections that may be incorporated into a wide variety of unsupervised training methods. Speciﬁcally, we choose local receptive ﬁelds that group together those low-level features that are most similar to each other according to a pairwise similarity metric. This approach allows us to harness the advantages of local receptive ﬁelds (such as improved scalability, and reduced data requirements) when we do not know how to specify such receptive ﬁelds by hand or where our unsupervised training algorithm has no obvious generalization to a topographic setting. We produce results showing how this method allows us to use even simple unsupervised training algorithms to train successful multi-layered networks that achieve state-of-the-art results on CIFAR and STL datasets: 82.0% and 60.1% accuracy, respectively. 1

6 0.54261607 124 nips-2011-ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning

7 0.535604 227 nips-2011-Pylon Model for Semantic Segmentation

8 0.53548515 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

9 0.53379816 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries

10 0.53221691 180 nips-2011-Multiple Instance Filtering

11 0.53154457 156 nips-2011-Learning to Learn with Compound HD Models

12 0.53093702 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition

13 0.52993983 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

14 0.52929181 276 nips-2011-Structured sparse coding via lateral inhibition

15 0.52742255 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

16 0.52731222 303 nips-2011-Video Annotation and Tracking with Active Learning

17 0.52671838 263 nips-2011-Sparse Manifold Clustering and Embedding

18 0.52668661 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data

19 0.52637094 35 nips-2011-An ideal observer model for identifying the reference frame of objects

20 0.52481121 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling