nips nips2001 nips2001-54 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Antonio Torralba
Abstract: The most popular algorithms for object detection require the use of exhaustive spatial and scale search procedures. In such approaches, an object is defined by means of local features. fu this paper we show that including contextual information in object detection procedures provides an efficient way of cutting down the need for exhaustive search. We present results with real images showing that the proposed scheme is able to accurately predict likely object classes, locations and sizes. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract The most popular algorithms for object detection require the use of exhaustive spatial and scale search procedures. [sent-4, score-0.814]
2 In such approaches, an object is defined by means of local features. [sent-5, score-0.637]
3 fu this paper we show that including contextual information in object detection procedures provides an efficient way of cutting down the need for exhaustive search. [sent-6, score-1.207]
4 We present results with real images showing that the proposed scheme is able to accurately predict likely object classes, locations and sizes. [sent-7, score-0.638]
5 1 Introduction Although there is growing evidence of the role of contextual information in human perception [1], research in computational vision is dominated by object-based representations [5,9,10,15]. [sent-8, score-0.475]
6 In real-world scenes, intrinsic object information is often degraded due to occlusion, low contrast, and poor resolution. [sent-9, score-0.626]
7 In such situations, the object recognition problem based on intrinsic object representations is ill-posed. [sent-10, score-1.233]
8 A more comprehensive representation of an object should include contextual information [11,13]: Obj. [sent-11, score-1.017]
9 In this representation, an object is defined by 1) a model of the intrinsic properties of the object and 2) a model of the typical contexts in which the object is immersed. [sent-15, score-1.79]
10 Here we show how incorporating contextual models can enhance target object saliency and provide an estimate of its likelihood and intrinsic properties. [sent-16, score-1.385]
11 2 Target saliency and object likelihood Image information can be partitioned into two sets of features: local features, VL, that are intrinsic to an object, and contextual features, rUe which encode structural properties of the background. [sent-17, score-1.333]
12 In a statistical framework, object detection requires evaluation of the likelihood function (target saliency function): P(O IVL, va) which provides the probability of presence of the object 0 given a set of local and contextual measurements. [sent-18, score-2.065]
13 0 is the set of parameters that define an object immersed in a scene: 0 == {on, x, y, i} with on==object class, (x,y)==location in image coordinates and bobject appearance parameters. [sent-19, score-0.803]
14 It provides a measure of how unlikely it is to find a set of local measurements VL within the context va. [sent-21, score-0.241]
15 We can define local saliency as S(x,y) == l/P(vL(x,y) Iva). [sent-22, score-0.259]
16 The second factor, P(VL 10, va), gives the likelihood of the local measurements VL when the object is present at such location in a particular context. [sent-24, score-0.739]
17 We can write P(VL 10, va) ~ P(VL 10), which is a convenient approximation when the aspect of the target object is fully determined by the parameters given by the description O. [sent-25, score-0.728]
18 This factor represents the top-down knowledge of the target· appearance and how it contributes to the search. [sent-26, score-0.111]
19 Regions of the image with features unlikely to belong to the target object are vetoed. [sent-27, score-0.933]
20 The third factor, the PDF P(O I va), provides context-based priors on object class, location and scale. [sent-29, score-0.64]
21 It is of capital importance for insuring reliable inferences in situations where the local image measurements VL produce ambiguous interpretations. [sent-30, score-0.251]
22 This factor does not depend on local measurements and target models [8,13]. [sent-31, score-0.248]
23 Therefore, the term P(O Iva) modulates the saliency of local image properties when looking for an object of the class On. [sent-32, score-1.08]
24 If P( On Iva) is very small, then object search need not be initiated (we do not need to look for cars in a living room). [sent-34, score-0.799]
25 On Contextual control of focus of attention: P(x, y I On, va)· This PDDF gives the most likely locations for the presence of object On given context information, and it allocates computational resources into relevant scene regions. [sent-35, score-0.962]
26 This gives the likely (prototypical) shapes (point of views, size, aspect ratio, object aspect) of the object On in the context Va- Here t == {a, p}, with a==scale and p==aspect ratio. [sent-38, score-1.284]
27 Other parameters describing the appearance of an object in an image can be added. [sent-39, score-0.803]
28 In such a representation [8], v(i,k) is the output magnitude- at the location i of a complex Gabor filter tuned to the spatial frequency f~. [sent-53, score-0.1]
29 The variable k indexes filters tuned to different spatial frequencies and orientations. [sent-54, score-0.069]
30 On the other ,hand, contextual features have to summarize the structure of the whole image. [sent-55, score-0.49]
31 It has been shown that a holistic low-dimensional encoding of the local image features conveys enough information for a semantic categorization of the scene/context [8] and can be used for contextual priming in object recognition tasks [13]. [sent-56, score-1.492]
32 Such a representation can be achieved by decomposing the image features into the basis functions provided by PCA: an == L L v{x, k) 1/ln{x, k) x k N v(x, k) ~ L an1/ln(x, k) (4) n=l We propose to use the decomposition coefficients vc == {a n }n=l,N as context features. [sent-57, score-0.405]
33 By using only a reduced set of components (N == 60 for the rest of the paper), the coefficients {a n }n=l,N encode the main spectral characteristics of the scene with a coarse description of their spatial arrangement. [sent-59, score-0.203]
34 In essence, {a n }n=l,N is a holistic representation as all the regions of the image contribute to all the coefficients, and objects are not encoded individually [8]. [sent-60, score-0.318]
35 In the rest of the paper we show the efficacy of this set of features in context modeling for object detection tasks. [sent-61, score-0.879]
36 3 Contextual object priming The PDF P( On Iva) gives the probability of presence of the object class On given contextual information. [sent-62, score-1.76]
37 In other words, the PDF P{on Ive) evaluates the consistency of the object On with the context vc. [sent-63, score-0.672]
38 For instance, a car has a high probability of presence in a highway scene but it is inconsistent with an indoor environment. [sent-64, score-0.293]
39 The goal of P(on Ive) is to cut down the number of possible object categories to deal with before- expending computational resources in the object recognition process. [sent-65, score-1.257]
40 The model parameters (bi,n, Vi,n, Vi,n) for the object class On are obtained using the EM algorithm [3]. [sent-67, score-0.592]
41 presence/absence of four objects categories (i-people, 2-furniture, 3-vehicles and 4-trees). [sent-70, score-0.111]
42 1 shows some typical results from the priming model on the four superordinate categories of objects defined. [sent-72, score-0.261]
43 that the probability function P(on Ive) provides information about the probable presence of one object without scanning the picture. [sent-74, score-0.65]
44 If P( On Ive) > 1th then we can predict that the target is present. [sent-75, score-0.118]
45 On the other hand, if P( On Ive) < th we can predict that the object is likely to be absent before exploring the image. [sent-76, score-0.571]
46 The number of scenes in which the system may be able to take high confidence decisions will depend on different factors such as: the strength of the relationship between the target object and its context and the ability of ve for efficiently characterizing the context. [sent-77, score-0.934]
47 Figure 1 shows some typical results from the priming model for a set of super-ordinate categories of objects. [sent-78, score-0.199]
48 5) the presence/absence of the objects was correctly predicted by the model on 81 % of the scenes of the test set. [sent-80, score-0.131]
49 For each object category, high confidence predictions (th == . [sent-81, score-0.593]
50 1) were made in at least 50% of the tested scene pictures and the presence/absence of each object class was correctly predicted by the model on 95% of those images. [sent-82, score-0.818]
51 Therefore, for those images, we do not need to use local image analysis to decide about the presence/absence of the object. [sent-83, score-0.214]
52 4 Contextual control of focus of attention One of the strategies that biological visual systems use to deal with the analysis of real-world scenes is to focus attention (and, therefore, computational resources) onto the important image regions while neglecting others. [sent-84, score-0.585]
53 Current computational models of visual attention (saliency maps anQ target detection) rely exclusively on local information or intrinsic object models [6,7,9,14,16]. [sent-85, score-0.961]
54 The control of the focus of attention by contextual information that we propose. [sent-86, score-0.598]
55 here is both task driven (looking for object on) and context driven (given global context information: ve). [sent-87, score-0.731]
56 However, it does riot include any model of the target object at this stage. [sent-88, score-0.689]
57 In our framework, the problem of contextual control of the focus of attention involves the S•• 10 . [sent-89, score-0.598]
58 4 oReal pose 1 Figure 3: Estimation results of object scale and pose based on contextual features. [sent-121, score-1.247]
59 The training data is {Vt}t==l,Nt and {Xt}t==l,Nt where Vt are the contextual features of the picture t of the training set and Xt is the location of object On in the image. [sent-125, score-1.099]
60 We used 1200 pictures for training and a separate set of 1200 pictures for testing. [sent-127, score-0.176]
61 The success of the PDF in narrowing the region of the focus of attention will depend on the consistency of the relationship between the object and the context. [sent-128, score-0.76]
62 2 shows several examples of images and the selected regions based on contextual features when looking for cars and faces. [sent-130, score-0.889]
63 From the PDF P(x, Vo IOn) we selected the region with the highest probability (33% of the image size on average). [sent-131, score-0.174]
64 87% of the heads present in the test pictures were inside the selected regions. [sent-132, score-0.159]
65 5 Contextual selection of object appearance models One major problem for computational approaches to object detection is the large variability in object appearance. [sent-133, score-1.989]
66 The classical solution is to explore the space of possible shapes looking for the best match. [sent-134, score-0.104]
67 The main sources of variability in object appearance are size, pose and intra-class shape variability (deformations, style, etc. [sent-135, score-0.821]
68 We show here that including contextual information can reduce at le. [sent-137, score-0.421]
69 For instance, the expected size of people in an image differs greatly between an indoor environment and a perspective view of a street. [sent-139, score-0.183]
70 Both environments produce different patterns of contextual features vo [8]. [sent-140, score-0.579]
71 For the second factor, pose, in the case of cars, there is a strong relationship between the possible orientations of the object and the scene configuration. [sent-141, score-0.709]
72 For instance, looking down a highway, we expect to see the back of the cars, however, in a street view, looking towards the buildings, lateral views of cars are more likely. [sent-142, score-0.434]
73 The expected scale and pose of the target object can be estimated by a regression procedure. [sent-143, score-0.859]
74 The training database used for building the regression is a set of 1000 images in which the target object On is present. [sent-144, score-0.744]
75 For each training image the target Figure 4: Selection of prototypical object appearances based on contextual cues. [sent-145, score-1.368]
76 For faces and cars we define the u == scale as the height of the selected window and the P == pose as the ratio between the horizontal and vertical dimensions of the window (~y/ ~x). [sent-147, score-0.451]
77 On average, this definition of pose provides a good estimation of the orientation for cars but not for heads. [sent-148, score-0.365]
78 Here we used regression using a mixture of gaussians for estimating the conditional PDFs between scale, pose and contextual features: P(u I Va, on) and PCP I va, on). [sent-149, score-0.57]
79 3 show that context is a strong cue for scale selection for the face detection task but less important for the car detection task. [sent-151, score-0.45]
80 On the other hand, context introduces strong constraints on the prototypical point of views of cars but not at all for heads. [sent-152, score-0.422]
81 Once the two parameters (pose and scale) have been estimated, we can build a prototypical model of the target object. [sent-153, score-0.188]
82 In the case of a view-based object representation, the model of the object will consist of a collection of templates that correspond to the possible aspects of the target. [sent-154, score-1.142]
83 For each image the system produces a collection of views, selected among a database of target examples that have the scale and pose given by eqs. [sent-155, score-0.441]
84 In the statistical framework, the object detection requires the evaluation of the function P(VL 10, va). [sent-159, score-0.708]
85 The bottom example is an error in detection due to incorrect context identification. [sent-161, score-0.217]
86 5 and 6 show the complete chain of operations and some detection results using a simple correlation technique between the image and the generated object models (100 exemplars) at only one scale. [sent-164, score-0.856]
87 The last image of each row shows the total object likelihood obtained by multiplying the object saliency maps (obtained by the correlation) and the contextual control of the focus of attention. [sent-165, score-2.036]
88 The result shows how the use of context helps reduce false alarms. [sent-166, score-0.08]
89 This results in good detection performances despite the simplicity of the matching procedure used. [sent-167, score-0.16]
90 6 Conclusion The contextual schema of a scene provides the likelihood of presence, typical locations and appearances of objects within the scene. [sent-168, score-0.786]
91 We have proposed a model for incorporating such contextual cues in the task of object detection. [sent-169, score-0.992]
92 The main aspects of our approach are: 1) Progressive reduction of the window of focus of attention: the system reduces the size of the focus of attention by first integrating contextual information and then local information. [sent-170, score-0.732]
93 2) Inhibition of target like patterns that are in inconsistent locations. [sent-171, score-0.142]
94 3) Faster detection of correctly scaled targets that have a pose in agreement with the context. [sent-172, score-0.263]
95 4) No requirement of parsing a scene into individual objects. [sent-173, score-0.118]
96 Furthermore, once one object has been detected, it can introduce new contextual information for analyzing the rest of the scene. [sent-174, score-1.014]
97 Scene perception: detecting and judging objects undergoing relational violations. [sent-185, score-0.062]
98 A model of saliency-based visual attention for rapid scene analysis. [sent-230, score-0.246]
99 Modeling the Shape of the Scene: A holistic representation of the spatial envelope. [sent-242, score-0.115]
100 Rapid object detection using a boosted cascade of simple features. [sent-295, score-0.708]
wordName wordTfidf (topN-words)
[('object', 0.571), ('contextual', 0.421), ('cars', 0.228), ('vl', 0.197), ('saliency', 0.193), ('va', 0.16), ('image', 0.148), ('pdf', 0.14), ('detection', 0.137), ('priming', 0.128), ('iva', 0.121), ('target', 0.118), ('scene', 0.118), ('pose', 0.106), ('ive', 0.105), ('attention', 0.095), ('vo', 0.089), ('pictures', 0.088), ('appearance', 0.084), ('looking', 0.081), ('context', 0.08), ('prototypical', 0.07), ('torralba', 0.07), ('features', 0.069), ('local', 0.066), ('objects', 0.062), ('vc', 0.057), ('ion', 0.055), ('intrinsic', 0.055), ('vision', 0.054), ('focus', 0.053), ('holistic', 0.053), ('categories', 0.049), ('scenes', 0.049), ('presence', 0.048), ('ve', 0.045), ('heads', 0.045), ('views', 0.044), ('scale', 0.043), ('appearances', 0.04), ('highway', 0.04), ('ivl', 0.04), ('jauai', 0.04), ('oliva', 0.04), ('aspect', 0.039), ('location', 0.038), ('measurements', 0.037), ('spatial', 0.037), ('recognition', 0.036), ('gk', 0.035), ('indoor', 0.035), ('picard', 0.035), ('sinha', 0.035), ('video', 0.034), ('images', 0.034), ('visual', 0.033), ('locations', 0.033), ('filters', 0.032), ('pdfs', 0.032), ('schema', 0.032), ('ieee', 0.031), ('provides', 0.031), ('variability', 0.03), ('oriented', 0.03), ('regions', 0.03), ('resources', 0.03), ('factors', 0.029), ('control', 0.029), ('ei', 0.029), ('society', 0.029), ('car', 0.028), ('unlikely', 0.027), ('likelihood', 0.027), ('factor', 0.027), ('coefficients', 0.026), ('selected', 0.026), ('exhaustive', 0.026), ('vol', 0.026), ('integration', 0.025), ('selection', 0.025), ('representation', 0.025), ('inconsistent', 0.024), ('window', 0.024), ('maps', 0.023), ('performances', 0.023), ('shapes', 0.023), ('rest', 0.022), ('confidence', 0.022), ('typical', 0.022), ('gaussians', 0.022), ('regression', 0.021), ('class', 0.021), ('modulation', 0.021), ('procedures', 0.021), ('consistency', 0.021), ('vt', 0.021), ('cognitive', 0.02), ('correctly', 0.02), ('integrating', 0.02), ('relationship', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999934 54 nips-2001-Contextual Modulation of Target Saliency
Author: Antonio Torralba
Abstract: The most popular algorithms for object detection require the use of exhaustive spatial and scale search procedures. In such approaches, an object is defined by means of local features. fu this paper we show that including contextual information in object detection procedures provides an efficient way of cutting down the need for exhaustive search. We present results with real images showing that the proposed scheme is able to accurately predict likely object classes, locations and sizes. 1
2 0.34792215 65 nips-2001-Effective Size of Receptive Fields of Inferior Temporal Visual Cortex Neurons in Natural Scenes
Author: Thomas P. Trappenberg, Edmund T. Rolls, Simon M. Stringer
Abstract: Inferior temporal cortex (IT) neurons have large receptive fields when a single effective object stimulus is shown against a blank background, but have much smaller receptive fields when the object is placed in a natural scene. Thus, translation invariant object recognition is reduced in natural scenes, and this may help object selection. We describe a model which accounts for this by competition within an attractor in which the neurons are tuned to different objects in the scene, and the fovea has a higher cortical magnification factor than the peripheral visual field. Furthermore, we show that top-down object bias can increase the receptive field size, facilitating object search in complex visual scenes, and providing a model of object-based attention. The model leads to the prediction that introduction of a second object into a scene with blank background will reduce the receptive field size to values that depend on the closeness of the second object to the target stimulus. We suggest that mechanisms of this type enable the output of IT to be primarily about one object, so that the areas that receive from IT can select the object as a potential target for action.
3 0.23285052 46 nips-2001-Categorization by Learning and Combining Object Parts
Author: Bernd Heisele, Thomas Serre, Massimiliano Pontil, Thomas Vetter, Tomaso Poggio
Abstract: We describe an algorithm for automatically learning discriminative components of objects with SVM classifiers. It is based on growing image parts by minimizing theoretical bounds on the error probability of an SVM. Component-based face classifiers are then combined in a second stage to yield a hierarchical SVM classifier. Experimental results in face classification show considerable robustness against rotations in depth and suggest performance at significantly better level than other face detection systems. Novel aspects of our approach are: a) an algorithm to learn component-based classification experts and their combination, b) the use of 3-D morphable models for training, and c) a maximum operation on the output of each component classifier which may be relevant for biological models of visual recognition.
4 0.19170409 80 nips-2001-Generalizable Relational Binding from Coarse-coded Distributed Representations
Author: Randall C. O'Reilly, R. S. Busby
Abstract: We present a model of binding of relationship information in a spatial domain (e.g., square above triangle) that uses low-order coarse-coded conjunctive representations instead of more popular temporal synchrony mechanisms. Supporters of temporal synchrony argue that conjunctive representations lack both efficiency (i.e., combinatorial numbers of units are required) and systematicity (i.e., the resulting representations are overly specific and thus do not support generalization to novel exemplars). To counter these claims, we show that our model: a) uses far fewer hidden units than the number of conjunctions represented, by using coarse-coded, distributed representations where each unit has a broad tuning curve through high-dimensional conjunction space, and b) is capable of considerable generalization to novel inputs.
5 0.10432909 3 nips-2001-ACh, Uncertainty, and Cortical Inference
Author: Peter Dayan, Angela J. Yu
Abstract: Acetylcholine (ACh) has been implicated in a wide variety of tasks involving attentional processes and plasticity. Following extensive animal studies, it has previously been suggested that ACh reports on uncertainty and controls hippocampal, cortical and cortico-amygdalar plasticity. We extend this view and consider its effects on cortical representational inference, arguing that ACh controls the balance between bottom-up inference, influenced by input stimuli, and top-down inference, influenced by contextual information. We illustrate our proposal using a hierarchical hidden Markov model.
6 0.099722892 2 nips-2001-3 state neurons for contextual processing
7 0.099298976 191 nips-2001-Transform-invariant Image Decomposition with Similarity Templates
8 0.096181862 193 nips-2001-Unsupervised Learning of Human Motion Models
9 0.094655529 151 nips-2001-Probabilistic principles in unsupervised learning of visual structure: human data and a model
10 0.093055651 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
11 0.089159146 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network
12 0.080645099 89 nips-2001-Grouping with Bias
13 0.078290239 10 nips-2001-A Hierarchical Model of Complex Cells in Visual Cortex for the Binocular Perception of Motion-in-Depth
14 0.0736292 74 nips-2001-Face Recognition Using Kernel Methods
15 0.06832286 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data
16 0.067618966 124 nips-2001-Modeling the Modulatory Effect of Attention on Human Spatial Vision
17 0.066411905 182 nips-2001-The Fidelity of Local Ordinal Encoding
18 0.059986588 189 nips-2001-The g Factor: Relating Distributions on Features to Distributions on Images
19 0.05886817 111 nips-2001-Learning Lateral Interactions for Feature Binding and Sensory Segmentation
20 0.0583391 153 nips-2001-Product Analysis: Learning to Model Observations as Products of Hidden Variables
topicId topicWeight
[(0, -0.165), (1, -0.09), (2, -0.103), (3, 0.024), (4, -0.142), (5, 0.052), (6, -0.39), (7, -0.002), (8, 0.059), (9, 0.009), (10, 0.009), (11, 0.155), (12, 0.246), (13, 0.041), (14, -0.083), (15, 0.034), (16, 0.239), (17, 0.012), (18, -0.058), (19, -0.042), (20, 0.066), (21, -0.05), (22, -0.08), (23, -0.014), (24, 0.094), (25, -0.161), (26, -0.145), (27, -0.031), (28, -0.06), (29, 0.086), (30, -0.044), (31, -0.062), (32, 0.186), (33, -0.08), (34, -0.013), (35, -0.099), (36, -0.061), (37, -0.04), (38, 0.04), (39, -0.011), (40, -0.047), (41, -0.002), (42, -0.104), (43, 0.006), (44, 0.019), (45, 0.006), (46, -0.039), (47, -0.043), (48, -0.037), (49, -0.085)]
simIndex simValue paperId paperTitle
same-paper 1 0.98956031 54 nips-2001-Contextual Modulation of Target Saliency
Author: Antonio Torralba
Abstract: The most popular algorithms for object detection require the use of exhaustive spatial and scale search procedures. In such approaches, an object is defined by means of local features. fu this paper we show that including contextual information in object detection procedures provides an efficient way of cutting down the need for exhaustive search. We present results with real images showing that the proposed scheme is able to accurately predict likely object classes, locations and sizes. 1
2 0.8416099 65 nips-2001-Effective Size of Receptive Fields of Inferior Temporal Visual Cortex Neurons in Natural Scenes
Author: Thomas P. Trappenberg, Edmund T. Rolls, Simon M. Stringer
Abstract: Inferior temporal cortex (IT) neurons have large receptive fields when a single effective object stimulus is shown against a blank background, but have much smaller receptive fields when the object is placed in a natural scene. Thus, translation invariant object recognition is reduced in natural scenes, and this may help object selection. We describe a model which accounts for this by competition within an attractor in which the neurons are tuned to different objects in the scene, and the fovea has a higher cortical magnification factor than the peripheral visual field. Furthermore, we show that top-down object bias can increase the receptive field size, facilitating object search in complex visual scenes, and providing a model of object-based attention. The model leads to the prediction that introduction of a second object into a scene with blank background will reduce the receptive field size to values that depend on the closeness of the second object to the target stimulus. We suggest that mechanisms of this type enable the output of IT to be primarily about one object, so that the areas that receive from IT can select the object as a potential target for action.
3 0.55850905 80 nips-2001-Generalizable Relational Binding from Coarse-coded Distributed Representations
Author: Randall C. O'Reilly, R. S. Busby
Abstract: We present a model of binding of relationship information in a spatial domain (e.g., square above triangle) that uses low-order coarse-coded conjunctive representations instead of more popular temporal synchrony mechanisms. Supporters of temporal synchrony argue that conjunctive representations lack both efficiency (i.e., combinatorial numbers of units are required) and systematicity (i.e., the resulting representations are overly specific and thus do not support generalization to novel exemplars). To counter these claims, we show that our model: a) uses far fewer hidden units than the number of conjunctions represented, by using coarse-coded, distributed representations where each unit has a broad tuning curve through high-dimensional conjunction space, and b) is capable of considerable generalization to novel inputs.
4 0.46884051 46 nips-2001-Categorization by Learning and Combining Object Parts
Author: Bernd Heisele, Thomas Serre, Massimiliano Pontil, Thomas Vetter, Tomaso Poggio
Abstract: We describe an algorithm for automatically learning discriminative components of objects with SVM classifiers. It is based on growing image parts by minimizing theoretical bounds on the error probability of an SVM. Component-based face classifiers are then combined in a second stage to yield a hierarchical SVM classifier. Experimental results in face classification show considerable robustness against rotations in depth and suggest performance at significantly better level than other face detection systems. Novel aspects of our approach are: a) an algorithm to learn component-based classification experts and their combination, b) the use of 3-D morphable models for training, and c) a maximum operation on the output of each component classifier which may be relevant for biological models of visual recognition.
5 0.45164183 151 nips-2001-Probabilistic principles in unsupervised learning of visual structure: human data and a model
Author: Shimon Edelman, Benjamin P. Hiles, Hwajin Yang, Nathan Intrator
Abstract: To find out how the representations of structured visual objects depend on the co-occurrence statistics of their constituents, we exposed subjects to a set of composite images with tight control exerted over (1) the conditional probabilities of the constituent fragments, and (2) the value of Barlow’s criterion of “suspicious coincidence” (the ratio of joint probability to the product of marginals). We then compared the part verification response times for various probe/target combinations before and after the exposure. For composite probes, the speedup was much larger for targets that contained pairs of fragments perfectly predictive of each other, compared to those that did not. This effect was modulated by the significance of their co-occurrence as estimated by Barlow’s criterion. For lone-fragment probes, the speedup in all conditions was generally lower than for composites. These results shed light on the brain’s strategies for unsupervised acquisition of structural information in vision. 1 Motivation How does the human visual system decide for which objects it should maintain distinct and persistent internal representations of the kind typically postulated by theories of object recognition? Consider, for example, the image shown in Figure 1, left. This image can be represented as a monolithic hieroglyph, a pair of Chinese characters (which we shall refer to as and ), a set of strokes, or, trivially, as a collection of pixels. Note that the second option is only available to a system previously exposed to various combinations of Chinese characters. Indeed, a principled decision whether to represent this image as , or otherwise can only be made on the basis of prior exposure to related images. £ ¡ £¦ ¡ £ ¥¨§¢ ¥¤¢ ¢ According to Barlow’s [1] insight, one useful principle is tallying suspicious coincidences: two candidate fragments and should be combined into a composite object if the probability of their joint appearance is much higher than , which is the probability expected in the case of their statistical independence. This criterion may be compared to the Minimum Description Length (MDL) principle, which has been previously discussed in the context of object representation [2, 3]. In a simplified form [4], MDL calls for representing explicitly as a whole if , just as the principle of suspicious coincidences does. £ ©¢ £ ¢ ¥¤¥ £¦ ¢ ¥ £ ¢ £¦ ¢ ¥¤¥! ¨§¥ £ ¢ £ ©¢ £¦ £ ¨§¢¥ ¡ ¢ While the Barlow/MDL criterion certainly indicates a suspicious coincidence, there are additional probabilistic considerations that may be used and . One example is the possiin setting the degree of association between ble perfect predictability of from and vice versa, as measured by . If , then and are perfectly predictive of each other and should really be coded by a single symbol, whereas the MDL criterion may suggest merely that some association between the representation of and that of be established. In comparison, if and are not perfectly predictive of each other ( ), there is a case to be made in favor of coding them separately to allow for a maximally expressive representation, whereas MDL may actually suggest a high degree of association ). In this study we investigated whether the human (if visual system uses a criterion based on alongside MDL while learning (in an unsupervised manner) to represent composite objects. £ £ £ ¢ ¥ ¥ © §¥ ¡ ¢ ¨¦¤
6 0.43287864 182 nips-2001-The Fidelity of Local Ordinal Encoding
7 0.37733588 191 nips-2001-Transform-invariant Image Decomposition with Similarity Templates
8 0.34202394 3 nips-2001-ACh, Uncertainty, and Cortical Inference
9 0.30393955 193 nips-2001-Unsupervised Learning of Human Motion Models
10 0.30102611 158 nips-2001-Receptive field structure of flow detectors for heading perception
11 0.29929256 124 nips-2001-Modeling the Modulatory Effect of Attention on Human Spatial Vision
12 0.29602957 73 nips-2001-Eye movements and the maturation of cortical orientation selectivity
13 0.28856647 89 nips-2001-Grouping with Bias
14 0.26956066 74 nips-2001-Face Recognition Using Kernel Methods
15 0.26508531 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network
16 0.26082844 2 nips-2001-3 state neurons for contextual processing
17 0.24177362 53 nips-2001-Constructing Distributed Representations Using Additive Clustering
18 0.24045803 10 nips-2001-A Hierarchical Model of Complex Cells in Visual Cortex for the Binocular Perception of Motion-in-Depth
19 0.23791002 11 nips-2001-A Maximum-Likelihood Approach to Modeling Multisensory Enhancement
20 0.23074606 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
topicId topicWeight
[(14, 0.024), (17, 0.034), (19, 0.048), (27, 0.1), (30, 0.076), (36, 0.01), (38, 0.04), (58, 0.298), (59, 0.041), (72, 0.039), (79, 0.039), (83, 0.013), (91, 0.145)]
simIndex simValue paperId paperTitle
same-paper 1 0.81522369 54 nips-2001-Contextual Modulation of Target Saliency
Author: Antonio Torralba
Abstract: The most popular algorithms for object detection require the use of exhaustive spatial and scale search procedures. In such approaches, an object is defined by means of local features. fu this paper we show that including contextual information in object detection procedures provides an efficient way of cutting down the need for exhaustive search. We present results with real images showing that the proposed scheme is able to accurately predict likely object classes, locations and sizes. 1
2 0.74370182 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments
Author: Jonathan L. Shapiro, J. Wearden
Abstract: Animal data on delayed-reward conditioning experiments shows a striking property - the data for different time intervals collapses into a single curve when the data is scaled by the time interval. This is called the scalar property of interval timing. Here a simple model of a neural clock is presented and shown to give rise to the scalar property. The model is an accumulator consisting of noisy, linear spiking neurons. It is analytically tractable and contains only three parameters. When coupled with reinforcement learning it simulates peak procedure experiments, producing both the scalar property and the pattern of single trial covariances. 1
3 0.72563565 22 nips-2001-A kernel method for multi-labelled classification
Author: André Elisseeff, Jason Weston
Abstract: This article presents a Support Vector Machine (SVM) like learning system to handle multi-label problems. Such problems are usually decomposed into many two-class problems but the expressive power of such a system can be weak [5, 7]. We explore a new direct approach. It is based on a large margin ranking system that shares a lot of common properties with SVMs. We tested it on a Yeast gene functional classification problem with positive results.
4 0.56083322 13 nips-2001-A Natural Policy Gradient
Author: Sham M. Kakade
Abstract: We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1
5 0.55898964 89 nips-2001-Grouping with Bias
Author: Stella X. Yu, Jianbo Shi
Abstract: With the optimization of pattern discrimination as a goal, graph partitioning approaches often lack the capability to integrate prior knowledge to guide grouping. In this paper, we consider priors from unitary generative models, partially labeled data and spatial attention. These priors are modelled as constraints in the solution space. By imposing uniformity condition on the constraints, we restrict the feasible space to one of smooth solutions. A subspace projection method is developed to solve this constrained eigenproblema We demonstrate that simple priors can greatly improve image segmentation results. 1
6 0.55794096 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks
7 0.55686975 161 nips-2001-Reinforcement Learning with Long Short-Term Memory
8 0.55539799 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes
9 0.55523598 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data
10 0.55471456 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's
12 0.55249226 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
13 0.55248767 169 nips-2001-Small-World Phenomena and the Dynamics of Information
14 0.55208135 57 nips-2001-Correlation Codes in Neuronal Populations
15 0.55058104 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning
16 0.55012941 46 nips-2001-Categorization by Learning and Combining Object Parts
17 0.5489583 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
18 0.54863811 182 nips-2001-The Fidelity of Local Ordinal Encoding
19 0.54816443 150 nips-2001-Probabilistic Inference of Hand Motion from Neural Activity in Motor Cortex
20 0.54808617 132 nips-2001-Novel iteration schemes for the Cluster Variation Method