nips nips2008 nips2008-118 knowledge-graph by maker-knowledge-mining

118 nips-2008-Learning Transformational Invariants from Natural Movies


Source: pdf

Author: Charles Cadieu, Bruno A. Olshausen

Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. [sent-4, score-0.671]

2 The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. [sent-5, score-1.219]

3 The second layer learns the higher-order structure among the time-varying phase variables. [sent-6, score-0.698]

4 After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. [sent-7, score-0.567]

5 We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). [sent-8, score-1.005]

6 The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. [sent-9, score-0.318]

7 1 Introduction A key attribute of visual perception is the ability to extract invariances from visual input. [sent-11, score-0.312]

8 In the realm of object recognition, the goal of invariant representation is quite clear: a successful object recognition system must be invariant to image variations resulting from different views of the same object. [sent-12, score-0.436]

9 While spatial invariants are essential for forming a useful representation of the natural environment, there is another, equally important form of visual invariance, namely transformational invariance. [sent-13, score-0.688]

10 A transformational invariant refers to the dynamic visual structure that remains the same when the spatial structure changes. [sent-14, score-0.61]

11 For example, the property that a soccer ball moving through the air shares with a football moving through the air is a transformational invariant; it is specific to how the ball moves but invariant to the shape or form of the object. [sent-15, score-0.463]

12 Here we seek to learn such invariants from the statistics of natural movies. [sent-16, score-0.244]

13 There have been numerous efforts to learn spatial invariants [1, 2, 3] from the statistics of natural images, especially with the goal of producing representations useful for object recognition [4, 5, 6]. [sent-17, score-0.471]

14 However, there have been few attempts to learn transformational invariants from natural sensory data. [sent-18, score-0.468]

15 Furthermore, it is unclear to what extent these models have captured the diversity of transformations in natural visual scenes or to what level of abstraction their representations produce transformational invariants. [sent-20, score-0.544]

16 However, this type of model does not capture the abstract property of motion because each unit is bound to a specific orientation, spatial-frequency and location within the image—i. [sent-22, score-0.318]

17 1 Here we describe a hierarchical probabilistic generative model that learns transformational invariants from unsupervised exposure to natural movies. [sent-25, score-0.625]

18 The latter approach characterizes most models of form and motion processing in the visual cortical hierarchy [6, 12], but suffers from the fact that information about these properties is not bound together—i. [sent-27, score-0.397]

19 , it is not possible to reconstruct an image sequence from a representation in which form and motion have been extracted by separate and independent mechanisms. [sent-29, score-0.439]

20 In the model we propose here, form and motion are factorized, meaning that extracting one property depends upon the other. [sent-31, score-0.318]

21 We show that when such a model is adapted to natural movies, the top layer units learn to extract transformational invariants. [sent-33, score-0.815]

22 The diversity of units in both the intermediate layer and top layer provides a set of testable predictions for representations that might be found in V1 and MT. [sent-34, score-0.949]

23 The model consists of an input layer and two hidden layers as shown in Figure 1. [sent-37, score-0.519]

24 The input layer represents the time-varying image pixel intensities. [sent-38, score-0.518]

25 The first hidden layer is a sparse coding model utilizing complex basis functions, and shares many properties with subspace-ICA [13] and the standard energy model of complex cells [14]. [sent-39, score-0.946]

26 The second hidden layer models the dynamics of the complex basis function phase variables. [sent-40, score-0.947]

27 The sparse coding model imposes a kurtotic, independent prior over the coefficients, and when adapted to natural image patches the Ai (x) converge to a set of localized, oriented, multiscale functions similar to a Gabor wavelet decomposition of images. [sent-43, score-0.546]

28 We propose here a generalization of the sparse coding model to complex variables that is primarily motivated from two observations of natural image statistics. [sent-44, score-0.451]

29 These are most clearly seen as circularly symmetric, yet kurtotic distributions among pairs of coefficients corresponding to neighboring basis functions, as first described by Zetzsche [17]. [sent-46, score-0.312]

30 As pointed out by Hyvarinen [3], the temporal evolution of a coefficient in response to a movie, ui (t), can be well described in terms of the product of a smooth amplitude envelope multiplied by a quickly changing variable. [sent-51, score-0.304]

31 A similar result from Kording [1] indicates that temporal continuity in amplitude provides a strong cue for learning local invariances. [sent-52, score-0.301]

32 Note that the basis functions are only functions of space. [sent-58, score-0.232]

33 Therefore, the temporal dynamics within image sequences will be expressed in the temporal dynamics of the amplitude and phase. [sent-59, score-0.572]

34 The second term in the exponential imposes temporal stability on the time rate of change of the amplitudes and is given by: Sla (ai (t), ai (t−1)) = (ai (t) − ai (t−1))2 . [sent-63, score-0.82]

35 2 Phase Transformations Given the decomposition into amplitude and phase variables, we now have a non-linear representation of image content that enables us to learn its structure in another linear generative model. [sent-67, score-0.709]

36 In particular, the dynamics of objects moving in continuous trajectories through the world over short epochs will be encoded in the population activity of the phase variables φi . [sent-68, score-0.421]

37 Furthermore, because we have encoded these trajectories with an angular variable, many transformations in the image domain that would otherwise be nonlinear in the coefficients ui will now be linearized. [sent-69, score-0.342]

38 This linear relationship allows us to model the time-rate of change of the phase variables with a simple linear generative model. [sent-70, score-0.329]

39 We thus model the first-order time derivative of the phase variables as follows: ˙ φi (t) = Dik wk (t) + νi (t) (6) k ˙ where φi = φi (t) − φi (t−1), and D is the basis function matrix specifying how the high-level ˙ variables wk influence the phase shifts φi . [sent-71, score-0.925]

40 The additive noise term, νi , represents uncertainty or noise in the estimate of the phase time-rate of change. [sent-72, score-0.259]

41 As before, we impose a sparse, independent distribution on the coefficients wk , in this case with a sparse cost function given as: Sw (wk (t)) = β log 1 + wk (t) σ 2 (7) The uncertainty over the phase shifts is given by a von Mises distribution: p(νi ) ∝ exp(κ cos(νi )). [sent-73, score-0.554]

42 Thus, the log-posterior over the second layer units is given by ˙ κ cos(φi − [Dw(t)]i ) + E2 = − t Sw (wk (t)) k i∈{ai (t)>0} 3 (8) Figure 1: Graph of the hierarchical model showing the relationship among hidden variables. [sent-74, score-0.606]

43 Because the angle of a variable with 0 amplitude is undefined, we exclude angles where the corresponding amplitude is 0 from our cost function. [sent-75, score-0.436]

44 Note that in the first layer we did not introduce any prior on the phase variables. [sent-76, score-0.625]

45 With our second hidden layer, E2 can be viewed as a log-prior on the time rate of change of the phase variables: ˙ ˙ φi (t). [sent-77, score-0.311]

46 ˙ Activating the w variables moves the prior away from φi (t) = 0, encouraging certain patterns of phase shifting that will in turn produce patterns of motion in the image domain. [sent-79, score-0.788]

47 The resulting dynamics for the amplitudes and phases in the first layer are given by ∆ai (t) ∝ {bi (t)} − Sp (ai (t)) − Sl (ai (t), ai (t−1)) (9) ˙ i (t) − [Dw(t)]i ) + κ sin(φi (t+1) − [Dw(t+1)]i ) (10) ˙ ∆φi (t) ∝ {bi (t)} ai (t) − κ sin(φ with bi (t) = σ1 e−jφi (t) 2 N ∗ {zi (t) Ai (x)} . [sent-87, score-1.214]

48 The learning rule for the first layer basis functions is given by the gradient of E1 with respect to Ai (x), using the values of the complex coefficients inferred in eqs. [sent-92, score-0.632]

49 1 Simulation procedures The model was trained on natural image sequences obtained from Hans van Hateren’s repository at http://hlab. [sent-95, score-0.241]

50 They contain a variety of motions due to the movements of animals in the scene, camera motion, tracking (which introduces background motion), and motion borders due to occlusion. [sent-102, score-0.345]

51 We trained the first layer of the model on 20x20 pixel image patches, using 400 complex basis functions Ai in the first hidden layer initialized to random values. [sent-103, score-1.233]

52 During this initial phase of learning only the terms in E1 are used to infer the ai and φi . [sent-104, score-0.591]

53 Once the first layer reaches convergence, we begin training the second layer, using 100 bases, Di , initialized to random values. [sent-105, score-0.366]

54 The second ˙ layer bases are initially trained on the MAP estimates of the first layer φi inferred using E1 only. [sent-106, score-0.732]

55 After the second layer begins to converge we infer coefficients in both the first layer and the second layer simultaneously using all terms in E1 + E2 (we observed that this improved convergence in the second layer). [sent-107, score-1.098]

56 The bootstrapping of the second layer was used to speed convergence and we did not observe much change in the first layer basis functions after the initial convergence. [sent-109, score-0.917]

57 2 Learned complex basis functions After learning, the first layer complex basis functions converge to a set of localized, oriented, and bandpass functions with real and imaginary parts roughly in quadrature. [sent-113, score-1.026]

58 Figure 2(a) shows the real part, imaginary part, amplitude, and angle of two representative basis functions as a function of space. [sent-116, score-0.266]

59 Examining the amplitude of the basis function we see that it is localized and has a roughly Gaussian envelope. [sent-117, score-0.356]

60 The angle as a function of space reveals a smooth ramping of the phase in the direction perpendicular to the basis functions’ orientation. [sent-118, score-0.435]

61 Figure 2(b) displays the resulting image sequences produced by two representative basis functions as the amplitude and phase follow the indicated time courses. [sent-122, score-0.814]

62 The amplitude has the effect of controlling the presence of the feature within the image and the phase is related to the position of the edge within the image. [sent-123, score-0.668]

63 Importantly for our hierarchical model, the time derivative, or slope of the phase through time is directly related to the movement of the edge through time. [sent-124, score-0.314]

64 Figure 2(c) shows how the population of complex basis functions tiles the space of position (left) and spatial-frequency (right). [sent-125, score-0.345]

65 Each dot represents a different basis function according to its maximum amplitude in the space domain, or its maximum amplitude in the frequency domain computed via the 2D Fourier transform of each complex pair (which produces a single peak in the spatialfrequency plane). [sent-126, score-0.797]

66 This visualization will be useful for understanding what the phase shifting components D in the second layer have learned. [sent-128, score-0.677]

67 Some functions we believe arise from aliased temporal structure in the movies (row 1, column 5), and others are unknown (row 2, column 4). [sent-138, score-0.286]

68 Spatial Domain Frequency Domain Spatial Domain Frequency Domain Figure 3: Learned phase shifting components. [sent-140, score-0.311]

69 The phase shift components generate movements within the image that are invariant to aspects of the spatial structure such as orientation and spatial-frequency. [sent-141, score-0.679]

70 4 Discussion and conclusions The computational vision community has spent considerable effort on developing motion models. [sent-146, score-0.335]

71 Of particular relevance to our work is the Motion-Energy model [14], which signals motion via the amplitudes of quadrature pair filter outputs, similar to the responses of complex neurons in V1. [sent-147, score-0.469]

72 Simoncelli & Heeger have shown how it is possible to extract motion by pooling over a population of such units lying within a common plane in the 3D Fourier domain [12]. [sent-148, score-0.516]

73 Our model addresses each of these problems: it learns from the statistics of natural movies how to best tile the joint domain of position and motion, and it captures complex motion beyond uniform translation. [sent-151, score-0.799]

74 The use of phase information for computing motion is not new, and was used by Fleet and Jepson [20] to compute optic flow. [sent-153, score-0.546]

75 In addition, as shown in Eero Simoncelli’s Thesis, one can establish a formal equivalence between phase-based methods and motion energy models. [sent-154, score-0.287]

76 Here we argue that phase provides a convenient representation as it linearizes trajectories in coefficient space and thus allows one to capture the higher-order structure via a simple linear generative model. [sent-155, score-0.37]

77 Whether or how phase is represented in V1 is not known, 6 (a) (c) (b) (d) Figure 4: Visualization of learned transformational invariants (best viewed as animations in movie TransInv Figure4x. [sent-156, score-0.757]

78 Each phase-shift component produces a pattern of motion that is invariant to the spatial structure contained within the image. [sent-158, score-0.573]

79 Each panel displays the induced image transformations for a different basis function, Di . [sent-159, score-0.36]

80 Induced motions are shown for four different image patches with the original static patch displayed in the center position. [sent-160, score-0.384]

81 The final image in each sequence shows the pixel-wise variance of the transformation (white values indicate where image pixels are changing through time, which may be difficult to discern in this static presentation). [sent-162, score-0.304]

82 The example in (a) produces global motion in the direction of 45 deg. [sent-163, score-0.376]

83 The next example (b) produces local vertical motion in the lower portion of the image patch only. [sent-166, score-0.635]

84 Note that in the first patch the strong edge in the lower portion of the patch moves while the edge in the upper portion remains fixed. [sent-167, score-0.262]

85 Again, this component produces similar transformations irrespective of the spatial structure contained in the image. [sent-168, score-0.272]

86 The example in (c) produces horizontal motion in the left part of the image in the opposite direction of horizontal motion in the right half (the two halves of the image either converge or diverge). [sent-169, score-1.039]

87 Note that the oriented structure in the first two patches becomes more closely spaced in the leftmost patch and is more widely spaced in the right most image. [sent-170, score-0.261]

88 This is seen clearly in the third image as the spacing between the vertical structure is most narrow in the leftmost image and widest in the rightmost image. [sent-171, score-0.389]

89 Instead of independent processing streams focused on form perception and motion perception, the two streams may represent complementary aspects of visual information: spatial invariants and transformational invariants. [sent-177, score-0.975]

90 Importantly though, in our model information about form and motion is bound together since it is computed by a process of factorization rather than by independent mechanisms in separate streams. [sent-179, score-0.318]

91 Our model also illustrates a functional role for feedback between higher visual areas and primary visual cortex, not unlike the proposed inference pathways suggested by Lee and Mumford [22]. [sent-180, score-0.251]

92 The first layer units are responsive to visual information in a narrow spatial window and narrow spatial frequency band. [sent-181, score-0.924]

93 However, the top layer units receive input from a diverse population of first layer units and can thus disambiguate local information by providing a bias to the time rate of change of the phase variables. [sent-182, score-1.305]

94 Because the second layer weights D are adapted to the statistics of natural movies, these biases will be consistent with the statistical distribution of motion occurring in the 7 natural environment. [sent-183, score-0.769]

95 Most obviously, the graphical model in Figure 1 begs the question of what would be gained by modeling the joint distribution over the amplitudes, ai , in addition to the phases. [sent-186, score-0.363]

96 To some degree, this line of approach has already been pursued by Karklin & Lewicki [2], and they have shown that the high level units in this case learn spatial invariants within the image. [sent-187, score-0.398]

97 We are thus eager to combine both of these models into a unified model of higher-order form and motion in images. [sent-188, score-0.318]

98 A selection model for motion processing in area MT of primates. [sent-233, score-0.318]

99 Invariant global motion recognition in the dorsal visual system: A unifying theory. [sent-248, score-0.445]

100 Computation of component image velocity from local phase information. [sent-316, score-0.444]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('layer', 0.366), ('ai', 0.332), ('motion', 0.287), ('phase', 0.259), ('transformational', 0.224), ('amplitude', 0.218), ('invariants', 0.186), ('image', 0.152), ('movies', 0.148), ('basis', 0.138), ('wk', 0.119), ('coef', 0.114), ('visual', 0.11), ('spatial', 0.11), ('cients', 0.103), ('units', 0.102), ('dw', 0.094), ('circularly', 0.093), ('patches', 0.093), ('invariant', 0.084), ('imaginary', 0.081), ('kurtotic', 0.081), ('patch', 0.081), ('complex', 0.081), ('coding', 0.072), ('tile', 0.07), ('amplitudes', 0.07), ('layers', 0.07), ('transformations', 0.07), ('phases', 0.063), ('dik', 0.061), ('natural', 0.058), ('motions', 0.058), ('perception', 0.058), ('sin', 0.058), ('sparse', 0.057), ('america', 0.056), ('sw', 0.056), ('hierarchical', 0.055), ('domain', 0.053), ('hidden', 0.052), ('shifting', 0.052), ('optical', 0.051), ('dynamics', 0.051), ('produces', 0.051), ('temporal', 0.05), ('ar', 0.05), ('recognition', 0.048), ('vision', 0.048), ('invariance', 0.047), ('diversity', 0.047), ('functions', 0.047), ('cadieu', 0.047), ('fleet', 0.047), ('sla', 0.047), ('transinv', 0.047), ('oriented', 0.046), ('translation', 0.045), ('simoncelli', 0.045), ('sp', 0.045), ('learned', 0.044), ('movie', 0.044), ('narrow', 0.044), ('zi', 0.043), ('karklin', 0.041), ('aperture', 0.041), ('dilation', 0.041), ('spin', 0.041), ('zetzsche', 0.041), ('cos', 0.041), ('structure', 0.041), ('images', 0.04), ('moving', 0.04), ('population', 0.04), ('generative', 0.039), ('position', 0.039), ('direction', 0.038), ('moves', 0.038), ('frequency', 0.038), ('shares', 0.037), ('disambiguate', 0.037), ('adelson', 0.037), ('hyvarinen', 0.037), ('horizontal', 0.036), ('ui', 0.036), ('imposes', 0.036), ('representations', 0.035), ('polar', 0.035), ('fourier', 0.035), ('object', 0.034), ('extract', 0.034), ('uence', 0.034), ('local', 0.033), ('testable', 0.033), ('orientation', 0.033), ('learns', 0.032), ('emergence', 0.031), ('portion', 0.031), ('trajectories', 0.031), ('model', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999928 118 nips-2008-Learning Transformational Invariants from Natural Movies

Author: Charles Cadieu, Bruno A. Olshausen

Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1

2 0.22266316 136 nips-2008-Model selection and velocity estimation using novel priors for motion patterns

Author: Shuang Wu, Hongjing Lu, Alan L. Yuille

Abstract: Psychophysical experiments show that humans are better at perceiving rotation and expansion than translation. These findings are inconsistent with standard models of motion integration which predict best performance for translation [6]. To explain this discrepancy, our theory formulates motion perception at two levels of inference: we first perform model selection between the competing models (e.g. translation, rotation, and expansion) and then estimate the velocity using the selected model. We define novel prior models for smooth rotation and expansion using techniques similar to those in the slow-and-smooth model [17] (e.g. Green functions of differential operators). The theory gives good agreement with the trends observed in human experiments. 1

3 0.21536122 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

Author: Xuming He, Richard S. Zemel

Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1

4 0.17974588 157 nips-2008-Nonrigid Structure from Motion in Trajectory Space

Author: Ijaz Akhter, Yaser Sheikh, Sohaib Khan, Takeo Kanade

Abstract: Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes, which have to be estimated anew for each video sequence. In contrast, we propose that the evolving 3D structure be described by a linear combination of basis trajectories. The principal advantage of this approach is that we do not need to estimate any basis vectors during computation. We show that generic bases over trajectories, such as the Discrete Cosine Transform (DCT) basis, can be used to compactly describe most real motions. This results in a significant reduction in unknowns, and corresponding stability in estimation. We report empirical performance, quantitatively using motion capture data, and qualitatively on several video sequences exhibiting nonrigid motions including piece-wise rigid motion, partially nonrigid motion (such as a facial expression), and highly nonrigid motion (such as a person dancing). 1

5 0.14379662 148 nips-2008-Natural Image Denoising with Convolutional Networks

Author: Viren Jain, Sebastian Seung

Abstract: We present an approach to low-level vision that combines two main ideas: the use of convolutional networks as an image processing architecture and an unsupervised learning procedure that synthesizes training samples from specific noise models. We demonstrate this approach on the challenging problem of natural image denoising. Using a test set with a hundred natural images, we find that convolutional networks provide comparable and in some cases superior performance to state of the art wavelet and Markov random field (MRF) methods. Moreover, we find that a convolutional network offers similar performance in the blind denoising setting as compared to other techniques in the non-blind setting. We also show how convolutional networks are mathematically related to MRF approaches by presenting a mean field theory for an MRF specially designed for image denoising. Although these approaches are related, convolutional networks avoid computational difficulties in MRF approaches that arise from probabilistic learning and inference. This makes it possible to learn image processing architectures that have a high degree of representational power (we train models with over 15,000 parameters), but whose computational expense is significantly less than that associated with inference in MRF approaches with even hundreds of parameters. 1 Background Low-level image processing tasks include edge detection, interpolation, and deconvolution. These tasks are useful both in themselves, and as a front-end for high-level visual tasks like object recognition. This paper focuses on the task of denoising, defined as the recovery of an underlying image from an observation that has been subjected to Gaussian noise. One approach to image denoising is to transform an image from pixel intensities into another representation where statistical regularities are more easily captured. For example, the Gaussian scale mixture (GSM) model introduced by Portilla and colleagues is based on a multiscale wavelet decomposition that provides an effective description of local image statistics [1, 2]. Another approach is to try and capture statistical regularities of pixel intensities directly using Markov random fields (MRFs) to define a prior over the image space. Initial work used handdesigned settings of the parameters, but recently there has been increasing success in learning the parameters of such models from databases of natural images [3, 4, 5, 6, 7, 8]. Prior models can be used for tasks such as image denoising by augmenting the prior with a noise model. Alternatively, an MRF can be used to model the probability distribution of the clean image conditioned on the noisy image. This conditional random field (CRF) approach is said to be discriminative, in contrast to the generative MRF approach. Several researchers have shown that the CRF approach can outperform generative learning on various image restoration and labeling tasks [9, 10]. CRFs have recently been applied to the problem of image denoising as well [5]. 1 The present work is most closely related to the CRF approach. Indeed, certain special cases of convolutional networks can be seen as performing maximum likelihood inference on a CRF [11]. The advantage of the convolutional network approach is that it avoids a general difficulty with applying MRF-based methods to image analysis: the computational expense associated with both parameter estimation and inference in probabilistic models. For example, naive methods of learning MRFbased models involve calculation of the partition function, a normalization factor that is generally intractable for realistic models and image dimensions. As a result, a great deal of research has been devoted to approximate MRF learning and inference techniques that meliorate computational difficulties, generally at the cost of either representational power or theoretical guarantees [12, 13]. Convolutional networks largely avoid these difficulties by posing the computational task within the statistical framework of regression rather than density estimation. Regression is a more tractable computation and therefore permits models with greater representational power than methods based on density estimation. This claim will be argued for with empirical results on the denoising problem, as well as mathematical connections between MRF and convolutional network approaches. 2 Convolutional Networks Convolutional networks have been extensively applied to visual object recognition using architectures that accept an image as input and, through alternating layers of convolution and subsampling, produce one or more output values that are thresholded to yield binary predictions regarding object identity [14, 15]. In contrast, we study networks that accept an image as input and produce an entire image as output. Previous work has used such architectures to produce images with binary targets in image restoration problems for specialized microscopy data [11, 16]. Here we show that similar architectures can also be used to produce images with the analog fluctuations found in the intensity distributions of natural images. Network Dynamics and Architecture A convolutional network is an alternating sequence of linear filtering and nonlinear transformation operations. The input and output layers include one or more images, while intermediate layers contain “hidden

6 0.14217412 119 nips-2008-Learning a discriminative hidden part model for human action recognition

7 0.13499831 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries

8 0.12567566 62 nips-2008-Differentiable Sparse Coding

9 0.12366334 158 nips-2008-Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks

10 0.11162885 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning

11 0.1095089 66 nips-2008-Dynamic visual attention: searching for coding length increments

12 0.1077394 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition

13 0.10625581 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing

14 0.094628647 95 nips-2008-Grouping Contours Via a Related Image

15 0.092544451 232 nips-2008-The Conjoint Effect of Divisive Normalization and Orientation Selectivity on Redundancy Reduction

16 0.090752162 75 nips-2008-Estimating vector fields using sparse basis field expansions

17 0.086867064 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes

18 0.083113149 23 nips-2008-An ideal observer model of infant object perception

19 0.082630888 226 nips-2008-Supervised Dictionary Learning

20 0.082137898 202 nips-2008-Robust Regression and Lasso


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.248), (1, -0.051), (2, 0.206), (3, -0.077), (4, 0.076), (5, 0.022), (6, -0.201), (7, -0.115), (8, 0.089), (9, 0.07), (10, -0.011), (11, 0.015), (12, 0.1), (13, 0.226), (14, 0.012), (15, -0.129), (16, -0.117), (17, 0.009), (18, -0.113), (19, -0.072), (20, -0.117), (21, -0.008), (22, -0.15), (23, -0.191), (24, 0.073), (25, -0.063), (26, -0.03), (27, -0.097), (28, 0.065), (29, -0.096), (30, 0.067), (31, 0.077), (32, -0.008), (33, 0.005), (34, 0.049), (35, -0.053), (36, 0.017), (37, -0.063), (38, 0.126), (39, -0.06), (40, -0.024), (41, 0.019), (42, 0.01), (43, -0.016), (44, -0.036), (45, 0.036), (46, -0.025), (47, 0.008), (48, 0.01), (49, -0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97225726 118 nips-2008-Learning Transformational Invariants from Natural Movies

Author: Charles Cadieu, Bruno A. Olshausen

Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1

2 0.71660745 136 nips-2008-Model selection and velocity estimation using novel priors for motion patterns

Author: Shuang Wu, Hongjing Lu, Alan L. Yuille

Abstract: Psychophysical experiments show that humans are better at perceiving rotation and expansion than translation. These findings are inconsistent with standard models of motion integration which predict best performance for translation [6]. To explain this discrepancy, our theory formulates motion perception at two levels of inference: we first perform model selection between the competing models (e.g. translation, rotation, and expansion) and then estimate the velocity using the selected model. We define novel prior models for smooth rotation and expansion using techniques similar to those in the slow-and-smooth model [17] (e.g. Green functions of differential operators). The theory gives good agreement with the trends observed in human experiments. 1

3 0.70877719 157 nips-2008-Nonrigid Structure from Motion in Trajectory Space

Author: Ijaz Akhter, Yaser Sheikh, Sohaib Khan, Takeo Kanade

Abstract: Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes, which have to be estimated anew for each video sequence. In contrast, we propose that the evolving 3D structure be described by a linear combination of basis trajectories. The principal advantage of this approach is that we do not need to estimate any basis vectors during computation. We show that generic bases over trajectories, such as the Discrete Cosine Transform (DCT) basis, can be used to compactly describe most real motions. This results in a significant reduction in unknowns, and corresponding stability in estimation. We report empirical performance, quantitatively using motion capture data, and qualitatively on several video sequences exhibiting nonrigid motions including piece-wise rigid motion, partially nonrigid motion (such as a facial expression), and highly nonrigid motion (such as a person dancing). 1

4 0.60898381 119 nips-2008-Learning a discriminative hidden part model for human action recognition

Author: Yang Wang, Greg Mori

Abstract: We present a discriminative part-based approach for human action recognition from video sequences using motion features. Our model is based on the recently proposed hidden conditional random field (hCRF) for object recognition. Similar to hCRF for object recognition, we model a human action by a flexible constellation of parts conditioned on image observations. Different from object recognition, our model combines both large-scale global features and local patch features to distinguish various actions. Our experimental results show that our model is comparable to other state-of-the-art approaches in action recognition. In particular, our experimental results demonstrate that combining large-scale global features and local patch features performs significantly better than directly applying hCRF on local patches alone. 1

5 0.59722716 66 nips-2008-Dynamic visual attention: searching for coding length increments

Author: Xiaodi Hou, Liqing Zhang

Abstract: A visual attention system should respond placidly when common stimuli are presented, while at the same time keep alert to anomalous visual inputs. In this paper, a dynamic visual attention model based on the rarity of features is proposed. We introduce the Incremental Coding Length (ICL) to measure the perspective entropy gain of each feature. The objective of our model is to maximize the entropy of the sampled visual features. In order to optimize energy consumption, the limit amount of energy of the system is re-distributed amongst features according to their Incremental Coding Length. By selecting features with large coding length increments, the computational system can achieve attention selectivity in both static and dynamic scenes. We demonstrate that the proposed model achieves superior accuracy in comparison to mainstream approaches in static saliency map generation. Moreover, we also show that our model captures several less-reported dynamic visual search behaviors, such as attentional swing and inhibition of return. 1

6 0.58314401 148 nips-2008-Natural Image Denoising with Convolutional Networks

7 0.55154991 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries

8 0.54788005 158 nips-2008-Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks

9 0.51000488 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

10 0.49067762 240 nips-2008-Tracking Changing Stimuli in Continuous Attractor Neural Networks

11 0.46740246 23 nips-2008-An ideal observer model of infant object perception

12 0.41040468 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences

13 0.40196949 33 nips-2008-Bayesian Model of Behaviour in Economic Games

14 0.39005411 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing

15 0.38828194 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition

16 0.38745126 160 nips-2008-On Computational Power and the Order-Chaos Phase Transition in Reservoir Computing

17 0.38235503 75 nips-2008-Estimating vector fields using sparse basis field expansions

18 0.38075343 62 nips-2008-Differentiable Sparse Coding

19 0.37531465 152 nips-2008-Non-stationary dynamic Bayesian networks

20 0.34980851 232 nips-2008-The Conjoint Effect of Divisive Normalization and Orientation Selectivity on Redundancy Reduction


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.065), (7, 0.1), (12, 0.072), (15, 0.014), (28, 0.174), (57, 0.114), (59, 0.031), (60, 0.195), (63, 0.02), (71, 0.024), (77, 0.055), (78, 0.012), (83, 0.041), (94, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96527785 38 nips-2008-Bio-inspired Real Time Sensory Map Realignment in a Robotic Barn Owl

Author: Juan Huo, Zhijun Yang, Alan F. Murray

Abstract: The visual and auditory map alignment in the Superior Colliculus (SC) of barn owl is important for its accurate localization for prey behavior. Prism learning or Blindness may interfere this alignment and cause loss of the capability of accurate prey. However, juvenile barn owl could recover its sensory map alignment by shifting its auditory map. The adaptation of this map alignment is believed based on activity dependent axon developing in Inferior Colliculus (IC). A model is built to explore this mechanism. In this model, axon growing process is instructed by an inhibitory network in SC while the strength of the inhibition adjusted by Spike Timing Dependent Plasticity (STDP). We test and analyze this mechanism by application of the neural structures involved in spatial localization in a robotic system. 1

same-paper 2 0.85040498 118 nips-2008-Learning Transformational Invariants from Natural Movies

Author: Charles Cadieu, Bruno A. Olshausen

Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1

3 0.84265804 249 nips-2008-Variational Mixture of Gaussian Process Experts

Author: Chao Yuan, Claus Neubauer

Abstract: Mixture of Gaussian processes models extended a single Gaussian process with ability of modeling multi-modal data and reduction of training complexity. Previous inference algorithms for these models are mostly based on Gibbs sampling, which can be very slow, particularly for large-scale data sets. We present a new generative mixture of experts model. Each expert is still a Gaussian process but is reformulated by a linear model. This breaks the dependency among training outputs and enables us to use a much faster variational Bayesian algorithm for training. Our gating network is more flexible than previous generative approaches as inputs for each expert are modeled by a Gaussian mixture model. The number of experts and number of Gaussian components for an expert are inferred automatically. A variety of tests show the advantages of our method. 1

4 0.77233362 66 nips-2008-Dynamic visual attention: searching for coding length increments

Author: Xiaodi Hou, Liqing Zhang

Abstract: A visual attention system should respond placidly when common stimuli are presented, while at the same time keep alert to anomalous visual inputs. In this paper, a dynamic visual attention model based on the rarity of features is proposed. We introduce the Incremental Coding Length (ICL) to measure the perspective entropy gain of each feature. The objective of our model is to maximize the entropy of the sampled visual features. In order to optimize energy consumption, the limit amount of energy of the system is re-distributed amongst features according to their Incremental Coding Length. By selecting features with large coding length increments, the computational system can achieve attention selectivity in both static and dynamic scenes. We demonstrate that the proposed model achieves superior accuracy in comparison to mainstream approaches in static saliency map generation. Moreover, we also show that our model captures several less-reported dynamic visual search behaviors, such as attentional swing and inhibition of return. 1

5 0.76288122 200 nips-2008-Robust Kernel Principal Component Analysis

Author: Minh H. Nguyen, Fernando Torre

Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1

6 0.76118684 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization

7 0.76051712 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

8 0.76036894 62 nips-2008-Differentiable Sparse Coding

9 0.75653058 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes

10 0.75595146 4 nips-2008-A Scalable Hierarchical Distributed Language Model

11 0.75563657 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

12 0.75542217 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

13 0.75514251 27 nips-2008-Artificial Olfactory Brain for Mixture Identification

14 0.75387388 138 nips-2008-Modeling human function learning with Gaussian processes

15 0.75130701 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

16 0.75058937 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

17 0.74924338 215 nips-2008-Sparse Signal Recovery Using Markov Random Fields

18 0.74923778 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

19 0.74916232 75 nips-2008-Estimating vector fields using sparse basis field expansions

20 0.74866444 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC