nips nips2009 nips2009-88 knowledge-graph by maker-knowledge-mining

88 nips-2009-Extending Phase Mechanism to Differential Motion Opponency for Motion Pop-out


Source: pdf

Author: Yicong Meng, Bertram E. Shi

Abstract: We extend the concept of phase tuning, a ubiquitous mechanism among sensory neurons including motion and disparity selective neurons, to the motion contrast detection. We demonstrate that the motion contrast can be detected by phase shifts between motion neuronal responses in different spatial regions. By constructing the differential motion opponency in response to motions in two different spatial regions, varying motion contrasts can be detected, where similar motion is detected by zero phase shifts and differences in motion by non-zero phase shifts. The model can exhibit either enhancement or suppression of responses by either different or similar motion in the surrounding. A primary advantage of the model is that the responses are selective to relative motion instead of absolute motion, which could model neurons found in neurophysiological experiments responsible for motion pop-out detection. 1 In trod u ction Motion discontinuity or motion contrast is an important cue for the pop-out of salient moving objects from contextual backgrounds. Although the neural mechanism underlying the motion pop-out detection is still unknown, the center-surround receptive field (RF) organization is considered as a physiological basis responsible for the pop-out detection. The center-surround RF structure is simple and ubiquitous in cortical cells especially in neurons processing motion and color information. Nakayama and Loomis [1] have predicted the existence of motion selective neurons with antagonistic center-surround receptive field organization in 1974. Recent physiological experiments [2][3] show that neurons with center-surround RFs have been found in both middle temporal (MT) and medial superior temporal (MST) areas related to motion processing. This antagonistic mechanism has been suggested to detect motion segmentation [4], figure/ground segregation [5] and the differentiation of object motion from ego-motion [6]. There are many related works [7]-[12] on motion pop-out detection. Some works [7]-[9] are based on spatio-temporal filtering outputs, but motion neurons are not fully interacted by either only inhibiting similar motion [7] or only enhancing opposite motion [8]. Heeger, et al. [7] proposed a center-surround operator to eliminate the response dependence upon rotational motions. But the Heeger's model only shows a complete center-surround interaction for moving directions. With respect to the surrounding speed effects, the neuronal responses are suppressed by the same speed with the center motion but not enhanced by other speeds. Similar problem existed in [8], which only modeled the suppression of neuronal responses in the classical receptive field (CRF) by similar motions in surrounding regions. Physiological experiments [10][11] show that many neurons in visual cortex are sensitive to the motion contrast rather than depend upon the absolute direction and speed of the object motion. Although pooling over motion neurons tuned to different velocities can eliminate the dependence upon absolute velocities, it is computationally inefficient and still can't give full interactions of both suppression and enhancement by similar and opposite surrounding motions. The model proposed by Dellen, et al. [12] computed differential motion responses directly from complex cells in V1 and didn't utilize responses from direction selective neurons. In this paper, we propose an opponency model which directly responds to differential motions by utilizing the phase shift mechanism. Phase tuning is a ubiquitous mechanism in sensory information processing, including motion, disparity and depth detection. Disparity selective neurons in the visual cortex have been found to detect disparities by adjusting the phase shift between the receptive field organizations in the left and right eyes [13][14]. Motion sensitive cells have been modeled in the similar way as the disparity energy neurons and detect image motions by utilizing the phase shift between the real and imaginary parts of temporal complex valued responses, which are comparable to images to the left and right eyes [15]. Therefore, the differential motion can be modeled by exploring the similarity between images from different spatial regions and from different eyes. The remainder of this paper is organized as following. Section 2 illustrates the phase shift motion energy neurons which estimate image velocities by the phase tuning in the imaginary path of the temporal receptive field responses. In section 3, we extend the concept of phase tuning to the construction of differential motion opponency. The phase difference determines the preferred velocity difference between adjacent areas in retinal images. Section 4 investigates properties of motion pop-out detection by the proposed motion opponency model. Finally, in section 5, we relate our proposed model to the neural mechanism of motion integration and motion segmentation in motion related areas and suggest a possible interpretation for adaptive center-surround interactions observed in biological experiments. 2 Phase Shift Motion Energy Neurons Adelson and Bergen [16] proposed the motion energy model for visual motion perception by measuring spatio-temporal orientations of image sequences in space and time. The motion energy model posits that the responses of direction-selective V1 complex cells can be computed by a combination of two linear spatio-temporal filtering stages, followed by squaring and summation. The motion energy model was extended in [15] to be phase tuned by splitting the complex valued temporal responses into real and imaginary paths and adding a phase shift on the imaginary path. Figure 1(a) demonstrates the schematic diagram of the phase shift motion energy model. Here we assume an input image sequence in two-dimensional space (x, y) and time t. The separable spatio-temporal receptive field ensures the cascade implementation of RF with spatial and temporal filters. Due to the requirement of the causal temporal RF, the phase shift motion energy model didn’t adopt the Gabor filter like the spatial RF. The phase shift spatio-temporal RF is modeled with a complex valued function f ( x, y, t ) = g ( x, y ) ⋅ h ( t , Φ ) , where the spatial and temporal RFs are denoted by g ( x, y ) and h ( t , Φ ) respectively, g ( x, y ) = N ( x, y | 0, C ) exp ( jΩ x x + jΩ y y ) h ( t , Φ ) = hreal ( t ) + exp ( jΦ ) himag ( t ) (1) and C is the covariance matrix of the spatial Gaussian envelope and Φ is the phase tuning of the motion energy neuron. The real and imaginary profiles of the temporal receptive field are Gamma modulated sinusoidal functions with quadrature phases, hreal ( t ) = G ( t | α ,τ ) cos ( Ωt t ) (2) himag ( t ) = G ( t | α ,τ ) sin ( Ωt t ) The envelopes for complex exponentials are functions of Gaussian and Gamma distributions, N ( x, y | 0, C ) = ⎛ x2 y2 exp ⎜ − 2 − 2 ⎜ 2σ x 2σ y 2πσ xσ y ⎝ 1 ⎞ ⎟ ⎟ ⎠ (3) hreal (t ) g ( x, y ) himag (t ) g ( x, y ) (·)2 (·)2 M M M (·)2 (·)2 M M M 2 (·) 2 (·) Vreal V (Φ ) e jΦ Vimag (a) Ev ( Φ max ) (·)2 wc ( x, y ) e jΦmin Ev ( (b) M 0 ) w ( x, y ) c M Ev ( Φ min ) M EΔv ( Θ ) ∫∫∫ K x , y ,Φ e j0 e jΦmin ws ( x, y ) Ks c e ∫∫∫ jΘ ws ( x, y ) e j0 x , y ,Φ ws ( x, y ) wc ( x, y ) M e jΦ max e jΦmax (·)2 M (·)2 M M (·)2 M 2 (·) M M (·)2 (c) Figure 1. (a) shows the diagram of the phase shift motion energy model adapted from [15]. (b) draws the spatiotemporal representation of the phase shift motion energy neuron with the real and imaginary receptive field demonstrated by the two left pictures. (c) illustrates the construction of differential motion opponency with a phase difference Θ from two populations of phase shift motion energy neurons in two spatial areas c and s. To avoid clutter, the space location (x, y) is not explicitly shown in phase tuned motion energies. G (t | α ,τ ) = 1 ⎛ t t α −1 exp ⎜ − Γ(α )τ α ⎝ τ ⎞ ⎟ u (t ) ⎠ (4) where Γ (α ) is the gamma function and u ( t ) is the unit step function. The parameters α and τ determine the temporal RF size. As derived in [15], the motion energy at location (x, y) can be computed by E v ( x, y, Φ ) = S + P cos ( Ψ − Φ ) (5) where S = Vreal 2 + Vimag 2 * P = 2 VrealVimag ( * Ψ = arg VrealVimag (6) ) and complex valued responses in real and imaginary paths are obtained as, Vreal ( x, y, t ) = ∫∫∫ g (ξ , ζ ) h (η ) I ( x − ξ , y − ζ , t − η ) dξ dζ dη real ξ ,ζ ,η Vimag ( x, y, t ) = ∫∫∫ g (ξ , ζ ) h (η ) I ( x − ξ , y − ζ , t − η ) dξ dζ dη ξ ζ η (7) imag , , The superscript * represents the complex conjugation and the phase shift parameter Φ controls the spatio-temporal orientation tuning. To avoid clutter, the spatial location variables x and y for S, P, Ψ, Vreal and Vimag are not explicitly shown in Eq. (5) and (6). Figure 1(b) demonstrates the even and odd profiles of the spatio-temporal RF tuned to a particular phase shift. Θ 0 Θ 0 (a) (b) Figure 2. Two types of differential motion opponency constructions of (a) center-surrounding interaction and (b) left-right interaction. Among cells in area MT with surrounding modulations, 25% of cells are with the antagonistic RF structure in the top row and another 50% of cells have the integrative RF structure as shown in the bottom row. 3 Extending Phase Op p on ency Mechanism to D i f f e r e nt i a l Motion Based on the above phase shift motion energy model, the local image velocity at each spatial location can be represented by a phase shift which leads to the peak response across a population of motion energy neurons. Across regions of different motions, there are clear discontinuities on the estimated velocity map. The motion discontinuities can be detected by edge detectors on the velocity map to segment different motions. However, this algorithm for motion discontinuities detection can’t discriminate between the object motion and uniform motions in contextual backgrounds. Here we propose a phase mechanism to detect differential motions inspired by the disparity energy model and adopt the center-surround inhibition mechanism to pop out the object motion from contextual background motions. The motion differences between different spatial locations can be modeled in the similar way as the disparity model. The motion energies from two neighboring locations are considered as the retinal images to the left and right eyes. Thus, we can construct a differential motion opponency by placing two populations of phase shift motion energy neurons at different spatial locations and the energy EΔv ( Θ ) of the opponency is the squared modulus of the averaged phase shift motion energies over space and phase, E Δv ( Θ ) = ∫∫∫ E ( x, y, Φ ) ⋅ w ( x, y, Φ | Θ ) dxdyd Φ v 2 (8) where w ( x, y, Θ ) is the profile for differential motion opponency and Δv is the velocity difference between the two spatial regions defined by the kernel w ( x, y, Θ ) . Since w ( x, y, Θ ) is intended to implement the functional role of spatial interactions, it is desired to be a separable function in space and phase domain and can be modeled by phase tuned summation of two spatial kernels, w ( x, y, Φ | Θ ) = wc ( x, y ) e jΦ + e jΘ+ jΦ ws ( x, y ) (9) where wc ( x, y ) and ws ( x, y ) are Gaussian kernels of different spatial sizes σ c and σ s , and Θ is the phase difference representing velocity difference between two spatial regions c and s. Substituting Eq. (9) into Eq. (8), the differential motion energy can be reformulated as EΔv ( Θ ) = K c + e jΘ K s 2 (10) 3 3 3 2 2 2 1 1 1 0 0 -1 -1 -2 -2 -2 -3 -3 -3 -3 -3 1 Right Velocity Right Velocity 0.98 0.96 0.94 0.92 0 0.9 0.88 -1 0.86 0.84 0.82 -2 -1 0 1 Left Velocity 2 3 -2 -1 0 1 Left Velocity 2 3 0.8 (a) (b) Figure 3. (a) Phase map and (b) peak magnitude map are obtained from stimuli of two patches of random dots moving with different velocities. The two patches of stimuli are statistically independent but share the same spatial properties: dot size of 2 pixels, dot density of 10% and dot coherence level of 100%. The phase tuned population of motion energy neurons are applied to each patch of random dots with RF parameters: Ωt = 2π/8, Ωt = 2π/16, σx = 5 and τ = 5.5. For each combination of velocities from left and right patches, averaged phase shifts over space and time are computed and so do the magnitudes of peak responses. The unit for velocities is pixels per frame. where Kc = ∫∫∫ E ( x, y, Φ ) exp ( jΦ ) w ( x, y ) dxdyd Φ v,c c x , y ,Φ Ks = ∫∫∫ E ( x, y, Φ ) exp ( jΦ ) w ( x, y ) dxdyd Φ v,s (11) s x, y ,Φ Ev ,c ( x, y, Φ ) and Ev , s ( x, y, Φ ) are phase shift motion energies at location (x, y) and with phase shift Φ. Utilizing the results in Eq. (5) and (6), Eq. (10) and (11) generate similar results, E Δv ( Θ ) = Sopp + Popp cos ( Θopp − Θ ) (12) where Sopp = K c 2 + Ks Popp = 2 K c K s* 2 (13) Θopp = arg ( K c K s* ) According to above derivations, by varying the phase shift Θ between –π and π, the relative motion energy of the differential motion opponency can be modeled as population responses across a population of phase tuned motion opponencies. The response is completely specified by three parameters Sopp , Popp and Θopp . The schematic diagram of this opponency is illustrated in Figure 1(c). The differential motion opponency is constituted by three stages. At the first stage, a population of phase shift motion energy neurons is applied to be selective to different velocities. At the second stage, motion energies from the first stage are weighted by kernels tuned to different spatial locations and phase shifts respectively for both spatial regions and two single differential motion signals in region c and region s are achieved by integrating responses from these two regions over space and phase tuning. Finally, the differential motion energy is computed by the squared modulus of the summation of the integrated motion signal in region c and phase shifted motion signal in region s. The subscripts c and s represent two interacted spatial regions which are not limited to the center and surround regions. The opponency could also be constructed by the neighboring left and right Inhibitive interaction, Θ = π/2 Excitatory interaction, Θ =0 Inhibitory 2 1.6 Responses 1.6 Responses Excitatory 2 1.2 0.8 1.2 0.8 0.4 0.4 0 0 pi/2 pi 3pi/2 Surrouding Direction 0 0 2pi (a) Model by Petkov et al. [8] pi/2 pi 3pi/2 Surrouding Direction (b) Model by Heeger et al. [7] Inhibitory 2 2pi Inhibitory 2 1.6 1.6 Responses Responses 1.2 0.8 1.2 0.8 0.4 0.4 0 0 0 pi/2 pi Surrouding Direction 3pi/2 2pi 0 pi/2 pi Surrouding Direction 3pi/2 2pi (c) (d) Figure 4. Demonstrations of center-surround differential motion opponency, where (a) show the excitation of opposite directions outside the CRF and (b) show the inhibition by surrounding motions in same directions. The center-surround inhibition models by Petkov, et al. [8] and Heeger, et al. [7] are shown in (c) and (d). Responses above 1 indicate enhancement and responses below 1 indicate suppressions. spatial regions. Figure 2 shows two types of structures for the differential motion opponency. In [17], the authors demonstrates that among cells in area MT with surrounding modulations, 25% of cells are with the antagonistic RF structure as shown in Figure 2(a) and another 50% of cells have the integrative RF structure as shown in Figure 2(b). The velocity difference tuning of the opponency is determined by the phase shift parameter Θ combined with parameters of spatial and temporal frequencies for motion energy neurons. The larger phase shift magnitude prefers the bigger velocity difference. This phase tuning of velocity difference is consistent with the phase tuning of motion energy neurons. Figure 3 shows the phase map obtained by using random dots stimuli with different velocities on two spatial patches (left and right patches with sizes of 128 pixels 128 pixels). Along the diagonal line, velocities from left and right patches are equal to each other and therefore phase estimates are zeros along this line. Deviated from the diagonal line to upper-left and lower-right, the phase magnitudes increase while positive phases indicate larger left velocities and negative phases indicate larger right velocities. The phase tuning can give a good classification of velocity differences. 4 V a l i d a t i o n o f D i f f e r e n t i a l M o t i o n O pp o n e n c y Out derivation and analysis above show that the phase shift between two neighboring spatial regions is a good indicator for motion difference between these two regions. In this section, we validate the proposed differential motion opponency by two sets of experiments, which show effects of both surrounding directions and speeds on the center motion. Inhibitory 2 1.6 1.2 1.2 Responses 1.6 Responses Inhibitory 2 0.8 0.4 0.4 0 -2 0.8 0 -1.5 -1 -0.5 0 0.5 Center Speed 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 Center Speed 1 1.5 2 (a) (b) Figure 5. The insensitivity of the proposed opponency model to absolute center and surrounding velocities is demonstrated in (a), where responses are enhanced for all center velocities from -2 to 2 pixels per frame. In (b), the model by Heeger, et al. [7] only shows enhancement when the center speed matches the preferred speed of 1.2 pixel per frame. Similarly, responses above 1 indicate enhancement and below 1 indicate suppressions. In both curves, the velocity differences between center and surrounding regions are maintained as a constant of 3 pixels per frame. Physiological experiments [2][3] have demonstrated that the neuronal activities in the classical receptive field are suppressed by responses outside the CRF to stimuli with similar motions including both directions and speeds on the center and surrounding regions. On the contrary, visual stimuli of opposite directions or quite different speeds outside the CRF enhance the responses in the CRF. In their experiments, they used a set of stimuli of random dots moving at different velocities, where there are small patches of moving random dots on the center. We tested the properties of the proposed opponency model for motion difference measurement by using similar random dots stimuli. The random dots on background move with different speeds and in different direction but have the same statistical parameters: dot size of 2 pixels, dot density of 10% and motion coherence level of 100%. The small random dots patches are placed on the center of background stimuli to stimulate the neurons in the CRF. These small patches share the same statistical parameters with background random dots but move with a constant velocity of 1 pixel per frame. Figure 4 shows results for the enhanced and suppressed responses in the CRF with varying surrounding directions. The phase shift motion energy neurons had the same spatial and temporal frequencies and the same receptive field sizes, and were selective to vertical orientations. The preferred spatial frequency was 2π/16 radian per pixel and the temporal frequency was 2π/16 radian per frame. The sizes of RF in horizontal and vertical directions were respectively 5 pixels and 10 pixels, corresponding to a spatial bandwidth of 1.96 octaves. The time constant τ was 5.5 frames which resulted in a temporal bandwidth of 1.96 octaves. As shown in Figure 4 (a) and (b), the surrounding motion of opposite direction gives the largest response to the motion in the CRF for the inhibitory interaction and the smallest response for the excitatory interaction. Results demonstrated in Figure 4 are consistent with physiological results reported in [3]. In Born’s paper, inhibitory cells show response enhancement and excitatory cells show response suppression when surrounding motions are in opposite directions. The 3-dB bandwidth for the surrounding moving direction is about 135 degrees for the physiological experiments while the bandwidth is about 180 degrees for the simulation results in our proposed model. Models proposed by Petkov, et al. [8] and Heeger, et al. [7] also show clear inhibition between opposite motions. The Petkov’s model achieves the surrounding suppression for each point in ( x, y, t ) space by the subtraction between responses from that point and its surroundings and followed by a half-wave rectification, + % Ev ,θ ( x, y, t ) = Ev ,θ ( x, y, t ) − α ⋅ Sv ,θ ( x, y, t ) (14) where Ev ,θ ( x, y, t ) is the motion energy at location (x,y) and time t for a given preferred speed v and orientation θ, Sv ,θ ( x, y, t ) is the average motion energy in the surrounding of point (x, y, t), % Ev ,θ ( x, y, t ) is the suppressed motion energy and the factor α controls the inhibition strength. The inhibition term is computed by weighted motion energy Sv ,θ ( x, y, t ) = Ev ,θ ( x, y, t ) ∗ wv ,θ ( x, y, t ) (15) where wv ,θ ( x, y, t ) is the surround weighting function. The Heeger’s model constructs the center-surround motion opponent by computing the weighted sum of responses from motion selective cells, Rv ,θ ( t ) = ∑ β ( x, y ) ⎡ Ev ,θ ( x, y, t ) − E− v ,θ ( x, y, t ) ⎤ ⎣ ⎦ (16) x, y where β ( x, y ) is a center-surround weighting function and the motion energy at each point should be normalized across all cells with different tuning properties. As shown in Figure 4 (c) and (d) for results of Petkov’s and Heeger’s models, we replace the conventional frequency tuned motion energy neuron with our proposed phase tuned neuron. The model by Petkov, et al. [8] is generally suppressive and only reproduces less suppression for opposite motions, which is inconsistent with results from [3]. The model by Heeger, et al. [7] has similar properties with our proposed model with respect to both excitatory and inhibitory interactions. To evaluate the sensitivity of the proposed opponency model to velocity differences, we did simulations by using similar stimuli with the above experiment in Figure 4 but maintaining a constant velocity difference of 3 pixels per frame between the center and surrounding random dot patches. As shown in Figure 5, by varying the velocities of random dots on the center region, we found that responses by the proposed model are always enhanced independent upon absolute velocities of center stimuli, but responses by the Heeger’s model achieve the enhancement at a center velocity of 1.2 pixels per frame and maintain suppressed at other speeds. 5 D i s c u s s i on We proposed a new biologically plausible model of the differential motion opponency to model the spatial interaction property of motion energy neurons. The proposed opponency model is motivated by the phase tuning mechanism of disparity energy neurons which infers the disparity information from the phase difference between complex valued responses to left and right retinal images. Hence, the two neighboring spatial areas can be considered as left and right images and the motion difference between these two spatial regions is detected by the phase difference between the complex valued responses at these two regions. Our experimental results demonstrate a consistent conclusion with physiological experiments that motions of opposite directions and different speeds outside the CRF can show both inhibitive and excitatory effects on the CRF responses. The inhibitive interaction helps to segment the moving object from backgrounds when fed back to low-level features such as edges, orientations and color information. Except providing a unifying phase mechanism in understanding neurons with different functional roles at different brain areas, the proposed opponency model could possibly provide a way to understand the motion integration and motion segmentation. Integration and segmentation are two opposite motion perception tasks but co-exist to constitute two fundamental types of motion processing. Segmentation is achieved by discriminating motion signals from different objects, which is thought to be due to the antagonistic interaction between center and surrounding RFs. Integration is obtained by utilizing the enhancing function of surrounding areas to CRF areas. Both types of processing have been found in motion related areas including area MT and MST. Tadin, et al. [18] have found that motion segmentation dominants at high stimulus contrast and gives the way to motion integration at low stimulus contrast. Huang, et al. [19] suggests that the surrounding modulation is adaptive according to the visual stimulus such as contrasts and noise levels. Since our proposed opponency model determines the functional role of neurons by only the phase shift parameter, this makes the proposed model to be an ideal candidate model for the adaptive surrounding modulation with the phase tuning between two spatial regions. References [1]. K. Nakayama and J. M. Loomis, “Optical velocity patterns, velocity-sensitive neurons, and space perception: A hypothesis,” Perception, vol. 3, 63-80, 1974. [2]. K. Tanaka, K. Hikosaka, H. Saito, M. Yukie, Y. Fukada and E. Iwai, “Analysis of local and wide-field movements in the superior temporal visual areas of the macaque monkey,” Journal of Neuroscience, vol. 6, pp. 134-144, 1986. [3]. R. T. Born and R. B. H. Tootell, “Segregation of global and local motion processing in primate middle temporal visual area,” Nature, vol. 357, pp. 497-499, 1992. [4]. J. Allman, F. Miezin and E. McGuinness, “Stimulus specific responses from beyond the classical receptive field: Neurophysiological mechanisms for local-global comparisions in visual neurons,” Annual Review Neuroscience, vol. 8, pp. 407-430, 1985. [5]. V. A. F. Lamme, “The neurophysiology of figure-ground segregation in primary visual cortex,” Journal of Neuroscience, vol. 15, pp. 1605-1615, 1995. [6]. D. C. Bradley and R. A. Andersen, “Center-surround antagonism based on disparity in primate area MT,” Journal of Neuroscience, vol. 18, pp. 7552-65, 1998. [7]. D. J. Heeger, A. D. Jepson and E. P. Simoncelli, “Recovering observer translation with center-surround operators,” Proc IEEE Workshop on Visual Motion, pp. 95-100, Oct 1991. [8]. N. Petkov and E. Subramanian, “Motion detection, noise reduction, texture suppression, and contour enhancement by spatiotemporal Gabor filters with surround inhibition,” Biological Cybernetics, vol. 97, pp. 423-439, 2007. [9]. M. Escobar and P. Kornprobst, “Action recognition with a Bio-inspired feedforward motion processing model: the richness of center-surround interactions,” ECCV '08: Proceedings of the 10th European Conference on Computer Vision, pp. 186-199, Marseille, France, 2008. [10]. B. J. Frost and K. Nakayama, “Single visual neurons code opposing motion independent of direction,” Science, vol. 200, pp. 744-745, 1983. [11]. A. Cao and P. H. Schiller, “Neural responses to relative speed in the primary visual cortex of rhesus monkey,” Visual Neuroscience, vol. 20, pp. 77-84, 2003. [12]. B. K. Dellen, J. W. Clark and R. Wessel, “Computing relative motion with complex cells,” Visual Neuroscience, vol. 22, pp. 225-236, 2005. [13]. I. Ohzawa, G. C. Deangelis and R. D. Freeman, “Encoding of binocular disparity by complex cells in the cat’s visual cortex,” Journal of Neurophysiology, vol. 77, pp. 2879-2909, 1997. [14]. D. J. Fleet, H. Wagner and D. J. Heeger, “Neural Encoding of binocular disparity: energy model, position shifts and phase shifts,” Vision Research, vol. 26, pp. 1839-1857, 1996. [15]. Y. C. Meng and B. E. Shi, “Normalized Phase Shift Motion Energy Neuron Populations for Image Velocity Estimation,” International Joint Conference on Neural Network, Atlanta, GA, June 14-19, 2009. [16]. E. H. Adelson and J. R. Bergen, “Spatiotemporal energy models for the perception of motion,” J. Opt. Soc. Am. A Opt. Image Sci. Vis., vol. 2, pp. 284-299, 1985. [17]. D. K. Xiao, S. Raiguel, V. Marcar, J. Koenderink and G. A. Orban, “The spatial distribution of the antagonistic surround of MT/V5,” Cereb Cortex, vol. 7, pp. 662-677, 1997. [18]. D. Tadin, J. S. Lappin, L. A. Gilroy and R. Blake, “Perceptual consequences of centre-surround antagonism in visual motion processing,” Nature, vol. 424, pp. 312-315, 2003. [19]. X. Huang, T. D. Albright and G. R. Stoner, “Adaptive surround modulation in cortical area MT,” Neuron, vol. 53, pp. 761-770, 2007.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 hk Abstract We extend the concept of phase tuning, a ubiquitous mechanism among sensory neurons including motion and disparity selective neurons, to the motion contrast detection. [sent-3, score-1.893]

2 We demonstrate that the motion contrast can be detected by phase shifts between motion neuronal responses in different spatial regions. [sent-4, score-1.854]

3 By constructing the differential motion opponency in response to motions in two different spatial regions, varying motion contrasts can be detected, where similar motion is detected by zero phase shifts and differences in motion by non-zero phase shifts. [sent-5, score-3.762]

4 The model can exhibit either enhancement or suppression of responses by either different or similar motion in the surrounding. [sent-6, score-0.916]

5 A primary advantage of the model is that the responses are selective to relative motion instead of absolute motion, which could model neurons found in neurophysiological experiments responsible for motion pop-out detection. [sent-7, score-1.591]

6 1 In trod u ction Motion discontinuity or motion contrast is an important cue for the pop-out of salient moving objects from contextual backgrounds. [sent-8, score-0.672]

7 Although the neural mechanism underlying the motion pop-out detection is still unknown, the center-surround receptive field (RF) organization is considered as a physiological basis responsible for the pop-out detection. [sent-9, score-0.885]

8 The center-surround RF structure is simple and ubiquitous in cortical cells especially in neurons processing motion and color information. [sent-10, score-0.881]

9 Nakayama and Loomis [1] have predicted the existence of motion selective neurons with antagonistic center-surround receptive field organization in 1974. [sent-11, score-1.018]

10 Recent physiological experiments [2][3] show that neurons with center-surround RFs have been found in both middle temporal (MT) and medial superior temporal (MST) areas related to motion processing. [sent-12, score-0.977]

11 This antagonistic mechanism has been suggested to detect motion segmentation [4], figure/ground segregation [5] and the differentiation of object motion from ego-motion [6]. [sent-13, score-1.446]

12 There are many related works [7]-[12] on motion pop-out detection. [sent-14, score-0.636]

13 Some works [7]-[9] are based on spatio-temporal filtering outputs, but motion neurons are not fully interacted by either only inhibiting similar motion [7] or only enhancing opposite motion [8]. [sent-15, score-2.092]

14 With respect to the surrounding speed effects, the neuronal responses are suppressed by the same speed with the center motion but not enhanced by other speeds. [sent-19, score-1.135]

15 Similar problem existed in [8], which only modeled the suppression of neuronal responses in the classical receptive field (CRF) by similar motions in surrounding regions. [sent-20, score-0.62]

16 Physiological experiments [10][11] show that many neurons in visual cortex are sensitive to the motion contrast rather than depend upon the absolute direction and speed of the object motion. [sent-21, score-0.9]

17 Although pooling over motion neurons tuned to different velocities can eliminate the dependence upon absolute velocities, it is computationally inefficient and still can't give full interactions of both suppression and enhancement by similar and opposite surrounding motions. [sent-22, score-1.308]

18 [12] computed differential motion responses directly from complex cells in V1 and didn't utilize responses from direction selective neurons. [sent-24, score-1.215]

19 In this paper, we propose an opponency model which directly responds to differential motions by utilizing the phase shift mechanism. [sent-25, score-0.95]

20 Disparity selective neurons in the visual cortex have been found to detect disparities by adjusting the phase shift between the receptive field organizations in the left and right eyes [13][14]. [sent-27, score-0.791]

21 Motion sensitive cells have been modeled in the similar way as the disparity energy neurons and detect image motions by utilizing the phase shift between the real and imaginary parts of temporal complex valued responses, which are comparable to images to the left and right eyes [15]. [sent-28, score-1.23]

22 Therefore, the differential motion can be modeled by exploring the similarity between images from different spatial regions and from different eyes. [sent-29, score-0.883]

23 Section 2 illustrates the phase shift motion energy neurons which estimate image velocities by the phase tuning in the imaginary path of the temporal receptive field responses. [sent-31, score-2.046]

24 In section 3, we extend the concept of phase tuning to the construction of differential motion opponency. [sent-32, score-1.057]

25 The phase difference determines the preferred velocity difference between adjacent areas in retinal images. [sent-33, score-0.557]

26 Section 4 investigates properties of motion pop-out detection by the proposed motion opponency model. [sent-34, score-1.566]

27 Finally, in section 5, we relate our proposed model to the neural mechanism of motion integration and motion segmentation in motion related areas and suggest a possible interpretation for adaptive center-surround interactions observed in biological experiments. [sent-35, score-2.04]

28 2 Phase Shift Motion Energy Neurons Adelson and Bergen [16] proposed the motion energy model for visual motion perception by measuring spatio-temporal orientations of image sequences in space and time. [sent-36, score-1.515]

29 The motion energy model posits that the responses of direction-selective V1 complex cells can be computed by a combination of two linear spatio-temporal filtering stages, followed by squaring and summation. [sent-37, score-1.061]

30 The motion energy model was extended in [15] to be phase tuned by splitting the complex valued temporal responses into real and imaginary paths and adding a phase shift on the imaginary path. [sent-38, score-1.889]

31 Figure 1(a) demonstrates the schematic diagram of the phase shift motion energy model. [sent-39, score-1.238]

32 The separable spatio-temporal receptive field ensures the cascade implementation of RF with spatial and temporal filters. [sent-41, score-0.287]

33 Due to the requirement of the causal temporal RF, the phase shift motion energy model didn’t adopt the Gabor filter like the spatial RF. [sent-42, score-1.366]

34 (a) shows the diagram of the phase shift motion energy model adapted from [15]. [sent-45, score-1.238]

35 (b) draws the spatiotemporal representation of the phase shift motion energy neuron with the real and imaginary receptive field demonstrated by the two left pictures. [sent-46, score-1.425]

36 (c) illustrates the construction of differential motion opponency with a phase difference Θ from two populations of phase shift motion energy neurons in two spatial areas c and s. [sent-47, score-2.798]

37 To avoid clutter, the space location (x, y) is not explicitly shown in phase tuned motion energies. [sent-48, score-0.94]

38 Figure 1(b) demonstrates the even and odd profiles of the spatio-temporal RF tuned to a particular phase shift. [sent-54, score-0.33]

39 Two types of differential motion opponency constructions of (a) center-surrounding interaction and (b) left-right interaction. [sent-56, score-1.079]

40 Among cells in area MT with surrounding modulations, 25% of cells are with the antagonistic RF structure in the top row and another 50% of cells have the integrative RF structure as shown in the bottom row. [sent-57, score-0.483]

41 3 Extending Phase Op p on ency Mechanism to D i f f e r e nt i a l Motion Based on the above phase shift motion energy model, the local image velocity at each spatial location can be represented by a phase shift which leads to the peak response across a population of motion energy neurons. [sent-58, score-2.741]

42 The motion discontinuities can be detected by edge detectors on the velocity map to segment different motions. [sent-60, score-0.865]

43 However, this algorithm for motion discontinuities detection can’t discriminate between the object motion and uniform motions in contextual backgrounds. [sent-61, score-1.418]

44 Here we propose a phase mechanism to detect differential motions inspired by the disparity energy model and adopt the center-surround inhibition mechanism to pop out the object motion from contextual background motions. [sent-62, score-1.555]

45 The motion differences between different spatial locations can be modeled in the similar way as the disparity model. [sent-63, score-0.85]

46 The motion energies from two neighboring locations are considered as the retinal images to the left and right eyes. [sent-64, score-0.7]

47 (8), the differential motion energy can be reformulated as EΔv ( Θ ) = K c + e jΘ K s 2 (10) 3 3 3 2 2 2 1 1 1 0 0 -1 -1 -2 -2 -2 -3 -3 -3 -3 -3 1 Right Velocity Right Velocity 0. [sent-69, score-0.917]

48 The two patches of stimuli are statistically independent but share the same spatial properties: dot size of 2 pixels, dot density of 10% and dot coherence level of 100%. [sent-81, score-0.314]

49 The phase tuned population of motion energy neurons are applied to each patch of random dots with RF parameters: Ωt = 2π/8, Ωt = 2π/16, σx = 5 and τ = 5. [sent-82, score-1.328]

50 For each combination of velocities from left and right patches, averaged phase shifts over space and time are computed and so do the magnitudes of peak responses. [sent-84, score-0.435]

51 where Kc = ∫∫∫ E ( x, y, Φ ) exp ( jΦ ) w ( x, y ) dxdyd Φ v,c c x , y ,Φ Ks = ∫∫∫ E ( x, y, Φ ) exp ( jΦ ) w ( x, y ) dxdyd Φ v,s (11) s x, y ,Φ Ev ,c ( x, y, Φ ) and Ev , s ( x, y, Φ ) are phase shift motion energies at location (x, y) and with phase shift Φ. [sent-86, score-1.557]

52 The schematic diagram of this opponency is illustrated in Figure 1(c). [sent-91, score-0.32]

53 The differential motion opponency is constituted by three stages. [sent-92, score-1.04]

54 At the first stage, a population of phase shift motion energy neurons is applied to be selective to different velocities. [sent-93, score-1.417]

55 Finally, the differential motion energy is computed by the squared modulus of the summation of the integrated motion signal in region c and phase shifted motion signal in region s. [sent-95, score-2.443]

56 The opponency could also be constructed by the neighboring left and right Inhibitive interaction, Θ = π/2 Excitatory interaction, Θ =0 Inhibitory 2 1. [sent-97, score-0.294]

57 Demonstrations of center-surround differential motion opponency, where (a) show the excitation of opposite directions outside the CRF and (b) show the inhibition by surrounding motions in same directions. [sent-116, score-1.146]

58 Figure 2 shows two types of structures for the differential motion opponency. [sent-122, score-0.746]

59 In [17], the authors demonstrates that among cells in area MT with surrounding modulations, 25% of cells are with the antagonistic RF structure as shown in Figure 2(a) and another 50% of cells have the integrative RF structure as shown in Figure 2(b). [sent-123, score-0.483]

60 The velocity difference tuning of the opponency is determined by the phase shift parameter Θ combined with parameters of spatial and temporal frequencies for motion energy neurons. [sent-124, score-1.908]

61 The larger phase shift magnitude prefers the bigger velocity difference. [sent-125, score-0.573]

62 This phase tuning of velocity difference is consistent with the phase tuning of motion energy neurons. [sent-126, score-1.62]

63 Figure 3 shows the phase map obtained by using random dots stimuli with different velocities on two spatial patches (left and right patches with sizes of 128 pixels 128 pixels). [sent-127, score-0.744]

64 Along the diagonal line, velocities from left and right patches are equal to each other and therefore phase estimates are zeros along this line. [sent-128, score-0.447]

65 Deviated from the diagonal line to upper-left and lower-right, the phase magnitudes increase while positive phases indicate larger left velocities and negative phases indicate larger right velocities. [sent-129, score-0.4]

66 The phase tuning can give a good classification of velocity differences. [sent-130, score-0.479]

67 4 V a l i d a t i o n o f D i f f e r e n t i a l M o t i o n O pp o n e n c y Out derivation and analysis above show that the phase shift between two neighboring spatial regions is a good indicator for motion difference between these two regions. [sent-131, score-1.201]

68 In this section, we validate the proposed differential motion opponency by two sets of experiments, which show effects of both surrounding directions and speeds on the center motion. [sent-132, score-1.313]

69 The insensitivity of the proposed opponency model to absolute center and surrounding velocities is demonstrated in (a), where responses are enhanced for all center velocities from -2 to 2 pixels per frame. [sent-150, score-1.055]

70 In both curves, the velocity differences between center and surrounding regions are maintained as a constant of 3 pixels per frame. [sent-155, score-0.453]

71 Physiological experiments [2][3] have demonstrated that the neuronal activities in the classical receptive field are suppressed by responses outside the CRF to stimuli with similar motions including both directions and speeds on the center and surrounding regions. [sent-156, score-0.766]

72 On the contrary, visual stimuli of opposite directions or quite different speeds outside the CRF enhance the responses in the CRF. [sent-157, score-0.344]

73 We tested the properties of the proposed opponency model for motion difference measurement by using similar random dots stimuli. [sent-159, score-1.014]

74 The random dots on background move with different speeds and in different direction but have the same statistical parameters: dot size of 2 pixels, dot density of 10% and motion coherence level of 100%. [sent-160, score-0.855]

75 The small random dots patches are placed on the center of background stimuli to stimulate the neurons in the CRF. [sent-161, score-0.333]

76 Figure 4 shows results for the enhanced and suppressed responses in the CRF with varying surrounding directions. [sent-163, score-0.36]

77 The phase shift motion energy neurons had the same spatial and temporal frequencies and the same receptive field sizes, and were selective to vertical orientations. [sent-164, score-1.681]

78 As shown in Figure 4 (a) and (b), the surrounding motion of opposite direction gives the largest response to the motion in the CRF for the inhibitory interaction and the smallest response for the excitatory interaction. [sent-171, score-1.702]

79 In Born’s paper, inhibitory cells show response enhancement and excitatory cells show response suppression when surrounding motions are in opposite directions. [sent-173, score-0.798]

80 The 3-dB bandwidth for the surrounding moving direction is about 135 degrees for the physiological experiments while the bandwidth is about 180 degrees for the simulation results in our proposed model. [sent-174, score-0.329]

81 The inhibition term is computed by weighted motion energy Sv ,θ ( x, y, t ) = Ev ,θ ( x, y, t ) ∗ wv ,θ ( x, y, t ) (15) where wv ,θ ( x, y, t ) is the surround weighting function. [sent-179, score-0.91]

82 As shown in Figure 4 (c) and (d) for results of Petkov’s and Heeger’s models, we replace the conventional frequency tuned motion energy neuron with our proposed phase tuned neuron. [sent-181, score-1.161]

83 To evaluate the sensitivity of the proposed opponency model to velocity differences, we did simulations by using similar stimuli with the above experiment in Figure 4 but maintaining a constant velocity difference of 3 pixels per frame between the center and surrounding random dot patches. [sent-186, score-0.984]

84 As shown in Figure 5, by varying the velocities of random dots on the center region, we found that responses by the proposed model are always enhanced independent upon absolute velocities of center stimuli, but responses by the Heeger’s model achieve the enhancement at a center velocity of 1. [sent-187, score-1.064]

85 5 D i s c u s s i on We proposed a new biologically plausible model of the differential motion opponency to model the spatial interaction property of motion energy neurons. [sent-189, score-1.987]

86 The proposed opponency model is motivated by the phase tuning mechanism of disparity energy neurons which infers the disparity information from the phase difference between complex valued responses to left and right retinal images. [sent-190, score-1.694]

87 Hence, the two neighboring spatial areas can be considered as left and right images and the motion difference between these two spatial regions is detected by the phase difference between the complex valued responses at these two regions. [sent-191, score-1.443]

88 Our experimental results demonstrate a consistent conclusion with physiological experiments that motions of opposite directions and different speeds outside the CRF can show both inhibitive and excitatory effects on the CRF responses. [sent-192, score-0.391]

89 Except providing a unifying phase mechanism in understanding neurons with different functional roles at different brain areas, the proposed opponency model could possibly provide a way to understand the motion integration and motion segmentation. [sent-194, score-2.026]

90 Integration and segmentation are two opposite motion perception tasks but co-exist to constitute two fundamental types of motion processing. [sent-195, score-1.375]

91 Segmentation is achieved by discriminating motion signals from different objects, which is thought to be due to the antagonistic interaction between center and surrounding RFs. [sent-196, score-0.943]

92 Both types of processing have been found in motion related areas including area MT and MST. [sent-198, score-0.671]

93 [18] have found that motion segmentation dominants at high stimulus contrast and gives the way to motion integration at low stimulus contrast. [sent-200, score-1.32]

94 Since our proposed opponency model determines the functional role of neurons by only the phase shift parameter, this makes the proposed model to be an ideal candidate model for the adaptive surrounding modulation with the phase tuning between two spatial regions. [sent-203, score-1.416]

95 Tootell, “Segregation of global and local motion processing in primate middle temporal visual area,” Nature, vol. [sent-226, score-0.733]

96 Kornprobst, “Action recognition with a Bio-inspired feedforward motion processing model: the richness of center-surround interactions,” ECCV '08: Proceedings of the 10th European Conference on Computer Vision, pp. [sent-269, score-0.636]

97 Nakayama, “Single visual neurons code opposing motion independent of direction,” Science, vol. [sent-275, score-0.813]

98 Wessel, “Computing relative motion with complex cells,” Visual Neuroscience, vol. [sent-291, score-0.664]

99 Heeger, “Neural Encoding of binocular disparity: energy model, position shifts and phase shifts,” Vision Research, vol. [sent-309, score-0.46]

100 Blake, “Perceptual consequences of centre-surround antagonism in visual motion processing,” Nature, vol. [sent-351, score-0.706]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('motion', 0.636), ('opponency', 0.294), ('phase', 0.254), ('energy', 0.171), ('velocity', 0.168), ('shift', 0.151), ('surrounding', 0.149), ('velocities', 0.146), ('responses', 0.137), ('heeger', 0.134), ('neurons', 0.133), ('motions', 0.115), ('disparity', 0.113), ('rf', 0.113), ('differential', 0.11), ('ev', 0.106), ('spatial', 0.101), ('petkov', 0.09), ('cells', 0.089), ('enhancement', 0.082), ('receptive', 0.068), ('physiological', 0.067), ('antagonistic', 0.067), ('field', 0.065), ('inhibitory', 0.064), ('crf', 0.062), ('dots', 0.061), ('suppression', 0.061), ('inhibition', 0.058), ('imaginary', 0.058), ('tuning', 0.057), ('temporal', 0.053), ('center', 0.052), ('opposite', 0.051), ('surrouding', 0.051), ('vimag', 0.051), ('vreal', 0.051), ('tuned', 0.05), ('mechanism', 0.049), ('selective', 0.049), ('pixels', 0.048), ('excitatory', 0.048), ('patches', 0.047), ('speeds', 0.045), ('surround', 0.045), ('visual', 0.044), ('ws', 0.043), ('suppressed', 0.043), ('dot', 0.042), ('wc', 0.041), ('stimuli', 0.04), ('valued', 0.039), ('interaction', 0.039), ('dxdyd', 0.038), ('himag', 0.038), ('hreal', 0.038), ('inhibitive', 0.038), ('opp', 0.038), ('popp', 0.038), ('sopp', 0.038), ('mt', 0.037), ('regions', 0.036), ('moving', 0.036), ('areas', 0.035), ('shifts', 0.035), ('energies', 0.035), ('nakayama', 0.034), ('segregation', 0.034), ('enhanced', 0.031), ('discontinuities', 0.031), ('speed', 0.031), ('detected', 0.03), ('direction', 0.029), ('retinal', 0.029), ('perception', 0.028), ('complex', 0.028), ('cortex', 0.027), ('directions', 0.027), ('utilizing', 0.026), ('diagram', 0.026), ('antagonism', 0.026), ('born', 0.026), ('dellen', 0.026), ('loomis', 0.026), ('profiles', 0.026), ('tadin', 0.026), ('vrealvimag', 0.026), ('response', 0.025), ('neuronal', 0.025), ('preferred', 0.025), ('integration', 0.024), ('segmentation', 0.024), ('neuroscience', 0.024), ('bandwidth', 0.024), ('modulation', 0.023), ('ks', 0.023), ('ubiquitous', 0.023), ('difference', 0.023), ('population', 0.023), ('spatiotemporal', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 88 nips-2009-Extending Phase Mechanism to Differential Motion Opponency for Motion Pop-out

Author: Yicong Meng, Bertram E. Shi

Abstract: We extend the concept of phase tuning, a ubiquitous mechanism among sensory neurons including motion and disparity selective neurons, to the motion contrast detection. We demonstrate that the motion contrast can be detected by phase shifts between motion neuronal responses in different spatial regions. By constructing the differential motion opponency in response to motions in two different spatial regions, varying motion contrasts can be detected, where similar motion is detected by zero phase shifts and differences in motion by non-zero phase shifts. The model can exhibit either enhancement or suppression of responses by either different or similar motion in the surrounding. A primary advantage of the model is that the responses are selective to relative motion instead of absolute motion, which could model neurons found in neurophysiological experiments responsible for motion pop-out detection. 1 In trod u ction Motion discontinuity or motion contrast is an important cue for the pop-out of salient moving objects from contextual backgrounds. Although the neural mechanism underlying the motion pop-out detection is still unknown, the center-surround receptive field (RF) organization is considered as a physiological basis responsible for the pop-out detection. The center-surround RF structure is simple and ubiquitous in cortical cells especially in neurons processing motion and color information. Nakayama and Loomis [1] have predicted the existence of motion selective neurons with antagonistic center-surround receptive field organization in 1974. Recent physiological experiments [2][3] show that neurons with center-surround RFs have been found in both middle temporal (MT) and medial superior temporal (MST) areas related to motion processing. This antagonistic mechanism has been suggested to detect motion segmentation [4], figure/ground segregation [5] and the differentiation of object motion from ego-motion [6]. There are many related works [7]-[12] on motion pop-out detection. Some works [7]-[9] are based on spatio-temporal filtering outputs, but motion neurons are not fully interacted by either only inhibiting similar motion [7] or only enhancing opposite motion [8]. Heeger, et al. [7] proposed a center-surround operator to eliminate the response dependence upon rotational motions. But the Heeger's model only shows a complete center-surround interaction for moving directions. With respect to the surrounding speed effects, the neuronal responses are suppressed by the same speed with the center motion but not enhanced by other speeds. Similar problem existed in [8], which only modeled the suppression of neuronal responses in the classical receptive field (CRF) by similar motions in surrounding regions. Physiological experiments [10][11] show that many neurons in visual cortex are sensitive to the motion contrast rather than depend upon the absolute direction and speed of the object motion. Although pooling over motion neurons tuned to different velocities can eliminate the dependence upon absolute velocities, it is computationally inefficient and still can't give full interactions of both suppression and enhancement by similar and opposite surrounding motions. The model proposed by Dellen, et al. [12] computed differential motion responses directly from complex cells in V1 and didn't utilize responses from direction selective neurons. In this paper, we propose an opponency model which directly responds to differential motions by utilizing the phase shift mechanism. Phase tuning is a ubiquitous mechanism in sensory information processing, including motion, disparity and depth detection. Disparity selective neurons in the visual cortex have been found to detect disparities by adjusting the phase shift between the receptive field organizations in the left and right eyes [13][14]. Motion sensitive cells have been modeled in the similar way as the disparity energy neurons and detect image motions by utilizing the phase shift between the real and imaginary parts of temporal complex valued responses, which are comparable to images to the left and right eyes [15]. Therefore, the differential motion can be modeled by exploring the similarity between images from different spatial regions and from different eyes. The remainder of this paper is organized as following. Section 2 illustrates the phase shift motion energy neurons which estimate image velocities by the phase tuning in the imaginary path of the temporal receptive field responses. In section 3, we extend the concept of phase tuning to the construction of differential motion opponency. The phase difference determines the preferred velocity difference between adjacent areas in retinal images. Section 4 investigates properties of motion pop-out detection by the proposed motion opponency model. Finally, in section 5, we relate our proposed model to the neural mechanism of motion integration and motion segmentation in motion related areas and suggest a possible interpretation for adaptive center-surround interactions observed in biological experiments. 2 Phase Shift Motion Energy Neurons Adelson and Bergen [16] proposed the motion energy model for visual motion perception by measuring spatio-temporal orientations of image sequences in space and time. The motion energy model posits that the responses of direction-selective V1 complex cells can be computed by a combination of two linear spatio-temporal filtering stages, followed by squaring and summation. The motion energy model was extended in [15] to be phase tuned by splitting the complex valued temporal responses into real and imaginary paths and adding a phase shift on the imaginary path. Figure 1(a) demonstrates the schematic diagram of the phase shift motion energy model. Here we assume an input image sequence in two-dimensional space (x, y) and time t. The separable spatio-temporal receptive field ensures the cascade implementation of RF with spatial and temporal filters. Due to the requirement of the causal temporal RF, the phase shift motion energy model didn’t adopt the Gabor filter like the spatial RF. The phase shift spatio-temporal RF is modeled with a complex valued function f ( x, y, t ) = g ( x, y ) ⋅ h ( t , Φ ) , where the spatial and temporal RFs are denoted by g ( x, y ) and h ( t , Φ ) respectively, g ( x, y ) = N ( x, y | 0, C ) exp ( jΩ x x + jΩ y y ) h ( t , Φ ) = hreal ( t ) + exp ( jΦ ) himag ( t ) (1) and C is the covariance matrix of the spatial Gaussian envelope and Φ is the phase tuning of the motion energy neuron. The real and imaginary profiles of the temporal receptive field are Gamma modulated sinusoidal functions with quadrature phases, hreal ( t ) = G ( t | α ,τ ) cos ( Ωt t ) (2) himag ( t ) = G ( t | α ,τ ) sin ( Ωt t ) The envelopes for complex exponentials are functions of Gaussian and Gamma distributions, N ( x, y | 0, C ) = ⎛ x2 y2 exp ⎜ − 2 − 2 ⎜ 2σ x 2σ y 2πσ xσ y ⎝ 1 ⎞ ⎟ ⎟ ⎠ (3) hreal (t ) g ( x, y ) himag (t ) g ( x, y ) (·)2 (·)2 M M M (·)2 (·)2 M M M 2 (·) 2 (·) Vreal V (Φ ) e jΦ Vimag (a) Ev ( Φ max ) (·)2 wc ( x, y ) e jΦmin Ev ( (b) M 0 ) w ( x, y ) c M Ev ( Φ min ) M EΔv ( Θ ) ∫∫∫ K x , y ,Φ e j0 e jΦmin ws ( x, y ) Ks c e ∫∫∫ jΘ ws ( x, y ) e j0 x , y ,Φ ws ( x, y ) wc ( x, y ) M e jΦ max e jΦmax (·)2 M (·)2 M M (·)2 M 2 (·) M M (·)2 (c) Figure 1. (a) shows the diagram of the phase shift motion energy model adapted from [15]. (b) draws the spatiotemporal representation of the phase shift motion energy neuron with the real and imaginary receptive field demonstrated by the two left pictures. (c) illustrates the construction of differential motion opponency with a phase difference Θ from two populations of phase shift motion energy neurons in two spatial areas c and s. To avoid clutter, the space location (x, y) is not explicitly shown in phase tuned motion energies. G (t | α ,τ ) = 1 ⎛ t t α −1 exp ⎜ − Γ(α )τ α ⎝ τ ⎞ ⎟ u (t ) ⎠ (4) where Γ (α ) is the gamma function and u ( t ) is the unit step function. The parameters α and τ determine the temporal RF size. As derived in [15], the motion energy at location (x, y) can be computed by E v ( x, y, Φ ) = S + P cos ( Ψ − Φ ) (5) where S = Vreal 2 + Vimag 2 * P = 2 VrealVimag ( * Ψ = arg VrealVimag (6) ) and complex valued responses in real and imaginary paths are obtained as, Vreal ( x, y, t ) = ∫∫∫ g (ξ , ζ ) h (η ) I ( x − ξ , y − ζ , t − η ) dξ dζ dη real ξ ,ζ ,η Vimag ( x, y, t ) = ∫∫∫ g (ξ , ζ ) h (η ) I ( x − ξ , y − ζ , t − η ) dξ dζ dη ξ ζ η (7) imag , , The superscript * represents the complex conjugation and the phase shift parameter Φ controls the spatio-temporal orientation tuning. To avoid clutter, the spatial location variables x and y for S, P, Ψ, Vreal and Vimag are not explicitly shown in Eq. (5) and (6). Figure 1(b) demonstrates the even and odd profiles of the spatio-temporal RF tuned to a particular phase shift. Θ 0 Θ 0 (a) (b) Figure 2. Two types of differential motion opponency constructions of (a) center-surrounding interaction and (b) left-right interaction. Among cells in area MT with surrounding modulations, 25% of cells are with the antagonistic RF structure in the top row and another 50% of cells have the integrative RF structure as shown in the bottom row. 3 Extending Phase Op p on ency Mechanism to D i f f e r e nt i a l Motion Based on the above phase shift motion energy model, the local image velocity at each spatial location can be represented by a phase shift which leads to the peak response across a population of motion energy neurons. Across regions of different motions, there are clear discontinuities on the estimated velocity map. The motion discontinuities can be detected by edge detectors on the velocity map to segment different motions. However, this algorithm for motion discontinuities detection can’t discriminate between the object motion and uniform motions in contextual backgrounds. Here we propose a phase mechanism to detect differential motions inspired by the disparity energy model and adopt the center-surround inhibition mechanism to pop out the object motion from contextual background motions. The motion differences between different spatial locations can be modeled in the similar way as the disparity model. The motion energies from two neighboring locations are considered as the retinal images to the left and right eyes. Thus, we can construct a differential motion opponency by placing two populations of phase shift motion energy neurons at different spatial locations and the energy EΔv ( Θ ) of the opponency is the squared modulus of the averaged phase shift motion energies over space and phase, E Δv ( Θ ) = ∫∫∫ E ( x, y, Φ ) ⋅ w ( x, y, Φ | Θ ) dxdyd Φ v 2 (8) where w ( x, y, Θ ) is the profile for differential motion opponency and Δv is the velocity difference between the two spatial regions defined by the kernel w ( x, y, Θ ) . Since w ( x, y, Θ ) is intended to implement the functional role of spatial interactions, it is desired to be a separable function in space and phase domain and can be modeled by phase tuned summation of two spatial kernels, w ( x, y, Φ | Θ ) = wc ( x, y ) e jΦ + e jΘ+ jΦ ws ( x, y ) (9) where wc ( x, y ) and ws ( x, y ) are Gaussian kernels of different spatial sizes σ c and σ s , and Θ is the phase difference representing velocity difference between two spatial regions c and s. Substituting Eq. (9) into Eq. (8), the differential motion energy can be reformulated as EΔv ( Θ ) = K c + e jΘ K s 2 (10) 3 3 3 2 2 2 1 1 1 0 0 -1 -1 -2 -2 -2 -3 -3 -3 -3 -3 1 Right Velocity Right Velocity 0.98 0.96 0.94 0.92 0 0.9 0.88 -1 0.86 0.84 0.82 -2 -1 0 1 Left Velocity 2 3 -2 -1 0 1 Left Velocity 2 3 0.8 (a) (b) Figure 3. (a) Phase map and (b) peak magnitude map are obtained from stimuli of two patches of random dots moving with different velocities. The two patches of stimuli are statistically independent but share the same spatial properties: dot size of 2 pixels, dot density of 10% and dot coherence level of 100%. The phase tuned population of motion energy neurons are applied to each patch of random dots with RF parameters: Ωt = 2π/8, Ωt = 2π/16, σx = 5 and τ = 5.5. For each combination of velocities from left and right patches, averaged phase shifts over space and time are computed and so do the magnitudes of peak responses. The unit for velocities is pixels per frame. where Kc = ∫∫∫ E ( x, y, Φ ) exp ( jΦ ) w ( x, y ) dxdyd Φ v,c c x , y ,Φ Ks = ∫∫∫ E ( x, y, Φ ) exp ( jΦ ) w ( x, y ) dxdyd Φ v,s (11) s x, y ,Φ Ev ,c ( x, y, Φ ) and Ev , s ( x, y, Φ ) are phase shift motion energies at location (x, y) and with phase shift Φ. Utilizing the results in Eq. (5) and (6), Eq. (10) and (11) generate similar results, E Δv ( Θ ) = Sopp + Popp cos ( Θopp − Θ ) (12) where Sopp = K c 2 + Ks Popp = 2 K c K s* 2 (13) Θopp = arg ( K c K s* ) According to above derivations, by varying the phase shift Θ between –π and π, the relative motion energy of the differential motion opponency can be modeled as population responses across a population of phase tuned motion opponencies. The response is completely specified by three parameters Sopp , Popp and Θopp . The schematic diagram of this opponency is illustrated in Figure 1(c). The differential motion opponency is constituted by three stages. At the first stage, a population of phase shift motion energy neurons is applied to be selective to different velocities. At the second stage, motion energies from the first stage are weighted by kernels tuned to different spatial locations and phase shifts respectively for both spatial regions and two single differential motion signals in region c and region s are achieved by integrating responses from these two regions over space and phase tuning. Finally, the differential motion energy is computed by the squared modulus of the summation of the integrated motion signal in region c and phase shifted motion signal in region s. The subscripts c and s represent two interacted spatial regions which are not limited to the center and surround regions. The opponency could also be constructed by the neighboring left and right Inhibitive interaction, Θ = π/2 Excitatory interaction, Θ =0 Inhibitory 2 1.6 Responses 1.6 Responses Excitatory 2 1.2 0.8 1.2 0.8 0.4 0.4 0 0 pi/2 pi 3pi/2 Surrouding Direction 0 0 2pi (a) Model by Petkov et al. [8] pi/2 pi 3pi/2 Surrouding Direction (b) Model by Heeger et al. [7] Inhibitory 2 2pi Inhibitory 2 1.6 1.6 Responses Responses 1.2 0.8 1.2 0.8 0.4 0.4 0 0 0 pi/2 pi Surrouding Direction 3pi/2 2pi 0 pi/2 pi Surrouding Direction 3pi/2 2pi (c) (d) Figure 4. Demonstrations of center-surround differential motion opponency, where (a) show the excitation of opposite directions outside the CRF and (b) show the inhibition by surrounding motions in same directions. The center-surround inhibition models by Petkov, et al. [8] and Heeger, et al. [7] are shown in (c) and (d). Responses above 1 indicate enhancement and responses below 1 indicate suppressions. spatial regions. Figure 2 shows two types of structures for the differential motion opponency. In [17], the authors demonstrates that among cells in area MT with surrounding modulations, 25% of cells are with the antagonistic RF structure as shown in Figure 2(a) and another 50% of cells have the integrative RF structure as shown in Figure 2(b). The velocity difference tuning of the opponency is determined by the phase shift parameter Θ combined with parameters of spatial and temporal frequencies for motion energy neurons. The larger phase shift magnitude prefers the bigger velocity difference. This phase tuning of velocity difference is consistent with the phase tuning of motion energy neurons. Figure 3 shows the phase map obtained by using random dots stimuli with different velocities on two spatial patches (left and right patches with sizes of 128 pixels 128 pixels). Along the diagonal line, velocities from left and right patches are equal to each other and therefore phase estimates are zeros along this line. Deviated from the diagonal line to upper-left and lower-right, the phase magnitudes increase while positive phases indicate larger left velocities and negative phases indicate larger right velocities. The phase tuning can give a good classification of velocity differences. 4 V a l i d a t i o n o f D i f f e r e n t i a l M o t i o n O pp o n e n c y Out derivation and analysis above show that the phase shift between two neighboring spatial regions is a good indicator for motion difference between these two regions. In this section, we validate the proposed differential motion opponency by two sets of experiments, which show effects of both surrounding directions and speeds on the center motion. Inhibitory 2 1.6 1.2 1.2 Responses 1.6 Responses Inhibitory 2 0.8 0.4 0.4 0 -2 0.8 0 -1.5 -1 -0.5 0 0.5 Center Speed 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 Center Speed 1 1.5 2 (a) (b) Figure 5. The insensitivity of the proposed opponency model to absolute center and surrounding velocities is demonstrated in (a), where responses are enhanced for all center velocities from -2 to 2 pixels per frame. In (b), the model by Heeger, et al. [7] only shows enhancement when the center speed matches the preferred speed of 1.2 pixel per frame. Similarly, responses above 1 indicate enhancement and below 1 indicate suppressions. In both curves, the velocity differences between center and surrounding regions are maintained as a constant of 3 pixels per frame. Physiological experiments [2][3] have demonstrated that the neuronal activities in the classical receptive field are suppressed by responses outside the CRF to stimuli with similar motions including both directions and speeds on the center and surrounding regions. On the contrary, visual stimuli of opposite directions or quite different speeds outside the CRF enhance the responses in the CRF. In their experiments, they used a set of stimuli of random dots moving at different velocities, where there are small patches of moving random dots on the center. We tested the properties of the proposed opponency model for motion difference measurement by using similar random dots stimuli. The random dots on background move with different speeds and in different direction but have the same statistical parameters: dot size of 2 pixels, dot density of 10% and motion coherence level of 100%. The small random dots patches are placed on the center of background stimuli to stimulate the neurons in the CRF. These small patches share the same statistical parameters with background random dots but move with a constant velocity of 1 pixel per frame. Figure 4 shows results for the enhanced and suppressed responses in the CRF with varying surrounding directions. The phase shift motion energy neurons had the same spatial and temporal frequencies and the same receptive field sizes, and were selective to vertical orientations. The preferred spatial frequency was 2π/16 radian per pixel and the temporal frequency was 2π/16 radian per frame. The sizes of RF in horizontal and vertical directions were respectively 5 pixels and 10 pixels, corresponding to a spatial bandwidth of 1.96 octaves. The time constant τ was 5.5 frames which resulted in a temporal bandwidth of 1.96 octaves. As shown in Figure 4 (a) and (b), the surrounding motion of opposite direction gives the largest response to the motion in the CRF for the inhibitory interaction and the smallest response for the excitatory interaction. Results demonstrated in Figure 4 are consistent with physiological results reported in [3]. In Born’s paper, inhibitory cells show response enhancement and excitatory cells show response suppression when surrounding motions are in opposite directions. The 3-dB bandwidth for the surrounding moving direction is about 135 degrees for the physiological experiments while the bandwidth is about 180 degrees for the simulation results in our proposed model. Models proposed by Petkov, et al. [8] and Heeger, et al. [7] also show clear inhibition between opposite motions. The Petkov’s model achieves the surrounding suppression for each point in ( x, y, t ) space by the subtraction between responses from that point and its surroundings and followed by a half-wave rectification, + % Ev ,θ ( x, y, t ) = Ev ,θ ( x, y, t ) − α ⋅ Sv ,θ ( x, y, t ) (14) where Ev ,θ ( x, y, t ) is the motion energy at location (x,y) and time t for a given preferred speed v and orientation θ, Sv ,θ ( x, y, t ) is the average motion energy in the surrounding of point (x, y, t), % Ev ,θ ( x, y, t ) is the suppressed motion energy and the factor α controls the inhibition strength. The inhibition term is computed by weighted motion energy Sv ,θ ( x, y, t ) = Ev ,θ ( x, y, t ) ∗ wv ,θ ( x, y, t ) (15) where wv ,θ ( x, y, t ) is the surround weighting function. The Heeger’s model constructs the center-surround motion opponent by computing the weighted sum of responses from motion selective cells, Rv ,θ ( t ) = ∑ β ( x, y ) ⎡ Ev ,θ ( x, y, t ) − E− v ,θ ( x, y, t ) ⎤ ⎣ ⎦ (16) x, y where β ( x, y ) is a center-surround weighting function and the motion energy at each point should be normalized across all cells with different tuning properties. As shown in Figure 4 (c) and (d) for results of Petkov’s and Heeger’s models, we replace the conventional frequency tuned motion energy neuron with our proposed phase tuned neuron. The model by Petkov, et al. [8] is generally suppressive and only reproduces less suppression for opposite motions, which is inconsistent with results from [3]. The model by Heeger, et al. [7] has similar properties with our proposed model with respect to both excitatory and inhibitory interactions. To evaluate the sensitivity of the proposed opponency model to velocity differences, we did simulations by using similar stimuli with the above experiment in Figure 4 but maintaining a constant velocity difference of 3 pixels per frame between the center and surrounding random dot patches. As shown in Figure 5, by varying the velocities of random dots on the center region, we found that responses by the proposed model are always enhanced independent upon absolute velocities of center stimuli, but responses by the Heeger’s model achieve the enhancement at a center velocity of 1.2 pixels per frame and maintain suppressed at other speeds. 5 D i s c u s s i on We proposed a new biologically plausible model of the differential motion opponency to model the spatial interaction property of motion energy neurons. The proposed opponency model is motivated by the phase tuning mechanism of disparity energy neurons which infers the disparity information from the phase difference between complex valued responses to left and right retinal images. Hence, the two neighboring spatial areas can be considered as left and right images and the motion difference between these two spatial regions is detected by the phase difference between the complex valued responses at these two regions. Our experimental results demonstrate a consistent conclusion with physiological experiments that motions of opposite directions and different speeds outside the CRF can show both inhibitive and excitatory effects on the CRF responses. The inhibitive interaction helps to segment the moving object from backgrounds when fed back to low-level features such as edges, orientations and color information. Except providing a unifying phase mechanism in understanding neurons with different functional roles at different brain areas, the proposed opponency model could possibly provide a way to understand the motion integration and motion segmentation. Integration and segmentation are two opposite motion perception tasks but co-exist to constitute two fundamental types of motion processing. Segmentation is achieved by discriminating motion signals from different objects, which is thought to be due to the antagonistic interaction between center and surrounding RFs. Integration is obtained by utilizing the enhancing function of surrounding areas to CRF areas. Both types of processing have been found in motion related areas including area MT and MST. Tadin, et al. [18] have found that motion segmentation dominants at high stimulus contrast and gives the way to motion integration at low stimulus contrast. Huang, et al. [19] suggests that the surrounding modulation is adaptive according to the visual stimulus such as contrasts and noise levels. Since our proposed opponency model determines the functional role of neurons by only the phase shift parameter, this makes the proposed model to be an ideal candidate model for the adaptive surrounding modulation with the phase tuning between two spatial regions. References [1]. K. Nakayama and J. M. Loomis, “Optical velocity patterns, velocity-sensitive neurons, and space perception: A hypothesis,” Perception, vol. 3, 63-80, 1974. [2]. K. Tanaka, K. Hikosaka, H. Saito, M. Yukie, Y. Fukada and E. Iwai, “Analysis of local and wide-field movements in the superior temporal visual areas of the macaque monkey,” Journal of Neuroscience, vol. 6, pp. 134-144, 1986. [3]. R. T. Born and R. B. H. Tootell, “Segregation of global and local motion processing in primate middle temporal visual area,” Nature, vol. 357, pp. 497-499, 1992. [4]. J. Allman, F. Miezin and E. McGuinness, “Stimulus specific responses from beyond the classical receptive field: Neurophysiological mechanisms for local-global comparisions in visual neurons,” Annual Review Neuroscience, vol. 8, pp. 407-430, 1985. [5]. V. A. F. Lamme, “The neurophysiology of figure-ground segregation in primary visual cortex,” Journal of Neuroscience, vol. 15, pp. 1605-1615, 1995. [6]. D. C. Bradley and R. A. Andersen, “Center-surround antagonism based on disparity in primate area MT,” Journal of Neuroscience, vol. 18, pp. 7552-65, 1998. [7]. D. J. Heeger, A. D. Jepson and E. P. Simoncelli, “Recovering observer translation with center-surround operators,” Proc IEEE Workshop on Visual Motion, pp. 95-100, Oct 1991. [8]. N. Petkov and E. Subramanian, “Motion detection, noise reduction, texture suppression, and contour enhancement by spatiotemporal Gabor filters with surround inhibition,” Biological Cybernetics, vol. 97, pp. 423-439, 2007. [9]. M. Escobar and P. Kornprobst, “Action recognition with a Bio-inspired feedforward motion processing model: the richness of center-surround interactions,” ECCV '08: Proceedings of the 10th European Conference on Computer Vision, pp. 186-199, Marseille, France, 2008. [10]. B. J. Frost and K. Nakayama, “Single visual neurons code opposing motion independent of direction,” Science, vol. 200, pp. 744-745, 1983. [11]. A. Cao and P. H. Schiller, “Neural responses to relative speed in the primary visual cortex of rhesus monkey,” Visual Neuroscience, vol. 20, pp. 77-84, 2003. [12]. B. K. Dellen, J. W. Clark and R. Wessel, “Computing relative motion with complex cells,” Visual Neuroscience, vol. 22, pp. 225-236, 2005. [13]. I. Ohzawa, G. C. Deangelis and R. D. Freeman, “Encoding of binocular disparity by complex cells in the cat’s visual cortex,” Journal of Neurophysiology, vol. 77, pp. 2879-2909, 1997. [14]. D. J. Fleet, H. Wagner and D. J. Heeger, “Neural Encoding of binocular disparity: energy model, position shifts and phase shifts,” Vision Research, vol. 26, pp. 1839-1857, 1996. [15]. Y. C. Meng and B. E. Shi, “Normalized Phase Shift Motion Energy Neuron Populations for Image Velocity Estimation,” International Joint Conference on Neural Network, Atlanta, GA, June 14-19, 2009. [16]. E. H. Adelson and J. R. Bergen, “Spatiotemporal energy models for the perception of motion,” J. Opt. Soc. Am. A Opt. Image Sci. Vis., vol. 2, pp. 284-299, 1985. [17]. D. K. Xiao, S. Raiguel, V. Marcar, J. Koenderink and G. A. Orban, “The spatial distribution of the antagonistic surround of MT/V5,” Cereb Cortex, vol. 7, pp. 662-677, 1997. [18]. D. Tadin, J. S. Lappin, L. A. Gilroy and R. Blake, “Perceptual consequences of centre-surround antagonism in visual motion processing,” Nature, vol. 424, pp. 312-315, 2003. [19]. X. Huang, T. D. Albright and G. R. Stoner, “Adaptive surround modulation in cortical area MT,” Neuron, vol. 53, pp. 761-770, 2007.

2 0.23390618 243 nips-2009-The Ordered Residual Kernel for Robust Motion Subspace Clustering

Author: Tat-jun Chin, Hanzi Wang, David Suter

Abstract: We present a novel and highly effective approach for multi-body motion segmentation. Drawing inspiration from robust statistical model fitting, we estimate putative subspace hypotheses from the data. However, instead of ranking them we encapsulate the hypotheses in a novel Mercer kernel which elicits the potential of two point trajectories to have emerged from the same subspace. The kernel permits the application of well-established statistical learning methods for effective outlier rejection, automatic recovery of the number of motions and accurate segmentation of the point trajectories. The method operates well under severe outliers arising from spurious trajectories or mistracks. Detailed experiments on a recent benchmark dataset (Hopkins 155) show that our method is superior to other stateof-the-art approaches in terms of recovering the number of motions, segmentation accuracy, robustness against gross outliers and computational efficiency. 1 Introduction1 Multi-body motion segmentation concerns the separation of motions arising from multiple moving objects in a video sequence. The input data is usually a set of points on the surface of the objects which are tracked throughout the video sequence. Motion segmentation can serve as a useful preprocessing step for many computer vision applications. In recent years the case of rigid (i.e. nonarticulated) objects for which the motions could be semi-dependent on each other has received much attention [18, 14, 19, 21, 22, 17]. Under this domain the affine projection model is usually adopted. Such a model implies that the point trajectories from a particular motion lie on a linear subspace of at most four, and trajectories from different motions lie on distinct subspaces. Thus multi-body motion segmentation is reduced to the problem of subspace segmentation or clustering. To realize practical algorithms, motion segmentation approaches should possess four desirable attributes: (1) Accuracy in classifying the point trajectories to the motions they respectively belong to. This is crucial for success in the subsequent vision applications, e.g. object recognition, 3D reconstruction. (2) Robustness against inlier noise (e.g. slight localization error) and gross outliers (e.g. mistracks, spurious trajectories), since getting imperfect data is almost always unavoidable in practical circumstances. (3) Ability to automatically deduce the number of motions in the data. This is pivotal to accomplish fully automated vision applications. (4) Computational efficiency. This is integral for the processing of video sequences which are usually large amounts of data. Recent work on multi-body motion segmentation can roughly be divided into algebraic or factorization methods [3, 19, 20], statistical methods [17, 7, 14, 6, 10] and clustering methods [22, 21, 5]. Notable approaches include Generalized PCA (GPCA) [19, 20], an algebraic method based on the idea that one can fit a union of m subspaces with a set of polynomials of degree m. Statistical methods often employ concepts such random hypothesis generation [4, 17], Expectation-Maximization [14, 6] 1 This work was supported by the Australian Research Council (ARC) under the project DP0878801. 1 and geometric model selection [7, 8]. Clustering based methods [22, 21, 5] are also gaining attention due to their effectiveness. They usually include a dimensionality reduction step (e.g. manifold learning [5]) followed by a clustering of the point trajectories (e.g. via spectral clustering in [21]). A recent benchmark [18] indicated that Local Subspace Affinity (LSA) [21] gave the best performance in terms of classification accuracy, although their result was subsequently surpassed by [5, 10]. However, we argue that most of the previous approaches do not simultaneously fulfil the qualities desirable of motion segmentation algorithms. Most notably, although some of the approaches have the means to estimate the number of motions, they are generally unreliable in this respect and require manual input of this parameter. In fact this prior knowledge was given to all the methods compared in [18]2 . Secondly, most of the methods (e.g. [19, 5]) do not explicitly deal with outliers. They will almost always breakdown when given corrupted data. These deficiencies reduce the usefulness of available motion segmentation algorithms in practical circumstances. In this paper we attempt to bridge the gap between experimental performance and practical usability. Our previous work [2] indicates that robust multi-structure model fitting can be achieved effectively with statistical learning. Here we extend this concept to motion subspace clustering. Drawing inspiration from robust statistical model fitting [4], we estimate random hypotheses of motion subspaces in the data. However, instead of ranking these hypotheses we encapsulate them in a novel Mercer kernel. The kernel can function reliably despite overwhelming sampling imbalance, and it permits the application of non-linear dimensionality reduction techniques to effectively identify and reject outlying trajectories. This is then followed by Kernel PCA [11] to maximize the separation between groups and spectral clustering [13] to recover the number of motions and clustering. Experiments on the Hopkins 155 benchmark dataset [18] show that our method is superior to other approaches in terms of the qualities described above, including computational efficiency. 1.1 Brief review of affine model multi-body motion segmentation Let {tf p ∈ R2 }f =1,...,F be the set of 2D coordinates of P trajectories tracked across F frames. In p=1,...,P multi-body motion segmentation the tf p ’s correspond to points on the surface of rigid objects which are moving. The goal is to separate the trajectories into groups corresponding to the motion they belong to. In other words, if we arrange the coordinates in the following data matrix   t11 · · · t1P  . .  ∈ R2F ×P , .. .  T= . (1) . . . tF 1 . . . tF P the goal is to find the permutation Γ ∈ RP ×P such that the columns of T · Γ are arranged according to the respective motions they belong to. It turns out that under affine projection [1, 16] trajectories from the same motion lie on a distinct subspace in R2F , and each of these motion subspaces is of dimensions 2, 3 or 4. Thus motion segmentation can be accomplished via clustering subspaces in R2F . See [1, 16] for more details. Realistically actual motion sequences might contain trajectories which do not correspond to valid objects or motions. These trajectories behave as outliers in the data and, if not taken into account, can be seriously detrimental to subspace clustering algorithms. 2 The Ordered Residual Kernel (ORK) First, we take a statistical model fitting point of view to motion segmentation. Let {xi }i=1,...,N be the set of N samples on which we want to perform model fitting. We randomly draw p-subsets from the data and use it to fit a hypothesis of the model, where p is the number of parameters that define the model. In motion segmentation, the xi ’s are the columns of matrix T, and p = 4 since the model is a four-dimensional subspace3 . Assume that M of such random hypotheses are drawn. i i For each data point xi compute its absolute residual set ri = {r1 , . . . , rM } as measured to the M hypotheses. For motion segmentation, the residual is the orthogonal distance to a hypothesis 2 As confirmed through private contact with the authors of [18]. Ideally we should also consider degenerate motions with subspace dimensions 2 or 3, but previous work [18] using RANSAC [4] and our results suggest this is not a pressing issue for the Hopkins 155 dataset. 3 2 i i subspace. We sort the elements in ri to obtain the sorted residual set ˜i = {rλi , . . . , rλi }, where r 1 M i i the permutation {λi , . . . , λi } is obtained such that rλi ≤ · · · ≤ rλi . Define the following 1 M 1 M ˜ θi := {λi , . . . , λi } 1 M (2) ˜ as the sorted hypothesis set of point xi , i.e. θi depicts the order in which xi becomes the inlier of the M hypotheses as a fictitious inlier threshold is increased from 0 to ∞. We define the Ordered Residual Kernel (ORK) between two data points as 1 kr (xi1 , xi2 ) := ˜ Z M/h t ˜ ˜ zt · k∩ (θi1 , θi2 ), (3) t=1 M/h where zt = 1 are the harmonic series and Z = t=1 zt is the (M/h)-th harmonic number. t Without lost of generality assume that M is wholly divisible by h. Step size h is used to obtain the Difference of Intersection Kernel (DOIK) 1 ˜1:α t ˜ ˜ ˜1:α ˜1:α ˜1:α k∩ (θi1 , θi2 ) := (|θi1 t ∩ θi2 t | − |θi1 t−1 ∩ θi2 t−1 |) (4) h ˜a:b where αt = t · h and αt−1 = (t − 1) · h. Symbol θi indicates the set formed by the a-th to ˜i . Since the contents of the sorted hypotheses set are merely permutations of the b-th elements of θ {1 . . . M }, i.e. there are no repeating elements, 0 ≤ kr (xi1 , xi2 ) ≤ 1. ˜ (5) Note that kr is independent of the type of model to be fitted, thus it is applicable to generic statistical ˜ model fitting problems. However, we concentrate on motion subspaces in this paper. Let τ be a fictitious inlier threshold. The kernel kr captures the intuition that, if τ is low, two ˜ points arising from the same subspace will have high normalized intersection since they share many common hypotheses which correspond to that subspace. If τ is high, implausible hypotheses fitted on outliers start to dominate and decrease the normalized intersection. Step size h allows us to quantify the rate of change of intersection if τ is increased from 0 to ∞, and since zt is decreasing, kr will evaluate to a high value for two points from the same subspace. In contrast, kr is always low ˜ ˜ for points not from the same subspace or that are outliers. Proof of satisfying Mercer’s condition. Let D be a fixed domain, and P(D) be the power set of D, i.e. the set of all subsets of D. Let S ⊆ P(D), and p, q ∈ S. If µ is a measure on D, then k∩ (p, q) = µ(p ∩ q), (6) called the intersection kernel, is provably a valid Mercer kernel [12]. The DOIK can be rewritten as t ˜ ˜ k∩ (θi1 , θi2 ) = 1 ˜(αt−1 +1):αt ˜(αt−1 +1):αt (|θ ∩ θi2 | h i1 ˜1:(α ) ˜(α +1):αt | + |θ (αt−1 +1):αt ∩ θ 1:(αt−1 ) |). ˜ ˜ +|θi1 t−1 ∩ θi2 t−1 i1 i2 (7) If we let D = {1 . . . M } be the set of all possible hypothesis indices and µ be uniform on D, each term in Eq. (7) is simply an intersection kernel multiplied by |D|/h. Since multiplying a kernel with a positive constant and adding two kernels respectively produce valid Mercer kernels [12], the DOIK and ORK are also valid Mercer kernels.• Parameter h in kr depends on the number of random hypotheses M , i.e. step size h can be set as a ˜ ratio of M . The value of M can be determined based on the size of the p-subset and the size of the data N (e.g. [23, 15]), and thus h is not contingent on knowledge of the true inlier noise scale or threshold. Moreover, our experiments in Sec. 4 show that segmentation performance is relatively insensitive to the settings of h and M . 2.1 Performance under sampling imbalance Methods based on random sampling (e.g. RANSAC [4]) are usually affected by unbalanced datasets. The probability of simultaneously retrieving p inliers from a particular structure is tiny if points 3 from that structure represent only a small minority in the data. In an unbalanced dataset the “pure” p-subsets in the M randomly drawn samples will be dominated by points from the majority structure in the data. This is a pronounced problem in motion sequences, since there is usually a background “object” whose point trajectories form a large majority in the data. In fact, for motion sequences from the Hopkins 155 dataset [18] with typically about 300 points per sequence, M has to be raised to about 20,000 before a pure p-subset from the non-background objects is sampled. However, ORK can function reliably despite serious sampling imbalance. This is because points from the same subspace are roughly equi-distance to the sampled hypotheses in their vicinity, even though these hypotheses might not pass through that subspace. Moreover, since zt in Eq. (3) is decreasing only residuals/hypotheses in the vicinity of a point are heavily weighted in the intersection. Fig. 1(a) illustrates this condition. Results in Sec. 4 show that ORK excelled even with M = 1, 000. (a) Data in R2F . (b) Data in RKHS Fkr . ˜ Figure 1: (a) ORK under sampling imbalance. (b) Data in RKHS induced by ORK. 3 Multi-Body Motion Segmentation using ORK In this section, we describe how ORK is used for multi-body motion segmentation. 3.1 Outlier rejection via non-linear dimensionality reduction Denote by Fkr the Reproducing Kernel Hilbert Space (RKHS) induced by kr . Let matrix A = ˜ ˜ [φ(x1 ) . . . φ(xN )] contain the input data after it is mapped to Fkr . The kernel matrix K = AT A is ˜ computed using the kernel function kr as ˜ Kp,q = φ(xp ), φ(xq ) = kr (xp , xq ), p, q ∈ {1 . . . N }. ˜ (8) Since kr is a valid Mercer kernel, K is guaranteed to be positive semi-definite [12]. Let K = ˜ Q∆QT be the eigenvalue decomposition (EVD) of K. Then the rank-n Kernel Singular Value Decomposition (Kernel SVD) [12] of A is 1 1 An = [AQn (∆n )− 2 ][(∆n ) 2 ][(Qn )T ] ≡ Un Σn (Vn )T . n n (9) n Via the Matlab notation, Q = Q:,1:n and ∆ = ∆1:n,1:n . The left singular vectors U is an orthonormal basis for the n-dimensional principal subspace of the whole dataset in Fkr . Projecting ˜ the data onto the principal subspace yields 1 1 B = [AQn (∆n )− 2 ]T A = (∆n ) 2 (Qn )T , (10) n×N where B = [b1 . . . bN ] ∈ R is the reduced dimension version of A. Directions of the principal subspace are dominated by inlier points, since kr evaluates to a high value generally for them, but ˜ always to a low value for gross outliers. Moreover the kernel ensures that points from the same subspace are mapped to the same cluster and vice versa. Fig. 1(b) illustrates this condition. Fig. 2(a)(left) shows the first frame of sequence “Cars10” from the Hopkins 155 dataset [18] with 100 false trajectories of Brownian motion added to the original data (297 points). The corresponing RKHS norm histogram for n = 3 is displayed in Fig. 2(b). The existence of two distinct modes, 4 15 Outlier mode Bin count Inlier mode 10 5 0 (a) (left) Before and (right) after outlier removal. Blue dots are inliers while red dots are added outliers. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Vector norm in principal subspace 0.18 0.2 (b) Actual norm histogram of “cars10”. Figure 2: Demonstration of outlier rejection on sequence “cars10” from Hopkins 155. corresponding respectively to inliers and outliers, is evident. We exploit this observation for outlier rejection by discarding data with low norms in the principal subspace. The cut-off threshold ψ can be determined by analyzing the shape of the distribution. For instance we can fit a 1D Gaussian Mixture Model (GMM) with two components and set ψ as the point of equal Mahalanobis distance between the two components. However, our experimentation shows that an effective threshold can be obtained by simply setting ψ as the average value of all the norms, i.e. ψ= 1 N N bi . (11) i=1 This method was applied uniformly on all the sequences in our experiments in Sec. 4. Fig. 2(a)(right) shows an actual result of the method on Fig. 2(a)(left). 3.2 Recovering the number of motions and subspace clustering After outlier rejection, we further take advantage of the mapping induced by ORK for recovering the number of motions and subspace clustering. On the remaining data, we perform Kernel PCA [11] to seek the principal components which maximize the variance of the data in the RKHS, as Fig. 1(b) illustrates. Let {yi }i=1,...,N ′ be the N ′ -point subset of the input data that remains after outlier removal, where N ′ < N . Denote by C = [φ(y1 ) . . . φ(yN ′ )] the data matrix after mapping the data ˜ to Fkr , and by symbol C the result of adjusting C with the empirical mean of {φ(y1 ), . . . , φ(yN ′ )}. ˜ ˜ ˜ ˜ The centered kernel matrix K′ = CT C [11] can be obtained as 1 ˜ K′ = ν T K′ ν, ν = [IN ′ − ′ 1N ′ ,N ′ ], (12) N where K′ = CT C is the uncentered kernel matrix, Is and 1s,s are respectively the s × s identity ˜ ˜ matrix and a matrix of ones. If K′ = RΩRT is the EVD of K′ , then we obtain first-m kernel m ˜ principal components P of C as the first-m left singular vectors of C , i.e. 1 ˜ Pm = CRm (Ωm )− 2 , (13) where Rm = R:,1:m and Ω1:m,1:m ; see Eq. (9). Projecting the data on the principal components yields 1 D = [d1 . . . dN ′ ] = (Ωm ) 2 (Rm )T , (14) ′ where D ∈ Rm×N . The affine subspace span(Pm ) maximizes the spread of the centered data in the RKHS, and the projection D offers an effective representation for clustering. Fig. 3(a) shows the Kernel PCA projection results for m = 3 on the sequence in Fig. 2(a). The number of clusters in D is recovered via spectral clustering. More specifically we apply the Normalized Cut (Ncut) [13] algorithm. A fully connected graph is first derived from the data, where ′ ′ its weighted adjacency matrix W ∈ RN ×N is obtained as Wp,q = exp(− dp − dq 2 /2δ 2 ), (15) and δ is taken as the average nearest neighbour distance in the Euclidean sense among the vectors in D. The Laplacian matrix [13] is then derived from W and eigendecomposed. Under Ncut, 5 0.1 0.05 0 −0.05 −0.1 0.1 −0.15 0.15 0.08 0.1 0.05 0 −0.05 −0.1 0.06 (a) Kernel PCA and Ncut results. (b) W matrix. (c) Final result for “cars10”. Figure 3: Actual results on the motion sequence in Fig. 2(a)(left). the number of clusters is revealed as the number of eigenvalues of the Laplacian that are zero or numerically insignificant. With this knowledge, a subsequent k-means step is then performed to cluster the points. Fig. 3(b) shows W for the input data in Fig. 2(a)(left) after outlier removal. It can be seen that strong affinity exists between points from the same cluster, thus allowing accurate clustering. Figs. 3(a) and 3(c) illustrate the final clustering result for the data in Fig. 2(a)(left). There are several reasons why spectral clustering under our framework is more successful than previous methods. Firstly, we perform an effective outlier rejection step that removes bad trajectories that can potentially mislead the clustering. Secondly, the mapping induced by ORK deliberately separates the trajectories based on their cluster membership. Finally, we perform Kernel PCA to maximize the variance of the data. Effectively this also improves the separation of clusters, thus facilitating an accurate recovery of the number of clusters and also the subsequent segmentation. This distinguishes our work from previous clustering based methods [21, 5] which tend to operate without maximizing the between-class scatter. Results in Sec. 4 validate our claims. 4 Results Henceforth we indicate the proposed method as “ORK”. We leverage on a recently published benchmark on affine model motion segmentation [18] as a basis of comparison. The benchmark was evaluated on the Hopkins 155 dataset4 which contains 155 sequences with tracked point trajectories. A total of 120 sequences have two motions while 35 have three motions. The sequences contain degenerate and non-degenerate motions, independent and partially dependent motions, articulated motions, nonrigid motions etc. In terms of video content three categories exist: Checkerboard sequences, traffic sequences (moving cars, trucks) and articulated motions (moving faces, people). 4.1 Details on benchmarking Four major algorithms were compared in [18]: Generalized PCA (GPCA) [19], Local Subspace Affinity (LSA) [21], Multi-Stage Learning (MSL) [14] and RANSAC [17]. Here we extend the benchmark with newly reported results from Locally Linear Manifold Clustering (LLMC) [5] and Agglomerative Lossy Compression (ALC) [10, 9]. We also compare our method against Kanatani and Matsunaga’s [8] algorithm (henceforth, the “KM” method) in estimating the number of independent motions in the video sequences. Note that KM per se does not perform motion segmentation. For the sake of objective comparisons we use only implementations available publicly5. Following [18], motion segmentation performance is evaluated in terms of the labelling error of the point trajectories, where each point in a sequence has a ground truth label, i.e. number of mislabeled points . (16) classification error = total number of points Unlike [18], we also emphasize on the ability of the methods in recovering the number of motions. However, although the methods compared in [18] (except RANSAC) theoretically have the means to 4 Available at http://www.vision.jhu.edu/data/hopkins155/. For MSL and KM, see http://www.suri.cs.okayama-u.ac.jp/e-program-separate.html/. For GPCA, LSA and RANSAC, refer to the url for the Hopkins 155 dataset. 5 6 do so, their estimation of the number of motions is generally unrealiable and the benchmark results in [18] were obtained by revealing the actual number of motions to the algorithms. A similar initialization exists in [5, 10] where the results were obtained by giving LLMC and ALC this knowledge a priori (for LLMC, this was given at least to the variant LLMC 4m during dimensionality reduction [5], where m is the true number of motions). In the following subsections, where variants exist for the compared algorithms we use results from the best performing variant. In the following the number of random hypotheses M and step size h for ORK are fixed at 1000 and 300 respectively, and unlike the others, ORK is not given knowledge of the number of motions. 4.2 Data without gross outliers We apply ORK on the Hopkins 155 dataset. Since ORK uses random sampling we repeat it 100 times for each sequence and average the results. Table 1 depicts the obtained classification error among those from previously proposed methods. ORK (column 9) gives comparable results to the other methods for sequences with 2 motions (mean = 7.83%, median = 0.41%). For sequences with 3 motions, ORK (mean = 12.62%, median = 4.75%) outperforms GPCA and RANSAC, but is slightly less accurate than the others. However, bear in mind that unlike the other methods ORK is not given prior knowledge of the true number of motions and has to estimate this independently. Column Method 1 REF 2 GPCA Mean Median 2.03 0.00 4.59 0.38 Mean Median 5.08 2.40 28.66 28.26 3 4 5 6 LSA MSL RANSAC LLMC Sequences with 2 motions 3.45 4.14 5.56 3.62 0.59 0.00 1.18 0.00 Sequences with 3 motions 9.73 8.23 22.94 8.85 2.33 1.76 22.03 3.19 8 ALC 9 ORK 10 ORK∗ 3.03 0.00 7.83 0.41 1.27 0.00 6.26 1.02 12.62 4.75 2.09 0.05 Table 1: Classification error (%) on Hopkins 155 sequences. REF represents the reference/control method which operates based on knowledge of ground truth segmentation. Refer to [18] for details. We also separately investigate the accuracy of ORK in estimating the number of motions, and compare it against KM [8] which was proposed for this purpose. Note that such an experiment was not attempted in [18] since approaches compared therein generally do not perform reliably in estimating the number of motions. The results in Table 2 (columns 1–2) show that for sequences with two motions, KM (80.83%) outperforms ORK (67.37%) by ≈ 15 percentage points. However, for sequences with three motions, ORK (49.66%) vastly outperforms KM (14.29%) by more than doubling the percentage points of accuracy. The overall accuracy of KM (65.81%) is slightly better than ORK (63.37%), but this is mostly because sequences with two motions form the majority in the dataset (120 out of 155). This leads us to conclude that ORK is actually the superior method here. Dataset Column Method 2 motions 3 motions Overall Hopkins 155 1 2 KM ORK 80.83% 67.37% 14.29% 49.66% 65.81% 63.37% Hopkins 155 + Outliers 3 4 KM ORK 00.00% 47.58% 100.00% 50.00% 22.58% 48.13% Table 2: Accuracy in determining the number of motions in a sequence. Note that in the experiment with outliers (columns 3–4), KM returns a constant number of 3 motions for all sequences. We re-evaluate the performance of ORK by considering only results on sequences where the number of motions is estimated correctly by ORK (there are about 98 ≡ 63.37% of such cases). The results are tabulated under ORK∗ (column 10) in Table 1. It can be seen that when ORK estimates the number of motions correctly, it is significantly more accurate than the other methods. Finally, we compare the speed of the methods in Table 3. ORK was implemented and run in Matlab on a Dual Core Pentium 3.00GHz machine with 4GB of main memory (this is much less powerful 7 than the 8 Core Xeon 3.66GHz with 32GB memory used in [18] for the other methods in Table 3). The results show that ORK is comparable to LSA, much faster than MSL and ALC, but slower than GPCA and RANSAC. Timing results of LLMC are not available in the literature. Method 2 motions 3 motions GPCA 324ms 738ms LSA 7.584s 15.956s MSL 11h 4m 1d 23h RANSAC 175ms 258ms ALC 10m 32s 10m 32s ORK 4.249s 8.479s Table 3: Average computation time on Hopkins 155 sequences. 4.3 Data with gross outliers We next examine the ability of the proposed method in dealing with gross outliers in motion data. For each sequence in Hopkins 155, we add 100 gross outliers by creating trajectories corresponding to mistracks or spuriously occuring points. These are created by randomly initializing 100 locations in the first frame and allowing them to drift throughout the sequence according to Brownian motion. The corrupted sequences are then subjected to the algorithms for motion segmentation. Since only ORK is capable of rejecting outliers, the classification error of Eq. (16) is evaluated on the inlier points only. The results in Table 4 illustrate that ORK (column 4) is the most accurate method by a large margin. Despite being given the true number of motions a priori, GPCA, LSA and RANSAC are unable to provide satisfactory segmentation results. Column Method Mean Median Mean Median 1 2 3 4 GPCA LSA RANSAC ORK Sequences with 2 motions 28.66 24.25 30.64 16.50 30.96 26.51 32.36 10.54 Sequences with 3 motions 40.61 30.94 42.24 19.99 41.30 27.68 43.43 8.49 5 ORK∗ 1.62 0.00 2.68 0.09 Table 4: Classification error (%) on Hopkins 155 sequences with 100 gross outliers per sequence. In terms of estimating the number of motions, as shown in column 4 in Table 2 the overall accuracy of ORK is reduced to 48.13%. This is contributed mainly by the deterioration in accuracy on sequences with two motions (47.58%), although the accuracy on sequences with three motions are maintained (50.00%). This is not a surprising result, since sequences with three motions generally have more (inlying) point trajectories than sequences with two motions, thus the outlier rates for sequences with three motions are lower (recall that a fixed number of 100 false trajectories are added). On the other hand, the KM method (column 3) is completely overwhelmed by the outliers— for all the sequences with outliers it returned a constant “3” as the number of motions. We again re-evaluate ORK by considering results from sequences (now with gross outliers) where the number of motions is correctly estimated (there are about 75 ≡ 48.13% of such cases). The results tabulated under ORK∗ (column 5) in Table 4 show that the proposed method can accurately segment the point trajectories without being influenced by the gross outliers. 5 Conclusions In this paper we propose a novel and highly effective approach for multi-body motion segmentation. Our idea is based on encapsulating random hypotheses in a novel Mercel kernel and statistical learning. We evaluated our method on the Hopkins 155 dataset with results showing that the idea is superior other state-of-the-art approaches. It is by far the most accurate in terms of estimating the number of motions, and it excels in segmentation accuracy despite lacking prior knowledge of the number of motions. The proposed idea is also highly robust towards outliers in the input data. Acknowledgements. We are grateful to the authors of [18] especially Ren´ Vidal for discussions e and insights which have been immensely helpful. 8 References [1] T. Boult and L. Brown. Factorization-based segmentation of motions. In IEEE Workshop on Motion Understanding, 1991. [2] T.-J. Chin, H. Wang, and D. Suter. Robust fitting of multiple structures: The statistical learning approach. In ICCV, 2009. [3] J. Costeira and T. Kanade. A multibody factorization method for independently moving objects. IJCV, 29(3):159–179, 1998. [4] M. A. Fischler and R. C. Bolles. Random sample concensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. of the ACM, 24:381–395, 1981. [5] A. Goh and R. Vidal. Segmenting motions of different types by unsupervised manifold clustering. In CVPR, 2007. [6] A. Gruber and Y. Weiss. Multibody factorization with uncertainty and missing data using the EM algorithm. In CVPR, 2004. [7] K. Kanatani. Motion segmentation by subspace separation and model selection. In ICCV, 2001. [8] K. Kanatani and C. Matsunaga. Estimating the number of independent motions for multibody segmentation. In ACCV, 2002. [9] Y. Ma, H. Derksen, W. Hong, and J. Wright. Segmentation of multivariate mixed data via lossy coding and compression. TPAMI, 29(9):1546–1562, 2007. [10] S. Rao, R. Tron, Y. Ma, and R. Vidal. Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories. In CVPR, 2008. [11] B. Sch¨ lkopf, A. Smola, and K. R. M¨ ller. Nonlinear component analysis as a kernel eigeno u value problem. Neural Computation, 10:1299–1319, 1998. [12] J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge University Press, 2004. [13] J. Shi and J. Malik. Normalized cuts and image segmentation. TPAMI, 22(8):888–905, 2000. [14] Y. Sugaya and K. Kanatani. Geometric structure of degeneracy for multi-body motion segmentation. In Workshop on Statistical Methods in Video Processing, 2004. [15] R. Toldo and A. Fusiello. Robust multiple structures estimation with J-Linkage. In ECCV, 2008. [16] C. Tomasi and T. Kanade. Shape and motion from image streams under orthography. IJCV, 9(2):137–154, 1992. [17] P. Torr. Geometric motion segmentation and model selection. Phil. Trans. Royal Society of London, 356(1740):1321–1340, 1998. [18] R. Tron and R. Vidal. A benchmark for the comparison of 3-D motion segmentation algorithms. In CVPR, 2007. [19] R. Vidal and R. Hartley. Motion segmentation with missing data by PowerFactorization and Generalized PCA. In CVPR, 2004. [20] R. Vidal, Y. Ma, and S. Sastry. Generalized Principal Component Analysis (GPCA). TPAMI, 27(12):1–15, 2005. [21] J. Yan and M. Pollefeys. A general framework for motion segmentation: independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In ECCV, 2006. [22] L. Zelnik-Manor and M. Irani. Degeneracies, dependencies and their implications on multibody and multi-sequence factorization. In CVPR, 2003. [23] W. Zhang and J. Koseck´ . Nonparametric estimation of multiple structures with outliers. In a Dynamical Vision, ICCV 2005 and ECCV 2006 Workshops, 2006. 9

3 0.13178243 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

Author: Novi Quadrianto, John Lim, Dale Schuurmans, Tibério S. Caetano

Abstract: We develop a convex relaxation of maximum a posteriori estimation of a mixture of regression models. Although our relaxation involves a semidefinite matrix variable, we reformulate the problem to eliminate the need for general semidefinite programming. In particular, we provide two reformulations that admit fast algorithms. The first is a max-min spectral reformulation exploiting quasi-Newton descent. The second is a min-min reformulation consisting of fast alternating steps of closed-form updates. We evaluate the methods against Expectation-Maximization in a real problem of motion segmentation from video data. 1

4 0.11830802 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

Author: Steven Chase, Andrew Schwartz, Wolfgang Maass, Robert A. Legenstein

Abstract: The control of neuroprosthetic devices from the activity of motor cortex neurons benefits from learning effects where the function of these neurons is adapted to the control task. It was recently shown that tuning properties of neurons in monkey motor cortex are adapted selectively in order to compensate for an erroneous interpretation of their activity. In particular, it was shown that the tuning curves of those neurons whose preferred directions had been misinterpreted changed more than those of other neurons. In this article, we show that the experimentally observed self-tuning properties of the system can be explained on the basis of a simple learning rule. This learning rule utilizes neuronal noise for exploration and performs Hebbian weight updates that are modulated by a global reward signal. In contrast to most previously proposed reward-modulated Hebbian learning rules, this rule does not require extraneous knowledge about what is noise and what is signal. The learning rule is able to optimize the performance of the model system within biologically realistic periods of time and under high noise levels. When the neuronal noise is fitted to experimental data, the model produces learning effects similar to those found in monkey experiments.

5 0.10522333 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

Author: Lei Shi, Thomas L. Griffiths

Abstract: The goal of perception is to infer the hidden states in the hierarchical process by which sensory data are generated. Human behavior is consistent with the optimal statistical solution to this problem in many tasks, including cue combination and orientation detection. Understanding the neural mechanisms underlying this behavior is of particular importance, since probabilistic computations are notoriously challenging. Here we propose a simple mechanism for Bayesian inference which involves averaging over a few feature detection neurons which fire at a rate determined by their similarity to a sensory stimulus. This mechanism is based on a Monte Carlo method known as importance sampling, commonly used in computer science and statistics. Moreover, a simple extension to recursive importance sampling can be used to perform hierarchical Bayesian inference. We identify a scheme for implementing importance sampling with spiking neurons, and show that this scheme can account for human behavior in cue combination and the oblique effect. 1

6 0.10187681 137 nips-2009-Learning transport operators for image manifolds

7 0.095426477 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing

8 0.094925173 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

9 0.093057461 201 nips-2009-Region-based Segmentation and Object Detection

10 0.082864568 164 nips-2009-No evidence for active sparsification in the visual cortex

11 0.080786414 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

12 0.078296959 38 nips-2009-Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity

13 0.077871583 219 nips-2009-Slow, Decorrelated Features for Pretraining Complex Cell-like Networks

14 0.074504338 241 nips-2009-The 'tree-dependent components' of natural scenes are edge filters

15 0.071642771 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

16 0.067478336 97 nips-2009-Free energy score space

17 0.067219816 6 nips-2009-A Biologically Plausible Model for Rapid Natural Scene Identification

18 0.066944979 200 nips-2009-Reconstruction of Sparse Circuits Using Multi-neuronal Excitation (RESCUME)

19 0.066836402 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry

20 0.066367842 50 nips-2009-Canonical Time Warping for Alignment of Human Behavior


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.136), (1, -0.155), (2, 0.072), (3, 0.076), (4, 0.018), (5, 0.105), (6, 0.036), (7, 0.039), (8, 0.094), (9, -0.106), (10, 0.01), (11, -0.0), (12, 0.01), (13, 0.003), (14, -0.035), (15, -0.026), (16, -0.022), (17, -0.045), (18, -0.11), (19, 0.074), (20, 0.103), (21, 0.025), (22, 0.106), (23, -0.325), (24, 0.069), (25, -0.084), (26, 0.154), (27, -0.004), (28, 0.08), (29, 0.086), (30, -0.162), (31, -0.146), (32, -0.108), (33, 0.123), (34, -0.023), (35, 0.026), (36, -0.046), (37, 0.063), (38, -0.116), (39, -0.241), (40, -0.054), (41, 0.079), (42, 0.004), (43, 0.052), (44, -0.071), (45, -0.076), (46, 0.118), (47, 0.079), (48, -0.01), (49, -0.006)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98999149 88 nips-2009-Extending Phase Mechanism to Differential Motion Opponency for Motion Pop-out

Author: Yicong Meng, Bertram E. Shi

Abstract: We extend the concept of phase tuning, a ubiquitous mechanism among sensory neurons including motion and disparity selective neurons, to the motion contrast detection. We demonstrate that the motion contrast can be detected by phase shifts between motion neuronal responses in different spatial regions. By constructing the differential motion opponency in response to motions in two different spatial regions, varying motion contrasts can be detected, where similar motion is detected by zero phase shifts and differences in motion by non-zero phase shifts. The model can exhibit either enhancement or suppression of responses by either different or similar motion in the surrounding. A primary advantage of the model is that the responses are selective to relative motion instead of absolute motion, which could model neurons found in neurophysiological experiments responsible for motion pop-out detection. 1 In trod u ction Motion discontinuity or motion contrast is an important cue for the pop-out of salient moving objects from contextual backgrounds. Although the neural mechanism underlying the motion pop-out detection is still unknown, the center-surround receptive field (RF) organization is considered as a physiological basis responsible for the pop-out detection. The center-surround RF structure is simple and ubiquitous in cortical cells especially in neurons processing motion and color information. Nakayama and Loomis [1] have predicted the existence of motion selective neurons with antagonistic center-surround receptive field organization in 1974. Recent physiological experiments [2][3] show that neurons with center-surround RFs have been found in both middle temporal (MT) and medial superior temporal (MST) areas related to motion processing. This antagonistic mechanism has been suggested to detect motion segmentation [4], figure/ground segregation [5] and the differentiation of object motion from ego-motion [6]. There are many related works [7]-[12] on motion pop-out detection. Some works [7]-[9] are based on spatio-temporal filtering outputs, but motion neurons are not fully interacted by either only inhibiting similar motion [7] or only enhancing opposite motion [8]. Heeger, et al. [7] proposed a center-surround operator to eliminate the response dependence upon rotational motions. But the Heeger's model only shows a complete center-surround interaction for moving directions. With respect to the surrounding speed effects, the neuronal responses are suppressed by the same speed with the center motion but not enhanced by other speeds. Similar problem existed in [8], which only modeled the suppression of neuronal responses in the classical receptive field (CRF) by similar motions in surrounding regions. Physiological experiments [10][11] show that many neurons in visual cortex are sensitive to the motion contrast rather than depend upon the absolute direction and speed of the object motion. Although pooling over motion neurons tuned to different velocities can eliminate the dependence upon absolute velocities, it is computationally inefficient and still can't give full interactions of both suppression and enhancement by similar and opposite surrounding motions. The model proposed by Dellen, et al. [12] computed differential motion responses directly from complex cells in V1 and didn't utilize responses from direction selective neurons. In this paper, we propose an opponency model which directly responds to differential motions by utilizing the phase shift mechanism. Phase tuning is a ubiquitous mechanism in sensory information processing, including motion, disparity and depth detection. Disparity selective neurons in the visual cortex have been found to detect disparities by adjusting the phase shift between the receptive field organizations in the left and right eyes [13][14]. Motion sensitive cells have been modeled in the similar way as the disparity energy neurons and detect image motions by utilizing the phase shift between the real and imaginary parts of temporal complex valued responses, which are comparable to images to the left and right eyes [15]. Therefore, the differential motion can be modeled by exploring the similarity between images from different spatial regions and from different eyes. The remainder of this paper is organized as following. Section 2 illustrates the phase shift motion energy neurons which estimate image velocities by the phase tuning in the imaginary path of the temporal receptive field responses. In section 3, we extend the concept of phase tuning to the construction of differential motion opponency. The phase difference determines the preferred velocity difference between adjacent areas in retinal images. Section 4 investigates properties of motion pop-out detection by the proposed motion opponency model. Finally, in section 5, we relate our proposed model to the neural mechanism of motion integration and motion segmentation in motion related areas and suggest a possible interpretation for adaptive center-surround interactions observed in biological experiments. 2 Phase Shift Motion Energy Neurons Adelson and Bergen [16] proposed the motion energy model for visual motion perception by measuring spatio-temporal orientations of image sequences in space and time. The motion energy model posits that the responses of direction-selective V1 complex cells can be computed by a combination of two linear spatio-temporal filtering stages, followed by squaring and summation. The motion energy model was extended in [15] to be phase tuned by splitting the complex valued temporal responses into real and imaginary paths and adding a phase shift on the imaginary path. Figure 1(a) demonstrates the schematic diagram of the phase shift motion energy model. Here we assume an input image sequence in two-dimensional space (x, y) and time t. The separable spatio-temporal receptive field ensures the cascade implementation of RF with spatial and temporal filters. Due to the requirement of the causal temporal RF, the phase shift motion energy model didn’t adopt the Gabor filter like the spatial RF. The phase shift spatio-temporal RF is modeled with a complex valued function f ( x, y, t ) = g ( x, y ) ⋅ h ( t , Φ ) , where the spatial and temporal RFs are denoted by g ( x, y ) and h ( t , Φ ) respectively, g ( x, y ) = N ( x, y | 0, C ) exp ( jΩ x x + jΩ y y ) h ( t , Φ ) = hreal ( t ) + exp ( jΦ ) himag ( t ) (1) and C is the covariance matrix of the spatial Gaussian envelope and Φ is the phase tuning of the motion energy neuron. The real and imaginary profiles of the temporal receptive field are Gamma modulated sinusoidal functions with quadrature phases, hreal ( t ) = G ( t | α ,τ ) cos ( Ωt t ) (2) himag ( t ) = G ( t | α ,τ ) sin ( Ωt t ) The envelopes for complex exponentials are functions of Gaussian and Gamma distributions, N ( x, y | 0, C ) = ⎛ x2 y2 exp ⎜ − 2 − 2 ⎜ 2σ x 2σ y 2πσ xσ y ⎝ 1 ⎞ ⎟ ⎟ ⎠ (3) hreal (t ) g ( x, y ) himag (t ) g ( x, y ) (·)2 (·)2 M M M (·)2 (·)2 M M M 2 (·) 2 (·) Vreal V (Φ ) e jΦ Vimag (a) Ev ( Φ max ) (·)2 wc ( x, y ) e jΦmin Ev ( (b) M 0 ) w ( x, y ) c M Ev ( Φ min ) M EΔv ( Θ ) ∫∫∫ K x , y ,Φ e j0 e jΦmin ws ( x, y ) Ks c e ∫∫∫ jΘ ws ( x, y ) e j0 x , y ,Φ ws ( x, y ) wc ( x, y ) M e jΦ max e jΦmax (·)2 M (·)2 M M (·)2 M 2 (·) M M (·)2 (c) Figure 1. (a) shows the diagram of the phase shift motion energy model adapted from [15]. (b) draws the spatiotemporal representation of the phase shift motion energy neuron with the real and imaginary receptive field demonstrated by the two left pictures. (c) illustrates the construction of differential motion opponency with a phase difference Θ from two populations of phase shift motion energy neurons in two spatial areas c and s. To avoid clutter, the space location (x, y) is not explicitly shown in phase tuned motion energies. G (t | α ,τ ) = 1 ⎛ t t α −1 exp ⎜ − Γ(α )τ α ⎝ τ ⎞ ⎟ u (t ) ⎠ (4) where Γ (α ) is the gamma function and u ( t ) is the unit step function. The parameters α and τ determine the temporal RF size. As derived in [15], the motion energy at location (x, y) can be computed by E v ( x, y, Φ ) = S + P cos ( Ψ − Φ ) (5) where S = Vreal 2 + Vimag 2 * P = 2 VrealVimag ( * Ψ = arg VrealVimag (6) ) and complex valued responses in real and imaginary paths are obtained as, Vreal ( x, y, t ) = ∫∫∫ g (ξ , ζ ) h (η ) I ( x − ξ , y − ζ , t − η ) dξ dζ dη real ξ ,ζ ,η Vimag ( x, y, t ) = ∫∫∫ g (ξ , ζ ) h (η ) I ( x − ξ , y − ζ , t − η ) dξ dζ dη ξ ζ η (7) imag , , The superscript * represents the complex conjugation and the phase shift parameter Φ controls the spatio-temporal orientation tuning. To avoid clutter, the spatial location variables x and y for S, P, Ψ, Vreal and Vimag are not explicitly shown in Eq. (5) and (6). Figure 1(b) demonstrates the even and odd profiles of the spatio-temporal RF tuned to a particular phase shift. Θ 0 Θ 0 (a) (b) Figure 2. Two types of differential motion opponency constructions of (a) center-surrounding interaction and (b) left-right interaction. Among cells in area MT with surrounding modulations, 25% of cells are with the antagonistic RF structure in the top row and another 50% of cells have the integrative RF structure as shown in the bottom row. 3 Extending Phase Op p on ency Mechanism to D i f f e r e nt i a l Motion Based on the above phase shift motion energy model, the local image velocity at each spatial location can be represented by a phase shift which leads to the peak response across a population of motion energy neurons. Across regions of different motions, there are clear discontinuities on the estimated velocity map. The motion discontinuities can be detected by edge detectors on the velocity map to segment different motions. However, this algorithm for motion discontinuities detection can’t discriminate between the object motion and uniform motions in contextual backgrounds. Here we propose a phase mechanism to detect differential motions inspired by the disparity energy model and adopt the center-surround inhibition mechanism to pop out the object motion from contextual background motions. The motion differences between different spatial locations can be modeled in the similar way as the disparity model. The motion energies from two neighboring locations are considered as the retinal images to the left and right eyes. Thus, we can construct a differential motion opponency by placing two populations of phase shift motion energy neurons at different spatial locations and the energy EΔv ( Θ ) of the opponency is the squared modulus of the averaged phase shift motion energies over space and phase, E Δv ( Θ ) = ∫∫∫ E ( x, y, Φ ) ⋅ w ( x, y, Φ | Θ ) dxdyd Φ v 2 (8) where w ( x, y, Θ ) is the profile for differential motion opponency and Δv is the velocity difference between the two spatial regions defined by the kernel w ( x, y, Θ ) . Since w ( x, y, Θ ) is intended to implement the functional role of spatial interactions, it is desired to be a separable function in space and phase domain and can be modeled by phase tuned summation of two spatial kernels, w ( x, y, Φ | Θ ) = wc ( x, y ) e jΦ + e jΘ+ jΦ ws ( x, y ) (9) where wc ( x, y ) and ws ( x, y ) are Gaussian kernels of different spatial sizes σ c and σ s , and Θ is the phase difference representing velocity difference between two spatial regions c and s. Substituting Eq. (9) into Eq. (8), the differential motion energy can be reformulated as EΔv ( Θ ) = K c + e jΘ K s 2 (10) 3 3 3 2 2 2 1 1 1 0 0 -1 -1 -2 -2 -2 -3 -3 -3 -3 -3 1 Right Velocity Right Velocity 0.98 0.96 0.94 0.92 0 0.9 0.88 -1 0.86 0.84 0.82 -2 -1 0 1 Left Velocity 2 3 -2 -1 0 1 Left Velocity 2 3 0.8 (a) (b) Figure 3. (a) Phase map and (b) peak magnitude map are obtained from stimuli of two patches of random dots moving with different velocities. The two patches of stimuli are statistically independent but share the same spatial properties: dot size of 2 pixels, dot density of 10% and dot coherence level of 100%. The phase tuned population of motion energy neurons are applied to each patch of random dots with RF parameters: Ωt = 2π/8, Ωt = 2π/16, σx = 5 and τ = 5.5. For each combination of velocities from left and right patches, averaged phase shifts over space and time are computed and so do the magnitudes of peak responses. The unit for velocities is pixels per frame. where Kc = ∫∫∫ E ( x, y, Φ ) exp ( jΦ ) w ( x, y ) dxdyd Φ v,c c x , y ,Φ Ks = ∫∫∫ E ( x, y, Φ ) exp ( jΦ ) w ( x, y ) dxdyd Φ v,s (11) s x, y ,Φ Ev ,c ( x, y, Φ ) and Ev , s ( x, y, Φ ) are phase shift motion energies at location (x, y) and with phase shift Φ. Utilizing the results in Eq. (5) and (6), Eq. (10) and (11) generate similar results, E Δv ( Θ ) = Sopp + Popp cos ( Θopp − Θ ) (12) where Sopp = K c 2 + Ks Popp = 2 K c K s* 2 (13) Θopp = arg ( K c K s* ) According to above derivations, by varying the phase shift Θ between –π and π, the relative motion energy of the differential motion opponency can be modeled as population responses across a population of phase tuned motion opponencies. The response is completely specified by three parameters Sopp , Popp and Θopp . The schematic diagram of this opponency is illustrated in Figure 1(c). The differential motion opponency is constituted by three stages. At the first stage, a population of phase shift motion energy neurons is applied to be selective to different velocities. At the second stage, motion energies from the first stage are weighted by kernels tuned to different spatial locations and phase shifts respectively for both spatial regions and two single differential motion signals in region c and region s are achieved by integrating responses from these two regions over space and phase tuning. Finally, the differential motion energy is computed by the squared modulus of the summation of the integrated motion signal in region c and phase shifted motion signal in region s. The subscripts c and s represent two interacted spatial regions which are not limited to the center and surround regions. The opponency could also be constructed by the neighboring left and right Inhibitive interaction, Θ = π/2 Excitatory interaction, Θ =0 Inhibitory 2 1.6 Responses 1.6 Responses Excitatory 2 1.2 0.8 1.2 0.8 0.4 0.4 0 0 pi/2 pi 3pi/2 Surrouding Direction 0 0 2pi (a) Model by Petkov et al. [8] pi/2 pi 3pi/2 Surrouding Direction (b) Model by Heeger et al. [7] Inhibitory 2 2pi Inhibitory 2 1.6 1.6 Responses Responses 1.2 0.8 1.2 0.8 0.4 0.4 0 0 0 pi/2 pi Surrouding Direction 3pi/2 2pi 0 pi/2 pi Surrouding Direction 3pi/2 2pi (c) (d) Figure 4. Demonstrations of center-surround differential motion opponency, where (a) show the excitation of opposite directions outside the CRF and (b) show the inhibition by surrounding motions in same directions. The center-surround inhibition models by Petkov, et al. [8] and Heeger, et al. [7] are shown in (c) and (d). Responses above 1 indicate enhancement and responses below 1 indicate suppressions. spatial regions. Figure 2 shows two types of structures for the differential motion opponency. In [17], the authors demonstrates that among cells in area MT with surrounding modulations, 25% of cells are with the antagonistic RF structure as shown in Figure 2(a) and another 50% of cells have the integrative RF structure as shown in Figure 2(b). The velocity difference tuning of the opponency is determined by the phase shift parameter Θ combined with parameters of spatial and temporal frequencies for motion energy neurons. The larger phase shift magnitude prefers the bigger velocity difference. This phase tuning of velocity difference is consistent with the phase tuning of motion energy neurons. Figure 3 shows the phase map obtained by using random dots stimuli with different velocities on two spatial patches (left and right patches with sizes of 128 pixels 128 pixels). Along the diagonal line, velocities from left and right patches are equal to each other and therefore phase estimates are zeros along this line. Deviated from the diagonal line to upper-left and lower-right, the phase magnitudes increase while positive phases indicate larger left velocities and negative phases indicate larger right velocities. The phase tuning can give a good classification of velocity differences. 4 V a l i d a t i o n o f D i f f e r e n t i a l M o t i o n O pp o n e n c y Out derivation and analysis above show that the phase shift between two neighboring spatial regions is a good indicator for motion difference between these two regions. In this section, we validate the proposed differential motion opponency by two sets of experiments, which show effects of both surrounding directions and speeds on the center motion. Inhibitory 2 1.6 1.2 1.2 Responses 1.6 Responses Inhibitory 2 0.8 0.4 0.4 0 -2 0.8 0 -1.5 -1 -0.5 0 0.5 Center Speed 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 Center Speed 1 1.5 2 (a) (b) Figure 5. The insensitivity of the proposed opponency model to absolute center and surrounding velocities is demonstrated in (a), where responses are enhanced for all center velocities from -2 to 2 pixels per frame. In (b), the model by Heeger, et al. [7] only shows enhancement when the center speed matches the preferred speed of 1.2 pixel per frame. Similarly, responses above 1 indicate enhancement and below 1 indicate suppressions. In both curves, the velocity differences between center and surrounding regions are maintained as a constant of 3 pixels per frame. Physiological experiments [2][3] have demonstrated that the neuronal activities in the classical receptive field are suppressed by responses outside the CRF to stimuli with similar motions including both directions and speeds on the center and surrounding regions. On the contrary, visual stimuli of opposite directions or quite different speeds outside the CRF enhance the responses in the CRF. In their experiments, they used a set of stimuli of random dots moving at different velocities, where there are small patches of moving random dots on the center. We tested the properties of the proposed opponency model for motion difference measurement by using similar random dots stimuli. The random dots on background move with different speeds and in different direction but have the same statistical parameters: dot size of 2 pixels, dot density of 10% and motion coherence level of 100%. The small random dots patches are placed on the center of background stimuli to stimulate the neurons in the CRF. These small patches share the same statistical parameters with background random dots but move with a constant velocity of 1 pixel per frame. Figure 4 shows results for the enhanced and suppressed responses in the CRF with varying surrounding directions. The phase shift motion energy neurons had the same spatial and temporal frequencies and the same receptive field sizes, and were selective to vertical orientations. The preferred spatial frequency was 2π/16 radian per pixel and the temporal frequency was 2π/16 radian per frame. The sizes of RF in horizontal and vertical directions were respectively 5 pixels and 10 pixels, corresponding to a spatial bandwidth of 1.96 octaves. The time constant τ was 5.5 frames which resulted in a temporal bandwidth of 1.96 octaves. As shown in Figure 4 (a) and (b), the surrounding motion of opposite direction gives the largest response to the motion in the CRF for the inhibitory interaction and the smallest response for the excitatory interaction. Results demonstrated in Figure 4 are consistent with physiological results reported in [3]. In Born’s paper, inhibitory cells show response enhancement and excitatory cells show response suppression when surrounding motions are in opposite directions. The 3-dB bandwidth for the surrounding moving direction is about 135 degrees for the physiological experiments while the bandwidth is about 180 degrees for the simulation results in our proposed model. Models proposed by Petkov, et al. [8] and Heeger, et al. [7] also show clear inhibition between opposite motions. The Petkov’s model achieves the surrounding suppression for each point in ( x, y, t ) space by the subtraction between responses from that point and its surroundings and followed by a half-wave rectification, + % Ev ,θ ( x, y, t ) = Ev ,θ ( x, y, t ) − α ⋅ Sv ,θ ( x, y, t ) (14) where Ev ,θ ( x, y, t ) is the motion energy at location (x,y) and time t for a given preferred speed v and orientation θ, Sv ,θ ( x, y, t ) is the average motion energy in the surrounding of point (x, y, t), % Ev ,θ ( x, y, t ) is the suppressed motion energy and the factor α controls the inhibition strength. The inhibition term is computed by weighted motion energy Sv ,θ ( x, y, t ) = Ev ,θ ( x, y, t ) ∗ wv ,θ ( x, y, t ) (15) where wv ,θ ( x, y, t ) is the surround weighting function. The Heeger’s model constructs the center-surround motion opponent by computing the weighted sum of responses from motion selective cells, Rv ,θ ( t ) = ∑ β ( x, y ) ⎡ Ev ,θ ( x, y, t ) − E− v ,θ ( x, y, t ) ⎤ ⎣ ⎦ (16) x, y where β ( x, y ) is a center-surround weighting function and the motion energy at each point should be normalized across all cells with different tuning properties. As shown in Figure 4 (c) and (d) for results of Petkov’s and Heeger’s models, we replace the conventional frequency tuned motion energy neuron with our proposed phase tuned neuron. The model by Petkov, et al. [8] is generally suppressive and only reproduces less suppression for opposite motions, which is inconsistent with results from [3]. The model by Heeger, et al. [7] has similar properties with our proposed model with respect to both excitatory and inhibitory interactions. To evaluate the sensitivity of the proposed opponency model to velocity differences, we did simulations by using similar stimuli with the above experiment in Figure 4 but maintaining a constant velocity difference of 3 pixels per frame between the center and surrounding random dot patches. As shown in Figure 5, by varying the velocities of random dots on the center region, we found that responses by the proposed model are always enhanced independent upon absolute velocities of center stimuli, but responses by the Heeger’s model achieve the enhancement at a center velocity of 1.2 pixels per frame and maintain suppressed at other speeds. 5 D i s c u s s i on We proposed a new biologically plausible model of the differential motion opponency to model the spatial interaction property of motion energy neurons. The proposed opponency model is motivated by the phase tuning mechanism of disparity energy neurons which infers the disparity information from the phase difference between complex valued responses to left and right retinal images. Hence, the two neighboring spatial areas can be considered as left and right images and the motion difference between these two spatial regions is detected by the phase difference between the complex valued responses at these two regions. Our experimental results demonstrate a consistent conclusion with physiological experiments that motions of opposite directions and different speeds outside the CRF can show both inhibitive and excitatory effects on the CRF responses. The inhibitive interaction helps to segment the moving object from backgrounds when fed back to low-level features such as edges, orientations and color information. Except providing a unifying phase mechanism in understanding neurons with different functional roles at different brain areas, the proposed opponency model could possibly provide a way to understand the motion integration and motion segmentation. Integration and segmentation are two opposite motion perception tasks but co-exist to constitute two fundamental types of motion processing. Segmentation is achieved by discriminating motion signals from different objects, which is thought to be due to the antagonistic interaction between center and surrounding RFs. Integration is obtained by utilizing the enhancing function of surrounding areas to CRF areas. Both types of processing have been found in motion related areas including area MT and MST. Tadin, et al. [18] have found that motion segmentation dominants at high stimulus contrast and gives the way to motion integration at low stimulus contrast. Huang, et al. [19] suggests that the surrounding modulation is adaptive according to the visual stimulus such as contrasts and noise levels. Since our proposed opponency model determines the functional role of neurons by only the phase shift parameter, this makes the proposed model to be an ideal candidate model for the adaptive surrounding modulation with the phase tuning between two spatial regions. References [1]. K. Nakayama and J. M. Loomis, “Optical velocity patterns, velocity-sensitive neurons, and space perception: A hypothesis,” Perception, vol. 3, 63-80, 1974. [2]. K. Tanaka, K. Hikosaka, H. Saito, M. Yukie, Y. Fukada and E. Iwai, “Analysis of local and wide-field movements in the superior temporal visual areas of the macaque monkey,” Journal of Neuroscience, vol. 6, pp. 134-144, 1986. [3]. R. T. Born and R. B. H. Tootell, “Segregation of global and local motion processing in primate middle temporal visual area,” Nature, vol. 357, pp. 497-499, 1992. [4]. J. Allman, F. Miezin and E. McGuinness, “Stimulus specific responses from beyond the classical receptive field: Neurophysiological mechanisms for local-global comparisions in visual neurons,” Annual Review Neuroscience, vol. 8, pp. 407-430, 1985. [5]. V. A. F. Lamme, “The neurophysiology of figure-ground segregation in primary visual cortex,” Journal of Neuroscience, vol. 15, pp. 1605-1615, 1995. [6]. D. C. Bradley and R. A. Andersen, “Center-surround antagonism based on disparity in primate area MT,” Journal of Neuroscience, vol. 18, pp. 7552-65, 1998. [7]. D. J. Heeger, A. D. Jepson and E. P. Simoncelli, “Recovering observer translation with center-surround operators,” Proc IEEE Workshop on Visual Motion, pp. 95-100, Oct 1991. [8]. N. Petkov and E. Subramanian, “Motion detection, noise reduction, texture suppression, and contour enhancement by spatiotemporal Gabor filters with surround inhibition,” Biological Cybernetics, vol. 97, pp. 423-439, 2007. [9]. M. Escobar and P. Kornprobst, “Action recognition with a Bio-inspired feedforward motion processing model: the richness of center-surround interactions,” ECCV '08: Proceedings of the 10th European Conference on Computer Vision, pp. 186-199, Marseille, France, 2008. [10]. B. J. Frost and K. Nakayama, “Single visual neurons code opposing motion independent of direction,” Science, vol. 200, pp. 744-745, 1983. [11]. A. Cao and P. H. Schiller, “Neural responses to relative speed in the primary visual cortex of rhesus monkey,” Visual Neuroscience, vol. 20, pp. 77-84, 2003. [12]. B. K. Dellen, J. W. Clark and R. Wessel, “Computing relative motion with complex cells,” Visual Neuroscience, vol. 22, pp. 225-236, 2005. [13]. I. Ohzawa, G. C. Deangelis and R. D. Freeman, “Encoding of binocular disparity by complex cells in the cat’s visual cortex,” Journal of Neurophysiology, vol. 77, pp. 2879-2909, 1997. [14]. D. J. Fleet, H. Wagner and D. J. Heeger, “Neural Encoding of binocular disparity: energy model, position shifts and phase shifts,” Vision Research, vol. 26, pp. 1839-1857, 1996. [15]. Y. C. Meng and B. E. Shi, “Normalized Phase Shift Motion Energy Neuron Populations for Image Velocity Estimation,” International Joint Conference on Neural Network, Atlanta, GA, June 14-19, 2009. [16]. E. H. Adelson and J. R. Bergen, “Spatiotemporal energy models for the perception of motion,” J. Opt. Soc. Am. A Opt. Image Sci. Vis., vol. 2, pp. 284-299, 1985. [17]. D. K. Xiao, S. Raiguel, V. Marcar, J. Koenderink and G. A. Orban, “The spatial distribution of the antagonistic surround of MT/V5,” Cereb Cortex, vol. 7, pp. 662-677, 1997. [18]. D. Tadin, J. S. Lappin, L. A. Gilroy and R. Blake, “Perceptual consequences of centre-surround antagonism in visual motion processing,” Nature, vol. 424, pp. 312-315, 2003. [19]. X. Huang, T. D. Albright and G. R. Stoner, “Adaptive surround modulation in cortical area MT,” Neuron, vol. 53, pp. 761-770, 2007.

2 0.78925806 243 nips-2009-The Ordered Residual Kernel for Robust Motion Subspace Clustering

Author: Tat-jun Chin, Hanzi Wang, David Suter

Abstract: We present a novel and highly effective approach for multi-body motion segmentation. Drawing inspiration from robust statistical model fitting, we estimate putative subspace hypotheses from the data. However, instead of ranking them we encapsulate the hypotheses in a novel Mercer kernel which elicits the potential of two point trajectories to have emerged from the same subspace. The kernel permits the application of well-established statistical learning methods for effective outlier rejection, automatic recovery of the number of motions and accurate segmentation of the point trajectories. The method operates well under severe outliers arising from spurious trajectories or mistracks. Detailed experiments on a recent benchmark dataset (Hopkins 155) show that our method is superior to other stateof-the-art approaches in terms of recovering the number of motions, segmentation accuracy, robustness against gross outliers and computational efficiency. 1 Introduction1 Multi-body motion segmentation concerns the separation of motions arising from multiple moving objects in a video sequence. The input data is usually a set of points on the surface of the objects which are tracked throughout the video sequence. Motion segmentation can serve as a useful preprocessing step for many computer vision applications. In recent years the case of rigid (i.e. nonarticulated) objects for which the motions could be semi-dependent on each other has received much attention [18, 14, 19, 21, 22, 17]. Under this domain the affine projection model is usually adopted. Such a model implies that the point trajectories from a particular motion lie on a linear subspace of at most four, and trajectories from different motions lie on distinct subspaces. Thus multi-body motion segmentation is reduced to the problem of subspace segmentation or clustering. To realize practical algorithms, motion segmentation approaches should possess four desirable attributes: (1) Accuracy in classifying the point trajectories to the motions they respectively belong to. This is crucial for success in the subsequent vision applications, e.g. object recognition, 3D reconstruction. (2) Robustness against inlier noise (e.g. slight localization error) and gross outliers (e.g. mistracks, spurious trajectories), since getting imperfect data is almost always unavoidable in practical circumstances. (3) Ability to automatically deduce the number of motions in the data. This is pivotal to accomplish fully automated vision applications. (4) Computational efficiency. This is integral for the processing of video sequences which are usually large amounts of data. Recent work on multi-body motion segmentation can roughly be divided into algebraic or factorization methods [3, 19, 20], statistical methods [17, 7, 14, 6, 10] and clustering methods [22, 21, 5]. Notable approaches include Generalized PCA (GPCA) [19, 20], an algebraic method based on the idea that one can fit a union of m subspaces with a set of polynomials of degree m. Statistical methods often employ concepts such random hypothesis generation [4, 17], Expectation-Maximization [14, 6] 1 This work was supported by the Australian Research Council (ARC) under the project DP0878801. 1 and geometric model selection [7, 8]. Clustering based methods [22, 21, 5] are also gaining attention due to their effectiveness. They usually include a dimensionality reduction step (e.g. manifold learning [5]) followed by a clustering of the point trajectories (e.g. via spectral clustering in [21]). A recent benchmark [18] indicated that Local Subspace Affinity (LSA) [21] gave the best performance in terms of classification accuracy, although their result was subsequently surpassed by [5, 10]. However, we argue that most of the previous approaches do not simultaneously fulfil the qualities desirable of motion segmentation algorithms. Most notably, although some of the approaches have the means to estimate the number of motions, they are generally unreliable in this respect and require manual input of this parameter. In fact this prior knowledge was given to all the methods compared in [18]2 . Secondly, most of the methods (e.g. [19, 5]) do not explicitly deal with outliers. They will almost always breakdown when given corrupted data. These deficiencies reduce the usefulness of available motion segmentation algorithms in practical circumstances. In this paper we attempt to bridge the gap between experimental performance and practical usability. Our previous work [2] indicates that robust multi-structure model fitting can be achieved effectively with statistical learning. Here we extend this concept to motion subspace clustering. Drawing inspiration from robust statistical model fitting [4], we estimate random hypotheses of motion subspaces in the data. However, instead of ranking these hypotheses we encapsulate them in a novel Mercer kernel. The kernel can function reliably despite overwhelming sampling imbalance, and it permits the application of non-linear dimensionality reduction techniques to effectively identify and reject outlying trajectories. This is then followed by Kernel PCA [11] to maximize the separation between groups and spectral clustering [13] to recover the number of motions and clustering. Experiments on the Hopkins 155 benchmark dataset [18] show that our method is superior to other approaches in terms of the qualities described above, including computational efficiency. 1.1 Brief review of affine model multi-body motion segmentation Let {tf p ∈ R2 }f =1,...,F be the set of 2D coordinates of P trajectories tracked across F frames. In p=1,...,P multi-body motion segmentation the tf p ’s correspond to points on the surface of rigid objects which are moving. The goal is to separate the trajectories into groups corresponding to the motion they belong to. In other words, if we arrange the coordinates in the following data matrix   t11 · · · t1P  . .  ∈ R2F ×P , .. .  T= . (1) . . . tF 1 . . . tF P the goal is to find the permutation Γ ∈ RP ×P such that the columns of T · Γ are arranged according to the respective motions they belong to. It turns out that under affine projection [1, 16] trajectories from the same motion lie on a distinct subspace in R2F , and each of these motion subspaces is of dimensions 2, 3 or 4. Thus motion segmentation can be accomplished via clustering subspaces in R2F . See [1, 16] for more details. Realistically actual motion sequences might contain trajectories which do not correspond to valid objects or motions. These trajectories behave as outliers in the data and, if not taken into account, can be seriously detrimental to subspace clustering algorithms. 2 The Ordered Residual Kernel (ORK) First, we take a statistical model fitting point of view to motion segmentation. Let {xi }i=1,...,N be the set of N samples on which we want to perform model fitting. We randomly draw p-subsets from the data and use it to fit a hypothesis of the model, where p is the number of parameters that define the model. In motion segmentation, the xi ’s are the columns of matrix T, and p = 4 since the model is a four-dimensional subspace3 . Assume that M of such random hypotheses are drawn. i i For each data point xi compute its absolute residual set ri = {r1 , . . . , rM } as measured to the M hypotheses. For motion segmentation, the residual is the orthogonal distance to a hypothesis 2 As confirmed through private contact with the authors of [18]. Ideally we should also consider degenerate motions with subspace dimensions 2 or 3, but previous work [18] using RANSAC [4] and our results suggest this is not a pressing issue for the Hopkins 155 dataset. 3 2 i i subspace. We sort the elements in ri to obtain the sorted residual set ˜i = {rλi , . . . , rλi }, where r 1 M i i the permutation {λi , . . . , λi } is obtained such that rλi ≤ · · · ≤ rλi . Define the following 1 M 1 M ˜ θi := {λi , . . . , λi } 1 M (2) ˜ as the sorted hypothesis set of point xi , i.e. θi depicts the order in which xi becomes the inlier of the M hypotheses as a fictitious inlier threshold is increased from 0 to ∞. We define the Ordered Residual Kernel (ORK) between two data points as 1 kr (xi1 , xi2 ) := ˜ Z M/h t ˜ ˜ zt · k∩ (θi1 , θi2 ), (3) t=1 M/h where zt = 1 are the harmonic series and Z = t=1 zt is the (M/h)-th harmonic number. t Without lost of generality assume that M is wholly divisible by h. Step size h is used to obtain the Difference of Intersection Kernel (DOIK) 1 ˜1:α t ˜ ˜ ˜1:α ˜1:α ˜1:α k∩ (θi1 , θi2 ) := (|θi1 t ∩ θi2 t | − |θi1 t−1 ∩ θi2 t−1 |) (4) h ˜a:b where αt = t · h and αt−1 = (t − 1) · h. Symbol θi indicates the set formed by the a-th to ˜i . Since the contents of the sorted hypotheses set are merely permutations of the b-th elements of θ {1 . . . M }, i.e. there are no repeating elements, 0 ≤ kr (xi1 , xi2 ) ≤ 1. ˜ (5) Note that kr is independent of the type of model to be fitted, thus it is applicable to generic statistical ˜ model fitting problems. However, we concentrate on motion subspaces in this paper. Let τ be a fictitious inlier threshold. The kernel kr captures the intuition that, if τ is low, two ˜ points arising from the same subspace will have high normalized intersection since they share many common hypotheses which correspond to that subspace. If τ is high, implausible hypotheses fitted on outliers start to dominate and decrease the normalized intersection. Step size h allows us to quantify the rate of change of intersection if τ is increased from 0 to ∞, and since zt is decreasing, kr will evaluate to a high value for two points from the same subspace. In contrast, kr is always low ˜ ˜ for points not from the same subspace or that are outliers. Proof of satisfying Mercer’s condition. Let D be a fixed domain, and P(D) be the power set of D, i.e. the set of all subsets of D. Let S ⊆ P(D), and p, q ∈ S. If µ is a measure on D, then k∩ (p, q) = µ(p ∩ q), (6) called the intersection kernel, is provably a valid Mercer kernel [12]. The DOIK can be rewritten as t ˜ ˜ k∩ (θi1 , θi2 ) = 1 ˜(αt−1 +1):αt ˜(αt−1 +1):αt (|θ ∩ θi2 | h i1 ˜1:(α ) ˜(α +1):αt | + |θ (αt−1 +1):αt ∩ θ 1:(αt−1 ) |). ˜ ˜ +|θi1 t−1 ∩ θi2 t−1 i1 i2 (7) If we let D = {1 . . . M } be the set of all possible hypothesis indices and µ be uniform on D, each term in Eq. (7) is simply an intersection kernel multiplied by |D|/h. Since multiplying a kernel with a positive constant and adding two kernels respectively produce valid Mercer kernels [12], the DOIK and ORK are also valid Mercer kernels.• Parameter h in kr depends on the number of random hypotheses M , i.e. step size h can be set as a ˜ ratio of M . The value of M can be determined based on the size of the p-subset and the size of the data N (e.g. [23, 15]), and thus h is not contingent on knowledge of the true inlier noise scale or threshold. Moreover, our experiments in Sec. 4 show that segmentation performance is relatively insensitive to the settings of h and M . 2.1 Performance under sampling imbalance Methods based on random sampling (e.g. RANSAC [4]) are usually affected by unbalanced datasets. The probability of simultaneously retrieving p inliers from a particular structure is tiny if points 3 from that structure represent only a small minority in the data. In an unbalanced dataset the “pure” p-subsets in the M randomly drawn samples will be dominated by points from the majority structure in the data. This is a pronounced problem in motion sequences, since there is usually a background “object” whose point trajectories form a large majority in the data. In fact, for motion sequences from the Hopkins 155 dataset [18] with typically about 300 points per sequence, M has to be raised to about 20,000 before a pure p-subset from the non-background objects is sampled. However, ORK can function reliably despite serious sampling imbalance. This is because points from the same subspace are roughly equi-distance to the sampled hypotheses in their vicinity, even though these hypotheses might not pass through that subspace. Moreover, since zt in Eq. (3) is decreasing only residuals/hypotheses in the vicinity of a point are heavily weighted in the intersection. Fig. 1(a) illustrates this condition. Results in Sec. 4 show that ORK excelled even with M = 1, 000. (a) Data in R2F . (b) Data in RKHS Fkr . ˜ Figure 1: (a) ORK under sampling imbalance. (b) Data in RKHS induced by ORK. 3 Multi-Body Motion Segmentation using ORK In this section, we describe how ORK is used for multi-body motion segmentation. 3.1 Outlier rejection via non-linear dimensionality reduction Denote by Fkr the Reproducing Kernel Hilbert Space (RKHS) induced by kr . Let matrix A = ˜ ˜ [φ(x1 ) . . . φ(xN )] contain the input data after it is mapped to Fkr . The kernel matrix K = AT A is ˜ computed using the kernel function kr as ˜ Kp,q = φ(xp ), φ(xq ) = kr (xp , xq ), p, q ∈ {1 . . . N }. ˜ (8) Since kr is a valid Mercer kernel, K is guaranteed to be positive semi-definite [12]. Let K = ˜ Q∆QT be the eigenvalue decomposition (EVD) of K. Then the rank-n Kernel Singular Value Decomposition (Kernel SVD) [12] of A is 1 1 An = [AQn (∆n )− 2 ][(∆n ) 2 ][(Qn )T ] ≡ Un Σn (Vn )T . n n (9) n Via the Matlab notation, Q = Q:,1:n and ∆ = ∆1:n,1:n . The left singular vectors U is an orthonormal basis for the n-dimensional principal subspace of the whole dataset in Fkr . Projecting ˜ the data onto the principal subspace yields 1 1 B = [AQn (∆n )− 2 ]T A = (∆n ) 2 (Qn )T , (10) n×N where B = [b1 . . . bN ] ∈ R is the reduced dimension version of A. Directions of the principal subspace are dominated by inlier points, since kr evaluates to a high value generally for them, but ˜ always to a low value for gross outliers. Moreover the kernel ensures that points from the same subspace are mapped to the same cluster and vice versa. Fig. 1(b) illustrates this condition. Fig. 2(a)(left) shows the first frame of sequence “Cars10” from the Hopkins 155 dataset [18] with 100 false trajectories of Brownian motion added to the original data (297 points). The corresponing RKHS norm histogram for n = 3 is displayed in Fig. 2(b). The existence of two distinct modes, 4 15 Outlier mode Bin count Inlier mode 10 5 0 (a) (left) Before and (right) after outlier removal. Blue dots are inliers while red dots are added outliers. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Vector norm in principal subspace 0.18 0.2 (b) Actual norm histogram of “cars10”. Figure 2: Demonstration of outlier rejection on sequence “cars10” from Hopkins 155. corresponding respectively to inliers and outliers, is evident. We exploit this observation for outlier rejection by discarding data with low norms in the principal subspace. The cut-off threshold ψ can be determined by analyzing the shape of the distribution. For instance we can fit a 1D Gaussian Mixture Model (GMM) with two components and set ψ as the point of equal Mahalanobis distance between the two components. However, our experimentation shows that an effective threshold can be obtained by simply setting ψ as the average value of all the norms, i.e. ψ= 1 N N bi . (11) i=1 This method was applied uniformly on all the sequences in our experiments in Sec. 4. Fig. 2(a)(right) shows an actual result of the method on Fig. 2(a)(left). 3.2 Recovering the number of motions and subspace clustering After outlier rejection, we further take advantage of the mapping induced by ORK for recovering the number of motions and subspace clustering. On the remaining data, we perform Kernel PCA [11] to seek the principal components which maximize the variance of the data in the RKHS, as Fig. 1(b) illustrates. Let {yi }i=1,...,N ′ be the N ′ -point subset of the input data that remains after outlier removal, where N ′ < N . Denote by C = [φ(y1 ) . . . φ(yN ′ )] the data matrix after mapping the data ˜ to Fkr , and by symbol C the result of adjusting C with the empirical mean of {φ(y1 ), . . . , φ(yN ′ )}. ˜ ˜ ˜ ˜ The centered kernel matrix K′ = CT C [11] can be obtained as 1 ˜ K′ = ν T K′ ν, ν = [IN ′ − ′ 1N ′ ,N ′ ], (12) N where K′ = CT C is the uncentered kernel matrix, Is and 1s,s are respectively the s × s identity ˜ ˜ matrix and a matrix of ones. If K′ = RΩRT is the EVD of K′ , then we obtain first-m kernel m ˜ principal components P of C as the first-m left singular vectors of C , i.e. 1 ˜ Pm = CRm (Ωm )− 2 , (13) where Rm = R:,1:m and Ω1:m,1:m ; see Eq. (9). Projecting the data on the principal components yields 1 D = [d1 . . . dN ′ ] = (Ωm ) 2 (Rm )T , (14) ′ where D ∈ Rm×N . The affine subspace span(Pm ) maximizes the spread of the centered data in the RKHS, and the projection D offers an effective representation for clustering. Fig. 3(a) shows the Kernel PCA projection results for m = 3 on the sequence in Fig. 2(a). The number of clusters in D is recovered via spectral clustering. More specifically we apply the Normalized Cut (Ncut) [13] algorithm. A fully connected graph is first derived from the data, where ′ ′ its weighted adjacency matrix W ∈ RN ×N is obtained as Wp,q = exp(− dp − dq 2 /2δ 2 ), (15) and δ is taken as the average nearest neighbour distance in the Euclidean sense among the vectors in D. The Laplacian matrix [13] is then derived from W and eigendecomposed. Under Ncut, 5 0.1 0.05 0 −0.05 −0.1 0.1 −0.15 0.15 0.08 0.1 0.05 0 −0.05 −0.1 0.06 (a) Kernel PCA and Ncut results. (b) W matrix. (c) Final result for “cars10”. Figure 3: Actual results on the motion sequence in Fig. 2(a)(left). the number of clusters is revealed as the number of eigenvalues of the Laplacian that are zero or numerically insignificant. With this knowledge, a subsequent k-means step is then performed to cluster the points. Fig. 3(b) shows W for the input data in Fig. 2(a)(left) after outlier removal. It can be seen that strong affinity exists between points from the same cluster, thus allowing accurate clustering. Figs. 3(a) and 3(c) illustrate the final clustering result for the data in Fig. 2(a)(left). There are several reasons why spectral clustering under our framework is more successful than previous methods. Firstly, we perform an effective outlier rejection step that removes bad trajectories that can potentially mislead the clustering. Secondly, the mapping induced by ORK deliberately separates the trajectories based on their cluster membership. Finally, we perform Kernel PCA to maximize the variance of the data. Effectively this also improves the separation of clusters, thus facilitating an accurate recovery of the number of clusters and also the subsequent segmentation. This distinguishes our work from previous clustering based methods [21, 5] which tend to operate without maximizing the between-class scatter. Results in Sec. 4 validate our claims. 4 Results Henceforth we indicate the proposed method as “ORK”. We leverage on a recently published benchmark on affine model motion segmentation [18] as a basis of comparison. The benchmark was evaluated on the Hopkins 155 dataset4 which contains 155 sequences with tracked point trajectories. A total of 120 sequences have two motions while 35 have three motions. The sequences contain degenerate and non-degenerate motions, independent and partially dependent motions, articulated motions, nonrigid motions etc. In terms of video content three categories exist: Checkerboard sequences, traffic sequences (moving cars, trucks) and articulated motions (moving faces, people). 4.1 Details on benchmarking Four major algorithms were compared in [18]: Generalized PCA (GPCA) [19], Local Subspace Affinity (LSA) [21], Multi-Stage Learning (MSL) [14] and RANSAC [17]. Here we extend the benchmark with newly reported results from Locally Linear Manifold Clustering (LLMC) [5] and Agglomerative Lossy Compression (ALC) [10, 9]. We also compare our method against Kanatani and Matsunaga’s [8] algorithm (henceforth, the “KM” method) in estimating the number of independent motions in the video sequences. Note that KM per se does not perform motion segmentation. For the sake of objective comparisons we use only implementations available publicly5. Following [18], motion segmentation performance is evaluated in terms of the labelling error of the point trajectories, where each point in a sequence has a ground truth label, i.e. number of mislabeled points . (16) classification error = total number of points Unlike [18], we also emphasize on the ability of the methods in recovering the number of motions. However, although the methods compared in [18] (except RANSAC) theoretically have the means to 4 Available at http://www.vision.jhu.edu/data/hopkins155/. For MSL and KM, see http://www.suri.cs.okayama-u.ac.jp/e-program-separate.html/. For GPCA, LSA and RANSAC, refer to the url for the Hopkins 155 dataset. 5 6 do so, their estimation of the number of motions is generally unrealiable and the benchmark results in [18] were obtained by revealing the actual number of motions to the algorithms. A similar initialization exists in [5, 10] where the results were obtained by giving LLMC and ALC this knowledge a priori (for LLMC, this was given at least to the variant LLMC 4m during dimensionality reduction [5], where m is the true number of motions). In the following subsections, where variants exist for the compared algorithms we use results from the best performing variant. In the following the number of random hypotheses M and step size h for ORK are fixed at 1000 and 300 respectively, and unlike the others, ORK is not given knowledge of the number of motions. 4.2 Data without gross outliers We apply ORK on the Hopkins 155 dataset. Since ORK uses random sampling we repeat it 100 times for each sequence and average the results. Table 1 depicts the obtained classification error among those from previously proposed methods. ORK (column 9) gives comparable results to the other methods for sequences with 2 motions (mean = 7.83%, median = 0.41%). For sequences with 3 motions, ORK (mean = 12.62%, median = 4.75%) outperforms GPCA and RANSAC, but is slightly less accurate than the others. However, bear in mind that unlike the other methods ORK is not given prior knowledge of the true number of motions and has to estimate this independently. Column Method 1 REF 2 GPCA Mean Median 2.03 0.00 4.59 0.38 Mean Median 5.08 2.40 28.66 28.26 3 4 5 6 LSA MSL RANSAC LLMC Sequences with 2 motions 3.45 4.14 5.56 3.62 0.59 0.00 1.18 0.00 Sequences with 3 motions 9.73 8.23 22.94 8.85 2.33 1.76 22.03 3.19 8 ALC 9 ORK 10 ORK∗ 3.03 0.00 7.83 0.41 1.27 0.00 6.26 1.02 12.62 4.75 2.09 0.05 Table 1: Classification error (%) on Hopkins 155 sequences. REF represents the reference/control method which operates based on knowledge of ground truth segmentation. Refer to [18] for details. We also separately investigate the accuracy of ORK in estimating the number of motions, and compare it against KM [8] which was proposed for this purpose. Note that such an experiment was not attempted in [18] since approaches compared therein generally do not perform reliably in estimating the number of motions. The results in Table 2 (columns 1–2) show that for sequences with two motions, KM (80.83%) outperforms ORK (67.37%) by ≈ 15 percentage points. However, for sequences with three motions, ORK (49.66%) vastly outperforms KM (14.29%) by more than doubling the percentage points of accuracy. The overall accuracy of KM (65.81%) is slightly better than ORK (63.37%), but this is mostly because sequences with two motions form the majority in the dataset (120 out of 155). This leads us to conclude that ORK is actually the superior method here. Dataset Column Method 2 motions 3 motions Overall Hopkins 155 1 2 KM ORK 80.83% 67.37% 14.29% 49.66% 65.81% 63.37% Hopkins 155 + Outliers 3 4 KM ORK 00.00% 47.58% 100.00% 50.00% 22.58% 48.13% Table 2: Accuracy in determining the number of motions in a sequence. Note that in the experiment with outliers (columns 3–4), KM returns a constant number of 3 motions for all sequences. We re-evaluate the performance of ORK by considering only results on sequences where the number of motions is estimated correctly by ORK (there are about 98 ≡ 63.37% of such cases). The results are tabulated under ORK∗ (column 10) in Table 1. It can be seen that when ORK estimates the number of motions correctly, it is significantly more accurate than the other methods. Finally, we compare the speed of the methods in Table 3. ORK was implemented and run in Matlab on a Dual Core Pentium 3.00GHz machine with 4GB of main memory (this is much less powerful 7 than the 8 Core Xeon 3.66GHz with 32GB memory used in [18] for the other methods in Table 3). The results show that ORK is comparable to LSA, much faster than MSL and ALC, but slower than GPCA and RANSAC. Timing results of LLMC are not available in the literature. Method 2 motions 3 motions GPCA 324ms 738ms LSA 7.584s 15.956s MSL 11h 4m 1d 23h RANSAC 175ms 258ms ALC 10m 32s 10m 32s ORK 4.249s 8.479s Table 3: Average computation time on Hopkins 155 sequences. 4.3 Data with gross outliers We next examine the ability of the proposed method in dealing with gross outliers in motion data. For each sequence in Hopkins 155, we add 100 gross outliers by creating trajectories corresponding to mistracks or spuriously occuring points. These are created by randomly initializing 100 locations in the first frame and allowing them to drift throughout the sequence according to Brownian motion. The corrupted sequences are then subjected to the algorithms for motion segmentation. Since only ORK is capable of rejecting outliers, the classification error of Eq. (16) is evaluated on the inlier points only. The results in Table 4 illustrate that ORK (column 4) is the most accurate method by a large margin. Despite being given the true number of motions a priori, GPCA, LSA and RANSAC are unable to provide satisfactory segmentation results. Column Method Mean Median Mean Median 1 2 3 4 GPCA LSA RANSAC ORK Sequences with 2 motions 28.66 24.25 30.64 16.50 30.96 26.51 32.36 10.54 Sequences with 3 motions 40.61 30.94 42.24 19.99 41.30 27.68 43.43 8.49 5 ORK∗ 1.62 0.00 2.68 0.09 Table 4: Classification error (%) on Hopkins 155 sequences with 100 gross outliers per sequence. In terms of estimating the number of motions, as shown in column 4 in Table 2 the overall accuracy of ORK is reduced to 48.13%. This is contributed mainly by the deterioration in accuracy on sequences with two motions (47.58%), although the accuracy on sequences with three motions are maintained (50.00%). This is not a surprising result, since sequences with three motions generally have more (inlying) point trajectories than sequences with two motions, thus the outlier rates for sequences with three motions are lower (recall that a fixed number of 100 false trajectories are added). On the other hand, the KM method (column 3) is completely overwhelmed by the outliers— for all the sequences with outliers it returned a constant “3” as the number of motions. We again re-evaluate ORK by considering results from sequences (now with gross outliers) where the number of motions is correctly estimated (there are about 75 ≡ 48.13% of such cases). The results tabulated under ORK∗ (column 5) in Table 4 show that the proposed method can accurately segment the point trajectories without being influenced by the gross outliers. 5 Conclusions In this paper we propose a novel and highly effective approach for multi-body motion segmentation. Our idea is based on encapsulating random hypotheses in a novel Mercel kernel and statistical learning. We evaluated our method on the Hopkins 155 dataset with results showing that the idea is superior other state-of-the-art approaches. It is by far the most accurate in terms of estimating the number of motions, and it excels in segmentation accuracy despite lacking prior knowledge of the number of motions. The proposed idea is also highly robust towards outliers in the input data. Acknowledgements. We are grateful to the authors of [18] especially Ren´ Vidal for discussions e and insights which have been immensely helpful. 8 References [1] T. Boult and L. Brown. Factorization-based segmentation of motions. In IEEE Workshop on Motion Understanding, 1991. [2] T.-J. Chin, H. Wang, and D. Suter. Robust fitting of multiple structures: The statistical learning approach. In ICCV, 2009. [3] J. Costeira and T. Kanade. A multibody factorization method for independently moving objects. IJCV, 29(3):159–179, 1998. [4] M. A. Fischler and R. C. Bolles. Random sample concensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. of the ACM, 24:381–395, 1981. [5] A. Goh and R. Vidal. Segmenting motions of different types by unsupervised manifold clustering. In CVPR, 2007. [6] A. Gruber and Y. Weiss. Multibody factorization with uncertainty and missing data using the EM algorithm. In CVPR, 2004. [7] K. Kanatani. Motion segmentation by subspace separation and model selection. In ICCV, 2001. [8] K. Kanatani and C. Matsunaga. Estimating the number of independent motions for multibody segmentation. In ACCV, 2002. [9] Y. Ma, H. Derksen, W. Hong, and J. Wright. Segmentation of multivariate mixed data via lossy coding and compression. TPAMI, 29(9):1546–1562, 2007. [10] S. Rao, R. Tron, Y. Ma, and R. Vidal. Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories. In CVPR, 2008. [11] B. Sch¨ lkopf, A. Smola, and K. R. M¨ ller. Nonlinear component analysis as a kernel eigeno u value problem. Neural Computation, 10:1299–1319, 1998. [12] J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge University Press, 2004. [13] J. Shi and J. Malik. Normalized cuts and image segmentation. TPAMI, 22(8):888–905, 2000. [14] Y. Sugaya and K. Kanatani. Geometric structure of degeneracy for multi-body motion segmentation. In Workshop on Statistical Methods in Video Processing, 2004. [15] R. Toldo and A. Fusiello. Robust multiple structures estimation with J-Linkage. In ECCV, 2008. [16] C. Tomasi and T. Kanade. Shape and motion from image streams under orthography. IJCV, 9(2):137–154, 1992. [17] P. Torr. Geometric motion segmentation and model selection. Phil. Trans. Royal Society of London, 356(1740):1321–1340, 1998. [18] R. Tron and R. Vidal. A benchmark for the comparison of 3-D motion segmentation algorithms. In CVPR, 2007. [19] R. Vidal and R. Hartley. Motion segmentation with missing data by PowerFactorization and Generalized PCA. In CVPR, 2004. [20] R. Vidal, Y. Ma, and S. Sastry. Generalized Principal Component Analysis (GPCA). TPAMI, 27(12):1–15, 2005. [21] J. Yan and M. Pollefeys. A general framework for motion segmentation: independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In ECCV, 2006. [22] L. Zelnik-Manor and M. Irani. Degeneracies, dependencies and their implications on multibody and multi-sequence factorization. In CVPR, 2003. [23] W. Zhang and J. Koseck´ . Nonparametric estimation of multiple structures with outliers. In a Dynamical Vision, ICCV 2005 and ECCV 2006 Workshops, 2006. 9

3 0.43486574 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing

Author: Ruben Coen-cagli, Peter Dayan, Odelia Schwartz

Abstract: A central hypothesis about early visual processing is that it represents inputs in a coordinate system matched to the statistics of natural scenes. Simple versions of this lead to Gabor–like receptive fields and divisive gain modulation from local surrounds; these have led to influential neural and psychological models of visual processing. However, these accounts are based on an incomplete view of the visual context surrounding each point. Here, we consider an approximate model of linear and non–linear correlations between the responses of spatially distributed Gaborlike receptive fields, which, when trained on an ensemble of natural scenes, unifies a range of spatial context effects. The full model accounts for neural surround data in primary visual cortex (V1), provides a statistical foundation for perceptual phenomena associated with Li’s (2002) hypothesis that V1 builds a saliency map, and fits data on the tilt illusion. 1

4 0.37275994 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

Author: Novi Quadrianto, John Lim, Dale Schuurmans, Tibério S. Caetano

Abstract: We develop a convex relaxation of maximum a posteriori estimation of a mixture of regression models. Although our relaxation involves a semidefinite matrix variable, we reformulate the problem to eliminate the need for general semidefinite programming. In particular, we provide two reformulations that admit fast algorithms. The first is a max-min spectral reformulation exploiting quasi-Newton descent. The second is a min-min reformulation consisting of fast alternating steps of closed-form updates. We evaluate the methods against Expectation-Maximization in a real problem of motion segmentation from video data. 1

5 0.3630881 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry

Author: Arnak Dalalyan, Renaud Keriven

Abstract: We propose a new approach to the problem of robust estimation in multiview geometry. Inspired by recent advances in the sparse recovery problem of statistics, we define our estimator as a Bayesian maximum a posteriori with multivariate Laplace prior on the vector describing the outliers. This leads to an estimator in which the fidelity to the data is measured by the L∞ -norm while the regularization is done by the L1 -norm. The proposed procedure is fairly fast since the outlier removal is done by solving one linear program (LP). An important difference compared to existing algorithms is that for our estimator it is not necessary to specify neither the number nor the proportion of the outliers. We present strong theoretical results assessing the accuracy of our procedure, as well as a numerical example illustrating its efficiency on real data. 1

6 0.33644941 137 nips-2009-Learning transport operators for image manifolds

7 0.3182835 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

8 0.31191462 164 nips-2009-No evidence for active sparsification in the visual cortex

9 0.30775964 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

10 0.30730197 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

11 0.30165806 97 nips-2009-Free energy score space

12 0.29752612 13 nips-2009-A Neural Implementation of the Kalman Filter

13 0.28073403 50 nips-2009-Canonical Time Warping for Alignment of Human Behavior

14 0.27830896 38 nips-2009-Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity

15 0.26940864 235 nips-2009-Structural inference affects depth perception in the context of potential occlusion

16 0.26786324 219 nips-2009-Slow, Decorrelated Features for Pretraining Complex Cell-like Networks

17 0.26731378 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

18 0.26405197 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

19 0.25661871 203 nips-2009-Replacing supervised classification learning by Slow Feature Analysis in spiking neural networks

20 0.25299829 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.02), (24, 0.014), (25, 0.082), (35, 0.051), (36, 0.054), (37, 0.027), (39, 0.073), (57, 0.343), (58, 0.063), (62, 0.017), (71, 0.03), (81, 0.046), (86, 0.068), (91, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.85804003 88 nips-2009-Extending Phase Mechanism to Differential Motion Opponency for Motion Pop-out

Author: Yicong Meng, Bertram E. Shi

Abstract: We extend the concept of phase tuning, a ubiquitous mechanism among sensory neurons including motion and disparity selective neurons, to the motion contrast detection. We demonstrate that the motion contrast can be detected by phase shifts between motion neuronal responses in different spatial regions. By constructing the differential motion opponency in response to motions in two different spatial regions, varying motion contrasts can be detected, where similar motion is detected by zero phase shifts and differences in motion by non-zero phase shifts. The model can exhibit either enhancement or suppression of responses by either different or similar motion in the surrounding. A primary advantage of the model is that the responses are selective to relative motion instead of absolute motion, which could model neurons found in neurophysiological experiments responsible for motion pop-out detection. 1 In trod u ction Motion discontinuity or motion contrast is an important cue for the pop-out of salient moving objects from contextual backgrounds. Although the neural mechanism underlying the motion pop-out detection is still unknown, the center-surround receptive field (RF) organization is considered as a physiological basis responsible for the pop-out detection. The center-surround RF structure is simple and ubiquitous in cortical cells especially in neurons processing motion and color information. Nakayama and Loomis [1] have predicted the existence of motion selective neurons with antagonistic center-surround receptive field organization in 1974. Recent physiological experiments [2][3] show that neurons with center-surround RFs have been found in both middle temporal (MT) and medial superior temporal (MST) areas related to motion processing. This antagonistic mechanism has been suggested to detect motion segmentation [4], figure/ground segregation [5] and the differentiation of object motion from ego-motion [6]. There are many related works [7]-[12] on motion pop-out detection. Some works [7]-[9] are based on spatio-temporal filtering outputs, but motion neurons are not fully interacted by either only inhibiting similar motion [7] or only enhancing opposite motion [8]. Heeger, et al. [7] proposed a center-surround operator to eliminate the response dependence upon rotational motions. But the Heeger's model only shows a complete center-surround interaction for moving directions. With respect to the surrounding speed effects, the neuronal responses are suppressed by the same speed with the center motion but not enhanced by other speeds. Similar problem existed in [8], which only modeled the suppression of neuronal responses in the classical receptive field (CRF) by similar motions in surrounding regions. Physiological experiments [10][11] show that many neurons in visual cortex are sensitive to the motion contrast rather than depend upon the absolute direction and speed of the object motion. Although pooling over motion neurons tuned to different velocities can eliminate the dependence upon absolute velocities, it is computationally inefficient and still can't give full interactions of both suppression and enhancement by similar and opposite surrounding motions. The model proposed by Dellen, et al. [12] computed differential motion responses directly from complex cells in V1 and didn't utilize responses from direction selective neurons. In this paper, we propose an opponency model which directly responds to differential motions by utilizing the phase shift mechanism. Phase tuning is a ubiquitous mechanism in sensory information processing, including motion, disparity and depth detection. Disparity selective neurons in the visual cortex have been found to detect disparities by adjusting the phase shift between the receptive field organizations in the left and right eyes [13][14]. Motion sensitive cells have been modeled in the similar way as the disparity energy neurons and detect image motions by utilizing the phase shift between the real and imaginary parts of temporal complex valued responses, which are comparable to images to the left and right eyes [15]. Therefore, the differential motion can be modeled by exploring the similarity between images from different spatial regions and from different eyes. The remainder of this paper is organized as following. Section 2 illustrates the phase shift motion energy neurons which estimate image velocities by the phase tuning in the imaginary path of the temporal receptive field responses. In section 3, we extend the concept of phase tuning to the construction of differential motion opponency. The phase difference determines the preferred velocity difference between adjacent areas in retinal images. Section 4 investigates properties of motion pop-out detection by the proposed motion opponency model. Finally, in section 5, we relate our proposed model to the neural mechanism of motion integration and motion segmentation in motion related areas and suggest a possible interpretation for adaptive center-surround interactions observed in biological experiments. 2 Phase Shift Motion Energy Neurons Adelson and Bergen [16] proposed the motion energy model for visual motion perception by measuring spatio-temporal orientations of image sequences in space and time. The motion energy model posits that the responses of direction-selective V1 complex cells can be computed by a combination of two linear spatio-temporal filtering stages, followed by squaring and summation. The motion energy model was extended in [15] to be phase tuned by splitting the complex valued temporal responses into real and imaginary paths and adding a phase shift on the imaginary path. Figure 1(a) demonstrates the schematic diagram of the phase shift motion energy model. Here we assume an input image sequence in two-dimensional space (x, y) and time t. The separable spatio-temporal receptive field ensures the cascade implementation of RF with spatial and temporal filters. Due to the requirement of the causal temporal RF, the phase shift motion energy model didn’t adopt the Gabor filter like the spatial RF. The phase shift spatio-temporal RF is modeled with a complex valued function f ( x, y, t ) = g ( x, y ) ⋅ h ( t , Φ ) , where the spatial and temporal RFs are denoted by g ( x, y ) and h ( t , Φ ) respectively, g ( x, y ) = N ( x, y | 0, C ) exp ( jΩ x x + jΩ y y ) h ( t , Φ ) = hreal ( t ) + exp ( jΦ ) himag ( t ) (1) and C is the covariance matrix of the spatial Gaussian envelope and Φ is the phase tuning of the motion energy neuron. The real and imaginary profiles of the temporal receptive field are Gamma modulated sinusoidal functions with quadrature phases, hreal ( t ) = G ( t | α ,τ ) cos ( Ωt t ) (2) himag ( t ) = G ( t | α ,τ ) sin ( Ωt t ) The envelopes for complex exponentials are functions of Gaussian and Gamma distributions, N ( x, y | 0, C ) = ⎛ x2 y2 exp ⎜ − 2 − 2 ⎜ 2σ x 2σ y 2πσ xσ y ⎝ 1 ⎞ ⎟ ⎟ ⎠ (3) hreal (t ) g ( x, y ) himag (t ) g ( x, y ) (·)2 (·)2 M M M (·)2 (·)2 M M M 2 (·) 2 (·) Vreal V (Φ ) e jΦ Vimag (a) Ev ( Φ max ) (·)2 wc ( x, y ) e jΦmin Ev ( (b) M 0 ) w ( x, y ) c M Ev ( Φ min ) M EΔv ( Θ ) ∫∫∫ K x , y ,Φ e j0 e jΦmin ws ( x, y ) Ks c e ∫∫∫ jΘ ws ( x, y ) e j0 x , y ,Φ ws ( x, y ) wc ( x, y ) M e jΦ max e jΦmax (·)2 M (·)2 M M (·)2 M 2 (·) M M (·)2 (c) Figure 1. (a) shows the diagram of the phase shift motion energy model adapted from [15]. (b) draws the spatiotemporal representation of the phase shift motion energy neuron with the real and imaginary receptive field demonstrated by the two left pictures. (c) illustrates the construction of differential motion opponency with a phase difference Θ from two populations of phase shift motion energy neurons in two spatial areas c and s. To avoid clutter, the space location (x, y) is not explicitly shown in phase tuned motion energies. G (t | α ,τ ) = 1 ⎛ t t α −1 exp ⎜ − Γ(α )τ α ⎝ τ ⎞ ⎟ u (t ) ⎠ (4) where Γ (α ) is the gamma function and u ( t ) is the unit step function. The parameters α and τ determine the temporal RF size. As derived in [15], the motion energy at location (x, y) can be computed by E v ( x, y, Φ ) = S + P cos ( Ψ − Φ ) (5) where S = Vreal 2 + Vimag 2 * P = 2 VrealVimag ( * Ψ = arg VrealVimag (6) ) and complex valued responses in real and imaginary paths are obtained as, Vreal ( x, y, t ) = ∫∫∫ g (ξ , ζ ) h (η ) I ( x − ξ , y − ζ , t − η ) dξ dζ dη real ξ ,ζ ,η Vimag ( x, y, t ) = ∫∫∫ g (ξ , ζ ) h (η ) I ( x − ξ , y − ζ , t − η ) dξ dζ dη ξ ζ η (7) imag , , The superscript * represents the complex conjugation and the phase shift parameter Φ controls the spatio-temporal orientation tuning. To avoid clutter, the spatial location variables x and y for S, P, Ψ, Vreal and Vimag are not explicitly shown in Eq. (5) and (6). Figure 1(b) demonstrates the even and odd profiles of the spatio-temporal RF tuned to a particular phase shift. Θ 0 Θ 0 (a) (b) Figure 2. Two types of differential motion opponency constructions of (a) center-surrounding interaction and (b) left-right interaction. Among cells in area MT with surrounding modulations, 25% of cells are with the antagonistic RF structure in the top row and another 50% of cells have the integrative RF structure as shown in the bottom row. 3 Extending Phase Op p on ency Mechanism to D i f f e r e nt i a l Motion Based on the above phase shift motion energy model, the local image velocity at each spatial location can be represented by a phase shift which leads to the peak response across a population of motion energy neurons. Across regions of different motions, there are clear discontinuities on the estimated velocity map. The motion discontinuities can be detected by edge detectors on the velocity map to segment different motions. However, this algorithm for motion discontinuities detection can’t discriminate between the object motion and uniform motions in contextual backgrounds. Here we propose a phase mechanism to detect differential motions inspired by the disparity energy model and adopt the center-surround inhibition mechanism to pop out the object motion from contextual background motions. The motion differences between different spatial locations can be modeled in the similar way as the disparity model. The motion energies from two neighboring locations are considered as the retinal images to the left and right eyes. Thus, we can construct a differential motion opponency by placing two populations of phase shift motion energy neurons at different spatial locations and the energy EΔv ( Θ ) of the opponency is the squared modulus of the averaged phase shift motion energies over space and phase, E Δv ( Θ ) = ∫∫∫ E ( x, y, Φ ) ⋅ w ( x, y, Φ | Θ ) dxdyd Φ v 2 (8) where w ( x, y, Θ ) is the profile for differential motion opponency and Δv is the velocity difference between the two spatial regions defined by the kernel w ( x, y, Θ ) . Since w ( x, y, Θ ) is intended to implement the functional role of spatial interactions, it is desired to be a separable function in space and phase domain and can be modeled by phase tuned summation of two spatial kernels, w ( x, y, Φ | Θ ) = wc ( x, y ) e jΦ + e jΘ+ jΦ ws ( x, y ) (9) where wc ( x, y ) and ws ( x, y ) are Gaussian kernels of different spatial sizes σ c and σ s , and Θ is the phase difference representing velocity difference between two spatial regions c and s. Substituting Eq. (9) into Eq. (8), the differential motion energy can be reformulated as EΔv ( Θ ) = K c + e jΘ K s 2 (10) 3 3 3 2 2 2 1 1 1 0 0 -1 -1 -2 -2 -2 -3 -3 -3 -3 -3 1 Right Velocity Right Velocity 0.98 0.96 0.94 0.92 0 0.9 0.88 -1 0.86 0.84 0.82 -2 -1 0 1 Left Velocity 2 3 -2 -1 0 1 Left Velocity 2 3 0.8 (a) (b) Figure 3. (a) Phase map and (b) peak magnitude map are obtained from stimuli of two patches of random dots moving with different velocities. The two patches of stimuli are statistically independent but share the same spatial properties: dot size of 2 pixels, dot density of 10% and dot coherence level of 100%. The phase tuned population of motion energy neurons are applied to each patch of random dots with RF parameters: Ωt = 2π/8, Ωt = 2π/16, σx = 5 and τ = 5.5. For each combination of velocities from left and right patches, averaged phase shifts over space and time are computed and so do the magnitudes of peak responses. The unit for velocities is pixels per frame. where Kc = ∫∫∫ E ( x, y, Φ ) exp ( jΦ ) w ( x, y ) dxdyd Φ v,c c x , y ,Φ Ks = ∫∫∫ E ( x, y, Φ ) exp ( jΦ ) w ( x, y ) dxdyd Φ v,s (11) s x, y ,Φ Ev ,c ( x, y, Φ ) and Ev , s ( x, y, Φ ) are phase shift motion energies at location (x, y) and with phase shift Φ. Utilizing the results in Eq. (5) and (6), Eq. (10) and (11) generate similar results, E Δv ( Θ ) = Sopp + Popp cos ( Θopp − Θ ) (12) where Sopp = K c 2 + Ks Popp = 2 K c K s* 2 (13) Θopp = arg ( K c K s* ) According to above derivations, by varying the phase shift Θ between –π and π, the relative motion energy of the differential motion opponency can be modeled as population responses across a population of phase tuned motion opponencies. The response is completely specified by three parameters Sopp , Popp and Θopp . The schematic diagram of this opponency is illustrated in Figure 1(c). The differential motion opponency is constituted by three stages. At the first stage, a population of phase shift motion energy neurons is applied to be selective to different velocities. At the second stage, motion energies from the first stage are weighted by kernels tuned to different spatial locations and phase shifts respectively for both spatial regions and two single differential motion signals in region c and region s are achieved by integrating responses from these two regions over space and phase tuning. Finally, the differential motion energy is computed by the squared modulus of the summation of the integrated motion signal in region c and phase shifted motion signal in region s. The subscripts c and s represent two interacted spatial regions which are not limited to the center and surround regions. The opponency could also be constructed by the neighboring left and right Inhibitive interaction, Θ = π/2 Excitatory interaction, Θ =0 Inhibitory 2 1.6 Responses 1.6 Responses Excitatory 2 1.2 0.8 1.2 0.8 0.4 0.4 0 0 pi/2 pi 3pi/2 Surrouding Direction 0 0 2pi (a) Model by Petkov et al. [8] pi/2 pi 3pi/2 Surrouding Direction (b) Model by Heeger et al. [7] Inhibitory 2 2pi Inhibitory 2 1.6 1.6 Responses Responses 1.2 0.8 1.2 0.8 0.4 0.4 0 0 0 pi/2 pi Surrouding Direction 3pi/2 2pi 0 pi/2 pi Surrouding Direction 3pi/2 2pi (c) (d) Figure 4. Demonstrations of center-surround differential motion opponency, where (a) show the excitation of opposite directions outside the CRF and (b) show the inhibition by surrounding motions in same directions. The center-surround inhibition models by Petkov, et al. [8] and Heeger, et al. [7] are shown in (c) and (d). Responses above 1 indicate enhancement and responses below 1 indicate suppressions. spatial regions. Figure 2 shows two types of structures for the differential motion opponency. In [17], the authors demonstrates that among cells in area MT with surrounding modulations, 25% of cells are with the antagonistic RF structure as shown in Figure 2(a) and another 50% of cells have the integrative RF structure as shown in Figure 2(b). The velocity difference tuning of the opponency is determined by the phase shift parameter Θ combined with parameters of spatial and temporal frequencies for motion energy neurons. The larger phase shift magnitude prefers the bigger velocity difference. This phase tuning of velocity difference is consistent with the phase tuning of motion energy neurons. Figure 3 shows the phase map obtained by using random dots stimuli with different velocities on two spatial patches (left and right patches with sizes of 128 pixels 128 pixels). Along the diagonal line, velocities from left and right patches are equal to each other and therefore phase estimates are zeros along this line. Deviated from the diagonal line to upper-left and lower-right, the phase magnitudes increase while positive phases indicate larger left velocities and negative phases indicate larger right velocities. The phase tuning can give a good classification of velocity differences. 4 V a l i d a t i o n o f D i f f e r e n t i a l M o t i o n O pp o n e n c y Out derivation and analysis above show that the phase shift between two neighboring spatial regions is a good indicator for motion difference between these two regions. In this section, we validate the proposed differential motion opponency by two sets of experiments, which show effects of both surrounding directions and speeds on the center motion. Inhibitory 2 1.6 1.2 1.2 Responses 1.6 Responses Inhibitory 2 0.8 0.4 0.4 0 -2 0.8 0 -1.5 -1 -0.5 0 0.5 Center Speed 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 Center Speed 1 1.5 2 (a) (b) Figure 5. The insensitivity of the proposed opponency model to absolute center and surrounding velocities is demonstrated in (a), where responses are enhanced for all center velocities from -2 to 2 pixels per frame. In (b), the model by Heeger, et al. [7] only shows enhancement when the center speed matches the preferred speed of 1.2 pixel per frame. Similarly, responses above 1 indicate enhancement and below 1 indicate suppressions. In both curves, the velocity differences between center and surrounding regions are maintained as a constant of 3 pixels per frame. Physiological experiments [2][3] have demonstrated that the neuronal activities in the classical receptive field are suppressed by responses outside the CRF to stimuli with similar motions including both directions and speeds on the center and surrounding regions. On the contrary, visual stimuli of opposite directions or quite different speeds outside the CRF enhance the responses in the CRF. In their experiments, they used a set of stimuli of random dots moving at different velocities, where there are small patches of moving random dots on the center. We tested the properties of the proposed opponency model for motion difference measurement by using similar random dots stimuli. The random dots on background move with different speeds and in different direction but have the same statistical parameters: dot size of 2 pixels, dot density of 10% and motion coherence level of 100%. The small random dots patches are placed on the center of background stimuli to stimulate the neurons in the CRF. These small patches share the same statistical parameters with background random dots but move with a constant velocity of 1 pixel per frame. Figure 4 shows results for the enhanced and suppressed responses in the CRF with varying surrounding directions. The phase shift motion energy neurons had the same spatial and temporal frequencies and the same receptive field sizes, and were selective to vertical orientations. The preferred spatial frequency was 2π/16 radian per pixel and the temporal frequency was 2π/16 radian per frame. The sizes of RF in horizontal and vertical directions were respectively 5 pixels and 10 pixels, corresponding to a spatial bandwidth of 1.96 octaves. The time constant τ was 5.5 frames which resulted in a temporal bandwidth of 1.96 octaves. As shown in Figure 4 (a) and (b), the surrounding motion of opposite direction gives the largest response to the motion in the CRF for the inhibitory interaction and the smallest response for the excitatory interaction. Results demonstrated in Figure 4 are consistent with physiological results reported in [3]. In Born’s paper, inhibitory cells show response enhancement and excitatory cells show response suppression when surrounding motions are in opposite directions. The 3-dB bandwidth for the surrounding moving direction is about 135 degrees for the physiological experiments while the bandwidth is about 180 degrees for the simulation results in our proposed model. Models proposed by Petkov, et al. [8] and Heeger, et al. [7] also show clear inhibition between opposite motions. The Petkov’s model achieves the surrounding suppression for each point in ( x, y, t ) space by the subtraction between responses from that point and its surroundings and followed by a half-wave rectification, + % Ev ,θ ( x, y, t ) = Ev ,θ ( x, y, t ) − α ⋅ Sv ,θ ( x, y, t ) (14) where Ev ,θ ( x, y, t ) is the motion energy at location (x,y) and time t for a given preferred speed v and orientation θ, Sv ,θ ( x, y, t ) is the average motion energy in the surrounding of point (x, y, t), % Ev ,θ ( x, y, t ) is the suppressed motion energy and the factor α controls the inhibition strength. The inhibition term is computed by weighted motion energy Sv ,θ ( x, y, t ) = Ev ,θ ( x, y, t ) ∗ wv ,θ ( x, y, t ) (15) where wv ,θ ( x, y, t ) is the surround weighting function. The Heeger’s model constructs the center-surround motion opponent by computing the weighted sum of responses from motion selective cells, Rv ,θ ( t ) = ∑ β ( x, y ) ⎡ Ev ,θ ( x, y, t ) − E− v ,θ ( x, y, t ) ⎤ ⎣ ⎦ (16) x, y where β ( x, y ) is a center-surround weighting function and the motion energy at each point should be normalized across all cells with different tuning properties. As shown in Figure 4 (c) and (d) for results of Petkov’s and Heeger’s models, we replace the conventional frequency tuned motion energy neuron with our proposed phase tuned neuron. The model by Petkov, et al. [8] is generally suppressive and only reproduces less suppression for opposite motions, which is inconsistent with results from [3]. The model by Heeger, et al. [7] has similar properties with our proposed model with respect to both excitatory and inhibitory interactions. To evaluate the sensitivity of the proposed opponency model to velocity differences, we did simulations by using similar stimuli with the above experiment in Figure 4 but maintaining a constant velocity difference of 3 pixels per frame between the center and surrounding random dot patches. As shown in Figure 5, by varying the velocities of random dots on the center region, we found that responses by the proposed model are always enhanced independent upon absolute velocities of center stimuli, but responses by the Heeger’s model achieve the enhancement at a center velocity of 1.2 pixels per frame and maintain suppressed at other speeds. 5 D i s c u s s i on We proposed a new biologically plausible model of the differential motion opponency to model the spatial interaction property of motion energy neurons. The proposed opponency model is motivated by the phase tuning mechanism of disparity energy neurons which infers the disparity information from the phase difference between complex valued responses to left and right retinal images. Hence, the two neighboring spatial areas can be considered as left and right images and the motion difference between these two spatial regions is detected by the phase difference between the complex valued responses at these two regions. Our experimental results demonstrate a consistent conclusion with physiological experiments that motions of opposite directions and different speeds outside the CRF can show both inhibitive and excitatory effects on the CRF responses. The inhibitive interaction helps to segment the moving object from backgrounds when fed back to low-level features such as edges, orientations and color information. Except providing a unifying phase mechanism in understanding neurons with different functional roles at different brain areas, the proposed opponency model could possibly provide a way to understand the motion integration and motion segmentation. Integration and segmentation are two opposite motion perception tasks but co-exist to constitute two fundamental types of motion processing. Segmentation is achieved by discriminating motion signals from different objects, which is thought to be due to the antagonistic interaction between center and surrounding RFs. Integration is obtained by utilizing the enhancing function of surrounding areas to CRF areas. Both types of processing have been found in motion related areas including area MT and MST. Tadin, et al. [18] have found that motion segmentation dominants at high stimulus contrast and gives the way to motion integration at low stimulus contrast. Huang, et al. [19] suggests that the surrounding modulation is adaptive according to the visual stimulus such as contrasts and noise levels. Since our proposed opponency model determines the functional role of neurons by only the phase shift parameter, this makes the proposed model to be an ideal candidate model for the adaptive surrounding modulation with the phase tuning between two spatial regions. References [1]. K. Nakayama and J. M. Loomis, “Optical velocity patterns, velocity-sensitive neurons, and space perception: A hypothesis,” Perception, vol. 3, 63-80, 1974. [2]. K. Tanaka, K. Hikosaka, H. Saito, M. Yukie, Y. Fukada and E. Iwai, “Analysis of local and wide-field movements in the superior temporal visual areas of the macaque monkey,” Journal of Neuroscience, vol. 6, pp. 134-144, 1986. [3]. R. T. Born and R. B. H. Tootell, “Segregation of global and local motion processing in primate middle temporal visual area,” Nature, vol. 357, pp. 497-499, 1992. [4]. J. Allman, F. Miezin and E. McGuinness, “Stimulus specific responses from beyond the classical receptive field: Neurophysiological mechanisms for local-global comparisions in visual neurons,” Annual Review Neuroscience, vol. 8, pp. 407-430, 1985. [5]. V. A. F. Lamme, “The neurophysiology of figure-ground segregation in primary visual cortex,” Journal of Neuroscience, vol. 15, pp. 1605-1615, 1995. [6]. D. C. Bradley and R. A. Andersen, “Center-surround antagonism based on disparity in primate area MT,” Journal of Neuroscience, vol. 18, pp. 7552-65, 1998. [7]. D. J. Heeger, A. D. Jepson and E. P. Simoncelli, “Recovering observer translation with center-surround operators,” Proc IEEE Workshop on Visual Motion, pp. 95-100, Oct 1991. [8]. N. Petkov and E. Subramanian, “Motion detection, noise reduction, texture suppression, and contour enhancement by spatiotemporal Gabor filters with surround inhibition,” Biological Cybernetics, vol. 97, pp. 423-439, 2007. [9]. M. Escobar and P. Kornprobst, “Action recognition with a Bio-inspired feedforward motion processing model: the richness of center-surround interactions,” ECCV '08: Proceedings of the 10th European Conference on Computer Vision, pp. 186-199, Marseille, France, 2008. [10]. B. J. Frost and K. Nakayama, “Single visual neurons code opposing motion independent of direction,” Science, vol. 200, pp. 744-745, 1983. [11]. A. Cao and P. H. Schiller, “Neural responses to relative speed in the primary visual cortex of rhesus monkey,” Visual Neuroscience, vol. 20, pp. 77-84, 2003. [12]. B. K. Dellen, J. W. Clark and R. Wessel, “Computing relative motion with complex cells,” Visual Neuroscience, vol. 22, pp. 225-236, 2005. [13]. I. Ohzawa, G. C. Deangelis and R. D. Freeman, “Encoding of binocular disparity by complex cells in the cat’s visual cortex,” Journal of Neurophysiology, vol. 77, pp. 2879-2909, 1997. [14]. D. J. Fleet, H. Wagner and D. J. Heeger, “Neural Encoding of binocular disparity: energy model, position shifts and phase shifts,” Vision Research, vol. 26, pp. 1839-1857, 1996. [15]. Y. C. Meng and B. E. Shi, “Normalized Phase Shift Motion Energy Neuron Populations for Image Velocity Estimation,” International Joint Conference on Neural Network, Atlanta, GA, June 14-19, 2009. [16]. E. H. Adelson and J. R. Bergen, “Spatiotemporal energy models for the perception of motion,” J. Opt. Soc. Am. A Opt. Image Sci. Vis., vol. 2, pp. 284-299, 1985. [17]. D. K. Xiao, S. Raiguel, V. Marcar, J. Koenderink and G. A. Orban, “The spatial distribution of the antagonistic surround of MT/V5,” Cereb Cortex, vol. 7, pp. 662-677, 1997. [18]. D. Tadin, J. S. Lappin, L. A. Gilroy and R. Blake, “Perceptual consequences of centre-surround antagonism in visual motion processing,” Nature, vol. 424, pp. 312-315, 2003. [19]. X. Huang, T. D. Albright and G. R. Stoner, “Adaptive surround modulation in cortical area MT,” Neuron, vol. 53, pp. 761-770, 2007.

2 0.73894858 232 nips-2009-Strategy Grafting in Extensive Games

Author: Kevin Waugh, Nolan Bard, Michael Bowling

Abstract: Extensive games are often used to model the interactions of multiple agents within an environment. Much recent work has focused on increasing the size of an extensive game that can be feasibly solved. Despite these improvements, many interesting games are still too large for such techniques. A common approach for computing strategies in these large games is to first employ an abstraction technique to reduce the original game to an abstract game that is of a manageable size. This abstract game is then solved and the resulting strategy is played in the original game. Most top programs in recent AAAI Computer Poker Competitions use this approach. The trend in this competition has been that strategies found in larger abstract games tend to beat strategies found in smaller abstract games. These larger abstract games have more expressive strategy spaces and therefore contain better strategies. In this paper we present a new method for computing strategies in large games. This method allows us to compute more expressive strategies without increasing the size of abstract games that we are required to solve. We demonstrate the power of the approach experimentally in both small and large games, while also providing a theoretical justification for the resulting improvement. 1

3 0.6583432 138 nips-2009-Learning with Compressible Priors

Author: Volkan Cevher

Abstract: We describe a set of probability distributions, dubbed compressible priors, whose independent and identically distributed (iid) realizations result in p-compressible signals. A signal x ∈ RN is called p-compressible with magnitude R if its sorted coefficients exhibit a power-law decay as |x|(i) R · i−d , where the decay rate d is equal to 1/p. p-compressible signals live close to K-sparse signals (K N) in the r -norm (r > p) since their best K-sparse approximation error decreases with O R · K 1/r−1/p . We show that the membership of generalized Pareto, Student’s t, log-normal, Fr´ chet, and log-logistic distributions to the set of compresse ible priors depends only on the distribution parameters and is independent of N . In contrast, we demonstrate that the membership of the generalized Gaussian distribution (GGD) depends both on the signal dimension and the GGD parameters: the expected decay rate of N -sample iid realizations from the GGD with the shape parameter q is given by 1/ [q log (N/q)]. As stylized examples, we show via experiments that the wavelet coefficients of natural images are 1.67-compressible whereas their pixel gradients are 0.95 log (N/0.95)-compressible, on the average. We also leverage the connections between compressible priors and sparse signals to develop new iterative re-weighted sparse signal recovery algorithms that outperform the standard 1 -norm minimization. Finally, we describe how to learn the hyperparameters of compressible priors in underdetermined regression problems by exploiting the geometry of their order statistics during signal recovery. 1

4 0.53662574 147 nips-2009-Matrix Completion from Noisy Entries

Author: Raghunandan Keshavan, Andrea Montanari, Sewoong Oh

Abstract: Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations of a small, random subset of its entries. The problem arises in a variety of applications, from collaborative filtering (the ‘Netflix problem’) to structure-from-motion and positioning. We study a low complexity algorithm introduced in [1], based on a combination of spectral techniques and manifold optimization, that we call here O PT S PACE. We prove performance guarantees that are order-optimal in a number of circumstances. 1

5 0.46350169 243 nips-2009-The Ordered Residual Kernel for Robust Motion Subspace Clustering

Author: Tat-jun Chin, Hanzi Wang, David Suter

Abstract: We present a novel and highly effective approach for multi-body motion segmentation. Drawing inspiration from robust statistical model fitting, we estimate putative subspace hypotheses from the data. However, instead of ranking them we encapsulate the hypotheses in a novel Mercer kernel which elicits the potential of two point trajectories to have emerged from the same subspace. The kernel permits the application of well-established statistical learning methods for effective outlier rejection, automatic recovery of the number of motions and accurate segmentation of the point trajectories. The method operates well under severe outliers arising from spurious trajectories or mistracks. Detailed experiments on a recent benchmark dataset (Hopkins 155) show that our method is superior to other stateof-the-art approaches in terms of recovering the number of motions, segmentation accuracy, robustness against gross outliers and computational efficiency. 1 Introduction1 Multi-body motion segmentation concerns the separation of motions arising from multiple moving objects in a video sequence. The input data is usually a set of points on the surface of the objects which are tracked throughout the video sequence. Motion segmentation can serve as a useful preprocessing step for many computer vision applications. In recent years the case of rigid (i.e. nonarticulated) objects for which the motions could be semi-dependent on each other has received much attention [18, 14, 19, 21, 22, 17]. Under this domain the affine projection model is usually adopted. Such a model implies that the point trajectories from a particular motion lie on a linear subspace of at most four, and trajectories from different motions lie on distinct subspaces. Thus multi-body motion segmentation is reduced to the problem of subspace segmentation or clustering. To realize practical algorithms, motion segmentation approaches should possess four desirable attributes: (1) Accuracy in classifying the point trajectories to the motions they respectively belong to. This is crucial for success in the subsequent vision applications, e.g. object recognition, 3D reconstruction. (2) Robustness against inlier noise (e.g. slight localization error) and gross outliers (e.g. mistracks, spurious trajectories), since getting imperfect data is almost always unavoidable in practical circumstances. (3) Ability to automatically deduce the number of motions in the data. This is pivotal to accomplish fully automated vision applications. (4) Computational efficiency. This is integral for the processing of video sequences which are usually large amounts of data. Recent work on multi-body motion segmentation can roughly be divided into algebraic or factorization methods [3, 19, 20], statistical methods [17, 7, 14, 6, 10] and clustering methods [22, 21, 5]. Notable approaches include Generalized PCA (GPCA) [19, 20], an algebraic method based on the idea that one can fit a union of m subspaces with a set of polynomials of degree m. Statistical methods often employ concepts such random hypothesis generation [4, 17], Expectation-Maximization [14, 6] 1 This work was supported by the Australian Research Council (ARC) under the project DP0878801. 1 and geometric model selection [7, 8]. Clustering based methods [22, 21, 5] are also gaining attention due to their effectiveness. They usually include a dimensionality reduction step (e.g. manifold learning [5]) followed by a clustering of the point trajectories (e.g. via spectral clustering in [21]). A recent benchmark [18] indicated that Local Subspace Affinity (LSA) [21] gave the best performance in terms of classification accuracy, although their result was subsequently surpassed by [5, 10]. However, we argue that most of the previous approaches do not simultaneously fulfil the qualities desirable of motion segmentation algorithms. Most notably, although some of the approaches have the means to estimate the number of motions, they are generally unreliable in this respect and require manual input of this parameter. In fact this prior knowledge was given to all the methods compared in [18]2 . Secondly, most of the methods (e.g. [19, 5]) do not explicitly deal with outliers. They will almost always breakdown when given corrupted data. These deficiencies reduce the usefulness of available motion segmentation algorithms in practical circumstances. In this paper we attempt to bridge the gap between experimental performance and practical usability. Our previous work [2] indicates that robust multi-structure model fitting can be achieved effectively with statistical learning. Here we extend this concept to motion subspace clustering. Drawing inspiration from robust statistical model fitting [4], we estimate random hypotheses of motion subspaces in the data. However, instead of ranking these hypotheses we encapsulate them in a novel Mercer kernel. The kernel can function reliably despite overwhelming sampling imbalance, and it permits the application of non-linear dimensionality reduction techniques to effectively identify and reject outlying trajectories. This is then followed by Kernel PCA [11] to maximize the separation between groups and spectral clustering [13] to recover the number of motions and clustering. Experiments on the Hopkins 155 benchmark dataset [18] show that our method is superior to other approaches in terms of the qualities described above, including computational efficiency. 1.1 Brief review of affine model multi-body motion segmentation Let {tf p ∈ R2 }f =1,...,F be the set of 2D coordinates of P trajectories tracked across F frames. In p=1,...,P multi-body motion segmentation the tf p ’s correspond to points on the surface of rigid objects which are moving. The goal is to separate the trajectories into groups corresponding to the motion they belong to. In other words, if we arrange the coordinates in the following data matrix   t11 · · · t1P  . .  ∈ R2F ×P , .. .  T= . (1) . . . tF 1 . . . tF P the goal is to find the permutation Γ ∈ RP ×P such that the columns of T · Γ are arranged according to the respective motions they belong to. It turns out that under affine projection [1, 16] trajectories from the same motion lie on a distinct subspace in R2F , and each of these motion subspaces is of dimensions 2, 3 or 4. Thus motion segmentation can be accomplished via clustering subspaces in R2F . See [1, 16] for more details. Realistically actual motion sequences might contain trajectories which do not correspond to valid objects or motions. These trajectories behave as outliers in the data and, if not taken into account, can be seriously detrimental to subspace clustering algorithms. 2 The Ordered Residual Kernel (ORK) First, we take a statistical model fitting point of view to motion segmentation. Let {xi }i=1,...,N be the set of N samples on which we want to perform model fitting. We randomly draw p-subsets from the data and use it to fit a hypothesis of the model, where p is the number of parameters that define the model. In motion segmentation, the xi ’s are the columns of matrix T, and p = 4 since the model is a four-dimensional subspace3 . Assume that M of such random hypotheses are drawn. i i For each data point xi compute its absolute residual set ri = {r1 , . . . , rM } as measured to the M hypotheses. For motion segmentation, the residual is the orthogonal distance to a hypothesis 2 As confirmed through private contact with the authors of [18]. Ideally we should also consider degenerate motions with subspace dimensions 2 or 3, but previous work [18] using RANSAC [4] and our results suggest this is not a pressing issue for the Hopkins 155 dataset. 3 2 i i subspace. We sort the elements in ri to obtain the sorted residual set ˜i = {rλi , . . . , rλi }, where r 1 M i i the permutation {λi , . . . , λi } is obtained such that rλi ≤ · · · ≤ rλi . Define the following 1 M 1 M ˜ θi := {λi , . . . , λi } 1 M (2) ˜ as the sorted hypothesis set of point xi , i.e. θi depicts the order in which xi becomes the inlier of the M hypotheses as a fictitious inlier threshold is increased from 0 to ∞. We define the Ordered Residual Kernel (ORK) between two data points as 1 kr (xi1 , xi2 ) := ˜ Z M/h t ˜ ˜ zt · k∩ (θi1 , θi2 ), (3) t=1 M/h where zt = 1 are the harmonic series and Z = t=1 zt is the (M/h)-th harmonic number. t Without lost of generality assume that M is wholly divisible by h. Step size h is used to obtain the Difference of Intersection Kernel (DOIK) 1 ˜1:α t ˜ ˜ ˜1:α ˜1:α ˜1:α k∩ (θi1 , θi2 ) := (|θi1 t ∩ θi2 t | − |θi1 t−1 ∩ θi2 t−1 |) (4) h ˜a:b where αt = t · h and αt−1 = (t − 1) · h. Symbol θi indicates the set formed by the a-th to ˜i . Since the contents of the sorted hypotheses set are merely permutations of the b-th elements of θ {1 . . . M }, i.e. there are no repeating elements, 0 ≤ kr (xi1 , xi2 ) ≤ 1. ˜ (5) Note that kr is independent of the type of model to be fitted, thus it is applicable to generic statistical ˜ model fitting problems. However, we concentrate on motion subspaces in this paper. Let τ be a fictitious inlier threshold. The kernel kr captures the intuition that, if τ is low, two ˜ points arising from the same subspace will have high normalized intersection since they share many common hypotheses which correspond to that subspace. If τ is high, implausible hypotheses fitted on outliers start to dominate and decrease the normalized intersection. Step size h allows us to quantify the rate of change of intersection if τ is increased from 0 to ∞, and since zt is decreasing, kr will evaluate to a high value for two points from the same subspace. In contrast, kr is always low ˜ ˜ for points not from the same subspace or that are outliers. Proof of satisfying Mercer’s condition. Let D be a fixed domain, and P(D) be the power set of D, i.e. the set of all subsets of D. Let S ⊆ P(D), and p, q ∈ S. If µ is a measure on D, then k∩ (p, q) = µ(p ∩ q), (6) called the intersection kernel, is provably a valid Mercer kernel [12]. The DOIK can be rewritten as t ˜ ˜ k∩ (θi1 , θi2 ) = 1 ˜(αt−1 +1):αt ˜(αt−1 +1):αt (|θ ∩ θi2 | h i1 ˜1:(α ) ˜(α +1):αt | + |θ (αt−1 +1):αt ∩ θ 1:(αt−1 ) |). ˜ ˜ +|θi1 t−1 ∩ θi2 t−1 i1 i2 (7) If we let D = {1 . . . M } be the set of all possible hypothesis indices and µ be uniform on D, each term in Eq. (7) is simply an intersection kernel multiplied by |D|/h. Since multiplying a kernel with a positive constant and adding two kernels respectively produce valid Mercer kernels [12], the DOIK and ORK are also valid Mercer kernels.• Parameter h in kr depends on the number of random hypotheses M , i.e. step size h can be set as a ˜ ratio of M . The value of M can be determined based on the size of the p-subset and the size of the data N (e.g. [23, 15]), and thus h is not contingent on knowledge of the true inlier noise scale or threshold. Moreover, our experiments in Sec. 4 show that segmentation performance is relatively insensitive to the settings of h and M . 2.1 Performance under sampling imbalance Methods based on random sampling (e.g. RANSAC [4]) are usually affected by unbalanced datasets. The probability of simultaneously retrieving p inliers from a particular structure is tiny if points 3 from that structure represent only a small minority in the data. In an unbalanced dataset the “pure” p-subsets in the M randomly drawn samples will be dominated by points from the majority structure in the data. This is a pronounced problem in motion sequences, since there is usually a background “object” whose point trajectories form a large majority in the data. In fact, for motion sequences from the Hopkins 155 dataset [18] with typically about 300 points per sequence, M has to be raised to about 20,000 before a pure p-subset from the non-background objects is sampled. However, ORK can function reliably despite serious sampling imbalance. This is because points from the same subspace are roughly equi-distance to the sampled hypotheses in their vicinity, even though these hypotheses might not pass through that subspace. Moreover, since zt in Eq. (3) is decreasing only residuals/hypotheses in the vicinity of a point are heavily weighted in the intersection. Fig. 1(a) illustrates this condition. Results in Sec. 4 show that ORK excelled even with M = 1, 000. (a) Data in R2F . (b) Data in RKHS Fkr . ˜ Figure 1: (a) ORK under sampling imbalance. (b) Data in RKHS induced by ORK. 3 Multi-Body Motion Segmentation using ORK In this section, we describe how ORK is used for multi-body motion segmentation. 3.1 Outlier rejection via non-linear dimensionality reduction Denote by Fkr the Reproducing Kernel Hilbert Space (RKHS) induced by kr . Let matrix A = ˜ ˜ [φ(x1 ) . . . φ(xN )] contain the input data after it is mapped to Fkr . The kernel matrix K = AT A is ˜ computed using the kernel function kr as ˜ Kp,q = φ(xp ), φ(xq ) = kr (xp , xq ), p, q ∈ {1 . . . N }. ˜ (8) Since kr is a valid Mercer kernel, K is guaranteed to be positive semi-definite [12]. Let K = ˜ Q∆QT be the eigenvalue decomposition (EVD) of K. Then the rank-n Kernel Singular Value Decomposition (Kernel SVD) [12] of A is 1 1 An = [AQn (∆n )− 2 ][(∆n ) 2 ][(Qn )T ] ≡ Un Σn (Vn )T . n n (9) n Via the Matlab notation, Q = Q:,1:n and ∆ = ∆1:n,1:n . The left singular vectors U is an orthonormal basis for the n-dimensional principal subspace of the whole dataset in Fkr . Projecting ˜ the data onto the principal subspace yields 1 1 B = [AQn (∆n )− 2 ]T A = (∆n ) 2 (Qn )T , (10) n×N where B = [b1 . . . bN ] ∈ R is the reduced dimension version of A. Directions of the principal subspace are dominated by inlier points, since kr evaluates to a high value generally for them, but ˜ always to a low value for gross outliers. Moreover the kernel ensures that points from the same subspace are mapped to the same cluster and vice versa. Fig. 1(b) illustrates this condition. Fig. 2(a)(left) shows the first frame of sequence “Cars10” from the Hopkins 155 dataset [18] with 100 false trajectories of Brownian motion added to the original data (297 points). The corresponing RKHS norm histogram for n = 3 is displayed in Fig. 2(b). The existence of two distinct modes, 4 15 Outlier mode Bin count Inlier mode 10 5 0 (a) (left) Before and (right) after outlier removal. Blue dots are inliers while red dots are added outliers. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Vector norm in principal subspace 0.18 0.2 (b) Actual norm histogram of “cars10”. Figure 2: Demonstration of outlier rejection on sequence “cars10” from Hopkins 155. corresponding respectively to inliers and outliers, is evident. We exploit this observation for outlier rejection by discarding data with low norms in the principal subspace. The cut-off threshold ψ can be determined by analyzing the shape of the distribution. For instance we can fit a 1D Gaussian Mixture Model (GMM) with two components and set ψ as the point of equal Mahalanobis distance between the two components. However, our experimentation shows that an effective threshold can be obtained by simply setting ψ as the average value of all the norms, i.e. ψ= 1 N N bi . (11) i=1 This method was applied uniformly on all the sequences in our experiments in Sec. 4. Fig. 2(a)(right) shows an actual result of the method on Fig. 2(a)(left). 3.2 Recovering the number of motions and subspace clustering After outlier rejection, we further take advantage of the mapping induced by ORK for recovering the number of motions and subspace clustering. On the remaining data, we perform Kernel PCA [11] to seek the principal components which maximize the variance of the data in the RKHS, as Fig. 1(b) illustrates. Let {yi }i=1,...,N ′ be the N ′ -point subset of the input data that remains after outlier removal, where N ′ < N . Denote by C = [φ(y1 ) . . . φ(yN ′ )] the data matrix after mapping the data ˜ to Fkr , and by symbol C the result of adjusting C with the empirical mean of {φ(y1 ), . . . , φ(yN ′ )}. ˜ ˜ ˜ ˜ The centered kernel matrix K′ = CT C [11] can be obtained as 1 ˜ K′ = ν T K′ ν, ν = [IN ′ − ′ 1N ′ ,N ′ ], (12) N where K′ = CT C is the uncentered kernel matrix, Is and 1s,s are respectively the s × s identity ˜ ˜ matrix and a matrix of ones. If K′ = RΩRT is the EVD of K′ , then we obtain first-m kernel m ˜ principal components P of C as the first-m left singular vectors of C , i.e. 1 ˜ Pm = CRm (Ωm )− 2 , (13) where Rm = R:,1:m and Ω1:m,1:m ; see Eq. (9). Projecting the data on the principal components yields 1 D = [d1 . . . dN ′ ] = (Ωm ) 2 (Rm )T , (14) ′ where D ∈ Rm×N . The affine subspace span(Pm ) maximizes the spread of the centered data in the RKHS, and the projection D offers an effective representation for clustering. Fig. 3(a) shows the Kernel PCA projection results for m = 3 on the sequence in Fig. 2(a). The number of clusters in D is recovered via spectral clustering. More specifically we apply the Normalized Cut (Ncut) [13] algorithm. A fully connected graph is first derived from the data, where ′ ′ its weighted adjacency matrix W ∈ RN ×N is obtained as Wp,q = exp(− dp − dq 2 /2δ 2 ), (15) and δ is taken as the average nearest neighbour distance in the Euclidean sense among the vectors in D. The Laplacian matrix [13] is then derived from W and eigendecomposed. Under Ncut, 5 0.1 0.05 0 −0.05 −0.1 0.1 −0.15 0.15 0.08 0.1 0.05 0 −0.05 −0.1 0.06 (a) Kernel PCA and Ncut results. (b) W matrix. (c) Final result for “cars10”. Figure 3: Actual results on the motion sequence in Fig. 2(a)(left). the number of clusters is revealed as the number of eigenvalues of the Laplacian that are zero or numerically insignificant. With this knowledge, a subsequent k-means step is then performed to cluster the points. Fig. 3(b) shows W for the input data in Fig. 2(a)(left) after outlier removal. It can be seen that strong affinity exists between points from the same cluster, thus allowing accurate clustering. Figs. 3(a) and 3(c) illustrate the final clustering result for the data in Fig. 2(a)(left). There are several reasons why spectral clustering under our framework is more successful than previous methods. Firstly, we perform an effective outlier rejection step that removes bad trajectories that can potentially mislead the clustering. Secondly, the mapping induced by ORK deliberately separates the trajectories based on their cluster membership. Finally, we perform Kernel PCA to maximize the variance of the data. Effectively this also improves the separation of clusters, thus facilitating an accurate recovery of the number of clusters and also the subsequent segmentation. This distinguishes our work from previous clustering based methods [21, 5] which tend to operate without maximizing the between-class scatter. Results in Sec. 4 validate our claims. 4 Results Henceforth we indicate the proposed method as “ORK”. We leverage on a recently published benchmark on affine model motion segmentation [18] as a basis of comparison. The benchmark was evaluated on the Hopkins 155 dataset4 which contains 155 sequences with tracked point trajectories. A total of 120 sequences have two motions while 35 have three motions. The sequences contain degenerate and non-degenerate motions, independent and partially dependent motions, articulated motions, nonrigid motions etc. In terms of video content three categories exist: Checkerboard sequences, traffic sequences (moving cars, trucks) and articulated motions (moving faces, people). 4.1 Details on benchmarking Four major algorithms were compared in [18]: Generalized PCA (GPCA) [19], Local Subspace Affinity (LSA) [21], Multi-Stage Learning (MSL) [14] and RANSAC [17]. Here we extend the benchmark with newly reported results from Locally Linear Manifold Clustering (LLMC) [5] and Agglomerative Lossy Compression (ALC) [10, 9]. We also compare our method against Kanatani and Matsunaga’s [8] algorithm (henceforth, the “KM” method) in estimating the number of independent motions in the video sequences. Note that KM per se does not perform motion segmentation. For the sake of objective comparisons we use only implementations available publicly5. Following [18], motion segmentation performance is evaluated in terms of the labelling error of the point trajectories, where each point in a sequence has a ground truth label, i.e. number of mislabeled points . (16) classification error = total number of points Unlike [18], we also emphasize on the ability of the methods in recovering the number of motions. However, although the methods compared in [18] (except RANSAC) theoretically have the means to 4 Available at http://www.vision.jhu.edu/data/hopkins155/. For MSL and KM, see http://www.suri.cs.okayama-u.ac.jp/e-program-separate.html/. For GPCA, LSA and RANSAC, refer to the url for the Hopkins 155 dataset. 5 6 do so, their estimation of the number of motions is generally unrealiable and the benchmark results in [18] were obtained by revealing the actual number of motions to the algorithms. A similar initialization exists in [5, 10] where the results were obtained by giving LLMC and ALC this knowledge a priori (for LLMC, this was given at least to the variant LLMC 4m during dimensionality reduction [5], where m is the true number of motions). In the following subsections, where variants exist for the compared algorithms we use results from the best performing variant. In the following the number of random hypotheses M and step size h for ORK are fixed at 1000 and 300 respectively, and unlike the others, ORK is not given knowledge of the number of motions. 4.2 Data without gross outliers We apply ORK on the Hopkins 155 dataset. Since ORK uses random sampling we repeat it 100 times for each sequence and average the results. Table 1 depicts the obtained classification error among those from previously proposed methods. ORK (column 9) gives comparable results to the other methods for sequences with 2 motions (mean = 7.83%, median = 0.41%). For sequences with 3 motions, ORK (mean = 12.62%, median = 4.75%) outperforms GPCA and RANSAC, but is slightly less accurate than the others. However, bear in mind that unlike the other methods ORK is not given prior knowledge of the true number of motions and has to estimate this independently. Column Method 1 REF 2 GPCA Mean Median 2.03 0.00 4.59 0.38 Mean Median 5.08 2.40 28.66 28.26 3 4 5 6 LSA MSL RANSAC LLMC Sequences with 2 motions 3.45 4.14 5.56 3.62 0.59 0.00 1.18 0.00 Sequences with 3 motions 9.73 8.23 22.94 8.85 2.33 1.76 22.03 3.19 8 ALC 9 ORK 10 ORK∗ 3.03 0.00 7.83 0.41 1.27 0.00 6.26 1.02 12.62 4.75 2.09 0.05 Table 1: Classification error (%) on Hopkins 155 sequences. REF represents the reference/control method which operates based on knowledge of ground truth segmentation. Refer to [18] for details. We also separately investigate the accuracy of ORK in estimating the number of motions, and compare it against KM [8] which was proposed for this purpose. Note that such an experiment was not attempted in [18] since approaches compared therein generally do not perform reliably in estimating the number of motions. The results in Table 2 (columns 1–2) show that for sequences with two motions, KM (80.83%) outperforms ORK (67.37%) by ≈ 15 percentage points. However, for sequences with three motions, ORK (49.66%) vastly outperforms KM (14.29%) by more than doubling the percentage points of accuracy. The overall accuracy of KM (65.81%) is slightly better than ORK (63.37%), but this is mostly because sequences with two motions form the majority in the dataset (120 out of 155). This leads us to conclude that ORK is actually the superior method here. Dataset Column Method 2 motions 3 motions Overall Hopkins 155 1 2 KM ORK 80.83% 67.37% 14.29% 49.66% 65.81% 63.37% Hopkins 155 + Outliers 3 4 KM ORK 00.00% 47.58% 100.00% 50.00% 22.58% 48.13% Table 2: Accuracy in determining the number of motions in a sequence. Note that in the experiment with outliers (columns 3–4), KM returns a constant number of 3 motions for all sequences. We re-evaluate the performance of ORK by considering only results on sequences where the number of motions is estimated correctly by ORK (there are about 98 ≡ 63.37% of such cases). The results are tabulated under ORK∗ (column 10) in Table 1. It can be seen that when ORK estimates the number of motions correctly, it is significantly more accurate than the other methods. Finally, we compare the speed of the methods in Table 3. ORK was implemented and run in Matlab on a Dual Core Pentium 3.00GHz machine with 4GB of main memory (this is much less powerful 7 than the 8 Core Xeon 3.66GHz with 32GB memory used in [18] for the other methods in Table 3). The results show that ORK is comparable to LSA, much faster than MSL and ALC, but slower than GPCA and RANSAC. Timing results of LLMC are not available in the literature. Method 2 motions 3 motions GPCA 324ms 738ms LSA 7.584s 15.956s MSL 11h 4m 1d 23h RANSAC 175ms 258ms ALC 10m 32s 10m 32s ORK 4.249s 8.479s Table 3: Average computation time on Hopkins 155 sequences. 4.3 Data with gross outliers We next examine the ability of the proposed method in dealing with gross outliers in motion data. For each sequence in Hopkins 155, we add 100 gross outliers by creating trajectories corresponding to mistracks or spuriously occuring points. These are created by randomly initializing 100 locations in the first frame and allowing them to drift throughout the sequence according to Brownian motion. The corrupted sequences are then subjected to the algorithms for motion segmentation. Since only ORK is capable of rejecting outliers, the classification error of Eq. (16) is evaluated on the inlier points only. The results in Table 4 illustrate that ORK (column 4) is the most accurate method by a large margin. Despite being given the true number of motions a priori, GPCA, LSA and RANSAC are unable to provide satisfactory segmentation results. Column Method Mean Median Mean Median 1 2 3 4 GPCA LSA RANSAC ORK Sequences with 2 motions 28.66 24.25 30.64 16.50 30.96 26.51 32.36 10.54 Sequences with 3 motions 40.61 30.94 42.24 19.99 41.30 27.68 43.43 8.49 5 ORK∗ 1.62 0.00 2.68 0.09 Table 4: Classification error (%) on Hopkins 155 sequences with 100 gross outliers per sequence. In terms of estimating the number of motions, as shown in column 4 in Table 2 the overall accuracy of ORK is reduced to 48.13%. This is contributed mainly by the deterioration in accuracy on sequences with two motions (47.58%), although the accuracy on sequences with three motions are maintained (50.00%). This is not a surprising result, since sequences with three motions generally have more (inlying) point trajectories than sequences with two motions, thus the outlier rates for sequences with three motions are lower (recall that a fixed number of 100 false trajectories are added). On the other hand, the KM method (column 3) is completely overwhelmed by the outliers— for all the sequences with outliers it returned a constant “3” as the number of motions. We again re-evaluate ORK by considering results from sequences (now with gross outliers) where the number of motions is correctly estimated (there are about 75 ≡ 48.13% of such cases). The results tabulated under ORK∗ (column 5) in Table 4 show that the proposed method can accurately segment the point trajectories without being influenced by the gross outliers. 5 Conclusions In this paper we propose a novel and highly effective approach for multi-body motion segmentation. Our idea is based on encapsulating random hypotheses in a novel Mercel kernel and statistical learning. We evaluated our method on the Hopkins 155 dataset with results showing that the idea is superior other state-of-the-art approaches. It is by far the most accurate in terms of estimating the number of motions, and it excels in segmentation accuracy despite lacking prior knowledge of the number of motions. The proposed idea is also highly robust towards outliers in the input data. Acknowledgements. We are grateful to the authors of [18] especially Ren´ Vidal for discussions e and insights which have been immensely helpful. 8 References [1] T. Boult and L. Brown. Factorization-based segmentation of motions. In IEEE Workshop on Motion Understanding, 1991. [2] T.-J. Chin, H. Wang, and D. Suter. Robust fitting of multiple structures: The statistical learning approach. In ICCV, 2009. [3] J. Costeira and T. Kanade. A multibody factorization method for independently moving objects. IJCV, 29(3):159–179, 1998. [4] M. A. Fischler and R. C. Bolles. Random sample concensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. of the ACM, 24:381–395, 1981. [5] A. Goh and R. Vidal. Segmenting motions of different types by unsupervised manifold clustering. In CVPR, 2007. [6] A. Gruber and Y. Weiss. Multibody factorization with uncertainty and missing data using the EM algorithm. In CVPR, 2004. [7] K. Kanatani. Motion segmentation by subspace separation and model selection. In ICCV, 2001. [8] K. Kanatani and C. Matsunaga. Estimating the number of independent motions for multibody segmentation. In ACCV, 2002. [9] Y. Ma, H. Derksen, W. Hong, and J. Wright. Segmentation of multivariate mixed data via lossy coding and compression. TPAMI, 29(9):1546–1562, 2007. [10] S. Rao, R. Tron, Y. Ma, and R. Vidal. Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories. In CVPR, 2008. [11] B. Sch¨ lkopf, A. Smola, and K. R. M¨ ller. Nonlinear component analysis as a kernel eigeno u value problem. Neural Computation, 10:1299–1319, 1998. [12] J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge University Press, 2004. [13] J. Shi and J. Malik. Normalized cuts and image segmentation. TPAMI, 22(8):888–905, 2000. [14] Y. Sugaya and K. Kanatani. Geometric structure of degeneracy for multi-body motion segmentation. In Workshop on Statistical Methods in Video Processing, 2004. [15] R. Toldo and A. Fusiello. Robust multiple structures estimation with J-Linkage. In ECCV, 2008. [16] C. Tomasi and T. Kanade. Shape and motion from image streams under orthography. IJCV, 9(2):137–154, 1992. [17] P. Torr. Geometric motion segmentation and model selection. Phil. Trans. Royal Society of London, 356(1740):1321–1340, 1998. [18] R. Tron and R. Vidal. A benchmark for the comparison of 3-D motion segmentation algorithms. In CVPR, 2007. [19] R. Vidal and R. Hartley. Motion segmentation with missing data by PowerFactorization and Generalized PCA. In CVPR, 2004. [20] R. Vidal, Y. Ma, and S. Sastry. Generalized Principal Component Analysis (GPCA). TPAMI, 27(12):1–15, 2005. [21] J. Yan and M. Pollefeys. A general framework for motion segmentation: independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In ECCV, 2006. [22] L. Zelnik-Manor and M. Irani. Degeneracies, dependencies and their implications on multibody and multi-sequence factorization. In CVPR, 2003. [23] W. Zhang and J. Koseck´ . Nonparametric estimation of multiple structures with outliers. In a Dynamical Vision, ICCV 2005 and ECCV 2006 Workshops, 2006. 9

6 0.43202901 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing

7 0.4225904 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

8 0.42249921 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

9 0.4187381 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

10 0.41762289 133 nips-2009-Learning models of object structure

11 0.41360554 211 nips-2009-Segmenting Scenes by Matching Image Composites

12 0.41201594 13 nips-2009-A Neural Implementation of the Kalman Filter

13 0.40828833 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

14 0.4073965 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

15 0.40738785 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

16 0.40726337 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

17 0.40723598 148 nips-2009-Matrix Completion from Power-Law Distributed Samples

18 0.40708482 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

19 0.40640309 168 nips-2009-Non-stationary continuous dynamic Bayesian networks

20 0.40478721 70 nips-2009-Discriminative Network Models of Schizophrenia