nips nips2000 nips2000-83 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Arno Schödl, Irfan A. Essa
Abstract: We present techniques for rendering and animation of realistic scenes by analyzing and training on short video sequences. This work extends the new paradigm for computer animation, video textures, which uses recorded video to generate novel animations by replaying the video samples in a new order. Here we concentrate on video sprites, which are a special type of video texture. In video sprites, instead of storing whole images, the object of interest is separated from the background and the video samples are stored as a sequence of alpha-matted sprites with associated velocity information. They can be rendered anywhere on the screen to create a novel animation of the object. We present methods to create such animations by finding a sequence of sprite samples that is both visually smooth and follows a desired path. To estimate visual smoothness, we train a linear classifier to estimate visual similarity between video samples. If the motion path is known in advance, we use beam search to find a good sample sequence. We can specify the motion interactively by precomputing the sequence cost function using Q-Iearning.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract We present techniques for rendering and animation of realistic scenes by analyzing and training on short video sequences. [sent-5, score-0.843]
2 This work extends the new paradigm for computer animation, video textures, which uses recorded video to generate novel animations by replaying the video samples in a new order. [sent-6, score-1.751]
3 Here we concentrate on video sprites, which are a special type of video texture. [sent-7, score-0.856]
4 In video sprites, instead of storing whole images, the object of interest is separated from the background and the video samples are stored as a sequence of alpha-matted sprites with associated velocity information. [sent-8, score-1.388]
5 They can be rendered anywhere on the screen to create a novel animation of the object. [sent-9, score-0.291]
6 We present methods to create such animations by finding a sequence of sprite samples that is both visually smooth and follows a desired path. [sent-10, score-0.991]
7 To estimate visual smoothness, we train a linear classifier to estimate visual similarity between video samples. [sent-11, score-0.636]
8 If the motion path is known in advance, we use beam search to find a good sample sequence. [sent-12, score-0.746]
9 We can specify the motion interactively by precomputing the sequence cost function using Q-Iearning. [sent-13, score-0.727]
10 1 Introduction Computer animation of realistic characters requires an explicitly defined model with control parameters. [sent-14, score-0.29]
11 The animator defines keyframes for these parameters, which are interpolated to generate the animation. [sent-15, score-0.071]
12 Both the model generation and the motion parameter adjustment are often manual, costly tasks. [sent-16, score-0.301]
13 Recently, researchers in computer graphics and computer vision have proposed efficient methods to generate novel views by analyzing captured images. [sent-17, score-0.265]
14 These techniques, called image-based r'endering, require minimal user interaction and allow photorealistic synthesis of still scenes[3]. [sent-18, score-0.07]
15 In [7] we introduced a new paradigm for image synthesis, which we call video textures. [sent-19, score-0.538]
16 In that paper, we extended the paradigm of image-based rendering into video-based rendering, generating novel animations from video. [sent-20, score-0.361]
17 A video texture transitions~_--- __ Figure 1: An animation is created from reordered video sprite samples. [sent-21, score-1.625]
18 Transitions between samples that are played out of the original order must be visually smooth. [sent-22, score-0.227]
19 turns a finite duration video into a continuous infinitely varying stream of images. [sent-23, score-0.464]
20 We treat the video sequence as a collection of image samples, from which we automatically select suitable sequences to form the new animation. [sent-24, score-0.649]
21 Instead of using the image as a whole, we can also record an object against a bluescreen and separate it from the background using background-subtraction. [sent-25, score-0.101]
22 We store the created opacity image (alpha channel) and the motion of the object for every sample. [sent-26, score-0.493]
23 We can then render the object at arbitrary image locations to generate animations, as shown in Figure 1. [sent-27, score-0.172]
24 We call this special type of video texture a video sprite. [sent-28, score-0.885]
25 A complete description of the video textures paradigm and techniques to generate video textures is presented in [7]. [sent-29, score-1.197]
26 In this paper, we address the controlled animation of video sprites. [sent-30, score-0.643]
27 To generate video textures or video sprites, we have to optimize the sequence of samples so that the resulting animation looks continuous and smooth, even if the samples are not played in their original order. [sent-31, score-1.559]
28 This optimization requires a visual similarity metric between sprite images, which has to be as close as possible to the human perception of similarity. [sent-32, score-0.643]
29 The simple L2 image distance used in [7] gives poor results for our example video sprite, a fish swimming in a tank. [sent-33, score-0.586]
30 In Section 2 we describe how to improve the similarity metric by training a classifier on manually labeled data [1]. [sent-34, score-0.195]
31 Video sprites usually require some form of motion control. [sent-35, score-0.501]
32 We present two t echniques to control the sprite motion while preserving the visual smoothness of the sequence. [sent-36, score-0.952]
33 In Section 3 we compute a good sequence of samples for a motion path scripted in advance. [sent-37, score-0.701]
34 Since the number of possible sequences is too large to explore exhaustively, we use beam search to make the optimization manageable. [sent-38, score-0.181]
35 For applications like computer games, we would like to control the motion of the sprite interactively. [sent-39, score-0.864]
36 1 Previous work Before the advent of 3D graphics, the idea of creating animations by sequencing 2D sprites showing different poses and actions was widely used in computer games. [sent-42, score-0.426]
37 Almost all characters in fighting and jump-and-run games are animated in this fashion. [sent-43, score-0.112]
38 Game artists had to generate all these animations manually. [sent-44, score-0.221]
39 Figure 2: Relationship between image similarities and transitions. [sent-45, score-0.058]
40 There is very little earlier work in research on automatically sequencing 2D views for animation. [sent-46, score-0.11]
41 Video Rewrite [2] is the work most closely related to video textures. [sent-47, score-0.428]
42 It creates lip motion for a new audio track from a training video of the subject speaking by replaying short subsequences of the training video fitting best to the sequence of phonemes. [sent-48, score-1.327]
43 To our knowledge, nobody has automatically generated an object animation from video thus far. [sent-49, score-0.719]
44 Of course, we are not the first applying learning techniques to animation. [sent-50, score-0.022]
45 Neural networks have also been used to improve visual similarity classification [6]. [sent-52, score-0.105]
46 2 Training the similarity metric Video textures reorder the original video samples into a new sequence. [sent-53, score-0.717]
47 If the sequence of samples is not the original order, we have to insure that transitions between samples that are out of order are visually smooth. [sent-54, score-0.44]
48 More precisely, in a transition from sample i to j, we substitute the successor of sample i by sample j and the predecessor of sample j by sample i. [sent-55, score-0.614]
49 So sample i should be similar to sample j - 1 and sample i + 1 should be similar to sample j (Figure 2). [sent-56, score-0.432]
50 The distance function Dij between two samples i and j should be small if we can substitute one image for the other without a noticeable discontinuity or "jump". [sent-57, score-0.203]
51 The simple L2 image distance used in [7] gives poor results for the fish sprite, because it fails to capture important information like the orientation of the fish. [sent-58, score-0.158]
52 Instead of trying to code this information into our system, we train a linear classifier from manually labeled training data. [sent-59, score-0.111]
53 The manual labels for a sprite pair are binary: visually acceptable or unacceptable. [sent-61, score-0.678]
54 To create the labels, we guess a rough estimator and then manually correct the classification of this estimator. [sent-62, score-0.086]
55 Since it is more important to avoid visual glitches than to exploit every possible transition, we penalize false-positives 10 times higher than false-negatives in our training. [sent-63, score-0.135]
56 segment boundary,: Ik I Figure 3: The components of the path cost function . [sent-64, score-0.465]
57 All sprite pairs that the classifier rejected are no longer considered for transitions. [sent-65, score-0.546]
58 If the pair of samples i and j is kept , we use the value of the linear classifying function as a measure for visual difference D ij . [sent-66, score-0.2]
59 The pairs i, j with i = j are treated just as any other pair, but of course they have minimal visual difference. [sent-67, score-0.058]
60 The cost for a transition Tij from sample i to sample j is then T ij = ~Di , j - l + ~Di+ l ,j . [sent-68, score-0.389]
61 3 Motion path scripting A common approach in animation is to specify all constraints before rendering the animation [8]. [sent-69, score-0.826]
62 In this section we describe how to generate a good sequence of sprites from a specified motion path, given as a series of line segments. [sent-70, score-0.76]
63 We specify a cost function for a given path, and starting at the beginning of the first segment, we explore the tree of possible transitions and find the path of least cost. [sent-71, score-0.615]
64 1 Sequence cost function The total cost function is a sum of per-frame costs. [sent-73, score-0.266]
65 For every new sequence frame, in addition to the transition cost, as discussed in the previous section, we penalize any deviation from the defined path and movement direction. [sent-74, score-0.433]
66 We only constrain the motion path, not the velocity magnitude or the motion timing because the fewer constraints we impose, the bett er the chance of finding a smooth sequence using the limited number of available video samples. [sent-75, score-1.279]
67 The path is composed of line segments and we keep track of the line segment that the sprite is currently expected to follow . [sent-76, score-1.003]
68 We compute the error function only with respect to this line segment. [sent-77, score-0.072]
69 As soon as the orthogonal proj ection of the sprite position onto the segment passes the end of the current segment, we switch to the next segment. [sent-78, score-0.707]
70 This avoids the ambiguity of which line segment to follow when paths are self-intersecting. [sent-79, score-0.218]
71 We define an animation sequence (iI, PI, h), (i 2,P2, I2) . [sent-80, score-0.309]
72 (iN ,PN , IN) where ik, 1 ::; k ::; N, is the sample shown in frame k, Pk is the position at which it is shown, and Ik is the line segment that it has to follow. [sent-83, score-0.375]
73 Let d(Pk' Id be the distance from point Pk to line Ik ' V(ik) the estimat ed velocity of the sprite at sample ik, and L(v(ik), Ik) is the angle between the velocity vector and the line segment. [sent-84, score-0.972]
74 The cost function C for the frame k from this sequence is then (1) where WI and W2 are user-defined weights that trade off visual smoothness against the motion constraints. [sent-85, score-0.675]
75 2 Sequence tree search We seed our search with all possible starting samples and set the sprite position to the starting position of the first line segment. [sent-87, score-0.972]
76 For every sequence, we store the total cost up to the current end of the path, the current position of the sprite, the current sample and the current line segment. [sent-88, score-0.542]
77 Since from any given video sample there can be many possible transitions and it is impossible to explore the whole tree, we employ beam search to prune the set of sequences after advancing the tree depth by one transition. [sent-89, score-0.893]
78 At every depth we keep the 50000 sequences with least accumulated cost. [sent-90, score-0.127]
79 When the sprite reaches the end of the last segment, the sequence with lowest total cost is chosen. [sent-91, score-0.776]
80 4 Interactive motion control For interactive applications like computer games, video sprites allow us to generate high-quality graphics without the computational burden of high-end modeling and rendering. [sent-93, score-1.175]
81 In this section we show how to control video sprite motion interactively without time-consuming optimization over a planned path. [sent-94, score-1.388]
82 The following observation allows us to compute the path tree in a much more efficient manner: If W2 in equation (1) is set to zero, the sprite does not adhere to a certain path but still moves in the desired general direction. [sent-95, score-1.016]
83 If we assume the line segment is infinitely long, or in other words is indicating only a general motion direction l , equation (1) is independent of the position Pk of the sprite and only depends on the sample that is currently shown. [sent-96, score-1.24]
84 To solve equation (2), we initialize with Fij = Tij for all i and j and then iterate over the equation until convergence. [sent-99, score-0.044]
85 1 Interactive switching between cost functions We described above how to compute a good path for a given motion direction l. [sent-101, score-0.742]
86 To interactively control the sprite, we precompute Fij for multiple motion directions, for example for the eight compass directions. [sent-102, score-0.437]
87 The user can then interactively specify the motion direction by choosing one of the precomputed cost functions. [sent-103, score-0.722]
88 Unfortunately, the cost function is precomputed to be optimal only for a certain motion direction, and does not take into account any switching between cost functions, which can cause discontinuous motion when the user changes direction. [sent-104, score-1.017]
89 Note that switching to a motion path without any motion constraint (equation (2) with WI = 0) will never cause any additional discontinuities, because the smoothness constraint is the only one left. [sent-105, score-0.928]
90 Thus, we solve our problem by precomputing a cost function that does not constrain the motion for a couple of transitions, and then starts to constrain the motion with the new motion direction. [sent-106, score-1.16]
91 The response delay allows us to gracefully adjust to the new cost function. [sent-107, score-0.133]
wordName wordTfidf (topN-words)
[('sprite', 0.501), ('video', 0.428), ('motion', 0.301), ('path', 0.222), ('animation', 0.215), ('sprites', 0.2), ('animations', 0.15), ('cost', 0.133), ('ik', 0.129), ('rendering', 0.125), ('segment', 0.11), ('sample', 0.108), ('interactively', 0.1), ('textures', 0.098), ('sequence', 0.094), ('visually', 0.086), ('alpha', 0.086), ('velocity', 0.084), ('samples', 0.084), ('beam', 0.075), ('line', 0.072), ('generate', 0.071), ('transitions', 0.069), ('pk', 0.068), ('interactive', 0.059), ('precomputed', 0.059), ('fij', 0.059), ('visual', 0.058), ('image', 0.058), ('smoothness', 0.056), ('graphics', 0.054), ('position', 0.052), ('paradigm', 0.052), ('games', 0.051), ('arno', 0.05), ('fish', 0.05), ('precomputing', 0.05), ('replaying', 0.05), ('sequencing', 0.05), ('tree', 0.049), ('specify', 0.049), ('switching', 0.048), ('similarity', 0.047), ('classifier', 0.045), ('manually', 0.044), ('object', 0.043), ('create', 0.042), ('user', 0.042), ('starting', 0.041), ('transition', 0.04), ('search', 0.04), ('characters', 0.039), ('penalize', 0.039), ('tij', 0.039), ('every', 0.038), ('direction', 0.038), ('constrain', 0.037), ('metric', 0.037), ('control', 0.036), ('sequences', 0.036), ('infinitely', 0.036), ('paths', 0.036), ('smooth', 0.034), ('novel', 0.034), ('played', 0.034), ('substitute', 0.034), ('manual', 0.034), ('automatically', 0.033), ('frame', 0.033), ('pair', 0.032), ('depth', 0.031), ('color', 0.031), ('explore', 0.03), ('texture', 0.029), ('store', 0.029), ('synthesis', 0.028), ('whole', 0.027), ('distance', 0.027), ('views', 0.027), ('analyzing', 0.027), ('computer', 0.026), ('di', 0.026), ('scenes', 0.026), ('track', 0.026), ('lowest', 0.026), ('difference', 0.026), ('labels', 0.025), ('angle', 0.024), ('created', 0.024), ('channel', 0.024), ('original', 0.023), ('poor', 0.023), ('labeled', 0.022), ('equation', 0.022), ('current', 0.022), ('techniques', 0.022), ('least', 0.022), ('end', 0.022), ('section', 0.022), ('fighting', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 83 nips-2000-Machine Learning for Video-Based Rendering
Author: Arno Schödl, Irfan A. Essa
Abstract: We present techniques for rendering and animation of realistic scenes by analyzing and training on short video sequences. This work extends the new paradigm for computer animation, video textures, which uses recorded video to generate novel animations by replaying the video samples in a new order. Here we concentrate on video sprites, which are a special type of video texture. In video sprites, instead of storing whole images, the object of interest is separated from the background and the video samples are stored as a sequence of alpha-matted sprites with associated velocity information. They can be rendered anywhere on the screen to create a novel animation of the object. We present methods to create such animations by finding a sequence of sprite samples that is both visually smooth and follows a desired path. To estimate visual smoothness, we train a linear classifier to estimate visual similarity between video samples. If the motion path is known in advance, we use beam search to find a good sample sequence. We can specify the motion interactively by precomputing the sequence cost function using Q-Iearning.
2 0.22266586 78 nips-2000-Learning Joint Statistical Models for Audio-Visual Fusion and Segregation
Author: John W. Fisher III, Trevor Darrell, William T. Freeman, Paul A. Viola
Abstract: People can understand complex auditory and visual information, often using one to disambiguate the other. Automated analysis, even at a lowlevel, faces severe challenges, including the lack of accurate statistical models for the signals, and their high-dimensionality and varied sampling rates. Previous approaches [6] assumed simple parametric models for the joint distribution which, while tractable, cannot capture the complex signal relationships. We learn the joint distribution of the visual and auditory signals using a non-parametric approach. First, we project the data into a maximally informative, low-dimensional subspace, suitable for density estimation. We then model the complicated stochastic relationships between the signals using a nonparametric density estimator. These learned densities allow processing across signal modalities. We demonstrate, on synthetic and real signals, localization in video of the face that is speaking in audio, and, conversely, audio enhancement of a particular speaker selected from the video.
3 0.21603552 82 nips-2000-Learning and Tracking Cyclic Human Motion
Author: Dirk Ormoneit, Hedvig Sidenbladh, Michael J. Black, Trevor Hastie
Abstract: We present methods for learning and tracking human motion in video. We estimate a statistical model of typical activities from a large set of 3D periodic human motion data by segmenting these data automatically into
4 0.14581093 50 nips-2000-FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks
Author: Malcolm Slaney, Michele Covell
Abstract: FaceSync is an optimal linear algorithm that finds the degree of synchronization between the audio and image recordings of a human speaker. Using canonical correlation, it finds the best direction to combine all the audio and image data, projecting them onto a single axis. FaceSync uses Pearson's correlation to measure the degree of synchronization between the audio and image data. We derive the optimal linear transform to combine the audio and visual information and describe an implementation that avoids the numerical problems caused by computing the correlation matrices. 1 Motivation In many applications, we want to know about the synchronization between an audio signal and the corresponding image data. In a teleconferencing system, we might want to know which of the several people imaged by a camera is heard by the microphones; then, we can direct the camera to the speaker. In post-production for a film, clean audio dialog is often dubbed over the video; we want to adjust the audio signal so that the lip-sync is perfect. When analyzing a film, we want to know when the person talking is in the shot, instead of off camera. When evaluating the quality of dubbed films, we can measure of how well the translated words and audio fit the actor's face. This paper describes an algorithm, FaceSync, that measures the degree of synchronization between the video image of a face and the associated audio signal. We can do this task by synthesizing the talking face, using techniques such as Video Rewrite [1], and then comparing the synthesized video with the test video. That process, however, is expensive. Our solution finds a linear operator that, when applied to the audio and video signals, generates an audio-video-synchronization-error signal. The linear operator gathers information from throughout the image and thus allows us to do the computation inexpensively. Hershey and Movellan [2] describe an approach based on measuring the mutual information between the audio signal and individual pixels in the video. The correlation between the audio signal, x, and one pixel in the image y, is given by Pearson's correlation, r. The mutual information between these two variables is given by f(x,y) = -1/2 log(l-?). They create movies that show the regions of the video that have high correlation with the audio; 1. Currently at IBM Almaden Research, 650 Harry Road, San Jose, CA 95120. 2. Currently at Yes Video. com, 2192 Fortune Drive, San Jose, CA 95131. Standard Deviation of Testing Data FaceSync
5 0.13824679 45 nips-2000-Emergence of Movement Sensitive Neurons' Properties by Learning a Sparse Code for Natural Moving Images
Author: Rafal Bogacz, Malcolm W. Brown, Christophe G. Giraud-Carrier
Abstract: Olshausen & Field demonstrated that a learning algorithm that attempts to generate a sparse code for natural scenes develops a complete family of localised, oriented, bandpass receptive fields, similar to those of 'simple cells' in VI. This paper describes an algorithm which finds a sparse code for sequences of images that preserves information about the input. This algorithm when trained on natural video sequences develops bases representing the movement in particular directions with particular speeds, similar to the receptive fields of the movement-sensitive cells observed in cortical visual areas. Furthermore, in contrast to previous approaches to learning direction selectivity, the timing of neuronal activity encodes the phase of the movement, so the precise timing of spikes is crucially important to the information encoding.
6 0.13063623 103 nips-2000-Probabilistic Semantic Video Indexing
7 0.11688118 30 nips-2000-Bayesian Video Shot Segmentation
8 0.10167533 80 nips-2000-Learning Switching Linear Models of Human Motion
9 0.096085899 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks
10 0.085315175 72 nips-2000-Keeping Flexible Active Contours on Track using Metropolis Updates
11 0.063105188 53 nips-2000-Feature Correspondence: A Markov Chain Monte Carlo Approach
12 0.062265914 101 nips-2000-Place Cells and Spatial Navigation Based on 2D Visual Feature Extraction, Path Integration, and Reinforcement Learning
13 0.05253445 76 nips-2000-Learning Continuous Distributions: Simulations With Field Theoretic Priors
14 0.05218992 88 nips-2000-Multiple Timescales of Adaptation in a Neural Code
15 0.050627615 125 nips-2000-Stability and Noise in Biochemical Switches
16 0.046413213 145 nips-2000-Weak Learners and Improved Rates of Convergence in Boosting
17 0.040668748 73 nips-2000-Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice
18 0.039913714 19 nips-2000-Adaptive Object Representation with Hierarchically-Distributed Memory Sites
19 0.039766006 44 nips-2000-Efficient Learning of Linear Perceptrons
20 0.039053176 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure
topicId topicWeight
[(0, 0.163), (1, -0.103), (2, 0.117), (3, 0.157), (4, -0.142), (5, -0.082), (6, 0.306), (7, 0.167), (8, -0.255), (9, -0.097), (10, 0.021), (11, 0.008), (12, -0.016), (13, 0.03), (14, -0.026), (15, -0.003), (16, 0.104), (17, -0.086), (18, 0.107), (19, -0.034), (20, 0.047), (21, 0.007), (22, 0.057), (23, -0.118), (24, -0.17), (25, -0.008), (26, -0.086), (27, -0.09), (28, -0.082), (29, -0.107), (30, -0.006), (31, 0.039), (32, -0.047), (33, 0.004), (34, -0.068), (35, -0.062), (36, -0.04), (37, 0.021), (38, -0.01), (39, 0.019), (40, -0.027), (41, -0.005), (42, -0.064), (43, -0.027), (44, 0.008), (45, -0.002), (46, -0.013), (47, -0.045), (48, -0.002), (49, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.98167127 83 nips-2000-Machine Learning for Video-Based Rendering
Author: Arno Schödl, Irfan A. Essa
Abstract: We present techniques for rendering and animation of realistic scenes by analyzing and training on short video sequences. This work extends the new paradigm for computer animation, video textures, which uses recorded video to generate novel animations by replaying the video samples in a new order. Here we concentrate on video sprites, which are a special type of video texture. In video sprites, instead of storing whole images, the object of interest is separated from the background and the video samples are stored as a sequence of alpha-matted sprites with associated velocity information. They can be rendered anywhere on the screen to create a novel animation of the object. We present methods to create such animations by finding a sequence of sprite samples that is both visually smooth and follows a desired path. To estimate visual smoothness, we train a linear classifier to estimate visual similarity between video samples. If the motion path is known in advance, we use beam search to find a good sample sequence. We can specify the motion interactively by precomputing the sequence cost function using Q-Iearning.
2 0.68617541 82 nips-2000-Learning and Tracking Cyclic Human Motion
Author: Dirk Ormoneit, Hedvig Sidenbladh, Michael J. Black, Trevor Hastie
Abstract: We present methods for learning and tracking human motion in video. We estimate a statistical model of typical activities from a large set of 3D periodic human motion data by segmenting these data automatically into
3 0.559524 78 nips-2000-Learning Joint Statistical Models for Audio-Visual Fusion and Segregation
Author: John W. Fisher III, Trevor Darrell, William T. Freeman, Paul A. Viola
Abstract: People can understand complex auditory and visual information, often using one to disambiguate the other. Automated analysis, even at a lowlevel, faces severe challenges, including the lack of accurate statistical models for the signals, and their high-dimensionality and varied sampling rates. Previous approaches [6] assumed simple parametric models for the joint distribution which, while tractable, cannot capture the complex signal relationships. We learn the joint distribution of the visual and auditory signals using a non-parametric approach. First, we project the data into a maximally informative, low-dimensional subspace, suitable for density estimation. We then model the complicated stochastic relationships between the signals using a nonparametric density estimator. These learned densities allow processing across signal modalities. We demonstrate, on synthetic and real signals, localization in video of the face that is speaking in audio, and, conversely, audio enhancement of a particular speaker selected from the video.
4 0.54303116 50 nips-2000-FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks
Author: Malcolm Slaney, Michele Covell
Abstract: FaceSync is an optimal linear algorithm that finds the degree of synchronization between the audio and image recordings of a human speaker. Using canonical correlation, it finds the best direction to combine all the audio and image data, projecting them onto a single axis. FaceSync uses Pearson's correlation to measure the degree of synchronization between the audio and image data. We derive the optimal linear transform to combine the audio and visual information and describe an implementation that avoids the numerical problems caused by computing the correlation matrices. 1 Motivation In many applications, we want to know about the synchronization between an audio signal and the corresponding image data. In a teleconferencing system, we might want to know which of the several people imaged by a camera is heard by the microphones; then, we can direct the camera to the speaker. In post-production for a film, clean audio dialog is often dubbed over the video; we want to adjust the audio signal so that the lip-sync is perfect. When analyzing a film, we want to know when the person talking is in the shot, instead of off camera. When evaluating the quality of dubbed films, we can measure of how well the translated words and audio fit the actor's face. This paper describes an algorithm, FaceSync, that measures the degree of synchronization between the video image of a face and the associated audio signal. We can do this task by synthesizing the talking face, using techniques such as Video Rewrite [1], and then comparing the synthesized video with the test video. That process, however, is expensive. Our solution finds a linear operator that, when applied to the audio and video signals, generates an audio-video-synchronization-error signal. The linear operator gathers information from throughout the image and thus allows us to do the computation inexpensively. Hershey and Movellan [2] describe an approach based on measuring the mutual information between the audio signal and individual pixels in the video. The correlation between the audio signal, x, and one pixel in the image y, is given by Pearson's correlation, r. The mutual information between these two variables is given by f(x,y) = -1/2 log(l-?). They create movies that show the regions of the video that have high correlation with the audio; 1. Currently at IBM Almaden Research, 650 Harry Road, San Jose, CA 95120. 2. Currently at Yes Video. com, 2192 Fortune Drive, San Jose, CA 95131. Standard Deviation of Testing Data FaceSync
5 0.54013491 30 nips-2000-Bayesian Video Shot Segmentation
Author: Nuno Vasconcelos, Andrew Lippman
Abstract: Prior knowledge about video structure can be used both as a means to improve the peiformance of content analysis and to extract features that allow semantic classification. We introduce statistical models for two important components of this structure, shot duration and activity, and demonstrate the usefulness of these models by introducing a Bayesian formulation for the shot segmentation problem. The new formulations is shown to extend standard thresholding methods in an adaptive and intuitive way, leading to improved segmentation accuracy.
6 0.53440851 80 nips-2000-Learning Switching Linear Models of Human Motion
7 0.49300373 103 nips-2000-Probabilistic Semantic Video Indexing
8 0.41748473 45 nips-2000-Emergence of Movement Sensitive Neurons' Properties by Learning a Sparse Code for Natural Moving Images
9 0.350205 125 nips-2000-Stability and Noise in Biochemical Switches
10 0.33337125 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks
11 0.26457 53 nips-2000-Feature Correspondence: A Markov Chain Monte Carlo Approach
12 0.25586703 72 nips-2000-Keeping Flexible Active Contours on Track using Metropolis Updates
13 0.22620898 138 nips-2000-The Use of Classifiers in Sequential Inference
15 0.22277923 73 nips-2000-Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice
16 0.21619438 57 nips-2000-Four-legged Walking Gait Control Using a Neuromorphic Chip Interfaced to a Support Vector Learning Algorithm
17 0.21022446 93 nips-2000-On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems
18 0.20456623 135 nips-2000-The Manhattan World Assumption: Regularities in Scene Statistics which Enable Bayesian Inference
19 0.18912165 88 nips-2000-Multiple Timescales of Adaptation in a Neural Code
20 0.18825258 44 nips-2000-Efficient Learning of Linear Perceptrons
topicId topicWeight
[(10, 0.031), (17, 0.119), (32, 0.026), (33, 0.04), (42, 0.011), (55, 0.031), (62, 0.059), (65, 0.016), (67, 0.053), (70, 0.35), (75, 0.023), (76, 0.036), (79, 0.014), (81, 0.021), (90, 0.031), (91, 0.014), (97, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.80848259 83 nips-2000-Machine Learning for Video-Based Rendering
Author: Arno Schödl, Irfan A. Essa
Abstract: We present techniques for rendering and animation of realistic scenes by analyzing and training on short video sequences. This work extends the new paradigm for computer animation, video textures, which uses recorded video to generate novel animations by replaying the video samples in a new order. Here we concentrate on video sprites, which are a special type of video texture. In video sprites, instead of storing whole images, the object of interest is separated from the background and the video samples are stored as a sequence of alpha-matted sprites with associated velocity information. They can be rendered anywhere on the screen to create a novel animation of the object. We present methods to create such animations by finding a sequence of sprite samples that is both visually smooth and follows a desired path. To estimate visual smoothness, we train a linear classifier to estimate visual similarity between video samples. If the motion path is known in advance, we use beam search to find a good sample sequence. We can specify the motion interactively by precomputing the sequence cost function using Q-Iearning.
2 0.40562862 79 nips-2000-Learning Segmentation by Random Walks
Author: Marina Meila, Jianbo Shi
Abstract: We present a new view of image segmentation by pairwise similarities. We interpret the similarities as edge flows in a Markov random walk and study the eigenvalues and eigenvectors of the walk's transition matrix. This interpretation shows that spectral methods for clustering and segmentation have a probabilistic foundation. In particular, we prove that the Normalized Cut method arises naturally from our framework. Finally, the framework provides a principled method for learning the similarity function as a combination of features. 1
3 0.40489727 74 nips-2000-Kernel Expansions with Unlabeled Examples
Author: Martin Szummer, Tommi Jaakkola
Abstract: Modern classification applications necessitate supplementing the few available labeled examples with unlabeled examples to improve classification performance. We present a new tractable algorithm for exploiting unlabeled examples in discriminative classification. This is achieved essentially by expanding the input vectors into longer feature vectors via both labeled and unlabeled examples. The resulting classification method can be interpreted as a discriminative kernel density estimate and is readily trained via the EM algorithm, which in this case is both discriminative and achieves the optimal solution. We provide, in addition, a purely discriminative formulation of the estimation problem by appealing to the maximum entropy framework. We demonstrate that the proposed approach requires very few labeled examples for high classification accuracy.
4 0.40383807 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning
Author: Zoubin Ghahramani, Matthew J. Beal
Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1
5 0.40046734 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks
Author: Javier R. Movellan, Paul Mineiro, Ruth J. Williams
Abstract: This paper explores a framework for recognition of image sequences using partially observable stochastic differential equation (SDE) models. Monte-Carlo importance sampling techniques are used for efficient estimation of sequence likelihoods and sequence likelihood gradients. Once the network dynamics are learned, we apply the SDE models to sequence recognition tasks in a manner similar to the way Hidden Markov models (HMMs) are commonly applied. The potential advantage of SDEs over HMMS is the use of continuous state dynamics. We present encouraging results for a video sequence recognition task in which SDE models provided excellent performance when compared to hidden Markov models. 1
6 0.3984375 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition
7 0.39835155 122 nips-2000-Sparse Representation for Gaussian Process Models
8 0.39497161 4 nips-2000-A Linear Programming Approach to Novelty Detection
9 0.39327678 60 nips-2000-Gaussianization
10 0.39256057 133 nips-2000-The Kernel Gibbs Sampler
11 0.39189106 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm
12 0.39179823 146 nips-2000-What Can a Single Neuron Compute?
13 0.39172766 130 nips-2000-Text Classification using String Kernels
14 0.3911463 37 nips-2000-Convergence of Large Margin Separable Linear Classification
15 0.38932401 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script
16 0.3887479 22 nips-2000-Algorithms for Non-negative Matrix Factorization
17 0.38853595 111 nips-2000-Regularized Winnow Methods
18 0.38723484 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing
19 0.38667893 49 nips-2000-Explaining Away in Weight Space
20 0.38641408 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics