nips nips2000 nips2000-103 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Milind R. Naphade, Igor Kozintsev, Thomas S. Huang
Abstract: We propose a novel probabilistic framework for semantic video indexing. We define probabilistic multimedia objects (multijects) to map low-level media features to high-level semantic labels. A graphical network of such multijects (multinet) captures scene context by discovering intra-frame as well as inter-frame dependency relations between the concepts. The main contribution is a novel application of a factor graph framework to model this network. We model relations between semantic concepts in terms of their co-occurrence as well as the temporal dependencies between these concepts within video shots. Using the sum-product algorithm [1] for approximate or exact inference in these factor graph multinets, we attempt to correct errors made during isolated concept detection by forcing high-level constraints. This results in a significant improvement in the overall detection performance. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We propose a novel probabilistic framework for semantic video indexing. [sent-4, score-0.647]
2 We define probabilistic multimedia objects (multijects) to map low-level media features to high-level semantic labels. [sent-5, score-0.643]
3 A graphical network of such multijects (multinet) captures scene context by discovering intra-frame as well as inter-frame dependency relations between the concepts. [sent-6, score-0.516]
4 The main contribution is a novel application of a factor graph framework to model this network. [sent-7, score-0.314]
5 We model relations between semantic concepts in terms of their co-occurrence as well as the temporal dependencies between these concepts within video shots. [sent-8, score-1.317]
6 Using the sum-product algorithm [1] for approximate or exact inference in these factor graph multinets, we attempt to correct errors made during isolated concept detection by forcing high-level constraints. [sent-9, score-0.68]
7 This results in a significant improvement in the overall detection performance. [sent-10, score-0.234]
8 1 Introduction Research in video retrieval has traditionally focussed on the paradigm of query-byexample (QBE) using low-level features [2]. [sent-11, score-0.383]
9 Query by keywords/key-phrases (QBK) (preferably semantic) instead of examples has motivated recent research in semantic video indexing. [sent-12, score-0.514]
10 A QBK system can support semantic retrieval for a small set of keywords and also act as the first step in QBE systems to narrow down the search. [sent-14, score-0.36]
11 The difficulty lies in the gap between low-level media features and highlevel semantics. [sent-15, score-0.12]
12 Recent attempts to address this include detection of audio-visual events like explosion [3] and semantic visual templates [4]. [sent-16, score-0.631]
13 We propose a statistical pattern recognition approach for training probabilistic multimedia objects (multijects) which map the high level concepts to low-level audiovisual features. [sent-17, score-0.557]
14 We also propose a probabilistic factor graph framework, which models the interaction between concepts within each video frame as well as across the video frames within each video shot. [sent-18, score-1.58]
15 Factor graphs provide an elegant framework to represent the stochastic relationship between concepts, while the sum-product algo- rithm provides an efficient tool to perform learning and inference in factor graphs. [sent-19, score-0.237]
16 Using exact as well as approximate inference (through loopy probability propagation) we show that there is significant improvement in the detection performance. [sent-20, score-0.322]
17 2 Proposed Framework To support retrieval based on high-level queries like' Explosion on a beach', we need models for the event explosion and site beach. [sent-21, score-0.287]
18 Detection of some of these concepts may be possible, while some others may not be directly observable. [sent-23, score-0.276]
19 To support such queries, we proposed a probabilistic multimedia object (multiject) [3] as shown in Figure 1 (a) , which has a semantic label and which summarizes a time sequence of features from multiple media. [sent-24, score-0.537]
20 Intuitively it is clear that the presence of certain multijects suggests a high possibility of detecting certain other multijects. [sent-26, score-0.478]
21 Similarly some multijects are less likely to occur in the presence of others. [sent-27, score-0.432]
22 The detection of sky and water boosts the chances of detecting a beach, and reduces the chances of detecting Indoor. [sent-28, score-0.469]
23 It might also be possible to detect some concepts and infer more complex concepts based on their relation with the detected ones. [sent-29, score-0.552]
24 Detection of human speech in the audio stream and a face in the video stream may lead to the inference of human talking. [sent-30, score-0.348]
25 To integrate all the multijects and model their interaction, we propose the network of multijects which we term as multinet. [sent-31, score-0.898]
26 A conceptual figure of a multinet is shown in Figure 1 (b) with positive (negative) signs indicating positive (negative) interaction. [sent-32, score-0.464]
27 audio featur •• (a) (b) Figure 1: (a) A probabilistic multimedia object. [sent-36, score-0.262]
28 Section 5 we present a factor graph multinet implementation. [sent-38, score-0.643]
29 3 Video segmentation and Feature Extraction We have digitized movies of different genres to create a large database of a few hours of video data. [sent-39, score-0.281]
30 The video clips are segmented into shots using the algorithm in [5]. [sent-40, score-0.273]
31 We then perform spatio-temporal segmentation [2] within each shot to obtain and track regions homogeneous in color and motion separated by strong edges. [sent-41, score-0.208]
32 For sites we use color, texture and structural features (84 elements) and for objects and events we use all features (98 elements)l . [sent-45, score-0.209]
33 For training our multiject and multinet models we use 1800 frames from different video shots and for testing our framework we use 9400 frames. [sent-47, score-0.852]
34 Since consecutive images within a shot are correlated, the video data is subsampled to create the training and testing without redundancy. [sent-48, score-0.397]
35 4 Modeling semantic concepts using Multijects We use an identical approach to model concepts in video and audio (independently and jointly). [sent-49, score-1.128]
36 The following site multijects are used in our experiments: sky, water, forest , rocks and snow. [sent-50, score-0.565]
37 Audio-only multijects (human-speech, music) can be found in [10] and audio-visual multijects (explosion) in [3]. [sent-51, score-0.864]
38 Detection of multijects is performed on every segmented region 2 within each video frame. [sent-52, score-0.734]
39 We model the semantic concept as a binary random variable and define the two hypotheses Ho and Hl as (1) where Po(Xj ) and PdXj) denote the class conditional probability density functions conditioned on the null hypothesis (concept absent) and the true hypothesis (concept present). [sent-54, score-0.44]
40 Po (Xj) and P 1 (Xj) are modeled using a mixture of Gaussian components for the site multijects 3 . [sent-55, score-0.498]
41 For objects and events (in video and audio), hidden Markov models replace the Gaussian mixture models and feature vectors for all the frames within a shot constitute to the time series modeled. [sent-56, score-0.497]
42 The detection performance for the five site multijects on the test-set is given in Table 1. [sent-57, score-0.744]
43 multiject Detection (%) False Alarm (%) Rocks 77 24. [sent-58, score-0.123]
44 1 Frame level semantic features Since multijects are used as semantic feature detectors at a regional level, it is easy to define multiject-based semantic features at the frame level by integrating the region-level classification. [sent-69, score-1.471]
45 We check each region for each concept individually and obtain probabilities of each concept being present or absent in the region. [sent-70, score-0.263]
46 Imperfect segmentation does not hurt us too much since these soft decisions are modified in the multinet based on high-level constraints. [sent-71, score-0.514]
47 In general models for Ho are represented better with more components than those for HI OR function to combine soft decisions for each concept from all regions to obtain F i . [sent-78, score-0.216]
48 , X~} (M is the number of regions in a frame), then j=M P(Fi = O IX) = II P(Rij = 0IXj ) and P(Fi = 11X) = 1- P(Fi = OIX) (3) j=l 5 The multinet as a factor graph To model the interaction between multijects in a multinet, we propose to use a factor graph [1] framework. [sent-82, score-1.411]
49 A factor graph visualizes the factorization of a global function f(x). [sent-88, score-0.273]
50 Let f(x) factor as (4) i=l where x( i) is the set of variables of the function fi. [sent-89, score-0.12]
51 A factor graph for f is defined as the bipartite graph with two vertex classes Vf and Vv of sizes m and n respectively such that the ith node in Vf is connected to the jth node in Vv iff fi is a function of Xj. [sent-90, score-0.763]
52 Figure 2 (a) shows a simple factor graph representation of f(x,y,z) = h(x,y)h(y,z) with function nodes h,h and variable nodes X,y,z. [sent-91, score-0.45]
53 The sum-product algorithm works by computing messages at the nodes using a simple rule and then passing the messages between nodes according to a reasonable schedule. [sent-94, score-0.402]
54 A message from a function node to a variable node is the product of all messages incoming to the function node with the function itself, marginalized for the variable associated with the variable node. [sent-95, score-0.732]
55 A message from a variable node to a function node is simply the product of all messages incoming to the variable node from other functions connected to it. [sent-96, score-0.651]
56 Pearl's probability propagation working on a Bayesian net is equivalent to the sum-product algorithm applied to the corresponding factor graph. [sent-97, score-0.12]
57 If the factor graph is a tree, exact inference is possible using a single set of forward and backward passage of messages. [sent-98, score-0.366]
58 For all other cases inference is approximate and the message passing is iterative [1] leading to loopy probability propagation. [sent-99, score-0.198]
59 This has a direct bearing on our problem because relations between semantic concepts are complicated and in general contain numerous cycles (e. [sent-100, score-0.593]
60 1 Relating semantic concepts in a factor graph We now describe a frame-level factor graph to model the probabilistic relations between various frame-level semantic features Fi obtained using Equation 3. [sent-104, score-1.534]
61 To capture the co-occurrence relationship between the five semantic concepts at the frame-level, we define a function node which is connected to the five variable nodes representing the concepts as shown in Figure 2 (b). [sent-105, score-1.172]
62 This function node represents P(F1' F 2 , F 3 , . [sent-106, score-0.127]
63 The function nodes below the five variable nodes denote the messages passed by the OR function of Equation 3 (P(Fi = 1), P(Fi = 0)). [sent-109, score-0.341]
64 At the function node the messages are multiplied by the function which is estimated from the co-occurrence of the concepts in the training set. [sent-111, score-0.515]
65 The function node then sends back messages summarized for each variable. [sent-112, score-0.239]
66 This modifies the soft decisions at the variable nodes according to the high-level relationship between the five concepts. [sent-113, score-0.266]
67 at the function node in Figure 2 (b) is exponential in the number of concepts (N) and the computational cost may increase quickly. [sent-115, score-0.403]
68 This modification to the graph in Figure 2 (b) is shown in Figure 2 (c). [sent-117, score-0.153]
69 The factor graph is no longer a tree and exact inference becomes hard as the number of loops grows. [sent-119, score-0.321]
70 We can also incorporate temporal A dynamic multi net with unfactored global distribution for each frame A dynamic multinet with factored global di stribution for each frame Multinet slate at Multinet Mullmet stale at framel Multinet state al frame 1-1 ,- ! [sent-121, score-0.821]
71 This can be done by replicating the slice of factor graph in Figure 2 (b) or (c) as many times as the number of frames within a single video shot and by introducing a first order Markov chain for each concept. [sent-125, score-0.772]
72 The horizontal links in Figures 3 (a), (b) connect the variable node for each concept in a time slice to the corresponding variable node in the next time slice through a function modeling the transition probability. [sent-127, score-0.563]
73 For inference, messages are iteratively passed locally within each slice. [sent-129, score-0.143]
74 This is followed by message passing across the time slices in the forward direction and then in the backward direction. [sent-130, score-0.185]
75 Accounting for temporal dependencies thus leads to temporal smoothing of the soft decisions within each shot. [sent-131, score-0.463]
76 6 Results We compare detection performance of the multijects with and without accounting for the concept dependencies and temporal dependencies. [sent-132, score-0.995]
77 The reference system performs multiject detection by thresholding soft-decisions (i. [sent-133, score-0.317]
78 The proposed schemes are then evaluated by thresholding the soft decisions obtained after message passing using the structures in Figures 2 (b), (c) (conceptual dependencies) and Figures 3 (a), (b) (conceptual and temporal dependencies). [sent-136, score-0.303]
79 We use receiver operating characteristics (ROC) curves which show a plot of the probability of detection plotted against the probability of false alarms for different values of a parameter (the threshold in our case). [sent-137, score-0.238]
80 The three curves in Figure 4 (a) correspond to the performance using isolated frame-level classification, the factor graph in Figure 2 (b) and the factor graph in Figure 2 (c) with ten iterations of loopy propagation. [sent-139, score-0.68]
81 The curves in Figure 4 (b) correspond to isolated detection followed by temporal smoothing, the dynamic multinet in Figure 3 (a) and the one in Figure 3 (b) respectively. [sent-140, score-0.796]
82 , ms (b) Figure 4: ROC curves for overall performance using isolated detection and two factor graph representations. [sent-149, score-0.561]
83 Figure 4 we observe that there is significant improvement in detection performance by using the multinet to model the dependencies between concepts than without using it. [sent-151, score-0.967]
84 This improvement is especially stark for low Pf where detection rate improves by more than 22 % for a threshold corresponding to P f = 0. [sent-152, score-0.234]
85 Interestingly, detection based on the factorized functions (Figure 2 (c)) performs better than the the one based on the unfactorized function. [sent-154, score-0.194]
86 Also by using models in Figure 3, which account for temporal dependencies across video frames and by performing smoothing using the forward backward algorithm, we see further improvement in detection performance in Figure 4 (b). [sent-156, score-0.801]
87 1 is 68 % for the static multinet (Figure 2 (c)) and 72 % for its dynamic counterpart (Figure 3 (b)). [sent-158, score-0.449]
88 Comparison of ROC curves with and without temporal smoothing (not shown here due to lack of space) reveal that temporal smoothing results in better detection irrespective of the threshold or configuration. [sent-159, score-0.542]
89 7 Conclusions and Future Research We propose a probabilistic framework for detecting semantic concepts using multijects and multinets. [sent-160, score-1.163]
90 We present implementations of static and dynamic multinets using factor graphs. [sent-161, score-0.261]
91 We show that there is significant improvement in detection performance by accounting for the interaction between semantic concepts and temporal dependency amongst the concepts. [sent-162, score-1.025]
92 The multinet architecture imposes no restrictions on the classifiers used in the multijects and we can improve performance by using better multiject models. [sent-163, score-0.925]
93 Our framework can be easily expanded to integrate multiple modalities if they have not been integrated in the multijects to account for the loose coupling between audio and visual streams in movies. [sent-164, score-0.535]
94 It can also support inference of concepts that are observed not through media features but through their relation to those concepts which are observed in media features. [sent-165, score-0.779]
95 Chang, "Spatio-temporal video search using the object-based video representation," in Proceedings of the IEEE International Conference on Image Processing, vol. [sent-176, score-0.476]
96 Huang, "Probabilistic multimedia objects (multijects): A novel approach to indexing and retrieval in multimedia systems," in Proceedings of the fifth IEEE International Conference on Image Processing, vol. [sent-185, score-0.514]
97 Sundaram, "Semantic visual templates - linking features to semantics," in Proceedings of the fifth IEEE International Conference on Image Processing, vol. [sent-192, score-0.133]
98 Tekalp, "A high performance shot boundary detection algorithm using multiple cues," in Proceedings of the fifth IEEE International Conference on Image Processing, vol. [sent-204, score-0.33]
99 Huang, "A probabilistic framework for semantic indexing and retrieval in video," to appear in IEEE International Conference on Multimedia and Expo, New York, NY, July 2000 . [sent-230, score-0.518]
100 Huang, "Stochastic modeling of soundtrack for efficient segmentation and indexing of video," in SPIE IS €3 T Storage and Retrieval for Multimedia Databases, vol. [sent-239, score-0.102]
wordName wordTfidf (topN-words)
[('multijects', 0.432), ('multinet', 0.37), ('concepts', 0.276), ('semantic', 0.276), ('video', 0.238), ('detection', 0.194), ('graph', 0.153), ('multimedia', 0.142), ('node', 0.127), ('multiject', 0.123), ('factor', 0.12), ('concept', 0.115), ('messages', 0.112), ('naphade', 0.103), ('shot', 0.096), ('temporal', 0.092), ('frame', 0.089), ('explosion', 0.089), ('huang', 0.089), ('dependencies', 0.087), ('retrieval', 0.084), ('fi', 0.083), ('rij', 0.082), ('accounting', 0.075), ('roc', 0.071), ('sky', 0.071), ('site', 0.066), ('nodes', 0.064), ('conceptual', 0.064), ('audio', 0.062), ('beach', 0.062), ('chang', 0.062), ('multinets', 0.062), ('oct', 0.062), ('features', 0.061), ('message', 0.06), ('smoothing', 0.06), ('indexing', 0.059), ('media', 0.059), ('probabilistic', 0.058), ('decisions', 0.055), ('chicago', 0.053), ('five', 0.052), ('passing', 0.05), ('isolated', 0.05), ('figures', 0.05), ('variable', 0.049), ('queries', 0.048), ('slice', 0.048), ('inference', 0.048), ('objects', 0.047), ('detecting', 0.046), ('dynamic', 0.046), ('soft', 0.046), ('backward', 0.045), ('frames', 0.045), ('xj', 0.044), ('curves', 0.044), ('dependency', 0.043), ('segmentation', 0.043), ('water', 0.042), ('framework', 0.041), ('relations', 0.041), ('milind', 0.041), ('qbe', 0.041), ('qbk', 0.041), ('replicating', 0.041), ('zhong', 0.041), ('improvement', 0.04), ('events', 0.04), ('loopy', 0.04), ('fifth', 0.04), ('color', 0.038), ('jain', 0.035), ('forest', 0.035), ('chances', 0.035), ('ho', 0.035), ('databases', 0.035), ('shots', 0.035), ('invariants', 0.035), ('helicopter', 0.035), ('jan', 0.035), ('vv', 0.035), ('propose', 0.034), ('static', 0.033), ('region', 0.033), ('consecutive', 0.032), ('outdoor', 0.032), ('rocks', 0.032), ('templates', 0.032), ('marginalized', 0.032), ('international', 0.032), ('within', 0.031), ('figure', 0.03), ('slices', 0.03), ('interaction', 0.029), ('ieee', 0.029), ('graphs', 0.028), ('vf', 0.028), ('frey', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 103 nips-2000-Probabilistic Semantic Video Indexing
Author: Milind R. Naphade, Igor Kozintsev, Thomas S. Huang
Abstract: We propose a novel probabilistic framework for semantic video indexing. We define probabilistic multimedia objects (multijects) to map low-level media features to high-level semantic labels. A graphical network of such multijects (multinet) captures scene context by discovering intra-frame as well as inter-frame dependency relations between the concepts. The main contribution is a novel application of a factor graph framework to model this network. We model relations between semantic concepts in terms of their co-occurrence as well as the temporal dependencies between these concepts within video shots. Using the sum-product algorithm [1] for approximate or exact inference in these factor graph multinets, we attempt to correct errors made during isolated concept detection by forcing high-level constraints. This results in a significant improvement in the overall detection performance. 1
2 0.18077993 30 nips-2000-Bayesian Video Shot Segmentation
Author: Nuno Vasconcelos, Andrew Lippman
Abstract: Prior knowledge about video structure can be used both as a means to improve the peiformance of content analysis and to extract features that allow semantic classification. We introduce statistical models for two important components of this structure, shot duration and activity, and demonstrate the usefulness of these models by introducing a Bayesian formulation for the shot segmentation problem. The new formulations is shown to extend standard thresholding methods in an adaptive and intuitive way, leading to improved segmentation accuracy.
3 0.16586778 78 nips-2000-Learning Joint Statistical Models for Audio-Visual Fusion and Segregation
Author: John W. Fisher III, Trevor Darrell, William T. Freeman, Paul A. Viola
Abstract: People can understand complex auditory and visual information, often using one to disambiguate the other. Automated analysis, even at a lowlevel, faces severe challenges, including the lack of accurate statistical models for the signals, and their high-dimensionality and varied sampling rates. Previous approaches [6] assumed simple parametric models for the joint distribution which, while tractable, cannot capture the complex signal relationships. We learn the joint distribution of the visual and auditory signals using a non-parametric approach. First, we project the data into a maximally informative, low-dimensional subspace, suitable for density estimation. We then model the complicated stochastic relationships between the signals using a nonparametric density estimator. These learned densities allow processing across signal modalities. We demonstrate, on synthetic and real signals, localization in video of the face that is speaking in audio, and, conversely, audio enhancement of a particular speaker selected from the video.
4 0.13063623 83 nips-2000-Machine Learning for Video-Based Rendering
Author: Arno Schödl, Irfan A. Essa
Abstract: We present techniques for rendering and animation of realistic scenes by analyzing and training on short video sequences. This work extends the new paradigm for computer animation, video textures, which uses recorded video to generate novel animations by replaying the video samples in a new order. Here we concentrate on video sprites, which are a special type of video texture. In video sprites, instead of storing whole images, the object of interest is separated from the background and the video samples are stored as a sequence of alpha-matted sprites with associated velocity information. They can be rendered anywhere on the screen to create a novel animation of the object. We present methods to create such animations by finding a sequence of sprite samples that is both visually smooth and follows a desired path. To estimate visual smoothness, we train a linear classifier to estimate visual similarity between video samples. If the motion path is known in advance, we use beam search to find a good sample sequence. We can specify the motion interactively by precomputing the sequence cost function using Q-Iearning.
5 0.11724125 50 nips-2000-FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks
Author: Malcolm Slaney, Michele Covell
Abstract: FaceSync is an optimal linear algorithm that finds the degree of synchronization between the audio and image recordings of a human speaker. Using canonical correlation, it finds the best direction to combine all the audio and image data, projecting them onto a single axis. FaceSync uses Pearson's correlation to measure the degree of synchronization between the audio and image data. We derive the optimal linear transform to combine the audio and visual information and describe an implementation that avoids the numerical problems caused by computing the correlation matrices. 1 Motivation In many applications, we want to know about the synchronization between an audio signal and the corresponding image data. In a teleconferencing system, we might want to know which of the several people imaged by a camera is heard by the microphones; then, we can direct the camera to the speaker. In post-production for a film, clean audio dialog is often dubbed over the video; we want to adjust the audio signal so that the lip-sync is perfect. When analyzing a film, we want to know when the person talking is in the shot, instead of off camera. When evaluating the quality of dubbed films, we can measure of how well the translated words and audio fit the actor's face. This paper describes an algorithm, FaceSync, that measures the degree of synchronization between the video image of a face and the associated audio signal. We can do this task by synthesizing the talking face, using techniques such as Video Rewrite [1], and then comparing the synthesized video with the test video. That process, however, is expensive. Our solution finds a linear operator that, when applied to the audio and video signals, generates an audio-video-synchronization-error signal. The linear operator gathers information from throughout the image and thus allows us to do the computation inexpensively. Hershey and Movellan [2] describe an approach based on measuring the mutual information between the audio signal and individual pixels in the video. The correlation between the audio signal, x, and one pixel in the image y, is given by Pearson's correlation, r. The mutual information between these two variables is given by f(x,y) = -1/2 log(l-?). They create movies that show the regions of the video that have high correlation with the audio; 1. Currently at IBM Almaden Research, 650 Harry Road, San Jose, CA 95120. 2. Currently at Yes Video. com, 2192 Fortune Drive, San Jose, CA 95131. Standard Deviation of Testing Data FaceSync
6 0.11121 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation
7 0.09034767 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles
8 0.08217866 16 nips-2000-Active Inference in Concept Learning
9 0.079049475 62 nips-2000-Generalized Belief Propagation
10 0.074007526 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach
11 0.073446214 82 nips-2000-Learning and Tracking Cyclic Human Motion
12 0.071103543 19 nips-2000-Adaptive Object Representation with Hierarchically-Distributed Memory Sites
13 0.068625599 4 nips-2000-A Linear Programming Approach to Novelty Detection
14 0.066760249 45 nips-2000-Emergence of Movement Sensitive Neurons' Properties by Learning a Sparse Code for Natural Moving Images
15 0.059359495 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks
16 0.059279203 148 nips-2000-`N-Body' Problems in Statistical Learning
17 0.058416147 72 nips-2000-Keeping Flexible Active Contours on Track using Metropolis Updates
18 0.056164004 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script
19 0.055841282 114 nips-2000-Second Order Approximations for Probability Models
20 0.054124881 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks
topicId topicWeight
[(0, 0.185), (1, -0.081), (2, 0.147), (3, 0.121), (4, -0.047), (5, -0.065), (6, 0.237), (7, 0.114), (8, -0.193), (9, 0.146), (10, 0.092), (11, 0.071), (12, -0.1), (13, 0.035), (14, 0.135), (15, 0.09), (16, -0.057), (17, 0.056), (18, 0.049), (19, 0.146), (20, 0.026), (21, 0.068), (22, 0.059), (23, -0.03), (24, -0.004), (25, 0.03), (26, 0.032), (27, 0.014), (28, 0.049), (29, 0.034), (30, -0.096), (31, 0.031), (32, -0.08), (33, -0.067), (34, -0.105), (35, 0.061), (36, 0.131), (37, 0.052), (38, -0.138), (39, 0.028), (40, -0.063), (41, -0.14), (42, -0.111), (43, 0.12), (44, -0.05), (45, 0.068), (46, 0.015), (47, 0.091), (48, -0.116), (49, 0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.96282434 103 nips-2000-Probabilistic Semantic Video Indexing
Author: Milind R. Naphade, Igor Kozintsev, Thomas S. Huang
Abstract: We propose a novel probabilistic framework for semantic video indexing. We define probabilistic multimedia objects (multijects) to map low-level media features to high-level semantic labels. A graphical network of such multijects (multinet) captures scene context by discovering intra-frame as well as inter-frame dependency relations between the concepts. The main contribution is a novel application of a factor graph framework to model this network. We model relations between semantic concepts in terms of their co-occurrence as well as the temporal dependencies between these concepts within video shots. Using the sum-product algorithm [1] for approximate or exact inference in these factor graph multinets, we attempt to correct errors made during isolated concept detection by forcing high-level constraints. This results in a significant improvement in the overall detection performance. 1
2 0.78789985 30 nips-2000-Bayesian Video Shot Segmentation
Author: Nuno Vasconcelos, Andrew Lippman
Abstract: Prior knowledge about video structure can be used both as a means to improve the peiformance of content analysis and to extract features that allow semantic classification. We introduce statistical models for two important components of this structure, shot duration and activity, and demonstrate the usefulness of these models by introducing a Bayesian formulation for the shot segmentation problem. The new formulations is shown to extend standard thresholding methods in an adaptive and intuitive way, leading to improved segmentation accuracy.
3 0.46909344 83 nips-2000-Machine Learning for Video-Based Rendering
Author: Arno Schödl, Irfan A. Essa
Abstract: We present techniques for rendering and animation of realistic scenes by analyzing and training on short video sequences. This work extends the new paradigm for computer animation, video textures, which uses recorded video to generate novel animations by replaying the video samples in a new order. Here we concentrate on video sprites, which are a special type of video texture. In video sprites, instead of storing whole images, the object of interest is separated from the background and the video samples are stored as a sequence of alpha-matted sprites with associated velocity information. They can be rendered anywhere on the screen to create a novel animation of the object. We present methods to create such animations by finding a sequence of sprite samples that is both visually smooth and follows a desired path. To estimate visual smoothness, we train a linear classifier to estimate visual similarity between video samples. If the motion path is known in advance, we use beam search to find a good sample sequence. We can specify the motion interactively by precomputing the sequence cost function using Q-Iearning.
4 0.42632428 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation
Author: Brendan J. Frey, Anitha Kannan
Abstract: One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. probability propagation algorithm), while ignoring the fact that the graph has cycles. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce
5 0.38641748 78 nips-2000-Learning Joint Statistical Models for Audio-Visual Fusion and Segregation
Author: John W. Fisher III, Trevor Darrell, William T. Freeman, Paul A. Viola
Abstract: People can understand complex auditory and visual information, often using one to disambiguate the other. Automated analysis, even at a lowlevel, faces severe challenges, including the lack of accurate statistical models for the signals, and their high-dimensionality and varied sampling rates. Previous approaches [6] assumed simple parametric models for the joint distribution which, while tractable, cannot capture the complex signal relationships. We learn the joint distribution of the visual and auditory signals using a non-parametric approach. First, we project the data into a maximally informative, low-dimensional subspace, suitable for density estimation. We then model the complicated stochastic relationships between the signals using a nonparametric density estimator. These learned densities allow processing across signal modalities. We demonstrate, on synthetic and real signals, localization in video of the face that is speaking in audio, and, conversely, audio enhancement of a particular speaker selected from the video.
6 0.38305593 16 nips-2000-Active Inference in Concept Learning
7 0.3514199 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles
8 0.34588137 62 nips-2000-Generalized Belief Propagation
9 0.34247503 50 nips-2000-FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks
10 0.27995563 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script
11 0.26871291 82 nips-2000-Learning and Tracking Cyclic Human Motion
12 0.2623834 79 nips-2000-Learning Segmentation by Random Walks
13 0.25683576 135 nips-2000-The Manhattan World Assumption: Regularities in Scene Statistics which Enable Bayesian Inference
14 0.25518006 19 nips-2000-Adaptive Object Representation with Hierarchically-Distributed Memory Sites
15 0.2389033 127 nips-2000-Structure Learning in Human Causal Induction
16 0.23079042 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks
17 0.22137907 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks
18 0.21815915 29 nips-2000-Bayes Networks on Ice: Robotic Search for Antarctic Meteorites
19 0.21567743 4 nips-2000-A Linear Programming Approach to Novelty Detection
20 0.21545544 138 nips-2000-The Use of Classifiers in Sequential Inference
topicId topicWeight
[(10, 0.017), (17, 0.109), (32, 0.013), (33, 0.036), (54, 0.013), (55, 0.039), (62, 0.033), (65, 0.055), (67, 0.028), (75, 0.032), (76, 0.022), (79, 0.026), (81, 0.432), (90, 0.022), (91, 0.032)]
simIndex simValue paperId paperTitle
1 0.97043365 66 nips-2000-Hippocampally-Dependent Consolidation in a Hierarchical Model of Neocortex
Author: Szabolcs KĂĄli, Peter Dayan
Abstract: In memory consolidation, declarative memories which initially require the hippocampus for their recall, ultimately become independent of it. Consolidation has been the focus of numerous experimental and qualitative modeling studies, but only little quantitative exploration. We present a consolidation model in which hierarchical connections in the cortex, that initially instantiate purely semantic information acquired through probabilistic unsupervised learning, come to instantiate episodic information as well. The hippocampus is responsible for helping complete partial input patterns before consolidation is complete, while also training the cortex to perform appropriate completion by itself.
2 0.94627368 141 nips-2000-Universality and Individuality in a Neural Code
Author: Elad Schneidman, Naama Brenner, Naftali Tishby, Robert R. de Ruyter van Steveninck, William Bialek
Abstract: The problem of neural coding is to understand how sequences of action potentials (spikes) are related to sensory stimuli, motor outputs, or (ultimately) thoughts and intentions. One clear question is whether the same coding rules are used by different neurons, or by corresponding neurons in different individuals. We present a quantitative formulation of this problem using ideas from information theory, and apply this approach to the analysis of experiments in the fly visual system. We find significant individual differences in the structure of the code, particularly in the way that temporal patterns of spikes are used to convey information beyond that available from variations in spike rate. On the other hand, all the flies in our ensemble exhibit a high coding efficiency, so that every spike carries the same amount of information in all the individuals. Thus the neural code has a quantifiable mixture of individuality and universality. 1
3 0.94363588 137 nips-2000-The Unscented Particle Filter
Author: Rudolph van der Merwe, Arnaud Doucet, Nando de Freitas, Eric A. Wan
Abstract: In this paper, we propose a new particle filter based on sequential importance sampling. The algorithm uses a bank of unscented filters to obtain the importance proposal distribution. This proposal has two very
same-paper 4 0.91542077 103 nips-2000-Probabilistic Semantic Video Indexing
Author: Milind R. Naphade, Igor Kozintsev, Thomas S. Huang
Abstract: We propose a novel probabilistic framework for semantic video indexing. We define probabilistic multimedia objects (multijects) to map low-level media features to high-level semantic labels. A graphical network of such multijects (multinet) captures scene context by discovering intra-frame as well as inter-frame dependency relations between the concepts. The main contribution is a novel application of a factor graph framework to model this network. We model relations between semantic concepts in terms of their co-occurrence as well as the temporal dependencies between these concepts within video shots. Using the sum-product algorithm [1] for approximate or exact inference in these factor graph multinets, we attempt to correct errors made during isolated concept detection by forcing high-level constraints. This results in a significant improvement in the overall detection performance. 1
5 0.6295436 55 nips-2000-Finding the Key to a Synapse
Author: Thomas Natschläger, Wolfgang Maass
Abstract: Experimental data have shown that synapses are heterogeneous: different synapses respond with different sequences of amplitudes of postsynaptic responses to the same spike train. Neither the role of synaptic dynamics itself nor the role of the heterogeneity of synaptic dynamics for computations in neural circuits is well understood. We present in this article methods that make it feasible to compute for a given synapse with known synaptic parameters the spike train that is optimally fitted to the synapse, for example in the sense that it produces the largest sum of postsynaptic responses. To our surprise we find that most of these optimally fitted spike trains match common firing patterns of specific types of neurons that are discussed in the literature.
6 0.59586614 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics
7 0.5461058 131 nips-2000-The Early Word Catches the Weights
8 0.53338516 146 nips-2000-What Can a Single Neuron Compute?
9 0.50827891 88 nips-2000-Multiple Timescales of Adaptation in a Neural Code
10 0.49862844 43 nips-2000-Dopamine Bonuses
11 0.48017091 80 nips-2000-Learning Switching Linear Models of Human Motion
12 0.45937857 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script
13 0.45734525 30 nips-2000-Bayesian Video Shot Segmentation
14 0.45652571 49 nips-2000-Explaining Away in Weight Space
15 0.44824404 89 nips-2000-Natural Sound Statistics and Divisive Normalization in the Auditory System
16 0.44086546 124 nips-2000-Spike-Timing-Dependent Learning for Oscillatory Networks
17 0.43858323 129 nips-2000-Temporally Dependent Plasticity: An Information Theoretic Account
18 0.43627945 72 nips-2000-Keeping Flexible Active Contours on Track using Metropolis Updates
19 0.43543503 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks
20 0.43148524 90 nips-2000-New Approaches Towards Robust and Adaptive Speech Recognition