nips nips2004 nips2004-160 knowledge-graph by maker-knowledge-mining

160 nips-2004-Seeing through water


Source: pdf

Author: Alexei Efros, Volkan Isler, Jianbo Shi, Mirkó Visontai

Abstract: We consider the problem of recovering an underwater image distorted by surface waves. A large amount of video data of the distorted image is acquired. The problem is posed in terms of finding an undistorted image patch at each spatial location. This challenging reconstruction task can be formulated as a manifold learning problem, such that the center of the manifold is the image of the undistorted patch. To compute the center, we present a new technique to estimate global distances on the manifold. Our technique achieves robustness through convex flow computations and solves the “leakage” problem inherent in recent manifold embedding techniques. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We consider the problem of recovering an underwater image distorted by surface waves. [sent-10, score-0.415]

2 A large amount of video data of the distorted image is acquired. [sent-11, score-0.249]

3 The problem is posed in terms of finding an undistorted image patch at each spatial location. [sent-12, score-0.311]

4 This challenging reconstruction task can be formulated as a manifold learning problem, such that the center of the manifold is the image of the undistorted patch. [sent-13, score-0.616]

5 Our technique achieves robustness through convex flow computations and solves the “leakage” problem inherent in recent manifold embedding techniques. [sent-15, score-0.481]

6 A pool of water is observed by a stationary video camera mounted above the pool and looking straight down. [sent-17, score-0.505]

7 There are waves on the surface of the water and all the camera sees is a series of distorted images of the bottom of the pool, e. [sent-18, score-0.571]

8 The aim is to use these images to recover the undistorted image of the pool floor – as if the water was perfectly still. [sent-21, score-0.409]

9 But once in a while, when the water just happens to be locally flat at that point, we will be looking straight down and seeing exactly the right spot on the ground. [sent-30, score-0.332]

10 Figure 2: Ground truth image and reconstruction results using mean and median and snap the right picture at each spatial location, then recovering the desired ground truth image would be simply a matter of stitching these correct observations together. [sent-34, score-0.567]

11 However, the water surface refracts in accordance with Snell’s Law. [sent-40, score-0.32]

12 If the normal to the water surface directly underneath x is pointing straight up, there is no refraction and V (x) = G(x). [sent-42, score-0.356]

13 33 sin θ1 ), so the camera point V (x) will see the light projected from G(x + dx) on the ground plane. [sent-44, score-0.263]

14 This means that, in 2D, what the camera will be seeing over time at point V (x, y, t) are points on the ground plane sampled from a disk centered at G(x, y) and with radius related to the height of the water and the overall roughness of the water surface. [sent-47, score-0.931]

15 According to Cox-Munk Law [2], the surface normals of rough water are distributed approximately as a Gaussian centered around the vertical, assuming a large surface area and stationary waves. [sent-50, score-0.475]

16 Up to now, we only concerned ourselves with infinitesimally small points on the image or the ground plane. [sent-52, score-0.259]

17 Therefore, we will make an assumption that the surface of the water can be locally approximated by a planar patch. [sent-54, score-0.32]

18 This means that everything that was true for points is now true for local image patches (up to a small affine distortion). [sent-55, score-0.412]

19 If the distribution of a particular ground point on the image plane is unimodal, then one could track feature points in the video sequence over time. [sent-57, score-0.371]

20 Computing their mean positions over the entire video will give an estimate of their true positions on the ground plane. [sent-58, score-0.239]

21 However, since we have a lot of data, we can substitute smoothness in time with smoothness in similarity – for a given patch we are more likely to find a patch similar to it somewhere in time, and will have a better chance to track the transition between them. [sent-60, score-0.332]

22 We know that this set of patches comes from a disk on the ground plane centered around patch G(x, y) – our goal. [sent-62, score-0.832]

23 If the disk was small enough compared to the size of the patch, we could just cluster the patches together, e. [sent-63, score-0.397]

24 Unfortunately, the disk can be rather large, containing patches with no overlap at all, thus making only the local similarity comparisons possible. [sent-66, score-0.441]

25 However, notice that our set of patches lies on a low-dimensional manifold; in fact we know precisely which manifold – it’s the disk on the ground plane centered at G(x, y)! [sent-67, score-0.844]

26 So, if we could use the local patch similarities to find an embedding of the patches in V (x, y, t) on this manifold, the center of the embedding will hold our desired patch G(x, y). [sent-68, score-1.133]

27 The problem of embedding the patches based on local similarity is related to the recent work in manifold learning [4, 5]. [sent-69, score-0.711]

28 Basic ingredients of the embedding algorithms are: defining a distance measure between points, and finding an energy function that optimally places them in the embedding space. [sent-70, score-0.482]

29 The distance can be defined as all-pairs distance matrix, or as distance from a particular reference node. [sent-71, score-0.234]

30 The local similarity measure for our problem turned out to be particularly unreliable, so none of the previous manifold learning techniques were adequate for our purposes. [sent-73, score-0.222]

31 In the following section we will describe our own, robust method for computing a global distance function and finding the right embedding and eventually the center of it. [sent-74, score-0.394]

32 Our goal is to find a center patch to represent the set I. [sent-82, score-0.232]

33 A common approach is to design a global distance function using the measurable local distances and transitivity [6, 4]. [sent-85, score-0.246]

34 This is equivalent to designing a global distance function of the form: d(Ii , Ij ) = dlocal (Ii , Ij ), dtransitive (Ii , Ij ), if dlocal (Ii , Ij ) ≤ τ otherwise. [sent-86, score-0.503]

35 (2) where dlocal is a local distance function, τ is a user-specified threshold and dtransitive is a global, transitive distance function which utilizes dlocal . [sent-87, score-0.592]

36 One method for designing such a transitive distance function is to build a graph G = (V, E) whose vertices correspond to the members of I. [sent-93, score-0.236]

37 Afterwards, the length of pairwise shortest paths are used to estimate the true distances on the manifold S. [sent-95, score-0.507]

38 Unfortunately, estimating the distance dtransitive (·, ·) using shortest path computations is not robust to errors in the local distances – which are very common. [sent-97, score-0.561]

39 Since they are different letters, we expect that these patches would be quite distant on the manifold S. [sent-99, score-0.465]

40 However, among the A patches there will inevitably be a very blurry A that would look quite similar to a very blurry B producing an erroneous local distance measurement. [sent-100, score-0.817]

41 When the transitive global distances are computed using shortest paths, a single erroneous edge will singlehandedly cause all the A patches to be much closer to all the B patches, short-circuiting the graph and completely distorting all the distances. [sent-101, score-0.938]

42 Such errors lead to the leakage problem in estimating the global distances of patches. [sent-102, score-0.395]

43 Suppose our local distance function erroneously estimates an edge between the corners of the triangle as shown in the figure. [sent-105, score-0.263]

44 After the erroneous edge is inserted, the shortest paths from the top of the triangle leak through this edge. [sent-106, score-0.612]

45 Therefore, the shortest path distances will fail to reflect the true distance on the manifold. [sent-107, score-0.432]

46 5 Solving the leakage problem Recall that our goal is to find the center of our data set as defined in Equation 1. [sent-108, score-0.337]

47 The leakage problem occurs when we compute the values dI (Ii ) using the shortest path metric. [sent-111, score-0.549]

48 In this case, even a single erroneous edge may reduce the shortest paths from many different patches to Ii – changing the value of dI (Ii ) drastically. [sent-112, score-0.785]

49 Intuitively, in order to prevent the leakage problem we must prevent edges from getting involved in many shortest path computations to the same node (i. [sent-113, score-0.672]

50 Let G = (V, E) be our graph representation such that for each patch Ii ∈ I, there is a vertex vi ∈ V . [sent-117, score-0.288]

51 The edge set E is built as follows: there is an edge (vi , vj ) if dlocal (Ii , Ij ) is less than a threshold. [sent-118, score-0.375]

52 The weight of the edge (vi , vj ) is equal to dlocal (Ii , Ij ). [sent-119, score-0.282]

53 The arcs of the flow network are chosen using the edge set E. [sent-123, score-0.327]

54 For each edge (vj , vk ) ∈ E we add the arcs vj → vk and vk → vj . [sent-124, score-0.549]

55 Both arcs have infinite capacity and the cost of pushing one unit of flow on either arc is equal to the weight of (vj , vk ), as shown in Figure 4 left (top and bottom). [sent-125, score-0.578]

56 Computing the minimum cost flow on N W (Ii ) not only gives us dI (Ii ) but also allows us to compute how many times an edge is involved in the distance computation: the amount of flow through an edge is exactly the number of times that edge is used for the shortest path computations. [sent-129, score-0.744]

57 Therefore, if we prevent too much flow going through an edge, we can prevent the leakage problem. [sent-131, score-0.347]

58 d3 /∞ u d2 /c2 w d1 /c1 c1 c1 + c2 u C: Shortest Path with Capacity Error d/∞ v ∞ d1 /c1 u w v c1 w Figure 4: The leakage problem. [sent-132, score-0.271]

59 Left: Equivalence of shortest path leakage and uncapacitated flow leakage problem. [sent-133, score-0.82]

60 Bottom-middle: After the erroneous edge is inserted, the shortest paths from the top of the triangle to vertex v go through this edge. [sent-134, score-0.616]

61 One might think that the leakage problem can simply be avoided by imposing capacity constraints on the arcs of the flow network (Figure 4, box C). [sent-143, score-0.63]

62 Observe that in the minimum cost flow solution of the network N W (Ii ), the amount of flow on the arcs will increase as the arcs get closer to Ii . [sent-145, score-0.533]

63 Therefore, when we are setting up the network N W (Ii ), we must adaptively increase the capacities of arcs “closer” to the sink vi – otherwise, there will be no feasible solution. [sent-146, score-0.369]

64 Therefore imposing capacities on the arcs requires understanding the underlying structure of the graph G as well as the space S – which is in fact the problem we are trying to solve! [sent-149, score-0.278]

65 Our proposed solution to the leakage problem uses the notion of a convex flow. [sent-150, score-0.345]

66 Instead, we impose a convex cost function on the arcs such that the cost of pushing unit flow on arc a increases as the total amount of flow through a increases. [sent-152, score-0.648]

67 The transformation is achieved by applying the following operation on each arc in N W (Ii ): Let a be an arc from u to w in N W (Ii ). [sent-155, score-0.208]

68 The weights and capacities of the other arcs are chosen to reflect the steepness of the desired convexity (Figure 4, box B). [sent-165, score-0.301]

69 However, the minimum cost flow will avoid the leakage problem because it will be costly to use an erroneous edge to carry the flow from many different patches. [sent-169, score-0.592]

70 1 Fixing the leakage in Isomap As noted earlier, the Isomap method [4] uses the shortest path measurements to estimate a distance matrix M . [sent-171, score-0.627]

71 Afterwards, M is used to find an embedding of the manifold S via MDS. [sent-172, score-0.38]

72 As expected, this method also suffers from the leakage problem as demonstrated in Figure 5. [sent-173, score-0.271]

73 The top-left image in Figure 5 shows our ground truth. [sent-174, score-0.259]

74 In the middle row, we present an embedding of these graphs computed using Isomap which uses the shortest path length as the global distance measure. [sent-175, score-0.644]

75 As illustrated in these figures, even though isomap does a good job in embedding the ground truth when there are no errors, the embedding (or manifold) collapses after we insert the erroneous edges. [sent-176, score-0.904]

76 In contrast, when we use the convex-flow based technique to estimate the distances, we recover the true embedding – even in the presence of erroneous edges (Figure 5 bottom row). [sent-177, score-0.477]

77 6 Results In our experiments we used 800 image frames to reconstruct the ground truth image. [sent-178, score-0.368]

78 We fixed 30 × 30 size patches in each frame at the same location (see top of Figure 7 for two sets of examples), and for every location we found the center. [sent-179, score-0.319]

79 The middle row of Figure 7 shows embeddings of the patches computed using the distance derived from the convex flow. [sent-180, score-0.535]

80 The transition path and the morphing from selected patches (A,B,C) to the center patch (F) is shown at the bottom. [sent-181, score-0.64]

81 The embedding plot on the left is considered an easier case, with a Gaussian-like embedding (the graph is denser close to the center) and smooth transitions between the patches in a transition path. [sent-182, score-0.728]

82 The plot to the right shows a more difficult example, when the embedding has no longer a Gaussian shape, but rather a triangular one. [sent-183, score-0.239]

83 Also note that the transitions can have jumps connecting non-similar patches which are distant in the embedding space. [sent-184, score-0.489]

84 After sampling points from a triangular disk, a kNN graph is constructed to provide a local measure for the embedding (left). [sent-285, score-0.32]

85 Additional erroneous edges AC and CB are added to perturb the local measure (middle, right). [sent-286, score-0.243]

86 However, all-pairs shortest path can “leak” through AC and CB, resulting a significant change in the embedding. [sent-289, score-0.278]

87 Convex flow penalized too many paths going through the same edge – correcting the leakage problem. [sent-291, score-0.423]

88 This results in ‘folding in’ the embedding and thus, moving estimated the center towards the blurry patches. [sent-294, score-0.396]

89 To solve this problem, we introduced additional two centers, which ideally would represent the blurry patches, allowing the third center to move to the ground truth. [sent-295, score-0.372]

90 Once we have found the centers for all patches we stitched them together to form the complete reconstructed image. [sent-296, score-0.403]

91 In case of three centers, we use overlapping patches and dynamic programming to determine the best stitching. [sent-297, score-0.287]

92 The better performance of the latter suggests that the two new centers relieve the correct center from the blurry patches. [sent-300, score-0.267]

93 For a graph with n vertices and m edges, the minimum cost flow computation takes O(m log n(m + n log n)) time, therefore finding the center I ∗ of one set of patches can be done in O(mn log n(m + n log n)) time. [sent-301, score-0.466]

94 B A1 FC C2 C1 F C FA F B2 FB A2 B1 A F A F FA F C FB FA B A1 A2 FC FB B1 FC B2 C1 C2 Figure 7: Top row: sample patches (two different locations) from 800 frames, Middle row: Convex flow embedding, showing the transition paths. [sent-304, score-0.287]

95 Bottom row: corresponding patches (A, B, C, A1, A2, B1, B2, C1, C2) and the morphing of them to the centers F F, FA, FB, FC respectively 7 Conclusion In this paper, we studied the problem of recovering an underwater image from a video sequence. [sent-305, score-0.684]

96 Because of the surface waves, the sequence consists of distorted versions of the image to be recovered. [sent-306, score-0.27]

97 The novelty of our work is in the formulation of the reconstruction problem as a manifold embedding problem. [sent-307, score-0.429]

98 Our contribution also includes a new technique, based on convex flows, to recover global distances on the manifold in a robust fashion. [sent-308, score-0.376]

99 This technique solves the leakage problem inherent in recent embedding methods. [sent-309, score-0.5]

100 Correction of an underwater object image distorted by surface waves. [sent-314, score-0.377]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ow', 0.389), ('patches', 0.287), ('leakage', 0.271), ('water', 0.205), ('embedding', 0.202), ('shortest', 0.194), ('ii', 0.19), ('arcs', 0.19), ('ground', 0.178), ('manifold', 0.178), ('patch', 0.166), ('erroneous', 0.152), ('blurry', 0.128), ('dlocal', 0.128), ('surface', 0.115), ('disk', 0.11), ('underwater', 0.107), ('arc', 0.104), ('isomap', 0.1), ('edge', 0.093), ('dtransitive', 0.085), ('fc', 0.085), ('camera', 0.085), ('path', 0.084), ('image', 0.081), ('distance', 0.078), ('cost', 0.076), ('distances', 0.076), ('distorted', 0.074), ('ij', 0.074), ('convex', 0.074), ('centers', 0.073), ('truth', 0.07), ('fb', 0.068), ('center', 0.066), ('capacity', 0.065), ('undistorted', 0.064), ('pushing', 0.063), ('vj', 0.061), ('video', 0.061), ('box', 0.06), ('fa', 0.06), ('pool', 0.059), ('paths', 0.059), ('row', 0.058), ('seeing', 0.057), ('plane', 0.051), ('capacities', 0.051), ('transitive', 0.051), ('bottom', 0.049), ('reconstruction', 0.049), ('vk', 0.048), ('triangle', 0.048), ('di', 0.048), ('global', 0.048), ('edges', 0.047), ('vi', 0.047), ('tracking', 0.046), ('local', 0.044), ('network', 0.044), ('dyt', 0.043), ('snell', 0.043), ('stitched', 0.043), ('waves', 0.043), ('centered', 0.04), ('dx', 0.039), ('frames', 0.039), ('prevent', 0.038), ('vertex', 0.038), ('middle', 0.038), ('recovering', 0.038), ('distortion', 0.038), ('efros', 0.037), ('triangular', 0.037), ('optics', 0.037), ('sink', 0.037), ('morphing', 0.037), ('graph', 0.037), ('designing', 0.036), ('straight', 0.036), ('members', 0.034), ('cb', 0.034), ('spot', 0.034), ('leak', 0.034), ('dxt', 0.034), ('knn', 0.034), ('cc', 0.034), ('unfortunately', 0.033), ('amount', 0.033), ('top', 0.032), ('unit', 0.032), ('inserted', 0.032), ('unimodal', 0.032), ('dec', 0.032), ('ak', 0.031), ('afterwards', 0.03), ('ac', 0.03), ('technique', 0.027), ('distortions', 0.027), ('setup', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 160 nips-2004-Seeing through water

Author: Alexei Efros, Volkan Isler, Jianbo Shi, Mirkó Visontai

Abstract: We consider the problem of recovering an underwater image distorted by surface waves. A large amount of video data of the distorted image is acquired. The problem is posed in terms of finding an undistorted image patch at each spatial location. This challenging reconstruction task can be formulated as a manifold learning problem, such that the center of the manifold is the image of the undistorted patch. To compute the center, we present a new technique to estimate global distances on the manifold. Our technique achieves robustness through convex flow computations and solves the “leakage” problem inherent in recent manifold embedding techniques. 1

2 0.20826381 99 nips-2004-Learning Hyper-Features for Visual Identification

Author: Andras D. Ferencz, Erik G. Learned-miller, Jitendra Malik

Abstract: We address the problem of identifying specific instances of a class (cars) from a set of images all belonging to that class. Although we cannot build a model for any particular instance (as we may be provided with only one “training” example of it), we can use information extracted from observing other members of the class. We pose this task as a learning problem, in which the learner is given image pairs, labeled as matching or not, and must discover which image features are most consistent for matching instances and discriminative for mismatches. We explore a patch based representation, where we model the distributions of similarity measurements defined on the patches. Finally, we describe an algorithm that selects the most salient patches based on a mutual information criterion. This algorithm performs identification well for our challenging dataset of car images, after matching only a few, well chosen patches. 1

3 0.18621863 131 nips-2004-Non-Local Manifold Tangent Learning

Author: Yoshua Bengio, Martin Monperrus

Abstract: We claim and present arguments to the effect that a large class of manifold learning algorithms that are essentially local and can be framed as kernel learning algorithms will suffer from the curse of dimensionality, at the dimension of the true underlying manifold. This observation suggests to explore non-local manifold learning algorithms which attempt to discover shared structure in the tangent planes at different positions. A criterion for such an algorithm is proposed and experiments estimating a tangent plane prediction function are presented, showing its advantages with respect to local manifold learning algorithms: it is able to generalize very far from training data (on learning handwritten character image rotations), where a local non-parametric method fails. 1

4 0.14723663 62 nips-2004-Euclidean Embedding of Co-Occurrence Data

Author: Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby

Abstract: Embedding algorithms search for low dimensional structure in complex data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for embedding objects of different types, such as images and text, into a single common Euclidean space based on their co-occurrence statistics. The joint distributions are modeled as exponentials of Euclidean distances in the low-dimensional embedding space, which links the problem to convex optimization over positive semidefinite matrices. The local structure of our embedding corresponds to the statistical correlations via random walks in the Euclidean space. We quantify the performance of our method on two text datasets, and show that it consistently and significantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling and correspondence analysis. 1

5 0.13158621 179 nips-2004-Surface Reconstruction using Learned Shape Models

Author: Jan E. Solem, Fredrik Kahl

Abstract: We consider the problem of geometrical surface reconstruction from one or several images using learned shape models. While humans can effortlessly retrieve 3D shape information, this inverse problem has turned out to be difficult to perform automatically. We introduce a framework based on level set surface reconstruction and shape models for achieving this goal. Through this merging, we obtain an efficient and robust method for reconstructing surfaces of an object category of interest. The shape model includes surface cues such as point, curve and silhouette features. Based on ideas from Active Shape Models, we show how both the geometry and the appearance of these features can be modelled consistently in a multi-view context. The complete surface is obtained by evolving a level set driven by a PDE, which tries to fit the surface to the inferred 3D features. In addition, an a priori 3D surface model is used to regularize the solution, in particular, where surface features are sparse. Experiments are demonstrated on a database of real face images.

6 0.097850628 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters

7 0.091271363 4 nips-2004-A Generalized Bradley-Terry Model: From Group Competition to Individual Skill

8 0.090185061 114 nips-2004-Maximum Likelihood Estimation of Intrinsic Dimension

9 0.090097755 192 nips-2004-The power of feature clustering: An application to object detection

10 0.089949556 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning

11 0.083697408 44 nips-2004-Conditional Random Fields for Object Recognition

12 0.08034379 145 nips-2004-Parametric Embedding for Class Visualization

13 0.078629792 83 nips-2004-Incremental Learning for Visual Tracking

14 0.078251541 92 nips-2004-Kernel Methods for Implicit Surface Modeling

15 0.078073606 68 nips-2004-Face Detection --- Efficient and Rank Deficient

16 0.074111812 125 nips-2004-Multiple Relational Embedding

17 0.073374681 17 nips-2004-Adaptive Manifold Learning

18 0.069472849 100 nips-2004-Learning Preferences for Multiclass Problems

19 0.067475885 196 nips-2004-Triangle Fixing Algorithms for the Metric Nearness Problem

20 0.067291282 121 nips-2004-Modeling Nonlinear Dependencies in Natural Images using Mixture of Laplacian Distribution


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.198), (1, 0.054), (2, -0.062), (3, -0.162), (4, 0.105), (5, 0.017), (6, -0.017), (7, -0.116), (8, -0.053), (9, 0.003), (10, 0.019), (11, -0.217), (12, 0.056), (13, -0.161), (14, -0.162), (15, 0.186), (16, 0.05), (17, -0.101), (18, -0.147), (19, 0.036), (20, -0.003), (21, -0.017), (22, -0.181), (23, -0.108), (24, 0.042), (25, 0.069), (26, -0.037), (27, -0.04), (28, 0.015), (29, -0.023), (30, 0.05), (31, -0.012), (32, 0.074), (33, -0.11), (34, 0.068), (35, 0.051), (36, -0.127), (37, -0.068), (38, -0.101), (39, 0.08), (40, 0.0), (41, 0.087), (42, 0.023), (43, 0.104), (44, -0.092), (45, -0.01), (46, -0.006), (47, -0.06), (48, -0.101), (49, -0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97367436 160 nips-2004-Seeing through water

Author: Alexei Efros, Volkan Isler, Jianbo Shi, Mirkó Visontai

Abstract: We consider the problem of recovering an underwater image distorted by surface waves. A large amount of video data of the distorted image is acquired. The problem is posed in terms of finding an undistorted image patch at each spatial location. This challenging reconstruction task can be formulated as a manifold learning problem, such that the center of the manifold is the image of the undistorted patch. To compute the center, we present a new technique to estimate global distances on the manifold. Our technique achieves robustness through convex flow computations and solves the “leakage” problem inherent in recent manifold embedding techniques. 1

2 0.60987794 131 nips-2004-Non-Local Manifold Tangent Learning

Author: Yoshua Bengio, Martin Monperrus

Abstract: We claim and present arguments to the effect that a large class of manifold learning algorithms that are essentially local and can be framed as kernel learning algorithms will suffer from the curse of dimensionality, at the dimension of the true underlying manifold. This observation suggests to explore non-local manifold learning algorithms which attempt to discover shared structure in the tangent planes at different positions. A criterion for such an algorithm is proposed and experiments estimating a tangent plane prediction function are presented, showing its advantages with respect to local manifold learning algorithms: it is able to generalize very far from training data (on learning handwritten character image rotations), where a local non-parametric method fails. 1

3 0.59956545 99 nips-2004-Learning Hyper-Features for Visual Identification

Author: Andras D. Ferencz, Erik G. Learned-miller, Jitendra Malik

Abstract: We address the problem of identifying specific instances of a class (cars) from a set of images all belonging to that class. Although we cannot build a model for any particular instance (as we may be provided with only one “training” example of it), we can use information extracted from observing other members of the class. We pose this task as a learning problem, in which the learner is given image pairs, labeled as matching or not, and must discover which image features are most consistent for matching instances and discriminative for mismatches. We explore a patch based representation, where we model the distributions of similarity measurements defined on the patches. Finally, we describe an algorithm that selects the most salient patches based on a mutual information criterion. This algorithm performs identification well for our challenging dataset of car images, after matching only a few, well chosen patches. 1

4 0.48523661 62 nips-2004-Euclidean Embedding of Co-Occurrence Data

Author: Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby

Abstract: Embedding algorithms search for low dimensional structure in complex data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for embedding objects of different types, such as images and text, into a single common Euclidean space based on their co-occurrence statistics. The joint distributions are modeled as exponentials of Euclidean distances in the low-dimensional embedding space, which links the problem to convex optimization over positive semidefinite matrices. The local structure of our embedding corresponds to the statistical correlations via random walks in the Euclidean space. We quantify the performance of our method on two text datasets, and show that it consistently and significantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling and correspondence analysis. 1

5 0.48207825 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning

Author: Richard S. Zemel, Miguel Á. Carreira-Perpiñán

Abstract: Many machine learning algorithms for clustering or dimensionality reduction take as input a cloud of points in Euclidean space, and construct a graph with the input data points as vertices. This graph is then partitioned (clustering) or used to redefine metric information (dimensionality reduction). There has been much recent work on new methods for graph-based clustering and dimensionality reduction, but not much on constructing the graph itself. Graphs typically used include the fullyconnected graph, a local fixed-grid graph (for image segmentation) or a nearest-neighbor graph. We suggest that the graph should adapt locally to the structure of the data. This can be achieved by a graph ensemble that combines multiple minimum spanning trees, each fit to a perturbed version of the data set. We show that such a graph ensemble usually produces a better representation of the data manifold than standard methods; and that it provides robustness to a subsequent clustering or dimensionality reduction algorithm based on the graph. 1

6 0.46673742 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images

7 0.41292107 68 nips-2004-Face Detection --- Efficient and Rank Deficient

8 0.39483535 17 nips-2004-Adaptive Manifold Learning

9 0.38965061 179 nips-2004-Surface Reconstruction using Learned Shape Models

10 0.36831939 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models

11 0.36703774 192 nips-2004-The power of feature clustering: An application to object detection

12 0.3475334 186 nips-2004-The Correlated Correspondence Algorithm for Unsupervised Registration of Nonrigid Surfaces

13 0.30628273 114 nips-2004-Maximum Likelihood Estimation of Intrinsic Dimension

14 0.29709932 196 nips-2004-Triangle Fixing Algorithms for the Metric Nearness Problem

15 0.29274076 145 nips-2004-Parametric Embedding for Class Visualization

16 0.27607584 40 nips-2004-Common-Frame Model for Object Recognition

17 0.27241343 81 nips-2004-Implicit Wiener Series for Higher-Order Image Analysis

18 0.27220288 92 nips-2004-Kernel Methods for Implicit Surface Modeling

19 0.26965708 47 nips-2004-Contextual Models for Object Detection Using Boosted Random Fields

20 0.2621873 125 nips-2004-Multiple Relational Embedding


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.086), (15, 0.149), (26, 0.039), (31, 0.016), (33, 0.142), (35, 0.017), (39, 0.023), (50, 0.049), (55, 0.299), (56, 0.024), (71, 0.012), (76, 0.017), (77, 0.029), (87, 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80231524 160 nips-2004-Seeing through water

Author: Alexei Efros, Volkan Isler, Jianbo Shi, Mirkó Visontai

Abstract: We consider the problem of recovering an underwater image distorted by surface waves. A large amount of video data of the distorted image is acquired. The problem is posed in terms of finding an undistorted image patch at each spatial location. This challenging reconstruction task can be formulated as a manifold learning problem, such that the center of the manifold is the image of the undistorted patch. To compute the center, we present a new technique to estimate global distances on the manifold. Our technique achieves robustness through convex flow computations and solves the “leakage” problem inherent in recent manifold embedding techniques. 1

2 0.60149282 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters

Author: Tim K. Marks, J. C. Roddey, Javier R. Movellan, John R. Hershey

Abstract: We present a generative model and stochastic filtering algorithm for simultaneous tracking of 3D position and orientation, non-rigid motion, object texture, and background texture using a single camera. We show that the solution to this problem is formally equivalent to stochastic filtering of conditionally Gaussian processes, a problem for which well known approaches exist [3, 8]. We propose an approach based on Monte Carlo sampling of the nonlinear component of the process (object motion) and exact filtering of the object and background textures given the sampled motion. The smoothness of image sequences in time and space is exploited by using Laplace’s method to generate proposal distributions for importance sampling [7]. The resulting inference algorithm encompasses both optic flow and template-based tracking as special cases, and elucidates the conditions under which these methods are optimal. We demonstrate an application of the system to 3D non-rigid face tracking. 1 Background Recent algorithms track morphable objects by solving optic flow equations, subject to the constraint that the tracked points belong to an object whose non-rigid deformations are linear combinations of a set of basic shapes [10, 2, 11]. These algorithms require precise initialization of the object pose and tend to drift out of alignment on long video sequences. We present G-flow, a generative model and stochastic filtering formulation of tracking that address the problems of initialization and error recovery in a principled manner. We define a non-rigid object by the 3D locations of n vertices. The object is a linear combination of k fixed morph bases, with coefficients c = [c1 , c2 , · · · , ck ]T . The fixed 3 × k matrix hi contains the position of the ith vertex in all k morph bases. The transformation from object-centered to image coordinates consists of a rotation, weak perspective projection, and translation. Thus xi , the 2D location of the ith vertex on the image plane, is xi = grhi c + l, (1) where r is the 3 × 3 rotation matrix, l is the 2 × 1 translation vector, and g = 1 0 0 is the 010 projection matrix. The object pose, ut , comprises both the rigid motion parameters and the morph parameters at time t: ut = {r(t), l(t), c(t)}. (2) 1.1 Optic flow Let yt represent the current image, and let xi (ut ) index the image pixel that is rendered by the ith object vertex when the object assumes pose ut . Suppose that we know ut−1 , the pose at time t − 1, and we want to find ut , the pose at time t. This problem can be solved by minimizing the following form with respect to ut : ut = argmin ˆ ut 1 2 n 2 [yt (xi (ut )) − yt−1 (xi (ut−1 ))] . (3) i=1 In the special case in which the xi (ut ) are neighboring points that move with the same 2D displacement, this reduces to the standard Lucas-Kanade optic flow algorithm [9, 1]. Recent work [10, 2, 11] has shown that in the general case, this optimization problem can be solved efficiently using the Gauss-Newton method. We will take advantage of this fact to develop an efficient stochastic inference algorithm within the framework of G-flow. Notational conventions Unless otherwise stated, capital letters are used for random variables, small letters for specific values taken by random variables, and Greek letters for fixed model parameters. Subscripted colons indicate sequences: e.g., X1:t = X1 · · · Xt . The term In stands for the n × n identity matrix, E for expected value, V ar for the covariance matrix, and V ar−1 for the inverse of the covariance matrix (precision matrix). 2 The Generative Model for G-Flow Figure 1: Left: a(Ut ) determines which texel (color at a vertex of the object model or a pixel of the background model) is responsible for rendering each image pixel. Right: G-flow video generation model: At time t, the object’s 3D pose, Ut , is used to project the object texture, Vt , into 2D. This projection is combined with the background texture, Bt , to generate the observed image, Yt . We model the image sequence Y as a stochastic process generated by three hidden causes, U , V , and B, as shown in the graphical model (Figure 1, right). The m × 1 random vector Yt represents the m-pixel image at time t. The n × 1 random vector Vt and the m × 1 random vector Bt represent the n-texel object texture and the m-texel background texture, respectively. As illustrated in Figure 1, left, the object pose, Ut , determines onto which image pixels the object and background texels project at time t. This is formulated using the projection function a(Ut ). For a given pose, ut , the projection a(ut ) is a block matrix, def a(ut ) = av (ut ) ab (ut ) . Here av (ut ), the object projection function, is an m × n matrix of 0s and 1s that tells onto which image pixel each object vertex projects; e.g., a 1 at row j, column i it means that the ith object point projects onto image pixel j. Matrix ab plays the same role for background pixels. Assuming the foreground mapping is one-toone, we let ab = Im −av (ut )av (ut )T , expressing the simple occlusion constraint that every image pixel is rendered by object or background, but not both. In the G-flow generative model: Vt Yt = a(Ut ) + Wt Wt ∼ N (0, σw Im ), σw > 0 Bt (4) Ut ∼ p(ut | ut−1 ) v v Vt = Vt−1 + Zt−1 Zt−1 ∼ N (0, Ψv ), Ψv is diagonal b b Bt = Bt−1 + Zt−1 Zt−1 ∼ N (0, Ψb ), Ψb is diagonal where p(ut | ut−1 ) is the pose transition distribution, and Z v , Z b , W are independent of each other, of the initial conditions, and over time. The form of the pose distribution is left unspecified since the algorithm proposed here does not require the pose distribution or the pose dynamics to be Gaussian. For the initial conditions, we require that the variance of V1 and the variance of B1 are both diagonal. Non-rigid 3D tracking is a difficult nonlinear filtering problem because changing the pose has a nonlinear effect on the image pixels. Fortunately, the problem has a rich structure that we can exploit: under the G-flow model, video generation is a conditionally Gaussian process [3, 6, 4, 5]. If the specific values taken by the pose sequence, u1:t , were known, then the texture processes, V and B, and the image process, Y , would be jointly Gaussian. This suggests the following scheme: we could use particle filtering to obtain a distribution of pose experts (each expert corresponds to a highly probable sample of pose, u1:t ). For each expert we could then use Kalman filtering equations to infer the posterior distribution of texture given the observed images. This method is known in the statistics community as a Monte Carlo filtering solution for conditionally Gaussian processes [3, 4], and in the machine learning community as Rao-Blackwellized particle filtering [6, 5]. We found that in addition to Rao-Blackwellization, it was also critical to use Laplace’s method to generate the proposal distributions for importance sampling [7]. In the context of G-flow, we accomplished this by performing an optic flow-like optimization, using an efficient algorithm similar to those in [10, 2]. 3 Inference Our goal is to find an expression for the filtering distribution, p(ut , vt , bt | y1:t ). Using the law of total probability, we have the following equation for the filtering distribution: p(ut , vt , bt | y1:t ) = p(ut , vt , bt | u1:t−1 , y1:t ) p(u1:t−1 | y1:t ) du1:t−1 Opinion of expert (5) Credibility of expert We can think of the integral in (5) as a sum over a distribution of experts, where each expert corresponds to a single pose history, u1:t−1 . Based on its hypothesis about pose history, each expert has an opinion about the current pose of the object, Ut , and the texture maps of the object and background, Vt and Bt . Each expert also has a credibility, a scalar that measures how well the expert’s opinion matches the observed image yt . Thus, (5) can be interpreted as follows: The filtering distribution at time t is obtained by integrating over the entire ensemble of experts the opinion of each expert weighted by that expert’s credibility. The opinion distribution of expert u1:t−1 can be factorized into the expert’s opinion about the pose Ut times the conditional distribution of texture Vt , Bt given pose: p(ut , vt , bt | u1:t−1 , y1:t ) = p(ut | u1:t−1 , y1:t ) p(vt , bt | u1:t , y1:t ) (6) Opinion of expert Pose Opinion Texture Opinion given pose The rest of this section explains how we evaluate each term in (5) and (6). We cover the distribution of texture given pose in 3.1, pose opinion in 3.2, and credibility in 3.3. 3.1 Texture opinion given pose The distribution of Vt and Bt given the pose history u1:t is Gaussian with mean and covariance that can be obtained using the Kalman filter estimation equations: −1 V ar−1 (Vt , Bt | u1:t , y1:t ) = V ar−1 (Vt , Bt | u1:t−1 , y1:t−1 ) + a(ut )T σw a(ut ) E(Vt , Bt | u1:t , y1:t ) = V ar(Vt , Bt | u1:t , y1:t ) −1 × V ar−1 (Vt , Bt | u1:t−1 , y1:t−1 )E(Vt , Bt | u1:t−1 , y1:t−1 ) + a(ut )T σw yt (7) (8) This requires p(Vt , Bt |u1:t−1 , y1:t−1 ), which we get from the Kalman prediction equations: E(Vt , Bt | u1:t−1 , y1:t−1 ) = E(Vt−1 , Bt−1 | u1:t−1 , y1:t−1 ) V ar(Vt , Bt | u1:t−1 , y1:t−1 ) = V ar(Vt−1 , Bt−1 | u1:t−1 , y1:t−1 ) + (9) Ψv 0 0 Ψb (10) In (9), the expected value E(Vt , Bt | u1:t−1 , y1:t−1 ) consists of texture maps (templates) for the object and background. In (10), V ar(Vt , Bt | u1:t−1 , y1:t−1 ) represents the degree of uncertainty about each texel in these texture maps. Since this is a diagonal matrix, we can refer to the mean and variance of each texel individually. For the ith texel in the object texture map, we use the following notation: µv (i) t v σt (i) def = ith element of E(Vt | u1:t−1 , y1:t−1 ) def = (i, i)th element of V ar(Vt | u1:t−1 , y1:t−1 ) b Similarly, define µb (j) and σt (j) as the mean and variance of the jth texel in the backt ground texture map. (This notation leaves the dependency on u1:t−1 and y1:t−1 implicit.) 3.2 Pose opinion Based on its current texture template (derived from the history of poses and images up to time t−1) and the new image yt , each expert u1:t−1 has a pose opinion, p(ut |u1:t−1 , y1:t ), a probability distribution representing that expert’s beliefs about the pose at time t. Since the effect of ut on the likelihood function is nonlinear, we will not attempt to find an analytical solution for the pose opinion distribution. However, due to the spatio-temporal smoothness of video signals, it is possible to estimate the peak and variance of an expert’s pose opinion. 3.2.1 Estimating the peak of an expert’s pose opinion We want to estimate ut (u1:t−1 ), the value of ut that maximizes the pose opinion. Since ˆ p(ut | u1:t−1 , y1:t ) = p(y1:t−1 | u1:t−1 ) p(ut | ut−1 ) p(yt | u1:t , y1:t−1 ), p(y1:t | u1:t−1 ) (11) def ut (u1:t−1 ) = argmax p(ut | u1:t−1 , y1:t ) = argmax p(ut | ut−1 ) p(yt | u1:t , y1:t−1 ). ˆ ut ut (12) We now need an expression for the final term in (12), the predictive distribution p(yt | u1:t , y1:t−1 ). By integrating out the hidden texture variables from p(yt , vt , bt | u1:t , y1:t−1 ), and using the conditional independence relationships defined by the graphical model (Figure 1, right), we can derive: 1 m log p(yt | u1:t , y1:t−1 ) = − log 2π − log |V ar(Yt | u1:t , y1:t−1 )| 2 2 n v 2 1 (yt (xi (ut )) − µt (i)) 1 (yt (j) − µb (j))2 t − − , (13) v (i) + σ b 2 i=1 σt 2 σt (j) + σw w j∈X (ut ) where xi (ut ) is the image pixel rendered by the ith object vertex when the object assumes pose ut , and X (ut ) is the set of all image pixels rendered by the object under pose ut . Combining (12) and (13), we can derive ut (u1:t−1 ) = argmin − log p(ut | ut−1 ) ˆ (14) ut + 1 2 n i=1 [yt (xi (ut )) − µv (i)]2 [yt (xi (ut )) − µb (xi (ut ))]2 t t b − − log[σt (xi (ut )) + σw ] v b σt (i) + σw σt (xi (ut )) + σw Foreground term Background terms Note the similarity between (14) and constrained optic flow (3). For example, focus on the foreground term in (14) and ignore the weights in the denominator. The previous image yt−1 from (3) has been replaced by µv (·), the estimated object texture based on the images t and poses up to time t − 1. As in optic flow, we can find the pose estimate ut (u1:t−1 ) ˆ efficiently using the Gauss-Newton method. 3.2.2 Estimating the distribution of an expert’s pose opinion We estimate the distribution of an expert’s pose opinion using a combination of Laplace’s method and importance sampling. Suppose at time t − 1 we are given a sample of experts (d) (d) indexed by d, each endowed with a pose sequence u1:t−1 , a weight wt−1 , and the means and variances of Gaussian distributions for object and background texture. For each expert (d) (d) u1:t−1 , we use (14) to compute ut , the peak of the pose distribution at time t according ˆ (d) to that expert. Define σt as the inverse Hessian matrix of (14) at this peak, the Laplace ˆ estimate of the covariance matrix of the expert’s opinion. We then generate a set of s (d,e) (d) independent samples {ut : e = 1, · · · , s} from a Gaussian distribution with mean ut ˆ (d) (d) (d) and variance proportional to σt , g(·|ˆt , αˆt ), where the parameter α > 0 determines ˆ u σ the sharpness of the sampling distribution. (Note that letting α → 0 would be equivalent to (d,e) (d) simply setting the new pose equal to the peak of the pose opinion, ut = ut .) To find ˆ the parameters of this Gaussian proposal distribution, we use the Gauss-Newton method, ignoring the second of the two background terms in (14). (This term is not ignored in the importance sampling step.) To refine our estimate of the pose opinion we use importance sampling. We assign each sample from the proposal distribution an importance weight wt (d, e) that is proportional to the ratio between the posterior distribution and the proposal distribution: s (d) p(ut | u1:t−1 , y1:t ) = ˆ (d,e) δ(ut − ut ) wt (d, e) s f =1 wt (d, f ) (15) e=1 (d,e) (d) (d) (d,e) p(ut | ut−1 )p(yt | u1:t−1 , ut , y1:t−1 ) wt (d, e) = (16) (d,e) (d) (d) g(ut | ut , αˆt ) ˆ σ (d,e) (d) The numerator of (16) is proportional to p(ut |u1:t−1 , y1:t ) by (12), and the denominator of (16) is the sampling distribution. 3.3 Estimating an expert’s credibility (d) The credibility of the dth expert, p(u1:t−1 | y1:t ), is proportional to the product of a prior term and a likelihood term: (d) (d) p(u1:t−1 | y1:t−1 )p(yt | u1:t−1 , y1:t−1 ) (d) p(u1:t−1 | y1:t ) = . (17) p(yt | y1:t−1 ) Regarding the likelihood, p(yt |u1:t−1 , y1:t−1 ) = p(yt , ut |u1:t−1 , y1:t−1 )dut = p(yt |u1:t , y1:t−1 )p(ut |ut−1 )dut (18) (d,e) We already generated a set of samples {ut : e = 1, · · · , s} that estimate the pose opin(d) ion of the dth expert, p(ut | u1:t−1 , y1:t ). We can now use these samples to estimate the likelihood for the dth expert: (d) p(yt | u1:t−1 , y1:t−1 ) = (d) (d) p(yt | u1:t−1 , ut , y1:t−1 )p(ut | ut−1 )dut (19) (d) (d) (d) (d) = p(yt | u1:t−1 , ut , y1:t−1 )g(ut | ut , αˆt ) ˆ σ 3.4 p(ut | ut−1 ) s e=1 dut ≈ wt (d, e) s Updating the filtering distribution g(ut | (d) (d) ut , αˆt ) ˆ σ Once we have calculated the opinion and credibility of each expert u1:t−1 , we evaluate the integral in (5) as a weighted sum over experts. The credibilities of all of the experts are normalized to sum to 1. New experts u1:t (children) are created from the old experts u1:t−1 (parents) by appending a pose ut to the parent’s history of poses u1:t−1 . Every expert in the new generation is created as follows: One parent is chosen to sire the child. The probability of being chosen is proportional to the parent’s credibility. The child’s value of ut is chosen at random from its parent’s pose opinion (the weighted samples described in Section 3.2.2). 4 Relation to Optic Flow and Template Matching In basic template-matching, the same time-invariant texture map is used to track every frame in the video sequence. Optic flow can be thought of as template-matching with a template that is completely reset at each frame for use in the subsequent frame. In most cases, optimal inference under G-flow involves a combination of optic flow-based and template-based tracking, in which the texture template gradually evolves as new images are presented. Pure optic flow and template-matching emerge as special cases. Optic Flow as a Special Case Suppose that the pose transition probability p(ut | ut−1 ) is uninformative, that the background is uninformative, that every texel in the initial object texture map has equal variance, V ar(V1 ) = κIn , and that the texture transition uncertainty is very high, Ψv → diag(∞). Using (7), (8), and (10), it follows that: µv (i) = [av (ut−1 )]T yt−1 = yt−1 (xi (ut−1 )) , t (20) i.e., the object texture map at time t is determined by the pixels from image yt−1 that according to pose ut−1 were rendered by the object. As a result, (14) reduces to: ut (u1:t−1 ) = argmin ˆ ut 1 2 n yt (xi (ut )) − yt−1 (xi (ut−1 )) 2 (21) i=1 which is identical to (3). Thus constrained optic flow [10, 2, 11] is simply a special case of optimal inference under G-flow, with a single expert and with sampling parameter α → 0. The key assumption that Ψv → diag(∞) means that the object’s texture is very different in adjacent frames. However, optic flow is typically applied in situations in which the object’s texture in adjacent frames is similar. The optimal solution in such situations calls not for optic flow, but for a texture map that integrates information across multiple frames. Template Matching as a Special Case Suppose the initial texture map is known precisely, V ar(V1 ) = 0, and the texture transition uncertainty is very low, Ψv → 0. By (7), (8), and (10), it follows that µv (i) = µv (i) = µv (i), i.e., the texture map does not change t t−1 1 over time, but remains fixed at its initial value (it is a texture template). Then (14) becomes: n yt (xi (ut )) − µv (i) 1 ut (u1:t−1 ) = argmin ˆ ut 2 (22) i=1 where µv (i) is the ith texel of the fixed texture template. This is the error function mini1 mized by standard template-matching algorithms. The key assumption that Ψv → 0 means the object’s texture is constant from each frame to the next, which is rarely true in real data. G-flow provides a principled way to relax this unrealistic assumption of template methods. General Case In general, if the background is uninformative, then minimizing (14) results in a weighted combination of optic flow and template matching, with the weight of each approach depending on the current level of certainty about the object template. In addition, when there is useful information in the background, G-flow infers a model of the background which is used to improve tracking. Figure 2: G-flow tracking an outdoor video. Results are shown for frames 1, 81, and 620. 5 Simulations We collected a video (30 frames/sec) of a subject in an outdoor setting who made a variety of facial expressions while moving her head. A later motion-capture session was used to create a 3D morphable model of her face, consisting of a set of 5 morph bases (k = 5). Twenty experts were initialized randomly near the correct pose on frame 1 of the video and propagated using G-flow inference (assuming an uninformative background). See http://mplab.ucsd.edu for video. Figure 2 shows the distribution of experts for three frames. In each frame, every expert has a hypothesis about the pose (translation, rotation, scale, and morph coefficients). The 38 points in the model are projected into the image according to each expert’s pose, yielding 760 red dots in each frame. In each frame, the mean of the experts gives a single hypothesis about the 3D non-rigid deformation of the face (lower right) as well as the rigid pose of the face (rotated 3D axes, lower left). Notice G-flow’s ability to recover from error: bad initial hypotheses are weeded out, leaving only good hypotheses. To compare G-flow’s performance versus deterministic constrained optic flow algorithms such as [10, 2, 11] , we used both G-flow and the method from [2] to track the same video sequence. We ran each tracker several times, introducing small errors in the starting pose. Figure 3: Average error over time for G-flow (green) and for deterministic optic flow [2] (blue). Results were averaged over 16 runs (deterministic algorithm) or 4 runs (G-flow) and smoothed. As ground truth, the 2D locations of 6 points were hand-labeled in every 20th frame. The error at every 20th frame was calculated as the distance from these labeled locations to the inferred (tracked) locations, averaged across several runs. Figure 3 compares this tracking error as a function of time for the deterministic constrained optic flow algorithm and for a 20-expert version of the G-flow tracking algorithm. Notice that the deterministic system has a tendency to drift (increase in error) over time, whereas G-flow can recover from drift. Acknowledgments Tim K. Marks was supported by NSF grant IIS-0223052 and NSF grant DGE-0333451 to GWC. John Hershey was supported by the UCDIMI grant D00-10084. J. Cooper Roddey was supported by the Swartz Foundation. Javier R. Movellan was supported by NSF grants IIS-0086107, IIS-0220141, and IIS-0223052, and by the UCDIMI grant D00-10084. References [1] Simon Baker and Iain Matthews. Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56(3):221–255, 2002. [2] M. Brand. Flexible flow for 3D nonrigid tracking and shape recovery. In CVPR, volume 1, pages 315–322, 2001. [3] H. Chen, P. Kumar, and J. van Schuppen. On Kalman filtering for conditionally gaussian systems with random matrices. Syst. Contr. Lett., 13:397–404, 1989. [4] R. Chen and J. Liu. Mixture Kalman filters. J. R. Statist. Soc. B, 62:493–508, 2000. [5] A. Doucet and C. Andrieu. Particle filtering for partially observed gaussian state space models. J. R. Statist. Soc. B, 64:827–838, 2002. [6] A. Doucet, N. de Freitas, K. Murphy, and S. Russell. Rao-blackwellised particle filtering for dynamic bayesian networks. In 16th Conference on Uncertainty in AI, pages 176–183, 2000. [7] A. Doucet, S. J. Godsill, and C. Andrieu. On sequential monte carlo sampling methods for bayesian filtering. Statistics and Computing, 10:197–208, 2000. [8] Zoubin Ghahramani and Geoffrey E. Hinton. Variational learning for switching state-space models. Neural Computation, 12(4):831–864, 2000. [9] B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, 1981. [10] L. Torresani, D. Yang, G. Alexander, and C. Bregler. Tracking and modeling non-rigid objects with rank constraints. In CVPR, pages 493–500, 2001. [11] Lorenzo Torresani, Aaron Hertzmann, and Christoph Bregler. Learning non-rigid 3d shape from 2d motion. In Advances in Neural Information Processing Systems 16. MIT Press, 2004.

3 0.60048217 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform

Author: Changjiang Yang, Ramani Duraiswami, Larry S. Davis

Abstract: The computation and memory required for kernel machines with N training samples is at least O(N 2 ). Such a complexity is significant even for moderate size problems and is prohibitive for large datasets. We present an approximation technique based on the improved fast Gauss transform to reduce the computation to O(N ). We also give an error bound for the approximation, and provide experimental results on the UCI datasets. 1

4 0.59988719 178 nips-2004-Support Vector Classification with Input Data Uncertainty

Author: Jinbo Bi, Tong Zhang

Abstract: This paper investigates a new learning model in which the input data is corrupted with noise. We present a general statistical framework to tackle this problem. Based on the statistical reasoning, we propose a novel formulation of support vector classification, which allows uncertainty in input data. We derive an intuitive geometric interpretation of the proposed formulation, and develop algorithms to efficiently solve it. Empirical results are included to show that the newly formed method is superior to the standard SVM for problems with noisy input. 1

5 0.59971744 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer

Abstract: Prediction suffix trees (PST) provide a popular and effective tool for tasks such as compression, classification, and language modeling. In this paper we take a decision theoretic view of PSTs for the task of sequence prediction. Generalizing the notion of margin to PSTs, we present an online PST learning algorithm and derive a loss bound for it. The depth of the PST generated by this algorithm scales linearly with the length of the input. We then describe a self-bounded enhancement of our learning algorithm which automatically grows a bounded-depth PST. We also prove an analogous mistake-bound for the self-bounded algorithm. The result is an efficient algorithm that neither relies on a-priori assumptions on the shape or maximal depth of the target PST nor does it require any parameters. To our knowledge, this is the first provably-correct PST learning algorithm which generates a bounded-depth PST while being competitive with any fixed PST determined in hindsight. 1

6 0.59678209 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications

7 0.59541726 131 nips-2004-Non-Local Manifold Tangent Learning

8 0.59395552 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods

9 0.5937326 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection

10 0.59367156 168 nips-2004-Semigroup Kernels on Finite Sets

11 0.59347308 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

12 0.59320688 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

13 0.5905087 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment

14 0.5904398 201 nips-2004-Using the Equivalent Kernel to Understand Gaussian Process Regression

15 0.58942807 4 nips-2004-A Generalized Bradley-Terry Model: From Group Competition to Individual Skill

16 0.58906084 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

17 0.58766067 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images

18 0.58650506 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

19 0.58629215 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach

20 0.58558881 26 nips-2004-At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks