nips nips2002 nips2002-173 knowledge-graph by maker-knowledge-mining

173 nips-2002-Recovering Intrinsic Images from a Single Image


Source: pdf

Author: Marshall F. Tappen, William T. Freeman, Edward H. Adelson

Abstract: We present an algorithm that uses multiple cues to recover shading and reflectance intrinsic images from a single image. Using both color information and a classifier trained to recognize gray-scale patterns, each image derivative is classified as being caused by shading or a change in the surface’s reflectance. Generalized Belief Propagation is then used to propagate information from areas where the correct classification is clear to areas where it is ambiguous. We also show results on real images.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present an algorithm that uses multiple cues to recover shading and reflectance intrinsic images from a single image. [sent-7, score-0.768]

2 Using both color information and a classifier trained to recognize gray-scale patterns, each image derivative is classified as being caused by shading or a change in the surface’s reflectance. [sent-8, score-1.251]

3 Generalized Belief Propagation is then used to propagate information from areas where the correct classification is clear to areas where it is ambiguous. [sent-9, score-0.181]

4 1 Introduction Every image is the product of the characteristics of a scene. [sent-11, score-0.292]

5 Two of the most important characteristics of the scene are its shading and reflectance. [sent-12, score-0.585]

6 The shading of a scene is the interaction of the surfaces in the scene and the illumination. [sent-13, score-0.703]

7 The ability to find the reflectance of each point in the scene and how it is shaded is important because interpreting an image requires the ability to decide how these two factors affect the image. [sent-15, score-0.369]

8 For example, the geometry of an object in the scene cannot be recovered without being able to isolate the shading of every point. [sent-16, score-0.569]

9 In this work, we present a system which finds the shading and reflectance of each point in a scene by decomposing an input image into two images, one containing the shading of each point in the scene and another image containing the reflectance of each point. [sent-18, score-1.723]

10 These two images are types of a representation known as intrinsic images [1] because each image contains one intrinsic characteristic of the scene. [sent-19, score-0.682]

11 Most prior algorithms for finding shading and reflectance images can be broadly classified as generative or discriminative approaches. [sent-20, score-0.639]

12 In contrast, discriminative approaches attempt to differentiate between changes in the image caused by shading and those caused by a reflectance change. [sent-23, score-1.095]

13 Early algorithms, such as Retinex [8], were based on simple assumptions, such as the assumption that the gradients along reflectance changes have much larger magnitudes than those caused by shading. [sent-24, score-0.195]

14 That assumption does not hold for many real images, so recent algorithms have used more complex statistics to separate shading and reflectance. [sent-25, score-0.496]

15 Bell and Freeman [2] trained a classifier to use local image information to classify steerable pyramid coefficients as being due to shading or reflectance. [sent-26, score-0.926]

16 Using steerable pyramid coefficients allowed the algorithm to classify edges at multiple orientations and scales. [sent-27, score-0.151]

17 Without classifying the low-frequency residual, only band-pass filtered copies of the shading and reflectance images can be recovered. [sent-29, score-0.651]

18 In a different direction, Weiss [13] proposed using multiple images where the reflectance is constant, but the illumination changes. [sent-31, score-0.159]

19 This approach was able to create full frequency images, but required multiple input images of a fixed scene. [sent-32, score-0.178]

20 In this work, we present a system which uses multiple cues to recover full-frequency shading and reflectance intrinsic images from a single image. [sent-33, score-0.786]

21 Our approach is discriminative, using both a classifier based on color information in the image and a classifier trained to recognize local image patterns to distinguish derivatives caused by reflectance changes from derivatives caused by shading. [sent-34, score-1.411]

22 We also address the problem of ambiguous local evidence by using a Markov Random Field to propagate the classifications of those areas where the evidence is clear into ambiguous areas of the image. [sent-35, score-0.348]

23 2 Separating Shading and Reflectance Our algorithm decomposes an image into shading and reflectance images by classifying each image derivative as being caused by shading or a reflectance change. [sent-36, score-1.892]

24 We assume that the input image, I(x, y), can be expressed as the product of the shading image, S(x, y), and the reflectance image, R(x, y). [sent-37, score-0.511]

25 Considering the images in the log domain, the derivatives of the input image are the sum of the derivatives of the shading and reflectance images. [sent-38, score-1.158]

26 It is unlikely that significant shading boundaries and reflectance edges occur at the same point, thus we make the simplifying assumption that every image derivative is either caused by shading or reflectance. [sent-39, score-1.478]

27 This reduces the problem of specifying the shading and reflectance derivatives to that of binary classification of the image’s x and y derivatives. [sent-40, score-0.621]

28 Labelling each x and y derivative produces estimates of the derivatives of the shading and reflectance images. [sent-41, score-0.686]

29 Each derivative represents a set of linear constraints on the image and using both derivative images results in an over-constrained system. [sent-42, score-0.527]

30 We recover each intrinsic image from its derivatives by using the method introduced by Weiss in [13] to find the pseudo-inverse of the over-constrained system of derivatives. [sent-43, score-0.52]

31 3 Classifying Derivatives With an architecture for recovering intrinsic images, the next step is to create the classifiers to separate the underlying processes in the image. [sent-46, score-0.159]

32 Our system uses two classifiers, one which uses color information to separate shading and reflectance derivatives and a second classifier that uses local image patterns to classify each derivative. [sent-47, score-1.25]

33 Original Image Shape Image Reflectance Image Figure 1: Example computed using only color information to classify derivatives. [sent-48, score-0.276]

34 To facilitate printing, the intrinsic images have been computed from a gray-scale version of the image. [sent-49, score-0.203]

35 The color information is used solely for classifying derivatives in the gray-scale copy of the image. [sent-50, score-0.418]

36 1 Using Color Information Our system takes advantage of the property that changes in color between pixels indicate a reflectance change [10]. [sent-52, score-0.352]

37 When surfaces are diffuse, any changes in a color image due to shading should affect all three color channels proportionally. [sent-53, score-1.346]

38 Assume two adjacent pixels in the image have values c1 and c2 , where c1 and c2 are RGB triplets. [sent-54, score-0.294]

39 If the change between the two pixels is caused by shading, then only the intensity of the color changes and c2 = αc1 for some scalar α. [sent-55, score-0.482]

40 If c2 = αc1 , the chromaticity of the colors has changed and the color change must have been caused by a reflectance change. [sent-56, score-0.462]

41 A chromaticity change in the image indicates that the reflectance must have changed at that point. [sent-57, score-0.355]

42 c c c c When the change is caused by shading, (ˆ1 · ˆ2 ) equals 1. [sent-60, score-0.165]

43 Using only the color information, this approach is similar to that used in [6]. [sent-62, score-0.234]

44 The primary difference is that our system classifies the vertical and horizontal derivatives independently. [sent-63, score-0.16]

45 2 Using Gray-Scale Information While color information is useful, it is not sufficient to properly decompose images. [sent-68, score-0.234]

46 A change in color intensity could be caused by either shading or a reflectance change. [sent-69, score-0.915]

47 Using only local color information, color intensity changes cannot be classified properly. [sent-70, score-0.572]

48 Fortunately, shading patterns have a unique appearance which can be discriminated from most common reflectance patterns. [sent-71, score-0.532]

49 This allows us to use the local gray-scale image pattern surrounding a derivative to classify it. [sent-72, score-0.422]

50 The filter, w is the same size as the image patch, I, and we only consider the response at the center of Ip . [sent-76, score-0.291]

51 This makes the feature a function from a patch of image data to a scalar response. [sent-77, score-0.301]

52 We use the responses of linear Figure 2: Example images from the training set. [sent-79, score-0.14]

53 The first two are examples of reflectance changes and the last three are examples of shading (a) Original Image (b) Shading Image (c) Reflectance Image Figure 3: Results obtained using the gray-scale classifier. [sent-80, score-0.575]

54 The non-linear filters are used to classify derivatives with a classifier similar to that used by Tieu and Viola in [12]. [sent-82, score-0.167]

55 The training set consists of a mix of images of rendered fractal surfaces and images of shaded ellipses placed randomly in the image. [sent-88, score-0.363]

56 Examples of reflectance changes were created using images of random lines and images of random ellipse painted onto the image. [sent-89, score-0.338]

57 When evaluating test images, the classifier will assume that the test image is also lit from the right. [sent-92, score-0.276]

58 The results can be evaluated by thinking of the shading image as how the scene should appear if it were made entirely of gray plastic. [sent-94, score-0.845]

59 The reflectance image should appear very flat, with the the three-dimensional depth cues placed in the shading image. [sent-95, score-0.824]

60 Our system performs well on the image shown in Figure 3. [sent-96, score-0.294]

61 The shading image has a very uniform appearance, with almost all of the effects of the reflectance changes placed in the reflectance image. [sent-97, score-0.838]

62 The examples shown are computed without taking the log of the input image before processing it. [sent-98, score-0.308]

63 The input images are uncalibrated and ordinary photographic tonescale is very similar to a log transformation. [sent-99, score-0.157]

64 Errors from not taking log of the input image first would (a) (b) (c) (d) Figure 4: An example where propagation is needed. [sent-100, score-0.344]

65 The smile from the pillow image in (a) has been enlarged in (b). [sent-101, score-0.34]

66 Figures (c) and (d) contain an example of shading and a reflectance change, respectively. [sent-102, score-0.496]

67 Locally, the center of the mouth in (b) is as similar to the shading example in (c) as it is to the example reflectance change in (d). [sent-103, score-0.607]

68 This is found by combining the local evidence from the color and gray-scale classifiers, then using Generalized Belief Propagation to propagate local evidence. [sent-105, score-0.407]

69 cause one intrinsic image to modulate the local brightness of the other. [sent-106, score-0.397]

70 4 Propagating Evidence While the classifier works well, there are still areas in the image where the local information is ambiguous. [sent-108, score-0.374]

71 When compared to the example shading and reflectance change in Figure 4(c) and 4(d), the center of the mouth in Figure 4(b) is equally well classified with either label. [sent-110, score-0.607]

72 However, the corners of the mouth can be classified as being caused by a reflectance change with little ambiguity. [sent-111, score-0.243]

73 Since the derivatives in the corner of the mouth and the center all lie on the same image contour, they should have the same classification. [sent-112, score-0.475]

74 A mechanism is needed to propagate information from the corners of the mouth, where the classification is clear, into areas where the local evidence is ambiguous. [sent-113, score-0.212]

75 This will allow areas where the classification is clear to disambiguate those areas where it is not. [sent-114, score-0.133]

76 In order to propagate evidence, we treat each derivative as a node in a Markov Random Field with two possible states, indicating whether the derivative is caused by shading or caused by a reflectance change. [sent-115, score-0.951]

77 Setting the compatibility functions between nodes correctly will force nodes along the same contour to have the same classification. [sent-116, score-0.22]

78 Since derivatives along an image contour should have the same classification, β should be close to 1 when two neighboring derivatives are along a contour and should be 0. [sent-121, score-0.727]

79 Since β depends on the image at each point, we express it as β(Ixy ), where Ixy is the image information at some point. [sent-123, score-0.552]

80 To ensure β(Ixy ) between 0 and 1, it is modelled as β(Ixy ) = g(z(Ixy )), where g(·) is the logistic function and z(Ixy ) has a large response along image contours. [sent-124, score-0.298]

81 2 Learning the Potential Functions The function z(Ixy ) is based on two local image features, the magnitude of the image and the difference in orientation between the gradient and the orientation of the graph edge. [sent-126, score-0.693]

82 These features reflect our heuristic that derivatives along an image contour should have the same classification. [sent-127, score-0.491]

83 ˆ The difference in orientation between a horizontal graph edge and image contour, φ, is found from the orientation of the image gradient, φ. [sent-128, score-0.659]

84 Assuming that −π/2 ≤ φ ≤ π/2, the ˆ ˆ angle between a horizontal edge and the image gradient,φ, is φ = |φ|. [sent-129, score-0.338]

85 The function relating the image features to ψ(·), z(·), is chosen to be a linear function and is found by maximizing equation 5 over a set of training images similar to those used to train the local classifier. [sent-135, score-0.455]

86 3 (6) ˆ where | I| is the magnitude of the image gradient and both φ and | I| have been normalized to be between 0 and 1. [sent-140, score-0.316]

87 5 for regions of the image with a gradient magnitude less than 0. [sent-142, score-0.316]

88 The values in equation 6 correspond with our expected results; two derivatives are constrained to have the same value when they are along an edge in the image that has a similar orientation to the edge in the MRF connecting the two nodes. [sent-148, score-0.51]

89 The local evidence for each node in the MRF is obtained from the results of the color classifier and from the gray-scale classifier by assuming that the two are statistically independent. [sent-151, score-0.341]

90 It is necessary to use the color information because propagation cannot help in areas where the gray-scale classifier misses an edge altogether. [sent-152, score-0.374]

91 In Figure 5, the cheek patches on the pillow, which are pink in the color image, are missed by the gray-scale classifier, but caught by the color classifier. [sent-153, score-0.468]

92 For the results shown, we used the results of the AdaBoost classifier to classify the gray-scale images and used the method suggested by Friedman et al. [sent-154, score-0.163]

93 We used the Generalized Belief Propagation algorithm [14] to infer the best label of each node in the MRF because ordinary Belief Propagation performed poorly in areas with both weak local evidence and strong compatibility constraints. [sent-156, score-0.263]

94 The ripples on the pillow are correctly identified as being caused by shading, while the face is correctly identified as having been painted on. [sent-158, score-0.301]

95 In a second example, shown in Figure 6, the algorithm correctly identifies the change in reflectance between the sweatshirt and the jersey and correctly identifies the folds in the clothing as being caused by shading. [sent-159, score-0.255]

96 There are some small shading artifacts in the reflectance image, especially around the sleeves of the sweatshirt, presumably caused by particular shapes not present in the training set. [sent-160, score-0.643]

97 5 Discussion We have presented a system that is able to use multiple cues to produce shading and reflectance intrinsic images from a single image. [sent-162, score-0.767]

98 The most computationally intense steps for recovering the shading and reflectance images are computing the local evidence, which takes about six minutes on a 700MHz Pentium for a 256 × 256 image, and running the Generalized Belief Propagation algorithm. [sent-164, score-0.727]

99 Belief propagation was used on both the x and y derivative images and took around 6 minutes to run 200 iterations on each image. [sent-165, score-0.271]

100 Color vision and image intensities: When are changes material. [sent-251, score-0.362]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ectance', 0.6), ('shading', 0.496), ('image', 0.276), ('color', 0.234), ('re', 0.164), ('classi', 0.154), ('ixy', 0.143), ('caused', 0.128), ('derivatives', 0.125), ('images', 0.121), ('mrf', 0.083), ('intrinsic', 0.082), ('er', 0.073), ('scene', 0.073), ('fx', 0.071), ('contour', 0.068), ('derivative', 0.065), ('pillow', 0.064), ('surfaces', 0.061), ('mouth', 0.059), ('fy', 0.059), ('areas', 0.059), ('recovering', 0.054), ('propagation', 0.053), ('belief', 0.052), ('painted', 0.051), ('propagate', 0.048), ('evidence', 0.047), ('compatibility', 0.047), ('changes', 0.045), ('classify', 0.042), ('chromaticity', 0.042), ('vision', 0.041), ('local', 0.039), ('pyramid', 0.038), ('change', 0.037), ('steerable', 0.035), ('freeman', 0.035), ('classifying', 0.034), ('adaboost', 0.034), ('retinex', 0.032), ('sweatshirt', 0.032), ('orientation', 0.031), ('cues', 0.031), ('ip', 0.029), ('correctly', 0.029), ('weak', 0.029), ('edge', 0.028), ('ers', 0.028), ('tieu', 0.028), ('nodes', 0.027), ('patch', 0.025), ('copy', 0.025), ('rgb', 0.025), ('adelson', 0.025), ('lter', 0.025), ('gradient', 0.025), ('lters', 0.024), ('create', 0.023), ('along', 0.022), ('labelling', 0.022), ('discriminative', 0.022), ('colors', 0.021), ('ordinary', 0.021), ('neighboring', 0.021), ('placed', 0.021), ('node', 0.021), ('intensity', 0.02), ('patterns', 0.02), ('shaded', 0.02), ('surface', 0.02), ('generalized', 0.019), ('illumination', 0.019), ('corners', 0.019), ('weiss', 0.019), ('recover', 0.019), ('training', 0.019), ('multiple', 0.019), ('xj', 0.019), ('system', 0.018), ('pixels', 0.018), ('identi', 0.018), ('ambiguous', 0.017), ('examples', 0.017), ('minutes', 0.017), ('angle', 0.017), ('convolution', 0.017), ('horizontal', 0.017), ('edges', 0.017), ('characteristics', 0.016), ('bell', 0.016), ('appearance', 0.016), ('field', 0.016), ('friedman', 0.016), ('recognize', 0.015), ('took', 0.015), ('input', 0.015), ('magnitude', 0.015), ('center', 0.015), ('clear', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 173 nips-2002-Recovering Intrinsic Images from a Single Image

Author: Marshall F. Tappen, William T. Freeman, Edward H. Adelson

Abstract: We present an algorithm that uses multiple cues to recover shading and reflectance intrinsic images from a single image. Using both color information and a classifier trained to recognize gray-scale patterns, each image derivative is classified as being caused by shading or a change in the surface’s reflectance. Generalized Belief Propagation is then used to propagate information from areas where the correct classification is clear to areas where it is ambiguous. We also show results on real images.

2 0.31979755 202 nips-2002-Unsupervised Color Constancy

Author: Kinh Tieu, Erik G. Miller

Abstract: In [1] we introduced a linear statistical model of joint color changes in images due to variation in lighting and certain non-geometric camera parameters. We did this by measuring the mappings of colors in one image of a scene to colors in another image of the same scene under different lighting conditions. Here we increase the flexibility of this color flow model by allowing flow coefficients to vary according to a low order polynomial over the image. This allows us to better fit smoothly varying lighting conditions as well as curved surfaces without endowing our model with too much capacity. We show results on image matching and shadow removal and detection.

3 0.1796267 10 nips-2002-A Model for Learning Variance Components of Natural Images

Author: Yan Karklin, Michael S. Lewicki

Abstract: We present a hierarchical Bayesian model for learning efficient codes of higher-order structure in natural images. The model, a non-linear generalization of independent component analysis, replaces the standard assumption of independence for the joint distribution of coefficients with a distribution that is adapted to the variance structure of the coefficients of an efficient image basis. This offers a novel description of higherorder image structure and provides a way to learn coarse-coded, sparsedistributed representations of abstract image properties such as object location, scale, and texture.

4 0.16581564 105 nips-2002-How to Combine Color and Shape Information for 3D Object Recognition: Kernels do the Trick

Author: B. Caputo, Gy. Dorkó

Abstract: This paper presents a kernel method that allows to combine color and shape information for appearance-based object recognition. It doesn't require to define a new common representation, but use the power of kernels to combine different representations together in an effective manner. These results are achieved using results of statistical mechanics of spin glasses combined with Markov random fields via kernel functions. Experiments show an increase in recognition rate up to 5.92% with respect to conventional strategies. 1

5 0.15442429 39 nips-2002-Bayesian Image Super-Resolution

Author: Michael E. Tipping, Christopher M. Bishop

Abstract: The extraction of a single high-quality image from a set of lowresolution images is an important problem which arises in fields such as remote sensing, surveillance, medical imaging and the extraction of still images from video. Typical approaches are based on the use of cross-correlation to register the images followed by the inversion of the transformation from the unknown high resolution image to the observed low resolution images, using regularization to resolve the ill-posed nature of the inversion process. In this paper we develop a Bayesian treatment of the super-resolution problem in which the likelihood function for the image registration parameters is based on a marginalization over the unknown high-resolution image. This approach allows us to estimate the unknown point spread function, and is rendered tractable through the introduction of a Gaussian process prior over images. Results indicate a significant improvement over techniques based on MAP (maximum a-posteriori) point optimization of the high resolution image and associated registration parameters. 1

6 0.14436685 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture

7 0.14145054 182 nips-2002-Shape Recipes: Scene Representations that Refer to the Image

8 0.1062542 92 nips-2002-FloatBoost Learning for Classification

9 0.10620281 74 nips-2002-Dynamic Structure Super-Resolution

10 0.10332717 133 nips-2002-Learning to Perceive Transparency from the Statistics of Natural Scenes

11 0.10228158 148 nips-2002-Morton-Style Factorial Coding of Color in Primary Visual Cortex

12 0.093969837 57 nips-2002-Concurrent Object Recognition and Segmentation by Graph Partitioning

13 0.089706905 126 nips-2002-Learning Sparse Multiscale Image Representations

14 0.084009193 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking

15 0.081602916 2 nips-2002-A Bilinear Model for Sparse Coding

16 0.077840835 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

17 0.072823539 45 nips-2002-Boosted Dyadic Kernel Discriminants

18 0.071733199 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions

19 0.065010734 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs

20 0.064343087 65 nips-2002-Derivative Observations in Gaussian Process Models of Dynamic Systems


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.181), (1, -0.014), (2, 0.024), (3, 0.316), (4, 0.13), (5, -0.081), (6, 0.233), (7, -0.012), (8, -0.078), (9, -0.07), (10, -0.133), (11, 0.014), (12, -0.051), (13, -0.071), (14, 0.037), (15, -0.137), (16, -0.209), (17, -0.017), (18, -0.033), (19, -0.041), (20, 0.076), (21, 0.018), (22, -0.029), (23, 0.041), (24, 0.056), (25, 0.034), (26, -0.026), (27, 0.032), (28, 0.061), (29, -0.083), (30, -0.028), (31, 0.006), (32, 0.132), (33, -0.033), (34, 0.084), (35, 0.007), (36, 0.014), (37, 0.007), (38, -0.045), (39, 0.089), (40, 0.08), (41, -0.024), (42, -0.037), (43, 0.049), (44, 0.019), (45, 0.018), (46, -0.031), (47, 0.002), (48, -0.03), (49, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95454443 173 nips-2002-Recovering Intrinsic Images from a Single Image

Author: Marshall F. Tappen, William T. Freeman, Edward H. Adelson

Abstract: We present an algorithm that uses multiple cues to recover shading and reflectance intrinsic images from a single image. Using both color information and a classifier trained to recognize gray-scale patterns, each image derivative is classified as being caused by shading or a change in the surface’s reflectance. Generalized Belief Propagation is then used to propagate information from areas where the correct classification is clear to areas where it is ambiguous. We also show results on real images.

2 0.86598682 202 nips-2002-Unsupervised Color Constancy

Author: Kinh Tieu, Erik G. Miller

Abstract: In [1] we introduced a linear statistical model of joint color changes in images due to variation in lighting and certain non-geometric camera parameters. We did this by measuring the mappings of colors in one image of a scene to colors in another image of the same scene under different lighting conditions. Here we increase the flexibility of this color flow model by allowing flow coefficients to vary according to a low order polynomial over the image. This allows us to better fit smoothly varying lighting conditions as well as curved surfaces without endowing our model with too much capacity. We show results on image matching and shadow removal and detection.

3 0.74247932 182 nips-2002-Shape Recipes: Scene Representations that Refer to the Image

Author: William T. Freeman, Antonio Torralba

Abstract: The goal of low-level vision is to estimate an underlying scene, given an observed image. Real-world scenes (eg, albedos or shapes) can be very complex, conventionally requiring high dimensional representations which are hard to estimate and store. We propose a low-dimensional representation, called a scene recipe, that relies on the image itself to describe the complex scene configurations. Shape recipes are an example: these are the regression coefficients that predict the bandpassed shape from image data. We describe the benefits of this representation, and show two uses illustrating their properties: (1) we improve stereo shape estimates by learning shape recipes at low resolution and applying them at full resolution; (2) Shape recipes implicitly contain information about lighting and materials and we use them for material segmentation.

4 0.67567396 105 nips-2002-How to Combine Color and Shape Information for 3D Object Recognition: Kernels do the Trick

Author: B. Caputo, Gy. Dorkó

Abstract: This paper presents a kernel method that allows to combine color and shape information for appearance-based object recognition. It doesn't require to define a new common representation, but use the power of kernels to combine different representations together in an effective manner. These results are achieved using results of statistical mechanics of spin glasses combined with Markov random fields via kernel functions. Experiments show an increase in recognition rate up to 5.92% with respect to conventional strategies. 1

5 0.55278677 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture

Author: David R. Martin, Charless C. Fowlkes, Jitendra Malik

Abstract: The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness and texture associated with natural boundaries. In order to combine the information from these features in an optimal way, a classifier is trained using human labeled images as ground truth. We present precision-recall curves showing that the resulting detector outperforms existing approaches.

6 0.52005148 133 nips-2002-Learning to Perceive Transparency from the Statistics of Natural Scenes

7 0.49784863 39 nips-2002-Bayesian Image Super-Resolution

8 0.44284117 126 nips-2002-Learning Sparse Multiscale Image Representations

9 0.41237083 10 nips-2002-A Model for Learning Variance Components of Natural Images

10 0.41025212 74 nips-2002-Dynamic Structure Super-Resolution

11 0.34765124 57 nips-2002-Concurrent Object Recognition and Segmentation by Graph Partitioning

12 0.34701225 196 nips-2002-The RA Scanner: Prediction of Rheumatoid Joint Inflammation Based on Laser Imaging

13 0.33917403 92 nips-2002-FloatBoost Learning for Classification

14 0.33066535 2 nips-2002-A Bilinear Model for Sparse Coding

15 0.3298761 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking

16 0.32634932 148 nips-2002-Morton-Style Factorial Coding of Color in Primary Visual Cortex

17 0.31430683 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions

18 0.3047846 55 nips-2002-Combining Features for BCI

19 0.30275539 45 nips-2002-Boosted Dyadic Kernel Discriminants

20 0.30054936 150 nips-2002-Multiple Cause Vector Quantization


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.012), (11, 0.011), (14, 0.012), (23, 0.024), (42, 0.055), (45, 0.276), (54, 0.115), (55, 0.021), (57, 0.021), (64, 0.014), (67, 0.013), (68, 0.054), (74, 0.138), (92, 0.018), (98, 0.107)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79482162 173 nips-2002-Recovering Intrinsic Images from a Single Image

Author: Marshall F. Tappen, William T. Freeman, Edward H. Adelson

Abstract: We present an algorithm that uses multiple cues to recover shading and reflectance intrinsic images from a single image. Using both color information and a classifier trained to recognize gray-scale patterns, each image derivative is classified as being caused by shading or a change in the surface’s reflectance. Generalized Belief Propagation is then used to propagate information from areas where the correct classification is clear to areas where it is ambiguous. We also show results on real images.

2 0.78187877 31 nips-2002-Application of Variational Bayesian Approach to Speech Recognition

Author: Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, Naonori Ueda

Abstract: In this paper, we propose a Bayesian framework, which constructs shared-state triphone HMMs based on a variational Bayesian approach, and recognizes speech based on the Bayesian prediction classification; variational Bayesian estimation and clustering for speech recognition (VBEC). An appropriate model structure with high recognition performance can be found within a VBEC framework. Unlike conventional methods, including BIC or MDL criterion based on the maximum likelihood approach, the proposed model selection is valid in principle, even when there are insufficient amounts of data, because it does not use an asymptotic assumption. In isolated word recognition experiments, we show the advantage of VBEC over conventional methods, especially when dealing with small amounts of data.

3 0.76417363 43 nips-2002-Binary Coding in Auditory Cortex

Author: Michael R. Deweese, Anthony M. Zador

Abstract: Cortical neurons have been reported to use both rate and temporal codes. Here we describe a novel mode in which each neuron generates exactly 0 or 1 action potentials, but not more, in response to a stimulus. We used cell-attached recording, which ensured single-unit isolation, to record responses in rat auditory cortex to brief tone pips. Surprisingly, the majority of neurons exhibited binary behavior with few multi-spike responses; several dramatic examples consisted of exactly one spike on 100% of trials, with no trial-to-trial variability in spike count. Many neurons were tuned to stimulus frequency. Since individual trials yielded at most one spike for most neurons, the information about stimulus frequency was encoded in the population, and would not have been accessible to later stages of processing that only had access to the activity of a single unit. These binary units allow a more efficient population code than is possible with conventional rate coding units, and are consistent with a model of cortical processing in which synchronous packets of spikes propagate stably from one neuronal population to the next. 1 Binary coding in auditory cortex We recorded responses of neurons in the auditory cortex of anesthetized rats to pure-tone pips of different frequencies [1, 2]. Each pip was presented repeatedly, allowing us to assess the variability of the neural response to multiple presentations of each stimulus. We first recorded multi-unit activity with conventional tungsten electrodes (Fig. 1a). The number of spikes in response to each pip fluctuated markedly from one trial to the next (Fig. 1e), as though governed by a random mechanism such as that generating the ticks of a Geiger counter. Highly variable responses such as these, which are at least as variable as a Poisson process, are the norm in the cortex [3-7], and have contributed to the widely held view that cortical spike trains are so noisy that only the average firing rate can be used to encode stimuli. Because we were recording the activity of an unknown number of neurons, we could not be sure whether the strong trial-to-trial fluctuations reflected the underlying variability of the single units. We therefore used an alternative technique, cell- a b Single-unit recording method 5mV Multi-unit 1sec Raw cellattached voltage 10 kHz c Single-unit . . . . .. .. ... . . .... . ... . Identified spikes Threshold e 28 kHz d Single-unit 80 120 160 200 Time (msec) N = 29 tones 3 2 1 Poisson N = 11 tones ry 40 4 na bi 38 kHz 0 Response variance/mean (spikes/trial) High-pass filtered 0 0 1 2 3 Mean response (spikes/trial) Figure 1: Multi-unit spiking activity was highly variable, but single units obeyed binomial statistics. a Multi-unit spike rasters from a conventional tungsten electrode recording showed high trial-to-trial variability in response to ten repetitions of the same 50 msec pure tone stimulus (bottom). Darker hash marks indicate spike times within the response period, which were used in the variability analysis. b Spikes recorded in cell-attached mode were easily identified from the raw voltage trace (top) by applying a high-pass filter (bottom) and thresholding (dark gray line). Spike times (black squares) were assigned to the peaks of suprathreshold segments. c Spike rasters from a cell-attached recording of single-unit responses to 25 repetitions of the same tone consisted of exactly one well-timed spike per trial (latency standard deviation = 1.0 msec), unlike the multi-unit responses (Fig. 1a). Under the Poisson assumption, this would have been highly unlikely (P ~ 10 -11). d The same neuron as in Fig. 1c responds with lower probability to repeated presentations of a different tone, but there are still no multi-spike responses. e We quantified response variability for each tone by dividing the variance in spike count by the mean spike count across all trials for that tone. Response variability for multi-unit tungsten recording (open triangles) was high for each of the 29 tones (out of 32) that elicited at least one spike on one trial. All but one point lie above one (horizontal gray line), which is the value produced by a Poisson process with any constant or time varying event rate. Single unit responses recorded in cell-attached mode were far less variable (filled circles). Ninety one percent (10/11) of the tones that elicited at least one spike from this neuron produced no multi-spike responses in 25 trials; the corresponding points fall on the diagonal line between (0,1) and (1,0), which provides a strict lower bound on the variability for any response set with a mean between 0 and 1. No point lies above one. attached recording with a patch pipette [8, 9], in order to ensure single unit isolation (Fig. 1b). This recording mode minimizes both of the main sources of error in spike detection: failure to detect a spike in the unit under observation (false negatives), and contamination by spikes from nearby neurons (false positives). It also differs from conventional extracellular recording methods in its selection bias: With cell- attached recording neurons are selected solely on the basis of the experimenter’s ability to form a seal, rather than on the basis of neuronal activity and responsiveness to stimuli as in conventional methods. Surprisingly, single unit responses were far more orderly than suggested by the multi-unit recordings; responses typically consisted of either 0 or 1 spikes per trial, and not more (Fig. 1c-e). In the most dramatic examples, each presentation of the same tone pip elicited exactly one spike (Fig. 1c). In most cases, however, some presentations failed to elicit a spike (Fig. 1d). Although low-variability responses have recently been observed in the cortex [10, 11] and elsewhere [12, 13], the binary behavior described here has not previously been reported for cortical neurons. a 1.4 N = 3055 response sets b 1.2 1 Poisson 28 kHz - 100 msec 0.8 0.6 0.4 0.2 0 0 ry na bi Response variance/mean (spikes/trial) The majority of the neurons (59%) in our study for which statistical significance could be assessed (at the p<0.001 significance level; see Fig. 2, caption) showed noisy binary behavior—“binary” because neurons produced either 0 or 1 spikes, and “noisy” because some stimuli elicited both single spikes and failures. In a substantial fraction of neurons, however, the responses showed more variability. We found no correlation between neuronal variability and cortical layer (inferred from the depth of the recording electrode), cortical area (inside vs. outside of area A1) or depth of anesthesia. Moreover, the binary mode of spiking was not due to the brevity (25 msec) of the stimuli; responses that were binary for short tones were comparably binary when longer (100 msec) tones were used (Fig. 2b). Not assessable Not significant Significant (p<0.001) 0.2 0.4 0.6 0.8 1 1.2 Mean response (spikes/trial) 28 kHz - 25 msec 1.4 0 40 80 120 160 Time (msec) 200 Figure 2: Half of the neuronal population exhibited binary firing behavior. a Of the 3055 sets of responses to 25 msec tones, 2588 (gray points) could not be assessed for significance at the p<0.001 level, 225 (open circles) were not significantly binary, and 242 were significantly binary (black points; see Identification methods for group statistics below). All points were jittered slightly so that overlying points could be seen in the figure. 2165 response sets contained no multi-spike responses; the corresponding points fell on the line from [0,1] to [1,0]. b The binary nature of single unit responses was insensitive to tone duration, even for frequencies that elicited the largest responses. Twenty additional spike rasters from the same neuron (and tone frequency) as in Fig. 1c contain no multi-spike responses whether in response to 100 msec tones (above) or 25 msec tones (below). Across the population, binary responses were as prevalent for 100 msec tones as for 25 msec tones (see Identification methods for group statistics). In many neurons, binary responses showed high temporal precision, with latencies sometimes exhibiting standard deviations as low as 1 msec (Fig. 3; see also Fig. 1c), comparable to previous observations in the auditory cortex [14], and only slightly more precise than in monkey visual area MT [5]. High temporal precision was positively correlated with high response probability (Fig. 3). a b N = (44 cells)x(32 tones) 14 N = 32 tones 12 30 Jitter (msec) Jitter (msec) 40 10 8 6 20 10 4 2 0 0 0 0.2 0.4 0.6 0.8 Mean response (spikes/trial) 1 0 0.4 0.8 1.2 1.6 Mean response (spikes/trial) 2 Figure 3: Trial-to-trial variability in latency of response to repeated presentations of the same tone decreased with increasing response probability. a Scatter plot of standard deviation of latency vs. mean response for 25 presentations each of 32 tones for a different neuron as in Figs. 1 and 2 (gray line is best linear fit). Rasters from 25 repeated presentations of a low response tone (upper left inset, which corresponds to left-most data point) display much more variable latencies than rasters from a high response tone (lower right inset; corresponds to right-most data point). b The negative correlation between latency variability and response size was present on average across the population of 44 neurons described in Identification methods for group statistics (linear fit, gray). The low trial-to-trial variability ruled out the possibility that the firing statistics could be accounted for by a simple rate-modulated Poisson process (Fig. 4a1,a2). In other systems, low variability has sometimes been modeled as a Poisson process followed by a post-spike refractory period [10, 12]. In our system, however, the range in latencies of evoked binary responses was often much greater than the refractory period, which could not have been longer than the 2 msec inter-spike intervals observed during epochs of spontaneous spiking, indicating that binary spiking did not result from any intrinsic property of the spike generating mechanism (Fig. 4a3). Moreover, a single stimulus-evoked spike could suppress subsequent spikes for as long as hundreds of milliseconds (e.g. Figs. 1d,4d), supporting the idea that binary spiking arises through a circuit-level, rather than a single-neuron, mechanism. Indeed, the fact that this suppression is observed even in the cortex of awake animals [15] suggests that binary spiking is not a special property of the anesthetized state. It seems surprising that binary spiking in the cortex has not previously been remarked upon. In the auditory cortex the explanation may be in part technical: Because firing rates in the auditory cortex tend to be low, multi-unit recording is often used to maximize the total amount of data collected. Moreover, our use of cell-attached recording minimizes the usual bias toward responsive or active neurons. Such explanations are not, however, likely to account for the failure to observe binary spiking in the visual cortex, where spike count statistics have been scrutinized more closely [3-7]. One possibility is that this reflects a fundamental difference between the auditory and visual systems. An alternative interpretation— a1 b Response probability 100 spikes/s 2 kHz Poisson simulation c 100 200 300 400 Time (msec) 500 20 Ratio of pool sizes a2 0 16 12 8 4 0 a3 Poisson with refractory period 0 40 80 120 160 200 Time (msec) d Response probability PSTH 0.2 0.4 0.6 0.8 1 Mean spike count per neuron 1 0.8 N = 32 tones 0.6 0.4 0.2 0 2.0 3.8 7.1 13.2 24.9 46.7 Tone frequency (kHz) Figure 4: a The lack of multi-spike responses elicited by the neuron shown in Fig. 3a were not due to an absolute refractory period since the range of latencies for many tones, like that shown here, was much greater than any reasonable estimate for the neuron’s refractory period. (a1) Experimentally recorded responses. (a2) Using the smoothed post stimulus time histogram (PSTH; bottom) from the set of responses in Fig. 4a, we generated rasters under the assumption of Poisson firing. In this representative example, four double-spike responses (arrows at left) were produced in 25 trials. (a3) We then generated rasters assuming that the neuron fired according to a Poisson process subject to a hard refractory period of 2 msec. Even with a refractory period, this representative example includes one triple- and three double-spike responses. The minimum interspike-interval during spontaneous firing events was less than two msec for five of our neurons, so 2 msec is a conservative upper bound for the refractory period. b. Spontaneous activity is reduced following high-probability responses. The PSTH (top; 0.25 msec bins) of the combined responses from the 25% (8/32) of tones that elicited the largest responses from the same neuron as in Figs. 3a and 4a illustrates a preclusion of spontaneous and evoked activity for over 200 msec following stimulation. The PSTHs from progressively less responsive groups of tones show progressively less preclusion following stimulation. c Fewer noisy binary neurons need to be pooled to achieve the same “signal-to-noise ratio” (SNR; see ref. [24]) as a collection of Poisson neurons. The ratio of the number of Poisson to binary neurons required to achieve the same SNR is plotted against the mean number of spikes elicited per neuron following stimulation; here we have defined the SNR to be the ratio of the mean spike count to the standard deviation of the spike count. d Spike probability tuning curve for the same neuron as in Figs. 1c-e and 2b fit to a Gaussian in tone frequency. and one that we favor—is that the difference rests not in the sensory modality, but instead in the difference between the stimuli used. In this view, the binary responses may not be limited to the auditory cortex; neurons in visual and other sensory cortices might exhibit similar responses to the appropriate stimuli. For example, the tone pips we used might be the auditory analog of a brief flash of light, rather than the oriented moving edges or gratings usually used to probe the primary visual cortex. Conversely, auditory stimuli analogous to edges or gratings [16, 17] may be more likely to elicit conventional, rate-modulated Poisson responses in the auditory cortex. Indeed, there may be a continuum between binary and Poisson modes. Thus, even in conventional rate-modulated responses, the first spike is often privileged in that it carries most of the information in the spike train [5, 14, 18]. The first spike may be particularly important as a means of rapidly signaling stimulus transients. Binary responses suggest a mode that complements conventional rate coding. In the simplest rate-coding model, a stimulus parameter (such as the frequency of a tone) governs only the rate at which a neuron generates spikes, but not the detailed positions of the spikes; the actual spike train itself is an instantiation of a random process (such as a Poisson process). By contrast, in the binomial model, the stimulus parameter (frequency) is encoded as the probability of firing (Fig. 4d). Binary coding has implications for cortical computation. In the rate coding model, stimulus encoding is “ergodic”: a stimulus parameter can be read out either by observing the activity of one neuron for a long time, or a population for a short time. By contrast, in the binary model the stimulus value can be decoded only by observing a neuronal population, so that there is no benefit to integrating over long time periods (cf. ref. [19]). One advantage of binary encoding is that it allows the population to signal quickly; the most compact message a neuron can send is one spike [20]. Binary coding is also more efficient in the context of population coding, as quantified by the signal-to-noise ratio (Fig. 4c). The precise organization of both spike number and time we have observed suggests that cortical activity consists, at least under some conditions, of packets of spikes synchronized across populations of neurons. Theoretical work [21-23] has shown how such packets can propagate stably from one population to the next, but only if neurons within each population fire at most one spike per packet; otherwise, the number of spikes per packet—and hence the width of each packet—grows at each propagation step. Interestingly, one prediction of stable propagation models is that spike probability should be related to timing precision, a prediction born out by our observations (Fig. 3). The role of these packets in computation remains an open question. 2 Identification methods for group statistics We recorded responses to 32 different 25 msec tones from each of 175 neurons from the auditory cortices of 16 Sprague-Dawley rats; each tone was repeated between 5 and 75 times (mean = 19). Thus our ensemble consisted of 32x175=5600 response sets, with between 5 and 75 samples in each set. Of these, 3055 response sets contained at least one spike on at least on trial. For each response set, we tested the hypothesis that the observed variability was significantly lower than expected from the null hypothesis of a Poisson process. The ability to assess significance depended on two parameters: the sample size (5-75) and the firing probability. Intuitively, the dependence on firing probability arises because at low firing rates most responses produce only trials with 0 or 1 spikes under both the Poisson and binary models; only at high firing rates do the two models make different predictions, since in that case the Poisson model includes many trials with 2 or even 3 spikes while the binary model generates only solitary spikes (see Fig. 4a1,a2). Using a stringent significance criterion of p<0.001, 467 response sets had a sufficient number of repeats to assess significance, given the observed firing probability. Of these, half (242/467=52%) were significantly less variable than expected by chance, five hundred-fold higher than the 467/1000=0.467 response sets expected, based on the 0.001 significance criterion, to yield a binary response set. Seventy-two neurons had at least one response set for which significance could be assessed, and of these, 49 neurons (49/72=68%) had at least one significantly sub-Poisson response set. Of this population of 49 neurons, five achieved low variability through repeatable bursty behavior (e.g., every spike count was either 0 or 3, but not 1 or 2) and were excluded from further analysis. The remaining 44 neurons formed the basis for the group statistics analyses shown in Figs. 2a and 3b. Nine of these neurons were subjected to an additional protocol consisting of at least 10 presentations each of 100 msec tones and 25 msec tones of all 32 frequencies. Of the 100 msec stimulation response sets, 44 were found to be significantly sub-Poisson at the p<0.05 level, in good agreement with the 43 found to be significant among the responses to 25 msec tones. 3 Bibliography 1. Kilgard, M.P. and M.M. Merzenich, Cortical map reorganization enabled by nucleus basalis activity. Science, 1998. 279(5357): p. 1714-8. 2. Sally, S.L. and J.B. Kelly, Organization of auditory cortex in the albino rat: sound frequency. J Neurophysiol, 1988. 59(5): p. 1627-38. 3. Softky, W.R. and C. Koch, The highly irregular firing of cortical cells is inconsistent with temporal integration of random EPSPs. J Neurosci, 1993. 13(1): p. 334-50. 4. Stevens, C.F. and A.M. Zador, Input synchrony and the irregular firing of cortical neurons. Nat Neurosci, 1998. 1(3): p. 210-7. 5. Buracas, G.T., A.M. Zador, M.R. DeWeese, and T.D. Albright, Efficient discrimination of temporal patterns by motion-sensitive neurons in primate visual cortex. Neuron, 1998. 20(5): p. 959-69. 6. Shadlen, M.N. and W.T. Newsome, The variable discharge of cortical neurons: implications for connectivity, computation, and information coding. J Neurosci, 1998. 18(10): p. 3870-96. 7. Tolhurst, D.J., J.A. Movshon, and A.F. Dean, The statistical reliability of signals in single neurons in cat and monkey visual cortex. Vision Res, 1983. 23(8): p. 775-85. 8. Otmakhov, N., A.M. Shirke, and R. Malinow, Measuring the impact of probabilistic transmission on neuronal output. Neuron, 1993. 10(6): p. 1101-11. 9. Friedrich, R.W. and G. Laurent, Dynamic optimization of odor representations by slow temporal patterning of mitral cell activity. Science, 2001. 291(5505): p. 889-94. 10. Kara, P., P. Reinagel, and R.C. Reid, Low response variability in simultaneously recorded retinal, thalamic, and cortical neurons. Neuron, 2000. 27(3): p. 635-46. 11. Gur, M., A. Beylin, and D.M. Snodderly, Response variability of neurons in primary visual cortex (V1) of alert monkeys. J Neurosci, 1997. 17(8): p. 2914-20. 12. Berry, M.J., D.K. Warland, and M. Meister, The structure and precision of retinal spike trains. Proc Natl Acad Sci U S A, 1997. 94(10): p. 5411-6. 13. de Ruyter van Steveninck, R.R., G.D. Lewen, S.P. Strong, R. Koberle, and W. Bialek, Reproducibility and variability in neural spike trains. Science, 1997. 275(5307): p. 1805-8. 14. Heil, P., Auditory cortical onset responses revisited. I. First-spike timing. J Neurophysiol, 1997. 77(5): p. 2616-41. 15. Lu, T., L. Liang, and X. Wang, Temporal and rate representations of timevarying signals in the auditory cortex of awake primates. Nat Neurosci, 2001. 4(11): p. 1131-8. 16. Kowalski, N., D.A. Depireux, and S.A. Shamma, Analysis of dynamic spectra in ferret primary auditory cortex. I. Characteristics of single-unit responses to moving ripple spectra. J Neurophysiol, 1996. 76(5): p. 350323. 17. deCharms, R.C., D.T. Blake, and M.M. Merzenich, Optimizing sound features for cortical neurons. Science, 1998. 280(5368): p. 1439-43. 18. Panzeri, S., R.S. Petersen, S.R. Schultz, M. Lebedev, and M.E. Diamond, The role of spike timing in the coding of stimulus location in rat somatosensory cortex. Neuron, 2001. 29(3): p. 769-77. 19. Britten, K.H., M.N. Shadlen, W.T. Newsome, and J.A. Movshon, The analysis of visual motion: a comparison of neuronal and psychophysical performance. J Neurosci, 1992. 12(12): p. 4745-65. 20. Delorme, A. and S.J. Thorpe, Face identification using one spike per neuron: resistance to image degradations. Neural Netw, 2001. 14(6-7): p. 795-803. 21. Diesmann, M., M.O. Gewaltig, and A. Aertsen, Stable propagation of synchronous spiking in cortical neural networks. Nature, 1999. 402(6761): p. 529-33. 22. Marsalek, P., C. Koch, and J. Maunsell, On the relationship between synaptic input and spike output jitter in individual neurons. Proc Natl Acad Sci U S A, 1997. 94(2): p. 735-40. 23. Kistler, W.M. and W. Gerstner, Stable propagation of activity pulses in populations of spiking neurons. Neural Comp., 2002. 14: p. 987-997. 24. Zohary, E., M.N. Shadlen, and W.T. Newsome, Correlated neuronal discharge rate and its implications for psychophysical performance. Nature, 1994. 370(6485): p. 140-3. 25. Abbott, L.F. and P. Dayan, The effect of correlated variability on the accuracy of a population code. Neural Comput, 1999. 11(1): p. 91-101.

4 0.60414571 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture

Author: David R. Martin, Charless C. Fowlkes, Jitendra Malik

Abstract: The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness and texture associated with natural boundaries. In order to combine the information from these features in an optimal way, a classifier is trained using human labeled images as ground truth. We present precision-recall curves showing that the resulting detector outperforms existing approaches.

5 0.60084236 135 nips-2002-Learning with Multiple Labels

Author: Rong Jin, Zoubin Ghahramani

Abstract: In this paper, we study a special kind of learning problem in which each training instance is given a set of (or distribution over) candidate class labels and only one of the candidate labels is the correct one. Such a problem can occur, e.g., in an information retrieval setting where a set of words is associated with an image, or if classes labels are organized hierarchically. We propose a novel discriminative approach for handling the ambiguity of class labels in the training examples. The experiments with the proposed approach over five different UCI datasets show that our approach is able to find the correct label among the set of candidate labels and actually achieve performance close to the case when each training instance is given a single correct label. In contrast, naIve methods degrade rapidly as more ambiguity is introduced into the labels. 1

6 0.59958351 124 nips-2002-Learning Graphical Models with Mercer Kernels

7 0.59802353 74 nips-2002-Dynamic Structure Super-Resolution

8 0.59622467 89 nips-2002-Feature Selection by Maximum Marginal Diversity

9 0.59539872 39 nips-2002-Bayesian Image Super-Resolution

10 0.59378135 152 nips-2002-Nash Propagation for Loopy Graphical Games

11 0.59367836 175 nips-2002-Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

12 0.59189159 52 nips-2002-Cluster Kernels for Semi-Supervised Learning

13 0.59186119 2 nips-2002-A Bilinear Model for Sparse Coding

14 0.59140986 48 nips-2002-Categorization Under Complexity: A Unified MDL Account of Human Learning of Regular and Irregular Categories

15 0.59125757 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions

16 0.58986777 162 nips-2002-Parametric Mixture Models for Multi-Labeled Text

17 0.58984244 141 nips-2002-Maximally Informative Dimensions: Analyzing Neural Responses to Natural Signals

18 0.58760756 93 nips-2002-Forward-Decoding Kernel-Based Phone Recognition

19 0.58714861 53 nips-2002-Clustering with the Fisher Score

20 0.58650184 122 nips-2002-Learning About Multiple Objects in Images: Factorial Learning without Factorial Search