nips nips2009 nips2009-259 knowledge-graph by maker-knowledge-mining

259 nips-2009-Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Source: pdf

Author: Jie Luo, Barbara Caputo, Vittorio Ferrari

Abstract: Given a corpus of news items consisting of images accompanied by text captions, we want to ﬁnd out “who’s doing what”, i.e. associate names and action verbs in the captions to the face and body pose of the persons in the images. We present a joint model for simultaneously solving the image-caption correspondences and learning visual appearance models for the face and pose classes occurring in the corpus. These models can then be used to recognize people and actions in novel images without captions. We demonstrate experimentally that our joint ‘face and pose’ model solves the correspondence problem better than earlier models covering only the face, and that it can perform recognition of new uncaptioned images. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 associate names and action verbs in the captions to the face and body pose of the persons in the images. [sent-8, score-1.485]

2 We present a joint model for simultaneously solving the image-caption correspondences and learning visual appearance models for the face and pose classes occurring in the corpus. [sent-9, score-0.638]

3 1 Introduction A huge amount of images with accompanying text captions are available on the Internet. [sent-12, score-0.371]

4 The learned models could then be used in a variety of Computer Vision applications, including face recognition, image search engines, and to annotate new images for which no caption is available. [sent-17, score-0.563]

5 Previous works on news items has focused on associating names in the captions to faces in the images [5, 6, 16, 21]. [sent-21, score-0.696]

6 This is difﬁcult due to the correspondence ambiguity problem: multiple persons appear in the image and the caption. [sent-22, score-0.357]

7 Moreover, persons in the image are not always mentioned in the caption, and not all names in the caption appear in the image. [sent-23, score-0.69]

8 As a result, these methods work well for frequently occurring persons (typical for famous people) appearing in dataset with thousands of news items. [sent-25, score-0.274]

9 In this paper we propose to go beyond the above works, by modeling both names and action verbs jointly. [sent-26, score-0.399]

10 These correspond to faces and body poses in the images (ﬁgure 3). [sent-27, score-0.281]

11 The connections between the subject (name) and verb in a caption can be found by well established language analysis techniques [1, 8]. [sent-28, score-0.685]

12 We present a new generative model where the observed variables are names and verbs in the caption as well as detected persons in the image. [sent-30, score-0.829]

13 The image-caption correspondences are carried by latent variables, while the visual appearance of face and pose classes corresponding to different names and verbs are model parameters. [sent-31, score-1.035]

14 The face and upper body of the persons in the image are marked by bounding-boxes. [sent-45, score-0.569]

15 We stress a caption might contain names and/or verbs not visible in the image, and vice versa. [sent-46, score-0.562]

16 In our joint model, the correspondence ambiguity is reduced because the face and pose information help each other. [sent-47, score-0.604]

17 This paper is most closely related to works on associating names and faces, which we discussed above. [sent-52, score-0.248]

18 There exist also works on associating nouns to image regions [2, 3, 10], starting from images annotated with a list of nouns indicating the objects it contains (typical datasets contain natural scenes and objects such as ‘water’ and ‘tiger’). [sent-53, score-0.255]

19 2 Generative model for faces and body poses The news item corpus used to train our face and pose model consists of still images of person(s) performing some action(s). [sent-58, score-0.867]

20 Each image is annotated with a caption describing “who’s doing what” in the image (ﬁgure 1). [sent-59, score-0.344]

21 Some names from the caption might not appear in the image, and viceversa some imaged persons might not be mentioned in the caption. [sent-60, score-0.625]

22 The basic units in our model are persons in the image, consisting of their face and upper body. [sent-61, score-0.45]

23 Our system automatically detects them by bounding-boxes in the image using a face detector [23] and an upper body detector [14]. [sent-62, score-0.34]

24 In the rest of the paper, we say “person” to indicate a detected face and the upper body associated with it (including false positive detections). [sent-63, score-0.336]

25 A face and an upper-body are considered to belong to the same person if the face lies near the center of the upper body bounding-box. [sent-64, score-0.601]

26 For each person, we obtain a pose estimate using [11] (ﬁgure 3(right)). [sent-65, score-0.32]

27 Our goals are to: (i) associate the persons in the images to the name-verb pairs in the captions, and (ii) learn visual appearance models corresponding to names and verbs. [sent-67, score-0.639]

28 , DM } with each document Di consisting of an image I i and its caption C i . [sent-76, score-0.283]

29 These captions implicitly provide the labels of the person(s)’ name(s) and pose(s) in the corresponding images. [sent-77, score-0.262]

30 For each caption C i , we consider only the name-verb pairs ni returned by a language parser [1, 8] and ignore other words. [sent-78, score-0.343]

31 We make the same assumptions as for the name-face problem [5, 6, 16, 21] that the labels can only come from the name-verb pairs in the captions or null (for persons not mentioned in the caption). [sent-79, score-0.766]

32 , y i,P ), y i,p is the assignment of the pth person in ith image Ai : Set of possible assignments for document i Ai = {ai , . [sent-83, score-0.371]

33 , ai i } 1 L Li : Number of possible assignments for document D i ai : l ai l i {ai,1 , . [sent-86, score-0.447]

34 , ai,P }, l l where lth assignment = Θ: Appearance models for face and pose classes V : Number of different verbs U : Number of different names θ k : Sets of class representative vectors for class k v v θverb = {µv,1 , . [sent-89, score-0.964]

35 , µv,R } pose pose ai,p l is the label for the pth person Θ = (θname , θverb ) 1 V θverb = (θverb , . [sent-92, score-0.796]

36 , µu,R } face face Table I: The mathematical notation used in the paper I C W Y P A L M Figure 2: Graphical plate representation of the generative model. [sent-101, score-0.442]

37 Hence, we replace the captions by the sets of possible assignments A = {A1 , . [sent-104, score-0.358]

38 , y i,P ) be the assignment for the P i i,p i,p persons in the ith image. [sent-116, score-0.256]

39 Each y i,p = (yface , ypose ) is a pair of indices deﬁning the assignment of a person’s face to a name and pose to a verb. [sent-117, score-0.836]

40 N/V is the number of different names/verbs over all the captions and null represents unknown names/verbs and false positive person detections. [sent-125, score-0.623]

41 Assuming independence between multiple persons in an image, the likelihood of an image can be expressed as the product over the likelihood of each person: P (I i,p |y i,p , Θ) P (I i |Y i , Θ) = (2) I i,p ∈I i i,p i,p where y i,p deﬁne the name-verb indices of the pth person in the image. [sent-134, score-0.45]

42 A person I i,p = (Iface , Ipose ) i,p i,p is represented by the appearance of her face Iface and pose Ipose . [sent-135, score-0.743]

43 , θverb , βverb ) is a set of representative vectors modeling the variability within the pose class corresponding to a verb v. [sent-140, score-0.801]

44 For example, the verb “serve” in tennis could correspond to different poses such as holding the ball on the racket, tossing the ball and hitting it. [sent-141, score-0.542]

45 u Analogously, θname models the variability within the face class corresponding to a name u. [sent-142, score-0.396]

46 2 Face and pose descriptors and similarity measures After detecting faces from the images with the multi-view algorithm [23], we use [12] to detect nine distinctive feature points within the face bounding box (ﬁgure 3(left)). [sent-144, score-0.711]

47 A pose E consists of a distribution over the position (x, y and orientation) for each of 6 body parts (head, torso, upper/lower 3 Figure 3: Example images with facial features and pose estimates superimposed. [sent-149, score-0.76]

48 Left Facial features (left and right corners of each eye, two nostrils, tip of the nose, and the left and right corners of the mouth) located using [12] in the detected face bounding-box. [sent-150, score-0.259]

49 The pose estimator factors out variations due to clothing and background, so E conveys purely spatial arrangements of body parts. [sent-156, score-0.374]

50 We derive three relatively low-dimensional pose descriptors from E, as proposed in [13]. [sent-157, score-0.349]

51 These descriptors represent pose in different ways, such as the relative position between pairs of body parts, and part-speciﬁc soft-segmentations of the image (i. [sent-158, score-0.51]

52 3 Appearance model The appearance model for a pose class (corresponding to a verb) is deﬁned as: i,p i,p P (Ipose |ypose , θverb ) = i,p i,p k δ(ypose , k) · P (Ipose |θverb ) (4) k∈{1,. [sent-165, score-0.417]

53 ,V,null} k where θverb are the parameters of the kth pose class (or βverb if k = null). [sent-168, score-0.32]

54 We only explain here the model for a pose class, as the face model is derived analogously. [sent-170, score-0.541]

55 Some previous works on names-faces used a Gaussian mixture model [6, 21]: each name is associated with a Gaussian density, plus an additional Gaussian to model the null class. [sent-172, score-0.429]

56 Problems such as face and pose recognition are particularly challenging because they involve complex non-Gaussian multimodal distributions. [sent-175, score-0.566]

57 Figure 3(right) shows a few examples of the variance within the pose class for a verb. [sent-176, score-0.32]

58 Moreover, we cannot easily employ existing pose similarity measures [13]. [sent-177, score-0.32]

59 , µpose }, where Rk is the number of pose r k representative poses for verb k. [sent-181, score-0.887]

60 The scalar βverb represents the null model, thus poses assigned to null have likelihood Zθ1 e−βverb . [sent-183, score-0.552]

61 It is important to have this null model, as some detected persons might not verb correspond to any verb in the caption or they might be false detections. [sent-184, score-1.626]

62 4 Name-verb assignments The name-verb pairs ni for a document are observed in its caption C i . [sent-190, score-0.356]

63 , ai i } of name-verb pairs to persons in the image. [sent-194, score-0.379]

64 The 1 L number of possible assignments Li depends both on the number of persons and of name-verb pairs. [sent-195, score-0.325]

65 Therefore, given a document with P i persons and min(P i ,W i ) P i Wi W i name-verb pairs, the number of possible assignments is Li = j=0 j , where j · j is the number of persons assigned to a name-verb pair instead of null. [sent-198, score-0.581]

66 The reported results are based on automatically parsed captions for learning. [sent-274, score-0.282]

67 For each different name/verb, we select all captions containing only this name/verb. [sent-280, score-0.262]

68 If a name/verb only appears in captions with multiple names/verbs or if the corresponding images always contain multiple persons (e. [sent-283, score-0.557]

69 Each point I i,p in a cluster is given a weight i,p wY i = P (Y i |I i,p , Ai , Θ) j i,p , Ai , Θ) Y j ∈Ai P (Y |I (11) i,p i,p which represents the likelihood that Iface and Ipose belong to the name and verb deﬁned by Y i . [sent-298, score-0.631]

70 Therefore, faces and poses from images with many detections have a lower weights and contribute less to the cluster centers, reﬂecting the larger uncertainty in their assignments. [sent-299, score-0.249]

71 Faces often occupy most of the image so the body pose is not visible. [sent-302, score-0.439]

72 Second, the captions frequently describe the event at an abstract level, rather than using a verb to describe the actions of the persons in the image (compare ﬁgure 1 to the ﬁgures in [6, 16]). [sent-303, score-1.012]

73 Therefore, we collected a new dataset 2 by querying Google-images using a combination of names and verbs (from sports and social interactions), corresponding to distinct upper body poses. [sent-304, score-0.425]

74 Our dataset contains 1610 images, each with at least one person whose face occupies less than 5% of the image, and with the accompanying snippet of text returned by Google-images. [sent-306, score-0.369]

75 Sarkozy - embrace Brian Cowen - null Hu Jintao - Wave R. [sent-335, score-0.233]

76 Sarkozy null Hu Jintao Hu Jintao - shake hands J. [sent-341, score-0.536]

77 Bakjyev - shake hands Kyrgyzstan - null F:: FP: null;null;null null;null;Hu Jintao null;Hu Jintao N. [sent-347, score-0.536]

78 Bakjyev Figure 5: Examples of when modeling pose improves the results at learning time. [sent-353, score-0.32]

79 Below the images we report the name-verb pairs (C) from the caption as returned by the automatic parser and compare the association recovered by a model using only faces (F) and using both faces and poses (FP). [sent-354, score-0.607]

80 The assigned names (left to right) correspond to the detected face bounding-boxes (left to right). [sent-355, score-0.464]

81 Sharapova Hold Hu Jintao Shakehands Figure 6: Recognition results on images without text captions (using models learned from automatically parsed captions). [sent-362, score-0.369]

82 Left compares face annotation using different models and scenarios (see main text); Right shows a few examples of the labels predicted by the joint face and pose model (without using captions). [sent-363, score-0.806]

83 extend these snippets into realistic captions when necessary, with varied long sentences, mentioning the action of the persons in the image as well as names/verbs not appearing in the image (as ‘noise’, ﬁgure 1). [sent-364, score-0.649]

84 Moreover, they also annotated the ground-truth name-verb pairs mentioned in the captions as well as the location of the target persons in the images, enabling to evaluate results quantitatively. [sent-365, score-0.556]

85 In our experiments we only consider 7 names and verbs occurring in at least 3 captions for a name, and 20 captions for a verb. [sent-367, score-0.895]

86 This leaves 69 names corresponding to 69 face classes and 20 verbs corresponding to 20 pose classes. [sent-368, score-0.912]

87 We used an open source Named Entity recognizer [1] to detect names in the captions and a language parser [8] to ﬁnd name-verbs pairs (or name-null if the language parser could not ﬁnd a verb associated with a name). [sent-369, score-1.185]

88 By using simple stemming rules, the same verb under different tenses and possessive adjectives was merged together. [sent-370, score-0.479]

89 For instance “shake their hands”, “is shaking hands” and “shakes hands” all correspond to the action verb “shake hands”. [sent-371, score-0.484]

90 By discarding infrequent names and verbs as explained above, we retain 85 names and 20 verbs to be learned by our model (recall that some of these are false positives rather than actual person names and verbs). [sent-375, score-1.075]

91 We compare experimentally our face and pose model to stripped-down versions using only face or pose information. [sent-377, score-1.082]

92 The accuracy is deﬁned as the percentage of correct assignments over all detected persons, including assignments to null, as in [5, 16]. [sent-384, score-0.23]

93 As the ﬁgure shows, our joint ‘face and pose’ model outperforms both models using face or pose alone in all setups. [sent-385, score-0.541]

94 As a second point, our model with face alone also outperforms the baseline approach using Gaussian mixture appearance models (e. [sent-389, score-0.318]

95 Figure 5 shows a few examples of how including pose improves the learning results and solve some of the correspondence ambiguities. [sent-392, score-0.362]

96 We collected a new set of 100 images and captions from Google-images using ﬁve keywords based on names and verbs from the training dataset. [sent-399, score-0.72]

97 Here we run inference on the model, recovering the best assignment Y from the set of possible assignments generated from the captions; (b) the same test images are used but the captions are not given, so the problem degenerates to a standard face and pose recognition task. [sent-401, score-1.017]

98 Figure 6(left) reports face annotation accuracy for three methods using captions (scenario (a)): (⋄) a baseline which randomly assigns a name (or null) from the caption to each face in the image; (x) our face and pose model; ( ) our model using only faces. [sent-402, score-1.655]

99 On scenario (a) all models outperform the baseline, and our joint face and pose model improves signiﬁcantly on the face-only model for all keywords, especially when there are multiple persons in the image. [sent-404, score-0.77]

100 We present an approach for the joint modeling of faces and poses in images and their association to names and action verbs in accompanying text captions. [sent-406, score-0.669]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('verb', 0.456), ('pose', 0.32), ('captions', 0.262), ('null', 0.233), ('persons', 0.229), ('face', 0.221), ('names', 0.205), ('caption', 0.191), ('name', 0.175), ('verbs', 0.166), ('shake', 0.163), ('ipose', 0.152), ('jintao', 0.152), ('hands', 0.14), ('ai', 0.108), ('person', 0.105), ('appearance', 0.097), ('assignments', 0.096), ('obama', 0.093), ('ypose', 0.093), ('hu', 0.092), ('poses', 0.086), ('iface', 0.082), ('merkel', 0.082), ('faces', 0.075), ('parser', 0.072), ('federer', 0.072), ('clinton', 0.07), ('jankovic', 0.07), ('sarkozy', 0.07), ('images', 0.066), ('nadal', 0.066), ('image', 0.065), ('body', 0.054), ('hit', 0.053), ('pth', 0.051), ('backhand', 0.051), ('ferrari', 0.05), ('dpose', 0.047), ('garnett', 0.047), ('news', 0.045), ('gure', 0.045), ('annotation', 0.044), ('pairs', 0.042), ('correspondence', 0.042), ('torso', 0.041), ('wave', 0.04), ('detected', 0.038), ('language', 0.038), ('centers', 0.036), ('simpose', 0.035), ('yface', 0.035), ('fp', 0.033), ('williams', 0.032), ('forehand', 0.031), ('descriptors', 0.029), ('nouns', 0.029), ('berg', 0.029), ('action', 0.028), ('assignment', 0.027), ('document', 0.027), ('barnard', 0.026), ('arms', 0.026), ('latent', 0.026), ('documents', 0.026), ('representative', 0.025), ('recognition', 0.025), ('setups', 0.024), ('gmm', 0.024), ('people', 0.024), ('annotated', 0.023), ('adjectives', 0.023), ('agassi', 0.023), ('annoation', 0.023), ('bakjyev', 0.023), ('barack', 0.023), ('celtics', 0.023), ('cowen', 0.023), ('deschacht', 0.023), ('gulbis', 0.023), ('idiap', 0.023), ('kiss', 0.023), ('logz', 0.023), ('marin', 0.023), ('mensink', 0.023), ('cvpr', 0.023), ('false', 0.023), ('associating', 0.022), ('accompanying', 0.022), ('detections', 0.022), ('ambiguity', 0.021), ('keywords', 0.021), ('works', 0.021), ('text', 0.021), ('caputo', 0.02), ('websites', 0.02), ('annotate', 0.02), ('garcia', 0.02), ('parsed', 0.02), ('prepositions', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 259 nips-2009-Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Author: Jie Luo, Barbara Caputo, Vittorio Ferrari

2 0.23215388 66 nips-2009-Differential Use of Implicit Negative Evidence in Generative and Discriminative Language Learning

Author: Anne Hsu, Thomas L. Griffiths

Abstract: A classic debate in cognitive science revolves around understanding how children learn complex linguistic rules, such as those governing restrictions on verb alternations, without negative evidence. Traditionally, formal learnability arguments have been used to claim that such learning is impossible without the aid of innate language-speciﬁc knowledge. However, recently, researchers have shown that statistical models are capable of learning complex rules from only positive evidence. These two kinds of learnability analyses differ in their assumptions about the distribution from which linguistic input is generated. The former analyses assume that learners seek to identify grammatical sentences in a way that is robust to the distribution from which the sentences are generated, analogous to discriminative approaches in machine learning. The latter assume that learners are trying to estimate a generative model, with sentences being sampled from that model. We show that these two learning approaches differ in their use of implicit negative evidence – the absence of a sentence – when learning verb alternations, and demonstrate that human learners can produce results consistent with the predictions of both approaches, depending on how the learning problem is presented. 1

3 0.16108115 236 nips-2009-Structured output regression for detection with partial truncation

Author: Andrea Vedaldi, Andrew Zisserman

Abstract: We develop a structured output model for object category detection that explicitly accounts for alignment, multiple aspects and partial truncation in both training and inference. The model is formulated as large margin learning with latent variables and slack rescaling, and both training and inference are computationally efﬁcient. We make the following contributions: (i) we note that extending the Structured Output Regression formulation of Blaschko and Lampert [1] to include a bias term signiﬁcantly improves performance; (ii) that alignment (to account for small rotations and anisotropic scalings) can be included as a latent variable and efﬁciently determined and implemented; (iii) that the latent variable extends to multiple aspects (e.g. left facing, right facing, front) with the same formulation; and (iv), most signiﬁcantly for performance, that truncated and truncated instances can be included in both training and inference with an explicit truncation mask. We demonstrate the method by training and testing on the PASCAL VOC 2007 data set – training includes the truncated examples, and in testing object instances are detected at multiple scales, alignments, and with signiﬁcant truncations. 1

4 0.093189128 96 nips-2009-Filtering Abstract Senses From Image Search Results

Author: Kate Saenko, Trevor Darrell

Abstract: We propose an unsupervised method that, given a word, automatically selects non-abstract senses of that word from an online ontology and generates images depicting the corresponding entities. When faced with the task of learning a visual model based only on the name of an object, a common approach is to ﬁnd images on the web that are associated with the object name and train a visual classiﬁer from the search result. As words are generally polysemous, this approach can lead to relatively noisy models if many examples due to outlier senses are added to the model. We argue that images associated with an abstract word sense should be excluded when training a visual classiﬁer to learn a model of a physical object. While image clustering can group together visually coherent sets of returned images, it can be difﬁcult to distinguish whether an image cluster relates to a desired object or to an abstract sense of the word. We propose a method that uses both image features and the text associated with the images to relate latent topics to particular senses. Our model does not require any human supervision, and takes as input only the name of an object category. We show results of retrieving concrete-sense images in two available multimodal, multi-sense databases, as well as experiment with object classiﬁers trained on concrete-sense images returned by our method for a set of ten common ofﬁce objects. 1

5 0.091000974 8 nips-2009-A Fast, Consistent Kernel Two-Sample Test

Author: Arthur Gretton, Kenji Fukumizu, Zaïd Harchaoui, Bharath K. Sriperumbudur

Abstract: A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufﬁciently rich RKHS, this distance is zero if and only if P and Q coincide. In using this distance as a statistic for a test of whether two samples are from different distributions, a major difﬁculty arises in computing the significance threshold, since the empirical statistic has as its null distribution (where P = Q) an inﬁnite weighted sum of χ2 random variables. Prior ﬁnite sample approximations to the null distribution include using bootstrap resampling, which yields a consistent estimate but is computationally costly; and ﬁtting a parametric model with the low order moments of the test statistic, which can work well in practice but has no consistency or accuracy guarantees. The main result of the present work is a novel estimate of the null distribution, computed from the eigenspectrum of the Gram matrix on the aggregate sample from P and Q, and having lower computational cost than the bootstrap. A proof of consistency of this estimate is provided. The performance of the null distribution estimate is compared with the bootstrap and parametric approaches on an artiﬁcial example, high dimensional multivariate data, and text.

6 0.068763442 129 nips-2009-Learning a Small Mixture of Trees

7 0.065533936 211 nips-2009-Segmenting Scenes by Matching Image Composites

8 0.062592641 133 nips-2009-Learning models of object structure

9 0.058968861 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

10 0.055797338 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

11 0.051787596 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

12 0.049956948 153 nips-2009-Modeling Social Annotation Data with Content Relevance using a Topic Model

13 0.049789041 201 nips-2009-Region-based Segmentation and Object Detection

14 0.047790773 233 nips-2009-Streaming Pointwise Mutual Information

15 0.045668062 260 nips-2009-Zero-shot Learning with Semantic Output Codes

16 0.042138185 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection

17 0.042120654 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

18 0.041103303 104 nips-2009-Group Sparse Coding

19 0.040979277 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm

20 0.039837856 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.125), (1, -0.07), (2, -0.093), (3, -0.056), (4, -0.021), (5, 0.046), (6, -0.035), (7, 0.043), (8, 0.055), (9, 0.06), (10, 0.002), (11, -0.024), (12, 0.057), (13, -0.039), (14, 0.046), (15, 0.004), (16, 0.04), (17, 0.022), (18, 0.032), (19, 0.021), (20, 0.009), (21, 0.004), (22, -0.072), (23, 0.028), (24, -0.079), (25, 0.148), (26, 0.107), (27, 0.025), (28, 0.024), (29, 0.175), (30, -0.114), (31, -0.063), (32, -0.099), (33, -0.065), (34, 0.046), (35, 0.044), (36, -0.052), (37, -0.261), (38, 0.183), (39, 0.304), (40, 0.004), (41, 0.024), (42, -0.048), (43, 0.169), (44, -0.037), (45, -0.136), (46, 0.16), (47, 0.03), (48, 0.045), (49, 0.076)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95331222 259 nips-2009-Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Author: Jie Luo, Barbara Caputo, Vittorio Ferrari

2 0.80764556 66 nips-2009-Differential Use of Implicit Negative Evidence in Generative and Discriminative Language Learning

Author: Anne Hsu, Thomas L. Griffiths

3 0.50265974 236 nips-2009-Structured output regression for detection with partial truncation

Author: Andrea Vedaldi, Andrew Zisserman

4 0.48584867 233 nips-2009-Streaming Pointwise Mutual Information

Author: Benjamin V. Durme, Ashwin Lall

Abstract: Recent work has led to the ability to perform space efﬁcient, approximate counting over large vocabularies in a streaming context. Motivated by the existence of data structures of this type, we explore the computation of associativity scores, otherwise known as pointwise mutual information (PMI), in a streaming context. We give theoretical bounds showing the impracticality of perfect online PMI computation, and detail an algorithm with high expected accuracy. Experiments on news articles show our approach gives high accuracy on real world data. 1

5 0.3617053 153 nips-2009-Modeling Social Annotation Data with Content Relevance using a Topic Model

Author: Tomoharu Iwata, Takeshi Yamada, Naonori Ueda

Abstract: We propose a probabilistic topic model for analyzing and extracting contentrelated annotations from noisy annotated discrete data such as web pages stored in social bookmarking services. In these services, since users can attach annotations freely, some annotations do not describe the semantics of the content, thus they are noisy, i.e. not content-related. The extraction of content-related annotations can be used as a preprocessing step in machine learning tasks such as text classiﬁcation and image recognition, or can improve information retrieval performance. The proposed model is a generative model for content and annotations, in which the annotations are assumed to originate either from topics that generated the content or from a general distribution unrelated to the content. We demonstrate the effectiveness of the proposed method by using synthetic data and real social annotation data for text and images.

6 0.32678413 196 nips-2009-Quantification and the language of thought

7 0.29977158 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

8 0.27807763 197 nips-2009-Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs

9 0.27553302 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

10 0.27438435 8 nips-2009-A Fast, Consistent Kernel Two-Sample Test

11 0.26253808 211 nips-2009-Segmenting Scenes by Matching Image Composites

12 0.24517362 93 nips-2009-Fast Image Deconvolution using Hyper-Laplacian Priors

13 0.24158937 96 nips-2009-Filtering Abstract Senses From Image Search Results

14 0.22701116 46 nips-2009-Bilinear classifiers for visual recognition

15 0.22018433 258 nips-2009-Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise

16 0.21476659 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

17 0.20945454 137 nips-2009-Learning transport operators for image manifolds

18 0.20875885 97 nips-2009-Free energy score space

19 0.20666856 104 nips-2009-Group Sparse Coding

20 0.20556825 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.017), (25, 0.063), (35, 0.047), (36, 0.057), (39, 0.041), (58, 0.032), (61, 0.014), (71, 0.031), (81, 0.013), (86, 0.063), (91, 0.508)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92392468 39 nips-2009-Bayesian Belief Polarization

Author: Alan Jern, Kai-min Chang, Charles Kemp

Abstract: Empirical studies have documented cases of belief polarization, where two people with opposing prior beliefs both strengthen their beliefs after observing the same evidence. Belief polarization is frequently offered as evidence of human irrationality, but we demonstrate that this phenomenon is consistent with a fully Bayesian approach to belief revision. Simulation results indicate that belief polarization is not only possible but relatively common within the set of Bayesian models that we consider. Suppose that Carol has requested a promotion at her company and has received a score of 50 on an aptitude test. Alice, one of the company’s managers, began with a high opinion of Carol and became even more conﬁdent of her abilities after seeing her test score. Bob, another manager, began with a low opinion of Carol and became even less conﬁdent about her qualiﬁcations after seeing her score. On the surface, it may appear that either Alice or Bob is behaving irrationally, since the same piece of evidence has led them to update their beliefs about Carol in opposite directions. This situation is an example of belief polarization [1, 2], a widely studied phenomenon that is often taken as evidence of human irrationality [3, 4]. In some cases, however, belief polarization may appear much more sensible when all the relevant information is taken into account. Suppose, for instance, that Alice was familiar with the aptitude test and knew that it was scored out of 60, but that Bob was less familiar with the test and assumed that the score was a percentage. Even though only one interpretation of the score can be correct, Alice and Bob have both made rational inferences given their assumptions about the test. Some instances of belief polarization are almost certain to qualify as genuine departures from rational inference, but we argue in this paper that others will be entirely compatible with a rational approach. Distinguishing between these cases requires a precise normative standard against which human inferences can be compared. We suggest that Bayesian inference provides this normative standard, and present a set of Bayesian models that includes cases where polarization can and cannot emerge. Our work is in the spirit of previous studies that use careful rational analyses in order to illuminate apparently irrational human behavior (e.g. [5, 6, 7]). Previous studies of belief polarization have occasionally taken a Bayesian approach, but often the goal is to show how belief polarization can emerge as a consequence of approximate inference in a Bayesian model that is subject to memory constraints or processing limitations [8]. In contrast, we demonstrate that some examples of polarization are compatible with a fully Bayesian approach. Other formal accounts of belief polarization have relied on complex versions of utility theory [9], or have focused on continuous hypothesis spaces [10] unlike the discrete hypothesis spaces usually considered by psychological studies of belief polarization. We focus on discrete hypothesis spaces and require no additional machinery beyond the basics of Bayesian inference. We begin by introducing the belief revision phenomena considered in this paper and developing a Bayesian approach that clariﬁes whether and when these phenomena should be considered irrational. We then consider several Bayesian models that are capable of producing belief polarization and illustrate them with concrete examples. Having demonstrated that belief polarization is compatible 1 (a) Contrary updating (i) Divergence (ii) (b) Parallel updating Convergence A P (h1 ) 0.5 0.5 0.5 B Prior beliefs Updated beliefs Prior beliefs Updated beliefs Prior beliefs Updated beliefs Figure 1: Examples of belief updating behaviors for two individuals, A (solid line) and B (dashed line). The individuals begin with different beliefs about hypothesis h1 . After observing the same set of evidence, their beliefs may (a) move in opposite directions or (b) move in the same direction. with a Bayesian approach, we present simulations suggesting that this phenomenon is relatively generic within the space of models that we consider. We ﬁnish with some general comments on human rationality and normative models. 1 Belief revision phenomena The term “belief polarization” is generally used to describe situations in which two people observe the same evidence and update their respective beliefs in the directions of their priors. A study by Lord, et al. [1] provides one classic example in which participants read about two studies, one of which concluded that the death penalty deters crime and another which concluded that the death penalty has no effect on crime. After exposure to this mixed evidence, supporters of the death penalty strengthened their support and opponents strengthened their opposition. We will treat belief polarization as a special case of contrary updating, a phenomenon where two people update their beliefs in opposite directions after observing the same evidence (Figure 1a). We distinguish between two types of contrary updating. Belief divergence refers to cases in which the person with the stronger belief in some hypothesis increases the strength of his or her belief and the person with the weaker belief in the hypothesis decreases the strength of his or her belief (Figure 1a(i)). Divergence therefore includes cases of traditional belief polarization. The opposite of divergence is belief convergence (Figure 1a(ii)), in which the person with the stronger belief decreases the strength of his or her belief and the person with the weaker belief increases the strength of his or her belief. Contrary updating may be contrasted with parallel updating (Figure 1b), in which the two people update their beliefs in the same direction. Throughout this paper, we consider only situations in which both people change their beliefs after observing some evidence. All such situations can be unambiguously classiﬁed as instances of parallel or contrary updating. Parallel updating is clearly compatible with a normative approach, but the normative status of divergence and convergence is less clear. Many authors argue that divergence is irrational, and many of the same authors also propose that convergence is rational [2, 3]. For example, Baron [3] writes that “Normatively, we might expect that beliefs move toward the middle of the range when people are presented with mixed evidence.” (p. 210) The next section presents a formal analysis that challenges the conventional wisdom about these phenomena and clariﬁes the cases where they can be considered rational. 2 A Bayesian approach to belief revision Since belief revision involves inference under uncertainty, Bayesian inference provides the appropriate normative standard. Consider a problem where two people observe data d that bear on some hypothesis h1 . Let P1 (·) and P2 (·) be distributions that capture the two people’s respective beliefs. Contrary updating occurs whenever one person’s belief in h1 increases and the other person’s belief in h1 decreases, or when [P1 (h1 |d) − P1 (h1 )] [P2 (h1 |d) − P2 (h1 )] < 0 . 2 (1) Family 1 (a) H (c) (d) (e) V H D Family 2 (b) V V V D H D H D H (f) (g) V V D H D (h) V H D H D Figure 2: (a) A simple Bayesian network that cannot produce either belief divergence or belief convergence. (b) – (h) All possible three-node Bayes nets subject to the constraints described in the text. Networks in Family 1 can produce only parallel updating, but networks in Family 2 can produce both parallel and contrary updating. We will use Bayesian networks to capture the relationships between H, D, and any other variables that are relevant to the situation under consideration. For example, Figure 2a captures the idea that the data D are probabilistically generated from hypothesis H. The remaining networks in Figure 2 show several other ways in which D and H may be related, and will be discussed later. We assume that the two individuals agree on the variables that are relevant to a problem and agree about the relationships between these variables. We can formalize this idea by requiring that both people agree on the structure and the conditional probability distributions (CPDs) of a network N that captures relationships between the relevant variables, and that they differ only in the priors they assign to the root nodes of N . If N is the Bayes net in Figure 2a, then we assume that the two people must agree on the distribution P (D|H), although they may have different priors P1 (H) and P2 (H). If two people agree on network N but have different priors on the root nodes, we can create a single expanded Bayes net to simulate the inferences of both individuals. The expanded network is created by adding a background knowledge node B that sends directed edges to all root nodes in N , and acts as a switch that sets different root node priors for the two different individuals. Given this expanded network, distributions P1 and P2 in Equation 1 can be recovered by conditioning on the value of the background knowledge node and rewritten as [P (h1 |d, b1 ) − P (h1 |b1 )] [P (h1 |d, b2 ) − P (h1 |b2 )] < 0 (2) where P (·) represents the probability distribution captured by the expanded network. Suppose that there are exactly two mutually exclusive hypotheses. For example, h1 and h0 might state that the death penalty does or does not deter crime. In this case Equation 2 implies that contrary updating occurs when [P (d|h1 , b1 ) − P (d|h0 , b1 )] [P (d|h1 , b2 ) − P (d|h0 , b2 )] < 0 . (3) Equation 3 is derived in the supporting material, and leads immediately to the following result: R1: If H is a binary variable and D and B are conditionally independent given H, then contrary updating is impossible. Result R1 follows from the observation that if D and B are conditionally independent given H, then the product in Equation 3 is equal to (P (d|h1 ) − P (d|h0 ))2 , which cannot be less than zero. R1 implies that the simple Bayes net in Figure 2a is incapable of producing contrary updating, an observation previously made by Lopes [11]. Our analysis may help to explain the common intuition that belief divergence is irrational, since many researchers seem to implicitly adopt a model in which H and D are the only relevant variables. Network 2a, however, is too simple to capture the causal relationships that are present in many real world situations. For example, the promotion example at the beginning of this paper is best captured using a network with an additional node that represents the grading scale for the aptitude test. Networks with many nodes may be needed for some real world problems, but here we explore the space of three-node networks. We restrict our attention to connected graphs in which D has no outgoing edges, motivated by the idea that the three variables should be linked and that the data are the ﬁnal result of some generative process. The seven graphs that meet these conditions are shown in Figures 2b–h, where the additional variable has been labeled V . These Bayes nets illustrate cases in which (b) V is an additional 3 Models Conventional wisdom Family 1 Family 2 Belief divergence Belief convergence Parallel updating Table 1: The ﬁrst column represents the conventional wisdom about which belief revision phenomena are normative. The models in the remaining columns include all three-node Bayes nets. This set of models can be partitioned into those that support both belief divergence and convergence (Family 2) and those that support neither (Family 1). piece of evidence that bears on H, (c) V informs the prior probability of H, (d)–(e) D is generated by an intervening variable V , (f) V is an additional generating factor of D, (g) V informs both the prior probability of H and the likelihood of D, and (h) H and D are both effects of V . The graphs in Figure 2 have been organized into two families. R1 implies that none of the graphs in Family 1 is capable of producing contrary updating. The next section demonstrates by example that all three of the graphs in Family 2 are capable of producing contrary updating. Table 1 compares the two families of Bayes nets to the informal conclusions about normative approaches that are often found in the psychological literature. As previously noted, the conventional wisdom holds that belief divergence is irrational but that convergence and parallel updating are both rational. Our analysis suggests that this position has little support. Depending on the causal structure of the problem under consideration, a rational approach should allow both divergence and convergence or neither. Although we focus in this paper on Bayes nets with no more than three nodes, the class of all network structures can be partitioned into those that can (Family 2) and cannot (Family 1) produce contrary updating. R1 is true for Bayes nets of any size and characterizes one group of networks that belong to Family 1. Networks where the data provide no information about the hypotheses must also fail to produce contrary updating. Note that if D and H are conditionally independent given B, then the left side of Equation 3 is equal to zero, meaning contrary updating cannot occur. We conjecture that all remaining networks can produce contrary updating if the cardinalities of the nodes and the CPDs are chosen appropriately. Future studies can attempt to verify this conjecture and to precisely characterize the CPDs that lead to contrary updating. 3 Examples of rational belief divergence We now present four scenarios that can be modeled by the three-node Bayes nets in Family 2. Our purpose in developing these examples is to demonstrate that these networks can produce belief divergence and to provide some everyday examples in which this behavior is both normative and intuitive. 3.1 Example 1: Promotion We ﬁrst consider a scenario that can be captured by Bayes net 2f, in which the data depend on two independent factors. Recall the scenario described at the beginning of this paper: Alice and Bob are responsible for deciding whether to promote Carol. For simplicity, we consider a case where the data represent a binary outcome—whether or not Carol’s r´ sum´ indicates that she is included e e in The Directory of Notable People—rather than her score on an aptitude test. Alice believes that The Directory is a reputable publication but Bob believes it is illegitimate. This situation is represented by the Bayes net and associated CPDs in Figure 3a. In the tables, the hypothesis space H = {‘Unqualiﬁed’ = 0, ‘Qualiﬁed’ = 1} represents whether or not Carol is qualiﬁed for the promotion, the additional factor V = {‘Disreputable’ = 0, ‘Reputable’ = 1} represents whether The Directory is a reputable publication, and the data variable D = {‘Not included’ = 0, ‘Included’ = 1} represents whether Carol is featured in it. The actual probabilities were chosen to reﬂect the fact that only an unqualiﬁed person is likely to pad their r´ sum´ by mentioning a disreputable publication, but that e e 4 (a) B Alice Bob (b) P(V=1) 0.01 0.9 B Alice Bob V B Alice Bob P(H=1) 0.6 0.4 V H D V 0 0 1 1 H 0 1 0 1 V 0 1 P(D=1) 0.5 0.1 0.1 0.9 (c) P(H=1) 0.1 0.9 H V 0 0 1 1 D H 0 1 0 1 P(D=1) 0.4 0.01 0.4 0.6 (d) B Alice Bob P(V=0) P(V=1) P(V=2) P(V=3) 0.6 0.2 0.1 0.1 0.1 0.1 0.2 0.6 B Alice Bob P(V1=1) 0.9 0.1 P(H=1) 1 1 0 0 H B Alice Bob V1 V V 0 1 2 3 P(V=1) 0.9 0.1 D V 0 1 2 3 P(D=0) P(D=1) P(D=2) P(D=3) 0.7 0.1 0.1 0.1 0.1 0.7 0.1 0.1 0.1 0.1 0.7 0.1 0.1 0.1 0.1 0.7 V1 0 0 1 1 V2 0 1 0 1 P(H=1) 0.5 0.1 0.5 0.9 P(V2=1) 0.5 0.5 V2 H D V2 0 1 P(D=1) 0.1 0.9 Figure 3: The Bayes nets and conditional probability distributions used in (a) Example 1: Promotion, (b) Example 2: Religious belief, (c) Example 3: Election polls, (d) Example 4: Political belief. only a qualiﬁed person is likely to be included in The Directory if it is reputable. Note that Alice and Bob agree on the conditional probability distribution for D, but assign different priors to V and H. Alice and Bob therefore interpret the meaning of Carol’s presence in The Directory differently, resulting in the belief divergence shown in Figure 4a. This scenario is one instance of a large number of belief divergence cases that can be attributed to two individuals possessing different mental models of how the observed evidence was generated. For instance, suppose now that Alice and Bob are both on an admissions committee and are evaluating a recommendation letter for an applicant. Although the letter is positive, it is not enthusiastic. Alice, who has less experience reading recommendation letters interprets the letter as a strong endorsement. Bob, however, takes the lack of enthusiasm as an indication that the author has some misgivings [12]. As in the promotion scenario, the differences in Alice’s and Bob’s experience can be effectively represented by the priors they assign to the H and V nodes in a Bayes net of the form in Figure 2f. 3.2 Example 2: Religious belief We now consider a scenario captured by Bayes net 2g. In our example for Bayes net 2f, the status of an additional factor V affected how Alice and Bob interpreted the data D, but did not shape their prior beliefs about H. In many cases, however, the additional factor V will inﬂuence both people’s prior beliefs about H as well as their interpretation of the relationship between D and H. Bayes net 2g captures this situation, and we provide a concrete example inspired by an experiment conducted by Batson [13]. Suppose that Alice believes in a “Christian universe:” she believes in the divinity of Jesus Christ and expects that followers of Christ will be persecuted. Bob, on the other hand, believes in a “secular universe.” This belief leads him to doubt Christ’s divinity, but to believe that if Christ were divine, his followers would likely be protected rather than persecuted. Now suppose that both Alice and Bob observe that Christians are, in fact, persecuted, and reassess the probability of Christ’s divinity. This situation is represented by the Bayes net and associated CPDs in Figure 3b. In the tables, the hypothesis space H = {‘Human’ = 0, ‘Divine’ = 1} represents the divinity of Jesus Christ, the additional factor V = {‘Secular’ = 0, ‘Christian’ = 1} represents the nature of the universe, and the data variable D = {‘Not persecuted’ = 0, ‘Persecuted’ = 1} represents whether Christians are subject to persecution. The exact probabilities were chosen to reﬂect the fact that, regardless of worldview, people will agree on a “base rate” of persecution given that Christ is not divine, but that more persecution is expected if the Christian worldview is correct than if the secular worldview is correct. Unlike in the previous scenario, Alice and Bob agree on the CPDs for both D and H, but 5 (a) (b) P (H = 1) (d) 1 1 1 0.5 1 (c) 0.5 0.5 A 0.5 B 0 0 0 Prior beliefs Updated beliefs Prior beliefs Updated beliefs 0 Prior beliefs Updated beliefs Prior beliefs Updated beliefs Figure 4: Belief revision outcomes for (a) Example 1: Promotion, (b) Example 2: Religious belief, (c) Example 3: Election polls, and (d) Example 4: Political belief. In all four plots, the updated beliefs for Alice (solid line) and Bob (dashed line) are computed after observing the data described in the text. The plots conﬁrm that all four of our example networks can lead to belief divergence. differ in the priors they assign to V . As a result, Alice and Bob disagree about whether persecution supports or undermines a Christian worldview, which leads to the divergence shown in Figure 4b. This scenario is analogous to many real world situations in which one person has knowledge that the other does not. For instance, in a police interrogation, someone with little knowledge of the case (V ) might take a suspect’s alibi (D) as strong evidence of their innocence (H). However, a detective with detailed knowledge of the case may assign a higher prior probability to the subject’s guilt based on other circumstantial evidence, and may also notice a detail in the suspect’s alibi that only the culprit would know, thus making the statement strong evidence of guilt. In all situations of this kind, although two people possess different background knowledge, their inferences are normative given that knowledge, consistent with the Bayes net in Figure 2g. 3.3 Example 3: Election polls We now consider two qualitatively different cases that are both captured by Bayes net 2h. The networks considered so far have all included a direct link between H and D. In our next two examples, we consider cases where the hypotheses and observed data are not directly linked, but are coupled by means of one or more unobserved causal factors. Suppose that an upcoming election will be contested by two Republican candidates, Rogers and Rudolph, and two Democratic candidates, Davis and Daly. Alice and Bob disagree about the various candidates’ chances of winning, with Alice favoring the two Republicans and Bob favoring the two Democrats. Two polls were recently released, one indicating that Rogers was most likely to win the election and the other indicating that Daly was most likely to win. After considering these polls, they both assess the likelihood that a Republican will win the election. This situation is represented by the Bayes net and associated CPDs in Figure 3c. In the tables, the hypothesis space H = {‘Democrat wins’ = 0, ‘Republican wins’ = 1} represents the winning party, the variable V = {‘Rogers’ = 0, ‘Rudolph’ = 1, ‘Davis’ = 2, ‘Daly’ = 3} represents the winning candidate, and the data variables D1 = D2 = {‘Rogers’ = 0, ‘Rudolph’ = 1, ‘Davis’ = 2, ‘Daly’ = 3} represent the results of the two polls. The exact probabilities were chosen to reﬂect the fact that the polls are likely to reﬂect the truth with some noise, but whether a Democrat or Republican wins is completely determined by the winning candidate V . In Figure 3c, only a single D node is shown because D1 and D2 have identical CPDs. The resulting belief divergence is shown in Figure 4c. Note that in this scenario, Alice’s and Bob’s different priors cause them to discount the poll that disagrees with their existing beliefs as noise, thus causing their prior beliefs to be reinforced by the mixed data. This scenario was inspired by the death penalty study [1] alluded to earlier, in which a set of mixed results caused supporters and opponents of the death penalty to strengthen their existing beliefs. We do not claim that people’s behavior in this study can be explained with exactly the model employed here, but our analysis does show that selective interpretation of evidence is sometimes consistent with a rational approach. 6 3.4 Example 4: Political belief We conclude with a second illustration of Bayes net 2h in which two people agree on the interpretation of an observed piece of evidence but disagree about the implications of that evidence. In this scenario, Alice and Bob are two economists with different philosophies about how the federal government should approach a major recession. Alice believes that the federal government should increase its own spending to stimulate economic activity; Bob believes that the government should decrease its spending and reduce taxes instead, providing taxpayers with more spending money. A new bill has just been proposed and an independent study found that the bill was likely to increase federal spending. Alice and Bob now assess the likelihood that this piece of legislation will improve the economic climate. This scenario can be modeled by the Bayes net and associated CPDs in Figure 3d. In the tables, the hypothesis space H = {‘Bad policy’ = 0, ‘Good policy’ = 1} represents whether the new bill is good for the economy and the data variable D = {‘No spending’ = 0, ‘Spending increase’ = 1} represents the conclusions of the independent study. Unlike in previous scenarios, we introduce two additional factors, V 1 = {‘Fiscally conservative’ = 0, ‘Fiscally liberal’ = 1}, which represents the optimal economic philosophy, and V 2 = {‘No spending’ = 0, ‘Spending increase’ = 1}, which represents the spending policy of the new bill. The exact probabilities in the tables were chosen to reﬂect the fact that if the bill does not increase spending, the policy it enacts may still be good for other reasons. A uniform prior was placed on V 2 for both people, reﬂecting the fact that they have no prior expectations about the spending in the bill. However, the priors placed on V 1 for Alice and Bob reﬂect their different beliefs about the best economic policy. The resulting belief divergence behavior is shown in Figure 4d. The model used in this scenario bears a strong resemblance to the probabilogical model of attitude change developed by McGuire [14] in which V 1 and V 2 might be logical “premises” that entail the “conclusion” H. 4 How common is contrary updating? We have now described four concrete cases where belief divergence is captured by a normative approach. It is possible, however, that belief divergence is relatively rare within the Bayes nets of Family 2, and that our four examples are exotic special cases that depend on carefully selected CPDs. To rule out this possibility, we ran simulations to explore the space of all possible CPDs for the three networks in Family 2. We initially considered cases where H, D, and V were binary variables, and ran two simulations for each model. In one simulation, the priors and each row of each CPD were sampled from a symmetric Beta distribution with parameter 0.1, resulting in probabilities highly biased toward 0 and 1. In the second simulation, the probabilities were sampled from a uniform distribution. In each trial, a single set of CPDs were generated and then two different priors were generated for each root node in the graph to simulate two individuals, consistent with our assumption that two individuals may have different priors but must agree about the conditional probabilities. 20,000 trials were carried out in each simulation, and the proportion of trials that led to convergence and divergence was computed. Trials were only counted as instances of convergence or divergence if |P (H = 1|D = 1) − P (H = 1)| > for both individuals, with = 1 × 10−5 . The results of these simulations are shown in Table 2. The supporting material proves that divergence and convergence are equally common, and therefore the percentages in the table show the frequencies for contrary updating of either type. Our primary question was whether contrary updating is rare or anomalous. In all but the third simulation, contrary updating constituted a substantial proportion of trials, suggesting that the phenomenon is relatively generic. We were also interested in whether this behavior relied on particular settings of the CPDs. The fact that percentages for the uniform distribution are approximately the same or greater than for the biased distribution indicates that contrary updating appears to be a relatively generic behavior for the Bayes nets we considered. More generally, these results directly challenge the suggestion that normative accounts are not suited for modeling belief divergence. The last two columns of Table 2 show results for two simulations with the same Bayes net, the only difference being whether V was treated as 2-valued (binary) or 4-valued. The 4-valued case is included because both Examples 3 and 4 considered multi-valued additional factor variables V . 7 2-valued V V H Biased Uniform 4-valued V V V V D 9.6% 18.2% D H 12.7% 16.0% H D 0% 0% H D 23.3% 20.0% Table 2: Simulation results. The percentages indicate the proportion of trials that produced contrary updating using the speciﬁed Bayes net (column) and probability distributions (row). The prior and conditional probabilities were either sampled from a Beta(0.1, 0.1) distribution (biased) or a Beta(1, 1) distribution (uniform). The probabilities for the simulation results shown in the last column were sampled from a Dirichlet([0.1, 0.1, 0.1, 0.1]) distribution (biased) or a Dirichlet([1, 1, 1, 1]) distribution (uniform). In Example 4, we used two binary variables, but we could have equivalently used a single 4-valued variable. Belief convergence and divergence are not possible in the binary case, a result that is proved in the supporting material. We believe, however, that convergence and divergence are fairly common whenever V takes three or more values, and the simulation in the last column of the table conﬁrms this claim for the 4-valued case. Given that belief divergence seems relatively common in the space of all Bayes nets, it is natural to explore whether cases of rational divergence are regularly encountered in the real world. One possible approach is to analyze a large database of networks that capture everyday belief revision problems, and to determine what proportion of networks lead to rational divergence. Future studies can explore this issue, but our simulations suggest that contrary updating is likely to arise in cases where it is necessary to move beyond a simple model like the one in Figure 2a and consider several causal factors. 5 Conclusion This paper presented a family of Bayes nets that can account for belief divergence, a phenomenon that is typically considered to be incompatible with normative accounts. We provided four concrete examples that illustrate how this family of networks can capture a variety of settings where belief divergence can emerge from rational statistical inference. We also described a series of simulations that suggest that belief divergence is not only possible but relatively common within the family of networks that we considered. Our work suggests that belief polarization should not always be taken as evidence of irrationality, and that researchers who aim to document departures from rationality may wish to consider alternative phenomena instead. One such phenomenon might be called “inevitable belief reinforcement” and occurs when supporters of a hypothesis update their belief in the same direction for all possible data sets d. For example, a gambler will demonstrate inevitable belief reinforcement if he or she becomes increasingly convinced that a roulette wheel is biased towards red regardless of whether the next spin produces red, black, or green. This phenomenon is provably inconsistent with any fully Bayesian approach, and therefore provides strong evidence of irrationality. Although we propose that some instances of polarization are compatible with a Bayesian approach, we do not claim that human inferences are always or even mostly rational. We suggest, however, that characterizing normative behavior can require careful thought, and that formal analyses are invaluable for assessing the rationality of human inferences. In some cases, a formal analysis will provide an appropriate baseline for understanding how human inferences depart from rational norms. In other cases, a formal analysis will suggest that an apparently irrational inference makes sense once all of the relevant information is taken into account. 8 References [1] C. G. Lord, L. Ross, and M. R. Lepper. Biased assimilation and attitude polarization: The effects of prior theories on subsequently considered evidence. Journal of Personality and Social Psychology, 37(1):2098–2109, 1979. [2] L. Ross and M. R. Lepper. The perseverance of beliefs: Empirical and normative considerations. In New directions for methodology of social and behavioral science: Fallible judgment in behavioral research. Jossey-Bass, San Francisco, 1980. [3] J. Baron. Thinking and Deciding. Cambridge University Press, Cambridge, 4th edition, 2008. [4] A. Gerber and D. Green. Misperceptions about perceptual bias. Annual Review of Political Science, 2:189–210, 1999. [5] M. Oaksford and N. Chater. A rational analysis of the selection task as optimal data selection. Psychological Review, 101(4):608–631, 1994. [6] U. Hahn and M. Oaksford. The rationality of informal argumentation: A Bayesian approach to reasoning fallacies. Psychological Review, 114(3):704–732, 2007. [7] S. Sher and C. R. M. McKenzie. Framing effects and rationality. In N. Chater and M. Oaksford, editors, The probablistic mind: Prospects for Bayesian cognitive science. Oxford University Press, Oxford, 2008. [8] B. O’Connor. Biased evidence assimilation under bounded Bayesian rationality. Master’s thesis, Stanford University, 2006. [9] A. Zimper and A. Ludwig. Attitude polarization. Technical report, Mannheim Research Institute for the Economics of Aging, 2007. [10] A. K. Dixit and J. W. Weibull. Political polarization. Proceedings of the National Academy of Sciences, 104(18):7351–7356, 2007. [11] L. L. Lopes. Averaging rules and adjustment processes in Bayesian inference. Bulletin of the Psychonomic Society, 23(6):509–512, 1985. [12] A. Harris, A. Corner, and U. Hahn. “Damned by faint praise”: A Bayesian account. In A. D. De Groot and G. Heymans, editors, Proceedings of the 31th Annual Conference of the Cognitive Science Society, Austin, TX, 2009. Cognitive Science Society. [13] C. D. Batson. Rational processing or rationalization? The effect of disconﬁrming information on a stated religious belief. Journal of Personality and Social Psychology, 32(1):176–184, 1975. [14] W. J. McGuire. The probabilogical model of cognitive structure and attitude change. In R. E. Petty, T. M. Ostrom, and T. C. Brock, editors, Cognitive Responses in Persuasion. Lawrence Erlbaum Associates, 1981. 9

same-paper 2 0.86482126 259 nips-2009-Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Author: Jie Luo, Barbara Caputo, Vittorio Ferrari

3 0.74435836 52 nips-2009-Code-specific policy gradient rules for spiking neurons

Author: Henning Sprekeler, Guillaume Hennequin, Wulfram Gerstner

Abstract: Although it is widely believed that reinforcement learning is a suitable tool for describing behavioral learning, the mechanisms by which it can be implemented in networks of spiking neurons are not fully understood. Here, we show that different learning rules emerge from a policy gradient approach depending on which features of the spike trains are assumed to inﬂuence the reward signals, i.e., depending on which neural code is in effect. We use the framework of Williams (1992) to derive learning rules for arbitrary neural codes. For illustration, we present policy-gradient rules for three different example codes - a spike count code, a spike timing code and the most general “full spike train” code - and test them on simple model problems. In addition to classical synaptic learning, we derive learning rules for intrinsic parameters that control the excitability of the neuron. The spike count learning rule has structural similarities with established Bienenstock-Cooper-Munro rules. If the distribution of the relevant spike train features belongs to the natural exponential family, the learning rules have a characteristic shape that raises interesting prediction problems. 1

4 0.60421288 213 nips-2009-Semi-supervised Learning using Sparse Eigenfunction Bases

Author: Kaushik Sinha, Mikhail Belkin

Abstract: We present a new framework for semi-supervised learning with sparse eigenfunction bases of kernel matrices. It turns out that when the data has clustered, that is, when the high density regions are sufﬁciently separated by low density valleys, each high density area corresponds to a unique representative eigenvector. Linear combination of such eigenvectors (or, more precisely, of their Nystrom extensions) provide good candidates for good classiﬁcation functions when the cluster assumption holds. By ﬁrst choosing an appropriate basis of these eigenvectors from unlabeled data and then using labeled data with Lasso to select a classiﬁer in the span of these eigenvectors, we obtain a classiﬁer, which has a very sparse representation in this basis. Importantly, the sparsity corresponds naturally to the cluster assumption. Experimental results on a number of real-world data-sets show that our method is competitive with the state of the art semi-supervised learning algorithms and outperforms the natural base-line algorithm (Lasso in the Kernel PCA basis). 1

5 0.53673506 166 nips-2009-Noisy Generalized Binary Search

Author: Robert Nowak

Abstract: This paper addresses the problem of noisy Generalized Binary Search (GBS). GBS is a well-known greedy algorithm for determining a binary-valued hypothesis through a sequence of strategically selected queries. At each step, a query is selected that most evenly splits the hypotheses under consideration into two disjoint subsets, a natural generalization of the idea underlying classic binary search. GBS is used in many applications, including fault testing, machine diagnostics, disease diagnosis, job scheduling, image processing, computer vision, and active learning. In most of these cases, the responses to queries can be noisy. Past work has provided a partial characterization of GBS, but existing noise-tolerant versions of GBS are suboptimal in terms of query complexity. This paper presents an optimal algorithm for noisy GBS and demonstrates its application to learning multidimensional threshold functions. 1

6 0.39372912 247 nips-2009-Time-rescaling methods for the estimation and assessment of non-Poisson neural encoding models

7 0.39320719 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

8 0.37693697 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

9 0.37502402 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

10 0.36694038 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

11 0.36605698 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

12 0.35474491 107 nips-2009-Help or Hinder: Bayesian Models of Social Goal Inference

13 0.35255903 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels

14 0.34182915 183 nips-2009-Optimal context separation of spiking haptic signals by second-order somatosensory neurons

15 0.33707729 163 nips-2009-Neurometric function analysis of population codes

16 0.33202723 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

17 0.32810286 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities

18 0.32486004 29 nips-2009-An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

19 0.31925094 38 nips-2009-Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity

20 0.31663901 154 nips-2009-Modeling the spacing effect in sequential category learning