nips nips2010 nips2010-82 knowledge-graph by maker-knowledge-mining

82 nips-2010-Evaluation of Rarity of Fingerprints in Forensics

Source: pdf

Author: Chang Su, Sargur Srihari

Abstract: A method for computing the rarity of latent ﬁngerprints represented by minutiae is given. It allows determining the probability of ﬁnding a match for an evidence print in a database of n known prints. The probability of random correspondence between evidence and database is determined in three procedural steps. In the registration step the latent print is aligned by ﬁnding its core point; which is done using a procedure based on a machine learning approach based on Gaussian processes. In the evidence probability evaluation step a generative model based on Bayesian networks is used to determine the probability of the evidence; it takes into account both the dependency of each minutia on nearby minutiae and the conﬁdence of their presence in the evidence. In the speciﬁc probability of random correspondence step the evidence probability is used to determine the probability of match among n for a given tolerance; the last evaluation is similar to the birthday correspondence probability for a speciﬁc birthday. The generative model is validated using a goodness-of-ﬁt test evaluated with a standard database of ﬁngerprints. The probability of random correspondence for several latent ﬁngerprints are evaluated for varying numbers of minutiae. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract A method for computing the rarity of latent ﬁngerprints represented by minutiae is given. [sent-2, score-0.642]

2 It allows determining the probability of ﬁnding a match for an evidence print in a database of n known prints. [sent-3, score-0.251]

3 In the registration step the latent print is aligned by ﬁnding its core point; which is done using a procedure based on a machine learning approach based on Gaussian processes. [sent-5, score-0.437]

4 In the evidence probability evaluation step a generative model based on Bayesian networks is used to determine the probability of the evidence; it takes into account both the dependency of each minutia on nearby minutiae and the conﬁdence of their presence in the evidence. [sent-6, score-1.153]

5 In the speciﬁc probability of random correspondence step the evidence probability is used to determine the probability of match among n for a given tolerance; the last evaluation is similar to the birthday correspondence probability for a speciﬁc birthday. [sent-7, score-0.193]

6 For instance in the case of DNA a probability statement is made after a match has been conﬁrmed between the evidence and the known, that the chance that a randomly selected person would have the same DNA pattern is 1 in 24,000,000 which is a description of rarity of the evidence/known [1]. [sent-11, score-0.265]

7 In the case of ﬁngerprint evidence there is uncertainty at two levels: the similarity between the evidence and the known and the rarity of the known. [sent-12, score-0.249]

8 This paper describes a systematic approach for the computation of the rarity of ﬁngerprints in a robust and reliable manner. [sent-20, score-0.206]

9 Due to varying quality of ﬁngerprints collected from the crime scene, called latent prints, a registration process is needed to determine which area of ﬁnger skin the print comes from; Section 2 describes the use of Gaussian processes to predict core points by which prints can be aligned. [sent-22, score-0.678]

10 In Section 3 a generative model based on Bayesian networks is proposed to model the distribution of minutiae as well as the dependencies between them. [sent-23, score-0.437]

11 To measure rarity, a metric for assessing the probability of random correspondence of a speciﬁc print against n samples is deﬁned in Section 4. [sent-24, score-0.235]

12 To solve this problem, we ﬁrst predict the core points and then align the ﬁngerprints by overlapping their core points. [sent-30, score-0.335]

13 For ﬁngerprints that do not contain loop or whorl singularities, the core is usually associated with the point of maxima ridge line curvature[6]. [sent-33, score-0.233]

14 Latent prints are usually small partial prints and do not contain core points. [sent-39, score-0.416]

15 Since the ridge ﬂow directions reveal the intrinsic features of ridge topologies, and thus have critical impact on core point prediction. [sent-42, score-0.296]

16 The orientation maps are used to predict the core points. [sent-43, score-0.271]

17 Given an orientation map of a ﬁngerprint, the core point is predicted using Gaussian processes. [sent-47, score-0.268]

18 Instead of representing the core point as a single value, the predication of the core point from Gaussian process model takes the form of a full predictive distribution. [sent-51, score-0.331]

19 , N }, where g denotes the orientation map of a ﬁngerprint print and y denotes the output which is the core point. [sent-55, score-0.421]

20 The Gaussian predictive distribution of core point y ∗ can be evaluated by conditioning the joint Gaussian prior distribution on the observation (G, y), where G = (g1 , . [sent-61, score-0.211]

21 (7) Note that for some ﬁngerprints such as latent ﬁngerprints collected from crime scene, their locations in the complete print are unknown. [sent-69, score-0.334]

22 So any g∗ only represents the orientation map of the print in one possible location. [sent-70, score-0.276]

23 In order to predict the core point in the correct location, we list all the possible print locations corresponding to the different translations and rotations. [sent-71, score-0.354]

24 The core point y ∗ should maximize p(y ∗ |gi , G, y) with respect to gi . [sent-77, score-0.189]

25 Note that the minutia features mentioned in following sections have been aligned ﬁrst. [sent-80, score-0.656]

26 Previous generative models for ﬁngerprints involve different assumptions: uniform distribution of minutia locations and directions [15] and minutiae are independent of each other [16, 17]. [sent-82, score-1.088]

27 However, minutiae that are spatially close tend to have similar directions with each other [18]. [sent-83, score-0.403]

28 The variance of the minutia directions in different regions of the ﬁngerprint are dependent on both their locations and location variance [19, 20]. [sent-85, score-0.74]

29 These observations on the dependency between minutiae need to be accounted for in eliciting reliable statistical models. [sent-86, score-0.424]

30 The proposed model incorporates the distribution of minutiae and the dependency relationship between them. [sent-87, score-0.423]

31 Each minutia is represented by its location and direction. [sent-90, score-0.712]

32 Automatic ﬁngerprint matching algorithms use minutiae as the salient features [21], since they are stable and are reliably extracted. [sent-92, score-0.387]

33 Each minutia is represented as x = (s, θ) where s = (x1 , x2 ) is its location and θ its direction. [sent-93, score-0.712]

34 In order to capture the distribution of minutiae as well as the dependencies between them, we ﬁrst propose a method to deﬁne a unique sequence for a given set of minutiae. [sent-94, score-0.4]

35 The sequence starts with the minutia x1 whose location is closest to the core point. [sent-96, score-0.857]

36 Each remaining minutia xn is the spatially closest to the centroid deﬁned by the arithmetic mean of the location coordinates of all the previous minutiae x1 , . [sent-97, score-1.139]

37 Given this sequence, the ﬁngerprint can be represented by a minutia sequence X = (x1 , . [sent-101, score-0.636]

38 The sequence is robust to the variance of the minutiae because the next minutia is decided by the all the previous minutiae. [sent-105, score-1.011]

39 Given the observation that spatially closer minutiae are more strongly related, we only model the dependence between xn and its nearest minutia among {x1 , . [sent-106, score-1.046]

40 Figure 1(b) shows the minutia sequence and the minutia 3 (a) Minutiae sequencing. [sent-115, score-1.272]

41 Figure 1: Minutia dependency modeling: (a) given minutiae {x1 , . [sent-117, score-0.411]

42 , x4 } with centroid c, the next minutia x5 is the one closest to c, and (b) following this procedure dependency between seven minutiae are represented by arrows. [sent-120, score-1.064]

43 Based on the characteristic of ﬁngerprint minutiae studied in [18, 19, 20], we know that the minutia direction is related to its location and the neighboring minutiae. [sent-124, score-1.115]

44 The minutia location is conditional independent of the location of the neighboring minutiae given their directions. [sent-125, score-1.163]

45 To address the probabilistic relationships of the minutiae, Bayesian networks are used to represent the distributions of the minutia features in ﬁngerprints. [sent-126, score-0.636]

46 Figure 2 shows the Bayesian network for the distribution of the minutia set given in Figure 1. [sent-127, score-0.648]

47 The nodes sn and θn represent the location and direction of minutia xn . [sent-128, score-0.841]

48 In general, for a given ﬁngerprint, the joint distribution over its minutia set X is given by N p(sn )p(θn |sn , sψ(n) , θψ(n) ) p(X) = p(s1 )p(θ1 |s1 ) (10) n=2 where sψ(n) and θψ(n) are the location and direction of the minutia xi which has the minimal spatial distance to the minutia xn . [sent-130, score-2.059]

49 It is known that minutiae tend to form clusters [18] and minutiae in different regions of the ﬁngerprint are observed to be associated with different region-speciﬁc minutia directions. [sent-132, score-1.386]

50 A mixture of Gaussian is a natural approach to model the minutia location given by (12). [sent-133, score-0.712]

51 Since minutia orientation is a periodic variable, it is modeled by the von Mises distribution which itself is derived from the Gaussian. [sent-134, score-0.744]

52 The minutia represented by its location and direction is modeled by the mixture of joint Gaussian and von-Mises distribution [22] give by (13). [sent-135, score-0.766]

53 Given its location and the nearest minutia, the minutia direction has the mixture of von-Mises density given by (14). [sent-136, score-0.74]

54 The metric of rarity is speciﬁc nPRC, the probability that data with value x coincides with an element in a set of n samples, within speciﬁed tolerance. [sent-140, score-0.196]

55 To compute speciﬁc nPRC, we ﬁrst deﬁne correspondence or match, between two minutiae as follows. [sent-148, score-0.411]

56 The minutiae are said to correspond if for tolerance ǫ = [ǫs , ǫθ ], sa − sb ≤ ǫs ∧ |θa − θb | ≤ ǫθ (16) where sa − sb is the Euclidean distance between the minutia locations. [sent-150, score-1.114]

57 Then, the match between two ﬁngerprints is deﬁned as existing at least m pairs of matched minutiae between two ﬁngerprints. [sent-151, score-0.393]

58 ˆ To deal with the largely varying quality in latent ﬁngerprints, it is also important to consider the minutia conﬁdence in speciﬁc nPRC measurement. [sent-153, score-0.741]

59 The conﬁdence of the minutia xn is deﬁned as (dsn , dθn ), where dsn is the conﬁdence of location and dθn is the conﬁdence of direction. [sent-154, score-0.783]

60 Let f be a randomly sampled ﬁngerprint which has minutia set X′ = {x′ , . [sent-156, score-0.636]

61 Let X and X′ 1 M be the sets of m minutiae randomly picked from X and X′ ,where m ≤ N and m ≤ M . [sent-160, score-0.391]

62 N (m) ˆ m′ ′ · pǫ (Xi ) (24) pǫ (X, m) = ˆ p(m ) m ˆ ′ i=1 m ∈M where M contains all possible numbers of minutiae in one ﬁngerprint among n ﬁngerprints, p(m′ ) is the probability of a random ﬁngerprint having m′ minutiae, minutia set Xi = (xi1 , xi2 , . [sent-162, score-1.04]

63 , xim ) is ˆ the subset of X and pǫ (Xi ) is the joint probability of minutia set Xi given by (19). [sent-165, score-0.667]

64 The training set consists of the orientation maps of the ﬁngerprints and the corresponding core points which are marked manually. [sent-182, score-0.271]

65 The core point prediction is applied on three groups of latent prints in different quality. [sent-183, score-0.391]

66 Figure 3 shows the results of core point prediction and subsequent latent print localization given two latent ﬁngerprints from NIST27. [sent-184, score-0.535]

67 The test latent prints are extracted and enhanced manually. [sent-186, score-0.23]

68 The true core points of the latent prints are picked from the matching 10-prints. [sent-187, score-0.406]

69 Correct prediction is determined by comparing the location and direction distances between predicted and true core points with the threshold parameters set at Ts = 16 pixels, and Tθ = π/6. [sent-188, score-0.281]

70 Good quality set contains 88 images that mostly contain the core points. [sent-189, score-0.19]

71 Both bad and ugly quality sets contain 85 images that have small size and usually do not include core points. [sent-190, score-0.238]

72 Precisions of bad and ugly quality show distinct difference between two methods and indicate that GP based method provides core point prediction even though the core points can not be seen in the latent prints. [sent-192, score-0.488]

73 The chi-square 6 (a) Latent print localization of case “g90”. [sent-196, score-0.185]

74 Figure 3: Latent print localization: Left side images are the latent ﬁngerprints (rectangles) collected from crime scenes. [sent-198, score-0.335]

75 Right side images contain the predicted core points (crosses) and true core points (rounds) with the orientation maps of the latent prints. [sent-199, score-0.548]

76 Three different tests were conducted for : (i) distribution of minutia location (12), (ii) joint distribution of minutia location and orientation (13), and (iii) distributions of minutia dependency (14). [sent-210, score-2.246]

77 For minutia location, we partitioned the minutia location space into 16 non-overlapping blocks. [sent-211, score-1.348]

78 For minutia location and orientation, we partitioned the feature space into 16 × 4 non-overlapping blocks. [sent-212, score-0.712]

79 For minutia dependency, the orientation space is divided into 9 non-overlapping blocks. [sent-213, score-0.732]

80 The blocks are combined with adjacent blocks until both observed and expected numbers of minutiae in the block are greater than or equal to 5. [sent-214, score-0.387]

81 χ2 = i (Oi − Ei )2 Ei (25) where Oi is the observed minutia count for the ith block, and Ei is the expected minutia count for the ith block. [sent-216, score-1.272]

82 To test the models for minutia location, and minutia location and orientation, the numbers of ﬁngerprints with p-values above (corresponding to accept the model) and below (corresponding to reject the model) the signiﬁcance level are computed. [sent-221, score-1.36]

83 Of the 4000 ﬁngerprints, 3387 are accepted and 613 are rejected for minutia location model, and 3216 are accepted and 784 are rejected for minutia location and orientation model. [sent-222, score-1.594]

84 To test the model for minutia dependency, we ﬁrst collect all the linked minutia pairs in the minutia sequences produced from 4000 ﬁngerprints. [sent-223, score-1.908]

85 Then these minutia pairs are separated by the binned locations of both minutiae (32 × 32) and orientation of the leading minutia (4). [sent-224, score-1.757]

86 Finally, the minutia dependency models can be tested on the corresponding minutia pair sets. [sent-225, score-1.308]

87 Figure 4: Two latent cases: The left images are the crime scene photographs containing the latent ﬁngerprints and minutiae. [sent-231, score-0.257]

88 The right images are the preprocessed latent prints with aligned minutiae with predicted core points. [sent-232, score-0.772]

89 51 × 10−79 6 Fingerprint Rarity measurement on Latent Prints The method for assessing ﬁngerprint rarity using the validated model is demonstrated here. [sent-243, score-0.211]

90 The ﬁrst latent print “b115” contains 16 minutiae and the second “g73” contains 39 minutiae. [sent-245, score-0.629]

91 The conﬁdences of minutiae are manually assigned by visual inspection. [sent-246, score-0.375]

92 The speciﬁc nPRC of the two latent prints are given by Table 3. [sent-247, score-0.217]

93 The speciﬁc nPRCs are calculated through varying numbers of matching minutia pairs (m), assuming ˆ that the number of ﬁngerprints (n) is 100, 000. [sent-248, score-0.66]

94 For the latent print that contains more minutiae or whose minutiae are more common in minutia population, the probability that the latent print shares m minutiae with a random ﬁngerprint ˆ is more. [sent-251, score-2.286]

95 7 Summary This work is the ﬁrst attempt of offering a systematic method to measure the rarity of ﬁngerprints. [sent-254, score-0.193]

96 In order to align the prints, a Gaussian processes based approach is proposed to predict the core points. [sent-255, score-0.191]

97 It is proven that this approach can predict core points whether the prints contain the core points or not. [sent-256, score-0.48]

98 Furthermore, a generative model is proposed to model the distribution of minutiae as well as the dependency between them. [sent-257, score-0.46]

99 To further improve the accuracy, minutia conﬁdences are taken into account for speciﬁc nPRC calculation. [sent-260, score-0.636]

100 It is shown that the proposed method is capable of estimating the rarity of real-life latent ﬁngerprints. [sent-263, score-0.267]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('minutia', 0.636), ('ngerprint', 0.418), ('minutiae', 0.375), ('ngerprints', 0.288), ('rarity', 0.179), ('print', 0.166), ('core', 0.145), ('prints', 0.129), ('orientation', 0.096), ('nprc', 0.089), ('latent', 0.088), ('sn', 0.08), ('fingerprint', 0.08), ('location', 0.076), ('ridge', 0.062), ('crime', 0.052), ('dsn', 0.05), ('generative', 0.037), ('dependency', 0.036), ('correspondence', 0.036), ('evidence', 0.035), ('gi', 0.031), ('forensic', 0.03), ('forensics', 0.03), ('individuality', 0.03), ('nprcs', 0.03), ('poincare', 0.03), ('ugly', 0.03), ('direction', 0.028), ('precisions', 0.027), ('gaussian', 0.027), ('prabhakar', 0.026), ('nist', 0.026), ('tolerance', 0.025), ('gp', 0.025), ('cance', 0.023), ('dences', 0.023), ('sb', 0.023), ('statistic', 0.022), ('xn', 0.021), ('dence', 0.02), ('aligned', 0.02), ('fingerprints', 0.02), ('srihari', 0.02), ('rejected', 0.019), ('localization', 0.019), ('accepted', 0.018), ('bad', 0.018), ('gm', 0.018), ('registration', 0.018), ('match', 0.018), ('dna', 0.018), ('court', 0.017), ('quality', 0.017), ('probability', 0.017), ('centroid', 0.017), ('ax', 0.017), ('processes', 0.017), ('lr', 0.017), ('points', 0.016), ('ds', 0.016), ('validated', 0.016), ('sa', 0.016), ('picked', 0.016), ('tests', 0.016), ('prediction', 0.016), ('predict', 0.016), ('con', 0.016), ('pattern', 0.016), ('ki', 0.016), ('assessing', 0.016), ('database', 0.015), ('jain', 0.015), ('images', 0.015), ('predictive', 0.015), ('maps', 0.014), ('systematic', 0.014), ('joint', 0.014), ('locations', 0.014), ('heidelberg', 0.014), ('map', 0.014), ('directions', 0.014), ('spatially', 0.014), ('collected', 0.014), ('scene', 0.014), ('contain', 0.013), ('point', 0.013), ('reliable', 0.013), ('goodness', 0.013), ('cartesian', 0.013), ('covariance', 0.013), ('dependencies', 0.013), ('enhanced', 0.013), ('su', 0.013), ('oi', 0.013), ('align', 0.013), ('angle', 0.012), ('distribution', 0.012), ('matching', 0.012), ('numbers', 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 82 nips-2010-Evaluation of Rarity of Fingerprints in Forensics

Author: Chang Su, Sargur Srihari

2 0.048604727 17 nips-2010-A biologically plausible network for the computation of orientation dominance

Author: Kritika Muralidharan, Nuno Vasconcelos

Abstract: The determination of dominant orientation at a given image location is formulated as a decision-theoretic question. This leads to a novel measure for the dominance of a given orientation θ, which is similar to that used by SIFT. It is then shown that the new measure can be computed with a network that implements the sequence of operations of the standard neurophysiological model of V1. The measure can thus be seen as a biologically plausible version of SIFT, and is denoted as bioSIFT. The network units are shown to exhibit trademark properties of V1 neurons, such as cross-orientation suppression, sparseness and independence. The connection between SIFT and biological vision provides a justiﬁcation for the success of SIFT-like features and reinforces the importance of contrast normalization in computer vision. We illustrate this by replacing the Gabor units of an HMAX network with the new bioSIFT units. This is shown to lead to significant gains for classiﬁcation tasks, leading to state-of-the-art performance among biologically inspired network models and performance competitive with the best non-biological object recognition systems. 1

3 0.046533294 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models

Author: Iain Murray, Ryan P. Adams

Abstract: The Gaussian process (GP) is a popular way to specify dependencies between random variables in a probabilistic model. In the Bayesian framework the covariance structure can be speciﬁed using unknown hyperparameters. Integrating over these hyperparameters considers diﬀerent possible explanations for the data when making predictions. This integration is often performed using Markov chain Monte Carlo (MCMC) sampling. However, with non-Gaussian observations standard hyperparameter sampling approaches require careful tuning and may converge slowly. In this paper we present a slice sampling approach that requires little tuning while mixing well in both strong- and weak-data regimes. 1

4 0.042637046 89 nips-2010-Factorized Latent Spaces with Structured Sparsity

Author: Yangqing Jia, Mathieu Salzmann, Trevor Darrell

Abstract: Recent approaches to multi-view learning have shown that factorizing the information into parts that are shared across all views and parts that are private to each view could effectively account for the dependencies and independencies between the different input modalities. Unfortunately, these approaches involve minimizing non-convex objective functions. In this paper, we propose an approach to learning such factorized representations inspired by sparse coding techniques. In particular, we show that structured sparsity allows us to address the multiview learning problem by alternately solving two convex optimization problems. Furthermore, the resulting factorized latent spaces generalize over existing approaches in that they allow having latent dimensions shared between any subset of the views instead of between all the views only. We show that our approach outperforms state-of-the-art methods on the task of human pose estimation. 1

5 0.041427724 185 nips-2010-Nonparametric Density Estimation for Stochastic Optimization with an Observable State Variable

Author: Lauren Hannah, Warren Powell, David M. Blei

Abstract: In this paper we study convex stochastic optimization problems where a noisy objective function value is observed after a decision is made. There are many stochastic optimization problems whose behavior depends on an exogenous state variable which affects the shape of the objective function. Currently, there is no general purpose algorithm to solve this class of problems. We use nonparametric density estimation to take observations from the joint state-outcome distribution and use them to infer the optimal decision for a given query state s. We propose two solution methods that depend on the problem characteristics: function-based and gradient-based optimization. We examine two weighting schemes, kernel based weights and Dirichlet process based weights, for use with the solution methods. The weights and solution methods are tested on a synthetic multi-product newsvendor problem and the hour ahead wind commitment problem. Our results show that in some cases Dirichlet process weights offer substantial beneﬁts over kernel based weights and more generally that nonparametric estimation methods provide good solutions to otherwise intractable problems. 1

6 0.036571108 103 nips-2010-Generating more realistic images using gated MRF's

7 0.036044687 95 nips-2010-Feature Transitions with Saccadic Search: Size, Color, and Orientation Are Not Alike

8 0.03595829 235 nips-2010-Self-Paced Learning for Latent Variable Models

9 0.031307589 274 nips-2010-Trading off Mistakes and Don't-Know Predictions

10 0.029838938 268 nips-2010-The Neural Costs of Optimal Control

11 0.029776433 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces

12 0.02956938 213 nips-2010-Predictive Subspace Learning for Multi-view Data: a Large Margin Approach

13 0.028994989 133 nips-2010-Kernel Descriptors for Visual Recognition

14 0.028423781 119 nips-2010-Implicit encoding of prior probabilities in optimal neural populations

15 0.027569108 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models

16 0.026580272 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

17 0.02638663 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification

18 0.025744917 9 nips-2010-A New Probabilistic Model for Rank Aggregation

19 0.024282079 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models

20 0.02364766 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.075), (1, 0.026), (2, -0.033), (3, -0.006), (4, -0.017), (5, -0.007), (6, 0.014), (7, 0.012), (8, -0.003), (9, 0.005), (10, -0.027), (11, -0.011), (12, -0.013), (13, -0.035), (14, 0.003), (15, 0.02), (16, 0.006), (17, 0.022), (18, 0.059), (19, 0.045), (20, -0.018), (21, 0.03), (22, -0.003), (23, 0.001), (24, -0.007), (25, 0.026), (26, 0.013), (27, 0.027), (28, -0.039), (29, -0.02), (30, -0.009), (31, -0.036), (32, 0.048), (33, -0.005), (34, -0.029), (35, -0.016), (36, 0.006), (37, -0.034), (38, 0.081), (39, -0.069), (40, 0.028), (41, 0.029), (42, 0.011), (43, 0.015), (44, -0.012), (45, 0.042), (46, -0.023), (47, -0.039), (48, -0.016), (49, 0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.86363882 82 nips-2010-Evaluation of Rarity of Fingerprints in Forensics

Author: Chang Su, Sargur Srihari

2 0.60770351 101 nips-2010-Gaussian sampling by local perturbations

Author: George Papandreou, Alan L. Yuille

Abstract: We present a technique for exact simulation of Gaussian Markov random ﬁelds (GMRFs), which can be interpreted as locally injecting noise to each Gaussian factor independently, followed by computing the mean/mode of the perturbed GMRF. Coupled with standard iterative techniques for the solution of symmetric positive deﬁnite systems, this yields a very efﬁcient sampling algorithm with essentially linear complexity in terms of speed and memory requirements, well suited to extremely large scale probabilistic models. Apart from synthesizing data under a Gaussian model, the proposed technique directly leads to an efﬁcient unbiased estimator of marginal variances. Beyond Gaussian models, the proposed algorithm is also very useful for handling highly non-Gaussian continuously-valued MRFs such as those arising in statistical image modeling or in the ﬁrst layer of deep belief networks describing real-valued data, where the non-quadratic potentials coupling different sites can be represented as ﬁnite or inﬁnite mixtures of Gaussians with the help of local or distributed latent mixture assignment variables. The Bayesian treatment of such models most naturally involves a block Gibbs sampler which alternately draws samples of the conditionally independent latent mixture assignments and the conditionally multivariate Gaussian continuous vector and we show that it can directly beneﬁt from the proposed methods. 1

3 0.58255124 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models

Author: Iain Murray, Ryan P. Adams

4 0.56762701 103 nips-2010-Generating more realistic images using gated MRF's

Author: Marc'aurelio Ranzato, Volodymyr Mnih, Geoffrey E. Hinton

Abstract: Probabilistic models of natural images are usually evaluated by measuring performance on rather indirect tasks, such as denoising and inpainting. A more direct way to evaluate a generative model is to draw samples from it and to check whether statistical properties of the samples match the statistics of natural images. This method is seldom used with high-resolution images, because current models produce samples that are very different from natural images, as assessed by even simple visual inspection. We investigate the reasons for this failure and we show that by augmenting existing models so that there are two sets of latent variables, one set modelling pixel intensities and the other set modelling image-speciﬁc pixel covariances, we are able to generate high-resolution images that look much more realistic than before. The overall model can be interpreted as a gated MRF where both pair-wise dependencies and mean intensities of pixels are modulated by the states of latent variables. Finally, we conﬁrm that if we disallow weight-sharing between receptive ﬁelds that overlap each other, the gated MRF learns more efﬁcient internal representations, as demonstrated in several recognition tasks. 1 Introduction and Prior Work The study of the statistical properties of natural images has a long history and has inﬂuenced many ﬁelds, from image processing to computational neuroscience [1]. In this work we focus on probabilistic models of natural images. These models are useful for extracting representations [2, 3, 4] that can be used for discriminative tasks and they can also provide adaptive priors [5, 6, 7] that can be used in applications like denoising and inpainting. Our main focus, however, will be on improving the quality of the generative model, rather than exploring its possible applications. Markov Random Fields (MRF’s) provide a very general framework for modelling natural images. In an MRF, an image is assigned a probability which is a normalized product of potential functions, with each function typically being deﬁned over a subset of the observed variables. In this work we consider a very versatile class of MRF’s in which potential functions are deﬁned over both pixels and latent variables, thus allowing the states of the latent variables to modulate or gate the effective interactions between the pixels. This type of MRF, that we dub gated MRF, was proposed as an image model by Geman and Geman [8]. Welling et al. [9] showed how an MRF in this family1 could be learned for small image patches and their work was extended to high-resolution images by Roth and Black [6] who also demonstrated its success in some practical applications [7]. Besides their practical use, these models were speciﬁcally designed to match the statistical properties of natural images, and therefore, it seems natural to evaluate them in those terms. Indeed, several authors [10, 7] have proposed that these models should be evaluated by generating images and 1 Product of Student’s t models (without pooling) may not appear to have latent variables but each potential can be viewed as an inﬁnite mixture of zero-mean Gaussians where the inverse variance of the Gaussian is the latent variable. 1 checking whether the samples match the statistical properties observed in natural images. It is, therefore, very troublesome that none of the existing models can generate good samples, especially for high-resolution images (see for instance ﬁg. 2 in [7] which is one of the best models of highresolution images reported in the literature so far). In fact, as our experiments demonstrate the generated samples from these models are more similar to random images than to natural images! When MRF’s with gated interactions are applied to small image patches, they actually seem to work moderately well, as demonstrated by several authors [11, 12, 13]. The generated patches have some coherent and elongated structure and, like natural image patches, they are predominantly very smooth with sudden outbreaks of strong structure. This is unsurprising because these models have a built-in assumption that images are very smooth with occasional strong violations of smoothness [8, 14, 15]. However, the extension of these patch-based models to high-resolution images by replicating ﬁlters across the image has proven to be difﬁcult. The receptive ﬁelds that are learned no longer resemble Gabor wavelets but look random [6, 16] and the generated images lack any of the long range structure that is so typical of natural images [7]. The success of these methods in applications such as denoising is a poor measure of the quality of the generative model that has been learned: Setting the parameters to random values works almost as well for eliminating independent Gaussian noise [17], because this can be done quite well by just using a penalty for high-frequency variation. In this work, we show that the generative quality of these models can be drastically improved by jointly modelling both pixel mean intensities and pixel covariances. This can be achieved by using two sets of latent variables, one that gates pair-wise interactions between pixels and another one that sets the mean intensities of pixels, as we already proposed in some earlier work [4]. Here, we show that this modelling choice is crucial to make the gated MRF work well on high-resolution images. Finally, we show that the most widely used method of sharing weights in MRF’s for high-resolution images is overly constrained. Earlier work considered homogeneous MRF’s in which each potential is replicated at all image locations. This has the subtle effect of making learning very difﬁcult because of strong correlations at nearby sites. Following Gregor and LeCun [18] and also Tang and Eliasmith [19], we keep the number of parameters under control by using local potentials, but unlike Roth and Black [6] we only share weights between potentials that do not overlap. 2 Augmenting Gated MRF’s with Mean Hidden Units A Product of Student’s t (PoT) model [15] is a gated MRF deﬁned on small image patches that can be viewed as modelling image-speciﬁc, pair-wise relationships between pixel values by using the states of its latent variables. It is very good at representing the fact that two-pixel have very similar intensities and no good at all at modelling what these intensities are. Failure to model the mean also leads to impoverished modelling of the covariances when the input images have nonzero mean intensity. The covariance RBM (cRBM) [20] is another model that shares the same limitation since it only differs from PoT in the distribution of its latent variables: The posterior over the latent variables is a product of Bernoulli distributions instead of Gamma distributions as in PoT. We explain the fundamental limitation of these models by using a simple toy example: Modelling two-pixel images using a cRBM with only one binary hidden unit, see ﬁg. 1. This cRBM assumes that the conditional distribution over the input is a zero-mean Gaussian with a covariance that is determined by the state of the latent variable. Since the latent variable is binary, the cRBM can be viewed as a mixture of two zero-mean full covariance Gaussians. The latent variable uses the pairwise relationship between pixels to decide which of the two covariance matrices should be used to model each image. When the input data is pre-proessed by making each image have zero mean intensity (the empirical histogram is shown in the ﬁrst row and ﬁrst column), most images lie near the origin because most of the times nearby pixels are strongly correlated. Less frequently we encounter edge images that exhibit strong anti-correlation between the pixels, as shown by the long tails along the anti-diagonal line. A cRBM could model this data by using two Gaussians (ﬁrst row and second column): one that is spherical and tight at the origin for smooth images and another one that has a covariance elongated along the anti-diagonal for structured images. If, however, the whole set of images is normalized by subtracting from every pixel the mean value of all pixels over all images (second row and ﬁrst column), the cRBM fails at modelling structured images (second row and second column). It can ﬁt a Gaussian to the smooth images by discovering 2 Figure 1: In the ﬁrst row, each image is zero mean. In the second row, the whole set of data points is centered but each image can have non-zero mean. The ﬁrst column shows 8x8 images picked at random from natural images. The images in the second column are generated by a model that does not account for mean intensity. The images in the third column are generated by a model that has both “mean” and “covariance” hidden units. The contours in the ﬁrst column show the negative log of the empirical distribution of (tiny) natural two-pixel images (x-axis being the ﬁrst pixel and the y-axis the second pixel). The plots in the other columns are toy examples showing how each model could represent the empirical distribution using a mixture of Gaussians with components that have one of two possible covariances (corresponding to the state of a binary “covariance” latent variable). Models that can change the means of the Gaussians (mPoT and mcRBM) can represent better structured images (edge images lie along the anti-diagonal and are ﬁtted by the Gaussians shown in red) while the other models (PoT and cRBM) fail, overall when each image can have non-zero mean. the direction of strong correlation along the main diagonal, but it is very likely to fail to discover the direction of anti-correlation, which is crucial to represent discontinuities, because structured images with different mean intensity appear to be evenly spread over the whole input space. If the model has another set of latent variables that can change the means of the Gaussian distributions in the mixture (as explained more formally below and yielding the mPoT and mcRBM models), then the model can represent both changes of mean intensity and the correlational structure of pixels (see last column). The mean latent variables effectively subtract off the relevant mean from each data-point, letting the covariance latent variable capture the covariance structure of the data. As before, the covariance latent variable needs only to select between two covariance matrices. In fact, experiments on real 8x8 image patches conﬁrm these conjectures. Fig. 1 shows samples drawn from PoT and mPoT. mPoT (and similarly mcRBM [4]) is not only better at modelling zero mean images but it can also represent images that have non zero mean intensity well. We now describe mPoT, referring the reader to [4] for a detailed description of mcRBM. In PoT [9] the energy function is: E PoT (x, hc ) = i 1 [hc (1 + (Ci T x)2 ) + (1 − γ) log hc ] i i 2 (1) where x is a vectorized image patch, hc is a vector of Gamma “covariance” latent variables, C is a ﬁlter bank matrix and γ is a scalar parameter. The joint probability over input pixels and latent variables is proportional to exp(−E PoT (x, hc )). Therefore, the conditional distribution over the input pixels is a zero-mean Gaussian with covariance equal to: Σc = (Cdiag(hc )C T )−1 . (2) In order to make the mean of the conditional distribution non-zero, we deﬁne mPoT as the normalized product of the above zero-mean Gaussian that models the covariance and a spherical covariance Gaussian that models the mean. The overall energy function becomes: E mPoT (x, hc , hm ) = E PoT (x, hc ) + E m (x, hm ) 3 (3) Figure 2: Illustration of different choices of weight-sharing scheme for a RBM. Links converging to one latent variable are ﬁlters. Filters with the same color share the same parameters. Kinds of weight-sharing scheme: A) Global, B) Local, C) TConv and D) Conv. E) TConv applied to an image. Cells correspond to neighborhoods to which ﬁlters are applied. Cells with the same color share the same parameters. F) 256 ﬁlters learned by a Gaussian RBM with TConv weight-sharing scheme on high-resolution natural images. Each ﬁlter has size 16x16 pixels and it is applied every 16 pixels in both the horizontal and vertical directions. Filters in position (i, j) and (1, 1) are applied to neighborhoods that are (i, j) pixels away form each other. Best viewed in color. where hm is another set of latent variables that are assumed to be Bernoulli distributed (but other distributions could be used). The new energy term is: E m (x, hm ) = 1 T x x− 2 hm Wj T x j (4) j yielding the following conditional distribution over the input pixels: p(x|hc , hm ) = N (Σ(W hm ), Σ), Σ = (Σc + I)−1 (5) with Σc deﬁned in eq. 2. As desired, the conditional distribution has non-zero mean2 . Patch-based models like PoT have been extended to high-resolution images by using spatially localized ﬁlters [6]. While we can subtract off the mean intensity from independent image patches to successfully train PoT, we cannot do that on a high-resolution image because overlapping patches might have different mean. Unfortunately, replicating potentials over the image ignoring variations of mean intensity has been the leading strategy to date [6]3 . This is the major reason why generation of high-resolution images is so poor. Sec. 4 shows that generation can be drastically improved by explicitly accounting for variations of mean intensity, as performed by mPoT and mcRBM. 3 Weight-Sharing Schemes By integrating out the latent variables, we can write the density function of any gated MRF as a normalized product of potential functions (for mPoT refer to eq. 6). In this section we investigate different ways of constraining the parameters of the potentials of a generic MRF. Global: The obvious way to extend a patch-based model like PoT to high-resolution images is to deﬁne potentials over the whole image; we call this scheme global. This is not practical because 1) the number of parameters grows about quadratically with the size of the image making training too slow, 2) we do not need to model interactions between very distant pairs of pixels since their dependence is negligible, and 3) we would not be able to use the model on images of different size. Conv: The most popular way to handle big images is to deﬁne potentials on small subsets of variables (e.g., neighborhoods of size 5x5 pixels) and to replicate these potentials across space while 2 The need to model the means was clearly recognized in [21] but they used conjunctive latent features that simultaneously represented a contribution to the “precision matrix” in a speciﬁc direction and the mean along that same direction. 3 The success of PoT-like models in Bayesian denoising is not surprising since the noisy image effectively replaces the reconstruction term from the mean hidden units (see eq. 5), providing a set of noisy mean intensities that are cleaned up by the patterns of correlation enforced by the covariance latent variables. 4 sharing their parameters at each image location [23, 24, 6]. This yields a convolutional weightsharing scheme, also called homogeneous ﬁeld in the statistics literature. This choice is justiﬁed by the stationarity of natural images. This weight-sharing scheme is extremely concise in terms of number of parameters, but also rather inefﬁcient in terms of latent representation. First, if there are N ﬁlters at each location and these ﬁlters are stepped by one pixel then the internal representation is about N times overcomplete. The internal representation has not only high computational cost, but it is also highly redundant. Since the input is mostly smooth and the parameters are the same across space, the latent variables are strongly correlated as well. This inefﬁciency turns out to be particularly harmful for a model like PoT causing the learned ﬁlters to become “random” looking (see ﬁg 3-iii). A simple intuition follows from the equivalence between PoT and square ICA [15]. If the ﬁlter matrix C of eq. 1 is square and invertible, we can marginalize out the latent variables and write: p(y) = i S(yi ), where yi = Ci T x and S is a Student’s t distribution. In other words, there is an underlying assumption that ﬁlter outputs are independent. However, if the ﬁlters of matrix C are shifted and overlapping versions of each other, this clearly cannot be true. Training PoT with the Conv weight-sharing scheme forces the model to ﬁnd ﬁlters that make ﬁlter outputs as independent as possible, which explains the very high-frequency patterns that are usually discovered [6]. Local: The Global and Conv weight-sharing schemes are at the two extremes of a spectrum of possibilities. For instance, we can deﬁne potentials on a small subset of input variables but, unlike Conv, each potential can have its own set of parameters, as shown in ﬁg. 2-B. This is called local, or inhomogeneous ﬁeld. Compared to Conv the number of parameters increases only slightly but the number of latent variables required and their redundancy is greatly reduced. In fact, the model learns different receptive ﬁelds at different locations as a better strategy for representing the input, overall when the number of potentials is limited (see also ﬁg. 2-F). TConv: Local would not allow the model to be trained and tested on images of different resolution, and it might seem wasteful not to exploit the translation invariant property of images. We therefore advocate the use of a weight-sharing scheme that we call tiled-convolutional (TConv) shown in ﬁg. 2-C and E [18]. Each ﬁlter tiles the image without overlaps with copies of itself (i.e. the stride equals the ﬁlter diameter). This reduces spatial redundancy of latent variables and allows the input images to have arbitrary size. At the same time, different ﬁlters do overlap with each other in order to avoid tiling artifacts. Fig. 2-F shows ﬁlters that were (jointly) learned by a Restricted Boltzmann Machine (RBM) [29] with Gaussian input variables using the TConv weight-sharing scheme. 4 Experiments We train gated MRF’s with and without mean hidden units using different weight-sharing schemes. The training procedure is very similar in all cases. We perform approximate maximum likelihood by using Fast Persistence Contrastive Divergence (FPCD) [25] and we draw samples by using Hybrid Monte Carlo (HMC) [26]. Since all latent variables can be exactly marginalized out we can use HMC on the free energy (negative logarithm of the marginal distribution over the input pixels). For mPoT this is: F mPoT (x) = − log(p(x))+const. = k,i 1 1 γ log(1+ (Cik T xk )2 )+ xT x− 2 2 T log(1+exp(Wjk xk )) (6) k,j where the index k runs over spatial locations and xk is the k-th image patch. FPCD keeps samples, called negative particles, that it uses to represent the model distribution. These particles are all updated after each weight update. For each mini-batch of data-points a) we compute the derivative of the free energy w.r.t. the training samples, b) we update the negative particles by running HMC for one HMC step consisting of 20 leapfrog steps. We start at the previous set of negative particles and use as parameters the sum of the regular parameters and a small perturbation vector, c) we compute the derivative of the free energy at the negative particles, and d) we update the regular parameters by using the difference of gradients between step a) and c) while the perturbation vector is updated using the gradient from c) only. The perturbation is also strongly decayed to zero and is subject to a larger learning rate. The aim is to encourage the negative particles to explore the space more quickly by slightly and temporarily raising the energy at their current position. Note that the use of FPCD as opposed to other estimation methods (like Persistent Contrastive Divergence [27]) turns out to be crucial to achieve good mixing of the sampler even after training. We train on mini-batches of 32 samples using gray-scale images of approximate size 160x160 pixels randomly cropped from the Berkeley segmentation dataset [28]. We perform 160,000 weight updates decreasing the learning by a factor of 4 by the end of training. The initial learning rate is set to 0.1 for the covariance 5 Figure 3: 160x160 samples drawn by A) mPoT-TConv, B) mHPoT-TConv, C) mcRBM-TConv and D) PoTTConv. On the side also i) a subset of 8x8 “covariance” ﬁlters learned by mPoT-TConv (the plot below shows how the whole set of ﬁlters tile a small patch; each bar correspond to a Gabor ﬁt of a ﬁlter and colors identify ﬁlters applied at the same 8x8 location, each group is shifted by 2 pixels down the diagonal and a high-resolution image is tiled by replicating this pattern every 8 pixels horizontally and vertically), ii) a subset of 8x8 “mean” ﬁlters learned by the same mPoT-TConv, iii) ﬁlters learned by PoT-Conv and iv) by PoT-TConv. ﬁlters (matrix C of eq. 1), 0.01 for the mean parameters (matrix W of eq. 4), and 0.001 for the other parameters (γ of eq. 1). During training we condition on the borders and initialize the negative particles at zero in order to avoid artifacts at the border of the image. We learn 8x8 ﬁlters and pre-multiply the covariance ﬁlters by a whitening transform retaining 99% of the variance; we also normalize the norm of the covariance ﬁlters to prevent some of them from decaying to zero during training4 . Whenever we use the TConv weight-sharing scheme the model learns covariance ﬁlters that mostly resemble localized and oriented Gabor functions (see ﬁg. 3-i and iv), while the Conv weight-sharing scheme learns structured but poorly localized high-frequency patterns (see ﬁg. 3-iii) [6]. The TConv models re-use the same 8x8 ﬁlters every 8 pixels and apply a diagonal offset of 2 pixels between neighboring ﬁlters with different weights in order to reduce tiling artifacts. There are 4 sets of ﬁlters, each with 64 ﬁlters for a total of 256 covariance ﬁlters (see bottom plot of ﬁg. 3). Similarly, we have 4 sets of mean ﬁlters, each with 32 ﬁlters. These ﬁlters have usually non-zero mean and exhibit on-center off-surround and off-center on-surround patterns, see ﬁg. 3-ii. In order to draw samples from the learned models, we run HMC for a long time (10,000 iterations, each composed of 20 leap-frog steps). Some samples of size 160x160 pixels are reported in ﬁg. 3 A)D). Without modelling the mean intensity, samples lack structure and do not seem much different from those that would be generated by a simple Gaussian model merely ﬁtting the second order statistics (see ﬁg. 3 in [1] and also ﬁg. 2 in [7]). By contrast, structure, sharp boundaries and some simple texture emerge only from models that have mean latent variables, namely mcRBM, mPoT and mHPoT which differs from mPoT by having a second layer pooling matrix on the squared covariance ﬁlter outputs [11]. A more quantitative comparison is reported in table 1. We ﬁrst compute marginal statistics of ﬁlter responses using the generated images, natural images from the test set, and random images. The statistics are the normalized histogram of individual ﬁlter responses to 24 Gabor ﬁlters (8 orientations and 3 scales). We then calculate the KL divergence between the histograms on random images and generated images and the KL divergence between the histograms on natural images and generated images. The table also reports the average difference of energies between random images and natural images. All results demonstrate that models that account for mean intensity generate images 4 The code used in the experiments can be found at the ﬁrst author’s web-page. 6 MODEL F (R) − F (T ) (104 ) KL(R G) KL(T G) KL(R G) − KL(T PoT - Conv 2.9 0.3 0.6 PoT - TConv 2.8 0.4 1.0 -0.6 mPoT - TConv 5.2 1.0 0.2 0.8 mHPoT - TConv 4.9 1.7 0.8 0.9 mcRBM - TConv 3.5 1.5 1.0 G) -0.3 0.5 Table 1: Comparing MRF’s by measuring: difference of energy (negative log ratio of probabilities) between random images (R) and test natural images (T), the KL divergence between statistics of random images (R) and generated images (G), KL divergence between statistics of test natural images (T) and generated images (G), and difference of these two KL divergences. Statistics are computed using 24 Gabor ﬁlters. that are closer to natural images than to random images, whereas models that do not account for the mean (like the widely used PoT-Conv) produce samples that are actually closer to random images. 4.1 Discriminative Experiments on Weight-Sharing Schemes In future work, we intend to use the features discovered by the generative model for recognition. To understand how the different weight sharing schemes affect recognition performance we have done preliminary tests using the discriminative performance of a simpler model on simpler data. We consider one of the simplest and most versatile models, namely the RBM [29]. Since we also aim to test the Global weight-sharing scheme we are constrained to using fairly low resolution datasets such as the MNIST dataset of handwritten digits [30] and the CIFAR 10 dataset of generic object categories [22]. The MNIST dataset has soft binary images of size 28x28 pixels, while the CIFAR 10 dataset has color images of size 32x32 pixels. CIFAR 10 has 10 classes, 5000 training samples per class and 1000 test samples per class. MNIST also has 10 classes with, on average, 6000 training samples per class and 1000 test samples per class. The energy function of the RBM trained on the CIFAR 10 dataset, modelling input pixels with 3 (R,G,B) Gaussian variables [31], is exactly the one shown in eq. 4; while the RBM trained on MNIST uses logistic units for the pixels and the energy function is again the same as before but without any quadratic term. All models are trained in an unsupervised way to approximately maximize the likelihood in the training set using Contrastive Divergence [32]. They are then used to represent each input image with a feature vector (mean of the posterior over the latent variables) which is fed to a multinomial logistic classiﬁer for discrimination. Models are compared in terms of: 1) recognition accuracy, 2) convergence time and 3) dimensionality of the representation. In general, assuming ﬁlters much smaller than the input image and assuming equal number of latent variables, Conv, TConv and Local models process each sample faster than Global by a factor approximately equal to the ratio between the area of the image and the area of the ﬁlters, which can be very large in practice. In the ﬁrst set of experiments reported on the left of ﬁg. 4 we study the internal representation in terms of discrimination and dimensionality using the MNIST dataset. For each choice of dimensionality all models are trained using the same number of operations. This is set to the amount necessary to complete one epoch over the training set using the Global model. This experiment shows that: 1) Local outperforms all other weight-sharing schemes for a wide range of dimensionalities, 2) TConv does not perform as well as Local probably because the translation invariant assumption is clearly violated for these relatively small, centered, images, 3) Conv performs well only when the internal representation is very high dimensional (10 times overcomplete) otherwise it severely underﬁts, 4) Global performs well when the representation is compact but its performance degrades rapidly as this increases because it needs more than the allotted training time. The right hand side of ﬁg. 4 shows how the recognition performance evolves as we increase the number of operations (or training time) using models that produce a twice overcomplete internal representation. With only very few ﬁlters Conv still underﬁts and it does not improve its performance by training for longer, but Global does improve and eventually it reaches the performance of Local. If we look at the crossing of the error rate at 2% we can see that Local is about 4 times faster than Global. To summarize, Local provides more compact representations than Conv, is much faster than Global while achieving 7 6 2.4 error rate % 5 error rate % 2.6 Global Local TConv Conv 4 3 2 1 0 2.2 Global Local 2 Conv 1.8 1000 2000 3000 4000 5000 dimensionality 6000 7000 1.6 0 8000 2 4 6 8 # flops (relative to # flops per epoch of Global model) 10 Figure 4: Experiments on MNIST using RBM’s with different weight-sharing schemes. Left: Error rate as a function of the dimensionality of the latent representation. Right: Error rate as a function of the number of operations (normalized to those needed to perform one epoch in the Global model); all models have a twice overcomplete latent representation. similar performance in discrimination. Also, Local can easily scale to larger images while Global cannot. Similar experiments are performed using the CIFAR 10 dataset [22] of natural images. Using the same protocol introduced in earlier work by Krizhevsky [22], the RBM’s are trained in an unsupervised way on a subset of the 80 million tiny images dataset [33] and then “ﬁne-tuned” on the CIFAR 10 dataset by supervised back-propagation of the error through the linear classiﬁer and feature extractor. All models produce an approximately 10,000 dimensional internal representation to make a fair comparison. Models using local ﬁlters learn 16x16 ﬁlters that are stepped every pixel. Again, we do not experiment with the TConv weight-sharing scheme because the image is not large enough to allow enough replicas. Similarly to ﬁg. 3-iii the Conv weight-sharing scheme was very difﬁcult to train and did not produce Gabor-like features. Indeed, careful injection of sparsity and long training time seem necessary [31] for these RBM’s. By contrast, both Local and Global produce Gabor-like ﬁlters similar to those shown in ﬁg. 2 F). The model trained with Conv weight-sharing scheme yields an accuracy equal to 56.6%, while Local and Global yield much better performance, 63.6% and 64.8% [22], respectively. Although Local and Global have similar performance, training with the Local weight-sharing scheme took under an hour while using the Global weight-sharing scheme required more than a day. 5 Conclusions and Future Work This work is motivated by the poor generative quality of currently popular MRF models of natural images. These models generate images that are actually more similar to white noise than to natural images. Our contribution is to recognize that current models can beneﬁt from 1) the addition of a simple model of the mean intensities and from 2) the use of a less constrained weight-sharing scheme. By augmenting these models with an extra set of latent variables that model mean intensity we can generate samples that look much more realistic: they are characterized by smooth regions, sharp boundaries and some simple high frequency texture. We validate our approach by comparing the statistics of ﬁlter outputs on natural images and generated images. In the future, we plan to integrate these MRF’s into deeper hierarchical models and to use their internal representation to perform object recognition in high-resolution images. The hope is to further improve generation by capturing longer range dependencies and to exploit this to better cope with missing values and ambiguous sensory inputs. References [1] E.P. Simoncelli. Statistical modeling of photographic images. Handbook of Image and Video Processing, pages 431–441, 2005. 8 [2] A. Hyvarinen, J. Karhunen, and E. Oja. Independent Component Analysis. John Wiley & Sons, 2001. [3] G.E. Hinton and R. R Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006. [4] M. Ranzato and G.E. Hinton. Modeling pixel means and covariances using factorized third-order boltzmann machines. In CVPR, 2010. [5] M.J. Wainwright and E.P. Simoncelli. Scale mixtures of gaussians and the statistics of natural images. In NIPS, 2000. [6] S. Roth and M.J. Black. Fields of experts: A framework for learning image priors. In CVPR, 2005. [7] U. Schmidt, Q. Gao, and S. Roth. A generative perspective on mrfs in low-level vision. In CVPR, 2010. [8] S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. PAMI, 6:721–741, 1984. [9] M. Welling, G.E. Hinton, and S. Osindero. Learning sparse topographic representations with products of student-t distributions. In NIPS, 2003. [10] S.C. Zhu and D. Mumford. Prior learning and gibbs reaction diffusion. PAMI, pages 1236–1250, 1997. [11] S. Osindero, M. Welling, and G. E. Hinton. Topographic product models applied to natural scene statistics. Neural Comp., 18:344–381, 2006. [12] S. Osindero and G. E. Hinton. Modeling image patches with a directed hierarchy of markov random ﬁelds. In NIPS, 2008. [13] Y. Karklin and M.S. Lewicki. Emergence of complex cell properties by learning to generalize in natural scenes. Nature, 457:83–86, 2009. [14] B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: a strategy employed by v1? Vision Research, 37:3311–3325, 1997. [15] Y. W. Teh, M. Welling, S. Osindero, and G. E. Hinton. Energy-based models for sparse overcomplete representations. JMLR, 4:1235–1260, 2003. [16] Y. Weiss and W.T. Freeman. What makes a good model of natural images? In CVPR, 2007. [17] S. Roth and M. J. Black. Fields of experts. Int. Journal of Computer Vision, 82:205–229, 2009. [18] K. Gregor and Y. LeCun. Emergence of complex-like cells in a temporal product network with local receptive ﬁelds. arXiv:1006.0448, 2010. [19] C. Tang and C. Eliasmith. Deep networks for robust visual recognition. In ICML, 2010. [20] M. Ranzato, A. Krizhevsky, and G.E. Hinton. Factored 3-way restricted boltzmann machines for modeling natural images. In AISTATS, 2010. [21] N. Heess, C.K.I. Williams, and G.E. Hinton. Learning generative texture models with extended ﬁelds-ofexperts. In BMCV, 2009. [22] A. Krizhevsky. Learning multiple layers of features from tiny images, 2009. MSc Thesis, Dept. of Comp. Science, Univ. of Toronto. [23] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang. Phoneme recognition using time-delay neural networks. IEEE Acoustics Speech and Signal Proc., 37:328–339, 1989. [24] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [25] T. Tieleman and G.E. Hinton. Using fast weights to improve persistent contrastive divergence. In ICML, 2009. [26] R.M. Neal. Bayesian learning for neural networks. Springer-Verlag, 1996. [27] T. Tieleman. Training restricted boltzmann machines using approximations to the likelihood gradient. In ICML, 2008. [28] http://www.cs.berkeley.edu/projects/vision/grouping/segbench/. [29] M. Welling, M. Rosen-Zvi, and G.E. Hinton. Exponential family harmoniums with an application to information retrieval. In NIPS, 2005. [30] http://yann.lecun.com/exdb/mnist/. [31] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proc. ICML, 2009. [32] G.E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:1771–1800, 2002. [33] A. Torralba, R. Fergus, and W.T. Freeman. 80 million tiny images: a large dataset for non-parametric object and scene recognition. PAMI, 30:1958–1970, 2008. 9

5 0.55096292 95 nips-2010-Feature Transitions with Saccadic Search: Size, Color, and Orientation Are Not Alike

Author: Stella X. Yu

Abstract: Size, color, and orientation have long been considered elementary features whose attributes are extracted in parallel and available to guide the deployment of attention. If each is processed in the same fashion with simply a different set of local detectors, one would expect similar search behaviours on localizing an equivalent ﬂickering change among identically laid out disks. We analyze feature transitions associated with saccadic search and ﬁnd out that size, color, and orientation are not alike in dynamic attribute processing over time. The Markovian feature transition is attractive for size, repulsive for color, and largely reversible for orientation. 1

6 0.49959973 245 nips-2010-Space-Variant Single-Image Blind Deconvolution for Removing Camera Shake

7 0.48728794 156 nips-2010-Learning to combine foveal glimpses with a third-order Boltzmann machine

8 0.48665202 149 nips-2010-Learning To Count Objects in Images

9 0.4839491 113 nips-2010-Heavy-Tailed Process Priors for Selective Shrinkage

10 0.48365128 266 nips-2010-The Maximal Causes of Natural Scenes are Edge Filters

11 0.47487152 224 nips-2010-Regularized estimation of image statistics by Score Matching

12 0.46659455 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

13 0.44939157 17 nips-2010-A biologically plausible network for the computation of orientation dominance

14 0.44731867 256 nips-2010-Structural epitome: a way to summarize one’s visual experience

15 0.44571596 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process

16 0.43697947 3 nips-2010-A Bayesian Framework for Figure-Ground Interpretation

17 0.4347403 100 nips-2010-Gaussian Process Preference Elicitation

18 0.43000939 234 nips-2010-Segmentation as Maximum-Weight Independent Set

19 0.42042309 262 nips-2010-Switched Latent Force Models for Movement Segmentation

20 0.41272828 77 nips-2010-Epitome driven 3-D Diffusion Tensor image segmentation: on extracting specific structures

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.355), (13, 0.016), (17, 0.023), (27, 0.06), (30, 0.049), (35, 0.03), (45, 0.165), (50, 0.04), (52, 0.034), (60, 0.028), (77, 0.027), (90, 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.79083467 252 nips-2010-SpikeAnts, a spiking neuron network modelling the emergence of organization in a complex system

Author: Sylvain Chevallier, Hél\`ene Paugam-moisy, Michele Sebag

Abstract: Many complex systems, ranging from neural cell assemblies to insect societies, involve and rely on some division of labor. How to enforce such a division in a decentralized and distributed way, is tackled in this paper, using a spiking neuron network architecture. Speciﬁcally, a spatio-temporal model called SpikeAnts is shown to enforce the emergence of synchronized activities in an ant colony. Each ant is modelled from two spiking neurons; the ant colony is a sparsely connected spiking neuron network. Each ant makes its decision (among foraging, sleeping and self-grooming) from the competition between its two neurons, after the signals received from its neighbor ants. Interestingly, three types of temporal patterns emerge in the ant colony: asynchronous, synchronous, and synchronous periodic foraging activities − similar to the actual behavior of some living ant colonies. A phase diagram of the emergent activity patterns with respect to two control parameters, respectively accounting for ant sociability and receptivity, is presented and discussed. 1

same-paper 2 0.69416726 82 nips-2010-Evaluation of Rarity of Fingerprints in Forensics

Author: Chang Su, Sargur Srihari

3 0.54348993 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

Author: Alessandro Bergamo, Lorenzo Torresani

Abstract: Most current image categorization methods require large collections of manually annotated training examples to learn accurate visual recognition models. The time-consuming human labeling effort effectively limits these approaches to recognition problems involving a small number of different object classes. In order to address this shortcoming, in recent years several authors have proposed to learn object classiﬁers from weakly-labeled Internet images, such as photos retrieved by keyword-based image search engines. While this strategy eliminates the need for human supervision, the recognition accuracies of these methods are considerably lower than those obtained with fully-supervised approaches, because of the noisy nature of the labels associated to Web data. In this paper we investigate and compare methods that learn image classiﬁers by combining very few manually annotated examples (e.g., 1-10 images per class) and a large number of weakly-labeled Web photos retrieved using keyword-based image search. We cast this as a domain adaptation problem: given a few stronglylabeled examples in a target domain (the manually annotated examples) and many source domain examples (the weakly-labeled Web photos), learn classiﬁers yielding small generalization error on the target domain. Our experiments demonstrate that, for the same number of strongly-labeled examples, our domain adaptation approach produces signiﬁcant recognition rate improvements over the best published results (e.g., 65% better when using 5 labeled training examples per class) and that our classiﬁers are one order of magnitude faster to learn and to evaluate than the best competing method, despite our use of large weakly-labeled data sets.

4 0.47950104 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes

Author: Dahua Lin, Eric Grimson, John W. Fisher

Abstract: We present a novel method for constructing dependent Dirichlet processes. The approach exploits the intrinsic relationship between Dirichlet and Poisson processes in order to create a Markov chain of Dirichlet processes suitable for use as a prior over evolving mixture models. The method allows for the creation, removal, and location variation of component models over time while maintaining the property that the random measures are marginally DP distributed. Additionally, we derive a Gibbs sampling algorithm for model inference and test it on both synthetic and real data. Empirical results demonstrate that the approach is effective in estimating dynamically varying mixture models. 1

5 0.47936401 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior

Author: Pierre Garrigues, Bruno A. Olshausen

Abstract: We propose a class of sparse coding models that utilizes a Laplacian Scale Mixture (LSM) prior to model dependencies among coefﬁcients. Each coefﬁcient is modeled as a Laplacian distribution with a variable scale parameter, with a Gamma distribution prior over the scale parameter. We show that, due to the conjugacy of the Gamma prior, it is possible to derive efﬁcient inference procedures for both the coefﬁcients and the scale parameter. When the scale parameters of a group of coefﬁcients are combined into a single variable, it is possible to describe the dependencies that occur due to common amplitude ﬂuctuations among coefﬁcients, which have been shown to constitute a large fraction of the redundancy in natural images [1]. We show that, as a consequence of this group sparse coding, the resulting inference of the coefﬁcients follows a divisive normalization rule, and that this may be efﬁciently implemented in a network architecture similar to that which has been proposed to occur in primary visual cortex. We also demonstrate improvements in image coding and compressive sensing recovery using the LSM model. 1

6 0.47916475 155 nips-2010-Learning the context of a category

7 0.4769522 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

8 0.47581825 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework

9 0.47556546 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

10 0.47453839 268 nips-2010-The Neural Costs of Optimal Control

11 0.47449872 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models

12 0.47362387 282 nips-2010-Variable margin losses for classifier design

13 0.47355744 63 nips-2010-Distributed Dual Averaging In Networks

14 0.4730542 148 nips-2010-Learning Networks of Stochastic Differential Equations

15 0.47299621 117 nips-2010-Identifying graph-structured activation patterns in networks

16 0.47298393 280 nips-2010-Unsupervised Kernel Dimension Reduction

17 0.47269338 7 nips-2010-A Family of Penalty Functions for Structured Sparsity

18 0.47259459 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression

19 0.47254121 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs

20 0.47240269 80 nips-2010-Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs