nips nips2004 nips2004-134 knowledge-graph by maker-knowledge-mining

134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics


Source: pdf

Author: Michael Fink

Abstract: We describe a framework for learning an object classifier from a single example. This goal is achieved by emphasizing the relevant dimensions for classification using available examples of related classes. Learning to accurately classify objects from a single training example is often unfeasible due to overfitting effects. However, if the instance representation provides that the distance between each two instances of the same class is smaller than the distance between any two instances from different classes, then a nearest neighbor classifier could achieve perfect performance with a single training example. We therefore suggest a two stage strategy. First, learn a metric over the instances that achieves the distance criterion mentioned above, from available examples of other related classes. Then, using the single examples, define a nearest neighbor classifier where distance is evaluated by the learned class relevance metric. Finding a metric that emphasizes the relevant dimensions for classification might not be possible when restricted to linear projections. We therefore make use of a kernel based metric learning algorithm. Our setting encodes object instances as sets of locality based descriptors and adopts an appropriate image kernel for the class relevance metric learning. The proposed framework for learning from a single example is demonstrated in a synthetic setting and on a character classification task. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This goal is achieved by emphasizing the relevant dimensions for classification using available examples of related classes. [sent-4, score-0.383]

2 However, if the instance representation provides that the distance between each two instances of the same class is smaller than the distance between any two instances from different classes, then a nearest neighbor classifier could achieve perfect performance with a single training example. [sent-6, score-1.432]

3 First, learn a metric over the instances that achieves the distance criterion mentioned above, from available examples of other related classes. [sent-8, score-0.452]

4 Then, using the single examples, define a nearest neighbor classifier where distance is evaluated by the learned class relevance metric. [sent-9, score-1.0]

5 Our setting encodes object instances as sets of locality based descriptors and adopts an appropriate image kernel for the class relevance metric learning. [sent-12, score-1.483]

6 1 Introduction We describe a framework for learning to accurately discriminate between two target classes of objects (e. [sent-14, score-0.441]

7 In general, learning to accurately classify object images from a single training example is unfeasible due to overfitting effects of high dimensional data. [sent-17, score-0.545]

8 However, if a certain distance function over the instances guarantees that all within-class distances are smaller than any betweenclass distance, then nearest neighbor classification could achieve perfect performance with a single training example. [sent-18, score-0.794]

9 First, learn from available examples of other related classes (like beavers, skunks and marmots), a metric over the instance space that satisfies the distance criterion mentioned above. [sent-20, score-0.691]

10 Then, define a nearest neighbor classifier based on the single examples. [sent-21, score-0.393]

11 This nearest neighbor classifier calculates distance using the class relevance metric. [sent-22, score-0.93]

12 The difficulty in achieving robust object classification emerges from the instance variety of object appearance. [sent-23, score-0.484]

13 This variability results from both class relevant and class non-relevant dimensions. [sent-24, score-0.451]

14 For example, adding a stroke crossing the digit 7, adds variability due to a class relevant dimension (better discriminating 7’s from 1’s), while italic writing adds variability in a class irrelevant dimension. [sent-25, score-0.541]

15 Since such guiding heuristics may be absent or misleading, object classification systems often use numerous positive examples for training, in an attempt to manage within class variability. [sent-29, score-0.426]

16 We are guided by the observation that in many settings providing an extended training set of certain classes might be costly or impossible due to scarcity of examples, thus motivating methods that suffice with few training examples. [sent-30, score-0.443]

17 Recently, it has been demonstrated that objects share distribution densities on deformation transforms [13], shape or appearance [6]; and that objects could be detected by a common set of reusable features [1, 18]. [sent-35, score-0.427]

18 We suggest that in many visual tasks it is natural to assume that one common set of constraints characterized a common set of relevant and non-relevant dimensions shared by a specific family of related classes [10]. [sent-36, score-0.571]

19 4 a kernel based method for learning a pseudo-metric that is capable of emphasizing the relevant dimensions and diminishing the overfitting effects of non-relevant dimensions. [sent-43, score-0.536]

20 By projecting the single examples using this class relevance pseudo-metric, learning from a single example becomes feasible. [sent-44, score-0.795]

21 5, adopts shape context descriptors [3] of Latin letters to demonstrate the feasibility of learning from a single example. [sent-46, score-0.748]

22 2 Problem Setting Let X be our object instance space and let u and v indicate two classes defined over X . [sent-48, score-0.534]

23 Our goal is to generate a classifier h(x) which discriminates between instances of the two object classes u and v. [sent-49, score-0.622]

24 Thus, every x in our instance space is characterized by the set {lj , pi }k where j j=1 i i 1 i lj is a locality based descriptor calculated at location pj of image i . [sent-52, score-0.643]

25 Our method uses a single instance from classes u and v as well as instances from other related classes. [sent-55, score-0.599]

26 This assumption could be relaxed as demonstrated in [16, 19] • A single example of class u, (x, u) and a single example of class v, (x, v) • An extended sample {(x1 , y1 ), . [sent-62, score-0.595]

27 Recall that our goal is to generate a classifier h(x) which discriminates between instances of the two object classes u and v. [sent-67, score-0.622]

28 In general, learning from a single example is prone to overfitting, yet if a set of classes is γ separated, a single example is sufficient for a nearest neighbor classifier to achieve perfect performance. [sent-68, score-0.815]

29 Learn from the extended sample a distance function d that achieves γ separation on classes y ∈ {u, v}. [sent-70, score-0.613]

30 Learn a nearest neighbor classifier h(x) from the single examples, where the classifier employs d for evaluating distances. [sent-72, score-0.393]

31 We informally state that analogously, if we find a distance function d such that q − 2 classes that form the extended sample are separated by a large γ with respect to d, with high probability classes u and v should exhibit the separation characteristic as well. [sent-77, score-0.979]

32 If these assumptions hold and d indeed induces γ separation on classes u and v, then a nearest neighbor classifier would generalize well from a single training example of the target classes. [sent-78, score-0.955]

33 Classifying instances in the original instance space by comparing them to the target classes’ single examples x and x , leads to overfitting. [sent-82, score-0.494]

34 In contrast, our approach projects the instance space by W and only then applies a nearest neighbor distance measurement to the projected single examples W x and W x . [sent-83, score-0.797]

35 Our method relies on the distance d, parameterized by W , to achieve γ separation on classes u and v. [sent-84, score-0.561]

36 In certain problems it is not possible to achieve γ separation by using a distance function which is based on a linear transformation of the instance space. [sent-85, score-0.385]

37 3 A Principal Angles Image Kernel We dedicate this section to describe a Mercer kernel between sets of locality based imi age features {lj , pi }k encoded as n × k matrices. [sent-87, score-0.375]

38 Although potentially advantageous in j j=1 many applications, one shortcoming in adopting locality based feature descriptors lays in the vagueness of matching two sets of corresponding locations pi , pi selected from difj j ferent object images i and i (see Fig. [sent-88, score-0.796]

39 Recently attempts have been made to tackle this problem [19], we choose to follow [20] by adopting the principal angles kernel approach that implicitly maps x of size n × k to a significantly higher n -dimensional feature space k φ(x) ∈ F . [sent-90, score-0.369]

40 One advantage of the principal angels kernel emerges from its invariance to column permutations of the instance matrices x i and xi , thus circumventing the location matching problem stated above. [sent-97, score-0.424]

41 Extensions of the principal angles kernel that have the additional capacity to incorporate knowledge on the accurate location matching, might enhance the kernel’s descriptive power [16]. [sent-98, score-0.449]

42 4 Learning a Class Relevance Pseudo-Metric In this section we describe the two stage framework for learning from a single example to accurately classify classes u and v. [sent-99, score-0.466]

43 We focus on transferring information from the extended sample of classes y ∈ {u, v} in the form of a learned pseudo-metric over X . [sent-100, score-0.368]

44 For sake of / clarity we will start by temporarily referring to the instance space X as a vector space, but later return to our original definition of instances in X as being matrices which columns i encode a selected set of locality based descriptors {lj , pi }k . [sent-101, score-0.775]

45 We now restate our goal as that of using the extended sample of classes y ∈ {u, v} in order to find a linear projection W that achieves γ separation / by emphasizing the relevant dimensions for classification and diminishing the overfitting effects of non-relevant dimensions. [sent-106, score-0.983]

46 Several linear methods exist for finding a class relevance projection [2, 9], some of which have a kernel based variant [12]. [sent-107, score-0.717]

47 2 demonstrates how a class relevance pseudo-metric enables training a nearest neighbor classifier from a single example of two classes in a synthetic two dimensional setting. [sent-112, score-1.315]

48 Figure 2: A synthetic sample of six obliquely oriented classes in a two dimensional space (left). [sent-113, score-0.486]

49 A class relevance metric is calculated from the (m = 200) examples of the four classes y ∈ {u, v} / marked in gray. [sent-114, score-0.98]

50 The examples of the target classes u and v, indicated in black, are not used in calculating the metric. [sent-115, score-0.51]

51 After learning the pseudo-metric, all the instances of the six classes are projected to the class relevance space. [sent-116, score-1.033]

52 Here distance measurements are performed between the four classes y ∈ {u, v}. [sent-117, score-0.405]

53 Throughout the / paper distance matrix indices are ordered by class so γ separated classes should appear as block diagonal matrices. [sent-119, score-0.612]

54 Although not participating in calculating the pseudo-metric, classes u and v exhibit γ separation (center-bottom). [sent-120, score-0.546]

55 After the class relevance projection, a nearest neighbor classifier will generalize well from any single example of classes u and v (right). [sent-121, score-1.194]

56 These operations have no clear interpretation when applied to our representation of objects as sets of locality based i descriptors {lj , pi }k . [sent-123, score-0.635]

57 3 demonstrates how a class relevance pseudometric enables training from a single example in a classification problem, where nonlinear projection of the instance space is required for achieving a γ margin. [sent-129, score-0.854]

58 e, n, t, f, h and c) are selected as target classes for our experiment (see examples in Fig. [sent-132, score-0.492]

59 Our second representation adopts the shape context descriptors for object encoding. [sent-137, score-0.784]

60 This representation relies on a set of 40 locations pj randomly sampled from the object contour. [sent-138, score-0.369]

61 The descriptor of each location pj is based on a 60-bin histogram (5 radius × 12 orientation bins) summing the number of ”lit” pixels falling in each specific radius and orientation bin (using pj as the origin). [sent-139, score-0.506]

62 Shape Figure 3: A synthetic sample of six co-centric classes in a two dimensional space (left). [sent-143, score-0.486]

63 Two class relevance metrics are calculated from the examples (m = 200) of the four classes y ∈ {u, v} marked / in gray using either a linear or a second degree polynomial kernel. [sent-144, score-0.959]

64 The examples of the target classes u and v, indicated in black, are not used in calculating the metrics. [sent-145, score-0.51]

65 After learning both metrics, all the instances of the six classes are projected using both class relevance metrics. [sent-146, score-1.033]

66 Then distance measurements are performed between the four classes y ∈ {u, v}. [sent-147, score-0.405]

67 Classes u and v, not participating in calculating the pseudo-metric, exhibit γ separation only when using an appropriate kernel (right-bottom). [sent-149, score-0.404]

68 A linear kernel cannot accommodate γ separation between co-centric classes (center-bottom). [sent-150, score-0.538]

69 context descriptors have proven to be robust in many classification tasks [3] and avoid the common reliance on a detection of (the often elusive) interest points. [sent-151, score-0.377]

70 In many writing systems letters tend to share a common underlying set of class relevant and non-relevant dimensions (Fig. [sent-152, score-0.684]

71 We therefore expect that letters should be a good candidate for exhibiting that a class relevance pseudo-metric achieving a large margin γ, would induce the distance separation characteristic on two additional letter classes in the same system. [sent-154, score-1.377]

72 A nearest neighbor classifier is defined by the two examples, in order to assess baseline performance of training from a single example. [sent-158, score-0.498]

73 A linear kernel is applied for the pixel based representation while the principal angles kernel is used for the shape context representation. [sent-159, score-0.933]

74 Baseline results for the shape context and pixel representations are depicted in Fig. [sent-161, score-0.373]

75 4) is implemented and run for 1000 iterations over random pairs selected from the 240 training examples (m = 4 classes × 60 examples). [sent-168, score-0.487]

76 It is observed that the learned pseudo-metric approximates the separation goal on the two unseen target classes u and v (center plot of Fig. [sent-170, score-0.503]

77 A nearest neighbor classifier is then trained using the projected examples (W x,W x ) from class u and v. [sent-172, score-0.663]

78 When training from a single example the lower dimensional pixel representation (of size 1296) displays less of an overfitting effect than the shape context representation paired with a principal angles kernel (implicitly mapped by the kernel from size 60 × 40 to size 60 40 ). [sent-177, score-1.215]

79 The context descriptor at location p is based on a 60-bin histogram (5 radius × 12 orientation bins) of all surrounding pixels, using p as the origin. [sent-182, score-0.36]

80 Three examples of the letter e, depicted with the histogram bin boundaries (top) and three derived shape context histograms plotted as log(radius) × orientation bins (bottom). [sent-183, score-0.661]

81 Note the similarity of the two shape context descriptors sampled from analogous locations on two different examples of the letter e (two bottom-center plots). [sent-184, score-0.772]

82 The shape context of a descriptor sampled from a distant location is evidently different (right). [sent-185, score-0.353]

83 5 A B C D E F Figure 5: Letters in many writing systems, like uppercase Latin, tend to share a common underlying set of class relevant and non-relevant dimensions (left plot adapted from [5]). [sent-196, score-0.699]

84 A class relevance pseudo-metric was calculated from four letters (i. [sent-197, score-0.661]

85 The central plot depicts the distance matrix of the two target letters (i. [sent-200, score-0.371]

86 e and n) after the class relevance pseudo-metric projection. [sent-202, score-0.482]

87 The right plot presents average accuracy of classifiers trained on a single example of lowercase letters (i. [sent-203, score-0.371]

88 Shape Context Representation after a projection derived from uppercase letters D. [sent-208, score-0.359]

89 However, it appears that by projecting the single examples using the class relevance pseudo-metric, the class relevant dimensions are emphasized and hindering effects of the non-relevant dimensions are diminished (displayed at Fig. [sent-214, score-1.358]

90 It should be noted that a simple linear pseudo-metric projection cannot achieve the desired margin on the extended sample, and therefore seems not to generalize well from the single trial training stage. [sent-216, score-0.396]

91 Following the same setting as in the first experiment we randomly selected two lowercase Latin letters for the single trial training task, while applying a pseudo-metric projection derived from uppercase Latin letters. [sent-220, score-0.681]

92 It is observed that utilizing a less relevant pseudo-metric attenuates the benefit in the setting based on the shape context representation paired with the principal angles kernel (Fig. [sent-221, score-0.819]

93 In the linear pixel based setting projecting lowercase letters to the uppercase relevance directions significantly deteriorates performance (Fig. [sent-223, score-0.891]

94 Our approach, first attempts to learn from available examples of other related classes, a class relevance metric where all within class distances are smaller than between class distances. [sent-226, score-0.994]

95 We then, define a nearest neighbor classifier for the two target classes, using the class relevance metric. [sent-227, score-0.869]

96 Our high dimensional representation applied a principal angles kernel [20] to sets of local shape descriptors [3]. [sent-228, score-0.937]

97 However, by learning the class relevance metric from available examples of related objects, relevant dimensions for classification are emphasized and the overfitting effects of irrelevant dimensions are diminished. [sent-230, score-1.132]

98 Varying the choice of local feature descriptors [11, 15], and enhancing the image kernel [16] might further improve the proposed method’s generalization capacity in other object classification settings. [sent-232, score-0.566]

99 We assume that our examples represent a set of classes that originate from a common set of constraints, thus imposing that the classes tend to agree on the relevance and non-relevance of different dimensions. [sent-233, score-1.062]

100 These mechanisms are closely related to our framework when considering the common features as a subset of directions in our class relevance pseudo-metric. [sent-236, score-0.52]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('relevance', 0.317), ('classes', 0.28), ('descriptors', 0.242), ('neighbor', 0.17), ('latin', 0.166), ('class', 0.165), ('nearest', 0.153), ('object', 0.15), ('instances', 0.145), ('letters', 0.143), ('kernel', 0.138), ('shape', 0.136), ('letter', 0.135), ('dimensions', 0.134), ('angles', 0.131), ('distance', 0.125), ('locality', 0.121), ('separation', 0.12), ('lowercase', 0.119), ('uppercase', 0.119), ('lj', 0.114), ('classi', 0.111), ('examples', 0.111), ('er', 0.105), ('instance', 0.104), ('principal', 0.1), ('representation', 0.099), ('tting', 0.098), ('projection', 0.097), ('context', 0.097), ('pixel', 0.094), ('relevant', 0.081), ('effects', 0.079), ('pi', 0.079), ('descriptor', 0.076), ('metric', 0.071), ('fink', 0.071), ('single', 0.07), ('pj', 0.069), ('da', 0.066), ('reusable', 0.065), ('target', 0.064), ('projected', 0.064), ('six', 0.062), ('character', 0.062), ('projecting', 0.062), ('adopts', 0.06), ('permutation', 0.06), ('training', 0.059), ('objects', 0.057), ('emphasizing', 0.057), ('calculating', 0.055), ('radius', 0.055), ('unfeasible', 0.054), ('dimensional', 0.054), ('locations', 0.051), ('orientation', 0.05), ('writing', 0.05), ('metrics', 0.05), ('margin', 0.05), ('bins', 0.048), ('psd', 0.047), ('temporarily', 0.047), ('discriminates', 0.047), ('diminishing', 0.047), ('entail', 0.047), ('participating', 0.047), ('synthetic', 0.047), ('qi', 0.047), ('depicted', 0.046), ('baseline', 0.046), ('extended', 0.045), ('location', 0.044), ('exhibit', 0.044), ('sample', 0.043), ('displayed', 0.042), ('separated', 0.042), ('achieving', 0.042), ('dual', 0.042), ('emphasized', 0.04), ('variability', 0.04), ('accurately', 0.04), ('plot', 0.039), ('classify', 0.039), ('generalize', 0.039), ('histogram', 0.038), ('emerges', 0.038), ('common', 0.038), ('cation', 0.037), ('demonstrated', 0.037), ('setting', 0.037), ('stage', 0.037), ('selected', 0.037), ('share', 0.037), ('sets', 0.037), ('capacity', 0.036), ('perfect', 0.036), ('calculated', 0.036), ('tend', 0.036), ('achieve', 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics

Author: Michael Fink

Abstract: We describe a framework for learning an object classifier from a single example. This goal is achieved by emphasizing the relevant dimensions for classification using available examples of related classes. Learning to accurately classify objects from a single training example is often unfeasible due to overfitting effects. However, if the instance representation provides that the distance between each two instances of the same class is smaller than the distance between any two instances from different classes, then a nearest neighbor classifier could achieve perfect performance with a single training example. We therefore suggest a two stage strategy. First, learn a metric over the instances that achieves the distance criterion mentioned above, from available examples of other related classes. Then, using the single examples, define a nearest neighbor classifier where distance is evaluated by the learned class relevance metric. Finding a metric that emphasizes the relevant dimensions for classification might not be possible when restricted to linear projections. We therefore make use of a kernel based metric learning algorithm. Our setting encodes object instances as sets of locality based descriptors and adopts an appropriate image kernel for the class relevance metric learning. The proposed framework for learning from a single example is demonstrated in a synthetic setting and on a character classification task. 1

2 0.21607007 85 nips-2004-Instance-Based Relevance Feedback for Image Retrieval

Author: Giorgio Gia\-cin\-to, Fabio Roli

Abstract: High retrieval precision in content-based image retrieval can be attained by adopting relevance feedback mechanisms. These mechanisms require that the user judges the quality of the results of the query by marking all the retrieved images as being either relevant or not. Then, the search engine exploits this information to adapt the search to better meet user’s needs. At present, the vast majority of proposed relevance feedback mechanisms are formulated in terms of search model that has to be optimized. Such an optimization involves the modification of some search parameters so that the nearest neighbor of the query vector contains the largest number of relevant images. In this paper, a different approach to relevance feedback is proposed. After the user provides the first feedback, following retrievals are not based on knn search, but on the computation of a relevance score for each image of the database. This score is computed as a function of two distances, namely the distance from the nearest non-relevant image and the distance from the nearest relevant one. Images are then ranked according to this score and the top k images are displayed. Reported results on three image data sets show that the proposed mechanism outperforms other state-of-the-art relevance feedback mechanisms. 1 In t rod u ct i on A large number of content-based image retrieval (CBIR) systems rely on the vector representation of images in a multidimensional feature space representing low-level image characteristics, e.g., color, texture, shape, etc. [1]. Content-based queries are often expressed by visual examples in order to retrieve from the database the images that are “similar” to the examples. This kind of retrieval is often referred to as K nearest-neighbor retrieval. It is easy to see that the effectiveness of content-based image retrieval systems (CBIR) strongly depends on the choice of the set of visual features, on the choice of the “metric” used to model the user’s perception of image similarity, and on the choice of the image used to query the database [1]. Typically, if we allow different users to mark the images retrieved with a given query as relevant or non-relevant, different subsets of images will be marked as relevant. Accordingly, the need for mechanisms to adapt the CBIR system response based on some feedback from the user is widely recognized. It is interesting to note that while relevance feedback mechanisms have been first introduced in the information retrieval field [2], they are receiving more attention in the CBIR field (Huang). The vast majority of relevance feedback techniques proposed in the literature is based on modifying the values of the search parameters as to better represent the concept the user bears in mind. To this end, search parameters are computed as a function of the relevance values assigned by the user to all the images retrieved so far. As an example, relevance feedback is often formulated in terms of the modification of the query vector, and/or in terms of adaptive similarity metrics. [3]-[7]. Recently, pattern classification paradigms such as SVMs have been proposed [8]. Feedback is thus used to model the concept of relevant images and adjust the search consequently. Concept modeling may be difficult on account of the distribution of relevant images in the selected feature space. “Narrow domain” image databases allows extracting good features, so that images bearing similar concepts belong to compact clusters. On the other hand, “broad domain” databases, such as image collection used by graphic professionals, or those made up of images from the Internet, are more difficult to subdivide in cluster because of the high variability of concepts [1]. In these cases, it is worth extracting only low level, non-specialized features, and image retrieval is better formulated in terms of a search problem rather then concept modeling. The present paper aims at offering an original contribution in this direction. Rather then modeling the concept of “relevance” the user bears in mind, feedback is used to assign each image of the database a relevance score. Such a score depends only from two dissimilarities (distances) computed against the images already marked by the user: the dissimilarity from the set of relevant images, and the dissimilarity from the set of non-relevant images. Despite its computational simplicity, this mechanism allows outperforming state-of-the-art relevance feedback mechanisms both on “narrow domain” databases, and on “broad domain” databases. This paper is organized as follows. Section 2 illustrates the idea behind the proposed mechanism and provides the basic assumptions. Section 3 details the proposed relevance feedback mechanism. Results on three image data sets are presented in Section 4, where performances of other relevance feedback mechanisms are compared. Conclusions are drawn in Section 5. 2 In st an ce- b ased rel evan ce est i m at i on The proposed mechanism has been inspired by classification techniques based on the “nearest case” [9]-[10]. Nearest-case theory provided the mechanism to compute the dissimilarity of each image from the sets of relevant and non–relevant images. The ratio between the nearest relevant image and the nearest non-relevant image has been used to compute the degree of relevance of each image of the database [11]. The present section illustrates the rationale behind the use of the nearest-case paradigm. Let us assume that each image of the database has been represented by a number of low-level features, and that a (dis)similarity measure has been defined so that the proximity between pairs of images represents some kind of “conceptual” similarity. In other words, the chosen feature space and similarity metric is meaningful at least for a restricted number of users. A search in image databases is usually performed by retrieving the k most similar images with respect to a given query. The dimension of k is usually small, to avoid displaying a large number of images at a time. Typical values for k are between 10 and 20. However, as the “relevant” images that the user wishes to retrieve may not fit perfectly with the similarity metric designed for the search engine, the user may be interested in exploring other regions of the feature space. To this end, the user marks the subset of “relevant” images out of the k retrieved. Usually, such relevance feedback is used to perform a new k-nn search by modifying some search parameters, i.e., the position of the query point, the similarity metric, and other tuning parameters [1]-[7]. Recent works proposed the use of support vector machine to learn the distribution of relevant images [8]. These techniques require some assumption about the general form of the distribution of relevant images in the feature space. As it is difficult to make any assumption about such a distribution for broad domain databases, we propose to exploit the information about the relevance of the images retrieved so far in a nearest-neighbor fashion. Nearest-neighbor techniques, as used in statistical pattern recognition, case-based reasoning, or instance-based learning, are effective in all applications where it is difficult to produce a high-level generalization of a “class” of objects [9]-[10],[12][13]. Relevance learning in content base image retrieval may well fit into this definition, as it is difficult to provide a general model that can be adapted to represent different concepts of similarity. In addition, the number of available cases may be too small to estimate the optimal set of parameters for such a general model. On the other hand, it can be more effective to use each “relevant” image as well as each “non-relevant” image, as “cases” or “instances” against which the images of the database should be compared. Consequently, we assume that an image is as much as relevant as much as its dissimilarity from the nearest relevant image is small. Analogously, an image is as much as non-relevant as much as its dissimilarity from the nearest non-relevant image is small. 3 Rel evan ce S core Com p u t ati on According to previous section, each image of the database can be thus characterized by a “degree of relevance” and a “degree of non-relevance” according to the dissimilarities from the nearest relevant image, and from the nearest non-relevant image, respectively. However, it should be noted that these degrees should be treated differently because only “relevant” images represent a “concept” in the user’s mind, while “non-relevant” images may represent a number of other concepts different from user’s interest. In other words, while it is meaningful to treat the degree of relevance as a degree of membership to the class of relevant images, the same does not apply to the degree of non-relevance. For this reason, we propose to use the “degree of non-relevance” to weight the “degree of relevance”. Let us denote with R the subset of indexes j ∈ {1,...,k} related to the set of relevant images retrieved so far and the original query (that is relevant by default), and with NR the subset of indexes j ∈ (1,...,k} related to the set of non-relevant images retrieved so far. For each image I of the database, according to the nearest neighbor rule, let us compute the dissimilarity from the nearest image in R and the dissimilarity from the nearest image in NR. Let us denote these dissimilarities as dR(I) and dNR(I), respectively. The value of dR(I) can be clearly used to measure the degree of relevance of image I, assuming that small values of dR(I) are related to very relevant images. On the other hand, the hypothesis that image I is relevant to the user’s query can be supported by a high value of dNR(I). Accordingly, we defined the relevance score ! dR ( I ) $ relevance ( I ) = # 1 + dN ( I ) &

3 0.12450123 145 nips-2004-Parametric Embedding for Class Visualization

Author: Tomoharu Iwata, Kazumi Saito, Naonori Ueda, Sean Stromsten, Thomas L. Griffiths, Joshua B. Tenenbaum

Abstract: In this paper, we propose a new method, Parametric Embedding (PE), for visualizing the posteriors estimated over a mixture model. PE simultaneously embeds both objects and their classes in a low-dimensional space. PE takes as input a set of class posterior vectors for given data points, and tries to preserve the posterior structure in an embedding space by minimizing a sum of Kullback-Leibler divergences, under the assumption that samples are generated by a Gaussian mixture with equal covariances in the embedding space. PE has many potential uses depending on the source of the input data, providing insight into the classifier’s behavior in supervised, semi-supervised and unsupervised settings. The PE algorithm has a computational advantage over conventional embedding methods based on pairwise object relations since its complexity scales with the product of the number of objects and the number of classes. We demonstrate PE by visualizing supervised categorization of web pages, semi-supervised categorization of digits, and the relations of words and latent topics found by an unsupervised algorithm, Latent Dirichlet Allocation. 1

4 0.12409593 107 nips-2004-Making Latin Manuscripts Searchable using gHMM's

Author: Jaety Edwards, Yee W. Teh, Roger Bock, Michael Maire, Grace Vesom, David A. Forsyth

Abstract: We describe a method that can make a scanned, handwritten mediaeval latin manuscript accessible to full text search. A generalized HMM is fitted, using transcribed latin to obtain a transition model and one example each of 22 letters to obtain an emission model. We show results for unigram, bigram and trigram models. Our method transcribes 25 pages of a manuscript of Terence with fair accuracy (75% of letters correctly transcribed). Search results are very strong; we use examples of variant spellings to demonstrate that the search respects the ink of the document. Furthermore, our model produces fair searches on a document from which we obtained no training data. 1. Intoduction There are many large corpora of handwritten scanned documents, and their number is growing rapidly. Collections range from the complete works of Mark Twain to thousands of pages of zoological notes spanning two centuries. Large scale analyses of such corpora is currently very difficult, because handwriting recognition works poorly. Recently, Rath and Manmatha have demonstrated that one can use small bodies of aligned material as supervised data to train a word spotting mechanism [7]. The result can make scanned handwritten documents searchable. Current techniques assume a closed vocabulary — one can search only for words in the training set — and search for instances of whole words. This approach is particularly unattractive for an inflected language, because individual words can take so many forms that one is unlikely to see all in the training set. Furthermore, one would like the method used to require very little aligned training data, so that it is possible to process documents written by different scribes with little overhead. Mediaeval Latin manuscripts are a natural first corpus for studying this problem, because there are many scanned manuscripts and because the handwriting is relatively regular. We expect the primary user need to be search over a large body of documents — to allow comparisons between documents — rather than transcription of a particular document (which is usually relatively easy to do by hand). Desirable features for a system are: First, that it use little or no aligned training data (an ideal, which we believe may be attainable, is an unsupervised learning system). Second, that one can search the document for an arbitrary string (rather than, say, only complete words that appear in the training data). This would allow a user to determine whether a document contains curious or distinctive spellings, for example (figure 7). We show that, using a statistical model based on a generalized HMM, we can search a medieval manuscript with considerable accuracy, using only one instance each of each letter in the manuscript to train the method (22 instances in total; Latin has no j, k, w, or z). Furthermore, our method allows fairly accurate transcription of the manuscript. We train our system on 22 glyphs taken from a a 12th century latin manuscript of Terence’s Comedies (obtained from a repository of over 80 scanned medieval works maintained by Oxford University [1]). We evaluate searches using a considerable portion of this manuscript aligned by hand; we then show that fair search results are available on a different manuscript (MS. Auct. D. 2. 16, Latin Gospels with beast-headed evangelist portraits made at Landvennec, Brittany, late 9th or early 10th century, from [1]) without change of letter templates. 1.1. Previous Work Handwriting recognition is a traditional problem, too well studied to review in detail here (see [6]). Typically, online handwriting recognition (where strokes can be recorded) works better than offline handwriting recognition. Handwritten digits can now be recognized with high accuracy [2, 5]. Handwritten amounts can be read with fair accuracy, which is significantly improved if one segments the amount into digits at the same time as one recognizes it [4, 5]. Recently several authors have proposed new techniques for search and translation in this unrestricted setting. Manmatha et al [7] introduce the technique of “word spotting,” which segments text into word images, rectifies the word images, and then uses an aligned training set to learn correspondences between rectified word images and strings. The method is not suitable for a heavily inflected language, because words take so many forms. In an inflected language, the natural unit to match to is a subset of a word, rather than a whole word, implying that one should segment the text into blocks — which may be smaller than words — while recognizing. Vinciarelli et al [8] introduce a method for line by line recognition based around an HMM and quite similar to techniques used in the speech recognition community. Their method uses a window that slides along the text to obtain features; this has the difficulty that the same window is in some places too small (and so uninformative) and in others too big (and so spans more than one letter, and is confusing). Their method requires a substantial body of aligned training data, which makes it impractical for our applications. Close in spirit to our work is the approach to machine translation of Koehn and Knight [3]. They demonstrate that the statistics of unaligned corpora may provide as powerful constraints for training models as aligned bitexts. 2. The Model Our models for both search and transcription are based on the generalized HMM and differ only in their choice of transition model. In an HMM, each hidden node ct emits a single evidence node xt . In a generalized HMM, we allow each ct to emit a series of x’s whose length is itself a random variable. In our model, the hidden nodes correspond to letters and each xt is a single column of pixels. Allowing letters to emit sets of columns lets us accomodate letter templates of variable width. In particular, this means that we can unify segmenting ink into letters and recognizing blocks of ink; figure 3 shows an example of how useful this is. 2.1. Generating a line of text Our hidden state consists of a character label c, width w and vertical position y. The statespace of c contains the characters ‘a’-‘z’, a space ‘ ’, and a special end state Ω. Let T c be the template associated with character c, Tch , Tcw be respectively the height and width of that template, and m be the height of the image. Figure 1: Left, a full page of our manuscript, a 12’th century manuscript of Terence’s Comedies obtained from [1]. Top right, a set of lines from a page from that document and bottom right, some words in higher resolution. Note: (a) the richness of page layout; (b) the clear spacing of the lines; (c) the relatively regular handwriting. Figure 2: Left, the 22 instances, one per letter, used to train our emission model. These templates are extracted by hand from the Terence document. Right, the five image channels for a single letter. Beginning at image column 1 (and assuming a dummy space before the first character), • • • • choose character c ∼ p(c|c−1...−n ) (an n-gram letter model) choose length w ∼ Uniform(Tcw − k, Tcw + k) (for some small k) choose vertical position y ∼ Uniform(1, m − Tch ) z,y and Tch now define a bounding box b of pixels. Let i and j be indexed from the top left of that bounding box. – draw pixel (i, j) ∼ N (Tcij , σcij ) for each pixel in b – draw all pixels above and below b from background gaussian N (µ0 , σ0 ) (See 2.2 for greater detail on pixel emission model) • move to column w + 1 and repeat until we enter the end state Ω. Inference on a gHMM is a relatively straighforward business of dynamic programming. We have used unigram, bigram and trigram models, with each transition model fitted using an electronic version of Caesar’s Gallic Wars, obtained from http://www.thelatinlibrary.com. We do not believe that the choice of author should significantly affect the fitted transition model — which is at the level of characters — but have not experimented with this point. The important matter is the emission model. 2.2. The Emission Model Our emission model is as follows: Given the character c and width w, we generate a template of the required length. Each pixel in this template becomes the mean of a gaussian which generates the corresponding pixel in the image. This template has a separate mean image for each pixel channel. The channels are assumed independent given the means. We train the model by cutting out by hand a single instance of each letter from our corpus (figure 2). This forms the central portion of the template. Pixels above and below this Model Perfect transcription unigram bigram trigram matching chars 21019 14603 15572 15788 substitutions 0 5487 4597 4410 insertions 0 534 541 507 deletions 0 773 718 695 Table 1: Edit distance between our transcribed Terence and the editor’s version. Note the trigram model produces significantly fewer letter errors than the unigram model, but that the error rate is still a substantial 25%. central box are generated from a single gaussian used to model background pixels (basically white pixels). We add a third variable yt to our hidden state indicating the vertical position of the central box. However, since we are uninterested in actually recovering this variable, during inference we sum it out of the model. The width of a character is constrained to be close to the width (tw ) of our hand cut example by setting p(w|c) = 0 for w < tw − k and w > tw + k. Here k is a small, user defined integer. Within this range, p(w|c) is distributed uniformly, larger templates are created by appending pixels from the background model to the template and smaller ones by simply removing the right k-most columns of the hand cut example. For features, we generate five image representations, shown in figure 2. The first is a grayscale version of the original color image. The second and third are generated by convolving the grayscale image with a vertical derivative of gaussian filter, separating the positive and negative components of this response, and smoothing each of these gradient images separately. The fourth and fifth are generated similarly but with a horizontal derivative of gaussian filter. We have experimented with different weightings of these 5 channels. In practice we use the gray scale channel and the horizontal gradient channels. We emphasize the horizontal pieces since these seem the more discriminative. 2.3. Transcription For transcription, we model letters as coming from an n-gram language model, with no dependencies between words. Thus, the probability of a letter depends on the k letters before it, where k = n unless this would cross a word boundary in which case the history terminates at this boundary. We chose not to model word to word transition probabilities since, unlike in English, word order in Latin is highly arbitrary. This transition model is fit from a corpus of ascii encoded latin. We have experimented with unigram (i.e. uniform transition probabilities), bigram and trigram letter models. We can perform transcription by fitting the maximum likelihood path through any given line. Some results of this technique are shown in figure 3. 2.4. Search For search, we rank lines by the probability that they contain our search word. We set up a finite state machine like that in figure 4. In this figure, ‘bg’ represents our background model for that portion of the line not generated by our search word. We can use any of the n-gram letter models described for transcription as the transition model for ‘bg’. The probability that the line contains the search word is the probability that this FSM takes path 1. We use this FSM as the transition model for our gHMM, and output the posterior probability of the two arrows leading into the end state. 1 and 2 are user defined weights, but in practice the algorithm does not appear to be particular sensitive to the choice of these parameters. The results presented here use the unigram model. Editorial translation Orator ad vos venio ornatu prologi: unigram b u rt o r a d u o s u em o o r n a t u p r o l o g r b u rt o r a d v o s v em o o r u a t u p r o l o g r fo r a t o r a d v o s v en i o o r n a t u p r o l o g i bigram trigram Figure 3: We transcribe the text by finding the maximum likelihood path through the gHMM. The top line shows the standard version of the line (obtained by consensus among editors who have consulted various manuscripts; we obtained this information in electronic form from http://www.thelatinlibrary.com). Below, we show the line as segmented and transcribed by unigram, bigram and trigram models; the unigram and bigram models transcribe one word as “vemo”, but the stronger trigram model forces the two letters to be segmented and correctly transcribes the word as “venio”, illustrating the considerable benefit to be obtained by segmenting only at recognition time. 1 − ε1 Path 1 1 − ε2 a b bg ε1 Ω bg Path 2 ε2 Figure 4: The finite state machine to search for the word ‘ab.’ ‘bg’ is a place holder for the larger finite state machine defined by our language model’s transition matrix. 3. Results Figure 1 shows a page from our collection. This is a scanned 12th century manuscript of Terence’s Comedies, obtained from the collection at [1]. In preprocessing, we extract individual lines of text by rotating the image to various degrees and projecting the sum of the pixel values onto the y-axis. We choose the orientation whose projection vector has the lowest entropy, and then segment lines by cutting at minima of this projection. Transcription is not our primary task, but methods that produce good transcriptions are going to support good searches. The gHMM can produce a surprisingly good transcription, given how little training data is used to train the emission model. We aligned an editors version of Terence with 25 pages from the manuscript by hand, and computed the edit distance between the transcribed text and the aligned text; as table 1 indicates, approximately 75% of letters are read correctly. Search results are strong. We show results for two documents. The first set of results refers to the edition of Terence’s Comedies, from which we took the 22 letter instances. In particular, for any given search term, our process ranks the complete set of lines. We used a hand alignment of the manuscript to determine which lines contained each term; figure 5 shows an overview of searches performed using every word that appears in the 50 100 150 200 250 300 350 400 450 500 550 Figure 5: Our search ranks 587 manuscript lines, with higher ranking lines more likely to contain the relevant term. This figure shows complete search results for each term that appears more than three times in the 587 lines. Each row represents the ranked search results for a term, and a black mark appears if the search term is actually in the line; a successful search will therefore appear as a row which is wholly dark to the left, and then wholly light. All 587 lines are represented. More common terms are represented by lower rows. More detailed results appear in figure 5 and figure 6; this summary figure suggests almost all searches are highly successful. document more than three times, in particular, showing which of the ranked set of lines actually contained the search term. For almost every search, the term appears mainly in the lines with higher rank. Figure 6 contains more detailed information for a smaller set of words. We do not score the position of a word in a line (for practical reasons). Figure 7 demonstrates (a) that our search respects the ink of the document and (b) that for the Terence document, word positions are accurately estimated. The spelling of mediaeval documents is typically cleaned up by editors; in our manuscript, the scribe reliably spells “michi” for the standard “mihi”. A search on “michi” produces many instances; a search on “mihi” produces none, because the ink doesn’t have any. Notice this phenomenon also in the bottom right line of figure 7, the scribe writes “habet, ut consumat nunc cum nichil obsint doli” and the editor gives “habet, ut consumat nunc quom nil obsint doli.” Figure 8 shows that searches on short strings produce many words containing that string as one would wish. 4. Discussion We have shown that it is possible to make at least some handwritten mediaeval manuscripts accessible to full text search, without requiring an aligned text or much supervisory data. Our documents have very regular letters, and letter frequencies — which can be obtained from transcribed Latin — appear to provide so powerful a cue that relatively little detailed information about letter shapes is required. Linking letter segmentation and recognition has thoroughly beneficial effects. This suggests that the pool of manuscripts that can be made accessible in this way is large. In particular, we have used our method, trained on 22 instances of letters from one document, to search another document. Figure 9 shows the results from two searches of our second document (MS. Auct. D. 2. 16, Latin Gospels with beast-headed evangelist portraits made at Landvennec, Brittany, late 9th or early 10th century, from [1]). No information from this document was used in training at all; but letter 1tu arbitror pater etiam nisi factum primum siet vero illi inter hic michi ibi qui tu ibi michi 0.9 0.8 0.7 qui hic 0.6 inter 0.5 illi 0.4 siet 0.3 vero 0.2 nisi 0.1 50 100 150 200 250 300 350 400 450 500 550 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 6: On the left, search results for selected words (indicated on the leftmost column). Each row represents the ranked search results for a term, and a black mark appears if the search term is actually in the line; a successful search will therefore appear as a row which is wholly dark to the left, and then wholly light. Note only the top 300 results are represented, and that lines containing the search term are almost always at or close to the top of the search results (black marks to the left). On the right, we plot precision against recall for a set of different words by taking the top 10, 20, ... lines returned from the search, and checking them against the aligned manuscript. Note that, once all cases have been found, if the size of the pool is increased the precision will fall with 100% recall; many words work well, with most of the first 20 or so lines returned containing the search term. shapes are sufficiently well shared that the search is still useful. All this suggests that one might be able to use EM to link three processes: one that clusters to determine letter shapes; one that segments letters; and one that imposes a language model. Such a system might be able to make handwritten Latin searchable with no training data. References [1] Early Manuscripts at Oxford University. Bodleian library ms. auct. f. 2.13. http://image.ox.ac.uk/. [2] Serge Belongie, Jitendra Malik, and Jan Puzicha. Shape matching and object recognition using shape contexts. IEEE T. Pattern Analysis and Machine Intelligence, 24(4):509–522, 2002. [3] Philipp Koehn and Kevin Knight. Estimating word translation probabilities from unrelated monolingual corpora. In Proc. of the 17th National Conf. on AI, pages 711–715. AAAI Press / The MIT Press, 2000. [4] Y. LeCun, L. Bottou, and Y. Bengio. Reading checks with graph transformer networks. In International Conference on Acoustics, Speech, and Signal Processing, volume 1, pages 151–154, Munich, 1997. IEEE. [5] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [6] R. Plamondon and S.N. Srihari. Online and off-line handwriting recognition: a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):63–84, 2000. [7] T. M. Rath and R. Manmatha. Word image matching using dynamic time warping. In Proc. of the Conf. on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 521–527, 2003. [8] Alessandro Vinciarelli, Samy Bengio, and Horst Bunke. Offline recognition of unconstrained handwritten texts using hmms and statistical language models. IEEE Trans. Pattern Anal. Mach. Intell., 26(6):709–720, 2004. michi: Spe incerta certum mihi laborem sustuli, mihi: Faciuntne intellegendo ut nil intellegant? michi: Nonnumquam conlacrumabat. placuit tum id mihi. mihi: Placuit: despondi. hic nuptiis dictust dies. michi: Sto exspectans siquid mi imperent. venit una,

5 0.12284677 127 nips-2004-Neighbourhood Components Analysis

Author: Jacob Goldberger, Geoffrey E. Hinton, Sam T. Roweis, Ruslan Salakhutdinov

Abstract: In this paper we propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leave-one-out KNN score on the training set. It can also learn a low-dimensional linear embedding of labeled data that can be used for data visualization and fast classification. Unlike other methods, our classification model is non-parametric, making no assumptions about the shape of the class distributions or the boundaries between them. The performance of the method is demonstrated on several data sets, both for metric learning and linear dimensionality reduction. 1

6 0.11122678 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition

7 0.11081397 22 nips-2004-An Investigation of Practical Approximate Nearest Neighbor Algorithms

8 0.10988314 44 nips-2004-Conditional Random Fields for Object Recognition

9 0.10775658 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

10 0.10661503 168 nips-2004-Semigroup Kernels on Finite Sets

11 0.10307293 34 nips-2004-Breaking SVM Complexity with Cross-Training

12 0.10203411 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications

13 0.097151801 125 nips-2004-Multiple Relational Embedding

14 0.093552835 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

15 0.093338557 115 nips-2004-Maximum Margin Clustering

16 0.08650364 131 nips-2004-Non-Local Manifold Tangent Learning

17 0.086173624 83 nips-2004-Incremental Learning for Visual Tracking

18 0.085662588 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition

19 0.085656658 40 nips-2004-Common-Frame Model for Object Recognition

20 0.084257744 92 nips-2004-Kernel Methods for Implicit Surface Modeling


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.29), (1, 0.109), (2, -0.074), (3, -0.067), (4, 0.106), (5, 0.083), (6, 0.045), (7, -0.129), (8, 0.111), (9, 0.062), (10, -0.078), (11, -0.023), (12, -0.081), (13, 0.011), (14, -0.063), (15, 0.037), (16, 0.047), (17, -0.042), (18, 0.05), (19, 0.048), (20, 0.072), (21, 0.072), (22, 0.072), (23, 0.055), (24, -0.024), (25, -0.143), (26, -0.003), (27, 0.183), (28, 0.104), (29, 0.013), (30, -0.191), (31, -0.073), (32, -0.184), (33, -0.102), (34, -0.062), (35, 0.014), (36, -0.053), (37, 0.109), (38, 0.062), (39, -0.034), (40, 0.114), (41, -0.113), (42, -0.015), (43, 0.003), (44, 0.13), (45, -0.049), (46, -0.006), (47, -0.052), (48, -0.039), (49, -0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9682073 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics

Author: Michael Fink

Abstract: We describe a framework for learning an object classifier from a single example. This goal is achieved by emphasizing the relevant dimensions for classification using available examples of related classes. Learning to accurately classify objects from a single training example is often unfeasible due to overfitting effects. However, if the instance representation provides that the distance between each two instances of the same class is smaller than the distance between any two instances from different classes, then a nearest neighbor classifier could achieve perfect performance with a single training example. We therefore suggest a two stage strategy. First, learn a metric over the instances that achieves the distance criterion mentioned above, from available examples of other related classes. Then, using the single examples, define a nearest neighbor classifier where distance is evaluated by the learned class relevance metric. Finding a metric that emphasizes the relevant dimensions for classification might not be possible when restricted to linear projections. We therefore make use of a kernel based metric learning algorithm. Our setting encodes object instances as sets of locality based descriptors and adopts an appropriate image kernel for the class relevance metric learning. The proposed framework for learning from a single example is demonstrated in a synthetic setting and on a character classification task. 1

2 0.74903882 85 nips-2004-Instance-Based Relevance Feedback for Image Retrieval

Author: Giorgio Gia\-cin\-to, Fabio Roli

Abstract: High retrieval precision in content-based image retrieval can be attained by adopting relevance feedback mechanisms. These mechanisms require that the user judges the quality of the results of the query by marking all the retrieved images as being either relevant or not. Then, the search engine exploits this information to adapt the search to better meet user’s needs. At present, the vast majority of proposed relevance feedback mechanisms are formulated in terms of search model that has to be optimized. Such an optimization involves the modification of some search parameters so that the nearest neighbor of the query vector contains the largest number of relevant images. In this paper, a different approach to relevance feedback is proposed. After the user provides the first feedback, following retrievals are not based on knn search, but on the computation of a relevance score for each image of the database. This score is computed as a function of two distances, namely the distance from the nearest non-relevant image and the distance from the nearest relevant one. Images are then ranked according to this score and the top k images are displayed. Reported results on three image data sets show that the proposed mechanism outperforms other state-of-the-art relevance feedback mechanisms. 1 In t rod u ct i on A large number of content-based image retrieval (CBIR) systems rely on the vector representation of images in a multidimensional feature space representing low-level image characteristics, e.g., color, texture, shape, etc. [1]. Content-based queries are often expressed by visual examples in order to retrieve from the database the images that are “similar” to the examples. This kind of retrieval is often referred to as K nearest-neighbor retrieval. It is easy to see that the effectiveness of content-based image retrieval systems (CBIR) strongly depends on the choice of the set of visual features, on the choice of the “metric” used to model the user’s perception of image similarity, and on the choice of the image used to query the database [1]. Typically, if we allow different users to mark the images retrieved with a given query as relevant or non-relevant, different subsets of images will be marked as relevant. Accordingly, the need for mechanisms to adapt the CBIR system response based on some feedback from the user is widely recognized. It is interesting to note that while relevance feedback mechanisms have been first introduced in the information retrieval field [2], they are receiving more attention in the CBIR field (Huang). The vast majority of relevance feedback techniques proposed in the literature is based on modifying the values of the search parameters as to better represent the concept the user bears in mind. To this end, search parameters are computed as a function of the relevance values assigned by the user to all the images retrieved so far. As an example, relevance feedback is often formulated in terms of the modification of the query vector, and/or in terms of adaptive similarity metrics. [3]-[7]. Recently, pattern classification paradigms such as SVMs have been proposed [8]. Feedback is thus used to model the concept of relevant images and adjust the search consequently. Concept modeling may be difficult on account of the distribution of relevant images in the selected feature space. “Narrow domain” image databases allows extracting good features, so that images bearing similar concepts belong to compact clusters. On the other hand, “broad domain” databases, such as image collection used by graphic professionals, or those made up of images from the Internet, are more difficult to subdivide in cluster because of the high variability of concepts [1]. In these cases, it is worth extracting only low level, non-specialized features, and image retrieval is better formulated in terms of a search problem rather then concept modeling. The present paper aims at offering an original contribution in this direction. Rather then modeling the concept of “relevance” the user bears in mind, feedback is used to assign each image of the database a relevance score. Such a score depends only from two dissimilarities (distances) computed against the images already marked by the user: the dissimilarity from the set of relevant images, and the dissimilarity from the set of non-relevant images. Despite its computational simplicity, this mechanism allows outperforming state-of-the-art relevance feedback mechanisms both on “narrow domain” databases, and on “broad domain” databases. This paper is organized as follows. Section 2 illustrates the idea behind the proposed mechanism and provides the basic assumptions. Section 3 details the proposed relevance feedback mechanism. Results on three image data sets are presented in Section 4, where performances of other relevance feedback mechanisms are compared. Conclusions are drawn in Section 5. 2 In st an ce- b ased rel evan ce est i m at i on The proposed mechanism has been inspired by classification techniques based on the “nearest case” [9]-[10]. Nearest-case theory provided the mechanism to compute the dissimilarity of each image from the sets of relevant and non–relevant images. The ratio between the nearest relevant image and the nearest non-relevant image has been used to compute the degree of relevance of each image of the database [11]. The present section illustrates the rationale behind the use of the nearest-case paradigm. Let us assume that each image of the database has been represented by a number of low-level features, and that a (dis)similarity measure has been defined so that the proximity between pairs of images represents some kind of “conceptual” similarity. In other words, the chosen feature space and similarity metric is meaningful at least for a restricted number of users. A search in image databases is usually performed by retrieving the k most similar images with respect to a given query. The dimension of k is usually small, to avoid displaying a large number of images at a time. Typical values for k are between 10 and 20. However, as the “relevant” images that the user wishes to retrieve may not fit perfectly with the similarity metric designed for the search engine, the user may be interested in exploring other regions of the feature space. To this end, the user marks the subset of “relevant” images out of the k retrieved. Usually, such relevance feedback is used to perform a new k-nn search by modifying some search parameters, i.e., the position of the query point, the similarity metric, and other tuning parameters [1]-[7]. Recent works proposed the use of support vector machine to learn the distribution of relevant images [8]. These techniques require some assumption about the general form of the distribution of relevant images in the feature space. As it is difficult to make any assumption about such a distribution for broad domain databases, we propose to exploit the information about the relevance of the images retrieved so far in a nearest-neighbor fashion. Nearest-neighbor techniques, as used in statistical pattern recognition, case-based reasoning, or instance-based learning, are effective in all applications where it is difficult to produce a high-level generalization of a “class” of objects [9]-[10],[12][13]. Relevance learning in content base image retrieval may well fit into this definition, as it is difficult to provide a general model that can be adapted to represent different concepts of similarity. In addition, the number of available cases may be too small to estimate the optimal set of parameters for such a general model. On the other hand, it can be more effective to use each “relevant” image as well as each “non-relevant” image, as “cases” or “instances” against which the images of the database should be compared. Consequently, we assume that an image is as much as relevant as much as its dissimilarity from the nearest relevant image is small. Analogously, an image is as much as non-relevant as much as its dissimilarity from the nearest non-relevant image is small. 3 Rel evan ce S core Com p u t ati on According to previous section, each image of the database can be thus characterized by a “degree of relevance” and a “degree of non-relevance” according to the dissimilarities from the nearest relevant image, and from the nearest non-relevant image, respectively. However, it should be noted that these degrees should be treated differently because only “relevant” images represent a “concept” in the user’s mind, while “non-relevant” images may represent a number of other concepts different from user’s interest. In other words, while it is meaningful to treat the degree of relevance as a degree of membership to the class of relevant images, the same does not apply to the degree of non-relevance. For this reason, we propose to use the “degree of non-relevance” to weight the “degree of relevance”. Let us denote with R the subset of indexes j ∈ {1,...,k} related to the set of relevant images retrieved so far and the original query (that is relevant by default), and with NR the subset of indexes j ∈ (1,...,k} related to the set of non-relevant images retrieved so far. For each image I of the database, according to the nearest neighbor rule, let us compute the dissimilarity from the nearest image in R and the dissimilarity from the nearest image in NR. Let us denote these dissimilarities as dR(I) and dNR(I), respectively. The value of dR(I) can be clearly used to measure the degree of relevance of image I, assuming that small values of dR(I) are related to very relevant images. On the other hand, the hypothesis that image I is relevant to the user’s query can be supported by a high value of dNR(I). Accordingly, we defined the relevance score ! dR ( I ) $ relevance ( I ) = # 1 + dN ( I ) &

3 0.6315757 22 nips-2004-An Investigation of Practical Approximate Nearest Neighbor Algorithms

Author: Ting Liu, Andrew W. Moore, Ke Yang, Alexander G. Gray

Abstract: This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate nearest neighbor approach called Locality Sensitive Hashing (LSH). In this paper we ask the question: can earlier spatial data structure approaches to exact nearest neighbor, such as metric trees, be altered to provide approximate answers to proximity queries and if so, how? We introduce a new kind of metric tree that allows overlap: certain datapoints may appear in both the children of a parent. We also introduce new approximate k-NN search algorithms on this structure. We show why these structures should be able to exploit the same randomprojection-based approximations that LSH enjoys, but with a simpler algorithm and perhaps with greater efficiency. We then provide a detailed empirical evaluation on five large, high dimensional datasets which show up to 31-fold accelerations over LSH. This result holds true throughout the spectrum of approximation levels. 1

4 0.59731448 127 nips-2004-Neighbourhood Components Analysis

Author: Jacob Goldberger, Geoffrey E. Hinton, Sam T. Roweis, Ruslan Salakhutdinov

Abstract: In this paper we propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leave-one-out KNN score on the training set. It can also learn a low-dimensional linear embedding of labeled data that can be used for data visualization and fast classification. Unlike other methods, our classification model is non-parametric, making no assumptions about the shape of the class distributions or the boundaries between them. The performance of the method is demonstrated on several data sets, both for metric learning and linear dimensionality reduction. 1

5 0.56306648 107 nips-2004-Making Latin Manuscripts Searchable using gHMM's

Author: Jaety Edwards, Yee W. Teh, Roger Bock, Michael Maire, Grace Vesom, David A. Forsyth

Abstract: We describe a method that can make a scanned, handwritten mediaeval latin manuscript accessible to full text search. A generalized HMM is fitted, using transcribed latin to obtain a transition model and one example each of 22 letters to obtain an emission model. We show results for unigram, bigram and trigram models. Our method transcribes 25 pages of a manuscript of Terence with fair accuracy (75% of letters correctly transcribed). Search results are very strong; we use examples of variant spellings to demonstrate that the search respects the ink of the document. Furthermore, our model produces fair searches on a document from which we obtained no training data. 1. Intoduction There are many large corpora of handwritten scanned documents, and their number is growing rapidly. Collections range from the complete works of Mark Twain to thousands of pages of zoological notes spanning two centuries. Large scale analyses of such corpora is currently very difficult, because handwriting recognition works poorly. Recently, Rath and Manmatha have demonstrated that one can use small bodies of aligned material as supervised data to train a word spotting mechanism [7]. The result can make scanned handwritten documents searchable. Current techniques assume a closed vocabulary — one can search only for words in the training set — and search for instances of whole words. This approach is particularly unattractive for an inflected language, because individual words can take so many forms that one is unlikely to see all in the training set. Furthermore, one would like the method used to require very little aligned training data, so that it is possible to process documents written by different scribes with little overhead. Mediaeval Latin manuscripts are a natural first corpus for studying this problem, because there are many scanned manuscripts and because the handwriting is relatively regular. We expect the primary user need to be search over a large body of documents — to allow comparisons between documents — rather than transcription of a particular document (which is usually relatively easy to do by hand). Desirable features for a system are: First, that it use little or no aligned training data (an ideal, which we believe may be attainable, is an unsupervised learning system). Second, that one can search the document for an arbitrary string (rather than, say, only complete words that appear in the training data). This would allow a user to determine whether a document contains curious or distinctive spellings, for example (figure 7). We show that, using a statistical model based on a generalized HMM, we can search a medieval manuscript with considerable accuracy, using only one instance each of each letter in the manuscript to train the method (22 instances in total; Latin has no j, k, w, or z). Furthermore, our method allows fairly accurate transcription of the manuscript. We train our system on 22 glyphs taken from a a 12th century latin manuscript of Terence’s Comedies (obtained from a repository of over 80 scanned medieval works maintained by Oxford University [1]). We evaluate searches using a considerable portion of this manuscript aligned by hand; we then show that fair search results are available on a different manuscript (MS. Auct. D. 2. 16, Latin Gospels with beast-headed evangelist portraits made at Landvennec, Brittany, late 9th or early 10th century, from [1]) without change of letter templates. 1.1. Previous Work Handwriting recognition is a traditional problem, too well studied to review in detail here (see [6]). Typically, online handwriting recognition (where strokes can be recorded) works better than offline handwriting recognition. Handwritten digits can now be recognized with high accuracy [2, 5]. Handwritten amounts can be read with fair accuracy, which is significantly improved if one segments the amount into digits at the same time as one recognizes it [4, 5]. Recently several authors have proposed new techniques for search and translation in this unrestricted setting. Manmatha et al [7] introduce the technique of “word spotting,” which segments text into word images, rectifies the word images, and then uses an aligned training set to learn correspondences between rectified word images and strings. The method is not suitable for a heavily inflected language, because words take so many forms. In an inflected language, the natural unit to match to is a subset of a word, rather than a whole word, implying that one should segment the text into blocks — which may be smaller than words — while recognizing. Vinciarelli et al [8] introduce a method for line by line recognition based around an HMM and quite similar to techniques used in the speech recognition community. Their method uses a window that slides along the text to obtain features; this has the difficulty that the same window is in some places too small (and so uninformative) and in others too big (and so spans more than one letter, and is confusing). Their method requires a substantial body of aligned training data, which makes it impractical for our applications. Close in spirit to our work is the approach to machine translation of Koehn and Knight [3]. They demonstrate that the statistics of unaligned corpora may provide as powerful constraints for training models as aligned bitexts. 2. The Model Our models for both search and transcription are based on the generalized HMM and differ only in their choice of transition model. In an HMM, each hidden node ct emits a single evidence node xt . In a generalized HMM, we allow each ct to emit a series of x’s whose length is itself a random variable. In our model, the hidden nodes correspond to letters and each xt is a single column of pixels. Allowing letters to emit sets of columns lets us accomodate letter templates of variable width. In particular, this means that we can unify segmenting ink into letters and recognizing blocks of ink; figure 3 shows an example of how useful this is. 2.1. Generating a line of text Our hidden state consists of a character label c, width w and vertical position y. The statespace of c contains the characters ‘a’-‘z’, a space ‘ ’, and a special end state Ω. Let T c be the template associated with character c, Tch , Tcw be respectively the height and width of that template, and m be the height of the image. Figure 1: Left, a full page of our manuscript, a 12’th century manuscript of Terence’s Comedies obtained from [1]. Top right, a set of lines from a page from that document and bottom right, some words in higher resolution. Note: (a) the richness of page layout; (b) the clear spacing of the lines; (c) the relatively regular handwriting. Figure 2: Left, the 22 instances, one per letter, used to train our emission model. These templates are extracted by hand from the Terence document. Right, the five image channels for a single letter. Beginning at image column 1 (and assuming a dummy space before the first character), • • • • choose character c ∼ p(c|c−1...−n ) (an n-gram letter model) choose length w ∼ Uniform(Tcw − k, Tcw + k) (for some small k) choose vertical position y ∼ Uniform(1, m − Tch ) z,y and Tch now define a bounding box b of pixels. Let i and j be indexed from the top left of that bounding box. – draw pixel (i, j) ∼ N (Tcij , σcij ) for each pixel in b – draw all pixels above and below b from background gaussian N (µ0 , σ0 ) (See 2.2 for greater detail on pixel emission model) • move to column w + 1 and repeat until we enter the end state Ω. Inference on a gHMM is a relatively straighforward business of dynamic programming. We have used unigram, bigram and trigram models, with each transition model fitted using an electronic version of Caesar’s Gallic Wars, obtained from http://www.thelatinlibrary.com. We do not believe that the choice of author should significantly affect the fitted transition model — which is at the level of characters — but have not experimented with this point. The important matter is the emission model. 2.2. The Emission Model Our emission model is as follows: Given the character c and width w, we generate a template of the required length. Each pixel in this template becomes the mean of a gaussian which generates the corresponding pixel in the image. This template has a separate mean image for each pixel channel. The channels are assumed independent given the means. We train the model by cutting out by hand a single instance of each letter from our corpus (figure 2). This forms the central portion of the template. Pixels above and below this Model Perfect transcription unigram bigram trigram matching chars 21019 14603 15572 15788 substitutions 0 5487 4597 4410 insertions 0 534 541 507 deletions 0 773 718 695 Table 1: Edit distance between our transcribed Terence and the editor’s version. Note the trigram model produces significantly fewer letter errors than the unigram model, but that the error rate is still a substantial 25%. central box are generated from a single gaussian used to model background pixels (basically white pixels). We add a third variable yt to our hidden state indicating the vertical position of the central box. However, since we are uninterested in actually recovering this variable, during inference we sum it out of the model. The width of a character is constrained to be close to the width (tw ) of our hand cut example by setting p(w|c) = 0 for w < tw − k and w > tw + k. Here k is a small, user defined integer. Within this range, p(w|c) is distributed uniformly, larger templates are created by appending pixels from the background model to the template and smaller ones by simply removing the right k-most columns of the hand cut example. For features, we generate five image representations, shown in figure 2. The first is a grayscale version of the original color image. The second and third are generated by convolving the grayscale image with a vertical derivative of gaussian filter, separating the positive and negative components of this response, and smoothing each of these gradient images separately. The fourth and fifth are generated similarly but with a horizontal derivative of gaussian filter. We have experimented with different weightings of these 5 channels. In practice we use the gray scale channel and the horizontal gradient channels. We emphasize the horizontal pieces since these seem the more discriminative. 2.3. Transcription For transcription, we model letters as coming from an n-gram language model, with no dependencies between words. Thus, the probability of a letter depends on the k letters before it, where k = n unless this would cross a word boundary in which case the history terminates at this boundary. We chose not to model word to word transition probabilities since, unlike in English, word order in Latin is highly arbitrary. This transition model is fit from a corpus of ascii encoded latin. We have experimented with unigram (i.e. uniform transition probabilities), bigram and trigram letter models. We can perform transcription by fitting the maximum likelihood path through any given line. Some results of this technique are shown in figure 3. 2.4. Search For search, we rank lines by the probability that they contain our search word. We set up a finite state machine like that in figure 4. In this figure, ‘bg’ represents our background model for that portion of the line not generated by our search word. We can use any of the n-gram letter models described for transcription as the transition model for ‘bg’. The probability that the line contains the search word is the probability that this FSM takes path 1. We use this FSM as the transition model for our gHMM, and output the posterior probability of the two arrows leading into the end state. 1 and 2 are user defined weights, but in practice the algorithm does not appear to be particular sensitive to the choice of these parameters. The results presented here use the unigram model. Editorial translation Orator ad vos venio ornatu prologi: unigram b u rt o r a d u o s u em o o r n a t u p r o l o g r b u rt o r a d v o s v em o o r u a t u p r o l o g r fo r a t o r a d v o s v en i o o r n a t u p r o l o g i bigram trigram Figure 3: We transcribe the text by finding the maximum likelihood path through the gHMM. The top line shows the standard version of the line (obtained by consensus among editors who have consulted various manuscripts; we obtained this information in electronic form from http://www.thelatinlibrary.com). Below, we show the line as segmented and transcribed by unigram, bigram and trigram models; the unigram and bigram models transcribe one word as “vemo”, but the stronger trigram model forces the two letters to be segmented and correctly transcribes the word as “venio”, illustrating the considerable benefit to be obtained by segmenting only at recognition time. 1 − ε1 Path 1 1 − ε2 a b bg ε1 Ω bg Path 2 ε2 Figure 4: The finite state machine to search for the word ‘ab.’ ‘bg’ is a place holder for the larger finite state machine defined by our language model’s transition matrix. 3. Results Figure 1 shows a page from our collection. This is a scanned 12th century manuscript of Terence’s Comedies, obtained from the collection at [1]. In preprocessing, we extract individual lines of text by rotating the image to various degrees and projecting the sum of the pixel values onto the y-axis. We choose the orientation whose projection vector has the lowest entropy, and then segment lines by cutting at minima of this projection. Transcription is not our primary task, but methods that produce good transcriptions are going to support good searches. The gHMM can produce a surprisingly good transcription, given how little training data is used to train the emission model. We aligned an editors version of Terence with 25 pages from the manuscript by hand, and computed the edit distance between the transcribed text and the aligned text; as table 1 indicates, approximately 75% of letters are read correctly. Search results are strong. We show results for two documents. The first set of results refers to the edition of Terence’s Comedies, from which we took the 22 letter instances. In particular, for any given search term, our process ranks the complete set of lines. We used a hand alignment of the manuscript to determine which lines contained each term; figure 5 shows an overview of searches performed using every word that appears in the 50 100 150 200 250 300 350 400 450 500 550 Figure 5: Our search ranks 587 manuscript lines, with higher ranking lines more likely to contain the relevant term. This figure shows complete search results for each term that appears more than three times in the 587 lines. Each row represents the ranked search results for a term, and a black mark appears if the search term is actually in the line; a successful search will therefore appear as a row which is wholly dark to the left, and then wholly light. All 587 lines are represented. More common terms are represented by lower rows. More detailed results appear in figure 5 and figure 6; this summary figure suggests almost all searches are highly successful. document more than three times, in particular, showing which of the ranked set of lines actually contained the search term. For almost every search, the term appears mainly in the lines with higher rank. Figure 6 contains more detailed information for a smaller set of words. We do not score the position of a word in a line (for practical reasons). Figure 7 demonstrates (a) that our search respects the ink of the document and (b) that for the Terence document, word positions are accurately estimated. The spelling of mediaeval documents is typically cleaned up by editors; in our manuscript, the scribe reliably spells “michi” for the standard “mihi”. A search on “michi” produces many instances; a search on “mihi” produces none, because the ink doesn’t have any. Notice this phenomenon also in the bottom right line of figure 7, the scribe writes “habet, ut consumat nunc cum nichil obsint doli” and the editor gives “habet, ut consumat nunc quom nil obsint doli.” Figure 8 shows that searches on short strings produce many words containing that string as one would wish. 4. Discussion We have shown that it is possible to make at least some handwritten mediaeval manuscripts accessible to full text search, without requiring an aligned text or much supervisory data. Our documents have very regular letters, and letter frequencies — which can be obtained from transcribed Latin — appear to provide so powerful a cue that relatively little detailed information about letter shapes is required. Linking letter segmentation and recognition has thoroughly beneficial effects. This suggests that the pool of manuscripts that can be made accessible in this way is large. In particular, we have used our method, trained on 22 instances of letters from one document, to search another document. Figure 9 shows the results from two searches of our second document (MS. Auct. D. 2. 16, Latin Gospels with beast-headed evangelist portraits made at Landvennec, Brittany, late 9th or early 10th century, from [1]). No information from this document was used in training at all; but letter 1tu arbitror pater etiam nisi factum primum siet vero illi inter hic michi ibi qui tu ibi michi 0.9 0.8 0.7 qui hic 0.6 inter 0.5 illi 0.4 siet 0.3 vero 0.2 nisi 0.1 50 100 150 200 250 300 350 400 450 500 550 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 6: On the left, search results for selected words (indicated on the leftmost column). Each row represents the ranked search results for a term, and a black mark appears if the search term is actually in the line; a successful search will therefore appear as a row which is wholly dark to the left, and then wholly light. Note only the top 300 results are represented, and that lines containing the search term are almost always at or close to the top of the search results (black marks to the left). On the right, we plot precision against recall for a set of different words by taking the top 10, 20, ... lines returned from the search, and checking them against the aligned manuscript. Note that, once all cases have been found, if the size of the pool is increased the precision will fall with 100% recall; many words work well, with most of the first 20 or so lines returned containing the search term. shapes are sufficiently well shared that the search is still useful. All this suggests that one might be able to use EM to link three processes: one that clusters to determine letter shapes; one that segments letters; and one that imposes a language model. Such a system might be able to make handwritten Latin searchable with no training data. References [1] Early Manuscripts at Oxford University. Bodleian library ms. auct. f. 2.13. http://image.ox.ac.uk/. [2] Serge Belongie, Jitendra Malik, and Jan Puzicha. Shape matching and object recognition using shape contexts. IEEE T. Pattern Analysis and Machine Intelligence, 24(4):509–522, 2002. [3] Philipp Koehn and Kevin Knight. Estimating word translation probabilities from unrelated monolingual corpora. In Proc. of the 17th National Conf. on AI, pages 711–715. AAAI Press / The MIT Press, 2000. [4] Y. LeCun, L. Bottou, and Y. Bengio. Reading checks with graph transformer networks. In International Conference on Acoustics, Speech, and Signal Processing, volume 1, pages 151–154, Munich, 1997. IEEE. [5] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [6] R. Plamondon and S.N. Srihari. Online and off-line handwriting recognition: a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):63–84, 2000. [7] T. M. Rath and R. Manmatha. Word image matching using dynamic time warping. In Proc. of the Conf. on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 521–527, 2003. [8] Alessandro Vinciarelli, Samy Bengio, and Horst Bunke. Offline recognition of unconstrained handwritten texts using hmms and statistical language models. IEEE Trans. Pattern Anal. Mach. Intell., 26(6):709–720, 2004. michi: Spe incerta certum mihi laborem sustuli, mihi: Faciuntne intellegendo ut nil intellegant? michi: Nonnumquam conlacrumabat. placuit tum id mihi. mihi: Placuit: despondi. hic nuptiis dictust dies. michi: Sto exspectans siquid mi imperent. venit una,

6 0.54523277 196 nips-2004-Triangle Fixing Algorithms for the Metric Nearness Problem

7 0.53155589 14 nips-2004-A Topographic Support Vector Machine: Classification Using Local Label Configurations

8 0.48449764 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

9 0.45424905 168 nips-2004-Semigroup Kernels on Finite Sets

10 0.44156513 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations

11 0.43918189 125 nips-2004-Multiple Relational Embedding

12 0.43776789 156 nips-2004-Result Analysis of the NIPS 2003 Feature Selection Challenge

13 0.43214202 145 nips-2004-Parametric Embedding for Class Visualization

14 0.41865885 40 nips-2004-Common-Frame Model for Object Recognition

15 0.41184586 53 nips-2004-Discriminant Saliency for Visual Recognition from Cluttered Scenes

16 0.40849972 15 nips-2004-Active Learning for Anomaly and Rare-Category Detection

17 0.40569395 111 nips-2004-Maximal Margin Labeling for Multi-Topic Text Categorization

18 0.40427813 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

19 0.39305377 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound

20 0.38746792 94 nips-2004-Kernels for Multi--task Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.111), (15, 0.121), (26, 0.046), (31, 0.015), (33, 0.162), (35, 0.404), (39, 0.015), (50, 0.037), (76, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99435467 140 nips-2004-Optimal Information Decoding from Neuronal Populations with Specific Stimulus Selectivity

Author: Marcelo A. Montemurro, Stefano Panzeri

Abstract: A typical neuron in visual cortex receives most inputs from other cortical neurons with a roughly similar stimulus preference. Does this arrangement of inputs allow efficient readout of sensory information by the target cortical neuron? We address this issue by using simple modelling of neuronal population activity and information theoretic tools. We find that efficient synaptic information transmission requires that the tuning curve of the afferent neurons is approximately as wide as the spread of stimulus preferences of the afferent neurons reaching the target neuron. By meta analysis of neurophysiological data we found that this is the case for cortico-cortical inputs to neurons in visual cortex. We suggest that the organization of V1 cortico-cortical synaptic inputs allows optimal information transmission. 1

same-paper 2 0.8870213 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics

Author: Michael Fink

Abstract: We describe a framework for learning an object classifier from a single example. This goal is achieved by emphasizing the relevant dimensions for classification using available examples of related classes. Learning to accurately classify objects from a single training example is often unfeasible due to overfitting effects. However, if the instance representation provides that the distance between each two instances of the same class is smaller than the distance between any two instances from different classes, then a nearest neighbor classifier could achieve perfect performance with a single training example. We therefore suggest a two stage strategy. First, learn a metric over the instances that achieves the distance criterion mentioned above, from available examples of other related classes. Then, using the single examples, define a nearest neighbor classifier where distance is evaluated by the learned class relevance metric. Finding a metric that emphasizes the relevant dimensions for classification might not be possible when restricted to linear projections. We therefore make use of a kernel based metric learning algorithm. Our setting encodes object instances as sets of locality based descriptors and adopts an appropriate image kernel for the class relevance metric learning. The proposed framework for learning from a single example is demonstrated in a synthetic setting and on a character classification task. 1

3 0.78403431 1 nips-2004-A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees

Author: Daniela D. Farias, Benjamin V. Roy

Abstract: We introduce a new algorithm based on linear programming that approximates the differential value function of an average-cost Markov decision process via a linear combination of pre-selected basis functions. The algorithm carries out a form of cost shaping and minimizes a version of Bellman error. We establish an error bound that scales gracefully with the number of states without imposing the (strong) Lyapunov condition required by its counterpart in [6]. We propose a path-following method that automates selection of important algorithm parameters which represent counterparts to the “state-relevance weights” studied in [6]. 1

4 0.663234 173 nips-2004-Spike-timing Dependent Plasticity and Mutual Information Maximization for a Spiking Neuron Model

Author: Taro Toyoizumi, Jean-pascal Pfister, Kazuyuki Aihara, Wulfram Gerstner

Abstract: We derive an optimal learning rule in the sense of mutual information maximization for a spiking neuron model. Under the assumption of small fluctuations of the input, we find a spike-timing dependent plasticity (STDP) function which depends on the time course of excitatory postsynaptic potentials (EPSPs) and the autocorrelation function of the postsynaptic neuron. We show that the STDP function has both positive and negative phases. The positive phase is related to the shape of the EPSP while the negative phase is controlled by neuronal refractoriness. 1

5 0.63691229 84 nips-2004-Inference, Attention, and Decision in a Bayesian Neural Architecture

Author: Angela J. Yu, Peter Dayan

Abstract: We study the synthesis of neural coding, selective attention and perceptual decision making. A hierarchical neural architecture is proposed, which implements Bayesian integration of noisy sensory input and topdown attentional priors, leading to sound perceptual discrimination. The model offers an explicit explanation for the experimentally observed modulation that prior information in one stimulus feature (location) can have on an independent feature (orientation). The network’s intermediate levels of representation instantiate known physiological properties of visual cortical neurons. The model also illustrates a possible reconciliation of cortical and neuromodulatory representations of uncertainty. 1

6 0.63412595 181 nips-2004-Synergies between Intrinsic and Synaptic Plasticity in Individual Model Neurons

7 0.63212395 194 nips-2004-Theory of localized synfire chain: characteristic propagation speed of stable spike pattern

8 0.62918061 28 nips-2004-Bayesian inference in spiking neurons

9 0.62831336 76 nips-2004-Hierarchical Bayesian Inference in Networks of Spiking Neurons

10 0.62754601 21 nips-2004-An Information Maximization Model of Eye Movements

11 0.62332982 151 nips-2004-Rate- and Phase-coded Autoassociative Memory

12 0.62160844 153 nips-2004-Reducing Spike Train Variability: A Computational Theory Of Spike-Timing Dependent Plasticity

13 0.61211121 157 nips-2004-Saliency-Driven Image Acuity Modulation on a Reconfigurable Array of Spiking Silicon Neurons

14 0.60722339 172 nips-2004-Sparse Coding of Natural Images Using an Overcomplete Set of Limited Capacity Units

15 0.60486799 118 nips-2004-Methods for Estimating the Computational Power and Generalization Capability of Neural Microcircuits

16 0.60394645 58 nips-2004-Edge of Chaos Computation in Mixed-Mode VLSI - A Hard Liquid

17 0.58581632 131 nips-2004-Non-Local Manifold Tangent Learning

18 0.58525205 170 nips-2004-Similarity and Discrimination in Classical Conditioning: A Latent Variable Account

19 0.57350725 112 nips-2004-Maximising Sensitivity in a Spiking Network

20 0.57326353 26 nips-2004-At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks