nips nips2003 nips2003-12 knowledge-graph by maker-knowledge-mining

12 nips-2003-A Model for Learning the Semantics of Pictures

Source: pdf

Author: Victor Lavrenko, R. Manmatha, Jiwoon Jeon

Abstract: We propose an approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries. We do this using a formalism that models the generation of annotated images. We assume that every image is divided into regions, each described by a continuous-valued feature vector. Given a training set of images with annotations, we compute a joint probabilistic model of image features and words which allow us to predict the probability of generating a word given the image regions. This may be used to automatically annotate and retrieve images given a word as a query. Experiments show that our model signiﬁcantly outperforms the best of the previously reported results on the tasks of automatic image annotation and retrieval. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We propose an approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries. [sent-6, score-1.037]

2 We assume that every image is divided into regions, each described by a continuous-valued feature vector. [sent-8, score-0.346]

3 Given a training set of images with annotations, we compute a joint probabilistic model of image features and words which allow us to predict the probability of generating a word given the image regions. [sent-9, score-1.213]

4 This may be used to automatically annotate and retrieve images given a word as a query. [sent-10, score-0.507]

5 Experiments show that our model signiﬁcantly outperforms the best of the previously reported results on the tasks of automatic image annotation and retrieval. [sent-11, score-0.874]

6 1 Introduction Historically, librarians have retrieved images by ﬁrst manually annotating them with keywords. [sent-12, score-0.389]

7 Underlying this approach is the belief that the words associated (manually) with a picture essentially capture the semantics of the picture and any retrieval based on these keywords will, therefore, retrieve relevant pictures. [sent-14, score-0.558]

8 Since manual image annotation is expensive, there has been great interest in coming up with automatic ways to retrieve images based on content. [sent-15, score-1.099]

9 Queries based on image concepts like color or texture have been proposed for retrieving images by content but most users ﬁnd it difﬁcult to query using such visual attributes. [sent-16, score-0.807]

10 Most people would prefer to pose text queries and ﬁnd images relevant to those queries. [sent-17, score-0.356]

11 This is difﬁcult if not impossible with many of the current image retrieval systems and hence has not led to widespread adoption of these systems. [sent-19, score-0.43]

12 We propose a model which looks at the probability of associating words with image regions. [sent-20, score-0.447]

13 For example, the association of a region with the word tiger is increased by the fact that there is a grass region and a water region in the same image and should be decreased if instead there is a region corresponding to the interior of an aircraft. [sent-23, score-0.775]

14 Thus the association of different regions provides context while the association of words with image regions provides meaning. [sent-24, score-0.776]

15 Our model computes a joint probability of image features over different regions in an image using a training set and uses this joint probability to annotate and retrieve images. [sent-25, score-1.039]

16 More formally, we propose a statistical generative model to automatically learn the semantics of images - that is, for annotating and retrieving images based on a training set of images. [sent-26, score-0.795]

17 We assume that an image is segmented into regions (although the regions could simply be a partition of the image) and that features are computed over each of these regions. [sent-27, score-0.646]

18 Given a training set of images with annotations, we show that probabilistic models allow us to predict the probability of generating a word given the features computed over different regions in an image. [sent-28, score-0.638]

19 This may be used to automatically annotate and retrieve images given a word as a query. [sent-29, score-0.507]

20 We show that the continuous relevance model - a statistical generative model related to relevance models in information retrieval - allows us to derive these probabilities in a natural way. [sent-30, score-0.52]

21 The model proposed here directly associates continuous features with words and does not require an intermediate clustering stage. [sent-31, score-0.295]

22 Experiments show that the annotation performance of this continuous relevance model is substantially better than any other model tested on the same data set. [sent-32, score-0.698]

23 The model also allows ranked retrieval in response to a text query and again performs much better than any other model in this regard. [sent-35, score-0.499]

24 2 Related Work Recently, there has been some work on automatically annotating images by looking at the probability of associating words with image regions. [sent-37, score-0.746]

25 [9] proposed a Cooccurrence Model in which they looked at the co-occurrence of words with image regions created using a regular grid. [sent-39, score-0.594]

26 Duygulu et al [4] proposed to describe images using a vocabulary of blobs. [sent-40, score-0.319]

27 For each region, features are computed and then blobs are generated by clustering the image features for these regions across images. [sent-42, score-0.585]

28 Their Translation Model applies one of the classical statistical machine translation models to translate from the set of keywords of an image to the set of blobs forming the image. [sent-44, score-0.423]

29 Jeon et al [5] instead assumed that this could be viewed as analogous to the cross-lingual retrieval problem and used a cross-media relevance model (CMRM) to perform both image annotation and ranked retrieval. [sent-45, score-1.115]

30 In order to use CMRM for image annotation we have to quantize continuous feature position,size, texture,shape, color, . [sent-63, score-0.841]

31 tiger = w1 grass = w2 sun = w3 r1 g1 P(w|J) J P(g|J) g2 g3 P(r|g) = r2 = r3 = Figure 1: A generative model of annotated images. [sent-66, score-0.334]

32 In section 4 we will show that CRM performs signiﬁcantly better than all previosly proposed models on the tasks of image annotation and retrieval. [sent-87, score-0.798]

33 3 A Model of Annotated Images The purpose of this section is to introduce a statistical formalism that will allow us to model a relationship between the contents of a given image and the annotation of that image. [sent-89, score-0.785]

34 We will describe an approach to learning a joint probability disdribution P (r, w) over the regions r of some image and the words w in its annotation. [sent-90, score-0.603]

35 Suppose we are given a new image for which no annotation is provided. [sent-93, score-0.746]

36 Having a joint distribution allows us to compute a conditional likelihood P (w|r) which can then be used to guess the most likely annotation w for the image in question. [sent-95, score-0.781]

37 The new annotation can be presented to a user, indexed, or used for retrieval purposes. [sent-96, score-0.618]

38 Suppose we are given a collection of un-annotated images and a text query wqry consisting of a few keywords. [sent-99, score-0.556]

39 Knowing the joint model of images and annotations, we can compute the query likelihood P (wqry |rJ ) for every image J in the dataset. [sent-100, score-0.759]

40 We can then rank images in the collection according to their likelihood of having the query as annotation, resulting in a special case of the popular Language Modeling approach to Information Retrieval [6]. [sent-101, score-0.43]

41 2 presents a generative framework for relating image regions with image annotations. [sent-106, score-0.763]

42 We assume that C includes one “transparent” color c0 , which will be handy when we have to layer image regions. [sent-111, score-0.321]

43 We assume that each image contains several distinct regions {r1 . [sent-114, score-0.439]

44 The ﬁnal image is the result of stacking or layering the regions on top of each other, as shown on the right side of Figure 1. [sent-121, score-0.475]

45 In our model of images, a central part will be played by a special function G which maps image regions r ∈ R to real-valued vectors g ∈ I k . [sent-122, score-0.507]

46 When we model image generation we will treat the output of G as a generator or a “recipe” for producing a certain type of image. [sent-126, score-0.384]

47 For example, a feature vetor g1 = G(r1 ) can be thought of as a generator for any image region resembling a sun-like object in the upper-left corner. [sent-127, score-0.496]

48 Finally, an annotation for a given image is a set of words {w1 . [sent-128, score-0.875]

49 We assume that the annotation describes the objects represented by regions {r1 . [sent-132, score-0.627]

50 However, contrary to prior work [4, 3] we do not assume an underlying one-to-one correspondence between the objects in the image annotation and words in the annotation. [sent-136, score-0.875]

51 Instead, we are interested in modeling a joint probability for observing a set of image regions {r1 . [sent-137, score-0.506]

52 rn } together with the set of annotation words {w1 . [sent-140, score-0.596]

53 According to the previous section J is represented as a set of image regions rJ = {r1 . [sent-146, score-0.439]

54 rn } along with the corresponding annotation wJ = {w1 . [sent-149, score-0.467]

55 rnA } denote the regions of some image A, which is not in the training set T . [sent-172, score-0.475]

56 We would like to model P (rA , wA ), the joint probability of observing an image deﬁned by rA together with annotation words wB . [sent-177, score-0.949]

57 Pick a training image J ∈ T with some probability PT (J) 2. [sent-181, score-0.315]

58 nB : (a) Pick the annotation word wb from the multinomial distribution PV (·|J). [sent-185, score-0.733]

59 (b) Pick the image region ra according to the probability PR (ra |ga ) 1 The assumptions of ﬁnite colormap and ﬁxed image size can easily be relaxed but require arguments that are beyond the scope of this paper. [sent-191, score-0.755]

60 We show the process of generating a simple image consisting of three regions and a corresponding 3-word annotation. [sent-193, score-0.472]

61 Note that the number of words in the annotation n B does not have to be the same as the number of image regions nA . [sent-194, score-1.035]

62 PT (J) is the probability of selecting the underlying model of image J to generate some new observation r, w. [sent-197, score-0.318]

63 PR (r|g) is a global probability distribution responsible for mapping generator vectors g∈I k to actual image regions r∈R. [sent-199, score-0.534]

64 In our case for every image region r there is only R one corresponding generator g = G(r), so we can assume a particularly simple form for the distribution PR : 1/Ng if G(r) = g PR (r|g) = (2) 0 otherwise where Ng is the number of all regions r in R such that G(r ) = g. [sent-200, score-0.57]

65 gn , which are later mapped to image regions rJ according to PR . [sent-205, score-0.439]

66 rn } to be the set of regions of image J we estimate: n 1 1 exp (g − G(ri )) Σ−1 (g − G(ri )) (3) k π k |Σ| n i=1 2 Equation (3) arises out of placing a Gaussian kernel over the feature vector G(r i ) of every region of image J. [sent-210, score-0.825]

67 PG (g|J) = PV (·|J) is the multinomial distribution that is assumed to have generated the annotation wJ of image J∈T . [sent-215, score-0.788]

68 Here µ is a constant, selected empirically, and pv is the relative frequency of observing the word v in the training set. [sent-219, score-0.296]

69 16 +60 % Table 1: Comparing recall and precision of the four models on the task of automatic image annotation. [sent-241, score-0.535]

70 Prior to modeling, every image in the dataset is pre-segmented into regions using general-purpose algorithms, such as normalized cuts [11]. [sent-249, score-0.464]

71 We divided the dataset into 3 parts - with 4,000 training set images, 500 evaluation set images and 500 images in the test set. [sent-254, score-0.562]

72 After ﬁxing the parameters, we merged the 4,000 training set and 500 evaluation set images to make a new training set. [sent-256, score-0.323]

73 This corresponds to the training set of 4500 images and the test set of 500 images used by Duygulu et al [4]. [sent-257, score-0.517]

74 1 Results: Automatic Image Annotation In this section we evaluate the performance of our model on the task of automatic image annotation. [sent-259, score-0.367]

75 We are given an un-annotated image J and are asked to automatically produce an annotation wauto . [sent-260, score-0.788]

76 The automatic annotation is then compared to the held-out human annotation wJ . [sent-261, score-0.983]

77 Given a set of image regions rJ we use equation (1) to arrive at the conditional distribution P (w|rJ ). [sent-263, score-0.439]

78 We take the top 5 words from that distribution and call them the automatic annotation of the image in question. [sent-264, score-0.924]

79 Then, following [4], we compute annotation recall and precision for every word in the testing set. [sent-265, score-0.787]

80 Recall is the number of images correctly annotated with a given word, divided by the number of images that have that word in the human annotation. [sent-266, score-0.688]

81 Precision is the number of correctly annotated images divided by the total number of images annotated with that particular word (correctly or not). [sent-267, score-0.79]

82 We compare the annotation performance of the four models: the Co-occurrence Model [9], the Translation Model [4], CMRM [5] and the model proposed in this paper (CRM). [sent-269, score-0.532]

83 We report the results on two sets of words: the subset of 49 best words which was used by[4, 5], and the complete set of all 260 words that occur in the testing set. [sent-270, score-0.286]

84 The ﬁgures clearly show that the model presented here (CRM) substabtially outperforms the other models and is the only one of the four capable of producing reasonable mean recall and mean precision numbers when every word in the test set is used. [sent-272, score-0.397]

85 ,9907,907 147089 Figure 2: The generative model based on contiuous features (CRM) that is proposed here performs substantially better than the discrete cross-media relevance model (CMRM) for annotating images in the test set. [sent-279, score-0.617]

86 Query length Number of queries Relevant images CMRM CRM CMRM CRM 1 word 2 words 3 words 179 386 178 1675 1647 542 Precision after 5 retrieved images 0. [sent-280, score-0.971]

87 4471 +61% Table 2: Comparing our model to the Cross-Media Relevance Model (CMRM) on the task of image retrieval. [sent-296, score-0.318]

88 Our model outperforms the CMRM model by a wide margin on all query sets. [sent-297, score-0.299]

89 In the retrieval setting we are given a text query wqry and a testing collection of un-annotated images. [sent-302, score-0.51]

90 For each testing image J we use equation (1) to get the conditional probability P (wqry |rJ ). [sent-303, score-0.307]

91 All images in the collection are ranked according to the conditional likelihood P (wqry |rJ ). [sent-304, score-0.301]

92 An image is considered relevant to a given query if its manual annotation contains all of the query words. [sent-309, score-1.143]

93 As our evaluation metrics we use precision at 5 retrieved images and non-interpolated average precision3 , averaged over the entire query set. [sent-310, score-0.664]

94 We observe that our model substantially outperforms the CMRM baseline on every query set. [sent-315, score-0.315]

95 All improvements on 1-, 2- and 3-word queries are statistically signiﬁcant based on a sign test with a p-value of 3 Average precision is the average of precision values at the ranks where relevant items occur. [sent-317, score-0.406]

96 Figure 3: Example: top 5 images retrieved in responce to text query “cars track” 0. [sent-318, score-0.536]

97 We are also very encouraged by the precision our model shows at 5 retrieved images: precision values around 0. [sent-320, score-0.41]

98 2 suggest that an average query always has a relevant image in the top 5. [sent-321, score-0.495]

99 Figure 3 shows top 5 images retrieved in response to the text query “cars track”. [sent-322, score-0.536]

100 We showed that this model works signiﬁcantly better than a number of other models for image annotation and retrieval. [sent-324, score-0.811]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('annotation', 0.467), ('crm', 0.322), ('cmrm', 0.304), ('image', 0.279), ('images', 0.225), ('query', 0.181), ('regions', 0.16), ('retrieval', 0.151), ('pv', 0.149), ('precision', 0.139), ('ra', 0.132), ('words', 0.129), ('wb', 0.113), ('word', 0.111), ('pg', 0.105), ('annotated', 0.102), ('relevance', 0.096), ('rj', 0.094), ('duygulu', 0.093), ('retrieved', 0.093), ('jeon', 0.089), ('wqry', 0.089), ('annotations', 0.079), ('blei', 0.079), ('retrieve', 0.079), ('wj', 0.079), ('annotating', 0.071), ('lavrenko', 0.071), ('sigir', 0.071), ('pr', 0.07), ('generator', 0.066), ('region', 0.065), ('queries', 0.059), ('semantics', 0.058), ('ga', 0.057), ('mori', 0.054), ('retrieving', 0.054), ('tiger', 0.054), ('ranked', 0.052), ('dirichlet', 0.052), ('translation', 0.05), ('annotate', 0.05), ('automatic', 0.049), ('sun', 0.047), ('features', 0.047), ('grass', 0.047), ('generative', 0.045), ('object', 0.044), ('pt', 0.043), ('keywords', 0.042), ('cars', 0.042), ('color', 0.042), ('recall', 0.042), ('feature', 0.042), ('multinomial', 0.042), ('automatically', 0.042), ('outperforms', 0.04), ('barnard', 0.04), ('model', 0.039), ('vocabulary', 0.037), ('text', 0.037), ('training', 0.036), ('croft', 0.036), ('ponte', 0.036), ('stacking', 0.036), ('relevant', 0.035), ('joint', 0.035), ('improvements', 0.034), ('generating', 0.033), ('picture', 0.032), ('acm', 0.032), ('modeling', 0.032), ('na', 0.031), ('doubly', 0.031), ('al', 0.031), ('latent', 0.03), ('ri', 0.03), ('vectors', 0.029), ('substantially', 0.029), ('continuous', 0.028), ('gures', 0.028), ('granularity', 0.028), ('testing', 0.028), ('pixels', 0.027), ('evaluation', 0.026), ('models', 0.026), ('baseline', 0.026), ('blobs', 0.026), ('nb', 0.026), ('proposed', 0.026), ('clustering', 0.026), ('pick', 0.025), ('dataset', 0.025), ('divided', 0.025), ('prominent', 0.025), ('quantize', 0.025), ('topological', 0.025), ('track', 0.025), ('collection', 0.024), ('association', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 12 nips-2003-A Model for Learning the Semantics of Pictures

Author: Victor Lavrenko, R. Manmatha, Jiwoon Jeon

2 0.27598774 37 nips-2003-Automatic Annotation of Everyday Movements

Author: Deva Ramanan, David A. Forsyth

Abstract: This paper describes a system that can annotate a video sequence with: a description of the appearance of each actor; when the actor is in view; and a representation of the actor’s activity while in view. The system does not require a ﬁxed background, and is automatic. The system works by (1) tracking people in 2D and then, using an annotated motion capture dataset, (2) synthesizing an annotated 3D motion sequence matching the 2D tracks. The 3D motion capture data is manually annotated off-line using a class structure that describes everyday motions and allows motion annotations to be composed — one may jump while running, for example. Descriptions computed from video of real motions show that the method is accurate.

3 0.19750996 17 nips-2003-A Sampled Texture Prior for Image Super-Resolution

Author: Lyndsey C. Pickup, Stephen J. Roberts, Andrew Zisserman

Abstract: Super-resolution aims to produce a high-resolution image from a set of one or more low-resolution images by recovering or inventing plausible high-frequency image content. Typical approaches try to reconstruct a high-resolution image using the sub-pixel displacements of several lowresolution images, usually regularized by a generic smoothness prior over the high-resolution image space. Other methods use training data to learn low-to-high-resolution matches, and have been highly successful even in the single-input-image case. Here we present a domain-speciﬁc image prior in the form of a p.d.f. based upon sampled images, and show that for certain types of super-resolution problems, this sample-based prior gives a signiﬁcant improvement over other common multiple-image super-resolution techniques. 1

4 0.10087711 73 nips-2003-Feature Selection in Clustering Problems

Author: Volker Roth, Tilman Lange

Abstract: A novel approach to combining clustering and feature selection is presented. It implements a wrapper strategy for feature selection, in the sense that the features are directly selected by optimizing the discriminative power of the used partitioning algorithm. On the technical side, we present an efﬁcient optimization algorithm with guaranteed local convergence property. The only free parameter of this method is selected by a resampling-based stability analysis. Experiments with real-world datasets demonstrate that our method is able to infer both meaningful partitions and meaningful subsets of features. 1

5 0.10033044 86 nips-2003-ICA-based Clustering of Genes from Microarray Expression Data

Author: Su-in Lee, Serafim Batzoglou

Abstract: We propose an unsupervised methodology using independent component analysis (ICA) to cluster genes from DNA microarray data. Based on an ICA mixture model of genomic expression patterns, linear and nonlinear ICA finds components that are specific to certain biological processes. Genes that exhibit significant up-regulation or down-regulation within each component are grouped into clusters. We test the statistical significance of enrichment of gene annotations within each cluster. ICA-based clustering outperformed other leading methods in constructing functionally coherent clusters on various datasets. This result supports our model of genomic expression data as composite effect of independent biological processes. Comparison of clustering performance among various ICA algorithms including a kernel-based nonlinear ICA algorithm shows that nonlinear ICA performed the best for small datasets and natural-gradient maximization-likelihood worked well for all the datasets. 1 In trod u ction Microarray technology has enabled genome-wide expression profiling, promising to provide insight into underlying biological mechanism involved in gene regulation. To aid such discoveries, mathematical tools that are versatile enough to capture the underlying biology and simple enough to be applied efficiently on large datasets are needed. Analysis tools based on novel data mining techniques have been proposed [1]-[6]. When applying mathematical models and tools to microarray analysis, clustering genes that have the similar biological properties is an important step for three reasons: reduction of data complexity, prediction of gene function, and evaluation of the analysis approach by measuring the statistical significance of biological coherence of gene clusters. Independent component analysis (ICA) linearly decomposes each of N vectors into M common component vectors (N≥M) so that each component is statistically as independent from the others as possible. One of the main applications of ICA is blind source separation (BSS) that aims to separate source signals from their mixtures. There have been a few attempts to apply ICA to the microarray expression data to extract meaningful signals each corresponding to independent biological process [5]-[6]. In this paper, we provide the first evidence that ICA is a superior mathematical model and clustering tool for microarray analysis, compared to the most widely used methods namely PCA and k-means clustering. We also introduce the application of nonlinear ICA to microarray analysis, and show that it outperforms linear ICA on some datasets. We apply ICA to microarray data to decompose the input data into statistically independent components. Then, genes are clustered in an unsupervised fashion into non-mutually exclusive clusters. Each independent component is assigned a putative biological meaning based on functional annotations of genes that are predominant within the component. We systematically evaluate the clustering performance of several ICA algorithms on four expression datasets and show that ICA-based clustering is superior to other leading methods that have been applied to analyze the same datasets. We also proposed a kernel based nonlinear ICA algorithm for dealing with more realistic mixture model. Among the different linear ICA algorithms including six linear and one nonlinear ICA algorithm, the natural-gradient maximum-likelihood estimation method (NMLE) [7]-[8] performs well in all the datasets. Kernel-based nonlinear ICA method worked better for three small datasets. 2 M a t h e m at i c a l m o de l o f g e n om e - wi d e e x p r e s s i on Several distinct biological processes take place simultaneously inside a cell; each biological process has its own expression program to up-regulate or down-regulate the level of expression of specific sets of genes. We model a genome-wide expression pattern in a given condition (measured by a microarray assay) as a mixture of signals generated by statistically independent biological processes with different activation levels. We design two kinds of models for genomic expression pattern: a linear and nonlinear mixture model. Suppose that a cell is governed by M independent biological processes S = (s1, …, sM)T, each of which is a vector of K gene expression levels, and that we measure the levels of expression of all genes in N conditions, resulting in a microarray expression matrix X = (x1,…,xN)T. The expression level at each different condition j can be expressed as linear combinations of the M biological processes: xj=aj1s1+…+ajMsM. We can express this idea concisely in matrix notation as follows. X = AS ,  x1   a11 L a1M   s1   M = M M  M        x N  a N 1 L a NM   s M       (1) More generally, we can express X = (x1,…,xN)T as a post-nonlinear mixture of the underlying independent processes as follows, where f(.) is a nonlinear mapping from N to N dimensional space. X = f ( AS ),   a11 L a1M   s1    x1    M  = f  M M  M         a  xN     N 1 L a NM   s M      (2) 3 I n d e p e n d e n t c o m po n e n t a n a l y s i s In the models described above, since we assume that the underlying biological processes are independent, we suggest that vectors S=(s1,…,sM) are statistically independent and so ICA can recover S from the observed microarray data X. For linear ICA, we apply natural-gradient maximum estimation (NMLE) method which was proposed in [7] and was made more efficient by using natural gradient method in [8]. We also apply nonlinear ICA using reproducible kernel Hilbert spaces (RKHS) based on [9], as follows: 1. We map the N dimensional input data xi to Ф(xi) in the feature space by using the kernel trick. The feature space is defined by the relationship Ф(xi)TФ(xj)=k(xi,, xj). That is, inner product of mapped data is determined to by a kernel function k(.,.) in the input space; we used a Gaussian radial basis function (RBF) kernel (k(x,y)=exp(-|x-y|2)) and a polynomial kernel of degree 2 (k(x,y)=(xTy+1)2). To perform mapping, we found orthonormal bases of the feature space by randomly sampling L input data v={v1,…,vL} 1000 times and choosing one set minimizing the condition number of Φv=(Φ(v1),…,Φ(vL)). Then, a set of orthonormal bases of the feature space is determined by the selected L images of input data in v as Ξ = Φv(Φv TΦv )-1/2. We map all input data x1,…,xK, each corresponding to a gene, to Ψ(x1),…,Ψ(xK) in the feature space with basis Ξ, as follows:  k (v1 , v1 ) K k (v1 , v L )   M M     k (v L , v1 ) L k (v L , v L )  Ψ(xi)=(ΦvTΦv )-1/2Φv TΦv (xi)  = −1 / 2  k (v1 , xi )   ∈ ℜ L (1≤ i≤K) (3)  M    k ( v L , xi )    2. We linearly decompose the mapped data Ψ=[Ψ(x1),.,Ψ(xK)] ∈RL×K into statistically independent components using NMLE. 4 Proposed approach The microarray dataset we are given is in matrix form where each element xij corresponds to the level of expression of the jth gene in the ith experimental condition. Missing values are imputed by KNNImpute [10], an algorithm based on k nearest neighbors that is widely used in microarray analysis. Given the expression matrix X of N experiment by K genes, we perform the following steps. 1. Apply ICA to decompose X into independent components y1, …,yM as in Equations (1) and (2). Prior to applying ICA, remove any rows that make the expression matrix X singular. After ICA, each component denoted by yi is a vector comprising K loads gene expression levels, i.e., yi = (yi1, ...,yiK). We chose to let the number of components M to be maximized, which is equal the number of microarray experiments N because the maximum for N in our datasets was 250, which is smaller than the number of biological processes we hypothesize to act within a cell. 2. For each component, cluster genes according to their relative loads yij/mean(yi). Based on our ICA model, each component is a putative genomic expression program of an independent biological process. Thus, our hypothesis is that genes showing relatively high or low expression level within the component are the most important for the process. We create two clusters for each component: one cluster containing genes with expression level higher than a threshold, and one cluster containing genes with expression level lower than a threshold. Cluster i,1 = {gene j | y ij > mean( y i ) + c × std( y i )} Cluster i,2 = {gene j | y ij < mean( y i ) – c × std( y i )} (4) Here, mean(yi) is the average, std(yi) is the standard deviation of yi; and c is an adjustable coefficient. The value of the coefficient c was varied from 1.0 to 2.0 and the result for c=1.25 was presented in this paper. The results for other values of c are similar, and are presented on the website www.stanford.edu/~silee/ICA/. 3. For each cluster, measure the enrichment of each cluster with genes of known functional annotations. Using the Gene Ontology (GO) [11] and KEGG [12] gene annotation databases, we calculate the p-value for each cluster with every gene annotation, which is the probability that the cluster contains the observed number of genes with the annotation by chance assuming the hypergeometric distribution (details in [4]). For each gene annotation, the minimum p-value that is smaller than 10-7 obtained from any cluster was collected. If no p-value smaller than 10-7 is found, we consider the gene annotation not to be detected by the approach. As a result, we can assign biological meaning to each cluster and the corresponding independent component and we can evaluate the clustering performance by comparing the collected minimum p-value for each gene annotation with that from other clustering approach. 5 P e r f o r m a n c e e v a l uat i o n We tested the ICA-based clustering to four expression datasets (D1—D4) described in Table 1. Table 1: The four datasets used in our analysis ARRAY TYPE DESCRIPTION # OF GENES (K) # OF EXPS (N) D1 Spotted 4579 22 D2 Oligonucl eotide Spotted Oligonucl eotide Budding yeast during cell cycle and CLB2/CLN3 overactive strain [13] Budding yeast during cell cycle [14] 6616 17 C. elegans in various conditions [3] Normal human tissue including 19 kinds of tissues [15] 17817 7070 553 59 D3 D4 For D1 and D4, we compared the biological coherence of ICA components with that of PCA applied in the same datasets in [1] and [2], respectively. For D2 and D3, we compared with k-means clustering and the topomap method, applied in the same datasets in [4] and [3], respectively. We applied nonlinear ICA to D1, D2 and D4. Dataset D3 is very large and makes the nonlinear algorithm unstable. D1 was preprocessed to contain log-ratios xij=log2(Rij/Gij) between red and green intensities. In [1], principal components, referred to as eigenarrays, were hypothesized to be genomic expression programs of distinct biological processes. We compared the biological coherence of independent components with that of principal components found by [1]. Comparison was done in two ways: (1) For each component, we grouped genes within top x% of significant up-regulation and down-regulation (as measured by the load of the gene in the component) into two clusters with x adjusted from 5% to 45%. For each value of x, statistical significance was measured for clusters from independent components and compared with that from principal components based on the minimum p-value for each gene annotation, as described in Section 4. We made a scatter plot to compare the negative log of the collected best p-values for each gene annotation when x is fixed to be 15%, shown in Figure 1 (a) (2) Same as before, except we did not fix the value of x; instead, we collected the minimum p-value from each method for each GO and KEGG gene annotation category and compared the collected p-values (Figure 1 (b)). For both cases, in the majority of the gene annotation categories ICA produced significantly lower p-values than PCA did, especially for gene annotation for which both ICA and PCA showed high significance. Figure 1. Comparison of linear ICA (NMLE) to PCA on dataset D1 (a) when x is fixed to be 15%; (b) when x is not fixed. (c) Three independent components of dataset D4. Each gene is mapped to a point based on the value assigned to the gene in three independent components, which are enriched with liver- (red), Muscle- (orange) and vulva-specific (green) genes, respectively. The expression levels of genes in D4 were normalized across the 59 experiments, and the logarithms of the resulting values were taken. Experiments 57, 58, and 59 were removed because they made the expression matrix nearly singular. In [2], a clustering approach based on PCA and subsequent visual inspection was applied to an earlier version of this dataset, containing 50 of the 59 samples. After we performed ICA, the most significant independent components were enriched for liver-specific, muscle-specific and vulva-specific genes with p-value of 10-133, 10-124 and 100-117, respectively. In the ICA liver cluster, 198 genes were liver specific (out of a total of 244), as compared with the 23 liver-specific genes identified in [2] using PCA. The ICA muscle cluster of 235 genes contains 199 muscle specific genes compared to 19 muscle-specific genes identified in [2]. We generated a 3-dimensional scatter plot of the load expression levels of all genes annotated in [15] on these significant ICA components in Figure 1 (c). We can see that the liver-specific, muscle-specific and vulva-specific genes are strongly biased to lie on the x-, y-, and z- axis, respectively. We applied nonlinear ICA on this dataset and the first four most significant clusters from nonlinear ICA with Gaussian RBF kernel were muscle-specific, liver-specific, vulva-specific and brain-specific with p-value of 10-158, 10-127, 10-112 and 10-70, respectively, showing considerable improvement over the linear ICA clusters. For D2, variance-normalization was applied to the 3000 most variant genes as in [4]. The 17th experiment, which made the expression matrix close to singular, was removed. We measured the statistical significance of clusters as described in Section 4 and compared the smallest p-value of each gene annotation from our approach to that from k-means clustering applied to the same dataset [4]. We made a scatter plot for comparing the negative log of the smallest p-value (y-axis) from ICA clusters with that from k-means clustering (x-axis). The coefficient c is varied from 1.0 to 2.0 and the superiority of ICA-based clustering to k-means clustering does not change. In many practical settings, estimation of the best c is not needed; we can adjust c to get a desired size of the cluster unless our focus is to blindly find the size of clusters. Figure 2 (a) (b) (c) shows for c=1.25 a comparison of the performance of linear ICA (NMLE), nonlinear ICA with Gaussian RBF kernel (NICA gauss), and k-means clustering (k-means). For D3, first we removed experiments that contained more than 7000 missing values, because ICA does not perform properly when the dataset contains many missing values. The 250 remaining experiments were used, containing expression levels for 17817 genes preprocessed to be log-ratios xij=log2(Rij/Gij) between red and green intensities. We compared the biological coherence of clusters by our approach with that of topomap-based approach applied to the same dataset in [3]. The result when c=1.25 is plotted in the Figure 2 (d). We observe that the two methods perform very similarly, with most categories having roughly the same p-value in ICA and in the topomap clusters. The topomap clustering approach performs slightly better in a larger fraction of the categories. Still, we consider this performance a confirmation that ICA is a widely applicable method that requires minimal training: in this case the missing values and high diversity of the data make clustering especially challenging, while the topomap approach was specifically designed and manually trained for this dataset as described in [3]. Finally, we compared different ICA algorithms in terms of clustering performance. We tested six linear ICA methods: Natural Gradient Maximum Likelihood Estimation (NMLE) [7][8], Joint Approximate Diagonalization of Eigenmatrices [16], Fast Fixed Point ICA with three different measures of non-Gaussianity [17], and Extended Information Maximization (Infomax) [18]. We also tested two kernels for nonlinear ICA: Gaussian RBF kernel, and polynomial kernel (NICA ploy). For each dataset, we compared the biological coherence of clusters generated by each method. Among the six linear ICA algorithms, NMLE was the best in all datasets. Among both linear and nonlinear methods, the Gaussian kernel nonlinear ICA method was the best in Datasets D1, D2 and D4, the polynomial kernel nonlinear ICA method was best in Dataset D4, and NMLE was best in the large datasets (D3 and D4). In Figure 3, we compare the NMLE method with three other ICA methods for the dataset D2. Overall, the NMLE algorithm consistently performed well in all datasets. The nonlinear ICA algorithms performed best in the small datasets, but were unstable in the two largest datasets. More comparison results are demonstrated in the website www.stanford.edu/~silee/ICA/. Figure 2: Comparison of (a) linear ICA (NMLE) with k-means clustering, (b) nonlinear ICA with Gaussian RBF kernel to linear ICA (NMLE), and (c) nonlinear ICA with Gaussian RBF kernel to k-means clustering on the dataset D2. (d) Comparison of linear ICA (NMLE) to topomap-based approach on the dataset D3. Figure 3: Comparison of linear ICA (NMLE) to (a) Extended Infomax ICA algorithm, (b) Fast ICA with symmetric orthogonalization and tanh nonlinearity and (c) Nonlinear ICA with polynomial kernel of degree 2 on the Dataset (B). 6 D i s c u s s i on ICA is a powerful statistical method for separating mixed independent signals. We proposed applying ICA to decompose microarray data into independent gene expression patterns of underlying biological processes, and to group genes into clusters that are mutually non-exclusive with statistically significant functional coherence. Our clustering method outperformed several leading methods on a variety of datasets, with the added advantage that it requires setting only one parameter, namely the fraction c of standard deviations beyond which a gene is considered to be associated with a component’s cluster. We observed that performance was not very sensitive to that parameter, suggesting that ICA is robust enough to be used for clustering with little human intervention. The empirical performance of ICA in our tests supports the hypothesis that statistical independence is a good criterion for separating mixed biological signals in microarray data. The Extended Infomax ICA algorithm proposed in [18] can automatically determine whether the distribution of each source signal is super-Gaussian or sub-Gaussian. Interestingly, the application of Extended Infomax ICA to all the expression datasets uncovered no source signal with sub-Gaussian distribution. A likely explanation is that global gene expression profiles are mixtures of super-Gaussian sources rather than of sub-Gaussian sources. This finding is consistent with the following intuition: underlying biological processes are super-Gaussian, because they affect sharply the relevant genes, typically a small fraction of all genes, and leave the majority of genes relatively unaffected. Acknowledgments We thank Te-Won Lee for helpful feedback. We thank Relly Brandman, Chuong Do, and Yueyi Liu for edits to the manuscript. References [1] Alter O, Brown PO, Botstein D. Proc. Natl. Acad. Sci. USA 97(18):10101-10106, 2000. [2] Misra J, Schmitt W, et al. Genome Research 12:1112-1120, 2002. [3] Kim SK, Lund J, et al. Science 293:2087-2092, 2001. [4] Tavazoie S, Hughes JD, et al. Nature Genetics 22(3):281-285, 1999. [5] Hori G, Inoue M, et al. Proc. 3rd Int. Workshop on Independent Component Analysis and Blind Signal Separation, Helsinki, Finland, pp. 151-155, 2000. [6] Liebermeister W. Bioinformatics 18(1):51-60, 2002. [7] Bell AJ. and Sejnowski TJ. Neural Computation, 7:1129-1159, 1995. [8] Amari S, Cichocki A, et al. In Advances in Neural Information Processing Systems 8, pp. 757-763. Cambridge, MA: MIT Press, 1996. [9] Harmeling S, Ziehe A, et al. In Advances in Neural Information Processing Systems 8, pp. 757-763. Cambridge, MA: MIT Press, . [10] Troyanskaya O., Cantor M, et al. Bioinformatics 17:520-525, 2001. [11] The Gene Ontology Consortium. Genome Research 11:1425-1433, 2001. [12] Kanehisa M., Goto S. In Current Topics in Computational Molecular Biology, pp. 301–315. MIT-Press, Cambridge, MA, 2002. [13] Spellman PT, Sherlock G, et al. Mol. Biol. Cell 9:3273-3297, 1998. [14] Cho RJ, Campell MJ, et al. Molecular Cell 2:65-73, 1998. [15] Hsiao L, Dangond F, et al. Physiol. Genomics 7:97-104, 2001. [16] Cardoso JF, Neural Computation 11(1):157-192, 1999. [17] Hyvarinen A. IEEE Transactions on Neural Network 10(3):626–634, 1999. [18] Lee TW, Girolami M, et al. Neural Computation 11:417–441, 1999.

6 0.096562542 88 nips-2003-Image Reconstruction by Linear Programming

7 0.095045298 192 nips-2003-Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes

8 0.087939821 39 nips-2003-Bayesian Color Constancy with Non-Gaussian Models

9 0.086369656 119 nips-2003-Local Phase Coherence and the Perception of Blur

10 0.085987851 164 nips-2003-Ranking on Data Manifolds

11 0.082549475 139 nips-2003-Nonlinear Filtering of Electron Micrographs by Means of Support Vector Regression

12 0.081682205 133 nips-2003-Mutual Boosting for Contextual Inference

13 0.075222783 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

14 0.068841368 190 nips-2003-Unsupervised Color Decomposition Of Histologically Stained Tissue Samples

15 0.067059293 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models

16 0.063751198 152 nips-2003-Pairwise Clustering and Graphical Models

17 0.063411549 195 nips-2003-When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?

18 0.054912493 85 nips-2003-Human and Ideal Observers for Detecting Image Curves

19 0.05384415 6 nips-2003-A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters

20 0.052984089 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.177), (1, -0.074), (2, 0.037), (3, -0.046), (4, -0.286), (5, -0.091), (6, 0.046), (7, 0.006), (8, 0.042), (9, -0.115), (10, -0.075), (11, 0.028), (12, -0.278), (13, 0.171), (14, 0.039), (15, -0.082), (16, -0.128), (17, -0.016), (18, -0.076), (19, -0.036), (20, -0.041), (21, 0.06), (22, -0.053), (23, 0.07), (24, -0.053), (25, 0.057), (26, -0.005), (27, -0.014), (28, 0.086), (29, -0.076), (30, 0.028), (31, 0.1), (32, 0.172), (33, -0.07), (34, -0.037), (35, -0.039), (36, 0.042), (37, 0.021), (38, 0.109), (39, 0.113), (40, 0.032), (41, -0.138), (42, -0.006), (43, -0.014), (44, -0.132), (45, -0.048), (46, -0.098), (47, -0.084), (48, -0.001), (49, 0.073)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95028478 12 nips-2003-A Model for Learning the Semantics of Pictures

Author: Victor Lavrenko, R. Manmatha, Jiwoon Jeon

2 0.73458171 17 nips-2003-A Sampled Texture Prior for Image Super-Resolution

Author: Lyndsey C. Pickup, Stephen J. Roberts, Andrew Zisserman

3 0.67140782 37 nips-2003-Automatic Annotation of Everyday Movements

Author: Deva Ramanan, David A. Forsyth

4 0.53565234 39 nips-2003-Bayesian Color Constancy with Non-Gaussian Models

Author: Charles Rosenberg, Alok Ladsariya, Tom Minka

Abstract: We present a Bayesian approach to color constancy which utilizes a nonGaussian probabilistic model of the image formation process. The parameters of this model are estimated directly from an uncalibrated image set and a small number of additional algorithmic parameters are chosen using cross validation. The algorithm is empirically shown to exhibit RMS error lower than other color constancy algorithms based on the Lambertian surface reﬂectance model when estimating the illuminants of a set of test images. This is demonstrated via a direct performance comparison utilizing a publicly available set of real world test images and code base.

5 0.50411028 88 nips-2003-Image Reconstruction by Linear Programming

Author: Koji Tsuda, Gunnar Rätsch

Abstract: A common way of image denoising is to project a noisy image to the subspace of admissible images made for instance by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. We propose a new method to identify the noisy pixels by 1 -norm penalization and update the identiﬁed pixels only. The identiﬁcation and updating of noisy pixels are formulated as one linear program which can be solved efﬁciently. Especially, one can apply the ν-trick to directly specify the fraction of pixels to be reconstructed. Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e.g. sunglasses on faces). The basic idea is to penalize boundary points and interior points of the occluded area differently. We are able to show the ν-property also for this extended LP leading a method which is easy to use. Experimental results impressively demonstrate the power of our approach.

6 0.43404806 139 nips-2003-Nonlinear Filtering of Electron Micrographs by Means of Support Vector Regression

7 0.41328955 192 nips-2003-Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes

8 0.41287285 73 nips-2003-Feature Selection in Clustering Problems

9 0.40951872 190 nips-2003-Unsupervised Color Decomposition Of Histologically Stained Tissue Samples

10 0.40327105 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

11 0.38645127 195 nips-2003-When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?

12 0.36719546 119 nips-2003-Local Phase Coherence and the Perception of Blur

13 0.36055061 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

14 0.3546862 133 nips-2003-Mutual Boosting for Contextual Inference

15 0.34995762 11 nips-2003-A Mixed-Signal VLSI for Real-Time Generation of Edge-Based Image Vectors

16 0.33150455 69 nips-2003-Factorization with Uncertainty and Missing Data: Exploiting Temporal Coherence

17 0.29707778 6 nips-2003-A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters

18 0.28414762 85 nips-2003-Human and Ideal Observers for Detecting Image Curves

19 0.28003332 7 nips-2003-A Functional Architecture for Motion Pattern Processing in MSTd

20 0.27780318 120 nips-2003-Locality Preserving Projections

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.053), (11, 0.14), (25, 0.241), (29, 0.016), (30, 0.033), (35, 0.041), (49, 0.011), (53, 0.076), (66, 0.012), (71, 0.076), (76, 0.049), (85, 0.052), (91, 0.106)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83455294 12 nips-2003-A Model for Learning the Semantics of Pictures

Author: Victor Lavrenko, R. Manmatha, Jiwoon Jeon

2 0.68441755 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection

Author: Jianxin Wu, James M. Rehg, Matthew D. Mullin

Abstract: Face detection is a canonical example of a rare event detection problem, in which target patterns occur with much lower frequency than nontargets. Out of millions of face-sized windows in an input image, for example, only a few will typically contain a face. Viola and Jones recently proposed a cascade architecture for face detection which successfully addresses the rare event nature of the task. A central part of their method is a feature selection algorithm based on AdaBoost. We present a novel cascade learning algorithm based on forward feature selection which is two orders of magnitude faster than the Viola-Jones approach and yields classiﬁers of equivalent quality. This faster method could be used for more demanding classiﬁcation tasks, such as on-line learning. 1

3 0.66601378 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels

Author: Jason Weston, Dengyong Zhou, André Elisseeff, William S. Noble, Christina S. Leslie

Abstract: A key issue in supervised protein classiﬁcation is the representation of input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classiﬁcation performance. However, such representations are based only on labeled data — examples with known 3D structures, organized into structural classes — while in practice, unlabeled data is far more plentiful. In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classiﬁcation performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods while achieving far greater computational efﬁciency. 1

4 0.6633386 37 nips-2003-Automatic Annotation of Everyday Movements

Author: Deva Ramanan, David A. Forsyth

5 0.65354925 88 nips-2003-Image Reconstruction by Linear Programming

Author: Koji Tsuda, Gunnar Rätsch

6 0.61985588 11 nips-2003-A Mixed-Signal VLSI for Real-Time Generation of Edge-Based Image Vectors

7 0.59812832 17 nips-2003-A Sampled Texture Prior for Image Super-Resolution

8 0.58311886 168 nips-2003-Salient Boundary Detection using Ratio Contour

9 0.56247139 112 nips-2003-Learning to Find Pre-Images

10 0.56022114 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors

11 0.55886579 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

12 0.55727553 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

13 0.55380112 113 nips-2003-Learning with Local and Global Consistency