nips nips2001 nips2001-30 knowledge-graph by maker-knowledge-mining

30 nips-2001-Agglomerative Multivariate Information Bottleneck


Source: pdf

Author: Noam Slonim, Nir Friedman, Naftali Tishby

Abstract: The information bottleneck method is an unsupervised model independent data organization technique. Given a joint distribution peA, B), this method constructs a new variable T that extracts partitions, or clusters, over the values of A that are informative about B. In a recent paper, we introduced a general principled framework for multivariate extensions of the information bottleneck method that allows us to consider multiple systems of data partitions that are inter-related. In this paper, we present a new family of simple agglomerative algorithms to construct such systems of inter-related clusters. We analyze the behavior of these algorithms and apply them to several real-life datasets.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Agglomerative Multivariate Information Bottleneck Noam Sionim Nir Friedman Naftali Tishby School of Computer Science & Engineering, Hebrew University, Jerusalem 91904, Israel {noamm, nir, tishby } @cs. [sent-1, score-0.234]

2 il Abstract The information bottleneck method is an unsupervised model independent data organization technique. [sent-4, score-0.491]

3 Given a joint distribution peA, B), this method constructs a new variable T that extracts partitions, or clusters, over the values of A that are informative about B. [sent-5, score-0.469]

4 In a recent paper, we introduced a general principled framework for multivariate extensions of the information bottleneck method that allows us to consider multiple systems of data partitions that are inter-related. [sent-6, score-0.89]

5 In this paper, we present a new family of simple agglomerative algorithms to construct such systems of inter-related clusters. [sent-7, score-0.444]

6 We analyze the behavior of these algorithms and apply them to several real-life datasets. [sent-8, score-0.074]

7 1 Introduction The information bottleneck (IB) method of Tishby et al [14] is an unsupervised nonparametric data organization technique. [sent-9, score-0.629]

8 Given a joint distribution P(A, B), this method constructs a new variable T that represents partitions of A which are (locally) maximizing the mutual information about B. [sent-10, score-0.628]

9 In other words, the variable T induces a sufficient partition, or informative features of the variable A with respect to B. [sent-11, score-0.334]

10 The construction of T finds a tradeoff between the information about A that we try to minimize, J(T; A), and the information about B which we try to maximize, J(T ; B). [sent-12, score-0.569]

11 This approach is particularly useful for co-occurrence data, such as words and documents [12], where we want to capture what information one variable (e. [sent-13, score-0.288]

12 This extension allows us to consider cases where the data partition is relevant with respect to several variables, or where we construct several systems of clusters simultaneously. [sent-20, score-0.368]

13 In this framework, we specify the desired interactions by a pair of Bayesian networks. [sent-21, score-0.114]

14 One network, Gin, represents which variables are compressed versions of the observed variables - each new variable compresses its parents in the network. [sent-22, score-0.518]

15 The second network, Gout> defines the statistical relationship between these new variables and the observed variables that should be maintained. [sent-23, score-0.243]

16 we formulated the general principle as a tradeoff between the (multi) information each network carries. [sent-25, score-0.362]

17 On the one hand, we want to minimize the information maintained by G in and on the other to maximize the information maintained by Gout. [sent-26, score-0.431]

18 We also provide a characterization of stationary points in this tradeoff as a set of self-consistent equations. [sent-27, score-0.306]

19 Moreover, we prove that iterations of these equations converges to a (local) optimum. [sent-28, score-0.061]

20 Then, we describe a deterministic annealing procedure that constructs a solution by tracking the bifurcation of clusters as it traverses the tradeoff curve, similar to the original IB method. [sent-29, score-0.847]

21 In this paper, we consider an alternative approach to solving multivariate IB problems which is motivated by the success of the agglomerative IB of Slonim and Tishby [11]. [sent-30, score-0.625]

22 As shown there, a bottom-up greedy agglomeration is a simple heuristic procedure that can yield good solutions to the original IB problem. [sent-31, score-0.547]

23 Here we extend this idea in the context of multivariate IB problems. [sent-32, score-0.264]

24 We start by analyzing the cost of agglomeration steps within this framework. [sent-33, score-0.41]

25 This both elucidates the criteria that guides greedy agglomeration and provides for efficient local evaluation rules for agglomeration steps. [sent-34, score-0.992]

26 This construction results with a novel family of information theoretic agglomerative clustering algorithms, that can be specified using the graphs G in and G out. [sent-35, score-0.639]

27 We demonstrate the performance of some of these algorithms for document and word clustering and gene expression analysis. [sent-36, score-0.319]

28 2 Multivariate Information Bottleneck A Bayesian network structure G is a DAG that specifies interactions among variables [8]. [sent-37, score-0.284]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('agglomeration', 0.371), ('ib', 0.361), ('agglomerative', 0.294), ('multivariate', 0.264), ('bottleneck', 0.261), ('tishby', 0.234), ('tradeoff', 0.225), ('partitions', 0.169), ('constructs', 0.157), ('nir', 0.147), ('friedman', 0.136), ('maintained', 0.108), ('clusters', 0.105), ('informative', 0.095), ('partition', 0.09), ('document', 0.09), ('organization', 0.088), ('variables', 0.085), ('compresses', 0.081), ('guides', 0.081), ('bifurcation', 0.081), ('dag', 0.081), ('try', 0.079), ('variable', 0.077), ('interactions', 0.075), ('noam', 0.073), ('traverses', 0.073), ('pea', 0.068), ('slonim', 0.068), ('network', 0.065), ('construction', 0.065), ('greedy', 0.065), ('unsupervised', 0.065), ('hebrew', 0.064), ('jerusalem', 0.064), ('naftali', 0.064), ('word', 0.064), ('clustering', 0.063), ('family', 0.062), ('gene', 0.061), ('multi', 0.061), ('extension', 0.06), ('compressed', 0.059), ('specifies', 0.059), ('parents', 0.056), ('extracts', 0.054), ('nonparametric', 0.054), ('annealing', 0.052), ('induces', 0.051), ('israel', 0.051), ('maximize', 0.05), ('characterization', 0.049), ('documents', 0.049), ('finds', 0.049), ('want', 0.049), ('tracking', 0.048), ('et', 0.047), ('construct', 0.047), ('theoretic', 0.046), ('joint', 0.045), ('minimize', 0.044), ('defines', 0.042), ('original', 0.042), ('method', 0.041), ('words', 0.041), ('pa', 0.041), ('algorithms', 0.041), ('bayesian', 0.041), ('recent', 0.041), ('versions', 0.04), ('extensions', 0.04), ('analyzing', 0.039), ('specify', 0.039), ('specified', 0.039), ('school', 0.038), ('criteria', 0.038), ('principled', 0.038), ('al', 0.037), ('heuristic', 0.037), ('information', 0.036), ('formulated', 0.036), ('particularly', 0.036), ('mutual', 0.036), ('represents', 0.035), ('graphs', 0.034), ('sufficient', 0.034), ('rules', 0.034), ('locally', 0.034), ('success', 0.034), ('several', 0.033), ('motivated', 0.033), ('stationary', 0.032), ('local', 0.032), ('procedure', 0.032), ('deterministic', 0.032), ('maximizing', 0.032), ('prove', 0.031), ('relationship', 0.031), ('denoted', 0.03), ('converges', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 30 nips-2001-Agglomerative Multivariate Information Bottleneck

Author: Noam Slonim, Nir Friedman, Naftali Tishby

Abstract: The information bottleneck method is an unsupervised model independent data organization technique. Given a joint distribution peA, B), this method constructs a new variable T that extracts partitions, or clusters, over the values of A that are informative about B. In a recent paper, we introduced a general principled framework for multivariate extensions of the information bottleneck method that allows us to consider multiple systems of data partitions that are inter-related. In this paper, we present a new family of simple agglomerative algorithms to construct such systems of inter-related clusters. We analyze the behavior of these algorithms and apply them to several real-life datasets.

2 0.13538875 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

Author: Ran El-Yaniv, Oren Souroujon

Abstract: We present a powerful meta-clustering technique called Iterative Double Clustering (IDC). The IDC method is a natural extension of the recent Double Clustering (DC) method of Slonim and Tishby that exhibited impressive performance on text categorization tasks [12]. Using synthetically generated data we empirically find that whenever the DC procedure is successful in recovering some of the structure hidden in the data, the extended IDC procedure can incrementally compute a significantly more accurate classification. IDC is especially advantageous when the data exhibits high attribute noise. Our simulation results also show the effectiveness of IDC in text categorization problems. Surprisingly, this unsupervised procedure can be competitive with a (supervised) SVM trained with a small training set. Finally, we propose a simple and natural extension of IDC for semi-supervised and transductive learning where we are given both labeled and unlabeled examples. 1

3 0.082392596 135 nips-2001-On Spectral Clustering: Analysis and an algorithm

Author: Andrew Y. Ng, Michael I. Jordan, Yair Weiss

Abstract: Despite many empirical successes of spectral clustering methodsalgorithms that cluster points using eigenvectors of matrices derived from the data- there are several unresolved issues. First, there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems. 1

4 0.080335528 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

Author: Hilbert J. Kappen, Wim Wiegerinck

Abstract: The Cluster Variation method is a class of approximation methods containing the Bethe and Kikuchi approximations as special cases. We derive two novel iteration schemes for the Cluster Variation Method. One is a fixed point iteration scheme which gives a significant improvement over loopy BP, mean field and TAP methods on directed graphical models. The other is a gradient based method, that is guaranteed to converge and is shown to give useful results on random graphs with mild frustration. We conclude that the methods are of significant practical value for large inference problems. 1

5 0.067268856 107 nips-2001-Latent Dirichlet Allocation

Author: David M. Blei, Andrew Y. Ng, Michael I. Jordan

Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hofmann's aspect model , also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification. 1

6 0.066348843 171 nips-2001-Spectral Relaxation for K-means Clustering

7 0.062426582 24 nips-2001-Active Information Retrieval

8 0.056954145 149 nips-2001-Probabilistic Abstraction Hierarchies

9 0.051548507 8 nips-2001-A General Greedy Approximation Algorithm with Applications

10 0.047989532 119 nips-2001-Means, Correlations and Bounds

11 0.046222415 21 nips-2001-A Variational Approach to Learning Curves

12 0.045370627 185 nips-2001-The Method of Quantum Clustering

13 0.042890932 87 nips-2001-Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway

14 0.042500626 84 nips-2001-Global Coordination of Local Linear Models

15 0.041688606 78 nips-2001-Fragment Completion in Humans and Machines

16 0.040344652 43 nips-2001-Bayesian time series classification

17 0.039470278 128 nips-2001-Multiagent Planning with Factored MDPs

18 0.038437497 143 nips-2001-PAC Generalization Bounds for Co-training

19 0.038073674 170 nips-2001-Spectral Kernel Methods for Clustering

20 0.037625998 13 nips-2001-A Natural Policy Gradient


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.122), (1, -0.0), (2, 0.01), (3, -0.084), (4, 0.027), (5, -0.102), (6, -0.053), (7, -0.029), (8, -0.065), (9, -0.047), (10, 0.134), (11, -0.028), (12, -0.079), (13, 0.023), (14, -0.017), (15, -0.01), (16, -0.004), (17, -0.044), (18, -0.061), (19, -0.047), (20, 0.017), (21, 0.008), (22, 0.03), (23, -0.032), (24, 0.142), (25, 0.052), (26, 0.101), (27, 0.149), (28, 0.045), (29, -0.003), (30, -0.071), (31, -0.062), (32, -0.052), (33, 0.054), (34, -0.019), (35, 0.091), (36, -0.028), (37, -0.131), (38, 0.18), (39, 0.022), (40, 0.145), (41, -0.005), (42, 0.018), (43, -0.081), (44, -0.027), (45, -0.112), (46, -0.013), (47, -0.02), (48, 0.23), (49, 0.144)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97318912 30 nips-2001-Agglomerative Multivariate Information Bottleneck

Author: Noam Slonim, Nir Friedman, Naftali Tishby

Abstract: The information bottleneck method is an unsupervised model independent data organization technique. Given a joint distribution peA, B), this method constructs a new variable T that extracts partitions, or clusters, over the values of A that are informative about B. In a recent paper, we introduced a general principled framework for multivariate extensions of the information bottleneck method that allows us to consider multiple systems of data partitions that are inter-related. In this paper, we present a new family of simple agglomerative algorithms to construct such systems of inter-related clusters. We analyze the behavior of these algorithms and apply them to several real-life datasets.

2 0.68464464 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

Author: Ran El-Yaniv, Oren Souroujon

Abstract: We present a powerful meta-clustering technique called Iterative Double Clustering (IDC). The IDC method is a natural extension of the recent Double Clustering (DC) method of Slonim and Tishby that exhibited impressive performance on text categorization tasks [12]. Using synthetically generated data we empirically find that whenever the DC procedure is successful in recovering some of the structure hidden in the data, the extended IDC procedure can incrementally compute a significantly more accurate classification. IDC is especially advantageous when the data exhibits high attribute noise. Our simulation results also show the effectiveness of IDC in text categorization problems. Surprisingly, this unsupervised procedure can be competitive with a (supervised) SVM trained with a small training set. Finally, we propose a simple and natural extension of IDC for semi-supervised and transductive learning where we are given both labeled and unlabeled examples. 1

3 0.39588621 194 nips-2001-Using Vocabulary Knowledge in Bayesian Multinomial Estimation

Author: Thomas L. Griffiths, Joshua B. Tenenbaum

Abstract: Estimating the parameters of sparse multinomial distributions is an important component of many statistical learning tasks. Recent approaches have used uncertainty over the vocabulary of symbols in a multinomial distribution as a means of accounting for sparsity. We present a Bayesian approach that allows weak prior knowledge, in the form of a small set of approximate candidate vocabularies, to be used to dramatically improve the resulting estimates. We demonstrate these improvements in applications to text compression and estimating distributions over words in newsgroup data. 1

4 0.3958534 53 nips-2001-Constructing Distributed Representations Using Additive Clustering

Author: Wheeler Ruml

Abstract: If the promise of computational modeling is to be fully realized in higherlevel cognitive domains such as language processing, principled methods must be developed to construct the semantic representations used in such models. In this paper, we propose the use of an established formalism from mathematical psychology, additive clustering, as a means of automatically constructing binary representations for objects using only pairwise similarity data. However, existing methods for the unsupervised learning of additive clustering models do not scale well to large problems. We present a new algorithm for additive clustering, based on a novel heuristic technique for combinatorial optimization. The algorithm is simpler than previous formulations and makes fewer independence assumptions. Extensive empirical tests on both human and synthetic data suggest that it is more effective than previous methods and that it also scales better to larger problems. By making additive clustering practical, we take a significant step toward scaling connectionist models beyond hand-coded examples.

5 0.39026821 186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning

Author: Mikio L. Braun, Joachim M. Buhmann

Abstract: We consider noisy Euclidean traveling salesman problems in the plane, which are random combinatorial problems with underlying structure. Gibbs sampling is used to compute average trajectories, which estimate the underlying structure common to all instances. This procedure requires identifying the exact relationship between permutations and tours. In a learning setting, the average trajectory is used as a model to construct solutions to new instances sampled from the same source. Experimental results show that the average trajectory can in fact estimate the underlying structure and that overfitting effects occur if the trajectory adapts too closely to a single instance. 1

6 0.38855943 171 nips-2001-Spectral Relaxation for K-means Clustering

7 0.38604403 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

8 0.37117293 107 nips-2001-Latent Dirichlet Allocation

9 0.32759622 90 nips-2001-Hyperbolic Self-Organizing Maps for Semantic Navigation

10 0.31882176 135 nips-2001-On Spectral Clustering: Analysis and an algorithm

11 0.30817744 149 nips-2001-Probabilistic Abstraction Hierarchies

12 0.28504488 94 nips-2001-Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM

13 0.28205639 87 nips-2001-Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway

14 0.27741253 185 nips-2001-The Method of Quantum Clustering

15 0.26436511 78 nips-2001-Fragment Completion in Humans and Machines

16 0.24740073 128 nips-2001-Multiagent Planning with Factored MDPs

17 0.24665609 110 nips-2001-Learning Hierarchical Structures with Linear Relational Embedding

18 0.24227244 24 nips-2001-Active Information Retrieval

19 0.2393232 26 nips-2001-Active Portfolio-Management based on Error Correction Neural Networks

20 0.23653859 109 nips-2001-Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.024), (15, 0.378), (19, 0.015), (27, 0.062), (30, 0.081), (72, 0.07), (79, 0.032), (91, 0.228)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82545459 30 nips-2001-Agglomerative Multivariate Information Bottleneck

Author: Noam Slonim, Nir Friedman, Naftali Tishby

Abstract: The information bottleneck method is an unsupervised model independent data organization technique. Given a joint distribution peA, B), this method constructs a new variable T that extracts partitions, or clusters, over the values of A that are informative about B. In a recent paper, we introduced a general principled framework for multivariate extensions of the information bottleneck method that allows us to consider multiple systems of data partitions that are inter-related. In this paper, we present a new family of simple agglomerative algorithms to construct such systems of inter-related clusters. We analyze the behavior of these algorithms and apply them to several real-life datasets.

2 0.71575177 104 nips-2001-Kernel Logistic Regression and the Import Vector Machine

Author: Ji Zhu, Trevor Hastie

Abstract: The support vector machine (SVM) is known for its good performance in binary classification, but its extension to multi-class classification is still an on-going research issue. In this paper, we propose a new approach for classification, called the import vector machine (IVM), which is built on kernel logistic regression (KLR). We show that the IVM not only performs as well as the SVM in binary classification, but also can naturally be generalized to the multi-class case. Furthermore, the IVM provides an estimate of the underlying probability. Similar to the “support points” of the SVM, the IVM model uses only a fraction of the training data to index kernel basis functions, typically a much smaller fraction than the SVM. This gives the IVM a computational advantage over the SVM, especially when the size of the training data set is large.

3 0.66796207 143 nips-2001-PAC Generalization Bounds for Co-training

Author: Sanjoy Dasgupta, Michael L. Littman, David A. McAllester

Abstract: The rule-based bootstrapping introduced by Yarowsky, and its cotraining variant by Blum and Mitchell, have met with considerable empirical success. Earlier work on the theory of co-training has been only loosely related to empirically useful co-training algorithms. Here we give a new PAC-style bound on generalization error which justifies both the use of confidences — partial rules and partial labeling of the unlabeled data — and the use of an agreement-based objective function as suggested by Collins and Singer. Our bounds apply to the multiclass case, i.e., where instances are to be assigned one of labels for . £ ¡ ¤¢

4 0.57607043 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

Author: Ran El-Yaniv, Oren Souroujon

Abstract: We present a powerful meta-clustering technique called Iterative Double Clustering (IDC). The IDC method is a natural extension of the recent Double Clustering (DC) method of Slonim and Tishby that exhibited impressive performance on text categorization tasks [12]. Using synthetically generated data we empirically find that whenever the DC procedure is successful in recovering some of the structure hidden in the data, the extended IDC procedure can incrementally compute a significantly more accurate classification. IDC is especially advantageous when the data exhibits high attribute noise. Our simulation results also show the effectiveness of IDC in text categorization problems. Surprisingly, this unsupervised procedure can be competitive with a (supervised) SVM trained with a small training set. Finally, we propose a simple and natural extension of IDC for semi-supervised and transductive learning where we are given both labeled and unlabeled examples. 1

5 0.5542441 111 nips-2001-Learning Lateral Interactions for Feature Binding and Sensory Segmentation

Author: Heiko Wersing

Abstract: We present a new approach to the supervised learning of lateral interactions for the competitive layer model (CLM) dynamic feature binding architecture. The method is based on consistency conditions, which were recently shown to characterize the attractor states of this linear threshold recurrent network. For a given set of training examples the learning problem is formulated as a convex quadratic optimization problem in the lateral interaction weights. An efficient dimension reduction of the learning problem can be achieved by using a linear superposition of basis interactions. We show the successful application of the method to a medical image segmentation problem of fluorescence microscope cell images.

6 0.55366367 144 nips-2001-Partially labeled classification with Markov random walks

7 0.5518778 18 nips-2001-A Rational Analysis of Cognitive Control in a Speeded Discrimination Task

8 0.54968226 66 nips-2001-Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

9 0.54776722 107 nips-2001-Latent Dirichlet Allocation

10 0.54752082 148 nips-2001-Predictive Representations of State

11 0.54108161 87 nips-2001-Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway

12 0.54073358 174 nips-2001-Spike timing and the coding of naturalistic sounds in a central auditory area of songbirds

13 0.53799212 123 nips-2001-Modeling Temporal Structure in Classical Conditioning

14 0.53743494 189 nips-2001-The g Factor: Relating Distributions on Features to Distributions on Images

15 0.52883238 183 nips-2001-The Infinite Hidden Markov Model

16 0.52307451 182 nips-2001-The Fidelity of Local Ordinal Encoding

17 0.52034158 122 nips-2001-Model Based Population Tracking and Automatic Detection of Distribution Changes

18 0.51997244 68 nips-2001-Entropy and Inference, Revisited

19 0.51975024 160 nips-2001-Reinforcement Learning and Time Perception -- a Model of Animal Experiments

20 0.51956534 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data