nips nips2003 nips2003-177 knowledge-graph by maker-knowledge-mining

177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles

Source: pdf

Author: Mark Girolami, Ata Kabán

Abstract: To provide a compact generative representation of the sequential activity of a number of individuals within a group there is a tradeoff between the deﬁnition of individual speciﬁc and global models. This paper proposes a linear-time distributed model for ﬁnite state symbolic sequences representing traces of individual user activity by making the assumption that heterogeneous user behavior may be ‘explained’ by a relatively small number of common structurally simple behavioral patterns which may interleave randomly in a user-speciﬁc proportion. The results of an empirical study on three different sources of user traces indicates that this modelling approach provides an efﬁcient representation scheme, reﬂected by improved prediction performance as well as providing lowcomplexity and intuitively interpretable representations.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Ata Kab´ n a School of Computer Science University of Birmingham Birmingham, UK a. [sent-4, score-0.036]

2 uk Abstract To provide a compact generative representation of the sequential activity of a number of individuals within a group there is a tradeoff between the deﬁnition of individual speciﬁc and global models. [sent-8, score-0.337]

3 The results of an empirical study on three different sources of user traces indicates that this modelling approach provides an efﬁcient representation scheme, reﬂected by improved prediction performance as well as providing lowcomplexity and intuitively interpretable representations. [sent-10, score-0.324]

4 1 Introduction The now commonplace ability to accurately and inexpensively log the activity of individuals in a digital environment makes available a variety of traces of user activity and with it the necessity to develop efﬁcient representations, or proﬁles, of individuals. [sent-11, score-0.347]

5 Most often, such recordings take the form of streams of discrete symbols ordered in time. [sent-12, score-0.109]

6 The modelling of time dependent sequences of discrete symbols employing n’th order Markov chains has been extensively studied in a number of domains. [sent-13, score-0.399]

7 The representation provided by such models is global in the sense that it is assumed that one global generating process underlies all observed sequences. [sent-14, score-0.159]

8 To capture the possible heterogeneous nature of the observed sequences a model with a number of differing generating processes needs to be considered. [sent-15, score-0.189]

9 Indeed the notion of a heterogeneous population, characterized for example by occupational mobility and consumer brand preferences, has been captured in the MoverStayer model [3]. [sent-16, score-0.112]

10 This model is a discrete time stochastic process that is a two component mixture of ﬁrst order Markov chains, one of which is degenerate and possesses an identity transition matrix characterizing the stayers in the population. [sent-17, score-0.324]

11 The original notion of a twocomponent mixture of Markov chains has recently been extended to the general form of a mixture model of Markov chains in [2]. [sent-18, score-0.689]

12 This is also a computationally attractive model, as simple structural characteristics may be assumed at the generative level, while allowing them to interleave randomly can account for more complex individual behavior. [sent-21, score-0.142]

13 Consistent generative semantics similar to the recently introduced latent Dirichlet allocation (LDA) [1] will be adopted and by analogy with [8] the resulting model will be referred to as a simplicial mixture. [sent-23, score-0.597]

14 2 Simplicial Mixtures of Markov Chains Assume that a sequence of L symbols sL sL−1 , · · · , s0 , denoted by s, can be drawn from a dictionary S by a process k, which has initial state probability P1 (k) and has |S|m+1 state transition probabilities denoted by T (sm , · · · , s1 → s0 |k). [sent-24, score-0.412]

15 The number of times that the symbol s0 follows from the state deﬁned by the m-tuple of symbols s sm , · · · , s1 within the n-th sequence is denoted as rnm ,··· ,s1 →s0 and so the probability of the sequence of symbols under the k’th m-th order Markov process is P (s|k) = sm ,··· ,s1 →s0 |S| |S| . [sent-25, score-0.832]

16 To introduce a more comP1 (k) sm =1 · · · s0 =1 T (sm , · · · , s1 → s0 |k)r pact notation we represent the elements of the state transition matrix for the k’th Markov process by Tm···0,k and the counts rsm ,··· ,s1 →s0 within the n’th observed sequence as m···0 rn . [sent-26, score-0.458]

17 In addition, we employ Start and Stop states in each symbol sequence sn and incorporate the initial state distribution of the Start state as the transition probabilities from this state within the state transition matrix Tk . [sent-27, score-0.749]

18 We denote the set of all state transition matrices {T1 , · · · , Tk , · · · , TK } as T. [sent-28, score-0.148]

19 Suppose that we are given a set of symbolic trajectories {sn }n=1:N over a common ﬁnite state space, each having length Ln . [sent-29, score-0.052]

20 To account for this idea, we will adopt a similar modelling strategy to LDA. [sent-31, score-0.078]

21 These are then combined with the individual state-transition probabilities Tk , which are model parameters to be estimated, K and yield the symbol transition probabilities Tm···0 = k=1 Tm···0,k λk . [sent-33, score-0.315]

22 As in [7], for each kn observed sequence in the sample a MAP value for the variable λ is iteratively estimated by the following multiplicative updates |S| |S| ˜ λkn Ln + k (αk − 1) sm =1 s0 =1 (3) m···0 where Ln = sm ···s0 rn is the length of the sequence sn . [sent-40, score-0.981]

23 It is interesting to note that the MAP estimator under a uniform Dirichlet distribution exactly recovers the aspect mixture model of [5] as a special case of the MAP estimated LDA model. [sent-43, score-0.27]

24 The above (2) can be further lower-bounded by noting that |S| log P (sn |T, λ) ≥ |S| K m···0 rn Qm···0,n (k) log λk ··· sm =1 s0 =1 k=1 Tm···0,k Qm···0,n (k) (5) where k Qm···0,n (k) = 1, Qm···0,n (k) ≥ 0 are additional variational parameters. [sent-47, score-0.391]

25 ) can also be understood as a variational distribution on a discrete hidden variable with K possible outcomes that selects which transition matrix is active at each time step of the generative process. [sent-49, score-0.256]

26 t ˜ Tm···0,k = Tm···0,k m···0 rn t exp{ψ(γkn )} K k =1 ; t+1 Tm···0,k = 2. [sent-51, score-0.069]

27 2 Prediction with Simplicial Mixtures The predictive probability of observing symbol snext given a sequence of L symbols sn = {sLn , · · · , s1 } is given as P (snext |sn ) = EP (λ|sn ) {P (snext |sm · · · s1 , λ)} ≈ K k=1 T (snext |sm · · · s1 , k)EQn (λ) {λk }. [sent-52, score-0.707]

28 It should be noted that while m-th order Markov chains form the basis of the representation, the resulting simplicial mixture is not m-th order Markov with any global transition model. [sent-53, score-0.873]

29 Rather it approximates the individual m-th order models while keeping the generative parameter set compact. [sent-54, score-0.132]

30 The m-th order information of each individual’s past behaviour is embodied in the individual-speciﬁc latent variable estimate. [sent-55, score-0.047]

31 On the other hand in a mixture model one component is responsible for sequence generation so within a cluster the representation is still global m-th order. [sent-56, score-0.38]

32 1 Telephone Usage Modelling The ability to model the usage of a telephone service is of importance at a number of levels, e. [sent-62, score-0.219]

33 to obtain a predictive model of customer speciﬁc activity and service usage for the purposes of service provision planning, resource management of switching capacity, identiﬁcation of fraudulent usage of services. [sent-64, score-0.708]

34 A representative description can be based on the distribution of the destination numbers dialled and connected by the customer, in which case a multinomial distribution over the dialling codes can be employed. [sent-65, score-0.141]

35 One method of encoding the destination numbers dialled by a customer is to capture the geographic location of the destination, or the mobile service provider if not a land based call. [sent-66, score-0.358]

36 This is useful in determining the potential demand placed on telecommunication switches which route trafﬁc from various geographical regions on the service providers network. [sent-67, score-0.088]

37 Two weeks of transactions from a UK telecommunications operator were logged during weekdays, amounting to 36,492,082 and 45,350,654 transactions in each week respectively. [sent-68, score-0.17]

38 All transactions made by commercial customers in the Glasgow region of the UK were considered in this study. [sent-69, score-0.145]

39 This amounts to 1,172,578 transactions from 12,202 high usage customers in the ﬁrst week considered and 1,753,304 transactions being made in the following week. [sent-70, score-0.276]

40 The mapping from dialling number to geographic region or mobile operator was encoded with 87 symbols amounting to a possible 7,569 symbol transitions. [sent-71, score-0.309]

41 Each customers activity is deﬁned by a sequence of symbols deﬁning the sequence of calls made over each period considered and these are employed to encode activity in a customer speciﬁc generative representation. [sent-72, score-0.667]

42 The resulting data set, referred to as WEB, totals 119,667 page requests corresponding to 1,480 web browsing sessions. [sent-78, score-0.189]

43 8 Figure 1: Left: percentage of incorrect predictions against the number of model factors; right: predictive perplexity of each model against model order for the PHONE dataset. [sent-98, score-0.412]

44 Solid straight line: global ﬁrst order MC, dash: MAP estimated simplicial mixture, solid line: VB estimated simplicial mixture, dash-dot: mixture model. [sent-99, score-1.131]

45 2 Results In each experiment the objective assessment of model performance is evaluated by the Ntest predictive perplexity, exp{−1/N m=1 log P (snext |sm )}. [sent-101, score-0.195]

46 In addition, the predictive accuracy of all models is measured under a 0-1 loss. [sent-102, score-0.158]

47 Given a number of previously unobserved truncated sequences, the number of times the model correctly predicts the symbol which follows in the sequence is then counted. [sent-103, score-0.182]

48 In all mixture models naive random initialization of the parameters was employed and parameter estimation was halted when the 5 0. [sent-104, score-0.257]

49 05 1 Simplicial Model 2 Mixture Model 0 5 10 15 20 (k) Figure 2: Left: distribution of entropy rates for the transition matrices of a 20-factor mixture and simplicial mixture models (VB). [sent-112, score-1.036]

50 Right: the expected value of the Dirichlet variable under the variational approximation for one customer indicating the levels of participation in factor speciﬁc behaviors. [sent-113, score-0.258]

51 001%, no annealing or early stopping was utilized, ﬁfteen randomly initialized parameter estimation runs for each model were performed. [sent-115, score-0.071]

52 The number of mixture components for the models ranged from 2 up to 200. [sent-116, score-0.221]

53 The results are summarized in Figure 1, from the predictive perplexity measures it is clear that the simplicial representation provides a statistically (tested at the 5% level using a Wilcoxon Rank Sum test) and practically signiﬁcant reduction in perplexity over the global and mixture models. [sent-118, score-1.197]

54 This is also reﬂected in the levels of prediction error under each model, however the mixture models tend to perform slightly worse than the global model. [sent-119, score-0.295]

55 As expected the MAP estimated simplicial model performs slightly worse than that obtained using VB [1]. [sent-120, score-0.481]

56 This also provides an additional insight as to why LDA models improve upon PLSA, as they are in fact both the same model using different approximations to the likelihood, refer to [10] for an illustrative discussion on the weaknesses of MAP estimators. [sent-121, score-0.063]

57 As a comparison to different structural models hidden Markov models with a range of hidden states were also tested on this data set the best results obtained were for a ten state model which achieved a predictive perplexity score of (mean±standard-deviation) 11. [sent-122, score-0.45]

58 959, considerably poorer than that obtained by the models considered here. [sent-126, score-0.028]

59 The left hand plot of Figure (2) shows the distribution of the entropy rates for the transition probabilities in twenty factor simplicial and mixture models, the results are obtained from ﬁfty randomly initialized estimation procedures. [sent-128, score-0.94]

60 The entropy rates for the simplicial mixture are signiﬁcantly lower than that of a mixture model indicating that the basis of each representation describes a number of simpler processes. [sent-129, score-0.986]

61 The results of ten-fold cross-validated predictive perplexities again show statistically signiﬁcant improvement obtained with the VB-estimated simplicial mixture (again tested using the ranksum Wilcoxon test at the 5% level). [sent-131, score-0.758]

62 Five of the estimated transition factors of a twenty-factor model are shown in Figure 4, demonstrating once more that the proposed model creates a low entropy and an easily interpretable dynamic factorial representation. [sent-133, score-0.439]

63 The numbers on the axes on these charts correspond to the 17 page cat- egories enumerated earlier and the average strength of each of these factors amongst the N 1 full set of twenty factors computed as N n=1 ED(λ|γ n ) {λk } is also given above each chart. [sent-134, score-0.238]

64 The distribution of the entropy rates of the full set of these twenty basistransitions in comparison to those obtained from the mixture model is given on the right hand plot of Figure 3. [sent-136, score-0.403]

65 Clearly, the coding efﬁciency of a simplicial mixture representation is signiﬁcantly (statistically tested) superior. [sent-137, score-0.636]

66 Note also these basis-transitions embody correlated transitions (transitions which appear in similar dynamical contexts and so have similar functionality), as can be seen from the multiplicative nature of the equations used for identifying the model. [sent-138, score-0.068]

67 It is not surprising then that state repetitions or transitions which express focused interest in one of the topic categories appear together on distinct factors. [sent-139, score-0.112]

68 We can also see a joint interest in msnnews and msnsport being present together on the 4-th chart of Figure 4 — indeed, as the preﬁx of these page categories also indicates, these are related page categories. [sent-140, score-0.208]

69 6 Nr of Factors (log10) Mixture Model Simplicial Model Figure 3: Left: the predictive perplexity for the WEB data (straight line: global ﬁrstorder Markov chain, dash-dot: mixture of Markov chains, dotted line: simplicial mixture estimated by MAP, solid line: simplicial mixture estimated by VB). [sent-158, score-1.824]

70 4 Conclusions This paper has presented a linear time method to model ﬁnite-state sequences of discrete symbols which may arise from user or customer activity traces. [sent-160, score-0.495]

71 The main feature of the proposed approach has been the assumption that heterogeneous user behavior may be ‘explained’ by the interleaved action of some structurally simple common generator processes. [sent-161, score-0.267]

72 07 2 2 2 2 2 4 4 4 4 4 6 6 6 6 8 8 8 8 8 10 10 10 10 10 12 12 12 12 12 14 10 15 5 10 15 14 16 16 16 5 14 14 14 16 6 5 10 15 16 5 10 15 5 10 Figure 4: State transition matrices of selected factors from a 20-factor run on WEB. [sent-167, score-0.16]

73 15 An empirical study has been conducted on two real-world collections of user activity which has demonstrated this to be an efﬁcient representation, revealed by both objective measures of prediction performance, low entropy rates, and interpretable representations of the user proﬁles provided. [sent-168, score-0.43]

74 White, Model-based clustering and visualisation of navigation patterns on a web site, Journal of data Mining and Knowledge Discovery, in press. [sent-182, score-0.089]

75 Frydman, Maximum likelihood estimation in the mover-stayer model, Journal of the American Statistical Society, 79, 632-638, 1984. [sent-184, score-0.036]

76 Hofmann,Unsupervised learning by probabilistic latent semantic analysis, Machine Learning, 42, 177-196, 2001. [sent-191, score-0.047]

77 Ronning, Maximum likelihood estimation of Dirichlet distributions, Journal of Statistical Computation and Simulation, 32:4, 215-221, 1989. [sent-193, score-0.036]

78 Lafferty, Expectation-propogation for the generative aspect model, Proceedings of the Eighteenth Conference on Uncertainty in Artiﬁcial Intelligence, 2002. [sent-200, score-0.072]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tm', 0.404), ('simplicial', 0.404), ('ap', 0.215), ('mixture', 0.193), ('kn', 0.182), ('perplexity', 0.177), ('sm', 0.174), ('sn', 0.166), ('dirichlet', 0.155), ('snext', 0.155), ('qm', 0.135), ('customer', 0.135), ('chains', 0.134), ('predictive', 0.13), ('symbols', 0.109), ('vb', 0.105), ('user', 0.1), ('transition', 0.096), ('lda', 0.094), ('entropy', 0.088), ('variational', 0.088), ('service', 0.088), ('eqn', 0.088), ('symbol', 0.08), ('modelling', 0.078), ('browsing', 0.077), ('heterogeneous', 0.077), ('customers', 0.077), ('qn', 0.073), ('girolami', 0.073), ('generative', 0.072), ('activity', 0.07), ('rn', 0.069), ('usage', 0.067), ('sequence', 0.067), ('glasgow', 0.066), ('markov', 0.064), ('factors', 0.064), ('chart', 0.062), ('map', 0.06), ('behavioral', 0.06), ('users', 0.059), ('week', 0.058), ('page', 0.057), ('tk', 0.056), ('web', 0.055), ('ln', 0.053), ('twenty', 0.053), ('destination', 0.053), ('state', 0.052), ('les', 0.051), ('latent', 0.047), ('global', 0.046), ('interleaved', 0.046), ('sequences', 0.046), ('mixtures', 0.046), ('dialled', 0.044), ('dialling', 0.044), ('plsa', 0.044), ('structurally', 0.044), ('interpretable', 0.044), ('pro', 0.043), ('individuals', 0.042), ('estimated', 0.042), ('multiplicative', 0.04), ('representation', 0.039), ('adopted', 0.039), ('interleave', 0.038), ('geographic', 0.038), ('amounting', 0.038), ('kab', 0.038), ('th', 0.038), ('transactions', 0.037), ('estimation', 0.036), ('uk', 0.036), ('probabilities', 0.036), ('traces', 0.035), ('birmingham', 0.035), ('wilcoxon', 0.035), ('participation', 0.035), ('sl', 0.035), ('model', 0.035), ('dynamic', 0.035), ('rates', 0.034), ('patterns', 0.034), ('individual', 0.032), ('categories', 0.032), ('employing', 0.032), ('news', 0.031), ('commercial', 0.031), ('differing', 0.031), ('phone', 0.031), ('statistically', 0.031), ('log', 0.03), ('telephone', 0.029), ('distributed', 0.028), ('management', 0.028), ('prediction', 0.028), ('models', 0.028), ('transitions', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles

Author: Mark Girolami, Ata Kabán

2 0.24048972 195 nips-2003-When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?

Author: David Donoho, Victoria Stodden

Abstract: We interpret non-negative matrix factorization geometrically, as the problem of ﬁnding a simplicial cone which contains a cloud of data points and which is contained in the positive orthant. We show that under certain conditions, basically requiring that some of the data are spread across the faces of the positive orthant, there is a unique such simplicial cone. We give examples of synthetic image articulation databases which obey these conditions; these require separated support and factorial sampling. For such databases there is a generative model in terms of ‘parts’ and NMF correctly identiﬁes the ‘parts’. We show that our theoretical results are predictive of the performance of published NMF code, by running the published algorithms on one of our synthetic image articulation databases. 1

3 0.12164918 131 nips-2003-Modeling User Rating Profiles For Collaborative Filtering

Author: Benjamin M. Marlin

Abstract: In this paper we present a generative latent variable model for rating-based collaborative ﬁltering called the User Rating Proﬁle model (URP). The generative process which underlies URP is designed to produce complete user rating proﬁles, an assignment of one rating to each item for each user. Our model represents each user as a mixture of user attitudes, and the mixing proportions are distributed according to a Dirichlet random variable. The rating for each item is generated by selecting a user attitude for the item, and then selecting a rating according to the preference pattern associated with that attitude. URP is related to several models including a multinomial mixture model, the aspect model [7], and LDA [1], but has clear advantages over each. 1

4 0.099850208 155 nips-2003-Perspectives on Sparse Bayesian Learning

Author: Jason Palmer, Bhaskar D. Rao, David P. Wipf

Abstract: Recently, relevance vector machines (RVM) have been fashioned from a sparse Bayesian learning (SBL) framework to perform supervised learning using a weight prior that encourages sparsity of representation. The methodology incorporates an additional set of hyperparameters governing the prior, one for each weight, and then adopts a speciﬁc approximation to the full marginalization over all weights and hyperparameters. Despite its empirical success however, no rigorous motivation for this particular approximation is currently available. To address this issue, we demonstrate that SBL can be recast as the application of a rigorous variational approximation to the full model by expressing the prior in a dual form. This formulation obviates the necessity of assuming any hyperpriors and leads to natural, intuitive explanations of why sparsity is achieved in practice. 1

5 0.099149503 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis

Author: Pedro F. Felzenszwalb, Daniel P. Huttenlocher, Jon M. Kleinberg

Abstract: In applying Hidden Markov Models to the analysis of massive data streams, it is often necessary to use an artiﬁcially reduced set of states; this is due in large part to the fact that the basic HMM estimation algorithms have a quadratic dependence on the size of the state set. We present algorithms that reduce this computational bottleneck to linear or near-linear time, when the states can be embedded in an underlying grid of parameters. This type of state representation arises in many domains; in particular, we show an application to traﬃc analysis at a high-volume Web site. 1

6 0.088562056 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles

7 0.085608132 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices

8 0.08461795 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models

9 0.082850926 79 nips-2003-Gene Expression Clustering with Functional Mixture Models

10 0.081257105 193 nips-2003-Variational Linear Response

11 0.080086395 83 nips-2003-Hierarchical Topic Models and the Nested Chinese Restaurant Process

12 0.076193683 63 nips-2003-Error Bounds for Transductive Learning via Compression and Clustering

13 0.062475339 77 nips-2003-Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

14 0.05961524 73 nips-2003-Feature Selection in Clustering Problems

15 0.055171289 115 nips-2003-Linear Dependent Dimensionality Reduction

16 0.054722887 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

17 0.052346125 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions

18 0.05233065 166 nips-2003-Reconstructing MEG Sources with Unknown Correlations

19 0.051694155 58 nips-2003-Efficient Multiscale Sampling from Products of Gaussian Mixtures

20 0.051682167 91 nips-2003-Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.184), (1, 0.004), (2, -0.008), (3, 0.047), (4, -0.012), (5, 0.04), (6, 0.138), (7, -0.056), (8, -0.025), (9, -0.048), (10, -0.051), (11, -0.157), (12, -0.108), (13, 0.064), (14, 0.063), (15, 0.052), (16, -0.159), (17, 0.038), (18, 0.07), (19, -0.159), (20, -0.242), (21, 0.108), (22, -0.209), (23, -0.051), (24, -0.035), (25, -0.066), (26, -0.044), (27, -0.035), (28, 0.155), (29, 0.204), (30, 0.178), (31, -0.081), (32, -0.214), (33, 0.075), (34, -0.114), (35, 0.158), (36, 0.04), (37, 0.13), (38, -0.006), (39, -0.032), (40, -0.208), (41, 0.067), (42, -0.056), (43, -0.013), (44, -0.022), (45, -0.056), (46, -0.071), (47, -0.026), (48, -0.03), (49, 0.102)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95096332 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles

Author: Mark Girolami, Ata Kabán

2 0.75816858 131 nips-2003-Modeling User Rating Profiles For Collaborative Filtering

Author: Benjamin M. Marlin

3 0.71956956 195 nips-2003-When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?

Author: David Donoho, Victoria Stodden

4 0.43404391 83 nips-2003-Hierarchical Topic Models and the Nested Chinese Restaurant Process

Author: Thomas L. Griffiths, Michael I. Jordan, Joshua B. Tenenbaum, David M. Blei

Abstract: We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts. 1

5 0.38104886 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis

Author: Pedro F. Felzenszwalb, Daniel P. Huttenlocher, Jon M. Kleinberg

6 0.33922946 75 nips-2003-From Algorithmic to Subjective Randomness

7 0.33846202 155 nips-2003-Perspectives on Sparse Bayesian Learning

8 0.31714022 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science

9 0.31388301 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices

10 0.29901764 153 nips-2003-Parameterized Novelty Detectors for Environmental Sensor Monitoring

11 0.29680091 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles

12 0.29166368 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models

13 0.27630442 63 nips-2003-Error Bounds for Transductive Learning via Compression and Clustering

14 0.26608473 79 nips-2003-Gene Expression Clustering with Functional Mixture Models

15 0.24535263 146 nips-2003-Online Learning of Non-stationary Sequences

16 0.22913972 68 nips-2003-Eye Movements for Reward Maximization

17 0.22846949 77 nips-2003-Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

18 0.22494468 181 nips-2003-Statistical Debugging of Sampled Programs

19 0.22412489 14 nips-2003-A Nonlinear Predictive State Representation

20 0.2180201 44 nips-2003-Can We Learn to Beat the Best Stock

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.038), (11, 0.025), (29, 0.014), (30, 0.02), (35, 0.08), (53, 0.089), (63, 0.309), (71, 0.068), (76, 0.06), (85, 0.062), (91, 0.092), (99, 0.048)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8918438 13 nips-2003-A Neuromorphic Multi-chip Model of a Disparity Selective Complex Cell

Author: Bertram E. Shi, Eric K. Tsang

Abstract: The relative depth of objects causes small shifts in the left and right retinal positions of these objects, called binocular disparity. Here, we describe a neuromorphic implementation of a disparity selective complex cell using the binocular energy model, which has been proposed to model the response of disparity selective cells in the visual cortex. Our system consists of two silicon chips containing spiking neurons with monocular Gabor-type spatial receptive fields (RF) and circuits that combine the spike outputs to compute a disparity selective complex cell response. The disparity selectivity of the cell can be adjusted by both position and phase shifts between the monocular RF profiles, which are both used in biology. Our neuromorphic system performs better with phase encoding, because the relative responses of neurons tuned to different disparities by phase shifts are better matched than the responses of neurons tuned by position shifts.

2 0.8859151 183 nips-2003-Synchrony Detection by Analogue VLSI Neurons with Bimodal STDP Synapses

Author: Adria Bofill-i-petit, Alan F. Murray

Abstract: We present test results from spike-timing correlation learning experiments carried out with silicon neurons with STDP (Spike Timing Dependent Plasticity) synapses. The weight change scheme of the STDP synapses can be set to either weight-independent or weight-dependent mode. We present results that characterise the learning window implemented for both modes of operation. When presented with spike trains with diﬀerent types of synchronisation the neurons develop bimodal weight distributions. We also show that a 2-layered network of silicon spiking neurons with STDP synapses can perform hierarchical synchrony detection. 1

same-paper 3 0.81011081 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles

Author: Mark Girolami, Ata Kabán

4 0.66668123 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates

Author: Peter L. Bartlett, Michael I. Jordan, Jon D. Mcauliffe

Abstract: Many classiﬁcation algorithms, including the support vector machine, boosting and logistic regression, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. We characterize the statistical consequences of using such a surrogate by providing a general quantitative relationship between the risk as assessed using the 0-1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial bounds under the weakest possible condition on the loss function—that it satisfy a pointwise form of Fisher consistency for classiﬁcation. The relationship is based on a variational transformation of the loss function that is easy to compute in many applications. We also present a reﬁned version of this result in the case of low noise. Finally, we present applications of our results to the estimation of convergence rates in the general setting of function classes that are scaled hulls of a ﬁnite-dimensional base class.

5 0.58608991 27 nips-2003-Analytical Solution of Spike-timing Dependent Plasticity Based on Synaptic Biophysics

Author: Bernd Porr, Ausra Saudargiene, Florentin Wörgötter

Abstract: Spike timing plasticity (STDP) is a special form of synaptic plasticity where the relative timing of post- and presynaptic activity determines the change of the synaptic weight. On the postsynaptic side, active backpropagating spikes in dendrites seem to play a crucial role in the induction of spike timing dependent plasticity. We argue that postsynaptically the temporal change of the membrane potential determines the weight change. Coming from the presynaptic side induction of STDP is closely related to the activation of NMDA channels. Therefore, we will calculate analytically the change of the synaptic weight by correlating the derivative of the membrane potential with the activity of the NMDA channel. Thus, for this calculation we utilise biophysical variables of the physiological cell. The ﬁnal result shows a weight change curve which conforms with measurements from biology. The positive part of the weight change curve is determined by the NMDA activation. The negative part of the weight change curve is determined by the membrane potential change. Therefore, the weight change curve should change its shape depending on the distance from the soma of the postsynaptic cell. We ﬁnd temporally asymmetric weight change close to the soma and temporally symmetric weight change in the distal dendrite. 1

6 0.58170855 16 nips-2003-A Recurrent Model of Orientation Maps with Simple and Complex Cells

7 0.52629864 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model

8 0.52221912 158 nips-2003-Policy Search by Dynamic Programming

9 0.52039433 189 nips-2003-Tree-structured Approximations by Expectation Propagation

10 0.51789337 78 nips-2003-Gaussian Processes in Reinforcement Learning

11 0.51136214 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions

12 0.51089633 122 nips-2003-Margin Maximizing Loss Functions

13 0.50971818 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

14 0.50961405 107 nips-2003-Learning Spectral Clustering

15 0.50771999 72 nips-2003-Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms

16 0.50663692 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games

17 0.50437957 30 nips-2003-Approximability of Probability Distributions

18 0.50380915 146 nips-2003-Online Learning of Non-stationary Sequences

19 0.50375861 61 nips-2003-Entrainment of Silicon Central Pattern Generators for Legged Locomotory Control

20 0.50368077 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images