nips nips2000 nips2000-41 knowledge-graph by maker-knowledge-mining

41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach


Source: pdf

Author: Gal Elidan, Noam Lotner, Nir Friedman, Daphne Koller

Abstract: A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. As such, they induce seemingly complex dependencies among the latter. In recent years, much attention has been devoted to the development of algorithms for learning parameters, and in some cases structure, in the presence of hidden variables. In this paper, we address the related problem of detecting hidden variables that interact with the observed variables. This problem is of interest both for improving our understanding of the domain and as a preliminary step that guides the learning procedure towards promising models. A very natural approach is to search for

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract A serious problem in learning probabilistic models is the presence of hidden variables. [sent-6, score-0.529]

2 In recent years, much attention has been devoted to the development of algorithms for learning parameters, and in some cases structure, in the presence of hidden variables. [sent-9, score-0.577]

3 In this paper, we address the related problem of detecting hidden variables that interact with the observed variables. [sent-10, score-0.765]

4 A very natural approach is to search for "structural signatures" of hidden variables substructures in the learned network that tend to suggest the presence of a hidden variable. [sent-12, score-1.598]

5 An important issue is the existence of hidden variables that are never observed, yet interact with observed variables. [sent-18, score-0.675]

6 We can construct a network over the observable variables which is an I-map for the marginal distribution over these variables, i. [sent-21, score-0.395]

7 It contains 12 edges rather than 6, and the nodes have much bigger families. [sent-29, score-0.378]

8 From the perspective of learning these networks from data, the marginalized network has significant disadvantages. [sent-31, score-0.315]

9 Moreover, with limited amounts of data the induced network will usually omit several of the dependencies in the model. [sent-33, score-0.393]

10 When a hidden variable is known to exist, we can introduce it into the network and apply known BN learning algorithms. [sent-34, score-0.835]

11 If the network structure is known, algorithms such as Figure 1: Hidden variable simplifies structure (a) with hidden variable (b) no hidden variable EM [3, 9] or gradient ascent [2] can learn parameters. [sent-35, score-1.933]

12 If the structure is not known, the Structural EM (SEM) algorithm of [4] can be used to perform structure learning with missing data. [sent-36, score-0.496]

13 However, we cannot simply introduce a "floating" hidden variable and expect SEM to place it correctly. [sent-37, score-0.592]

14 Hence, both of these algorithms assume that some other mechanism introduces the hidden variable in approximately the right location in the network. [sent-38, score-0.592]

15 Somewhat surprisingly, only little work has been done on the problem of automatically detecting that a hidden variable might be present in a certain position in the network. [sent-39, score-0.725]

16 In this paper, we investigate what is arguably the most straightforward approach for inducing the existence of a hidden variable. [sent-40, score-0.479]

17 We then search the structure for substructures, which we call semi-cliques, that seem as if they might be induced by a hidden variable. [sent-42, score-0.799]

18 We temporarily introduce the hidden variable in a way that breaks up the clique, and then continue learning based on that new structure. [sent-43, score-0.635]

19 If the resulting structure has a better score, we keep the hidden variable. [sent-44, score-0.596]

20 We apply our approach to several synthetic and real datasets, and show that it often provides a good initial placement for the introduced hidden variable. [sent-48, score-0.609]

21 ,Xn } of discrete random variables where each variable Xi may take on values from a finite set. [sent-53, score-0.325]

22 The common approach to this problem is to introduce a scoring function that evaluates each network with respect to the training data, and then to search for the best network according to this score. [sent-68, score-0.69]

23 The scoring function most commonly used to learn Bayesian networks is the Bayesian scoring metric [8]. [sent-69, score-0.344]

24 Given a scoring function, the structure learning task reduces to a problem of searching over the combinatorial space of structures for the structure that maximizes the score. [sent-70, score-0.574]

25 The problem of learning in the presence of partially observable data (or known hidden variables) is computationally and conceptually much harder. [sent-73, score-0.6]

26 In the case of a fixed network structure, the Expectation Maximization (EM) algorithm of [3] can be used to search for a (local) maximum likelihood (or maximum a posteriori) assignment to the parameters. [sent-74, score-0.361]

27 The current model structure as well as parameters - is used for computing expected sufficient statistics for other candidate structures. [sent-77, score-0.319]

28 3 Detecting Hidden Variables We motivate our approach for detecting hidden variables by considering the simple example discussed in the introduction. [sent-81, score-0.727]

29 Consider the distribution represented by the network shown in Figure l(a), where H is a hidden variable. [sent-82, score-0.625]

30 We can show that this phenomenon is a typical effect of removing a hidden variables: Proposition 3. [sent-87, score-0.425]

31 , X n that contains an edge from Xi to X j whenever G contains such an edge, and in addition: G' contains a clique over the children}j of H , and G' contains an edge from any parent Xi of H to any child}j of H. [sent-96, score-0.626]

32 We want to define a procedure that will suggest candidate hidden variables by finding structures of this type in the context of a learning algorithm. [sent-98, score-0.936]

33 We will apply our procedure to networks induced by standard structure learning algorithms [7]. [sent-99, score-0.415]

34 We therefore use a somewhat more flexible definition, which allows us to detect potential hidden variables. [sent-103, score-0.479]

35 These cases are of less interest to us, because they are less likely to arise from the marginalization of a hidden variable. [sent-120, score-0.473]

36 In the second phase, we convert each of the semi-cliques to a structure candidate containing a new hidden node. [sent-121, score-0.786]

37 Our construction introduces a new variable H, and replaces all of the incoming edges into variables in Q by edges from H. [sent-123, score-0.765]

38 In the third phase, we evaluate each of these candidate structures in attempt to find the most useful hidden variable. [sent-126, score-0.701]

39 The simplest assumes that the network structure, after the introduction of the hidden variable, is fixed. [sent-129, score-0.625]

40 In other words, we assume that the "true" structure of the network is indeed the result of applying our transformation to the input network (which was produced by the first stage of learning). [sent-130, score-0.571]

41 In our construction, we chose to make the hidden variable H the parent of all the nodes in the semi-clique, and eliminate all other incoming edges to variables in the clique. [sent-133, score-1.195]

42 There might well be cases where some of the edges in the clique are warranted even in the presence of the hidden variable. [sent-135, score-0.88]

43 It might also be the case that some of the edges from H to the semi-clique variables should be reversed. [sent-136, score-0.421]

44 We could therefore allow the learning algorithm - the SEM algorithm of [4] - to adapt the structure after the hidden variable is introduced. [sent-138, score-0.888]

45 One approach is to use SEM to fine-tune our model for the part of the network we just changed: the variables in the semi-clique and the new hidden variable. [sent-139, score-0.837]

46 To summarize our approach: In the first phase we analyze the network learned using conventional structure search to find semi-cliques that indicate potential locations of hidden variables. [sent-144, score-1.071]

47 In the second phase we convert these semi-cliques into structure candidates (each containing a new hidden variable). [sent-145, score-0.74]

48 Finally, in the third phase we evaluate each of these structures (possibly using them as a seed for further search) and return the best scoring network we find. [sent-146, score-0.57]

49 The main assumption of our approach is that we can find "structural signatures" of hidden variables via semi-cliques. [sent-147, score-0.637]

50 As we discussed above, it is unrealistic to expect the learned network G to have exactly the structure described in Proposition 3. [sent-148, score-0.463]

51 On the one hand, learned networks often have spurious edges resulting from statistical noise, which might cause fragments of the network to resemble these structures even if no hidden variable is involved. [sent-150, score-1.352]

52 On the other hand, there might be edges that are missing or reversed. [sent-151, score-0.333]

53 At worst, they will lead us to propose a spurious hidden variable which will be eliminated by the subsequent evaluation step. [sent-153, score-0.731]

54 4 Experimental Results Our aim is to evaluate the success of our procedure in detecting hidden variables. [sent-156, score-0.659]

55 We chose to hide variables that are "central" in the network (i. [sent-160, score-0.398]

56 However, the data is generated from a distribution that indeed has only a single hidden variable. [sent-164, score-0.459]

57 Insurance: A 27-node network developed to evaluate driver's insurance applications [2]. [sent-167, score-0.335]

58 Each point in the graph corresponds to a network learned by one of the methods. [sent-200, score-0.342]

59 First, we used a standard model selection procedure to learn a network from the training data (without any hidden variables). [sent-210, score-0.788]

60 We supplied the learned network as input to the cliquedetecting algorithm which returned a set of candidate hidden variables. [sent-212, score-0.906]

61 The Hidden procedure returns the highest-scoring network that results from evaluating the different putative hidden variables. [sent-214, score-0.714]

62 The Naive strawman [4] initializes the learning with a network that has a single hidden variable as parent of all the observed variables. [sent-216, score-1.059]

63 The Original strawman, which applied only in synthetic data set, is to use the true generating network on the data set. [sent-221, score-0.387]

64 That is, we take the original network (that contains the variable we hid) and use standard parametric EM to learn parameters for it. [sent-222, score-0.536]

65 First, we computed the Bayesian score of each network on the training data. [sent-225, score-0.331]

66 Thus, a positive score of say 100 in Figure 2 indicates a score which is larger by 100 than the score of No-Hidden. [sent-229, score-0.393]

67 Since scores are the logarithm of the Bayesian posterior probability of structures (up to a constant), this implies that such a structure is 2100 times more probable than the structure found by No-Hidden. [sent-230, score-0.46]

68 We can see that, in most cases, the network learned by Hidden outperforms the network learned by No-hidden. [sent-231, score-0.631]

69 Our algorithm can only evaluate networks according to their score; indeed, the scores of the networks found by Hidden are better than those of Original in 12 out of 13 cases tested. [sent-234, score-0.333]

70 Our approach usually outperforms the network learned by Naive. [sent-236, score-0.393]

71 In all of our experiments, the variant that fixed the candidate structure and learned parameters for it resulted in scores that were significantly worse than the networks found by the variants that employed structure search. [sent-240, score-0.762]

72 This highlights the importance of structure search in evaluating a potential hidden variable. [sent-242, score-0.716]

73 The initial structure candidate is often too simplified; on the one hand, it forces too many independencies among the variables in the semi-clique, and on the other, it can add too many parents to the new hidden variable. [sent-243, score-1.074]

74 In many cases, the variant that gives the SEM complete flexibility in adapting the network structure did not find a better scoring network than the variant that only searches for edges in the area of the new variable. [sent-245, score-1.033]

75 For example, in the stock market data, our procedure constructs a hidden variable that is the parent of several stocks: Microsoft, Intel, Dell, CISCO, and Yahoo. [sent-249, score-0.922]

76 When the hidden variable has the "strong" value, all the stocks have higher probability for going up. [sent-252, score-0.684]

77 When the hidden variable has the "stationary" probability, these stocks have much higher probability of being in the "no change" value. [sent-253, score-0.684]

78 We do note that in the learned networks there were still many edges between the individual stocks. [sent-254, score-0.384]

79 Thus, the hidden variable serves as a general market trend, while the additional edges make better description of the correlations between individual stocks. [sent-255, score-0.87]

80 One value of the hidden variable captures two highly dominant segments of the population: older, HIV-negative, foreign-born Asians, and younger, HIV-positive, US-born blacks. [sent-257, score-0.592]

81 The hidden variable's children distinguished between the two aggregated subpopulations using the HIV-result variable, which was also a parent of most of them. [sent-258, score-0.623]

82 We believe that, had we allowed the hidden variable to have three values, it would have separated these populations. [sent-259, score-0.592]

83 5 Discussion and Future Work In this paper, we propose a simple and intuitive algorithm for finding plausible locations for hidden variables in BN learning. [sent-260, score-0.713]

84 It attempts to detect structural signatures of a hidden variable in the network learned by standard structure search. [sent-261, score-1.323]

85 To our knowledge, this paper is also the first to provide systematic empirical tests of any approach to the task of discovering hidden variables. [sent-263, score-0.552]

86 The problem of detecting hidden variables has received surprisingly little attention. [sent-264, score-0.717]

87 [11] suggest an approach that detects patterns of conditional independencies that can only be generated in the presence of hidden variables. [sent-266, score-0.696]

88 Second, it only detects hidden variables that are forced by the qualitative independence constraints. [sent-269, score-0.671]

89 It cannot detect situations where the hidden variable provides a more succinct model of a distribution that can be described by a network without a hidden variable (as in the simple example of Figure 1). [sent-270, score-1.438]

90 Then they construct a two-layered network that contains independent hidden variables in the top level, and observables in the bottom layer, such that every dependency between two observed variables is "explained" by at least one common hidden parent. [sent-274, score-1.525]

91 Thus, it forms clusters even in cases where the dependencies can be fully explained by a standard Bayesian network structure. [sent-277, score-0.36]

92 (In this case, it would learn a hidden variable that is the parent of all three variables. [sent-279, score-0.733]

93 ) Finally, this approach learns a restricted form of networks that requires many hidden variables to represent dependencies among variables. [sent-280, score-0.785]

94 Thus, it has limited utility in distinguishing "true" hidden variables from artifacts of the representation. [sent-281, score-0.583]

95 First, other possibilities for structural signatures (for example the structure resulting from a many parent - many children configuration) may expand the range of variables we can discover. [sent-283, score-0.741]

96 Second, our clique-discovering procedure is based solely on the structure of the network learned. [sent-284, score-0.46]

97 Additional information, such as the confidence of learned edges [6, 5], might help the procedure avoid spurious signatures. [sent-285, score-0.504]

98 Third, we plan to experiment with multi-valued hidden variables and better heuristics for selecting candidates out of the different proposed networks. [sent-286, score-0.655]

99 Information-theoretic measures might provide a more statistical signature for the presence of a hidden variable. [sent-288, score-0.529]

100 Discrete factor analysis: Learning hidden variables in Bayesian networks. [sent-358, score-0.583]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hidden', 0.425), ('sem', 0.276), ('edges', 0.22), ('network', 0.2), ('structure', 0.171), ('variable', 0.167), ('variables', 0.158), ('candidate', 0.148), ('score', 0.131), ('bayesian', 0.123), ('search', 0.12), ('scoring', 0.116), ('signatures', 0.115), ('hid', 0.107), ('parents', 0.103), ('parent', 0.101), ('structural', 0.099), ('children', 0.097), ('tb', 0.097), ('stocks', 0.092), ('learned', 0.092), ('em', 0.091), ('detecting', 0.09), ('procedure', 0.089), ('synthetic', 0.087), ('nodes', 0.084), ('alarm', 0.083), ('clique', 0.083), ('insurance', 0.08), ('strawman', 0.08), ('dependencies', 0.076), ('contains', 0.074), ('edge', 0.073), ('structures', 0.073), ('networks', 0.072), ('missing', 0.07), ('independencies', 0.069), ('proposition', 0.069), ('variant', 0.063), ('phase', 0.063), ('seed', 0.063), ('substructures', 0.063), ('friedman', 0.062), ('presence', 0.061), ('spurious', 0.06), ('market', 0.058), ('original', 0.055), ('node', 0.055), ('evaluate', 0.055), ('approach', 0.054), ('detect', 0.054), ('agmv', 0.053), ('detects', 0.053), ('hil', 0.053), ('hilv', 0.053), ('nir', 0.053), ('spirtes', 0.053), ('uaj', 0.053), ('greedy', 0.052), ('koller', 0.051), ('graph', 0.05), ('interact', 0.049), ('cases', 0.048), ('outperforms', 0.047), ('annotated', 0.046), ('xi', 0.046), ('scores', 0.045), ('plausible', 0.045), ('surprisingly', 0.044), ('propose', 0.044), ('might', 0.043), ('observed', 0.043), ('learning', 0.043), ('several', 0.043), ('dependency', 0.042), ('convert', 0.042), ('algorithm', 0.041), ('learn', 0.04), ('induced', 0.04), ('chose', 0.04), ('tests', 0.039), ('domain', 0.039), ('pb', 0.039), ('candidates', 0.039), ('age', 0.039), ('stock', 0.039), ('substantially', 0.038), ('graphs', 0.037), ('observable', 0.037), ('statements', 0.036), ('explained', 0.036), ('independence', 0.035), ('evaluation', 0.035), ('conditional', 0.034), ('discovering', 0.034), ('suffers', 0.034), ('data', 0.034), ('concrete', 0.033), ('plan', 0.033), ('generating', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach

Author: Gal Elidan, Noam Lotner, Nir Friedman, Daphne Koller

Abstract: A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. As such, they induce seemingly complex dependencies among the latter. In recent years, much attention has been devoted to the development of algorithms for learning parameters, and in some cases structure, in the presence of hidden variables. In this paper, we address the related problem of detecting hidden variables that interact with the observed variables. This problem is of interest both for improving our understanding of the domain and as a preliminary step that guides the learning procedure towards promising models. A very natural approach is to search for

2 0.19097714 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

Author: Zoubin Ghahramani, Matthew J. Beal

Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1

3 0.16359825 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

Author: Simon Tong, Daphne Koller

Abstract: Bayesian networks are graphical representations of probability distributions. In virtually all of the work on learning these networks, the assumption is that we are presented with a data set consisting of randomly generated instances from the underlying distribution. In many situations, however, we also have the option of active learning, where we have the possibility of guiding the sampling process by querying for certain types of samples. This paper addresses the problem of estimating the parameters of Bayesian networks in an active learning setting. We provide a theoretical framework for this problem, and an algorithm that chooses which active learning queries to generate based on the model learned so far. We present experimental results showing that our active learning algorithm can significantly reduce the need for training data in many situations.

4 0.1354913 114 nips-2000-Second Order Approximations for Probability Models

Author: Hilbert J. Kappen, Wim Wiegerinck

Abstract: In this paper, we derive a second order mean field theory for directed graphical probability models. By using an information theoretic argument it is shown how this can be done in the absense of a partition function. This method is a direct generalisation of the well-known TAP approximation for Boltzmann Machines. In a numerical example, it is shown that the method greatly improves the first order mean field approximation. For a restricted class of graphical models, so-called single overlap graphs, the second order method has comparable complexity to the first order method. For sigmoid belief networks, the method is shown to be particularly fast and effective.

5 0.12958838 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition

Author: Yee Whye Teh, Geoffrey E. Hinton

Abstract: We describe a neurally-inspired, unsupervised learning algorithm that builds a non-linear generative model for pairs of face images from the same individual. Individuals are then recognized by finding the highest relative probability pair among all pairs that consist of a test image and an image whose identity is known. Our method compares favorably with other methods in the literature. The generative model consists of a single layer of rate-coded, non-linear feature detectors and it has the property that, given a data vector, the true posterior probability distribution over the feature detector activities can be inferred rapidly without iteration or approximation. The weights of the feature detectors are learned by comparing the correlations of pixel intensities and feature activations in two phases: When the network is observing real data and when it is observing reconstructions of real data generated from the feature activations.

6 0.12173077 108 nips-2000-Recognizing Hand-written Digits Using Hierarchical Products of Experts

7 0.11299686 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

8 0.11021888 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

9 0.10550999 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

10 0.10065592 35 nips-2000-Computing with Finite and Infinite Networks

11 0.090359375 142 nips-2000-Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task

12 0.084115513 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

13 0.080926985 38 nips-2000-Data Clustering by Markovian Relaxation and the Information Bottleneck Method

14 0.080318525 135 nips-2000-The Manhattan World Assumption: Regularities in Scene Statistics which Enable Bayesian Inference

15 0.078061558 53 nips-2000-Feature Correspondence: A Markov Chain Monte Carlo Approach

16 0.076542966 85 nips-2000-Mixtures of Gaussian Processes

17 0.076246418 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

18 0.075757109 148 nips-2000-`N-Body' Problems in Statistical Learning

19 0.074007526 103 nips-2000-Probabilistic Semantic Video Indexing

20 0.072462663 27 nips-2000-Automatic Choice of Dimensionality for PCA


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.287), (1, -0.043), (2, 0.147), (3, -0.066), (4, 0.167), (5, 0.015), (6, 0.073), (7, -0.022), (8, 0.011), (9, 0.123), (10, 0.147), (11, -0.056), (12, 0.121), (13, -0.089), (14, 0.082), (15, 0.062), (16, -0.081), (17, 0.192), (18, -0.073), (19, 0.083), (20, -0.092), (21, -0.048), (22, -0.014), (23, -0.12), (24, 0.029), (25, -0.05), (26, -0.06), (27, -0.049), (28, -0.008), (29, -0.036), (30, 0.051), (31, -0.095), (32, 0.02), (33, 0.054), (34, 0.016), (35, -0.06), (36, -0.081), (37, -0.006), (38, 0.035), (39, -0.039), (40, -0.05), (41, 0.069), (42, -0.001), (43, -0.111), (44, -0.06), (45, -0.054), (46, 0.167), (47, 0.081), (48, -0.003), (49, 0.068)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98411155 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach

Author: Gal Elidan, Noam Lotner, Nir Friedman, Daphne Koller

Abstract: A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. As such, they induce seemingly complex dependencies among the latter. In recent years, much attention has been devoted to the development of algorithms for learning parameters, and in some cases structure, in the presence of hidden variables. In this paper, we address the related problem of detecting hidden variables that interact with the observed variables. This problem is of interest both for improving our understanding of the domain and as a preliminary step that guides the learning procedure towards promising models. A very natural approach is to search for

2 0.74146152 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

Author: Simon Tong, Daphne Koller

Abstract: Bayesian networks are graphical representations of probability distributions. In virtually all of the work on learning these networks, the assumption is that we are presented with a data set consisting of randomly generated instances from the underlying distribution. In many situations, however, we also have the option of active learning, where we have the possibility of guiding the sampling process by querying for certain types of samples. This paper addresses the problem of estimating the parameters of Bayesian networks in an active learning setting. We provide a theoretical framework for this problem, and an algorithm that chooses which active learning queries to generate based on the model learned so far. We present experimental results showing that our active learning algorithm can significantly reduce the need for training data in many situations.

3 0.65229255 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

Author: Zoubin Ghahramani, Matthew J. Beal

Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1

4 0.60438359 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

Author: Javier R. Movellan, Paul Mineiro, Ruth J. Williams

Abstract: This paper explores a framework for recognition of image sequences using partially observable stochastic differential equation (SDE) models. Monte-Carlo importance sampling techniques are used for efficient estimation of sequence likelihoods and sequence likelihood gradients. Once the network dynamics are learned, we apply the SDE models to sequence recognition tasks in a manner similar to the way Hidden Markov models (HMMs) are commonly applied. The potential advantage of SDEs over HMMS is the use of continuous state dynamics. We present encouraging results for a video sequence recognition task in which SDE models provided excellent performance when compared to hidden Markov models. 1

5 0.55456334 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

Author: Brendan J. Frey, Anitha Kannan

Abstract: One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. probability propagation algorithm), while ignoring the fact that the graph has cycles. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce

6 0.54711914 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

7 0.52083546 148 nips-2000-`N-Body' Problems in Statistical Learning

8 0.50940913 108 nips-2000-Recognizing Hand-written Digits Using Hierarchical Products of Experts

9 0.49082148 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition

10 0.46239302 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

11 0.44617525 142 nips-2000-Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task

12 0.42740354 85 nips-2000-Mixtures of Gaussian Processes

13 0.42605415 53 nips-2000-Feature Correspondence: A Markov Chain Monte Carlo Approach

14 0.4231638 35 nips-2000-Computing with Finite and Infinite Networks

15 0.40441036 114 nips-2000-Second Order Approximations for Probability Models

16 0.39919549 147 nips-2000-Who Does What? A Novel Algorithm to Determine Function Localization

17 0.38842443 29 nips-2000-Bayes Networks on Ice: Robotic Search for Antarctic Meteorites

18 0.38478523 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

19 0.37146086 138 nips-2000-The Use of Classifiers in Sequential Inference

20 0.36129218 135 nips-2000-The Manhattan World Assumption: Regularities in Scene Statistics which Enable Bayesian Inference


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.037), (17, 0.115), (32, 0.015), (33, 0.05), (54, 0.014), (55, 0.416), (62, 0.035), (65, 0.028), (67, 0.041), (76, 0.037), (79, 0.03), (81, 0.021), (90, 0.04), (91, 0.029), (97, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98505199 19 nips-2000-Adaptive Object Representation with Hierarchically-Distributed Memory Sites

Author: Bosco S. Tjan

Abstract: Theories of object recognition often assume that only one representation scheme is used within one visual-processing pathway. Versatility of the visual system comes from having multiple visual-processing pathways, each specialized in a different category of objects. We propose a theoretically simpler alternative, capable of explaining the same set of data and more. A single primary visual-processing pathway, loosely modular, is assumed. Memory modules are attached to sites along this pathway. Object-identity decision is made independently at each site. A site's response time is a monotonic-decreasing function of its confidence regarding its decision. An observer's response is the first-arriving response from any site. The effective representation(s) of such a system, determined empirically, can appear to be specialized for different tasks and stimuli, consistent with recent clinical and functional-imaging findings. This, however, merely reflects a decision being made at its appropriate level of abstraction. The system itself is intrinsically flexible and adaptive.

same-paper 2 0.97537917 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach

Author: Gal Elidan, Noam Lotner, Nir Friedman, Daphne Koller

Abstract: A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. As such, they induce seemingly complex dependencies among the latter. In recent years, much attention has been devoted to the development of algorithms for learning parameters, and in some cases structure, in the presence of hidden variables. In this paper, we address the related problem of detecting hidden variables that interact with the observed variables. This problem is of interest both for improving our understanding of the domain and as a preliminary step that guides the learning procedure towards promising models. A very natural approach is to search for

3 0.6182943 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure

Author: Shimon Edelman, Nathan Intrator

Abstract: We describe a unified framework for the understanding of structure representation in primate vision. A model derived from this framework is shown to be effectively systematic in that it has the ability to interpret and associate together objects that are related through a rearrangement of common

4 0.56025016 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

Author: Zoubin Ghahramani, Matthew J. Beal

Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1

5 0.52361363 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

Author: Thomas Natschläger, Wolfgang Maass, Eduardo D. Sontag, Anthony M. Zador

Abstract: Experimental data show that biological synapses behave quite differently from the symbolic synapses in common artificial neural network models. Biological synapses are dynamic, i.e., their

6 0.51933461 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

7 0.51548105 147 nips-2000-Who Does What? A Novel Algorithm to Determine Function Localization

8 0.50292248 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

9 0.49897954 8 nips-2000-A New Model of Spatial Representation in Multimodal Brain Areas

10 0.49130103 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

11 0.49029851 131 nips-2000-The Early Word Catches the Weights

12 0.48862025 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

13 0.48644629 108 nips-2000-Recognizing Hand-written Digits Using Hierarchical Products of Experts

14 0.48080102 52 nips-2000-Fast Training of Support Vector Classifiers

15 0.47270775 97 nips-2000-Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping

16 0.46890801 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition

17 0.46356028 80 nips-2000-Learning Switching Linear Models of Human Motion

18 0.46220976 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing

19 0.46159133 122 nips-2000-Sparse Representation for Gaussian Process Models

20 0.45697254 85 nips-2000-Mixtures of Gaussian Processes