nips nips2005 nips2005-6 knowledge-graph by maker-knowledge-mining

6 nips-2005-A Connectionist Model for Constructive Modal Reasoning

Source: pdf

Author: Artur Garcez, Luis C. Lamb, Dov M. Gabbay

Abstract: We present a new connectionist model for constructive, intuitionistic modal reasoning. We use ensembles of neural networks to represent intuitionistic modal theories, and show that for each intuitionistic modal program there exists a corresponding neural network ensemble that computes the program. This provides a massively parallel model for intuitionistic modal reasoning, and sets the scene for integrated reasoning, knowledge representation, and learning of intuitionistic theories in neural networks, since the networks in the ensemble can be trained by examples using standard neural learning algorithms. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract We present a new connectionist model for constructive, intuitionistic modal reasoning. [sent-11, score-1.083]

2 We use ensembles of neural networks to represent intuitionistic modal theories, and show that for each intuitionistic modal program there exists a corresponding neural network ensemble that computes the program. [sent-12, score-2.342]

3 However, while (machine) learning has focused mainly on quantitative and connectionist approaches [16], the reasoning component of intelligent systems has been developed mainly by formalisms of classical and non-classical logics [7, 9]. [sent-15, score-0.287]

4 More recently, the recognition of the need for systems that integrate reasoning and learning into the same foundation, and the evolution of the ﬁelds of cognitive and neural computation, has led to a number of proposals that attempt to integrate reasoning and learning [1, 3, 12, 13, 15]. [sent-16, score-0.213]

5 By integrating logic and neural networks, they may provide (i) a sound logical characterisation of a connectionist system, (ii) a connectionist (parallel) implementation of a logic, or (iii) a hybrid learning system bringing together advantages from connectionism and symbolic reasoning. [sent-19, score-0.388]

6 Intuitionistic logical systems have been advocated by many as providing adequate logical foundations for computation (see [2] for a survey). [sent-20, score-0.13]

7 In this paper, we follow the research path outlined in [4, 5], and develop a computational model for integrated reasoning, representation, and learning of intuitionistic modal knowledge. [sent-22, score-0.982]

8 We concentrate on reasoning and knowledge representation issues, which set the scene for connectionist intuitionistic learning, since effective knowledge representation should precede learning [15]. [sent-23, score-0.998]

9 Still, we base the representation on standard, simple neural network architectures, aiming at future work on experimental learning within the model proposed here. [sent-24, score-0.101]

10 We claim that the intuitionistic interpretation introduced here will make sense for a number of problems in neural computation in the same way that intuitionistic logic is more appropriate than classical logic in a number of computational settings. [sent-26, score-1.579]

11 We will start by illustrating the proposed computational model in an appropriate constructive reasoning, distributed knowledge representation scenario, namely, the wise men puzzle [7]. [sent-27, score-0.441]

12 Then, we will show how ensembles of Connectionist Inductive Learning and Logic Programming (C-ILP) networks [3] can compute intuitionistic modal knowledge. [sent-28, score-1.077]

13 A proof that the algorithm produces a neural network ensemble that computes a semantics of its associated intuitionistic modal theory is then given. [sent-30, score-1.27]

14 Furthermore, the networks in the ensemble are kept simple and in a modular structure, and may be trained from examples with the use of standard learning algorithms such as backpropagation [11]. [sent-31, score-0.169]

15 In Section 2, we present the basic concepts of intuitionistic reasoning used in the paper. [sent-32, score-0.775]

16 In Section 3, we motivate the proposed model using the wise men puzzle. [sent-33, score-0.264]

17 In Section 4, we introduce the Intuitionistic Modal Algorithm, which translates intuitionistic modal theories into neural network ensembles, and prove that the ensemble computes a semantics of the theory. [sent-34, score-1.278]

18 2 Background In this section, we present some basic concepts of artiﬁcial neural networks and intuitionistic programs used throughout the paper. [sent-36, score-0.804]

19 We concentrate on ensembles of single hidden layer feedforward networks, and on recurrent networks typically with feedback only from the output to the input layer. [sent-37, score-0.27]

20 Feedback is used with the sole purpose of denoting that the output of a neuron should serve as the input of another neuron when we run the network, i. [sent-38, score-0.375]

21 the weight of any feedback connection is ﬁxed at 1. [sent-40, score-0.108]

22 Intuitionistic logic was originally developed by Brouwer, and later by Heyting and Kolmogorov [2]. [sent-43, score-0.098]

23 In intuitionistic logics, a statement that there exists a proof of a proposition x is only made if there is a constructive method of the proof of x. [sent-44, score-0.761]

24 One of the consequences of Brouwer’s ideas is the rejection of the law of the excluded middle, namely α ∨ ¬α, since one cannot always state that there is a proof of α or of its negation, as accepted in classical logic and in (classical) mathematics. [sent-45, score-0.115]

25 The development of these ideas and applications in mathematics has led to developments in constructive mathematics and has inﬂuenced several lines of research on logic and computing science [2]. [sent-46, score-0.145]

26 An intuitionistic modal language L includes propositional letters (atoms) p, q, r. [sent-47, score-0.982]

27 Deﬁnition 1 (Kripke Models for Intuitionistic Modal Logic) Let L be an intuitionistic language. [sent-54, score-0.68]

28 We now deﬁne labelled intuitionistic programs as sets of intuitionistic rules, where each rule is labelled by the world at which it holds, similarly to Gabbay’s Labelled Deductive Systems [8]. [sent-56, score-1.643]

29 , An ⇒ A0 (where “,” abbreviates “∧”, as usual), and a ﬁnite set of relations R between worlds ωi (1 ≤ i ≤ m) in C, where Ak (0 ≤ k ≤ n) are atoms and ωi is a label representing a world in which the associated rule holds. [sent-60, score-0.215]

30 To deal with intuitionistic negation, we adopt the approach of [10], as follows. [sent-61, score-0.68]

31 We rename any negative literal ¬A as an atom A′ not present originally in the language. [sent-62, score-0.112]

32 This form of renaming allows our deﬁnition of labelled intuitionistic programs above to consider atoms only. [sent-63, score-0.91]

33 Following Deﬁnition 1 (intuitionistic negation), A′ will be true in a world ωi if and only if A does not hold in every world ωj such that R(ωi , ωj ). [sent-71, score-0.151]

34 Finally, we extend labelled intuitionistic programs to include modalities. [sent-72, score-0.798]

35 Deﬁnition 3 (Labelled Intuitionistic Modal Program) A modal atom is of the form M A where M ∈ { , ♦} and A is an atom. [sent-73, score-0.391]

36 , M An ⇒ M A0 , where M Ak (0 ≤ k ≤ n) are modal atoms and ωi is a label representing a world in which the associated rule holds, and a ﬁnite set of (accessibility) relations R between worlds ωi (1 ≤ i ≤ m) in C. [sent-77, score-0.517]

37 3 Motivating Scenario In this section, we consider an archetypal testbed for distributed knowledge representation, namely, the wise men puzzle [7], and model it intuitionistically in a neural network ensemble. [sent-78, score-0.459]

38 Our aim is to illustrate the combination of neural networks and intuitionistic modal reasoning. [sent-79, score-1.059]

39 A certain king wishes to test his three wise men. [sent-81, score-0.198]

40 It is also common knowledge among them that there are three red hats and two white hats, and ﬁve hats in total. [sent-84, score-0.255]

41 The king places a hat on the head of each wise man in a way that they are not able to see the colour of their own hats, and then asks each one whether they know the colour of the hats on their heads. [sent-85, score-0.496]

42 The puzzle illustrates a situation in which intuitionistic implication and intuitionistic negation occur. [sent-86, score-1.614]

43 For example, at the ﬁrst round it is known that there are at most two white hats on the wise men’s heads. [sent-88, score-0.295]

44 Then, if the wise men get to a second round, it becomes known that there is at most one white hat on their heads. [sent-89, score-0.36]

45 This means that if A ⇒ B is true at a world t1 then A ⇒ B will be true at a world t2 that is related to t1 (intuitionistic implication). [sent-91, score-0.172]

46 Now, in any situation in which a wise man knows that his hat is red, this knowledge - constructed with the use of sound reasoning processes - cannot be refuted. [sent-92, score-0.511]

47 In other words, in this puzzle, if ¬A is true at world t1 then A cannot be true at a world t2 that is related to t1 (intuitionistic negation). [sent-93, score-0.172]

48 We model the wise men puzzle by constructing the relative knowledge of each wise man along time points. [sent-94, score-0.636]

49 This allows us to explicitly represent the relativistic notion of knowledge, which is a principle of intuitionistic reasoning. [sent-95, score-0.699]

50 For simplicity, we refer to wise man 1 (respectively, 2 and 3) as agent 1 (respectively, 2 and 3). [sent-96, score-0.31]

51 The resulting model is a twodimensional network ensemble (agents × time), containing three networks in each dimension. [sent-97, score-0.209]

52 In addition to pi - denoting the fact that wise man i wears a red hat - to model each agent’s individual knowledge, we need to use a modality Kj , j ∈ {1, 2, 3}, which represents the relative notion of knowledge at each time point t1 , t2 , t3 . [sent-98, score-0.55]

53 Thus, Kj pi denotes the fact that agent j knows that agent i wears a red hat. [sent-99, score-0.263]

54 The K modality above corresponds to the modality in intuitionistic modal reasoning, as customary in the logics of knowledge [7], and as exempliﬁed below. [sent-100, score-1.184]

55 First, we model the fact that each agent knows the colour of the others’ hats. [sent-101, score-0.121]

56 For example, if wise man 3 wears a red hat (neuron p3 is active) then wise man 1 knows that wise man 3 wears a red hat (neuron Kp3 is active for wise man 1). [sent-102, score-1.386]

57 We then need to model the reasoning process of each wise man. [sent-103, score-0.268]

58 For agent 1, we have the rule t1 : K1 ¬p2 ∧ K1 ¬p3 ⇒ K1 p1 , which states that agent 1 can deduce that he is wearing a red hat if he knows that the other agents are both wearing white hats. [sent-105, score-0.361]

59 As before, the implication is intuitionistic, so that it persists at t2 and t3 as depicted in Figure 1 for wise man 1 (represented via hidden neuron h1 in each network). [sent-107, score-0.544]

60 In addition, according to the philosophy of intuitionistic negation, we may only conclude that agent 1 knows ¬p2 , if in every world envisaged by agent 1, p2 is not derived. [sent-108, score-0.886]

61 , if neuron Kp2 is not active at t3 then neuron K¬p2 will be active at t2 . [sent-111, score-0.398]

62 As a result, the network ensemble will never derive p2 (as one should expect), and thus it will derive K1 ¬p2 and K3 ¬p2 . [sent-112, score-0.155]

63 2 4 Connectionist Intuitionistic Modal Reasoning The wise men puzzle example of Section 3 shows that simple, single-hidden layer neural networks can be combined in a modular structure where each network represents a possible world in the Kripke structure of Deﬁnition 1. [sent-113, score-0.604]

64 The way that the networks should then be inter-connected can be deﬁned by following a semantics for ⇒ and ¬, and for and ♦ from intuitionistic logic. [sent-114, score-0.793]

65 In this section, we see how exactly we construct a network ensemble 1 This is because if there were two white hats on their heads, one of them would have known (and have said), in the ﬁrst round, that his hat was red, for he would have been seeing the other two with white hats. [sent-115, score-0.354]

66 We introduce a translation algorithm, which takes the program as input and produces the ensemble as output by setting the initial architecture, set of weights, and thresholds of the networks according to a Kripke semantics for the program. [sent-120, score-0.291]

67 We then prove that the translation is correct, and thus that the network ensemble can be used to compute the logical consequences of the program in parallel. [sent-121, score-0.268]

68 Each possible world is represented by a single hidden layer neural network. [sent-124, score-0.204]

69 In each network, input and output neurons represent atoms or modal atoms of the form A, ¬A, A, or ♦A, while each hidden neuron encodes a rule. [sent-125, score-0.799]

70 For example, in Figure 1, hidden neuron h1 encodes a rule of the form A ∧ B ⇒ C. [sent-126, score-0.266]

71 Thresholds and weights must be such that the hidden layer computes a logical and of the input layer, while the output layer computes a logical or of the hidden layer. [sent-127, score-0.465]

72 3 Furthermore, in each network, each output neuron is connected to its corresponding input neuron with a weight ﬁxed at 1. [sent-128, score-0.407]

73 Now, in CML, we allow for an ensemble of C-ILP networks, each network representing knowledge in a (learnable) possible world. [sent-131, score-0.198]

74 These are deﬁned as follows: in the case of , if neuron A is activated (true) in network (world) ωi then A must be activated in every network ωj that is related to ωi (this is analogous to the situation in which we activate K1 p3 and K2 p3 whenever p3 is active). [sent-133, score-0.529]

75 Dually, if A is active in every ωj then A must be activated 3 For example, if A ∧ B ⇒ D and C ⇒ D then a hidden neuron h1 is used to connect A and B to D, and a hidden neuron h2 is used to connect C to D such that if h1 or h2 is activated then D is activated. [sent-134, score-0.919]

76 in ωi (this is done with the use of feedback connections and a hidden neuron that computes a logical and, as detailed in the algorithm below). [sent-135, score-0.36]

77 In the case of ♦, if ♦A is activated in network ωi then A must be activated in at least one network ωj that is related to ωi (we do this by choosing an arbitrary ωj to make A active). [sent-136, score-0.334]

78 Dually, if A is activated in any ωj that is related to ωi then ♦A must be activated in ωi (this is done with the use of a hidden neuron that computes a logical or, also as detailed in the algorithm below). [sent-137, score-0.55]

79 Now, in the case of ⇒, according to the semantics of intuitionistic implication, ωi : A ⇒ B and R(ωi , ωj ) imply ωj : A ⇒ B. [sent-138, score-0.739]

80 Finally, in the case of ¬, we need to make sure that ¬A is activated in ωi if, for every ωj such that R(ωi , ωj ), A is not active in ωj . [sent-140, score-0.136]

81 This is implemented with the use of negative weights (to account for the fact that the non-activation of a neuron needs to activate another neuron), as depicted in Figure 1 (dashed arrows), and detailed in the algorithm below. [sent-141, score-0.195]

82 , Pn } be a labelled intuitionistic modal program with rules of the form ωi : M A1 , . [sent-146, score-1.137]

83 , Nn } be a neural network ensemble with each network Ni corresponding to program Pi . [sent-153, score-0.286]

84 Consider that the atoms of Pi are numbered from 1 to ηi such that the input and output layers of Ni are vectors of length ηi , where the j-th neuron represents the j-th atom of Pi . [sent-155, score-0.383]

85 Rename each modal atom M Aj by a new atom not occurring in P of the form Aj if M = A♦ if M = ♦; j , or 2. [sent-179, score-0.48]

86 Such neurons encode (meta-level) knowledge about negation, while the other hidden neurons encode (object-level) knowledge about the problem domain. [sent-195, score-0.247]

87 Theorem 1 (Correctness of Intuitionistic Modal Algorithm) For any intuitionistic modal program P there exists an ensemble of neural networks N such that N computes the intuitionistic modal semantics of P. [sent-209, score-2.277]

88 Proof The algorithm to build each individual network in the ensemble is that of C-ILP, which we know is provably correct [3]. [sent-210, score-0.155]

89 We need to consider when modalities and intuitionistic negation are to be encoded together. [sent-212, score-0.843]

90 Consider an output neuron A0 with neurons M (encoding modalities) and neurons n (encoding negation) among its predecessors in a network’s hidden layer. [sent-213, score-0.366]

91 (i) Both neurons M and neurons n are not activated: since the activation function of neurons M and n is the step function, their activation is zero, and thus this case reduces to C-ILP. [sent-215, score-0.269]

92 (ii) Only neurons M are activated: from the algorithm above, A0 will also be activated (with minimum input potential W M + ς, where ς ∈ R). [sent-216, score-0.154]

93 (iii) Only neurons n are activated: as before, A0 will also be activated (now with minimum input potential W I + ς). [sent-217, score-0.154]

94 Since W M > 0 and W I > 0, and since the activation function of A0 , h(x), is monotonically increasing, A0 will be activated whenever both M and n neurons are activated. [sent-219, score-0.218]

95 5 Concluding Remarks In this paper, we have presented a new model of computation that integrates neural networks and constructive, intuitionistic modal reasoning. [sent-221, score-1.059]

96 We have deﬁned labelled intuitionistic modal programs, and have presented an algorithm to translate the intuitionistic theories into ensembles of C-ILP neural networks, and showed that the ensembles compute a semantics of the corresponding theories. [sent-222, score-1.922]

97 As a result, each ensemble can be seen as a new massively parallel model for the computation of intuitionistic modal logic. [sent-223, score-1.1]

98 , backpropagation, one can adapt the network ensemble by training possible world representations from examples. [sent-226, score-0.22]

99 Extensions of this work also include the study of how to represent other non-classical logics such as branching time temporal logics, and conditional logics of normality, which are relevant for cognitive and neural computation. [sent-229, score-0.205]

100 Applying connectionist modal logics to distributed knowledge representation problems. [sent-270, score-0.555]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('intuitionistic', 0.68), ('modal', 0.302), ('amin', 0.238), ('aj', 0.185), ('wise', 0.173), ('neuron', 0.17), ('negation', 0.138), ('ni', 0.125), ('activated', 0.107), ('nu', 0.102), ('connect', 0.101), ('connectionist', 0.101), ('logic', 0.098), ('reasoning', 0.095), ('ensemble', 0.095), ('logics', 0.091), ('men', 0.091), ('atom', 0.089), ('atoms', 0.089), ('man', 0.087), ('hats', 0.079), ('hat', 0.072), ('labelled', 0.071), ('puzzle', 0.069), ('garcez', 0.068), ('hidden', 0.067), ('logical', 0.065), ('world', 0.065), ('activation', 0.064), ('network', 0.06), ('semantics', 0.059), ('avila', 0.057), ('ak', 0.056), ('pi', 0.056), ('networks', 0.054), ('connection', 0.052), ('agent', 0.05), ('layer', 0.049), ('rl', 0.048), ('program', 0.048), ('programs', 0.047), ('neurons', 0.047), ('constructive', 0.047), ('implication', 0.047), ('axp', 0.045), ('gabbay', 0.045), ('kripke', 0.045), ('lamb', 0.045), ('knowledge', 0.043), ('knows', 0.041), ('ensembles', 0.041), ('iff', 0.039), ('wears', 0.036), ('rules', 0.036), ('output', 0.035), ('cml', 0.034), ('modality', 0.034), ('namin', 0.034), ('computes', 0.034), ('add', 0.033), ('weight', 0.032), ('worlds', 0.032), ('nl', 0.032), ('red', 0.03), ('colour', 0.03), ('kq', 0.03), ('na', 0.029), ('rule', 0.029), ('active', 0.029), ('activate', 0.025), ('king', 0.025), ('modalities', 0.025), ('nz', 0.025), ('agents', 0.025), ('theories', 0.025), ('feedback', 0.024), ('threshold', 0.024), ('white', 0.024), ('neural', 0.023), ('artur', 0.023), ('broda', 0.023), ('brouwer', 0.023), ('clarendom', 0.023), ('dually', 0.023), ('formalisation', 0.023), ('massively', 0.023), ('rename', 0.023), ('renaming', 0.023), ('iii', 0.022), ('true', 0.021), ('kl', 0.02), ('deductive', 0.02), ('modular', 0.02), ('wearing', 0.02), ('round', 0.019), ('notion', 0.019), ('representation', 0.018), ('proof', 0.017), ('kj', 0.017), ('pj', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 6 nips-2005-A Connectionist Model for Constructive Modal Reasoning

Author: Artur Garcez, Luis C. Lamb, Dov M. Gabbay

2 0.10727055 181 nips-2005-Spiking Inputs to a Winner-take-all Network

Author: Matthias Oster, Shih-Chii Liu

Abstract: Recurrent networks that perform a winner-take-all computation have been studied extensively. Although some of these studies include spiking networks, they consider only analog input rates. We present results of this winner-take-all computation on a network of integrate-and-ﬁre neurons which receives spike trains as inputs. We show how we can conﬁgure the connectivity in the network so that the winner is selected after a pre-determined number of input spikes. We discuss spiking inputs with both regular frequencies and Poisson-distributed rates. The robustness of the computation was tested by implementing the winner-take-all network on an analog VLSI array of 64 integrate-and-ﬁre neurons which have an innate variance in their operating parameters. 1

3 0.092101291 164 nips-2005-Representing Part-Whole Relationships in Recurrent Neural Networks

Author: Viren Jain, Valentin Zhigulin, H. S. Seung

Abstract: There is little consensus about the computational function of top-down synaptic connections in the visual system. Here we explore the hypothesis that top-down connections, like bottom-up connections, reﬂect partwhole relationships. We analyze a recurrent network with bidirectional synaptic interactions between a layer of neurons representing parts and a layer of neurons representing wholes. Within each layer, there is lateral inhibition. When the network detects a whole, it can rigorously enforce part-whole relationships by ignoring parts that do not belong. The network can complete the whole by ﬁlling in missing parts. The network can refuse to recognize a whole, if the activated parts do not conform to a stored part-whole relationship. Parameter regimes in which these behaviors happen are identiﬁed using the theory of permitted and forbidden sets [3, 4]. The network behaviors are illustrated by recreating Rumelhart and McClelland’s “interactive activation” model [7]. In neural network models of visual object recognition [2, 6, 8], patterns of synaptic connectivity often reﬂect part-whole relationships between the features that are represented by neurons. For example, the connections of Figure 1 reﬂect the fact that feature B both contains simpler features A1, A2, and A3, and is contained in more complex features C1, C2, and C3. Such connectivity allows neurons to follow the rule that existence of the part is evidence for existence of the whole. By combining synaptic input from multiple sources of evidence for a feature, a neuron can “decide” whether that feature is present. 1 The synapses shown in Figure 1 are purely bottom-up, directed from simple to complex features. However, there are also top-down connections in the visual system, and there is little consensus about their function. One possibility is that top-down connections also reﬂect part-whole relationships. They allow feature detectors to make decisions using the rule that existence of the whole is evidence for existence of its parts. In this paper, we analyze the dynamics of a recurrent network in which part-whole relationships are stored as bidirectional synaptic interactions, rather than the unidirectional interactions of Figure 1. The network has a number of interesting computational capabilities. When the network detects a whole, it can rigorously enforce part-whole relationships 1 Synaptic connectivity may reﬂect other relationships besides part-whole. For example, invariances can be implemented by connecting detectors of several instances of the same feature to the same target, which is consequently an invariant detector of the feature. C1 C2 C3 B A1 A2 A3 Figure 1: The synaptic connections (arrows) of neuron B represent part-whole relationships. Feature B both contains simpler features and is contained in more complex features. The synaptic interactions are drawn one-way, as in most models of visual object recognition. Existence of the part is regarded as evidence for existence of the whole. This paper makes the interactions bidirectional, allowing the existence of the whole to be evidence for the existence of its parts. by ignoring parts that do not belong. The network can complete the whole by ﬁlling in missing parts. The network can refuse to recognize a whole, if the activated parts do not conform to a stored part-whole relationship. Parameter regimes in which these behaviors happen are identiﬁed using the recently developed theory of permitted and forbidden sets [3, 4]. Our model is closely related to the interactive activation model of word recognition, which was proposed by McClelland and Rumelhart to explain the word superiority effect studied by visual psychologists [7]. Here our concern is not to model a psychological effect, but to characterize mathematically how computations involving part-whole relationships can be carried out by a recurrent network. 1 Network model Suppose that we are given a set of part-whole relationships speciﬁed by a ξi = 1, if part i is contained in whole a 0, otherwise We assume that every whole contains at least one part, and every part is contained in at least one whole. The stimulus drives a layer of neurons that detect parts. These neurons also interact with a layer of neurons that detect wholes. We will refer to part-detectors as “P-neurons” and whole-detectors as “W-neurons.” The part-whole relationships are directly stored in the synaptic connections between P and a W neurons. If ξi = 1, the ith neuron in the P layer and the ath neuron in the W layer have a an excitatory interaction of strength γ. If ξi = 0, the neurons have an inhibitory interaction of strength σ. Furthermore, the P-neurons inhibit each other with strength β, and the Wneurons inhibit each other with strength α. All of these interactions are symmetric, and all activation functions are the rectiﬁcation nonlinearity [z]+ = max{z, 0}. Then the dynamics of the network takes the form  ˙ Wa + Wa a Pi ξ i − σ = γ i + a (1 − ξi )Pi − α i Wb  , +  ˙ Pi + Pi (1) b=a a Wa ξ i − σ = γ a a (1 − ξi )Wa − β a Pj + B i  . j=i (2) where Bi is the input to the P layer from the stimulus. Figure 2 shows an example of a network with two wholes. Each whole contains two parts. One of the parts is contained in both wholes. -α Wa excitation γ -σ inhibition P1 B1 -β } W layer Wb -σ P2 -β B2 P3 } P layer B3 Figure 2: Model in example conﬁguration: ξ = {(1, 1, 0), (0, 1, 1)}. When a stimulus is presented, it activates some of the P-neurons, which activate some of the W-neurons. The network eventually converges to a stable steady state. We will assume that α > 1. In the Appendix, we prove that this leads to unconditional winner-take-all behavior in the W layer. In other words, no more than one W-neuron can be active at a stable steady state. If a single W-neuron is active, then a whole has been detected. Potentially there are also many P-neurons active, indicating detection of parts. This representation may have different properties, depending on the choice of parameters β, γ, and σ. As discussed below, these include rigorous enforcement of part-whole relationships, completion of wholes by “ﬁlling in” missing parts, and non-recognition of parts that do not conform to a whole. 2 Enforcement of part-whole relationships Suppose that a single W-neuron is active at a stable steady state, so that a whole has been detected. Part-whole relationships are said to be enforced if the network always ignores parts that are not contained in the detected whole, despite potentially strong bottom-up evidence for them. It can be shown that enforcement follows from the inequality σ 2 + β 2 + γ 2 + 2σβγ > 1. (3) which guarantees that neuron i in the P layer is inactive, if neuron a in the W layer is a active and ξi = 0. When part-whole relations are enforced, prior knowledge about legal combinations of parts strictly constrains what may be perceived. This result is proven in the Appendix, and only an intuitive explanation is given here. Enforcement is easiest to understand when there is interlayer inhibition (σ > 0). In this case, the active W-neuron directly inhibits the forbidden P-neurons. The case of σ = 0 is more subtle. Then enforcement is mediated by lateral inhibition in the P layer. Excitatory feedback from the W-neuron has the effect of counteracting the lateral inhibition between the P-neurons that belong to the whole. As a result, these P-neurons become strongly activated enough to inhibit the rest of the P layer. 3 Completion of wholes by ﬁlling in missing parts If a W-neuron is active, it excites the P-neurons that belong to the whole. As a result, even if one of these P-neurons receives no bottom-up input (Bi = 0), it is still active. We call this phenomenon “completion,” and it is guaranteed to happen when (4) γ> β The network may thus “imagine” parts that are consistent with the recognized whole, but are not actually present in the stimulus. As with enforcement, this condition depends on top-down connections. √ In the special case γ = β, the interlayer excitation between a W-neuron and its P-neurons exactly cancels out the lateral inhibition between the P-neurons at a steady state. So the recurrent connections effectively vanish, letting the activity of the P-neurons be determined by their feedforward inputs. When the interlayer excitation is stronger than this, the inequality (4) holds, and completion occurs. 4 Non-recognition of a whole If there is no interlayer inhibition (σ = 0), then a single W-neuron is always active, assuming that there is some activity in the P layer. To see this, suppose for the sake of contradiction that all the W-neurons are inactive. Then they receive no inhibition to counteract the excitation from the P layer. This means some of them must be active, which contradicts our assumption. This means that the network always recognizes a whole, even if the stimulus is very different from any part-whole combination that is stored in the network. However, if interlayer inhibition is sufﬁciently strong (large σ), the network may refuse to recognize a whole. Neurons in the P layer are activated, but there is no activity in the W layer. Formal conditions on σ can be derived, but are not given here because of space limitations. In case of non-recognition, constraints on the P-layer are not enforced. It is possible for the network to detect a conﬁguration of parts that is not consistent with any stored whole. 5 Example: Interactive Activation model To illustrate the computational capabilities of our network, we use it to recreate the interactive activation (IA) model of McClelland and Rumelhart. Figure 3 shows numerical simulations of a network containing three layers of neurons representing strokes, letters, and words, respectively. There are 16 possible strokes in each of four letter positions. For each stroke, there are two neurons, one signaling the presence of the stroke and the other signaling its absence. Letter neurons represent each letter of the alphabet in each of four positions. Word neurons represent each of 1200 common four letter words. The letter and word layers correspond to the P and W layers that were introduced previously. There are bidirectional interactions between the letter and word layers, and lateral inhibition within the layers. The letter neurons also receive input from the stroke neurons, but this interaction is unidirectional. Our network differs in two ways from the original IA model. First, all interactions involving letter and word neurons are symmetric. In the original model, the interactions between the letter and word layers were asymmetric. In particular, inhibitory connections only ran from letter neurons to word neurons, and not vice versa. Second, the only nonlinearity in our model is rectiﬁcation. These two aspects allow us to apply the full machinery of the theory of permitted and forbidden sets. Figure 3 shows the result of presenting the stimulus “MO M” for four different settings of parameters. In each of the four cases, the word layer of the network converges to the same result, detecting the word “MOON”, which is the closest stored word to the stimulus. However, the activity in the letter layer is different in the four cases. input: P layer reconstruction W layer P layer reconstruction W layer completion noncompletion enforcement non-enforcement Figure 3: Simulation of 4 different parameter regimes in a letter- word recognition network. Within each panel, the middle column presents a feature- layer reconstruction based on the letter activity shown in the left column. W layer activity is shown in the right column. The top row shows the network state after 10 iterations of the dynamics. The bottom row shows the steady state. In the left column, the parameters obey the inequality (3), so that part- whole relationships are enforced. The activity of the letter layer is visualized by activating the strokes corresponding to each active letter neuron. The activated letters are part of the word “MOON”. In the top left, the inequality (4) is satisﬁed, so that the missing “O” in the stimulus is ﬁlled in. In the bottom left, completion does not occur. In the simulations of the right column, parameters are such that part- whole relationships are not enforced. Consequently, the word layer is much more active. Bottom- up input provides evidence for several other letters, which is not suppressed. In the top right, the inequality (4) is satisﬁed, so that the missing “O” in the stimulus is ﬁlled in. In the bottom right, the “O” neuron is not activated in the third position, so there is no completion. However, some letter neurons for the third position are activated, due to the input from neurons that indicate the absence of strokes. input: non-recognition event multi-stability Figure 4: Simulation of a non- recognition event and example of multistability. Figure 4 shows simulations for large σ, deep in the enforcement regime where non- recognition is a possibility. From one initial condition, the network converges to a state in which no W neurons are active, a non- recognition. From another initial condition, the network detects the word “NORM”. Deep in the enforcement regime, the top- down feedback can be so strong that the network has multiple stable states, many of which bear little resemblance to the stimulus at all. This is a problematic aspect of this network. It can be prevented by setting parameters at the edge of the enforcement regime. 6 Discussion We have analyzed a recurrent network that performs computations involving part- whole relationships. The network can ﬁll in missing parts and suppress parts that do not belong. These two computations are distinct and can be dissociated from each other, as shown in Figure 3. While these two computations can also be performed by associative memory models, they are not typically dissociable in these models. For example, in the Hopﬁeld model pattern completion and noise suppression are both the result of recall of one of a ﬁnite number of stereotyped activity patterns. We believe that our model is more appropriate for perceptual systems, because its behavior is piecewise linear, due its reliance on rectiﬁcation nonlinearity. Therefore, analog aspects of computation are able to coexist with the part-whole relationships. Furthermore, in our model the stimulus is encoded in maintained synaptic input to the network, rather than as an initial condition of the dynamics. A Appendix: Permitted and forbidden sets Our mathematical results depend on the theory of permitted and forbidden sets [3, 4], which is summarized brieﬂy here. The theory is applicable to neural networks with rectiﬁcation nonlinearity, of the form xi + xi = [bi + j Wij xj ]+ . Neuron i is said to be active when ˙ xi > 0. For a network of N neurons, there are 2N possible sets of active neurons. For each active set, consider the submatrix of Wij corresponding to the synapses between active neurons. If all eigenvalues of this submatrix have real parts less than or equal to unity, then the active set is said to be permitted. Otherwise the active set is said to be forbidden. A set is permitted if and only if there exists an input vector b such that those neurons are active at a stable steady state. Permitted sets can be regarded as memories stored in the synaptic connections Wij . If Wij is a symmetric matrix, the nesting property holds: every subset of a permitted set is permitted, and every superset of a forbidden set is forbidden. The present model can be seen as a general method for storing permitted sets in a recurrent network. This method introduces a neuron for each permitted set, relying on a unary or “grandmother cell” representation. In contrast, Xie et al.[9] used lateral inhibition in a single layer of neurons to store permitted sets. By introducing extra neurons, the present model achieves superior storage capacity, much as unary models of associative memory [1] surpass distributed models [5]. A.1 Unconditional winner-take-all in the W layer The synapses between two W-neurons have strengths 0 −α −α 0 The eigenvalues of this matrix are ±α. Therefore two W-neurons constitute a forbidden set if α > 1. By the nesting property, it follows more than two W-neurons is also a forbidden set, and that the W layer has the unconditional winner-take-all property. A.2 Part-whole combinations as permitted sets Theorem 1. Suppose that β < 1. If γ 2 < β + (1 − β)/k then any combination of k ≥ 1 parts consistent with a whole corresponds to a permitted set. Proof. Consider k parts belonging to a whole. They are represented by one W-neuron and k P-neurons, with synaptic connections given by the (k + 1) × (k + 1) matrix M= −β(11T − I) γ1 , γ1T 0 (5) where 1 is the k- dimensional vector whose elements are all equal to one. Two eigenvectors of M are of the form (1T c), and have the same eigenvalues as the 2 × 2 matrix −β(k − 1) γk γ 0 This matrix has eigenvalues less than one when γ 2 < β + (1 − β)/k and β(k − 1) + 2 > 0. The other k − 1 eigenvectors are of the form (dT , 0), where dT 1 = 0. These have eigenvalues β. Therefore all eigenvalues of W are less than one if the condition of the theorem is satisﬁed. A.3 Constraints on combining parts Here, we derive conditions under which the network can enforce the constraint that steady state activity be conﬁned to parts that constitute a whole. Theorem 2. Suppose that β > 0 and σ 2 +β 2 +γ 2 +2σβγ > 1 If a W- neuron is active, then only P- neurons corresponding to parts contained in the relevant whole can be active at a stable steady state. Proof. Consider P- neurons Pi , Pj , and W- neuron Wa . Supa a pose that ξi = 1 but ξj = 0. As shown in Figure 5, the matrix of connections is given by: W = 0 −β γ −β 0 −σ γ −σ 0 (6) Wa γ Pi -σ -β Pj Figure 5: A set of one W- neuron and two P- neurons is forbidden if one part belongs to the whole and the other does not. This set is permitted if all eigenvalues of W − I have negative real parts. The characteristic equation of I − W is λ3 + b1 λ2 + b2 λ + b3 = 0, where b1 = 3, b2 = 3 − σ 2 − β 2 − γ 2 and b3 = 1−2σβγ−σ 2 −β 2 −γ 2 . According to the Routh- Hurwitz theorem, all the eigenvalues have negative real parts if and only if b1 > 0, b3 > 0 and b1 b2 > b3 . Clearly, the ﬁrst condition is always satisﬁed. The second condition is more restrictive than the third. It is satisﬁed only when σ 2 + β 2 + γ 2 + 2σβγ < 1. Hence, one of the eigenvalues has a positive real part when this condition is broken, i.e., when σ 2 +β 2 +γ 2 +2σβγ > 1. By the nesting property, any larger set of P- neurons inconsistent with the W- neuron is also forbidden. A.4 Completion of wholes √ Theorem 3. If γ > β and a single W- neuron a is active at a steady state, then Pi > 0 a for all i such that ξi = 1. Proof. Suppose that the detected whole has k parts. At the steady state Pi = a ξi Bi − (β − γ 2 )Ptot 1−β + where Ptot = Pi = i 1 1 − β + (β − γ 2 )k k a B i ξi i=1 (7) A.5 Preventing runaway If feedback loops cause the network activity to diverge, then the preceding analyses are not relevant. Here we give a sufﬁcient condition guaranteeing that runaway instability does not happen. It is not a necessary condition. Interestingly, the condition implies the condition of Theorem 1. Theorem 4. Suppose that P and W obey the dynamics of Eqs. (1) and (2), and deﬁne the objective function E 1−α 2 = − 2 Wa a α + 2 Wa a 1−β + 2 a Pi Wa ξi + σ Bi Pi − γ i 2 ia Pi2 i 2 β + 2 Pi i a (1 − ξi )Pi Wa . (8) ia Then E is a Lyapunov like function that, given β > γ 2 − dynamics to a stable steady state. 1−γ 2 N −1 , ensures convergence of the Proof. (sketch) Differentiation of E with respect to time shows that that E is nonincreasing in the nonnegative orthant and constant only at steady states of the network dynamics. We must also show that E is radially unbounded, which is true if the quadratic part of E is copositive deﬁnite. Note that the last term of E is lower-bounded by zero and the previous term is upper bounded by γ ia Pi Wa . We assume α > 1. Thus, we can use Cauchy’s 2 2 inequality, i Pi2 ≥ ( i Pi ) /N , and the fact that a Wa ≤ ( a Wa )2 for Wa ≥ 0, to derive E≥ 1 2 Wa )2 + ( If β > γ 2 − unbounded. a 1−γ 2 N −1 , 1 − β + βN ( N Pi )2 − 2γ( i Wa a Pi ) i − Bi Pi . (9) i the quadratic form in the inequality is positive deﬁnite and E is radially References [1] E. B. Baum, J. Moody, and F. Wilczek. Internal representations for associative memory. Biol. Cybern., 59:217–228, 1988. [2] K. Fukushima. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern, 36(4):193–202, 1980. [3] R.H. Hahnloser, R. Sarpeshkar, M.A. Mahowald, R.J. Douglas, and H.S. Seung. Digital selection and analogue ampliﬁcation coexist in a cortex-inspired silicon circuit. Nature, 405(6789):947– 51, Jun 22 2000. [4] R.H. Hahnloser, H.S. Seung, and J.-J. Slotine. Permitted and forbidden sets in symmetric threshold-linear networks. Neural Computation, 15:621–638, 2003. [5] J.J. Hopﬁeld. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci U S A, 79(8):2554–8, Apr 1982. [6] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Comput., 1:541–551, 1989. [7] J. L. McClelland and D. E. Rumelhart. An interactive activation model of context effects in letter perception: Part i. an account of basic ﬁndings. Psychological Review, 88(5):375–407, Sep 1981. [8] M Riesenhuber and T Poggio. Hierarchical models of object recognition in cortex. Nat Neurosci, 2(11):1019–25, Nov 1999. [9] X. Xie, R.H. Hahnloser, and H. S. Seung. Selectively grouping neurons in recurrent networks of lateral inhibition. Neural Computation, 14:2627–2646, 2002.

4 0.06447807 64 nips-2005-Efficient estimation of hidden state dynamics from spike trains

Author: Marton G. Danoczy, Richard H. R. Hahnloser

Abstract: Neurons can have rapidly changing spike train statistics dictated by the underlying network excitability or behavioural state of an animal. To estimate the time course of such state dynamics from single- or multiple neuron recordings, we have developed an algorithm that maximizes the likelihood of observed spike trains by optimizing the state lifetimes and the state-conditional interspike-interval (ISI) distributions. Our nonparametric algorithm is free of time-binning and spike-counting problems and has the computational complexity of a Mixed-state Markov Model operating on a state sequence of length equal to the total number of recorded spikes. As an example, we ﬁt a two-state model to paired recordings of premotor neurons in the sleeping songbird. We ﬁnd that the two state-conditional ISI functions are highly similar to the ones measured during waking and singing, respectively. 1

5 0.063419685 99 nips-2005-Integrate-and-Fire models with adaptation are good enough

Author: Renaud Jolivet, Alexander Rauch, Hans-rudolf Lüscher, Wulfram Gerstner

Abstract: Integrate-and-Fire-type models are usually criticized because of their simplicity. On the other hand, the Integrate-and-Fire model is the basis of most of the theoretical studies on spiking neuron models. Here, we develop a sequential procedure to quantitatively evaluate an equivalent Integrate-and-Fire-type model based on intracellular recordings of cortical pyramidal neurons. We ﬁnd that the resulting effective model is sufﬁcient to predict the spike train of the real pyramidal neuron with high accuracy. In in vivo-like regimes, predicted and recorded traces are almost indistinguishable and a signiﬁcant part of the spikes can be predicted at the correct timing. Slow processes like spike-frequency adaptation are shown to be a key feature in this context since they are necessary for the model to connect between different driving regimes. 1

6 0.060078241 114 nips-2005-Learning Rankings via Convex Hull Separation

7 0.056372456 8 nips-2005-A Criterion for the Convergence of Learning with Spike Timing Dependent Plasticity

8 0.054340221 50 nips-2005-Convex Neural Networks

9 0.049764179 122 nips-2005-Logic and MRF Circuitry for Labeling Occluding and Thinline Visual Contours

10 0.04880077 36 nips-2005-Bayesian models of human action understanding

11 0.044192608 20 nips-2005-Affine Structure From Sound

12 0.042191774 97 nips-2005-Inferring Motor Programs from Images of Handwritten Digits

13 0.040166788 157 nips-2005-Principles of real-time computing with feedback applied to cortical microcircuit models

14 0.039658025 187 nips-2005-Temporal Abstraction in Temporal-difference Networks

15 0.037827108 67 nips-2005-Extracting Dynamical Structure Embedded in Neural Activity

16 0.037495974 117 nips-2005-Learning from Data of Variable Quality

17 0.035526946 118 nips-2005-Learning in Silicon: Timing is Everything

18 0.034687489 95 nips-2005-Improved risk tail bounds for on-line algorithms

19 0.034465615 124 nips-2005-Measuring Shared Information and Coordinated Activity in Neuronal Networks

20 0.034154229 104 nips-2005-Laplacian Score for Feature Selection

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.1), (1, -0.097), (2, 0.003), (3, -0.05), (4, 0.01), (5, 0.015), (6, -0.009), (7, 0.028), (8, -0.015), (9, 0.003), (10, 0.001), (11, 0.019), (12, -0.009), (13, -0.012), (14, -0.029), (15, 0.005), (16, 0.052), (17, 0.057), (18, -0.014), (19, 0.046), (20, 0.037), (21, 0.037), (22, 0.043), (23, 0.002), (24, -0.022), (25, -0.064), (26, 0.036), (27, 0.13), (28, 0.113), (29, -0.079), (30, 0.184), (31, 0.098), (32, -0.166), (33, 0.206), (34, -0.049), (35, 0.094), (36, 0.043), (37, -0.068), (38, -0.104), (39, -0.014), (40, 0.084), (41, 0.01), (42, 0.044), (43, 0.049), (44, -0.048), (45, -0.03), (46, -0.068), (47, 0.113), (48, 0.002), (49, -0.0)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95456678 6 nips-2005-A Connectionist Model for Constructive Modal Reasoning

Author: Artur Garcez, Luis C. Lamb, Dov M. Gabbay

2 0.60808456 164 nips-2005-Representing Part-Whole Relationships in Recurrent Neural Networks

Author: Viren Jain, Valentin Zhigulin, H. S. Seung

3 0.49331528 181 nips-2005-Spiking Inputs to a Winner-take-all Network

Author: Matthias Oster, Shih-Chii Liu

4 0.43355948 64 nips-2005-Efficient estimation of hidden state dynamics from spike trains

Author: Marton G. Danoczy, Richard H. R. Hahnloser

5 0.39851815 124 nips-2005-Measuring Shared Information and Coordinated Activity in Neuronal Networks

Author: Kristina Klinkner, Cosma Shalizi, Marcelo Camperi

Abstract: Most nervous systems encode information about stimuli in the responding activity of large neuronal networks. This activity often manifests itself as dynamically coordinated sequences of action potentials. Since multiple electrode recordings are now a standard tool in neuroscience research, it is important to have a measure of such network-wide behavioral coordination and information sharing, applicable to multiple neural spike train data. We propose a new statistic, informational coherence, which measures how much better one unit can be predicted by knowing the dynamical state of another. We argue informational coherence is a measure of association and shared information which is superior to traditional pairwise measures of synchronization and correlation. To ﬁnd the dynamical states, we use a recently-introduced algorithm which reconstructs effective state spaces from stochastic time series. We then extend the pairwise measure to a multivariate analysis of the network by estimating the network multi-information. We illustrate our method by testing it on a detailed model of the transition from gamma to beta rhythms. Much of the most important information in neural systems is shared over multiple neurons or cortical areas, in such forms as population codes and distributed representations [1]. On behavioral time scales, neural information is stored in temporal patterns of activity as opposed to static markers; therefore, as information is shared between neurons or brain regions, it is physically instantiated as coordination between entire sequences of neural spikes. Furthermore, neural systems and regions of the brain often require coordinated neural activity to perform important functions; acting in concert requires multiple neurons or cortical areas to share information [2]. Thus, if we want to measure the dynamic network-wide behavior of neurons and test hypotheses about them, we need reliable, practical methods to detect and quantify behavioral coordination and the associated information sharing across multiple neural units. These would be especially useful in testing ideas about how particular forms of coordination relate to distributed coding (e.g., that of [3]). Current techniques to analyze relations among spike trains handle only pairs of neurons, so we further need a method which is extendible to analyze the coordination in the network, system, or region as a whole. Here we propose a new measure of behavioral coordination and information sharing, informational coherence, based on the notion of dynamical state. Section 1 argues that coordinated behavior in neural systems is often not captured by exist- ing measures of synchronization or correlation, and that something sensitive to nonlinear, stochastic, predictive relationships is needed. Section 2 deﬁnes informational coherence as the (normalized) mutual information between the dynamical states of two systems and explains how looking at the states, rather than just observables, fulﬁlls the needs laid out in Section 1. Since we rarely know the right states a prori, Section 2.1 brieﬂy describes how we reconstruct effective state spaces from data. Section 2.2 gives some details about how we calculate the informational coherence and approximate the global information stored in the network. Section 3 applies our method to a model system (a biophysically detailed conductance-based model) comparing our results to those of more familiar second-order statistics. In the interest of space, we omit proofs and a full discussion of the existing literature, giving only minimal references here; proofs and references will appear in a longer paper now in preparation. 1 Synchrony or Coherence? Most hypotheses which involve the idea that information sharing is reﬂected in coordinated activity across neural units invoke a very speciﬁc notion of coordinated activity, namely strict synchrony: the units should be doing exactly the same thing (e.g., spiking) at exactly the same time. Investigators then measure coordination by measuring how close the units come to being strictly synchronized (e.g., variance in spike times). From an informational point of view, there is no reason to favor strict synchrony over other kinds of coordination. One neuron consistently spiking 50 ms after another is just as informative a relationship as two simultaneously spiking, but such stable phase relations are missed by strict-synchrony approaches. Indeed, whatever the exact nature of the neural code, it uses temporally extended patterns of activity, and so information sharing should be reﬂected in coordination of those patterns, rather than just the instantaneous activity. There are three common ways of going beyond strict synchrony: cross-correlation and related second-order statistics, mutual information, and topological generalized synchrony. The cross-correlation function (the normalized covariance function; this includes, for present purposes, the joint peristimulus time histogram [2]), is one of the most widespread measures of synchronization. It can be efﬁciently calculated from observable series; it handles statistical as well as deterministic relationships between processes; by incorporating variable lags, it reduces the problem of phase locking. Fourier transformation of the covariance function γXY (h) yields the cross-spectrum FXY (ν), which in turn gives the 2 spectral coherence cXY (ν) = FXY (ν)/FX (ν)FY (ν), a normalized correlation between the Fourier components of X and Y . Integrated over frequencies, the spectral coherence measures, essentially, the degree of linear cross-predictability of the two series. ([4] applies spectral coherence to coordinated neural activity.) However, such second-order statistics only handle linear relationships. Since neural processes are known to be strongly nonlinear, there is little reason to think these statistics adequately measure coordination and synchrony in neural systems. Mutual information is attractive because it handles both nonlinear and stochastic relationships and has a very natural and appealing interpretation. Unfortunately, it often seems to fail in practice, being disappointingly small even between signals which are known to be tightly coupled [5]. The major reason is that the neural codes use distinct patterns of activity over time, rather than many different instantaneous actions, and the usual approach misses these extended patterns. Consider two neurons, one of which drives the other to spike 50 ms after it does, the driving neuron spiking once every 500 ms. These are very tightly coordinated, but whether the ﬁrst neuron spiked at time t conveys little information about what the second neuron is doing at t — it’s not spiking, but it’s not spiking most of the time anyway. Mutual information calculated from the direct observations conﬂates the “no spike” of the second neuron preparing to ﬁre with its just-sitting-around “no spike”. Here, mutual information could ﬁnd the coordination if we used a 50 ms lag, but that won’t work in general. Take two rate-coding neurons with base-line ﬁring rates of 1 Hz, and suppose that a stimulus excites one to 10 Hz and suppresses the other to 0.1 Hz. The spiking rates thus share a lot of information, but whether the one neuron spiked at t is uninformative about what the other neuron did then, and lagging won’t help. Generalized synchrony is based on the idea of establishing relationships between the states of the various units. “State” here is taken in the sense of physics, dynamics and control theory: the state at time t is a variable which ﬁxes the distribution of observables at all times ≥ t, rendering the past of the system irrelevant [6]. Knowing the state allows us to predict, as well as possible, how the system will evolve, and how it will respond to external forces [7]. Two coupled systems are said to exhibit generalized synchrony if the state of one system is given by a mapping from the state of the other. Applications to data employ statespace reconstruction [8]: if the state x ∈ X evolves according to smooth, d-dimensional deterministic dynamics, and we observe a generic function y = f (x), then the space Y of time-delay vectors [y(t), y(t − τ ), ...y(t − (k − 1)τ )] is diffeomorphic to X if k > 2d, for generic choices of lag τ . The various versions of generalized synchrony differ on how, precisely, to quantify the mappings between reconstructed state spaces, but they all appear to be empirically equivalent to one another and to notions of phase synchronization based on Hilbert transforms [5]. Thus all of these measures accommodate nonlinear relationships, and are potentially very ﬂexible. Unfortunately, there is essentially no reason to believe that neural systems have deterministic dynamics at experimentally-accessible levels of detail, much less that there are deterministic relationships among such states for different units. What we want, then, but none of these alternatives provides, is a quantity which measures predictive relationships among states, but allows those relationships to be nonlinear and stochastic. The next section introduces just such a measure, which we call “informational coherence”. 2 States and Informational Coherence There are alternatives to calculating the “surface” mutual information between the sequences of observations themselves (which, as described, fails to capture coordination). If we know that the units are phase oscillators, or rate coders, we can estimate their instantaneous phase or rate and, by calculating the mutual information between those variables, see how coordinated the units’ patterns of activity are. However, phases and rates do not exhaust the repertoire of neural patterns and a more general, common scheme is desirable. The most general notion of “pattern of activity” is simply that of the dynamical state of the system, in the sense mentioned above. We now formalize this. Assuming the usual notation for Shannon information [9], the information content of a state variable X is H[X] and the mutual information between X and Y is I[X; Y ]. As is well-known, I[X; Y ] ≤ min H[X], H[Y ]. We use this to normalize the mutual state information to a 0 − 1 scale, and this is the informational coherence (IC). ψ(X, Y ) = I[X; Y ] , with 0/0 = 0 . min H[X], H[Y ] (1) ψ can be interpreted as follows. I[X; Y ] is the Kullback-Leibler divergence between the joint distribution of X and Y , and the product of their marginal distributions [9], indicating the error involved in ignoring the dependence between X and Y . The mutual information between predictive, dynamical states thus gauges the error involved in assuming the two systems are independent, i.e., how much predictions could improve by taking into account the dependence. Hence it measures the amount of dynamically-relevant information shared between the two systems. ψ simply normalizes this value, and indicates the degree to which two systems have coordinated patterns of behavior (cf. [10], although this only uses directly observable quantities). 2.1 Reconstruction and Estimation of Effective State Spaces As mentioned, the state space of a deterministic dynamical system can be reconstructed from a sequence of observations. This is the main tool of experimental nonlinear dynamics [8]; but the assumption of determinism is crucial and false, for almost any interesting neural system. While classical state-space reconstruction won’t work on stochastic processes, such processes do have state-space representations [11], and, in the special case of discretevalued, discrete-time series, there are ways to reconstruct the state space. Here we use the CSSR algorithm, introduced in [12] (code available at http://bactra.org/CSSR). This produces causal state models, which are stochastic automata capable of statistically-optimal nonlinear prediction; the state of the machine is a minimal sufﬁcient statistic for the future of the observable process[13].1 The basic idea is to form a set of states which should be (1) Markovian, (2) sufﬁcient statistics for the next observable, and (3) have deterministic transitions (in the automata-theory sense). The algorithm begins with a minimal, one-state, IID model, and checks whether these properties hold, by means of hypothesis tests. If they fail, the model is modiﬁed, generally but not always by adding more states, and the new model is checked again. Each state of the model corresponds to a distinct distribution over future events, i.e., to a statistical pattern of behavior. Under mild conditions, which do not involve prior knowledge of the state space, CSSR converges in probability to the unique causal state model of the data-generating process [12]. In practice, CSSR is quite fast (linear in the data size), and generalizes at least as well as training hidden Markov models with the EM algorithm and using cross-validation for selection, the standard heuristic [12]. One advantage of the causal state approach (which it shares with classical state-space reconstruction) is that state estimation is greatly simpliﬁed. In the general case of nonlinear state estimation, it is necessary to know not just the form of the stochastic dynamics in the state space and the observation function, but also their precise parametric values and the distribution of observation and driving noises. Estimating the state from the observable time series then becomes a computationally-intensive application of Bayes’s Rule [17]. Due to the way causal states are built as statistics of the data, with probability 1 there is a ﬁnite time, t, at which the causal state at time t is certain. This is not just with some degree of belief or conﬁdence: because of the way the states are constructed, it is impossible for the process to be in any other state at that time. Once the causal state has been established, it can be updated recursively, i.e., the causal state at time t + 1 is an explicit function of the causal state at time t and the observation at t + 1. The causal state model can be automatically converted, therefore, into a ﬁnite-state transducer which reads in an observation time series and outputs the corresponding series of states [18, 13]. (Our implementation of CSSR ﬁlters its training data automatically.) The result is a new time series of states, from which all non-predictive components have been ﬁltered out. 2.2 Estimating the Coherence Our algorithm for estimating the matrix of informational coherences is as follows. For each unit, we reconstruct the causal state model, and ﬁlter the observable time series to produce a series of causal states. Then, for each pair of neurons, we construct a joint histogram of 1 Causal state models have the same expressive power as observable operator models [14] or predictive state representations [7], and greater power than variable-length Markov models [15, 16]. a b Figure 1: Rastergrams of neuronal spike-times in the network. Excitatory, pyramidal neurons (numbers 1 to 1000) are shown in green, inhibitory interneurons (numbers 1001 to 1300) in red. During the ﬁrst 10 seconds (a), the current connections among the pyramidal cells are suppressed and a gamma rhythm emerges (left). At t = 10s, those connections become active, leading to a beta rhythm (b, right). the state distribution, estimate the mutual information between the states, and normalize by the single-unit state informations. This gives a symmetric matrix of ψ values. Even if two systems are independent, their estimated IC will, on average, be positive, because, while they should have zero mutual information, the empirical estimate of mutual information is non-negative. Thus, the signiﬁcance of IC values must be assessed against the null hypothesis of system independence. The easiest way to do so is to take the reconstructed state models for the two systems and run them forward, independently of one another, to generate a large number of simulated state sequences; from these calculate values of the IC. This procedure will approximate the sampling distribution of the IC under a null model which preserves the dynamics of each system, but not their interaction. We can then ﬁnd p-values as usual. We omit them here to save space. 2.3 Approximating the Network Multi-Information There is broad agreement [2] that analyses of networks should not just be an analysis of pairs of neurons, averaged over pairs. Ideally, an analysis of information sharing in a network would look at the over-all structure of statistical dependence between the various units, reﬂected in the complete joint probability distribution P of the states. This would then allow us, for instance, to calculate the n-fold multi-information, I[X1 , X2 , . . . Xn ] ≡ D(P ||Q), the Kullback-Leibler divergence between the joint distribution P and the product of marginal distributions Q, analogous to the pairwise mutual information [19]. Calculated over the predictive states, the multi-information would give the total amount of shared dynamical information in the system. Just as we normalized the mutual information I[X1 , X2 ] by its maximum possible value, min H[X1 ], H[X2 ], we normalize the multiinformation by its maximum, which is the smallest sum of n − 1 marginal entropies: I[X1 ; X2 ; . . . Xn ] ≤ min k H[Xn ] i=k Unfortunately, P is a distribution over a very high dimensional space and so, hard to estimate well without strong parametric constraints. We thus consider approximations. The lowest-order approximation treats all the units as independent; this is the distribution Q. One step up are tree distributions, where the global distribution is a function of the joint distributions of pairs of units. Not every pair of units needs to enter into such a distribution, though every unit must be part of some pair. Graphically, a tree distribution corresponds to a spanning tree, with edges linking units whose interactions enter into the global probability, and conversely spanning trees determine tree distributions. Writing ET for the set of pairs (i, j) and abbreviating X1 = x1 , X2 = x2 , . . . Xn = xn by X = x, one has n T (X = x) = (i,j)∈ET T (Xi = xi , Xj = xj ) T (Xi = xi ) T (Xi = xi )T (Xj = xj ) i=1 (2) where the marginal distributions T (Xi ) and the pair distributions T (Xi , Xj ) are estimated by the empirical marginal and pair distributions. We must now pick edges ET so that T best approximates the true global distribution P . A natural approach is to minimize D(P ||T ), the divergence between P and its tree approximation. Chow and Liu [20] showed that the maximum-weight spanning tree gives the divergence-minimizing distribution, taking an edge’s weight to be the mutual information between the variables it links. There are three advantages to using the Chow-Liu approximation. (1) Estimating T from empirical probabilities gives a consistent maximum likelihood estimator of the ideal ChowLiu tree [20], with reasonable rates of convergence, so T can be reliably known even if P cannot. (2) There are efﬁcient algorithms for constructing maximum-weight spanning trees, such as Prim’s algorithm [21, sec. 23.2], which runs in time O(n2 + n log n). Thus, the approximation is computationally tractable. (3) The KL divergence of the Chow-Liu distribution from Q gives a lower bound on the network multi-information; that bound is just the sum of the mutual informations along the edges in the tree: I[X1 ; X2 ; . . . Xn ] ≥ D(T ||Q) = I[Xi ; Xj ] (3) (i,j)∈ET Even if we knew P exactly, Eq. 3 would be useful as an alternative to calculating D(P ||Q) directly, evaluating log P (x)/Q(x) for all the exponentially-many conﬁgurations x. It is natural to seek higher-order approximations to P , e.g., using three-way interactions not decomposable into pairwise interactions [22, 19]. But it is hard to do so effectively, because ﬁnding the optimal approximation to P when such interactions are allowed is NP [23], and analytical formulas like Eq. 3 generally do not exist [19]. We therefore conﬁne ourselves to the Chow-Liu approximation here. 3 Example: A Model of Gamma and Beta Rhythms We use simulated data as a test case, instead of empirical multiple electrode recordings, which allows us to try the method on a system of over 1000 neurons and compare the measure against expected results. The model, taken from [24], was originally designed to study episodes of gamma (30–80Hz) and beta (12–30Hz) oscillations in the mammalian nervous system, which often occur successively with a spontaneous transition between them. More concretely, the rhythms studied were those displayed by in vitro hippocampal (CA1) slice preparations and by in vivo neocortical EEGs. The model contains two neuron populations: excitatory (AMPA) pyramidal neurons and inhibitory (GABAA ) interneurons, deﬁned by conductance-based Hodgkin-Huxley-style equations. Simulations were carried out in a network of 1000 pyramidal cells and 300 interneurons. Each cell was modeled as a one-compartment neuron with all-to-all coupling, endowed with the basic sodium and potassium spiking currents, an external applied current, and some Gaussian input noise. The ﬁrst 10 seconds of the simulation correspond to the gamma rhythm, in which only a group of neurons is made to spike via a linearly increasing applied current. The beta rhythm a b c d Figure 2: Heat-maps of coordination for the network, as measured by zero-lag cross-correlation (top row) and informational coherence (bottom), contrasting the gamma rhythm (left column) with the beta (right). Colors run from red (no coordination) through yellow to pale cream (maximum). (subsequent 10 seconds) is obtained by activating pyramidal-pyramidal recurrent connections (potentiated by Hebbian preprocessing as a result of synchrony during the gamma rhythm) and a slow outward after-hyper-polarization (AHP) current (the M-current), suppressed during gamma due to the metabotropic activation used in the generation of the rhythm. During the beta rhythm, pyramidal cells, silent during gamma rhythm, ﬁre on a subset of interneurons cycles (Fig. 1). Fig. 2 compares zero-lag cross-correlation, a second-order method of quantifying coordination, with the informational coherence calculated from the reconstructed states. (In this simulation, we could have calculated the actual states of the model neurons directly, rather than reconstructing them, but for purposes of testing our method we did not.) Crosscorrelation ﬁnds some of the relationships visible in Fig. 1, but is confused by, for instance, the phase shifts between pyramidal cells. (Surface mutual information, not shown, gives similar results.) Informational coherence, however, has no trouble recognizing the two populations as effectively coordinated blocks. The presence of dynamical noise, problematic for ordinary state reconstruction, is not an issue. The average IC is 0.411 (or 0.797 if the inactive, low-numbered neurons are excluded). The tree estimate of the global informational multi-information is 3243.7 bits, with a global coherence of 0.777. The right half of Fig. 2 repeats this analysis for the beta rhythm; in this stage, the average IC is 0.614, and the tree estimate of the global multi-information is 7377.7 bits, though the estimated global coherence falls very slightly to 0.742. This is because low-numbered neurons which were quiescent before are now active, contributing to the global information, but the over-all pattern is somewhat weaker and more noisy (as can be seen from Fig. 1b.) So, as expected, the total information content is higher, but the overall coordination across the network is lower. 4 Conclusion Informational coherence provides a measure of neural information sharing and coordinated activity which accommodates nonlinear, stochastic relationships between extended patterns of spiking. It is robust to dynamical noise and leads to a genuinely multivariate measure of global coordination across networks or regions. Applied to data from multi-electrode recordings, it should be a valuable tool in evaluating hypotheses about distributed neural representation and function. Acknowledgments Thanks to R. Haslinger, E. Ionides and S. Page; and for support to the Santa Fe Institute (under grants from Intel, the NSF and the MacArthur Foundation, and DARPA agreement F30602-00-2-0583), the Clare Booth Luce Foundation (KLK) and the James S. McDonnell Foundation (CRS). References [1] L. F. Abbott and T. J. Sejnowski, eds. Neural Codes and Distributed Representations. MIT Press, 1998. [2] E. N. Brown, R. E. Kass, and P. P. Mitra. Nature Neuroscience, 7:456–461, 2004. [3] D. H. Ballard, Z. Zhang, and R. P. N. Rao. In R. P. N. Rao, B. A. Olshausen, and M. S. Lewicki, eds., Probabilistic Models of the Brain, pp. 273–284, MIT Press, 2002. [4] D. R. Brillinger and A. E. P. Villa. In D. R. Brillinger, L. T. Fernholz, and S. Morgenthaler, eds., The Practice of Data Analysis, pp. 77–92. Princeton U.P., 1997. [5] R. Quian Quiroga et al. Physical Review E, 65:041903, 2002. [6] R. F. Streater. Statistical Dynamics. Imperial College Press, London. [7] M. L. Littman, R. S. Sutton, and S. Singh. In T. G. Dietterich, S. Becker, and Z. Ghahramani, eds., Advances in Neural Information Processing Systems 14, pp. 1555–1561. MIT Press, 2002. [8] H. Kantz and T. Schreiber. Nonlinear Time Series Analysis. Cambridge U.P., 1997. [9] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, 1991. [10] M. Palus et al. Physical Review E, 63:046211, 2001. [11] F. B. Knight. Annals of Probability, 3:573–596, 1975. [12] C. R. Shalizi and K. L. Shalizi. In M. Chickering and J. Halpern, eds., Uncertainty in Artiﬁcial Intelligence: Proceedings of the Twentieth Conference, pp. 504–511. AUAI Press, 2004. [13] C. R. Shalizi and J. P. Crutchﬁeld. Journal of Statistical Physics, 104:817–819, 2001. [14] H. Jaeger. Neural Computation, 12:1371–1398, 2000. [15] D. Ron, Y. Singer, and N. Tishby. Machine Learning, 25:117–149, 1996. [16] P. B¨ hlmann and A. J. Wyner. Annals of Statistics, 27:480–513, 1999. u [17] N. U. Ahmed. Linear and Nonlinear Filtering for Scientists and Engineers. World Scientiﬁc, 1998. [18] D. R. Upper. PhD thesis, University of California, Berkeley, 1997. [19] E. Schneidman, S. Still, M. J. Berry, and W. Bialek. Physical Review Letters, 91:238701, 2003. [20] C. K. Chow and C. N. Liu. IEEE Transactions on Information Theory, IT-14:462–467, 1968. [21] T. H. Cormen et al. Introduction to Algorithms. 2nd ed. MIT Press, 2001. [22] S. Amari. IEEE Transacttions on Information Theory, 47:1701–1711, 2001. [23] S. Kirshner, P. Smyth, and A. Robertson. Tech. Rep. 04-04, UC Irvine, Information and Computer Science, 2004. [24] M. S. Olufsen et al. Journal of Computational Neuroscience, 14:33–54, 2003.

6 0.33750319 61 nips-2005-Dynamical Synapses Give Rise to a Power-Law Distribution of Neuronal Avalanches

7 0.30537027 36 nips-2005-Bayesian models of human action understanding

8 0.30229291 99 nips-2005-Integrate-and-Fire models with adaptation are good enough

9 0.29778695 114 nips-2005-Learning Rankings via Convex Hull Separation

10 0.27708068 187 nips-2005-Temporal Abstraction in Temporal-difference Networks

11 0.26699665 122 nips-2005-Logic and MRF Circuitry for Labeling Occluding and Thinline Visual Contours

12 0.2644031 143 nips-2005-Off-Road Obstacle Avoidance through End-to-End Learning

13 0.2643635 62 nips-2005-Efficient Estimation of OOMs

14 0.26345614 73 nips-2005-Fast biped walking with a reflexive controller and real-time policy searching

15 0.26196048 50 nips-2005-Convex Neural Networks

16 0.24706256 184 nips-2005-Structured Prediction via the Extragradient Method

17 0.2438091 157 nips-2005-Principles of real-time computing with feedback applied to cortical microcircuit models

18 0.23198994 161 nips-2005-Radial Basis Function Network for Multi-task Learning

19 0.22854741 7 nips-2005-A Cortically-Plausible Inverse Problem Solving Method Applied to Recognizing Static and Kinematic 3D Objects

20 0.22851114 142 nips-2005-Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.047), (10, 0.02), (27, 0.013), (31, 0.049), (34, 0.044), (55, 0.011), (69, 0.577), (73, 0.011), (77, 0.013), (88, 0.053), (91, 0.042)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.97663975 40 nips-2005-CMOL CrossNets: Possible Neuromorphic Nanoelectronic Circuits

Author: Jung Hoon Lee, Xiaolong Ma, Konstantin K. Likharev

Abstract: Hybrid “CMOL” integrated circuits, combining CMOS subsystem with nanowire crossbars and simple two-terminal nanodevices, promise to extend the exponential Moore-Law development of microelectronics into the sub-10-nm range. We are developing neuromorphic network (“CrossNet”) architectures for this future technology, in which neural cell bodies are implemented in CMOS, nanowires are used as axons and dendrites, while nanodevices (bistable latching switches) are used as elementary synapses. We have shown how CrossNets may be trained to perform pattern recovery and classification despite the limitations imposed by the CMOL hardware. Preliminary estimates have shown that CMOL CrossNets may be extremely dense (~10 7 cells per cm2) and operate approximately a million times faster than biological neural networks, at manageable power consumption. In Conclusion, we discuss in brief possible short-term and long-term applications of the emerging technology. 1 Introduction: CMOL Circuits Recent results [1, 2] indicate that the current VLSI paradigm based on CMOS technology can be hardly extended beyond the 10-nm frontier: in this range the sensitivity of parameters (most importantly, the gate voltage threshold) of silicon field-effect transistors to inevitable fabrication spreads grows exponentially. This sensitivity will probably send the fabrication facilities costs skyrocketing, and may lead to the end of Moore’s Law some time during the next decade. There is a growing consensus that the impending Moore’s Law crisis may be preempted by a radical paradigm shift from the purely CMOS technology to hybrid CMOS/nanodevice circuits, e.g., those of “CMOL” variety (Fig. 1). Such circuits (see, e.g., Ref. 3 for their recent review) would combine a level of advanced CMOS devices fabricated by the lithographic patterning, and two-layer nanowire crossbar formed, e.g., by nanoimprint, with nanowires connected by simple, similar, two-terminal nanodevices at each crosspoint. For such devices, molecular single-electron latching switches [4] are presently the leading candidates, in particular because they may be fabricated using the self-assembled monolayer (SAM) technique which already gave reproducible results for simpler molecular devices [5]. (a) nanodevices nanowiring and nanodevices interface pins upper wiring level of CMOS stack (b) βFCMOS Fnano α Fig. 1. CMOL circuit: (a) schematic side view, and (b) top-view zoom-in on several adjacent interface pins. (For clarity, only two adjacent nanodevices are shown.) In order to overcome the CMOS/nanodevice interface problems pertinent to earlier proposals of hybrid circuits [6], in CMOL the interface is provided by pins that are distributed all over the circuit area, on the top of the CMOS stack. This allows to use advanced techniques of nanowire patterning (like nanoimprint) which do not have nanoscale accuracy of layer alignment [3]. The vital feature of this interface is the tilt, by angle α = arcsin(Fnano/βFCMOS), of the nanowire crossbar relative to the square arrays of interface pins (Fig. 1b). Here Fnano is the nanowiring half-pitch, FCMOS is the half-pitch of the CMOS subsystem, and β is a dimensionless factor larger than 1 that depends on the CMOS cell complexity. Figure 1b shows that this tilt allows the CMOS subsystem to address each nanodevice even if Fnano << βFCMOS. By now, it has been shown that CMOL circuits can combine high performance with high defect tolerance (which is necessary for any circuit using nanodevices) for several digital applications. In particular, CMOL circuits with defect rates below a few percent would enable terabit-scale memories [7], while the performance of FPGA-like CMOL circuits may be several hundred times above that of overcome purely CMOL FPGA (implemented with the same FCMOS), at acceptable power dissipation and defect tolerance above 20% [8]. In addition, the very structure of CMOL circuits makes them uniquely suitable for the implementation of more complex, mixed-signal information processing systems, including ultradense and ultrafast neuromorphic networks. The objective of this paper is to describe in brief the current status of our work on the development of so-called Distributed Crossbar Networks (“CrossNets”) that could provide high performance despite the limitations imposed by CMOL hardware. A more detailed description of our earlier results may be found in Ref. 9. 2 Synapses The central device of CrossNet is a two-terminal latching switch [3, 4] (Fig. 2a) which is a combination of two single-electron devices, a transistor and a trap [3]. The device may be naturally implemented as a single organic molecule (Fig. 2b). Qualitatively, the device operates as follows: if voltage V = Vj – Vk applied between the external electrodes (in CMOL, nanowires) is low, the trap island has no net electric charge, and the single-electron transistor is closed. If voltage V approaches certain threshold value V+ > 0, an additional electron is inserted into the trap island, and its field lifts the Coulomb blockade of the single-electron transistor, thus connecting the nanowires. The switch state may be reset (e.g., wires disconnected) by applying a lower voltage V < V- < V+. Due to the random character of single-electron tunneling [2], the quantitative description of the switch is by necessity probabilistic: actually, V determines only the rates Γ↑↓ of device switching between its ON and OFF states. The rates, in turn, determine the dynamics of probability p to have the transistor opened (i.e. wires connected): dp/dt = Γ↑(1 - p) - Γ↓p. (1) The theory of single-electron tunneling [2] shows that, in a good approximation, the rates may be presented as Γ↑↓ = Γ0 exp{±e(V - S)/kBT} , (2) (a) single-electron trap tunnel junction Vj Vk single-electron transistor (b) O clipping group O N C R diimide acceptor groups O O C N R R O OPE wires O N R R N O O R O N R R = hexyl N O O R R O N C R R R Fig. 2. (a) Schematics and (b) possible molecular implementation of the two-terminal single-electron latching switch where Γ0 and S are constants depending on physical parameters of the latching switches. Note that despite the random character of switching, the strong nonlinearity of Eq. (2) allows to limit the degree of the device “fuzziness”. 3 CrossNets Figure 3a shows the generic structure of a CrossNet. CMOS-implemented somatic cells (within the Fire Rate model, just nonlinear differential amplifiers, see Fig. 3b,c) apply their output voltages to “axonic” nanowires. If the latching switch, working as an elementary synapse, on the crosspoint of an axonic wire with the perpendicular “dendritic” wire is open, some current flows into the latter wire, charging it. Since such currents are injected into each dendritic wire through several (many) open synapses, their addition provides a natural passive analog summation of signals from the corresponding somas, typical for all neural networks. Examining Fig. 3a, please note the open-circuit terminations of axonic and dendritic lines at the borders of the somatic cells; due to these terminations the somas do not communicate directly (but only via synapses). The network shown on Fig. 3 is evidently feedforward; recurrent networks are achieved in the evident way by doubling the number of synapses and nanowires per somatic cell (Fig. 3c). Moreover, using dual-rail (bipolar) representation of the signal, and hence doubling the number of nanowires and elementary synapses once again, one gets a CrossNet with somas coupled by compact 4-switch groups [9]. Using Eqs. (1) and (2), it is straightforward to show that that the average synaptic weight wjk of the group obeys the “quasi-Hebbian” rule: d w jk = −4Γ0 sinh (γ S ) sinh (γ V j ) sinh (γ Vk ) . dt (3) (a) - +soma j (b) RL + -- jk+ RL (c) jk- RL + -- -+soma k RL Fig. 3. (a) Generic structure of the simplest, (feedforward, non-Hebbian) CrossNet. Red lines show “axonic”, and blue lines “dendritic” nanowires. Gray squares are interfaces between nanowires and CMOS-based somas (b, c). Signs show the dendrite input polarities. Green circles denote molecular latching switches forming elementary synapses. Bold red and blue points are open-circuit terminations of the nanowires, that do not allow somas to interact in bypass of synapses In the simplest cases (e.g., quasi-Hopfield networks with finite connectivity), the tri-level synaptic weights of the generic CrossNets are quite satisfactory, leading to just a very modest (~30%) network capacity loss. However, some applications (in particular, pattern classification) may require a larger number of weight quantization levels L (e.g., L ≈ 30 for a 1% fidelity [9]). This may be achieved by using compact square arrays (e.g., 4×4) of latching switches (Fig. 4). Various species of CrossNets [9] differ also by the way the somatic cells are distributed around the synaptic field. Figure 5 shows feedforward versions of two CrossNet types most explored so far: the so-called FlossBar and InBar. The former network is more natural for the implementation of multilayered perceptrons (MLP), while the latter system is preferable for recurrent network implementations and also allows a simpler CMOS design of somatic cells. The most important advantage of CrossNets over the hardware neural networks suggested earlier is that these networks allow to achieve enormous density combined with large cell connectivity M >> 1 in quasi-2D electronic circuits. 4 CrossNet training CrossNet training faces several hardware-imposed challenges: (i) The synaptic weight contribution provided by the elementary latching switch is binary, so that for most applications the multi-switch synapses (Fig. 4) are necessary. (ii) The only way to adjust any particular synaptic weight is to turn ON or OFF the corresponding latching switch(es). This is only possible to do by applying certain voltage V = Vj – Vk between the two corresponding nanowires. At this procedure, other nanodevices attached to the same wires should not be disturbed. (iii) As stated above, synapse state switching is a statistical progress, so that the degree of its “fuzziness” should be carefully controlled. (a) Vj (b) V w – A/2 i=1 i=1 2 2 … … n n Vj V w+ A/2 i' = 1 RL 2 … i' = 1 n RS ±(V t –A/2) 2 … RS n ±(V t +A/2) Fig. 4. Composite synapse for providing L = 2n2+1 discrete levels of the weight in (a) operation and (b) weight adjustment modes. The dark-gray rectangles are resistive metallic strips at soma/nanowire interfaces (a) (b) Fig. 5. Two main CrossNet species: (a) FlossBar and (b) InBar, in the generic (feedforward, non-Hebbian, ternary-weight) case for the connectivity parameter M = 9. Only the nanowires and nanodevices coupling one cell (indicated with red dashed lines) to M post-synaptic cells (blue dashed lines) are shown; actually all the cells are similarly coupled We have shown that these challenges may be met using (at least) the following training methods [9]: (i) Synaptic weight import. This procedure is started with training of a homomorphic “precursor” artificial neural network with continuous synaptic weighs wjk, implemented in software, using one of established methods (e.g., error backpropagation). Then the synaptic weights wjk are transferred to the CrossNet, with some “clipping” (rounding) due to the binary nature of elementary synaptic weights. To accomplish the transfer, pairs of somatic cells are sequentially selected via CMOS-level wiring. Using the flexibility of CMOS circuitry, these cells are reconfigured to apply external voltages ±VW to the axonic and dendritic nanowires leading to a particular synapse, while all other nanowires are grounded. The voltage level V W is selected so that it does not switch the synapses attached to only one of the selected nanowires, while voltage 2VW applied to the synapse at the crosspoint of the selected wires is sufficient for its reliable switching. (In the composite synapses with quasi-continuous weights (Fig. 4), only a part of the corresponding switches is turned ON or OFF.) (ii) Error backpropagation. The synaptic weight import procedure is straightforward when wjk may be simply calculated, e.g., for the Hopfield-type networks. However, for very large CrossNets used, e.g., as pattern classifiers the precursor network training may take an impracticably long time. In this case the direct training of a CrossNet may become necessary. We have developed two methods of such training, both based on “Hebbian” synapses consisting of 4 elementary synapses (latching switches) whose average weight dynamics obeys Eq. (3). This quasi-Hebbian rule may be used to implement the backpropagation algorithm either using a periodic time-multiplexing [9] or in a continuous fashion, using the simultaneous propagation of signals and errors along the same dual-rail channels. As a result, presently we may state that CrossNets may be taught to perform virtually all major functions demonstrated earlier with the usual neural networks, including the corrupted pattern restoration in the recurrent quasi-Hopfield mode and pattern classification in the feedforward MLP mode [11]. 5 C r o s s N e t p e r f o r m an c e e s t i m a t e s The significance of this result may be only appreciated in the context of unparalleled physical parameters of CMOL CrossNets. The only fundamental limitation on the half-pitch Fnano (Fig. 1) comes from quantum-mechanical tunneling between nanowires. If the wires are separated by vacuum, the corresponding specific leakage conductance becomes uncomfortably large (~10-12 Ω-1m-1) only at Fnano = 1.5 nm; however, since realistic insulation materials (SiO2, etc.) provide somewhat lower tunnel barriers, let us use a more conservative value Fnano= 3 nm. Note that this value corresponds to 1012 elementary synapses per cm2, so that for 4M = 104 and n = 4 the areal density of neural cells is close to 2×107 cm-2. Both numbers are higher than those for the human cerebral cortex, despite the fact that the quasi-2D CMOL circuits have to compete with quasi-3D cerebral cortex. With the typical specific capacitance of 3×10-10 F/m = 0.3 aF/nm, this gives nanowire capacitance C0 ≈ 1 aF per working elementary synapse, because the corresponding segment has length 4Fnano. The CrossNet operation speed is determined mostly by the time constant τ0 of dendrite nanowire capacitance recharging through resistances of open nanodevices. Since both the relevant conductance and capacitance increase similarly with M and n, τ0 ≈ R0C0. The possibilities of reduction of R0, and hence τ0, are limited mostly by acceptable power dissipation per unit area, that is close to Vs2/(2Fnano)2R0. For room-temperature operation, the voltage scale V0 ≈ Vt should be of the order of at least 30 kBT/e ≈ 1 V to avoid thermally-induced errors [9]. With our number for Fnano, and a relatively high but acceptable power consumption of 100 W/cm2, we get R0 ≈ 1010Ω (which is a very realistic value for single-molecule single-electron devices like one shown in Fig. 3). With this number, τ0 is as small as ~10 ns. This means that the CrossNet speed may be approximately six orders of magnitude (!) higher than that of the biological neural networks. Even scaling R0 up by a factor of 100 to bring power consumption to a more comfortable level of 1 W/cm2, would still leave us at least a four-orders-of-magnitude speed advantage. 6 D i s c u s s i on: P o s s i bl e a p p l i c at i o n s These estimates make us believe that that CMOL CrossNet chips may revolutionize the neuromorphic network applications. Let us start with the example of relatively small (1-cm2-scale) chips used for recognition of a face in a crowd [11]. The most difficult feature of such recognition is the search for face location, i.e. optimal placement of a face on the image relative to the panel providing input for the processing network. The enormous density and speed of CMOL hardware gives a possibility to time-and-space multiplex this task (Fig. 6). In this approach, the full image (say, formed by CMOS photodetectors on the same chip) is divided into P rectangular panels of h×w pixels, corresponding to the expected size and approximate shape of a single face. A CMOS-implemented communication channel passes input data from each panel to the corresponding CMOL neural network, providing its shift in time, say using the TV scanning pattern (red line in Fig. 6). The standard methods of image classification require the network to have just a few hidden layers, so that the time interval Δt necessary for each mapping position may be so short that the total pattern recognition time T = hwΔt may be acceptable even for online face recognition. w h image network input Fig. 6. Scan mapping of the input image on CMOL CrossNet inputs. Red lines show the possible time sequence of image pixels sent to a certain input of the network processing image from the upper-left panel of the pattern Indeed, let us consider a 4-Megapixel image partitioned into 4K 32×32-pixel panels (h = w = 32). This panel will require an MLP net with several (say, four) layers with 1K cells each in order to compare the panel image with ~10 3 stored faces. With the feasible 4-nm nanowire half-pitch, and 65-level synapses (sufficient for better than 99% fidelity [9]), each interlayer crossbar would require chip area about (4K×64 nm)2 = 64×64 μm2, fitting 4×4K of them on a ~0.6 cm2 chip. (The CMOS somatic-layer and communication-system overheads are negligible.) With the acceptable power consumption of the order of 10 W/cm2, the input-to-output signal propagation in such a network will take only about 50 ns, so that Δt may be of the order of 100 ns and the total time T = hwΔt of processing one frame of the order of 100 microseconds, much shorter than the typical TV frame time of ~10 milliseconds. The remaining two-orders-of-magnitude time gap may be used, for example, for double-checking the results via stopping the scan mapping (Fig. 6) at the most promising position. (For this, a simple feedback from the recognition output to the mapping communication system is necessary.) It is instructive to compare the estimated CMOL chip speed with that of the implementation of a similar parallel network ensemble on a CMOS signal processor (say, also combined on the same chip with an array of CMOS photodetectors). Even assuming an extremely high performance of 30 billion additions/multiplications per second, we would need ~4×4K×1K×(4K)2/(30×109) ≈ 104 seconds ~ 3 hours per frame, evidently incompatible with the online image stream processing. Let us finish with a brief (and much more speculative) discussion of possible long-term prospects of CMOL CrossNets. Eventually, large-scale (~30×30 cm2) CMOL circuits may become available. According to the estimates given in the previous section, the integration scale of such a system (in terms of both neural cells and synapses) will be comparable with that of the human cerebral cortex. Equipped with a set of broadband sensor/actuator interfaces, such (necessarily, hierarchical) system may be capable, after a period of initial supervised training, of further self-training in the process of interaction with environment, with the speed several orders of magnitude higher than that of its biological prototypes. Needless to say, the successful development of such self-developing systems would have a major impact not only on all information technologies, but also on the society as a whole. Acknowledgments This work has been supported in part by the AFOSR, MARCO (via FENA Center), and NSF. Valuable contributions made by Simon Fölling, Özgür Türel and Ibrahim Muckra, as well as useful discussions with P. Adams, J. Barhen, D. Hammerstrom, V. Protopopescu, T. Sejnowski, and D. Strukov are gratefully acknowledged. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] Frank, D. J. et al. (2001) Device scaling limits of Si MOSFETs and their application dependencies. Proc. IEEE 89(3): 259-288. Likharev, K. K. (2003) Electronics below 10 nm, in J. Greer et al. (eds.), Nano and Giga Challenges in Microelectronics, pp. 27-68. Amsterdam: Elsevier. Likharev, K. K. and Strukov, D. B. (2005) CMOL: Devices, circuits, and architectures, in G. Cuniberti et al. (eds.), Introducing Molecular Electronics, Ch. 16. Springer, Berlin. Fölling, S., Türel, Ö. & Likharev, K. K. (2001) Single-electron latching switches as nanoscale synapses, in Proc. of the 2001 Int. Joint Conf. on Neural Networks, pp. 216-221. Mount Royal, NJ: Int. Neural Network Society. Wang, W. et al. (2003) Mechanism of electron conduction in self-assembled alkanethiol monolayer devices. Phys. Rev. B 68(3): 035416 1-8. Stan M. et al. (2003) Molecular electronics: From devices and interconnect to circuits and architecture, Proc. IEEE 91(11): 1940-1957. Strukov, D. B. & Likharev, K. K. (2005) Prospects for terabit-scale nanoelectronic memories. Nanotechnology 16(1): 137-148. Strukov, D. B. & Likharev, K. K. (2005) CMOL FPGA: A reconfigurable architecture for hybrid digital circuits with two-terminal nanodevices. Nanotechnology 16(6): 888-900. Türel, Ö. et al. (2004) Neuromorphic architectures for nanoelectronic circuits”, Int. J. of Circuit Theory and Appl. 32(5): 277-302. See, e.g., Hertz J. et al. (1991) Introduction to the Theory of Neural Computation. Cambridge, MA: Perseus. Lee, J. H. & Likharev, K. K. (2005) CrossNets as pattern classifiers. Lecture Notes in Computer Sciences 3575: 434-441.

same-paper 2 0.94878763 6 nips-2005-A Connectionist Model for Constructive Modal Reasoning

Author: Artur Garcez, Luis C. Lamb, Dov M. Gabbay

3 0.93172264 18 nips-2005-Active Learning For Identifying Function Threshold Boundaries

Author: Brent Bryan, Robert C. Nichol, Christopher R. Genovese, Jeff Schneider, Christopher J. Miller, Larry Wasserman

Abstract: We present an efﬁcient algorithm to actively select queries for learning the boundaries separating a function domain into regions where the function is above and below a given threshold. We develop experiment selection methods based on entropy, misclassiﬁcation rate, variance, and their combinations, and show how they perform on a number of data sets. We then show how these algorithms are used to determine simultaneously valid 1 − α conﬁdence intervals for seven cosmological parameters. Experimentation shows that the algorithm reduces the computation necessary for the parameter estimation problem by an order of magnitude.

4 0.91764385 180 nips-2005-Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms

Author: Baback Moghaddam, Yair Weiss, Shai Avidan

Abstract: Sparse PCA seeks approximate sparse “eigenvectors” whose projections capture the maximal variance of data. As a cardinality-constrained and non-convex optimization problem, it is NP-hard and is encountered in a wide range of applied ﬁelds, from bio-informatics to ﬁnance. Recent progress has focused mainly on continuous approximation and convex relaxation of the hard cardinality constraint. In contrast, we consider an alternative discrete spectral formulation based on variational eigenvalue bounds and provide an effective greedy strategy as well as provably optimal solutions using branch-and-bound search. Moreover, the exact methodology used reveals a simple renormalization step that improves approximate solutions obtained by any continuous method. The resulting performance gain of discrete algorithms is demonstrated on real-world benchmark data and in extensive Monte Carlo evaluation trials. 1

5 0.85578108 115 nips-2005-Learning Shared Latent Structure for Image Synthesis and Robotic Imitation

Author: Aaron Shon, Keith Grochow, Aaron Hertzmann, Rajesh P. Rao

Abstract: We propose an algorithm that uses Gaussian process regression to learn common hidden structure shared between corresponding sets of heterogenous observations. The observation spaces are linked via a single, reduced-dimensionality latent variable space. We present results from two datasets demonstrating the algorithms’s ability to synthesize novel data from learned correspondences. We ﬁrst show that the method can learn the nonlinear mapping between corresponding views of objects, ﬁlling in missing data as needed to synthesize novel views. We then show that the method can learn a mapping between human degrees of freedom and robotic degrees of freedom for a humanoid robot, allowing robotic imitation of human poses from motion capture data. 1

6 0.6316365 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery

7 0.58267158 181 nips-2005-Spiking Inputs to a Winner-take-all Network

8 0.5521751 149 nips-2005-Optimal cue selection strategy

9 0.51851606 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach

10 0.51145941 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

11 0.50758958 99 nips-2005-Integrate-and-Fire models with adaptation are good enough

12 0.50021195 169 nips-2005-Saliency Based on Information Maximization

13 0.49600363 20 nips-2005-Affine Structure From Sound

14 0.49427518 68 nips-2005-Factorial Switching Kalman Filters for Condition Monitoring in Neonatal Intensive Care

15 0.48723072 67 nips-2005-Extracting Dynamical Structure Embedded in Neural Activity

16 0.4850466 93 nips-2005-Ideal Observers for Detecting Motion: Correspondence Noise

17 0.48130602 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning

18 0.47243136 163 nips-2005-Recovery of Jointly Sparse Signals from Few Random Projections

19 0.47098774 167 nips-2005-Robust design of biological experiments

20 0.46825048 187 nips-2005-Temporal Abstraction in Temporal-difference Networks