nips nips2003 nips2003-25 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Woojae Kim, Daniel J. Navarro, Mark A. Pitt, In J. Myung
Abstract: Despite the popularity of connectionist models in cognitive science, their performance can often be difficult to evaluate. Inspired by the geometric approach to statistical model selection, we introduce a conceptually similar method to examine the global behavior of a connectionist model, by counting the number and types of response patterns it can simulate. The Markov Chain Monte Carlo-based algorithm that we constructed Þnds these patterns efficiently. We demonstrate the approach using two localist network models of speech perception. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Despite the popularity of connectionist models in cognitive science, their performance can often be difficult to evaluate. [sent-8, score-0.268]
2 Inspired by the geometric approach to statistical model selection, we introduce a conceptually similar method to examine the global behavior of a connectionist model, by counting the number and types of response patterns it can simulate. [sent-9, score-0.432]
3 The Markov Chain Monte Carlo-based algorithm that we constructed Þnds these patterns efficiently. [sent-10, score-0.122]
4 We demonstrate the approach using two localist network models of speech perception. [sent-11, score-0.214]
5 1 Introduction Connectionist models are popular in some areas of cognitive science, especially language processing. [sent-12, score-0.093]
6 For example, levels of mental representation can be mapped onto layers of nodes in a connectionist network. [sent-14, score-0.235]
7 Although this sort of modeling has enriched our understanding of human cognition, the consequences of the choices made in the design of a model can be difficult to evaluate. [sent-19, score-0.066]
8 While good simulation performance is assumed to support the model and its underlying principles, a drawback of this testing methodology is that it can obscure the role played by a model’s complexity and other reasons why a competing model might simulate human data equally well. [sent-20, score-0.136]
9 A great deal of progress has been made in solving it for statistical models (i. [sent-22, score-0.076]
10 The current paper introduces a technique that can be used to assist in evaluating and choosing between connectionist models of cognition. [sent-28, score-0.228]
11 2 A Complexity Measure for Connectionist Models The ability of a connectionist model to simulate human performance well does not provide conclusive evidence that the network architecture is a good approximation to the human cognitive system that generated the data. [sent-29, score-0.402]
12 Accordingly, we need a “global” view of the model’s behavior to discover all of the qualitatively different patterns it can simulate. [sent-31, score-0.122]
13 A model’s ability to reproduce diverse patterns of data is known as its complexity, an intrinsic property of a model that arises from the interaction between its parameters and functional form. [sent-32, score-0.147]
14 Unfortunately, geometric complexity cannot be applied to connectionist models, because these models rarely possess a likelihood function, much less a well-deÞned Fisher information matrix. [sent-36, score-0.31]
15 A conceptually simple solution to the problem, albeit a computationally demanding one, is Þrst to discretize the data space in some properly deÞned sense and then to identify all of the data patterns a connectionist model can generate. [sent-40, score-0.32]
16 This approach provides the desired global view of the model’s capabilities and its deÞnition resembles that of geometric complexity: the complexity of a connectionist model is deÞned in terms of the number of discrete data patterns the model can produce. [sent-41, score-0.379]
17 As such, this reparametrization-invariant complexity measure can be used for virtually all types of network models provided that the discretization of the data space is both justiÞable and meaningful. [sent-42, score-0.18]
18 Only a small fraction of these might correspond to a model’s predictions, so it is essential to use an efficient search algorithm, one that will Þnd most or all of these patterns in a reasonable time. [sent-44, score-0.179]
19 It is tailored to exploit the kinds of search spaces that we suspect are typical of localist connectionist models, and we evaluate its performance on two of them. [sent-46, score-0.332]
20 3 Localist Models of Phoneme Perception A central issue in the Þeld of human speech perception is how lexical knowledge inßuences the perception of speech sounds. [sent-47, score-0.498]
21 That is, how does knowing the word you are hearing inßuence how you hear the smaller units that make up the word (i. [sent-48, score-0.124]
22 Two localist models have been proposed that represent opposing theoretical positions. [sent-51, score-0.153]
23 Both models were motivated by different theoretical prin- Lexical Layer Phoneme Layer Lexical Layer Phoneme Input Phoneme Decision Figure 1: Network architectures for trace (left) and merge (right). [sent-52, score-0.725]
24 Arrows indicate excitatory connections between layers; lines with dots indicate inhibitory connections within layers. [sent-53, score-0.111]
25 Proponents of trace [5] argue for bi-directional communication between layers whereas proponents of merge [6] argue against it. [sent-55, score-0.746]
26 At the heart of the controversy is whether activation also ßows in the reverse direction, directly affecting how the phonemic input is processed. [sent-60, score-0.126]
27 Instead, the processing performed at the phoneme level in merge is split in two, with an input stage and a phoneme decision stage. [sent-63, score-1.422]
28 The second, lexical layer cannot directly affect phoneme activation. [sent-64, score-0.836]
29 Instead, the two sources of information (phonemic and lexical) are integrated only at the phoneme decision stage. [sent-65, score-0.552]
30 Although the precise details of the models are unnecessary for the purposes of this paper , it will be useful to sketch a few of their technical details. [sent-66, score-0.078]
31 The parameters for the models (denoted θ), of which trace has 7 and merge has 11, correspond to the strength of the excitatory and inhibitory connections between nodes, both within and between layers. [sent-67, score-0.786]
32 In our formulation, a parameter set θ was considered valid only if the Þnal state satisÞed certain decision criteria (discussed shortly). [sent-69, score-0.113]
33 Despite the differences in motivation, trace and merge are comparable in their ability to simulate key experimental Þndings [6], making it quite challenging if not impossible to distinguish between then experimentally. [sent-71, score-0.72]
34 In the experiments, monosyllabic words were presented in which the last phoneme from one word was partially replaced by one from another word (through digital editing) to create word blends that retained residual information about the identity of the phoneme from both words. [sent-76, score-1.162]
35 The six types of blends are listed on the left of Table 1. [sent-77, score-0.103]
36 Listeners had to categorize the last phoneme in one task (phoneme decision) and categorize the entire utterance as a word or a nonsense word in the other task (lexical decision). [sent-78, score-0.662]
37 Three responses choices were used in lexical decision to test the models’ ability to distinguish between words, not just words and nonwords. [sent-80, score-0.486]
38 The asterisks in each cell indicate the responses that listeners chose most often. [sent-81, score-0.109]
39 Both trace and merge can simulate this pattern of responses. [sent-82, score-0.733]
40 Condition Name bB gB vB zZ gZ vZ Example JOb JOg JOv JOz JOg JOv + + + + + + joB joB joB joZ joZ joZ Phonemic Decision /b/ /g/ /z/ /v/ * * * * * * Lexical Decision job jog nonword * * * * * * Table 2: Two sets of decision rules for trace and merge. [sent-85, score-0.707]
41 The values shown correspond to activation levels of the appropriate decision node. [sent-86, score-0.11]
42 We applied two different sets of decision rules, listed in Table 2, and were interested in determining how many patterns (besides the human-like pattern) each model can generate. [sent-110, score-0.244]
43 4 The Search Algorithm The search problem that we need to solve differs from the standard Monte Carlo counting problem. [sent-112, score-0.107]
44 Ordinarily, Monte Carlo methods are used to discover how much of the search space is covered by some region by counting how often co-ordinates are sampled from that region. [sent-113, score-0.153]
45 In our problem, a high-dimensional parameter space has been partitioned into an unknown number of regions, with each region corresponding to a single data pattern. [sent-114, score-0.075]
46 The task is to Þnd all such regions irrespective of their size. [sent-115, score-0.129]
47 Firstly, on many occasions the network does not converge on a state that meets the decision criteria, so some proportion of the parameter space does not correspond to any data pattern. [sent-122, score-0.17]
48 Secondly, the size of the regions vary a great deal. [sent-123, score-0.12]
49 Some data patterns are elicited by a wide range of parameter values, whereas others can be produced only by a small range of values. [sent-124, score-0.179]
50 In these models, there are likely to be regions where the model consistently chooses the dominant phoneme and makes the correspondingly appropriate lexical decision. [sent-126, score-0.881]
51 However, there will also be large regions in which the models always choose “nonword” ir- Figure 2: A parameter space with “grainy” structure. [sent-127, score-0.179]
52 Each region corresponds to a single data pattern that the model can generate. [sent-128, score-0.084]
53 Regions vary in size, and small regions cluster together. [sent-129, score-0.097]
54 Along the borders between regions, however, there might be lots of smaller “transition regions”, and these regions will tend to be near one another. [sent-131, score-0.097]
55 The consequence of this structure is that the size of the region in which the process is currently located provides extensive information about the number of regions that are likely to lie nearby. [sent-132, score-0.143]
56 In a small region, there will probably be other small regions nearby, so a Þne-grained search is required in order to Þnd them. [sent-133, score-0.154]
57 However, a Þne-grained search process will get stuck in a large region, taking tiny steps when great leaps are required. [sent-134, score-0.08]
58 Our algorithm exploits this structure by using MCMC to estimate a different parameter sampling distribution p(θjri ) for every region ri that it encounters, and then cycling through these distributions in order to sample parameter sets. [sent-135, score-0.104]
59 We specify a uniform jumping distribution over a small hypersphere centered on the current point θ in the parameter space, accepting candidate points if and only if they produce the same pattern as θ. [sent-148, score-0.094]
60 Unlike SMC (or even a more standard application of MCMC), our algorithm has the desirable property that it focuses on each region in equal proportion, irrespective of its size. [sent-152, score-0.078]
61 Consequently, we search primarily along the edges of the regions that we have already discovered, paying closer attention to the small regions. [sent-154, score-0.18]
62 The Þrst test applied both to a simple toy problem possessing a grainy structure. [sent-157, score-0.08]
63 Inside a hypercube [0, 1]d , an assortment of large and small regions (also hypercubes) were deÞned using unevenly spaced grids so that all the regions neighbored each other (d ranged from 3 to 6). [sent-158, score-0.194]
64 Overall, the MCMC-based algorithm is slower than SMC at the beginning of the search due to the time required for region estimation. [sent-161, score-0.103]
65 However, the time required to learn the structure of the parameter space is time well spent because the search becomes more efficient and successful, paying large dividends in time and accuracy in the end. [sent-162, score-0.112]
66 In one reduced model, for instance, only phoneme responses were considered. [sent-164, score-0.505]
67 Weak and strong constraints (Table 2) were imposed on both models. [sent-166, score-0.076]
68 In all cases, MCMC found as many or more patterns than SMC, and all SMC patterns were among the MCMC patterns. [sent-167, score-0.244]
69 6 Application to Models of Phoneme Perception Next we ran the search algorithm on the full versions of trace and merge, using both the strong and weak constraints (Table 2). [sent-168, score-0.43]
70 The number of patterns discovered in each case is summarized in Figure 3. [sent-169, score-0.152]
71 In this experimental design merge is more complex than trace, although the extent of this effect is somewhat dependent on the choice of constraints. [sent-170, score-0.402]
72 When strong constraints are applied trace (27 patterns) is nested within merge (67 patterns), which produces 148% more patterns. [sent-171, score-0.722]
73 However, when these constraints are eased, the nesting relationship disappears, and merge (73 patterns) produces only 40% more patterns than trace (52 patterns). [sent-172, score-0.809]
74 Nevertheless, it is noteworthy that the behavior of each is highly constrained, producing less than 100 of the 46 £ 36 = 2, 985, 984 patterns available. [sent-173, score-0.122]
75 Also, for both models (under both sets of constraints), the vast majority of the parameter space was occupied by only a few patterns. [sent-174, score-0.108]
76 To answer this, we classiÞed every data pattern in terms of the number of mismatches from the human-like pattern (from 0 to 12), and counted how frequently the model patterns fell into each class. [sent-176, score-0.388]
77 The choice of constraints had little effect, and in both cases the trace distribution (open circles) is a little closer to the human-like pattern than the merge distribution (closed circles). [sent-178, score-0.725]
78 Even so, both models are remarkably human-like when considered in light of the distribution of all possible patterns (cross hairs). [sent-179, score-0.175]
79 In fact, the probability is virtually zero that a “random model” (consisting of a random sample of patterns) would display such a low mismatch frequency. [sent-180, score-0.116]
80 Building on this analysis, we looked for qualitative differences in the types of mismatches made by each model. [sent-181, score-0.215]
81 Since the choice of constraints made no difference, Figure 5 shows the mismatch proÞles under weak constraints. [sent-182, score-0.182]
82 Both models produce no mismatches in some conditions (e. [sent-183, score-0.27]
83 Interestingly, trace and merge produce similar mismatch proÞles for lexical decision, and a comparable number of mismatches (108 vs. [sent-188, score-1.267]
84 However, striking qualitative differences are evident for Weak Constraints Strong Constraints TRACE TRACE 32 20 27 41 40 MERGE MERGE Figure 3: Venn diagrams showing the number of patterns discovered for both models under both types of constraint. [sent-190, score-0.23]
85 phoneme decisions, with merge producing mismatches in conditions that trace does not (e. [sent-197, score-1.304]
86 When the two graphs are compared, an asymmetry is evident in the frequency of mismatches across tasks: merge makes phonemic mismatches with about the same frequency as lexical errors (139 vs. [sent-200, score-1.227]
87 124), whereas trace does so less than half as often (56 vs. [sent-201, score-0.244]
88 The mismatch asymmetry accords nicely with the architectures shown in Figure 1. [sent-203, score-0.143]
89 The two models make lexical decisions in an almost identical manner: phonemic information feeds into the lexical decision layer, from which a decision is made. [sent-204, score-0.981]
90 It should then come as no surprise that lexical processing in trace and merge is so similar. [sent-205, score-0.962]
91 In contrast, phoneme processing is split between two layers in merge but conÞned to one in trace. [sent-206, score-0.93]
92 The two layers dedicated to phoneme processing provide merge an added degree of ßexibility (i. [sent-207, score-0.93]
93 This shows up in many ways, not just in merge’s ability to produce mismatches in more conditions than trace. [sent-210, score-0.242]
94 For example, these mismatches yield a wider range of phoneme responses. [sent-211, score-0.658]
95 Shown above each bar in Figure 5 is the phoneme that was misrecognized in the given condition. [sent-212, score-0.508]
96 trace only misrecogized the phoneme as /g/ whereas merge misrecognized it as /g/, /z/, and /v/. [sent-213, score-1.154]
97 , Þt) alone, this additional complexity is unnecessary for modeling phoneme perception because the simpler architecture of trace simulates human data as well as merge. [sent-217, score-0.862]
98 7 Conclusions The results of this preliminary evaluation suggest that the MCMC-based algorithm is a promising method for comparing connectionist models. [sent-221, score-0.175]
99 Although it was developed to compare localist models like trace and merge, it may be broadly applicable whenever the search space exhibits this “grainy” structure. [sent-222, score-0.454]
100 Indeed, the algorithm could be a general tool for designing, comparing, and evaluating connectionist models of human cognition. [sent-223, score-0.27]
wordName wordTfidf (topN-words)
[('phoneme', 0.468), ('merge', 0.402), ('lexical', 0.316), ('trace', 0.244), ('jog', 0.2), ('mismatches', 0.19), ('connectionist', 0.175), ('patterns', 0.122), ('jri', 0.12), ('vz', 0.12), ('job', 0.119), ('smc', 0.111), ('localist', 0.1), ('phonemic', 0.1), ('regions', 0.097), ('nw', 0.095), ('vb', 0.095), ('mcmc', 0.093), ('di', 0.093), ('mismatch', 0.088), ('gz', 0.087), ('zz', 0.087), ('decision', 0.084), ('grainy', 0.08), ('joz', 0.08), ('bb', 0.079), ('gb', 0.074), ('word', 0.062), ('layers', 0.06), ('nonword', 0.06), ('search', 0.057), ('weak', 0.053), ('models', 0.053), ('erences', 0.053), ('layer', 0.052), ('rissanen', 0.052), ('counting', 0.05), ('simulate', 0.049), ('ellipsoid', 0.047), ('monte', 0.047), ('region', 0.046), ('complexity', 0.045), ('human', 0.042), ('constraints', 0.041), ('asterisks', 0.04), ('balasubramanian', 0.04), ('blends', 0.04), ('jov', 0.04), ('misrecognized', 0.04), ('myung', 0.04), ('navarro', 0.04), ('pitt', 0.04), ('proponents', 0.04), ('cognitive', 0.04), ('erent', 0.04), ('pattern', 0.038), ('perception', 0.038), ('carlo', 0.038), ('listed', 0.038), ('responses', 0.037), ('geometric', 0.037), ('strong', 0.035), ('ohio', 0.035), ('categorize', 0.035), ('phonemes', 0.035), ('excitatory', 0.033), ('speech', 0.032), ('fisher', 0.032), ('irrespective', 0.032), ('listeners', 0.032), ('schematically', 0.032), ('pro', 0.031), ('discovered', 0.03), ('inhibitory', 0.03), ('phonetic', 0.029), ('asymmetry', 0.029), ('parameter', 0.029), ('network', 0.029), ('psychology', 0.028), ('others', 0.028), ('ows', 0.028), ('virtually', 0.028), ('proportion', 0.028), ('decisions', 0.028), ('les', 0.028), ('produce', 0.027), ('paying', 0.026), ('architectures', 0.026), ('vast', 0.026), ('activation', 0.026), ('ability', 0.025), ('unnecessary', 0.025), ('kim', 0.025), ('types', 0.025), ('connections', 0.024), ('choices', 0.024), ('table', 0.024), ('great', 0.023), ('conceptually', 0.023), ('chain', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science
Author: Woojae Kim, Daniel J. Navarro, Mark A. Pitt, In J. Myung
Abstract: Despite the popularity of connectionist models in cognitive science, their performance can often be difficult to evaluate. Inspired by the geometric approach to statistical model selection, we introduce a conceptually similar method to examine the global behavior of a connectionist model, by counting the number and types of response patterns it can simulate. The Markov Chain Monte Carlo-based algorithm that we constructed Þnds these patterns efficiently. We demonstrate the approach using two localist network models of speech perception. 1
2 0.062786132 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels
Author: Jason Weston, Dengyong Zhou, André Elisseeff, William S. Noble, Christina S. Leslie
Abstract: A key issue in supervised protein classification is the representation of input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classification performance. However, such representations are based only on labeled data — examples with known 3D structures, organized into structural classes — while in practice, unlabeled data is far more plentiful. In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classification performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods while achieving far greater computational efficiency. 1
3 0.056927204 185 nips-2003-The Doubly Balanced Network of Spiking Neurons: A Memory Model with High Capacity
Author: Yuval Aviel, David Horn, Moshe Abeles
Abstract: A balanced network leads to contradictory constraints on memory models, as exemplified in previous work on accommodation of synfire chains. Here we show that these constraints can be overcome by introducing a 'shadow' inhibitory pattern for each excitatory pattern of the model. This is interpreted as a doublebalance principle, whereby there exists both global balance between average excitatory and inhibitory currents and local balance between the currents carrying coherent activity at any given time frame. This principle can be applied to networks with Hebbian cell assemblies, leading to a high capacity of the associative memory. The number of possible patterns is limited by a combinatorial constraint that turns out to be P=0.06N within the specific model that we employ. This limit is reached by the Hebbian cell assembly network. To the best of our knowledge this is the first time that such high memory capacities are demonstrated in the asynchronous state of models of spiking neurons. 1 In trod u ction Numerous studies analyze the different phases of unstructured networks of spiking neurons [1, 2]. These networks with random connectivity possess a phase of asynchronous activity, the asynchronous state (AS), which is the most interesting one from the biological perspective, since it is similar to physiological data. Unstructured networks, however, do not hold information in their connectivity matrix, and therefore do not store memories. Binary networks with ordered connectivity matrices, or structured networks, and their ability to store and retrieve memories, have been extensively studied in the past [3-8]. Applicability of these results to biologically plausible neuronal models is questionable. In particular, models of spiking neurons are known to have modes of synchronous global oscillations. Avoiding such modes, and staying in an AS, is a major constraint on networks of spiking neurons that is absent in most binary neural networks. As we will show below, it is this constraint that imposes a limit on capacity in our model. Existing associative memory models of spiking neurons have not strived for maximal pattern capacity [3, 4, 8]. Here, using an integrate-and-fire model, we embed structured synaptic connections in an otherwise unstructured network and study the capacity limit of the system. The system is therefore macroscopically unstructured, but microscopically structured. The unstructured network model is based on Brunel's [1] balanced network of integrate-and-fire neurons. In his model, the network possesses different phases, one of which is the AS. We replace his unstructured excitatory connectivity by a semistructured one, including a super-position of either synfire chains or Hebbian cell assemblies. The existence of a stable AS is a fundamental prerequisite of the system. There are two reasons for that: First, physiological measurements of cortical tissues reveal an irregular neuronal activity and an asynchronous population activity. These findings match the properties of the AS. Second, in term of information content, the entropy of the system is the highest when firing probability is uniformly distributed, as in an AS. In general, embedding one or two patterns will not destabilize the AS. Increasing the number of embedded patterns, however, will eventually destabilize the AS, leading to global oscillations. In previous work [9], we have demonstrated that the cause of AS instability is correlations between neurons that result from the presence of structure in the network. The patterns, be it Hebbian cell assemblies (HCA) or pools occurring in synfire chains (SFC), have an important characteristic: neurons that are members of the same pattern (or pool) share a large portion of their inputs. This common input correlates neuronal activities both when a pattern is activated and when both neurons are influenced by random activity. If too many patterns are embedded in the network, too many neurons become correlated due to common inputs, leading to globally synchronized deviations from mean activity. A qualitative understanding of this state of affairs is provided by a simple model of a threshold linear pair of neurons that receive n excitatory common, and correlated, inputs, and K-n excitatory, as well as K inhibitory, non-common uncorrelated inputs. Thinking of these neurons as belonging to a pattern or a pool within a network, we can obtain an interesting self-consistent result by assuming the correlation of the pair of neurons to be also the correlation in their common correlated input (as is likely to be the case in a network loaded with HCA or SFC). We find then [9] that there exists a critical pattern size, n c , below which correlations decay but above which correlations are amplified. Furthermore, the following scaling was found to exist (1) nc = rc K . Implications of this model for the whole network are that: (i) rc is independent of N, the size of the network, (ii) below nc the AS is stable, and (iii) above nc the AS is unstable. Using extensive computer simulations we were able [9] to validate all these predictions. In addition, keeping n nmin, by the requirement that n excitatory post-synaptic potentials (PSPs), on average, drive a neuron across its threshold. Since N>K and typically N>>K, together with Eq. (1) it follows that N >> (n min / rc ) . Hence rc and nmin set the lower bound of the network's size, 2 above which it is possible to embed a reasonable number of patterns in the network without losing the AS. In this paper we propose a solution that enables small n min and large r values, which in turn enables embedding a large number of patterns in much smaller networks. This is made possible by the doubly-balanced construction to be outlined below. 2 The double-balance principle Counteracting the excitatory correlations with inhibitory ones is the principle that will allow us to solve the problem. Since we deal with balanced networks, in which the mean excitatory input is balanced by an inhibitory one, we note that this principle imposes a second type of balancing condition, hence we refer to it as the double- balance principle. In the following, we apply this principle by introducing synaptic connections between any excitatory pattern and its randomly chosen inhibitory pattern. These inhibitory patterns, which we call shadow patterns, are activated after the excitatory patterns fire, but have no special in-pattern connectivity or structured projections onto other patterns. The premise is that correlations evolved in the excitatory patterns will elicit correlated inhibitory activity, thus balancing the network's average correlation level. The size of the shadow pattern has to be small enough, so that the global network activity will not be quenched, yet large enough, so that the excitatory correlation will be counteracted. A balanced network that is embedded with patterns and their shadow patterns will be referred to as a doubly balanced network (DBN), to be contrasted with the singly balanced network (SBN) where shadow patterns are absent. 3 3.1 Application of the double balance principle. The Network We model neuronal activity with the Integrate and Fire [10] model. All neurons have the same parameters: τ = 10ms , τ ref = 2.5ms , C=250pF. PSPs are modeled by a delta function with fixed delay. The number of synapses on a neuron is fixed and set to KE excitatory synapses from the local network, KE excitatory synapses from external sources and KI inhibitory synapses from the local network. See Aviel et al [9] for details. All synapses of each group will be given fixed values. It is allowed for one pre-synaptic neuron to make more than one connection to one postsynaptic neuron. The network possesses NE excitatory neurons and N I ≡ γN E inhibitory neurons. Connectivity is sparse, ε = 0.1 ). K E N E = K I N I = ε , (we use A Poisson process with rate vext=10Hz models the external source. If a neuron of population y innervates a neuron of population x its synaptic strength J xy is defined as J xE ≡ J 0 K E , J xI ≡ − gJ 0 with J0=10, and g=5. Note that J xI = − g γ KI J xE , hence g γ controls the balance between the two populations. Within an HCA pattern the neurons have high connection probability with one another. Here it is achieved by requiring L of the synapses of a neuron in the excitatory pattern to originate from within the pattern. Similarly, a neuron in the inhibitory shadow pattern dedicates L of its synapses to the associated excitatory pattern. In a SFC, each neuron in an excitatory pool is fed by L neurons from the previous pool. This forms a feed forward connectivity. In addition, when shadow pools are present, each neuron in a shadow pool is fed by L neurons from its associated excitatory pool. In both cases L = C L K E , with CL=2.5. The size of the excitatory patterns (i.e. the number of neurons participating in a pattern) or pools, nE, is also chosen to be proportional to K E (see Aviel et al. 2003 [9]), nE ≡ Cn K E , where Cn varies. This is a suitable choice, because of the behavior of the critical nc of Eq. (1), and is needed for the meaningful memory activity (of the HCA or SFC) to overcome synaptic noise. ~ The size of a shadow pattern is defined as nI ≡ d nE . This leads to the factor d, representing the relative strength of inhibitory and excitatory currents, due to a pattern or pool, affecting a neuron that is connected to both: d≡ (2) Thus it fixes nI = d − J xI nI J xE nE = gJ 0 K E d gd . = J0 K I γ ( )n . In the simulations reported below d varied between 1 γ g E and 3. Wiring the network is done in two stages, first all excitatory patterns are wired, and then random connections are added, complying with the fixed number of synapses. A volley of w spikes, normally distributed over time with width of 1ms, is used to ignite a memory pattern. In the case of SFC, the first pool is ignited, and under the right conditions the volley propagates along the chain without fading away and without destabilizing the AS. 3.2 Results First we show that the AS remains stable when embedding HCAs in a small DBN, whereas global oscillations take place if embedding is done without shadow pools. Figure 1 displays clearly the sustained activity of an HCA in the DBN. The same principle also enables embedding of SFCs in a small network. This is to be contrasted with the conclusions drawn in Aviel et al [9], where it was shown that otherwise very large networks are necessary to reach this goal. Figure 1: HCAs are embedded in a balanced network without (left) and with (right) shadow patterns. P=300 HCAs of size nE=194 excitatory neurons were embedded in a network of NE=15,000 excitatory neurons. The eleventh pattern is externally ignited at time t=100ms. A raster plot of 200ms is displayed. Without shadow patterns the network exhibits global oscillations, but with shadow patterns the network exhibits only minute oscillations, enabling the activity of the ignited pattern to be sustained. The size of the shadow patterns is set according to Eq. (2) with d=1. Neurons that participate in more than one HCA may appear more than once on the raster plot, whose y-axis is ordered according to HCAs, and represents every second neuron in each pattern. Figure 2: SFCs embedded in a balanced network without (left) and with (right) shadow patterns. The first pool is externally ignited at time t=100ms. d=0.5. The rest of the parameters are as in Figure 1. Here again, without shadow pools, the network exhibits global oscillations, but with shadow pools it has only minute oscillation, enabling a stable propagation of the synfire wave. 3.3 Maximum Capacity In this section we show that, within our DBN, it is the fixed number of synapses (rather than dynamical constraints) that dictates the maximal number of patterns or pools P that may be loaded onto the network. Let us start by noting that a neuron of population x (E or I) can participate in at most m ≡ K E L patterns, hence N x m sets an upper bound on the number of neurons that participate in all patterns: P n x P ≤ m ⋅ N x . Next, defining α x ≡ , we find that Nx αx ≤ (3) m nx = K E CL K E nx To leading order in NE this turns into K E CL K E N = C C D −1 N − O αxNx = n L x E E D C K (4) x n E ( ) ( NE ) (g γ ) if x=I, or 1 for x=E. where Dx ≡ d Thus we conclude that synaptic combinatorial considerations lead to a maximal number of patterns P. If DI<1, including the case DI=0 of the SBN, the excitatory neurons determine the limit to be P = (C n C L ) N E . If, as is the case in our DBN, −1 DI>1, then ( γα I < α E P = C n C L DI ) −1 and the inhibitory neurons set the maximum value to NE . For example, setting Cn=3.5, CL=2.4, g=3 and d=3, in Eq. (4), we get P=0.06NE. In Figure 3 we use these parameters. The capacity of a DBN is compared to that of an SBN for different network sizes. The maximal load is defined by the presence of global oscillation strong enough to prohibit sustained activity of patterns. The DBN reaches the combinatorial limit, whereas the SBN does not increase with N and obviously does not reach its combinatorial limit. 1400 1200 Pmax 1000 DBN SBN DBN Upper Limit SBN Upper Limit 800 600 400 200 0 0 5000 10000 NE 15000 Figure 3: A balanced network maximally loaded with HCAs. Left: A raster plot of a maximally loaded DBN. P=408, NE=6,000. At time t=450ms, the seventh pattern is ignited for a duration of 10ms, leading to termination of another pattern's activity (upper stripe) and to sustained activity of the ignited pattern (lower stripe). Right: P(NE) as inferred from simulations of a SBN (
4 0.051827088 196 nips-2003-Wormholes Improve Contrastive Divergence
Author: Max Welling, Andriy Mnih, Geoffrey E. Hinton
Abstract: In models that define probabilities via energies, maximum likelihood learning typically involves using Markov Chain Monte Carlo to sample from the model’s distribution. If the Markov chain is started at the data distribution, learning often works well even if the chain is only run for a few time steps [3]. But if the data distribution contains modes separated by regions of very low density, brief MCMC will not ensure that different modes have the correct relative energies because it cannot move particles from one mode to another. We show how to improve brief MCMC by allowing long-range moves that are suggested by the data distribution. If the model is approximately correct, these long-range moves have a reasonable acceptance rate.
5 0.050513111 48 nips-2003-Convex Methods for Transduction
Author: Tijl D. Bie, Nello Cristianini
Abstract: The 2-class transduction problem, as formulated by Vapnik [1], involves finding a separating hyperplane for a labelled data set that is also maximally distant from a given set of unlabelled test points. In this form, the problem has exponential computational complexity in the size of the working set. So far it has been attacked by means of integer programming techniques [2] that do not scale to reasonable problem sizes, or by local search procedures [3]. In this paper we present a relaxation of this task based on semidefinite programming (SDP), resulting in a convex optimization problem that has polynomial complexity in the size of the data set. The results are very encouraging for mid sized data sets, however the cost is still too high for large scale problems, due to the high dimensional search space. To this end, we restrict the feasible region by introducing an approximation based on solving an eigenproblem. With this approximation, the computational cost of the algorithm is such that problems with more than 1000 points can be treated. 1
6 0.047571469 126 nips-2003-Measure Based Regularization
7 0.046605442 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus
8 0.046205241 110 nips-2003-Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System
9 0.042988978 6 nips-2003-A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters
10 0.042313632 12 nips-2003-A Model for Learning the Semantics of Pictures
11 0.042278402 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks
12 0.040559143 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis
13 0.038319778 166 nips-2003-Reconstructing MEG Sources with Unknown Correlations
14 0.038158737 43 nips-2003-Bounded Invariance and the Formation of Place Fields
15 0.036744785 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles
16 0.036419127 139 nips-2003-Nonlinear Filtering of Electron Micrographs by Means of Support Vector Regression
17 0.035744406 8 nips-2003-A Holistic Approach to Compositional Semantics: a connectionist model and robot experiments
18 0.035675921 157 nips-2003-Plasticity Kernels and Temporal Statistics
19 0.034939963 176 nips-2003-Sequential Bayesian Kernel Regression
20 0.034423873 51 nips-2003-Design of Experiments via Information Theory
topicId topicWeight
[(0, -0.134), (1, 0.009), (2, 0.038), (3, 0.005), (4, -0.021), (5, -0.014), (6, 0.022), (7, 0.003), (8, 0.01), (9, 0.029), (10, 0.041), (11, -0.009), (12, 0.01), (13, -0.006), (14, 0.06), (15, -0.008), (16, -0.028), (17, -0.004), (18, 0.047), (19, -0.01), (20, -0.016), (21, -0.012), (22, -0.145), (23, 0.008), (24, -0.103), (25, -0.014), (26, -0.008), (27, 0.061), (28, -0.033), (29, 0.006), (30, 0.072), (31, 0.014), (32, 0.002), (33, 0.079), (34, 0.037), (35, -0.02), (36, 0.025), (37, -0.167), (38, -0.076), (39, -0.019), (40, -0.031), (41, -0.005), (42, -0.104), (43, 0.062), (44, -0.135), (45, 0.025), (46, -0.093), (47, 0.076), (48, -0.089), (49, -0.009)]
simIndex simValue paperId paperTitle
same-paper 1 0.93659812 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science
Author: Woojae Kim, Daniel J. Navarro, Mark A. Pitt, In J. Myung
Abstract: Despite the popularity of connectionist models in cognitive science, their performance can often be difficult to evaluate. Inspired by the geometric approach to statistical model selection, we introduce a conceptually similar method to examine the global behavior of a connectionist model, by counting the number and types of response patterns it can simulate. The Markov Chain Monte Carlo-based algorithm that we constructed Þnds these patterns efficiently. We demonstrate the approach using two localist network models of speech perception. 1
2 0.62675577 196 nips-2003-Wormholes Improve Contrastive Divergence
Author: Max Welling, Andriy Mnih, Geoffrey E. Hinton
Abstract: In models that define probabilities via energies, maximum likelihood learning typically involves using Markov Chain Monte Carlo to sample from the model’s distribution. If the Markov chain is started at the data distribution, learning often works well even if the chain is only run for a few time steps [3]. But if the data distribution contains modes separated by regions of very low density, brief MCMC will not ensure that different modes have the correct relative energies because it cannot move particles from one mode to another. We show how to improve brief MCMC by allowing long-range moves that are suggested by the data distribution. If the model is approximately correct, these long-range moves have a reasonable acceptance rate.
3 0.5676893 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus
Author: Zach Solan, David Horn, Eytan Ruppin, Shimon Edelman
Abstract: We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). Significant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of significant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proficiency. The results are encouraging: the model attains a level of performance considered to be “intermediate” for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children. 1
4 0.5547123 175 nips-2003-Sensory Modality Segregation
Author: Virginia Sa
Abstract: Why are sensory modalities segregated the way they are? In this paper we show that sensory modalities are well designed for self-supervised cross-modal learning. Using the Minimizing-Disagreement algorithm on an unsupervised speech categorization task with visual (moving lips) and auditory (sound signal) inputs, we show that very informative auditory dimensions actually harm performance when moved to the visual side of the network. It is better to throw them away than to consider them part of the “visual input”. We explain this finding in terms of the statistical structure in sensory inputs. 1
5 0.54823923 6 nips-2003-A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters
Author: Daniel B. Neill, Andrew W. Moore
Abstract: Given an N ×N grid of squares, where each square has a count and an underlying population, our goal is to find the square region with the highest density, and to calculate its significance by randomization. Any density measure D, dependent on the total count and total population of a region, can be used. For example, if each count represents the number of disease cases occurring in that square, we can use Kulldorff’s spatial scan statistic DK to find the most significant spatial disease cluster. A naive approach to finding the maximum density region requires O(N 3 ) time, and is generally computationally infeasible. We present a novel algorithm which partitions the grid into overlapping regions, bounds the maximum score of subregions contained in each region, and prunes regions which cannot contain the maximum density region. For sufficiently dense regions, this method finds the maximum density region in optimal O(N 2 ) time, in practice resulting in significant (10-200x) speedups. 1
6 0.48272085 75 nips-2003-From Algorithmic to Subjective Randomness
7 0.4747844 187 nips-2003-Training a Quantum Neural Network
8 0.46678928 165 nips-2003-Reasoning about Time and Knowledge in Neural Symbolic Learning Systems
9 0.44278815 185 nips-2003-The Doubly Balanced Network of Spiking Neurons: A Memory Model with High Capacity
10 0.43694457 8 nips-2003-A Holistic Approach to Compositional Semantics: a connectionist model and robot experiments
11 0.42469877 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels
12 0.40653762 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition
13 0.40195391 181 nips-2003-Statistical Debugging of Sampled Programs
14 0.4008427 126 nips-2003-Measure Based Regularization
15 0.38245508 184 nips-2003-The Diffusion-Limited Biochemical Signal-Relay Channel
16 0.37356865 123 nips-2003-Markov Models for Automated ECG Interval Analysis
17 0.3644256 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis
18 0.35969409 131 nips-2003-Modeling User Rating Profiles For Collaborative Filtering
19 0.34459803 104 nips-2003-Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks
20 0.33894941 112 nips-2003-Learning to Find Pre-Images
topicId topicWeight
[(0, 0.052), (11, 0.029), (29, 0.023), (30, 0.025), (35, 0.054), (53, 0.094), (68, 0.345), (69, 0.011), (71, 0.062), (76, 0.025), (85, 0.075), (91, 0.102)]
simIndex simValue paperId paperTitle
same-paper 1 0.78926647 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science
Author: Woojae Kim, Daniel J. Navarro, Mark A. Pitt, In J. Myung
Abstract: Despite the popularity of connectionist models in cognitive science, their performance can often be difficult to evaluate. Inspired by the geometric approach to statistical model selection, we introduce a conceptually similar method to examine the global behavior of a connectionist model, by counting the number and types of response patterns it can simulate. The Markov Chain Monte Carlo-based algorithm that we constructed Þnds these patterns efficiently. We demonstrate the approach using two localist network models of speech perception. 1
2 0.74295598 152 nips-2003-Pairwise Clustering and Graphical Models
Author: Noam Shental, Assaf Zomet, Tomer Hertz, Yair Weiss
Abstract: Significant progress in clustering has been achieved by algorithms that are based on pairwise affinities between the datapoints. In particular, spectral clustering methods have the advantage of being able to divide arbitrarily shaped clusters and are based on efficient eigenvector calculations. However, spectral methods lack a straightforward probabilistic interpretation which makes it difficult to automatically set parameters using training data. In this paper we use the previously proposed typical cut framework for pairwise clustering. We show an equivalence between calculating the typical cut and inference in an undirected graphical model. We show that for clustering problems with hundreds of datapoints exact inference may still be possible. For more complicated datasets, we show that loopy belief propagation (BP) and generalized belief propagation (GBP) can give excellent results on challenging clustering problems. We also use graphical models to derive a learning algorithm for affinity matrices based on labeled data. 1
3 0.70502394 17 nips-2003-A Sampled Texture Prior for Image Super-Resolution
Author: Lyndsey C. Pickup, Stephen J. Roberts, Andrew Zisserman
Abstract: Super-resolution aims to produce a high-resolution image from a set of one or more low-resolution images by recovering or inventing plausible high-frequency image content. Typical approaches try to reconstruct a high-resolution image using the sub-pixel displacements of several lowresolution images, usually regularized by a generic smoothness prior over the high-resolution image space. Other methods use training data to learn low-to-high-resolution matches, and have been highly successful even in the single-input-image case. Here we present a domain-specific image prior in the form of a p.d.f. based upon sampled images, and show that for certain types of super-resolution problems, this sample-based prior gives a significant improvement over other common multiple-image super-resolution techniques. 1
4 0.49922884 107 nips-2003-Learning Spectral Clustering
Author: Francis R. Bach, Michael I. Jordan
Abstract: Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive a new cost function for spectral clustering based on a measure of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing this cost function with respect to the partition leads to a new spectral clustering algorithm. Minimizing with respect to the similarity matrix leads to an algorithm for learning the similarity matrix. We develop a tractable approximation of our cost function that is based on the power method of computing eigenvectors. 1
5 0.47548419 192 nips-2003-Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes
Author: Kevin P. Murphy, Antonio Torralba, William T. Freeman
Abstract: Standard approaches to object detection focus on local patches of the image, and try to classify them as background or not. We propose to use the scene context (image as a whole) as an extra source of (global) information, to help resolve local ambiguities. We present a conditional random field for jointly solving the tasks of object detection and scene classification. 1
6 0.4707756 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
7 0.46686241 113 nips-2003-Learning with Local and Global Consistency
8 0.46528232 78 nips-2003-Gaussian Processes in Reinforcement Learning
9 0.46108881 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images
10 0.46011549 143 nips-2003-On the Dynamics of Boosting
11 0.45999587 4 nips-2003-A Biologically Plausible Algorithm for Reinforcement-shaped Representational Learning
12 0.45950922 126 nips-2003-Measure Based Regularization
13 0.4593617 81 nips-2003-Geometric Analysis of Constrained Curves
14 0.45874459 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons
15 0.45853651 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
16 0.45745066 72 nips-2003-Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms
17 0.4569242 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition
18 0.45574158 3 nips-2003-AUC Optimization vs. Error Rate Minimization
19 0.45536852 172 nips-2003-Semi-Supervised Learning with Trees
20 0.455255 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models