nips nips2001 nips2001-78 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Jacobs, Bas Rokers, Archisman Rudra, Zili Liu
Abstract: Partial information can trigger a complete memory. At the same time, human memory is not perfect. A cue can contain enough information to specify an item in memory, but fail to trigger that item. In the context of word memory, we present experiments that demonstrate some basic patterns in human memory errors. We use cues that consist of word fragments. We show that short and long cues are completed more accurately than medium length ones and study some of the factors that lead to this behavior. We then present a novel computational model that shows some of the flexibility and patterns of errors that occur in human memory. This model iterates between bottom-up and top-down computations. These are tied together using a Markov model of words that allows memory to be accessed with a simple feature set, and enables a bottom-up process to compute a probability distribution of possible completions of word fragments, in a manner similar to models of visual perceptual completion.
Reference: text
sentIndex sentText sentNum sentScore
1 A cue can contain enough information to specify an item in memory, but fail to trigger that item. [sent-13, score-0.314]
2 In the context of word memory, we present experiments that demonstrate some basic patterns in human memory errors. [sent-14, score-0.692]
3 We show that short and long cues are completed more accurately than medium length ones and study some of the factors that lead to this behavior. [sent-16, score-0.427]
4 These are tied together using a Markov model of words that allows memory to be accessed with a simple feature set, and enables a bottom-up process to compute a probability distribution of possible completions of word fragments, in a manner similar to models of visual perceptual completion. [sent-19, score-0.785]
5 1 Introduction This paper addresses the problem of retrieving items in memory from partial information. [sent-20, score-0.345]
6 We hypothesize that memory errors occur in part because a trade-off exists between memory accuracy and the complexity of neural hardware needed to perform complicated memory tasks. [sent-23, score-0.729]
7 If this is true, we can gain insight into mechanisms of human memory by studying the patterns of errors humans make, and we can model human memory with systems that produce similar patterns as a result of constraints on computational resources. [sent-24, score-0.763]
8 We experiment with word memory questions of the sort that arise in a game called superghost. [sent-25, score-0.606]
9 They must find a valid English word that matches this query, by replacing each ‘*’ with zero or more letters. [sent-27, score-0.325]
10 In ef- fect, the subject is given a set of letters and must think of a word that contains all of those letters, in that order, with other letters added as needed. [sent-29, score-0.68]
11 Most of the psychological literature on word completion involves the effects of priming certain responses with recent experience (Shacter and Tulving[18]). [sent-30, score-0.61]
12 However, priming is only able to account for about five percent of the variance in a typical fragment completion task (Olofsson and Nyberg[13], Hintzman and Hartry[6]). [sent-31, score-0.688]
13 This measures the extent to which all the letters in the query are needed to find a valid answer. [sent-33, score-0.346]
14 We show that when we control for the redundancy of queries, we find that the difficulty of answering questions increases with their length; queries with many letters tend to be easy only because they tend to be highly redundant. [sent-34, score-0.475]
15 Our model is based on the idea that a large memory system can gain efficiency by keeping the comparison between input and items in memory as simple as possible. [sent-36, score-0.615]
16 In the latter, the string of letters given may begin at any point in the word, and adjacent letters in the fragment do not need, but may, be adjacent in the completed word. [sent-45, score-1.056]
17 For example, for stem completion the fragment “str” may be completed into “string”, but for fragment completion also into “satire”. [sent-46, score-1.462]
18 Performance for wordfragment completion is lower than word-stem completion (Olofsson and Nyberg[12]). [sent-47, score-0.414]
19 In addition words, for which the ending fragment is given, show performance closer to wordstem completion than to word-fragment completion (Olofsson and Nyberg[13]). [sent-48, score-0.853]
20 [21] indicate that assuming orthographic encoding is in most cases sufficient to describe word completion performance in humans. [sent-51, score-0.591]
21 Contradicting evidence exists for the influence of fragment length on word completion. [sent-55, score-0.837]
22 Oloffsson and Nyberg [12] failed to find a difference between two and three letter fragments on words of length of five to eight letters. [sent-56, score-0.638]
23 However this might have been due to the fact that in their task, each fragment has a unique completion. [sent-57, score-0.442]
24 Many recurrent neural networks have been proposed as models of associative memory (Anderson[1] contains a review). [sent-58, score-0.308]
25 Perhaps most relevant to our work are models that use an input query to activate items from a complete dictionary in memory, and then use these items to alter the activations of the input. [sent-59, score-0.37]
26 For example, in the Interactive Activation model of Rumelhart and McClelland[16], the presence of letters activates words, which boost the activity of the letters they contain. [sent-60, score-0.387]
27 In Adaptive Resonance models (Carpenter and Grossberg[3]) activated memory items are compared to the input query and de-activated if they do not match. [sent-61, score-0.511]
28 [5], Rao and Ballard[14]), although these are not used as part of a memory system with complete items stored in memory. [sent-66, score-0.345]
29 We find that superghost queries seem more natural to people than associative memory word problems (compare the superghost query “think of a word with an a” to the associative memory query “think of a word whose seventh letter is an a”). [sent-69, score-2.338]
30 However, it is not clear how to extend most models of associative memory to handle superghost problems. [sent-70, score-0.452]
31 Simultaneous with our work ([8]) they use a bigram model to solve anagram problems, in which letters are unscrambled to match words in a dictionary. [sent-77, score-0.68]
32 2 Experiments with Human Subjects In our experiments, fragments and matching words were drawn from a large standard corpus of English text. [sent-81, score-0.441]
33 The frequency of a word is the number of times it appears in this corpus. [sent-82, score-0.441]
34 The frequency of a fragment is the sum of the frequency of all words that the fragment matches. [sent-83, score-1.174]
35 We used fragments of length two to eight, discarding any fragments with frequency lower than one thousand. [sent-84, score-0.743]
36 Consequently, shorter fragments tended to match more words, with greater total frequency. [sent-87, score-0.36]
37 In the second experiment, fragments were selected so that a uniform distribution of frequencies was ensured over all fragment lengths. [sent-88, score-0.717]
38 For example, we used length two fragments that matched unusually few words. [sent-89, score-0.373]
39 A fragment was presented on a computer screen with spaces interspersed, indicating the possibility of letter insertion. [sent-91, score-0.606]
40 The subject was required to enter a word that would fit the fragment. [sent-92, score-0.297]
41 For each session 50 fragments were presented, with a similar number of fragments of each length. [sent-94, score-0.55]
42 Reaction times were recorded by measuring the time elapsed between the fragment first appearing on screen and the subject typing the first character of a matching word. [sent-95, score-0.531]
43 Words that did not match the fragment or did not exist in the corpus were marked as not completed. [sent-96, score-0.532]
44 2 0 1 2 3 4 5 6 7 8 9 Fragment Length Figure 1: Fragment completion as a function of fragment length for randomly chosen cues (top-left) and cues of equal frequency (top-right). [sent-111, score-1.168]
45 On the bottom, the equal frequency cues are divided into five groups, from least redundancy (R0) to most (R5) . [sent-112, score-0.482]
46 Results For each graph we plot the number of fragments completed divided by the number of , where is the fragments presented (Figure 1). [sent-114, score-0.734]
47 Controlling for frequency reduces performance because on average lower frequency fragments are selected. [sent-118, score-0.498]
48 The U-shaped curve is flattened, but persists; hence Ushaped performance is not just due to frequency Finally, we divide the fragments from the two experiments into five groups, according to their redundancy. [sent-119, score-0.43]
49 It is the probability that if we randomly delete a letter from the fragment and find a matching word, that this word will match the full fragment. [sent-121, score-1.004]
50 Specifically, let denote the frequency of a query fragment of length (total frequency of words that match it). [sent-122, score-1.058]
51 Let denote the frequency of the fragment that results when we delete the ’th letter from the query (note, ). [sent-123, score-0.868]
52 3 Using Markov Models for Word Retrieval We now describe a model of word memory in which matching between the query and memory is mediated by a simple set of features. [sent-129, score-1.014]
53 We denote the beginning and end of a word using the symbols ‘0’ and ‘1’, respectively, so that bigram probabilities also indicate how often individual letters begin or end a word. [sent-131, score-0.749]
54 Then bigram probabilities are used to trigger words in memory that might match the query. [sent-133, score-0.684]
55 First, we compute a prior distribution on how likely each word in memory is to match our query. [sent-135, score-0.625]
56 However, this distribution could reflect the frequency with which each word occurs in English. [sent-137, score-0.392]
57 It could also be used to capture priming phenomena; for example, if a word has been recently seen, its prior probability could increase, making it more likely that the model would retrieve this word. [sent-138, score-0.404]
58 Then, using these we compute a probability that each bigram will appear if we randomly select a bigram from a word selected according to our prior distribution. [sent-139, score-0.798]
59 Second, we use these bigram probabilities as a Markov model, and compute the expected number of times each bigram will occur in the answer, conditioned on the query. [sent-140, score-0.502]
60 That is, as a generic model of words we assume that each letter in the word depends on the adjacent letters, but is conditionally independent of all others. [sent-141, score-0.63]
61 Implicitly, each query begins with ‘0’ and ends with ‘1’, so the expected number of times any bigram will appear in the completed word is the sum of the number of times it appears in the completions of the fragments: ‘0*p’, ‘p*l’, ‘l*c’, and ‘c*1’. [sent-144, score-0.96]
62 To compute this, we assume a prior distribution on the number of letters that will replace a ‘*’ in the completed word. [sent-145, score-0.36]
63 £ & ¢ ¡&§ ( Beginning the third step of the algorithm, we know the expected number of times that each bigram appears in the completed cue. [sent-151, score-0.445]
64 Each bigram then votes for all words containing that bigram. [sent-152, score-0.386]
65 The weight of this vote is the expected number of times each bigram appears in the completed cue, divided by the prior probability of each bigram, computed in step 1. [sent-153, score-0.495]
66 We update the prior for each word as the product of these votes with the previous probability. [sent-155, score-0.367]
67 We can view this an approximate computation of the probability of each word being the correct answer, based on the likelihood that a bigram appears in the completed cue, and our prior on each word being correct. [sent-156, score-1.069]
68 After the third step, we once again have a probability that each word is correct, and can iterate, using this probability to initialize step one. [sent-157, score-0.297]
69 After a small number of iterations, we terminate the algorithm and select the most probable word as our answer. [sent-158, score-0.297]
70 4 1 2 3 4 5 6 7 8 9 Fragment Length Figure 2: Performance as a function of cue length, for cues of frequency between 4 and 22 (top-left) and between 1 and 3 (top-right). [sent-179, score-0.505]
71 word matches the cue, where the main approximation comes from using a small set of features to bring the cue into contact with items in memory. [sent-181, score-0.665]
72 Denote the number of features ), the number of features in each word by by (with a bigram representation, (ie. [sent-182, score-0.536]
73 , the word length plus one), the number of words by , and the maximum number of blanks replacing a ‘*’ by . [sent-183, score-0.495]
74 First, we simulated the conditions described in Olofsson and Nyberg[12] comparing word stem and word fragment completion. [sent-189, score-1.079]
75 To match their experiments, we used a modified algorithm that handled cues in which the number of missing letters can be specified. [sent-190, score-0.414]
76 We used cues that specified the first three letters of a word, the last three letters, or three letters scattered throughout the word. [sent-191, score-0.532]
77 Therefore, the fact that it performs better when the first letters of the word are given than when the last are given is due to regularities in English spelling, and is not built into the algorithm. [sent-195, score-0.477]
78 Next we simulated conditions comparable to our own experiments on human subjects, using superghost cues. [sent-196, score-0.272]
79 First we selected cues of varying length that match between four and twenty-two words in the dictionary. [sent-197, score-0.432]
80 We also ran these experiments using cues that matched one to three words (Figure 2-topright). [sent-200, score-0.299]
81 The algorithm performs differently on fragments with very low frequency because in our corpus the shorter of these cues had especially low redundancy and the longer fragments had especially high redundancy, in comparison to fragments with frequencies between 4 and 22. [sent-202, score-1.381]
82 We can see that performance increases with redundancy and decreases with cue length. [sent-204, score-0.459]
83 Discussion Our experiments indicate two main effects in human word memory that our model also shares. [sent-205, score-0.726]
84 Second, when we control for this, performance drops with cue length. [sent-207, score-0.294]
85 Since redundancy tends to increase with cue length, this creates two conflicting tendencies that result in a U-shaped memory curve. [sent-208, score-0.669]
86 We conjecture that these factors may be present in many memory tasks, leading to U-shaped memory curves in a number of domains. [sent-209, score-0.486]
87 In our model, the fact that performance drops with cue length is a result of our use of a simple feature set to mediate matching the cue to words in memory. [sent-210, score-0.796]
88 This means that not all the information present in the cue is conveyed to items in memory. [sent-211, score-0.34]
89 When the length of a cue increases, but its redundancy remains low, all the information in the cue remains important in getting a correct answer, but the amount of information in the cue increases, making it harder to capture it all with a limited feature set. [sent-212, score-1.059]
90 On the other hand, the extent to which redundancy grows with cue length is really a product of the specific words in memory and the cues chosen. [sent-214, score-1.039]
91 So, for example, if we add a letter to a cue that is already highly redundant, the new letter may not be needed to find a correct answer, but that is not reflected by much of an increase in the cue’s redundancy. [sent-219, score-0.543]
92 4 Conclusions We have proposed superghost queries as a domain for experimenting with word memory, because it seems a natural task to people, and requires models that can flexibly handle somewhat complicated questions. [sent-220, score-0.515]
93 We have shown that in human subjects, performance on superghost improves with the redundancy of a query, and otherwise tends to decrease with word length. [sent-221, score-0.763]
94 We have proposed a computational model that uses a simple, generic model of words to map a superghost query onto a simple feature set of bigrams. [sent-223, score-0.525]
95 This means that somewhat complicated questions can be answered while keeping comparisons between the fragments and words in memory very simple. [sent-224, score-0.705]
96 It also does better at word stem completion than word fragment completion, which agrees with previous work on human memory. [sent-226, score-1.369]
97 Item effects in recognition and fragment completion: Contingency relations vary for different sets of words. [sent-256, score-0.473]
98 Swedish norms for completion of word stems and unique word fragments. [sent-286, score-0.783]
99 Sublexical structures in visual word recognition: Access units or orthographic redundancy? [sent-312, score-0.393]
100 The role of syllabic and orthographic properties of letter cues in solving word fragments. [sent-336, score-0.678]
wordName wordTfidf (topN-words)
[('fragment', 0.442), ('word', 0.297), ('fragments', 0.275), ('memory', 0.243), ('bigram', 0.239), ('cue', 0.238), ('completion', 0.189), ('redundancy', 0.188), ('letters', 0.18), ('cues', 0.172), ('query', 0.166), ('completed', 0.157), ('superghost', 0.144), ('letter', 0.137), ('olofsson', 0.108), ('items', 0.102), ('human', 0.101), ('words', 0.1), ('length', 0.098), ('frequency', 0.095), ('nyberg', 0.09), ('queries', 0.074), ('anagram', 0.072), ('orthographic', 0.072), ('answer', 0.072), ('associative', 0.065), ('match', 0.062), ('priming', 0.057), ('english', 0.055), ('bidirectional', 0.054), ('ve', 0.048), ('votes', 0.047), ('bigrams', 0.047), ('subjects', 0.045), ('stem', 0.043), ('fraction', 0.042), ('markov', 0.041), ('trigger', 0.04), ('matching', 0.038), ('cognition', 0.038), ('perceptual', 0.038), ('nd', 0.038), ('psychological', 0.036), ('adjacent', 0.036), ('exibly', 0.036), ('hintzman', 0.036), ('optics', 0.036), ('palm', 0.036), ('rokers', 0.036), ('scandinavian', 0.036), ('shacter', 0.036), ('sommer', 0.036), ('srinivas', 0.036), ('ucla', 0.036), ('wordfragment', 0.036), ('zili', 0.036), ('item', 0.036), ('jacobs', 0.036), ('experiment', 0.033), ('performance', 0.033), ('generic', 0.033), ('beginning', 0.033), ('questions', 0.033), ('carpenter', 0.031), ('grimes', 0.031), ('effects', 0.031), ('correct', 0.031), ('psychology', 0.03), ('answered', 0.028), ('delete', 0.028), ('baum', 0.028), ('completions', 0.028), ('groups', 0.028), ('corpus', 0.028), ('eight', 0.028), ('matches', 0.028), ('feature', 0.028), ('model', 0.027), ('experiments', 0.027), ('divided', 0.027), ('angeles', 0.027), ('rumelhart', 0.027), ('nec', 0.027), ('screen', 0.027), ('comparisons', 0.026), ('especially', 0.025), ('rao', 0.025), ('string', 0.025), ('appears', 0.025), ('visual', 0.024), ('interactive', 0.024), ('pereira', 0.024), ('patterns', 0.024), ('times', 0.024), ('think', 0.023), ('prior', 0.023), ('rst', 0.023), ('shorter', 0.023), ('redundant', 0.023), ('drops', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 78 nips-2001-Fragment Completion in Humans and Machines
Author: David Jacobs, Bas Rokers, Archisman Rudra, Zili Liu
Abstract: Partial information can trigger a complete memory. At the same time, human memory is not perfect. A cue can contain enough information to specify an item in memory, but fail to trigger that item. In the context of word memory, we present experiments that demonstrate some basic patterns in human memory errors. We use cues that consist of word fragments. We show that short and long cues are completed more accurately than medium length ones and study some of the factors that lead to this behavior. We then present a novel computational model that shows some of the flexibility and patterns of errors that occur in human memory. This model iterates between bottom-up and top-down computations. These are tied together using a Markov model of words that allows memory to be accessed with a simple feature set, and enables a bottom-up process to compute a probability distribution of possible completions of word fragments, in a manner similar to models of visual perceptual completion.
2 0.18470325 151 nips-2001-Probabilistic principles in unsupervised learning of visual structure: human data and a model
Author: Shimon Edelman, Benjamin P. Hiles, Hwajin Yang, Nathan Intrator
Abstract: To find out how the representations of structured visual objects depend on the co-occurrence statistics of their constituents, we exposed subjects to a set of composite images with tight control exerted over (1) the conditional probabilities of the constituent fragments, and (2) the value of Barlow’s criterion of “suspicious coincidence” (the ratio of joint probability to the product of marginals). We then compared the part verification response times for various probe/target combinations before and after the exposure. For composite probes, the speedup was much larger for targets that contained pairs of fragments perfectly predictive of each other, compared to those that did not. This effect was modulated by the significance of their co-occurrence as estimated by Barlow’s criterion. For lone-fragment probes, the speedup in all conditions was generally lower than for composites. These results shed light on the brain’s strategies for unsupervised acquisition of structural information in vision. 1 Motivation How does the human visual system decide for which objects it should maintain distinct and persistent internal representations of the kind typically postulated by theories of object recognition? Consider, for example, the image shown in Figure 1, left. This image can be represented as a monolithic hieroglyph, a pair of Chinese characters (which we shall refer to as and ), a set of strokes, or, trivially, as a collection of pixels. Note that the second option is only available to a system previously exposed to various combinations of Chinese characters. Indeed, a principled decision whether to represent this image as , or otherwise can only be made on the basis of prior exposure to related images. £ ¡ £¦ ¡ £ ¥¨§¢ ¥¤¢ ¢ According to Barlow’s [1] insight, one useful principle is tallying suspicious coincidences: two candidate fragments and should be combined into a composite object if the probability of their joint appearance is much higher than , which is the probability expected in the case of their statistical independence. This criterion may be compared to the Minimum Description Length (MDL) principle, which has been previously discussed in the context of object representation [2, 3]. In a simplified form [4], MDL calls for representing explicitly as a whole if , just as the principle of suspicious coincidences does. £ ©¢ £ ¢ ¥¤¥ £¦ ¢ ¥ £ ¢ £¦ ¢ ¥¤¥! ¨§¥ £ ¢ £ ©¢ £¦ £ ¨§¢¥ ¡ ¢ While the Barlow/MDL criterion certainly indicates a suspicious coincidence, there are additional probabilistic considerations that may be used and . One example is the possiin setting the degree of association between ble perfect predictability of from and vice versa, as measured by . If , then and are perfectly predictive of each other and should really be coded by a single symbol, whereas the MDL criterion may suggest merely that some association between the representation of and that of be established. In comparison, if and are not perfectly predictive of each other ( ), there is a case to be made in favor of coding them separately to allow for a maximally expressive representation, whereas MDL may actually suggest a high degree of association ). In this study we investigated whether the human (if visual system uses a criterion based on alongside MDL while learning (in an unsupervised manner) to represent composite objects. £ £ £ ¢ ¥ ¥ © §¥ ¡ ¢ ¨¦¤
3 0.15714338 86 nips-2001-Grammatical Bigrams
Author: Mark A. Paskin
Abstract: Unsupervised learning algorithms have been derived for several statistical models of English grammar, but their computational complexity makes applying them to large data sets intractable. This paper presents a probabilistic model of English grammar that is much simpler than conventional models, but which admits an efficient EM training algorithm. The model is based upon grammatical bigrams, i.e. , syntactic relationships between pairs of words. We present the results of experiments that quantify the representational adequacy of the grammatical bigram model, its ability to generalize from labelled data, and its ability to induce syntactic structure from large amounts of raw text. 1
4 0.12878208 87 nips-2001-Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway
Author: Gal Chechik, Amir Globerson, M. J. Anderson, E. D. Young, Israel Nelken, Naftali Tishby
Abstract: The way groups of auditory neurons interact to code acoustic information is investigated using an information theoretic approach. We develop measures of redundancy among groups of neurons, and apply them to the study of collaborative coding efficiency in two processing stations in the auditory pathway: the inferior colliculus (IC) and the primary auditory cortex (AI). Under two schemes for the coding of the acoustic content, acoustic segments coding and stimulus identity coding, we show differences both in information content and group redundancies between IC and AI neurons. These results provide for the first time a direct evidence for redundancy reduction along the ascending auditory pathway, as has been hypothesized for theoretical considerations [Barlow 1959,2001]. The redundancy effects under the single-spikes coding scheme are significant only for groups larger than ten cells, and cannot be revealed with the redundancy measures that use only pairs of cells. The results suggest that the auditory system transforms low level representations that contain redundancies due to the statistical structure of natural stimuli, into a representation in which cortical neurons extract rare and independent component of complex acoustic signals, that are useful for auditory scene analysis. 1
5 0.11070713 24 nips-2001-Active Information Retrieval
Author: Tommi Jaakkola, Hava T. Siegelmann
Abstract: In classical large information retrieval systems, the system responds to a user initiated query with a list of results ranked by relevance. The users may further refine their query as needed. This process may result in a lengthy correspondence without conclusion. We propose an alternative active learning approach, where the system responds to the initial user's query by successively probing the user for distinctions at multiple levels of abstraction. The system's initiated queries are optimized for speedy recovery and the user is permitted to respond with multiple selections or may reject the query. The information is in each case unambiguously incorporated by the system and the subsequent queries are adjusted to minimize the need for further exchange. The system's initiated queries are subject to resource constraints pertaining to the amount of information that can be presented to the user per iteration. 1
6 0.094113477 184 nips-2001-The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank
7 0.081734635 18 nips-2001-A Rational Analysis of Cognitive Control in a Speeded Discrimination Task
8 0.07890448 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network
9 0.072506726 12 nips-2001-A Model of the Phonological Loop: Generalization and Binding
10 0.070406735 56 nips-2001-Convolution Kernels for Natural Language
11 0.06565538 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines
12 0.062134031 174 nips-2001-Spike timing and the coding of naturalistic sounds in a central auditory area of songbirds
13 0.060938895 123 nips-2001-Modeling Temporal Structure in Classical Conditioning
14 0.057465829 39 nips-2001-Audio-Visual Sound Separation Via Hidden Markov Models
15 0.057394151 161 nips-2001-Reinforcement Learning with Long Short-Term Memory
16 0.057191212 142 nips-2001-Orientational and Geometric Determinants of Place and Head-direction
17 0.055533752 110 nips-2001-Learning Hierarchical Structures with Linear Relational Embedding
18 0.05538911 37 nips-2001-Associative memory in realistic neuronal networks
19 0.050117515 166 nips-2001-Self-regulation Mechanism of Temporally Asymmetric Hebbian Plasticity
20 0.047691382 5 nips-2001-A Bayesian Model Predicts Human Parse Preference and Reading Times in Sentence Processing
topicId topicWeight
[(0, -0.146), (1, -0.094), (2, -0.047), (3, -0.032), (4, -0.099), (5, -0.046), (6, -0.111), (7, -0.001), (8, -0.176), (9, -0.036), (10, -0.021), (11, 0.004), (12, -0.201), (13, 0.018), (14, 0.098), (15, 0.027), (16, 0.103), (17, -0.039), (18, -0.119), (19, 0.159), (20, -0.017), (21, -0.053), (22, 0.003), (23, -0.015), (24, -0.083), (25, -0.029), (26, -0.138), (27, 0.125), (28, -0.024), (29, 0.135), (30, 0.03), (31, 0.142), (32, -0.018), (33, 0.26), (34, -0.009), (35, 0.091), (36, 0.167), (37, -0.121), (38, -0.145), (39, 0.025), (40, 0.096), (41, 0.01), (42, 0.01), (43, -0.115), (44, 0.08), (45, 0.008), (46, 0.012), (47, -0.225), (48, 0.059), (49, 0.094)]
simIndex simValue paperId paperTitle
same-paper 1 0.97100061 78 nips-2001-Fragment Completion in Humans and Machines
Author: David Jacobs, Bas Rokers, Archisman Rudra, Zili Liu
Abstract: Partial information can trigger a complete memory. At the same time, human memory is not perfect. A cue can contain enough information to specify an item in memory, but fail to trigger that item. In the context of word memory, we present experiments that demonstrate some basic patterns in human memory errors. We use cues that consist of word fragments. We show that short and long cues are completed more accurately than medium length ones and study some of the factors that lead to this behavior. We then present a novel computational model that shows some of the flexibility and patterns of errors that occur in human memory. This model iterates between bottom-up and top-down computations. These are tied together using a Markov model of words that allows memory to be accessed with a simple feature set, and enables a bottom-up process to compute a probability distribution of possible completions of word fragments, in a manner similar to models of visual perceptual completion.
2 0.66037524 151 nips-2001-Probabilistic principles in unsupervised learning of visual structure: human data and a model
Author: Shimon Edelman, Benjamin P. Hiles, Hwajin Yang, Nathan Intrator
Abstract: To find out how the representations of structured visual objects depend on the co-occurrence statistics of their constituents, we exposed subjects to a set of composite images with tight control exerted over (1) the conditional probabilities of the constituent fragments, and (2) the value of Barlow’s criterion of “suspicious coincidence” (the ratio of joint probability to the product of marginals). We then compared the part verification response times for various probe/target combinations before and after the exposure. For composite probes, the speedup was much larger for targets that contained pairs of fragments perfectly predictive of each other, compared to those that did not. This effect was modulated by the significance of their co-occurrence as estimated by Barlow’s criterion. For lone-fragment probes, the speedup in all conditions was generally lower than for composites. These results shed light on the brain’s strategies for unsupervised acquisition of structural information in vision. 1 Motivation How does the human visual system decide for which objects it should maintain distinct and persistent internal representations of the kind typically postulated by theories of object recognition? Consider, for example, the image shown in Figure 1, left. This image can be represented as a monolithic hieroglyph, a pair of Chinese characters (which we shall refer to as and ), a set of strokes, or, trivially, as a collection of pixels. Note that the second option is only available to a system previously exposed to various combinations of Chinese characters. Indeed, a principled decision whether to represent this image as , or otherwise can only be made on the basis of prior exposure to related images. £ ¡ £¦ ¡ £ ¥¨§¢ ¥¤¢ ¢ According to Barlow’s [1] insight, one useful principle is tallying suspicious coincidences: two candidate fragments and should be combined into a composite object if the probability of their joint appearance is much higher than , which is the probability expected in the case of their statistical independence. This criterion may be compared to the Minimum Description Length (MDL) principle, which has been previously discussed in the context of object representation [2, 3]. In a simplified form [4], MDL calls for representing explicitly as a whole if , just as the principle of suspicious coincidences does. £ ©¢ £ ¢ ¥¤¥ £¦ ¢ ¥ £ ¢ £¦ ¢ ¥¤¥! ¨§¥ £ ¢ £ ©¢ £¦ £ ¨§¢¥ ¡ ¢ While the Barlow/MDL criterion certainly indicates a suspicious coincidence, there are additional probabilistic considerations that may be used and . One example is the possiin setting the degree of association between ble perfect predictability of from and vice versa, as measured by . If , then and are perfectly predictive of each other and should really be coded by a single symbol, whereas the MDL criterion may suggest merely that some association between the representation of and that of be established. In comparison, if and are not perfectly predictive of each other ( ), there is a case to be made in favor of coding them separately to allow for a maximally expressive representation, whereas MDL may actually suggest a high degree of association ). In this study we investigated whether the human (if visual system uses a criterion based on alongside MDL while learning (in an unsupervised manner) to represent composite objects. £ £ £ ¢ ¥ ¥ © §¥ ¡ ¢ ¨¦¤
3 0.51967216 86 nips-2001-Grammatical Bigrams
Author: Mark A. Paskin
Abstract: Unsupervised learning algorithms have been derived for several statistical models of English grammar, but their computational complexity makes applying them to large data sets intractable. This paper presents a probabilistic model of English grammar that is much simpler than conventional models, but which admits an efficient EM training algorithm. The model is based upon grammatical bigrams, i.e. , syntactic relationships between pairs of words. We present the results of experiments that quantify the representational adequacy of the grammatical bigram model, its ability to generalize from labelled data, and its ability to induce syntactic structure from large amounts of raw text. 1
4 0.50677258 184 nips-2001-The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank
Author: Matthew Richardson, Pedro Domingos
Abstract: The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a page a score proportional to the number of times a random surfer would visit that page, if it surfed indefinitely from page to page, following all outlinks from a page with equal probability. We propose to improve PageRank by using a more intelligent surfer, one that is guided by a probabilistic model of the relevance of a page to a query. Efficient execution of our algorithm at query time is made possible by precomputing at crawl time (and thus once for all queries) the necessary terms. Experiments on two large subsets of the Web indicate that our algorithm significantly outperforms PageRank in the (human-rated) quality of the pages returned, while remaining efficient enough to be used in today’s large search engines. 1
5 0.44408095 18 nips-2001-A Rational Analysis of Cognitive Control in a Speeded Discrimination Task
Author: Michael C. Mozer, Michael D. Colagrosso, David E. Huber
Abstract: We are interested in the mechanisms by which individuals monitor and adjust their performance of simple cognitive tasks. We model a speeded discrimination task in which individuals are asked to classify a sequence of stimuli (Jones & Braver, 2001). Response conflict arises when one stimulus class is infrequent relative to another, resulting in more errors and slower reaction times for the infrequent class. How do control processes modulate behavior based on the relative class frequencies? We explain performance from a rational perspective that casts the goal of individuals as minimizing a cost that depends both on error rate and reaction time. With two additional assumptions of rationality—that class prior probabilities are accurately estimated and that inference is optimal subject to limitations on rate of information transmission—we obtain a good fit to overall RT and error data, as well as trial-by-trial variations in performance. Consider the following scenario: While driving, you approach an intersection at which the traffic light has already turned yellow, signaling that it is about to turn red. You also notice that a car is approaching you rapidly from behind, with no indication of slowing. Should you stop or speed through the intersection? The decision is difficult due to the presence of two conflicting signals. Such response conflict can be produced in a psychological laboratory as well. For example, Stroop (1935) asked individuals to name the color of ink on which a word is printed. When the words are color names incongruous with the ink color— e.g., “blue” printed in red—reaction times are slower and error rates are higher. We are interested in the control mechanisms underlying performance of high-conflict tasks. Conflict requires individuals to monitor and adjust their behavior, possibly responding more slowly if errors are too frequent. In this paper, we model a speeded discrimination paradigm in which individuals are asked to classify a sequence of stimuli (Jones & Braver, 2001). The stimuli are letters of the alphabet, A–Z, presented in rapid succession. In a choice task, individuals are asked to press one response key if the letter is an X or another response key for any letter other than X (as a shorthand, we will refer to non-X stimuli as Y). In a go/no-go task, individuals are asked to press a response key when X is presented and to make no response otherwise. We address both tasks because they elicit slightly different decision-making behavior. In both tasks, Jones and Braver (2001) manipulated the relative frequency of the X and Y stimuli; the ratio of presentation frequency was either 17:83, 50:50, or 83:17. Response conflict arises when the two stimulus classes are unbalanced in frequency, resulting in more errors and slower reaction times. For example, when X’s are frequent but Y is presented, individuals are predisposed toward producing the X response, and this predisposition must be overcome by the perceptual evidence from the Y. Jones and Braver (2001) also performed an fMRI study of this task and found that anterior cingulate cortex (ACC) becomes activated in situations involving response conflict. Specifically, when one stimulus occurs infrequently relative to the other, event-related fMRI response in the ACC is greater for the low frequency stimulus. Jones and Braver also extended a neural network model of Botvinick, Braver, Barch, Carter, and Cohen (2001) to account for human performance in the two discrimination tasks. The heart of the model is a mechanism that monitors conflict—the posited role of the ACC—and adjusts response biases accordingly. In this paper, we develop a parsimonious alternative account of the role of the ACC and of how control processes modulate behavior when response conflict arises. 1 A RATIONAL ANALYSIS Our account is based on a rational analysis of human cognition, which views cognitive processes as being optimized with respect to certain task-related goals, and being adaptive to the structure of the environment (Anderson, 1990). We make three assumptions of rationality: (1) perceptual inference is optimal but is subject to rate limitations on information transmission, (2) response class prior probabilities are accurately estimated, and (3) the goal of individuals is to minimize a cost that depends both on error rate and reaction time. The heart of our account is an existing probabilistic model that explains a variety of facilitation effects that arise from long-term repetition priming (Colagrosso, in preparation; Mozer, Colagrosso, & Huber, 2000), and more broadly, that addresses changes in the nature of information transmission in neocortex due to experience. We give a brief overview of this model; the details are not essential for the present work. The model posits that neocortex can be characterized by a collection of informationprocessing pathways, and any act of cognition involves coordination among pathways. To model a simple discrimination task, we might suppose a perceptual pathway to map the visual input to a semantic representation, and a response pathway to map the semantic representation to a response. The choice and go/no-go tasks described earlier share a perceptual pathway, but require different response pathways. The model is framed in terms of probability theory: pathway inputs and outputs are random variables and microinference in a pathway is carried out by Bayesian belief revision. To elaborate, consider a pathway whose input at time is a discrete random variable, denoted , which can assume values corresponding to alternative input states. Similarly, the output of the pathway at time is a discrete random variable, denoted , which can assume values . For example, the input to the perceptual pathway in the discrimination task is one of visual patterns corresponding to the letters of the alphabet, and the output is one of letter identities. (This model is highly abstract: the visual patterns are enumerated, but the actual pixel patterns are not explicitly represented in the model. Nonetheless, the similarity structure among inputs can be captured, but we skip a discussion of this issue because it is irrelevant for the current work.) To present a particular input alternative, , to the model for time steps, we clamp for . The model computes a probability distribution over given , i.e., P . ¡ # 4 0 ©2' & 0 ' ! 1)(
6 0.38454297 87 nips-2001-Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway
7 0.3271822 174 nips-2001-Spike timing and the coding of naturalistic sounds in a central auditory area of songbirds
8 0.32271391 24 nips-2001-Active Information Retrieval
9 0.30733627 5 nips-2001-A Bayesian Model Predicts Human Parse Preference and Reading Times in Sentence Processing
10 0.3067576 12 nips-2001-A Model of the Phonological Loop: Generalization and Binding
11 0.29252684 108 nips-2001-Learning Body Pose via Specialized Maps
12 0.27755523 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines
13 0.27309519 30 nips-2001-Agglomerative Multivariate Information Bottleneck
14 0.27187058 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network
15 0.26967502 11 nips-2001-A Maximum-Likelihood Approach to Modeling Multisensory Enhancement
16 0.26344621 142 nips-2001-Orientational and Geometric Determinants of Place and Head-direction
17 0.24122319 68 nips-2001-Entropy and Inference, Revisited
18 0.24099439 123 nips-2001-Modeling Temporal Structure in Classical Conditioning
19 0.23732474 53 nips-2001-Constructing Distributed Representations Using Additive Clustering
20 0.23480552 193 nips-2001-Unsupervised Learning of Human Motion Models
topicId topicWeight
[(14, 0.015), (17, 0.019), (19, 0.016), (27, 0.072), (30, 0.065), (38, 0.051), (59, 0.017), (72, 0.052), (79, 0.462), (91, 0.137)]
simIndex simValue paperId paperTitle
1 0.94727898 35 nips-2001-Analysis of Sparse Bayesian Learning
Author: Anita C. Faul, Michael E. Tipping
Abstract: The recent introduction of the 'relevance vector machine' has effectively demonstrated how sparsity may be obtained in generalised linear models within a Bayesian framework. Using a particular form of Gaussian parameter prior, 'learning' is the maximisation, with respect to hyperparameters, of the marginal likelihood of the data. This paper studies the properties of that objective function, and demonstrates that conditioned on an individual hyperparameter, the marginal likelihood has a unique maximum which is computable in closed form. It is further shown that if a derived 'sparsity criterion' is satisfied, this maximum is exactly equivalent to 'pruning' the corresponding parameter from the model. 1
2 0.94086367 2 nips-2001-3 state neurons for contextual processing
Author: Ádám Kepecs, S. Raghavachari
Abstract: Neurons receive excitatory inputs via both fast AMPA and slow NMDA type receptors. We find that neurons receiving input via NMDA receptors can have two stable membrane states which are input dependent. Action potentials can only be initiated from the higher voltage state. Similar observations have been made in several brain areas which might be explained by our model. The interactions between the two kinds of inputs lead us to suggest that some neurons may operate in 3 states: disabled, enabled and firing. Such enabled, but non-firing modes can be used to introduce context-dependent processing in neural networks. We provide a simple example and discuss possible implications for neuronal processing and response variability. 1
3 0.92484891 115 nips-2001-Linear-time inference in Hierarchical HMMs
Author: Kevin P. Murphy, Mark A. Paskin
Abstract: The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original infertime, where is ence algorithm is rather complicated, and takes the length of the sequence, making it impractical for many domains. In this paper, we show how HHMMs are a special kind of dynamic Bayesian network (DBN), and thereby derive a much simpler inference algorithm, which only takes time. Furthermore, by drawing the connection between HHMMs and DBNs, we enable the application of many standard approximation techniques to further speed up inference. ¥ ©§ £ ¨¦¥¤¢ © £ ¦¥¤¢
same-paper 4 0.91089696 78 nips-2001-Fragment Completion in Humans and Machines
Author: David Jacobs, Bas Rokers, Archisman Rudra, Zili Liu
Abstract: Partial information can trigger a complete memory. At the same time, human memory is not perfect. A cue can contain enough information to specify an item in memory, but fail to trigger that item. In the context of word memory, we present experiments that demonstrate some basic patterns in human memory errors. We use cues that consist of word fragments. We show that short and long cues are completed more accurately than medium length ones and study some of the factors that lead to this behavior. We then present a novel computational model that shows some of the flexibility and patterns of errors that occur in human memory. This model iterates between bottom-up and top-down computations. These are tied together using a Markov model of words that allows memory to be accessed with a simple feature set, and enables a bottom-up process to compute a probability distribution of possible completions of word fragments, in a manner similar to models of visual perceptual completion.
5 0.88792819 180 nips-2001-The Concave-Convex Procedure (CCCP)
Author: Alan L. Yuille, Anand Rangarajan
Abstract: We introduce the Concave-Convex procedure (CCCP) which constructs discrete time iterative dynamical systems which are guaranteed to monotonically decrease global optimization/energy functions. It can be applied to (almost) any optimization problem and many existing algorithms can be interpreted in terms of CCCP. In particular, we prove relationships to some applications of Legendre transform techniques. We then illustrate CCCP by applications to Potts models, linear assignment, EM algorithms, and Generalized Iterative Scaling (GIS). CCCP can be used both as a new way to understand existing optimization algorithms and as a procedure for generating new algorithms. 1
6 0.62413871 183 nips-2001-The Infinite Hidden Markov Model
7 0.54877174 184 nips-2001-The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank
8 0.54829472 118 nips-2001-Matching Free Trees with Replicator Equations
9 0.54698193 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's
10 0.54026562 86 nips-2001-Grammatical Bigrams
11 0.5391916 3 nips-2001-ACh, Uncertainty, and Cortical Inference
12 0.53552234 169 nips-2001-Small-World Phenomena and the Dynamics of Information
13 0.52753758 172 nips-2001-Speech Recognition using SVMs
14 0.52144384 194 nips-2001-Using Vocabulary Knowledge in Bayesian Multinomial Estimation
15 0.52130044 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs
16 0.52124017 123 nips-2001-Modeling Temporal Structure in Classical Conditioning
17 0.5166353 12 nips-2001-A Model of the Phonological Loop: Generalization and Binding
18 0.51199794 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
19 0.51107311 171 nips-2001-Spectral Relaxation for K-means Clustering
20 0.50978011 132 nips-2001-Novel iteration schemes for the Cluster Variation Method