nips nips2000 nips2000-71 knowledge-graph by maker-knowledge-mining

71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

Source: pdf

Author: Predrag Neskovic, Philip C. Davis, Leon N. Cooper

Abstract: In this work, we introduce an Interactive Parts (IP) model as an alternative to Hidden Markov Models (HMMs). We t ested both models on a database of on-line cursive script. We show that implementations of HMMs and the IP model, in which all letters are assumed to have the same average width , give comparable results. However , in contrast to HMMs, the IP model can handle duration modeling without an increase in computational complexity. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We t ested both models on a database of on-line cursive script. [sent-2, score-0.155]

2 We show that implementations of HMMs and the IP model, in which all letters are assumed to have the same average width , give comparable results. [sent-3, score-0.207]

3 However , in contrast to HMMs, the IP model can handle duration modeling without an increase in computational complexity. [sent-4, score-0.092]

4 1 Introduction Hidden Markov models [9] have been a dominant paradigm in speech and handwriting recognition over the past several decades. [sent-5, score-0.195]

5 The success of HMMs is primarily due to their ability to model the statistical and sequential nature of speech and handwriting data. [sent-6, score-0.189]

6 However , HMMs have a numb er of weaknesses [2] . [sent-7, score-0.066]

7 Although explicit duration HMMs model data more accurately, the computational cost of such modeling is high [5]. [sent-10, score-0.092]

8 In addition, this new objective function can be cast into a NNbased framework [7, 8] and can easily deal with the sequential nature of handwriting . [sent-14, score-0.224]

9 In our approach, we model an object as a set of local parts arranged at specific spatial locations. [sent-15, score-0.144]

10 7 DISTORTIONS Cl ct cut Figure 1: Effect of shape distortion, and Figure 2: Some of the non-zero elements spatial distortions applied on the word of the detection matrix associated with "act" . [sent-25, score-0.707]

11 Parts-based representation has been used in face detection systems [3] and has recently been applied to spotting keywords in cursive handwriting data [4]. [sent-27, score-0.458]

12 In our application, an object is a handwritten word and its parts are the letters. [sent-30, score-0.404]

13 2 The Objective Function In our approach , we assume that a handwritten pattern is a distorted version of one of the dictionary words. [sent-32, score-0.693]

14 Furthermore, we assume that any distortion of a word can be expressed as a combination of two types of local distortions [6]: a) shape distortions of one or more letters, and b) spatial distortions, also called domain warping, as illustrated in Figure 1. [sent-33, score-0.723]

15 In the latter case, the shape of each letter is unchanged but the lo cation of one or more letters is perturbed. [sent-34, score-0.68]

16 A number of different techniques can be used to construct letter detectors. [sent-36, score-0.418]

17 The output of a letter detector is in the range [0 - 1], where 1 corresponds to the undistorted shape of the corresponding letter. [sent-38, score-0.473]

18 Since it is not known, a priori, where the letters are located in the pattern, letter detectors , for each letter of the alphabet , are arranged over the pattern so that the pattern is completely covered by their (overlapping) receptive fields. [sent-39, score-1.261]

19 The outputs of the letter detectors form a det ection matrix, Figure 2. [sent-40, score-0.457]

20 Each row of the detection matrix represents one letter and each column corresponds to the position of the letter within the pattern. [sent-41, score-1.009]

21 An element of the detection matrix is lab eled as d! [sent-42, score-0.173]

22 In general , the detection matrix contains a large number of "false alarms" due to the fact that local segments are often ambiguous. [sent-47, score-0.195]

23 The recognition system segments a pattern by selecting one detection matrix element for each letter of a given dictionary word 1. [sent-48, score-1.548]

24 To measure spatial distortions, one must first choose a reference point from which distortions are measured. [sent-49, score-0.224]

25 It is clear that for any choice of reference point, the location estimates for letters that are not close to the reference point might be very poor. [sent-50, score-0.348]

26 For this reason, we chose a representation in which each letter serves as a reference point to estimate the position of every other letter. [sent-51, score-0.519]

27 This representation allows translation invariant recognition, is very robust (since it does not depend on any single reference point) and very accurate (since it includes nearest neighbor reference points). [sent-52, score-0.321]

28 To evaluate the level of distortion of a handwritten pattern from a given dictionary word, we introduce an objective function. [sent-53, score-0.746]

29 The value of this function represents the amount of distortion of the pattern from the dictionary word. [sent-54, score-0.646]

30 We require that the objective function reach a minimal value if all the letters that constitute the dictionary word are detected with the highest confidence and are located at the locations with highest expectation values. [sent-55, score-1.077]

31 Furthermore, we require that the dependence of the function on one of its letters be smaller for longer words. [sent-56, score-0.207]

32 One function with similar properties to these is the energy function of a system of interacting particles, Li,j qiUi,j(Xi, Xj)qj. [sent-57, score-0.066]

33 If we assume that all the letters are of the same size, we can map 1) letter detection estimates into "charge" and 2) choose interaction terms (potentials) to reflect the expected relative positioning of the letters (detection matrix elements). [sent-58, score-1.154]

34 The energy function of the n -th dictionary word, is then Ln En(x) = L di(x;)Ui~j(xi, xj)d'J(xj), (1) i ,j=l,iicj where Ln is the number of letters in the word, Xi is the location of the i - th letter of the n - th dictionary word, and = (Xl,路 . [sent-59, score-1.962]

35 , XLJ is a particular configuration of detection matrix elements. [sent-61, score-0.21]

36 The interaction terms Ui,j are more complex than l/r, and each "charge", di(Xi), does not have a fixed value, but depends on its location. [sent-63, score-0.149]

37 Note that this energy is a function of a specific choice of elements from the detection matrix, x, a specific segmentation of the word. [sent-64, score-0.296]

38 One possibility is to use the EM algorithm [9] and do the training for each dictionary word. [sent-66, score-0.556]

39 Let us denote with the symbol pij (Xi, Xj) the (pairwise) probability of finding the j - th letter of the n - th dictionary word at distance X = Xj - Xi from the location of the i - th letter. [sent-68, score-1.586]

40 A simple way to approximate pairwise probabilities is to find the probability distribution of letter widths for each letter and then from single letter distributions calculate nearest neighbor pairwise probabilities. [sent-69, score-1.84]

41 Knowing the nearest neighbor probabilities, it is then easy to propagate them and find the pairwise probabilities between any two letters of any dictionary word [7]. [sent-70, score-1.452]

42 Interaction potentials are related to pairwise probabilities (using the Boltzmann distribution and setting j3 = 1/ kT = 1), as Ui~j(Xi,Xj) = -lnpij(xi,Xj)+C. [sent-71, score-0.231]

43 Since the interaction potentials are defined up to a constant, we can selectively 1 Note that this segmentation corresponds to finding the centers of the letters, as opposed to segmenting a word into letters by finding their boundaries. [sent-72, score-0.754]

44 I b u - c __---//// Figure 3: Solid line: an example of a pairwise probability distribution for neighboring letters. [sent-73, score-0.107]

45 Regions x ::::: a and x :::: b are the "forbidden" regions for letter locations. [sent-76, score-0.467]

46 In the regions a < x < a' and b' < x < b the interaction term is zero. [sent-77, score-0.198]

47 It is important to stress that the only valid domain for the interaction terms is the region for which Ui,j < 0 since for each pair ofletters (i, j) we want to simultaneously minimize the interaction term Ui,j and to maximize the term d i 路dj 2. [sent-80, score-0.319]

48 We will assume that there is a value, Pmin, for the pairwise probability below which the estimate of the letter location is not reliable. [sent-81, score-0.564]

49 In practice, this means that there is an effective range of influence for each letter, and beyond that range the influence ofthe letter is zero. [sent-84, score-0.5]

50 In the limiting case, one can get a nearest neighbor approximation by appropriately setting Pmin. [sent-85, score-0.219]

51 It is clear that the interaction terms put constraints on the possible lo cations of the letters of a given dictionary word. [sent-86, score-0.886]

52 They define "allowed" regions, where the letters can be found, unimportant regions, where the influence of a letter on other letters is zero, and not-allowed regions (U = (0) , which have zero probability of finding a letter in that region, Fig. [sent-87, score-1.329]

53 The task of recognition can now be formulated as follows. [sent-89, score-0.062]

54 For a given dictionary word, find the configuration of elements from the detection matrix (a specific segmentation ofthe pattern) such that the energy is minimal. [sent-90, score-0.922]

55 Then, in order to find the best dictionary word, repeat the previous procedure for every dictionary word and associate with the pattern the dictionary word with lowest energy. [sent-91, score-2.242]

56 If we denote by X the space of all possible segmentations of the pattern, then the final segmentation of the pattern, x*, is given as x* = argmin~Ex,nEN(En(x)). [sent-92, score-0.069]

57 (2) where the index n runs through the dictionary words. [sent-93, score-0.53]

58 3 Implementation and an Overview of the System An overview of the system is illustrated in Fig. [sent-94, score-0.065]

59 A raw data file, representing a handwritten word, contains x and y positions of the pen recorded every 10 milliseconds. [sent-96, score-0.119]

60 Each stroke is characterized by 2 For Ui,J > 0, increasing di 路d J would increase, rather than decrease, the energy function. [sent-98, score-0.144]

61 Figure 6: Comparison of recognition resuits on 10 writers using the IP model and HMMs. [sent-106, score-0.197]

62 The preprocessor extracts these features from each stroke and supplies them to the neural network. [sent-108, score-0.088]

63 In our implementation, the network has one hidden layer with thirty rows of hidden units. [sent-112, score-0.084]

64 The output of the NN, the detection matrix, is then supplied to the HMM-based and IP model-based post-processors, Fig. [sent-114, score-0.164]

65 For both models , we assume that every letter has the same average width. [sent-116, score-0.446]

66 The first approximation for interaction t erms is to assume a "square well" shape. [sent-118, score-0.193]

67 Each interaction t erm is then defined with only three parameters , the left boundary a, the right boundary b and the depth of the well, en, which are the same for all the nearest neighbor letters, Fig. [sent-119, score-0.368]

68 The lower and upper limits for the i - th and j - th non-adjacent interaction terms can then b e approximated as aij = Ij - il . [sent-121, score-0.321]

69 Since the exact solution of the energy function given by Eq. [sent-125, score-0.066]

70 (2) is often computationally infeasible (the detection matrices can exceed 40 columns in width for long words), one has to use some approximation technique. [sent-126, score-0.136]

71 Another possibility is to revise the energy function by considering only nearest neighbor t erms and then solve it exactly using a Dynamic Programming (DP) algorithm. [sent-128, score-0.355]

72 We have used DP to find the optimal segmentation for each word . [sent-129, score-0.35]

73 We then use this "optimal" configuration of letters to calculate the energy given by Eq. [sent-130, score-0.335]

74 It is important to m ention that we have introduced beginning (B) and end (E) "letters" to mark the beginning and end of the pattern, and their detection probabilities are set to some constant value 3 . [sent-132, score-0.256]

75 The goal of the recognition system is to find the dictionary word with the maximum posterior probability, p(w IO) = p(Olw)p(w)/p(O), 3This is necessary in order to define interaction potentials for single letter words. [sent-134, score-1.488]

76 HMMs P(d) u x en / \ a -f- ' - - - - - - ' b Figure 7: Square well approximation of the interaction potential. [sent-135, score-0.185]

77 Allowed region is defined as a < x < b, and forbidden regions are x < a, and x > b. [sent-136, score-0.104]

78 Since p( 0) and p( w) are the same for all dictionary words, maximizing p(wIO) is equivalent to maximizing p(Olw). [sent-140, score-0.53]

79 To find p(Olw), we constructed a left-right (or Bakis) HMM [9] for each dictionary word, ,\", where each letter was represented with one state. [sent-141, score-0.948]

80 Given a dictionary word (a model ,\n), we calculated the maximumlikelihood,p(OI,\n) = Lall Q P(O, QI,\n) = Lall Q P(OIQ, ,\n)p(QI,\n), where the summation is done over all possible state sequences. [sent-142, score-0.857]

81 Emission probabilities were calculated from the detection probabilities using Bayes' rule P(Oxlqk) = dk(x)P(Ox)/ P(qk), where P(qk) denotes the frequency of the k - th letter in the dictionary and the term P(Ox) is the same for all words and can therefore be omitted. [sent-144, score-1.395]

82 Transition probabilities were adjusted until the best recognition results were obtained. [sent-145, score-0.138]

83 Recall that we assumed that all letter widths are the same and therefore the transition probabilities are independent of letter pairs. [sent-146, score-0.985]

84 The neural network was trained on 70 writers (70,000 words) and an independent group of writers was used as a cross validation set. [sent-149, score-0.22]

85 We have tested both the IP model and HMMs on a group of 10 writers (different from the testing and cross-validation groups). [sent-150, score-0.135]

86 89% of the time, while HMMs selected the correct word 79. [sent-154, score-0.281]

87 It is important to mention that new dictionary words can be easily added to the dictionary and the IP model does not require retraining on the new words (using the method of calculating interaction terms suggested in this paper). [sent-160, score-1.399]

88 The only information about the new word that has to be supplied to the system is the ordering of the letters. [sent-161, score-0.309]

89 Knowing the nearest neighbor pairwise probabilities, pi} (Xi, X j), it is easy to calculate the location estimates between any two letters of the new word. [sent-162, score-0.597]

90 Furthermore, the IP model can easily recognize words where many of the letters are highly distorted or missing. [sent-163, score-0.345]

91 In standard first-order HMMs with time-independent transition probabilities, the probability of remaining in the i - th state for exactly d time steps is illustrated in Fig. [sent-164, score-0.132]

92 The real probability distribution on letter widths is actually similar to a Poisson distribution [11), Fig. [sent-166, score-0.47]

93 It has been shown that explicit duration HMMs can significantly improve recognition accuracy, but at the expense of a significant increase in computational complexity [5] . [sent-168, score-0.106]

94 Our model, on the other hand, can easily model arbitrarily complex pairwise probabilities without increasing the computational complexity (using DP in a nearest neighbor approximation). [sent-169, score-0.456]

95 We believe that including more precise interaction terms will yield significantly better results (as in HMMs) and this work is currently in progress. [sent-171, score-0.149]

96 Using hierarchical shape models to spot keywords in cursive handwriting data. [sent-201, score-0.377]

97 Modeling duration in a hidden markov model with the exponential family. [sent-207, score-0.149]

98 Neural network-based context driven recognition of on-line cursive script. [sent-228, score-0.24]

99 Theory to practice: A case study - recognizing cursive handwriting. [sent-238, score-0.155]

100 On-line cursive script recognition using time delay neural networks and hidden markov models. [sent-247, score-0.367]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dictionary', 0.53), ('letter', 0.418), ('word', 0.281), ('hmms', 0.232), ('letters', 0.207), ('cursive', 0.155), ('interaction', 0.149), ('detection', 0.136), ('ip', 0.135), ('distortions', 0.135), ('handwriting', 0.133), ('neighbor', 0.115), ('writers', 0.11), ('pairwise', 0.107), ('nearest', 0.104), ('pmin', 0.088), ('th', 0.086), ('probabilities', 0.076), ('handwritten', 0.069), ('segmentation', 0.069), ('olw', 0.066), ('energy', 0.066), ('recognition', 0.062), ('pattern', 0.062), ('pij', 0.06), ('shape', 0.055), ('distortion', 0.054), ('words', 0.052), ('widths', 0.052), ('rumelhart', 0.052), ('reference', 0.051), ('regions', 0.049), ('potentials', 0.048), ('duration', 0.044), ('erms', 0.044), ('neskovic', 0.044), ('numb', 0.044), ('preprocessor', 0.044), ('script', 0.044), ('stroke', 0.044), ('hidden', 0.042), ('overview', 0.04), ('location', 0.039), ('detectors', 0.039), ('burl', 0.038), ('spatial', 0.038), ('markov', 0.038), ('dp', 0.037), ('configuration', 0.037), ('matrix', 0.037), ('en', 0.036), ('di', 0.034), ('lall', 0.034), ('interactive', 0.034), ('nns', 0.034), ('keywords', 0.034), ('forbidden', 0.034), ('qk', 0.034), ('xj', 0.034), ('suggested', 0.032), ('charge', 0.032), ('propagate', 0.032), ('distorted', 0.032), ('objective', 0.031), ('sequential', 0.031), ('parts', 0.031), ('influence', 0.03), ('xi', 0.03), ('easily', 0.029), ('located', 0.028), ('knowing', 0.028), ('emission', 0.028), ('supplied', 0.028), ('hybrid', 0.028), ('every', 0.028), ('arranged', 0.027), ('possibility', 0.026), ('delay', 0.026), ('calculate', 0.025), ('model', 0.025), ('elements', 0.025), ('illustrated', 0.025), ('ox', 0.024), ('brain', 0.024), ('modeling', 0.023), ('brown', 0.023), ('driven', 0.023), ('feedforward', 0.023), ('object', 0.023), ('ln', 0.022), ('act', 0.022), ('er', 0.022), ('beginning', 0.022), ('chose', 0.022), ('ofthe', 0.022), ('positions', 0.022), ('segments', 0.022), ('calculated', 0.021), ('transition', 0.021), ('region', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

Author: Predrag Neskovic, Philip C. Davis, Leon N. Cooper

2 0.17301437 6 nips-2000-A Neural Probabilistic Language Model

Author: Yoshua Bengio, Réjean Ducharme, Pascal Vincent

Abstract: A goal of statistical language modeling is to learn the joint probability function of sequences of words. This is intrinsically difficult because of the curse of dimensionality: we propose to fight it with its own weapons. In the proposed approach one learns simultaneously (1) a distributed representation for each word (i.e. a similarity between words) along with (2) the probability function for word sequences, expressed with these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar to words forming an already seen sentence. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach very significantly improves on a state-of-the-art trigram model.

3 0.12677464 19 nips-2000-Adaptive Object Representation with Hierarchically-Distributed Memory Sites

Author: Bosco S. Tjan

Abstract: Theories of object recognition often assume that only one representation scheme is used within one visual-processing pathway. Versatility of the visual system comes from having multiple visual-processing pathways, each specialized in a different category of objects. We propose a theoretically simpler alternative, capable of explaining the same set of data and more. A single primary visual-processing pathway, loosely modular, is assumed. Memory modules are attached to sites along this pathway. Object-identity decision is made independently at each site. A site's response time is a monotonic-decreasing function of its confidence regarding its decision. An observer's response is the first-arriving response from any site. The effective representation(s) of such a system, determined empirically, can appear to be specialized for different tasks and stimuli, consistent with recent clinical and functional-imaging findings. This, however, merely reflects a decision being made at its appropriate level of abstraction. The system itself is intrinsically flexible and adaptive.

4 0.085270092 90 nips-2000-New Approaches Towards Robust and Adaptive Speech Recognition

Author: Hervé Bourlard, Samy Bengio, Katrin Weber

Abstract: In this paper, we discuss some new research directions in automatic speech recognition (ASR), and which somewhat deviate from the usual approaches. More specifically, we will motivate and briefly describe new approaches based on multi-stream and multi/band ASR. These approaches extend the standard hidden Markov model (HMM) based approach by assuming that the different (frequency) channels representing the speech signal are processed by different (independent)

5 0.08522626 117 nips-2000-Shape Context: A New Descriptor for Shape Matching and Object Recognition

Author: Serge Belongie, Jitendra Malik, Jan Puzicha

Abstract: We develop an approach to object recognition based on matching shapes and using a resulting measure of similarity in a nearest neighbor classifier. The key algorithmic problem here is that of finding pointwise correspondences between an image shape and a stored prototype shape. We introduce a new shape descriptor, the shape context, which makes this possible, using a simple and robust algorithm. The shape context at a point captures the distribution over relative positions of other shape points and thus summarizes global shape in a rich, local descriptor. We demonstrate that shape contexts greatly simplify recovery of correspondences between points of two given shapes. Once shapes are aligned, shape contexts are used to define a robust score for measuring shape similarity. We have used this score in a nearest-neighbor classifier for recognition of hand written digits as well as 3D objects, using exactly the same distance function. On the benchmark MNIST dataset of handwritten digits, this yields an error rate of 0.63%, outperforming other published techniques. 1

6 0.079177096 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

7 0.073152505 23 nips-2000-An Adaptive Metric Machine for Pattern Classification

8 0.073005907 38 nips-2000-Data Clustering by Markovian Relaxation and the Information Bottleneck Method

9 0.072584011 141 nips-2000-Universality and Individuality in a Neural Code

10 0.060859378 130 nips-2000-Text Classification using String Kernels

11 0.056164004 103 nips-2000-Probabilistic Semantic Video Indexing

12 0.054845646 4 nips-2000-A Linear Programming Approach to Novelty Detection

13 0.054689799 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure

14 0.053089194 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach

15 0.052539244 79 nips-2000-Learning Segmentation by Random Walks

16 0.048824426 51 nips-2000-Factored Semi-Tied Covariance Matrices

17 0.047986254 96 nips-2000-One Microphone Source Separation

18 0.047737956 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

19 0.047407806 75 nips-2000-Large Scale Bayes Point Machines

20 0.046636049 2 nips-2000-A Comparison of Image Processing Techniques for Visual Speech Recognition Applications

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.182), (1, -0.047), (2, 0.052), (3, 0.034), (4, -0.015), (5, 0.022), (6, 0.009), (7, -0.02), (8, 0.045), (9, 0.231), (10, 0.132), (11, 0.123), (12, 0.167), (13, 0.113), (14, -0.029), (15, 0.127), (16, 0.076), (17, -0.134), (18, 0.107), (19, 0.118), (20, 0.194), (21, -0.168), (22, 0.056), (23, 0.103), (24, 0.097), (25, 0.07), (26, 0.03), (27, -0.032), (28, 0.04), (29, 0.106), (30, 0.001), (31, -0.072), (32, 0.047), (33, -0.031), (34, -0.183), (35, 0.093), (36, 0.002), (37, -0.037), (38, 0.196), (39, 0.072), (40, -0.074), (41, 0.076), (42, -0.058), (43, 0.036), (44, 0.105), (45, 0.029), (46, -0.025), (47, 0.112), (48, -0.047), (49, 0.007)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96152306 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

Author: Predrag Neskovic, Philip C. Davis, Leon N. Cooper

2 0.69170171 6 nips-2000-A Neural Probabilistic Language Model

Author: Yoshua Bengio, Réjean Ducharme, Pascal Vincent

3 0.56517339 19 nips-2000-Adaptive Object Representation with Hierarchically-Distributed Memory Sites

Author: Bosco S. Tjan

4 0.40825519 131 nips-2000-The Early Word Catches the Weights

Author: Mark A. Smith, Garrison W. Cottrell, Karen L. Anderson

Abstract: The strong correlation between the frequency of words and their naming latency has been well documented. However, as early as 1973, the Age of Acquisition (AoA) of a word was alleged to be the actual variable of interest, but these studies seem to have been ignored in most of the literature. Recently, there has been a resurgence of interest in AoA. While some studies have shown that frequency has no effect when AoA is controlled for, more recent studies have found independent contributions of frequency and AoA. Connectionist models have repeatedly shown strong effects of frequency, but little attention has been paid to whether they can also show AoA effects. Indeed, several researchers have explicitly claimed that they cannot show AoA effects. In this work, we explore these claims using a simple feed forward neural network. We find a significant contribution of AoA to naming latency, as well as conditions under which frequency provides an independent contribution. 1 Background Naming latency is the time between the presentation of a picture or written word and the beginning of the correct utterance of that word. It is undisputed that there are significant differences in the naming latency of many words, even when controlling word length, syllabic complexity, and other structural variants. The cause of differences in naming latency has been the subject of numerous studies. Earlier studies found that the frequency with which a word appears in spoken English is the best determinant of its naming latency (Oldfield & Wingfield, 1965). More recent psychological studies, however, show that the age at which a word is learned, or its Age of Acquisition (AoA), may be a better predictor of naming latency. Further, in many multiple regression analyses, frequency is not found to be significant when AoA is controlled for (Brown & Watson, 1987; Carroll & White, 1973; Morrison et al. 1992; Morrison & Ellis, 1995). These studies show that frequency and AoA are highly correlated (typically r =-.6) explaining the confound of older studies on frequency. However, still more recent studies question this finding and find that both AoA and frequency are significant and contribute independently to naming latency (Ellis & Morrison, 1998; Gerhand & Barry, 1998,1999). Much like their psychological counterparts, connectionist networks also show very strong frequency effects. However, the ability of a connectionist network to show AoA effects has been doubted (Gerhand & Barry, 1998; Morrison & Ellis, 1995). Most of these claims are based on the well known fact that connectionist networks exhibit

5 0.39501405 23 nips-2000-An Adaptive Metric Machine for Pattern Classification

Author: Carlotta Domeniconi, Jing Peng, Dimitrios Gunopulos

Abstract: Nearest neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a locally adaptive nearest neighbor classification method to try to minimize bias. We use a Chi-squared distance analysis to compute a flexible metric for producing neighborhoods that are elongated along less relevant feature dimensions and constricted along most influential ones. As a result, the class conditional probabilities tend to be smoother in the modified neighborhoods, whereby better classification performance can be achieved. The efficacy of our method is validated and compared against other techniques using a variety of real world data. 1

6 0.34442133 90 nips-2000-New Approaches Towards Robust and Adaptive Speech Recognition

7 0.3214246 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

8 0.29883903 117 nips-2000-Shape Context: A New Descriptor for Shape Matching and Object Recognition

9 0.29734424 103 nips-2000-Probabilistic Semantic Video Indexing

10 0.29231554 141 nips-2000-Universality and Individuality in a Neural Code

11 0.25177339 132 nips-2000-The Interplay of Symbolic and Subsymbolic Processes in Anagram Problem Solving

12 0.25060678 38 nips-2000-Data Clustering by Markovian Relaxation and the Information Bottleneck Method

13 0.24803275 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure

14 0.24761352 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

15 0.23202197 138 nips-2000-The Use of Classifiers in Sequential Inference

16 0.22407925 79 nips-2000-Learning Segmentation by Random Walks

17 0.22133866 86 nips-2000-Model Complexity, Goodness of Fit and Diminishing Returns

18 0.22034067 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data

19 0.21799219 128 nips-2000-Support Vector Novelty Detection Applied to Jet Engine Vibration Spectra

20 0.21480696 112 nips-2000-Reinforcement Learning with Function Approximation Converges to a Region

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.019), (10, 0.04), (17, 0.109), (32, 0.019), (33, 0.039), (35, 0.307), (54, 0.01), (55, 0.055), (62, 0.044), (65, 0.017), (67, 0.041), (75, 0.047), (76, 0.037), (79, 0.024), (81, 0.044), (90, 0.027), (97, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78787631 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

Author: Predrag Neskovic, Philip C. Davis, Leon N. Cooper

2 0.72203147 36 nips-2000-Constrained Independent Component Analysis

Author: Wei Lu, Jagath C. Rajapakse

Abstract: The paper presents a novel technique of constrained independent component analysis (CICA) to introduce constraints into the classical ICA and solve the constrained optimization problem by using Lagrange multiplier methods. This paper shows that CICA can be used to order the resulted independent components in a specific manner and normalize the demixing matrix in the signal separation procedure. It can systematically eliminate the ICA's indeterminacy on permutation and dilation. The experiments demonstrate the use of CICA in ordering of independent components while providing normalized demixing processes. Keywords: Independent component analysis, constrained independent component analysis, constrained optimization, Lagrange multiplier methods 1

3 0.43678048 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

Author: Zoubin Ghahramani, Matthew J. Beal

Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1

4 0.435146 122 nips-2000-Sparse Representation for Gaussian Process Models

Author: Lehel Csatč´¸, Manfred Opper

Abstract: We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach.

5 0.43340731 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

Author: Javier R. Movellan, Paul Mineiro, Ruth J. Williams

Abstract: This paper explores a framework for recognition of image sequences using partially observable stochastic differential equation (SDE) models. Monte-Carlo importance sampling techniques are used for efficient estimation of sequence likelihoods and sequence likelihood gradients. Once the network dynamics are learned, we apply the SDE models to sequence recognition tasks in a manner similar to the way Hidden Markov models (HMMs) are commonly applied. The potential advantage of SDEs over HMMS is the use of continuous state dynamics. We present encouraging results for a video sequence recognition task in which SDE models provided excellent performance when compared to hidden Markov models. 1

6 0.43220821 79 nips-2000-Learning Segmentation by Random Walks

7 0.43021172 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

8 0.42642215 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure

9 0.42462051 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition

10 0.42416102 38 nips-2000-Data Clustering by Markovian Relaxation and the Information Bottleneck Method

11 0.42314845 74 nips-2000-Kernel Expansions with Unlabeled Examples

12 0.42103487 146 nips-2000-What Can a Single Neuron Compute?

13 0.4197495 30 nips-2000-Bayesian Video Shot Segmentation

14 0.4176611 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm

15 0.4171018 4 nips-2000-A Linear Programming Approach to Novelty Detection

16 0.41614237 2 nips-2000-A Comparison of Image Processing Techniques for Visual Speech Recognition Applications

17 0.41530195 130 nips-2000-Text Classification using String Kernels

18 0.41490701 102 nips-2000-Position Variance, Recurrence and Perceptual Learning

19 0.41417933 80 nips-2000-Learning Switching Linear Models of Human Motion

20 0.41350374 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling