nips nips2007 nips2007-71 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Fred Richardson, William M. Campbell
Abstract: Many tasks in speech processing involve classification of long term characteristics of a speech segment such as language, speaker, dialect, or topic. A natural technique for determining these characteristics is to first convert the input speech into a sequence of tokens such as words, phones, etc. From these tokens, we can then look for distinctive sequences, keywords, that characterize the speech. In many applications, a set of distinctive keywords may not be known a priori. In this case, an automatic method of building up keywords from short context units such as phones is desirable. We propose a method for the construction of keywords based upon Support Vector Machines. We cast the problem of keyword selection as a feature selection problem for n-grams of phones. We propose an alternating filter-wrapper method that builds successively longer keywords. Application of this method to language recognition and topic recognition tasks shows that the technique produces interesting and significant qualitative and quantitative results.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Many tasks in speech processing involve classification of long term characteristics of a speech segment such as language, speaker, dialect, or topic. [sent-7, score-0.398]
2 A natural technique for determining these characteristics is to first convert the input speech into a sequence of tokens such as words, phones, etc. [sent-8, score-0.31]
3 In many applications, a set of distinctive keywords may not be known a priori. [sent-10, score-0.382]
4 In this case, an automatic method of building up keywords from short context units such as phones is desirable. [sent-11, score-0.746]
5 We propose a method for the construction of keywords based upon Support Vector Machines. [sent-12, score-0.412]
6 We cast the problem of keyword selection as a feature selection problem for n-grams of phones. [sent-13, score-0.628]
7 Application of this method to language recognition and topic recognition tasks shows that the technique produces interesting and significant qualitative and quantitative results. [sent-15, score-0.735]
8 1 Introduction A common problem in speech processing is to identify properties of a speech segment such as the language, speaker, topic, or dialect. [sent-16, score-0.371]
9 A set of classifiers is applied to a speech segment to produce a decision. [sent-18, score-0.22]
10 For instance, for language recognition, we might construct detectors for English, French, and Spanish. [sent-19, score-0.224]
11 The maximum scoring detector on a speech segment would be the predicted language. [sent-20, score-0.22]
12 A first approach uses short-term spectral characteristics of the speech and models these with Gaussian mixture models (GMMs) or support vector machines (SVMs) directly producing a decision. [sent-22, score-0.285]
13 A second approach uses high level features of the speech such as phones and words to detect the properties. [sent-24, score-0.435]
14 For example, a particular phone or word sequence might indicate the topic. [sent-26, score-0.22]
15 SVMs have become a common method of extracting high-level properties of sequences of speech tokens [1, 2, 3, 4]. [sent-28, score-0.324]
16 Sequence kernels are constructed by viewing a speech segment as a document of tokens. [sent-29, score-0.22]
17 The SVM feature space in this case is a scaling of co-occurrence probabilities of tokens in an utterance. [sent-30, score-0.237]
18 SVMs have been applied at many linguistic levels of tokens as detectors. [sent-32, score-0.163]
19 Our focus in this paper is at the acoustic phone level. [sent-33, score-0.252]
20 Our goal is to automatically derive long sequences of phones which ∗ This work was sponsored by the Department of Homeland Security under Air Force Contract FA872105-C-0002. [sent-34, score-0.295]
21 1 we call keywords which are characteristic of a given class. [sent-36, score-0.382]
22 Prior work, for example, in language recognition [6], has shown that certain words are a significant predictor of a language. [sent-37, score-0.396]
23 For instance, the presence of the phrase “you know” in a conversational speech segment is a strong indicator of English. [sent-38, score-0.22]
24 A difficulty in using words as the indicator of the language is that we may not have available a speech-to-text (STT) system in all languages of interest. [sent-39, score-0.311]
25 In this case, we’d like to automatically construct keywords that are indicative of the language. [sent-40, score-0.382]
26 For instance, in topic recognition, proper names not in our STT system dictionary may be a strong indicator of topic. [sent-42, score-0.156]
27 Our basic approach is to view keyword construction as a feature selection problem. [sent-43, score-0.555]
28 Keywords are composed of sequences of phones of length n, i. [sent-44, score-0.295]
29 1, we review the basic architecture that we use for phone recognition and how it is applied to the problem. [sent-52, score-0.392]
30 2 presents our method for constructing long context units of phones to automatically create keywords. [sent-58, score-0.291]
31 We use a novel feature selection approach that attempts to find longer strings that discriminate well between classes. [sent-59, score-0.205]
32 Finally, in Section 4, we show the application of our method to language and topic recognition problems. [sent-60, score-0.505]
33 Quantitatively, we show that the method produces keywords which are good discriminators between classes. [sent-62, score-0.41]
34 1 Phone Recognition The high-level token extraction component of our system is a phone recognizer based upon the Brno University (BUT) design [7]. [sent-64, score-0.419]
35 Second, the BUT recognizer extensively uses discriminatively trained feedforward artificial neural networks (ANNs) to model HMM state posterior probabilities. [sent-69, score-0.152]
36 We developed a phone recognizer for English units using the BUT architecture and automatically generated STT transcripts on the Switchboard 2 Cell corpora [8]. [sent-70, score-0.377]
37 Alternatively, we use the lattice to produce expected counts of tokens and n-grams of tokens. [sent-77, score-0.379]
38 Then bigrams are created by grouping two tokens at a time to form, W2 = w1 _w2 , w2 _w3 , · · · , wn−1 _wn . [sent-80, score-0.132]
39 The count function for a given bigram, di , is count(di |W2 ) is the number of occurrences of di in the sequence W2 . [sent-82, score-0.231]
40 To extend counts to a lattice, L, we find the expected count over all all possible hypotheses in the lattice, count(di |L) = EW [count(di |W )] = W ∈L p(W |L) count(di |W ). [sent-83, score-0.134]
41 2 Discriminative Language Modeling: SVMs We focus on token-based language recognition with SVMs using the approach from [1, 4]. [sent-89, score-0.396]
42 Similar to [1], a lattice of tokens, L, is modeled using a bag-of-n-grams approach. [sent-90, score-0.184]
43 Joint probabilities of the unique n-grams, dj , on a per conversation basis are calculated, p(dj |L), see (2). [sent-91, score-0.282]
44 A typical choice is of the form 1 Dj = min Cj , gj (4) p(dj |all) where gj (·) is a function which squashes the dynamic range, and Cj is a constant. [sent-94, score-0.226]
45 Typical choices for gj are gj (x) = x and gj (x) = log(x) + 1. [sent-97, score-0.339]
46 In both cases, the squashing function gj normalizes out the typicality of a feature across all classes. [sent-98, score-0.214]
47 For the experiments in this paper, we use gj (x) = x, which is suited to high frequency token streams. [sent-101, score-0.145]
48 SVM training and scoring require only a method of kernel evaluation between two objects that produces positive definite kernel matrices (the Mercer condition). [sent-109, score-0.191]
49 1 SVM Feature Selection A first step towards an algorithm for automatic keyword generation using phones is to examine feature selection methods. [sent-114, score-0.817]
50 Ideally, we would like to select over all possible n-grams, where n is varying, the most discriminative sequences for determining a property of a speech segment. [sent-115, score-0.249]
51 As a first step, we examine feature selection for fixed n and look for keywords with n or less phones. [sent-118, score-0.546]
52 Since we are already using an SVM, a natural algorithm for discriminative feature selection in this case is to use a wrapper method [13]. [sent-120, score-0.313]
53 Guyon proposes an iterative wrapper method for feature selection for SVMs which has these basic steps: • For a set of features, S, find the SVM solution with model w. [sent-128, score-0.256]
54 Guyon’s algorithm for feature selection can be used for picking significant n-grams as keywords. [sent-133, score-0.164]
55 As an example, we have looked at this feature selection method for a language recognition task with trigrams (to be described in Section 4). [sent-136, score-0.6]
56 Figure 1 provides a motivation for the applicability of Guyon’s feature selection method. [sent-137, score-0.164]
57 This interesting result provides motivation that a small subset of keywords are significant to the task. [sent-147, score-0.382]
58 05 −3 10 −2 10 Threshold −1 10 0 0 10 Figure 1: Feature selection for a trigram language recognition task using Guyon’s method 4 3. [sent-155, score-0.499]
59 Now, suppose we want to find keywords for arbitrary n. [sent-158, score-0.382]
60 One possible hypothesis for keyword selection is that since higher order n-grams are discriminative, lower order n-grams in the keywords will also be discriminative. [sent-159, score-0.846]
61 On the basis of this idea, we propose the following algorithm for keyword construction: Keyword Building Algorithm ′ • Start with an initial value of n = ns . [sent-161, score-0.361]
62 Initialize the set, Sn , to all possible n-grams of phones including lower order grams. [sent-162, score-0.254]
63 A few items should be noted about the proposed keyword building algorithm. [sent-171, score-0.396]
64 First, we call the second feature selection process a filter step, since induction has not been applied to the (n + 1)-gram features. [sent-172, score-0.164]
65 In our experiments and in the algorithm description, we nominally append one phone to the beginning and end of an n-gram. [sent-175, score-0.266]
66 For instance, suppose the keyword is some_people which has phone transcript s_ah_m_p_iy_p_l. [sent-177, score-0.581]
67 3 Keyword Implementation The expected n-gram counts were computed from lattices using the forward-backward algorithm. [sent-180, score-0.183]
68 p(src_nd(aj+n )) (9) Equation (9) is attractive because it provides a way of computing the path posteriors locally using only the individual arc and node posteriors along the path. [sent-199, score-0.18]
69 1 Language Recognition Experimental Setup The phone recognizer described in Section 2. [sent-202, score-0.34]
70 1 was used to generate lattices across a train and an evaluation data set. [sent-203, score-0.159]
71 The training data set consists of more than 360 hours of telephone speech 5 spanning 13 different languages and coming from a variety of different sources including Callhome, Callfriend and Fisher. [sent-204, score-0.299]
72 We evaluated our system for the 30 and 10 second task under the the NIST 2005 closed condition which limits the evaluation data to 7 languages (English, Hindi, Japanese, Korean, Mandarin, Spanish and Tamil) coming only from the OHSU data source. [sent-206, score-0.167]
73 The training and evaluation data was segmented using an automatic speech activity detector and segments smaller than 0. [sent-207, score-0.228]
74 Lattice arcs with posterior probabilities lower than 10−6 were removed and lattice expected counts smaller than 10−3 were ignored. [sent-210, score-0.331]
75 The top and bottom 600 ranking keywords for each language were selected after each training iteration. [sent-211, score-0.688]
76 2 Language Recognition Results (Qualitative and Quantitative) To get a sense of how well our keyword building algorithm was working, we looked at the top ranking keywords from the English model only (since our phone recognizer is trained using the English phone set). [sent-214, score-1.46]
77 Table 1 summarizes a few of the more compelling phone 5-grams, and a possible keyword that corresponds to each one. [sent-215, score-0.613]
78 The equal error rates for our system on the NIST 2005 language recognition evaluation are summarized in Table 2. [sent-217, score-0.482]
79 The 4-gram system gave a relative improvement of 12% on the 10 second task and 9% on the 30 second task, but despite the compelling keywords produced by the 5-gram system, the performance actually degraded significantly compared to the 3-gram and 4-gram systems. [sent-218, score-0.461]
80 Table 1: Top ranking keywords for 5-gram SVM for English language recognition model phones Rank keyword SIL_Y_UW_N_OW 1 3 4 6 7 8 17 23 27 29 37 you know yeah ? [sent-219, score-1.447]
81 NULL_SIL_W_EH_L Table 2: %EER for 10 and 30 second NIST language recognition tasks N 1 2 3 4 5 10sec 30sec 25. [sent-227, score-0.396]
82 3 Topic Recognition Experimental Setup Topic recognition was performed using a subset of the phase I Fisher corpus (English) from LDC. [sent-238, score-0.172]
83 The training set was used to find keywords and models for topic detection. [sent-248, score-0.491]
84 4 Topic Recognition Results We first looked at top ranking keywords for several topics; some results are shown in Table 3. [sent-250, score-0.504]
85 We can see that many keywords show a strong correspondence with the topic. [sent-251, score-0.382]
86 Also, there are partial keywords which correspond to what appears to be longer keywords, e. [sent-252, score-0.423]
87 As in the language recognition task, we used EER as the performance measure. [sent-255, score-0.396]
88 But, as with the language recognition task, we see a degradation in performance for 5-grams. [sent-258, score-0.396]
89 5 Conclusions and future work We presented a method for automatic construction of keywords given a discriminative speech classification task. [sent-259, score-0.658]
90 Our method was based upon successively building longer span keywords from shorter span keywords using phones as a fundamental unit. [sent-260, score-1.14]
91 The problem was cast as a feature selection problem, and an alternating filter and wrapper algorithm was proposed. [sent-261, score-0.289]
92 Results showed that reasonable keywords and improved performance could be achieved using this methodology. [sent-262, score-0.382]
93 First, extension and experimentation on other tasks such as dialect and speaker recognition would be interesting. [sent-267, score-0.292]
94 Second, comparison of this method with other feature selection methods may be appropriate [16]. [sent-269, score-0.164]
95 For instance, we might want to consider more general keyword models where skips are allowed (or more general finite state transducers [17]). [sent-271, score-0.361]
96 Leek, “Phonetic speaker recognition with support vector machines,” in Advances in Neural Information Processing Systems 16, Sebastian Thrun, Lawrence Saul, and Bernhard Schölkopf, Eds. [sent-283, score-0.286]
97 TorresCarrasquillo, “Advanced language recognition using cepstra and phonotactics: MITLL system performance on the NIST 2005 language recognition evaluation,” in Proc. [sent-293, score-0.839]
98 [4] Lu-Feng Zhai, Man hung Siu, Xi Yang, and Herbert Gish, “Discriminatively trained language models using support vector machines for language identification,” in Proc. [sent-296, score-0.555]
99 Reynolds, “Language recognition with word lattices and support vector machines,” in Proceedings of ICASSP, 2007, pp. [sent-305, score-0.332]
100 Vapnik, “Gene selection for cancer classification using support vector machines,” Machine Learning, vol. [sent-343, score-0.143]
wordName wordTfidf (topN-words)
[('keywords', 0.382), ('keyword', 0.361), ('phones', 0.254), ('language', 0.224), ('phone', 0.22), ('lattice', 0.184), ('recognition', 0.172), ('dj', 0.169), ('speech', 0.151), ('aj', 0.141), ('eer', 0.138), ('tokens', 0.132), ('guyon', 0.128), ('lattices', 0.12), ('recognizer', 0.12), ('gj', 0.113), ('topic', 0.109), ('selection', 0.103), ('campbell', 0.1), ('svm', 0.094), ('wrapper', 0.092), ('english', 0.087), ('di', 0.08), ('svms', 0.08), ('nist', 0.077), ('sn', 0.076), ('speaker', 0.074), ('count', 0.071), ('conversation', 0.069), ('hobbies', 0.069), ('reynolds', 0.069), ('stt', 0.069), ('segment', 0.069), ('cdf', 0.068), ('machines', 0.067), ('counts', 0.063), ('kernel', 0.062), ('feature', 0.061), ('arc', 0.06), ('discriminative', 0.057), ('ranking', 0.054), ('cj', 0.053), ('lter', 0.047), ('system', 0.047), ('callfriend', 0.046), ('dialect', 0.046), ('nominally', 0.046), ('odyssey', 0.046), ('pavel', 0.046), ('quicknet', 0.046), ('successively', 0.046), ('wi', 0.045), ('movies', 0.044), ('probabilities', 0.044), ('duration', 0.042), ('longer', 0.041), ('coming', 0.041), ('posteriors', 0.041), ('sequences', 0.041), ('utterance', 0.04), ('icsi', 0.04), ('htk', 0.04), ('monophones', 0.04), ('telephone', 0.04), ('arcs', 0.04), ('squashing', 0.04), ('svmtorch', 0.04), ('support', 0.04), ('looked', 0.04), ('icassp', 0.04), ('languages', 0.04), ('evaluation', 0.039), ('automatic', 0.038), ('node', 0.038), ('units', 0.037), ('spoken', 0.037), ('schools', 0.037), ('package', 0.037), ('alternate', 0.036), ('building', 0.035), ('hmm', 0.034), ('alternating', 0.033), ('token', 0.032), ('compelling', 0.032), ('acoustic', 0.032), ('discriminatively', 0.032), ('detection', 0.031), ('linguistic', 0.031), ('features', 0.03), ('qualitative', 0.03), ('construction', 0.03), ('movie', 0.029), ('distinguishing', 0.029), ('richardson', 0.029), ('top', 0.028), ('produces', 0.028), ('hypothetical', 0.027), ('hours', 0.027), ('entries', 0.027), ('characteristics', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 71 nips-2007-Discriminative Keyword Selection Using Support Vector Machines
Author: Fred Richardson, William M. Campbell
Abstract: Many tasks in speech processing involve classification of long term characteristics of a speech segment such as language, speaker, dialect, or topic. A natural technique for determining these characteristics is to first convert the input speech into a sequence of tokens such as words, phones, etc. From these tokens, we can then look for distinctive sequences, keywords, that characterize the speech. In many applications, a set of distinctive keywords may not be known a priori. In this case, an automatic method of building up keywords from short context units such as phones is desirable. We propose a method for the construction of keywords based upon Support Vector Machines. We cast the problem of keyword selection as a feature selection problem for n-grams of phones. We propose an alternating filter-wrapper method that builds successively longer keywords. Application of this method to language recognition and topic recognition tasks shows that the technique produces interesting and significant qualitative and quantitative results.
2 0.12150881 129 nips-2007-Mining Internet-Scale Software Repositories
Author: Erik Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes, Pierre F. Baldi
Abstract: Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop Sourcerer, an infrastructure for the automated crawling, parsing, and database storage of open source software. Sourcerer allows us to gather Internet-scale source code. For instance, in one experiment, we gather 4,632 java projects from SourceForge and Apache totaling over 38 million lines of code from 9,250 developers. Simple statistical analyses of the data first reveal robust power-law behavior for package, SLOC, and lexical containment distributions. We then develop and apply unsupervised author-topic, probabilistic models to automatically discover the topics embedded in the code and extract topic-word and author-topic distributions. In addition to serving as a convenient summary for program function and developer activities, these and other related distributions provide a statistical and information-theoretic basis for quantifying and analyzing developer similarity and competence, topic scattering, and document tangling, with direct applications to software engineering. Finally, by combining software textual content with structural information captured by our CodeRank approach, we are able to significantly improve software retrieval performance, increasing the AUC metric to 0.84– roughly 10-30% better than previous approaches based on text alone. Supplementary material may be found at: http://sourcerer.ics.uci.edu/nips2007/nips07.html. 1
3 0.10251486 95 nips-2007-HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation
Author: Bing Zhao, Eric P. Xing
Abstract: We present a novel paradigm for statistical machine translation (SMT), based on a joint modeling of word alignment and the topical aspects underlying bilingual document-pairs, via a hidden Markov Bilingual Topic AdMixture (HM-BiTAM). In this paradigm, parallel sentence-pairs from a parallel document-pair are coupled via a certain semantic-flow, to ensure coherence of topical context in the alignment of mapping words between languages, likelihood-based training of topic-dependent translational lexicons, as well as in the inference of topic representations in each language. The learned HM-BiTAM can not only display topic patterns like methods such as LDA [1], but now for bilingual corpora; it also offers a principled way of inferring optimal translation using document context. Our method integrates the conventional model of HMM — a key component for most of the state-of-the-art SMT systems, with the recently proposed BiTAM model [10]; we report an extensive empirical analysis (in many ways complementary to the description-oriented [10]) of our method in three aspects: bilingual topic representation, word alignment, and translation.
4 0.091310479 210 nips-2007-Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks
Author: Alex Graves, Marcus Liwicki, Horst Bunke, Jürgen Schmidhuber, Santiago Fernández
Abstract: In online handwriting recognition the trajectory of the pen is recorded during writing. Although the trajectory provides a compact and complete representation of the written output, it is hard to transcribe directly, because each letter is spread over many pen locations. Most recognition systems therefore employ sophisticated preprocessing techniques to put the inputs into a more localised form. However these techniques require considerable human effort, and are specific to particular languages and alphabets. This paper describes a system capable of directly transcribing raw online handwriting data. The system consists of an advanced recurrent neural network with an output layer designed for sequence labelling, combined with a probabilistic language model. In experiments on an unconstrained online database, we record excellent results using either raw or preprocessed data, well outperforming a state-of-the-art HMM based system in both cases. 1
5 0.089684762 84 nips-2007-Expectation Maximization and Posterior Constraints
Author: Kuzman Ganchev, Ben Taskar, João Gama
Abstract: The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily to find a model that assigns values to the latent variables that have intended meaning for our data and maximizing expected likelihood only sometimes accomplishes this. Unfortunately, it is typically difficult to add even simple a-priori information about latent variables in graphical models without making the models overly complex or intractable. In this paper, we present an efficient, principled way to inject rich constraints on the posteriors of latent variables into the EM algorithm. Our method can be used to learn tractable graphical models that satisfy additional, otherwise intractable constraints. Focusing on clustering and the alignment problem for statistical machine translation, we show that simple, intuitive posterior constraints can greatly improve the performance over standard baselines and be competitive with more complex, intractable models. 1
6 0.085827246 197 nips-2007-The Infinite Markov Model
7 0.081526138 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
8 0.076421618 142 nips-2007-Non-parametric Modeling of Partially Ranked Data
9 0.074582696 183 nips-2007-Spatial Latent Dirichlet Allocation
10 0.068587132 114 nips-2007-Learning and using relational theories
11 0.066663615 9 nips-2007-A Probabilistic Approach to Language Change
12 0.06207737 2 nips-2007-A Bayesian LDA-based model for semi-supervised part-of-speech tagging
13 0.061946474 160 nips-2007-Random Features for Large-Scale Kernel Machines
14 0.061152108 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels
15 0.060984913 134 nips-2007-Multi-Task Learning via Conic Programming
16 0.060120653 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models
17 0.059687167 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis
18 0.058447678 189 nips-2007-Supervised Topic Models
19 0.05761613 109 nips-2007-Kernels on Attributed Pointsets with Applications
20 0.053501733 62 nips-2007-Convex Learning with Invariances
topicId topicWeight
[(0, -0.18), (1, 0.075), (2, -0.075), (3, -0.117), (4, 0.054), (5, 0.049), (6, 0.097), (7, -0.028), (8, -0.041), (9, 0.022), (10, 0.031), (11, -0.042), (12, 0.026), (13, -0.035), (14, -0.03), (15, -0.138), (16, -0.112), (17, -0.021), (18, 0.028), (19, 0.084), (20, 0.017), (21, 0.024), (22, 0.084), (23, -0.013), (24, 0.01), (25, -0.083), (26, -0.016), (27, 0.04), (28, -0.061), (29, -0.032), (30, -0.034), (31, -0.086), (32, -0.011), (33, 0.118), (34, 0.099), (35, 0.086), (36, 0.057), (37, 0.039), (38, 0.0), (39, 0.135), (40, -0.002), (41, 0.105), (42, 0.014), (43, 0.055), (44, -0.092), (45, 0.018), (46, -0.153), (47, -0.022), (48, -0.023), (49, -0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.94370961 71 nips-2007-Discriminative Keyword Selection Using Support Vector Machines
Author: Fred Richardson, William M. Campbell
Abstract: Many tasks in speech processing involve classification of long term characteristics of a speech segment such as language, speaker, dialect, or topic. A natural technique for determining these characteristics is to first convert the input speech into a sequence of tokens such as words, phones, etc. From these tokens, we can then look for distinctive sequences, keywords, that characterize the speech. In many applications, a set of distinctive keywords may not be known a priori. In this case, an automatic method of building up keywords from short context units such as phones is desirable. We propose a method for the construction of keywords based upon Support Vector Machines. We cast the problem of keyword selection as a feature selection problem for n-grams of phones. We propose an alternating filter-wrapper method that builds successively longer keywords. Application of this method to language recognition and topic recognition tasks shows that the technique produces interesting and significant qualitative and quantitative results.
2 0.63460314 210 nips-2007-Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks
Author: Alex Graves, Marcus Liwicki, Horst Bunke, Jürgen Schmidhuber, Santiago Fernández
Abstract: In online handwriting recognition the trajectory of the pen is recorded during writing. Although the trajectory provides a compact and complete representation of the written output, it is hard to transcribe directly, because each letter is spread over many pen locations. Most recognition systems therefore employ sophisticated preprocessing techniques to put the inputs into a more localised form. However these techniques require considerable human effort, and are specific to particular languages and alphabets. This paper describes a system capable of directly transcribing raw online handwriting data. The system consists of an advanced recurrent neural network with an output layer designed for sequence labelling, combined with a probabilistic language model. In experiments on an unconstrained online database, we record excellent results using either raw or preprocessed data, well outperforming a state-of-the-art HMM based system in both cases. 1
3 0.59218431 129 nips-2007-Mining Internet-Scale Software Repositories
Author: Erik Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes, Pierre F. Baldi
Abstract: Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop Sourcerer, an infrastructure for the automated crawling, parsing, and database storage of open source software. Sourcerer allows us to gather Internet-scale source code. For instance, in one experiment, we gather 4,632 java projects from SourceForge and Apache totaling over 38 million lines of code from 9,250 developers. Simple statistical analyses of the data first reveal robust power-law behavior for package, SLOC, and lexical containment distributions. We then develop and apply unsupervised author-topic, probabilistic models to automatically discover the topics embedded in the code and extract topic-word and author-topic distributions. In addition to serving as a convenient summary for program function and developer activities, these and other related distributions provide a statistical and information-theoretic basis for quantifying and analyzing developer similarity and competence, topic scattering, and document tangling, with direct applications to software engineering. Finally, by combining software textual content with structural information captured by our CodeRank approach, we are able to significantly improve software retrieval performance, increasing the AUC metric to 0.84– roughly 10-30% better than previous approaches based on text alone. Supplementary material may be found at: http://sourcerer.ics.uci.edu/nips2007/nips07.html. 1
4 0.55548596 95 nips-2007-HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation
Author: Bing Zhao, Eric P. Xing
Abstract: We present a novel paradigm for statistical machine translation (SMT), based on a joint modeling of word alignment and the topical aspects underlying bilingual document-pairs, via a hidden Markov Bilingual Topic AdMixture (HM-BiTAM). In this paradigm, parallel sentence-pairs from a parallel document-pair are coupled via a certain semantic-flow, to ensure coherence of topical context in the alignment of mapping words between languages, likelihood-based training of topic-dependent translational lexicons, as well as in the inference of topic representations in each language. The learned HM-BiTAM can not only display topic patterns like methods such as LDA [1], but now for bilingual corpora; it also offers a principled way of inferring optimal translation using document context. Our method integrates the conventional model of HMM — a key component for most of the state-of-the-art SMT systems, with the recently proposed BiTAM model [10]; we report an extensive empirical analysis (in many ways complementary to the description-oriented [10]) of our method in three aspects: bilingual topic representation, word alignment, and translation.
5 0.51232272 9 nips-2007-A Probabilistic Approach to Language Change
Author: Alexandre Bouchard-côté, Percy Liang, Dan Klein, Thomas L. Griffiths
Abstract: We present a probabilistic approach to language change in which word forms are represented by phoneme sequences that undergo stochastic edits along the branches of a phylogenetic tree. This framework combines the advantages of the classical comparative method with the robustness of corpus-based probabilistic models. We use this framework to explore the consequences of two different schemes for defining probabilistic models of phonological change, evaluating these schemes by reconstructing ancient word forms of Romance languages. The result is an efficient inference procedure for automatically inferring ancient word forms from modern languages, which can be generalized to support inferences about linguistic phylogenies. 1
6 0.46054024 84 nips-2007-Expectation Maximization and Posterior Constraints
7 0.45421138 152 nips-2007-Parallelizing Support Vector Machines on Distributed Computers
8 0.44648021 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models
9 0.4366813 142 nips-2007-Non-parametric Modeling of Partially Ranked Data
10 0.40316847 114 nips-2007-Learning and using relational theories
11 0.39395875 49 nips-2007-Colored Maximum Variance Unfolding
12 0.39357099 110 nips-2007-Learning Bounds for Domain Adaptation
13 0.39045969 160 nips-2007-Random Features for Large-Scale Kernel Machines
14 0.38730642 1 nips-2007-A Bayesian Framework for Cross-Situational Word-Learning
15 0.38600266 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
16 0.37214124 144 nips-2007-On Ranking in Survival Analysis: Bounds on the Concordance Index
17 0.37084863 109 nips-2007-Kernels on Attributed Pointsets with Applications
18 0.37067097 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
19 0.36309844 37 nips-2007-Blind channel identification for speech dereverberation using l1-norm sparse learning
20 0.35845113 197 nips-2007-The Infinite Markov Model
topicId topicWeight
[(5, 0.039), (13, 0.512), (16, 0.025), (18, 0.011), (21, 0.042), (31, 0.016), (34, 0.019), (35, 0.031), (47, 0.05), (49, 0.019), (83, 0.077), (85, 0.023), (87, 0.04), (90, 0.022)]
simIndex simValue paperId paperTitle
1 0.93791229 14 nips-2007-A configurable analog VLSI neural network with spiking neurons and self-regulating plastic synapses
Author: Massimiliano Giulioni, Mario Pannunzi, Davide Badoni, Vittorio Dante, Paolo D. Giudice
Abstract: We summarize the implementation of an analog VLSI chip hosting a network of 32 integrate-and-fire (IF) neurons with spike-frequency adaptation and 2,048 Hebbian plastic bistable spike-driven stochastic synapses endowed with a selfregulating mechanism which stops unnecessary synaptic changes. The synaptic matrix can be flexibly configured and provides both recurrent and AER-based connectivity with external, AER compliant devices. We demonstrate the ability of the network to efficiently classify overlapping patterns, thanks to the self-regulating mechanism.
2 0.89187407 191 nips-2007-Temporal Difference Updating without a Learning Rate
Author: Marcus Hutter, Shane Legg
Abstract: We derive an equation for temporal difference learning from statistical principles. Specifically, we start with the variational principle and then bootstrap to produce an updating rule for discounted state value estimates. The resulting equation is similar to the standard equation for temporal difference learning with eligibility traces, so called TD(λ), however it lacks the parameter α that specifies the learning rate. In the place of this free parameter there is now an equation for the learning rate that is specific to each state transition. We experimentally test this new learning rule against TD(λ) and find that it offers superior performance in various settings. Finally, we make some preliminary investigations into how to extend our new temporal difference algorithm to reinforcement learning. To do this we combine our update equation with both Watkins’ Q(λ) and Sarsa(λ) and find that it again offers superior performance without a learning rate parameter. 1
same-paper 3 0.87686527 71 nips-2007-Discriminative Keyword Selection Using Support Vector Machines
Author: Fred Richardson, William M. Campbell
Abstract: Many tasks in speech processing involve classification of long term characteristics of a speech segment such as language, speaker, dialect, or topic. A natural technique for determining these characteristics is to first convert the input speech into a sequence of tokens such as words, phones, etc. From these tokens, we can then look for distinctive sequences, keywords, that characterize the speech. In many applications, a set of distinctive keywords may not be known a priori. In this case, an automatic method of building up keywords from short context units such as phones is desirable. We propose a method for the construction of keywords based upon Support Vector Machines. We cast the problem of keyword selection as a feature selection problem for n-grams of phones. We propose an alternating filter-wrapper method that builds successively longer keywords. Application of this method to language recognition and topic recognition tasks shows that the technique produces interesting and significant qualitative and quantitative results.
4 0.82975078 22 nips-2007-Agreement-Based Learning
Author: Percy Liang, Dan Klein, Michael I. Jordan
Abstract: The learning of probabilistic models with many hidden variables and nondecomposable dependencies is an important and challenging problem. In contrast to traditional approaches based on approximate inference in a single intractable model, our approach is to train a set of tractable submodels by encouraging them to agree on the hidden variables. This allows us to capture non-decomposable aspects of the data while still maintaining tractability. We propose an objective function for our approach, derive EM-style algorithms for parameter estimation, and demonstrate their effectiveness on three challenging real-world learning tasks. 1
5 0.78587747 62 nips-2007-Convex Learning with Invariances
Author: Choon H. Teo, Amir Globerson, Sam T. Roweis, Alex J. Smola
Abstract: Incorporating invariances into a learning algorithm is a common problem in machine learning. We provide a convex formulation which can deal with arbitrary loss functions and arbitrary losses. In addition, it is a drop-in replacement for most optimization algorithms for kernels, including solvers of the SVMStruct family. The advantage of our setting is that it relies on column generation instead of modifying the underlying optimization problem directly. 1
6 0.63102031 95 nips-2007-HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation
7 0.55482191 117 nips-2007-Learning to classify complex patterns using a VLSI network of spiking neurons
8 0.54573977 84 nips-2007-Expectation Maximization and Posterior Constraints
9 0.47765815 102 nips-2007-Incremental Natural Actor-Critic Algorithms
10 0.47217363 205 nips-2007-Theoretical Analysis of Learning with Reward-Modulated Spike-Timing-Dependent Plasticity
11 0.47129554 9 nips-2007-A Probabilistic Approach to Language Change
12 0.45362246 98 nips-2007-Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion
13 0.43909743 86 nips-2007-Exponential Family Predictive Representations of State
14 0.43371904 151 nips-2007-Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs
15 0.43110311 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons
16 0.42446369 63 nips-2007-Convex Relaxations of Latent Variable Training
17 0.42294094 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine
18 0.41867045 24 nips-2007-An Analysis of Inference with the Universum
19 0.41159046 2 nips-2007-A Bayesian LDA-based model for semi-supervised part-of-speech tagging
20 0.41116878 210 nips-2007-Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks