nips nips2007 knowledge-graph by maker-knowledge-mining

nips 2007 knowledge graph


similar papers computed by tfidf model


similar papers computed by lsi model


similar papers computed by lda model


papers list:

1 nips-2007-A Bayesian Framework for Cross-Situational Word-Learning

Author: Noah Goodman, Joshua B. Tenenbaum, Michael J. Black

Abstract: For infants, early word learning is a chicken-and-egg problem. One way to learn a word is to observe that it co-occurs with a particular referent across different situations. Another way is to use the social context of an utterance to infer the intended referent of a word. Here we present a Bayesian model of cross-situational word learning, and an extension of this model that also learns which social cues are relevant to determining reference. We test our model on a small corpus of mother-infant interaction and find it performs better than competing models. Finally, we show that our model accounts for experimental phenomena including mutual exclusivity, fast-mapping, and generalization from social cues. To understand the difficulty of an infant word-learner, imagine walking down the street with a friend who suddenly says “dax blicket philbin na fivy!” while at the same time wagging her elbow. If you knew any of these words you might infer from the syntax of her sentence that blicket is a novel noun, and hence the name of a novel object. At the same time, if you knew that this friend indicated her attention by wagging her elbow at objects, you might infer that she intends to refer to an object in a nearby show window. On the other hand if you already knew that “blicket” meant the object in the window, you might be able to infer these elements of syntax and social cues. Thus, the problem of early word-learning is a classic chicken-and-egg puzzle: in order to learn word meanings, learners must use their knowledge of the rest of language (including rules of syntax, parts of speech, and other word meanings) as well as their knowledge of social situations. But in order to learn about the facts of their language they must first learn some words, and in order to determine which cues matter for establishing reference (for instance, pointing and looking at an object but normally not waggling your elbow) they must first have a way to know the intended referent in some situations. For theories of language acquisition, there are two common ways out of this dilemma. The first involves positing a wide range of innate structures which determine the syntax and categories of a language and which social cues are informative. (Though even when all of these elements are innately determined using them to learn a language from evidence may not be trivial [1].) The other alternative involves bootstrapping: learning some words, then using those words to learn how to learn more. This paper gives a proposal for the second alternative. We first present a Bayesian model of how learners could use a statistical strategy—cross-situational word-learning—to learn how words map to objects, independent of syntactic and social cues. We then extend this model to a true bootstrapping situation: using social cues to learn words while using words to learn social cues. Finally, we examine several important phenomena in word learning: mutual exclusivity (the tendency to assign novel words to novel referents), fast-mapping (the ability to assign a novel word in a linguistic context to a novel referent after only a single use), and social generalization (the ability to use social context to learn the referent of a novel word). Without adding additional specialized machinery, we show how these can be explained within our model as the result of domain-general probabilistic inference mechanisms operating over the linguistic domain. 1 Os r, b Is Ws Figure 1: Graphical model describing the generation of words (Ws ) from an intention (Is ) and lexicon ( ), and intention from the objects present in a situation (Os ). The plate indicates multiple copies of the model for different situation/utterance pairs (s). Dotted portions indicate additions to include the generation of social cues Ss from intentions. Ss ∀s 1 The Model Behind each linguistic utterance is a meaning that the speaker intends to communicate. Our model operates by attempting to infer this intended meaning (which we call the intent) on the basis of the utterance itself and observations of the physical and social context. For the purpose of modeling early word learning—which consists primarily of learning words for simple object categories—in our model, we assume that intents are simply groups of objects. To state the model formally, we assume the non-linguistic situation consists of a set Os of objects and that utterances are unordered sets of words Ws 1 . The lexicon is a (many-to-many) map from words to objects, which captures the meaning of those words. (Syntax enters our model only obliquely by different treatment of words depending on whether they are in the lexicon or not—that is, whether they are common nouns or other types of words.) In this setting the speaker’s intention will be captured by a set of objects in the situation to which she intends to refer: Is ⊆ Os . This setup is indicated in the graphical model of Fig. 1. Different situation-utterance pairs Ws , Os are independent given the lexicon , giving: P (Ws |Is , ) · P (Is |Os ). P (W| , O) = s (1) Is We further simplify by assuming that P (Is |Os ) ∝ 1 (which could be refined by adding a more detailed model of the communicative intentions a person is likely to form in different situations). We will assume that words in the utterance are generated independently given the intention and the lexicon and that the length of the utterance is observed. Each word is then generated from the intention set and lexicon by first choosing whether the word is a referential word or a non-referential word (from a binomial distribution of weight γ), then, for referential words, choosing which object in the intent it refers to (uniformly). This process gives: P (Ws |Is , ) = (1 − γ)PNR (w| ) + γ w∈Ws x∈Is 1 PR (w|x, ) . |Is | The probability of word w referring to object x is PR (w|x, ) ∝ δx∈ w occurring as a non-referring word is PNR (w| ) ∝ 1 if (w) = ∅, κ otherwise. (w) , (2) and the probability of word (3) (this probability is a distribution over all words in the vocabulary, not just those in lexicon ). The constant κ is a penalty for using a word in the lexicon as a non-referring word—this penalty indirectly enforces a light-weight difference between two different groups of words (parts-of-speech): words that refer and words that do not refer. Because the generative structure of this model exposes the role of speaker’s intentions, it is straightforward to add non-linguistic social cues. We assume that social cues such as pointing are generated 1 Note that, since we ignore word order, the distribution of words in a sentence should be exchangeable given the lexicon and situation. This implies, by de Finetti’s theorem, that they are independent conditioned on a latent state—we assume that the latent state giving rise to words is the intention of the speaker. 2 from the speaker’s intent independently of the linguistic aspects (as shown in the dotted arrows of Fig. 1). With the addition of social cues Ss , Eq. 1 becomes: P (Ws |Is , ) · P (Ss |Is ) · P (Is |Os ). P (W| , O) = s (4) Is We assume that the social cues are a set Si (x) of independent binary (cue present or not) feature values for each object x ∈ Os , which are generated through a noisy-or process: P (Si (x)=1|Is , ri , bi ) = 1 − (1 − bi )(1 − ri )δx∈Is . (5) Here ri is the relevance of cue i, while bi is its base rate. For the model without social cues the posterior probability of a lexicon given a set of situated utterances is: P ( |W, O) ∝ P (W| , O)P ( ). (6) And for the model with social cues the joint posterior over lexicon and cue parameters is: P ( , r, b|W, O) ∝ P (W| , r, b, O)P ( )P (r, b). (7) We take the prior probability of a lexicon to be exponential in its size: P ( ) ∝ e−α| | , and the prior probability of social cue parameters to be uniform. Given the model above and the corpus described below, we found the best lexicon (or lexicon and cue parameters) according to Eq. 6 and 7 by MAP inference using stochastic search2 . 2 Previous work While cross-situational word-learning has been widely discussed in the empirical literature, e.g., [2], there have been relatively few attempts to model this process computationally. Siskind [3] created an ambitious model which used deductive rules to make hypotheses about propositional word meanings their use across situations. This model achieved surprising success in learning word meanings in artificial corpora, but was extremely complex and relied on the availability of fully coded representations of the meaning of each sentence, making it difficult to extend to empirical corpus data. More recently, Yu and Ballard [4] have used a machine translation model (similar to IBM Translation Model I) to learn word-object association probabilities. In their study, they used a pre-existing corpus of mother-infant interactions and coded the objects present during each utterance (an example from this corpus—illustrated with our own coding scheme—is shown in Fig. 2). They applied their translation model to estimate the probability of an object given a word, creating a table of associations between words and objects. Using this table, they extracted a lexicon (a group of word-object mappings) which was relatively accurate in its guesses about the names of objects that were being talked about. They further extended their model to incorporate prosodic emphasis on words (a useful cue which we will not discuss here) and joint attention on objects. Joint attention was coded by hand, isolating a subset of objects which were attended to by both mother and infant. Their results reflected a sizable increase in recall with the use of social cues. 3 Materials and Assessment Methods To test the performance of our model on natural data, we used the Rollins section of the CHILDES corpus[5]. For comparison with the model by Yu and Ballard [4], we chose the files me03 and di06, each of which consisted of approximately ten minutes of interaction between a mother and a preverbal infant playing with objects found in a box of toys. Because we were not able to obtain the exact corpus Yu and Ballard used, we recoded the objects in the videos and added a coding of social cues co-occurring with each utterance. We annotated each utterance with the set of objects visible to the infant and with a social coding scheme (for an illustrated example, see Figure 2). Our social code included seven features: infants eyes, infants hands, infants mouth, infant touching, mothers hands, mothers eyes, mother touching. For each utterance, this coding created an object by social feature matrix. 2 In order to speed convergence we used a simulated tempering scheme with three temperature chains and a range of data-driven proposals. 3 Figure 2: A still frame from our corpus showing the coding of objects and social cues. We coded all mid-sized objects visible to the infant as well as social information including what both mother and infant were touching and looking at. We evaluated all models based on their coverage of a gold-standard lexicon, computing precision (how many of the word-object mappings in a lexicon were correct relative to the gold-standard), recall (how many of the total correct mappings were found), and their geometric mean, F-score. However, the gold-standard lexicon for word-learning is not obvious. For instance, should it include the mapping between the plural “pigs” or the sound “oink” and the object PIG? Should a goldstandard lexicon include word-object pairings that are correct but were not present in the learning situation? In the results we report, we included those pairings which would be useful for a child to learn (e.g., “oink” → PIG) but not including those pairings which were not observed to co-occur in the corpus (however, modifying these decisions did not affect the qualitative pattern of results). 4 Results For the purpose of comparison, we give scores for several other models on the same corpus. We implemented a range of simple associative models based on co-occurrence frequency, conditional probability (both word given object and object given word), and point-wise mutual information. In each of these models, we computed the relevant statistic across the entire corpus and then created a lexicon by including all word-object pairings for which the association statistic met a threshold value. We additionally implemented a translation model (based on Yu and Ballard [4]). Because Yu and Ballard did not include details on how they evaluated their model, we scored it in the same way as the other associative models, by creating an association matrix based on the scores P (O|W ) (as given in equation (3) in their paper) and then creating a lexicon based on a threshold value. In order to simulate this type of threshold value for our model, we searched for the MAP lexicon over a range of parameters α in our prior (the larger the prior value, the less probable a larger lexicon, thus this manipulation served to create more or less selective lexicons) . Base model. In Figure 3, we plot the precision and the recall for lexicons across a range of prior parameter values for our model and the full range of threshold values for the translation model and two of the simple association models (since results for the conditional probability models were very similar but slightly inferior to the performance of mutual information, we did not include them). For our model, we averaged performance at each threshold value across three runs of 5000 search iterations each. Our model performed better than any of the other models on a number of dimensions (best lexicon shown in Table 1), both achieving the highest F-score and showing a better tradeoff between precision and recall at sub-optimal threshold values. The translation model also performed well, increasing precision as the threshold of association was raised. Surprisingly, standard cooccurrence statistics proved to be relatively ineffective at extracting high-scoring lexicons: at any given threshold value, these models included a very large number of incorrect pairs. Table 1: The best lexicon found by the Bayesian model (α=11, γ=0.2, κ=0.01). baby → book hand → hand bigbird → bird hat → hat on → ring bird → rattle meow → kitty ring → ring 4 birdie → duck moocow → cow sheep → sheep book → book oink → pig 1 Co!occurrence frequency Mutual information Translation model Bayesian model 0.9 0.8 0.7 recall 0.6 0.5 0.4 0.3 F=0.54 F=0.44 F=0.21 F=0.12 0.2 0.1 0 0 0.2 0.4 0.6 precision 0.8 1 Figure 3: Comparison of models on corpus data: we plot model precision vs. recall across a range of threshold values for each model (see text). Unlike standard ROC curves for classification tasks, the precision and recall of a lexicon depends on the entire lexicon, and irregularities in the curves reflect the small size of the lexicons). One additional virtue of our model over other associative models is its ability to determine which objects the speaker intended to refer to. In Table 2, we give some examples of situations in which the model correctly inferred the objects that the speaker was talking about. Social model. While the addition of social cues did not increase corpus performance above that found in the base model, the lexicons which were found by the social model did have several properties that were not present in the base model. First, the model effectively and quickly converged on the social cues that we found subjectively important in viewing the corpus videos. The two cues which were consistently found relevant across the model were (1) the target of the infant’s gaze and (2) the caregiver’s hand. These data are especially interesting in light of the speculation that infants initially believe their own point of gaze is a good cue to reference, and must learn over the second year that the true cue is the caregiver’s point of gaze, not their own [6]. Second, while the social model did not outperform the base model on the full corpus (where many words were paired with their referents several times), on a smaller corpus (taking every other utterance), the social cue model did slightly outperform a model without social cues (max F-score=0.43 vs. 0.37). Third, the addition of social cues allowed the model to infer the intent of a speaker even in the absence of a word being used. In the right-hand column of Table 2, we give an example of a situation in which the caregiver simply says ”see that?” but from the direction of the infant’s eyes and the location of her hand, the model correctly infers that she is talking about the COW, not either of the other possible referents. This kind of inference might lead the way in allowing infants to learn words like pronouns, which serve pick out an unambiguous focus of attention (one that is so obvious based on social and contextual cues that it does not need to be named). Finally, in the next section we show that the addition of social cues to the model allows correct performance in experimental tests of social generalization which only children older than 18 months can pass, suggesting perhaps that the social model is closer to the strategy used by more mature word learners. Table 2: Intentions inferred by the Bayesian model after having learned a lexicon from the corpus. (IE=Infant’s eyes, CH=Caregiver’s hands). Words Objects Social Cues Inferred intention “look at the moocow” COW GIRL BEAR “see the bear by the rattle?” BEAR RATTLE COW COW BEAR RATTLE 5 “see that?” BEAR RATTLE COW IE & CH→COW COW situation: !7.3, corpus: !631.1, total: !638.4

2 nips-2007-A Bayesian LDA-based model for semi-supervised part-of-speech tagging

Author: Kristina Toutanova, Mark Johnson

Abstract: We present a novel Bayesian model for semi-supervised part-of-speech tagging. Our model extends the Latent Dirichlet Allocation model and incorporates the intuition that words’ distributions over tags, p(t|w), are sparse. In addition we introduce a model for determining the set of possible tags of a word which captures important dependencies in the ambiguity classes of words. Our model outperforms the best previously proposed model for this task on a standard dataset. 1

3 nips-2007-A Bayesian Model of Conditioned Perception

Author: Alan Stocker, Eero P. Simoncelli

Abstract: unkown-abstract

4 nips-2007-A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

Author: Byron Boots, Geoffrey J. Gordon, Sajid M. Siddiqi

Abstract: Stability is a desirable characteristic for linear dynamical systems, but it is often ignored by algorithms that learn these systems from data. We propose a novel method for learning stable linear dynamical systems: we formulate an approximation of the problem as a convex program, start with a solution to a relaxed version of the program, and incrementally add constraints to improve stability. Rather than continuing to generate constraints until we reach a feasible solution, we test stability at each step; because the convex program is only an approximation of the desired problem, this early stopping rule can yield a higher-quality solution. We apply our algorithm to the task of learning dynamic textures from image sequences as well as to modeling biosurveillance drug-sales data. The constraint generation approach leads to noticeable improvement in the quality of simulated sequences. We compare our method to those of Lacy and Bernstein [1, 2], with positive results in terms of accuracy, quality of simulated sequences, and efficiency. 1

5 nips-2007-A Game-Theoretic Approach to Apprenticeship Learning

Author: Umar Syed, Robert E. Schapire

Abstract: We study the problem of an apprentice learning to behave in an environment with an unknown reward function by observing the behavior of an expert. We follow on the work of Abbeel and Ng [1] who considered a framework in which the true reward function is assumed to be a linear combination of a set of known and observable features. We give a new algorithm that, like theirs, is guaranteed to learn a policy that is nearly as good as the expert’s, given enough examples. However, unlike their algorithm, we show that ours may produce a policy that is substantially better than the expert’s. Moreover, our algorithm is computationally faster, is easier to implement, and can be applied even in the absence of an expert. The method is based on a game-theoretic view of the problem, which leads naturally to a direct application of the multiplicative-weights algorithm of Freund and Schapire [2] for playing repeated matrix games. In addition to our formal presentation and analysis of the new algorithm, we sketch how the method can be applied when the transition function itself is unknown, and we provide an experimental demonstration of the algorithm on a toy video-game environment. 1

6 nips-2007-A General Boosting Method and its Application to Learning Ranking Functions for Web Search

Author: Zhaohui Zheng, Hongyuan Zha, Tong Zhang, Olivier Chapelle, Keke Chen, Gordon Sun

Abstract: We present a general boosting method extending functional gradient boosting to optimize complex loss functions that are encountered in many machine learning problems. Our approach is based on optimization of quadratic upper bounds of the loss functions which allows us to present a rigorous convergence analysis of the algorithm. More importantly, this general framework enables us to use a standard regression base learner such as single regression tree for £tting any loss function. We illustrate an application of the proposed method in learning ranking functions for Web search by combining both preference data and labeled data for training. We present experimental results for Web search using data from a commercial search engine that show signi£cant improvements of our proposed methods over some existing methods. 1

7 nips-2007-A Kernel Statistical Test of Independence

Author: Arthur Gretton, Kenji Fukumizu, Choon H. Teo, Le Song, Bernhard Schölkopf, Alex J. Smola

Abstract: Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC). The resulting test costs O(m2 ), where m is the sample size. We demonstrate that this test outperforms established contingency table and functional correlation-based tests, and that this advantage is greater for multivariate data. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists.

8 nips-2007-A New View of Automatic Relevance Determination

Author: David P. Wipf, Srikantan S. Nagarajan

Abstract: Automatic relevance determination (ARD) and the closely-related sparse Bayesian learning (SBL) framework are effective tools for pruning large numbers of irrelevant features leading to a sparse explanatory subset. However, popular update rules used for ARD are either difficult to extend to more general problems of interest or are characterized by non-ideal convergence properties. Moreover, it remains unclear exactly how ARD relates to more traditional MAP estimation-based methods for learning sparse representations (e.g., the Lasso). This paper furnishes an alternative means of expressing the ARD cost function using auxiliary functions that naturally addresses both of these issues. First, the proposed reformulation of ARD can naturally be optimized by solving a series of re-weighted 1 problems. The result is an efficient, extensible algorithm that can be implemented using standard convex programming toolboxes and is guaranteed to converge to a local minimum (or saddle point). Secondly, the analysis reveals that ARD is exactly equivalent to performing standard MAP estimation in weight space using a particular feature- and noise-dependent, non-factorial weight prior. We then demonstrate that this implicit prior maintains several desirable advantages over conventional priors with respect to feature selection. Overall these results suggest alternative cost functions and update procedures for selecting features and promoting sparse solutions in a variety of general situations. In particular, the methodology readily extends to handle problems such as non-negative sparse coding and covariance component estimation. 1

9 nips-2007-A Probabilistic Approach to Language Change

Author: Alexandre Bouchard-côté, Percy Liang, Dan Klein, Thomas L. Griffiths

Abstract: We present a probabilistic approach to language change in which word forms are represented by phoneme sequences that undergo stochastic edits along the branches of a phylogenetic tree. This framework combines the advantages of the classical comparative method with the robustness of corpus-based probabilistic models. We use this framework to explore the consequences of two different schemes for defining probabilistic models of phonological change, evaluating these schemes by reconstructing ancient word forms of Romance languages. The result is an efficient inference procedure for automatically inferring ancient word forms from modern languages, which can be generalized to support inferences about linguistic phylogenies. 1

10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning

Author: Krishnan Kumar, Chiru Bhattacharya, Ramesh Hariharan

Abstract: This paper investigates the application of randomized algorithms for large scale SVM learning. The key contribution of the paper is to show that, by using ideas random projections, the minimal number of support vectors required to solve almost separable classification problems, such that the solution obtained is near optimal with a very high probability, is given by O(log n); if on removal of properly chosen O(log n) points the data becomes linearly separable then it is called almost separable. The second contribution is a sampling based algorithm, motivated from randomized algorithms, which solves a SVM problem by considering subsets of the dataset which are greater in size than the number of support vectors for the problem. These two ideas are combined to obtain an algorithm for SVM classification problems which performs the learning by considering only O(log n) points at a time. Experiments done on synthetic and real life datasets show that the algorithm does scale up state of the art SVM solvers in terms of memory required and execution time without loss in accuracy. It is to be noted that the algorithm presented here nicely complements existing large scale SVM learning approaches as it can be used to scale up any SVM solver. 1

11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

Author: Kristiaan Pelckmans, Johan Suykens, Bart D. Moor

Abstract: This paper1 explores the use of a Maximal Average Margin (MAM) optimality principle for the design of learning algorithms. It is shown that the application of this risk minimization principle results in a class of (computationally) simple learning machines similar to the classical Parzen window classifier. A direct relation with the Rademacher complexities is established, as such facilitating analysis and providing a notion of certainty of prediction. This analysis is related to Support Vector Machines by means of a margin transformation. The power of the MAM principle is illustrated further by application to ordinal regression tasks, resulting in an O(n) algorithm able to process large datasets in reasonable time. 1

12 nips-2007-A Spectral Regularization Framework for Multi-Task Structure Learning

Author: Andreas Argyriou, Massimiliano Pontil, Yiming Ying, Charles A. Micchelli

Abstract: Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, which is based on regularization with spectral functions of matrices. This class of regularization problems exhibits appealing computational properties and can be optimized efficiently by an alternating minimization algorithm. In addition, we provide a necessary and sufficient condition for convexity of the regularizer. We analyze concrete examples of the framework, which are equivalent to regularization with Lp matrix norms. Experiments on two real data sets indicate that the algorithm scales well with the number of tasks and improves on state of the art statistical performance. 1

13 nips-2007-A Unified Near-Optimal Estimator For Dimension Reduction in $l \alpha$ ($0<\alpha\leq 2$) Using Stable Random Projections

Author: Ping Li, Trevor J. Hastie

Abstract: Many tasks (e.g., clustering) in machine learning only require the lα distances instead of the original data. For dimension reductions in the lα norm (0 < α ≤ 2), the method of stable random projections can efficiently compute the lα distances in massive datasets (e.g., the Web or massive data streams) in one pass of the data. The estimation task for stable random projections has been an interesting topic. We propose a simple estimator based on the fractional power of the samples (projected data), which is surprisingly near-optimal in terms of the asymptotic variance. In fact, it achieves the Cram´ r-Rao bound when α = 2 and α = 0+. This e new result will be useful when applying stable random projections to distancebased clustering, classifications, kernels, massive data streams etc.

14 nips-2007-A configurable analog VLSI neural network with spiking neurons and self-regulating plastic synapses

Author: Massimiliano Giulioni, Mario Pannunzi, Davide Badoni, Vittorio Dante, Paolo D. Giudice

Abstract: We summarize the implementation of an analog VLSI chip hosting a network of 32 integrate-and-fire (IF) neurons with spike-frequency adaptation and 2,048 Hebbian plastic bistable spike-driven stochastic synapses endowed with a selfregulating mechanism which stops unnecessary synaptic changes. The synaptic matrix can be flexibly configured and provides both recurrent and AER-based connectivity with external, AER compliant devices. We demonstrate the ability of the network to efficiently classify overlapping patterns, thanks to the self-regulating mechanism.

15 nips-2007-A general agnostic active learning algorithm

Author: Sanjoy Dasgupta, Claire Monteleoni, Daniel J. Hsu

Abstract: We present an agnostic active learning algorithm for any hypothesis class of bounded VC dimension under arbitrary data distributions. Most previous work on active learning either makes strong distributional assumptions, or else is computationally prohibitive. Our algorithm extends the simple scheme of Cohn, Atlas, and Ladner [1] to the agnostic setting, using reductions to supervised learning that harness generalization bounds in a simple but subtle manner. We provide a fall-back guarantee that bounds the algorithm’s label complexity by the agnostic PAC sample complexity. Our analysis yields asymptotic label complexity improvements for certain hypothesis classes and distributions. We also demonstrate improvements experimentally. 1

16 nips-2007-A learning framework for nearest neighbor search

Author: Lawrence Cayton, Sanjoy Dasgupta

Abstract: Can we leverage learning techniques to build a fast nearest-neighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures. 1

17 nips-2007-A neural network implementing optimal state estimation based on dynamic spike train decoding

Author: Omer Bobrowski, Ron Meir, Shy Shoham, Yonina Eldar

Abstract: It is becoming increasingly evident that organisms acting in uncertain dynamical environments often employ exact or approximate Bayesian statistical calculations in order to continuously estimate the environmental state, integrate information from multiple sensory modalities, form predictions and choose actions. What is less clear is how these putative computations are implemented by cortical neural networks. An additional level of complexity is introduced because these networks observe the world through spike trains received from primary sensory afferents, rather than directly. A recent line of research has described mechanisms by which such computations can be implemented using a network of neurons whose activity directly represents a probability distribution across the possible “world states”. Much of this work, however, uses various approximations, which severely restrict the domain of applicability of these implementations. Here we make use of rigorous mathematical results from the theory of continuous time point process filtering, and show how optimal real-time state estimation and prediction may be implemented in a general setting using linear neural networks. We demonstrate the applicability of the approach with several examples, and relate the required network properties to the statistical nature of the environment, thereby quantifying the compatibility of a given network with its environment. 1

18 nips-2007-A probabilistic model for generating realistic lip movements from speech

Author: Gwenn Englebienne, Tim Cootes, Magnus Rattray

Abstract: The present work aims to model the correspondence between facial motion and speech. The face and sound are modelled separately, with phonemes being the link between both. We propose a sequential model and evaluate its suitability for the generation of the facial animation from a sequence of phonemes, which we obtain from speech. We evaluate the results both by computing the error between generated sequences and real video, as well as with a rigorous double-blind test with human subjects. Experiments show that our model compares favourably to other existing methods and that the sequences generated are comparable to real video sequences. 1

19 nips-2007-Active Preference Learning with Discrete Choice Data

Author: Brochu Eric, Nando D. Freitas, Abhijeet Ghosh

Abstract: We propose an active learning algorithm that learns a continuous valuation model from discrete preferences. The algorithm automatically decides what items are best presented to an individual in order to find the item that they value highly in as few trials as possible, and exploits quirks of human psychology to minimize time and cognitive burden. To do this, our algorithm maximizes the expected improvement at each query without accurately modelling the entire valuation surface, which would be needlessly expensive. The problem is particularly difficult because the space of choices is infinite. We demonstrate the effectiveness of the new algorithm compared to related active learning methods. We also embed the algorithm within a decision making tool for assisting digital artists in rendering materials. The tool finds the best parameters while minimizing the number of queries. 1

20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis

Author: Venkat Chandrasekaran, Alan S. Willsky, Jason K. Johnson

Abstract: We consider the estimation problem in Gaussian graphical models with arbitrary structure. We analyze the Embedded Trees algorithm, which solves a sequence of problems on tractable subgraphs thereby leading to the solution of the estimation problem on an intractable graph. Our analysis is based on the recently developed walk-sum interpretation of Gaussian estimation. We show that non-stationary iterations of the Embedded Trees algorithm using any sequence of subgraphs converge in walk-summable models. Based on walk-sum calculations, we develop adaptive methods that optimize the choice of subgraphs used at each iteration with a view to achieving maximum reduction in error. These adaptive procedures provide a significant speedup in convergence over stationary iterative methods, and also appear to converge in a larger class of models. 1

21 nips-2007-Adaptive Online Gradient Descent

22 nips-2007-Agreement-Based Learning

23 nips-2007-An Analysis of Convex Relaxations for MAP Estimation

24 nips-2007-An Analysis of Inference with the Universum

25 nips-2007-An in-silico Neural Model of Dynamic Routing through Neuronal Coherence

26 nips-2007-An online Hebbian learning rule that performs Independent Component Analysis

27 nips-2007-Anytime Induction of Cost-sensitive Trees

28 nips-2007-Augmented Functional Time Series Representation and Forecasting with Gaussian Processes

29 nips-2007-Automatic Generation of Social Tags for Music Recommendation

30 nips-2007-Bayes-Adaptive POMDPs

31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

32 nips-2007-Bayesian Co-Training

33 nips-2007-Bayesian Inference for Spiking Neuron Models with a Sparsity Prior

34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC

35 nips-2007-Bayesian binning beats approximate alternatives: estimating peri-stimulus time histograms

36 nips-2007-Better than least squares: comparison of objective functions for estimating linear-nonlinear models

37 nips-2007-Blind channel identification for speech dereverberation using l1-norm sparse learning

38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin

39 nips-2007-Boosting the Area under the ROC Curve

40 nips-2007-Bundle Methods for Machine Learning

41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking

42 nips-2007-CPR for CSPs: A Probabilistic Relaxation of Constraint Propagation

43 nips-2007-Catching Change-points with Lasso

44 nips-2007-Catching Up Faster in Bayesian Model Selection and Model Averaging

45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

46 nips-2007-Cluster Stability for Finite Samples

47 nips-2007-Collapsed Variational Inference for HDP

48 nips-2007-Collective Inference on Markov Models for Modeling Bird Migration

49 nips-2007-Colored Maximum Variance Unfolding

50 nips-2007-Combined discriminative and generative articulated pose and non-rigid shape estimation

51 nips-2007-Comparing Bayesian models for multisensory cue combination without mandatory integration

52 nips-2007-Competition Adds Complexity

53 nips-2007-Compressed Regression

54 nips-2007-Computational Equivalence of Fixed Points and No Regret Algorithms, and Convergence to Equilibria

55 nips-2007-Computing Robust Counter-Strategies

56 nips-2007-Configuration Estimates Improve Pedestrian Finding

57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events

58 nips-2007-Consistent Minimization of Clustering Objective Functions

59 nips-2007-Continuous Time Particle Filtering for fMRI

60 nips-2007-Contraction Properties of VLSI Cooperative Competitive Neural Networks of Spiking Neurons

61 nips-2007-Convex Clustering with Exemplar-Based Models

62 nips-2007-Convex Learning with Invariances

63 nips-2007-Convex Relaxations of Latent Variable Training

64 nips-2007-Cooled and Relaxed Survey Propagation for MRFs

65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

66 nips-2007-Density Estimation under Independent Similarly Distributed Sampling Assumptions

67 nips-2007-Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation

68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

69 nips-2007-Discriminative Batch Mode Active Learning

70 nips-2007-Discriminative K-means for Clustering

71 nips-2007-Discriminative Keyword Selection Using Support Vector Machines

72 nips-2007-Discriminative Log-Linear Grammars with Latent Variables

73 nips-2007-Distributed Inference for Latent Dirichlet Allocation

74 nips-2007-EEG-Based Brain-Computer Interaction: Improved Accuracy by Automatic Single-Trial Error Detection

75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

77 nips-2007-Efficient Inference for Distributions on Permutations

78 nips-2007-Efficient Principled Learning of Thin Junction Trees

79 nips-2007-Efficient multiple hyperparameter learning for log-linear models

80 nips-2007-Ensemble Clustering using Semidefinite Programming

81 nips-2007-Estimating disparity with confidence from energy neurons

82 nips-2007-Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization

83 nips-2007-Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks

84 nips-2007-Expectation Maximization and Posterior Constraints

85 nips-2007-Experience-Guided Search: A Theory of Attentional Control

86 nips-2007-Exponential Family Predictive Representations of State

87 nips-2007-Fast Variational Inference for Large-scale Internet Diagnosis

88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition

89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta

90 nips-2007-FilterBoost: Regression and Classification on Large Datasets

91 nips-2007-Fitted Q-iteration in continuous action-space MDPs

92 nips-2007-Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations

93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data

94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

95 nips-2007-HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation

96 nips-2007-Heterogeneous Component Analysis

97 nips-2007-Hidden Common Cause Relations in Relational Learning

98 nips-2007-Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion

99 nips-2007-Hierarchical Penalization

100 nips-2007-Hippocampal Contributions to Control: The Third Way

101 nips-2007-How SVMs can estimate quantiles and the median

102 nips-2007-Incremental Natural Actor-Critic Algorithms

103 nips-2007-Inferring Elapsed Time from Stochastic Neural Processes

104 nips-2007-Inferring Neural Firing Rates from Spike Trains Using Gaussian Processes

105 nips-2007-Infinite State Bayes-Nets for Structured Domains

106 nips-2007-Invariant Common Spatial Patterns: Alleviating Nonstationarities in Brain-Computer Interfacing

107 nips-2007-Iterative Non-linear Dimensionality Reduction with Manifold Sculpting

108 nips-2007-Kernel Measures of Conditional Dependence

109 nips-2007-Kernels on Attributed Pointsets with Applications

110 nips-2007-Learning Bounds for Domain Adaptation

111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images

112 nips-2007-Learning Monotonic Transformations for Classification

113 nips-2007-Learning Visual Attributes

114 nips-2007-Learning and using relational theories

115 nips-2007-Learning the 2-D Topology of Images

116 nips-2007-Learning the structure of manifolds using random projections

117 nips-2007-Learning to classify complex patterns using a VLSI network of spiking neurons

118 nips-2007-Learning with Transformation Invariant Kernels

119 nips-2007-Learning with Tree-Averaged Densities and Distributions

120 nips-2007-Linear programming analysis of loopy belief propagation for weighted matching

121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs

122 nips-2007-Locality and low-dimensions in the prediction of natural experience from fMRI

123 nips-2007-Loop Series and Bethe Variational Bounds in Attractive Graphical Models

124 nips-2007-Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning

125 nips-2007-Markov Chain Monte Carlo with People

126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

127 nips-2007-Measuring Neural Synchrony by Message Passing

128 nips-2007-Message Passing for Max-weight Independent Set

129 nips-2007-Mining Internet-Scale Software Repositories

130 nips-2007-Modeling Natural Sounds with Modulation Cascade Processes

131 nips-2007-Modeling homophily and stochastic equivalence in symmetric relational data

132 nips-2007-Modeling image patches with a directed hierarchy of Markov random fields

133 nips-2007-Modelling motion primitives and their timing in biologically executed movements

134 nips-2007-Multi-Task Learning via Conic Programming

135 nips-2007-Multi-task Gaussian Process Prediction

136 nips-2007-Multiple-Instance Active Learning

137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors

138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

139 nips-2007-Nearest-Neighbor-Based Active Learning for Rare Category Detection

140 nips-2007-Neural characterization in partially observed populations of spiking neurons

141 nips-2007-New Outer Bounds on the Marginal Polytope

142 nips-2007-Non-parametric Modeling of Partially Ranked Data

143 nips-2007-Object Recognition by Scene Alignment

144 nips-2007-On Ranking in Survival Analysis: Bounds on the Concordance Index

145 nips-2007-On Sparsity and Overcompleteness in Image Models

146 nips-2007-On higher-order perceptron algorithms

147 nips-2007-One-Pass Boosting

148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning

149 nips-2007-Optimal ROC Curve for a Combination of Classifiers

150 nips-2007-Optimal models of sound localization by barn owls

151 nips-2007-Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs

152 nips-2007-Parallelizing Support Vector Machines on Distributed Computers

153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model

154 nips-2007-Predicting Brain States from fMRI Data: Incremental Functional Principal Component Regression

155 nips-2007-Predicting human gaze using low-level saliency combined with face detection

156 nips-2007-Predictive Matrix-Variate t Models

157 nips-2007-Privacy-Preserving Belief Propagation and Sampling

158 nips-2007-Probabilistic Matrix Factorization

159 nips-2007-Progressive mixture rules are deviation suboptimal

160 nips-2007-Random Features for Large-Scale Kernel Machines

161 nips-2007-Random Projections for Manifold Learning

162 nips-2007-Random Sampling of States in Dynamic Programming

163 nips-2007-Receding Horizon Differential Dynamic Programming

164 nips-2007-Receptive Fields without Spike-Triggering

165 nips-2007-Regret Minimization in Games with Incomplete Information

166 nips-2007-Regularized Boost for Semi-Supervised Learning

167 nips-2007-Regulator Discovery from Gene Expression Time Series of Malaria Parasites: a Hierachical Approach

168 nips-2007-Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods

169 nips-2007-Retrieved context and the discovery of semantic structure

170 nips-2007-Robust Regression with Twinned Gaussian Processes

171 nips-2007-Scan Strategies for Meteorological Radars

172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images

173 nips-2007-Second Order Bilinear Discriminant Analysis for single trial EEG analysis

174 nips-2007-Selecting Observations against Adversarial Objectives

175 nips-2007-Semi-Supervised Multitask Learning

176 nips-2007-Sequential Hypothesis Testing under Stochastic Deadlines

177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons

178 nips-2007-Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains

179 nips-2007-SpAM: Sparse Additive Models

180 nips-2007-Sparse Feature Learning for Deep Belief Networks

181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data

182 nips-2007-Sparse deep belief net model for visual area V2

183 nips-2007-Spatial Latent Dirichlet Allocation

184 nips-2007-Stability Bounds for Non-i.i.d. Processes

185 nips-2007-Stable Dual Dynamic Programming

186 nips-2007-Statistical Analysis of Semi-Supervised Regression

187 nips-2007-Structured Learning with Approximate Inference

188 nips-2007-Subspace-Based Face Recognition in Analog VLSI

189 nips-2007-Supervised Topic Models

190 nips-2007-Support Vector Machine Classification with Indefinite Kernels

191 nips-2007-Temporal Difference Updating without a Learning Rate

192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis

193 nips-2007-The Distribution Family of Similarity Distances

194 nips-2007-The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information

195 nips-2007-The Generalized FITC Approximation

196 nips-2007-The Infinite Gamma-Poisson Feature Model

197 nips-2007-The Infinite Markov Model

198 nips-2007-The Noisy-Logical Distribution and its Application to Causal Inference

199 nips-2007-The Price of Bandit Information for Online Optimization

200 nips-2007-The Tradeoffs of Large Scale Learning

201 nips-2007-The Value of Labeled and Unlabeled Examples when the Model is Imperfect

202 nips-2007-The discriminant center-surround hypothesis for bottom-up saliency

203 nips-2007-The rat as particle filter

204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs

205 nips-2007-Theoretical Analysis of Learning with Reward-Modulated Spike-Timing-Dependent Plasticity

206 nips-2007-Topmoumoute Online Natural Gradient Algorithm

207 nips-2007-Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations

208 nips-2007-TrueSkill Through Time: Revisiting the History of Chess

209 nips-2007-Ultrafast Monte Carlo for Statistical Summations

210 nips-2007-Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks

211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data

212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

213 nips-2007-Variational Inference for Diffusion Processes

214 nips-2007-Variational inference for Markov jump processes

215 nips-2007-What makes some POMDP problems easy to approximate?