jmlr jmlr2013 jmlr2013-98 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dan Stowell, Mark D. Plumbley
Abstract: We describe an inference task in which a set of timestamped event observations must be clustered into an unknown number of temporal sequences with independent and varying rates of observations. Various existing approaches to multi-object tracking assume a fixed number of sources and/or a fixed observation rate; we develop an approach to inferring structure in timestamped data produced by a mixture of an unknown and varying number of similar Markov renewal processes, plus independent clutter noise. The inference simultaneously distinguishes signal from noise as well as clustering signal observations into separate source streams. We illustrate the technique via synthetic experiments as well as an experiment to track a mixture of singing birds. Source code is available. Keywords: multi-target tracking, clustering, point processes, flow network, sound
Reference: text
sentIndex sentText sentNum sentScore
1 The inference simultaneously distinguishes signal from noise as well as clustering signal observations into separate source streams. [sent-12, score-0.426]
2 We are given a set of timestamped data, and we assume each datum is produced by one of a set of similar but independent signal processes, or by a “clutter” noise process, with known parameters. [sent-26, score-0.254]
3 In computational audio scene analysis, it is often the case that sound sources emit sound only intermittently during their presence in the scene, yet it is desirable to track their temporal evolution. [sent-38, score-0.409]
4 The PHD filter allows for stochastic missed detections but not for structured intermittency. [sent-46, score-0.241]
5 In the following we develop a model in which an unknown number of point-process sources are assumed to be active as well as Poisson clutter, and describe how to perform a maximum likelihood inference which clusters the signal into individual identified tracks plus clutter noise. [sent-63, score-0.502]
6 The overall system to be considered is not one but a set of such time-limited MRPs, plus a separate Poisson process that generates clutter noise with intensity λc (X). [sent-87, score-0.437]
7 We divide by the likelihood that all data were generated by the noise process, to give the likelihood ratio: K pMRP (k) k=1 pNOISE (k) L=∏ (1) where for notational simplicity we use pNOISE (k) as the joint likelihood of all observations contained within cluster k under the noise model. [sent-96, score-0.403]
8 The term pb (·) refers to the likelihood associated with a single observation under the Poisson process parametrised by λb , and similarly for pc (·) for the clutter process parametrised by λc . [sent-99, score-0.303]
9 The overall likelihood ratio L tells us the relative likelihood that the observation set was generated by the selected clustering of signals and noise, as opposed to the possibility that all observations were generated by clutter noise. [sent-100, score-0.401]
10 If we construct a directed graph with observations as vertices and possible transitions as arcs, then every possible path in the graph (from any vertex to any other reachable vertex) corresponds to one potential MRP cluster (Figure 1). [sent-112, score-0.236]
11 We then associate birth costs with arcs from the source and death costs with arcs to the sink. [sent-126, score-0.714]
12 This means that all feasible flows in our network will be composed of paths which consist of one single birth cost, plus a sequence of clutter and transition costs, and a single death cost. [sent-127, score-0.837]
13 The clutter noise cost ac (Xi ) is associated with this vertex. [sent-130, score-0.285]
14 The birth cost ab (Xi ) is associated with each arc Asi . [sent-134, score-0.363]
15 The death cost ad (Xi ) is associated with each arc Ait . [sent-136, score-0.24]
16 The number of vertices is closely related to the number of observations N; since we generate an arc for every possible transition between a pair of observations, |A| may be on the order of N 2 in the worst case. [sent-152, score-0.275]
17 However, if the clutter noise model is held constant between two different MMRP inferences, then the two likelihood ratios calculated by (1) can be divided through to give a likelihood ratio between the two. [sent-173, score-0.369]
18 To summarise the MMRP inference described in this section: given a set of observations plus MRP process parameters and noise process parameters, one first represents the data as a flow network, with added source and sink nodes, and with costs representing component likelihoods (Section 3. [sent-175, score-0.422]
19 Experiments We have described a multiple Markov renewal process (MMRP) inference technique which takes an MRP model, an iid clutter noise model and a set of timestamped data points, and finds a maximumlikelihood partition of the data into zero or more MRP sequences plus clutter noise. [sent-180, score-0.807]
20 We then apply MMRP inference in two experiments based on applications to audio tracking tasks: a synthetic experiment based on a well-known test of auditory “streaming” (Section 4. [sent-183, score-0.523]
21 3), and an experiment to track multiple singing birds in an audio mixture (Section 4. [sent-184, score-0.327]
22 However, the noise cluster is qualitatively different from the MRP clusters, 2220 S EGREGATING E VENT S TREAMS and the transitions within MRP sequences are the latent features of primary interest, so we will focus our evaluation measures on signal/noise separation and transitions. [sent-190, score-0.277]
23 However, the task for which our MMRP inference is designed is not an ordinary classification task: the signal/noise label for each ground-truth datum can be treated as a class label to be inferred, but the individual signal streams to be recovered do not have labels. [sent-192, score-0.273]
24 2 Synthetic Experiment I: MMRP Generated Data We designed a synthetic experiment to generate data under the MMRP model described in previous sections, with user-specified parameters including birth intensity, death probability, and clutter noise intensity. [sent-199, score-0.776]
25 To create an observation set, a set of birth events and clutter events were sampled independently from their Poisson distributions, and then each birth event was used as the starting point to sample a single {(X, T )} sequence using the death probability and the transition network. [sent-203, score-1.066]
26 The intensities for the birth process and the noise process were uniform across the alphabet of states, and so in the following we parametrise them by their intensity along the time axis only. [sent-204, score-0.451]
27 We used a signal-to-noise ratio (SNR) parameter to control the intensity of noise observations (λc ) in relation to that of signal observations: λc = λb · SNR . [sent-207, score-0.314]
28 The upper diagram shows a hypothetical groundtruth transition through a sequence of five observations (circles) accompanied by clutter noise (crosses). [sent-209, score-0.45]
29 The lower diagram shows what would happen if inference missed one of those observations out of the chain, resulting in one false-positive (dashed arrow) for a transition that does not exist in the ground-truth. [sent-210, score-0.356]
30 5 The factor of pd appears as well as the birth intensity (λb ) because the SNR relates to the count of all signal observations (not just births), and for a fixed death probability we have a geometric distribution over the number of detections per birth with expected value 1/pd . [sent-213, score-1.088]
31 To evaluate performance of our inference applied to such data, we repeatedly generated observation sets as described, and ran both the greedy and full inference algorithms on the data. [sent-214, score-0.4]
32 Unless otherwise stated, for all synthesis runs we used the following parameters: alphabet size 10, SNR 0 dB, birth intensity 0. [sent-215, score-0.365]
33 Together with the the birth intensity and SNR this implies that a typical generated observation set would consist of 160 observations, half being signal and half noise. [sent-222, score-0.458]
34 2 24 12 0 12 SNR (dB) 24 SNR (dB) 106 105 Run time (msecs) 104 103 102 full, birth intensity 0. [sent-252, score-0.365]
35 2 101 100 12 0 SNR (dB) 12 24 Figure 5: Performance of the full and greedy inference algorithms with varying SNR. [sent-256, score-0.271]
36 In this plot, we also compare birth intensities (λb ) of 0. [sent-260, score-0.259]
37 In this synthetic experiment, the separation of signal and noise (measured by FSN ) is strong at high SNRs and falls off a little as the SNR approaches zero. [sent-267, score-0.238]
38 The Fsigtrans measure shows a milder decline with SNR, but also notable differences between the full and greedy inference, with a consistent benefit in accurate recovery of transitions if the full inference is used. [sent-270, score-0.49]
39 We also show the measured runtime in Figure 5: the increased accuracy of full inference in recovering signal transitions comes at a cost of increased runtime, especially under adverse SNRs (because of the larger number of noise events generated). [sent-271, score-0.433]
40 This is important not only because we seek a robust algorithm, but also because parameters such as the birth density and death probability together imply approximate expectations about the level of polyphony in the signal. [sent-273, score-0.458]
41 We see that both algorithms (greedy and full) are robust to poor estimation of the birth density, death probability and SNR. [sent-276, score-0.404]
42 The advantage of full over greedy inference is maintained at around five percentage points in Fsigtrans through most of these varied conditions. [sent-277, score-0.271]
43 This reflects the fact that the transition probabilities encode the key structural distinction between signal and noise, and the key information that one could use to disambiguate two co-occurring signal streams. [sent-282, score-0.241]
44 We also investigated how inference may degrade when conditions fail to match some of the assumptions of the model: in many applications there may be missed detections, or noise may not be truly independent but exhibit correlations with the signal. [sent-283, score-0.277]
45 Missed detections were simulated by omitting observations at random; noise correlations were simulated by selecting a controllable fraction of the noise observations, and modifying those noise observations to have the same state and very similar time position as a randomly-selected signal datum. [sent-285, score-0.588]
46 This is plotted in Figure 8, showing that correlated noise beyond 25% can lead to run-times which are orders of magnitude longer, even though the data under consideration has the same number of observations and the same ratio of signal and noise observations. [sent-289, score-0.294]
47 Plots are as in Figure 5 but showing how performance varies with mismatch between the true and specified parameters for the birth density, death probability, SNR, and transition density. [sent-399, score-0.513]
48 00 Amount of signal correlation imposed on noise Figure 7: Sensitivity of inference to missed data and correlated noise. [sent-455, score-0.343]
49 Plots are as in Figure 6 but showing how performance varies when some detections are missed, and when noise is not independent but correlated with signal. [sent-456, score-0.238]
50 The most critical parameter for successful MMRP inference appears to be the transition probability structure rather than assumptions about birth/death probabilities, which accords with our intuition that the Markov structure of the sequences is the source of the discriminative power. [sent-464, score-0.321]
51 0 Figure 10: MRP transition probability densities for the two synthetic models: coherent (left) and segregated (right). [sent-488, score-0.417]
52 For synthesis, we generated four simultaneous sequences each with a random offset in state space, and we also added iid Poisson clutter noise in the same region of state space, whose intensity is held constant within each run to create a given SNR. [sent-502, score-0.451]
53 For MMRP inference we used fixed parameters derived from the SNR value and an arbitrary death probability of 0. [sent-505, score-0.247]
54 The following relationships show how to derive the birth and clutter likelihood parameters from the SNR value expressed as a ratio: SNR · pd , 1 + SNR 1 pc = . [sent-507, score-0.545]
55 The first column of Figure 11 shows the results of generating data under the locked, coherent and segregated models, with two generated sequences present in each case. [sent-510, score-0.32]
56 The second column 2228 S EGREGATING E VENT S TREAMS Figure 11: Results of generating observations under the locked, coherent or segregated model (in each row), and then analysing them using the coherent model or the segregated model (final two columns). [sent-511, score-0.576]
57 shows the sequences with added clutter noise at an SNR of -12 dB. [sent-513, score-0.345]
58 The final two columns show the maximum-likelihood signal sequences inferred under the coherent and the segregated model. [sent-514, score-0.422]
59 This leads to unlikely emission sequences as judged by the coherent model, and so the coherent model finds the maximum-likelihood solution to be that with no sequences (the blank plot in the figure). [sent-517, score-0.336]
60 Inference using the segregated model extracts traces in all three cases, since the phase-locked drift of the coherent model is not unlikely under the segregated model. [sent-518, score-0.45]
61 3 24 4 items, SNR known 4 items, SNR unknown 4 items, SNR unknown, greedy inference 12 0 SNR 12 24 Figure 12: F-measure for signal/noise separation (FSN ) and transitions (Fsigtrans ). [sent-537, score-0.298]
62 The ground truth in each case is a combination of four ABABAB streams, generated via the coherent or segregated cases (20 runs of each type). [sent-538, score-0.26]
63 As in the previous experiment, full inference shows a consistent advantage over the greedy inference, though this tails off at -24 dB SNR. [sent-544, score-0.271]
64 As in the previous experiment inspired by auditory streaming, if we model these natural sound sources with an MRP then our inference procedure should be able to separate multiple simultaneous “streams” of emissions. [sent-548, score-0.305]
65 In the following experiment we studied the ability of our inference to perform this separation in data derived from audio signals containing multiple instances of a species of bird common in many European countries, the Common Chiffchaff (Salomon and Hemim, 1992). [sent-549, score-0.509]
66 1 We located 25 recordings of song of the Chiffchaff (species Phylloscopus collybita) recorded in Europe (excluding any recordings marked as having “deviant” song or uncertain species identity; also excluding calls which are different from song in sound and function). [sent-559, score-0.279]
67 seconds to many minutes, so to create a set of independent audio samples which could be mixed together to create mixtures with overlapping bouts of song, audio files were each trimmed automatically to their highest-amplitude 8. [sent-573, score-0.553]
68 2 Each audio file was analysed separately to create training data; during testing, audio files were digitally mixed in groups of one to five files. [sent-576, score-0.508]
69 In order to convert an audio file into a sequence of events amenable to MMRP inference, we used spectro-temporal cross-correlation to detect individual syllables of song, as used by Osiejuk (2000). [sent-577, score-0.342]
70 Such cross-correlation detection applied to an audio file produces a set of observations, each having a time and frequency offset and a correlation strength (Figure 14). [sent-583, score-0.281]
71 It typically contains one detection for every Chiffchaff syllable, with occasional doubled detections and spurious noise detections. [sent-584, score-0.265]
72 In the lower image, bold lines represent detections treated as “signal” in the filtering used for training, while the fainter lines represent detections used to train the noise model. [sent-600, score-0.39]
73 Note that the noise detections often have relatively strong signal correlations, as seen in Figure 14. [sent-601, score-0.304]
74 In order to derive a Gaussian mixture model (GMM) transition probability model from monophonic Chiffchaff training data, for each audio file in a training set we filtered the observations automatically to keep only the single strongest detection within any 0. [sent-605, score-0.48]
75 This time 2232 S EGREGATING E VENT S TREAMS limit corresponds to the lower limit on the rate of song syllables; such filtering is only appropriate for monophonic training sequences and was not applied to the audio mixtures used for testing. [sent-607, score-0.417]
76 We also trained a separate GMM to create a noise model, taking the set of observations that had been discarded in the above filtering step and training a 10-component GMM with full covariance to fit an iid distribution to the one-dimensional log(frequency) data for the noise observations. [sent-609, score-0.283]
77 2 I NFERENCE FROM AUDIO M IXTURES In order to test whether the MMRP approach could recover syllable sequences from audio mixtures, we performed an experiment using five-fold cross-validation. [sent-612, score-0.415]
78 For each fold we used 20 audio files for training, and then with the remaining five audio files we created audio mixtures of up to five signals, testing recovery in each case. [sent-613, score-0.927]
79 For each mixture file, we applied spectro-temporal crosscorrelation as described above, then performed both full and greedy inference using the empiricallyderived signal and noise GMMs to provide densities/intensities for transition and clutter. [sent-614, score-0.566]
80 Again, though, the full inference shows a general advantage over greedy inference in the correct recovery of transitions. [sent-626, score-0.493]
81 0 Recovery from audio Recovery from audio (greedy) Recovery from audio (baseline) 4 2 3 Number of signals in mixture 5 0. [sent-639, score-0.831]
82 4 Recovery from audio Recovery from audio (greedy) Recovery from audio (baseline) 0. [sent-642, score-0.762]
83 0 1 4 2 3 Number of signals in mixture 5 Figure 15: The FSN and Fsigtrans evaluation measures for the Chiffchaff audio analyses. [sent-644, score-0.323]
84 Ideal recovery, synthetic noise: To simulate ideal recovery but with more adverse noise conditions, we proceeded as in the ideal case, but also added extra clutter noise at 0 dB. [sent-650, score-0.659]
85 5 Ideal recovery, trained on test data Ideal recovery Ideal recovery plus synthetic noise Recovery from audio Recovery from audio (greedy) Recovery from audio (baseline) 0. [sent-658, score-1.182]
86 4 Ideal recovery, trained on test data Ideal recovery Ideal recovery plus synthetic noise Recovery from audio Recovery from audio (greedy) Recovery from audio (baseline) 0. [sent-664, score-1.182]
87 Note that our synthetic noise is temporally decorrelated from the signal, whereas the noise present in recovery from audio mixtures shows quite strong correlations (Figure 14). [sent-678, score-0.639]
88 Our results indicate that in this experiment the noise correlation is not a major impediment to recovery from audio, since the uncorrelated noise induces consistently worse performance in Fsigtrans , and a similar level of performance in FSN at high polyphony. [sent-679, score-0.331]
89 Taken together, these results show that the practical task of retrieving detections from audio mixtures has a significant effect on algorithm performance, but that MMRP inference still performs strongly in simultaneously inferring signal/noise discrimination and clustering signals into tracks. [sent-680, score-0.588]
90 As in the synthetic experiments, in the present experiment the full MMRP inference shows a consistent Fsigtrans benefit over the greedy inference, although this must be balanced against the additional runtime cost. [sent-683, score-0.358]
91 Conclusions In this paper we have investigated the problem of segregating timestamped data originating in multiple point processes plus clutter noise. [sent-685, score-0.299]
92 We developed an approach to inferring structure in data produced by a mixture of an unknown number of similar Markov renewal processes (MRPs) plus independent clutter noise. [sent-686, score-0.34]
93 The inference simultaneously distinguishes signal from noise as well as clustering signal observations into separate source streams, by solving a network flow problem isomorphic to the MMRP mixture problem. [sent-687, score-0.506]
94 The full optimal MMRP inference incurs a higher complexity than a greedy approach, but generally achieves a more accurate recovery of the event-to-event transitions present in the data. [sent-690, score-0.435]
95 In a synthetic experiment, we explored the robustness of inference, and found that good performance is possible despite misspecification of parameters such as the birth density and noise level. [sent-691, score-0.393]
96 To illustrate applications of the technique, we then conducted two experiments related to audio recognition tasks. [sent-694, score-0.254]
97 In an experiment based on the “auditory streaming” paradigm, we showed that MMRP inference can recover polyphonic event streams from noisy observations, applying different MRP generative models to implement different expectations about the streams to be recovered. [sent-695, score-0.255]
98 Then in an experiment on birdsong audio data we showed strong performance, albeit with a dependence on the quality of the underlying representation to recover events from audio data. [sent-696, score-0.626]
99 Non-negative hidden Markov modeling of audio with application to source separation. [sent-834, score-0.304]
100 Sparse representations in audio and music: From coding to source separation. [sent-858, score-0.304]
wordName wordTfidf (topN-words)
[('fsigtrans', 0.321), ('mmrp', 0.276), ('snr', 0.268), ('birth', 0.259), ('mrp', 0.259), ('audio', 0.254), ('fsn', 0.241), ('clutter', 0.199), ('detections', 0.152), ('segregated', 0.152), ('death', 0.145), ('recovery', 0.12), ('lumbley', 0.116), ('towell', 0.116), ('greedy', 0.114), ('transition', 0.109), ('coherent', 0.108), ('egregating', 0.107), ('treams', 0.107), ('intensity', 0.106), ('ow', 0.103), ('inference', 0.102), ('vent', 0.092), ('chiffchaff', 0.089), ('missed', 0.089), ('noise', 0.086), ('auditory', 0.08), ('arcs', 0.078), ('arc', 0.068), ('signal', 0.066), ('syllable', 0.062), ('renewal', 0.061), ('sequences', 0.06), ('song', 0.058), ('streams', 0.057), ('observations', 0.056), ('db', 0.056), ('full', 0.055), ('gutin', 0.054), ('polyphony', 0.054), ('syllables', 0.054), ('timestamped', 0.054), ('costs', 0.052), ('source', 0.05), ('cluster', 0.049), ('synthetic', 0.048), ('datum', 0.048), ('sources', 0.047), ('plus', 0.046), ('adverse', 0.046), ('network', 0.046), ('doi', 0.045), ('vertex', 0.045), ('mixtures', 0.045), ('pd', 0.045), ('birdsong', 0.045), ('locked', 0.045), ('mrps', 0.045), ('stowell', 0.045), ('misspeci', 0.044), ('transitions', 0.044), ('vertices', 0.042), ('likelihood', 0.042), ('bird', 0.041), ('experiment', 0.039), ('streaming', 0.038), ('plumbley', 0.038), ('drift', 0.038), ('separation', 0.038), ('ideal', 0.037), ('sound', 0.037), ('ab', 0.036), ('inferred', 0.036), ('intermittent', 0.036), ('mahler', 0.036), ('pmrp', 0.036), ('spectrogram', 0.036), ('pb', 0.035), ('signals', 0.035), ('events', 0.034), ('temporal', 0.034), ('recordings', 0.034), ('gmm', 0.034), ('emissions', 0.034), ('mixture', 0.034), ('paths', 0.033), ('ows', 0.03), ('sink', 0.03), ('labelled', 0.029), ('cuts', 0.029), ('markov', 0.029), ('template', 0.028), ('les', 0.028), ('ad', 0.027), ('detection', 0.027), ('observation', 0.027), ('poisson', 0.027), ('ababab', 0.027), ('canto', 0.027), ('collybita', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model
Author: Dan Stowell, Mark D. Plumbley
Abstract: We describe an inference task in which a set of timestamped event observations must be clustered into an unknown number of temporal sequences with independent and varying rates of observations. Various existing approaches to multi-object tracking assume a fixed number of sources and/or a fixed observation rate; we develop an approach to inferring structure in timestamped data produced by a mixture of an unknown and varying number of similar Markov renewal processes, plus independent clutter noise. The inference simultaneously distinguishes signal from noise as well as clustering signal observations into separate source streams. We illustrate the technique via synthetic experiments as well as an experiment to track a mixture of singing birds. Source code is available. Keywords: multi-target tracking, clustering, point processes, flow network, sound
2 0.15053272 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows
Author: Julien Mairal, Bin Yu
Abstract: We consider supervised learning problems where the features are embedded in a graph, such as gene expressions in a gene network. In this context, it is of much interest to automatically select a subgraph with few connected components; by exploiting prior knowledge, one can indeed improve the prediction performance or obtain results that are easier to interpret. Regularization or penalty functions for selecting features in graphs have recently been proposed, but they raise new algorithmic challenges. For example, they typically require solving a combinatorially hard selection problem among all connected subgraphs. In this paper, we propose computationally feasible strategies to select a sparse and well-connected subset of features sitting on a directed acyclic graph (DAG). We introduce structured sparsity penalties over paths on a DAG called “path coding” penalties. Unlike existing regularization functions that model long-range interactions between features in a graph, path coding penalties are tractable. The penalties and their proximal operators involve path selection problems, which we efficiently solve by leveraging network flow optimization. We experimentally show on synthetic, image, and genomic data that our approach is scalable and leads to more connected subgraphs than other regularization functions for graphs. Keywords: convex and non-convex optimization, network flow optimization, graph sparsity
3 0.05561481 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models
Author: Matthew J. Johnson, Alan S. Willsky
Abstract: There is much interest in the Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) as a natural Bayesian nonparametric extension of the ubiquitous Hidden Markov Model for learning from sequential and time-series data. However, in many settings the HDP-HMM’s strict Markovian constraints are undesirable, particularly if we wish to learn or encode non-geometric state durations. We can extend the HDP-HMM to capture such structure by drawing upon explicit-duration semi-Markov modeling, which has been developed mainly in the parametric non-Bayesian setting, to allow construction of highly interpretable models that admit natural prior information on state durations. In this paper we introduce the explicit-duration Hierarchical Dirichlet Process Hidden semiMarkov Model (HDP-HSMM) and develop sampling algorithms for efficient posterior inference. The methods we introduce also provide new methods for sampling inference in the finite Bayesian HSMM. Our modular Gibbs sampling methods can be embedded in samplers for larger hierarchical Bayesian models, adding semi-Markov chain modeling as another tool in the Bayesian inference toolbox. We demonstrate the utility of the HDP-HSMM and our inference methods on both synthetic and real experiments. Keywords: Bayesian nonparametrics, time series, semi-Markov, sampling algorithms, Hierarchical Dirichlet Process Hidden Markov Model
4 0.044295274 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut
Author: Jun Wang, Tony Jebara, Shih-Fu Chang
Abstract: Graph-based semi-supervised learning (SSL) methods play an increasingly important role in practical machine learning systems, particularly in agnostic settings when no parametric information or other prior knowledge is available about the data distribution. Given the constructed graph represented by a weight matrix, transductive inference is used to propagate known labels to predict the values of all unlabeled vertices. Designing a robust label diffusion algorithm for such graphs is a widely studied problem and various methods have recently been suggested. Many of these can be formalized as regularized function estimation through the minimization of a quadratic cost. However, most existing label diffusion methods minimize a univariate cost with the classification function as the only variable of interest. Since the observed labels seed the diffusion process, such univariate frameworks are extremely sensitive to the initial label choice and any label noise. To alleviate the dependency on the initial observed labels, this article proposes a bivariate formulation for graph-based SSL, where both the binary label information and a continuous classification function are arguments of the optimization. This bivariate formulation is shown to be equivalent to a linearly constrained Max-Cut problem. Finally an efficient solution via greedy gradient Max-Cut (GGMC) is derived which gradually assigns unlabeled vertices to each class with minimum connectivity. Once convergence guarantees are established, this greedy Max-Cut based SSL is applied on both artificial and standard benchmark data sets where it obtains superior classification accuracy compared to existing state-of-the-art SSL methods. Moreover, GGMC shows robustness with respect to the graph construction method and maintains high accuracy over extensive experiments with various edge linking and weighting schemes. Keywords: graph transduction, semi-supervised learning, bivariate formulation, mixed integer programming, greedy
5 0.043789957 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference
Author: Edward Challis, David Barber
Abstract: We investigate Gaussian Kullback-Leibler (G-KL) variational approximate inference techniques for Bayesian generalised linear models and various extensions. In particular we make the following novel contributions: sufficient conditions for which the G-KL objective is differentiable and convex are described; constrained parameterisations of Gaussian covariance that make G-KL methods fast and scalable are provided; the lower bound to the normalisation constant provided by G-KL methods is proven to dominate those provided by local lower bounding methods; complexity and model applicability issues of G-KL versus other Gaussian approximate inference methods are discussed. Numerical results comparing G-KL and other deterministic Gaussian approximate inference methods are presented for: robust Gaussian process regression models with either Student-t or Laplace likelihoods, large scale Bayesian binary logistic regression models, and Bayesian sparse linear models for sequential experimental design. Keywords: generalised linear models, latent linear models, variational approximate inference, large scale inference, sparse learning, experimental design, active learning, Gaussian processes
6 0.04129687 121 jmlr-2013-Variational Inference in Nonconjugate Models
7 0.041066106 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions
8 0.040443614 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs
9 0.037832417 108 jmlr-2013-Stochastic Variational Inference
10 0.036503494 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning
11 0.035876635 82 jmlr-2013-Optimally Fuzzy Temporal Memory
12 0.033650983 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation
13 0.032704987 113 jmlr-2013-The CAM Software for Nonnegative Blind Source Separation in R-Java
14 0.032659486 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering
15 0.031565033 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models
16 0.03135344 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion
17 0.029207354 110 jmlr-2013-Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion
18 0.028923294 56 jmlr-2013-Keep It Simple And Sparse: Real-Time Action Recognition
19 0.028858349 119 jmlr-2013-Variable Selection in High-Dimension with Random Designs and Orthogonal Matching Pursuit
20 0.027918708 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach
topicId topicWeight
[(0, -0.159), (1, -0.053), (2, -0.06), (3, 0.026), (4, 0.062), (5, 0.128), (6, -0.01), (7, 0.047), (8, 0.03), (9, -0.006), (10, 0.046), (11, -0.066), (12, -0.187), (13, -0.002), (14, 0.093), (15, 0.053), (16, -0.023), (17, 0.076), (18, -0.025), (19, 0.056), (20, -0.082), (21, -0.165), (22, -0.272), (23, 0.003), (24, -0.196), (25, 0.2), (26, 0.03), (27, -0.128), (28, 0.159), (29, -0.062), (30, 0.163), (31, -0.076), (32, -0.156), (33, 0.026), (34, -0.136), (35, 0.168), (36, -0.206), (37, -0.188), (38, 0.029), (39, -0.002), (40, 0.032), (41, 0.01), (42, 0.054), (43, -0.035), (44, -0.16), (45, -0.021), (46, 0.066), (47, 0.009), (48, 0.075), (49, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.94932884 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model
Author: Dan Stowell, Mark D. Plumbley
Abstract: We describe an inference task in which a set of timestamped event observations must be clustered into an unknown number of temporal sequences with independent and varying rates of observations. Various existing approaches to multi-object tracking assume a fixed number of sources and/or a fixed observation rate; we develop an approach to inferring structure in timestamped data produced by a mixture of an unknown and varying number of similar Markov renewal processes, plus independent clutter noise. The inference simultaneously distinguishes signal from noise as well as clustering signal observations into separate source streams. We illustrate the technique via synthetic experiments as well as an experiment to track a mixture of singing birds. Source code is available. Keywords: multi-target tracking, clustering, point processes, flow network, sound
2 0.66975868 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows
Author: Julien Mairal, Bin Yu
Abstract: We consider supervised learning problems where the features are embedded in a graph, such as gene expressions in a gene network. In this context, it is of much interest to automatically select a subgraph with few connected components; by exploiting prior knowledge, one can indeed improve the prediction performance or obtain results that are easier to interpret. Regularization or penalty functions for selecting features in graphs have recently been proposed, but they raise new algorithmic challenges. For example, they typically require solving a combinatorially hard selection problem among all connected subgraphs. In this paper, we propose computationally feasible strategies to select a sparse and well-connected subset of features sitting on a directed acyclic graph (DAG). We introduce structured sparsity penalties over paths on a DAG called “path coding” penalties. Unlike existing regularization functions that model long-range interactions between features in a graph, path coding penalties are tractable. The penalties and their proximal operators involve path selection problems, which we efficiently solve by leveraging network flow optimization. We experimentally show on synthetic, image, and genomic data that our approach is scalable and leads to more connected subgraphs than other regularization functions for graphs. Keywords: convex and non-convex optimization, network flow optimization, graph sparsity
3 0.35047579 113 jmlr-2013-The CAM Software for Nonnegative Blind Source Separation in R-Java
Author: Niya Wang, Fan Meng, Li Chen, Subha Madhavan, Robert Clarke, Eric P. Hoffman, Jianhua Xuan, Yue Wang
Abstract: We describe a R-Java CAM (convex analysis of mixtures) package that provides comprehensive analytic functions and a graphic user interface (GUI) for blindly separating mixed nonnegative sources. This open-source multiplatform software implements recent and classic algorithms in the literature including Chan et al. (2008), Wang et al. (2010), Chen et al. (2011a) and Chen et al. (2011b). The CAM package offers several attractive features: (1) instead of using proprietary MATLAB, its analytic functions are written in R, which makes the codes more portable and easier to modify; (2) besides producing and plotting results in R, it also provides a Java GUI for automatic progress update and convenient visual monitoring; (3) multi-thread interactions between the R and Java modules are driven and integrated by a Java GUI, assuring that the whole CAM software runs responsively; (4) the package offers a simple mechanism to allow others to plug-in additional R-functions. Keywords: convex analysis of mixtures, blind source separation, affinity propagation clustering, compartment modeling, information-based model selection c 2013 Niya Wang, Fan Meng, Li Chen, Subha Madhavan, Robert Clarke, Eric P. Hoffman, Jianhua Xuan and Yue Wang. WANG , M ENG , C HEN , M ADHAVAN , C LARKE , H OFFMAN , X UAN AND WANG 1. Overview Blind source separation (BSS) has proven to be a powerful and widely-applicable tool for the analysis and interpretation of composite patterns in engineering and science (Hillman and Moore, 2007; Lee and Seung, 1999). BSS is often described by a linear latent variable model X = AS, where X is the observation data matrix, A is the unknown mixing matrix, and S is the unknown source data matrix. The fundamental objective of BSS is to estimate both the unknown but informative mixing proportions and the source signals based only on the observed mixtures (Child, 2006; Cruces-Alvarez et al., 2004; Hyvarinen et al., 2001; Keshava and Mustard, 2002). While many existing BSS algorithms can usefully extract interesting patterns from mixture observations, they often prove inaccurate or even incorrect in the face of real-world BSS problems in which the pre-imposed assumptions may be invalid. There is a family of approaches exploiting the source non-negativity, including the non-negative matrix factorization (NMF) (Gillis, 2012; Lee and Seung, 1999). This motivates the development of alternative BSS techniques involving exploitation of source nonnegative nature (Chan et al., 2008; Chen et al., 2011a,b; Wang et al., 2010). The method works by performing convex analysis of mixtures (CAM) that automatically identifies pure-source signals that reside at the vertices of the multifaceted simplex most tightly enclosing the data scatter, enabling geometrically-principled delineation of distinct source patterns from mixtures, with the number of underlying sources being suggested by the minimum description length criterion. Consider a latent variable model x(i) = As(i), where the observation vector x(i) = [x1 (i), ..., xM (i)]T can be expressed as a non-negative linear combination of the source vectors s(i) = [s1 (i), ..., sJ (i)]T , and A = [a1 , ..., aJ ] is the mixing matrix with a j being the jth column vector. This falls neatly within the definition of a convex set (Fig. 1) (Chen et al., 2011a): X= J J ∑ j=1 s j (i)a j |a j ∈ A, s j (i) ≥ 0, ∑ j=1 s j (i) = 1, i = 1, ..., N . Assume that the sources have at least one sample point whose signal is exclusively enriched in a particular source (Wang et al., 2010), we have shown that the vertex points of the observation simplex (Fig. 1) correspond to the column vectors of the mixing matrix (Chen et al., 2011b). Via a minimum-error-margin volume maximization, CAM identifies the optimum set of the vertices (Chen et al., 2011b; Wang et al., 2010). Using the samples attached to the vertices, compartment modeling (CM) (Chen et al., 2011a) obtains a parametric solution of A, nonnegative independent component analysis (nICA) (Oja and Plumbley, 2004) estimates A (and s) that maximizes the independency in s, and nonnegative well-grounded component analysis (nWCA) (Wang et al., 2010) finds the column vectors of A directly from the vertex cluster centers. Figure 1: Schematic and illustrative flowchart of R-Java CAM package. 2900 T HE CAM S OFTWARE IN R-JAVA In this paper we describe a newly developed R-Java CAM package whose analytic functions are written in R, while a graphic user interface (GUI) is implemented in Java, taking full advantages of both programming languages. The core software suite implements CAM functions and includes normalization, clustering, and data visualization. Multi-thread interactions between the R and Java modules are driven and integrated by a Java GUI, which not only provides convenient data or parameter passing and visual progress monitoring but also assures the responsive execution of the entire CAM software. 2. Software Design and Implementation The CAM package mainly consists of R and Java modules. The R module is a collection of main and helper functions, each represented by an R function object and achieving an independent and specific task (Fig. 1). The R module mainly performs various analytic tasks required by CAM: figure plotting, update, or error message generation. The Java module is developed to provide a GUI (Fig. 2). We adopt the model-view-controller (MVC) design strategy, and use different Java classes to separately perform information visualization and human-computer interaction. The Java module also serves as the software driver and integrator that use a multi-thread strategy to facilitate the interactions between the R and Java modules, such as importing raw data, passing algorithmic parameters, calling R scripts, and transporting results and messages. Figure 2: Interactive Java GUI supported by a multi-thread design strategy. 2.1 Analytic and Presentation Tasks Implemented in R The R module performs the CAM algorithm and facilitates a suite of subsequent analyses including CM, nICA, and nWCA. These tasks are performed by the three main functions: CAM-CM.R, CAM-nICA.R, and CAM-nWCA.R, which can be activated by the three R scripts: Java-runCAM-CM.R, Java-runCAM-ICA.R, and Java-runCAM-nWCA.R. The R module also performs auxiliary tasks including automatic R library installation, figure drawing, and result recording; and offers other standard methods such as nonnegative matrix factorization (Lee and Seung, 1999), Fast ICA (Hyvarinen et al., 2001), factor analysis (Child, 2006), principal component analysis, affinity propagation, k-means clustering, and expectation-maximization algorithm for learning standard finite normal mixture model. 2.2 Graphic User Interface Written in Java Swing The Java GUI module allows users to import data, select algorithms and parameters, and display results. The module encloses two packages: guiView contains classes for handling frames and 2901 WANG , M ENG , C HEN , M ADHAVAN , C LARKE , H OFFMAN , X UAN AND WANG Figure 3: Application of R-Java CAM to deconvolving dynamic medical image sequence. dialogs for managing user inputs; guiModel contains classes for representing result data sets and for interacting with the R script caller. Packaged as one jar file, the GUI module runs automatically. 2.3 Functional Interaction Between R and Java We adopt the open-source program RCaller (http://code.google.com/p/rcaller) to implement the interaction between R and Java modules (Fig. 2), supported by explicitly designed R scripts such as Java-runCAM-CM.R. Specifically, five featured Java classes are introduced to interact with R for importing data or parameters, running algorithms, passing on or recording results, displaying figures, and handing over error messages. The examples of these classes include guiModel.MyRCaller.java, guiModel.MyRCaller.readResults(), and guiView.MyRPlotViewer. 3. Case Studies and Experimental Results The CAM package has been successfully applied to various data types. Using dynamic contrastenhanced magnetic resonance imaging data set of an advanced breast cancer case (Chen, et al., 2011b),“double click” (or command lines under Ubuntu) activated execution of CAM-Java.jar reveals two biologically interpretable vascular compartments with distinct kinetic patterns: fast clearance in the peripheral “rim” and slow clearance in the inner “core”. These outcomes are consistent with previously reported intratumor heterogeneity (Fig. 3). Angiogenesis is essential to tumor development beyond 1-2mm3 . It has been widely observed that active angiogenesis is often observed in advanced breast tumors occurring in the peripheral “rim” with co-occurrence of inner-core hypoxia. This pattern is largely due to the defective endothelial barrier function and outgrowth blood supply. In another application to natural image mixtures, CAM algorithm successfully recovered the source images in a large number of trials (see Users Manual). 4. Summary and Acknowledgements We have developed a R-Java CAM package for blindly separating mixed nonnegative sources. The open-source cross-platform software is easy-to-use and effective, validated in several real-world applications leading to plausible scientific discoveries. The software is freely downloadable from http://mloss.org/software/view/437/. We intend to maintain and support this package in the future. This work was supported in part by the US National Institutes of Health under Grants CA109872, CA 100970, and NS29525. We thank T.H. Chan, F.Y. Wang, Y. Zhu, and D.J. Miller for technical discussions. 2902 T HE CAM S OFTWARE IN R-JAVA References T.H. Chan, W.K. Ma, C.Y. Chi, and Y. Wang. A convex analysis framework for blind separation of non-negative sources. IEEE Transactions on Signal Processing, 56:5120–5143, 2008. L. Chen, T.H. Chan, P.L. Choyke, and E.M. Hillman et al. Cam-cm: a signal deconvolution tool for in vivo dynamic contrast-enhanced imaging of complex tissues. Bioinformatics, 27:2607–2609, 2011a. L. Chen, P.L. Choyke, T.H. Chan, and C.Y. Chi et al. Tissue-specific compartmental analysis for dynamic contrast-enhanced mr imaging of complex tumors. IEEE Transactions on Medical Imaging, 30:2044–2058, 2011b. D. Child. The essentials of factor analysis. Continuum International, 2006. S.A. Cruces-Alvarez, Andrzej Cichocki, and Shun ichi Amari. From blind signal extraction to blind instantaneous signal separation: criteria, algorithms, and stability. IEEE Transactions on Neural Networks, 15:859–873, 2004. N. Gillis. Sparse and unique nonnegative matrix factorization through data preprocessing. Journal of Machine Learning Research, 13:3349–3386, 2012. E.M.C. Hillman and A. Moore. All-optical anatomical co-registration for molecular imaging of small animals using dynamic contrast. Nature Photonics, 1:526–530, 2007. A. Hyvarinen, J. Karhunen, and E. Oja. Independent Component Analysis. John Wiley, New York, 2001. N. Keshava and J.F. Mustard. Spectral unmixing. IEEE Signal Processing Magazine, 19:44–57, 2002. D.D. Lee and H.S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788–791, 1999. E. Oja and M. Plumbley. Blind separation of positive sources by globally convergent gradient search. Neural Computation, 16:1811–1825, 2004. F.Y. Wang, C.Y. Chi, T.H. Chan, and Y. Wang. Nonnegative least-correlated component analysis for separation of dependent sources by volume maximization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32:857–888, 2010. 2903
4 0.30949283 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions
Author: Vinayak Rao, Yee Whye Teh
Abstract: Markov jump processes (or continuous-time Markov chains) are a simple and important class of continuous-time dynamical systems. In this paper, we tackle the problem of simulating from the posterior distribution over paths in these models, given partial and noisy observations. Our approach is an auxiliary variable Gibbs sampler, and is based on the idea of uniformization. This sets up a Markov chain over paths by alternately sampling a finite set of virtual jump times given the current path, and then sampling a new path given the set of extant and virtual jump times. The first step involves simulating a piecewise-constant inhomogeneous Poisson process, while for the second, we use a standard hidden Markov model forward filtering-backward sampling algorithm. Our method is exact and does not involve approximations like time-discretization. We demonstrate how our sampler extends naturally to MJP-based models like Markov-modulated Poisson processes and continuous-time Bayesian networks, and show significant computational benefits over state-ofthe-art MCMC samplers for these models. Keywords: Markov jump process, MCMC, Gibbs sampler, uniformization, Markov-modulated Poisson process, continuous-time Bayesian network
5 0.27968359 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models
Author: Matthew J. Johnson, Alan S. Willsky
Abstract: There is much interest in the Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) as a natural Bayesian nonparametric extension of the ubiquitous Hidden Markov Model for learning from sequential and time-series data. However, in many settings the HDP-HMM’s strict Markovian constraints are undesirable, particularly if we wish to learn or encode non-geometric state durations. We can extend the HDP-HMM to capture such structure by drawing upon explicit-duration semi-Markov modeling, which has been developed mainly in the parametric non-Bayesian setting, to allow construction of highly interpretable models that admit natural prior information on state durations. In this paper we introduce the explicit-duration Hierarchical Dirichlet Process Hidden semiMarkov Model (HDP-HSMM) and develop sampling algorithms for efficient posterior inference. The methods we introduce also provide new methods for sampling inference in the finite Bayesian HSMM. Our modular Gibbs sampling methods can be embedded in samplers for larger hierarchical Bayesian models, adding semi-Markov chain modeling as another tool in the Bayesian inference toolbox. We demonstrate the utility of the HDP-HSMM and our inference methods on both synthetic and real experiments. Keywords: Bayesian nonparametrics, time series, semi-Markov, sampling algorithms, Hierarchical Dirichlet Process Hidden Markov Model
6 0.23132588 51 jmlr-2013-Greedy Sparsity-Constrained Optimization
7 0.2171016 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut
8 0.21023448 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering
9 0.2096215 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs
10 0.20946509 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks
11 0.1953696 15 jmlr-2013-Bayesian Canonical Correlation Analysis
12 0.19421519 82 jmlr-2013-Optimally Fuzzy Temporal Memory
13 0.19089849 106 jmlr-2013-Stationary-Sparse Causality Network Learning
14 0.18191554 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation
15 0.1651995 22 jmlr-2013-Classifying With Confidence From Incomplete Information
16 0.16279183 61 jmlr-2013-Learning Theory Analysis for Association Rules and Sequential Event Prediction
18 0.15467095 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference
19 0.14941056 76 jmlr-2013-Nonparametric Sparsity and Regularization
20 0.14772156 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation
topicId topicWeight
[(0, 0.022), (5, 0.087), (6, 0.564), (9, 0.01), (10, 0.053), (20, 0.016), (23, 0.024), (44, 0.01), (68, 0.045), (70, 0.026), (75, 0.028), (85, 0.013), (87, 0.011)]
simIndex simValue paperId paperTitle
1 0.94463348 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization
Author: Nicholas Ruozzi, Sekhar Tatikonda
Abstract: Gaussian belief propagation (GaBP) is an iterative algorithm for computing the mean (and variances) of a multivariate Gaussian distribution, or equivalently, the minimum of a multivariate positive definite quadratic function. Sufficient conditions, such as walk-summability, that guarantee the convergence and correctness of GaBP are known, but GaBP may fail to converge to the correct solution given an arbitrary positive definite covariance matrix. As was observed by Malioutov et al. (2006), the GaBP algorithm fails to converge if the computation trees produced by the algorithm are not positive definite. In this work, we will show that the failure modes of the GaBP algorithm can be understood via graph covers, and we prove that a parameterized generalization of the min-sum algorithm can be used to ensure that the computation trees remain positive definite whenever the input matrix is positive definite. We demonstrate that the resulting algorithm is closely related to other iterative schemes for quadratic minimization such as the Gauss-Seidel and Jacobi algorithms. Finally, we observe, empirically, that there always exists a choice of parameters such that the above generalization of the GaBP algorithm converges. Keywords: belief propagation, Gaussian graphical models, graph covers
same-paper 2 0.90725845 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model
Author: Dan Stowell, Mark D. Plumbley
Abstract: We describe an inference task in which a set of timestamped event observations must be clustered into an unknown number of temporal sequences with independent and varying rates of observations. Various existing approaches to multi-object tracking assume a fixed number of sources and/or a fixed observation rate; we develop an approach to inferring structure in timestamped data produced by a mixture of an unknown and varying number of similar Markov renewal processes, plus independent clutter noise. The inference simultaneously distinguishes signal from noise as well as clustering signal observations into separate source streams. We illustrate the technique via synthetic experiments as well as an experiment to track a mixture of singing birds. Source code is available. Keywords: multi-target tracking, clustering, point processes, flow network, sound
3 0.87174505 106 jmlr-2013-Stationary-Sparse Causality Network Learning
Author: Yuejia He, Yiyuan She, Dapeng Wu
Abstract: Recently, researchers have proposed penalized maximum likelihood to identify network topology underlying a dynamical system modeled by multivariate time series. The time series of interest are assumed to be stationary, but this restriction is never taken into consideration by existing estimation methods. Moreover, practical problems of interest may have ultra-high dimensionality and obvious node collinearity. In addition, none of the available algorithms provides a probabilistic measure of the uncertainty for the obtained network topology which is informative in reliable network identification. The main purpose of this paper is to tackle these challenging issues. We propose the S2 learning framework, which stands for stationary-sparse network learning. We propose a novel algorithm referred to as the Berhu iterative sparsity pursuit with stationarity (BISPS), where the Berhu regularization can improve the Lasso in detection and estimation. The algorithm is extremely easy to implement, efficient in computation and has a theoretical guarantee to converge to a global optimum. We also incorporate a screening technique into BISPS to tackle ultra-high dimensional problems and enhance computational efficiency. Furthermore, a stationary bootstrap technique is applied to provide connection occurring frequency for reliable topology learning. Experiments show that our method can achieve stationary and sparse causality network learning and is scalable for high-dimensional problems. Keywords: stationarity, sparsity, Berhu, screening, bootstrap
4 0.52967256 82 jmlr-2013-Optimally Fuzzy Temporal Memory
Author: Karthik H. Shankar, Marc W. Howard
Abstract: Any learner with the ability to predict the future of a structured time-varying signal must maintain a memory of the recent past. If the signal has a characteristic timescale relevant to future prediction, the memory can be a simple shift register—a moving window extending into the past, requiring storage resources that linearly grows with the timescale to be represented. However, an independent general purpose learner cannot a priori know the characteristic prediction-relevant timescale of the signal. Moreover, many naturally occurring signals show scale-free long range correlations implying that the natural prediction-relevant timescale is essentially unbounded. Hence the learner should maintain information from the longest possible timescale allowed by resource availability. Here we construct a fuzzy memory system that optimally sacrifices the temporal accuracy of information in a scale-free fashion in order to represent prediction-relevant information from exponentially long timescales. Using several illustrative examples, we demonstrate the advantage of the fuzzy memory system over a shift register in time series forecasting of natural signals. When the available storage resources are limited, we suggest that a general purpose learner would be better off committing to such a fuzzy memory system. Keywords: temporal information compression, forecasting long range correlated time series
5 0.51994747 120 jmlr-2013-Variational Algorithms for Marginal MAP
Author: Qiang Liu, Alexander Ihler
Abstract: The marginal maximum a posteriori probability (MAP) estimation problem, which calculates the mode of the marginal posterior distribution of a subset of variables with the remaining variables marginalized, is an important inference problem in many models, such as those with hidden variables or uncertain parameters. Unfortunately, marginal MAP can be NP-hard even on trees, and has attracted less attention in the literature compared to the joint MAP (maximization) and marginalization problems. We derive a general dual representation for marginal MAP that naturally integrates the marginalization and maximization operations into a joint variational optimization problem, making it possible to easily extend most or all variational-based algorithms to marginal MAP. In particular, we derive a set of “mixed-product” message passing algorithms for marginal MAP, whose form is a hybrid of max-product, sum-product and a novel “argmax-product” message updates. We also derive a class of convergent algorithms based on proximal point methods, including one that transforms the marginal MAP problem into a sequence of standard marginalization problems. Theoretically, we provide guarantees under which our algorithms give globally or locally optimal solutions, and provide novel upper bounds on the optimal objectives. Empirically, we demonstrate that our algorithms significantly outperform the existing approaches, including a state-of-the-art algorithm based on local search methods. Keywords: graphical models, message passing, belief propagation, variational methods, maximum a posteriori, marginal-MAP, hidden variable models
6 0.47682106 92 jmlr-2013-Random Spanning Trees and the Prediction of Weighted Graphs
7 0.45527685 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees
8 0.45319769 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions
9 0.44296011 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks
10 0.4342736 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning
11 0.42004415 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut
12 0.40702525 91 jmlr-2013-Query Induction with Schema-Guided Pruning Strategies
13 0.40675393 22 jmlr-2013-Classifying With Confidence From Incomplete Information
14 0.40179399 110 jmlr-2013-Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion
16 0.38790557 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning
17 0.37878704 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning
18 0.37529194 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization
19 0.37390757 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs