nips nips2011 nips2011-255 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Trung T. Pham, Tat-jun Chin, Jin Yu, David Suter
Abstract: Multi-structure model fitting has traditionally taken a two-stage approach: First, sample a (large) number of model hypotheses, then select the subset of hypotheses that optimise a joint fitting and model selection criterion. This disjoint two-stage approach is arguably suboptimal and inefficient — if the random sampling did not retrieve a good set of hypotheses, the optimised outcome will not represent a good fit. To overcome this weakness we propose a new multi-structure fitting approach based on Reversible Jump MCMC. Instrumental in raising the effectiveness of our method is an adaptive hypothesis generator, whose proposal distribution is learned incrementally and online. We prove that this adaptive proposal satisfies the diminishing adaptation property crucial for ensuring ergodicity in MCMC. Our method effectively conducts hypothesis sampling and optimisation simultaneously, and yields superior computational efficiency over previous two-stage methods. 1
Reference: text
sentIndex sentText sentNum sentScore
1 au Abstract Multi-structure model fitting has traditionally taken a two-stage approach: First, sample a (large) number of model hypotheses, then select the subset of hypotheses that optimise a joint fitting and model selection criterion. [sent-5, score-0.305]
2 Instrumental in raising the effectiveness of our method is an adaptive hypothesis generator, whose proposal distribution is learned incrementally and online. [sent-8, score-0.439]
3 We prove that this adaptive proposal satisfies the diminishing adaptation property crucial for ensuring ergodicity in MCMC. [sent-9, score-0.316]
4 Our method effectively conducts hypothesis sampling and optimisation simultaneously, and yields superior computational efficiency over previous two-stage methods. [sent-10, score-0.253]
5 The task manifests in applications such as mixture regression [21], motion segmentation [27, 10], and multi-projective estimation [29]. [sent-12, score-0.177]
6 In practical settings the number of structures is usually unknown beforehand, thus model selection is required in conjunction to fitting. [sent-14, score-0.148]
7 A common framework is to optimise a robust goodness-of-fit function jointly with a model selection criterion. [sent-16, score-0.145]
8 For tractability most methods [25, 19, 17, 26, 18, 7, 31] take a “hypothesise-then-select” approach: First, randomly sample from the parameter space a large number of putative model hypotheses, then select a subset of the hypotheses (structures) that optimise the combined objective function. [sent-17, score-0.261]
9 The hypotheses are typically fitted on minimal subsets [9] of the input data. [sent-18, score-0.358]
10 While sampling is crucial for tractability, a disjoint two-stage approach raises an awkward situation: If the sampled hypotheses are inaccurate, or worse, if not all valid structures are sampled, the selection or optimisation step will be affected. [sent-20, score-0.511]
11 , fundamental matrices in motion segmentation [27]) where enormous sampling effort is required before hitting good hypotheses (those fitted on all-inlier minimal subsets). [sent-23, score-0.577]
12 Thus two-stage approaches are highly vulnerable to sampling inadequacies, even with theoretical assurances on the optimisation step (e. [sent-24, score-0.127]
13 , globally optimal over the sampled hypotheses [19, 7, 31]). [sent-26, score-0.201]
14 One can consider iterative local refinement of the structures or re-sampling after data assignment [7], but the fact remains that if the initial hypotheses are inaccurate, the results of the subsequent fitting and refinement will be affected. [sent-32, score-0.305]
15 Despite their popular use [3] the method has not been fully explored in multi-structure fitting (a few authors have applied Monte Carlo techniques for robust estimation [28, 8], but mostly to enhance hypothesis sampling on single-structure data). [sent-36, score-0.178]
16 We show how to exploit the reversible jump mechanism to provide a simple and effective framework for multi-structure model selection. [sent-37, score-0.237]
17 The bane of MCMC, however, is the difficulty in designing efficient proposal distributions. [sent-38, score-0.171]
18 Adaptive MCMC techniques [4, 24] promise to alleviate this difficulty by learning the proposal distribution on-the-fly. [sent-39, score-0.171]
19 Instrumental in raising the efficiency of our RJMCMC approach is a recently proposed hypothesis generator [6] that progressively updates the proposal distribution using generated hypotheses. [sent-40, score-0.36]
20 Care must be taken in introducing such adaptive schemes, since a chain propagated based on a non-stationary proposal is non-Markovian, and unless the proposal satisfies certain properties [4, 24], this generally means a loss of asymptotic convergence to the target distribution. [sent-41, score-0.518]
21 Clearing these technical hurdles is one of our major contributions: Using emerging theory from adaptive MCMC [23, 4, 24, 11], we prove that the adaptive proposal, despite its origins in robust estimation [6], satisfies the properties required for convergence, most notably diminishing adaptation. [sent-42, score-0.295]
22 3 describes the adaptive hypothesis proposal used in our method, and develops proof that it is a valid adaptive MCMC sampler. [sent-46, score-0.506]
23 2 Multi-Structure Fitting and Model Selection Give input data X = {xi }N , usually with outliers, our goal is to recover the instances or structures i=1 θk = {θc }k of a geometric model M embedded in X. [sent-50, score-0.159]
24 The number of valid structures k is c=1 unknown beforehand and must also be estimated from the data. [sent-51, score-0.17]
25 Gaussian noise with known variance σ, the above problem is equivalent to minimising the function N f (k, θk ) = ρ i=1 minc ric 1. [sent-60, score-0.149]
26 96σ + αn(θk ), (1) where ric = g(xi , θc ) is the absolute residual of xi to the c-th structure θc in θk . [sent-61, score-0.12]
27 1 A reversible jump simulated annealing approach Simulated annealing has proven to be effective for difficult model selection problems [2, 5]. [sent-65, score-0.365]
28 The idea is to propagate a Markov chain for the Boltzmann distribution encapsulating (1) bT (k, θk ) ∝ exp(−f (k, θk )/T ) (2) where temperature T is progressively lowered until the samples from bT (k, θk ) converge to the global minima of f (k, θk ). [sent-66, score-0.127]
29 Under weak regularity assumptions, there exist cooling schedules [5] that will guarantee that as T tends to zero the samples from the chain will concentrate around the global minima. [sent-68, score-0.111]
30 Birth and death moves change the number of structures k. [sent-73, score-0.244]
31 These moves effectively cause the chain to jump across parameter spaces θk of different dimensions. [sent-74, score-0.222]
32 It is crucial that these trans-dimensional jumps are reversible to produce correct limiting behaviour of the chain. [sent-75, score-0.135]
33 Algorithm 1 Simulated annealing for multi-structure fitting and model selection 1: Initialise temperature T and state {k, θk }. [sent-77, score-0.146]
34 Algorithm 2 Reversible jump mixture of kernels MCMC to simulate bT (k, θk ) Require: Last visited state {k, θk } of previous chain, probability β (Sec. [sent-80, score-0.139]
35 2: if a ≤ β then 3: With probability rB (k), attempt birth move, else attempt death move. [sent-83, score-0.22]
36 1 Birth and death moves The birth move propagates {k, θk } to {k , θk }, with k = k + 1. [sent-90, score-0.35]
37 Applying Green’s [12, 22] seminal theorems on RJMCMC, the move is reversible if it is accepted with probability min{1, A}, where bT (k , θk )[1 − rB (k )]/k ∂θk A= . [sent-91, score-0.242]
38 (3) bT (k, θk )rB (k)q(u) ∂(θk , u) The probability of proposing the birth move is rB (k), where rB (k) = 1 for k = 1, rB (k) = 0. [sent-92, score-0.21]
39 In other words, any move that attempts to move k beyond the range [1, kmax ] is disallowed in Step 3 of Algorithm 2. [sent-97, score-0.207]
40 The death move is proposed with probability 1 − rB (k). [sent-98, score-0.164]
41 The death move is accepted with probability min{1, A−1 }, with obvious changes to the notations in A−1 . [sent-100, score-0.194]
42 In the birth move, the extra degrees of freedom required to specify the new item in θk are given by auxiliary variables u, which are in turn proposed by q(u). [sent-101, score-0.133]
43 Following [18, 7, 31], we estimate parameters of the new item by fitting the geometric model M onto a minimal subset of the data. [sent-102, score-0.164]
44 Our approach is equivalently minimising (1) over collections {k, θk } of minimal subsets of X, where now θk ≡ {uc }k . [sent-107, score-0.262]
45 c=1 Considering only minimal subsets somewhat simplifies the problem, but there are still a colossal number of possible minimal subsets. [sent-109, score-0.266]
46 Obtaining good overall performance thus hinges on the ability of proposal q(u) to propose minimal subsets that are relevant, i. [sent-110, score-0.328]
47 , those fitted purely on inliers of valid structures in the data. [sent-112, score-0.389]
48 3 and prove that the adaptive proposal preserves ergodicity. [sent-115, score-0.28]
49 The move involves randomly choosing a structure θc in θk to update, making only local adjustments to its minimal subset uc . [sent-119, score-0.24]
50 The outcome is a revised minimal subset uc , and the move is accepted with probability min{1, A}, where bT (k, θk )q(uc |θc ) . [sent-120, score-0.358]
51 (4) A= bT (k, θk )q(uc |θc ) As shown in the above our local update is also accomplished with the adaptive proposal q(u|θ), but this time conditioned on the selected structure θc . [sent-121, score-0.28]
52 The algorithm maintains a series of sampling weights which are revised incrementally as new hypotheses are generated. [sent-126, score-0.429]
53 1 The Multi-GS algorithm Let {θm }M aggregate the set of hypotheses fitted on the minimal subsets proposed thus far in all m=1 birth and local update moves in Algorithm 1. [sent-131, score-0.544]
54 To build the sampling weights, first for each xi ∈ X we compute its absolute residuals as measured to the M hypotheses, yielding the residual vector (i) (i) (i) (i) r(i) := [ r1 r2 (i) · · · rM ]. [sent-132, score-0.164]
55 The permutation a(i) essentially ranks the M hypotheses according to the preference of xi ; The higher a hypothesis is ranked the more likely xi is an inlier to it. [sent-134, score-0.465]
56 The weight wi,j between the pair xi and xj is obtained as 1 (i) (j) wi,j = Ih (xi , xj ) := a ∩ ah , (5) h h (i) (j) where |ah ∩ ah | is the number of identical elements shared by the first-h elements of a(i) and a(j) . [sent-135, score-0.193]
57 The weight wi,j measures the correlation of the top h preferences of xi and xj , and this value is typically high iff xi and xj are inliers from the same structure; Figs. [sent-138, score-0.388]
58 Multi-GS exploits the preference correlations to sample the next minimal subset u = {xst }p , t=1 where xst ∈ X and st ∈ {1, . [sent-145, score-0.255]
59 Beginning from t = 2, the selection of t=1 the t-th member st considers the weights related to the data s1 , . [sent-150, score-0.141]
60 A new hypothesis θM +1 is then fitted on u and the weights are updated in consideration of θM +1 . [sent-160, score-0.134]
61 , all-inlier minimal subsets produced per unit time) show that Multi-GS is superior over previous guided sampling schemes, especially on multi-structure data; See [6] for details. [sent-163, score-0.212]
62 Our RJMCMC scheme in Algorithm 2 depends on the Multi-GS-inspired adaptive proposals qM (u) and qM (u|θ), where we now add the subscript M to make explicit their dependency on the set of aggregated hypotheses {θm }M as well as the weights {wi,j }N m=1 i,j=1 they induce. [sent-166, score-0.362]
63 , zn−1 ∈ Ξ and for some distribution π(z) on Ξ, π(zn )Tn (zn , zn+1 ) = π(zn+1 ), (10) zn |Tn+k (z, z ) − Tn (z, z )| ≤ an ck , an = O(n−r1 ), ck = O(k −r2 ), r1 , r2 > 0, (11) Tn (z, z ) ≥ π(z ), > 0, where does not depend on n, z0 , . [sent-174, score-0.166]
64 (11) dictates that the transition kernel, and thus the proposal distribution in the Metropolis-Hastings updates in Eqs. [sent-181, score-0.171]
65 Without loss of generality assume that b new hypotheses are generated between successive weight updates wi,j and wi,j . [sent-186, score-0.201]
66 The result is based on the fact that the extension of b hypotheses will only perturb the overlap between the top-k percentile of any two preference vectors by at most b(k + 1) items. [sent-188, score-0.267]
67 It should also be noted that the result is not due to wi,j and wi,j simultaneously vanishing with increasing M ; in general (i) (j) lim |akM ∩ akM |/M = 0 M →∞ since a(i) and a(j) are extended and revised as M increases and this may increase their mutual overlap. [sent-189, score-0.201]
68 Using the above result, it can be shown that the product of any two weights also converges lim wi,j wp,q − wi,j wp,q = lim wi,j (wp,q − wp,q ) + wp,q (wi,j − wi,j ) M →∞ M →∞ ≤ lim M →∞ wi,j wp,q − wp,q + wp,q wi,j − wi,j = 0. [sent-192, score-0.391]
69 To show the convergence of the normalisation terms in (8), we first observe that the sum of weights is bounded away from 0 ∀i, 1T wi ≥ L, L > 0, due to the offsetting (6) and the constant element wi,i = 1 in wi (although wi,i will be set to zero to enforce sampling without replacement [6]). [sent-194, score-0.257]
70 It can thus be established that lim 1 M →∞ 1T wi − 1 1T wi = lim M →∞ 1T wi − 1T wi 1T wi − 1T wi ≤ lim =0 T w )(1T w ) M →∞ (1 i L2 i since the sum of the weights also converges. [sent-195, score-0.709]
71 Again, since we simulate the target using Metropolis-Hastings, every proposal has a chance of being rejected, thus implying aperiodicity [3]. [sent-208, score-0.208]
72 The above results apply for the local update proposal qM (u|θ) which differs from qM (u) only in the (stationary) probability to select the first index s1 . [sent-210, score-0.171]
73 Hence qM (u|θ) is also a valid adaptive proposal. [sent-211, score-0.144]
74 4 Experiments We compare our approach (ARJMC) against state-of-the-art methods: message passing [18] (FLOSS), energy minimisation with graph cut [7] (ENERGY), and quadratic programming based on a novel preference feature [31] (QP-MF). [sent-212, score-0.16]
75 , computational inefficiency [19, 17, 26], low accuracy due to greedy search [25], or vulnerability to outliers [17]. [sent-215, score-0.159]
76 In Algorithm 1 temperature T is initialiased as 1 and we apply the geometric cooling schedule Tnext = 0. [sent-219, score-0.159]
77 1 Two-view motion segmentation The goal is to segment point trajectories X matched across two views into distinct motions [27]. [sent-224, score-0.249]
78 Trajectories of a particular motion can be related by a distinct fundamental matrix F ∈ R3×3 [15]. [sent-225, score-0.11]
79 Our task is thus to estimate the number of motions k and the fundamental matrices {Fc }k correc=1 sponding to the motions embedded in data X. [sent-226, score-0.179]
80 We estimate fundamental matrix hypotheses from minimal subsets of size p = 8 using the 8-point method [14]. [sent-228, score-0.393]
81 We test the methods on publicly available two-view motion segmentation datasets [30]. [sent-230, score-0.177]
82 (c)–(g) show the evolution of the matrix of pairwise weights (5) computed from (b) as the number of hypotheses M is increased. [sent-240, score-0.289]
83 The matrices exhibit a a four-block pattern, indicating strong mutual preference among inliers from the same structure. [sent-248, score-0.316]
84 This phenomenon allows accurate selection of minimal subsets in Multi-GS [6]. [sent-249, score-0.201]
85 The latter involves assigning each datum xi ∈ X to the nearest structure in θk if the residual is less than the threshold t; else xi is labelled as an outlier. [sent-256, score-0.181]
86 , 1000 number of hypotheses generated so far in Algorithm 1. [sent-261, score-0.201]
87 1(j)–1(m) depict the evolution of the segmentation result of ARJMC as M increases. [sent-266, score-0.138]
88 , 1000 hypotheses are accumulatively generated (using both uniform random sampling [9] and Multi-GS [6]). [sent-270, score-0.256]
89 We ensure that each method returns the true number of structures for all M ; this represents an advantage over ARJMC, since the “online learning” nature of ARJMC means the number of structures is not discovered until closer to convergence. [sent-272, score-0.208]
90 16 breadcartoychip (4 structures) 33, 23, 41 and 58 inliers, 82 outliers toycubecar (3 structures) 45, 69 and 14 inliers, 72 outliers FLOSS ARJMC 21. [sent-429, score-0.362]
91 57 QP-MF ENERGY carchipscube (3 structures) 19, 33 and 53 inliers, 60 outliers ENERGY M 100 200 300 400 500 600 700 800 900 1000 Time (seconds) ARJMC 25. [sent-451, score-0.203]
92 95 biscuitbookbox (3 structures) 67, 41 and 54 inliers, 97 outliers Table 1: Median segmentation error (%) at different number of hypotheses M . [sent-549, score-0.506]
93 5 Conclusions By design, since our algorithm conducts hypothesis sampling, geometric fitting and model selection simultaneously, it minimises wastage in the sampling process and converges faster than previous two-stage approaches. [sent-552, score-0.28]
94 Underpinning our novel Reversible Jump MCMC method is an efficient hypothesis generator whose proposal distribution is learned online. [sent-554, score-0.316]
95 Drawing from new theory on Adaptive MCMC, we prove that our efficient hypothesis generator satisfies the properties crucial to ensure convergence to the correct target distribution. [sent-555, score-0.145]
96 Our work thus links the latest developments from MCMC optimisation and geometric model fitting. [sent-556, score-0.127]
97 Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. [sent-637, score-0.169]
98 Two-view motion segmentation by mixtures of dirichlet process with model selection and outlier removal. [sent-666, score-0.258]
99 Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. [sent-709, score-0.176]
100 Branch-and-bound hypothesis selection for two-view multiple structure and motion segmentation. [sent-727, score-0.201]
wordName wordTfidf (topN-words)
[('arjmc', 0.394), ('akm', 0.263), ('floss', 0.263), ('inliers', 0.25), ('hypotheses', 0.201), ('qm', 0.195), ('proposal', 0.171), ('zn', 0.166), ('outliers', 0.159), ('reversible', 0.135), ('birth', 0.133), ('mcmc', 0.132), ('rb', 0.116), ('lim', 0.113), ('chin', 0.109), ('rjmcmc', 0.109), ('minimal', 0.109), ('adaptive', 0.109), ('minimising', 0.105), ('structures', 0.104), ('segmentation', 0.102), ('jump', 0.102), ('energy', 0.094), ('gs', 0.09), ('tting', 0.088), ('revised', 0.088), ('death', 0.087), ('multi', 0.087), ('bt', 0.086), ('hypothesis', 0.082), ('move', 0.077), ('motion', 0.075), ('motions', 0.072), ('optimisation', 0.072), ('chain', 0.067), ('preference', 0.066), ('breadtoycar', 0.066), ('datum', 0.066), ('generator', 0.063), ('temperature', 0.06), ('optimise', 0.06), ('andrieu', 0.058), ('sampling', 0.055), ('geometric', 0.055), ('tn', 0.054), ('uc', 0.054), ('qp', 0.054), ('moves', 0.053), ('kmax', 0.053), ('wi', 0.053), ('weights', 0.052), ('tted', 0.05), ('mf', 0.05), ('subsets', 0.048), ('ah', 0.047), ('st', 0.045), ('fitting', 0.045), ('biscuitbookbox', 0.044), ('carchipscube', 0.044), ('conducts', 0.044), ('cooling', 0.044), ('cubebreadtoychips', 0.044), ('homographies', 0.044), ('offsetting', 0.044), ('raising', 0.044), ('ric', 0.044), ('sium', 0.044), ('summarises', 0.044), ('toycubecar', 0.044), ('selection', 0.044), ('annealing', 0.042), ('robust', 0.041), ('consensus', 0.039), ('xi', 0.039), ('inlier', 0.038), ('haario', 0.038), ('labelling', 0.038), ('nement', 0.037), ('km', 0.037), ('mixtures', 0.037), ('simulate', 0.037), ('residual', 0.037), ('diminishing', 0.036), ('evolution', 0.036), ('fundamental', 0.035), ('instrumental', 0.035), ('roberts', 0.035), ('aic', 0.035), ('colour', 0.035), ('xst', 0.035), ('valid', 0.035), ('residuals', 0.033), ('incrementally', 0.033), ('vision', 0.032), ('pattern', 0.032), ('planar', 0.031), ('beforehand', 0.031), ('xj', 0.03), ('accepted', 0.03), ('ih', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC
Author: Trung T. Pham, Tat-jun Chin, Jin Yu, David Suter
Abstract: Multi-structure model fitting has traditionally taken a two-stage approach: First, sample a (large) number of model hypotheses, then select the subset of hypotheses that optimise a joint fitting and model selection criterion. This disjoint two-stage approach is arguably suboptimal and inefficient — if the random sampling did not retrieve a good set of hypotheses, the optimised outcome will not represent a good fit. To overcome this weakness we propose a new multi-structure fitting approach based on Reversible Jump MCMC. Instrumental in raising the effectiveness of our method is an adaptive hypothesis generator, whose proposal distribution is learned incrementally and online. We prove that this adaptive proposal satisfies the diminishing adaptation property crucial for ensuring ergodicity in MCMC. Our method effectively conducts hypothesis sampling and optimisation simultaneously, and yields superior computational efficiency over previous two-stage methods. 1
2 0.12250048 180 nips-2011-Multiple Instance Filtering
Author: Kamil A. Wnuk, Stefano Soatto
Abstract: We propose a robust filtering approach based on semi-supervised and multiple instance learning (MIL). We assume that the posterior density would be unimodal if not for the effect of outliers that we do not wish to explicitly model. Therefore, we seek for a point estimate at the outset, rather than a generic approximation of the entire posterior. Our approach can be thought of as a combination of standard finite-dimensional filtering (Extended Kalman Filter, or Unscented Filter) with multiple instance learning, whereby the initial condition comes with a putative set of inlier measurements. We show how both the state (regression) and the inlier set (classification) can be estimated iteratively and causally by processing only the current measurement. We illustrate our approach on visual tracking problems whereby the object of interest (target) moves and evolves as a result of occlusions and deformations, and partial knowledge of the target is given in the form of a bounding box (training set). 1
3 0.10280293 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation
Author: Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman
Abstract: We propose a novel Adaptive Markov Chain Monte Carlo algorithm to compute the partition function. In particular, we show how to accelerate a flat histogram sampling technique by significantly reducing the number of “null moves” in the chain, while maintaining asymptotic convergence properties. Our experiments show that our method converges quickly to highly accurate solutions on a range of benchmark instances, outperforming other state-of-the-art methods such as IJGP, TRW, and Gibbs sampling both in run-time and accuracy. We also show how obtaining a so-called density of states distribution allows for efficient weight learning in Markov Logic theories. 1
4 0.089360818 275 nips-2011-Structured Learning for Cell Tracking
Author: Xinghua Lou, Fred A. Hamprecht
Abstract: We study the problem of learning to track a large quantity of homogeneous objects such as cell tracking in cell culture study and developmental biology. Reliable cell tracking in time-lapse microscopic image sequences is important for modern biomedical research. Existing cell tracking methods are usually kept simple and use only a small number of features to allow for manual parameter tweaking or grid search. We propose a structured learning approach that allows to learn optimum parameters automatically from a training set. This allows for the use of a richer set of features which in turn affords improved tracking compared to recently reported methods on two public benchmark sequences. 1
5 0.077314079 229 nips-2011-Query-Aware MCMC
Author: Michael L. Wick, Andrew McCallum
Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, specified by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-specific approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efficiency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1
6 0.07452099 228 nips-2011-Quasi-Newton Methods for Markov Chain Monte Carlo
7 0.071946055 131 nips-2011-Inference in continuous-time change-point models
8 0.064657539 227 nips-2011-Pylon Model for Semantic Segmentation
9 0.064028807 277 nips-2011-Submodular Multi-Label Learning
10 0.063926958 217 nips-2011-Practical Variational Inference for Neural Networks
11 0.059742138 68 nips-2011-Demixed Principal Component Analysis
12 0.059204064 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation
13 0.057668343 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference
14 0.056802325 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
15 0.054092191 258 nips-2011-Sparse Bayesian Multi-Task Learning
16 0.053662028 271 nips-2011-Statistical Tests for Optimization Efficiency
17 0.053602133 221 nips-2011-Priors over Recurrent Continuous Time Processes
18 0.052831046 55 nips-2011-Collective Graphical Models
19 0.052207954 12 nips-2011-A Two-Stage Weighting Framework for Multi-Source Domain Adaptation
20 0.05207172 299 nips-2011-Variance Penalizing AdaBoost
topicId topicWeight
[(0, 0.172), (1, 0.014), (2, -0.021), (3, 0.019), (4, -0.008), (5, -0.05), (6, -0.039), (7, -0.093), (8, -0.005), (9, 0.028), (10, 0.018), (11, -0.063), (12, 0.033), (13, -0.109), (14, -0.026), (15, -0.052), (16, -0.02), (17, -0.072), (18, -0.041), (19, 0.02), (20, 0.01), (21, 0.067), (22, -0.004), (23, -0.044), (24, -0.013), (25, 0.03), (26, -0.127), (27, 0.058), (28, 0.021), (29, -0.024), (30, -0.062), (31, -0.141), (32, -0.019), (33, -0.04), (34, -0.048), (35, 0.055), (36, -0.017), (37, 0.01), (38, 0.042), (39, -0.02), (40, -0.004), (41, -0.092), (42, -0.041), (43, -0.046), (44, 0.054), (45, -0.047), (46, -0.1), (47, -0.001), (48, -0.03), (49, 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.916493 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC
Author: Trung T. Pham, Tat-jun Chin, Jin Yu, David Suter
Abstract: Multi-structure model fitting has traditionally taken a two-stage approach: First, sample a (large) number of model hypotheses, then select the subset of hypotheses that optimise a joint fitting and model selection criterion. This disjoint two-stage approach is arguably suboptimal and inefficient — if the random sampling did not retrieve a good set of hypotheses, the optimised outcome will not represent a good fit. To overcome this weakness we propose a new multi-structure fitting approach based on Reversible Jump MCMC. Instrumental in raising the effectiveness of our method is an adaptive hypothesis generator, whose proposal distribution is learned incrementally and online. We prove that this adaptive proposal satisfies the diminishing adaptation property crucial for ensuring ergodicity in MCMC. Our method effectively conducts hypothesis sampling and optimisation simultaneously, and yields superior computational efficiency over previous two-stage methods. 1
2 0.73529965 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation
Author: Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman
Abstract: We propose a novel Adaptive Markov Chain Monte Carlo algorithm to compute the partition function. In particular, we show how to accelerate a flat histogram sampling technique by significantly reducing the number of “null moves” in the chain, while maintaining asymptotic convergence properties. Our experiments show that our method converges quickly to highly accurate solutions on a range of benchmark instances, outperforming other state-of-the-art methods such as IJGP, TRW, and Gibbs sampling both in run-time and accuracy. We also show how obtaining a so-called density of states distribution allows for efficient weight learning in Markov Logic theories. 1
3 0.62955534 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference
Author: David Wingate, Noah Goodman, Andreas Stuhlmueller, Jeffrey M. Siskind
Abstract: Probabilistic programming languages allow modelers to specify a stochastic process using syntax that resembles modern programming languages. Because the program is in machine-readable format, a variety of techniques from compiler design and program analysis can be used to examine the structure of the distribution represented by the probabilistic program. We show how nonstandard interpretations of probabilistic programs can be used to craft efficient inference algorithms: information about the structure of a distribution (such as gradients or dependencies) is generated as a monad-like side computation while executing the program. These interpretations can be easily coded using special-purpose objects and operator overloading. We implement two examples of nonstandard interpretations in two different languages, and use them as building blocks to construct inference algorithms: automatic differentiation, which enables gradient based methods, and provenance tracking, which enables efficient construction of global proposals. 1
4 0.60561019 228 nips-2011-Quasi-Newton Methods for Markov Chain Monte Carlo
Author: Yichuan Zhang, Charles A. Sutton
Abstract: The performance of Markov chain Monte Carlo methods is often sensitive to the scaling and correlations between the random variables of interest. An important source of information about the local correlation and scale is given by the Hessian matrix of the target distribution, but this is often either computationally expensive or infeasible. In this paper we propose MCMC samplers that make use of quasiNewton approximations, which approximate the Hessian of the target distribution from previous samples and gradients generated by the sampler. A key issue is that MCMC samplers that depend on the history of previous states are in general not valid. We address this problem by using limited memory quasi-Newton methods, which depend only on a fixed window of previous samples. On several real world datasets, we show that the quasi-Newton sampler is more effective than standard Hamiltonian Monte Carlo at a fraction of the cost of MCMC methods that require higher-order derivatives. 1
5 0.6045807 55 nips-2011-Collective Graphical Models
Author: Daniel R. Sheldon, Thomas G. Dietterich
Abstract: There are many settings in which we wish to fit a model of the behavior of individuals but where our data consist only of aggregate information (counts or low-dimensional contingency tables). This paper introduces Collective Graphical Models—a framework for modeling and probabilistic inference that operates directly on the sufficient statistics of the individual model. We derive a highlyefficient Gibbs sampling algorithm for sampling from the posterior distribution of the sufficient statistics conditioned on noisy aggregate observations, prove its correctness, and demonstrate its effectiveness experimentally. 1
6 0.59201789 180 nips-2011-Multiple Instance Filtering
7 0.56241637 197 nips-2011-On Tracking The Partition Function
8 0.55898482 275 nips-2011-Structured Learning for Cell Tracking
9 0.49171478 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference
10 0.47356564 241 nips-2011-Scalable Training of Mixture Models via Coresets
11 0.4690645 131 nips-2011-Inference in continuous-time change-point models
12 0.4654752 120 nips-2011-History distribution matching method for predicting effectiveness of HIV combination therapies
13 0.45946339 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes
14 0.45270979 95 nips-2011-Fast and Accurate k-means For Large Datasets
15 0.4428162 136 nips-2011-Inverting Grice's Maxims to Learn Rules from Natural Language Extractions
16 0.44140014 27 nips-2011-Advice Refinement in Knowledge-Based SVMs
17 0.4363271 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies
18 0.43246785 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
19 0.4264743 277 nips-2011-Submodular Multi-Label Learning
20 0.4195658 229 nips-2011-Query-Aware MCMC
topicId topicWeight
[(0, 0.025), (4, 0.037), (13, 0.011), (20, 0.038), (26, 0.424), (31, 0.083), (33, 0.018), (43, 0.045), (45, 0.071), (57, 0.021), (74, 0.05), (83, 0.033), (84, 0.016), (99, 0.046)]
simIndex simValue paperId paperTitle
1 0.92424113 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity
Author: Nir Ailon
Abstract: Given a set V of n elements we wish to linearly order them using pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). Our algorithm adaptively queries at most O(n poly(log n, ε−1 )) preference labels for a regret of ε times the optimal loss. This is strictly better, and often significantly better than what non-adaptive sampling could achieve. Our main result helps settle an open problem posed by learning-to-rank (from pairwise information) theoreticians and practitioners: What is a provably correct way to sample preference labels? 1
2 0.89271575 162 nips-2011-Lower Bounds for Passive and Active Learning
Author: Maxim Raginsky, Alexander Rakhlin
Abstract: We develop unified information-theoretic machinery for deriving lower bounds for passive and active learning schemes. Our bounds involve the so-called Alexander’s capacity function. The supremum of this function has been recently rediscovered by Hanneke in the context of active learning under the name of “disagreement coefficient.” For passive learning, our lower bounds match the upper bounds of Gin´ and Koltchinskii up to constants and generalize analogous results of Mase sart and N´ d´ lec. For active learning, we provide first known lower bounds based e e on the capacity function rather than the disagreement coefficient. 1
same-paper 3 0.87557232 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC
Author: Trung T. Pham, Tat-jun Chin, Jin Yu, David Suter
Abstract: Multi-structure model fitting has traditionally taken a two-stage approach: First, sample a (large) number of model hypotheses, then select the subset of hypotheses that optimise a joint fitting and model selection criterion. This disjoint two-stage approach is arguably suboptimal and inefficient — if the random sampling did not retrieve a good set of hypotheses, the optimised outcome will not represent a good fit. To overcome this weakness we propose a new multi-structure fitting approach based on Reversible Jump MCMC. Instrumental in raising the effectiveness of our method is an adaptive hypothesis generator, whose proposal distribution is learned incrementally and online. We prove that this adaptive proposal satisfies the diminishing adaptation property crucial for ensuring ergodicity in MCMC. Our method effectively conducts hypothesis sampling and optimisation simultaneously, and yields superior computational efficiency over previous two-stage methods. 1
4 0.84783518 179 nips-2011-Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach
Author: Qibin Zhao, Cesar F. Caiafa, Danilo P. Mandic, Liqing Zhang, Tonio Ball, Andreas Schulze-bonhage, Andrzej S. Cichocki
Abstract: A multilinear subspace regression model based on so called latent variable decomposition is introduced. Unlike standard regression methods which typically employ matrix (2D) data representations followed by vector subspace transformations, the proposed approach uses tensor subspace transformations to model common latent variables across both the independent and dependent data. The proposed approach aims to maximize the correlation between the so derived latent variables and is shown to be suitable for the prediction of multidimensional dependent data from multidimensional independent data, where for the estimation of the latent variables we introduce an algorithm based on Multilinear Singular Value Decomposition (MSVD) on a specially defined cross-covariance tensor. It is next shown that in this way we are also able to unify the existing Partial Least Squares (PLS) and N-way PLS regression algorithms within the same framework. Simulations on benchmark synthetic data confirm the advantages of the proposed approach, in terms of its predictive ability and robustness, especially for small sample sizes. The potential of the proposed technique is further illustrated on a real world task of the decoding of human intracranial electrocorticogram (ECoG) from a simultaneously recorded scalp electroencephalograph (EEG). 1
5 0.82559013 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs
Author: Edouard Grave, Guillaume R. Obozinski, Francis R. Bach
Abstract: Using the 1 -norm to regularize the estimation of the parameter vector of a linear model leads to an unstable estimator when covariates are highly correlated. In this paper, we introduce a new penalty function which takes into account the correlation of the design matrix to stabilize the estimation. This norm, called the trace Lasso, uses the trace norm of the selected covariates, which is a convex surrogate of their rank, as the criterion of model complexity. We analyze the properties of our norm, describe an optimization algorithm based on reweighted least-squares, and illustrate the behavior of this norm on synthetic data, showing that it is more adapted to strong correlations than competing methods such as the elastic net. 1
6 0.64524335 22 nips-2011-Active Ranking using Pairwise Comparisons
7 0.58187366 21 nips-2011-Active Learning with a Drifting Distribution
8 0.55124182 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning
9 0.54640472 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements
10 0.53963417 29 nips-2011-Algorithms and hardness results for parallel large margin learning
11 0.5366891 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits
12 0.53393966 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation
13 0.5304926 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries
14 0.52086765 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing
15 0.51781607 231 nips-2011-Randomized Algorithms for Comparison-based Search
16 0.51724225 158 nips-2011-Learning unbelievable probabilities
17 0.51210141 229 nips-2011-Query-Aware MCMC
18 0.50800681 226 nips-2011-Projection onto A Nonnegative Max-Heap
19 0.50164992 256 nips-2011-Solving Decision Problems with Limited Information
20 0.49724635 222 nips-2011-Prismatic Algorithm for Discrete D.C. Programming Problem