nips nips2012 nips2012-182 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola
Abstract: Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Information, disease, and influence diffuse over networks of entities in both natural systems and human society. [sent-7, score-0.108]
2 Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. [sent-8, score-0.585]
3 However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. [sent-9, score-0.772]
4 In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. [sent-10, score-0.141]
5 The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. [sent-11, score-0.242]
6 In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. [sent-13, score-0.886]
7 1 Introduction Networks have been powerful abstractions for modeling a variety of natural and artificial systems that consist of a large collection of interacting entities. [sent-14, score-0.027]
8 Due to the recent increasing availability of large-scale networks, network modeling and analysis have been extensively applied to study the spreading and diffusion of information, ideas, and even virus in social and information networks (see e. [sent-15, score-0.512]
9 However, the process of influence and diffusion often occurs in a hidden network that might not be easily observed and identified directly. [sent-18, score-0.491]
10 For instance, when a disease spreads among people, epidemiologists can know only when a person gets sick, but they can hardly ever know where and from whom he (she) gets infected. [sent-19, score-0.179]
11 Similarly, when consumers rush to buy some particular products, marketers can know when purchases occurred, but they cannot track in further where the recommendations originally came from [12]. [sent-20, score-0.081]
12 In all such cases, we could observe only the time stamp when a piece of information has been received by a particular entity, but the exact path of diffusion is missing. [sent-21, score-0.479]
13 Therefore, it is an interesting and challenging question whether we can uncover the diffusion paths based just on the time stamps of the events. [sent-22, score-0.473]
14 However, in these models, time is treated as discrete index and not modeled as a random variable. [sent-26, score-0.063]
15 In the diffusion network discovery problem, time is treated explicitly as a continuous variable, and one is interested in capturing how the occurrence of event at one node affects the time for its occurence at other nodes. [sent-27, score-0.928]
16 Specifically, Meyers and Leskovec inferred the diffusion network by learning the infection probability between two nodes using a convex programming, called CONNIE [14]. [sent-29, score-0.541]
17 inferred the network connectivity using a submodular optimization, called NETINF [4]. [sent-31, score-0.129]
18 However, both CONNIE and NETINF assume that the transmission model for each pair of nodes is fixed with predefined transmission rate. [sent-32, score-0.567]
19 proposed an elegant method, called NETRATE [3], using continuous temporal dynamics model to allow variable diffusion rates across network edges. [sent-34, score-0.386]
20 1 histogram exp rayleigh KernelCascade pdf pdf 0. [sent-37, score-0.285]
21 02 0 0 (a) Pair 1 histogram exp rayleigh KernelCascade 0. [sent-47, score-0.167]
22 3 20 t(hours) 40 0 0 50 t(hours) (b) Pair 2 100 (c) Pair 3 Figure 1: The histograms of the interval between the time when a post appeared in one site and the time when a new post in another site links to it. [sent-49, score-0.47]
23 Dotted and dash lines are density fitted by NETRATE. [sent-50, score-0.077]
24 fixed parametric form, such as exponential, power-law, or Rayleigh distribution, although the model parameters learned from cascades could be different. [sent-52, score-0.246]
25 In practice, the patterns of information diffusion (or a spreading disease) among entities can be quite complicated and different from each other, going far beyond what a single family of parametric models can capture. [sent-53, score-0.562]
26 For example, in twitter, an active user can be online for more than 12 hours a day, and he may instantly respond to any interesting message. [sent-54, score-0.128]
27 However, an inactive user may just log in and respond once a day. [sent-55, score-0.115]
28 As a result, the spreading pattern of the messages between the active user and his friends can be quite different from that of the inactive user. [sent-56, score-0.157]
29 Another example is from the information diffusion in a blogsphere: the hyperlinks between posts can be viewed as some kind of information flow from one media site to another, and the time difference between two linked posts reveal the pattern of diffusion. [sent-57, score-0.641]
30 In Figure 1, we examined three pairs of media sites from the MemeTracker dataset [3, 9], and plotted the histograms of the intervals between the the moment when a post first appeared in one site and the moment when it was linked by a new post in another site. [sent-58, score-0.471]
31 We can observe that information can have very different transmission patterns for these pairs. [sent-59, score-0.284]
32 Parametric models fitted by NETRATE may capture the simple pattern in Figure 1(a), but they might miss the multimodal patterns in Figure 1(b) and Figure 1(c). [sent-60, score-0.043]
33 Our key idea is to model the continuous information diffusion process using survival analysis by kernelizing the hazard function. [sent-63, score-0.956]
34 We obtain a convex optimization problem with grouped lasso type of regularization and develop a fast block-coordinate descent algorithm for solving the problem. [sent-64, score-0.029]
35 The sparsity patterns of the coefficients provide us the structure of the diffusion network. [sent-65, score-0.341]
36 In both synthetic and real world data, our method can better recover the underlying diffusion networks and drastically improve the estimation of the transmission functions among networked entities. [sent-66, score-0.722]
37 2 Preliminary In this section, we will present some basic concepts from survival analysis [7, 8], which are essential for our later modeling. [sent-67, score-0.309]
38 Given a nonnegative random variable T corresponding to the time when an t event happens, let f (t) be the probability density function of T and F (t) = P r(T ≤ t) = 0 f (x)dx be its cumulative distribution function. [sent-68, score-0.219]
39 The probability that an event does not happen up to time t is thus given by the survival function S(t) = P r(T ≥ t) = 1 − F (t). [sent-69, score-0.562]
40 The survival function is a continuous and monotonically decreasing function with S(0) = 1 and S(∞) = limt→∞ S(t) = 0. [sent-70, score-0.309]
41 Given f (t) and S(t), we can define the instantaneous risk (or rate) that an event has not happened yet up to time t but happens at time t by the hazard function P r(t ≤ T ≤ t + ∆t|T ≥ t) f (t) h(t) = lim = . [sent-71, score-0.688]
42 (1) ∆t→0 ∆t S(t) With this definition, h(t)∆t will be the approximate probability that an event happens in [t, t + ∆t) given that the event has not happened yet up to t. [sent-72, score-0.497]
43 Furthermore, the hazard function h(t) is also d related to the survival function S(t) via the differential equation h(t) = − dt log S(t), where we have used f (t) = −S (t). [sent-73, score-0.655]
44 Solving the differential equation with boundary condition S(0) = 1, we can recover the survival function S(t) and the density function f (t) based on the hazard function h(t), i. [sent-74, score-0.655]
45 , t S(t) = exp − t h(x) dx f (t) = h(t) exp − and 0 h(x) dx . [sent-76, score-0.07]
46 0 2 (2) a b c t0 t1 a b t2 t0 t3 c t1 a b t3 c t2 t4 d e d e d e (a) Hidden network (b) Node e gets infected at time t4 (c) Node e survives Figure 2: Cascades over a hidden network. [sent-77, score-0.453]
47 Solid lines in panel(a) represent connections in a hidden network. [sent-78, score-0.08]
48 In panel (b) and (c), filled circles indicate infected nodes while empty circles represent uninfected ones. [sent-79, score-0.422]
49 Node a, b, c and d are the parents of node e which got infected at t0 < t1 < t2 < t3 respectively and tended to infect node e. [sent-80, score-0.781]
50 In panel (b), node e survives given node a, b and c shown in green dash lines. [sent-81, score-0.65]
51 In panel (c), node e survives even though all its parents got infected. [sent-83, score-0.488]
52 3 Modeling Cascades using Survival Analysis We use survival analysis to model information diffusion for networked entities. [sent-84, score-0.708]
53 We assume that there is a fixed population of N nodes connected in a directed network G = (V, E). [sent-87, score-0.231]
54 Neighboring nodes are allowed to directly influence each other. [sent-88, score-0.085]
55 Nodes along a directed path may influence each other only through a diffusion process. [sent-89, score-0.356]
56 Because the true underlying network is unknown, our observations are only the time stamps when events occur to each node in the network. [sent-90, score-0.554]
57 The time stamps are then organized as cascades, each of which corresponds to a particular event. [sent-91, score-0.175]
58 For instance, a piece of news posted on CNN website about “Facebook went public” can be treated as an event. [sent-92, score-0.16]
59 It can spread across the blogsphere and trigger a sequence of posts from other sites referring to it. [sent-93, score-0.175]
60 Each site will have a time stamp when this particular piece of news is being discussed and cited. [sent-94, score-0.322]
61 The goal of the model is to capture the interplay between the hidden diffusion network and the cascades of observed event time stamps. [sent-95, score-0.857]
62 More formally, a directed edge, j → i, is associated with an transmission function fji (ti |tj ), which is the conditional likelihood of an event happening to node i at time ti given that the same event has already happened to node j at time tj . [sent-96, score-1.732]
63 The transmission function attempts to capture the temporal dependency between the two successive events for node i and j. [sent-97, score-0.532]
64 In addition, we focus on shiftinvariant transmission functions whose value only depends on the time difference, i. [sent-98, score-0.3]
65 , fji (ti |tj ) = fji (ti − tj ) = fji (∆ji ) where ∆ji := ti − tj . [sent-100, score-0.884]
66 Given the likelihood function, we can compute the corresponding survival function Sji (∆ji ) and hazard function hji (∆ji ). [sent-101, score-0.73]
67 When there is no directed edge j → i, the transmission function and hazard function are both identically zeros, i. [sent-102, score-0.646]
68 , fji (∆ji ) = 0 and hji (∆ji ) = 0, but the survival function is identically one, i. [sent-104, score-0.592]
69 Therefore, the structure of the diffusion network is reflected in the non-zero patterns of a collection of transmission functions (or hazard functions). [sent-107, score-0.99]
70 A cascade is an N -dimensional vector tc := (tc , . [sent-108, score-0.374]
71 , tc ) with i-th dimension recording the time 1 N stamp when event c occurs to node i. [sent-111, score-0.843]
72 Furthermore, tc ∈ [0, T c ] ∪ {∞}, and the symbol ∞ labels i nodes that have not been influenced during observation window [0, T c ] — it does not imply that nodes are never influenced. [sent-112, score-0.422]
73 A dataset can contain a collection, C, of cascades t1 , . [sent-114, score-0.199]
74 The time stamps assigned to nodes by a cascade induce a directed acyclic graph (DAG) by defining node j as the parent of i if tj < ti . [sent-118, score-0.944]
75 Thus, it is meaningful to refer to parents and children within a cascade [3], which is different from the parent-child structural relation on the true underlying diffusion network. [sent-119, score-0.493]
76 Since the true network is inferred from many cascades (each of which imposes its own DAG structure), the inferred network is typically not a DAG. [sent-120, score-0.457]
77 The likelihood (tc ) of a cascade induced by event c is then simply a product of all individual likelihood i (tc ) that event c occurs to each node i. [sent-121, score-0.871]
78 Depending on whether event c actually occurs to node i in the data, we can compute this individual likelihood as: Event c did occur at node i. [sent-122, score-0.75]
79 We assume that once an event occurs at node i under the influence of a particular parent j in a cascade, the same event will not happen again. [sent-123, score-0.783]
80 In Figure 2(b), node e is susceptible given its parent a, b, c and d. [sent-124, score-0.319]
81 However, only node d is the first parent who infects node e. [sent-125, score-0.552]
82 Because each parent could be equally likely to first influence node i, the likelihood is just a simple sum over the likelihoods of the mutually disjoint events that node i has survived from the influence of all the other parents except the first parent j, i. [sent-126, score-0.873]
83 , + c i (t ) Ski (∆c ) = ki fji (∆c ) ji = j:tc T c i + c i (t ) . [sent-128, score-0.345]
84 (5) tc ≤T c i uninfected nodes infected nodes Therefore, the likelihood of all cascades is a product of the these individual cascade likelihoods, i. [sent-129, score-1.017]
85 In the end, we take the negative log of this likelihood function and regroup all terms associated with edges pointing to node i together to derive L({t1 , . [sent-138, score-0.276]
86 , t|C| }) = − log S(∆c ) + ji i j h(∆c ) ji log (6) } } { { } { There are two interesting implications from this negative log likelihood function. [sent-141, score-0.337]
87 First, the function can be expressed using only the hazard and the survival function. [sent-142, score-0.629]
88 Second, the function is decomposed into additive contribution from each node i. [sent-143, score-0.233]
89 We can therefore estimate the hazard and survival function for each node separately. [sent-144, score-0.862]
90 [3] used parametric hazard and survival functions, and they estimated the model parameters using the convex programming. [sent-146, score-0.676]
91 In contrast, we will instead formulate an algorithm using kernels and grouped parameter regularization, which allows us to estimate complicated hazard and survival functions without overfitting. [sent-147, score-0.69]
wordName wordTfidf (topN-words)
[('hazard', 0.32), ('survival', 0.309), ('diffusion', 0.298), ('tc', 0.252), ('transmission', 0.241), ('node', 0.233), ('cascades', 0.199), ('fji', 0.198), ('event', 0.189), ('infected', 0.165), ('ji', 0.147), ('stamps', 0.145), ('kernelcascade', 0.132), ('netrate', 0.132), ('rayleigh', 0.132), ('cascade', 0.122), ('uence', 0.116), ('tj', 0.105), ('site', 0.103), ('networked', 0.101), ('network', 0.088), ('stamp', 0.087), ('parent', 0.086), ('nodes', 0.085), ('ti', 0.08), ('spreading', 0.08), ('survives', 0.08), ('happened', 0.076), ('post', 0.073), ('parents', 0.073), ('posts', 0.069), ('blogsphere', 0.066), ('connie', 0.066), ('netinf', 0.066), ('sji', 0.066), ('uninfected', 0.066), ('piece', 0.064), ('entities', 0.062), ('pdf', 0.059), ('directed', 0.058), ('events', 0.058), ('hji', 0.058), ('hours', 0.057), ('panel', 0.054), ('hidden', 0.053), ('occurs', 0.052), ('dash', 0.05), ('disease', 0.049), ('got', 0.048), ('parametric', 0.047), ('networks', 0.046), ('inactive', 0.044), ('dag', 0.044), ('patterns', 0.043), ('likelihood', 0.043), ('happens', 0.043), ('media', 0.043), ('inferred', 0.041), ('sites', 0.04), ('respond', 0.038), ('heterogeneous', 0.038), ('news', 0.038), ('gets', 0.037), ('drastically', 0.036), ('histogram', 0.035), ('dx', 0.035), ('happen', 0.034), ('alex', 0.034), ('treated', 0.033), ('user', 0.033), ('likelihoods', 0.032), ('complicated', 0.032), ('tted', 0.031), ('appeared', 0.031), ('time', 0.03), ('grouped', 0.029), ('sick', 0.029), ('ski', 0.029), ('infect', 0.029), ('infection', 0.029), ('kernelizing', 0.029), ('shiftinvariant', 0.029), ('spreads', 0.029), ('survived', 0.029), ('linked', 0.029), ('histograms', 0.027), ('identically', 0.027), ('happening', 0.027), ('occurence', 0.027), ('buy', 0.027), ('consumers', 0.027), ('purchases', 0.027), ('abstractions', 0.027), ('hardly', 0.027), ('lines', 0.027), ('circles', 0.026), ('differential', 0.026), ('moment', 0.026), ('georgia', 0.025), ('posted', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 182 nips-2012-Learning Networks of Heterogeneous Influence
Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola
Abstract: Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1
2 0.16732825 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes
Author: Simon Lyons, Amos J. Storkey, Simo Särkkä
Abstract: Stochastic differential equations (SDE) are a natural tool for modelling systems that are inherently noisy or contain uncertainties that can be modelled as stochastic processes. Crucial to the process of using SDE to build mathematical models is the ability to estimate parameters of those models from observed data. Over the past few decades, significant progress has been made on this problem, but we are still far from having a definitive solution. We describe a novel method of approximating a diffusion process that we show to be useful in Markov chain Monte-Carlo (MCMC) inference algorithms. We take the ‘white’ noise that drives a diffusion process and decompose it into two terms. The first is a ‘coloured noise’ term that can be deterministically controlled by a set of auxilliary variables. The second term is small and enables us to form a linear Gaussian ‘small noise’ approximation. The decomposition allows us to take a diffusion process of interest and cast it in a form that is amenable to sampling by MCMC methods. We explain why many state-of-the-art inference methods fail on highly nonlinear inference problems, and we demonstrate experimentally that our method performs well in such situations. Our results show that this method is a promising new tool for use in inference and parameter estimation problems. 1
3 0.13749321 205 nips-2012-MCMC for continuous-time discrete-state systems
Author: Vinayak Rao, Yee W. Teh
Abstract: We propose a simple and novel framework for MCMC inference in continuoustime discrete-state systems with pure jump trajectories. We construct an exact MCMC sampler for such systems by alternately sampling a random discretization of time given a trajectory of the system, and then a new trajectory given the discretization. The first step can be performed efficiently using properties of the Poisson process, while the second step can avail of discrete-time MCMC techniques based on the forward-backward algorithm. We show the advantage of our approach compared to particle MCMC and a uniformization-based sampler. 1
4 0.13016556 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification
Author: Yung-kyun Noh, Frank Park, Daniel D. Lee
Abstract: This paper sheds light on some fundamental connections of the diffusion decision making model of neuroscience and cognitive psychology with k-nearest neighbor classification. We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. By applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in knearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectiveness of our classification criteria. 1
5 0.11035477 298 nips-2012-Scalable Inference of Overlapping Communities
Author: Prem Gopalan, Sean Gerrish, Michael Freedman, David M. Blei, David M. Mimno
Abstract: We develop a scalable algorithm for posterior inference of overlapping communities in large networks. Our algorithm is based on stochastic variational inference in the mixed-membership stochastic blockmodel (MMSB). It naturally interleaves subsampling the network with estimating its community structure. We apply our algorithm on ten large, real-world networks with up to 60,000 nodes. It converges several orders of magnitude faster than the state-of-the-art algorithm for MMSB, finds hundreds of communities in large real-world networks, and detects the true communities in 280 benchmark networks with equal or better accuracy compared to other scalable algorithms. 1
6 0.10049191 215 nips-2012-Minimizing Uncertainty in Pipelines
7 0.099827737 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering
8 0.088683881 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease
9 0.077782795 194 nips-2012-Learning to Discover Social Circles in Ego Networks
10 0.07758116 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs
11 0.074488737 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization
12 0.069650985 226 nips-2012-Multiclass Learning Approaches: A Theoretical Comparison with Implications
13 0.068406343 100 nips-2012-Discriminative Learning of Sum-Product Networks
14 0.065456495 191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables
15 0.063944325 356 nips-2012-Unsupervised Structure Discovery for Semantic Analysis of Audio
16 0.058685321 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification
17 0.057330739 180 nips-2012-Learning Mixtures of Tree Graphical Models
18 0.054314438 346 nips-2012-Topology Constraints in Graphical Models
19 0.054091141 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification
20 0.053708769 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation
topicId topicWeight
[(0, 0.124), (1, 0.029), (2, -0.026), (3, 0.015), (4, -0.077), (5, -0.001), (6, -0.014), (7, -0.04), (8, -0.164), (9, 0.091), (10, -0.036), (11, 0.007), (12, 0.054), (13, -0.068), (14, -0.095), (15, -0.001), (16, -0.07), (17, -0.043), (18, 0.053), (19, -0.055), (20, 0.063), (21, 0.095), (22, 0.055), (23, -0.036), (24, 0.028), (25, -0.178), (26, -0.008), (27, 0.014), (28, -0.119), (29, -0.14), (30, -0.004), (31, -0.025), (32, -0.022), (33, -0.017), (34, -0.032), (35, 0.108), (36, 0.082), (37, -0.031), (38, 0.004), (39, 0.163), (40, -0.052), (41, 0.098), (42, 0.003), (43, 0.072), (44, 0.056), (45, 0.012), (46, 0.006), (47, -0.027), (48, 0.008), (49, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.94977862 182 nips-2012-Learning Networks of Heterogeneous Influence
Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola
Abstract: Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1
2 0.64113384 215 nips-2012-Minimizing Uncertainty in Pipelines
Author: Nilesh Dalvi, Aditya Parameswaran, Vibhor Rastogi
Abstract: In this paper, we consider the problem of debugging large pipelines by human labeling. We represent the execution of a pipeline using a directed acyclic graph of AND and OR nodes, where each node represents a data item produced by some operator in the pipeline. We assume that each operator assigns a confidence to each of its output data. We want to reduce the uncertainty in the output by issuing queries to a human, where a query consists of checking if a given data item is correct. In this paper, we consider the problem of asking the optimal set of queries to minimize the resulting output uncertainty. We perform a detailed evaluation of the complexity of the problem for various classes of graphs. We give efficient algorithms for the problem for trees, and show that, for a general dag, the problem is intractable. 1
3 0.59204543 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes
Author: Simon Lyons, Amos J. Storkey, Simo Särkkä
Abstract: Stochastic differential equations (SDE) are a natural tool for modelling systems that are inherently noisy or contain uncertainties that can be modelled as stochastic processes. Crucial to the process of using SDE to build mathematical models is the ability to estimate parameters of those models from observed data. Over the past few decades, significant progress has been made on this problem, but we are still far from having a definitive solution. We describe a novel method of approximating a diffusion process that we show to be useful in Markov chain Monte-Carlo (MCMC) inference algorithms. We take the ‘white’ noise that drives a diffusion process and decompose it into two terms. The first is a ‘coloured noise’ term that can be deterministically controlled by a set of auxilliary variables. The second term is small and enables us to form a linear Gaussian ‘small noise’ approximation. The decomposition allows us to take a diffusion process of interest and cast it in a form that is amenable to sampling by MCMC methods. We explain why many state-of-the-art inference methods fail on highly nonlinear inference problems, and we demonstrate experimentally that our method performs well in such situations. Our results show that this method is a promising new tool for use in inference and parameter estimation problems. 1
4 0.57074898 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes
Author: Charles Blundell, Jeff Beck, Katherine A. Heller
Abstract: We present a Bayesian nonparametric model that discovers implicit social structure from interaction time-series data. Social groups are often formed implicitly, through actions among members of groups. Yet many models of social networks use explicitly declared relationships to infer social structure. We consider a particular class of Hawkes processes, a doubly stochastic point process, that is able to model reciprocity between groups of individuals. We then extend the Infinite Relational Model by using these reciprocating Hawkes processes to parameterise its edges, making events associated with edges co-dependent through time. Our model outperforms general, unstructured Hawkes processes as well as structured Poisson process-based models at predicting verbal and email turn-taking, and military conflicts among nations. 1
5 0.53033066 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification
Author: Yung-kyun Noh, Frank Park, Daniel D. Lee
Abstract: This paper sheds light on some fundamental connections of the diffusion decision making model of neuroscience and cognitive psychology with k-nearest neighbor classification. We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. By applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in knearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectiveness of our classification criteria. 1
6 0.50822318 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization
7 0.50393671 232 nips-2012-Multiplicative Forests for Continuous-Time Processes
8 0.5036239 298 nips-2012-Scalable Inference of Overlapping Communities
9 0.47194257 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification
10 0.4667519 205 nips-2012-MCMC for continuous-time discrete-state systems
11 0.43928167 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering
12 0.41755402 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease
13 0.40688381 327 nips-2012-Structured Learning of Gaussian Graphical Models
14 0.40660018 73 nips-2012-Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing
15 0.39533415 140 nips-2012-Fusion with Diffusion for Robust Visual Tracking
16 0.38611621 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification
17 0.38551462 118 nips-2012-Entangled Monte Carlo
18 0.37930024 39 nips-2012-Analog readout for optical reservoir computers
19 0.37357101 81 nips-2012-Context-Sensitive Decision Forests for Object Detection
20 0.36255997 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification
topicId topicWeight
[(0, 0.042), (21, 0.028), (38, 0.077), (39, 0.028), (42, 0.024), (54, 0.017), (55, 0.036), (74, 0.054), (76, 0.126), (80, 0.082), (87, 0.357), (92, 0.04)]
simIndex simValue paperId paperTitle
1 0.76179576 341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex
Author: Hiroki Terashima, Masato Okada
Abstract: The computational modelling of the primary auditory cortex (A1) has been less fruitful than that of the primary visual cortex (V1) due to the less organized properties of A1. Greater disorder has recently been demonstrated for the tonotopy of A1 that has traditionally been considered to be as ordered as the retinotopy of V1. This disorder appears to be incongruous, given the uniformity of the neocortex; however, we hypothesized that both A1 and V1 would adopt an efficient coding strategy and that the disorder in A1 reflects natural sound statistics. To provide a computational model of the tonotopic disorder in A1, we used a model that was originally proposed for the smooth V1 map. In contrast to natural images, natural sounds exhibit distant correlations, which were learned and reflected in the disordered map. The auditory model predicted harmonic relationships among neighbouring A1 cells; furthermore, the same mechanism used to model V1 complex cells reproduced nonlinear responses similar to the pitch selectivity. These results contribute to the understanding of the sensory cortices of different modalities in a novel and integrated manner.
same-paper 2 0.73259324 182 nips-2012-Learning Networks of Heterogeneous Influence
Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola
Abstract: Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1
3 0.66445416 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks
Author: Junyuan Xie, Linli Xu, Enhong Chen
Abstract: We present a novel approach to low-level vision problems that combines sparse coding and deep networks pre-trained with denoising auto-encoder (DA). We propose an alternative training scheme that successfully adapts DA, originally designed for unsupervised feature learning, to the tasks of image denoising and blind inpainting. Our method’s performance in the image denoising task is comparable to that of KSVD which is a widely used sparse coding technique. More importantly, in blind image inpainting task, the proposed method provides solutions to some complex problems that have not been tackled before. Specifically, we can automatically remove complex patterns like superimposed text from an image, rather than simple patterns like pixels missing at random. Moreover, the proposed method does not need the information regarding the region that requires inpainting to be given a priori. Experimental results demonstrate the effectiveness of the proposed method in the tasks of image denoising and blind inpainting. We also show that our new training scheme for DA is more effective and can improve the performance of unsupervised feature learning. 1
4 0.48642084 150 nips-2012-Hierarchical spike coding of sound
Author: Yan Karklin, Chaitanya Ekanadham, Eero P. Simoncelli
Abstract: Natural sounds exhibit complex statistical regularities at multiple scales. Acoustic events underlying speech, for example, are characterized by precise temporal and frequency relationships, but they can also vary substantially according to the pitch, duration, and other high-level properties of speech production. Learning this structure from data while capturing the inherent variability is an important first step in building auditory processing systems, as well as understanding the mechanisms of auditory perception. Here we develop Hierarchical Spike Coding, a two-layer probabilistic generative model for complex acoustic structure. The first layer consists of a sparse spiking representation that encodes the sound using kernels positioned precisely in time and frequency. Patterns in the positions of first layer spikes are learned from the data: on a coarse scale, statistical regularities are encoded by a second-layer spiking representation, while fine-scale structure is captured by recurrent interactions within the first layer. When fit to speech data, the second layer acoustic features include harmonic stacks, sweeps, frequency modulations, and precise temporal onsets, which can be composed to represent complex acoustic events. Unlike spectrogram-based methods, the model gives a probability distribution over sound pressure waveforms. This allows us to use the second-layer representation to synthesize sounds directly, and to perform model-based denoising, on which we demonstrate a significant improvement over standard methods. 1
5 0.4654358 197 nips-2012-Learning with Recursive Perceptual Representations
Author: Oriol Vinyals, Yangqing Jia, Li Deng, Trevor Darrell
Abstract: Linear Support Vector Machines (SVMs) have become very popular in vision as part of state-of-the-art object recognition and other classification tasks but require high dimensional feature spaces for good performance. Deep learning methods can find more compact representations but current methods employ multilayer perceptrons that require solving a difficult, non-convex optimization problem. We propose a deep non-linear classifier whose layers are SVMs and which incorporates random projection as its core stacking element. Our method learns layers of linear SVMs recursively transforming the original data manifold through a random projection of the weak prediction computed from each layer. Our method scales as linear SVMs, does not rely on any kernel computations or nonconvex optimization, and exhibits better generalization ability than kernel-based SVMs. This is especially true when the number of training samples is smaller than the dimensionality of data, a common scenario in many real-world applications. The use of random projections is key to our method, as we show in the experiments section, in which we observe a consistent improvement over previous –often more complicated– methods on several vision and speech benchmarks. 1
6 0.46334559 188 nips-2012-Learning from Distributions via Support Measure Machines
7 0.46131143 168 nips-2012-Kernel Latent SVM for Visual Recognition
8 0.45997146 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs
9 0.45991173 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes
10 0.4595077 210 nips-2012-Memorability of Image Regions
11 0.45935538 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines
12 0.45917499 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models
13 0.45801041 193 nips-2012-Learning to Align from Scratch
14 0.45706335 274 nips-2012-Priors for Diversity in Generative Latent Variable Models
15 0.45705885 232 nips-2012-Multiplicative Forests for Continuous-Time Processes
16 0.45698881 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
17 0.45698199 279 nips-2012-Projection Retrieval for Classification
18 0.4566581 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model
19 0.45619947 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection
20 0.45611489 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video