nips nips2010 nips2010-40 knowledge-graph by maker-knowledge-mining

40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities


Source: pdf

Author: Tian Lan, Yang Wang, Weilong Yang, Greg Mori

Abstract: We propose a discriminative model for recognizing group activities. Our model jointly captures the group activity, the individual person actions, and the interactions among them. Two new types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. Different from most of the previous latent structured models which assume a predefined structure for the hidden layer, e.g. a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. Our experimental results demonstrate that by inferring this contextual information together with adaptive structures, the proposed model can significantly improve activity recognition performance. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ca Abstract We propose a discriminative model for recognizing group activities. [sent-6, score-0.308]

2 Our model jointly captures the group activity, the individual person actions, and the interactions among them. [sent-7, score-0.497]

3 Two new types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. [sent-8, score-0.385]

4 Our experimental results demonstrate that by inferring this contextual information together with adaptive structures, the proposed model can significantly improve activity recognition performance. [sent-12, score-0.544]

5 1 Introduction Look at the two persons in Fig. [sent-13, score-0.266]

6 1(b)) and we observe the interaction of the person with other persons in the group, it is immediately clear that the first person is queuing, while the second person is talking. [sent-16, score-1.005]

7 In this paper, we argue that actions of individual humans often cannot be inferred alone. [sent-17, score-0.26]

8 We instead focus on developing methods for recognizing group activities by modeling the collective behaviors of individuals in the group. [sent-18, score-0.438]

9 We use activity to refer to a more complex scenario that involves a group of people. [sent-21, score-0.486]

10 1(b), each frame describes a group activity: queuing and talking, while each person in a frame performs a lower level action: talking and facing right, talking and facing left, etc. [sent-23, score-1.168]

11 Our proposed approach is based on exploiting two types of contextual information in group activities. [sent-24, score-0.291]

12 First, the activity of a group and the collective actions of all the individuals serve as context (we call it the group-person interaction) for each other, hence should be modeled jointly in a unified framework. [sent-25, score-0.869]

13 1, knowing the group activity (queuing or talking) helps disambiguate individual human actions which are otherwise hard to recognize. [sent-27, score-0.8]

14 Similarly, knowing most of the persons in the scene are talking (whether facing right or left) allows us to infer the overall group activity (i. [sent-28, score-1.119]

15 Second, the action of an individual can also benefit from knowing the actions of other surrounding persons (which we call the person-person interaction). [sent-31, score-0.765]

16 The fact that the first two persons are facing the same direction provides a strong cue that 1 (a) (b) (c) Figure 1: Role of context in group activities. [sent-34, score-0.588]

17 It is often hard to distinguish actions from each individual person alone (a). [sent-35, score-0.419]

18 However, if we look at the whole scene (b), we can easily recognize the activity of the group and the action of each individual. [sent-36, score-0.687]

19 In this paper, we operationalize on this intuition and introduce a model for recognizing group activities by jointly consider the group activity, the action of each individual, and the interaction among certain pairs of individual actions (c). [sent-37, score-1.096]

20 Similarly, the fact that the last two persons are facing each other indicates they are more likely to be talking. [sent-39, score-0.386]

21 For example, work has been done on exploiting contextual information between scenes and objects [13], objects and objects [5, 16], objects and so-called “stuff” (amorphous spatial extent, e. [sent-42, score-0.332]

22 Most of the previous work in human action recognition focuses on recognizing actions performed by a single person in a video (e. [sent-45, score-0.797]

23 In this setting, there has been work on exploiting contexts provided by scenes [12] or objects [10] to help action recognition. [sent-48, score-0.252]

24 In still image action recognition, object-action context [6, 9, 23, 24] is a popular type of context used for human-object interaction. [sent-49, score-0.353]

25 In that work, person-person context is exploited by a new feature descriptor extracted from a person and its surrounding area. [sent-51, score-0.354]

26 Our model is directly inspired by some recent work on learning discriminative models that allow the use of latent variables [1, 6, 15, 19, 25], particularly when the latent variables have complex structures. [sent-52, score-0.242]

27 object detection [8, 18], action recognition [14, 19], human-object interaction [6], objects and attributes [21], human poses and actions [22], image region and tag correspondence [20], etc. [sent-55, score-0.882]

28 Our contributions: In this paper, we develop a discriminative model for recognizing group activities. [sent-60, score-0.308]

29 (1) Group activity: most of the work in human activity understanding focuses on single-person action recognition. [sent-62, score-0.568]

30 Instead, we present a model for group activities that dynamically decides on interactions among group members. [sent-63, score-0.459]

31 (2) Group-person and person-person interaction: although contextual information has been exploited for visual recognition problems, ours introduces two new types of contextual information that have not been explored before. [sent-64, score-0.339]

32 If we naively consider the interaction between every pair of persons, the model might try to enforce two persons to have take certain pairs of labels even though these two persons have nothing to do with each other. [sent-66, score-0.741]

33 To this end, we propose to use adaptive structures that automatically decide on whether the interaction of two persons should be considered. [sent-69, score-0.617]

34 2 Contextual Representation of Group Activities Our goal is to learn a model that jointly captures the group activity, the individual person actions, and the interactions among them. [sent-71, score-0.497]

35 Group-person interaction represents the co-occurrence between the activity of a group and the actions of all the individuals. [sent-76, score-0.838]

36 Person-person interaction indicates that the action of an individual can benefit from knowing the actions of other people in the same scene. [sent-77, score-0.637]

37 One important difference between our model and previous work is that in addition to learning the parameters in the graphical model, we also automatically infer the graph structures (see Sec. [sent-79, score-0.319]

38 by running a person detector) so the persons in the image have been found. [sent-83, score-0.531]

39 On the training data, each image is associated with a group activity label, and each person in the image is associated with an action label. [sent-84, score-1.026]

40 , Im be the set of persons found in the image I, we extract features x from the image I in the form of x = (x0 , x1 , . [sent-92, score-0.414]

41 , xm ), where x0 is the aggregation of feature descriptors of all the persons in the image (we call it root feature vector), and xi (i = 1, 2, . [sent-95, score-0.451]

42 We denote the collective actions of all the persons in the image as h = (h1 , h2 , . [sent-99, score-0.605]

43 , hm ), where hi ∈ H is the action label of the person Ii and H is the set of all possible action labels. [sent-102, score-0.708]

44 The image I is associated with a group activity label y ∈ Y, where Y is the set of all possible activity labels. [sent-103, score-0.968]

45 We assume there are connections between some pairs of action labels (hj , hk ). [sent-104, score-0.362]

46 , hm ), where a vertex vi ∈ V corresponds to the action label hi , and an edge (vj , vk ) ∈ E corresponds to the interactions between hj and hk . [sent-109, score-0.723]

47 We use fw (x, h, y; G) to denote the compatibility of the image feature x, the collective action labels h, the group activity label y, and the graph G = (V, E). [sent-110, score-1.429]

48 1 are described in the following: Image-Action Potential w1 φ1 (xj , hj ): This potential function models the compatibility between the j-th person’s action label hj and its image feature xj . [sent-113, score-1.029]

49 It is parameterized as: w1b 1(hj = b) · xj w1 φ1 (xj , hj ) = (2) b∈H where xj is the feature vector extracted from the j-th person and we use 1(·) to denote the indicator function. [sent-114, score-0.548]

50 3 Action-Activity Potential w2 φ2 (y, hj ): This potential function models the compatibility between the group activity label y and the j-th person’s action label hj . [sent-116, score-1.464]

51 It is parameterized as: w0 φ0 (y, x0 ) = w0a 1(y = a) · x0 (5) a∈Y The parameter w0a can be interpreted as a root filter that measures the compatibility of the class label a and the root feature vector x0 . [sent-119, score-0.346]

52 If the graph structure G is known and fixed, we can apply standard learning and inference techniques of latent SVMs. [sent-125, score-0.295]

53 For our application, a good graph structure turns out to be crucial, since it determines which person interacts (i. [sent-126, score-0.364]

54 The interaction of individuals turns out to be important for group activity recognition, and fixing the interaction (i. [sent-129, score-0.868]

55 We instead develop our own inference and learning algorithms that automatically infer the best graph structure from a particular set. [sent-134, score-0.278]

56 1 Inference Given the model parameters w, the inference problem is to find the best group activity label y ∗ for a new image x. [sent-136, score-0.676]

57 The group activity label of the image x can be inferred as: y ∗ = arg maxy Fw (x, y). [sent-138, score-0.677]

58 Since we can enumerate all the possible y ∈ Y and predict the activity label y ∗ of x, the main difficulty of solving the inference problem is the maximization over Gy and hy according to Eq. [sent-139, score-0.783]

59 Holding the graph structure Gy fixed, optimize the action labels hy for the x, y pair: hy = arg max w Ψ(x, h , y; Gy ) h (7) 2. [sent-148, score-1.058]

60 Holding hy fixed, optimize graph structure Gy for the x, y pair: Gy = arg max w Ψ(x, hy , y; G ) G 4 (8) The problem in Eq. [sent-149, score-0.814]

61 Even if we can enumerate all the graph structures, we might want to restrict ourselves to a subset of graph structures that will lead to efficient inference (e. [sent-154, score-0.451]

62 Another choice is to choose graph structures that are “sparse”, since sparse graphs tend to have fewer cycles, and loopy BP tends to be efficient in graphs with fewer cycles. [sent-163, score-0.29]

63 When hy is fixed, we can formulate an integer linear program (ILP) to find the optimal graph structure (Eq. [sent-165, score-0.474]

64 The ILP can be written as: max z zjk ≤ d, zjk ψjk , s. [sent-168, score-0.405]

65 j∈V k∈V j∈V zjk ≤ d, zjk = zkj , zjk ∈ {0, 1}, ∀j, k (9) k∈V where we use ψjk to collectively represent the summation of all the pairwise potential functions in Eq. [sent-170, score-0.593]

66 , N ), we would like to train the model parameter w that tends to produce the correct group activity y for a new test image x. [sent-181, score-0.56]

67 Note that the action labels h are observed on training data, but the graph structure G (or equivalently the variables z) are unobserved and will be automatically inferred. [sent-182, score-0.45]

68 max fw (xn , hn , y n ; Gyn ) − max max fw (xn , hy , y; Gy ) ≥ ∆(y, y n ) − ξn , ∀n, ∀y (10b) Gyn Gy hy n where ∆(y, y ) is a loss function measuring the cost incurred by predicting y when the groundtruth label is y n . [sent-185, score-1.264]

69 10 can be equivalently written as an unconstrained problem: N min w,ξ 1 ||w||2 + C (Ln − Rn ) 2 n=1 (12a) where Ln = max max max(∆(y, y n ) + fw (xn , hy , y; Gy )), Rn = max fw (xn , hn , y n ; Gyn )(12b) y hy Gy Gyn We use the non-convex bundle optimization in [7] to solve Eq. [sent-187, score-1.179]

70 Let (y ∗ , h∗ , G ∗ ) be the solution to the following optimization problem: max max max ∆(y, y n ) + fw (xn , h, y; G) (13) y h G 5 (a) (b) (c) (d) Figure 3: Different structures of person-person interaction. [sent-193, score-0.426]

71 ˆ Now we describe how to compute ∂w Rn , let G be the solution to the following optimization problem: max fw (xn , hn , y n ; G ) (14) G ˆ Then we can show that the subgradient ∂w Rn can be calculated as ∂w Rn = Ψ(xn , y n , hn ; G). [sent-207, score-0.371]

72 4 Experiments We demonstrate our model on the collective activity dataset introduced in [3]. [sent-213, score-0.402]

73 In the original dataset, all the persons in every tenth frame of the videos are assigned one of the following five categories: crossing, waiting, queuing, walking and talking, and one of the following eight pose categories: right, frontright, front, front-left, left, back-left, back and back-right. [sent-215, score-0.41]

74 Based on the original dataset, we define five activity categories including crossing, waiting, queuing, walking and talking. [sent-216, score-0.419]

75 We define forty action labels by combining the pose and activity information, i. [sent-217, score-0.613]

76 the action labels include crossing and facing right, crossing and facing front-right, etc. [sent-219, score-0.844]

77 We assign each frame into one of the five activity categories, by taking the majority of actions of persons (ignoring their pose categories) in that frame. [sent-220, score-0.861]

78 We select one fourth of the video clips from each activity category to form the test set, and the rest of the video clips are used for training. [sent-221, score-0.499]

79 the HOG descriptor [4]) as the feature vector xi in our framework, we train a 40-class SVM classifier based on the HOG descriptor of each individual and their associated action labels. [sent-224, score-0.345]

80 The performance of different structures of person-person in6 (a) (b) Figure 4: Confusion matrices for activity classification: (a) global bag-of-words (b) our approach. [sent-237, score-0.467]

81 5 Table 1: Comparison of activity classification accuracies of different methods. [sent-254, score-0.355]

82 the number of crossing examples is more than twice that of the queuing or talking examples, we report both overall and mean per-class accuracies. [sent-263, score-0.51]

83 Importance of adaptive structures of person-person interaction: In Table 1, the pre-defined structures such as the minimum spanning tree and the ε-neighborhood graph do not perform as well as the one without person-person interaction. [sent-272, score-0.501]

84 However, if we consider the graph structure as part of our model and directly infer it using our learning algorithm, we can make sure that the obtained structures are those useful for differentiating various activities. [sent-275, score-0.327]

85 5 Conclusion We have presented a discriminative model for group activity recognition which jointly captures the group activity, the individual person actions, and the interactions among them. [sent-279, score-1.097]

86 We have exploited two new types of contextual information: group-person interaction and person-person interaction. [sent-280, score-0.323]

87 We also introduce an adaptive structures algorithm that automatically infers the optimal structure of person-person interaction in a latent SVM framework. [sent-281, score-0.483]

88 7 (a) (b) (c) (d) (e) (f) Figure 5: Visualization of the weights across pairs of action classes for each of the five activity classes. [sent-283, score-0.524]

89 Consider the example (a), under the activity label crossing, the model favors seeing actions of crossing with different poses together (indicated by the area bounded by the red box). [sent-285, score-0.851]

90 we can see that within the crossing category, the model favors seeing the same pose together, indicated by the light regions along the diagonal. [sent-287, score-0.261]

91 The labels C, S, Q, W, T indicate crossing, waiting, queuing, walking and talking respectively. [sent-294, score-0.265]

92 The yellow lines represent the learned structure of person-person interaction, from which some important interactions for each activity can be obtained, e. [sent-296, score-0.465]

93 a chain structure which connects persons facing the same direction is “important” for the queuing activity. [sent-298, score-0.593]

94 : Collective activity classification using spatio-temporal relationship among people. [sent-318, score-0.323]

95 Modeling temporal structure of decomposable motion segments for activity classification. [sent-395, score-0.364]

96 Max-margin hidden conditional random fields for human action recognition. [sent-429, score-0.245]

97 A discriminative latent model of image region and object tag correspondence. [sent-440, score-0.289]

98 Recognizing human actions from still images with latent poses. [sent-451, score-0.321]

99 Grouplet: a structured image representation for recognizing human and object interactions. [sent-456, score-0.24]

100 Modeling mutual context of object and human pose in human-object interaction activities. [sent-461, score-0.332]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('activity', 0.323), ('hy', 0.301), ('persons', 0.266), ('hj', 0.231), ('action', 0.201), ('fw', 0.196), ('person', 0.191), ('actions', 0.186), ('gy', 0.185), ('zjk', 0.183), ('crossing', 0.18), ('queuing', 0.166), ('interaction', 0.166), ('talking', 0.164), ('group', 0.163), ('graph', 0.132), ('contextual', 0.128), ('facing', 0.12), ('structures', 0.113), ('compatibility', 0.101), ('gyn', 0.091), ('latent', 0.091), ('label', 0.085), ('recognizing', 0.085), ('collective', 0.079), ('hk', 0.074), ('image', 0.074), ('waiting', 0.074), ('interactions', 0.072), ('hn', 0.068), ('vision', 0.064), ('spanning', 0.063), ('activities', 0.061), ('discriminative', 0.06), ('walking', 0.058), ('ln', 0.056), ('fraser', 0.055), ('recognition', 0.054), ('clips', 0.052), ('objects', 0.051), ('individuals', 0.05), ('root', 0.047), ('svm', 0.046), ('pose', 0.046), ('wang', 0.046), ('loopy', 0.045), ('potential', 0.044), ('human', 0.044), ('xn', 0.044), ('connections', 0.044), ('enumerate', 0.043), ('labels', 0.043), ('poses', 0.042), ('knowing', 0.042), ('individual', 0.042), ('tree', 0.041), ('infer', 0.041), ('structure', 0.041), ('simon', 0.041), ('laptev', 0.04), ('stuff', 0.04), ('frame', 0.04), ('adaptive', 0.039), ('context', 0.039), ('max', 0.039), ('bp', 0.039), ('categories', 0.038), ('object', 0.037), ('ilp', 0.037), ('rn', 0.037), ('video', 0.036), ('descriptor', 0.035), ('favors', 0.035), ('desai', 0.035), ('parameterized', 0.034), ('automatically', 0.033), ('mori', 0.033), ('accuracies', 0.032), ('feature', 0.032), ('inferred', 0.032), ('connection', 0.031), ('yao', 0.031), ('inference', 0.031), ('global', 0.031), ('vertex', 0.03), ('hm', 0.03), ('style', 0.03), ('xj', 0.03), ('pattern', 0.029), ('jointly', 0.029), ('ramanan', 0.029), ('lines', 0.029), ('exploited', 0.029), ('vedaldi', 0.028), ('surrounding', 0.028), ('yang', 0.028), ('tag', 0.027), ('subgradients', 0.027), ('hog', 0.027), ('confusion', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities

Author: Tian Lan, Yang Wang, Weilong Yang, Greg Mori

Abstract: We propose a discriminative model for recognizing group activities. Our model jointly captures the group activity, the individual person actions, and the interactions among them. Two new types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. Different from most of the previous latent structured models which assume a predefined structure for the hidden layer, e.g. a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. Our experimental results demonstrate that by inferring this contextual information together with adaptive structures, the proposed model can significantly improve activity recognition performance. 1

2 0.17958789 281 nips-2010-Using body-anchored priors for identifying actions in single images

Author: Leonid Karlinsky, Michael Dinerstein, Shimon Ullman

Abstract: This paper presents an approach to the visual recognition of human actions using only single images as input. The task is easy for humans but difficult for current approaches to object recognition, because instances of different actions may be similar in terms of body pose, and often require detailed examination of relations between participating objects and body parts in order to be recognized. The proposed approach applies a two-stage interpretation procedure to each training and test image. The first stage produces accurate detection of the relevant body parts of the actor, forming a prior for the local evidence needed to be considered for identifying the action. The second stage extracts features that are anchored to the detected body parts, and uses these features and their feature-to-part relations in order to recognize the action. The body anchored priors we propose apply to a large range of human actions. These priors allow focusing on the relevant regions and relations, thereby significantly simplifying the learning process and increasing recognition performance. 1

3 0.12856384 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence

Author: Yang Wang, Greg Mori

Abstract: We propose a discriminative latent model for annotating images with unaligned object-level textual annotations. Instead of using the bag-of-words image representation currently popular in the computer vision community, our model explicitly captures more intricate relationships underlying visual and textual information. In particular, we model the mapping that translates image regions to annotations. This mapping allows us to relate image regions to their corresponding annotation terms. We also model the overall scene label as latent information. This allows us to cluster test images. Our training data consist of images and their associated annotations. But we do not have access to the ground-truth regionto-annotation mapping or the overall scene label. We develop a novel variant of the latent SVM framework to model them as latent variables. Our experimental results demonstrate the effectiveness of the proposed model compared with other baseline methods.

4 0.11944751 28 nips-2010-An Alternative to Low-level-Sychrony-Based Methods for Speech Detection

Author: Javier R. Movellan, Paul L. Ruvolo

Abstract: Determining whether someone is talking has applications in many areas such as speech recognition, speaker diarization, social robotics, facial expression recognition, and human computer interaction. One popular approach to this problem is audio-visual synchrony detection [10, 21, 12]. A candidate speaker is deemed to be talking if the visual signal around that speaker correlates with the auditory signal. Here we show that with the proper visual features (in this case movements of various facial muscle groups), a very accurate detector of speech can be created that does not use the audio signal at all. Further we show that this person independent visual-only detector can be used to train very accurate audio-based person dependent voice models. The voice model has the advantage of being able to identify when a particular person is speaking even when they are not visible to the camera (e.g. in the case of a mobile robot). Moreover, we show that a simple sensory fusion scheme between the auditory and visual models improves performance on the task of talking detection. The work here provides dramatic evidence about the efficacy of two very different approaches to multimodal speech detection on a challenging database. 1

5 0.10901958 127 nips-2010-Inferring Stimulus Selectivity from the Spatial Structure of Neural Network Dynamics

Author: Kanaka Rajan, L Abbott, Haim Sompolinsky

Abstract: How are the spatial patterns of spontaneous and evoked population responses related? We study the impact of connectivity on the spatial pattern of fluctuations in the input-generated response, by comparing the distribution of evoked and intrinsically generated activity across the different units of a neural network. We develop a complementary approach to principal component analysis in which separate high-variance directions are derived for each input condition. We analyze subspace angles to compute the difference between the shapes of trajectories corresponding to different network states, and the orientation of the low-dimensional subspaces that driven trajectories occupy within the full space of neuronal activity. In addition to revealing how the spatiotemporal structure of spontaneous activity affects input-evoked responses, these methods can be used to infer input selectivity induced by network dynamics from experimentally accessible measures of spontaneous activity (e.g. from voltage- or calcium-sensitive optical imaging experiments). We conclude that the absence of a detailed spatial map of afferent inputs and cortical connectivity does not limit our ability to design spatially extended stimuli that evoke strong responses. 1 1 Motivation Stimulus selectivity in neural networks was historically measured directly from input-driven responses [1], and only later were similar selectivity patterns observed in spontaneous activity across the cortical surface [2, 3]. We argue that it is possible to work in the reverse order, and show that analyzing the distribution of spontaneous activity across the different units in the network can inform us about the selectivity of evoked responses to stimulus features, even when no apparent sensory map exists. Sensory-evoked responses are typically divided into a signal component generated by the stimulus and a noise component corresponding to ongoing activity that is not directly related to the stimulus. Subsequent effort focuses on understanding how the signal depends on properties of the stimulus, while the remaining, irregular part of the response is treated as additive noise. The distinction between external stochastic processes and the noise generated deterministically as a function of intrinsic recurrence has been previously studied in chaotic neural networks [4]. It has also been suggested that internally generated noise is not additive and can be more sensitive to the frequency and amplitude of the input, compared to the signal component of the response [5 - 8]. In this paper, we demonstrate that the interaction between deterministic intrinsic noise and the spatial properties of the external stimulus is also complex and nonlinear. We study the impact of network connectivity on the spatial pattern of input-driven responses by comparing the structure of evoked and spontaneous activity, and show how the unique signature of these dynamics determines the selectivity of networks to spatial features of the stimuli driving them. 2 Model description In this section, we describe the network model and the methods we use to analyze its dynamics. Subsequent sections explore how the spatial patterns of spontaneous and evoked responses are related in terms of the distribution of the activity across the network. Finally, we show how the stimulus selectivity of the network can be inferred from its spontaneous activity patterns. 2.1 Network elements We build a firing rate model of N interconnected units characterized by a statistical description of the underlying circuitry (as N → ∞, the system “self averages” making the description independent of a specific network architecture, see also [11, 12]). Each unit is characterized by an activation variable xi ∀ i = 1, 2, . . . N , and a nonlinear response function ri which relates to xi through ri = R0 + φ(xi ) where,   R0 tanh x for x ≤ 0 R0 φ(x) = (1) x  (Rmax − R0 ) tanh otherwise. Rmax −R0 Eq. 1 allows us to independently set the maximum firing rate Rmax and the background rate R0 to biologically reasonable values, while retaining a maximum gradient at x = 0 to guarantee the smoothness of the transition to chaos [4]. We introduce a recurrent weight matrix with element Jij equivalent to the strength of the synapse from unit j → unit i. The individual weights are chosen independently and randomly from a Gaus2 sian distribution with mean and variance given by [Jij ]J = 0 and Jij J = g 2 /N , where square brackets are ensemble averages [9 - 11,13]. The control parameter g which scales as the variance of the synaptic weights, is particularly important in determining whether or not the network produces spontaneous activity with non-trivial dynamics (Specifically, g = 0 corresponds to a completely uncoupled network and a network with g = 1 generates non-trivial spontaneous activity [4, 9, 10]). The activation variable for each unit xi is therefore determined by the relation, N τr dxi = −xi + g Jij rj + Ii , dt j=1 with the time scale of the network set by the single-neuron time constant τr of 10 ms. 2 (2) The amplitude I of an oscillatory external input of frequency f , is always the same for each unit, but in some examples shown in this paper, we introduce a neuron-specific phase factor θi , chosen randomly from a uniform distribution between 0 and 2π, such that Ii = I cos(2πf t + θi ) ∀ i = 1, 2, . . . N. (3) In visually responsive neurons, this mimics a population of simple cells driven by a drifting grating of temporal frequency f , with the different phases arising from offsets in spatial receptive field locations. The randomly assigned phases in our model ensure that the spatial pattern of input is not correlated with the pattern of recurrent connectivity. In our selectivity analysis however (Fig. 3), we replace the random phases with spatial input patterns that are aligned with network connectivity. 2.2 PCA redux Principal component analysis (PCA) has been applied profitably to neuronal recordings (see for example [14]) but these analyses often plot activity trajectories corresponding to different network states using the fixed principal component coordinates derived from combined activities under all stimulus conditions. Our analysis offers a complementary approach whereby separate principal components are derived for each stimulus condition, and the resulting principal angles reveal not only the difference between the shapes of trajectories corresponding to different network states, but also the orientation of the low-dimensional subspaces these trajectories occupy within the full N -dimensional space of neuronal activity. The instantaneous network state can be described by a point in an N -dimensional space with coordinates equal to the firing rates of the N units. Over time, the network activity traverses a trajectory in this N -dimensional space and PCA can be used to delineate the subspace in which this trajectory lies. The analysis is done by diagonalizing the equal-time cross-correlation matrix of network firing rates given by, Dij = (ri (t) − ri )(rj (t) − rj ) , (4) where

6 0.098992713 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks

7 0.095718667 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts

8 0.093225919 89 nips-2010-Factorized Latent Spaces with Structured Sparsity

9 0.0863019 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

10 0.084764145 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

11 0.084564246 152 nips-2010-Learning from Logged Implicit Exploration Data

12 0.083090909 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

13 0.079907648 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

14 0.078290731 44 nips-2010-Brain covariance selection: better individual functional connectivity models using population prior

15 0.076130167 244 nips-2010-Sodium entry efficiency during action potentials: A novel single-parameter family of Hodgkin-Huxley models

16 0.07572981 235 nips-2010-Self-Paced Learning for Latent Variable Models

17 0.075000279 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

18 0.074227601 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades

19 0.06860017 192 nips-2010-Online Classification with Specificity Constraints

20 0.067947671 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.225), (1, 0.047), (2, -0.127), (3, -0.062), (4, -0.031), (5, -0.05), (6, -0.085), (7, -0.006), (8, 0.024), (9, 0.064), (10, -0.056), (11, 0.093), (12, -0.042), (13, 0.01), (14, 0.077), (15, -0.082), (16, -0.023), (17, -0.047), (18, 0.013), (19, 0.02), (20, -0.021), (21, 0.013), (22, 0.015), (23, 0.017), (24, 0.075), (25, 0.0), (26, -0.047), (27, 0.041), (28, 0.031), (29, 0.012), (30, 0.154), (31, -0.068), (32, 0.104), (33, 0.088), (34, 0.06), (35, 0.075), (36, -0.113), (37, -0.071), (38, -0.081), (39, 0.083), (40, -0.039), (41, -0.143), (42, -0.012), (43, -0.009), (44, -0.108), (45, 0.022), (46, 0.089), (47, 0.117), (48, -0.02), (49, -0.091)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95558375 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities

Author: Tian Lan, Yang Wang, Weilong Yang, Greg Mori

Abstract: We propose a discriminative model for recognizing group activities. Our model jointly captures the group activity, the individual person actions, and the interactions among them. Two new types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. Different from most of the previous latent structured models which assume a predefined structure for the hidden layer, e.g. a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. Our experimental results demonstrate that by inferring this contextual information together with adaptive structures, the proposed model can significantly improve activity recognition performance. 1

2 0.74326682 281 nips-2010-Using body-anchored priors for identifying actions in single images

Author: Leonid Karlinsky, Michael Dinerstein, Shimon Ullman

Abstract: This paper presents an approach to the visual recognition of human actions using only single images as input. The task is easy for humans but difficult for current approaches to object recognition, because instances of different actions may be similar in terms of body pose, and often require detailed examination of relations between participating objects and body parts in order to be recognized. The proposed approach applies a two-stage interpretation procedure to each training and test image. The first stage produces accurate detection of the relevant body parts of the actor, forming a prior for the local evidence needed to be considered for identifying the action. The second stage extracts features that are anchored to the detected body parts, and uses these features and their feature-to-part relations in order to recognize the action. The body anchored priors we propose apply to a large range of human actions. These priors allow focusing on the relevant regions and relations, thereby significantly simplifying the learning process and increasing recognition performance. 1

3 0.58689553 244 nips-2010-Sodium entry efficiency during action potentials: A novel single-parameter family of Hodgkin-Huxley models

Author: Anand Singh, Renaud Jolivet, Pierre Magistretti, Bruno Weber

Abstract: Sodium entry during an action potential determines the energy efficiency of a neuron. The classic Hodgkin-Huxley model of action potential generation is notoriously inefficient in that regard with about 4 times more charges flowing through the membrane than the theoretical minimum required to achieve the observed depolarization. Yet, recent experimental results show that mammalian neurons are close to the optimal metabolic efficiency and that the dynamics of their voltage-gated channels is significantly different than the one exhibited by the classic Hodgkin-Huxley model during the action potential. Nevertheless, the original Hodgkin-Huxley model is still widely used and rarely to model the squid giant axon from which it was extracted. Here, we introduce a novel family of HodgkinHuxley models that correctly account for sodium entry, action potential width and whose voltage-gated channels display a dynamics very similar to the most recent experimental observations in mammalian neurons. We speak here about a family of models because the model is parameterized by a unique parameter the variations of which allow to reproduce the entire range of experimental observations from cortical pyramidal neurons to Purkinje cells, yielding a very economical framework to model a wide range of different central neurons. The present paper demonstrates the performances and discuss the properties of this new family of models. 1

4 0.5587908 28 nips-2010-An Alternative to Low-level-Sychrony-Based Methods for Speech Detection

Author: Javier R. Movellan, Paul L. Ruvolo

Abstract: Determining whether someone is talking has applications in many areas such as speech recognition, speaker diarization, social robotics, facial expression recognition, and human computer interaction. One popular approach to this problem is audio-visual synchrony detection [10, 21, 12]. A candidate speaker is deemed to be talking if the visual signal around that speaker correlates with the auditory signal. Here we show that with the proper visual features (in this case movements of various facial muscle groups), a very accurate detector of speech can be created that does not use the audio signal at all. Further we show that this person independent visual-only detector can be used to train very accurate audio-based person dependent voice models. The voice model has the advantage of being able to identify when a particular person is speaking even when they are not visible to the camera (e.g. in the case of a mobile robot). Moreover, we show that a simple sensory fusion scheme between the auditory and visual models improves performance on the task of talking detection. The work here provides dramatic evidence about the efficacy of two very different approaches to multimodal speech detection on a challenging database. 1

5 0.51770735 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence

Author: Yang Wang, Greg Mori

Abstract: We propose a discriminative latent model for annotating images with unaligned object-level textual annotations. Instead of using the bag-of-words image representation currently popular in the computer vision community, our model explicitly captures more intricate relationships underlying visual and textual information. In particular, we model the mapping that translates image regions to annotations. This mapping allows us to relate image regions to their corresponding annotation terms. We also model the overall scene label as latent information. This allows us to cluster test images. Our training data consist of images and their associated annotations. But we do not have access to the ground-truth regionto-annotation mapping or the overall scene label. We develop a novel variant of the latent SVM framework to model them as latent variables. Our experimental results demonstrate the effectiveness of the proposed model compared with other baseline methods.

6 0.51468688 213 nips-2010-Predictive Subspace Learning for Multi-view Data: a Large Margin Approach

7 0.51146394 89 nips-2010-Factorized Latent Spaces with Structured Sparsity

8 0.4723601 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

9 0.45645598 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts

10 0.44505802 171 nips-2010-Movement extraction by detecting dynamics switches and repetitions

11 0.43924296 39 nips-2010-Bayesian Action-Graph Games

12 0.43559775 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

13 0.42920908 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process

14 0.42864585 235 nips-2010-Self-Paced Learning for Latent Variable Models

15 0.42206693 111 nips-2010-Hallucinations in Charles Bonnet Syndrome Induced by Homeostasis: a Deep Boltzmann Machine Model

16 0.42030996 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression

17 0.40942442 168 nips-2010-Monte-Carlo Planning in Large POMDPs

18 0.40523654 262 nips-2010-Switched Latent Force Models for Movement Segmentation

19 0.40498826 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

20 0.40484175 149 nips-2010-Learning To Count Objects in Images


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.027), (27, 0.067), (30, 0.481), (35, 0.013), (45, 0.196), (50, 0.028), (60, 0.025), (77, 0.033), (78, 0.016), (90, 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9896093 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis

Author: Hariharan Narayanan, Sanjoy Mitter

Abstract: The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume log 1 and curvature is unavoidable. Whether the known lower bound of O( k + 2 δ ) 2 for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. Here is the desired bound on the error and δ is a bound on the probability of failure. We improve the best currently known upper bound [14] of 2 log 1 log4 k log 1 O( k2 + 2 δ ) to O k min k, 2 + 2 δ . Based on these results, we 2 devise a simple algorithm for k−means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension. 1

2 0.98007953 264 nips-2010-Synergies in learning words and their referents

Author: Mark Johnson, Katherine Demuth, Bevan Jones, Michael J. Black

Abstract: This paper presents Bayesian non-parametric models that simultaneously learn to segment words from phoneme strings and learn the referents of some of those words, and shows that there is a synergistic interaction in the acquisition of these two kinds of linguistic information. The models themselves are novel kinds of Adaptor Grammars that are an extension of an embedding of topic models into PCFGs. These models simultaneously segment phoneme sequences into words and learn the relationship between non-linguistic objects to the words that refer to them. We show (i) that modelling inter-word dependencies not only improves the accuracy of the word segmentation but also of word-object relationships, and (ii) that a model that simultaneously learns word-object relationships and word segmentation segments more accurately than one that just learns word segmentation on its own. We argue that these results support an interactive view of language acquisition that can take advantage of synergies such as these. 1

3 0.9477247 283 nips-2010-Variational Inference over Combinatorial Spaces

Author: Alexandre Bouchard-côté, Michael I. Jordan

Abstract: Since the discovery of sophisticated fully polynomial randomized algorithms for a range of #P problems [1, 2, 3], theoretical work on approximate inference in combinatorial spaces has focused on Markov chain Monte Carlo methods. Despite their strong theoretical guarantees, the slow running time of many of these randomized algorithms and the restrictive assumptions on the potentials have hindered the applicability of these algorithms to machine learning. Because of this, in applications to combinatorial spaces simple exact models are often preferred to more complex models that require approximate inference [4]. Variational inference would appear to provide an appealing alternative, given the success of variational methods for graphical models [5]; unfortunately, however, it is not obvious how to develop variational approximations for combinatorial objects such as matchings, partial orders, plane partitions and sequence alignments. We propose a new framework that extends variational inference to a wide range of combinatorial spaces. Our method is based on a simple assumption: the existence of a tractable measure factorization, which we show holds in many examples. Simulations on a range of matching models show that the algorithm is more general and empirically faster than a popular fully polynomial randomized algorithm. We also apply the framework to the problem of multiple alignment of protein sequences, obtaining state-of-the-art results on the BAliBASE dataset [6]. 1

4 0.87781817 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems

Author: Ronny Luss, Saharon Rosset, Moni Shahar

Abstract: A new algorithm for isotonic regression is presented based on recursively partitioning the solution space. We develop efficient methods for each partitioning subproblem through an equivalent representation as a network flow problem, and prove that this sequence of partitions converges to the global solution. These network flow problems can further be decomposed in order to solve very large problems. Success of isotonic regression in prediction and our algorithm’s favorable computational properties are demonstrated through simulated examples as large as 2 × 105 variables and 107 constraints.

same-paper 5 0.8589071 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities

Author: Tian Lan, Yang Wang, Weilong Yang, Greg Mori

Abstract: We propose a discriminative model for recognizing group activities. Our model jointly captures the group activity, the individual person actions, and the interactions among them. Two new types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. Different from most of the previous latent structured models which assume a predefined structure for the hidden layer, e.g. a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. Our experimental results demonstrate that by inferring this contextual information together with adaptive structures, the proposed model can significantly improve activity recognition performance. 1

6 0.82041776 270 nips-2010-Tight Sample Complexity of Large-Margin Learning

7 0.7841475 220 nips-2010-Random Projection Trees Revisited

8 0.71935302 222 nips-2010-Random Walk Approach to Regret Minimization

9 0.71021742 75 nips-2010-Empirical Risk Minimization with Approximations of Probabilistic Grammars

10 0.69290435 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

11 0.69077557 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

12 0.68207341 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework

13 0.68089902 285 nips-2010-Why are some word orders more common than others? A uniform information density account

14 0.66744429 155 nips-2010-Learning the context of a category

15 0.65933383 274 nips-2010-Trading off Mistakes and Don't-Know Predictions

16 0.65291846 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods

17 0.65241039 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case

18 0.64800507 233 nips-2010-Scrambled Objects for Least-Squares Regression

19 0.64708114 288 nips-2010-Worst-case bounds on the quality of max-product fixed-points

20 0.64416045 221 nips-2010-Random Projections for $k$-means Clustering