nips nips2004 nips2004-120 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tanzeem Choudhury, Sumit Basu
Abstract: In this work, we quantitatively investigate the ways in which a given person influences the joint turn-taking behavior in a conversation. After collecting an auditory database of social interactions among a group of twenty-three people via wearable sensors (66 hours of data each over two weeks), we apply speech and conversation detection methods to the auditory streams. These methods automatically locate the conversations, determine their participants, and mark which participant was speaking when. We then model the joint turn-taking behavior as a Mixed-Memory Markov Model [1] that combines the statistics of the individual subjects' self-transitions and the partners ' cross-transitions. The mixture parameters in this model describe how much each person 's individual behavior contributes to the joint turn-taking behavior of the pair. By estimating these parameters, we thus estimate how much influence each participant has in determining the joint turntaking behavior. We show how this measure correlates significantly with betweenness centrality [2], an independent measure of an individual's importance in a social network. This result suggests that our estimate of conversational influence is predictive of social influence. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract In this work, we quantitatively investigate the ways in which a given person influences the joint turn-taking behavior in a conversation. [sent-4, score-0.309]
2 After collecting an auditory database of social interactions among a group of twenty-three people via wearable sensors (66 hours of data each over two weeks), we apply speech and conversation detection methods to the auditory streams. [sent-5, score-1.291]
3 These methods automatically locate the conversations, determine their participants, and mark which participant was speaking when. [sent-6, score-0.097]
4 We then model the joint turn-taking behavior as a Mixed-Memory Markov Model [1] that combines the statistics of the individual subjects' self-transitions and the partners ' cross-transitions. [sent-7, score-0.082]
5 The mixture parameters in this model describe how much each person 's individual behavior contributes to the joint turn-taking behavior of the pair. [sent-8, score-0.304]
6 By estimating these parameters, we thus estimate how much influence each participant has in determining the joint turntaking behavior. [sent-9, score-0.476]
7 We show how this measure correlates significantly with betweenness centrality [2], an independent measure of an individual's importance in a social network. [sent-10, score-0.685]
8 This result suggests that our estimate of conversational influence is predictive of social influence. [sent-11, score-0.758]
9 1 Introduction People's relationships are largely determined by their social interactions, and the nature of their conversations plays a large part in defining those interactions. [sent-12, score-0.427]
10 There is a long history of work in the social sciences aimed at understanding the interactions between individuals and the influences they have on each others' behavior. [sent-13, score-0.368]
11 However, existing studies of social network interactions have either been restricted to online communities, where unambiguous measurements about how people interact can be obtained, or have been forced to rely on questionnaires or diaries to get data on face-to-face interactions. [sent-14, score-0.416]
12 Studies show that self-reports correspond poorly to communication behavior as recorded by independent observers [3]. [sent-16, score-0.115]
13 In contrast, we have used wearable sensors and recent advances in speech processing techniques to automatically gather information about conversations: when they occurred, who was involved, and who was speaking when. [sent-17, score-0.416]
14 Our goal was then to see if we could examine the influence a given speaker had on the turn-taking behavior of her conversational partners. [sent-18, score-0.715]
15 Specifically, we wanted to see if we could better explain the turn-taking transitions observed in a given conversation between subjects i and} by combining the transitions typical to i and those typical toj. [sent-19, score-0.639]
16 We could then interpret the contribution from i as her influence on the joint turn-taking behavior. [sent-20, score-0.475]
17 In this paper, we first describe how we extract speech and conversation information from the raw sensor data, and how we can use this to estimate the underlying social network. [sent-21, score-0.746]
18 Finally, we show the performance of our method on our collected data and how it correlates well with other metrics of social influence. [sent-23, score-0.218]
19 2 Sensing and Modeling Face-to-face Communication Networks Although people heavily rely on email, telephone, and other virtual means of communication, high complexity information is primarily exchanged through face-toface interaction [4]. [sent-24, score-0.162]
20 Prior work on sensing face-to-face networks have been based on proximity measures [5],[6], a weak approximation of the actual communication network. [sent-25, score-0.129]
21 Our focus is to model the network based on conversations that take place within a community. [sent-26, score-0.275]
22 To do this, we need to gather data from real-world interactions. [sent-27, score-0.037]
23 We thus used an experiment conducted at MIT [7] in which 23 people agreed to wear the sociometer, a wearable data acquisition board [7],[8]. [sent-28, score-0.2]
24 The device stored audio information from a single microphone at 8 KHz. [sent-29, score-0.122]
25 During the experiment the users wore the device both indoors and outdoors for six hours a day for 11 days. [sent-30, score-0.041]
26 The participants were a mix of students, facuity, and administrative support staff who were distributed across different floors of a laboratory building and across different research groups. [sent-31, score-0.103]
27 3 Speech and Conversation Detection Given the set of auditory streams of each subject, we now have the problem of detecting who is speaking when and to whom they are speaking. [sent-32, score-0.233]
28 We break this problem into two parts: voicing/speech detection and conversation detection. [sent-33, score-0.441]
29 1 Voicing and Speech Detection To detect the speech, we use the linked-HMM model for VOlClllg and speech detection presented in [9]. [sent-35, score-0.261]
30 This structure models the speech as two layers (see Figure 1); the lower level hidden state represents whether the current frame of audio is voiced or unvoiced (i. [sent-36, score-0.463]
31 , whether the audio in the frame has a harmonic structure, as in a vowel), while the second level represents whether we are in a speech or nonspeech segment. [sent-38, score-0.366]
32 The principle behind the model is that while there are many voiced sounds in our environment (car horns, tones, computer sounds, etc. [sent-39, score-0.114]
33 ), the dynamics of voiced/unvoiced transitions provide a unique signature for human speech; the higher level is able to capture this dynamics since the lower level 's transitions are dependent on this variable. [sent-40, score-0.156]
34 speech layer (S[t) = {O, I}) voicing layer (V[t) = {O,l}) observation layer (3 features) Figure 1: Graphical model for the voicing and speech detector. [sent-41, score-0.885]
35 To apply this model to data, the 8 kHz audio is split into 256-sample frames (32 milliseconds) with a 128-sample overlap. [sent-42, score-0.138]
36 The model was then trained on 8000 frames of fully labeled data. [sent-45, score-0.053]
37 We chose this model because of its robustness to noise and distance from the microphone : even at 20 feet away more than 90% of voiced frames were detected with negligible false alarms (see [9]). [sent-46, score-0.213]
38 The results from this model are the binary sequences v[t} and s[t} signifying whether the frame is voiced and whether it is in a speech segment for all frames of the audio. [sent-47, score-0.449]
39 2 Conversation Detection Once the voicing and speech segments are identified, we are sti II left with the problem of determining who was talking with whom and when. [sent-49, score-0.469]
40 To approach this, we use the method of conversation detection described in [10]. [sent-50, score-0.441]
41 The basic idea is simple: since the speech detection method described above is robust to distance, the voicing segments v[t} of all the participants in the conversation will be picked up by the detector in all of the streams (this is referred to as a "mixed stream" in [10]). [sent-51, score-1.093]
42 We can then examine the mutual information of the binary voicing estimates between each person as a matching measure. [sent-52, score-0.371]
43 Since both voicing streams will be nearly identical, the mutual information should peak when the two participants are either involved in a conversation or are overhearing a conversation from a nearby group. [sent-53, score-1.167]
44 However, we have the added complication that the streams are only roughly aligned in time. [sent-54, score-0.12]
45 We can express the alignment measure a[k] for an offset of k between the two voicing streams as follows: " p(v,[t]=i,v, [t-l]=j) a[k] = l(vJt], v, [t - k]) = L. [sent-56, score-0.322]
46 j p(vJt]=i)p(v, [t-k]=j) where i and j take on values {O, l} for unvoiced and voiced states respectively. [sent-73, score-0.126]
47 The distributions for p(v\, vJ and its marginals are estimated over a window of one minute (T=3750 frames). [sent-74, score-0.059]
48 To see how well this measure performs, we examine an example pair of subjects who had one five-minute conversation over the course of half an hour. [sent-75, score-0.57]
49 The streams are correctly aligned at k=0, and by examining the value of ark} over a large range we can investigate its utility for conversation detection and for aligning the auditory streams (see Figure 2). [sent-76, score-0.762]
50 The peaks are both strong and unique to the correct alignment (k=0), implying that this is indeed a good measure for detecting conversations and aligning the audio in our setup. [sent-77, score-0.356]
51 By choosing the optimal threshold via the ROC curve, we can achieve 100% detection with no false alarms using time windows T of one minute. [sent-78, score-0.107]
52 For each minute of data in each speaker' s stream, we computed ark] for k ranging over +/- 30 seconds with T=3750 for each of the other 22 subjects in the study. [sent-82, score-0.195]
53 Since the microphones for each subject are pre-calibrated to have approximately equal energy response, we can classify each voicing segment among the speakers by integrating the audio energy over the segment and choosing the argmax over subjects. [sent-85, score-0.476]
54 It is still possible that the resulting subject does not correspond to the actua l speaker (she could simply be the one nearest to a nonsubject who is speaking), we determine an overall threshold below which the assignment to the speaker is rejected. [sent-86, score-0.195]
55 For this work, we rejected all conversations with more than two participants or those that were simply overheard by the subj ects. [sent-88, score-0.406]
56 Finally, we tested the overall performance of our method by comparing with a hand-labeling of conversation occurrence and length from four subjects over 2 days (48 hours of data) and found an 87% agreement with the hand labeling. [sent-89, score-0.58]
57 3 The Turn-Taking Signal S; Finally, given the location of the conversations and who is speaking when, we can S ;, defined over five-second blocks, which is create a new signal for each subject i , 1 when the subject is holding the turn and 0 otherwise. [sent-92, score-0.473]
58 We define the holder of the turn as whoever has produced more speech during the five-second block. [sent-93, score-0.191]
59 Thus, within a given conversation between subjects i and j , the turn-taking signals are complements of each other, i. [sent-94, score-0.539]
60 I 4 I Estimating the Social Network Structure Once we have detected the pairwise conversations we can identify the communication that occurs within the community and map the links between individuals. [sent-97, score-0.331]
61 The link structure is calculated from the total number of conversations each subj ect has with others: interactions with another person that account for less than 5% of the subject's total interactions are removed from the graph. [sent-98, score-0.656]
62 To get an intuitive picture of the interaction pattern within the group, we visualize the network diagram by performing multi-dimensional scaling (MDS) on the geodesic distances (number of hops) between the people (Figure 3). [sent-99, score-0.194]
63 From this we see that people whose offices are in the same general space seem to be close in the communication space as well. [sent-101, score-0.167]
64 Figure 3: Estimated network of subjects 5 Modeling the Influence of Turn-taking Behavior in Conversations When we talk to other people we are influenced by their style of interaction. [sent-102, score-0.432]
65 Sometimes this influence is strong and sometimes insignificant - we are interested in finding a way to quantify this effect. [sent-103, score-0.412]
66 We probably all know people who have a strong effect on our natural interaction style when we talk to them, causing us to change our style as a result . [sent-104, score-0.361]
67 For example, consider someone who never seems to stop talking once it is her turn. [sent-105, score-0.04]
68 She may end up imposing her style on us, and we may consequently end up not having enough of a chance to talk, whereas in most other circumstances we tend to be an active and equal participant. [sent-106, score-0.074]
69 Let us consider the influence subject} has on subj ect i. [sent-108, score-0.472]
70 We can compute i's average self-transition table , peS: I S;_I) , via simple counts over all conversations for subject i (excluding those with i). [sent-109, score-0.328]
71 Similarly, we can compute j's average cross-transition table, p(Stk I Sf- I)' over all subjects k (excluding i) with which} had conversations. [sent-110, score-0.168]
72 The question now is, for a given conversation between i and}, how much does} 's average cross-transition help explain peS: I S;_I ' Sf- I) ? [sent-111, score-0.371]
73 Also, since the a ij sum to one over j, in this case a ll = 1- a '2 . [sent-115, score-0.054]
74 , the contribution of subject 2's average turn-taking behavior on her interactions with subject 1. [sent-118, score-0.354]
75 1 Learning the influence parameters To find the a ij values, we would like to maximize the likelihood of the data. [sent-120, score-0.504]
76 Since we have already estimated the relevant conditional probability tables, we can do this via constrained gradient ascent, where we ensure that a ij>O [12]. [sent-121, score-0.059]
77 ) = peS; I S,~,) - pes; I S,~ ,) ~ LfJ;k P(S; I S,~, )+(I- LfJ;k )P(S; I S,~,) We can show that the likelihood is convex in the a ij ' so we are guaranteed to achieve the global maximum by climbing the gradient. [sent-123, score-0.054]
78 2 Aggregate Influence over Multiple Conversations In order to evaluate whether this model provides additional benefit over using a given subject's self-transition statistics alone, we estimated the reduction in KL divergence by using the mixture of interactions vs. [sent-126, score-0.212]
79 We found that by using the mixture model we were able to reduce the KL divergence between a subject's average self-transition statistics and the observed transitions by 32% on average. [sent-128, score-0.108]
80 However, in the mixture model we have added extra degrees of freedom, and hence tested whether the better fit was statistically significant by using the F-test. [sent-129, score-0.166]
81 01 , implying that the mixture model is a significantly better fit to the data. [sent-131, score-0.085]
82 In order to find a single influence parameter for each person, we took a subset of 80 conversations and aggregated all the pairwise influences each subject had on all her conversational partners. [sent-132, score-1.031]
83 In order to compute this aggregate value, there is an additional aspect about a ij we need to consider. [sent-133, score-0.153]
84 If the subject's self-transition matrix and the complement of the partner's cross-transition matrix are very similar, the influence scores are indeterminate, since for a given interaction S; = -,s: : i. [sent-134, score-0.497]
85 , we would essentially be trying to find the best way to linearly combine two identical transition matrices. [sent-136, score-0.038]
86 6 Link between Conversational Dynamics and Social Role Betweenness centrality is a measure frequently used in social network analysis to characterize importance in the social network. [sent-138, score-0.701]
87 For a given person i, it is defined as being proportional to the number of pairs of people (j,k) for which that person lies along the shortest path in the network between j and k. [sent-139, score-0.415]
88 It is thus used to estimate how much control an individual has over the interaction of others, since it is a count of how often she is a "gateway" between others. [sent-140, score-0.055]
89 People with high betweenness are often perceived as leaders [2]. [sent-141, score-0.139]
90 We computed the betweenness centrality for the subjects from the 80 conversations using the network structure we estimated in Section 3. [sent-142, score-0.915]
91 We then discovered an interesting and statistically significant correlation between a person's aggregate influence score and her betweenness centrality -- it appears that a person's interaction style is indicative of her role within the community based on the centrality measure. [sent-143, score-1.571]
92 Figure 4 shows the weighted influence values along with the centrality scores. [sent-144, score-0.713]
93 This resulted in her having artificially high centrality (based on link structure) but not high influence based on her interaction style. [sent-146, score-0.843]
94 We computed the statistical correlation between the influence values and the centrality scores, both including and excluding the outlier subject ID 8. [sent-147, score-0.944]
95 The two measures, namely influence and centrality, are highly correlated, and this correlation is statistically significant when we exclude ID 8, who was the coordinator of the project and whose centrality is likely to be artificially large. [sent-155, score-0.933]
96 7 Conclusion We have developed a model for quantitatively representing the influence of a given person j's turn-taking behavior on the joint-turn taking behavior with person i. [sent-156, score-0.827]
97 On real-world data gathered from wearable sensors, we have estimated the relevant component statistics about turn taking behavior via robust speech processing techniques, and have shown how we can use the Mixed-Memory Markov formalism to estimate the behavioral influence. [sent-157, score-0.398]
98 Finally, we have shown a strong correlation between a person's aggregate influence value and her betweenness centrality score. [sent-158, score-1.0]
99 This implies that our estimate of conversational influence may be indicative of importance within the social network. [sent-159, score-0.792]
100 05 Figure 4: Aggregate influence values and corresponding centrality scores. [sent-170, score-0.713]
wordName wordTfidf (topN-words)
[('influence', 0.412), ('conversation', 0.371), ('centrality', 0.301), ('conversations', 0.243), ('voicing', 0.202), ('speech', 0.191), ('social', 0.184), ('subjects', 0.168), ('conversational', 0.162), ('pes', 0.141), ('betweenness', 0.139), ('person', 0.138), ('streams', 0.12), ('people', 0.107), ('participants', 0.103), ('aggregate', 0.099), ('interactions', 0.093), ('wearable', 0.093), ('id', 0.092), ('voiced', 0.086), ('subject', 0.085), ('audio', 0.085), ('media', 0.074), ('basu', 0.074), ('style', 0.074), ('excluding', 0.071), ('detection', 0.07), ('ark', 0.07), ('choudhury', 0.07), ('doctoral', 0.07), ('sensing', 0.069), ('influences', 0.06), ('subj', 0.06), ('vjt', 0.06), ('communication', 0.06), ('speaking', 0.06), ('behavior', 0.055), ('partner', 0.055), ('interaction', 0.055), ('speaker', 0.055), ('ij', 0.054), ('frames', 0.053), ('auditory', 0.053), ('talk', 0.051), ('markov', 0.051), ('transitions', 0.05), ('correlation', 0.049), ('significant', 0.049), ('artificially', 0.046), ('coordinator', 0.046), ('lfj', 0.046), ('microphones', 0.046), ('stk', 0.046), ('hours', 0.041), ('talking', 0.04), ('unvoiced', 0.04), ('find', 0.038), ('others', 0.037), ('microphone', 0.037), ('tech', 0.037), ('gather', 0.037), ('alarms', 0.037), ('arts', 0.037), ('participant', 0.037), ('segments', 0.036), ('contribution', 0.036), ('sensors', 0.035), ('indicative', 0.034), ('correlates', 0.034), ('interacting', 0.034), ('layer', 0.033), ('autocorrelation', 0.032), ('estimated', 0.032), ('frame', 0.032), ('kl', 0.032), ('network', 0.032), ('examine', 0.031), ('individuals', 0.031), ('aggregated', 0.031), ('statistically', 0.03), ('scores', 0.03), ('modeling', 0.03), ('lab', 0.029), ('quantitatively', 0.029), ('fit', 0.029), ('divergence', 0.029), ('mixture', 0.029), ('link', 0.029), ('segment', 0.029), ('whether', 0.029), ('aligning', 0.028), ('sounds', 0.028), ('dynamics', 0.028), ('community', 0.028), ('significantly', 0.027), ('stream', 0.027), ('minute', 0.027), ('joint', 0.027), ('relevant', 0.027), ('outlier', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999928 120 nips-2004-Modeling Conversational Dynamics as a Mixed-Memory Markov Process
Author: Tanzeem Choudhury, Sumit Basu
Abstract: In this work, we quantitatively investigate the ways in which a given person influences the joint turn-taking behavior in a conversation. After collecting an auditory database of social interactions among a group of twenty-three people via wearable sensors (66 hours of data each over two weeks), we apply speech and conversation detection methods to the auditory streams. These methods automatically locate the conversations, determine their participants, and mark which participant was speaking when. We then model the joint turn-taking behavior as a Mixed-Memory Markov Model [1] that combines the statistics of the individual subjects' self-transitions and the partners ' cross-transitions. The mixture parameters in this model describe how much each person 's individual behavior contributes to the joint turn-taking behavior of the pair. By estimating these parameters, we thus estimate how much influence each participant has in determining the joint turntaking behavior. We show how this measure correlates significantly with betweenness centrality [2], an independent measure of an individual's importance in a social network. This result suggests that our estimate of conversational influence is predictive of social influence. 1
2 0.14463474 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach
Author: Francis R. Bach, Michael I. Jordan
Abstract: We present an algorithm to perform blind, one-microphone speech separation. Our algorithm separates mixtures of speech without modeling individual speakers. Instead, we formulate the problem of speech separation as a problem in segmenting the spectrogram of the signal into two or more disjoint sets. We build feature sets for our segmenter using classical cues from speech psychophysics. We then combine these features into parameterized affinity matrices. We also take advantage of the fact that we can generate training examples for segmentation by artificially superposing separately-recorded signals. Thus the parameters of the affinity matrices can be tuned using recent work on learning spectral clustering [1]. This yields an adaptive, speech-specific segmentation algorithm that can successfully separate one-microphone speech mixtures. 1
3 0.10710765 152 nips-2004-Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization
Author: Fei Sha, Lawrence K. Saul
Abstract: An auditory “scene”, composed of overlapping acoustic sources, can be viewed as a complex object whose constituent parts are the individual sources. Pitch is known to be an important cue for auditory scene analysis. In this paper, with the goal of building agents that operate in human environments, we describe a real-time system to identify the presence of one or more voices and compute their pitch. The signal processing in the front end is based on instantaneous frequency estimation, a method for tracking the partials of voiced speech, while the pattern-matching in the back end is based on nonnegative matrix factorization, an unsupervised algorithm for learning the parts of complex objects. While supporting a framework to analyze complicated auditory scenes, our system maintains real-time operability and state-of-the-art performance in clean speech.
4 0.090927891 5 nips-2004-A Harmonic Excitation State-Space Approach to Blind Separation of Speech
Author: Rasmus K. Olsson, Lars K. Hansen
Abstract: We discuss an identification framework for noisy speech mixtures. A block-based generative model is formulated that explicitly incorporates the time-varying harmonic plus noise (H+N) model for a number of latent sources observed through noisy convolutive mixtures. All parameters including the pitches of the source signals, the amplitudes and phases of the sources, the mixing filters and the noise statistics are estimated by maximum likelihood, using an EM-algorithm. Exact averaging over the hidden sources is obtained using the Kalman smoother. We show that pitch estimation and source separation can be performed simultaneously. The pitch estimates are compared to laryngograph (EGG) measurements. Artificial and real room mixtures are used to demonstrate the viability of the approach. Intelligible speech signals are re-synthesized from the estimated H+N models.
5 0.070333533 20 nips-2004-An Auditory Paradigm for Brain-Computer Interfaces
Author: N. J. Hill, Thomas N. Lal, Karin Bierig, Niels Birbaumer, Bernhard Schölkopf
Abstract: Motivated by the particular problems involved in communicating with “locked-in” paralysed patients, we aim to develop a braincomputer interface that uses auditory stimuli. We describe a paradigm that allows a user to make a binary decision by focusing attention on one of two concurrent auditory stimulus sequences. Using Support Vector Machine classification and Recursive Channel Elimination on the independent components of averaged eventrelated potentials, we show that an untrained user’s EEG data can be classified with an encouragingly high level of accuracy. This suggests that it is possible for users to modulate EEG signals in a single trial by the conscious direction of attention, well enough to be useful in BCI. 1
6 0.0628611 97 nips-2004-Learning Efficient Auditory Codes Using Spikes Predicts Cochlear Filters
7 0.054016002 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
8 0.053982321 156 nips-2004-Result Analysis of the NIPS 2003 Feature Selection Challenge
9 0.051615823 139 nips-2004-Optimal Aggregation of Classifiers and Boosting Maps in Functional Magnetic Resonance Imaging
10 0.044339467 106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification
11 0.043028872 162 nips-2004-Semi-Markov Conditional Random Fields for Information Extraction
12 0.041515365 21 nips-2004-An Information Maximization Model of Eye Movements
13 0.041455533 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
14 0.038360327 102 nips-2004-Learning first-order Markov models for control
15 0.0375931 46 nips-2004-Constraining a Bayesian Model of Human Visual Speed Perception
16 0.036313694 68 nips-2004-Face Detection --- Efficient and Rank Deficient
17 0.036295045 80 nips-2004-Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale
18 0.035918701 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants
19 0.035282902 124 nips-2004-Multiple Alignment of Continuous Time Series
20 0.033869207 198 nips-2004-Unsupervised Variational Bayesian Learning of Nonlinear Models
topicId topicWeight
[(0, -0.129), (1, -0.018), (2, -0.026), (3, -0.118), (4, -0.094), (5, -0.055), (6, 0.152), (7, 0.06), (8, 0.033), (9, -0.014), (10, -0.019), (11, 0.004), (12, -0.055), (13, 0.055), (14, -0.034), (15, -0.024), (16, 0.002), (17, -0.055), (18, -0.028), (19, 0.06), (20, -0.007), (21, 0.062), (22, 0.01), (23, -0.045), (24, 0.099), (25, -0.024), (26, 0.039), (27, -0.066), (28, -0.109), (29, -0.021), (30, -0.006), (31, -0.065), (32, 0.057), (33, -0.0), (34, -0.049), (35, -0.095), (36, 0.012), (37, -0.028), (38, 0.034), (39, 0.047), (40, 0.069), (41, -0.082), (42, 0.044), (43, -0.03), (44, -0.115), (45, 0.09), (46, 0.019), (47, -0.014), (48, 0.011), (49, 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.9376123 120 nips-2004-Modeling Conversational Dynamics as a Mixed-Memory Markov Process
Author: Tanzeem Choudhury, Sumit Basu
Abstract: In this work, we quantitatively investigate the ways in which a given person influences the joint turn-taking behavior in a conversation. After collecting an auditory database of social interactions among a group of twenty-three people via wearable sensors (66 hours of data each over two weeks), we apply speech and conversation detection methods to the auditory streams. These methods automatically locate the conversations, determine their participants, and mark which participant was speaking when. We then model the joint turn-taking behavior as a Mixed-Memory Markov Model [1] that combines the statistics of the individual subjects' self-transitions and the partners ' cross-transitions. The mixture parameters in this model describe how much each person 's individual behavior contributes to the joint turn-taking behavior of the pair. By estimating these parameters, we thus estimate how much influence each participant has in determining the joint turntaking behavior. We show how this measure correlates significantly with betweenness centrality [2], an independent measure of an individual's importance in a social network. This result suggests that our estimate of conversational influence is predictive of social influence. 1
2 0.73174536 152 nips-2004-Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization
Author: Fei Sha, Lawrence K. Saul
Abstract: An auditory “scene”, composed of overlapping acoustic sources, can be viewed as a complex object whose constituent parts are the individual sources. Pitch is known to be an important cue for auditory scene analysis. In this paper, with the goal of building agents that operate in human environments, we describe a real-time system to identify the presence of one or more voices and compute their pitch. The signal processing in the front end is based on instantaneous frequency estimation, a method for tracking the partials of voiced speech, while the pattern-matching in the back end is based on nonnegative matrix factorization, an unsupervised algorithm for learning the parts of complex objects. While supporting a framework to analyze complicated auditory scenes, our system maintains real-time operability and state-of-the-art performance in clean speech.
3 0.67033273 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach
Author: Francis R. Bach, Michael I. Jordan
Abstract: We present an algorithm to perform blind, one-microphone speech separation. Our algorithm separates mixtures of speech without modeling individual speakers. Instead, we formulate the problem of speech separation as a problem in segmenting the spectrogram of the signal into two or more disjoint sets. We build feature sets for our segmenter using classical cues from speech psychophysics. We then combine these features into parameterized affinity matrices. We also take advantage of the fact that we can generate training examples for segmentation by artificially superposing separately-recorded signals. Thus the parameters of the affinity matrices can be tuned using recent work on learning spectral clustering [1]. This yields an adaptive, speech-specific segmentation algorithm that can successfully separate one-microphone speech mixtures. 1
4 0.54286236 5 nips-2004-A Harmonic Excitation State-Space Approach to Blind Separation of Speech
Author: Rasmus K. Olsson, Lars K. Hansen
Abstract: We discuss an identification framework for noisy speech mixtures. A block-based generative model is formulated that explicitly incorporates the time-varying harmonic plus noise (H+N) model for a number of latent sources observed through noisy convolutive mixtures. All parameters including the pitches of the source signals, the amplitudes and phases of the sources, the mixing filters and the noise statistics are estimated by maximum likelihood, using an EM-algorithm. Exact averaging over the hidden sources is obtained using the Kalman smoother. We show that pitch estimation and source separation can be performed simultaneously. The pitch estimates are compared to laryngograph (EGG) measurements. Artificial and real room mixtures are used to demonstrate the viability of the approach. Intelligible speech signals are re-synthesized from the estimated H+N models.
5 0.41597134 20 nips-2004-An Auditory Paradigm for Brain-Computer Interfaces
Author: N. J. Hill, Thomas N. Lal, Karin Bierig, Niels Birbaumer, Bernhard Schölkopf
Abstract: Motivated by the particular problems involved in communicating with “locked-in” paralysed patients, we aim to develop a braincomputer interface that uses auditory stimuli. We describe a paradigm that allows a user to make a binary decision by focusing attention on one of two concurrent auditory stimulus sequences. Using Support Vector Machine classification and Recursive Channel Elimination on the independent components of averaged eventrelated potentials, we show that an untrained user’s EEG data can be classified with an encouragingly high level of accuracy. This suggests that it is possible for users to modulate EEG signals in a single trial by the conscious direction of attention, well enough to be useful in BCI. 1
6 0.41561332 21 nips-2004-An Information Maximization Model of Eye Movements
7 0.40102297 53 nips-2004-Discriminant Saliency for Visual Recognition from Cluttered Scenes
8 0.35627687 170 nips-2004-Similarity and Discrimination in Classical Conditioning: A Latent Variable Account
9 0.35521039 106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification
10 0.35036463 199 nips-2004-Using Machine Learning to Break Visual Human Interaction Proofs (HIPs)
11 0.32827953 27 nips-2004-Bayesian Regularization and Nonnegative Deconvolution for Time Delay Estimation
12 0.31869 97 nips-2004-Learning Efficient Auditory Codes Using Spikes Predicts Cochlear Filters
13 0.31832814 74 nips-2004-Harmonising Chorales by Probabilistic Inference
14 0.31804833 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models
15 0.31412286 80 nips-2004-Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale
16 0.31377426 29 nips-2004-Beat Tracking the Graphical Model Way
17 0.31375667 156 nips-2004-Result Analysis of the NIPS 2003 Feature Selection Challenge
18 0.30425939 149 nips-2004-Probabilistic Inference of Alternative Splicing Events in Microarray Data
19 0.29902536 136 nips-2004-On Semi-Supervised Classification
20 0.29125431 193 nips-2004-Theories of Access Consciousness
topicId topicWeight
[(13, 0.052), (15, 0.064), (26, 0.03), (31, 0.013), (33, 0.697), (50, 0.025)]
simIndex simValue paperId paperTitle
1 0.99443275 63 nips-2004-Expectation Consistent Free Energies for Approximate Inference
Author: Manfred Opper, Ole Winther
Abstract: We propose a novel a framework for deriving approximations for intractable probabilistic models. This framework is based on a free energy (negative log marginal likelihood) and can be seen as a generalization of adaptive TAP [1, 2, 3] and expectation propagation (EP) [4, 5]. The free energy is constructed from two approximating distributions which encode different aspects of the intractable model such a single node constraints and couplings and are by construction consistent on a chosen set of moments. We test the framework on a difficult benchmark problem with binary variables on fully connected graphs and 2D grid graphs. We find good performance using sets of moments which either specify factorized nodes or a spanning tree on the nodes (structured approximation). Surprisingly, the Bethe approximation gives very inferior results even on grids. 1
2 0.99338096 139 nips-2004-Optimal Aggregation of Classifiers and Boosting Maps in Functional Magnetic Resonance Imaging
Author: Vladimir Koltchinskii, Manel Martínez-ramón, Stefan Posse
Abstract: We study a method of optimal data-driven aggregation of classifiers in a convex combination and establish tight upper bounds on its excess risk with respect to a convex loss function under the assumption that the solution of optimal aggregation problem is sparse. We use a boosting type algorithm of optimal aggregation to develop aggregate classifiers of activation patterns in fMRI based on locally trained SVM classifiers. The aggregation coefficients are then used to design a ”boosting map” of the brain needed to identify the regions with most significant impact on classification. 1
3 0.9919576 39 nips-2004-Coarticulation in Markov Decision Processes
Author: Khashayar Rohanimanesh, Robert Platt, Sridhar Mahadevan, Roderic Grupen
Abstract: We investigate an approach for simultaneously committing to multiple activities, each modeled as a temporally extended action in a semi-Markov decision process (SMDP). For each activity we define a set of admissible solutions consisting of the redundant set of optimal policies, and those policies that ascend the optimal statevalue function associated with them. A plan is then generated by merging them in such a way that the solutions to the subordinate activities are realized in the set of admissible solutions satisfying the superior activities. We present our theoretical results and empirically evaluate our approach in a simulated domain. 1
same-paper 4 0.99175656 120 nips-2004-Modeling Conversational Dynamics as a Mixed-Memory Markov Process
Author: Tanzeem Choudhury, Sumit Basu
Abstract: In this work, we quantitatively investigate the ways in which a given person influences the joint turn-taking behavior in a conversation. After collecting an auditory database of social interactions among a group of twenty-three people via wearable sensors (66 hours of data each over two weeks), we apply speech and conversation detection methods to the auditory streams. These methods automatically locate the conversations, determine their participants, and mark which participant was speaking when. We then model the joint turn-taking behavior as a Mixed-Memory Markov Model [1] that combines the statistics of the individual subjects' self-transitions and the partners ' cross-transitions. The mixture parameters in this model describe how much each person 's individual behavior contributes to the joint turn-taking behavior of the pair. By estimating these parameters, we thus estimate how much influence each participant has in determining the joint turntaking behavior. We show how this measure correlates significantly with betweenness centrality [2], an independent measure of an individual's importance in a social network. This result suggests that our estimate of conversational influence is predictive of social influence. 1
5 0.99045551 126 nips-2004-Nearly Tight Bounds for the Continuum-Armed Bandit Problem
Author: Robert D. Kleinberg
Abstract: In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when there is an infinite strategy set. Here we consider the case when the set of strategies is a subset of Rd , and the cost functions are continuous. In the d = 1 case, we improve on the best-known upper and lower bounds, closing the gap to a sublogarithmic factor. We also consider the case where d > 1 and the cost functions are convex, adapting a recent online convex optimization algorithm of Zinkevich to the sparser feedback model of the multi-armed bandit problem. 1
6 0.98890311 149 nips-2004-Probabilistic Inference of Alternative Splicing Events in Microarray Data
7 0.98671347 186 nips-2004-The Correlated Correspondence Algorithm for Unsupervised Registration of Nonrigid Surfaces
8 0.98646492 115 nips-2004-Maximum Margin Clustering
9 0.89900768 143 nips-2004-PAC-Bayes Learning of Conjunctions and Classification of Gene-Expression Data
10 0.89898336 80 nips-2004-Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale
11 0.8982175 56 nips-2004-Dynamic Bayesian Networks for Brain-Computer Interfaces
12 0.88990325 32 nips-2004-Boosting on Manifolds: Adaptive Regularization of Base Classifiers
13 0.88784951 72 nips-2004-Generalization Error and Algorithmic Convergence of Median Boosting
14 0.88511878 44 nips-2004-Conditional Random Fields for Object Recognition
15 0.88432449 147 nips-2004-Planning for Markov Decision Processes with Sparse Stochasticity
16 0.88151073 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
17 0.87339491 166 nips-2004-Semi-supervised Learning via Gaussian Processes
18 0.8729943 38 nips-2004-Co-Validation: Using Model Disagreement on Unlabeled Data to Validate Classification Algorithms
19 0.8724016 64 nips-2004-Experts in a Markov Decision Process
20 0.87035716 204 nips-2004-Variational Minimax Estimation of Discrete Distributions under KL Loss