acl acl2012 acl2012-113 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Timo Baumann ; David Schlangen
Abstract: We present a component for incremental speech synthesis (iSS) and a set of applications that demonstrate its capabilities. This component can be used to increase the responsivity and naturalness of spoken interactive systems. While iSS can show its full strength in systems that generate output incrementally, we also discuss how even otherwise unchanged systems may profit from its capabilities.
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract We present a component for incremental speech synthesis (iSS) and a set of applications that demonstrate its capabilities. [sent-3, score-0.99]
2 This component can be used to increase the responsivity and naturalness of spoken interactive systems. [sent-4, score-0.351]
3 While iSS can show its full strength in systems that generate output incrementally, we also discuss how even otherwise unchanged systems may profit from its capabilities. [sent-5, score-0.075]
4 1 Introduction Current state of the art in speech synthesis for spoken dialogue systems (SDSs) is for the synthesis component to expect full utterances (in textual form) as input and to deliver an audio stream verbalising this full utterance. [sent-6, score-1.605]
5 At best, timing information is returned as well so that a control component can determine in case of an interruption / barge-in by the user where in the utterance this happened (Edlund, 2008; Matsuyama et al. [sent-7, score-0.45]
6 We want to argue here that providing capabilities to speech synthesis components for dealing with units smaller than full utterances can be beneficial for a whole range of interactive speech-based systems. [sent-9, score-0.813]
7 In the easiest case, incremental synthesis simply reduces the utterance-initial delay before speech output starts, as output already starts when its beginning has been produced. [sent-10, score-0.914]
8 In an otherwise conventional dialogue system, the synthesis module could make it possible to interrupt the output speech stream (e. [sent-11, score-0.727]
9 , when a noise event is detected that makes it likely that the user will not be able to hear what is being said), and continue production when the interruption is over. [sent-13, score-0.126]
10 If other SDS components are adapted more to take advantage of incremental speech synthesis, even more David Schlangen University of Bielefeld Faculty of Linguistics and Literary Studies Germany david . [sent-14, score-0.44]
11 Another, less conventional type of speech-based system that could profit from iSS is “babelfish-like” simultaneous speech-to-speech translation. [sent-20, score-0.067]
12 Our iSS component is built on top of an existing non-incremental synthesis component, MaryTTS (Schröder and Trouvain, 2003), and on an existing architecture for incremental processing, INPROTK (Baumann and Schlangen, 2012). [sent-23, score-0.927]
13 1 2 Related Work Typically, in current SDSs utterances are generated (either by lookup/template-based generation, or, less commonly, by concept-to-utterance natural language generation (NLG)) and then synthesised in full (McTear, 2002). [sent-27, score-0.152]
14 There is very little work on incremental synthesis (i. [sent-28, score-0.738]
15 , one that would work with units smaller than full utterances). [sent-30, score-0.094]
16 Edlund (2008) outlines some requirements for incremental speech synthesis: to give constant feedback to the dialogue system about what has been delivered, to be interruptible (and possibly continue from that position), and to run in real time. [sent-31, score-0.552]
17 Edlund (2008) also presents a prototype that meets these requirements, but is limited to diphone synthesis that is performed non-incrementally before utterance delivery starts. [sent-32, score-0.734]
18 Skantze and Hjalmarsson (2010) describe a system that generates utterances incrementally (albeit in a WOz-enviroment), allowing earlier components to incrementally produce and revise their hypothesis about the user’s utterance. [sent-34, score-0.226]
19 The system can automatically play hesitations if by the time it has the turn it does not know what to produce yet. [sent-35, score-0.043]
20 Our approach is complementary to this work, as it tar- gets a lower layer, the realisation or synthesis layer. [sent-37, score-0.44]
21 Where their system relies on ‘regular’ speech synthesis which is called on relatively short utterance fragments (and thus pays for the increase in responsiveness with a reduction in synthesis quality, esp. [sent-38, score-1.216]
22 regarding prosody), we aim to incrementalize the speech synthesis component itself. [sent-39, score-0.692]
23 (201 1) have presented an incremental formulation for HMM-based speech synthesis. [sent-41, score-0.409]
24 However, their system works offline and is fed by nonincrementally produced phoneme target sequences. [sent-42, score-0.198]
25 1The code of the toolkit and its iSS component and the demo applications discussed below have been released as open-source at http : / / inprot k . [sent-43, score-0.197]
26 104 We aim for a fully incremental speech synthesis component that can be integrated into dialogue systems. [sent-46, score-1.133]
27 There is some work on incremental NLG (Kilger and Finkler, 1995; Finkler, 1997; Guhe, 2007); however, that work does not concern itself with the actual synthesis of speech and hence describes only what would generate the input to our component. [sent-47, score-0.849]
28 In that way, target phoneme sequences annotated with durations and pitch contours are generated, in what is called the linguistic pre-processing step. [sent-50, score-0.407]
29 The then following synthesis step proper can be executed in one of several ways, with HMM-based and unit-selection synthesis currently being seen as producing the perceptually best results (Taylor, 2009). [sent-51, score-0.953]
30 The former works by first turning the target sequence into a sequence of HMM states; a global optimization then computes a stream of vocoding features that optimize both HMM emission probabilities and continuity constraints (Tokuda et al. [sent-52, score-0.232]
31 Finally, the parameter frames are fed to a vocoder which generates the speech audio signal. [sent-54, score-0.466]
32 Unit-selection, in contrast, searches for the best sequence of (variably sized) units of speech in a large, annotated corpus of recordings, aiming to find a sequence that closely matches the target sequence. [sent-55, score-0.171]
33 (201 1) have presented an online formulation of the optimization step in HMM-based synthesis. [sent-57, score-0.027]
34 pitch independently (and outside of the HMM framework) without altering other parameters or deteriorating speech quality. [sent-60, score-0.28]
35 Figure 1: Hierarchic structure of incremental units describing an example utterance as it is being produced during utterance delivery. [sent-61, score-0.808]
36 2 System Architecture Our component works by reducing the aforementioned top-down requirements. [sent-63, score-0.141]
37 For example, not all words of the utterance need to be known to produce the sentence-level intonation (which itself however is necessary to determine pitch contours) as long as a structural outline of the utterance is available. [sent-65, score-0.662]
38 Thus, our component generates its data structures incrementally in a top-down-and-left-to-right fashion with different amounts of pre-planning, using several processing modules that work concurrently. [sent-67, score-0.257]
39 This results in a ‘triangular’ structure (illustrated in Figure 1) where only the absolutely required minimum has to be specified at each level, allowing for later adaptations with few or no recomputations required. [sent-68, score-0.051]
40 As an aside, we observe that our component’s architecture happens to correspond rather closely to Levelt’s (1989) model of human speech production. [sent-69, score-0.159]
41 Levelt distinguishes several, partially independent processing modules (conceptualization, formulation, articulation, see Figure 1) that function incrementally and “in a highly automatic, reflex-like way” (Levelt, 1989, p. [sent-70, score-0.116]
42 In this architecture, incremental processing as the processing of incremental units (IUs), which are the smallest ‘chunks’ of information at a specific level (such as words, or phonemes, as can be seen in Figure 1). [sent-75, score-0.656]
43 The iSS component takes an IU sequence of chunks of words as input (from an NLG component). [sent-79, score-0.141]
44 Crucially, this sequence can then still be modified, through: (a) continuations, which simply link further words to the end of the sequence; or (b) replacements, where elements in the sequence are “unlinked” and other elements are spliced in. [sent-80, score-0.082]
45 Additionally, a chunk can be marked as open; this has the effect of linking to a special hesitation word, which is produced only if it is not replaced (by the NLG) in time with other material. [sent-81, score-0.161]
46 Technically, the representation levels below the chunk level are generated in our component by MaryTTS’s linguistic preprocessing and converting the output to IU structures. [sent-82, score-0.173]
47 Our component provides for two modes of operation: Either using MaryTTS’ HMM optimization routines which non-incrementally solve a large matrix operation and subsequently iteratively optimize the global variance constraint (Toda and Tokuda, 2007). [sent-83, score-0.168]
48 Or, using the incremental algorithm as proposed by Dutoit et al. [sent-84, score-0.298]
49 In our implementation of this algorithm, HMM emissions are computed with one phoneme of context in both directions; Dutoit et al. [sent-86, score-0.166]
50 (201 1) have found this setting to only slightly degrade synthesis quality. [sent-87, score-0.44]
51 While the former mode incurs some utterance-initial delay, switching between alternatives and prosodic alteration can be performed at virtually no lookahead, while requiring just little lookahead for the truly incremental mode. [sent-88, score-0.591]
52 The resulting vocoding frames then are attached to their corresponding phoneme units. [sent-89, score-0.449]
53 Phoneme units then contain all the information Figure 2: Example application that showcases just-in-time manipulation of prosodic aspects (tempo and pitch) of the ongoing utterance. [sent-90, score-0.248]
54 needed for the final vocoding step, in an accessible form, which makes possible various manipulations before the final synthesis step. [sent-91, score-0.612]
55 The lowest level module of our component is what may be called a crawling vocoder, which actively moves along the phoneme IU layer, querying each phoneme for its parameter frames one-by-one and producing the corresponding audio via vocoding. [sent-92, score-0.763]
56 The vocoding algorithm is entirely incremental, making it possible to vocode “just-in-time”: only when audio is needed to keep the sound card buffer full does the vocoder query for a next parameter frame. [sent-93, score-0.418]
57 4 Quality of Results As these descriptions should have made clear, there are some elements in the processing steps in our iSS component that aren’t yet fully incremental, such as assigning a sentence-level prosody. [sent-98, score-0.182]
58 The best results are thus achieved if a full utterance is presented to the component initially, which is used for computation of prosody, and of which then elements may be changed (e. [sent-99, score-0.469]
59 It is unavoidable, though, that there can be some “breaks” at the seams where elements are replaced. [sent-102, score-0.041]
60 Moreover, the way feature frames can be modified (as described below) and the incremental HMM optimization method may lead to deviations from the global optimum. [sent-103, score-0.436]
61 However, preliminary evaluation of the component’s prosody given varying amounts of lookahead indicate that degradations are reasonably small. [sent-105, score-0.177]
62 Also, the benefits in naturalness of behaviour enabled by iSS may outweigh the drawback in prosodic quality. [sent-106, score-0.134]
63 106 4 Interface Demonstrations We will describe the features of iSS, their implementation, their programming interface, and corresponding demo applications in the following subsections. [sent-107, score-0.056]
64 1 Low-Latency Changes to Prosody Pitch and tempo can be adapted on the phoneme IU layer (see Figure 1). [sent-109, score-0.293]
65 Figure 2 shows a demo interface to this functionality. [sent-110, score-0.102]
66 Pitch is determined by a single parameter in the vocoding frames and can be adapted independently of other parameters in the HMM approach. [sent-111, score-0.346]
67 We have implemented capabilities of adjusting all pitch values in a phoneme by an offset, or to change the values gradually for all frames in the phoneme. [sent-112, score-0.497]
68 (The first feature is show-cased in the application in Figure 2, the latter is used to cancel utterance-final pitch changes when a continuation is appended to an ongoing utterance. [sent-113, score-0.374]
69 ) Tempo can be adapted by changing the phoneme units’ durations which will then repeat (or skip) parameter frames (for lengthened or shortened phonemes, respectively) when passing them to the crawling vocoder. [sent-114, score-0.444]
70 Adaptations are conducted with virtually no lookahead, that is, they can be executed even on a phoneme that is currently being output. [sent-115, score-0.239]
71 A new progress field on IUs marks whether the IU’s production is upcoming, ongoing, or completed. [sent-118, score-0.073]
72 Listeners may subscribe to be notified about such progress changes using an update interface on IUs. [sent-119, score-0.077]
73 The applications in Figures 2 and 4 make use of this interface to mark the words of the utterance in bold for completed, and in italic for ongoing words (incidentally, the screenshot in Figure 4 was taken exactly at the boundary between “delete” and “the”). [sent-120, score-0.39]
74 3 Low-Latency Switching of Alternatives A major goal of iSS is to change what is being said while the utterance is ongoing. [sent-122, score-0.225]
75 Forward-pointing same-level links (SLLs, (Schlangen and Skantze, 2009; Baumann and Schlangen, 2012)) as shown in Figure 3 allow to construct alternative utterance paths beforehand. [sent-123, score-0.225]
76 Deciding on the actual utterance continuation is a simple re-ranking of the forward Figure 3: Incremental units chained together via forwardpointing same-level links to form an utterance tree. [sent-124, score-0.596]
77 Figure 4: Example application to showcase just-in-time selection between different paths in a complex utterance. [sent-125, score-0.038]
78 SLLs which can be changed until immediately before the word (or phoneme) in question is being uttered. [sent-126, score-0.028]
79 The demo application shown in Figure 4 allows the user to select the path through a fairly complex utterance tree. [sent-127, score-0.322]
80 The user has already decided on the color, but not on the type of piece to be deleted and hence the currently selected plan is to play a hesitation (see below). [sent-128, score-0.211]
81 4 Extension of the Ongoing Utterance In the previous subsection we have shown how alternatives in utterances can be selected with very low latency. [sent-130, score-0.135]
82 Thus, practical applications which use incremental NLG must generate their next steps with some lookahead to avoid stalling the output. [sent-132, score-0.381]
83 However, utterances can be marked as non-final, which results in a special hesitation word being inserted, as explained below. [sent-133, score-0.215]
84 5 Autonomously Performing Disfluencies In a multi-threaded, real-time system, the crawling vocoder may reach the end of synthesis before the NLG component (in its own thread) has been able to add a continuation to the ongoing utterance. [sent-135, score-0.969]
85 To avoid this case, special hesitation words can be inserted at the end of a yet unfinished utterance. [sent-136, score-0.129]
86 If the crawling vocoder nears such a word, a hesitation will be played, unless a continuation is available. [sent-137, score-0.398]
87 In that case, the hesitation is skipped (or aborted if currently ongoing). [sent-138, score-0.17]
88 6 Type-to-Speech A final demo application show-cases truly incremental HMM synthesis taken to its most extreme: A text input window is presented, and each word that is typed is treated as a single-word chunk which is immediately sent to the incremental synthesizer. [sent-140, score-1.15]
89 (For this demonstration, synthesis is slowed to half the regular speed, to account for slow typing speeds and to highlight the prosodic improvements when more right context becomes available to iSS. [sent-141, score-0.509]
90 ) A use case with a similar (but probably lower) level of incre- mentality could be simultaneous speech-to-speech translation, or type-to-speech for people with speech disabilities. [sent-142, score-0.137]
91 5 Conclusions We have presented a component for incremental speech synthesis (iSS) and demonstrated its capabilities with a number of example applications. [sent-143, score-1.041]
92 This component can be used to increase the responsivity and naturalness of spoken interactive systems. [sent-144, score-0.351]
93 While iSS can show its full strengths in systems that also generate output incrementally (a strategy which is currently seeing some renewed attention), we discussed how even otherwise unchanged systems may profit from its capabilities, e. [sent-145, score-0.186]
94 We provide this component in the hope that it will help spur research on incremental natural language generation and more interactive spoken dialogue systems, which so far had to made do with inadequate ways of realising its output. [sent-148, score-0.716]
95 2Thus, in contrast to (Skantze and Hjalmarsson, 2010), hesitations do not take up any additional time. [sent-149, score-0.043]
96 Automatische Selbstkorrektur bei der inkrementellen Generierung gesprochener Sprache unter Realzeitbedingungen. [sent-175, score-0.055]
wordName wordTfidf (topN-words)
[('synthesis', 0.44), ('incremental', 0.298), ('utterance', 0.225), ('schlangen', 0.215), ('baumann', 0.194), ('skantze', 0.172), ('vocoding', 0.172), ('pitch', 0.169), ('phoneme', 0.166), ('iss', 0.16), ('dialogue', 0.143), ('component', 0.141), ('hesitation', 0.129), ('ongoing', 0.119), ('frames', 0.111), ('speech', 0.111), ('dutoit', 0.108), ('vocoder', 0.108), ('nlg', 0.096), ('prosody', 0.094), ('marytts', 0.086), ('timo', 0.086), ('utterances', 0.086), ('continuation', 0.086), ('iu', 0.083), ('lookahead', 0.083), ('hmm', 0.081), ('crawling', 0.075), ('audio', 0.072), ('spoken', 0.071), ('incrementally', 0.07), ('prosodic', 0.069), ('delivery', 0.069), ('delay', 0.065), ('edlund', 0.065), ('hjalmarsson', 0.065), ('inprotk', 0.065), ('levelt', 0.065), ('naturalness', 0.065), ('schr', 0.065), ('tokuda', 0.065), ('units', 0.06), ('gabriel', 0.06), ('phonemes', 0.057), ('demo', 0.056), ('tempo', 0.056), ('continuations', 0.056), ('ius', 0.056), ('der', 0.055), ('adaptations', 0.051), ('capabilities', 0.051), ('alternatives', 0.049), ('architecture', 0.048), ('modules', 0.046), ('interface', 0.046), ('sigdial', 0.045), ('conceptualization', 0.043), ('contours', 0.043), ('finkler', 0.043), ('hesitations', 0.043), ('interruption', 0.043), ('intonation', 0.043), ('keiichi', 0.043), ('kilger', 0.043), ('matsuyama', 0.043), ('responsivity', 0.043), ('sdss', 0.043), ('slls', 0.043), ('toda', 0.043), ('trouvain', 0.043), ('production', 0.042), ('user', 0.041), ('profit', 0.041), ('currently', 0.041), ('elements', 0.041), ('layer', 0.04), ('autonomously', 0.038), ('incurs', 0.038), ('prompt', 0.038), ('showcase', 0.038), ('full', 0.034), ('stream', 0.033), ('parameter', 0.032), ('generation', 0.032), ('chunk', 0.032), ('executed', 0.032), ('fed', 0.032), ('progress', 0.031), ('adapted', 0.031), ('interactive', 0.031), ('aside', 0.03), ('athens', 0.029), ('durations', 0.029), ('japan', 0.029), ('mary', 0.028), ('changed', 0.028), ('switching', 0.028), ('optimization', 0.027), ('truly', 0.026), ('simultaneous', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 113 acl-2012-INPROwidth.3emiSS: A Component for Just-In-Time Incremental Speech Synthesis
Author: Timo Baumann ; David Schlangen
Abstract: We present a component for incremental speech synthesis (iSS) and a set of applications that demonstrate its capabilities. This component can be used to increase the responsivity and naturalness of spoken interactive systems. While iSS can show its full strength in systems that generate output incrementally, we also discuss how even otherwise unchanged systems may profit from its capabilities.
2 0.12307908 114 acl-2012-IRIS: a Chat-oriented Dialogue System based on the Vector Space Model
Author: Rafael E. Banchs ; Haizhou Li
Abstract: This system demonstration paper presents IRIS (Informal Response Interactive System), a chat-oriented dialogue system based on the vector space model framework. The system belongs to the class of examplebased dialogue systems and builds its chat capabilities on a dual search strategy over a large collection of dialogue samples. Additional strategies allowing for system adaptation and learning implemented over the same vector model space framework are also described and discussed. 1
3 0.09663789 149 acl-2012-Movie-DiC: a Movie Dialogue Corpus for Research and Development
Author: Rafael E. Banchs
Abstract: This paper describes Movie-DiC a Movie Dialogue Corpus recently collected for research and development purposes. The collected dataset comprises 132,229 dialogues containing a total of 764,146 turns that have been extracted from 753 movies. Details on how the data collection has been created and how it is structured are provided along with its main statistics and characteristics. 1
4 0.093977764 14 acl-2012-A Joint Model for Discovery of Aspects in Utterances
Author: Asli Celikyilmaz ; Dilek Hakkani-Tur
Abstract: We describe a joint model for understanding user actions in natural language utterances. Our multi-layer generative approach uses both labeled and unlabeled utterances to jointly learn aspects regarding utterance’s target domain (e.g. movies), intention (e.g., finding a movie) along with other semantic units (e.g., movie name). We inject information extracted from unstructured web search query logs as prior information to enhance the generative process of the natural language utterance understanding model. Using utterances from five domains, our approach shows up to 4.5% improvement on domain and dialog act performance over cascaded approach in which each semantic component is learned sequentially and a supervised joint learning model (which requires fully labeled data).
5 0.077340871 16 acl-2012-A Nonparametric Bayesian Approach to Acoustic Model Discovery
Author: Chia-ying Lee ; James Glass
Abstract: We investigate the problem of acoustic modeling in which prior language-specific knowledge and transcribed data are unavailable. We present an unsupervised model that simultaneously segments the speech, discovers a proper set of sub-word units (e.g., phones) and learns a Hidden Markov Model (HMM) for each induced acoustic unit. Our approach is formulated as a Dirichlet process mixture model in which each mixture is an HMM that represents a sub-word unit. We apply our model to the TIMIT corpus, and the results demonstrate that our model discovers sub-word units that are highly correlated with English phones and also produces better segmentation than the state-of-the-art unsupervised baseline. We test the quality of the learned acoustic models on a spoken term detection task. Compared to the baselines, our model improves the relative precision of top hits by at least 22.1% and outper- forms a language-mismatched acoustic model.
6 0.062452532 211 acl-2012-Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation
7 0.058351357 119 acl-2012-Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
8 0.058300123 88 acl-2012-Exploiting Social Information in Grounded Language Learning via Grammatical Reduction
9 0.055224102 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining
10 0.055095673 24 acl-2012-A Web-based Evaluation Framework for Spatial Instruction-Giving Systems
11 0.052098524 59 acl-2012-Corpus-based Interpretation of Instructions in Virtual Environments
12 0.045310434 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation
13 0.044852167 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
14 0.04279029 41 acl-2012-Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
15 0.040610399 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums
16 0.040436458 43 acl-2012-Building Trainable Taggers in a Web-based, UIMA-Supported NLP Workbench
17 0.040420145 74 acl-2012-Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach
18 0.039858259 13 acl-2012-A Graphical Interface for MT Evaluation and Error Analysis
19 0.037648022 153 acl-2012-Named Entity Disambiguation in Streaming Data
20 0.037628345 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations
topicId topicWeight
[(0, -0.112), (1, 0.028), (2, -0.024), (3, 0.015), (4, -0.009), (5, 0.088), (6, 0.034), (7, 0.008), (8, 0.009), (9, 0.069), (10, -0.083), (11, 0.04), (12, -0.097), (13, 0.087), (14, -0.095), (15, 0.021), (16, -0.07), (17, 0.11), (18, 0.053), (19, -0.05), (20, -0.009), (21, -0.0), (22, -0.057), (23, -0.237), (24, 0.031), (25, -0.019), (26, 0.057), (27, 0.007), (28, 0.043), (29, 0.006), (30, 0.039), (31, 0.046), (32, -0.021), (33, 0.0), (34, -0.064), (35, 0.037), (36, -0.009), (37, -0.054), (38, -0.012), (39, 0.013), (40, -0.174), (41, 0.147), (42, 0.036), (43, -0.104), (44, 0.011), (45, -0.015), (46, -0.072), (47, -0.076), (48, 0.025), (49, 0.054)]
simIndex simValue paperId paperTitle
same-paper 1 0.96338761 113 acl-2012-INPROwidth.3emiSS: A Component for Just-In-Time Incremental Speech Synthesis
Author: Timo Baumann ; David Schlangen
Abstract: We present a component for incremental speech synthesis (iSS) and a set of applications that demonstrate its capabilities. This component can be used to increase the responsivity and naturalness of spoken interactive systems. While iSS can show its full strength in systems that generate output incrementally, we also discuss how even otherwise unchanged systems may profit from its capabilities.
2 0.66824651 114 acl-2012-IRIS: a Chat-oriented Dialogue System based on the Vector Space Model
Author: Rafael E. Banchs ; Haizhou Li
Abstract: This system demonstration paper presents IRIS (Informal Response Interactive System), a chat-oriented dialogue system based on the vector space model framework. The system belongs to the class of examplebased dialogue systems and builds its chat capabilities on a dual search strategy over a large collection of dialogue samples. Additional strategies allowing for system adaptation and learning implemented over the same vector model space framework are also described and discussed. 1
3 0.63272864 149 acl-2012-Movie-DiC: a Movie Dialogue Corpus for Research and Development
Author: Rafael E. Banchs
Abstract: This paper describes Movie-DiC a Movie Dialogue Corpus recently collected for research and development purposes. The collected dataset comprises 132,229 dialogues containing a total of 764,146 turns that have been extracted from 753 movies. Details on how the data collection has been created and how it is structured are provided along with its main statistics and characteristics. 1
4 0.52147681 16 acl-2012-A Nonparametric Bayesian Approach to Acoustic Model Discovery
Author: Chia-ying Lee ; James Glass
Abstract: We investigate the problem of acoustic modeling in which prior language-specific knowledge and transcribed data are unavailable. We present an unsupervised model that simultaneously segments the speech, discovers a proper set of sub-word units (e.g., phones) and learns a Hidden Markov Model (HMM) for each induced acoustic unit. Our approach is formulated as a Dirichlet process mixture model in which each mixture is an HMM that represents a sub-word unit. We apply our model to the TIMIT corpus, and the results demonstrate that our model discovers sub-word units that are highly correlated with English phones and also produces better segmentation than the state-of-the-art unsupervised baseline. We test the quality of the learned acoustic models on a spoken term detection task. Compared to the baselines, our model improves the relative precision of top hits by at least 22.1% and outper- forms a language-mismatched acoustic model.
5 0.50735843 14 acl-2012-A Joint Model for Discovery of Aspects in Utterances
Author: Asli Celikyilmaz ; Dilek Hakkani-Tur
Abstract: We describe a joint model for understanding user actions in natural language utterances. Our multi-layer generative approach uses both labeled and unlabeled utterances to jointly learn aspects regarding utterance’s target domain (e.g. movies), intention (e.g., finding a movie) along with other semantic units (e.g., movie name). We inject information extracted from unstructured web search query logs as prior information to enhance the generative process of the natural language utterance understanding model. Using utterances from five domains, our approach shows up to 4.5% improvement on domain and dialog act performance over cascaded approach in which each semantic component is learned sequentially and a supervised joint learning model (which requires fully labeled data).
6 0.45027024 211 acl-2012-Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation
7 0.36674407 43 acl-2012-Building Trainable Taggers in a Web-based, UIMA-Supported NLP Workbench
8 0.33840647 165 acl-2012-Probabilistic Integration of Partial Lexical Information for Noise Robust Haptic Voice Recognition
9 0.33515319 138 acl-2012-LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation
10 0.33392352 70 acl-2012-Demonstration of IlluMe: Creating Ambient According to Instant Message Logs
11 0.3170729 129 acl-2012-Learning High-Level Planning from Text
12 0.31632113 88 acl-2012-Exploiting Social Information in Grounded Language Learning via Grammatical Reduction
13 0.30793029 41 acl-2012-Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
14 0.28948486 74 acl-2012-Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach
15 0.27236211 59 acl-2012-Corpus-based Interpretation of Instructions in Virtual Environments
16 0.26526734 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining
17 0.25015351 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
18 0.24141078 6 acl-2012-A Comprehensive Gold Standard for the Enron Organizational Hierarchy
19 0.24010536 23 acl-2012-A Two-step Approach to Sentence Compression of Spoken Utterances
20 0.23812082 160 acl-2012-Personalized Normalization for a Multilingual Chat System
topicId topicWeight
[(25, 0.022), (26, 0.037), (28, 0.029), (30, 0.021), (37, 0.035), (39, 0.045), (59, 0.016), (74, 0.039), (82, 0.019), (84, 0.021), (85, 0.03), (90, 0.081), (92, 0.042), (94, 0.025), (98, 0.41), (99, 0.054)]
simIndex simValue paperId paperTitle
same-paper 1 0.82236195 113 acl-2012-INPROwidth.3emiSS: A Component for Just-In-Time Incremental Speech Synthesis
Author: Timo Baumann ; David Schlangen
Abstract: We present a component for incremental speech synthesis (iSS) and a set of applications that demonstrate its capabilities. This component can be used to increase the responsivity and naturalness of spoken interactive systems. While iSS can show its full strength in systems that generate output incrementally, we also discuss how even otherwise unchanged systems may profit from its capabilities.
2 0.78902596 24 acl-2012-A Web-based Evaluation Framework for Spatial Instruction-Giving Systems
Author: Srinivasan Janarthanam ; Oliver Lemon ; Xingkun Liu
Abstract: We demonstrate a web-based environment for development and testing of different pedestrian route instruction-giving systems. The environment contains a City Model, a TTS interface, a game-world, and a user GUI including a simulated street-view. We describe the environment and components, the metrics that can be used for the evaluation of pedestrian route instruction-giving systems, and the shared challenge which is being organised using this environment.
3 0.53395176 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili ; Martha Palmer
Abstract: In this paper, we propose innovative representations for automatic classification of verbs according to mainstream linguistic theories, namely VerbNet and FrameNet. First, syntactic and semantic structures capturing essential lexical and syntactic properties of verbs are defined. Then, we design advanced similarity functions between such structures, i.e., semantic tree kernel functions, for exploiting distributional and grammatical information in Support Vector Machines. The extensive empirical analysis on VerbNet class and frame detection shows that our models capture mean- ingful syntactic/semantic structures, which allows for improving the state-of-the-art.
4 0.49294701 167 acl-2012-QuickView: NLP-based Tweet Search
Author: Xiaohua Liu ; Furu Wei ; Ming Zhou ; QuickView Team Microsoft
Abstract: Tweets have become a comprehensive repository for real-time information. However, it is often hard for users to quickly get information they are interested in from tweets, owing to the sheer volume of tweets as well as their noisy and informal nature. We present QuickView, an NLP-based tweet search platform to tackle this issue. Specifically, it exploits a series of natural language processing technologies, such as tweet normalization, named entity recognition, semantic role labeling, sentiment analysis, tweet classification, to extract useful information, i.e., named entities, events, opinions, etc., from a large volume of tweets. Then, non-noisy tweets, together with the mined information, are indexed, on top of which two brand new scenarios are enabled, i.e., categorized browsing and advanced search, allowing users to effectively access either the tweets or fine-grained information they are interested in.
5 0.30652511 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
Author: Gerard de Melo ; Gerhard Weikum
Abstract: We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.
6 0.30638814 191 acl-2012-Temporally Anchored Relation Extraction
7 0.30417433 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
8 0.30296302 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models
9 0.3017059 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle
10 0.30159217 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
11 0.30150843 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents
12 0.30112088 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool
13 0.30025735 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
14 0.30014434 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities
15 0.29972303 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization
16 0.29965311 136 acl-2012-Learning to Translate with Multiple Objectives
17 0.29943454 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars
18 0.29910037 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places
19 0.29897413 29 acl-2012-Assessing the Effect of Inconsistent Assessors on Summarization Evaluation
20 0.29866168 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model