acl acl2011 acl2011-303 knowledge-graph by maker-knowledge-mining

303 acl-2011-Tier-based Strictly Local Constraints for Phonology

Source: pdf

Author: Jeffrey Heinz ; Chetan Rawal ; Herbert G. Tanner

Abstract: Beginning with Goldsmith (1976), the phonological tier has a long history in phonological theory to describe non-local phenomena. This paper defines a class of formal languages, the Tier-based Strictly Local languages, which begin to describe such phenomena. Then this class is located within the Subregular Hierarchy (McNaughton and Papert, 1971). It is found that these languages contain the Strictly Local languages, are star-free, are incomparable with other known sub-star-free classes, and have other interesting properties.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu , , Abstract Beginning with Goldsmith (1976), the phonological tier has a long history in phonological theory to describe non-local phenomena. [sent-3, score-0.668]

2 This paper defines a class of formal languages, the Tier-based Strictly Local languages, which begin to describe such phenomena. [sent-4, score-0.075]

3 It is found that these languages contain the Strictly Local languages, are star-free, are incomparable with other known sub-star-free classes, and have other interesting properties. [sent-6, score-0.189]

4 1 Introduction The phonological tier is a level of representation where not all speech sounds are present. [sent-7, score-0.502]

5 For example, the vowel tier of the Finnish word p¨ aiv a¨ a¨ ‘Hello’ is simply the vowels in order without the consonants: a¨i a¨ a¨. [sent-8, score-0.567]

6 Computational work exists which incorporates and formalizes phonological tiers (Kornai, 1994; Bird, 1995; Eisner, 1997). [sent-11, score-0.147]

7 However, there is no work of which the authors are aware that 58 addresses the expressivity or properties of tier-based patterns in terms of formal language theory. [sent-13, score-0.098]

8 This paper begins to fill this gap by defining TierBased Strictly Local (TSL) languages, which generalize the Strictly Local languages (McNaughton and Papert, 1971). [sent-14, score-0.205]

9 It is shown that TSL languages are necessarily star-free, but are incomparable with other known sub-star-free classes, and that natural groups oflanguages within the class are string extension learnable (Heinz, 2010b; Kasprzik and K ¨otzing, 2010). [sent-15, score-0.325]

10 Section 3 reviews major subregular classes and their relationships. [sent-18, score-0.185]

11 Section 4 defines the TSL languages, relates them to known subregular classes, and section 5 discusses the results. [sent-19, score-0.145]

12 Let Σn, Σ≤n, Σ∗ denote all sequences over this alphabet of length n, of length less than or equal to n, and of any finite length, respectively. [sent-23, score-0.069]

13 The empty string is denoted λ and |w| denotes tvheley length mofp pwtyor sdtr w. [sent-24, score-0.064]

14 a nTcehe, |caoanacaa|tenation of two languages L1L2 = {uv : u ∈ L1 and v ∈ L2}. [sent-29, score-0.159]

15 A language is star-free iff there is a GRE defining it which contains no instances of the Kleene star (*). [sent-46, score-0.15]

16 It is well known that the star-free languages (1) are a proper subset of the regular languages, (2) are closed under Boolean operations, and (3) have multiple characterizations, including logical and algebraic ones (McNaughton and Papert, 1971). [sent-47, score-0.277]

17 String u is a factor of string w iff ∃x, y ∈ Σ∗ sucSht tinhagt w = xuy. [sent-48, score-0.245]

18 The domain Fk bisc generalized to languages L ⊆ Σ∗ in the usual way: Fk(L) = ∪w∈LFk(w). [sent-53, score-0.159]

19 Fk,t(w) = {(u, n) : u is a k-factor of w and n = |w|u iff |w|u < t else n = t} For example F2,3 (aaaaab) = {(aa, 3) , (ab, 1)} . [sent-55, score-0.15]

20 r By definition λ is a subsequence of every string in Σ∗. [sent-61, score-0.129]

21 This paper establishes the TSL class and its place in the figure. [sent-67, score-0.076]

22 The Locally Testable (LT) languages and the Strictly Piecewise (SP) languages are discussed by Rogers and Pullum (to ap- pear) and Rogers et al. [sent-72, score-0.318]

23 Definition 1 A language L is Strictly k-Local iff there exists a finite set S ⊆ Fk ( ⋊ Σ∗ ⋉ ) such that L = {w ∈ Σ∗ : Fk(⋊w⋉) ⊆ S} The symbols ⋊ and ⋉ invoke left and right word boundaries, respectively. [sent-76, score-0.219]

24 A language is said to be Strictly Local iff there is some k for which it is Strictly k-Local. [sent-77, score-0.15]

25 d The elements of S can be thought of as the permissible k-factors and the elements in Fk ( ⋊ Σ∗ ⋉ ) S are the forbidden k-factors. [sent-80, score-0.292]

26 For example, bb⋉ a)n−d ⋊ b are forbidden 2-factors for L = aa∗ (b + c). [sent-81, score-0.209]

27 More generally, any SL language L excludes exactly those words with any forbidden factors; i. [sent-82, score-0.209]

28 , L is the intersection of the complements of sets defined to be those words which contain a forbidden factor. [sent-84, score-0.249]

29 This provides another characterization of SL languages (given below in Theorem 1). [sent-86, score-0.159]

30 Formally, let the container of w ∈ ⋊ Σ∗ ⋉ be C(w) = {u ∈ Σ∗ : w is a factor of ⋊ u⋉ } For example, C( ⋊ a) = aΣ∗. [sent-87, score-0.063]

31 Then there exists a finite set of forbidden factors S¯ ⊆ Fk(⋊Σ∗⋉) such that L = ∩w∈S¯ C(w). [sent-90, score-0.337]

32 Definition 2 A language L is Locally t-Threshold k-Testable iff ∃t, k ∈ N such that ∀w, v ∈ Σ∗, if Fk,t(w) = Fk,t(v) kth ∈en w ∈ Lh ⇔ v ∈ L,. [sent-91, score-0.15]

33 v A language is Locally Threshold Testable iff there is some k and t for which it is Locally t-Threshold k-Testable. [sent-92, score-0.15]

34 Definition 3 A language L is Piecewise k-Testable iff ∃k ∈ N such that ∀w, v ∈ Σ∗, if P≤k(w) = P≤k (v) ∈the Nn w ∈ L th ⇔ v ∈ Lv. [sent-93, score-0.15]

35 A language is Piecewise Testable iff there is some k for which it is Piecewise k-Testable. [sent-94, score-0.15]

36 1 Definition The definition of Tier-based Strictly Local languages is similar to the one for SL languages with the exception that forbidden k-factors only apply to elements on a tier T ⊆ Σ, all other symbols are igneolermede. [sent-97, score-0.946]

37 n Isn o onrad e trie to Td ⊆efin Σe, t ahlel oTthSeLr languages, eit i gisnecessary to introduce an “erasing” function (sometimes called string projection), which erases symbols not on the tier. [sent-98, score-0.064]

38 ET(σ1 · · · σn) = u1 · · · un where ui = σi iff σi ∈ T and ui = λ otherwise. [sent-99, score-0.15]

39 a Ad string u = σ1 · · · σn ∈ ⋊ T∗ ⋉ is a factor on tier T of a string w iff u ·isσ a f∈act ⋊orT of ET(w). [sent-101, score-0.694]

40 ⋉A) language rLe iesn a Tier-based Strictly Local iff it is Strictly k-Local on Tier T for some T Σ and k ∈ N. [sent-104, score-0.15]

41 }E,l eTm =ent {s ,ofc S, are tShe = permissible ck,-cfabc,tbo⋉rs, on t. [sent-106, score-0.083]

42 Entlsem ofen Sts roef F2 ( ⋊T∗ ⋉ ) − S = {bb, cc} are the forbidden factors on t⋉ier) T−. [sent-108, score-0.268]

43 STh =e language athreis t hdees fcorirbbeid idnecnlu fdaecswords like aabaaacaaabaa, but excludes words like aabaaabaaacaa since bb is a forbidden 2-factor on tier T. [sent-109, score-0.63]

44 This example captures the nature of longdistance dissimilation patterns found in phonology (Suzuki, 1998; Frisch et al. [sent-110, score-0.244]

45 Like SL languages, TSL languages can also be characterized in terms of the forbidden factors. [sent-113, score-0.368]

46 Let the tier-based container of w ∈ ⋊T∗ ⋉ be CT(w) = {u ∈ Σ∗ : w is a factor on tier T of ⋊ u⋉ } For example, CT( ⋊ b) = (Σ − T)∗bΣ∗. [sent-114, score-0.448]

47 if w = σ1 · · · σn ∈ ⋊Tb)∗ t =he n(Σ CT(w) = Σ∗σ1(Σ − T)∗σ2(Σ − T)∗ · · · (Σ − In general T)∗σnΣ∗ In the case where w begins (ends) with a word boundary symbol then the first (last) Σ∗ in the previous GRE must be replaced with (Σ − T)∗. [sent-115, score-0.046]

48 Theorem 2 For any L ∈ TSL, let T, k, S be the tier, length, and permissible factors, respecTtively, and S¯ the forbidden factors. [sent-116, score-0.292]

49 PTroof The structure of the proof is identical to the one for Theorem 1. [sent-118, score-0.059]

50 2 Relations to other subregular classes This section establishes that TSL languages properly include SL languages and are properly star-free. [sent-121, score-0.545]

51 Theorems 4 and 5 show that TSL languages are not necessarily LTT nor PT, but Theorem 6 shows that TSL languages are necessarily star-free. [sent-123, score-0.318]

52 Proof Inclusion follows immediately from the definitions by setting the tier T = Σ. [sent-125, score-0.414]

53 The fact that TSL languages properly include SL ones follows from the next theorem. [sent-127, score-0.159]

54 Although TSL languages are neither LTT nor PT, Theorem 6 establishes that they are star-free. [sent-149, score-0.201]

55 Since Sthe ⊆ st Far-free languages are c =los ∩edw ∈under finite intersection and complement, it is sufficient to show that CT(w) is starfree for all w ∈ ⋊ T∗ ⋉ . [sent-154, score-0.268]

56 In the cases where w begins (ends) with a word 61 boundary symbol, the first (last) ∅ in the GRE above t∅ (Tla∅s. [sent-157, score-0.046]

57 Together Theorems 1-4 establish that TSL languages generalize the SL languages in a different way than the LT and LTT languages do (Figure 1). [sent-161, score-0.477]

58 3 Other Properties There are two other properties of TSL languages worth mentioning. [sent-163, score-0.159]

59 First, TSL languages are closed under suffix and prefix. [sent-164, score-0.197]

60 This follows immediately because no word w of any TSL language contains any forbidden factors on the tier and so neither does any prefix or suffix of w. [sent-165, score-0.682]

61 Next, consider that the choice of T ⊆ Σ and k ∈ eNx td,e fcionnes systematic hcela scsheosi coef languages Σwh aicnhd are T NS Lde. [sent-168, score-0.159]

62 gIte sfo wllhoiwchs immediately t hLat LT,k is a string extension class (Heinz, 2010b). [sent-170, score-0.165]

63 LA string extension class is one which can be defined by a function f whose domain is Σ∗ and whose codomain is the set of all finite subsets of some set A. [sent-171, score-0.241]

64 A grammar G is a particular finite subset of A and the language of the grammar is all words which f maps to a subset of G. [sent-172, score-0.069]

65 For LT,k, the grammar can be thought of as the set Fofo permissible factors on tier T and the function is w → Fk ( ⋊ ET(w) ⋉ ). [sent-173, score-0.527]

66 In other words, every twioonrd i sis w mapped to the set of k-factors present on tier T. [sent-174, score-0.385]

67 ) String extension classes have quite a bit of structure, which faciliates learning (Heinz, 2010b; Kasprzik and K ¨otzing, 2010). [sent-176, score-0.078]

68 They are closed under intersection, and have a lattice structure under the partial ordering given by the inclusion relation (⊆). [sent-177, score-0.071]

69 In the case just mentioned, the tier is known in advance. [sent-180, score-0.385]

70 Learners which identify in the limit a class of TSL languages with an unknown tier but known k exist in principle (since such a class is of finite size), but it is unknown whether any such learner is efficient in the size of the input sample. [sent-181, score-0.681]

71 5 Discussion Having established the main results, this section discusses some implications for phonology in general, Optimality Theory in particular, and future research. [sent-182, score-0.111]

72 There are three classes of phonotactic constraints in phonology: local segmental patterns, longdistance segmental patterns, and stress patterns (Heinz, 2007). [sent-183, score-0.485]

73 Long-distance segmental phonotactic patterns are those derived from processes of consonant harmony and disharmony and vowel harmony. [sent-185, score-0.543]

74 Below we show each of these patterns belong to TSL. [sent-186, score-0.057]

75 Phonotactic patterns derived from attested longdistance consonantal assimilation patterns (Rose and Walker, 2004; Hansson, 2001) are SP; on the other hand, phonotactic patterns derived from attested long-distance consonantal dissimilation patterns (Suzuki, 1998) are not (Heinz, 2010a). [sent-188, score-0.447]

76 Assimilation is obtained by forbidding disagreeing factors on the tier. [sent-190, score-0.171]

77 For example, forbidding lr and rl on the liquid tier T = {l, r} yields only words which do not contain bTot =h [l] arn}d y [r]. [sent-191, score-0.501]

78 D onislysim wiolartdiosn w hisi ohb dtaoin neodt by tfaoirnbidding agreeing factors on the tier; e. [sent-192, score-0.059]

79 forbidding ll and rr on the liquid tier yields a language of the same character as LD. [sent-194, score-0.501]

80 The phonological literature distinguishes three kinds ofvowel harmony patterns: those without neutral vowels, those with opaque vowels and those with transparent vowels (Bakovi c´, 2000; Nevins, 2010). [sent-195, score-0.462]

81 Formally, vowel harmony patterns without neutral vowels are the same as assimilatory consonant harmony. [sent-196, score-0.409]

82 For example, a case of back harmony can be described by forbidding disagreeing factors {iu, io, ¨o u, ¨o o, ui, u o¨, oi, o o¨} on the vowel tier {Ti ={i, o¨,u,o}. [sent-197, score-0.771]

83 oI,f a ,vo uw o¨,el oiis, opaque, i tt hdeoe vso nwoetl h tiaerrTmo =ni{zie,¨ o b,uu,to begins i vtso own harmony ,d iotm doaiens. [sent-198, score-0.152]

84 nFootr h example if [i] is opaque, this can be described by for- bidding factors {iu, io ¨o u, ¨o o, u o¨, o o¨} on the vowel btieidr. [sent-199, score-0.168]

85 d Tinghu fsa cwtoorrdss { liiuk,e i olu¨ lo oul,ilo o¨ ¨o are acceptable eb e vcoawuseel oi is a permissible factor. [sent-200, score-0.083]

86 If a vowel is transparent, it neither harmonizes nor begins its own harmony domain. [sent-201, score-0.261]

87 by forbidding factors { o¨u, ¨o o, u o¨, o o¨} on ti tieerr ;T i ={¨ o,u,o}. [sent-204, score-0.139]

88 l uT ahere reasonable hypothesis which follows from this discussion is that all humanly possible segmental phonotactic patterns are TSL (since TSL contains SL). [sent-208, score-0.264]

89 The intersection of two languages drawn from the same string extension class is only as expensive as the intersection of finite sets (Heinz, 2010b). [sent-210, score-0.444]

90 nIns athcriso way, ttihnicst w Lork suggests a way to factor OT constraints characterizable as TSL languages in a manner originally suggested by Eisner (1997). [sent-214, score-0.19]

91 Future work includes determining automatatheoretic characterizations of TSL languages and procedures for deciding whether a regular set belongs to TSL, and if so, for what T and k. [sent-215, score-0.225]

92 Also, the erasing function may be used to generalize other subregular classes. [sent-216, score-0.177]

93 6 Conclusion The TSL languages generalize the SL languages and have wide application within phonology. [sent-217, score-0.318]

94 Even though virtually all segmental phonotactic constraints present in the phonologies ofthe world’s languages, both local and non-local, fall into this class, it is striking how highly restricted (sub-star-free) and well-structured the TSL languages are. [sent-218, score-0.411]

95 Neutral vowels in hungarian vowel harmony: An autosegmental interpretation. [sent-249, score-0.218]

96 Information theoretic approaches to phonological structure: the case of Finnish vowel harmony. [sent-276, score-0.226]

97 A maximum en- tropy model of phonotactics and phonotactic learning. [sent-293, score-0.111]

98 A simple proof that Optimality Theory is computationally intractable. [sent-316, score-0.059]

99 Aural pattern recognition experiments and the subregular hi- erarchy. [sent-371, score-0.145]

100 In Christian piecewise and testable in the strict Ebert, Gerhard J ¨ager, and Jens Michaelis, editors, The Mathematics ofLanguage, vol- 6149 of Lecture Notes pages 255–265. [sent-376, score-0.224]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tsl', 0.489), ('tier', 0.385), ('fk', 0.215), ('forbidden', 0.209), ('strictly', 0.166), ('heinz', 0.164), ('languages', 0.159), ('iff', 0.15), ('subregular', 0.145), ('sl', 0.143), ('theorem', 0.129), ('phonological', 0.117), ('testable', 0.113), ('phonotactic', 0.111), ('phonology', 0.111), ('piecewise', 0.111), ('vowel', 0.109), ('harmony', 0.106), ('ld', 0.098), ('segmental', 0.096), ('rogers', 0.096), ('optimality', 0.085), ('permissible', 0.083), ('goldsmith', 0.08), ('mcnaughton', 0.08), ('forbidding', 0.08), ('ltt', 0.074), ('garland', 0.073), ('vowels', 0.073), ('finite', 0.069), ('consonant', 0.064), ('papert', 0.064), ('string', 0.064), ('ct', 0.063), ('jeffrey', 0.061), ('factors', 0.059), ('proof', 0.059), ('patterns', 0.057), ('opaque', 0.055), ('theory', 0.049), ('prince', 0.048), ('clements', 0.048), ('kasprzik', 0.048), ('begins', 0.046), ('local', 0.045), ('aa', 0.044), ('locally', 0.044), ('inquiry', 0.043), ('algebraic', 0.042), ('establishes', 0.042), ('formal', 0.041), ('intersection', 0.04), ('classes', 0.04), ('gre', 0.04), ('longdistance', 0.04), ('regular', 0.038), ('closed', 0.038), ('extension', 0.038), ('transparent', 0.038), ('aabaaacaaabaa', 0.036), ('archangeli', 0.036), ('autosegmental', 0.036), ('codomain', 0.036), ('dissimilation', 0.036), ('fofo', 0.036), ('frisch', 0.036), ('gres', 0.036), ('liquid', 0.036), ('rawal', 0.036), ('bb', 0.036), ('finnish', 0.036), ('definition', 0.034), ('class', 0.034), ('thesis', 0.033), ('inclusion', 0.033), ('editors', 0.033), ('container', 0.032), ('assimilation', 0.032), ('erasing', 0.032), ('bakovi', 0.032), ('otzing', 0.032), ('disagreeing', 0.032), ('typological', 0.032), ('subsequence', 0.031), ('ab', 0.031), ('factor', 0.031), ('ot', 0.031), ('bc', 0.03), ('incomparable', 0.03), ('tiers', 0.03), ('theorems', 0.03), ('abc', 0.03), ('enx', 0.03), ('herbert', 0.03), ('pullum', 0.03), ('delaware', 0.03), ('cambridge', 0.029), ('immediately', 0.029), ('hayes', 0.028), ('characterizations', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999893 303 acl-2011-Tier-based Strictly Local Constraints for Phonology

Author: Jeffrey Heinz ; Chetan Rawal ; Herbert G. Tanner

2 0.067098789 239 acl-2011-P11-5002 k2opt.pdf

Author: empty-author

Abstract: unkown-abstract

3 0.061300457 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity

Author: Kristina Toutanova ; Michel Galley

Abstract: Contrary to popular belief, we show that the optimal parameters for IBM Model 1 are not unique. We demonstrate that, for a large class of words, IBM Model 1 is indifferent among a continuum of ways to allocate probability mass to their translations. We study the magnitude of the variance in optimal model parameters using a linear programming approach as well as multiple random trials, and demonstrate that it results in variance in test set log-likelihood and alignment error rate.

4 0.045538485 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts

Author: Ruihong Huang ; Ellen Riloff

Abstract: The goal of our research is to improve event extraction by learning to identify secondary role filler contexts in the absence of event keywords. We propose a multilayered event extraction architecture that progressively “zooms in” on relevant information. Our extraction model includes a document genre classifier to recognize event narratives, two types of sentence classifiers, and noun phrase classifiers to extract role fillers. These modules are organized as a pipeline to gradually zero in on event-related information. We present results on the MUC-4 event extraction data set and show that this model performs better than previous systems.

5 0.043754779 157 acl-2011-I Thou Thee, Thou Traitor: Predicting Formal vs. Informal Address in English Literature

Author: Manaal Faruqui ; Sebastian Pado

Abstract: In contrast to many languages (like Russian or French), modern English does not distinguish formal and informal (“T/V”) address overtly, for example by pronoun choice. We describe an ongoing study which investigates to what degree the T/V distinction is recoverable in English text, and with what textual features it correlates. Our findings are: (a) human raters can label English utterances as T or V fairly well, given sufficient context; (b), lexical cues can predict T/V almost at human level.

6 0.042912643 311 acl-2011-Translationese and Its Dialects

7 0.041356012 106 acl-2011-Dual Decomposition for Natural Language Processing

8 0.04014834 10 acl-2011-A Discriminative Model for Joint Morphological Disambiguation and Dependency Parsing

9 0.037741978 154 acl-2011-How to train your multi bottom-up tree transducer

10 0.037355635 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation

11 0.035389766 259 acl-2011-Rare Word Translation Extraction from Aligned Comparable Documents

12 0.035021313 234 acl-2011-Optimal Head-Driven Parsing Complexity for Linear Context-Free Rewriting Systems

13 0.033720355 189 acl-2011-K-means Clustering with Feature Hashing

14 0.033677809 75 acl-2011-Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction

15 0.033562627 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization

16 0.033393916 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation

17 0.030914096 27 acl-2011-A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

18 0.030681588 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment

19 0.030522196 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars

20 0.028309336 323 acl-2011-Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.082), (1, -0.003), (2, -0.018), (3, -0.013), (4, -0.005), (5, 0.008), (6, 0.005), (7, -0.003), (8, -0.011), (9, 0.03), (10, 0.011), (11, 0.016), (12, -0.015), (13, 0.046), (14, -0.006), (15, -0.019), (16, -0.006), (17, -0.005), (18, 0.003), (19, -0.001), (20, 0.029), (21, -0.009), (22, 0.006), (23, -0.022), (24, -0.011), (25, -0.019), (26, -0.03), (27, -0.011), (28, 0.031), (29, -0.009), (30, 0.044), (31, 0.061), (32, 0.021), (33, 0.018), (34, 0.028), (35, 0.051), (36, -0.003), (37, 0.004), (38, -0.077), (39, 0.039), (40, -0.038), (41, -0.056), (42, -0.015), (43, -0.014), (44, 0.021), (45, -0.017), (46, -0.069), (47, 0.0), (48, 0.018), (49, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91705537 303 acl-2011-Tier-based Strictly Local Constraints for Phonology

Author: Jeffrey Heinz ; Chetan Rawal ; Herbert G. Tanner

2 0.66178173 239 acl-2011-P11-5002 k2opt.pdf

Author: empty-author

Abstract: unkown-abstract

3 0.55204362 323 acl-2011-Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Author: Dipanjan Das ; Slav Petrov

Abstract: We describe a novel approach for inducing unsupervised part-of-speech taggers for languages that have no labeled training data, but have translated text in a resource-rich language. Our method does not assume any knowledge about the target language (in particular no tagging dictionary is assumed), making it applicable to a wide array of resource-poor languages. We use graph-based label propagation for cross-lingual knowledge transfer and use the projected labels as features in an unsupervised model (BergKirkpatrick et al., 2010). Across eight European languages, our approach results in an average absolute improvement of 10.4% over a state-of-the-art baseline, and 16.7% over vanilla hidden Markov models induced with the Expectation Maximization algorithm.

4 0.52001435 149 acl-2011-Hierarchical Reinforcement Learning and Hidden Markov Models for Task-Oriented Natural Language Generation

Author: Nina Dethlefs ; Heriberto Cuayahuitl

Abstract: Surface realisation decisions in language generation can be sensitive to a language model, but also to decisions of content selection. We therefore propose the joint optimisation of content selection and surface realisation using Hierarchical Reinforcement Learning (HRL). To this end, we suggest a novel reward function that is induced from human data and is especially suited for surface realisation. It is based on a generation space in the form of a Hidden Markov Model (HMM). Results in terms of task success and human-likeness suggest that our unified approach performs better than greedy or random baselines.

5 0.50966257 234 acl-2011-Optimal Head-Driven Parsing Complexity for Linear Context-Free Rewriting Systems

Author: Pierluigi Crescenzi ; Daniel Gildea ; Andrea Marino ; Gianluca Rossi ; Giorgio Satta

Abstract: We study the problem offinding the best headdriven parsing strategy for Linear ContextFree Rewriting System productions. A headdriven strategy must begin with a specified righthand-side nonterminal (the head) and add the remaining nonterminals one at a time in any order. We show that it is NP-hard to find the best head-driven strategy in terms of either the time or space complexity of parsing.

6 0.49934861 124 acl-2011-Exploiting Morphology in Turkish Named Entity Recognition System

7 0.48639202 102 acl-2011-Does Size Matter - How Much Data is Required to Train a REG Algorithm?

8 0.48536852 317 acl-2011-Underspecifying and Predicting Voice for Surface Realisation Ranking

9 0.48507231 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars

10 0.4843719 176 acl-2011-Integrating surprisal and uncertain-input models in online sentence comprehension: formal techniques and empirical results

11 0.47866422 342 acl-2011-full-for-print

12 0.47107479 193 acl-2011-Language-independent compound splitting with morphological operations

13 0.46603459 125 acl-2011-Exploiting Readymades in Linguistic Creativity: A System Demonstration of the Jigsaw Bard

14 0.46094057 157 acl-2011-I Thou Thee, Thou Traitor: Predicting Formal vs. Informal Address in English Literature

15 0.45544264 291 acl-2011-SystemT: A Declarative Information Extraction System

16 0.44940633 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation

17 0.43587181 10 acl-2011-A Discriminative Model for Joint Morphological Disambiguation and Dependency Parsing

18 0.42694032 249 acl-2011-Predicting Relative Prominence in Noun-Noun Compounds

19 0.42571053 212 acl-2011-Local Histograms of Character N-grams for Authorship Attribution

20 0.42520264 274 acl-2011-Semi-Supervised Frame-Semantic Parsing for Unknown Predicates

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.018), (17, 0.039), (26, 0.012), (31, 0.011), (37, 0.06), (39, 0.036), (41, 0.052), (55, 0.032), (59, 0.035), (72, 0.021), (75, 0.454), (91, 0.035), (96, 0.106), (97, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83418822 303 acl-2011-Tier-based Strictly Local Constraints for Phonology

Author: Jeffrey Heinz ; Chetan Rawal ; Herbert G. Tanner

2 0.60718668 299 acl-2011-The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content

Author: Omar F. Zaidan ; Chris Callison-Burch

Abstract: The written form of Arabic, Modern Standard Arabic (MSA), differs quite a bit from the spoken dialects of Arabic, which are the true “native” languages of Arabic speakers used in daily life. However, due to MSA’s prevalence in written form, almost all Arabic datasets have predominantly MSA content. We present the Arabic Online Commentary Dataset, a 52M-word monolingual dataset rich in dialectal content, and we describe our long-term annotation effort to identify the dialect level (and dialect itself) in each sentence of the dataset. So far, we have labeled 108K sentences, 41% of which as having dialectal content. We also present experimental results on the task of automatic dialect identification, using the collected labels for training and evaluation.

3 0.59345114 113 acl-2011-Efficient Online Locality Sensitive Hashing via Reservoir Counting

Author: Benjamin Van Durme ; Ashwin Lall

Abstract: We describe a novel mechanism called Reservoir Counting for application in online Locality Sensitive Hashing. This technique allows for significant savings in the streaming setting, allowing for maintaining a larger number of signatures, or an increased level of approximation accuracy at a similar memory footprint.

4 0.48091495 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence

Author: Nguyen Bach ; Fei Huang ; Yaser Al-Onaizan

Abstract: State-of-the-art statistical machine translation (MT) systems have made significant progress towards producing user-acceptable translation output. However, there is still no efficient way for MT systems to inform users which words are likely translated correctly and how confident it is about the whole sentence. We propose a novel framework to predict wordlevel and sentence-level MT errors with a large number of novel features. Experimental results show that the MT error prediction accuracy is increased from 69.1 to 72.2 in F-score. The Pearson correlation between the proposed confidence measure and the human-targeted translation edit rate (HTER) is 0.6. Improve- ments between 0.4 and 0.9 TER reduction are obtained with the n-best list reranking task using the proposed confidence measure. Also, we present a visualization prototype of MT errors at the word and sentence levels with the objective to improve post-editor productivity.

5 0.44974405 72 acl-2011-Collecting Highly Parallel Data for Paraphrase Evaluation

Author: David Chen ; William Dolan

Abstract: A lack of standard datasets and evaluation metrics has prevented the field of paraphrasing from making the kind of rapid progress enjoyed by the machine translation community over the last 15 years. We address both problems by presenting a novel data collection framework that produces highly parallel text data relatively inexpensively and on a large scale. The highly parallel nature of this data allows us to use simple n-gram comparisons to measure both the semantic adequacy and lexical dissimilarity of paraphrase candidates. In addition to being simple and efficient to compute, experiments show that these metrics correlate highly with human judgments.

6 0.32528365 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals

7 0.32052463 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

8 0.31745765 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing

9 0.31735194 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

10 0.31729752 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

11 0.31662399 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization

12 0.31624651 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

13 0.31584224 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

14 0.31578881 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling

15 0.31543496 239 acl-2011-P11-5002 k2opt.pdf

16 0.31536171 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models

17 0.31512064 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

18 0.31494206 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning

19 0.31456685 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction

20 0.31419939 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation