acl acl2013 acl2013-140 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chris Fournier
Abstract: This work proposes a new segmentation evaluation metric, named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012). Existing segmentation metrics such as Pk, WindowDiff, and Segmentation Similarity (S) are all able to award partial credit for near misses between boundaries, but are biased towards segmentations containing few or tightly clustered boundaries. Despite S’s improvements, its normalization also produces cosmetically high values that overestimate agreement & performance, leading this work to propose a solution.
Reference: text
sentIndex sentText sentNum sentScore
1 ca Abstract This work proposes a new segmentation evaluation metric, named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012). [sent-3, score-1.942]
2 Existing segmentation metrics such as Pk, WindowDiff, and Segmentation Similarity (S) are all able to award partial credit for near misses between boundaries, but are biased towards segmentations containing few or tightly clustered boundaries. [sent-4, score-1.266]
3 1 Introduction Text segmentation is the task of splitting text into segments by placing boundaries within it. [sent-6, score-0.515]
4 A variety of segmentation granularities, or atomic units, exist, including segmentations at the morpheme (e. [sent-13, score-0.568]
5 Theoretically, segmentations could also contain varying boundary types, e. [sent-29, score-0.65]
6 , two boundary types could differen- tiate between act and scene breaks in a play. [sent-31, score-0.395]
7 Because of its value to natural language processing, various text segmentation tasks have been automated such as topical segmentation— for which a variety of automatic segmenters exist (e. [sent-32, score-0.619]
8 This work addresses how to best select an automatic segmenter and which segmentation metrics are most appropriate to do so. [sent-35, score-0.514]
9 To select an automatic segmenter for a particular task, a variety of segmentation evaluation metrics have been proposed, including Pk (Beeferman and Berger, 1999, pp. [sent-36, score-0.514]
10 Each of these metrics have a variety of flaws: Pk and WindowDiff both under-penalize errors at the beginning of segmentations (Lamprier et al. [sent-40, score-0.341]
11 , 2007) and have a bias towards favouring segmentations with few or tightly-clustered boundaries (Niekrasz and Moore, 2010), while S produces overly optimistic values due to its normalization (shown later). [sent-41, score-0.447]
12 To overcome the flaws of existing text segmentation metrics, this work proposes a new series of metrics derived from an adaptation of boundary edit distance (Fournier and Inkpen, 2012, p. [sent-42, score-1.074]
13 A confusion matrix to interpret segmentation as a classification problem is also proposed, allowing for the computation of information retrieval (IR) metrics such as precision and recall. [sent-45, score-0.54]
14 1 In this work: §2 reviews existing segmentation metrics; §3 proposes an adaptation of boundary edit distance, a new normalization of it, a new confusion matrix for segmentation, and an inter1An implementation of boundary edit distance, boundary similarity, B-precision, and B-recall, etc. [sent-46, score-2.012]
15 1 Segmentation Evaluation Many early studies evaluated automatic segmenters using information retrieval (IR) metrics such as precision, recall, etc. [sent-53, score-0.344]
16 These metrics looked at segmentation as a binary classification problem and were very harsh in their comparisons—no credit was awarded for nearly missing a boundary. [sent-54, score-0.508]
17 Near misses occur frequently in segmentation— although manual coders often agree upon the bulk of where segment lie, they frequently disagree upon the exact position of boundaries (Artstein and Poesio, 2008, p. [sent-55, score-0.69]
18 To attempt to overcome this issue, both Passonneau and Litman (1993) and Hearst (1993) conflated multiple manual segmentations into one that contained only those boundaries which the majority of coders agreed upon. [sent-57, score-0.6]
19 IR metrics were then used to compare automatic segmenters to this majority solution. [sent-58, score-0.344]
20 3–4) explain Pk well: a window of size k—where k is half of the mean manual segmentation length—is slid across both automatic and manual segmentations. [sent-68, score-0.523]
21 A penalty is awarded if the window’s edges are found to be in differing or the same segments within the manual segmentation and the automatic segmentation disagrees. [sent-69, score-0.821]
22 Measuring the proportion of windows in error allows Pk to penalize a fully missed boundary by k windows, whereas a nearly missed boundary is penalized by the distance that it is offset. [sent-71, score-0.953]
23 5–10) identified that Pk: i) penalizes false negatives (FNs)2 more than false positives (FPs); ii) does not penalize full misses within k units of a reference boundary; iii) penalize near misses too harshly in some situations; and iv) is sensitive to internal segment size variance. [sent-74, score-1.021]
24 Its major difference is in how it decides to penalized windows: within a window, ifthe number ofboundaries in the manual segmentation (Mij) differs from the number of boundaries in the automatic segmentation (Aij), then a penalty is given. [sent-77, score-0.88]
25 This change better allowed WD to: i) penalize FPs and FNs more equally;3 ii) Not skip full misses; iii) Less harshly penalize near misses; and iv) Reduce its sensitivity to internal segment size variance. [sent-79, score-0.417]
26 WD(M,A) =N −1 ki=1N,Xj−=ki+k(|Mij− Aij| > 0) (1) WD did not, however, solve all of the issues related to window-based segmentation comparison. [sent-80, score-0.313]
27 Instead of using windows, the work proposes a new restricted edit distance called boundary edit distance which differentiates between full and near misses. [sent-85, score-1.17]
28 , a boundary present in the manual but not the automatic segmentation, and the reverse for a false positive. [sent-88, score-0.551]
29 48) noted that WD interprets a near miss as a FP probabilistically more than as a FN. [sent-91, score-0.376]
30 − S(sa,sb,nt) = 1 −|edits(psba(,Dsb),nt)| (2) Boundary edit distance models full misses as the addition/deletion of a boundary, and near misses as n-wise transpositions. [sent-97, score-0.989]
31 An n-wise trans- position is the act of swapping the position of a boundary with an empty position such that it matches a boundary in the segmentation compared against (up to a spanning distance of nt). [sent-98, score-1.18]
32 S also scales the severity of a near miss by the distance over which it is transposed, allowing it to scale the penalty of a near misses much like WD. [sent-99, score-0.95]
33 Although direct interpretation of such coefficients is difficult, they are an invaluable tool when comparing segmentation data that has been collected with differing labels and when estimating the replicability of a study. [sent-108, score-0.435]
34 κ,π,κ∗, and π∗=A1a −− A Aee (3) When calculating agreement between manual segmenters, boundaries are considered labels and their positions the decisions. [sent-113, score-0.327]
35 53–54) attempted to adapt π∗ to award partial credit for near misses by using the percentage agreement metric of Gale et al. [sent-116, score-0.808]
36 254) to compute actual agreement—which conflates multiple manual segmentations together according to whether a majority of coders agree upon a boundary or not. [sent-118, score-0.957]
37 Unfortunately, such a method of computing agreement grossly inflates results, and “the statistic itself guarantees at least 50% agreement by only pairing off coders against the majority opinion” (Isard and Carletta, 1995, p. [sent-119, score-0.462]
38 154–156) proposed using pairwise mean S for actual agreement to allow inter-coder agreement coefficients to award partial credit for near misses. [sent-122, score-0.819]
39 Unfortunately, because S produces cosmetically high values, it also causes inter-coder agreement coefficients to drastically overestimates actual agreement. [sent-123, score-0.371]
40 3 A New Proposal for Edit-Based Text Segmentation Evaluation In this section, a new boundary edit distance based segmentation metric and confusion matrix is proposed to solve the deficiencies of S for both segmentation comparison and inter-coder agreement. [sent-125, score-1.426]
41 These edit operations are symmetric and operate upon the set of boundaries that occur at each potential boundary position in a pair of segmentations. [sent-131, score-0.762]
42 An example of how these edit operations are applied8 is shown in Figure 1, where a near miss (T), a matching pair of boundaries (M), and two full misses (ADs) are shown with the maximum distance that a transposition can span (nt) set to 2 potential boundaries (i. [sent-132, score-1.212]
43 Importantly, however, pairs of boundaries between the segmentations can be seen that represent the decisions made, and the correctness of these decisions. [sent-136, score-0.43]
44 The transposition is a partially correct decision, or boundary pair. [sent-138, score-0.455]
45 The additions/deletions, however, could be one of two erroneous decisions: to not place an expected boundary (FN), or to place a superfluous boundary (FP). [sent-140, score-0.79]
46 9 This work proposes assigning a correctness score for each boundary pair/decision (shown in Table 1) and then using the mean of this score as a normalization of boundary edit distance. [sent-141, score-1.129]
47 This interpretation intuitively relates boundary edit distance to coder judgements, making it ideal for 8A complete explanation of Boundary Edit Distance is de- tailed in Fournier (2013, Section 4. [sent-142, score-0.73]
48 9Also note that the ADs are close together, and if nt > 2, then they would be considered a T, and not two ADs—this is one way to award partial credit for near misses. [sent-145, score-0.415]
49 calculating actual agreement in inter-coder agreement coefficients and comparing segmentations. [sent-146, score-0.408]
50 Pair Correctness Match1 Addition/deletion Transposition Substitution 0 1− wtspan(Te , nt) 1 − −− w wsord(Se, Tb) Table 1: Correctness of boundary pair 3. [sent-147, score-0.395]
51 2 Boundary Similarity The new boundary edit distance normalization proposed herein is referred to as boundary similarity (B). [sent-148, score-1.096]
52 This normalization was also chosen because it is equivalent to mean boundary pair correctness and so that it ranges in value from 0 to 1. [sent-151, score-0.531]
53 In the worst case, a segmentation comparison will result in no matches, no near misses, no substitutions, and X full misses, i. [sent-152, score-0.569]
54 , |Ae | = X and all osttihteurti otenrsm,s a nind Equation s4s are zero, meaning tnhdaat:l B = 1 −XX + + 0 + 0 + 0 + 0 0 = 1−X/X = 1− 1 = 0 In the best case, a segmentation comparison will result in X matches, no near misses, no substitutions, and no full misses, i. [sent-154, score-0.569]
55 , |BM | = X and all otiothnesr, t eanrmds n ion Equation s4, are zero, meaning nthda ta:l B = 1 −0 +0 0 + + 0 + 0 + 0 X = 1−0/X = 1− 0 = 1 For all other scenarios, varying numbers of matches, near misses, substitutions and full misses will result in values of B between 0 and 1. [sent-156, score-0.55]
56 Equation 4 takes two segmentations (in any order), and the maximum transposition spanning distance (nt). [sent-157, score-0.392]
57 This distance represents the greatest offset between boundary positions that could be considered a near miss and can be used to scale 1705 the severity of a near miss. [sent-158, score-1.07]
58 A variety of scaling functions could be used, and this work arbitrarily chooses a simple fraction to represent each transposition’s severity in terms of its distance from its paired boundary over nt plus a constant wt (0 by default), as shown in Equation 5. [sent-159, score-0.563]
59 (5) If multiple boundary types are used, then substitution edit operations would occur when one boundary type was confused with another. [sent-162, score-0.983]
60 Assigning each boundary type tb ∈ Tb a number on an ordinal scale, substitutions can Tbe weighted by their distance on this scale over the maximum distance plus a constant ws (0 by default), as shown in Equation 6. [sent-163, score-0.596]
61 , B) gives an indication of just how similar one segmentation is to another, but what if one wants to identify some specific attributes of the performance of an automatic segmenter? [sent-170, score-0.351]
62 Is the segmenter confusing one boundary type with another, or is it very precise but has poor recall? [sent-171, score-0.472]
63 This work proposes using a task’s set of boundary types (Tb) and the lack of a boundary (∅) to represent the set of segmentation classes in a boundary classification problem. [sent-173, score-1.539]
64 Using these classes, a confusion matrix (defined in Equation 7) can be created which sums boundary pair correctness so that information-retrieval metrics can be calculated that award partial credit to near misses by scaling edits operations. [sent-174, score-1.252]
65 156–157) adapted four inter-coder agreement formulations provided by Artstein and Poesio (2008) to use S to award partial credit for near misses, but because S produces cosmetically high agreement values they grossly overestimate agreement. [sent-179, score-0.738]
66 , mean boundary pair correctness over all documents and codings compared) to solve this issue (demonstrated in §5) because it does not over-estimate actual agreement (demonstrated in §4 and 5). [sent-182, score-0.695]
67 To that end, this section discusses how each metric interprets a set of hypothetical segmentations of an excerpt of a poem by Coleridge (1816, pp. [sent-184, score-0.429]
68 , punctuation) which may dictate where a boundary lies, but the imagery, places, times, and subjects of the poem appear to twist and wind like a vision in a dream. [sent-203, score-0.45]
69 Thus, placing a topical boundary in this text is a highly subjective task. [sent-204, score-0.49]
70 One hypothetical topical segmentation of the excerpt is shown in Figure 4. [sent-205, score-0.39]
71 In this section, a variety of 1706 contrived automatic segmentations are compared to this manual segmentation to illustrate how each metric reacts to different mistakes. [sent-206, score-0.726]
72 S interprets these segmentations as being quite similar, yet, the automatic segmentation is missing a boundary. [sent-209, score-0.639]
73 W=7 2D7¯ Figure 5: False negative How would each metric react to an automatic segmentation that is very close to placing the boundaries correctly, but makes the slight mistake of thinking that the segment on waterways (3–5) ends a bit too early? [sent-214, score-0.628]
74 =W7 2D7¯ Figure 6: Near miss How would each metric react to an automatic segmentation that adds an additional boundary between line 8 and 9? [sent-221, score-0.98]
75 This would not be ideal because such a boundary falls in the middle of a cohesive description of a garden, representing 10WD is reported as 1−WD because WD is normally a penalWtyD Dm eistr ricep pworhteedre aas sv 1al−ueW WofD D0 bise eicdaeuasl,e u WnlDike i Ss annordm mBa. [sent-222, score-0.395]
76 l lAyd aditionally, k = 2 for all examples in this section because WD computes k from the manual segmentation m, which does not change in these examples. [sent-223, score-0.376]
77 In this case, there are two matching boundaries and a pair that do not match, which is arguably preferable to the full miss and one match in Figure 5, but not to the match and near miss in Figure 6. [sent-226, score-0.639]
78 =W7 2D7¯ Figure 7: False positive How would each metric react to an automatic segmentation that compensates for its lack of pre- cision by spuriously adding boundaries in clusters around where it thinks that segments should begin or end? [sent-231, score-0.611]
79 Both are linear topical segmentations at the paragraph level with only one boundary type, but that is where their similarities end. [sent-259, score-0.698]
80 For the Moonstone data set, the agreement coefficients for each group of 4–6 coders using S-based π∗ is again overinflated at 0. [sent-266, score-0.388]
81 This normalized frequency is shown per Figure 11: Normalized boundaries placed by each coder in the Moonstone data set (with mean±SD) coder in Figure 11 for The Moonstone data set, along with bars indicating the mean and one standard deviation above and below. [sent-285, score-0.363]
82 The Moonstone data set as a whole does not exhibit coders who behaved similarly, supporting the assertion by B-based π∗ that these coders do not agree well (though pockets of agreement exist). [sent-289, score-0.48]
83 From this single random segmentation, other segmentation can be created with a probability of either placing an offset boundary (i. [sent-293, score-0.755]
84 , a near miss) or placing an extra/omitting a boundary (i. [sent-295, score-0.656]
85 This is not apparent, however, for S-based π∗ in Figure 9a; as the probability of a full miss increases, agreement appears to rise and varies depending upon the number of pseudo-coders. [sent-299, score-0.359]
86 An ideal segmentation evaluation metric should, in theory, place the three automatic segmenters between the upper and lower bounds in terms of performance if the metrics, and the segmenters, function properly. [sent-309, score-0.718]
87 Despite the difference in the scale of their values, both S and WD performed almost identically, placing the three automatic segmenters between the upper and lower bounds as expected. [sent-311, score-0.395]
88 05) were found between all segmenters except between APS–human and MinCut–BayesSeg, and WD could only find significant differences between the automatic segmenters and the upper and lower bounds. [sent-313, score-0.524]
89 05 TN Table 2: Mean performance of 5 segmenters using micro-average B, B-precision (B-P), B-recall (B-R), and B-Fβ-measure (B-F1) along with the associated confusion matrix values for 5 segmenters Figure 12: Mean B-precision automatic segmenters versus B-recall of 5 in Figure 12). [sent-327, score-0.807]
90 These automatic segmenters were developed and performance tuned using WD, thus it would be expected that they would perform as they did according to WD, but the evaluation using B highlights WD’s bias towards sparse segmentations (i. [sent-328, score-0.513]
91 Mean B shows an unbiased ranking of these automatic segmenters in terms of the upper and lower bounds. [sent-331, score-0.354]
92 B, then, should be preferred over S and WD for an unbiased segmentation evaluation that assumes that similarity to a human solution is the best measure of performance for a task. [sent-332, score-0.39]
93 7 Conclusions In this work, a new segmentation evaluation metric, referred to as boundary similarity (B) is proposed as an unbiased metric, along with a boundary-edit-distance-based (BED-based) confusion matrix to compute predictably biased IR metrics such as precision and recall. [sent-333, score-0.98]
94 Additionally, a method of adapting inter-coder agreement coefficients to award partial credit for near misses is proposed that uses B as opposed to S for actual agreement so as to not over-estimate agreement. [sent-334, score-1.02]
95 B overcomes the cosmetically high values of S and, the bias towards segmentations with few or tightly-clustered boundaries of WD–manifesting in this work as a bias towards precision over recall for both WD and S. [sent-335, score-0.448]
96 WD and Pk should not be preferred because their biases do not occur consistently in all scenarios, whereas BED-based IR metrics offer expected biases built upon a consistent, edit-based, interpretation of segmentation error. [sent-337, score-0.448]
97 B also allows for an intuitive comparison of boundary pairs between segmentations, as opposed to the window counts of WD or the simplistic edit count normalization of S. [sent-338, score-0.597]
98 When an unbiased segmentation evaluation metric is desired, this work recommends the usage of B and the use of an upper and lower bound to provide context. [sent-339, score-0.466]
99 Otherwise, if the evaluation of a segmentation task requires some biased measure, the predictable bias of IR metrics computed from a BED-based confusion matrix is recommended. [sent-340, score-0.508]
100 8 Future Work Future work includes adapting this work to analyse hierarchical segmentations and using it to attempt to explain the low inter-coder agreement coefficients reported in topical segmentation tasks. [sent-342, score-0.847]
wordName wordTfidf (topN-words)
[('boundary', 0.395), ('segmentation', 0.313), ('wd', 0.26), ('segmentations', 0.255), ('misses', 0.247), ('segmenters', 0.22), ('near', 0.214), ('fournier', 0.164), ('edit', 0.162), ('coders', 0.157), ('agreement', 0.139), ('moonstone', 0.136), ('miss', 0.129), ('boundaries', 0.125), ('hearst', 0.106), ('coder', 0.096), ('coefficients', 0.092), ('pk', 0.091), ('metrics', 0.086), ('kazantseva', 0.084), ('segmenter', 0.077), ('distance', 0.077), ('credit', 0.073), ('inkpen', 0.071), ('confusion', 0.069), ('cosmetically', 0.068), ('khan', 0.068), ('pevzner', 0.068), ('manual', 0.063), ('transposition', 0.06), ('metric', 0.057), ('false', 0.055), ('harshly', 0.055), ('kubla', 0.055), ('poem', 0.055), ('transpositions', 0.055), ('windowdiff', 0.055), ('penalize', 0.053), ('usa', 0.052), ('unbiased', 0.05), ('nt', 0.05), ('correctness', 0.05), ('upon', 0.049), ('award', 0.049), ('stroudsburg', 0.049), ('react', 0.048), ('topical', 0.048), ('substitutions', 0.047), ('placing', 0.047), ('mean', 0.046), ('upper', 0.046), ('litman', 0.045), ('ir', 0.044), ('bounds', 0.044), ('pa', 0.042), ('aps', 0.042), ('mincut', 0.042), ('full', 0.042), ('bayesseg', 0.041), ('coleridge', 0.041), ('lamprier', 0.041), ('niekrasz', 0.041), ('severity', 0.041), ('stargazer', 0.041), ('proposes', 0.041), ('matrix', 0.04), ('normalization', 0.04), ('automatic', 0.038), ('eisenstein', 0.038), ('actual', 0.038), ('equation', 0.038), ('awarded', 0.036), ('drastically', 0.034), ('szpakowicz', 0.034), ('windows', 0.033), ('malioutov', 0.033), ('interprets', 0.033), ('fleiss', 0.033), ('artstein', 0.033), ('interpret', 0.032), ('passonneau', 0.032), ('siegel', 0.031), ('beeferman', 0.031), ('coefficient', 0.031), ('operations', 0.031), ('segments', 0.03), ('ads', 0.03), ('replicability', 0.03), ('partial', 0.029), ('hypothetical', 0.029), ('carletta', 0.028), ('penalty', 0.028), ('similarity', 0.027), ('behaved', 0.027), ('codings', 0.027), ('decree', 0.027), ('favouring', 0.027), ('fertile', 0.027), ('grossly', 0.027), ('intercoder', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000017 140 acl-2013-Evaluating Text Segmentation using Boundary Edit Distance
Author: Chris Fournier
Abstract: This work proposes a new segmentation evaluation metric, named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012). Existing segmentation metrics such as Pk, WindowDiff, and Segmentation Similarity (S) are all able to award partial credit for near misses between boundaries, but are biased towards segmentations containing few or tightly clustered boundaries. Despite S’s improvements, its normalization also produces cosmetically high values that overestimate agreement & performance, leading this work to propose a solution.
Author: Rohan Ramanath ; Monojit Choudhury ; Kalika Bali ; Rishiraj Saha Roy
Abstract: Query segmentation, like text chunking, is the first step towards query understanding. In this study, we explore the effectiveness of crowdsourcing for this task. Through carefully designed control experiments and Inter Annotator Agreement metrics for analysis of experimental data, we show that crowdsourcing may not be a suitable approach for query segmentation because the crowd seems to have a very strong bias towards dividing the query into roughly equal (often only two) parts. Similarly, in the case of hierarchical or nested segmentation, turkers have a strong preference towards balanced binary trees.
3 0.19523375 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study
Author: Wenbin Jiang ; Meng Sun ; Yajuan Lu ; Yating Yang ; Qun Liu
Abstract: Structural information in web text provides natural annotations for NLP problems such as word segmentation and parsing. In this paper we propose a discriminative learning algorithm to take advantage of the linguistic knowledge in large amounts of natural annotations on the Internet. It utilizes the Internet as an external corpus with massive (although slight and sparse) natural annotations, and enables a classifier to evolve on the large-scaled and real-time updated web text. With Chinese word segmentation as a case study, experiments show that the segmenter enhanced with the Chinese wikipedia achieves sig- nificant improvement on a series of testing sets from different domains, even with a single classifier and local features.
4 0.16913214 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
Author: Xiaodong Zeng ; Derek F. Wong ; Lidia S. Chao ; Isabel Trancoso
Abstract: This paper presents a semi-supervised Chinese word segmentation (CWS) approach that co-regularizes character-based and word-based models. Similarly to multi-view learning, the “segmentation agreements” between the two different types of view are used to overcome the scarcity of the label information on unlabeled data. The proposed approach trains a character-based and word-based model on labeled data, respectively, as the initial models. Then, the two models are constantly updated using unlabeled examples, where the learning objective is maximizing their segmentation agreements. The agreements are regarded as a set of valuable constraints for regularizing the learning of both models on unlabeled data. The segmentation for an input sentence is decoded by using a joint scoring function combining the two induced models. The evaluation on the Chinese tree bank reveals that our model results in better gains over the state-of-the-art semi-supervised models reported in the literature.
5 0.16098922 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars
Author: David Chiang ; Jacob Andreas ; Daniel Bauer ; Karl Moritz Hermann ; Bevan Jones ; Kevin Knight
Abstract: Hyperedge replacement grammar (HRG) is a formalism for generating and transforming graphs that has potential applications in natural language understanding and generation. A recognition algorithm due to Lautemann is known to be polynomial-time for graphs that are connected and of bounded degree. We present a more precise characterization of the algorithm’s complexity, an optimization analogous to binarization of contextfree grammars, and some important implementation details, resulting in an algorithm that is practical for natural-language applications. The algorithm is part of Bolinas, a new software toolkit for HRG processing.
6 0.15445901 128 acl-2013-Does Korean defeat phonotactic word segmentation?
7 0.1353112 80 acl-2013-Chinese Parsing Exploiting Characters
8 0.12893867 193 acl-2013-Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations
9 0.11486705 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing
10 0.10837795 173 acl-2013-Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
11 0.085274525 73 acl-2013-Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions
12 0.065525293 50 acl-2013-An improved MDL-based compression algorithm for unsupervised word segmentation
13 0.064722471 263 acl-2013-On the Predictability of Human Assessment: when Matrix Completion Meets NLP Evaluation
14 0.060129758 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
15 0.059340023 121 acl-2013-Discovering User Interactions in Ideological Discussions
16 0.058451917 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing
17 0.056840651 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art
18 0.055514064 326 acl-2013-Social Text Normalization using Contextual Graph Random Walks
19 0.055147912 195 acl-2013-Improving machine translation by training against an automatic semantic frame based evaluation metric
20 0.050235625 240 acl-2013-Microblogs as Parallel Corpora
topicId topicWeight
[(0, 0.155), (1, -0.015), (2, -0.095), (3, 0.01), (4, 0.186), (5, -0.067), (6, -0.066), (7, -0.021), (8, -0.049), (9, 0.125), (10, -0.055), (11, 0.098), (12, -0.013), (13, -0.042), (14, -0.089), (15, -0.03), (16, 0.087), (17, 0.026), (18, 0.009), (19, 0.027), (20, -0.015), (21, 0.022), (22, -0.017), (23, -0.02), (24, -0.006), (25, -0.0), (26, -0.078), (27, 0.102), (28, -0.003), (29, 0.072), (30, -0.021), (31, 0.002), (32, -0.057), (33, -0.021), (34, 0.013), (35, -0.107), (36, 0.034), (37, 0.049), (38, -0.087), (39, 0.057), (40, 0.16), (41, 0.019), (42, 0.008), (43, 0.054), (44, -0.005), (45, -0.081), (46, -0.004), (47, -0.019), (48, -0.16), (49, -0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.96891046 140 acl-2013-Evaluating Text Segmentation using Boundary Edit Distance
Author: Chris Fournier
Abstract: This work proposes a new segmentation evaluation metric, named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012). Existing segmentation metrics such as Pk, WindowDiff, and Segmentation Similarity (S) are all able to award partial credit for near misses between boundaries, but are biased towards segmentations containing few or tightly clustered boundaries. Despite S’s improvements, its normalization also produces cosmetically high values that overestimate agreement & performance, leading this work to propose a solution.
Author: Rohan Ramanath ; Monojit Choudhury ; Kalika Bali ; Rishiraj Saha Roy
Abstract: Query segmentation, like text chunking, is the first step towards query understanding. In this study, we explore the effectiveness of crowdsourcing for this task. Through carefully designed control experiments and Inter Annotator Agreement metrics for analysis of experimental data, we show that crowdsourcing may not be a suitable approach for query segmentation because the crowd seems to have a very strong bias towards dividing the query into roughly equal (often only two) parts. Similarly, in the case of hierarchical or nested segmentation, turkers have a strong preference towards balanced binary trees.
3 0.73030794 128 acl-2013-Does Korean defeat phonotactic word segmentation?
Author: Robert Daland ; Kie Zuraw
Abstract: Computational models of infant word segmentation have not been tested on a wide range of languages. This paper applies a phonotactic segmentation model to Korean. In contrast to the undersegmentation pattern previously found in English and Russian, the model exhibited more oversegmentation errors and more errors overall. Despite the high error rate, analysis suggested that lexical acquisition might not be problematic, provided that infants attend only to frequently segmented items. 1
4 0.66911137 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study
Author: Wenbin Jiang ; Meng Sun ; Yajuan Lu ; Yating Yang ; Qun Liu
Abstract: Structural information in web text provides natural annotations for NLP problems such as word segmentation and parsing. In this paper we propose a discriminative learning algorithm to take advantage of the linguistic knowledge in large amounts of natural annotations on the Internet. It utilizes the Internet as an external corpus with massive (although slight and sparse) natural annotations, and enables a classifier to evolve on the large-scaled and real-time updated web text. With Chinese word segmentation as a case study, experiments show that the segmenter enhanced with the Chinese wikipedia achieves sig- nificant improvement on a series of testing sets from different domains, even with a single classifier and local features.
5 0.63059074 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
Author: Xiaodong Zeng ; Derek F. Wong ; Lidia S. Chao ; Isabel Trancoso
Abstract: This paper presents a semi-supervised Chinese word segmentation (CWS) approach that co-regularizes character-based and word-based models. Similarly to multi-view learning, the “segmentation agreements” between the two different types of view are used to overcome the scarcity of the label information on unlabeled data. The proposed approach trains a character-based and word-based model on labeled data, respectively, as the initial models. Then, the two models are constantly updated using unlabeled examples, where the learning objective is maximizing their segmentation agreements. The agreements are regarded as a set of valuable constraints for regularizing the learning of both models on unlabeled data. The segmentation for an input sentence is decoded by using a joint scoring function combining the two induced models. The evaluation on the Chinese tree bank reveals that our model results in better gains over the state-of-the-art semi-supervised models reported in the literature.
6 0.61324453 50 acl-2013-An improved MDL-based compression algorithm for unsupervised word segmentation
7 0.58364087 193 acl-2013-Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations
8 0.47583738 173 acl-2013-Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
9 0.46620741 80 acl-2013-Chinese Parsing Exploiting Characters
10 0.44628048 243 acl-2013-Mining Informal Language from Chinese Microtext: Joint Word Recognition and Segmentation
11 0.43861997 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing
12 0.41100365 1 acl-2013-"Let Everything Turn Well in Your Wife": Generation of Adult Humor Using Lexical Constraints
13 0.40782008 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars
14 0.40519294 273 acl-2013-Paraphrasing Adaptation for Web Search Ranking
15 0.39930877 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
16 0.39664748 149 acl-2013-Exploring Word Order Universals: a Probabilistic Graphical Model Approach
17 0.38844308 281 acl-2013-Post-Retrieval Clustering Using Third-Order Similarity Measures
18 0.35796338 100 acl-2013-Crowdsourcing Interaction Logs to Understand Text Reuse from the Web
19 0.34292039 324 acl-2013-Smatch: an Evaluation Metric for Semantic Feature Structures
20 0.34206051 73 acl-2013-Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions
topicId topicWeight
[(0, 0.043), (6, 0.028), (11, 0.064), (15, 0.03), (24, 0.101), (26, 0.036), (28, 0.012), (35, 0.064), (42, 0.059), (48, 0.042), (63, 0.266), (70, 0.026), (88, 0.03), (90, 0.027), (95, 0.079)]
simIndex simValue paperId paperTitle
1 0.80971426 151 acl-2013-Extra-Linguistic Constraints on Stance Recognition in Ideological Debates
Author: Kazi Saidul Hasan ; Vincent Ng
Abstract: Determining the stance expressed by an author from a post written for a twosided debate in an online debate forum is a relatively new problem. We seek to improve Anand et al.’s (201 1) approach to debate stance classification by modeling two types of soft extra-linguistic constraints on the stance labels of debate posts, user-interaction constraints and ideology constraints. Experimental results on four datasets demonstrate the effectiveness of these inter-post constraints in improving debate stance classification.
same-paper 2 0.78868902 140 acl-2013-Evaluating Text Segmentation using Boundary Edit Distance
Author: Chris Fournier
Abstract: This work proposes a new segmentation evaluation metric, named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012). Existing segmentation metrics such as Pk, WindowDiff, and Segmentation Similarity (S) are all able to award partial credit for near misses between boundaries, but are biased towards segmentations containing few or tightly clustered boundaries. Despite S’s improvements, its normalization also produces cosmetically high values that overestimate agreement & performance, leading this work to propose a solution.
3 0.71540833 219 acl-2013-Learning Entity Representation for Entity Disambiguation
Author: Zhengyan He ; Shujie Liu ; Mu Li ; Ming Zhou ; Longkai Zhang ; Houfeng Wang
Abstract: We propose a novel entity disambiguation model, based on Deep Neural Network (DNN). Instead of utilizing simple similarity measures and their disjoint combinations, our method directly optimizes document and entity representations for a given similarity measure. Stacked Denoising Auto-encoders are first employed to learn an initial document representation in an unsupervised pre-training stage. A supervised fine-tuning stage follows to optimize the representation towards the similarity measure. Experiment results show that our method achieves state-of-the-art performance on two public datasets without any manually designed features, even beating complex collective approaches.
4 0.69710141 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words
Author: Hongliang Yu ; Zhi-Hong Deng ; Shiyingxue Li
Abstract: Sentiment Word Identification (SWI) is a basic technique in many sentiment analysis applications. Most existing researches exploit seed words, and lead to low robustness. In this paper, we propose a novel optimization-based model for SWI. Unlike previous approaches, our model exploits the sentiment labels of documents instead of seed words. Several experiments on real datasets show that WEED is effective and outperforms the state-of-the-art methods with seed words.
5 0.5797255 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions
Author: Amjad Abu-Jbara ; Ben King ; Mona Diab ; Dragomir Radev
Abstract: In this paper, we use Arabic natural language processing techniques to analyze Arabic debates. The goal is to identify how the participants in a discussion split into subgroups with contrasting opinions. The members of each subgroup share the same opinion with respect to the discussion topic and an opposing opinion to the members of other subgroups. We use opinion mining techniques to identify opinion expressions and determine their polarities and their targets. We opinion predictions to represent the discussion in one of two formal representations: signed attitude network or a space of attitude vectors. We identify opinion subgroups by partitioning the signed network representation or by clustering the vector space representation. We evaluate the system using a data set of labeled discussions and show that it achieves good results.
6 0.57225227 49 acl-2013-An annotated corpus of quoted opinions in news articles
8 0.55301958 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization
9 0.54880142 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
10 0.54829293 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations
11 0.54581124 267 acl-2013-PARMA: A Predicate Argument Aligner
12 0.54578084 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
13 0.5446915 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data
14 0.54440194 207 acl-2013-Joint Inference for Fine-grained Opinion Extraction
15 0.54351091 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study
16 0.54134226 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
17 0.54118478 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
18 0.54043555 318 acl-2013-Sentiment Relevance
19 0.54036593 287 acl-2013-Public Dialogue: Analysis of Tolerance in Online Discussions
20 0.53971398 72 acl-2013-Bridging Languages through Etymology: The case of cross language text categorization