acl acl2013 acl2013-140 acl2013-140-reference knowledge-graph by maker-knowledge-mining

140 acl-2013-Evaluating Text Segmentation using Boundary Edit Distance

Source: pdf

Author: Chris Fournier

Abstract: This work proposes a new segmentation evaluation metric, named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012). Existing segmentation metrics such as Pk, WindowDiff, and Segmentation Similarity (S) are all able to award partial credit for near misses between boundaries, but are biased towards segmentations containing few or tightly clustered boundaries. Despite S’s improvements, its normalization also produces cosmetically high values that overestimate agreement & performance, leading this work to propose a solution.

reference text

Artstein, Ron and Massimo Poesio. 2008. Intercoder agreement for computational linguistics. Computational Linguistics 34(4):555–596. Baker, David. 1990. Stargazers look for life. South Magazine 117:76–77. Beeferman, Doug and Adam Berger. 1999. Statistical models for text segmentation. Machine Learning 34: 177–210. Carletta, Jean. 1996. Assessing Agreement on Classification Tasks: The Kappa Statistic. Computational Linguistics 22(2):249–254. Chang, Pi-Chuan, Michel Galley, and Christopher D. Manning. 2008. Optimizing Chinese word segmentation for machine translation performance. In Proceedings of the Third Workshop on Statistical Machine Translation. Association for Computational Linguistics, Stroudsburg, PA, USA, pages 224–232. Cohen, Jacob. 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20:37–46. Coleridge, Samuel Taylor. 1816. Christabel, Kubla Khan, and the Pains of Sleep. John Murray. Collins, Wilkie. 1868. The Moonstone. Brothers. Tinsley Davies, Mark and Joseph L. Fleiss. 1982. Measuring agreement for multinomial data. Biometrics 38: 1047–1051. Eisenstein, Jacob. 2009. Hierarchical text segmentation from multi-scale lexical cohesion. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pages 353–361 . Eisenstein, Jacob and Regina Barzilay. 2008. Bayesian unsupervised topic segmentation. In Proceedings of the 2008 Conference on Em- pirical Methods in Natural Language Processing. Association for Computational Linguistics, Morristown, NJ, USA, pages 334–343. Fleiss, Joseph L. 1971 . Measuring nominal scale agreement among many raters. Psychological Bulletin 76:378–382. Fournier, Chris and Diana Inkpen. 2012. Segmentation Similarity and Agreement. In Proceedings of Human Language Technologies: The 2012 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pages 152– 161. Fournier, Christopher. 2013. Evaluating Text Segmentation. Master’s thesis, University of Ottawa. Franz, Martin, J. Scott McCarley, and Jian-Ming Xu. 2007. User-oriented text segmentation evaluation measure. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Stroudsburg, PA, USA, pages 701–702. Gale, William, Kenneth Ward Church, and David Yarowsky. 1992. Estimating upper and lower bounds on the performance of word-sense disambiguation programs. In Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pages 249–256. Georgescul, Maria, Alexander Clark, and Susan Armstrong. 2006. An analysis of quantitative aspects in the evaluation of thematic segmentation algorithms. In Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue. Association for Computational Linguistics, Stroudsburg, PA, USA, pages 144–151. Haghighi, Aria and Lucy Vanderwende. 2009. Exploring content models for multi-document summarization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Strouds- burg, PA, USA, NAACL ’09, pages 362–370. Hearst, Marti A. 1993. TextTiling: A Quantitative Approach to Discourse. Technical report, University of California at Berkeley, Berkeley, CA, USA. Hearst, Marti A. 1997. TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages. Computational Linguistics 23:33–64. Hollander, Myles and Douglas A. Wolfe. 1999. 1711 Nonparametric Statistical Methods. John Wiley & Sons, 2nd edition. Isard, Amy and Jean Carletta. 1995. Replicability of transaction and action coding in the map task corpus. In AAAI Spring Symposium: Empirical Methods in Discourse Interpretation and Generation. pages 60–66. Kazantseva, Anna and Stan Szpakowicz. 2011. Linear Text Segmentation Using Affinity Propagation. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Lin- guistics, Edinburgh, Scotland, UK., pages 284– 293. Kazantseva, Anna and Stan Szpakowicz. 2012. Topical Segmentation: a Study of Human Performance. In Proceedings of Human Language Technologies: The 2012 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pages 211–220. Lamprier, Sylvain, Tassadit Amghar, Bernard Levrat, and Frederic Saubion. 2007. On evaluation methodologies for text segmentation algorithms. In Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence. IEEE Computer Society, Washington, DC, USA, volume 2, pages 19–26. Litman, Diane J. and Rebecca J. Passonneau. 1995. Combining multiple knowledge sources for discourse segmentation. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pages 108–1 15. Malioutov, Igor and Regina Barzilay. 2006. Minimum cut model for spoken lecture segmentation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pages 25–32. Niekrasz, John and Johanna D. Moore. 2010. Unbiased discourse segmentation evaluation. In Proceedings of the IEEE Spoken Language Technology Workshop, SLT 2010. IEEE 2010, pages 43–48. Oh, Hyo-Jung, Sung Hyon Myaeng, and MyungGil Jang. 2007. Semantic passage segmentation based on sentence topics for question answering. Information Sciences 177(18):3696–3717. Passonneau, Rebecca J. and Diane J. Litman. 1993. Intention-based segmentation: human reliability and correlation with linguistic cues. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguis- tics. Association for Computational Linguistics, Stroudsburg, PA, USA, pages 148–155. Pevzner, Lev and Marti A. Hearst. 2002. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics 28: 19–36. Reynar, Jeffrey C. and Adwait Ratnaparkhi. 1997. A maximum entropy approach to identifying sentence boundaries. In Proceedings of the 5th Conference on Applied Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, USA, pages 16–19. Scott, William A. 1955. Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly 19:321–325. Siegel, Sidney and N. J. Castellan. 1988. Nonparametric Statistics for the Behavioral Sciences, McGraw-Hill, New York, USA, chapter 9.8. 2nd edition. Sirts, Kairit and Tanel Alum a¨e. 2012. A Hierarchical Dirichlet Process Model for Joint Part-ofSpeech and Morphology Induction. In Proceedings of Human Language Technologies: The 2012 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pages 407– 416. Stoyanov, Veselin and Claire Cardie. 2008. Topic identification for fine-grained opinion analysis. In Proceedings of the 22nd International Conference on Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pages 817–824. 1712