nips nips2009 nips2009-68 nips2009-68-reference knowledge-graph by maker-knowledge-mining

68 nips-2009-Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora

Source: pdf

Author: Shuang-hong Yang, Hongyuan Zha, Bao-gang Hu

Abstract: We propose Dirichlet-Bernoulli Alignment (DBA), a generative model for corpora in which each pattern (e.g., a document) contains a set of instances (e.g., paragraphs in the document) and belongs to multiple classes. By casting predefined classes as latent Dirichlet variables (i.e., instance level labels), and modeling the multi-label of each pattern as Bernoulli variables conditioned on the weighted empirical average of topic assignments, DBA automatically aligns the latent topics discovered from data to human-defined classes. DBA is useful for both pattern classification and instance disambiguation, which are tested on text classification and named entity disambiguation in web search queries respectively.

reference text

[1] Andrews S. and Hofmann T. (2003) Multiple Instance Learning via Disjunctive Programming Boosting, In Advances in Neural Information Processing Systems 17 (NIPS’03), MIT Press.

[2] Blei D. and McAuliffe J. (2007) Supervised topic models. In Advances in Neural Information Processing Systems 21 (NIPS’07), MIT Press.

[3] Blei D. and Lafferty J. (2007) A correlated topic model of Science. Annals of Applied Statistics. Vol. 1, No. 1, pp. 17–35, 2007.

[4] Blei D., Ng A. and Jordan M. (2003) Latent Dirichlet Allocation. Journal of Machine Learning Research, Vol. 3, pp.993–1022, Jan. 2003, MIT Press.

[5] Boutell M. R., Luo J., Shen X. and Brown C. M. (2004) Learning Multi-Label Scene Classification. Pattern Recognition, 37(9), pp.1757–1771, 2004.

[6] Cour T., Sapp B., Jordan C. and Taskar B. (2009) Learning from Ambiguously Labeled Images, In the 23rd IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09).

[7] Dietterich T. G., Lathrop R. H., Lozano-Perez T. (1997) Solving the Multiple-Instance Problem with Axis-Parallel Rectangles. Artificial Intelligence Journal, Vol. 89, pp.31–71, Jan.1997.

[8] Ghamrawi N. and McCallum A. (2005) Collective Multi-Label Classification, In ACM International Conference On Information And Knowledge Management (CIKM’05), pp.195–200.

[9] Jaakkola, T. and Jordan M. I. (2000). Bayesian parameter estimation via variational methods. Statistics and Computing, Vol 10, Issue 1, pp. 25–37.

[10] Ueda N. and Saito K. (2002) Parametric Mixture Models For Multi-Labeled Text. In Advances in Neural Information Processing Systems 15 (NIPS’02).

[11] Viola P., Platt J. and Zhang C. (2006). Multiple Instance Boosting For Object Detection. In Advances in Neural Information Processing Systems 20 (NIPS’06), pp.1419–1426, MIT Press.

[12] Xu G., Yang S.-H. and Li H. (2009) Named Entity Mining from Click-Through Data Using Weakly Supervised LDA, In ACM Knowledge Discovery and Data Mining (KDD’09).

[13] Zhou Z.-H. and Zhang M.-L. (2006) Multi-Instance Multi-Label Learning with Application to Scene Classification, In Advances in Neural Information Processing Systems 20 (NIPS’06). 8