nips nips2008 nips2008-194 nips2008-194-reference knowledge-graph by maker-knowledge-mining

194 nips-2008-Regularized Learning with Networks of Features

Source: pdf

Author: Ted Sandler, John Blitzer, Partha P. Talukdar, Lyle H. Ungar

Abstract: For many supervised learning problems, we possess prior knowledge about which features yield similar information about the target variable. In predicting the topic of a document, we might know that two words are synonyms, and when performing image recognition, we know which pixels are adjacent. Such synonymous or neighboring features are near-duplicates and should be expected to have similar weights in an accurate model. Here we present a framework for regularized learning when one has prior knowledge about which features are expected to have similar and dissimilar weights. The prior knowledge is encoded as a network whose vertices are features and whose edges represent similarities and dissimilarities between them. During learning, each feature’s weight is penalized by the amount it differs from the average weight of its neighbors. For text classiﬁcation, regularization using networks of word co-occurrences outperforms manifold learning and compares favorably to other recently proposed semi-supervised learning methods. For sentiment analysis, feature networks constructed from declarative human knowledge signiﬁcantly improve prediction accuracy. 1

reference text

[1] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer New York, 2001.

[2] C. Fellbaum. WordNet: an electronic lexical database. MIT Press, 1998.

[3] H. Ogata, S. Goto, K. Sato, W. Fujibuchi, H. Bono, and M. Kanehisa. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research, 27(1):29–34, 1999.

[4] I. Xenarios, D.W. Rice, L. Salwinski, M.K. Baron, E.M. Marcotte, and D. Eisenberg. DIP: The Database of Interacting Proteins. Nucleic Acids Research, 28(1):289–291, 2000.

[5] R.K. Ando and T. Zhang. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. JMLR, 6:1817–1853, 2005.

[6] R. Raina, A.Y. Ng, and D. Koller. Constructing informative priors using transfer learning. In ICML, 2006.

[7] S.T. Roweis and L.K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290(5500):2323–2326, 2000.

[8] E. Krupka and N. Tishby. Incorporating Prior Knowledge on Features into Learning. In AISTATS, 2007.

[9] M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: a geometric framework for lerning from lableed and unlabeled examples. JMLR, 7:2399–2434, 2006.

[10] D. Zhou, O. Bousquet, T.N. Lal, J. Weston, and B. Sch¨ lkopf. Learning with local and global consistency. o In NIPS, 2004.

[11] J. Blitzer, M. Dredze, and F. Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classiﬁcation. In ACL, 2007.

[12] A.B. Goldberg, X. Zhu, and S. Wright. Dissimilarity in Graph-Based Semi-Supervised Classiﬁcation. In AISTATS, 2007.

[13] C. Li and H. Li. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics, 24(9):1175–1182, 2008.

[14] A. Esuli and F. Sebastiani. SentiWordNet: A Publicly Available Lexical Resource For Opinion Mining. In LREC, 2006.

[15] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight. Sparsity and Smoothness via the Fused Lasso. Journal of the Royal Statistical Society Series B, 67(1):91–108, 2005.