nips nips2009 nips2009-87 nips2009-87-reference knowledge-graph by maker-knowledge-mining

87 nips-2009-Exponential Family Graph Matching and Ranking


Source: pdf

Author: James Petterson, Jin Yu, Julian J. Mcauley, Tibério S. Caetano

Abstract: We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application–document ranking–exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning document ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high. 1


reference text

em we are given a set of queries {qk } and, for each query qk , a list of D(k) documents

[1] Belongie, S., & Malik, J (2000). Matching with shape contexts. CBAIVL00. k rk } (assigned by recognition using shape contexts. ) } with corresponding ratings {r& , . . . , J. (2002). Shape matching and object a human editor), measur[2] Belongie, S., Malik, J., 1 Puzicha, D(k) IEEE Trans. on PAMI, 24, nce degree of each document with 509–521. to query qk . A rating or relevance degree is respect

[3] Burges, C. J. C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N. & Hulldender, G. (2005). inal value in the list {1,to rank,using gradient descent. ICML.typically between 2 and 5. We are also Learning . . . R}, where R is k ry retrieved document dk , a joint feature vector i for that document and the query i 8 raining time, we model each query q as a vector-weighted bipartite graph (Figure

[4] Caetano, T. S., Cheng, L., Le, Q. V., & Smola, A. J. (2009). Learning graph matching. IEEE Trans. on PAMI, 31, 1048–1058.

[5] Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., & Li, H. (2007). Learning to rank: from pairwise approach to listwise approach. ICML

[6] Freund, Y., Iyer, R., Schapire, R. E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res., 4, 933–969.

[7] Herbrich, A., Graepel, T., & Obermayer, K. (2000). Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers.

[8] Huang, B., & Jebara, T. (2007). Loopy belief propagation for bipartite maximum weight b-matching. AISTATS.

[9] Huang, J. C., & Frey, B. J. (2008). Structured ranking learning using cumulative distribution networks. In NIPS.

[10] Huber, M., & Law, J. (2008). Fast approximation of the permanent for very dense problems. SODA.

[11] Jarvelin, K., & Kekalainen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20, 2002.

[12] Lafferty, J. D., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic modeling for segmenting and labeling sequence data. ICML.

[13] Le, Q., & Smola, A. (2007). Direct optimization of ranking measures. http://arxiv.org/abs/0704.3359.

[14] Liu, T.-Y., Xu, J., Qin, T., Xiong, W., & Li, H. (2007). Letor: Benchmark dataset for research on learning to rank for information retrieval. LR4IR.

[15] Liu, Y. & Shen, X. (2005) Multicategory -learning and support vector machine: Computational tools. J. Computational and Graphical Statistics, 14, 219–236.

[16] Liu, Y. & Shen, X. (2006) Multicategory -learning. JASA, 101, 500–509.

[17] Martinez, C. (2004). Partial quicksort. SIAM.

[18] McAllester, D. (2007). Generalization bounds and consistency for structured labeling. Predicting Structured Data.

[19] Minc, H. (1978). Permanents. Addison-Wesley.

[20] Minka, T., & Robertson, S. (2008). Selection bias in the letor datasets. LR4IR.

[21] Nocedal, J., & Wright, S. J. (1999). Numerical optimization. Springer Series in Operations Research. Springer.

[22] Papadimitriou, C. H., & Steiglitz, K. (1982). Combinatorial optimization: Algorithms and complexity. New Jersey: Prentice-Hall.

[23] Qin, T., Liu, T.-Y., Zhang, X.-D., Wang, D.-S., & Li, H. (2009). Global ranking using continuous conditional random fields. NIPS.

[24] Rigutini, L., Papini, T., Maggini, M., & Scarselli, F. (2008). Sortnet: Learning to rank by a neural-based sorting algorithm. LR4IR.

[25] Ryser, H. J. (1963). Combinatorial mathematics. The Carus Mathematical Monographs, No. 14, Mathematical Association of America.

[26] Sherman, S. (1951). On a Theorem of Hardy, Littlewood, Polya, and Blackwell. Proceedings of the National Academy of Sciences, 37, 826–831.

[27] Taskar, B. (2004). Learning structured prediction models: a large-margin approach. Doctoral dissertation, Stanford University.

[28] Tsai, M., Liu, T., Qin, T., Chen, H., & Ma, W. (2007). Frank: A ranking method with fidelity loss. SIGIR.

[29] Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. JMLR, 6, 1453–1484.

[30] Valiant, L. G. (1979). The complexity of computing the permanent. Theor. Comput. Sci. (pp. 189–201).

[31] Wainwright, M. J., & Jordan, M. I. (2003). Graphical models, exponential families, and variational inference (Technical Report 649). UC Berkeley, Department of Statistics.

[32] Xu, J., & Li, H. (2007). Adarank: a boosting algorithm for information retrieval. SIGIR.

[33] Zheng, Z., Zha, H., & Sun, G. (2008a). Query-level learning to rank using isotonic regression. LR4IR.

[34] Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., & Sun, G. (2008b). A general boosting method and its application to learning ranking functions for web search. NIPS. 9