jmlr jmlr2012 jmlr2012-107 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xinhua Zhang, Ankan Saha, S.V.N. Vishwanathan
Abstract: Optimizing multivariate performance measure is an important task in Machine Learning. Joachims (2005) introduced a Support Vector Method whose underlying optimization problem is commonly solved by cutting plane methods (CPMs) such as SVM-Perf and BMRM. It can be shown that CPMs 1 converge to an ε accurate solution in O λε iterations, where λ is the trade-off parameter between the regularizer and the loss function. Motivated by the impressive convergence rate of CPM on a number of practical problems, it was conjectured that these rates can be further improved. We disprove this conjecture in this paper by constructing counter examples. However, surprisingly, we further discover that these problems are not inherently hard, and we develop a novel smoothing strategy, which in conjunction with Nesterov’s accelerated gradient method, can find an ε accuiterations. Computationally, our smoothing technique is also rate solution in O∗ min 1 , √1 ε λε particularly advantageous for optimizing multivariate performance scores such as precision/recall break-even point and ROCArea; the cost per iteration remains the same as that of CPMs. Empirical evaluation on some of the largest publicly available data sets shows that our method converges significantly faster than CPMs without sacrificing generalization ability. Keywords: non-smooth optimization, max-margin methods, multivariate performance measures, Support Vector Machines, smoothing
Reference: text
sentIndex sentText sentNum sentScore
1 7 8 10 1 2 10 10 (c) Obj value for PRBEP, λ = 10−6 BMRM SMS −1 10 W all− c lo c k tim e (s ec o n d s ) 0 10 W all− c lo c k tim e (s ec o n d s ) 6 7 . [sent-1025, score-1.709]
2 4 6 −1 10 W all− c lo c k tim e (s ec o n d s ) (a) Obj value for PRBEP, λ = 10−2 BMRM SMS T es t PRBEP (% ) 0 . [sent-1036, score-0.867]
3 9 3 0 −1 10 10 W all− c lo c k tim e (s ec o n d s ) 0 1 10 2 10 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 67. [sent-1043, score-1.704]
4 2 0 −1 BMRM SMS 10 −1 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 67. [sent-1055, score-1.062]
5 1 9 0 BMRM SMS 10 −1 10 W all− c lo c k tim e (s ec o n d s ) 0 10 1 10 2 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 8 9 . [sent-1061, score-1.719]
6 2 3 10 3 4 10 10 (c) Obj value for PRBEP, λ = 10−6 BMRM SMS 1 10 W all− c lo c k tim e (s ec o n d s ) 2 10 W all− c lo c k tim e (s ec o n d s ) 7 8 . [sent-1092, score-1.709]
7 3 9 2 10 BMRM SMS 1 2 10 W all− c lo c k tim e (s ec o n d s ) 3 10 4 10 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 78. [sent-1111, score-1.704]
8 5 0 1 BMRM SMS 10 1 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 78. [sent-1123, score-1.062]
9 2 5 2 BMRM SMS 10 1 2 10 W all− c lo c k tim e (s ec o n d s ) 10 3 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 6 6 . [sent-1129, score-1.719]
10 7 9 0 0 W all− c lo c k tim e (s ec o n d s ) T es t PRBEP (% ) T es t PRBEP (% ) T es t PRBEP (% ) 8 7 . [sent-1164, score-0.882]
11 7 2 1 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k 1 . [sent-1174, score-1.062]
12 0 7 10 BMRM SMS 0 1 10 W all− c lo c k tim e (s ec o n d s ) 10 2 3 10 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 99. [sent-1179, score-1.704]
13 3 7 0 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 95. [sent-1195, score-1.062]
14 30 10 0 1 10 W all− c lo c k tim e (s ec o n d s ) 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 9 7 . [sent-1197, score-1.719]
15 4 3 Valu e o f Reg u lariz ed Ris k Valu e o f Reg u lariz ed Ris k S MOOTHING M ULTIVARIATE P ERFORMANCE M EASURES −1 0 10 W all− c lo c k tim e (s ec o n d s ) 9 6 . [sent-1243, score-1.272]
16 7 7 10 W all− c lo c k tim e (s ec o n d s ) BMRM SMS 0 1 10 10 2 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 97. [sent-1247, score-1.704]
17 7 6 − 1 10 0 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 98. [sent-1263, score-1.062]
18 20 W all− c lo c k tim e (s ec o n d s ) 0 1 10 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 9 8 . [sent-1265, score-1.719]
19 0 0 W all− c lo c k tim e (s ec o n d s ) BMRM SMS 1 W all− c lo c k tim e (s ec o n d s ) (e) Test PRBEP, λ = 10−4 , 49. [sent-1313, score-1.704]
20 3 9 0 10 W all− c lo c k tim e (s ec o n d s ) (a) Obj value for PRBEP, λ = 10−2 BMRM SMS T es t PRBEP (% ) 0 . [sent-1380, score-0.867]
21 0 0 Valu e o f Reg u lariz ed Ris k Valu e o f Reg u lariz ed Ris k S MOOTHING M ULTIVARIATE P ERFORMANCE M EASURES 10 BMRM SMS 0 1 10 W all− c lo c k tim e (s ec o n d s ) 2 10 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 69. [sent-1383, score-2.124]
22 6 8 1 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 68. [sent-1399, score-1.062]
23 51 10 0 1 10 W all− c lo c k tim e (s ec o n d s ) 10 2 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 8 1 . [sent-1401, score-1.719]
24 9 7 T es t PRBEP (% ) 3 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k 1 . [sent-1446, score-1.072]
25 0 0 Valu e o f Reg u lariz ed Ris k Valu e o f Reg u lariz ed Ris k Z HANG , S AHA AND V ISHWANATHAN BMRM SMS 1 2 10 W all− c lo c k tim e (s ec o n d s ) 3 10 7 3 . [sent-1447, score-1.272]
26 5 1 10 BMRM SMS 1 2 10 W all− c lo c k tim e (s ec o n d s ) 3 10 10 4 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 78. [sent-1451, score-1.704]
27 0 5 3 BMRM SMS 10 1 2 10 W all− c lo c k tim e (s ec o n d s ) 10 3 10 Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 78. [sent-1463, score-1.062]
28 0 5 4 BMRM SMS 10 1 2 10 W all− c lo c k tim e (s ec o n d s ) 10 3 10 4 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 8 5 . [sent-1469, score-1.719]
29 8 1 BMRM SMS BMRM SMS W all− c lo c k tim e (s ec o n d s ) T es t PRBEP (% ) T es t PRBEP (% ) 3 1 . [sent-1515, score-0.872]
30 9 7 3 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k BMRM SMS T es t PRBEP (% ) 1 . [sent-1518, score-1.072]
31 0 1 Valu e o f Reg u lariz ed Ris k Valu e o f Reg u lariz ed Ris k S MOOTHING M ULTIVARIATE P ERFORMANCE M EASURES 10 2 3 10 W all− c lo c k tim e (s ec o n d s ) 4 10 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 29. [sent-1519, score-2.124]
32 9 9 3 10 BMRM SMS 10 2 3 10 W all− c lo c k tim e (s ec o n d s ) 10 Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 29. [sent-1531, score-1.062]
33 9 9 4 10 2 3 10 W all− c lo c k tim e (s ec o n d s ) 10 4 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 9 5 . [sent-1537, score-1.719]
34 0 4 W all− c lo c k tim e (s ec o n d s ) T es t PRBEP (% ) T es t PRBEP (% ) 4 5 . [sent-1580, score-0.872]
35 3 32 10 4 10 3 10 W all− c lo c k tim e (s ec o n d s ) 4 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 54. [sent-1588, score-1.704]
36 4 02 10 4 10 W all− c lo c k tim e (s ec o n d s ) 3 10 4 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 9 0 . [sent-1606, score-1.719]
37 0 4 10 W all− c lo c k tim e (s ec o n d s ) BMRM SMS 2 3 10 4 10 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 89. [sent-1657, score-1.704]
38 9 9 1 10 3 10 BMRM SMS W all− c lo c k tim e (s ec o n d s ) 2 10 3 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 8 0 . [sent-1675, score-1.719]
39 3 3 BMRM SMS 1 1 2 10 3 10 W all− c lo c k tim e (s ec o n d s ) (c) Obj value for PRBEP, λ = 10−6 T es t PRBEP (% ) 8 2 . [sent-1705, score-0.867]
40 2 6 2 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k 1 . [sent-1720, score-1.062]
41 4 1 1 10 10 W all− c lo c k tim e (s ec o n d s ) 2 10 3 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 82. [sent-1725, score-1.704]
42 5 9 2 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 77. [sent-1741, score-1.062]
43 84 10 1 2 10 W all− c lo c k tim e (s ec o n d s ) 10 3 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 9 2 . [sent-1743, score-1.719]
44 1 9 10 2 3 10 4 10 10 W all− c lo c k tim e (s ec o n d s ) (c) Obj value for PRBEP, λ = 10−6 BMRM SMS 1 10 W all− c lo c k tim e (s ec o n d s ) 1 10 8 0 . [sent-1776, score-1.709]
45 0 9 4 10 BMRM SMS 1 2 10 W all− c lo c k tim e (s ec o n d s ) 3 10 10 4 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 80. [sent-1793, score-1.704]
46 7 0 3 BMRM SMS 10 1 2 10 W all− c lo c k tim e (s ec o n d s ) 10 3 10 Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 80. [sent-1805, score-1.062]
47 6 9 4 BMRM SMS 10 1 2 10 W all− c lo c k tim e (s ec o n d s ) 10 3 10 4 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 8 7 . [sent-1811, score-1.719]
48 2 2 2 10 W all− c lo c k tim e (s ec o n d s ) 1 W all− c lo c k tim e (s ec o n d s ) 9 7 . [sent-1845, score-1.704]
49 4 0 Valu e o f Reg u lariz ed Ris k Valu e o f Reg u lariz ed Ris k Z HANG , S AHA AND V ISHWANATHAN 10 BMRM SMS 1 2 10 W all− c lo c k tim e (s ec o n d s ) 3 10 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 94. [sent-1861, score-2.124]
50 9 6 2 BMRM SMS 10 1 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 94. [sent-1873, score-1.062]
51 1 4 2 BMRM SMS 10 1 10 W all− c lo c k tim e (s ec o n d s ) 2 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 9 5 . [sent-1879, score-1.719]
52 8 91 10 4 10 W all− c lo c k tim e (s ec o n d s ) BMRM SMS 2 10 3 10 4 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 93. [sent-1929, score-1.704]
53 6 8 1 10 3 10 BMRM SMS W all− c lo c k tim e (s ec o n d s ) 2 3 10 10 Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 93. [sent-1941, score-1.062]
54 0 5 1 10 4 10 BMRM SMS W all− c lo c k tim e (s ec o n d s ) 2 3 10 10 4 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 7 4 . [sent-1947, score-1.719]
55 2 3 10 W all− c lo c k tim e (s ec o n d s ) 2 10 3 10 10 W all− c lo c k tim e (s ec o n d s ) (c) Obj value for PRBEP, λ = 10−6 9 1 . [sent-1981, score-1.709]
56 0 0 10 W all− c lo c k tim e (s ec o n d s ) 1 2 10 3 10 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 93. [sent-1998, score-1.704]
57 9 1 0 10 3 10 BMRM SMS W all− c lo c k tim e (s ec o n d s ) 1 2 10 10 3 10 Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 93. [sent-2010, score-1.062]
58 7 2 0 10 4 10 BMRM SMS W all− c lo c k tim e (s ec o n d s ) 1 2 10 10 3 10 4 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 7 5 . [sent-2016, score-1.719]
59 0 0 Valu e o f Reg u lariz ed Ris k Valu e o f Reg u lariz ed Ris k S MOOTHING M ULTIVARIATE P ERFORMANCE M EASURES 0 1 10 2 10 W all− c lo c k tim e (s ec o n d s ) 10 9 6 . [sent-2063, score-1.272]
60 4 8 3 10 0 10 W all− c lo c k tim e (s ec o n d s ) 1 2 10 10 3 10 4 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 97. [sent-2067, score-1.704]
61 3 3 2 10 W all− c lo c k tim e (s ec o n d s ) 10 BMRM SMS 0 10 W all− c lo c k tim e (s ec o n d s ) 1 2 10 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 1 0 0 . [sent-2085, score-2.571]
62 29 W all− c lo c k tim e (s ec o n d s ) 2 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k 1 . [sent-2135, score-1.914]
63 40 1 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k 6 9 . [sent-2141, score-1.062]
64 8 0 Valu e o f Reg u lariz ed Ris k 1 10 W all− c lo c k tim e (s ec o n d s ) 0 . [sent-2151, score-1.062]
65 0 2 − 1 10 −1 1 10 W all− c lo c k tim e (s ec o n d s ) T es t PRBEP (% ) T es t PRBEP (% ) T es t PRBEP (% ) 8 4 . [sent-2189, score-0.882]
66 0 0 Valu e o f Reg u lariz ed Ris k Valu e o f Reg u lariz ed Ris k S MOOTHING M ULTIVARIATE P ERFORMANCE M EASURES W all− c lo c k tim e (s ec o n d s ) 9 4 . [sent-2200, score-1.272]
67 9 8 − 1 10 0 10 0 10 W all− c lo c k tim e (s ec o n d s ) 1 10 2 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 94. [sent-2204, score-1.704]
68 9 0 0 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 95. [sent-2220, score-1.062]
69 51 −1 0 10 W all− c lo c k tim e (s ec o n d s ) 1 10 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 9 7 . [sent-2222, score-1.719]
70 0 0 0 10 1 10 W all− c lo c k tim e (s ec o n d s ) (a) Obj value for PRBEP, λ = 10−2 (b) Obj value for PRBEP, λ = 10−4 T es t PRBEP (% ) T es t PRBEP (% ) 4 9 . [sent-2249, score-0.882]
71 3 3 10 W all− c lo c k tim e (s ec o n d s ) 1 2 10 10 W all− c lo c k tim e (s ec o n d s ) (c) Obj value for PRBEP, λ = 10−6 5 2 . [sent-2256, score-1.709]
72 3 3 0 10 1 10 W all− c lo c k tim e (s ec o n d s ) 1 2 10 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 50. [sent-2272, score-1.704]
73 4 9 1 10 W all− c lo c k tim e (s ec o n d s ) BMRM SMS 1 10 W all− c lo c k tim e (s ec o n d s ) 2 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 9 6 . [sent-2290, score-2.571]
74 9 4 2 10 W all− c lo c k tim e (s ec o n d s ) 1 10 W all− c lo c k tim e (s ec o n d s ) 9 5 . [sent-2328, score-1.704]
75 0 0 Valu e o f Reg u lariz ed Ris k Valu e o f Reg u lariz ed Ris k S MOOTHING M ULTIVARIATE P ERFORMANCE M EASURES 10 BMRM SMS 1 2 10 W all− c lo c k tim e (s ec o n d s ) 10 3 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 94. [sent-2340, score-2.124]
76 6 7 1 10 W all− c lo c k tim e (s ec o n d s ) BMRM SMS 1 10 W all− c lo c k tim e (s ec o n d s ) 2 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 9 8 . [sent-2358, score-2.571]
77 9 7 0 10 W all− c lo c k tim e (s ec o n d s ) 0 10 W all− c lo c k tim e (s ec o n d s ) 8 1 . [sent-2392, score-1.704]
78 0 0 Valu e o f Reg u lariz ed Ris k Valu e o f Reg u lariz ed Ris k Z HANG , S AHA AND V ISHWANATHAN −1 10 10 W all− c lo c k tim e (s ec o n d s ) 0 1 10 2 10 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 79. [sent-2408, score-2.124]
79 6 0 −1 BMRM SMS 10 −1 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 76. [sent-2420, score-1.062]
80 5 1 0 BMRM SMS 10 −1 10 W all− c lo c k tim e (s ec o n d s ) 0 10 1 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 9 9 . [sent-2426, score-1.719]
81 6 8 3 10 W all− c lo c k tim e (s ec o n d s ) 2 W all− c lo c k tim e (s ec o n d s ) 1 0 0 . [sent-2460, score-1.704]
82 5 9 Valu e o f Reg u lariz ed Ris k Valu e o f Reg u lariz ed Ris k S MOOTHING M ULTIVARIATE P ERFORMANCE M EASURES 10 BMRM SMS 2 3 10 W all− c lo c k tim e (s ec o n d s ) 4 10 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 99. [sent-2476, score-2.124]
83 5 8 3 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 98. [sent-2492, score-1.062]
84 26 10 2 3 10 W all− c lo c k tim e (s ec o n d s ) 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 1 0 0 . [sent-2494, score-1.719]
85 7 8 1 10 W all− c lo c k tim e (s ec o n d s ) 10 W all− c lo c k tim e (s ec o n d s ) 9 4 . [sent-2532, score-1.704]
86 4 5 1 10 W all− c lo c k tim e (s ec o n d s ) T es t PRBEP (% ) BMRM SMS Valu e o f Reg u lariz ed Ris k − x 1 01 6 . [sent-2543, score-1.072]
87 5 0 Valu e o f Reg u lariz ed Ris k Valu e o f Reg u lariz ed Ris k Z HANG , S AHA AND V ISHWANATHAN 0 10 10 W all− c lo c k tim e (s ec o n d s ) 1 10 2 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 94. [sent-2544, score-2.124]
88 7 0 1 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 93. [sent-2560, score-1.062]
89 13 10 0 1 10 W all− c lo c k tim e (s ec o n d s ) 10 2 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 9 2 . [sent-2562, score-1.719]
90 6 3 2 10 W all− c lo c k tim e (s ec o n d s ) 2 10 W all− c lo c k tim e (s ec o n d s ) 8 7 . [sent-2597, score-1.704]
91 0 0 Valu e o f Reg u lariz ed Ris k Valu e o f Reg u lariz ed Ris k S MOOTHING M ULTIVARIATE P ERFORMANCE M EASURES 10 BMRM SMS 1 2 10 W all− c lo c k tim e (s ec o n d s ) 3 10 4 10 10 W all− c lo c k tim e (s ec o n d s ) (f) Test PRBEP, λ = 10−6 , 86. [sent-2612, score-2.124]
92 4 9 1 BMRM SMS 10 1 10 W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k (e) Test PRBEP, λ = 10−4 , 86. [sent-2624, score-1.062]
93 4 1 2 BMRM SMS 10 1 2 10 W all− c lo c k tim e (s ec o n d s ) 10 3 10 W all− c lo c k tim e (s ec o n d s ) (g) Obj value for ROCArea, λ = 10−2 (h) Obj value for ROCArea, λ = 10−4 (i) Obj value for ROCArea, λ = 10−6 9 8 . [sent-2630, score-1.719]
94 0 0 3 10 W all− c lo c k tim e (s ec o n d s ) W all− c lo c k tim e (s ec o n d s ) Valu e o f Reg u lariz ed Ris k Valu e o f Reg u lariz ed Ris k 1 . [sent-2682, score-2.124]
95 5 7 BMRM SMS BMRM SMS W all− c lo c k tim e (s ec o n d s ) T es t PRBEP (% ) T es t PRBEP (% ) 5 8 . [sent-2692, score-0.872]
96 7 8 3 10 W all− c lo c k tim e (s ec o n d s ) (a) Obj value for PRBEP, λ = 10−2 Valu e o f Reg u lariz ed Ris k 1 . [sent-2695, score-1.067]
97 6 4 0 10 1 10 W all− c lo c k tim e (s ec o n d s ) 10 W all− c lo c k tim e (s ec o n d s ) (a) Obj value for PRBEP (b) Test PRBEP 1 . [sent-2728, score-1.709]
98 7 0 0 10 1 10 W all− c lo c k tim e (s ec o n d s ) 10 W all− c lo c k tim e (s ec o n d s ) (a) Obj value for PRBEP (b) Test PRBEP − x 1 01 4 . [sent-2738, score-1.709]
99 7 8 2 10 10 0 1 10 W all− c lo c k tim e (s ec o n d s ) 2 10 10 W all− c lo c k tim e (s ec o n d s ) (a) Obj value for PRBEP (b) Test PRBEP 1 . [sent-2761, score-1.709]
100 6 5 2 10 10 0 1 10 W all− c lo c k tim e (s ec o n d s ) 2 10 10 W all− c lo c k tim e (s ec o n d s ) (a) Obj value for PRBEP (b) Test PRBEP 1 . [sent-2771, score-1.709]
wordName wordTfidf (topN-words)
[('bmrm', 0.413), ('prbep', 0.368), ('rocarea', 0.362), ('sms', 0.362), ('lo', 0.351), ('tim', 0.257), ('ec', 0.244), ('lariz', 0.206), ('ris', 0.206), ('valu', 0.206), ('obj', 0.146), ('reg', 0.136), ('remp', 0.057), ('ishwanathan', 0.038), ('moothing', 0.037), ('aha', 0.033), ('ultivariate', 0.032), ('easures', 0.028), ('boldface', 0.025), ('hang', 0.024), ('erformance', 0.023), ('test', 0.023), ('panels', 0.017), ('highlighted', 0.016), ('nesterov', 0.016), ('dna', 0.014), ('smoothing', 0.013), ('seconds', 0.013), ('wk', 0.012), ('cpms', 0.012), ('xinhua', 0.012), ('ai', 0.011), ('es', 0.01), ('ankan', 0.009), ('saha', 0.009), ('mi', 0.009), ('teo', 0.009), ('zi', 0.008), ('gure', 0.008), ('kdda', 0.008), ('risk', 0.007), ('subgradient', 0.007), ('agm', 0.007), ('smoothed', 0.006), ('objective', 0.006), ('panel', 0.006), ('minw', 0.006), ('covertype', 0.006), ('yi', 0.006), ('regularized', 0.006), ('value', 0.005), ('multivariate', 0.005), ('cpm', 0.005), ('kddb', 0.005), ('petsc', 0.005), ('accelerated', 0.005), ('franc', 0.005), ('cutting', 0.005), ('gradient', 0.005), ('jk', 0.005), ('contingency', 0.005), ('ocr', 0.005), ('optimal', 0.005), ('string', 0.005), ('sonnenburg', 0.005), ('yurii', 0.005), ('head', 0.004), ('delta', 0.004), ('ed', 0.004), ('zhang', 0.004), ('ak', 0.004), ('max', 0.004), ('joachims', 0.004), ('cne', 0.004), ('tero', 0.004), ('zeta', 0.004), ('smooth', 0.004), ('convex', 0.004), ('alberta', 0.004), ('bound', 0.004), ('plane', 0.004), ('wi', 0.004), ('nocedal', 0.003), ('beck', 0.003), ('decrement', 0.003), ('epsilon', 0.003), ('maximizer', 0.003), ('purdue', 0.003), ('maxn', 0.003), ('tao', 0.003), ('rates', 0.003), ('misclassi', 0.003), ('argmin', 0.003), ('lemar', 0.003), ('alpha', 0.003), ('worm', 0.003), ('beta', 0.003), ('fd', 0.003), ('culties', 0.003), ('chapelle', 0.003)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 107 jmlr-2012-Smoothing Multivariate Performance Measures
Author: Xinhua Zhang, Ankan Saha, S.V.N. Vishwanathan
Abstract: Optimizing multivariate performance measure is an important task in Machine Learning. Joachims (2005) introduced a Support Vector Method whose underlying optimization problem is commonly solved by cutting plane methods (CPMs) such as SVM-Perf and BMRM. It can be shown that CPMs 1 converge to an ε accurate solution in O λε iterations, where λ is the trade-off parameter between the regularizer and the loss function. Motivated by the impressive convergence rate of CPM on a number of practical problems, it was conjectured that these rates can be further improved. We disprove this conjecture in this paper by constructing counter examples. However, surprisingly, we further discover that these problems are not inherently hard, and we develop a novel smoothing strategy, which in conjunction with Nesterov’s accelerated gradient method, can find an ε accuiterations. Computationally, our smoothing technique is also rate solution in O∗ min 1 , √1 ε λε particularly advantageous for optimizing multivariate performance scores such as precision/recall break-even point and ROCArea; the cost per iteration remains the same as that of CPMs. Empirical evaluation on some of the largest publicly available data sets shows that our method converges significantly faster than CPMs without sacrificing generalization ability. Keywords: non-smooth optimization, max-margin methods, multivariate performance measures, Support Vector Machines, smoothing
2 0.05848781 118 jmlr-2012-Variational Multinomial Logit Gaussian Process
Author: Kian Ming A. Chai
Abstract: Gaussian process prior with an appropriate likelihood function is a flexible non-parametric model for a variety of learning tasks. One important and standard task is multi-class classification, which is the categorization of an item into one of several fixed classes. A usual likelihood function for this is the multinomial logistic likelihood function. However, exact inference with this model has proved to be difficult because high-dimensional integrations are required. In this paper, we propose a variational approximation to this model, and we describe the optimization of the variational parameters. Experiments have shown our approximation to be tight. In addition, we provide dataindependent bounds on the marginal likelihood of the model, one of which is shown to be much tighter than the existing variational mean-field bound in the experiments. We also derive a proper lower bound on the predictive likelihood that involves the Kullback-Leibler divergence between the approximating and the true posterior. We combine our approach with a recently proposed sparse approximation to give a variational sparse approximation to the Gaussian process multi-class model. We also derive criteria which can be used to select the inducing set, and we show the effectiveness of these criteria over random selection in an experiment. Keywords: Gaussian process, probabilistic classification, multinomial logistic, variational approximation, sparse approximation
3 0.044576302 31 jmlr-2012-DEAP: Evolutionary Algorithms Made Easy
Author: Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, Christian Gagné
Abstract: DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. Its design departs from most other existing frameworks in that it seeks to make algorithms explicit and data structures transparent, as opposed to the more common black-box frameworks. Freely available with extensive documentation at http://deap.gel.ulaval.ca, DEAP is an open source project under an LGPL license. Keywords: distributed evolutionary algorithms, software tools
4 0.0098798228 69 jmlr-2012-Mixability is Bayes Risk Curvature Relative to Log Loss
Author: Tim van Erven, Mark D. Reid, Robert C. Williamson
Abstract: Mixability of a loss characterizes fast rates in the online learning setting of prediction with expert advice. The determination of the mixability constant for binary losses is straightforward but opaque. In the binary case we make this transparent and simpler by characterising mixability in terms of the second derivative of the Bayes risk of proper losses. We then extend this result to multiclass proper losses where there are few existing results. We show that mixability is governed by the maximum eigenvalue of the Hessian of the Bayes risk, relative to the Hessian of the Bayes risk for log loss. We conclude by comparing our result to other work that bounds prediction performance in terms of the geometry of the Bayes risk. Although all calculations are for proper losses, we also show how to carry the results across to improper losses. Keywords: mixability, multiclass, prediction with expert advice, proper loss, learning rates
5 0.0091789644 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods
Author: Zhihua Zhang, Dehua Liu, Guang Dai, Michael I. Jordan
Abstract: Support vector machines (SVMs) naturally embody sparseness due to their use of hinge loss functions. However, SVMs can not directly estimate conditional class probabilities. In this paper we propose and study a family of coherence functions, which are convex and differentiable, as surrogates of the hinge function. The coherence function is derived by using the maximum-entropy principle and is characterized by a temperature parameter. It bridges the hinge function and the logit function in logistic regression. The limit of the coherence function at zero temperature corresponds to the hinge function, and the limit of the minimizer of its expected error is the minimizer of the expected error of the hinge loss. We refer to the use of the coherence function in large-margin classification as “C -learning,” and we present efficient coordinate descent algorithms for the training of regularized C -learning models. Keywords: large-margin classifiers, hinge functions, logistic functions, coherence functions, C learning
6 0.0087212194 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks
7 0.0079854177 116 jmlr-2012-Transfer in Reinforcement Learning via Shared Features
8 0.0077446834 59 jmlr-2012-Linear Regression With Random Projections
9 0.0076986435 87 jmlr-2012-PAC-Bayes Bounds with Data Dependent Priors
10 0.0071217483 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression
11 0.0065963808 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
12 0.0064277002 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization
13 0.0063392008 96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels
14 0.0063189222 112 jmlr-2012-Structured Sparsity via Alternating Direction Methods
15 0.0061709397 44 jmlr-2012-Feature Selection via Dependence Maximization
16 0.0060857506 39 jmlr-2012-Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications
17 0.0056766099 80 jmlr-2012-On Ranking and Generalization Bounds
18 0.0055816011 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression
19 0.0055796071 97 jmlr-2012-Regularization Techniques for Learning with Matrices
20 0.0053152656 49 jmlr-2012-Hope and Fear for Discriminative Training of Statistical Translation Models
topicId topicWeight
[(0, -0.03), (1, 0.007), (2, 0.015), (3, -0.002), (4, 0.02), (5, 0.003), (6, 0.019), (7, 0.013), (8, -0.061), (9, -0.009), (10, -0.02), (11, 0.022), (12, -0.017), (13, 0.043), (14, -0.01), (15, -0.005), (16, -0.0), (17, 0.019), (18, -0.098), (19, -0.008), (20, 0.031), (21, 0.057), (22, -0.113), (23, -0.051), (24, -0.033), (25, -0.141), (26, -0.079), (27, -0.048), (28, 0.103), (29, -0.022), (30, -0.201), (31, 0.023), (32, -0.085), (33, -0.371), (34, -0.043), (35, 0.038), (36, 0.367), (37, -0.073), (38, 0.006), (39, -0.173), (40, -0.173), (41, 0.111), (42, 0.024), (43, -0.087), (44, 0.186), (45, -0.056), (46, -0.075), (47, 0.116), (48, 0.148), (49, 0.164)]
simIndex simValue paperId paperTitle
same-paper 1 0.99373305 107 jmlr-2012-Smoothing Multivariate Performance Measures
Author: Xinhua Zhang, Ankan Saha, S.V.N. Vishwanathan
Abstract: Optimizing multivariate performance measure is an important task in Machine Learning. Joachims (2005) introduced a Support Vector Method whose underlying optimization problem is commonly solved by cutting plane methods (CPMs) such as SVM-Perf and BMRM. It can be shown that CPMs 1 converge to an ε accurate solution in O λε iterations, where λ is the trade-off parameter between the regularizer and the loss function. Motivated by the impressive convergence rate of CPM on a number of practical problems, it was conjectured that these rates can be further improved. We disprove this conjecture in this paper by constructing counter examples. However, surprisingly, we further discover that these problems are not inherently hard, and we develop a novel smoothing strategy, which in conjunction with Nesterov’s accelerated gradient method, can find an ε accuiterations. Computationally, our smoothing technique is also rate solution in O∗ min 1 , √1 ε λε particularly advantageous for optimizing multivariate performance scores such as precision/recall break-even point and ROCArea; the cost per iteration remains the same as that of CPMs. Empirical evaluation on some of the largest publicly available data sets shows that our method converges significantly faster than CPMs without sacrificing generalization ability. Keywords: non-smooth optimization, max-margin methods, multivariate performance measures, Support Vector Machines, smoothing
2 0.55071497 31 jmlr-2012-DEAP: Evolutionary Algorithms Made Easy
Author: Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, Christian Gagné
Abstract: DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. Its design departs from most other existing frameworks in that it seeks to make algorithms explicit and data structures transparent, as opposed to the more common black-box frameworks. Freely available with extensive documentation at http://deap.gel.ulaval.ca, DEAP is an open source project under an LGPL license. Keywords: distributed evolutionary algorithms, software tools
3 0.3555128 118 jmlr-2012-Variational Multinomial Logit Gaussian Process
Author: Kian Ming A. Chai
Abstract: Gaussian process prior with an appropriate likelihood function is a flexible non-parametric model for a variety of learning tasks. One important and standard task is multi-class classification, which is the categorization of an item into one of several fixed classes. A usual likelihood function for this is the multinomial logistic likelihood function. However, exact inference with this model has proved to be difficult because high-dimensional integrations are required. In this paper, we propose a variational approximation to this model, and we describe the optimization of the variational parameters. Experiments have shown our approximation to be tight. In addition, we provide dataindependent bounds on the marginal likelihood of the model, one of which is shown to be much tighter than the existing variational mean-field bound in the experiments. We also derive a proper lower bound on the predictive likelihood that involves the Kullback-Leibler divergence between the approximating and the true posterior. We combine our approach with a recently proposed sparse approximation to give a variational sparse approximation to the Gaussian process multi-class model. We also derive criteria which can be used to select the inducing set, and we show the effectiveness of these criteria over random selection in an experiment. Keywords: Gaussian process, probabilistic classification, multinomial logistic, variational approximation, sparse approximation
Author: Alain Hauser, Peter Bühlmann
Abstract: The investigation of directed acyclic graphs (DAGs) encoding the same Markov property, that is the same conditional independence relations of multivariate observational distributions, has a long tradition; many algorithms exist for model selection and structure learning in Markov equivalence classes. In this paper, we extend the notion of Markov equivalence of DAGs to the case of interventional distributions arising from multiple intervention experiments. We show that under reasonable assumptions on the intervention experiments, interventional Markov equivalence defines a finer partitioning of DAGs than observational Markov equivalence and hence improves the identifiability of causal models. We give a graph theoretic criterion for two DAGs being Markov equivalent under interventions and show that each interventional Markov equivalence class can, analogously to the observational case, be uniquely represented by a chain graph called interventional essential graph (also known as CPDAG in the observational case). These are key insights for deriving a generalization of the Greedy Equivalence Search algorithm aimed at structure learning from interventional data. This new algorithm is evaluated in a simulation study. Keywords: causal inference, interventions, graphical model, Markov equivalence, greedy equivalence search
5 0.096303716 59 jmlr-2012-Linear Regression With Random Projections
Author: Odalric-Ambrym Maillard, Rémi Munos
Abstract: We investigate a method for regression that makes use of a randomly generated subspace GP ⊂ F (of finite dimension P) of a given large (possibly infinite) dimensional function space F , for example, L2 ([0, 1]d ; R). GP is defined as the span of P random features that are linear combinations of a basis functions of F weighted by random Gaussian i.i.d. coefficients. We show practical motivation for the use of this approach, detail the link that this random projections method share with RKHS and Gaussian objects theory and prove, both in deterministic and random design, approximation error bounds when searching for the best regression function in GP rather than in F , and derive excess risk bounds for a specific regression algorithm (least squares regression in GP ). This paper stresses the motivation to study such methods, thus the analysis developed is kept simple for explanations purpose and leaves room for future developments. Keywords: regression, random matrices, dimension reduction
6 0.093144536 41 jmlr-2012-Exploration in Relational Domains for Model-based Reinforcement Learning
7 0.090803578 27 jmlr-2012-Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
8 0.090547606 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression
9 0.07861381 112 jmlr-2012-Structured Sparsity via Alternating Direction Methods
10 0.073389143 96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels
11 0.071065851 29 jmlr-2012-Consistent Model Selection Criteria on High Dimensions
12 0.068648316 69 jmlr-2012-Mixability is Bayes Risk Curvature Relative to Log Loss
13 0.068069808 52 jmlr-2012-Iterative Reweighted Algorithms for Matrix Rank Minimization
14 0.067514151 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection
15 0.064914323 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods
16 0.064310253 116 jmlr-2012-Transfer in Reinforcement Learning via Shared Features
17 0.063637383 16 jmlr-2012-Algorithms for Learning Kernels Based on Centered Alignment
18 0.06249861 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices
19 0.057827588 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation
20 0.054002598 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems
topicId topicWeight
[(21, 0.017), (26, 0.019), (29, 0.017), (64, 0.638), (75, 0.026), (92, 0.024), (96, 0.046)]
simIndex simValue paperId paperTitle
same-paper 1 0.8798821 107 jmlr-2012-Smoothing Multivariate Performance Measures
Author: Xinhua Zhang, Ankan Saha, S.V.N. Vishwanathan
Abstract: Optimizing multivariate performance measure is an important task in Machine Learning. Joachims (2005) introduced a Support Vector Method whose underlying optimization problem is commonly solved by cutting plane methods (CPMs) such as SVM-Perf and BMRM. It can be shown that CPMs 1 converge to an ε accurate solution in O λε iterations, where λ is the trade-off parameter between the regularizer and the loss function. Motivated by the impressive convergence rate of CPM on a number of practical problems, it was conjectured that these rates can be further improved. We disprove this conjecture in this paper by constructing counter examples. However, surprisingly, we further discover that these problems are not inherently hard, and we develop a novel smoothing strategy, which in conjunction with Nesterov’s accelerated gradient method, can find an ε accuiterations. Computationally, our smoothing technique is also rate solution in O∗ min 1 , √1 ε λε particularly advantageous for optimizing multivariate performance scores such as precision/recall break-even point and ROCArea; the cost per iteration remains the same as that of CPMs. Empirical evaluation on some of the largest publicly available data sets shows that our method converges significantly faster than CPMs without sacrificing generalization ability. Keywords: non-smooth optimization, max-margin methods, multivariate performance measures, Support Vector Machines, smoothing
2 0.63017911 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers
Author: Blaine Nelson, Benjamin I. P. Rubinstein, Ling Huang, Anthony D. Joseph, Steven J. Lee, Satish Rao, J. D. Tygar
Abstract: Classifiers are often used to detect miscreant activities. We study how an adversary can systematically query a classifier to elicit information that allows the attacker to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classifiers that partition their feature space into two sets, one of which is convex. We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. Our results demonstrate that nearoptimal evasion can be accomplished for this family without reverse engineering the classifier’s decision boundary. We also consider general ℓ p costs and show that near-optimal evasion on the family of convex-inducing classifiers is generally efficient for both positive and negative convexity for all levels of approximation if p = 1. Keywords: query algorithms, evasion, reverse engineering, adversarial learning
3 0.57686657 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
Author: Stephen Gould
Abstract: We present an open-source platform-independent C++ framework for machine learning and computer vision research. The framework includes a wide range of standard machine learning and graphical models algorithms as well as reference implementations for many machine learning and computer vision applications. The framework contains Matlab wrappers for core components of the library and an experimental graphical user interface for developing and visualizing machine learning data flows. Keywords: machine learning, graphical models, computer vision, open-source software
4 0.55573779 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
Author: Yiming Ying, Peng Li
Abstract: The main theme of this paper is to develop a novel eigenvalue optimization framework for learning a Mahalanobis metric. Within this context, we introduce a novel metric learning approach called DML-eig which is shown to be equivalent to a well-known eigenvalue optimization problem called minimizing the maximal eigenvalue of a symmetric matrix (Overton, 1988; Lewis and Overton, 1996). Moreover, we formulate LMNN (Weinberger et al., 2005), one of the state-of-the-art metric learning methods, as a similar eigenvalue optimization problem. This novel framework not only provides new insights into metric learning but also opens new avenues to the design of efficient metric learning algorithms. Indeed, first-order algorithms are developed for DML-eig and LMNN which only need the computation of the largest eigenvector of a matrix per iteration. Their convergence characteristics are rigorously established. Various experiments on benchmark data sets show the competitive performance of our new approaches. In addition, we report an encouraging result on a difficult and challenging face verification data set called Labeled Faces in the Wild (LFW). Keywords: metric learning, convex optimization, semi-definite programming, first-order methods, eigenvalue optimization, matrix factorization, face verification
5 0.22172375 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel
Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. Semidefinite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time. Keywords: Mahalanobis distance, semidefinite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor
6 0.14600082 84 jmlr-2012-Online Submodular Minimization
8 0.14379944 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting
9 0.14281704 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers
10 0.14263868 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression
11 0.14176491 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models
12 0.13976569 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
14 0.13866115 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning
15 0.13520034 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
16 0.13465968 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection
17 0.13366479 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints
18 0.12815782 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems
19 0.12735976 87 jmlr-2012-PAC-Bayes Bounds with Data Dependent Priors
20 0.12681919 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis