jmlr jmlr2011 jmlr2011-95 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ingo Steinwart, Don Hush, Clint Scovel
Abstract: We develop, analyze, and test a training algorithm for support vector machine classifiers without offset. Key features of this algorithm are a new, statistically motivated stopping criterion, new warm start options, and a set of inexpensive working set selection strategies that significantly reduce the number of iterations. For these working set strategies, we establish convergence rates that, not surprisingly, coincide with the best known rates for SVMs with offset. We further conduct various experiments that investigate both the run time behavior and the performed iterations of the new training algorithm. It turns out, that the new algorithm needs significantly less iterations and also runs substantially faster than standard training algorithms for SVMs with offset. Keywords: support vector machines, decomposition algorithms
Reference: text
sentIndex sentText sentNum sentScore
1 In particular, for data sets containing a few thousand samples, SVMs without offset profit about twice as much from a good warm start strategy than SVMs with offset do. [sent-59, score-0.139]
2 We further describe the new stopping criterion based on a clipped duality gap as well as several warm start strategies. [sent-62, score-0.143]
3 04 r na so art he ers re he sp o ion ive l ord is r−d bre er nc ca t− as an ali str au tes be dia 0. [sent-978, score-0.47]
4 02 0 r na so art he rs ere ph io s no er− liv rde o dis b rea r ce an c st− an ali a tr us 0 s s ete b dia las rc fou 0. [sent-1088, score-0.478]
5 176 T RAINING SVM S WITHOUT O FFSET using a warm start for SVMs with offset is about 20%, while for SVMs without offset it is about 45% even if only the simple warm start option W5 is used. [sent-1177, score-0.156]
6 Consequently, it seems fair to say that SVMs without offset benefit substantially more from warm start strategies than SVMs with offset do. [sent-1179, score-0.142]
7 This stopping criterion, which is essentially a relaxed duality gap, never leads to more iterations than the classical duality gap stopping criterion, but it often decreased the number of iterations. [sent-1201, score-0.155]
8 01 0 r na so art rs re rde he he p os ion o dis a bre an ali tr us a s r− e liv er nc a t−c 0 s s ete b dia las rc fou 0. [sent-1250, score-0.535]
9 05 0 3 an rm ge 0 2 1− ijc 2 e− a 0 00 a1 typ nn sv 0 00 00 ide u mg 5 1− a w1 a w2 nn ijc v co WSS 1 WSS 8 WSS 3 WSS 5 WSS 7 1. [sent-1251, score-0.769]
10 2 a a2 e1 a w3 uid mg a a3 00 50 e− typ sv 1. [sent-1252, score-0.312]
11 5 r na so art he ers re he sp o ion ive l ord is r−d bre er nc ca t− as an ali str au tes be dia 0. [sent-1275, score-0.47]
12 The graphics display the number of iterations in thousands (top), the run time in seconds (middle), and the ratios WSS x/WSS 1 of the run times (bottom). [sent-1277, score-0.165]
13 4 1D−SVM WSS 1 WSS 8 WSS 2 WSS 3 WSS 5 WSS 7 5 uid g vm s x 10 6 0 e3 an rm ge 1D−SVM WSS 1 WSS 8 WSS 2 WSS 3 WSS 5 WSS 7 0. [sent-1289, score-0.15]
14 05 0 r na so art he ers re he sp o ion ord is r−d an ali str au bre ive l er nc ca t− as tes be dia s as rcl fou 0. [sent-1298, score-0.581]
15 1 0 0 e3 an uid rm ge g vm s 0 00 −2 1 nn ijc 0 00 −2 e typ a a1 a w1 a w2 WSS 1 WSS 8 WSS 3 WSS 5 WSS 7 8 7 a a2 e1 a w3 uid mg sv 2. [sent-1299, score-0.651]
16 5 9 e lic sp ijc ov c 0 00 −5 1 nn a a3 0 00 −5 pe ty ov roo sh mu c a 1. [sent-1300, score-0.587]
17 4 1 0 r na so art he ers re he sp o ion ive l ord is r−d a bre r ce an c st− n lia tra s au tes be dia 0 s as rcl fou 0. [sent-1308, score-0.533]
18 The graphics display the average number of iterations in thousands (top), the run time in seconds (middle), and the ratios WSS x/WSS 1 of the run times (bottom). [sent-1311, score-0.165]
19 05 r na so art he ers re he sp o ion ord is r−d tes an ali str au bre ive l er nc ca t− as be dia 0. [sent-1355, score-0.47]
20 4 s as rcl fou 1 e3 an uid rm ge g vm s 0 00 −2 1 nn ijc 0 00 −2 e typ a a1 a w1 a w2 WSS 7 WSS 32 WSS 64 WSS 128 WSS 256 a a2 e1 a w3 uid mg sv a a3 ms s a4 s a4 roo sh mu c a4 om 0 00 −5 pe ty ov a 3. [sent-1356, score-1.022]
21 015 e lic sp ijc ov c 0 00 −5 1 nn WSS 7 WSS 32 WSS 64 WSS 128 WSS 256 0. [sent-1359, score-0.366]
22 005 0 r na so art he rs ere ph s no rde o dis er− io r ce an c st− liv a bre n lia tra s au tes be dia 0 0 s as rcl fou 3 an ide rm ge u mg sv 00 n ijc 00 20 20 − n1 − pe a a1 vty 00 50 − n1 a w1 a w2 WSS 7 WSS 32 WSS 64 WSS 128 1. [sent-1370, score-1.006]
23 25 e lic sp a a3 00 50 e− p vty co hro a s mu 1. [sent-1373, score-0.275]
24 95 r na so art he rs ere ph io s no er− liv rde o dis b rea r ce an c st− n lia tra s au tes be dia 0. [sent-1388, score-0.406]
25 The graphics display the number of iterations in thousands (top), the run time in seconds (middle), and the ratios WSS x/WSS 32 of the number of iterations (bottom). [sent-1391, score-0.166]
26 02 e lic sp a a3 om s a4 ms a4 s 00 50 e− p vty a4 hro co a s mu 4. [sent-1412, score-0.314]
27 002 0 r na so art ers re he he p os rd iso d ion r− ive er l ete b dia au s a bre a str s s n lia nc a t−c 0 0 e3 an las rc fou erm g 1. [sent-1429, score-0.494]
28 5 m sv 0 2 1− 2 e− a ty ov 0 00 a1 p nn ijc 0 00 00 id gu 5 1− a w1 a w2 nn ijc c WSS 7 WSS 32 WSS 64 WSS 128 e1 a w3 uid mg a a3 0 00 −5 pe ty ov roo sh c mu a 2. [sent-1430, score-0.945]
29 7 r na so art he ers ere ph io s no e liv ord is r−d b rea r ce an c st− n lia tra s au tes be dia 0. [sent-1447, score-0.357]
30 The graphics display the average number of iterations in thousands (top), the run time in seconds (middle), and the ratios WSS x/WSS 32 of the number of iterations (bottom). [sent-1450, score-0.166]
31 The graphics at the top display the number of iterations in thousands (left) and the run time in seconds (right), both averaged over the 10 folds, while the graphics at the bottom display the corresponding ratios WSS x/WSS 7. [sent-1466, score-0.202]
32 05 art r na so rs re he he p os ion rde o dis er− r ce an c st− a rea liv an ali tr us 0. [sent-1488, score-0.297]
33 4 s s ete b dia las rc fou 1 3 an rm ge 2 1− 0 00 2 e− a i 0 00 a1 p vty n jcn sv b 0 00 ide u mg 5 1− a w1 a w2 n jcn e lic sp a a2 ide gu m sv i co 1 a w3 s 50 e− p vty a4 ms a4 s 00 a a3 a4 om hro a s mu co −3 x 10 3 2. [sent-1489, score-1.063]
34 1 r na so art ers re he he p os rd iso d ion r− ive l er au s a bre a str s s n lia nc a t−c ete b dia 0 e3 an las rc fou erm g m sv 0 2 1− 2 e− a ty ov 0 00 a1 p nn ijc 0 00 00 id gu 5 1− a w1 a w2 nn ijc c WSS 7 WSS 519 WSS 1031 1. [sent-1507, score-1.062]
35 6 e lic sp a a3 0 00 −5 pe ty ov roo sh c mu a 1. [sent-1510, score-0.343]
36 The graphics display the number of iterations in thousands (top), the run time in seconds (middle), and the ratios WSS x/WSS 7 of the run times (bottom). [sent-1531, score-0.165]
37 05 art r na so rs re rde he he p os ion o dis r− e liv −3 r ce an c st− a a bre an ali tr us s s ete b dia las rc fou x 10 5 4 0. [sent-1551, score-0.519]
38 5 3 an rm ge 0 2 1− ijc 2 e− a 0 00 a1 typ nn sv 0 00 00 ide u mg 5 1− a w1 a w2 nn ijc v co e lic sp a a2 e1 a w3 uid mg ms a4 ms 50 e− a4 ms 00 a a3 typ sv 0. [sent-1553, score-1.272]
39 6 0 r na so art rs re rde he he ion p os e liv r− o dis er s a bre an ali nc a t−c a tr us 0. [sent-1591, score-0.35]
40 The graphics display the number of iterations in thousands (top), the run time in seconds (middle), and the ratios WSS x/WSS 7 of the run times (bottom). [sent-1594, score-0.165]
41 The graphics at the top display the number of iterations in thousands (left) and the run time in seconds (right), both averaged over the 10 folds, while the graphics at the bottom display the corresponding ratios WSS x/WSS 7. [sent-1621, score-0.202]
42 1 roo sh mu v co a 7 1D−SVM WSS 1 WSS 7 WSS 16 LIBSVM 4 6 1D−SVM WSS 1 WSS 7 WSS 16 LIBSVM 3. [sent-1652, score-0.168]
43 The graphics display the number of iterations in thousands (top), the run time in seconds (middle), and the ratios x/WSS 7 of the run times (bottom). [sent-1659, score-0.165]
44 5 5 10 0 art r na so he ers re he sp o ion ord is r−d l an ali str au bre ive −3 er nc ca t− as tes be dia 0 s as rcl fou 0 00 −2 1 nn ijc 0 00 −2 e typ a a1 a w1 a w2 e lic sp a a2 e1 a w3 uid mg sv ijc ov c 0 00 −5 1 nn 0. [sent-1665, score-1.448]
45 35 1D−SVM WSS 1 WSS 7 WSS 16 LIBSVM 5 uid g vm s x 10 6 0 e3 an rm ge a a3 ms a4 ms a4 ms 0 00 −5 pe ty ov a4 roo sh mu c a 1 1D−SVM WSS 1 WSS 7 WSS 16 LIBSVM 0. [sent-1666, score-0.44]
46 The graphics display the average number of iterations in thousands (top), the run time in seconds (middle), and the ratios x/WSS 7 of the run times (bottom). [sent-1683, score-0.165]
47 4 Duality gap Clipped duality gap 4 Duality gap Clipped duality gap 1. [sent-1702, score-0.176]
48 05 art r na so he ers re he sp o ion ord is r−d l tes n lia tra s au a bre ive −3 r ce an c st− be dia 1 e3 an g vm 00 − n1 − pe a a1 00 50 − n1 ty ov n ijc s 00 20 20 uid rm ge x 10 1. [sent-1719, score-0.785]
49 4 s as rcl fou a w1 a w2 e lic sp a a2 c e1 a w3 uid mg sv n ijc a a3 ms a4 s a4 roo sh mu c a4 ms 0 00 −5 pe ty ov a 0. [sent-1721, score-0.863]
50 035 Duality gap Clipped duality gap Duality gap Clipped duality gap Duality gap Clipped duality gap 1. [sent-1723, score-0.264]
51 04 0 r na so art he ers re he sp o ion ord is r−d n lia tra s au a bre ive l r ce an c st− tes be dia 0 s as rcl fou 0. [sent-1740, score-0.533]
52 02 e3 an uid rm ge g vm s 0 00 −2 1 nn ijc 0 00 −2 e typ a a1 a w1 a w2 Duality gap Clipped duality gap e1 a w3 uid mg a a3 0 00 −5 pe ty ov roo sh mu c a 1. [sent-1741, score-0.863]
53 05 Duality gap Clipped duality gap Duality gap Clipped duality gap 1 1. [sent-1742, score-0.176]
54 5 r na so art he rs ere ph io s no er− liv rde o dis a bre r ce an c st− n lia tra s au tes be dia 0. [sent-1764, score-0.443]
55 The graphics at the top display the number of iterations in thousands for the 2D-SVM with WSS 7, while the graphics in the middle show the corresponding run time in seconds. [sent-1767, score-0.169]
56 05 art r na so he ers re he sp o ion ord is r−d an ali str au a bre ive l r ce an c st− tes be dia 0. [sent-1787, score-0.454]
57 4 s as rcl fou 1 e3 an uid rm ge g vm s 0 00 −2 1 nn ijc 0 00 −2 e typ a a1 a w1 a w2 e lic sp a a2 e1 a w3 uid mg sv ijc ov c 0 00 −5 1 nn a a3 ms a4 ms a4 s 0 00 −5 pe ty ov a4 roo sh mu c a −3 1. [sent-1788, score-1.395]
58 004 0 r na so art he ers ere ph s no io ord is r−d n lia tra s au a bre e liv r ce an c st− tes be dia s as rcl fou 0. [sent-1814, score-0.505]
59 02 3 an ide rm ge u mg 00 sv n ijc 00 20 20 − n1 − pe a a1 ty ov 00 50 − n1 a w1 a w2 e lic sp a a2 c e1 a w3 uid mg sv n ijc a a3 0 00 −5 pe ty ov roo sh c mu a 1. [sent-1816, score-1.201]
60 4 Duality gap Clipped duality gap Duality gap Clipped duality gap 1 1. [sent-1818, score-0.176]
61 5 r na so art he rs ere ph io s no er− liv rde o dis b rea r ce an c st− n lia tra s au tes be dia s as rcl fou 0. [sent-1836, score-0.517]
62 Again, the graphics at the top display the number of iterations in thousands for the different stopping criteria applied to the 2D-SVM with WSS 7, while the graphics in the middle show the corresponding run time in seconds. [sent-1839, score-0.189]
63 192 T RAINING SVM S WITHOUT O FFSET Duality gap Clipped duality gap 8 Duality gap Clipped duality gap 0. [sent-1841, score-0.176]
64 4 Duality gap Clipped duality gap 1 Duality gap Clipped duality gap 1. [sent-1847, score-0.176]
65 The graphics at the top display the number of iterations in thousands (left) and the run time in seconds (right), both averaged over the 10 folds, while the graphics at the bottom display the corresponding ratios. [sent-1864, score-0.19]
66 05 art r na so he rs ere ph os ion rde iso d er− er nc ca t− as bre liv an ali str au tes be dia 0. [sent-1888, score-0.531]
67 4 s as rcl fou 1 3 an rm ge ide 0 00 −2 gu m sv 1 nn ijc 0 00 −2 e typ a a1 0 00 −5 1 nn a w1 a w2 e lic sp a a2 e1 a w3 uid mg ms a4 ms 50 e− a4 ms 00 a a3 typ sv ijc v co a4 roo sh mu v co a −3 x 10 1. [sent-1889, score-1.481]
68 06 0 r na so art he rs re he sp o ion rde iso d er− er nc ca t− as bre liv an ali str au tes be dia 0 s as rcl fou 0. [sent-1907, score-0.646]
69 02 3 an rm ge ide 0 00 −2 gu m sv 1 nn ijc 0 00 −2 e typ a a1 0 00 −5 1 nn a w1 a w2 5 NN 10 NN 15 NN 20 NN 1. [sent-1909, score-0.539]
70 6 e lic sp a a3 00 50 e− typ roo sh mu v co a 1. [sent-1912, score-0.349]
71 9 art he ers re he sp o ion ive l ord is r−d bre er nc ca t− as an ali str au tes be dia 0. [sent-1928, score-0.441]
72 The graphics display the number of iterations in thousands (top), the run time in seconds (middle), and the corresponding ratios xNN/10NN of the run times (bottom). [sent-1931, score-0.165]
73 2 1 art r na so he ers ere ord ph os ion dis er nc ca st− r− live an ali str au a bre tes be dia s as rcl fou 0. [sent-1953, score-0.571]
74 5 e3 an uid rm ge mg sv 0 00 −2 n1 ijcn 0 00 −2 pe ty ov a a1 c 0 00 −5 n1 ijcn a w1 a w2 lice sp a a2 e1 a w3 uid mg sv a a3 ms a4 ms a4 ms 0 00 −5 pe ty ov a4 roo sh mu c a −3 x 10 3 0. [sent-1954, score-1.018]
75 002 0 r na so art he ers re he sp o ion ord is r−d er nc ca t− as bre live an ali str au tes be dia 0 s as rcl fou 0. [sent-1973, score-0.559]
76 02 e3 an uid rm ge g vm s 0 00 −2 1 nn ijc 0 00 −2 e typ a a1 0 00 −5 1 nn a w1 a w2 5 NN 10 NN 15 NN 20 NN 1. [sent-1974, score-0.477]
77 2 a a2 e1 a w3 uid mg sv ijc v co a a3 00 50 e− typ roo sh mu v co a 1. [sent-1975, score-0.64]
78 2 0 r na so art he rs ere ph os ion liv rde iso d er− bre er nc ca t− as an ali str au tes be dia 0. [sent-1994, score-0.531]
79 The graphics display the average number of iterations in thousands (top), the run time in seconds (middle), and the corresponding ratios xNN/10NN of the run times (bottom). [sent-1997, score-0.165]
80 The graphics at the top display the number of iterations in thousands (left) and the run time in seconds (right), both averaged over the 10 folds, while the graphics at the bottom display the corresponding ratios xNN/10NN. [sent-2021, score-0.202]
81 02 r na so art he ers re he sp o ion ord is r−d an ali str au bre ive l er nc ca t− as tes be dia s as rcl fou 0 e3 an uid rm ge g vm s 0 00 −2 1 nn ijc 0 00 −2 e typ a a1 a w1 a w2 e lic sp I0−W0 I1−W1 I0−W2 I1−W2 I0−W5 I1−W3 1. [sent-2059, score-1.101]
82 8 a a2 ijc ov c 0 00 −5 1 nn a a3 0 00 −5 pe ty ov roo sh c mu a 2 I0−W0 I1−W1 I0−W2 I1−W2 I0−W5 I1−W3 2. [sent-2062, score-0.465]
83 4 r na so art he ers re he sp o ion ive l ord is r−d a bre r ce an c st− n lia tra s au tes be dia 0 s as rcl fou 0. [sent-2077, score-0.533]
84 The graphics display the number of iterations in thousands (top), the run time in seconds (middle), and the ratios Ix-Wy/I0-W0 of the run times (bottom). [sent-2079, score-0.165]
85 5 0 art r na so ers re he he p os ion rd iso r−d e liv −3 er tr us a s a bre an ali nc a t−c las rc fou m sv 0 2 1− 2 e− a 0 00 a1 typ nn ijc 0 00 00 id gu 5 1− a w1 a w2 nn ijc v co e lic sp a a2 e1 a w3 uid mg sv 0. [sent-2097, score-1.378]
86 2 0 s s ete b dia a a3 ms a4 ms a4 ms 00 50 e− typ a4 roo sh mu v co a 0. [sent-2099, score-0.369]
87 2 0 r na so art rs re rde he he p os ion o dis a bre an ali tr us a s r− e liv er nc a t−c 0 s s ete b dia las rc fou I0−W0 I0−W4 I1−W4 I0−W6 I1−W6 I0−W5 I1−W3 1 0. [sent-2116, score-0.535]
88 8 0 3 an rm ge 2 1− nn sv 0 ijc 0 00 00 ide u mg 2 e− a typ 0 00 a1 5 1− a w1 a w2 nn ijc v co 0. [sent-2118, score-0.769]
89 8 00 a a3 50 e− typ sv I0−W0 I0−W4 I1−W4 I0−W6 I1−W6 I0−W5 I1−W3 1 e lic sp roo sh mu v co a I0−W0 I0−W4 I1−W4 I0−W6 I1−W6 I0−W5 I1−W3 0. [sent-2121, score-0.446]
90 1 r na so art he ers re he sp o ion ord e liv is r−d bre er nc ca t− as an ali str au tes be dia 0. [sent-2137, score-0.491]
91 The graphics display the number of iterations in thousands (top), the run time in seconds (middle), and the ratios Ix-Wy/I0-W0 of the run times (bottom). [sent-2140, score-0.165]
92 05 art r na so he ers re he sp o ion ord is r−d n lia tra s au a bre e liv −3 r ce an c st− tes be dia 0. [sent-2161, score-0.443]
93 2 s as rcl fou − n1 00 20 − pe a a1 ty ov n ijc 00 50 − n1 a w1 a w2 e lic sp a a2 c e1 a w3 uid mg sv n ijc 0. [sent-2162, score-0.809]
94 5 00 20 uid g vm s x 10 3 1 e3 an rm ge a a3 s a4 s a4 s 0 00 −5 pe ty ov a4 om hro a s mu c 0. [sent-2164, score-0.333]
95 02 0 r na so art he rs ere rde ph s no o dis er− io liv r ce an c st− rea n lia tra s au tes be dia 0 s as rcl fou 0. [sent-2177, score-0.517]
96 1 0 3 an ide rm ge u mg sv b 00 20 − n1 00 20 − pe a a1 vty n ijc 00 50 − n1 a w1 a w2 I0−W0 I0−W2 I0−W3 I0−W5 1. [sent-2178, score-0.452]
97 2 a a2 co e1 a w3 uid mg a a3 00 50 e− p vty sv n ijc 1. [sent-2179, score-0.452]
98 4 r na so art he rs ere ph io s no er− liv rde o dis b rea r ce an c st− an ali a tr us s s ete b dia las rc fou 0. [sent-2199, score-0.478]
99 The graphics display the number of iterations in thousands (top), the run time in seconds (middle), and the ratios Ix-Wy/I0-W0 of the run times (bottom). [sent-2202, score-0.165]
100 The graphics at the top display the number of iterations in thousands (left) and the run time in seconds (right), both averaged over the 10 folds, while the graphics at the bottom display the corresponding ratios I0-Wx/I0-W0. [sent-2219, score-0.202]
wordName wordTfidf (topN-words)
[('wss', 0.93), ('ijc', 0.11), ('sv', 0.097), ('mg', 0.081), ('nn', 0.079), ('uid', 0.075), ('fou', 0.068), ('lic', 0.064), ('typ', 0.059), ('sp', 0.058), ('libsvm', 0.056), ('ov', 0.055), ('bre', 0.053), ('co', 0.05), ('ion', 0.05), ('offset', 0.049), ('dia', 0.048), ('roo', 0.046), ('ide', 0.043), ('liv', 0.043), ('rcl', 0.043), ('tes', 0.043), ('ge', 0.043), ('mu', 0.042), ('covel', 0.041), ('teinwart', 0.041), ('ush', 0.041), ('graphics', 0.041), ('ffset', 0.039), ('vty', 0.039), ('ali', 0.037), ('wi', 0.035), ('clipped', 0.035), ('art', 0.031), ('duality', 0.03), ('svm', 0.03), ('raining', 0.03), ('sh', 0.03), ('rde', 0.029), ('warm', 0.029), ('na', 0.029), ('cold', 0.029), ('str', 0.029), ('gap', 0.029), ('svms', 0.027), ('ty', 0.027), ('iterations', 0.026), ('grid', 0.025), ('au', 0.025), ('run', 0.025), ('ere', 0.025), ('ete', 0.025), ('las', 0.025), ('ki', 0.025), ('ms', 0.023), ('hro', 0.022), ('ive', 0.022), ('pe', 0.021), ('jcn', 0.021), ('display', 0.021), ('ci', 0.02), ('rs', 0.02), ('cinew', 0.02), ('stopping', 0.02), ('working', 0.019), ('rc', 0.019), ('io', 0.018), ('svmguide', 0.018), ('rm', 0.018), ('ei', 0.017), ('tra', 0.017), ('lia', 0.017), ('nc', 0.017), ('ph', 0.016), ('ord', 0.016), ('gain', 0.016), ('dis', 0.016), ('rea', 0.016), ('om', 0.016), ('cnew', 0.016), ('hush', 0.015), ('thousands', 0.015), ('strategies', 0.015), ('bsv', 0.014), ('steinwart', 0.014), ('vm', 0.014), ('usv', 0.013), ('os', 0.013), ('ce', 0.013), ('er', 0.012), ('solver', 0.012), ('ratios', 0.012), ('graphic', 0.012), ('mvp', 0.012), ('strategy', 0.012), ('requirements', 0.012), ('gu', 0.011), ('attains', 0.011), ('iso', 0.011), ('st', 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 95 jmlr-2011-Training SVMs Without Offset
Author: Ingo Steinwart, Don Hush, Clint Scovel
Abstract: We develop, analyze, and test a training algorithm for support vector machine classifiers without offset. Key features of this algorithm are a new, statistically motivated stopping criterion, new warm start options, and a set of inexpensive working set selection strategies that significantly reduce the number of iterations. For these working set strategies, we establish convergence rates that, not surprisingly, coincide with the best known rates for SVMs with offset. We further conduct various experiments that investigate both the run time behavior and the performed iterations of the new training algorithm. It turns out, that the new algorithm needs significantly less iterations and also runs substantially faster than standard training algorithms for SVMs with offset. Keywords: support vector machines, decomposition algorithms
2 0.023023488 45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms
Author: Vianney Perchet
Abstract: We provide consistent random algorithms for sequential decision under partial monitoring, when the decision maker does not observe the outcomes but receives instead random feedback signals. Those algorithms have no internal regret in the sense that, on the set of stages where the decision maker chose his action according to a given law, the average payoff could not have been improved in average by using any other fixed law. They are based on a generalization of calibration, no longer defined in terms of a Vorono¨ ı diagram but instead of a Laguerre diagram (a more general concept). This allows us to bound, for the first time in this general framework, the expected average internal, as well as the usual external, regret at stage n by O(n−1/3 ), which is known to be optimal. Keywords: repeated games, on-line learning, regret, partial monitoring, calibration, Vorono¨ and ı Laguerre diagrams
3 0.01366636 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing
Author: Jennifer Gillenwater, Kuzman Ganchev, João Graça, Fernando Pereira, Ben Taskar
Abstract: A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve significant gains in directed attachment accuracy over the standard expectation maximization (EM) baseline, with an average accuracy improvement of 6.5%, outperforming EM by at least 1% for 9 out of 12 languages. Furthermore, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors with an average improvement of 5% and positive gains of at least 1% for 9 out of 12 languages. On English text in particular, we show that our approach improves performance over other state-of-the-art techniques.
4 0.012908407 30 jmlr-2011-Efficient Structure Learning of Bayesian Networks using Constraints
Author: Cassio P. de Campos, Qiang Ji
Abstract: This paper addresses the problem of learning Bayesian network structures from data based on score functions that are decomposable. It describes properties that strongly reduce the time and memory costs of many known methods without losing global optimality guarantees. These properties are derived for different score criteria such as Minimum Description Length (or Bayesian Information Criterion), Akaike Information Criterion and Bayesian Dirichlet Criterion. Then a branch-andbound algorithm is presented that integrates structural constraints with data in a way to guarantee global optimality. As an example, structural constraints are used to map the problem of structure learning in Dynamic Bayesian networks into a corresponding augmented Bayesian network. Finally, we show empirically the benefits of using the properties with state-of-the-art methods and with the new algorithm, which is able to handle larger data sets than before. Keywords: Bayesian networks, structure learning, properties of decomposable scores, structural constraints, branch-and-bound technique
5 0.012452506 105 jmlr-2011-lp-Norm Multiple Kernel Learning
Author: Marius Kloft, Ulf Brefeld, Sören Sonnenburg, Alexander Zien
Abstract: Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability and scalability. Unfortunately, this ℓ1 -norm MKL is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we extend MKL to arbitrary norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary norms, that is ℓ p -norms with p ≥ 1. This interleaved optimization is much faster than the commonly used wrapper approaches, as demonstrated on several data sets. A theoretical analysis and an experiment on controlled artificial data shed light on the appropriateness of sparse, non-sparse and ℓ∞ -norm MKL in various scenarios. Importantly, empirical applications of ℓ p -norm MKL to three real-world problems from computational biology show that non-sparse MKL achieves accuracies that surpass the state-of-the-art. Data sets, source code to reproduce the experiments, implementations of the algorithms, and further information are available at http://doc.ml.tu-berlin.de/nonsparse_mkl/. Keywords: multiple kernel learning, learning kernels, non-sparse, support vector machine, convex conjugate, block coordinate descent, large scale optimization, bioinformatics, generalization bounds, Rademacher complexity ∗. Also at Machine Learning Group, Technische Universit¨ t Berlin, 10587 Berlin, Germany. a †. Parts of this work were done while SS was at the Friedrich Miescher Laboratory, Max Planck Society, 72076 T¨ bingen, Germany. u ‡. Most contributions by AZ were done at the Fraunhofer Institute FIRST, 12489 Berlin, Germany. c 2011 Marius Kloft, Ulf Brefeld, S¨ ren Sonnenburg and Alexander Zien. o K LOFT, B REFELD , S ONNENBURG AND Z IEN
6 0.011633006 68 jmlr-2011-Natural Language Processing (Almost) from Scratch
7 0.011141522 53 jmlr-2011-Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates
8 0.011130126 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models
9 0.010746416 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms
10 0.010221099 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination
11 0.010116195 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing
12 0.010098701 66 jmlr-2011-Multiple Kernel Learning Algorithms
13 0.010044969 55 jmlr-2011-Learning Multi-modal Similarity
14 0.0096158125 51 jmlr-2011-Laplacian Support Vector Machines Trained in the Primal
15 0.0095350184 9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models
16 0.0092894481 19 jmlr-2011-Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms
17 0.0090639582 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels
18 0.0089630783 101 jmlr-2011-Variable Sparsity Kernel Learning
19 0.0088485274 103 jmlr-2011-Weisfeiler-Lehman Graph Kernels
20 0.0088034933 80 jmlr-2011-Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach
topicId topicWeight
[(0, 0.046), (1, -0.006), (2, 0.006), (3, -0.019), (4, 0.004), (5, 0.008), (6, -0.009), (7, 0.01), (8, -0.007), (9, -0.005), (10, -0.032), (11, -0.03), (12, 0.005), (13, 0.021), (14, 0.009), (15, -0.026), (16, 0.002), (17, -0.034), (18, -0.015), (19, 0.023), (20, 0.021), (21, 0.028), (22, 0.018), (23, 0.0), (24, -0.006), (25, -0.019), (26, -0.041), (27, 0.04), (28, -0.099), (29, 0.137), (30, -0.001), (31, 0.054), (32, 0.015), (33, 0.084), (34, -0.187), (35, 0.301), (36, -0.004), (37, -0.138), (38, 0.083), (39, 0.036), (40, -0.233), (41, -0.696), (42, -0.44), (43, -0.09), (44, -0.036), (45, 0.083), (46, -0.063), (47, -0.02), (48, 0.067), (49, -0.005)]
simIndex simValue paperId paperTitle
same-paper 1 0.99040276 95 jmlr-2011-Training SVMs Without Offset
Author: Ingo Steinwart, Don Hush, Clint Scovel
Abstract: We develop, analyze, and test a training algorithm for support vector machine classifiers without offset. Key features of this algorithm are a new, statistically motivated stopping criterion, new warm start options, and a set of inexpensive working set selection strategies that significantly reduce the number of iterations. For these working set strategies, we establish convergence rates that, not surprisingly, coincide with the best known rates for SVMs with offset. We further conduct various experiments that investigate both the run time behavior and the performed iterations of the new training algorithm. It turns out, that the new algorithm needs significantly less iterations and also runs substantially faster than standard training algorithms for SVMs with offset. Keywords: support vector machines, decomposition algorithms
2 0.1056635 83 jmlr-2011-Scikit-learn: Machine Learning in Python
Author: Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net. Keywords: Python, supervised learning, unsupervised learning, model selection
3 0.10551225 45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms
Author: Vianney Perchet
Abstract: We provide consistent random algorithms for sequential decision under partial monitoring, when the decision maker does not observe the outcomes but receives instead random feedback signals. Those algorithms have no internal regret in the sense that, on the set of stages where the decision maker chose his action according to a given law, the average payoff could not have been improved in average by using any other fixed law. They are based on a generalization of calibration, no longer defined in terms of a Vorono¨ ı diagram but instead of a Laguerre diagram (a more general concept). This allows us to bound, for the first time in this general framework, the expected average internal, as well as the usual external, regret at stage n by O(n−1/3 ), which is known to be optimal. Keywords: repeated games, on-line learning, regret, partial monitoring, calibration, Vorono¨ and ı Laguerre diagrams
4 0.078880765 51 jmlr-2011-Laplacian Support Vector Machines Trained in the Primal
Author: Stefano Melacci, Mikhail Belkin
Abstract: In the last few years, due to the growing ubiquity of unlabeled data, much effort has been spent by the machine learning community to develop better understanding and improve the quality of classifiers exploiting unlabeled data. Following the manifold regularization approach, Laplacian Support Vector Machines (LapSVMs) have shown the state of the art performance in semi-supervised classification. In this paper we present two strategies to solve the primal LapSVM problem, in order to overcome some issues of the original dual formulation. In particular, training a LapSVM in the primal can be efficiently performed with preconditioned conjugate gradient. We speed up training by using an early stopping strategy based on the prediction on unlabeled data or, if available, on labeled validation examples. This allows the algorithm to quickly compute approximate solutions with roughly the same classification accuracy as the optimal ones, considerably reducing the training time. The computational complexity of the training algorithm is reduced from O(n3 ) to O(kn2 ), where n is the combined number of labeled and unlabeled examples and k is empirically evaluated to be significantly smaller than n. Due to its simplicity, training LapSVM in the primal can be the starting point for additional enhancements of the original LapSVM formulation, such as those for dealing with large data sets. We present an extensive experimental evaluation on real world data showing the benefits of the proposed approach. Keywords: Laplacian support vector machines, manifold regularization, semi-supervised learning, classification, optimization
5 0.064548552 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing
Author: Jennifer Gillenwater, Kuzman Ganchev, João Graça, Fernando Pereira, Ben Taskar
Abstract: A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve significant gains in directed attachment accuracy over the standard expectation maximization (EM) baseline, with an average accuracy improvement of 6.5%, outperforming EM by at least 1% for 9 out of 12 languages. Furthermore, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors with an average improvement of 5% and positive gains of at least 1% for 9 out of 12 languages. On English text in particular, we show that our approach improves performance over other state-of-the-art techniques.
6 0.062179517 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination
7 0.06098887 52 jmlr-2011-Large Margin Hierarchical Classification with Mutually Exclusive Class Membership
8 0.060512006 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models
9 0.059730124 49 jmlr-2011-Kernel Regression in the Presence of Correlated Errors
10 0.052678876 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing
11 0.047706954 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals
12 0.045895878 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels
13 0.044754509 88 jmlr-2011-Structured Variable Selection with Sparsity-Inducing Norms
14 0.04223296 80 jmlr-2011-Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach
15 0.039718095 68 jmlr-2011-Natural Language Processing (Almost) from Scratch
16 0.038155477 57 jmlr-2011-Learning a Robust Relevance Model for Search Using Kernel Methods
17 0.036031798 71 jmlr-2011-On Equivalence Relationships Between Classification and Ranking Algorithms
18 0.035768859 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms
19 0.035708647 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms
20 0.035401057 19 jmlr-2011-Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms
topicId topicWeight
[(4, 0.021), (9, 0.023), (10, 0.038), (24, 0.014), (31, 0.606), (32, 0.022), (41, 0.028), (71, 0.018), (73, 0.017), (78, 0.046), (90, 0.012)]
simIndex simValue paperId paperTitle
1 0.98491919 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels
Author: Krishnakumar Balasubramanian, Pinar Donmez, Guy Lebanon
Abstract: Many popular linear classifiers, such as logistic regression, boosting, or SVM, are trained by optimizing a margin-based risk function. Traditionally, these risk functions are computed based on a labeled data set. We develop a novel technique for estimating such risks using only unlabeled data and the marginal label distribution. We prove that the proposed risk estimator is consistent on high-dimensional data sets and demonstrate it on synthetic and real-world data. In particular, we show how the estimate is used for evaluating classifiers in transfer learning, and for training classifiers with no labeled data whatsoever. Keywords: classification, large margin, maximum likelihood
2 0.96244293 26 jmlr-2011-Distance Dependent Chinese Restaurant Processes
Author: David M. Blei, Peter I. Frazier
Abstract: We develop the distance dependent Chinese restaurant process, a flexible class of distributions over partitions that allows for dependencies between the elements. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies arising from time, space, and network connectivity. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both fully observed and latent mixture settings. We study its empirical performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data and network data. We also show that the distance dependent CRP representation of the traditional CRP mixture leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation. Keywords: Chinese restaurant processes, Bayesian nonparametrics
same-paper 3 0.96162844 95 jmlr-2011-Training SVMs Without Offset
Author: Ingo Steinwart, Don Hush, Clint Scovel
Abstract: We develop, analyze, and test a training algorithm for support vector machine classifiers without offset. Key features of this algorithm are a new, statistically motivated stopping criterion, new warm start options, and a set of inexpensive working set selection strategies that significantly reduce the number of iterations. For these working set strategies, we establish convergence rates that, not surprisingly, coincide with the best known rates for SVMs with offset. We further conduct various experiments that investigate both the run time behavior and the performed iterations of the new training algorithm. It turns out, that the new algorithm needs significantly less iterations and also runs substantially faster than standard training algorithms for SVMs with offset. Keywords: support vector machines, decomposition algorithms
4 0.95992988 101 jmlr-2011-Variable Sparsity Kernel Learning
Author: Jonathan Aflalo, Aharon Ben-Tal, Chiranjib Bhattacharyya, Jagarlapudi Saketha Nath, Sankaran Raman
Abstract: paper1 This presents novel algorithms and applications for a particular class of mixed-norm regularization based Multiple Kernel Learning (MKL) formulations. The formulations assume that the given kernels are grouped and employ l1 norm regularization for promoting sparsity within RKHS norms of each group and ls , s ≥ 2 norm regularization for promoting non-sparse combinations across groups. Various sparsity levels in combining the kernels can be achieved by varying the grouping of kernels—hence we name the formulations as Variable Sparsity Kernel Learning (VSKL) formulations. While previous attempts have a non-convex formulation, here we present a convex formulation which admits efficient Mirror-Descent (MD) based solving techniques. The proposed MD based algorithm optimizes over product of simplices and has a computational complexity of O m2 ntot log nmax /ε2 where m is no. training data points, nmax , ntot are the maximum no. kernels in any group, total no. kernels respectively and ε is the error in approximating the objective. A detailed proof of convergence of the algorithm is also presented. Experimental results show that the VSKL formulations are well-suited for multi-modal learning tasks like object categorization. Results also show that the MD based algorithm outperforms state-of-the-art MKL solvers in terms of computational efficiency. Keywords: multiple kernel learning, mirror descent, mixed-norm, object categorization, scalability 1. All authors contributed equally. The author names appear in alphabetical order. c 2011 Jonathan Aflalo, Aharon Ben-Tal, Chiranjib Bhattacharyya, Jagarlapudi Saketha Nath and Sankaran Raman. A FLALO , B EN -TAL , B HATTACHARYYA , NATH AND R AMAN
5 0.92403406 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series
Author: Graham W. Taylor, Geoffrey E. Hinton, Sam T. Roweis
Abstract: In this paper we develop a class of nonlinear generative models for high-dimensional time series. We first propose a model based on the restricted Boltzmann machine (RBM) that uses an undirected model with binary latent variables and real-valued “visible” variables. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. This “conditional” RBM (CRBM) makes on-line inference efficient and allows us to use a simple approximate learning procedure. We demonstrate the power of our approach by synthesizing various sequences from a model trained on motion capture data and by performing on-line filling in of data lost during capture. We extend the CRBM in a way that preserves its most important computational properties and introduces multiplicative three-way interactions that allow the effective interaction weight between two variables to be modulated by the dynamic state of a third variable. We introduce a factoring of the implied three-way weight tensor to permit a more compact parameterization. The resulting model can capture diverse styles of motion with a single set of parameters, and the three-way interactions greatly improve its ability to blend motion styles or to transition smoothly among them. Videos and source code can be found at http://www.cs.nyu.edu/˜gwtaylor/publications/ jmlr2011. Keywords: unsupervised learning, restricted Boltzmann machines, time series, generative models, motion capture
6 0.7549606 51 jmlr-2011-Laplacian Support Vector Machines Trained in the Primal
7 0.72519612 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation
8 0.70000184 27 jmlr-2011-Domain Decomposition Approach for Fast Gaussian Process Regression of Large Spatial Data Sets
10 0.69547635 3 jmlr-2011-A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis
11 0.69455612 12 jmlr-2011-Bayesian Co-Training
12 0.69028986 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing
13 0.68726003 69 jmlr-2011-Neyman-Pearson Classification, Convexity and Stochastic Constraints
14 0.67472267 36 jmlr-2011-Generalized TD Learning
15 0.67452806 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models
16 0.66574097 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets
17 0.66449392 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
18 0.66344875 16 jmlr-2011-Clustering Algorithms for Chains
19 0.66272616 38 jmlr-2011-Hierarchical Knowledge Gradient for Sequential Sampling
20 0.65914762 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms