jmlr jmlr2010 jmlr2010-35 jmlr2010-35-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sergio Escalera, Oriol Pujol, Petia Radeva
Abstract: In this paper, we present an open source Error-Correcting Output Codes (ECOC) library. The ECOC framework is a powerful tool to deal with multi-class categorization problems. This library contains both state-of-the-art coding (one-versus-one, one-versus-all, dense random, sparse random, DECOC, forest-ECOC, and ECOC-ONE) and decoding designs (hamming, euclidean, inverse hamming, laplacian, β-density, attenuated, loss-based, probabilistic kernel-based, and lossweighted) with the parameters defined by the authors, as well as the option to include your own coding, decoding, and base classifier. Keywords: error-correcting output codes, multi-class classification, coding, decoding, open source, matlab, octave 1. Error-Correcting Output Codes The Error-Correcting Output Codes (ECOC) framework (Dietterich and Bakiri, 1995) is a simple but powerful framework to deal with the multi-class categorization problem based on the embedding of binary classifiers. Given a set of Nc classes, the basis of the ECOC framework consists of designing a codeword for each of the classes. These codewords encode the membership information of each class for a given binary problem. Arranging the codewords as rows of a matrix, we obtain a ”coding matrix” Mc , where Mc ∈ {−1, 0, 1}Nc ×n , being n the length of the codewords codifying each class. From the point of view of learning, Mc is constructed by considering n binary problems, each one corresponding to a column of the matrix Mc . Each of these binary problems (or dichotomizers) splits the set of classes in two partitions (coded by +1 or -1 in Mc according to their class set membership, or 0 if the class is not considered by the current binary problem). Then, at the decoding step, applying the n trained binary classifiers, a code is obtained for each data point in the test set. This code is compared to the base codewords of each class defined in the matrix Mc , and the data point is assigned to the class with the ”closest” codeword. Several decoding strategies have been proposed in literature. The reader is referred to Escalera et al. (2008) for a more detailed review. An example of an ECOC design is described in Fig. 1. The ECOC designs are independent of the base classifier applied. They involve error-correcting properties (Dietterich and Bakiri, 1995) and have shown to be able to reduce the bias and variance produced by the learning algorithm (Kong and Dietterich, 1995). Because of these reasons, ECOCs have been widely used to deal with multi-class categorization problems. c 2010 Sergio Escalera, Oriol Pujol and Petia Radeva. E SCALERA , P UJOL AND R ADEVA ECOC coding design for a 4-class problem. White, black, and grey positions corresponds to the symbols +1, -1, and 0, respectively. Once the four binary problems are learnt, at the decoding step a new test sample X is tested by the n classifiers. Then, the new codeword x = {x1 , .., xn } is compared with the class codewords {C1 , ..C4 }, classifying the new sample by the class ci which codeword minimizes the decoding measure. Figure 1: ECOC design example. 2. Library Algorithms The ECOCs library is a Matlab/Octave code under the open source GPL license (gpl) with the implementation of the state-of-the-art coding and decoding ECOC designs. A main function defines the multi-class data, coding, decoding, and base classifier. A list of parameters are also included in order to tune the different strategies. In addition to the implemented coding and decoding designs, which are described in the following section, the user can include his own coding, decoding, and base classifier as defined in the user guide. 2.1 Implemented Coding Designs The ECOC designs of the ECOC library cover the state-of-the-art of coding strategies, mainly divided in two main groups: problem-independent approaches, which do not take into account the distribution of the data to define the coding matrix, and the problem-dependent designs, where information of the particular domain is used to guide the coding design. 2.1.1 P ROBLEM - INDEPENDENT ECOC D ESIGNS • One-versus-all (Rifkin and Klautau, 2004): Nc dichotomizers are learnt for Nc classes, where each one splits one class from the rest of classes. • One-versus-one (Nilsson, 1965): n = Nc (Nc − 1)/2 dichotomizers are learnt for Nc classes, splitting each possible pair of classes. • Dense Random (Allwein et al., 2002): n = 10 · log Nc dichotomizers are suggested to be learnt for Nc classes, where P(−1) = 1 − P(+1), being P(−1) and P(+1) the probability of the symbols -1 and +1 to appear, respectively. Then, from a set of defined random matrices, the one which maximizes a decoding measure among all possible rows of Mc is selected. • Sparse Random (Escalera et al., 2009): n = 15 · log Nc dichotomizers are suggested to be learnt for Nc classes, where P(0) = 1 − P(−1) − P(+1), defining a set of random matrices Mc and selecting the one which maximizes a decoding measure among all possible rows of Mc . 662 E RROR -C ORRECTING O UPUT C ODES L IBRARY 2.1.2 P ROBLEM - DEPENDENT ECOC D ESIGNS • DECOC (Pujol et al., 2006): problem-dependent design that uses n = Nc − 1 dichotomizers. The partitions of the problem are learnt by means of a binary tree structure using exhaustive search or a SFFS criterion. Finally, each internal node of the tree is embedded as a column in Mc . • Forest-ECOC (Escalera et al., 2007): problem-dependent design that uses n = (Nc − 1) · T dichotomizers, where T stands for the number of binary tree structures to be embedded. This approach extends the variability of the classifiers of the DECOC design by including extra dichotomizers. • ECOC-ONE (Pujol et al., 2008): problem-dependent design that uses n = 2 · Nc suggested dichotomizers. A validation sub-set is used to extend any initial matrix Mc and to increase its generalization by including new dichotomizers that focus on difficult to split classes. 2.2 Implemented Decoding Designs The software comes with a complete set of ECOC decoding strategies. The notation used refers to that used in (Escalera et al., 2008): j • Hamming decoding: HD(x, yi ) = ∑n (1 − sign(x j · yi ))/2, being x a test codeword and yi a j=1 codeword from Mc corresponding to class Ci . • Inverse Hamming decoding: IHD(x, yi ) = max(∆−1 DT ), where ∆(i1 , i2 ) = HD(yi1 , yi2 ), and D is the vector of Hamming decoding values of the test codeword x for each of the base codewords yi . • Euclidean decoding: ED(x, yi ) = j ∑n (x j − yi )2 . j=1 • Attenuated Euclidean decoding: AED(x, yi ) = j j ∑n | yi || x j | (x j − yi )2 . j=1 • Loss-based decoding: LB(ρ, yi ) = ∑n L(yi · f j (ρ)), where ρ is a test sample, L is a lossj=1 function, and f is a real-valued function f : R n → R . j • Probabilistic-based decoding: PD(yi , x)=−log ∏ j∈[1,..,n]:Mc (i, j)=0 P(x j = Mc (i, j)| f j ) + K , where K is a constant factor that collects the probability mass dispersed on the invalid codes, and the probability P(x j = Mc (i, j)| f j ) j 1 , where vectors υ and ω are obtained by is estimated by means of P(x j = yi | f j ) = j j j j 1+eyi (υ f +ω ) solving an optimization problem (Passerini et al., 2004). αi +1 • Laplacian decoding: LAP(x, yi ) = αi +βi +K , where αi is the number of matched positions between x and yi , βi is the number of miss-matches without considering the positions coded by 0, and K is an integer value that codifies the number of classes considered by the classifier. νi 1 • Pessimistic β-Density Distribution decoding: accuracy si : νi −si ψi (ν, αi , βi )dν = 3 , where 1 ψi (ν, αi , βi ) = K ναi (1 − ν)βi , ψi is the β-Density Distribution between a codeword x and a class codeword yi for class ci , and ν ∈ R : [0, 1]. R j • Loss-Weighted decoding: LW (ρ, i) = ∑n MW (i, j)L(yi · f (ρ, j)), where MW (i, j) = ∑nH(i, j) j) , j=1 H(i, j=1 j 1, if x j = yi , 1 H(i, j) = mi ∑mi ϕ(h j (ρi ), i, j), ϕ(x j , i, j) = , mi is the number of training k k=1 0, otherwise. samples from class Ci , and ρi is the kth sample from class Ci . k 663 E SCALERA , P UJOL AND R ADEVA 3. Implementation Details The ECOCs Library comes with detailed documentation. A user guide describes the usage of the software. All the strategies and parameters used in the functions and files are described in detail. The user guide also presents examples of variable setting and execution, including a demo file. About the computational complexity, the training and testing time depends on the data size, coding and decoding algorithms, as well as the base classifier used in the ECOC design. Acknowledgments This work has been supported in part by projects TIN2009-14404-C02 and CONSOLIDER-INGENIO CSD 2007-00018. References URL http://www.gnu.org/licences/. E. Allwein, R. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113–141, 2002. T. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2:263–282, 1995. S. Escalera, Oriol Pujol, and Petia Radeva. Boosted landmarks of contextual descriptors and ForestECOC: A novel framework to detect and classify objects in clutter scenes. Pattern Recognition Letters, 28(13):1759–1768, 2007. S. Escalera, O. Pujol, and P. Radeva. On the decoding process in ternary error-correcting output codes. IEEE Transactions in Pattern Analysis and Machine Intelligence, 99, 2008. S. Escalera, O. Pujol, and P. Radeva. Separability of ternary codes for sparse designs of errorcorrecting output codes. Pattern Recognition Letters, 30:285–297, 2009. E. B. Kong and T. G. Dietterich. Error-correcting output coding corrects bias and variance. International Conference of Machine Learning, pages 313–321, 1995. N. J. Nilsson. Learning Machines. McGraw-Hill, 1965. A. Passerini, M. Pontil, and P. Frasconi. New results on error correcting output codes of kernel machines. IEEE Transactions on Neural Networks, 15(1):45–54, 2004. O. Pujol, P. Radeva, , and J. Vitri` . Discriminant ECOC: A heuristic method for application depena dent design of error correcting output codes. IEEE Transactions in Pattern Analysis and Machine Intelligence, 28:1001–1007, 2006. O. Pujol, S. Escalera, and P. Radeva. An incremental node embedding technique for error-correcting output codes. Pattern Recognition, 4:713–725, 2008. R. Rifkin and A. Klautau. In defense of one-vs-all classification. The Journal of Machine Learning Research, 5:101–141, 2004. 664
URL http://www.gnu.org/licences/. E. Allwein, R. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113–141, 2002. T. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2:263–282, 1995. S. Escalera, Oriol Pujol, and Petia Radeva. Boosted landmarks of contextual descriptors and ForestECOC: A novel framework to detect and classify objects in clutter scenes. Pattern Recognition Letters, 28(13):1759–1768, 2007. S. Escalera, O. Pujol, and P. Radeva. On the decoding process in ternary error-correcting output codes. IEEE Transactions in Pattern Analysis and Machine Intelligence, 99, 2008. S. Escalera, O. Pujol, and P. Radeva. Separability of ternary codes for sparse designs of errorcorrecting output codes. Pattern Recognition Letters, 30:285–297, 2009. E. B. Kong and T. G. Dietterich. Error-correcting output coding corrects bias and variance. International Conference of Machine Learning, pages 313–321, 1995. N. J. Nilsson. Learning Machines. McGraw-Hill, 1965. A. Passerini, M. Pontil, and P. Frasconi. New results on error correcting output codes of kernel machines. IEEE Transactions on Neural Networks, 15(1):45–54, 2004. O. Pujol, P. Radeva, , and J. Vitri` . Discriminant ECOC: A heuristic method for application depena dent design of error correcting output codes. IEEE Transactions in Pattern Analysis and Machine Intelligence, 28:1001–1007, 2006. O. Pujol, S. Escalera, and P. Radeva. An incremental node embedding technique for error-correcting output codes. Pattern Recognition, 4:713–725, 2008. R. Rifkin and A. Klautau. In defense of one-vs-all classification. The Journal of Machine Learning Research, 5:101–141, 2004. 664