cvpr cvpr2013 cvpr2013-403 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Bin Zhao, Eric P. Xing
Abstract: Many vision tasks require a multi-class classifier to discriminate multiple categories, on the order of hundreds or thousands. In this paper, we propose sparse output coding, a principled way for large-scale multi-class classification, by turning high-cardinality multi-class categorization into a bit-by-bit decoding problem. Specifically, sparse output coding is composed of two steps: efficient coding matrix learning with scalability to thousands of classes, and probabilistic decoding. Empirical results on object recognition and scene classification demonstrate the effectiveness ofour proposed approach.
Reference: text
sentIndex sentText sentNum sentScore
1 In this paper, we propose sparse output coding, a principled way for large-scale multi-class classification, by turning high-cardinality multi-class categorization into a bit-by-bit decoding problem. [sent-5, score-0.323]
2 Specifically, sparse output coding is composed of two steps: efficient coding matrix learning with scalability to thousands of classes, and probabilistic decoding. [sent-6, score-0.678]
3 Error Correcting Output Coding: For a K class problem, error correcting output coding (ECOC) [1] consists of two stages: coding and decoding. [sent-27, score-0.678]
4 h,rKee} disjoint sets: positive partition s( a+1 p irnti βl), negative partition (o−in1t in βl), and ignored classes (0 in βl). [sent-34, score-0.313]
5 Binary learning algorithms are used to construct bit predictor hl using training data Zl = {(x1, By1,l) , . [sent-35, score-0.526]
6 , L (throughout the rest o}f w wthitihs paper, we use “bit predictor” to denote the binary classifier associated with a column of the coding matrix). [sent-41, score-0.328]
7 Results in [1] suggest that learning a coding matrix in a problem-dependent way is better than using a pre-defined one. [sent-42, score-0.316]
8 However, strong = error-correcting ability alone does not guarantee good classification [8], since the performance of output coding is also highly dependent on the accuracy of the individual bit predictors. [sent-43, score-0.715]
9 Consequently, several approaches [23, 8, 13] op333333445088 timizing coding matrix and bit predictors simultaneously have been proposed. [sent-44, score-0.857]
10 However, the coupling of learning coding matrix and bit predictors in a unified optimization framework is both a blessing and a curse. [sent-45, score-0.857]
11 Therefore, SpOC learns coding matrix and bit predictors separately, but still balances error-correcting ability and bit prediction accuracy in learning the coding matrix. [sent-47, score-1.524]
12 Given a test instance x, the decoding procedure finds the class y whose codeword in B is “closest” to h(x) = (h1(x) , . [sent-49, score-0.38]
13 For binary output coding scenario, where B ∈ {−1, +1}K×L, eitherHamming distance orEucwlhideeraenB Bd i∈sta {n−c1e, c+ou1l}d be adopted to measure distance between two codewords. [sent-53, score-0.36]
14 Specifically, previous attempts in decoding ternary codes [12] either (1) treat “0” bits the same way as non-zero bits, or (2) ignore those “0” bits entirely and only use non-zero bits for decoding. [sent-55, score-0.761]
15 Specifically, treating “0” bits the same way as nonzero ones would introduce bias in decoding, since the distance increases with the number ofpositions that contain the zero symbol. [sent-57, score-0.183]
16 On the other hand, ignoring “0” bits entirely would discard great amount of information. [sent-58, score-0.16]
17 Probabilistic decoding utilizes zero bits by propagating labels from nonzero bits to zero ones subject to smoothness constraints, and proves effective especially on large scale problems. [sent-59, score-0.581]
18 SpOC is robust to errors in bit predictors, simple to parallel, and its computational time scales logarithmically with the number of classes. [sent-64, score-0.378]
19 (3) We propose probabilistic decoding to effectively utilize semantic similarity between visual categories for accurate decoding. [sent-66, score-0.308]
20 Coding For the sake of scalability to large-scale problems, SpOC decouples the learning processes of coding matrix and bit predictors. [sent-68, score-0.715]
21 However, our proposed approach still balances the error-correcting ability of the coding matrix and classification accuracy for each resulting bit predictor, by utilizing training data and structure information among classes. [sent-69, score-0.718]
22 Formulation As its most attractive advantage, the coding matrix in output coding is usually chosen for strong error-correcting ability. [sent-72, score-0.629]
23 Besides, since output coding is essentially aggregating discriminative information residing in each bit, learning accurate bit predictors is also crucial for its success. [sent-73, score-0.854]
24 Since the high computational cost associated with methods optimizing coding matrix and bit predictors simultaneously [13] renders them unfavorable in large-scale problems, we propose to use training data and class taxonomy, to provide a measure of separability for each binary partition. [sent-74, score-1.024]
25 Specifically, if some classes are closely related but are given different codes in the l-th bit, the bit predictor hl may not be easily learnable. [sent-75, score-0.628]
26 However, a binary partition is more likely to be well solved if the intra-partition similarity is large while the inter-partition similarity is small. [sent-76, score-0.174]
27 Otherwise, every single class will participate in the training of each bit predictor. [sent-80, score-0.444]
28 With each of the 20K classes participating in learning a bit predictor, we will likely be facing a binary partition problem where both the positive and negative partitions are populated with data points coming from thousands of different classes. [sent-82, score-0.616]
29 Clearly, learning bit predictor for such binary partition will be extremely difficult, due to the huge intra-partition dissimilarity. [sent-83, score-0.608]
30 Therefore, sparse output coding learns optimal cLoding matrix as follows, mBax Fb(B)−λrFr(B)−λcl? [sent-84, score-0.368]
31 Fb(B) measures the sepis arability iosf t heaec inh binary partition problem associated with columns of B, and reflects the expected accuracy of bit predictors. [sent-111, score-0.502]
32 Moreover, Fr (B) measures codeword correlation, and minimizing Fr (B) ensures the strong error-correcting ability of the resulting coding matrix. [sent-112, score-0.34]
33 The l2 regularization on each column βl of B controls the complexity of each bit prediction problem. [sent-113, score-0.426]
34 Constraints in (2) ensure that each column of the coding matrix defines a binary partition problem, with the freedom of introducing ignored classes. [sent-115, score-0.552]
35 Constraints in (3) ensure that each bit prediction problem has non-empty positive partition and non-empty negative partition. [sent-116, score-0.527]
36 Finally, 333333445 919 constraints in (4) ensure that each class in the original Kway classification appears in at least one bit prediction problem, so that we could effectively decode this class. [sent-117, score-0.574]
37 1 Fb(B): Separability of Binary Problem One key issue in designing the coding matrix is to ensure that the resulting bit prediction problems could be effectively solved. [sent-123, score-0.767]
38 The key motivation of our mathematical formulation is to compute the following two measures using semantic relatedness matrix S (for example, rose is more similar to tulip than truck) for each binary partition problem: intra-partition similarity, and inter-partition similarity. [sent-124, score-0.199]
39 Specifically, in each binary partition problem, both positive partition and negative partition are composed of data points from multiple classes in the original problem. [sent-125, score-0.368]
40 To encourage better separation, those classes composing the positive partition should be similar to each other. [sent-126, score-0.167]
41 2 Fr (B) : Codeword Correlation Given an example x, the L-dimensional bit predictor h(x) = [h1(x) , . [sent-211, score-0.484]
42 To increase tolerance of errors occurred in bit predictions, a crucial design objective of the coding matrix is to ensure that the rows in B are separated as far from each other as possible. [sent-216, score-0.717]
43 Therefore, we propose a concave-convex procedure based algorithm, where the nonconvexity is handled by constrained concave-convex procedure (CCCP) [24, 7], and the non-smoothness is handled using dual proximal gradient method. [sent-316, score-0.177]
44 1 Algorithm 1 Learning output coding matrix Algorithm 1Learning output coding matrix Initialize B0 repeat Find Bt as the solution to problem (12); Set t = t + 1and get the new problem (12) until stopping criterion satisfied 2. [sent-365, score-0.736]
45 Probabilistic Decoding For large-scale multi-class categorization, a sparse output coding matrix is necessary to ensure the learnability of each bit predictor. [sent-485, score-0.806]
46 However, the zero bits in coding matrix also bring difficulty in decoding. [sent-486, score-0.479]
47 Given a test image from class Husky, if we treat zero bits the same way as nonzero ones, Hamming decoding would prefer Shepherd over Husky. [sent-488, score-0.484]
48 This effect occurs because the decoding value increases with the number of positions that contain the zero symbol and hence introduces bias. [sent-490, score-0.263]
49 On the other hand, ignoring zero bits entirely would discard precious information for decoding. [sent-491, score-0.188]
50 This is especially important when K is large, where we expect a very sparse coding matrix. [sent-492, score-0.261]
51 For example, in Figure 1, classes Husky and Tiger have only two non-zero bits in their codewords. [sent-493, score-0.204]
52 Since we cannot always have perfect bit predictors, classification errors on bit 1and bit 4 would severely impair the overall accuracy. [sent-494, score-1.158]
53 Fortunately, the semantic class similarity S computed using training data and class taxonomy, provides venue for effectively propagating information from non-zero bits to zero ones. [sent-495, score-0.34]
54 The second bit predictor in Figure 1 solves a binary partition of (Shepherd, Wolf) against Fox. [sent-497, score-0.608]
55 Even though class Husky is ignored in training for this bit, the binary partition on images from this class will have a higher probability of being +1, due to the fact that the two positive classes in this binary problem are closely related to class Husky. [sent-498, score-0.551]
56 Therefore, those classes with non-zero bits in the coding matrix, should effectively propagate their label to those initially ignored classes. [sent-499, score-0.556]
57 onepos- sible coding matrix for 5-class categorization, with red = +1, black = −1, and green = 0; (Right). [sent-502, score-0.316]
58 For the second bit (highlighted in dash box), although the first node (class Husky) is ignored during learning the bit predictor, it has a preference of being colored black, rather than red. [sent-506, score-0.847]
59 Specifically, we treat each bit prediction (without loss of generality, say, the l-th bit) as a label propagation [29] problem, where the labeled data corresponds to those classes whose codeword’s l-th bit is non-zero, and unlabeled data corresponds to those whose l-th bit is zero. [sent-509, score-1.277]
60 The goal of label propagation is to define a prior distribution indicating the probability of one class being classified as positive in the lth binary partition. [sent-510, score-0.203]
61 Combining this prior with the available training data, we formulate the decoding problem in ternary ECOC as maximum a posteriori estimation. [sent-511, score-0.298]
62 Formulation Given output coding matrix B ∈ {−1, 0, +1}K×L, our decoding mouettphuotd c eosdtiinmgat mesa rcioxnd Biti ∈on {al− probability of each class k given input x and L bit predictors {h1, . [sent-514, score-1.233]
63 cWlaitshso kut gloivsse ofgenerality, we assume dthicet borits predictors constructed in the coding stage are linear classifiers, each parameterized by a vector w as hl(x) = sign(w? [sent-518, score-0.424]
64 The decoding problem is then to find the class k, which maximizes the following conditional probability: P(y=k|{wl}, x, μ) =? [sent-524, score-0.301]
65 Therefore, we need to =lea 1rn|w wthe,x xB)er =no 1u/ll(i1 parameters {μkl }, 333333555422 which measures the probability of the l-th bit being +1 given the true class as y = k. [sent-552, score-0.467]
66 Specifically, for the l-th column of the coding matrix, those classes corresponding to +1in the l-th bit, i. [sent-553, score-0.35]
67 eHsocwoerrveespr, originally ignored clas=se −s 1(t,h wosilel corresponding to 0 in the coding matrix) will also be likely to have a preference on the value of the l-th bit. [sent-558, score-0.33]
68 For the example in Figure 1, the second bit predictor separates (Shepherd, Wolf) from Fox. [sent-559, score-0.484]
69 1 Prior Distribution via Label Propagation Since each column βl in coding matrix has its own labeling (different composition of positive classes, negative classes, and ignored classes), label propagation for each column is independent of others. [sent-572, score-0.492]
70 , μKl) as labels on neo todes n oVde, swih ner Ue μkl =fin 1e fμor those classes labeled as +1 aonnd n μkl = V ,0 w fhoerr eth μose classes labeled as −1 in βl. [sent-578, score-0.161]
71 2 Parameter Learning Given L bit predictors, and training data Z = (X, Y) = {(x1,y1),. [sent-597, score-0.378]
72 Clearly, given μ, decoding takes linear time with the number of columns in coding matrix, which could be as small as L = O(log K). [sent-639, score-0.518]
73 Experiments In this section, we test the performance of sparse output coding on two data sets: ImageNet [10] for object recognition, and SUN database [26] for scene recognition. [sent-643, score-0.313]
74 Moreover, we also compare with several output coding based methods. [sent-666, score-0.313]
75 Specifically, we compare with the random dense code output coding (RDOC) proposed in [1], where each element in the coding matrix is chosen at random from {−1, +1}, with probability 1/2 for −1 and +1 deoacmh. [sent-667, score-0.676]
76 rAomlso, { we provide ritehsu pltrso bfaobr tlhitey r 1a/n2do fmor sparse dco +de1 output coding (RSOC) in [1], where each element in the coding matrix is chosen at random from {−1, 0, +1}, with probability 1ix/2 is f cohr o0se, ann adt probability m1/ {4− f1o,r0 −,+1 1a}n,d w +ith1 pearochb. [sent-668, score-0.675]
77 Finally, to test the impact ofprobabilistic decoding on SpOC, we report results of SpOC using a simple Hamming distance based decoding strategy, denoted as SpOC-H. [sent-670, score-0.47]
78 For output coding based methods, we set code length L = 200 for flower and SUN, and L = 300 for food. [sent-672, score-0.368]
79 For RDOC and RSOC, 1000 random coding matrices are generated for each scenario and the one with the largest minimum pair-wise Hamming distances between all pairs of codewords and does not have any identical columns is chosen. [sent-676, score-0.261]
80 To decode the label for OVR using learned binary classifiers, we pick the class with the largest decision value. [sent-677, score-0.191]
81 For RDOC, RSOC, SpecOC and SpOC-H, we pick the class whose codeword has minimum Hamming distance with the codeword of test data point. [sent-678, score-0.247]
82 Specifically, for decoding in RSOC and SpOC-H, we test both strategies of treating zero bits the same way as non-zero ones and ignoring zero bits entirely, and report the best result of these two methods. [sent-679, score-0.561]
83 This shows that for large-scale visual recognition problems, output coding with a carefully designed coding matrix could outperform OVR, while maintaining cheaper computational cost, due to the error-correcting property introduced in the coding matrix. [sent-694, score-0.912]
84 (2) Both SpOC and OVR beat RDOC and RSOC, revealing the importance of enforcing learnability of each bit predictor, since randomly generated coding matrix could very likely generate difficult binary separation problems. [sent-695, score-0.831]
85 (3) SpOC performs better than SpecOC, which employs a dense coding matrix. [sent-696, score-0.261]
86 The margin between SpOC and SpecOC is even more severe on food, revealing the importance of having ignored classes in each bit predictor. [sent-697, score-0.547]
87 On the other hand, the error-correcting property in SpOC introduces robustness to errors made in bit predictors. [sent-699, score-0.378]
88 However, the fact that L = 200 performs almost as well as L = 400 demonstrates that SpOC usually requires much less bit predictors compared to the number of classes in the multi-class categorization problem. [sent-704, score-0.646]
89 Specifically, computational time for SpOC consists three parts: (1) time for learning output coding matrix, (2) time for training bit predictors, and (3) time for probabilistic decoding. [sent-706, score-0.719]
90 Bit predictors or binary classifiers are trained in parallel on a cluster composed of 200 nodes. [sent-711, score-0.236]
91 Time for training bit predictors is the summation of time spent on each node. [sent-712, score-0.566]
92 According to Table 4, time for learning coding matrix and probabilistic decoding is almost negligible compared to the time spent on training bit predictors. [sent-713, score-0.982]
93 Moreover, the total CPU time spent for training bit predic333333555644 SROHVpDieOcRSC V-HM123 70. [sent-714, score-0.403]
94 For SpOC, the time (seconds) before | is for learning coding matrix and decoding, and the toimndes )a bfteerfo | ies |fo isr learning nbigt predictors. [sent-735, score-0.316]
95 Conclusions Sparse output coding provides an initial foray into largescale visual recognition, by turning high-cardinality multiclass classification into a bit-by-bit decoding problem. [sent-738, score-0.617]
96 We also propose probabilistic decoding to decode the optimal class label. [sent-739, score-0.362]
97 The fact that SpOC takes less bit predictors than OVR while achieving better accuracy, renders it especially promising for large-scale visual recognition. [sent-741, score-0.569]
98 On the learnability and design of output codes for multiclass problems. [sent-785, score-0.167]
99 On the decoding process in ternary error-correcting output codes. [sent-814, score-0.35]
100 Spectral error correcting output codes for efficient multiclass recognition. [sent-917, score-0.168]
wordName wordTfidf (topN-words)
[('spoc', 0.489), ('bit', 0.378), ('bkl', 0.331), ('coding', 0.261), ('decoding', 0.235), ('husky', 0.189), ('cl', 0.185), ('predictors', 0.163), ('bits', 0.135), ('ovr', 0.112), ('shepherd', 0.11), ('predictor', 0.106), ('ek', 0.101), ('fs', 0.101), ('kl', 0.096), ('codeword', 0.079), ('rsoc', 0.079), ('bb', 0.078), ('partition', 0.077), ('wl', 0.074), ('proximal', 0.073), ('ignored', 0.069), ('classes', 0.069), ('class', 0.066), ('iznf', 0.063), ('rdoc', 0.063), ('specoc', 0.063), ('ternary', 0.063), ('bt', 0.058), ('matrix', 0.055), ('output', 0.052), ('fb', 0.052), ('binary', 0.047), ('bktl', 0.047), ('rek', 0.047), ('dual', 0.047), ('multiclass', 0.045), ('hl', 0.042), ('sbt', 0.042), ('taxonomy', 0.042), ('wolf', 0.04), ('pli', 0.039), ('imagenet', 0.039), ('correcting', 0.038), ('learnability', 0.037), ('categorization', 0.036), ('hamming', 0.035), ('food', 0.035), ('gradient', 0.034), ('decode', 0.033), ('codes', 0.033), ('bklbk', 0.032), ('yilpli', 0.032), ('flower', 0.031), ('revealing', 0.031), ('icml', 0.03), ('specifically', 0.029), ('sign', 0.028), ('renders', 0.028), ('equivalently', 0.028), ('zero', 0.028), ('probabilistic', 0.028), ('skk', 0.028), ('prediction', 0.028), ('separability', 0.026), ('million', 0.026), ('classifiers', 0.026), ('ecoc', 0.026), ('yil', 0.026), ('pi', 0.026), ('spent', 0.025), ('entirely', 0.025), ('similarity', 0.025), ('code', 0.024), ('mbin', 0.024), ('facing', 0.024), ('classification', 0.024), ('propagation', 0.024), ('neo', 0.023), ('nonconvexity', 0.023), ('tiger', 0.023), ('probability', 0.023), ('ensure', 0.023), ('sd', 0.023), ('rk', 0.023), ('sun', 0.023), ('pick', 0.023), ('node', 0.022), ('cccp', 0.022), ('label', 0.022), ('localityconstrained', 0.022), ('could', 0.022), ('scalability', 0.021), ('positive', 0.021), ('sij', 0.021), ('semantic', 0.02), ('column', 0.02), ('nonzero', 0.02), ('fr', 0.02), ('clearly', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 403 cvpr-2013-Sparse Output Coding for Large-Scale Visual Recognition
Author: Bin Zhao, Eric P. Xing
Abstract: Many vision tasks require a multi-class classifier to discriminate multiple categories, on the order of hundreds or thousands. In this paper, we propose sparse output coding, a principled way for large-scale multi-class classification, by turning high-cardinality multi-class categorization into a bit-by-bit decoding problem. Specifically, sparse output coding is composed of two steps: efficient coding matrix learning with scalability to thousands of classes, and probabilistic decoding. Empirical results on object recognition and scene classification demonstrate the effectiveness ofour proposed approach.
2 0.15189929 178 cvpr-2013-From Local Similarity to Global Coding: An Application to Image Classification
Author: Amirreza Shaban, Hamid R. Rabiee, Mehrdad Farajtabar, Marjan Ghazvininejad
Abstract: Bag of words models for feature extraction have demonstrated top-notch performance in image classification. These representations are usually accompanied by a coding method. Recently, methods that code a descriptor giving regard to its nearby bases have proved efficacious. These methods take into account the nonlinear structure of descriptors, since local similarities are a good approximation of global similarities. However, they confine their usage of the global similarities to nearby bases. In this paper, we propose a coding scheme that brings into focus the manifold structure of descriptors, and devise a method to compute the global similarities of descriptors to the bases. Given a local similarity measure between bases, a global measure is computed. Exploiting the local similarity of a descriptor and its nearby bases, a global measure of association of a descriptor to all the bases is computed. Unlike the locality-based and sparse coding methods, the proposed coding varies smoothly with respect to the underlying manifold. Experiments on benchmark image classification datasets substantiate the superiority oftheproposed method over its locality and sparsity based rivals.
3 0.10971431 63 cvpr-2013-Binary Code Ranking with Weighted Hamming Distance
Author: Lei Zhang, Yongdong Zhang, Jinhu Tang, Ke Lu, Qi Tian
Abstract: Binary hashing has been widely used for efficient similarity search due to its query and storage efficiency. In most existing binary hashing methods, the high-dimensional data are embedded into Hamming space and the distance or similarity of two points are approximated by the Hamming distance between their binary codes. The Hamming distance calculation is efficient, however, in practice, there are often lots of results sharing the same Hamming distance to a query, which makes this distance measure ambiguous and poses a critical issue for similarity search where ranking is important. In this paper, we propose a weighted Hamming distance ranking algorithm (WhRank) to rank the binary codes of hashing methods. By assigning different bit-level weights to different hash bits, the returned binary codes are ranked at a finer-grained binary code level. We give an algorithm to learn the data-adaptive and query-sensitive weight for each hash bit. Evaluations on two large-scale image data sets demonstrate the efficacy of our weighted Hamming distance for binary code ranking.
4 0.097513646 53 cvpr-2013-BFO Meets HOG: Feature Extraction Based on Histograms of Oriented p.d.f. Gradients for Image Classification
Author: Takumi Kobayashi
Abstract: Image classification methods have been significantly developed in the last decade. Most methods stem from bagof-features (BoF) approach and it is recently extended to a vector aggregation model, such as using Fisher kernels. In this paper, we propose a novel feature extraction method for image classification. Following the BoF approach, a plenty of local descriptors are first extracted in an image and the proposed method is built upon the probability density function (p.d.f) formed by those descriptors. Since the p.d.f essentially represents the image, we extract the features from the p.d.f by means of the gradients on the p.d.f. The gradients, especially their orientations, effectively characterize the shape of the p.d.f from the geometrical viewpoint. We construct the features by the histogram of the oriented p.d.f gradients via orientation coding followed by aggregation of the orientation codes. The proposed image features, imposing no specific assumption on the targets, are so general as to be applicable to any kinds of tasks regarding image classifications. In the experiments on object recog- nition and scene classification using various datasets, the proposed method exhibits superior performances compared to the other existing methods.
5 0.087523147 236 cvpr-2013-K-Means Hashing: An Affinity-Preserving Quantization Method for Learning Binary Compact Codes
Author: Kaiming He, Fang Wen, Jian Sun
Abstract: In computer vision there has been increasing interest in learning hashing codes whose Hamming distance approximates the data similarity. The hashing functions play roles in both quantizing the vector space and generating similarity-preserving codes. Most existing hashing methods use hyper-planes (or kernelized hyper-planes) to quantize and encode. In this paper, we present a hashing method adopting the k-means quantization. We propose a novel Affinity-Preserving K-means algorithm which simultaneously performs k-means clustering and learns the binary indices of the quantized cells. The distance between the cells is approximated by the Hamming distance of the cell indices. We further generalize our algorithm to a product space for learning longer codes. Experiments show our method, named as K-means Hashing (KMH), outperforms various state-of-the-art hashing encoding methods.
6 0.086866915 164 cvpr-2013-Fast Convolutional Sparse Coding
7 0.083674505 69 cvpr-2013-Boosting Binary Keypoint Descriptors
8 0.080685861 442 cvpr-2013-Transfer Sparse Coding for Robust Image Representation
9 0.077231519 340 cvpr-2013-Probabilistic Label Trees for Efficient Large Scale Image Classification
10 0.076688759 319 cvpr-2013-Optimized Product Quantization for Approximate Nearest Neighbor Search
11 0.07246995 88 cvpr-2013-Compressible Motion Fields
12 0.069243468 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection
13 0.068943009 246 cvpr-2013-Learning Binary Codes for High-Dimensional Data Using Bilinear Projections
14 0.068623498 79 cvpr-2013-Cartesian K-Means
15 0.066189587 296 cvpr-2013-Multi-level Discriminative Dictionary Learning towards Hierarchical Visual Categorization
16 0.059702408 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning
17 0.059482347 87 cvpr-2013-Compressed Hashing
18 0.058300734 422 cvpr-2013-Tag Taxonomy Aware Dictionary Learning for Region Tagging
19 0.058166452 304 cvpr-2013-Multipath Sparse Coding Using Hierarchical Matching Pursuit
20 0.055719212 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?
topicId topicWeight
[(0, 0.131), (1, -0.043), (2, -0.048), (3, 0.047), (4, 0.049), (5, 0.009), (6, -0.027), (7, -0.031), (8, -0.101), (9, -0.01), (10, -0.058), (11, 0.003), (12, 0.012), (13, 0.008), (14, -0.006), (15, 0.051), (16, -0.018), (17, 0.008), (18, 0.013), (19, 0.006), (20, -0.027), (21, -0.007), (22, 0.004), (23, -0.029), (24, 0.022), (25, 0.087), (26, -0.03), (27, 0.054), (28, 0.005), (29, -0.012), (30, -0.014), (31, 0.047), (32, -0.039), (33, -0.009), (34, -0.024), (35, 0.009), (36, -0.045), (37, -0.028), (38, 0.07), (39, -0.077), (40, -0.059), (41, -0.051), (42, 0.028), (43, 0.02), (44, 0.0), (45, -0.047), (46, -0.021), (47, -0.031), (48, -0.012), (49, 0.008)]
simIndex simValue paperId paperTitle
same-paper 1 0.92823374 403 cvpr-2013-Sparse Output Coding for Large-Scale Visual Recognition
Author: Bin Zhao, Eric P. Xing
Abstract: Many vision tasks require a multi-class classifier to discriminate multiple categories, on the order of hundreds or thousands. In this paper, we propose sparse output coding, a principled way for large-scale multi-class classification, by turning high-cardinality multi-class categorization into a bit-by-bit decoding problem. Specifically, sparse output coding is composed of two steps: efficient coding matrix learning with scalability to thousands of classes, and probabilistic decoding. Empirical results on object recognition and scene classification demonstrate the effectiveness ofour proposed approach.
2 0.77873725 442 cvpr-2013-Transfer Sparse Coding for Robust Image Representation
Author: Mingsheng Long, Guiguang Ding, Jianmin Wang, Jiaguang Sun, Yuchen Guo, Philip S. Yu
Abstract: Sparse coding learns a set of basis functions such that each input signal can be well approximated by a linear combination of just a few of the bases. It has attracted increasing interest due to its state-of-the-art performance in BoW based image representation. However, when labeled and unlabeled images are sampled from different distributions, they may be quantized into different visual words of the codebook and encoded with different representations, which may severely degrade classification performance. In this paper, we propose a Transfer Sparse Coding (TSC) approach to construct robust sparse representations for classifying cross-distribution images accurately. Specifically, we aim to minimize the distribution divergence between the labeled and unlabeled images, and incorporate this criterion into the objective function of sparse coding to make the new representations robust to the distribution difference. Experiments show that TSC can significantly outperform state-ofthe-art methods on three types of computer vision datasets.
3 0.76263565 178 cvpr-2013-From Local Similarity to Global Coding: An Application to Image Classification
Author: Amirreza Shaban, Hamid R. Rabiee, Mehrdad Farajtabar, Marjan Ghazvininejad
Abstract: Bag of words models for feature extraction have demonstrated top-notch performance in image classification. These representations are usually accompanied by a coding method. Recently, methods that code a descriptor giving regard to its nearby bases have proved efficacious. These methods take into account the nonlinear structure of descriptors, since local similarities are a good approximation of global similarities. However, they confine their usage of the global similarities to nearby bases. In this paper, we propose a coding scheme that brings into focus the manifold structure of descriptors, and devise a method to compute the global similarities of descriptors to the bases. Given a local similarity measure between bases, a global measure is computed. Exploiting the local similarity of a descriptor and its nearby bases, a global measure of association of a descriptor to all the bases is computed. Unlike the locality-based and sparse coding methods, the proposed coding varies smoothly with respect to the underlying manifold. Experiments on benchmark image classification datasets substantiate the superiority oftheproposed method over its locality and sparsity based rivals.
Author: Takumi Kobayashi
Abstract: Image classification methods have been significantly developed in the last decade. Most methods stem from bagof-features (BoF) approach and it is recently extended to a vector aggregation model, such as using Fisher kernels. In this paper, we propose a novel feature extraction method for image classification. Following the BoF approach, a plenty of local descriptors are first extracted in an image and the proposed method is built upon the probability density function (p.d.f) formed by those descriptors. Since the p.d.f essentially represents the image, we extract the features from the p.d.f by means of the gradients on the p.d.f. The gradients, especially their orientations, effectively characterize the shape of the p.d.f from the geometrical viewpoint. We construct the features by the histogram of the oriented p.d.f gradients via orientation coding followed by aggregation of the orientation codes. The proposed image features, imposing no specific assumption on the targets, are so general as to be applicable to any kinds of tasks regarding image classifications. In the experiments on object recog- nition and scene classification using various datasets, the proposed method exhibits superior performances compared to the other existing methods.
5 0.65782529 164 cvpr-2013-Fast Convolutional Sparse Coding
Author: Hilton Bristow, Anders Eriksson, Simon Lucey
Abstract: Sparse coding has become an increasingly popular method in learning and vision for a variety of classification, reconstruction and coding tasks. The canonical approach intrinsically assumes independence between observations during learning. For many natural signals however, sparse coding is applied to sub-elements (i.e. patches) of the signal, where such an assumption is invalid. Convolutional sparse coding explicitly models local interactions through the convolution operator, however the resulting optimization problem is considerably more complex than traditional sparse coding. In this paper, we draw upon ideas from signal processing and Augmented Lagrange Methods (ALMs) to produce a fast algorithm with globally optimal subproblems and super-linear convergence.
8 0.64935875 87 cvpr-2013-Compressed Hashing
9 0.6380893 320 cvpr-2013-Optimizing 1-Nearest Prototype Classifiers
10 0.60965747 69 cvpr-2013-Boosting Binary Keypoint Descriptors
11 0.60929269 83 cvpr-2013-Classification of Tumor Histology via Morphometric Context
12 0.60410064 34 cvpr-2013-Adaptive Active Learning for Image Classification
13 0.5966087 304 cvpr-2013-Multipath Sparse Coding Using Hierarchical Matching Pursuit
14 0.59643573 246 cvpr-2013-Learning Binary Codes for High-Dimensional Data Using Bilinear Projections
15 0.5909304 421 cvpr-2013-Supervised Kernel Descriptors for Visual Recognition
16 0.58460802 79 cvpr-2013-Cartesian K-Means
17 0.58432025 223 cvpr-2013-Inductive Hashing on Manifolds
18 0.57270437 319 cvpr-2013-Optimized Product Quantization for Approximate Nearest Neighbor Search
19 0.56752807 390 cvpr-2013-Semi-supervised Node Splitting for Random Forest Construction
20 0.56500852 134 cvpr-2013-Discriminative Sub-categorization
topicId topicWeight
[(10, 0.083), (16, 0.359), (26, 0.041), (28, 0.013), (33, 0.243), (67, 0.048), (69, 0.026), (80, 0.011), (87, 0.067)]
simIndex simValue paperId paperTitle
1 0.9588185 410 cvpr-2013-Specular Reflection Separation Using Dark Channel Prior
Author: Hyeongwoo Kim, Hailin Jin, Sunil Hadap, Inso Kweon
Abstract: We present a novel method to separate specular reflection from a single image. Separating an image into diffuse and specular components is an ill-posed problem due to lack of observations. Existing methods rely on a specularfree image to detect and estimate specularity, which however may confuse diffuse pixels with the same hue but a different saturation value as specular pixels. Our method is based on a novel observation that for most natural images the dark channel can provide an approximate specular-free image. We also propose a maximum a posteriori formulation which robustly recovers the specular reflection and chromaticity despite of the hue-saturation ambiguity. We demonstrate the effectiveness of the proposed algorithm on real and synthetic examples. Experimental results show that our method significantly outperforms the state-of-theart methods in separating specular reflection.
2 0.87191164 118 cvpr-2013-Detecting Pulse from Head Motions in Video
Author: Guha Balakrishnan, Fredo Durand, John Guttag
Abstract: We extract heart rate and beat lengths from videos by measuring subtle head motion caused by the Newtonian reaction to the influx of blood at each beat. Our method tracks features on the head and performs principal component analysis (PCA) to decompose their trajectories into a set of component motions. It then chooses the component that best corresponds to heartbeats based on its temporal frequency spectrum. Finally, we analyze the motion projected to this component and identify peaks of the trajectories, which correspond to heartbeats. When evaluated on 18 subjects, our approach reported heart rates nearly identical to an electrocardiogram device. Additionally we were able to capture clinically relevant information about heart rate variability.
3 0.85178828 27 cvpr-2013-A Theory of Refractive Photo-Light-Path Triangulation
Author: Visesh Chari, Peter Sturm
Abstract: 3D reconstruction of transparent refractive objects like a plastic bottle is challenging: they lack appearance related visual cues and merely reflect and refract light from the surrounding environment. Amongst several approaches to reconstruct such objects, the seminal work of Light-Path triangulation [17] is highly popular because of its general applicability and analysis of minimal scenarios. A lightpath is defined as the piece-wise linear path taken by a ray of light as it passes from source, through the object and into the camera. Transparent refractive objects not only affect the geometric configuration of light-paths but also their radiometric properties. In this paper, we describe a method that combines both geometric and radiometric information to do reconstruction. We show two major consequences of the addition of radiometric cues to the light-path setup. Firstly, we extend the case of scenarios in which reconstruction is plausible while reducing the minimal re- quirements for a unique reconstruction. This happens as a consequence of the fact that radiometric cues add an additional known variable to the already existing system of equations. Secondly, we present a simple algorithm for reconstruction, owing to the nature of the radiometric cue. We present several synthetic experiments to validate our theories, and show high quality reconstructions in challenging scenarios.
4 0.83452684 224 cvpr-2013-Information Consensus for Distributed Multi-target Tracking
Author: Ahmed T. Kamal, Jay A. Farrell, Amit K. Roy-Chowdhury
Abstract: Due to their high fault-tolerance, ease of installation and scalability to large networks, distributed algorithms have recently gained immense popularity in the sensor networks community, especially in computer vision. Multitarget tracking in a camera network is one of the fundamental problems in this domain. Distributed estimation algorithms work by exchanging information between sensors that are communication neighbors. Since most cameras are directional sensors, it is often the case that neighboring sensors may not be sensing the same target. Such sensors that do not have information about a target are termed as “naive ” with respect to that target. In this paper, we propose consensus-based distributed multi-target tracking algorithms in a camera network that are designed to address this issue of naivety. The estimation errors in tracking and data association, as well as the effect of naivety, are jointly addressed leading to the development of an informationweighted consensus algorithm, which we term as the Multitarget Information Consensus (MTIC) algorithm. The incorporation of the probabilistic data association mecha- nism makes the MTIC algorithm very robust to false measurements/clutter. Experimental analysis is provided to support the theoretical results.
5 0.82028836 271 cvpr-2013-Locally Aligned Feature Transforms across Views
Author: Wei Li, Xiaogang Wang
Abstract: In this paper, we propose a new approach for matching images observed in different camera views with complex cross-view transforms and apply it to person reidentification. It jointly partitions the image spaces of two camera views into different configurations according to the similarity of cross-view transforms. The visual features of an image pair from different views are first locally aligned by being projected to a common feature space and then matched with softly assigned metrics which are locally optimized. The features optimal for recognizing identities are different from those for clustering cross-view transforms. They are jointly learned by utilizing sparsityinducing norm and information theoretical regularization. . cuhk . edu .hk (a) Camera view A (b) Camera view B This approach can be generalized to the settings where test images are from new camera views, not the same as those in the training set. Extensive experiments are conducted on public datasets and our own dataset. Comparisons with the state-of-the-art metric learning and person re-identification methods show the superior performance of our approach.
6 0.81292129 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
7 0.81060129 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
same-paper 8 0.78646636 403 cvpr-2013-Sparse Output Coding for Large-Scale Visual Recognition
10 0.72204459 349 cvpr-2013-Reconstructing Gas Flows Using Light-Path Approximation
11 0.71039546 361 cvpr-2013-Robust Feature Matching with Alternate Hough and Inverted Hough Transforms
12 0.7103855 443 cvpr-2013-Uncalibrated Photometric Stereo for Unknown Isotropic Reflectances
13 0.70720452 454 cvpr-2013-Video Enhancement of People Wearing Polarized Glasses: Darkening Reversal and Reflection Reduction
14 0.70129377 130 cvpr-2013-Discriminative Color Descriptors
15 0.69463009 269 cvpr-2013-Light Field Distortion Feature for Transparent Object Recognition
16 0.69383681 352 cvpr-2013-Recovering Stereo Pairs from Anaglyphs
17 0.69288391 115 cvpr-2013-Depth Super Resolution by Rigid Body Self-Similarity in 3D
18 0.68933618 384 cvpr-2013-Segment-Tree Based Cost Aggregation for Stereo Matching
19 0.68359345 149 cvpr-2013-Evaluation of Color STIPs for Human Action Recognition
20 0.67929 54 cvpr-2013-BRDF Slices: Accurate Adaptive Anisotropic Appearance Acquisition