iccv iccv2013 iccv2013-3 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yen-Liang Lin, Cheng-Yu Huang, Hao-Jeng Wang, Winston Hsu
Abstract: We propose a 3D sub-query expansion approach for boosting sketch-based multi-view image retrieval. The core idea of our method is to automatically convert two (guided) 2D sketches into an approximated 3D sketch model, and then generate multi-view sketches as expanded sub-queries to improve the retrieval performance. To learn the weights among synthesized views (sub-queries), we present a new multi-query feature to model the similarity between subqueries and dataset images, and formulate it into a convex optimization problem. Our approach shows superior performance compared with the state-of-the-art approach on a public multi-view image dataset. Moreover, we also conduct sensitivity tests to analyze the parameters of our approach based on the gathered user sketches.
Reference: text
sentIndex sentText sentNum sentScore
1 The core idea of our method is to automatically convert two (guided) 2D sketches into an approximated 3D sketch model, and then generate multi-view sketches as expanded sub-queries to improve the retrieval performance. [sent-8, score-1.85]
2 To learn the weights among synthesized views (sub-queries), we present a new multi-query feature to model the similarity between subqueries and dataset images, and formulate it into a convex optimization problem. [sent-9, score-0.604]
3 However, example images may not be always at hand while searching, which motivates sketch-based image retrieval (SBIR) research that uses simpler hand-drawn sketches as query images. [sent-14, score-0.887]
4 However, these methods still suffer from the multiview problem and can only retrieve images with similar viewing angles or partial affine transformations with the query sketch (Figure 1). [sent-21, score-0.845]
5 Given two (2D) user sketches (different colors indicate different parts), our system can automatically reconstruct an approximated 3D sketch model (b) and generate a set of synthesized views (9 views for illustration) as expanded subqueries. [sent-26, score-1.922]
6 Our system will use these synthesized sketches for query expansion to retrieve multi-view images. [sent-27, score-1.149]
7 (d)(e) show the top five researching results using the state-of-the-art method [14] with the (2D) frontal and side view as the input sketch respectively. [sent-28, score-0.565]
8 Motivated by query expansion technique used in image retrieval [10], we bring the concept of query expansion into sketch-based image retrieval. [sent-39, score-0.653]
9 We expand the original input sketches by a set 33448958 Dataset images a 3D sketch model from the corresponding sketch contours and obtains a set of synthesized sketches for query expansion. [sent-40, score-2.445]
10 sketches and edge maps are represented by histogram of visual word frequency. [sent-41, score-0.6]
11 All dataset images are then ranked based on the final fusion score. [sent-43, score-0.213]
12 of synthesized multi-view sketches (as the expanded subqueries) from the reconstructed 3D sketch model to boost the retrieval performance. [sent-44, score-1.586]
13 The idea is intuitive: as a user query with keyword “Starbucks,” query expansions methods might transparently enhance the search results with more semantically related keyword such as “coffee,” “Seattle,” etc. [sent-45, score-0.5]
14 Figure 1 (a) shows an example (bicycle) of two sketches from the designated front and side views. [sent-50, score-0.672]
15 Once two (nearly orthogonal) sketches have been specified, its 3D sketch model is automatically constructed (Figure 1 (b)). [sent-51, score-0.995]
16 A set of sub-queries can be synthesized and expanded to match the possible multi-view candidate images in the data collection (Figure 1 (c)). [sent-52, score-0.384]
17 For comparison, Figure 1(d)(e) show the top five searching results with the state-of-the-art method [14] using front-view and side- view as the input sketch respectively. [sent-54, score-0.464]
18 Our main contributions include: • • • We propose a 3D-enhanced sketch-based system tWhaet generates amu 3ltDi--evniehwa sckeedtch sekse as expanded sstuebmqueries to boost multi-view image retrieval performance. [sent-57, score-0.326]
19 To our best knowledge, this is the first work that brings query expansion into sketch-based image retrieval. [sent-58, score-0.268]
20 To learn the weights of synthesized sub-queries, a new multi-query figeahttusre o representation sisu proposed t oa model the similarity between expanded query sketches and dataset images, and formulate it into a convex optimization problem. [sent-59, score-1.152]
21 A scalable approach, MindFinder system [6], is the first to propose an efficient indexing structure for large-scale sketch retrieval, they build inverted index-like structure to speed up the sketch-based image search. [sent-75, score-0.499]
22 In this paper, we propose a 3Denhanced approach that automatically reconstructs a 3D sketch model from (2D) user sketches, and expand the query sketches by a set of synthesized sketch (sub-queries) to boost the retrieval performance. [sent-89, score-2.235]
23 A user can draw two orthogonal views 2 of target sketches at the corresponding panel. [sent-92, score-0.923]
24 To draw different parts, a user can simply switch the brush color and system will automatically show a guidance line to help the user to align those parts from two views. [sent-93, score-0.321]
25 3D Model Reconstruction From Line Drawings Previous methods of 3D reconstruction from line drawings are mainly based on a set of heuristic rules [5], yet those rules are not always satisfied in the imperfect sketch drawings. [sent-96, score-0.574]
26 Some approaches employed Teddy [17] to convert user sketches into 3D objects. [sent-98, score-0.666]
27 Fortunately, most of the man-made objects or natural objects are (roughly) axis-aligned and the underlying 3D models can be reconstructed by 2D lines from different orthogonal views (e. [sent-100, score-0.275]
28 Motivated by the method [21], we reconstruct our 3D sketch model using two orthogonal sketches. [sent-103, score-0.482]
29 However, different from their approach for reconstructing a perfect 3D model, we tailor their method to reconstruct an approximated 3D sketch model and simplify the complexity of user sketches. [sent-104, score-0.602]
30 In the second online step, as a user draws sketches, our system automatically reconstructs the corresponding 3D model from the sketch contours and generates a set of synthesized (expanded) sketches to cover a more dense viewing range. [sent-110, score-1.58]
31 Then each synthesized sketch is similarly encoded by a visual word histogram. [sent-111, score-0.768]
32 The similarity between each synthesized sketch and dataset image is computed and concatenated into a long dimensional vector (multi-query feature vector). [sent-112, score-0.802]
33 Then a fusion function is designed and applied to the multi-query feature vector, all dataset images are ranked with the final fusion score. [sent-113, score-0.373]
34 3D Sketch Model Reconstruction To generate (approximated) 3D model from 2D sketches in the least effort, we propose to derive the results by two (nearly-orthogonal) sketches. [sent-116, score-0.533]
35 In this section, we briefly introduce the 3D reconstruction algorithm proposed in [21] and show how this can be adapted to create our 3D sketch model. [sent-117, score-0.46]
36 The sketch contours are then mapped back from 2D space onto the surface of the 3D model, hidden sketches are removed by testing against a depth map rendered from the reconstructed 3D model. [sent-121, score-1.041]
37 Different from their method for creating a sophisticated 3D model, a approximated 3D sketch model is sufficient for our system to estimate the 3D positions from input 2D sketches. [sent-122, score-0.494]
38 Since users may not want to spend lots of time to generate a 3D sketch model. [sent-124, score-0.486]
39 For example, in bicycle case (Figure 1), two wheels and the body could be regarded as a single part (blue) and reconstructed unitedly, while the saddle (green) might be occluded by the stem at the front view and is reconstructed separately. [sent-126, score-0.255]
40 , face, Figure 10) may not be axis-aligned as mentioned in [21] and there may generate some defects in the synthesized sketches. [sent-129, score-0.316]
41 We currently implement our user interface on iPad platform for pilot study (Figure 3), a more user-friendly interface will be investigated in our future work. [sent-131, score-0.257]
42 View Synthesis as Sketch Sub-Queries A fully viewpoint-independent retrieval system would require densely sampling the entire viewing sphere. [sent-134, score-0.292]
43 where a and e represents for azimuth and elevation respectively (i. [sent-137, score-0.294]
44 The missing viewpoints in the viewing sphere are made up by affine invariant local descriptors. [sent-140, score-0.222]
45 Figure 10 shows some synthesized object sketches defined in 3D object dataset. [sent-143, score-0.827]
46 For each dataset image, we find those salient edges that are most likely drawn by a user by applying Canny edge detector [4] and HoG descriptors are extracted at 500 random locations on each Canny edge map. [sent-150, score-0.256]
47 The window size of HoG descriptors and codebook size are set to 50 (in the percent of the minimum of image width and height) and 1000 with the best retrieval results in our image dataset. [sent-152, score-0.275]
48 Interestingly, we found that rotation variant HoG shows better retrieval performance in overall. [sent-154, score-0.255]
49 1st and 3rd rows are the top 10 retrieved samples using 28 synthesized car and bicycle query sketches respectively with “max” fusion scheme. [sent-158, score-1.243]
50 The results reveal that some views of an category are less discriminative and could be confused with other objects. [sent-160, score-0.271]
51 The role of local descriptors is to bridge the gap for those views that are not fully covered by our synthesized data. [sent-162, score-0.555]
52 A visual word histogram is constructed for each synthesized sub-query sketch and dataset image, and the similarity between each synthesized sub-query sketch and dataset image can be computed according to histogram distance, and finally is concatenated into a long dimensional feature vector. [sent-163, score-1.601]
53 Fusion Function Formally, given a set of synthesized sub-query sketches Q = {q(1) , q(2) , . [sent-166, score-0.827]
54 The dataset images are then ranked by the fusion score. [sent-184, score-0.213]
55 , average or max fusion scheme, that averages the similarity scores or pick the best one as the final fusion score. [sent-187, score-0.368]
56 However, these simple fusion methods obtain poor retrieval performance (Section 4. [sent-188, score-0.333]
57 That is, views within each category are not equally important and some might be confused with the other categories (Figure 5). [sent-190, score-0.268]
58 This motivates us to learn the weights of different views of an object category. [sent-191, score-0.232]
59 In other words, those views exhibit high discriminative power should discriminate the object than the rest of views. [sent-192, score-0.213]
60 We set up the learning problem using those dataset images with the same category label as query sketch as positive samples, and the others as negative samples. [sent-193, score-0.65]
61 Visualization of the optimal weights of different views for car and bicycle categories. [sent-195, score-0.307]
62 Our approach automatically learns those discriminative views while down-weighs the less discriminative ones. [sent-196, score-0.271]
63 Given the learned weight w, the final fusion score is defined as: f(xj) = 1 + e−1(wTxj), (2) where xj is the feature vector for each dataset image as defined above. [sent-202, score-0.217]
64 Figure 6 visualizes the linear weights learned from our gathered user sketches and the image dataset. [sent-203, score-0.685]
65 The result confirms our idea, it down-weights those less discriminative views such as the (nearly) top and frontal views of the car and bicycle categories. [sent-204, score-0.524]
66 Those more discriminative views would contribute more to the final fusion score and thus improve the retrieval performance. [sent-205, score-0.546]
67 The baseline method is based on a single-view query sketch, either front (top) or side view in our experiments. [sent-212, score-0.331]
68 The front (top) and side views are selected manually based on the object characteristics. [sent-213, score-0.326]
69 Average and max fusion scheme are also evaluated, which averages the similarity scores or pick the best one as the final fusion score. [sent-215, score-0.368]
70 Our approach further learns the weights of synthesized views to highlight those more discriminative sub-queries. [sent-216, score-0.529]
71 Thus, we evaluate the retrieval performance on a public multi-view image dataset [23], which is commonly used for evaluating pose estimation and object detection tasks. [sent-220, score-0.254]
72 For fairly comparing our approach with the baseline method, we select 5 viewing angles and the largest scale to evaluate the performance; since the backside information is unknown as users usually draw those head-on views. [sent-223, score-0.219]
73 These 5 viewing angles can be mapped into our sphere space with the range: azimuth = 0◦ ∼ 180◦ and elevation = 0◦ ∼ 90◦ . [sent-224, score-0.455]
74 To generate synthesized ∼sk 1et8c0hes, we sample those∼ ∼v 9ie0wpoints within this range with the defined azimuth and elevation steps. [sent-225, score-0.588]
75 In our study, the users were firstly explained with the rules for drawing two orthogonal views, and briefed with example images for each category, sketches are then drawn by their memory. [sent-229, score-0.671]
76 Figure 7 shows example sketches of the car category from 10 subjects. [sent-230, score-0.6]
77 1st and 3rd rows show the front views drawn by different users and the corresponded side views are shown in 2nd and 4th rows. [sent-233, score-0.569]
78 In all cases, a disjunct set of query sketches from a single user are used as test samples, while the remaining sketches are training samples for learning our categoryspecific fusion function (cf. [sent-239, score-1.489]
79 In our case, we hypothesize that the category label of a test query is given and the dataset images are unlabeled in the testing phrase 3. [sent-242, score-0.22]
80 The MAP numbers reported in Figure 8 is based on 28 synthesized query sketches (azimuth and elevation steps are set to 30◦). [sent-243, score-1.132]
81 In the next section, we will show how the number of synthesized views influence the retrieval performance. [sent-244, score-0.654]
82 4, the ambiguity problem 3In the text/image retrieval domain, there have shown some successful that use query-dependent (ranking) method to boost the performance [28, 18]. [sent-247, score-0.226]
83 Meanwhile, it is also possible to automatically approximate query intension by adopting some recent sketch recognition system [12]. [sent-248, score-0.731]
84 cases leads the max fusion scheme to have unacceptable even worser retrieval performance than the baseline approach. [sent-249, score-0.333]
85 Average fusion scheme does not perform well either, since the views within a object category are not equally important and may include some noise responses from those less discriminative views. [sent-250, score-0.404]
86 The reason is that rotation invariant SHoG brings more ambiguities for those outline sketches under a large pose variation, e. [sent-252, score-0.663]
87 The experimental results also show that the use of synthesized views with learned fusion function can significantly improve the retrieval performance and shows best MAP = 0. [sent-255, score-0.814]
88 Sensitivity Test We conduct sensitivity tests to evaluate the impact of number of synthesized views (controlled by azimuth (a) and elevation (e) steps) to the retrieval performance. [sent-263, score-1.021]
89 Figure 9 shows the retrieval performance with different azimuth and elevation steps of our method. [sent-264, score-0.467]
90 In addition, both increasing and decreasing the number of synthesized views resulted in a loss in performance due to over- or under- interpreting the pose distribution. [sent-267, score-0.503]
91 We found that the parameters: azimuth = 30◦ and elevation = 30◦ achieve the best overall performance on this dataset. [sent-268, score-0.294]
92 Conclusions and Future work In this paper, we propose the use of synthesized multiview sketches as expanded sub-queries to retrieve multiview images. [sent-270, score-1.038]
93 7 GHz Intel Core i5 CPU and 4G 1333 MHz memory, it takes approximately 2 seconds on average to recovery a 3D sketch model. [sent-273, score-0.43]
94 For the future work, we will design a more friendly interface to help users to draw two orthogonal views from 33549003 Figure 10. [sent-278, score-0.398]
95 Some examples of user sketches and synthesized views with azimuth step = 45◦ and elevation step = 45◦ . [sent-279, score-1.413]
96 Sensitivity test with different choices of azimuth (a) and elevation (e) steps. [sent-281, score-0.32]
97 Total recall: Automatic query expansion with a generative feature model for object retrieval. [sent-351, score-0.24]
98 An evaluation of descriptors for large-scale image retrieval from sketched feature lines. [sent-369, score-0.247]
99 A performance evaluation of gradient field hog descriptor for sketch based image retrieval. [sent-394, score-0.485]
100 It can be seen that the proposed method, 3D sub-query expansion (capturing more information than a single query) and fusion function (emphasizing on those more discriminative sub-queries), can get more accurate and diversified results. [sent-450, score-0.302]
wordName wordTfidf (topN-words)
[('sketches', 0.533), ('sketch', 0.43), ('synthesized', 0.294), ('shog', 0.231), ('views', 0.187), ('retrieval', 0.173), ('fusion', 0.16), ('query', 0.158), ('azimuth', 0.147), ('elevation', 0.147), ('user', 0.105), ('eitz', 0.102), ('expanded', 0.09), ('viewing', 0.086), ('expansion', 0.082), ('side', 0.075), ('descriptors', 0.074), ('front', 0.064), ('bicycle', 0.062), ('interface', 0.057), ('users', 0.056), ('hog', 0.055), ('affine', 0.055), ('drawings', 0.054), ('orthogonal', 0.052), ('sj', 0.051), ('retrieve', 0.049), ('urs', 0.049), ('arp', 0.046), ('intension', 0.046), ('sbir', 0.046), ('subqueries', 0.046), ('draw', 0.046), ('sensitivity', 0.045), ('sphere', 0.044), ('word', 0.044), ('rotation', 0.043), ('contours', 0.042), ('ehd', 0.041), ('hildebrand', 0.041), ('variant', 0.039), ('boubekeur', 0.038), ('pilot', 0.038), ('invariant', 0.037), ('car', 0.036), ('multiview', 0.036), ('inverted', 0.036), ('reconstructed', 0.036), ('hq', 0.036), ('tailor', 0.036), ('teddy', 0.036), ('offer', 0.036), ('bovw', 0.034), ('diversified', 0.034), ('view', 0.034), ('system', 0.033), ('ntu', 0.033), ('taiwan', 0.033), ('adopting', 0.032), ('automatically', 0.032), ('angles', 0.031), ('category', 0.031), ('approximated', 0.031), ('dataset', 0.031), ('rules', 0.03), ('reconstruction', 0.03), ('boost', 0.03), ('canny', 0.028), ('keyword', 0.028), ('silhouettes', 0.028), ('convert', 0.028), ('tests', 0.028), ('brings', 0.028), ('codebook', 0.028), ('siggraph', 0.028), ('public', 0.028), ('shrivastava', 0.027), ('confused', 0.027), ('choices', 0.026), ('xj', 0.026), ('discriminative', 0.026), ('invariance', 0.026), ('frontal', 0.026), ('reconstructs', 0.025), ('expand', 0.025), ('gathered', 0.025), ('liblinear', 0.024), ('libsvm', 0.024), ('similarity', 0.024), ('averages', 0.024), ('curved', 0.024), ('ambiguity', 0.023), ('edge', 0.023), ('might', 0.023), ('concatenated', 0.023), ('motivates', 0.023), ('ranked', 0.022), ('mentioned', 0.022), ('pose', 0.022), ('weights', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 3 iccv-2013-3D Sub-query Expansion for Improving Sketch-Based Multi-view Image Retrieval
Author: Yen-Liang Lin, Cheng-Yu Huang, Hao-Jeng Wang, Winston Hsu
Abstract: We propose a 3D sub-query expansion approach for boosting sketch-based multi-view image retrieval. The core idea of our method is to automatically convert two (guided) 2D sketches into an approximated 3D sketch model, and then generate multi-view sketches as expanded sub-queries to improve the retrieval performance. To learn the weights among synthesized views (sub-queries), we present a new multi-query feature to model the similarity between subqueries and dataset images, and formulate it into a convex optimization problem. Our approach shows superior performance compared with the state-of-the-art approach on a public multi-view image dataset. Moreover, we also conduct sensitivity tests to analyze the parameters of our approach based on the gathered user sketches.
2 0.37130237 368 iccv-2013-SYM-FISH: A Symmetry-Aware Flip Invariant Sketch Histogram Shape Descriptor
Author: Xiaochun Cao, Hua Zhang, Si Liu, Xiaojie Guo, Liang Lin
Abstract: Recently, studies on sketch, such as sketch retrieval and sketch classification, have received more attention in the computer vision community. One of its most fundamental and essential problems is how to more effectively describe a sketch image. Many existing descriptors, such as shape context, have achieved great success. In this paper, we propose a new descriptor, namely Symmetric-aware Flip Invariant Sketch Histogram (SYM-FISH) to refine the shape context feature. Its extraction process includes three steps. First the Flip Invariant Sketch Histogram (FISH) descriptor is extracted on the input image, which is a flip-invariant version of the shape context feature. Then we explore the symmetry character of the image by calculating the kurtosis coefficient. Finally, the SYM-FISH is generated by constructing a symmetry table. The new SYM-FISH descriptor supplements the original shape context by encoding the symmetric information, which is a pervasive characteristic of natural scene and objects. We evaluate the efficacy of the novel descriptor in two applications, i.e., sketch retrieval and sketch classification. Extensive experiments on three datasets well demonstrate the effectiveness and robustness of the proposed SYM-FISH descriptor.
Author: Basura Fernando, Tinne Tuytelaars
Abstract: In this paper we present a new method for object retrieval starting from multiple query images. The use of multiple queries allows for a more expressive formulation of the query object including, e.g., different viewpoints and/or viewing conditions. This, in turn, leads to more diverse and more accurate retrieval results. When no query images are available to the user, they can easily be retrieved from the internet using a standard image search engine. In particular, we propose a new method based on pattern mining. Using the minimal description length principle, we derive the most suitable set of patterns to describe the query object, with patterns corresponding to local feature configurations. This results in apowerful object-specific mid-level image representation. The archive can then be searched efficiently for similar images based on this representation, using a combination of two inverted file systems. Since the patterns already encode local spatial information, good results on several standard image retrieval datasets are obtained even without costly re-ranking based on geometric verification.
4 0.12159805 378 iccv-2013-Semantic-Aware Co-indexing for Image Retrieval
Author: Shiliang Zhang, Ming Yang, Xiaoyu Wang, Yuanqing Lin, Qi Tian
Abstract: Inverted indexes in image retrieval not only allow fast access to database images but also summarize all knowledge about the database, so that their discriminative capacity largely determines the retrieval performance. In this paper, for vocabulary tree based image retrieval, we propose a semantic-aware co-indexing algorithm to jointly San Antonio, TX 78249 . j dl@gmai l com qit ian@cs .ut sa . edu . The query embed two strong cues into the inverted indexes: 1) local invariant features that are robust to delineate low-level image contents, and 2) semantic attributes from large-scale object recognition that may reveal image semantic meanings. For an initial set of inverted indexes of local features, we utilize 1000 semantic attributes to filter out isolated images and insert semantically similar images to the initial set. Encoding these two distinct cues together effectively enhances the discriminative capability of inverted indexes. Such co-indexing operations are totally off-line and introduce small computation overhead to online query cause only local features but no semantic attributes are used for query. Experiments and comparisons with recent retrieval methods on 3 datasets, i.e., UKbench, Holidays, Oxford5K, and 1.3 million images from Flickr as distractors, manifest the competitive performance of our method 1.
5 0.11959139 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
Author: Jifeng Dai, Ying Nian Wu, Jie Zhou, Song-Chun Zhu
Abstract: Cosegmentation refers to theproblem ofsegmenting multiple images simultaneously by exploiting the similarities between the foreground and background regions in these images. The key issue in cosegmentation is to align common objects between these images. To address this issue, we propose an unsupervised learning framework for cosegmentation, by coupling cosegmentation with what we call “cosketch ”. The goal of cosketch is to automatically discover a codebook of deformable shape templates shared by the input images. These shape templates capture distinct image patterns and each template is matched to similar image patches in different images. Thus the cosketch of the images helps to align foreground objects, thereby providing crucial information for cosegmentation. We present a statistical model whose energy function couples cosketch and cosegmentation. We then present an unsupervised learning algorithm that performs cosketch and cosegmentation by energy minimization. Experiments show that our method outperforms state of the art methods for cosegmentation on the challenging MSRC and iCoseg datasets. We also illustrate our method on a new dataset called Coseg-Rep where cosegmentation can be performed within a single image with repetitive patterns.
6 0.11476678 210 iccv-2013-Image Retrieval Using Textual Cues
7 0.11013559 162 iccv-2013-Fast Subspace Search via Grassmannian Based Hashing
8 0.10762215 337 iccv-2013-Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search
9 0.10640147 400 iccv-2013-Stable Hyper-pooling and Query Expansion for Event Detection
10 0.10142621 334 iccv-2013-Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval
11 0.096909642 444 iccv-2013-Viewing Real-World Faces in 3D
12 0.09432441 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning
13 0.094197825 294 iccv-2013-Offline Mobile Instance Retrieval with a Small Memory Footprint
14 0.092948146 163 iccv-2013-Feature Weighting via Optimal Thresholding for Video Analysis
15 0.085173145 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion
16 0.078147747 213 iccv-2013-Implied Feedback: Learning Nuances of User Behavior in Image Search
17 0.075214989 333 iccv-2013-Quantize and Conquer: A Dimensionality-Recursive Solution to Clustering, Vector Quantization, and Image Retrieval
18 0.073606007 446 iccv-2013-Visual Semantic Complex Network for Web Images
19 0.071858272 235 iccv-2013-Learning Coupled Feature Spaces for Cross-Modal Matching
20 0.071534976 244 iccv-2013-Learning View-Invariant Sparse Representations for Cross-View Action Recognition
topicId topicWeight
[(0, 0.155), (1, 0.022), (2, -0.044), (3, -0.077), (4, 0.026), (5, 0.097), (6, 0.019), (7, -0.061), (8, -0.079), (9, 0.05), (10, 0.127), (11, 0.014), (12, 0.026), (13, 0.064), (14, -0.001), (15, -0.003), (16, 0.12), (17, -0.092), (18, 0.118), (19, -0.08), (20, 0.047), (21, -0.108), (22, -0.044), (23, 0.016), (24, 0.003), (25, 0.076), (26, 0.005), (27, 0.05), (28, -0.008), (29, -0.006), (30, -0.035), (31, 0.061), (32, 0.008), (33, -0.007), (34, 0.006), (35, 0.022), (36, -0.029), (37, 0.093), (38, -0.055), (39, -0.075), (40, -0.028), (41, -0.172), (42, -0.065), (43, -0.002), (44, 0.055), (45, -0.063), (46, -0.004), (47, 0.06), (48, -0.127), (49, 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.94145656 3 iccv-2013-3D Sub-query Expansion for Improving Sketch-Based Multi-view Image Retrieval
Author: Yen-Liang Lin, Cheng-Yu Huang, Hao-Jeng Wang, Winston Hsu
Abstract: We propose a 3D sub-query expansion approach for boosting sketch-based multi-view image retrieval. The core idea of our method is to automatically convert two (guided) 2D sketches into an approximated 3D sketch model, and then generate multi-view sketches as expanded sub-queries to improve the retrieval performance. To learn the weights among synthesized views (sub-queries), we present a new multi-query feature to model the similarity between subqueries and dataset images, and formulate it into a convex optimization problem. Our approach shows superior performance compared with the state-of-the-art approach on a public multi-view image dataset. Moreover, we also conduct sensitivity tests to analyze the parameters of our approach based on the gathered user sketches.
2 0.87011945 368 iccv-2013-SYM-FISH: A Symmetry-Aware Flip Invariant Sketch Histogram Shape Descriptor
Author: Xiaochun Cao, Hua Zhang, Si Liu, Xiaojie Guo, Liang Lin
Abstract: Recently, studies on sketch, such as sketch retrieval and sketch classification, have received more attention in the computer vision community. One of its most fundamental and essential problems is how to more effectively describe a sketch image. Many existing descriptors, such as shape context, have achieved great success. In this paper, we propose a new descriptor, namely Symmetric-aware Flip Invariant Sketch Histogram (SYM-FISH) to refine the shape context feature. Its extraction process includes three steps. First the Flip Invariant Sketch Histogram (FISH) descriptor is extracted on the input image, which is a flip-invariant version of the shape context feature. Then we explore the symmetry character of the image by calculating the kurtosis coefficient. Finally, the SYM-FISH is generated by constructing a symmetry table. The new SYM-FISH descriptor supplements the original shape context by encoding the symmetric information, which is a pervasive characteristic of natural scene and objects. We evaluate the efficacy of the novel descriptor in two applications, i.e., sketch retrieval and sketch classification. Extensive experiments on three datasets well demonstrate the effectiveness and robustness of the proposed SYM-FISH descriptor.
Author: Basura Fernando, Tinne Tuytelaars
Abstract: In this paper we present a new method for object retrieval starting from multiple query images. The use of multiple queries allows for a more expressive formulation of the query object including, e.g., different viewpoints and/or viewing conditions. This, in turn, leads to more diverse and more accurate retrieval results. When no query images are available to the user, they can easily be retrieved from the internet using a standard image search engine. In particular, we propose a new method based on pattern mining. Using the minimal description length principle, we derive the most suitable set of patterns to describe the query object, with patterns corresponding to local feature configurations. This results in apowerful object-specific mid-level image representation. The archive can then be searched efficiently for similar images based on this representation, using a combination of two inverted file systems. Since the patterns already encode local spatial information, good results on several standard image retrieval datasets are obtained even without costly re-ranking based on geometric verification.
4 0.69885433 446 iccv-2013-Visual Semantic Complex Network for Web Images
Author: Shi Qiu, Xiaogang Wang, Xiaoou Tang
Abstract: This paper proposes modeling the complex web image collections with an automatically generated graph structure called visual semantic complex network (VSCN). The nodes on this complex network are clusters of images with both visual and semantic consistency, called semantic concepts. These nodes are connected based on the visual and semantic correlations. Our VSCN with 33, 240 concepts is generated from a collection of 10 million web images. 1 A great deal of valuable information on the structures of the web image collections can be revealed by exploring the VSCN, such as the small-world behavior, concept community, indegree distribution, hubs, and isolated concepts. It not only helps us better understand the web image collections at a macroscopic level, but also has many important practical applications. This paper presents two application examples: content-based image retrieval and image browsing. Experimental results show that the VSCN leads to significant improvement on both the precision of image retrieval (over 200%) and user experience for image browsing.
5 0.69194376 334 iccv-2013-Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval
Author: Cai-Zhi Zhu, Hervé Jégou, Shin'Ichi Satoh
Abstract: Visual object retrieval aims at retrieving, from a collection of images, all those in which a given query object appears. It is inherently asymmetric: the query object is mostly included in the database image, while the converse is not necessarily true. However, existing approaches mostly compare the images with symmetrical measures, without considering the different roles of query and database. This paper first measure the extent of asymmetry on large-scale public datasets reflecting this task. Considering the standard bag-of-words representation, we then propose new asymmetrical dissimilarities accounting for the different inlier ratios associated with query and database images. These asymmetrical measures depend on the query, yet they are compatible with an inverted file structure, without noticeably impacting search efficiency. Our experiments show the benefit of our approach, and show that the visual object retrieval task is better treated asymmetrically, in the spirit of state-of-the-art text retrieval.
6 0.61433983 337 iccv-2013-Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search
7 0.5926674 306 iccv-2013-Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items
8 0.58515936 378 iccv-2013-Semantic-Aware Co-indexing for Image Retrieval
9 0.58364332 162 iccv-2013-Fast Subspace Search via Grassmannian Based Hashing
10 0.57422501 419 iccv-2013-To Aggregate or Not to aggregate: Selective Match Kernels for Image Search
11 0.56986499 294 iccv-2013-Offline Mobile Instance Retrieval with a Small Memory Footprint
12 0.56123304 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning
13 0.54350859 400 iccv-2013-Stable Hyper-pooling and Query Expansion for Event Detection
14 0.53055531 159 iccv-2013-Fast Neighborhood Graph Search Using Cartesian Concatenation
15 0.52918345 235 iccv-2013-Learning Coupled Feature Spaces for Cross-Modal Matching
16 0.47725341 210 iccv-2013-Image Retrieval Using Textual Cues
18 0.44998464 148 iccv-2013-Example-Based Facade Texture Synthesis
19 0.44009829 444 iccv-2013-Viewing Real-World Faces in 3D
20 0.43606812 221 iccv-2013-Joint Inverted Indexing
topicId topicWeight
[(2, 0.092), (4, 0.011), (7, 0.016), (8, 0.249), (12, 0.032), (26, 0.075), (31, 0.042), (38, 0.01), (40, 0.011), (42, 0.113), (64, 0.037), (73, 0.035), (89, 0.166)]
simIndex simValue paperId paperTitle
1 0.80930686 246 iccv-2013-Learning the Visual Interpretation of Sentences
Author: C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende
Abstract: Sentences that describe visual scenes contain a wide variety of information pertaining to the presence of objects, their attributes and their spatial relations. In this paper we learn the visual features that correspond to semantic phrases derived from sentences. Specifically, we extract predicate tuples that contain two nouns and a relation. The relation may take several forms, such as a verb, preposition, adjective or their combination. We model a scene using a Conditional Random Field (CRF) formulation where each node corresponds to an object, and the edges to their relations. We determine the potentials of the CRF using the tuples extracted from the sentences. We generate novel scenes depicting the sentences’ visual meaning by sampling from the CRF. The CRF is also used to score a set of scenes for a text-based image retrieval task. Our results show we can generate (retrieve) scenes that convey the desired semantic meaning, even when scenes (queries) are described by multiple sentences. Significant improvement is found over several baseline approaches.
same-paper 2 0.78877503 3 iccv-2013-3D Sub-query Expansion for Improving Sketch-Based Multi-view Image Retrieval
Author: Yen-Liang Lin, Cheng-Yu Huang, Hao-Jeng Wang, Winston Hsu
Abstract: We propose a 3D sub-query expansion approach for boosting sketch-based multi-view image retrieval. The core idea of our method is to automatically convert two (guided) 2D sketches into an approximated 3D sketch model, and then generate multi-view sketches as expanded sub-queries to improve the retrieval performance. To learn the weights among synthesized views (sub-queries), we present a new multi-query feature to model the similarity between subqueries and dataset images, and formulate it into a convex optimization problem. Our approach shows superior performance compared with the state-of-the-art approach on a public multi-view image dataset. Moreover, we also conduct sensitivity tests to analyze the parameters of our approach based on the gathered user sketches.
3 0.77667636 272 iccv-2013-Modifying the Memorability of Face Photographs
Author: Aditya Khosla, Wilma A. Bainbridge, Antonio Torralba, Aude Oliva
Abstract: Contemporary life bombards us with many new images of faces every day, which poses non-trivial constraints on human memory. The vast majority of face photographs are intended to be remembered, either because of personal relevance, commercial interests or because the pictures were deliberately designed to be memorable. Can we make aportrait more memorable or more forgettable automatically? Here, we provide a method to modify the memorability of individual face photographs, while keeping the identity and other facial traits (e.g. age, attractiveness, and emotional magnitude) of the individual fixed. We show that face photographs manipulated to be more memorable (or more forgettable) are indeed more often remembered (or forgotten) in a crowd-sourcing experiment with an accuracy of 74%. Quantifying and modifying the ‘memorability ’ of a face lends itself to many useful applications in computer vision and graphics, such as mnemonic aids for learning, photo editing applications for social networks and tools for designing memorable advertisements.
4 0.75261092 186 iccv-2013-GrabCut in One Cut
Author: Meng Tang, Lena Gorelick, Olga Veksler, Yuri Boykov
Abstract: Among image segmentation algorithms there are two major groups: (a) methods assuming known appearance models and (b) methods estimating appearance models jointly with segmentation. Typically, the first group optimizes appearance log-likelihoods in combination with some spacial regularization. This problem is relatively simple and many methods guarantee globally optimal results. The second group treats model parameters as additional variables transforming simple segmentation energies into highorder NP-hard functionals (Zhu-Yuille, Chan-Vese, GrabCut, etc). It is known that such methods indirectly minimize the appearance overlap between the segments. We propose a new energy term explicitly measuring L1 distance between the object and background appearance models that can be globally maximized in one graph cut. We show that in many applications our simple term makes NP-hard segmentation functionals unnecessary. Our one cut algorithm effectively replaces approximate iterative optimization techniques based on block coordinate descent.
5 0.72166717 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
Author: Jiongxin Liu, Peter N. Belhumeur
Abstract: In this paper, we propose a novel approach for bird part localization, targeting fine-grained categories with wide variations in appearance due to different poses (including aspect and orientation) and subcategories. As it is challenging to represent such variations across a large set of diverse samples with tractable parametric models, we turn to individual exemplars. Specifically, we extend the exemplarbased models in [4] by enforcing pose and subcategory consistency at the parts. During training, we build posespecific detectors scoring part poses across subcategories, and subcategory-specific detectors scoring part appearance across poses. At the testing stage, likely exemplars are matched to the image, suggesting part locations whose pose and subcategory consistency are well-supported by the image cues. From these hypotheses, part configuration can be predicted with very high accuracy. Experimental results demonstrate significantperformance gainsfrom our method on an extensive dataset: CUB-200-2011 [30], for both localization and classification tasks.
6 0.71298748 428 iccv-2013-Translating Video Content to Natural Language Descriptions
7 0.70019233 238 iccv-2013-Learning Graphs to Match
8 0.69006425 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
9 0.68979084 180 iccv-2013-From Where and How to What We See
10 0.68974006 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
12 0.68677115 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning
13 0.68616545 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps
14 0.68601382 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
15 0.68558419 126 iccv-2013-Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification
16 0.68536103 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
17 0.68526453 349 iccv-2013-Regionlets for Generic Object Detection
19 0.68483758 241 iccv-2013-Learning Near-Optimal Cost-Sensitive Decision Policy for Object Detection
20 0.68479443 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation