nips nips2009 nips2009-131 knowledge-graph by maker-knowledge-mining

131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition


Source: pdf

Author: Tom Ouyang, Randall Davis

Abstract: We propose a new sketch recognition framework that combines a rich representation of low level visual appearance with a graphical model for capturing high level relationships between symbols. This joint model of appearance and context allows our framework to be less sensitive to noise and drawing variations, improving accuracy and robustness. The result is a recognizer that is better able to handle the wide range of drawing styles found in messy freehand sketches. We evaluate our work on two real-world domains, molecular diagrams and electrical circuit diagrams, and show that our combined approach significantly improves recognition performance. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We propose a new sketch recognition framework that combines a rich representation of low level visual appearance with a graphical model for capturing high level relationships between symbols. [sent-4, score-0.872]

2 This joint model of appearance and context allows our framework to be less sensitive to noise and drawing variations, improving accuracy and robustness. [sent-5, score-0.507]

3 The result is a recognizer that is better able to handle the wide range of drawing styles found in messy freehand sketches. [sent-6, score-0.465]

4 We evaluate our work on two real-world domains, molecular diagrams and electrical circuit diagrams, and show that our combined approach significantly improves recognition performance. [sent-7, score-0.39]

5 We propose a new framework for sketch recognition that combines a rich representation of low level visual appearance with a probabilistic model for capturing higher level relationships. [sent-14, score-0.761]

6 Our combined approach uses a graphical model that classifies each symbol jointly with its context, allowing neighboring interpretations to influence each other. [sent-17, score-0.192]

7 The result is a recognizer that is better able to handle the range of drawing styles found in messy freehand sketches. [sent-19, score-0.465]

8 Current work in sketch recognition can, very broadly speaking, be separated into two groups. [sent-20, score-0.299]

9 The first group focuses on the relationships between geometric primitives like lines, arcs, and curves, specifying them either manually [1, 4, 5] or learning them from labeled data [16, 20]. [sent-21, score-0.271]

10 Circles may not always be round, line segments may not be straight, and stroke artifacts like pen-drag (not lifting the pen between strokes), over-tracing (drawing over a 1 previously drawn stroke), and stray ink may introduce false primitives that lead to poor recognition. [sent-24, score-0.927]

11 In addition, recognizers that rely on extracted primitives often discard potentially useful information contained in the appearance of the original strokes. [sent-25, score-0.384]

12 The second group of related work focuses on the visual appearance of shapes and symbols. [sent-26, score-0.471]

13 These include parts-based methods [9, 18], which learn a set of discriminative parts or patches for each symbol class, and template-based methods [7, 11], which compare the input symbol to a library of learned prototypes. [sent-27, score-0.318]

14 The main advantage of vision-based approaches is their robustness to many of the drawing variations commonly found in real-world sketches, including artifacts like over-tracing and pen drag. [sent-28, score-0.23]

15 However, these methods do not model the spatial relationships between neighboring shapes, relying solely on local appearance to classify a symbol. [sent-29, score-0.527]

16 In the following sections we describe our approach, which combines both appearance and context. [sent-30, score-0.338]

17 2 Preprocessing The first step in our recognition framework is to preprocess the sketch into a set of simple segments, as shown in Figure 1(b). [sent-32, score-0.299]

18 This is not the case when working with the strokes directly, so preprocessing allows us to handle strokes that contain more than one symbol (e. [sent-36, score-0.738]

19 Our preprocessing algorithm divides strokes into segments by splitting them at their corner points. [sent-39, score-0.611]

20 Previous approaches to corner detection focused primarily on local pen speed and curvature [15], but these measures are not always reliable in messy real-world sketches. [sent-40, score-0.479]

21 Our corner detection algorithm, on the other hand, tries to find the set of vertices that best approximates the original stroke as a whole. [sent-41, score-0.549]

22 At the end of the preprocessing stage, the system records the length of the longest segment L (after excluding the top 5% as outliers). [sent-45, score-0.385]

23 3 Symbol Detection Our algorithm searches for symbols among groups of segments. [sent-47, score-0.218]

24 Starting with each segment in isolation, we generate successively larger groups by expanding the group to include the next closest segment3 . [sent-48, score-0.347]

25 This process ends when either the size of the group exceeds 2L (a spatial constraint) or 1 Defined as the distance between vi and the line segment formed by vi−1 and vi+1 In our experiments, we set the threshold to 0. [sent-49, score-0.493]

26 3 Distance defined as mindist(s, g) + bbdist(s, g), where mindist(s, g) is the distance at the nearest point between segment s and group g and bbdist(s, g) is the diagonal length of the bounding box containing s and g. [sent-51, score-0.502]

27 2 2 (a) Original Strokes (b) Segments after preprocessing (d) Graphical model G hi l d l (c) Candidate groups (e) Fi l d ( ) Final detections i Figure 1: Our recognition framework. [sent-52, score-0.304]

28 (a) An example sketch of a circuit diagram and (b) the segments after preprocessing. [sent-53, score-0.562]

29 (c) A subset of the candidate groups extracted from the sketch (only those with an appearance potential > 0. [sent-54, score-0.684]

30 (d) The resulting graphical model: nodes represent segment labels, dark blue edges represent group overlap potentials, and light blue edges represent context potentials. [sent-56, score-0.463]

31 (e) The final set of symbol detections after running loopy belief propagation. [sent-57, score-0.297]

32 when the group spans more strokes than the temporal window specified for the domain4 . [sent-58, score-0.375]

33 Note that we allow temporal gaps in the detection region, so symbols do not need to be drawn with consecutive strokes. [sent-59, score-0.255]

34 We classify each candidate group using the symbol recognizer we described in [11], which converts the on-line stroke sequences into a set of low resolution feature images (see Figure 2(a)). [sent-61, score-0.907]

35 This emphasis on visual appearance makes our method less sensitive to stroke level differences like overtracing and pen drag, improving accuracy and robustness. [sent-62, score-0.93]

36 Since [11] was designed for classifying isolated shapes and not for detecting symbols in messy sketches, we augment its output with five geometric features and a set of local context features: stroke count: The number of strokes in the group. [sent-63, score-1.226]

37 segment count: The number of segments in the group. [sent-64, score-0.408]

38 group ink density: The total length of the strokes in the group divided by the diagonal length. [sent-66, score-0.622]

39 stroke separation: Maximum distance between any stroke and its nearest neighbor in the group. [sent-68, score-0.78]

40 local context: A set of four feature images that captures the local context around the group. [sent-69, score-0.197]

41 Each image filters the local appearance at a specific orientation: 0, 45, 90, and 135 degrees. [sent-70, score-0.353]

42 The symbol detector uses a linear SVM [13] to classify each candidate group, labeling it as one of the symbols in the domain or as mis-grouped “clutter”. [sent-73, score-0.532]

43 Because the classifier needs to distinguish between more than two classes, we 4 The temporal window is 8 strokes for chemistry diagrams and 20 strokes for the circuit diagrams. [sent-75, score-0.934]

44 3 (a) Isolated recognizer features 0 45 90 135 end (b) Local context features 0 45 90 135 (c) Local context features 0 45 90 135 Figure 2: Symbol Detection Features. [sent-77, score-0.252]

45 The first four images encode stroke orientation at 0, 45, 90, and 135 degrees; the fifth captures the locations of stroke endpoints. [sent-79, score-0.845]

46 For example, an isolated segment always looks like a straight line, so its visual appearance is not very informative. [sent-85, score-0.727]

47 , wires in circuits and straight bonds in chemistry): orientation: The orientation of the segment, discretized into evenly space bins of size π/4. [sent-88, score-0.282]

48 segment length: The length of the segment, normalized by L. [sent-89, score-0.284]

49 segment count: The total number of segments extracted from the parent stroke. [sent-90, score-0.442]

50 segment ink density: The length of the substroke matching the start and end points of the segment divided by the length of the segment. [sent-91, score-0.674]

51 stroke ink density: The length of the parent stroke divided by the diagonal length of the parent stroke’s bounding box. [sent-93, score-1.169]

52 local context: Same as the local context for multi-segment symbols, except these images are centered at the midpoint of the segment, oriented in the same direction as the segment, and scaled so that each dimension is equal to two times the length of the segment. [sent-94, score-0.259]

53 4 Improving Recognition using Context The final task is to select a set of symbol detections from the competing candidate groups. [sent-96, score-0.365]

54 Second, it should select candidates that are consistent with each other based on what the system knows about the likely spatial relationships between symbols. [sent-99, score-0.254]

55 Under our formulation, each segment (node) in the sketch needs to be assigned to one of the candidate groups (labels). [sent-101, score-0.566]

56 Thus, our candidate selection problem becomes a segment labeling problem, where the set of possible labels for a given segment is the set of candidate groups that contain that segment. [sent-102, score-0.73]

57 This allows us to incorporate local appearance, group overlap consistency, and spatial context into a single unified model. [sent-103, score-0.316]

58 The joint probability function over the entire graph is given by: appearance log P (c|x) = overlap ψo (ci , cj ) + ψc (ci , cj , xi , xj ) − log(Z) ψa (ci , x) + i context (2) ij where x is the set of segments in the sketch, c is the set of segment labels, and Z is a normalizing constant. [sent-105, score-1.177]

59 The appearance potential ψa measures how well the candidate group’s appearance matches that of its predicted class. [sent-107, score-0.753]

60 It uses the output of the isolated symbol classifier in section 4 and is defined as: ψa (ci , x) = log Pa (ci |x) (3) where Pa (ci |x) is the likelihood score for candidate ci returned by the isolated symbol classifier. [sent-108, score-0.758]

61 The overlap potential ψo (ci , cj ) is a pairwise compatibility that ensures the segment assignments do not conflict with each other. [sent-110, score-0.428]

62 For example, if segments xi and xj are both members of candidate c and xi is assigned to c, then xj must also be assigned to c. [sent-111, score-0.481]

63 ψo (ci , cj ) = −100, 0, if ((xi ∈ cj ) or (xj ∈ ci )) and (ci = cj ) otherwise (4) To improve efficiency, instead of connecting every pair of segments that are jointly considered in c, we connect the segments into a loop based on temporal ordering. [sent-112, score-0.914]

64 The context potential ψc (ci , cj , xi , xj ) represents the spatial compatibility between segments xi and xj , conditioned on their predicted class labels (e. [sent-116, score-0.694]

65 ψc (ci , cj , xi , xj ) = log Pc (θ(xi , xj ) | class(ci ), class(cj )) (5) where class(ci ) is the predicted class for candidate ci and θ(xi , xj ) is the set of three spatial relationships (θ1 , θ2 , θ3 ) between segments xi and xj . [sent-120, score-1.02]

66 This potential is active only for pairs of segments whose distance at the closest point is less than L/2. [sent-121, score-0.218]

67 , capturing the sketch but providing no recognition or feedback. [sent-132, score-0.336]

68 Using the data we collected, we evaluated five versions of our system: Appearance uses only the isolated appearance-based recognizer from [11]. [sent-133, score-0.197]

69 Complete (corner detector from [15]) is the complete framework, using the corner detector in [15]. [sent-137, score-0.281]

70 For this domain we noticed that users almost never drew multiple symbols using a single stroke, with the exception of multiple connected straight bonds (e. [sent-145, score-0.388]

71 Following this observation, we optimized our candidate extractor to filter out multi-segment candidates that break stroke boundaries. [sent-148, score-0.56]

72 971 Table 1: Overall recognition accuracy for the chemistry dataset. [sent-154, score-0.229]

73 Note that for this dataset we report only accuracy (recall), because, unlike traditional object detection, there are no overlapping detections and every stroke is assigned to a symbol. [sent-155, score-0.557]

74 , misclassifying one segment in a three-segment “H” makes it impossible to recognize the original “H” correctly. [sent-158, score-0.253]

75 The results in Table 1 show that our method was able to recognize 97% of the symbols correctly. [sent-159, score-0.204]

76 We can see that the diagrams in this dataset can be very messy, 6 and exhibit a wide range of drawing styles. [sent-163, score-0.284]

77 Circuits The second dataset is a collection of circuit diagrams collected by Oltmans and Davis [9]. [sent-165, score-0.318]

78 Each user drew ten or eleven different circuits, and every circuit was required to include a pre-specified set of components. [sent-167, score-0.204]

79 Also, since we do not count wire detections for this dataset (as in [9]), we report precision as well as recall. [sent-172, score-0.187]

80 912 Table 2: Overall recognition accuracy for the circuit diagram dataset. [sent-185, score-0.32]

81 Table 2 shows that our method was able to recognize over 91% of the circuit symbols correctly. [sent-186, score-0.331]

82 As Figure 4 (bottom) shows, this is a very complicated and messy corpus with significant drawing variations like overtracing and pen drag. [sent-192, score-0.394]

83 1 seconds to process a new stroke in the circuits dataset and 0. [sent-194, score-0.506]

84 They achieved an accuracy of 62% on a circuits dataset similar to ours, but needed to manually segment any strokes that contained more than one symbol. [sent-201, score-0.634]

85 Gennari et al [3] developed a system that searches for symbols in high density regions of the sketch and uses domain knowledge to correct low level recognition errors. [sent-202, score-0.52]

86 They reported an accuracy of 77% on a dataset with 6 types of circuit components. [sent-203, score-0.193]

87 Sezgin and Davis [16] proposed using an HMM to model the temporal patterns of geometric primitives, and reported an accuracy of 87% on a dataset containing 4 types of circuit components. [sent-204, score-0.262]

88 [17] proposed an approach that treats sketch recognition as a visual parsing problem. [sent-207, score-0.352]

89 Our work differs from theirs in that we use a rich model of low-level visual appearance and do not require a pre-defined spatial grammar. [sent-208, score-0.458]

90 Ouyang and Davis [10] developed a sketch recognition system that uses domain knowledge to refine its interpretation. [sent-209, score-0.347]

91 Their work focused on chemical diagrams, and detection was limited to symbols drawn using consecutive strokes. [sent-210, score-0.32]

92 Outside of the sketch recognition community, there is also a great deal of interest in combining appearance and context for problems in computer vision [6, 8, 19]. [sent-211, score-0.68]

93 7 Figure 4: Examples of chemical diagrams (top) and circuit diagrams (bottom) recognized by our system (complete framework). [sent-212, score-0.588]

94 Correct detections are highlighted in green (teal for hash and wedge bonds), false detections in red, and missed symbols in orange. [sent-213, score-0.406]

95 6 Discussion We have proposed a new framework that combines a rich representation of low level visual appearance with a probabilistic model for capturing higher level relationships. [sent-214, score-0.462]

96 To our knowledge this is the first paper to combine these two approaches, and the result is a recognizer that is better able to handle the range of drawing styles found in messy freehand sketches. [sent-215, score-0.465]

97 To preserve the familiar experience of using pen and paper, our system supports the same symbols, notations, and drawing styles that people are already accustomed to. [sent-216, score-0.328]

98 In our initial evaluation we apply our method on two real-world domains, chemical diagrams and electrical circuits (with 10 types of components), and achieve accuracy rates of 97% and 91% respectively. [sent-217, score-0.371]

99 Compared to existing benchmarks in literature, our method achieved higher accuracy even though the other systems supported fewer symbols [3, 16], trained on data from the same user [3, 16], or required manual pre-segmentation [1]. [sent-218, score-0.237]

100 Ladder: a language to describe drawing, display, and editing in sketch recognition. [sent-248, score-0.194]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('stroke', 0.39), ('appearance', 0.308), ('strokes', 0.263), ('segment', 0.222), ('sketch', 0.194), ('segments', 0.186), ('symbols', 0.173), ('symbol', 0.159), ('diagrams', 0.158), ('ci', 0.153), ('vi', 0.128), ('circuit', 0.127), ('messy', 0.124), ('sketches', 0.122), ('cj', 0.119), ('corner', 0.109), ('pen', 0.106), ('recognizer', 0.106), ('ink', 0.106), ('candidate', 0.105), ('recognition', 0.105), ('detections', 0.101), ('chemical', 0.097), ('drawing', 0.093), ('isolated', 0.091), ('chemistry', 0.091), ('circuits', 0.083), ('styles', 0.081), ('bonds', 0.081), ('oltmans', 0.081), ('group', 0.08), ('relationships', 0.078), ('primitives', 0.076), ('context', 0.073), ('ouyang', 0.071), ('vj', 0.068), ('davis', 0.067), ('candidates', 0.065), ('spatial', 0.063), ('xj', 0.063), ('detector', 0.062), ('length', 0.062), ('freehand', 0.061), ('sezgin', 0.061), ('shilman', 0.061), ('bounding', 0.06), ('vij', 0.058), ('geometry', 0.055), ('overlap', 0.055), ('diagram', 0.055), ('wire', 0.053), ('preprocessing', 0.053), ('visual', 0.053), ('mse', 0.053), ('straight', 0.053), ('detection', 0.05), ('system', 0.048), ('complete', 0.048), ('graphics', 0.047), ('box', 0.047), ('computers', 0.046), ('drew', 0.046), ('groups', 0.045), ('local', 0.045), ('curvature', 0.045), ('alvarado', 0.04), ('bbdist', 0.04), ('drawings', 0.04), ('gennari', 0.04), ('mindist', 0.04), ('overtracing', 0.04), ('geometric', 0.037), ('belief', 0.037), ('capturing', 0.037), ('resistor', 0.035), ('users', 0.035), ('images', 0.034), ('bins', 0.034), ('parent', 0.034), ('rich', 0.034), ('accuracy', 0.033), ('dataset', 0.033), ('classify', 0.033), ('graphical', 0.033), ('kara', 0.032), ('lifting', 0.032), ('organic', 0.032), ('xi', 0.032), ('potential', 0.032), ('temporal', 0.032), ('false', 0.031), ('recognize', 0.031), ('orientation', 0.031), ('user', 0.031), ('diagonal', 0.031), ('labels', 0.031), ('variations', 0.031), ('informal', 0.03), ('shapes', 0.03), ('combines', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

Author: Tom Ouyang, Randall Davis

Abstract: We propose a new sketch recognition framework that combines a rich representation of low level visual appearance with a graphical model for capturing high level relationships between symbols. This joint model of appearance and context allows our framework to be less sensitive to noise and drawing variations, improving accuracy and robustness. The result is a recognizer that is better able to handle the wide range of drawing styles found in messy freehand sketches. We evaluate our work on two real-world domains, molecular diagrams and electrical circuit diagrams, and show that our combined approach significantly improves recognition performance. 1

2 0.13296694 201 nips-2009-Region-based Segmentation and Object Detection

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1

3 0.11928639 211 nips-2009-Segmenting Scenes by Matching Image Composites

Author: Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman

Abstract: In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.

4 0.088947274 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

Author: Mario Fritz, Gary Bradski, Sergey Karayev, Trevor Darrell, Michael J. Black

Abstract: Existing methods for visual recognition based on quantized local features can perform poorly when local features exist on transparent surfaces, such as glass or plastic objects. There are characteristic patterns to the local appearance of transparent objects, but they may not be well captured by distances to individual examples or by a local pattern codebook obtained by vector quantization. The appearance of a transparent patch is determined in part by the refraction of a background pattern through a transparent medium: the energy from the background usually dominates the patch appearance. We model transparent local patch appearance using an additive model of latent factors: background factors due to scene content, and factors which capture a local edge energy distribution characteristic of the refraction. We implement our method using a novel LDA-SIFT formulation which performs LDA prior to any vector quantization step; we discover latent topics which are characteristic of particular transparent patches and quantize the SIFT space into transparent visual words according to the latent topic dimensions. No knowledge of the background scene is required at test time; we show examples recognizing transparent glasses in a domestic environment. 1

5 0.084376492 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

Author: Tomasz Malisiewicz, Alyosha Efros

Abstract: The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the object’s relationship to other elements of the scene (context). Most current approaches rely on modeling the relationships between object categories as a source of context. In this paper we seek to move beyond categories to provide a richer appearancebased model of context. We present an exemplar-based model of objects and their relationships, the Visual Memex, that encodes both local appearance and 2D spatial context between object instances. We evaluate our model on Torralba’s proposed Context Challenge against a baseline category-based system. Our experiments suggest that moving beyond categories for context modeling appears to be quite beneficial, and may be the critical missing ingredient in scene understanding systems. 1

6 0.077810183 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection

7 0.073688678 170 nips-2009-Nonlinear directed acyclic structure learning with weakly additive noise models

8 0.070054397 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

9 0.067831382 10 nips-2009-A Gaussian Tree Approximation for Integer Least-Squares

10 0.062603906 133 nips-2009-Learning models of object structure

11 0.058971204 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes

12 0.058858704 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

13 0.05773855 168 nips-2009-Non-stationary continuous dynamic Bayesian networks

14 0.056947153 236 nips-2009-Structured output regression for detection with partial truncation

15 0.056298859 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

16 0.055797338 259 nips-2009-Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

17 0.052288301 57 nips-2009-Conditional Random Fields with High-Order Features for Sequence Labeling

18 0.050356854 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

19 0.048789132 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

20 0.048780087 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.177), (1, -0.088), (2, -0.068), (3, -0.015), (4, -0.028), (5, 0.047), (6, 0.034), (7, 0.035), (8, 0.042), (9, -0.064), (10, -0.017), (11, -0.005), (12, 0.067), (13, -0.028), (14, 0.047), (15, 0.02), (16, 0.014), (17, -0.059), (18, 0.018), (19, -0.134), (20, 0.007), (21, -0.004), (22, -0.083), (23, 0.026), (24, -0.083), (25, 0.04), (26, 0.025), (27, 0.055), (28, -0.016), (29, -0.003), (30, -0.022), (31, 0.105), (32, 0.08), (33, 0.024), (34, -0.025), (35, -0.086), (36, 0.052), (37, -0.01), (38, -0.064), (39, 0.082), (40, 0.048), (41, 0.124), (42, -0.055), (43, -0.097), (44, -0.024), (45, 0.008), (46, 0.061), (47, 0.048), (48, -0.02), (49, -0.076)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94852054 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

Author: Tom Ouyang, Randall Davis

Abstract: We propose a new sketch recognition framework that combines a rich representation of low level visual appearance with a graphical model for capturing high level relationships between symbols. This joint model of appearance and context allows our framework to be less sensitive to noise and drawing variations, improving accuracy and robustness. The result is a recognizer that is better able to handle the wide range of drawing styles found in messy freehand sketches. We evaluate our work on two real-world domains, molecular diagrams and electrical circuit diagrams, and show that our combined approach significantly improves recognition performance. 1

2 0.59011269 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

Author: Tomasz Malisiewicz, Alyosha Efros

Abstract: The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the object’s relationship to other elements of the scene (context). Most current approaches rely on modeling the relationships between object categories as a source of context. In this paper we seek to move beyond categories to provide a richer appearancebased model of context. We present an exemplar-based model of objects and their relationships, the Visual Memex, that encodes both local appearance and 2D spatial context between object instances. We evaluate our model on Torralba’s proposed Context Challenge against a baseline category-based system. Our experiments suggest that moving beyond categories for context modeling appears to be quite beneficial, and may be the critical missing ingredient in scene understanding systems. 1

3 0.58901268 170 nips-2009-Nonlinear directed acyclic structure learning with weakly additive noise models

Author: Arthur Gretton, Peter Spirtes, Robert E. Tillman

Abstract: The recently proposed additive noise model has advantages over previous directed structure learning approaches since it (i) does not assume linearity or Gaussianity and (ii) can discover a unique DAG rather than its Markov equivalence class. However, for certain distributions, e.g. linear Gaussians, the additive noise model is invertible and thus not useful for structure learning, and it was originally proposed for the two variable case with a multivariate extension which requires enumerating all possible DAGs. We introduce weakly additive noise models, which extends this framework to cases where the additive noise model is invertible and when additive noise is not present. We then provide an algorithm that learns an equivalence class for such models from data, by combining a PC style search using recent advances in kernel measures of conditional dependence with local searches for additive noise models in substructures of the Markov equivalence class. This results in a more computationally efficient approach that is useful for arbitrary distributions even when additive noise models are invertible. 1

4 0.57737738 201 nips-2009-Region-based Segmentation and Object Detection

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1

5 0.55576688 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

Author: Mario Fritz, Gary Bradski, Sergey Karayev, Trevor Darrell, Michael J. Black

Abstract: Existing methods for visual recognition based on quantized local features can perform poorly when local features exist on transparent surfaces, such as glass or plastic objects. There are characteristic patterns to the local appearance of transparent objects, but they may not be well captured by distances to individual examples or by a local pattern codebook obtained by vector quantization. The appearance of a transparent patch is determined in part by the refraction of a background pattern through a transparent medium: the energy from the background usually dominates the patch appearance. We model transparent local patch appearance using an additive model of latent factors: background factors due to scene content, and factors which capture a local edge energy distribution characteristic of the refraction. We implement our method using a novel LDA-SIFT formulation which performs LDA prior to any vector quantization step; we discover latent topics which are characteristic of particular transparent patches and quantize the SIFT space into transparent visual words according to the latent topic dimensions. No knowledge of the background scene is required at test time; we show examples recognizing transparent glasses in a domestic environment. 1

6 0.5498895 211 nips-2009-Segmenting Scenes by Matching Image Composites

7 0.54727125 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection

8 0.52664149 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis

9 0.5231939 133 nips-2009-Learning models of object structure

10 0.51782233 10 nips-2009-A Gaussian Tree Approximation for Integer Least-Squares

11 0.49556273 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

12 0.48618659 236 nips-2009-Structured output regression for detection with partial truncation

13 0.45286083 97 nips-2009-Free energy score space

14 0.4414148 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

15 0.43015343 35 nips-2009-Approximating MAP by Compensating for Structural Relaxations

16 0.43009228 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference

17 0.4190819 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

18 0.41345084 141 nips-2009-Local Rules for Global MAP: When Do They Work ?

19 0.41179875 149 nips-2009-Maximin affinity learning of image segmentation

20 0.41043812 175 nips-2009-Occlusive Components Analysis


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.012), (21, 0.015), (24, 0.033), (25, 0.125), (33, 0.219), (35, 0.108), (36, 0.085), (39, 0.058), (58, 0.069), (71, 0.064), (81, 0.035), (86, 0.076), (91, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.85111147 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

Author: Tom Ouyang, Randall Davis

Abstract: We propose a new sketch recognition framework that combines a rich representation of low level visual appearance with a graphical model for capturing high level relationships between symbols. This joint model of appearance and context allows our framework to be less sensitive to noise and drawing variations, improving accuracy and robustness. The result is a recognizer that is better able to handle the wide range of drawing styles found in messy freehand sketches. We evaluate our work on two real-world domains, molecular diagrams and electrical circuit diagrams, and show that our combined approach significantly improves recognition performance. 1

2 0.80643117 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity

Author: Yusuke Fujiwara, Yoichi Miyawaki, Yukiyasu Kamitani

Abstract: Image representation based on image bases provides a framework for understanding neural representation of visual perception. A recent fMRI study has shown that arbitrary contrast-defined visual images can be reconstructed from fMRI activity patterns using a combination of multi-scale local image bases. In the reconstruction model, the mapping from an fMRI activity pattern to the contrasts of the image bases was learned from measured fMRI responses to visual images. But the shapes of the images bases were fixed, and thus may not be optimal for reconstruction. Here, we propose a method to build a reconstruction model in which image bases are automatically extracted from the measured data. We constructed a probabilistic model that relates the fMRI activity space to the visual image space via a set of latent variables. The mapping from the latent variables to the visual image space can be regarded as a set of image bases. We found that spatially localized, multi-scale image bases were estimated near the fovea, and that the model using the estimated image bases was able to accurately reconstruct novel visual images. The proposed method provides a means to discover a novel functional mapping between stimuli and brain activity patterns.

3 0.69883221 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

Author: Mario Fritz, Gary Bradski, Sergey Karayev, Trevor Darrell, Michael J. Black

Abstract: Existing methods for visual recognition based on quantized local features can perform poorly when local features exist on transparent surfaces, such as glass or plastic objects. There are characteristic patterns to the local appearance of transparent objects, but they may not be well captured by distances to individual examples or by a local pattern codebook obtained by vector quantization. The appearance of a transparent patch is determined in part by the refraction of a background pattern through a transparent medium: the energy from the background usually dominates the patch appearance. We model transparent local patch appearance using an additive model of latent factors: background factors due to scene content, and factors which capture a local edge energy distribution characteristic of the refraction. We implement our method using a novel LDA-SIFT formulation which performs LDA prior to any vector quantization step; we discover latent topics which are characteristic of particular transparent patches and quantize the SIFT space into transparent visual words according to the latent topic dimensions. No knowledge of the background scene is required at test time; we show examples recognizing transparent glasses in a domestic environment. 1

4 0.69810325 133 nips-2009-Learning models of object structure

Author: Joseph Schlecht, Kobus Barnard

Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1

5 0.68250525 211 nips-2009-Segmenting Scenes by Matching Image Composites

Author: Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman

Abstract: In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.

6 0.68249518 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

7 0.68224555 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

8 0.68199033 168 nips-2009-Non-stationary continuous dynamic Bayesian networks

9 0.67989266 113 nips-2009-Improving Existing Fault Recovery Policies

10 0.67729872 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

11 0.6762554 175 nips-2009-Occlusive Components Analysis

12 0.675892 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

13 0.67430437 219 nips-2009-Slow, Decorrelated Features for Pretraining Complex Cell-like Networks

14 0.67221236 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

15 0.67135316 97 nips-2009-Free energy score space

16 0.67088872 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing

17 0.67065805 25 nips-2009-Adaptive Design Optimization in Experiments with People

18 0.66816705 34 nips-2009-Anomaly Detection with Score functions based on Nearest Neighbor Graphs

19 0.66793031 226 nips-2009-Spatial Normalized Gamma Processes

20 0.66494119 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference