iccv iccv2013 iccv2013-376 knowledge-graph by maker-knowledge-mining

376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection


Source: pdf

Author: Lukáš Neumann, Jiri Matas

Abstract: An unconstrained end-to-end text localization and recognition method is presented. The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearestneighbor classifier trained on synthetic data in a plain form. The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 c z @ Abstract An unconstrained end-to-end text localization and recognition method is presented. [sent-7, score-0.338]

2 The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. [sent-8, score-0.541]

3 Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. [sent-9, score-1.271]

4 Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. [sent-10, score-0.982]

5 The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. [sent-11, score-0.954]

6 The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearestneighbor classifier trained on synthetic data in a plain form. [sent-12, score-0.358]

7 The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition. [sent-13, score-0.338]

8 The character representation allows efficient detection of rotated characters, as only a permutation of the feature vector is required. [sent-23, score-0.507]

9 The second, recently more popular approach [4, 13, 11, 19, 12, 15] is based on localizing individual characters as connected components using local properties of an image (color, intensity, stroke-width, etc. [sent-27, score-0.351]

10 The complexity of the methods does not depend on the parameters of the text as characters of all scales and orientations can be detected in one pass and the connected component representation also provides character segmentation which can be exploited in an OCR stage. [sent-29, score-1.19]

11 The biggest disadvantage of such methods is a dependence on the assumption that a character is a connected component, which is very brittle - a change in a sin- Stroke DetectionCandDideatetect Rioengion Character Recogniton Word FormationWord NMS Figure 2. [sent-30, score-0.477]

12 The assumption also prevents the methods from detecting characters which consist of several connected components or where multiple characters are joint into a single connected component. [sent-32, score-0.748]

13 As a first contribution, we introduce a novel approach for character detection which combines the advantages of sliding-window and connected component methods. [sent-34, score-0.541]

14 In the proposed method, the detected strokes induce the set of rectangles to be classified, which reduces the number of rectangles by three orders of magnitude when compared to the standard sliding-window methods. [sent-35, score-0.758]

15 As a second contribution, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. [sent-37, score-0.982]

16 The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character’s size and positioning. [sent-38, score-0.561]

17 The effectiveness of the representation is demonstrated by the results achieved in classification of real-world characters using a linear (approximative) nearest-neighbor classifier trained on synthetic data in a plain form (i. [sent-39, score-0.358]

18 The method was evaluated on the most cited dataset [14], where it achieves state-of-the-art results in both text localization and recognition. [sent-43, score-0.338]

19 Previous Work Several methods which focus only on a particular subproblem (text localization [6, 13, 19, 4] or character respectively word cut-out recognition [3, 9]) have been published. [sent-49, score-0.514]

20 Pairs of parallel edges are then used to calculate stroke width for each pixel and pixels with a similar stroke width are grouped together into characters. [sent-52, score-1.12]

21 Alongside the aforementioned limitation of the connected components methods, the method also relies on a successful edge detection which might be problematic in noisy images and moreover it cannot handle ambiguities because each image pixel can belong to only one stroke. [sent-53, score-0.121]

22 The proposed method differs radically from [4] in that it does not rely on hard de- cisions made by an edge detector, and in that it does not aim to estimate the stroke width (it actually assumes a unit stroke width - see Section 3. [sent-54, score-1.12]

23 1), but rather it estimates the possible positions of strokes and detects characters based on known patterns of stroke orientations and their relative positions. [sent-55, score-1.286]

24 provides character segmentation but does not perform text recognition. [sent-57, score-0.645]

25 The method of Wang and Belongie [17] finds individual characters as visual words using the sliding-window approach and then uses a lexicon to group characters into words. [sent-58, score-0.608]

26 An end-to-end text localization and recognition method [ 12] introduced by Neumann and Matas detects characters as a subset of Extremal Regions and then recognizes candidate regions in a separate OCR stage. [sent-60, score-0.822]

27 The text recognition however performs poorly on noisy images, because of the sensitivity induced by the connected component assumption. [sent-62, score-0.415]

28 Moreover, the character representation exploited in the method [ 1 1] is based on a direction of the boundary pixels chain-code, whose robustness is limited. [sent-63, score-0.453]

29 For an exhaustive survey of text localization and recognition methods refer to the ICDAR Robust Reading competition results [8, 7, 14]. [sent-64, score-0.435]

30 The Proposed Method We assume that each character is defined by a set of its strokes and their relative position. [sent-66, score-0.83]

31 For instance, the letter “F” consists oftwo strokes in the 0◦ direction and one stroke in the 90◦ direction, where the 90◦ stroke is to the left from the two 0◦ strokes and on the contrary the 0◦ strokes are lo- in the gradient (approximately) perpendicular to the stroke direction. [sent-67, score-2.842]

32 Note that the distance w between the ridges is the stroke width (a)(b) (c)(d) Figure 4. [sent-68, score-0.606]

33 In the proposed method, the strokes are modelled as responses to oriented filters in the gradient projection scale space (see Section 3. [sent-79, score-0.583]

34 1) and the relative stroke position is modelled by subsampling the responses into a fixed-sized matrix. [sent-80, score-0.61]

35 Characters are detected by recognizing a known stroke pattern with a classifier trained with synthetic data (see Section 3. [sent-81, score-0.58]

36 A gradient projection Gα,s in the direction α and scale s is the change of intensity in the image I rotated by angle α and resized to the scale s, taken in the horizontal direction, i. [sent-86, score-0.149]

37 A stroke of direction α can be detected as two opposing ridges in the gradient perpendicular to the stroke direction (see Figure 3), where the distance w between the two ridges corresponds to stroke width. [sent-89, score-1.75]

38 In the proposed method, we assume the stroke width value is one (w = 1) and we search for all strokes of unit width in a scale space by convolving the gradient projection with a 5 5 filter that responds to such strokes. [sent-90, score-1.22]

39 The response of the convolution filter Rα,s in the direction α and scale s is defined as Rα,s = ? [sent-91, score-0.135]

40 2 responds to a negative gradient ridge in the distance ofone pixel from a positive gradient ridge and the second and third term suppress the response where there is only positive respectively negative gradient ridge. [sent-96, score-0.276]

41 The thresholding parameter Θ represents a trade-off between an ability to detect strokes of low-contrast and the number of candidate regions to classify (see Section 3. [sent-97, score-0.564]

42 In our implementation, we set Θ = 8 for all directions and scales; lowering the threshold did not further improve method’s recall, but it increased the number of candidate regions (see Figure 6). [sent-99, score-0.16]

43 In our experiments, we first normalized the contrast of the gradient projection with a low-pass filter and then convolved it over a range of 10 exponentially decreasing scales (in the interval of 0. [sent-100, score-0.193]

44 Candidate regions are induced through bounding-boxes of strokes, which reduces the number of target rectangles by three orders of magnitude when compared to the sliding-window methods. [sent-104, score-0.3]

45 Note that the proposed method does not extract any image patches as part of the process twice, once in the original image and once in an inverted image to detect strokes with an opposite ridge orientation. [sent-107, score-0.438]

46 Candidate Region Detection In the next step, we generate candidate image regions (in the form of bounding-boxes) for classification. [sent-110, score-0.16]

47 Unlike sliding-window methods which exhaustively evaluate all image regions, we exploit the fact that we are only interested in image regions which contain at least one stroke (in our character representation, regions without any stroke would be rejected as non-characters anyways). [sent-111, score-1.561]

48 Moreover, if we assume that for each character there exists a subset of its strokes that induces its bounding-box, we can efficiently generate candidate regions by taking unions of stroke bounding-boxes. [sent-112, score-1.466]

49 The set of candidate regions T is then defined as K T = ? [sent-114, score-0.16]

50 , Nk(b)} , Nk(b) ∈ B (7) where denotes a union of bounding-boxes (the smallest rectangle that contains all rectangles in the set), ? [sent-124, score-0.142]

51 The number of candidate regions |T| as a function of the thresholding parameter Θ bounding-boxes’ centers). [sent-128, score-0.16]

52 In other words, for each connected component (stroke) we consider its bounding-box and then K bounding-boxes created as a union of boundingboxes of 1 to K nearest connected components (see Figure 5). [sent-129, score-0.226]

53 The problem is overcomplete as typically there are many different combinations of strokes which induce an identical or nearly identical character bounding-box, which reduces the probability of missing a true bounding-box in such a greedy approach. [sent-130, score-0.863]

54 For example, consider the letter “E” - it consists of 4 strokes (3 horizontal and 1 vertical) and 4 out of 6 possible stroke pairs (and all 4 possible stroke triplets) induce identical character bounding-box. [sent-131, score-1.864]

55 Moreover, the ex- ×× act position of the character bounding-box is not crucial because the character representation is robust to shift. [sent-132, score-0.827]

56 This property also allows to further improve the method’s performance by eliminating similar rectangles from the T set by keeping only the largest rectangle of the similar rectangles (two rectangles are considered similar if their intersection is more than 95% of their union). [sent-133, score-0.333]

57 The number of neighboring bounding-boxes K was set to 5; increasing the value further had very little impact on the overall results because of the overcompletness of the task and also because characters in our datasets (coming from Latin alphabet) consist of a relatively low number of strokes. [sent-139, score-0.289]

58 Chinese script) might however require an increase ofthe parameter value, but this still would be computationally feasible as the number of candidate regions is linear in the number of strokes. [sent-142, score-0.16]

59 Character Recognition Each candidate region b ∈ T is labelled with a Unicode code(s) or rejected as “unknown” in the following process. [sent-145, score-0.135]

60 At first, a response of the candidate region Rα (b) is calculated as a maximum pooled over an interval of scales Rα(b) = maxs∈ρ(b)M20(Rαb,s) ? [sent-146, score-0.28]

61 1) is used for each region, depending on its size and aspect, so that strokes from lower scales do not suppress the ones from a higher scale. [sent-156, score-0.44]

62 The subset is determined by the trained function ρ(b), which maps region’s height and width to an interval of admissible scales. [sent-157, score-0.173]

63 For example, a region which is two times wider than × higher can only be occupied by characters with a similar aspect (i. [sent-158, score-0.305]

64 ) and this limits the interval of possible stroke widths. [sent-163, score-0.543]

65 Because of the assumption of the unit stroke width, the interval of possible stroke widths unambiguously determines the interval of admissible scales. [sent-164, score-1.132]

66 A set of character regions R is then defined as? [sent-178, score-0.478]

67 The character representation is based on positions of ori- ented strokes, which are pooled over multiple scales Figure 8. [sent-187, score-0.471]

68 The set contains 5580 characters from 90 fonts with no distortions, blurring or rotations In our experiments, the training set consists of images with a single black letter on a white background (see Figure 8). [sent-189, score-0.428]

69 In total there were 5580 training samples (62 character classes in 90 different fonts). [sent-190, score-0.393]

70 The value of β represents a tradeoff between detecting more characters from fonts not in the training set and more false positives. [sent-194, score-0.363]

71 Word Formation Given a set of character regions R, the regions are agglomerated into a set of text lines T (see Algorithm 1). [sent-199, score-0.815]

72 The partial ordering is induced by relative position of the regions in the direction of the text line and represents a left-to-right ordering of characters in a word. [sent-202, score-0.812]

73 In other words, the partial ordering is induced by the restriction that a region can only be preceded by regions to the left and succeeded by regions to the right from the particular region, allowing for a small overlap. [sent-203, score-0.292]

74 To detect words in the image and recognize their content, an optimal sequence is found in each text line (where the order in the sequence is induced by the partial ordering of the text line) by maximizing the objective function L∗(T) = argmax∀i,l∈Lˆ(ri)? [sent-204, score-0.633]

75 (16) The probability pS models the observation that the spacing between characters does not vary a lot in a single word. [sent-215, score-0.336]

76 We define the difference of spacing of three regions as − Δs Δs(r1,r2,r3) =m|asx12(s−12 s,2s32|3) = rjL − riR (17) where rL and rR denote left respectively right boundary of the region in the orientation of text. [sent-216, score-0.16]

77 Similarly, the probability pP models the observation that positioning of a character triplet is not arbitrary. [sent-218, score-0.453]

78 The probability pA is approximated by relative frequencies of character triplets, which are calculated in the training stage (a list of approx. [sent-227, score-0.5]

79 As a final step, spaces are detected as peaks in the histogram of inter-character spacings to break down text lines into words and overlapping words are eliminated through a non-maximum suppression. [sent-230, score-0.388]

80 Experiments The proposed method was evaluated on the ICDAR 2011Robust Reading competition dataset [14], which contains 1189 words and 6393 letters in 255 images. [sent-232, score-0.142]

81 Using 102 Data: a set of regions R Result: a set of text lines T T ∅; ←− R ←− R; D ←− [−45, −35, . [sent-233, score-0.337]

82 3% in text localization (see Figure 10 for sample outputs). [sent-242, score-0.338]

83 The method achieves significantly better recall (66%) than the winner of ICDAR 2011 Robust Reading competition (62%) and the recently published Shi’s method [15] (63%). [sent-243, score-0.201]

84 Let us note that the ICDAR 2011competition was held in an open mode where authors supply only outputs of their methods on a previously published competition dataset. [sent-245, score-0.132]

85 In the end-to-end text recognition, the method achieves the recall of 45. [sent-246, score-0.273]

86 The problems of the method include ambiguities introduced by the fact that a subregion of a character might be another character, failures to detect letters on word boundaries which consist of just one stroke (e. [sent-253, score-0.986]

87 “I”, “l”) and false positives caused by strokes around areas with text (see Figure 11). [sent-255, score-0.656]

88 Comparison with most recent end-to-end text recognition results on the ICDAR 2011dataset. [sent-267, score-0.252]

89 Conclusions An end-to-end real-time text localization and recognition method was presented in the paper. [sent-269, score-0.338]

90 The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. [sent-270, score-0.541]

91 Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. [sent-271, score-1.271]

92 The characters are selected from an efficiently obtained set oftarget regions by a nearest-neighbor classifier, which exploits novel character representations based on strokes. [sent-272, score-0.766]

93 On the standard ICDAR 2011 dataset [14], the method achieves state-of-the-art results in both text localization and end-to-end text recognition. [sent-273, score-0.59]

94 Detecting text in natural scenes with stroke width transform. [sent-298, score-0.812]

95 Text detection and localization in complex scene images using constrained adaboost algorithm. [sent-304, score-0.167]

96 7 duced by the fact that a subregion of a character might be another character (“nD”). [sent-307, score-0.834]

97 A failed detection of a letter on word boundary which consists of just one stroke (“i”) [6] L. [sent-308, score-0.617]

98 A method for text localization and [12] [13] [14] [15] recognition in real-world images. [sent-349, score-0.338]

99 ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. [sent-369, score-0.456]

100 Scene text detection using graph model built upon maximally stable extremal regions. [sent-377, score-0.318]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('stroke', 0.488), ('strokes', 0.404), ('character', 0.393), ('characters', 0.267), ('icdar', 0.258), ('text', 0.252), ('rectangles', 0.111), ('competition', 0.097), ('ri', 0.092), ('localization', 0.086), ('regions', 0.085), ('reading', 0.085), ('connected', 0.084), ('candidate', 0.075), ('width', 0.072), ('neumann', 0.072), ('fonts', 0.072), ('orientations', 0.064), ('ocr', 0.057), ('letter', 0.057), ('interval', 0.055), ('induced', 0.052), ('convolving', 0.051), ('czech', 0.05), ('nk', 0.05), ('subregion', 0.048), ('winner', 0.048), ('gradient', 0.048), ('matas', 0.048), ('admissible', 0.046), ('ridges', 0.046), ('detected', 0.046), ('words', 0.045), ('epstein', 0.044), ('script', 0.043), ('direction', 0.039), ('approximative', 0.039), ('felk', 0.039), ('republic', 0.039), ('region', 0.038), ('convolution', 0.038), ('spacing', 0.037), ('detection', 0.037), ('distortions', 0.037), ('ps', 0.036), ('scales', 0.036), ('bar', 0.036), ('cvut', 0.036), ('responses', 0.035), ('word', 0.035), ('published', 0.035), ('modelled', 0.034), ('induce', 0.034), ('ridge', 0.034), ('cmp', 0.034), ('oriented', 0.033), ('relative', 0.033), ('response', 0.033), ('rotated', 0.033), ('blurring', 0.032), ('probability', 0.032), ('ordering', 0.032), ('pages', 0.031), ('union', 0.031), ('responds', 0.031), ('detects', 0.03), ('projection', 0.029), ('lexicon', 0.029), ('extremal', 0.029), ('pl', 0.028), ('positioning', 0.028), ('component', 0.027), ('recognizes', 0.027), ('pp', 0.027), ('magnitude', 0.026), ('disconnected', 0.026), ('rb', 0.026), ('orders', 0.026), ('pa', 0.025), ('filter', 0.025), ('detecting', 0.024), ('plain', 0.024), ('agency', 0.023), ('classifier', 0.023), ('triplets', 0.023), ('permutation', 0.023), ('synthetic', 0.023), ('calculated', 0.022), ('perpendicular', 0.022), ('adaboost', 0.022), ('consist', 0.022), ('rejected', 0.022), ('scene', 0.022), ('recall', 0.021), ('pooled', 0.021), ('efficiently', 0.021), ('representation', 0.021), ('frequencies', 0.02), ('position', 0.02), ('google', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection

Author: Lukáš Neumann, Jiri Matas

Abstract: An unconstrained end-to-end text localization and recognition method is presented. The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearestneighbor classifier trained on synthetic data in a plain form. The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition.

2 0.57862502 415 iccv-2013-Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors

Author: Weilin Huang, Zhe Lin, Jianchao Yang, Jue Wang

Abstract: In this paper, we present a new approach for text localization in natural images, by discriminating text and non-text regions at three levels: pixel, component and textline levels. Firstly, a powerful low-level filter called the Stroke Feature Transform (SFT) is proposed, which extends the widely-used Stroke Width Transform (SWT) by incorporating color cues of text pixels, leading to significantly enhanced performance on inter-component separation and intra-component connection. Secondly, based on the output of SFT, we apply two classifiers, a text component classifier and a text-line classifier, sequentially to extract text regions, eliminating the heuristic procedures that are commonly used in previous approaches. The two classifiers are built upon two novel Text Covariance Descriptors (TCDs) that encode both the heuristic properties and the statistical characteristics of text stokes. Finally, text regions are located by simply thresholding the text-line confident map. Our method was evaluated on two benchmark datasets: ICDAR 2005 and ICDAR 2011, and the corresponding F- , measure values are 0. 72 and 0. 73, respectively, surpassing previous methods in accuracy by a large margin.

3 0.45496801 210 iccv-2013-Image Retrieval Using Textual Cues

Author: Anand Mishra, Karteek Alahari, C.V. Jawahar

Abstract: We present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-artmethods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce.

4 0.4262712 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions

Author: Alessandro Bissacco, Mark Cummins, Yuval Netzer, Hartmut Neven

Abstract: We describe PhotoOCR, a system for text extraction from images. Our particular focus is reliable text extraction from smartphone imagery, with the goal of text recognition as a user input modality similar to speech recognition. Commercially available OCR performs poorly on this task. Recent progress in machine learning has substantially improved isolated character classification; we build on this progress by demonstrating a complete OCR system using these techniques. We also incorporate modern datacenter-scale distributed language modelling. Our approach is capable of recognizing text in a variety of challenging imaging conditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions. It also operates with low latency; mean processing time is 600 ms per image. We evaluate our system on public benchmark datasets for text extraction and outperform all previously reported results, more than halving the error rate on multiple benchmarks. The system is currently in use in many applications at Google, and is available as a user input modality in Google Translate for Android.

5 0.38833797 345 iccv-2013-Recognizing Text with Perspective Distortion in Natural Scenes

Author: Trung Quy Phan, Palaiahnakote Shivakumara, Shangxuan Tian, Chew Lim Tan

Abstract: This paper presents an approach to text recognition in natural scene images. Unlike most existing works which assume that texts are horizontal and frontal parallel to the image plane, our method is able to recognize perspective texts of arbitrary orientations. For individual character recognition, we adopt a bag-of-keypoints approach, in which Scale Invariant Feature Transform (SIFT) descriptors are extracted densely and quantized using a pre-trained vocabulary. Following [1, 2], the context information is utilized through lexicons. We formulate word recognition as finding the optimal alignment between the set of characters and the list of lexicon words. Furthermore, we introduce a new dataset called StreetViewText-Perspective, which contains texts in street images with a great variety of viewpoints. Experimental results on public datasets and the proposed dataset show that our method significantly outperforms the state-of-the-art on perspective texts of arbitrary orientations.

6 0.21108083 180 iccv-2013-From Where and How to What We See

7 0.12511985 192 iccv-2013-Handwritten Word Spotting with Corrected Attributes

8 0.091454163 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences

9 0.089214496 166 iccv-2013-Finding Actors and Actions in Movies

10 0.079671696 44 iccv-2013-Adapting Classification Cascades to New Domains

11 0.061564162 235 iccv-2013-Learning Coupled Feature Spaces for Cross-Modal Matching

12 0.056656707 330 iccv-2013-Proportion Priors for Image Sequence Segmentation

13 0.05462756 327 iccv-2013-Predicting an Object Location Using a Global Image Representation

14 0.049368333 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach

15 0.047299359 170 iccv-2013-Fingerspelling Recognition with Semi-Markov Conditional Random Fields

16 0.046099283 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition

17 0.044955805 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image

18 0.044836827 277 iccv-2013-Multi-channel Correlation Filters

19 0.044661224 74 iccv-2013-Co-segmentation by Composition

20 0.044146843 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.144), (1, 0.023), (2, -0.006), (3, -0.074), (4, 0.06), (5, 0.057), (6, 0.036), (7, -0.031), (8, -0.083), (9, -0.036), (10, 0.467), (11, -0.172), (12, 0.198), (13, 0.127), (14, 0.04), (15, 0.141), (16, -0.102), (17, 0.183), (18, -0.315), (19, 0.15), (20, 0.171), (21, 0.152), (22, 0.051), (23, -0.004), (24, 0.032), (25, -0.041), (26, -0.039), (27, 0.065), (28, 0.04), (29, -0.013), (30, -0.012), (31, 0.054), (32, -0.019), (33, 0.037), (34, 0.044), (35, 0.017), (36, 0.044), (37, 0.015), (38, 0.051), (39, -0.025), (40, -0.032), (41, -0.017), (42, 0.081), (43, -0.025), (44, -0.014), (45, 0.013), (46, -0.037), (47, 0.032), (48, 0.009), (49, 0.006)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96507269 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection

Author: Lukáš Neumann, Jiri Matas

Abstract: An unconstrained end-to-end text localization and recognition method is presented. The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearestneighbor classifier trained on synthetic data in a plain form. The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition.

2 0.91775757 415 iccv-2013-Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors

Author: Weilin Huang, Zhe Lin, Jianchao Yang, Jue Wang

Abstract: In this paper, we present a new approach for text localization in natural images, by discriminating text and non-text regions at three levels: pixel, component and textline levels. Firstly, a powerful low-level filter called the Stroke Feature Transform (SFT) is proposed, which extends the widely-used Stroke Width Transform (SWT) by incorporating color cues of text pixels, leading to significantly enhanced performance on inter-component separation and intra-component connection. Secondly, based on the output of SFT, we apply two classifiers, a text component classifier and a text-line classifier, sequentially to extract text regions, eliminating the heuristic procedures that are commonly used in previous approaches. The two classifiers are built upon two novel Text Covariance Descriptors (TCDs) that encode both the heuristic properties and the statistical characteristics of text stokes. Finally, text regions are located by simply thresholding the text-line confident map. Our method was evaluated on two benchmark datasets: ICDAR 2005 and ICDAR 2011, and the corresponding F- , measure values are 0. 72 and 0. 73, respectively, surpassing previous methods in accuracy by a large margin.

3 0.91734666 345 iccv-2013-Recognizing Text with Perspective Distortion in Natural Scenes

Author: Trung Quy Phan, Palaiahnakote Shivakumara, Shangxuan Tian, Chew Lim Tan

Abstract: This paper presents an approach to text recognition in natural scene images. Unlike most existing works which assume that texts are horizontal and frontal parallel to the image plane, our method is able to recognize perspective texts of arbitrary orientations. For individual character recognition, we adopt a bag-of-keypoints approach, in which Scale Invariant Feature Transform (SIFT) descriptors are extracted densely and quantized using a pre-trained vocabulary. Following [1, 2], the context information is utilized through lexicons. We formulate word recognition as finding the optimal alignment between the set of characters and the list of lexicon words. Furthermore, we introduce a new dataset called StreetViewText-Perspective, which contains texts in street images with a great variety of viewpoints. Experimental results on public datasets and the proposed dataset show that our method significantly outperforms the state-of-the-art on perspective texts of arbitrary orientations.

4 0.88918871 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions

Author: Alessandro Bissacco, Mark Cummins, Yuval Netzer, Hartmut Neven

Abstract: We describe PhotoOCR, a system for text extraction from images. Our particular focus is reliable text extraction from smartphone imagery, with the goal of text recognition as a user input modality similar to speech recognition. Commercially available OCR performs poorly on this task. Recent progress in machine learning has substantially improved isolated character classification; we build on this progress by demonstrating a complete OCR system using these techniques. We also incorporate modern datacenter-scale distributed language modelling. Our approach is capable of recognizing text in a variety of challenging imaging conditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions. It also operates with low latency; mean processing time is 600 ms per image. We evaluate our system on public benchmark datasets for text extraction and outperform all previously reported results, more than halving the error rate on multiple benchmarks. The system is currently in use in many applications at Google, and is available as a user input modality in Google Translate for Android.

5 0.79887199 210 iccv-2013-Image Retrieval Using Textual Cues

Author: Anand Mishra, Karteek Alahari, C.V. Jawahar

Abstract: We present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-artmethods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce.

6 0.68702143 180 iccv-2013-From Where and How to What We See

7 0.50521874 192 iccv-2013-Handwritten Word Spotting with Corrected Attributes

8 0.4802486 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences

9 0.36649853 235 iccv-2013-Learning Coupled Feature Spaces for Cross-Modal Matching

10 0.32407361 170 iccv-2013-Fingerspelling Recognition with Semi-Markov Conditional Random Fields

11 0.22060901 166 iccv-2013-Finding Actors and Actions in Movies

12 0.20819153 277 iccv-2013-Multi-channel Correlation Filters

13 0.20444353 44 iccv-2013-Adapting Classification Cascades to New Domains

14 0.20415995 408 iccv-2013-Super-resolution via Transform-Invariant Group-Sparse Regularization

15 0.19842714 112 iccv-2013-Detecting Irregular Curvilinear Structures in Gray Scale and Color Imagery Using Multi-directional Oriented Flux

16 0.19367541 388 iccv-2013-Shape Index Descriptors Applied to Texture-Based Galaxy Analysis

17 0.18861794 294 iccv-2013-Offline Mobile Instance Retrieval with a Small Memory Footprint

18 0.18454571 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification

19 0.18074061 55 iccv-2013-Automatic Kronecker Product Model Based Detection of Repeated Patterns in 2D Urban Images

20 0.1752968 278 iccv-2013-Multi-scale Topological Features for Hand Posture Representation and Analysis


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.052), (7, 0.12), (12, 0.015), (26, 0.078), (27, 0.023), (31, 0.147), (40, 0.019), (42, 0.099), (48, 0.019), (64, 0.045), (73, 0.039), (78, 0.014), (82, 0.079), (89, 0.132), (98, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86923707 415 iccv-2013-Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors

Author: Weilin Huang, Zhe Lin, Jianchao Yang, Jue Wang

Abstract: In this paper, we present a new approach for text localization in natural images, by discriminating text and non-text regions at three levels: pixel, component and textline levels. Firstly, a powerful low-level filter called the Stroke Feature Transform (SFT) is proposed, which extends the widely-used Stroke Width Transform (SWT) by incorporating color cues of text pixels, leading to significantly enhanced performance on inter-component separation and intra-component connection. Secondly, based on the output of SFT, we apply two classifiers, a text component classifier and a text-line classifier, sequentially to extract text regions, eliminating the heuristic procedures that are commonly used in previous approaches. The two classifiers are built upon two novel Text Covariance Descriptors (TCDs) that encode both the heuristic properties and the statistical characteristics of text stokes. Finally, text regions are located by simply thresholding the text-line confident map. Our method was evaluated on two benchmark datasets: ICDAR 2005 and ICDAR 2011, and the corresponding F- , measure values are 0. 72 and 0. 73, respectively, surpassing previous methods in accuracy by a large margin.

same-paper 2 0.86129117 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection

Author: Lukáš Neumann, Jiri Matas

Abstract: An unconstrained end-to-end text localization and recognition method is presented. The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearestneighbor classifier trained on synthetic data in a plain form. The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition.

3 0.85219997 357 iccv-2013-Robust Matrix Factorization with Unknown Noise

Author: Deyu Meng, Fernando De_La_Torre

Abstract: Many problems in computer vision can be posed as recovering a low-dimensional subspace from highdimensional visual data. Factorization approaches to lowrank subspace estimation minimize a loss function between an observed measurement matrix and a bilinear factorization. Most popular loss functions include the L2 and L1 losses. L2 is optimal for Gaussian noise, while L1 is for Laplacian distributed noise. However, real data is often corrupted by an unknown noise distribution, which is unlikely to be purely Gaussian or Laplacian. To address this problem, this paper proposes a low-rank matrix factorization problem with a Mixture of Gaussians (MoG) noise model. The MoG model is a universal approximator for any continuous distribution, and hence is able to model a wider range of noise distributions. The parameters of the MoG model can be estimated with a maximum likelihood method, while the subspace is computed with standard approaches. We illustrate the benefits of our approach in extensive syn- thetic and real-world experiments including structure from motion, face modeling and background subtraction.

4 0.85184526 38 iccv-2013-Action Recognition with Actons

Author: Jun Zhu, Baoyuan Wang, Xiaokang Yang, Wenjun Zhang, Zhuowen Tu

Abstract: With the improved accessibility to an exploding amount of video data and growing demands in a wide range of video analysis applications, video-based action recognition/classification becomes an increasingly important task in computer vision. In this paper, we propose a two-layer structure for action recognition to automatically exploit a mid-level “acton ” representation. The weakly-supervised actons are learned via a new max-margin multi-channel multiple instance learning framework, which can capture multiple mid-level action concepts simultaneously. The learned actons (with no requirement for detailed manual annotations) observe theproperties ofbeing compact, informative, discriminative, and easy to scale. The experimental results demonstrate the effectiveness ofapplying the learned actons in our two-layer structure, and show the state-ofthe-art recognition performance on two challenging action datasets, i.e., Youtube and HMDB51.

5 0.83870494 408 iccv-2013-Super-resolution via Transform-Invariant Group-Sparse Regularization

Author: Carlos Fernandez-Granda, Emmanuel J. Candès

Abstract: We present a framework to super-resolve planar regions found in urban scenes and other man-made environments by taking into account their 3D geometry. Such regions have highly structured straight edges, but this prior is challenging to exploit due to deformations induced by the projection onto the imaging plane. Our method factors out such deformations by using recently developed tools based on convex optimization to learn a transform that maps the image to a domain where its gradient has a simple group-sparse structure. This allows to obtain a novel convex regularizer that enforces global consistency constraints between the edges of the image. Computational experiments with real images show that this data-driven approach to the design of regularizers promoting transform-invariant group sparsity is very effective at high super-resolution factors. We view our approach as complementary to most recent superresolution methods, which tend to focus on hallucinating high-frequency textures.

6 0.83513665 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes

7 0.83477151 345 iccv-2013-Recognizing Text with Perspective Distortion in Natural Scenes

8 0.82390457 178 iccv-2013-From Semi-supervised to Transfer Counting of Crowds

9 0.80597603 212 iccv-2013-Image Set Classification Using Holistic Multiple Order Statistics Features and Localized Multi-kernel Metric Learning

10 0.80363929 323 iccv-2013-Pose Estimation with Unknown Focal Length Using Points, Directions and Lines

11 0.80339253 180 iccv-2013-From Where and How to What We See

12 0.80326259 291 iccv-2013-No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion

13 0.80326021 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction

14 0.79963481 275 iccv-2013-Motion-Aware KNN Laplacian for Video Matting

15 0.7980783 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions

16 0.79587442 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification

17 0.78895581 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures

18 0.77848798 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation

19 0.7763949 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation

20 0.77384013 173 iccv-2013-Fluttering Pattern Generation Using Modified Legendre Sequence for Coded Exposure Imaging