iccv iccv2013 iccv2013-376 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Lukáš Neumann, Jiri Matas
Abstract: An unconstrained end-to-end text localization and recognition method is presented. The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearestneighbor classifier trained on synthetic data in a plain form. The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition.
Reference: text
sentIndex sentText sentNum sentScore
1 c z @ Abstract An unconstrained end-to-end text localization and recognition method is presented. [sent-7, score-0.338]
2 The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. [sent-8, score-0.541]
3 Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. [sent-9, score-1.271]
4 Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. [sent-10, score-0.982]
5 The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. [sent-11, score-0.954]
6 The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearestneighbor classifier trained on synthetic data in a plain form. [sent-12, score-0.358]
7 The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition. [sent-13, score-0.338]
8 The character representation allows efficient detection of rotated characters, as only a permutation of the feature vector is required. [sent-23, score-0.507]
9 The second, recently more popular approach [4, 13, 11, 19, 12, 15] is based on localizing individual characters as connected components using local properties of an image (color, intensity, stroke-width, etc. [sent-27, score-0.351]
10 The complexity of the methods does not depend on the parameters of the text as characters of all scales and orientations can be detected in one pass and the connected component representation also provides character segmentation which can be exploited in an OCR stage. [sent-29, score-1.19]
11 The biggest disadvantage of such methods is a dependence on the assumption that a character is a connected component, which is very brittle - a change in a sin- Stroke DetectionCandDideatetect Rioengion Character Recogniton Word FormationWord NMS Figure 2. [sent-30, score-0.477]
12 The assumption also prevents the methods from detecting characters which consist of several connected components or where multiple characters are joint into a single connected component. [sent-32, score-0.748]
13 As a first contribution, we introduce a novel approach for character detection which combines the advantages of sliding-window and connected component methods. [sent-34, score-0.541]
14 In the proposed method, the detected strokes induce the set of rectangles to be classified, which reduces the number of rectangles by three orders of magnitude when compared to the standard sliding-window methods. [sent-35, score-0.758]
15 As a second contribution, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. [sent-37, score-0.982]
16 The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character’s size and positioning. [sent-38, score-0.561]
17 The effectiveness of the representation is demonstrated by the results achieved in classification of real-world characters using a linear (approximative) nearest-neighbor classifier trained on synthetic data in a plain form (i. [sent-39, score-0.358]
18 The method was evaluated on the most cited dataset [14], where it achieves state-of-the-art results in both text localization and recognition. [sent-43, score-0.338]
19 Previous Work Several methods which focus only on a particular subproblem (text localization [6, 13, 19, 4] or character respectively word cut-out recognition [3, 9]) have been published. [sent-49, score-0.514]
20 Pairs of parallel edges are then used to calculate stroke width for each pixel and pixels with a similar stroke width are grouped together into characters. [sent-52, score-1.12]
21 Alongside the aforementioned limitation of the connected components methods, the method also relies on a successful edge detection which might be problematic in noisy images and moreover it cannot handle ambiguities because each image pixel can belong to only one stroke. [sent-53, score-0.121]
22 The proposed method differs radically from [4] in that it does not rely on hard de- cisions made by an edge detector, and in that it does not aim to estimate the stroke width (it actually assumes a unit stroke width - see Section 3. [sent-54, score-1.12]
23 1), but rather it estimates the possible positions of strokes and detects characters based on known patterns of stroke orientations and their relative positions. [sent-55, score-1.286]
24 provides character segmentation but does not perform text recognition. [sent-57, score-0.645]
25 The method of Wang and Belongie [17] finds individual characters as visual words using the sliding-window approach and then uses a lexicon to group characters into words. [sent-58, score-0.608]
26 An end-to-end text localization and recognition method [ 12] introduced by Neumann and Matas detects characters as a subset of Extremal Regions and then recognizes candidate regions in a separate OCR stage. [sent-60, score-0.822]
27 The text recognition however performs poorly on noisy images, because of the sensitivity induced by the connected component assumption. [sent-62, score-0.415]
28 Moreover, the character representation exploited in the method [ 1 1] is based on a direction of the boundary pixels chain-code, whose robustness is limited. [sent-63, score-0.453]
29 For an exhaustive survey of text localization and recognition methods refer to the ICDAR Robust Reading competition results [8, 7, 14]. [sent-64, score-0.435]
30 The Proposed Method We assume that each character is defined by a set of its strokes and their relative position. [sent-66, score-0.83]
31 For instance, the letter “F” consists oftwo strokes in the 0◦ direction and one stroke in the 90◦ direction, where the 90◦ stroke is to the left from the two 0◦ strokes and on the contrary the 0◦ strokes are lo- in the gradient (approximately) perpendicular to the stroke direction. [sent-67, score-2.842]
32 Note that the distance w between the ridges is the stroke width (a)(b) (c)(d) Figure 4. [sent-68, score-0.606]
33 In the proposed method, the strokes are modelled as responses to oriented filters in the gradient projection scale space (see Section 3. [sent-79, score-0.583]
34 1) and the relative stroke position is modelled by subsampling the responses into a fixed-sized matrix. [sent-80, score-0.61]
35 Characters are detected by recognizing a known stroke pattern with a classifier trained with synthetic data (see Section 3. [sent-81, score-0.58]
36 A gradient projection Gα,s in the direction α and scale s is the change of intensity in the image I rotated by angle α and resized to the scale s, taken in the horizontal direction, i. [sent-86, score-0.149]
37 A stroke of direction α can be detected as two opposing ridges in the gradient perpendicular to the stroke direction (see Figure 3), where the distance w between the two ridges corresponds to stroke width. [sent-89, score-1.75]
38 In the proposed method, we assume the stroke width value is one (w = 1) and we search for all strokes of unit width in a scale space by convolving the gradient projection with a 5 5 filter that responds to such strokes. [sent-90, score-1.22]
39 The response of the convolution filter Rα,s in the direction α and scale s is defined as Rα,s = ? [sent-91, score-0.135]
40 2 responds to a negative gradient ridge in the distance ofone pixel from a positive gradient ridge and the second and third term suppress the response where there is only positive respectively negative gradient ridge. [sent-96, score-0.276]
41 The thresholding parameter Θ represents a trade-off between an ability to detect strokes of low-contrast and the number of candidate regions to classify (see Section 3. [sent-97, score-0.564]
42 In our implementation, we set Θ = 8 for all directions and scales; lowering the threshold did not further improve method’s recall, but it increased the number of candidate regions (see Figure 6). [sent-99, score-0.16]
43 In our experiments, we first normalized the contrast of the gradient projection with a low-pass filter and then convolved it over a range of 10 exponentially decreasing scales (in the interval of 0. [sent-100, score-0.193]
44 Candidate regions are induced through bounding-boxes of strokes, which reduces the number of target rectangles by three orders of magnitude when compared to the sliding-window methods. [sent-104, score-0.3]
45 Note that the proposed method does not extract any image patches as part of the process twice, once in the original image and once in an inverted image to detect strokes with an opposite ridge orientation. [sent-107, score-0.438]
46 Candidate Region Detection In the next step, we generate candidate image regions (in the form of bounding-boxes) for classification. [sent-110, score-0.16]
47 Unlike sliding-window methods which exhaustively evaluate all image regions, we exploit the fact that we are only interested in image regions which contain at least one stroke (in our character representation, regions without any stroke would be rejected as non-characters anyways). [sent-111, score-1.561]
48 Moreover, if we assume that for each character there exists a subset of its strokes that induces its bounding-box, we can efficiently generate candidate regions by taking unions of stroke bounding-boxes. [sent-112, score-1.466]
49 The set of candidate regions T is then defined as K T = ? [sent-114, score-0.16]
50 , Nk(b)} , Nk(b) ∈ B (7) where denotes a union of bounding-boxes (the smallest rectangle that contains all rectangles in the set), ? [sent-124, score-0.142]
51 The number of candidate regions |T| as a function of the thresholding parameter Θ bounding-boxes’ centers). [sent-128, score-0.16]
52 In other words, for each connected component (stroke) we consider its bounding-box and then K bounding-boxes created as a union of boundingboxes of 1 to K nearest connected components (see Figure 5). [sent-129, score-0.226]
53 The problem is overcomplete as typically there are many different combinations of strokes which induce an identical or nearly identical character bounding-box, which reduces the probability of missing a true bounding-box in such a greedy approach. [sent-130, score-0.863]
54 For example, consider the letter “E” - it consists of 4 strokes (3 horizontal and 1 vertical) and 4 out of 6 possible stroke pairs (and all 4 possible stroke triplets) induce identical character bounding-box. [sent-131, score-1.864]
55 Moreover, the ex- ×× act position of the character bounding-box is not crucial because the character representation is robust to shift. [sent-132, score-0.827]
56 This property also allows to further improve the method’s performance by eliminating similar rectangles from the T set by keeping only the largest rectangle of the similar rectangles (two rectangles are considered similar if their intersection is more than 95% of their union). [sent-133, score-0.333]
57 The number of neighboring bounding-boxes K was set to 5; increasing the value further had very little impact on the overall results because of the overcompletness of the task and also because characters in our datasets (coming from Latin alphabet) consist of a relatively low number of strokes. [sent-139, score-0.289]
58 Chinese script) might however require an increase ofthe parameter value, but this still would be computationally feasible as the number of candidate regions is linear in the number of strokes. [sent-142, score-0.16]
59 Character Recognition Each candidate region b ∈ T is labelled with a Unicode code(s) or rejected as “unknown” in the following process. [sent-145, score-0.135]
60 At first, a response of the candidate region Rα (b) is calculated as a maximum pooled over an interval of scales Rα(b) = maxs∈ρ(b)M20(Rαb,s) ? [sent-146, score-0.28]
61 1) is used for each region, depending on its size and aspect, so that strokes from lower scales do not suppress the ones from a higher scale. [sent-156, score-0.44]
62 The subset is determined by the trained function ρ(b), which maps region’s height and width to an interval of admissible scales. [sent-157, score-0.173]
63 For example, a region which is two times wider than × higher can only be occupied by characters with a similar aspect (i. [sent-158, score-0.305]
64 ) and this limits the interval of possible stroke widths. [sent-163, score-0.543]
65 Because of the assumption of the unit stroke width, the interval of possible stroke widths unambiguously determines the interval of admissible scales. [sent-164, score-1.132]
66 A set of character regions R is then defined as? [sent-178, score-0.478]
67 The character representation is based on positions of ori- ented strokes, which are pooled over multiple scales Figure 8. [sent-187, score-0.471]
68 The set contains 5580 characters from 90 fonts with no distortions, blurring or rotations In our experiments, the training set consists of images with a single black letter on a white background (see Figure 8). [sent-189, score-0.428]
69 In total there were 5580 training samples (62 character classes in 90 different fonts). [sent-190, score-0.393]
70 The value of β represents a tradeoff between detecting more characters from fonts not in the training set and more false positives. [sent-194, score-0.363]
71 Word Formation Given a set of character regions R, the regions are agglomerated into a set of text lines T (see Algorithm 1). [sent-199, score-0.815]
72 The partial ordering is induced by relative position of the regions in the direction of the text line and represents a left-to-right ordering of characters in a word. [sent-202, score-0.812]
73 In other words, the partial ordering is induced by the restriction that a region can only be preceded by regions to the left and succeeded by regions to the right from the particular region, allowing for a small overlap. [sent-203, score-0.292]
74 To detect words in the image and recognize their content, an optimal sequence is found in each text line (where the order in the sequence is induced by the partial ordering of the text line) by maximizing the objective function L∗(T) = argmax∀i,l∈Lˆ(ri)? [sent-204, score-0.633]
75 (16) The probability pS models the observation that the spacing between characters does not vary a lot in a single word. [sent-215, score-0.336]
76 We define the difference of spacing of three regions as − Δs Δs(r1,r2,r3) =m|asx12(s−12 s,2s32|3) = rjL − riR (17) where rL and rR denote left respectively right boundary of the region in the orientation of text. [sent-216, score-0.16]
77 Similarly, the probability pP models the observation that positioning of a character triplet is not arbitrary. [sent-218, score-0.453]
78 The probability pA is approximated by relative frequencies of character triplets, which are calculated in the training stage (a list of approx. [sent-227, score-0.5]
79 As a final step, spaces are detected as peaks in the histogram of inter-character spacings to break down text lines into words and overlapping words are eliminated through a non-maximum suppression. [sent-230, score-0.388]
80 Experiments The proposed method was evaluated on the ICDAR 2011Robust Reading competition dataset [14], which contains 1189 words and 6393 letters in 255 images. [sent-232, score-0.142]
81 Using 102 Data: a set of regions R Result: a set of text lines T T ∅; ←− R ←− R; D ←− [−45, −35, . [sent-233, score-0.337]
82 3% in text localization (see Figure 10 for sample outputs). [sent-242, score-0.338]
83 The method achieves significantly better recall (66%) than the winner of ICDAR 2011 Robust Reading competition (62%) and the recently published Shi’s method [15] (63%). [sent-243, score-0.201]
84 Let us note that the ICDAR 2011competition was held in an open mode where authors supply only outputs of their methods on a previously published competition dataset. [sent-245, score-0.132]
85 In the end-to-end text recognition, the method achieves the recall of 45. [sent-246, score-0.273]
86 The problems of the method include ambiguities introduced by the fact that a subregion of a character might be another character, failures to detect letters on word boundaries which consist of just one stroke (e. [sent-253, score-0.986]
87 “I”, “l”) and false positives caused by strokes around areas with text (see Figure 11). [sent-255, score-0.656]
88 Comparison with most recent end-to-end text recognition results on the ICDAR 2011dataset. [sent-267, score-0.252]
89 Conclusions An end-to-end real-time text localization and recognition method was presented in the paper. [sent-269, score-0.338]
90 The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. [sent-270, score-0.541]
91 Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. [sent-271, score-1.271]
92 The characters are selected from an efficiently obtained set oftarget regions by a nearest-neighbor classifier, which exploits novel character representations based on strokes. [sent-272, score-0.766]
93 On the standard ICDAR 2011 dataset [14], the method achieves state-of-the-art results in both text localization and end-to-end text recognition. [sent-273, score-0.59]
94 Detecting text in natural scenes with stroke width transform. [sent-298, score-0.812]
95 Text detection and localization in complex scene images using constrained adaboost algorithm. [sent-304, score-0.167]
96 7 duced by the fact that a subregion of a character might be another character (“nD”). [sent-307, score-0.834]
97 A failed detection of a letter on word boundary which consists of just one stroke (“i”) [6] L. [sent-308, score-0.617]
98 A method for text localization and [12] [13] [14] [15] recognition in real-world images. [sent-349, score-0.338]
99 ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. [sent-369, score-0.456]
100 Scene text detection using graph model built upon maximally stable extremal regions. [sent-377, score-0.318]
wordName wordTfidf (topN-words)
[('stroke', 0.488), ('strokes', 0.404), ('character', 0.393), ('characters', 0.267), ('icdar', 0.258), ('text', 0.252), ('rectangles', 0.111), ('competition', 0.097), ('ri', 0.092), ('localization', 0.086), ('regions', 0.085), ('reading', 0.085), ('connected', 0.084), ('candidate', 0.075), ('width', 0.072), ('neumann', 0.072), ('fonts', 0.072), ('orientations', 0.064), ('ocr', 0.057), ('letter', 0.057), ('interval', 0.055), ('induced', 0.052), ('convolving', 0.051), ('czech', 0.05), ('nk', 0.05), ('subregion', 0.048), ('winner', 0.048), ('gradient', 0.048), ('matas', 0.048), ('admissible', 0.046), ('ridges', 0.046), ('detected', 0.046), ('words', 0.045), ('epstein', 0.044), ('script', 0.043), ('direction', 0.039), ('approximative', 0.039), ('felk', 0.039), ('republic', 0.039), ('region', 0.038), ('convolution', 0.038), ('spacing', 0.037), ('detection', 0.037), ('distortions', 0.037), ('ps', 0.036), ('scales', 0.036), ('bar', 0.036), ('cvut', 0.036), ('responses', 0.035), ('word', 0.035), ('published', 0.035), ('modelled', 0.034), ('induce', 0.034), ('ridge', 0.034), ('cmp', 0.034), ('oriented', 0.033), ('relative', 0.033), ('response', 0.033), ('rotated', 0.033), ('blurring', 0.032), ('probability', 0.032), ('ordering', 0.032), ('pages', 0.031), ('union', 0.031), ('responds', 0.031), ('detects', 0.03), ('projection', 0.029), ('lexicon', 0.029), ('extremal', 0.029), ('pl', 0.028), ('positioning', 0.028), ('component', 0.027), ('recognizes', 0.027), ('pp', 0.027), ('magnitude', 0.026), ('disconnected', 0.026), ('rb', 0.026), ('orders', 0.026), ('pa', 0.025), ('filter', 0.025), ('detecting', 0.024), ('plain', 0.024), ('agency', 0.023), ('classifier', 0.023), ('triplets', 0.023), ('permutation', 0.023), ('synthetic', 0.023), ('calculated', 0.022), ('perpendicular', 0.022), ('adaboost', 0.022), ('consist', 0.022), ('rejected', 0.022), ('scene', 0.022), ('recall', 0.021), ('pooled', 0.021), ('efficiently', 0.021), ('representation', 0.021), ('frequencies', 0.02), ('position', 0.02), ('google', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection
Author: Lukáš Neumann, Jiri Matas
Abstract: An unconstrained end-to-end text localization and recognition method is presented. The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearestneighbor classifier trained on synthetic data in a plain form. The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition.
2 0.57862502 415 iccv-2013-Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors
Author: Weilin Huang, Zhe Lin, Jianchao Yang, Jue Wang
Abstract: In this paper, we present a new approach for text localization in natural images, by discriminating text and non-text regions at three levels: pixel, component and textline levels. Firstly, a powerful low-level filter called the Stroke Feature Transform (SFT) is proposed, which extends the widely-used Stroke Width Transform (SWT) by incorporating color cues of text pixels, leading to significantly enhanced performance on inter-component separation and intra-component connection. Secondly, based on the output of SFT, we apply two classifiers, a text component classifier and a text-line classifier, sequentially to extract text regions, eliminating the heuristic procedures that are commonly used in previous approaches. The two classifiers are built upon two novel Text Covariance Descriptors (TCDs) that encode both the heuristic properties and the statistical characteristics of text stokes. Finally, text regions are located by simply thresholding the text-line confident map. Our method was evaluated on two benchmark datasets: ICDAR 2005 and ICDAR 2011, and the corresponding F- , measure values are 0. 72 and 0. 73, respectively, surpassing previous methods in accuracy by a large margin.
3 0.45496801 210 iccv-2013-Image Retrieval Using Textual Cues
Author: Anand Mishra, Karteek Alahari, C.V. Jawahar
Abstract: We present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-artmethods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce.
4 0.4262712 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions
Author: Alessandro Bissacco, Mark Cummins, Yuval Netzer, Hartmut Neven
Abstract: We describe PhotoOCR, a system for text extraction from images. Our particular focus is reliable text extraction from smartphone imagery, with the goal of text recognition as a user input modality similar to speech recognition. Commercially available OCR performs poorly on this task. Recent progress in machine learning has substantially improved isolated character classification; we build on this progress by demonstrating a complete OCR system using these techniques. We also incorporate modern datacenter-scale distributed language modelling. Our approach is capable of recognizing text in a variety of challenging imaging conditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions. It also operates with low latency; mean processing time is 600 ms per image. We evaluate our system on public benchmark datasets for text extraction and outperform all previously reported results, more than halving the error rate on multiple benchmarks. The system is currently in use in many applications at Google, and is available as a user input modality in Google Translate for Android.
5 0.38833797 345 iccv-2013-Recognizing Text with Perspective Distortion in Natural Scenes
Author: Trung Quy Phan, Palaiahnakote Shivakumara, Shangxuan Tian, Chew Lim Tan
Abstract: This paper presents an approach to text recognition in natural scene images. Unlike most existing works which assume that texts are horizontal and frontal parallel to the image plane, our method is able to recognize perspective texts of arbitrary orientations. For individual character recognition, we adopt a bag-of-keypoints approach, in which Scale Invariant Feature Transform (SIFT) descriptors are extracted densely and quantized using a pre-trained vocabulary. Following [1, 2], the context information is utilized through lexicons. We formulate word recognition as finding the optimal alignment between the set of characters and the list of lexicon words. Furthermore, we introduce a new dataset called StreetViewText-Perspective, which contains texts in street images with a great variety of viewpoints. Experimental results on public datasets and the proposed dataset show that our method significantly outperforms the state-of-the-art on perspective texts of arbitrary orientations.
6 0.21108083 180 iccv-2013-From Where and How to What We See
7 0.12511985 192 iccv-2013-Handwritten Word Spotting with Corrected Attributes
8 0.091454163 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences
9 0.089214496 166 iccv-2013-Finding Actors and Actions in Movies
10 0.079671696 44 iccv-2013-Adapting Classification Cascades to New Domains
11 0.061564162 235 iccv-2013-Learning Coupled Feature Spaces for Cross-Modal Matching
12 0.056656707 330 iccv-2013-Proportion Priors for Image Sequence Segmentation
13 0.05462756 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
14 0.049368333 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
15 0.047299359 170 iccv-2013-Fingerspelling Recognition with Semi-Markov Conditional Random Fields
16 0.046099283 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
17 0.044955805 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image
18 0.044836827 277 iccv-2013-Multi-channel Correlation Filters
19 0.044661224 74 iccv-2013-Co-segmentation by Composition
20 0.044146843 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
topicId topicWeight
[(0, 0.144), (1, 0.023), (2, -0.006), (3, -0.074), (4, 0.06), (5, 0.057), (6, 0.036), (7, -0.031), (8, -0.083), (9, -0.036), (10, 0.467), (11, -0.172), (12, 0.198), (13, 0.127), (14, 0.04), (15, 0.141), (16, -0.102), (17, 0.183), (18, -0.315), (19, 0.15), (20, 0.171), (21, 0.152), (22, 0.051), (23, -0.004), (24, 0.032), (25, -0.041), (26, -0.039), (27, 0.065), (28, 0.04), (29, -0.013), (30, -0.012), (31, 0.054), (32, -0.019), (33, 0.037), (34, 0.044), (35, 0.017), (36, 0.044), (37, 0.015), (38, 0.051), (39, -0.025), (40, -0.032), (41, -0.017), (42, 0.081), (43, -0.025), (44, -0.014), (45, 0.013), (46, -0.037), (47, 0.032), (48, 0.009), (49, 0.006)]
simIndex simValue paperId paperTitle
same-paper 1 0.96507269 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection
Author: Lukáš Neumann, Jiri Matas
Abstract: An unconstrained end-to-end text localization and recognition method is presented. The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearestneighbor classifier trained on synthetic data in a plain form. The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition.
2 0.91775757 415 iccv-2013-Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors
Author: Weilin Huang, Zhe Lin, Jianchao Yang, Jue Wang
Abstract: In this paper, we present a new approach for text localization in natural images, by discriminating text and non-text regions at three levels: pixel, component and textline levels. Firstly, a powerful low-level filter called the Stroke Feature Transform (SFT) is proposed, which extends the widely-used Stroke Width Transform (SWT) by incorporating color cues of text pixels, leading to significantly enhanced performance on inter-component separation and intra-component connection. Secondly, based on the output of SFT, we apply two classifiers, a text component classifier and a text-line classifier, sequentially to extract text regions, eliminating the heuristic procedures that are commonly used in previous approaches. The two classifiers are built upon two novel Text Covariance Descriptors (TCDs) that encode both the heuristic properties and the statistical characteristics of text stokes. Finally, text regions are located by simply thresholding the text-line confident map. Our method was evaluated on two benchmark datasets: ICDAR 2005 and ICDAR 2011, and the corresponding F- , measure values are 0. 72 and 0. 73, respectively, surpassing previous methods in accuracy by a large margin.
3 0.91734666 345 iccv-2013-Recognizing Text with Perspective Distortion in Natural Scenes
Author: Trung Quy Phan, Palaiahnakote Shivakumara, Shangxuan Tian, Chew Lim Tan
Abstract: This paper presents an approach to text recognition in natural scene images. Unlike most existing works which assume that texts are horizontal and frontal parallel to the image plane, our method is able to recognize perspective texts of arbitrary orientations. For individual character recognition, we adopt a bag-of-keypoints approach, in which Scale Invariant Feature Transform (SIFT) descriptors are extracted densely and quantized using a pre-trained vocabulary. Following [1, 2], the context information is utilized through lexicons. We formulate word recognition as finding the optimal alignment between the set of characters and the list of lexicon words. Furthermore, we introduce a new dataset called StreetViewText-Perspective, which contains texts in street images with a great variety of viewpoints. Experimental results on public datasets and the proposed dataset show that our method significantly outperforms the state-of-the-art on perspective texts of arbitrary orientations.
4 0.88918871 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions
Author: Alessandro Bissacco, Mark Cummins, Yuval Netzer, Hartmut Neven
Abstract: We describe PhotoOCR, a system for text extraction from images. Our particular focus is reliable text extraction from smartphone imagery, with the goal of text recognition as a user input modality similar to speech recognition. Commercially available OCR performs poorly on this task. Recent progress in machine learning has substantially improved isolated character classification; we build on this progress by demonstrating a complete OCR system using these techniques. We also incorporate modern datacenter-scale distributed language modelling. Our approach is capable of recognizing text in a variety of challenging imaging conditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions. It also operates with low latency; mean processing time is 600 ms per image. We evaluate our system on public benchmark datasets for text extraction and outperform all previously reported results, more than halving the error rate on multiple benchmarks. The system is currently in use in many applications at Google, and is available as a user input modality in Google Translate for Android.
5 0.79887199 210 iccv-2013-Image Retrieval Using Textual Cues
Author: Anand Mishra, Karteek Alahari, C.V. Jawahar
Abstract: We present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-artmethods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce.
6 0.68702143 180 iccv-2013-From Where and How to What We See
7 0.50521874 192 iccv-2013-Handwritten Word Spotting with Corrected Attributes
9 0.36649853 235 iccv-2013-Learning Coupled Feature Spaces for Cross-Modal Matching
10 0.32407361 170 iccv-2013-Fingerspelling Recognition with Semi-Markov Conditional Random Fields
11 0.22060901 166 iccv-2013-Finding Actors and Actions in Movies
12 0.20819153 277 iccv-2013-Multi-channel Correlation Filters
13 0.20444353 44 iccv-2013-Adapting Classification Cascades to New Domains
14 0.20415995 408 iccv-2013-Super-resolution via Transform-Invariant Group-Sparse Regularization
15 0.19842714 112 iccv-2013-Detecting Irregular Curvilinear Structures in Gray Scale and Color Imagery Using Multi-directional Oriented Flux
16 0.19367541 388 iccv-2013-Shape Index Descriptors Applied to Texture-Based Galaxy Analysis
17 0.18861794 294 iccv-2013-Offline Mobile Instance Retrieval with a Small Memory Footprint
18 0.18454571 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
19 0.18074061 55 iccv-2013-Automatic Kronecker Product Model Based Detection of Repeated Patterns in 2D Urban Images
20 0.1752968 278 iccv-2013-Multi-scale Topological Features for Hand Posture Representation and Analysis
topicId topicWeight
[(2, 0.052), (7, 0.12), (12, 0.015), (26, 0.078), (27, 0.023), (31, 0.147), (40, 0.019), (42, 0.099), (48, 0.019), (64, 0.045), (73, 0.039), (78, 0.014), (82, 0.079), (89, 0.132), (98, 0.012)]
simIndex simValue paperId paperTitle
1 0.86923707 415 iccv-2013-Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors
Author: Weilin Huang, Zhe Lin, Jianchao Yang, Jue Wang
Abstract: In this paper, we present a new approach for text localization in natural images, by discriminating text and non-text regions at three levels: pixel, component and textline levels. Firstly, a powerful low-level filter called the Stroke Feature Transform (SFT) is proposed, which extends the widely-used Stroke Width Transform (SWT) by incorporating color cues of text pixels, leading to significantly enhanced performance on inter-component separation and intra-component connection. Secondly, based on the output of SFT, we apply two classifiers, a text component classifier and a text-line classifier, sequentially to extract text regions, eliminating the heuristic procedures that are commonly used in previous approaches. The two classifiers are built upon two novel Text Covariance Descriptors (TCDs) that encode both the heuristic properties and the statistical characteristics of text stokes. Finally, text regions are located by simply thresholding the text-line confident map. Our method was evaluated on two benchmark datasets: ICDAR 2005 and ICDAR 2011, and the corresponding F- , measure values are 0. 72 and 0. 73, respectively, surpassing previous methods in accuracy by a large margin.
same-paper 2 0.86129117 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection
Author: Lukáš Neumann, Jiri Matas
Abstract: An unconstrained end-to-end text localization and recognition method is presented. The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearestneighbor classifier trained on synthetic data in a plain form. The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition.
3 0.85219997 357 iccv-2013-Robust Matrix Factorization with Unknown Noise
Author: Deyu Meng, Fernando De_La_Torre
Abstract: Many problems in computer vision can be posed as recovering a low-dimensional subspace from highdimensional visual data. Factorization approaches to lowrank subspace estimation minimize a loss function between an observed measurement matrix and a bilinear factorization. Most popular loss functions include the L2 and L1 losses. L2 is optimal for Gaussian noise, while L1 is for Laplacian distributed noise. However, real data is often corrupted by an unknown noise distribution, which is unlikely to be purely Gaussian or Laplacian. To address this problem, this paper proposes a low-rank matrix factorization problem with a Mixture of Gaussians (MoG) noise model. The MoG model is a universal approximator for any continuous distribution, and hence is able to model a wider range of noise distributions. The parameters of the MoG model can be estimated with a maximum likelihood method, while the subspace is computed with standard approaches. We illustrate the benefits of our approach in extensive syn- thetic and real-world experiments including structure from motion, face modeling and background subtraction.
4 0.85184526 38 iccv-2013-Action Recognition with Actons
Author: Jun Zhu, Baoyuan Wang, Xiaokang Yang, Wenjun Zhang, Zhuowen Tu
Abstract: With the improved accessibility to an exploding amount of video data and growing demands in a wide range of video analysis applications, video-based action recognition/classification becomes an increasingly important task in computer vision. In this paper, we propose a two-layer structure for action recognition to automatically exploit a mid-level “acton ” representation. The weakly-supervised actons are learned via a new max-margin multi-channel multiple instance learning framework, which can capture multiple mid-level action concepts simultaneously. The learned actons (with no requirement for detailed manual annotations) observe theproperties ofbeing compact, informative, discriminative, and easy to scale. The experimental results demonstrate the effectiveness ofapplying the learned actons in our two-layer structure, and show the state-ofthe-art recognition performance on two challenging action datasets, i.e., Youtube and HMDB51.
5 0.83870494 408 iccv-2013-Super-resolution via Transform-Invariant Group-Sparse Regularization
Author: Carlos Fernandez-Granda, Emmanuel J. Candès
Abstract: We present a framework to super-resolve planar regions found in urban scenes and other man-made environments by taking into account their 3D geometry. Such regions have highly structured straight edges, but this prior is challenging to exploit due to deformations induced by the projection onto the imaging plane. Our method factors out such deformations by using recently developed tools based on convex optimization to learn a transform that maps the image to a domain where its gradient has a simple group-sparse structure. This allows to obtain a novel convex regularizer that enforces global consistency constraints between the edges of the image. Computational experiments with real images show that this data-driven approach to the design of regularizers promoting transform-invariant group sparsity is very effective at high super-resolution factors. We view our approach as complementary to most recent superresolution methods, which tend to focus on hallucinating high-frequency textures.
6 0.83513665 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes
7 0.83477151 345 iccv-2013-Recognizing Text with Perspective Distortion in Natural Scenes
8 0.82390457 178 iccv-2013-From Semi-supervised to Transfer Counting of Crowds
10 0.80363929 323 iccv-2013-Pose Estimation with Unknown Focal Length Using Points, Directions and Lines
11 0.80339253 180 iccv-2013-From Where and How to What We See
13 0.80326021 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
14 0.79963481 275 iccv-2013-Motion-Aware KNN Laplacian for Video Matting
15 0.7980783 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions
16 0.79587442 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
17 0.78895581 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
18 0.77848798 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation
19 0.7763949 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
20 0.77384013 173 iccv-2013-Fluttering Pattern Generation Using Modified Legendre Sequence for Coded Exposure Imaging