iccv iccv2013 iccv2013-447 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jonathan T. Barron, Mark D. Biggin, Pablo Arbeláez, David W. Knowles, Soile V.E. Keranen, Jitendra Malik
Abstract: We present an algorithm for the per-voxel semantic segmentation of a three-dimensional volume. At the core of our algorithm is a novel “pyramid context” feature, a descriptive representation designed such that exact per-voxel linear classification can be made extremely efficient. This feature not only allows for efficient semantic segmentation but enables other aspects of our algorithm, such as novel learned features and a stacked architecture that can reason about self-consistency. We demonstrate our technique on 3Dfluorescence microscopy data ofDrosophila embryosfor which we are able to produce extremely accurate semantic segmentations in a matter of minutes, and for which other algorithms fail due to the size and high-dimensionality of the data, or due to the difficulty of the task.
Reference: text
sentIndex sentText sentNum sentScore
1 This feature not only allows for efficient semantic segmentation but enables other aspects of our algorithm, such as novel learned features and a stacked architecture that can reason about self-consistency. [sent-10, score-0.316]
2 Introduction Consider Figure 1(a), which shows slices from a volumetric image of a fruit fly embryo in its late stages of development, acquired with 3D fluorescence microscopy. [sent-13, score-0.368]
3 From a computer vision perspective, the problem at hand is that of volumetric semantic segmentation, in which we must predict a tissue label for each voxel in a volume. [sent-17, score-0.835]
4 In this paper, we present an extremely accurate and efficient algorithm for volumetric semantic segmentation, based on a novel feature type called the “pyramid context”. [sent-18, score-0.365]
5 The state-of-the-art in semantic segmentation on 2D images is represented by the leading techniques on the PASCAL VOC challenge [14]. [sent-21, score-0.223]
6 Given training annotations of 8 tissues or organs from a biologist, such as in 1(b), we can produce a pervoxel prediction of each tissue from a new (test-set) volume in a matter of minutes, as shown in 1(c). [sent-33, score-0.998]
7 Our first layer takes as input 4 feature types computed from the input volume (top row, position features are not shown) to produce a per-voxel prediction. [sent-38, score-0.548]
8 This output is fed to a second layer, which computes the same types of features from that per-voxel prediction, and uses the first-layer features with the new second-layer features (bottom row) to produce a new prediction. [sent-39, score-0.244]
9 1 for an explanation of the different feature channels shown here. [sent-42, score-0.291]
10 The handful of volumetric segmentation techniques which do exist are restricted to the specific task of connectomics with Electron Microscopy [1, 25, 26]. [sent-45, score-0.283]
11 Because existing techniques are insufficient, we must construct a novel semantic segmentation algorithm. [sent-46, score-0.223]
12 We will address the problem as one of evaluating a classifier at every voxel in a volume. [sent-47, score-0.249]
13 A key property of this feature is that by design, the dense evaluation of a linear classifier on pyramid context features is extremely efficient. [sent-50, score-0.829]
14 To create a semantic-segmentation algorithm, we will construct these pyramid context features using oriented edge information (as in HOG [12] or SIFT [21]) and also learned “codebook” like features (as in a bagof-words models [18]). [sent-51, score-0.689]
15 We can then stack these pyramid context layers into a multilayer architecture which allows our model to reason about context and self-consistency. [sent-52, score-0.79]
16 Our model is fast evaluation of a volume takes a matter of minutes, while the time taken by a biologist to fully annotate an embryo is often on the order of hours, and the time taken by existing computer vision techniques is on the order of days. [sent-57, score-0.475]
17 The pyramid context is similar to the shape context feature [3], geometric blur [4, 5], or DAISY features [23] all serve to pool information around a location in a log-polar arrangement (Figure 3). [sent-61, score-0.947]
18 The key insight behind our pyramid context feature is that there exist two equivalent “views” of the feature: it can be viewed as a Haar-like — (a)InputSignal(b)ShapeContex [3](c)GeometricBlur [4, 5] / DAISY [23] (d)PyramidContext (e) Pyramid Context Figure 3. [sent-62, score-0.65]
19 Given an input signal and a location (3(a)) we can pool local information in a retina-like fashion to construct a feature, such as shape context (3(b)) or geometric blur / DAISY (3(c)). [sent-63, score-0.27]
20 33444492 × pooling of signals at different scales (Figure 3(d)) or as a series of interpolations into a Gaussian pyramid of a signal (Figure 3(e)). [sent-66, score-0.516]
21 Because of this, we can evaluate a linear classifier on top of pyramid context features at every voxel in a volume extremely efficiently, using simple pyramid operations and convolutions with very small kernels. [sent-67, score-1.696]
22 Now let P(V ) be a K-level Gaussian pyramid of V , such that Pk (V ) is the k-th level of the pyramid (P1(V ) = V ). [sent-80, score-0.882]
23 A pyramid context feature is the concatenation of our simple “context” features at every scale of the pyramid: × × C(V, x, y, z) = [ c ? [sent-81, score-0.699]
24 To classify every voxel in a volume, we must compute ? [sent-94, score-0.207]
25 extremely large (l1y5 i sm eixl-lion voxels), and the corresponding features for each voxel are hard to compute: each requires hundreds of trilinear interpolation operations into a pyramid. [sent-99, score-0.457]
26 f wO tnhcaet we rhesavpoe a fsi tltoer leedve Gl kau, rsessihaanp pyramid, we c×ol3lapse the pyramid by upsampling each scale to the size of the volume, and summing the upsampled scales. [sent-106, score-0.441]
27 that the pyramid collapse at the end of each pyramid filterin? [sent-113, score-0.916]
28 g is linear, and so we can sum up the filtered pyramids and then collapse the summed pyramid only once. [sent-114, score-0.504]
29 In short, only pyramid filtering can run efficiently (or, at all) on the volumetric data we are investigating naive alternatives either take over 1. [sent-117, score-0.657]
30 Analytically, we show through com— plexity analysis that pyramid fpalestx as sliding-window, though provement in practice because erally fast for non-algorithmic optimized code, etc). [sent-119, score-0.441]
31 Semantic Segmentation Algorithm We will now build upon our novel feature descriptor and its corresponding efficient classification technique to construct a volumetric semantic-segmentation algorithm, as shown in Figure 2. [sent-121, score-0.297]
32 1 we will present three kinds of feature channels for use as input to our model, some of which are themselves built upon pyramid context features. [sent-123, score-0.913]
33 2 to build a two-layer model which uses contextual information, again by exploiting our pyramid context features. [sent-129, score-0.629]
34 To compute our “fixed” features we take our volume V , compute a Gaussian pyramid P(V ), convolve each level by a set of filters, half-wave rectify the output [22], and concatenate the channels together2. [sent-136, score-1.02]
35 For each filter f, we convolve each pyramid level Pk (V ) with that filter, and produce the following two channels: max(0, Pk(V ) ∗ f), max(0, −(Pk(V ) ∗ f)) (3) giving us a total of 26 channels. [sent-138, score-0.586]
36 Examples of our “fixed” channels can be seen in Figure 2. [sent-139, score-0.232]
37 It is difficult to use intuition to hand-design appropriate features, especially in unexplored domains such as our volumetric fluorescence data, so we will use semi-supervised feature learning to learn our second set of “adaptive” feature channels. [sent-142, score-0.339]
38 Filtering volumes with the medium-sized filters commonly used in feature learning experiments (9 9, 14 14, etc) is infteraacttuarbele l,e aanrndi nsugc ehx pfiletreirms henatvse (t9oo × ×sm 9,al 1l a spatial support ntoprovide useful information regarding context or morphology. [sent-145, score-0.358]
39 We will therefore use our pyramid context features as a substrate for feature learning: we will extract pyramid context features from the raw volume, learn a set of filters for those features, and then pyramid filter the volume according to those learned filters. [sent-146, score-2.172]
40 This works significantly better due to halfwave rectification being applied to the pyramid rather than the volume. [sent-149, score-0.441]
41 {b}, with which we can compute our feature channels {F} as }fo,l wloiwths: w F(j) = max(0, (V ⊗ fj) + bj) (4) Where ⊗ is pyramid filtering, as described earlier. [sent-150, score-0.732]
42 We take a semi-supervised approach when learning features: for each tissue, we learn a different set of filters using only locations within 10 voxels of the tissue of interest. [sent-153, score-0.541]
43 Examples of the channels we learn can be seen in Figure 2. [sent-154, score-0.232]
44 Note that our “adaptive” channels describe fundamentally different properties than our “fixed” channels. [sent-155, score-0.232]
45 Our fixed channels describe the local distribution of a volume at a given location, orientation, and scale, while our adaptive channels describe the local distribution of pyramid context features at a given position, and as such they can describe non-local phenomena. [sent-156, score-1.343]
46 An adaptive channel may learn to activate at voxels which are slightly to the left of some mass at a fine scale and distantly to the right of a much larger mass at a coarse scale, for example. [sent-157, score-0.273]
47 With our one “raw” channel, our 26 “fixed” channels, and our 26 “adaptive” channels, we can construct a feature vector for a voxel by computing pyramid context features for each channel at that voxel’s location and concatenating those pyramid context vectors together (See Figure 2). [sent-158, score-1.557]
48 Position Features Our imagery has been rotated to the “canonical” orientation used by the Drosophila community (see Figure 1(b)), and all volumes have been roughly registered to each other, which means that the absolute position of a voxel is informative. [sent-162, score-0.381]
49 Our feature vector for a voxel’s position is an em- × bedding of the voxel’s (x, y, z) position into a multiscale trilinear spline basis. [sent-163, score-0.367]
50 That is, we use trilinear interpolation to embed each voxel’s position into a 3D lattice of control points, and we do this at multiple scales. [sent-164, score-0.217]
51 eatures for training, we construct these sparse position feature vectors using trilinear interpolation. [sent-168, score-0.231]
52 1with these position features) we can evaluate the position part of the classifier by reshaping the weights into our multiscale lattice, 334454 14 Figure 4. [sent-170, score-0.268]
53 Because our volumes are in a canonical frame of reference, the absolute position of a voxel is informative. [sent-171, score-0.381]
54 We then have the weights that our model learns for position for that tissue shown as a multiscale lattice (4(b)) and flattened to a single-scale volume (4(c)). [sent-173, score-0.784]
55 Our multiscale representation allows our model to learn broad trends about position in coarse scales (such that the tissue is unlikely to occur at the top of the volume) while still learning fine-scale trends (like the shape of the tissue at the bottom of the volume). [sent-174, score-1.001]
56 and collapsing that pyramid to be the same size as the input volume. [sent-176, score-0.506]
57 This can be pre-computed, making evaluating this part of classification extremely fast: the collapsed pyramid of weights is just a per-voxel “bias”. [sent-177, score-0.597]
58 See Figure 4 for a visualization of a pyramid of learned weights for position, and of that pyramid collapsed to a volume. [sent-178, score-0.95]
59 This prediction is noisy, as we classify each voxel in isolation. [sent-184, score-0.268]
60 We therefore construct a “twolayer” model which uses the prediction of the “single-layer” model to reason about the relative arrangement of the tissue, thereby adding information about context and self- consistency. [sent-185, score-0.262]
61 We then learn a two-layer model which uses as its feature channels both the channels used in the first layer, and these new features built on the output of the first layer. [sent-188, score-0.613]
62 Post-Processing Though our classification model can reason about context and self-consistency, its per-voxel predictions are still often noisy and incomplete at a fine scale. [sent-197, score-0.241]
63 We would like to smooth our predictions while still respecting intensity discontinuities in the raw input volume that is, we want to smooth within tissue boundaries, but — not across tissue boundaries. [sent-199, score-1.254]
64 The intensity bins are determined by the intensity of the raw volume while the quantity being filtered is the probability hence the “joint” aspect of the bilateral filter. [sent-203, score-0.445]
65 We then blur the 4D grid by convolving it with a 5-tap binomial filter in the three “position” dimensions and a 3-tap binomial filter in the “intensity” dimension. [sent-204, score-0.277]
66 We then resample (or “slice”) the smoothed 4D grid according to the linearlyinterpolated volume intensity to produce a smoothed 3D volume. [sent-205, score-0.383]
67 This is probably because most tissues are usually so distant from the other tissues that the pairwise potentials have little effect. [sent-210, score-0.324]
68 In 5(a) we have a cropped slice of an input volume, for which we have a ground-truth annotation of a tissue in 5(b). [sent-212, score-0.547]
69 Our model produces the prediction in 5(c), which is often noisy and incomplete, so we use joint-bilateral smoothing to produce the smoothed prediction in 5(d), which propagates label information across the volume while respecting cell-boundaries. [sent-213, score-0.452]
70 Training For each tissue we train a binary classifier using logistic regression, which we found to work as well as a linear SVM while having the benefits of being interpretable as probabilities and of introducing a non-linearity, which is important for our “two-layer” models. [sent-216, score-0.452]
71 9, or a voxel labeled false with a probability greater than 0. [sent-218, score-0.207]
72 For our twolayer architecture, we do cross-validation on the training set, produce cross-validated predictions, produce features from those, concatenate those second-layer channels with our first-layer channels (and position), and then train on both with bootstrapping. [sent-221, score-0.666]
73 Experiments We demonstrate our semantic segmentation algorithm on fluorescence volumes of late-state Drosophila embryogenesis. [sent-226, score-0.325]
74 As one baseline we present an “oracle” segmentation technique: we use standard watershed segmentation techniques (threshold the volume, compute the distance transform, then compute the watershed transform) on the input volume to produce an oversegmentation of 10-25 thousand “supervoxels”. [sent-232, score-0.642]
75 This oracle technique gives us an upper-bound on the performance we should expect from super-voxel based semantic-segmentation techniques. [sent-234, score-0.217]
76 Because this prediction is produced by registering tissue annotations instead of actual tissues, this oracle technique serves as an upper bound on the performance we should expect from (affine) registration-based or correspondence-based techniques such as [15]. [sent-238, score-0.752]
77 This oracle performs poorly, due to the heavy variation in each tissue ×× and the fine-grained detail of cellular boundaries. [sent-239, score-0.558]
78 Standard sliding-window detection with this 3D HOG feature is only tractable because of the severe pooling used in constructing the features — instead of 15 million voxels, we need only classify a quarter-million HOG features. [sent-242, score-0.208]
79 Our other baselines are ablations of our technique, many of which are actually extremely similar to preexisting techniques. [sent-246, score-0.25]
80 That same model is also similar to Daisy features [23], but again made tractable using pyramid filtering. [sent-249, score-0.529]
81 This comparison of our ablations to past techniques is generous, as pyramid context features and pyramid filtering are required to make all of these models tractable in our domain. [sent-251, score-1.423]
82 Inthef(iRrsFtAcoPl)u2mnw(eRhFaAveP)t2he+p orstion of the input volume containing the tissue, and in the second we have the ground-truth annotation of that tissue. [sent-254, score-0.277]
83 The other columns are the output of various models, the first being an improved HOG baseline, the last being our complete model, and the others being notable ablations of our model (some of which resemble optimized and improved versions of other techniques). [sent-255, score-0.275]
84 Model names are as follows: (1) is our “oracle” segmentation technique, (2) is our “oracle” exemplar warping technique, (3) is our HOG baseline, and (4) is (3) where features have been augmented with our position features. [sent-260, score-0.219]
85 (5)-(1 1) are single-layer models, where the model name indicates what features have been included: ‘R’ is the “raw” feature channel, ‘F’ is our “fixed” feature channels, ‘A’ is our “adaptive” feature channels, and ‘P’ is our position features. [sent-262, score-0.316]
86 In models (12)- (14) we set K (the number of levels in our Gaussian pyramids) to small values, to show the value of the coarse scales of our pyramid context features (in all other experiments, K = 6). [sent-263, score-0.685]
87 Our positiononly baseline shows that, even though our volumes are registered to each other, position information is not sufficient to solve this problem. [sent-272, score-0.206]
88 Our ablations which resemble shape context and geometric blur features underperform our complete model, presumably because their input feature channels are impoverished. [sent-273, score-0.802]
89 Both our “fixed” and “adaptive” feature channels improve performance, and so seem — to contribute useful and complementary information. [sent-274, score-0.291]
90 ablations of our model (5-19, some of which resemble past techniques), a baseline technique adapted to volumetric data (3-4), one “oracle” technique based on oversegmentation (1), and another “oracle” based on exemplar-based registration. [sent-276, score-0.573]
91 On the left we have the hardest tissue in our dataset (the one for which our model and the baselines performs worst) and on the right we have the easiest. [sent-281, score-0.41]
92 ablations in which our pyramid depths are limited perform poorly, as they are deprived of contextual information. [sent-283, score-0.641]
93 Conclusion We have presented an algorithm for per-voxel semantic segmentation, demonstrated on 3D fluorescence microscopy data of Drosophila embryos. [sent-286, score-0.212]
94 The size and highdimensionality of our data renders most existing techniques intractable or inaccurate, while our technique produces very accurate per-voxel segmentations extremely efficiently hundreds of times faster than existing techniques. [sent-287, score-0.287]
95 At the core of our algorithm is our novel pyramid context feature, which is not only a powerful descriptive representation, but is designed such that exact per-voxel linear classification can be made extremely efficient. [sent-288, score-0.74]
96 For our semantic segmentation algorithm, we have introduced three feature types a standard feature set that uses oriented edge information, a novel feature set produced by applying feature-learning to pyramid context features, and a feature which encodes absolute position information. [sent-290, score-1.076]
97 By learning classifiers on top of pyramid context features based on these channels we can produce per-voxel segmentations, which can be improved with contextual information by “stacking” our models and using the output of one layer as input into the next. [sent-291, score-1.077]
98 By efficiently and accurately producing semantic segmentations of tissues from volumetric data, we enable real, breakthrough biological research at a large scale. [sent-294, score-0.444]
99 A quantitative spatiotemporal atlas of gene expression in the drosophila blastoderm. [sent-425, score-0.267]
100 Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. [sent-455, score-0.441]
wordName wordTfidf (topN-words)
[('pyramid', 0.441), ('tissue', 0.41), ('channels', 0.232), ('voxel', 0.207), ('volume', 0.193), ('drosophila', 0.171), ('ablations', 0.162), ('tissues', 0.162), ('context', 0.15), ('oracle', 0.148), ('volumetric', 0.139), ('embryo', 0.114), ('position', 0.09), ('extremely', 0.088), ('volumes', 0.084), ('trilinear', 0.082), ('pervoxel', 0.082), ('fluorescence', 0.082), ('daisy', 0.082), ('raw', 0.081), ('segmentation', 0.08), ('semantic', 0.079), ('filtering', 0.077), ('pk', 0.076), ('resemble', 0.072), ('biologist', 0.07), ('technique', 0.069), ('voxels', 0.066), ('filters', 0.065), ('techniques', 0.064), ('atlas', 0.063), ('binomial', 0.062), ('predictions', 0.061), ('prediction', 0.061), ('channel', 0.06), ('feature', 0.059), ('produce', 0.056), ('gk', 0.054), ('convolutions', 0.054), ('annotation', 0.053), ('slice', 0.053), ('filter', 0.053), ('microscopy', 0.051), ('arrangement', 0.051), ('architecture', 0.049), ('features', 0.049), ('smoothed', 0.049), ('blur', 0.047), ('multiscale', 0.046), ('intractably', 0.046), ('adaptive', 0.046), ('coarse', 0.045), ('lattice', 0.045), ('classifier', 0.042), ('signal', 0.042), ('output', 0.041), ('twolayer', 0.041), ('hog', 0.041), ('layer', 0.039), ('tractable', 0.039), ('bilateral', 0.039), ('collapsed', 0.038), ('watershed', 0.038), ('wk', 0.038), ('contextual', 0.038), ('intensity', 0.036), ('convolve', 0.036), ('supplementary', 0.035), ('collapse', 0.034), ('supervoxel', 0.034), ('collapsing', 0.034), ('experimentation', 0.034), ('ker', 0.034), ('matter', 0.034), ('intractable', 0.033), ('segmentations', 0.033), ('gene', 0.033), ('slices', 0.033), ('pooling', 0.033), ('baseline', 0.032), ('respecting', 0.032), ('producing', 0.031), ('descriptive', 0.031), ('input', 0.031), ('operations', 0.031), ('bins', 0.031), ('aps', 0.031), ('classification', 0.03), ('naively', 0.03), ('oversegmentation', 0.03), ('visualization', 0.03), ('filtered', 0.029), ('arbel', 0.029), ('visualizations', 0.029), ('constructing', 0.028), ('carreira', 0.028), ('vv', 0.028), ('hours', 0.028), ('rectify', 0.028), ('mass', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000008 447 iccv-2013-Volumetric Semantic Segmentation Using Pyramid Context Features
Author: Jonathan T. Barron, Mark D. Biggin, Pablo Arbeláez, David W. Knowles, Soile V.E. Keranen, Jitendra Malik
Abstract: We present an algorithm for the per-voxel semantic segmentation of a three-dimensional volume. At the core of our algorithm is a novel “pyramid context” feature, a descriptive representation designed such that exact per-voxel linear classification can be made extremely efficient. This feature not only allows for efficient semantic segmentation but enables other aspects of our algorithm, such as novel learned features and a stacked architecture that can reason about self-consistency. We demonstrate our technique on 3Dfluorescence microscopy data ofDrosophila embryosfor which we are able to produce extremely accurate semantic segmentations in a matter of minutes, and for which other algorithms fail due to the size and high-dimensionality of the data, or due to the difficulty of the task.
Author: Hang Chang, Yin Zhou, Paul Spellman, Bahram Parvin
Abstract: Image-based classification ofhistology sections, in terms of distinct components (e.g., tumor, stroma, normal), provides a series of indices for tumor composition. Furthermore, aggregation of these indices, from each whole slide image (WSI) in a large cohort, can provide predictive models of the clinical outcome. However, performance of the existing techniques is hindered as a result of large technical variations and biological heterogeneities that are always present in a large cohort. We propose a system that automatically learns a series of basis functions for representing the underlying spatial distribution using stacked predictive sparse decomposition (PSD). The learned representation is then fed into the spatial pyramid matching framework (SPM) with a linear SVM classifier. The system has been evaluated for classification of (a) distinct histological components for two cohorts of tumor types, and (b) colony organization of normal and malignant cell lines in 3D cell culture models. Throughput has been increased through the utility of graphical processing unit (GPU), and evalu- ation indicates a superior performance results, compared with previous research.
3 0.17085178 2 iccv-2013-3D Scene Understanding by Voxel-CRF
Author: Byung-Soo Kim, Pushmeet Kohli, Silvio Savarese
Abstract: Scene understanding is an important yet very challenging problem in computer vision. In the past few years, researchers have taken advantage of the recent diffusion of depth-RGB (RGB-D) cameras to help simplify the problem of inferring scene semantics. However, while the added 3D geometry is certainly useful to segment out objects with different depth values, it also adds complications in that the 3D geometry is often incorrect because of noisy depth measurements and the actual 3D extent of the objects is usually unknown because of occlusions. In this paper we propose a new method that allows us to jointly refine the 3D reconstruction of the scene (raw depth values) while accurately segmenting out the objects or scene elements from the 3D reconstruction. This is achieved by introducing a new model which we called Voxel-CRF. The Voxel-CRF model is based on the idea of constructing a conditional random field over a 3D volume of interest which captures the semantic and 3D geometric relationships among different elements (voxels) of the scene. Such model allows to jointly estimate (1) a dense voxel-based 3D reconstruction and (2) the semantic labels associated with each voxel even in presence of par- tial occlusions using an approximate yet efficient inference strategy. We evaluated our method on the challenging NYU Depth dataset (Version 1and 2). Experimental results show that our method achieves competitive accuracy in inferring scene semantics and visually appealing results in improving the quality of the 3D reconstruction. We also demonstrate an interesting application of object removal and scene completion from RGB-D images.
4 0.16479449 125 iccv-2013-Drosophila Embryo Stage Annotation Using Label Propagation
Author: Tomáš Kazmar, Evgeny Z. Kvon, Alexander Stark, Christoph H. Lampert
Abstract: In this work we propose a system for automatic classification of Drosophila embryos into developmental stages. While the system is designed to solve an actual problem in biological research, we believe that the principle underlying it is interesting not only for biologists, but also for researchers in computer vision. The main idea is to combine two orthogonal sources of information: one is a classifier trained on strongly invariant features, which makes it applicable to images of very different conditions, but also leads to rather noisy predictions. The other is a label propagation step based on a more powerful similarity measure that however is only consistent within specific subsets of the data at a time. In our biological setup, the information sources are the shape and the staining patterns of embryo images. We show experimentally that while neither of the methods can be used by itself to achieve satisfactory results, their combination achieves prediction quality comparable to human per- formance.
5 0.13896966 331 iccv-2013-Pyramid Coding for Functional Scene Element Recognition in Video Scenes
Author: Eran Swears, Anthony Hoogs, Kim Boyer
Abstract: Recognizing functional scene elemeents in video scenes based on the behaviors of moving objects that interact with them is an emerging problem ooff interest. Existing approaches have a limited ability to chharacterize elements such as cross-walks, intersections, andd buildings that have low activity, are multi-modal, or havee indirect evidence. Our approach recognizes the low activvity and multi-model elements (crosswalks/intersections) by introducing a hierarchy of descriptive clusters to fform a pyramid of codebooks that is sparse in the numbber of clusters and dense in content. The incorporation oof local behavioral context such as person-enter-building aand vehicle-parking nearby enables the detection of elemennts that do not have direct motion-based evidence, e.g. buuildings. These two contributions significantly improvee scene element recognition when compared against thhree state-of-the-art approaches. Results are shown on tyypical ground level surveillance video and for the first time on the more complex Wide Area Motion Imagery.
6 0.1148733 228 iccv-2013-Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences
7 0.10383607 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
8 0.089670837 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
9 0.0873411 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
10 0.08640191 404 iccv-2013-Structured Forests for Fast Edge Detection
11 0.081563294 379 iccv-2013-Semantic Segmentation without Annotating Segments
12 0.080371037 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
13 0.080005318 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
14 0.076134428 432 iccv-2013-Uncertainty-Driven Efficiently-Sampled Sparse Graphical Models for Concurrent Tumor Segmentation and Atlas Registration
15 0.075258136 277 iccv-2013-Multi-channel Correlation Filters
16 0.073706031 128 iccv-2013-Dynamic Probabilistic Volumetric Models
17 0.071438357 211 iccv-2013-Image Segmentation with Cascaded Hierarchical Models and Logistic Disjunctive Normal Networks
18 0.0707644 172 iccv-2013-Flattening Supervoxel Hierarchies by the Uniform Entropy Slice
19 0.070161879 329 iccv-2013-Progressive Multigrid Eigensolvers for Multiscale Spectral Segmentation
20 0.069561444 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization
topicId topicWeight
[(0, 0.183), (1, -0.014), (2, -0.002), (3, -0.028), (4, 0.056), (5, 0.001), (6, -0.052), (7, -0.011), (8, -0.03), (9, -0.102), (10, 0.013), (11, 0.011), (12, 0.014), (13, -0.014), (14, -0.019), (15, -0.033), (16, -0.025), (17, -0.006), (18, -0.001), (19, -0.016), (20, -0.063), (21, 0.035), (22, -0.009), (23, 0.023), (24, -0.135), (25, 0.012), (26, -0.008), (27, 0.035), (28, 0.0), (29, 0.074), (30, 0.055), (31, -0.098), (32, 0.037), (33, 0.003), (34, -0.053), (35, -0.021), (36, 0.009), (37, -0.045), (38, 0.149), (39, 0.143), (40, -0.009), (41, -0.02), (42, -0.114), (43, -0.024), (44, 0.008), (45, -0.083), (46, -0.138), (47, -0.007), (48, -0.184), (49, -0.119)]
simIndex simValue paperId paperTitle
same-paper 1 0.9349488 447 iccv-2013-Volumetric Semantic Segmentation Using Pyramid Context Features
Author: Jonathan T. Barron, Mark D. Biggin, Pablo Arbeláez, David W. Knowles, Soile V.E. Keranen, Jitendra Malik
Abstract: We present an algorithm for the per-voxel semantic segmentation of a three-dimensional volume. At the core of our algorithm is a novel “pyramid context” feature, a descriptive representation designed such that exact per-voxel linear classification can be made extremely efficient. This feature not only allows for efficient semantic segmentation but enables other aspects of our algorithm, such as novel learned features and a stacked architecture that can reason about self-consistency. We demonstrate our technique on 3Dfluorescence microscopy data ofDrosophila embryosfor which we are able to produce extremely accurate semantic segmentations in a matter of minutes, and for which other algorithms fail due to the size and high-dimensionality of the data, or due to the difficulty of the task.
2 0.73944354 401 iccv-2013-Stacked Predictive Sparse Coding for Classification of Distinct Regions in Tumor Histopathology
Author: Hang Chang, Yin Zhou, Paul Spellman, Bahram Parvin
Abstract: Image-based classification ofhistology sections, in terms of distinct components (e.g., tumor, stroma, normal), provides a series of indices for tumor composition. Furthermore, aggregation of these indices, from each whole slide image (WSI) in a large cohort, can provide predictive models of the clinical outcome. However, performance of the existing techniques is hindered as a result of large technical variations and biological heterogeneities that are always present in a large cohort. We propose a system that automatically learns a series of basis functions for representing the underlying spatial distribution using stacked predictive sparse decomposition (PSD). The learned representation is then fed into the spatial pyramid matching framework (SPM) with a linear SVM classifier. The system has been evaluated for classification of (a) distinct histological components for two cohorts of tumor types, and (b) colony organization of normal and malignant cell lines in 3D cell culture models. Throughput has been increased through the utility of graphical processing unit (GPU), and evalu- ation indicates a superior performance results, compared with previous research.
3 0.67977118 331 iccv-2013-Pyramid Coding for Functional Scene Element Recognition in Video Scenes
Author: Eran Swears, Anthony Hoogs, Kim Boyer
Abstract: Recognizing functional scene elemeents in video scenes based on the behaviors of moving objects that interact with them is an emerging problem ooff interest. Existing approaches have a limited ability to chharacterize elements such as cross-walks, intersections, andd buildings that have low activity, are multi-modal, or havee indirect evidence. Our approach recognizes the low activvity and multi-model elements (crosswalks/intersections) by introducing a hierarchy of descriptive clusters to fform a pyramid of codebooks that is sparse in the numbber of clusters and dense in content. The incorporation oof local behavioral context such as person-enter-building aand vehicle-parking nearby enables the detection of elemennts that do not have direct motion-based evidence, e.g. buuildings. These two contributions significantly improvee scene element recognition when compared against thhree state-of-the-art approaches. Results are shown on tyypical ground level surveillance video and for the first time on the more complex Wide Area Motion Imagery.
4 0.58699536 125 iccv-2013-Drosophila Embryo Stage Annotation Using Label Propagation
Author: Tomáš Kazmar, Evgeny Z. Kvon, Alexander Stark, Christoph H. Lampert
Abstract: In this work we propose a system for automatic classification of Drosophila embryos into developmental stages. While the system is designed to solve an actual problem in biological research, we believe that the principle underlying it is interesting not only for biologists, but also for researchers in computer vision. The main idea is to combine two orthogonal sources of information: one is a classifier trained on strongly invariant features, which makes it applicable to images of very different conditions, but also leads to rather noisy predictions. The other is a label propagation step based on a more powerful similarity measure that however is only consistent within specific subsets of the data at a time. In our biological setup, the information sources are the shape and the staining patterns of embryo images. We show experimentally that while neither of the methods can be used by itself to achieve satisfactory results, their combination achieves prediction quality comparable to human per- formance.
Author: Engin Türetken, Carlos Becker, Przemyslaw Glowacki, Fethallah Benmansour, Pascal Fua
Abstract: We propose a new approach to detecting irregular curvilinear structures in noisy image stacks. In contrast to earlier approaches that rely on circular models of the crosssections, ours allows for the arbitrarily-shaped ones that are prevalent in biological imagery. This is achieved by maximizing the image gradient flux along multiple directions and radii, instead of only two with a unique radius as is usually done. This yields a more complex optimization problem for which we propose a computationally efficient solution. We demonstrate the effectiveness of our approach on a wide range ofchallenging gray scale and color datasets and show that it outperforms existing techniques, especially on very irregular structures.
6 0.58370888 388 iccv-2013-Shape Index Descriptors Applied to Texture-Based Galaxy Analysis
8 0.54533654 2 iccv-2013-3D Scene Understanding by Voxel-CRF
9 0.51727527 77 iccv-2013-Codemaps - Segment, Classify and Search Objects Locally
10 0.51190543 215 iccv-2013-Incorporating Cloud Distribution in Sky Representation
11 0.4817597 132 iccv-2013-Efficient 3D Scene Labeling Using Fields of Trees
12 0.46869653 228 iccv-2013-Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences
13 0.46839419 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
14 0.46764153 329 iccv-2013-Progressive Multigrid Eigensolvers for Multiscale Spectral Segmentation
15 0.46526453 211 iccv-2013-Image Segmentation with Cascaded Hierarchical Models and Logistic Disjunctive Normal Networks
16 0.45831174 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling
17 0.45728979 193 iccv-2013-Heterogeneous Auto-similarities of Characteristics (HASC): Exploiting Relational Information for Classification
18 0.45458522 76 iccv-2013-Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees
19 0.45283303 406 iccv-2013-Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time
20 0.44888276 416 iccv-2013-The Interestingness of Images
topicId topicWeight
[(2, 0.058), (4, 0.014), (7, 0.012), (12, 0.013), (13, 0.013), (26, 0.117), (31, 0.087), (40, 0.038), (42, 0.086), (44, 0.159), (48, 0.016), (64, 0.042), (73, 0.037), (78, 0.01), (89, 0.194), (98, 0.016)]
simIndex simValue paperId paperTitle
1 0.89462924 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
Author: Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
Abstract: We propose an on-line algorithm to extract a human by foreground/background segmentation and estimate pose of the human from the videos captured by moving cameras. We claim that a virtuous cycle can be created by appropriate interactions between the two modules to solve individual problems. This joint estimation problem is divided into two subproblems, , foreground/background segmentation and pose tracking, which alternate iteratively for optimization; segmentation step generates foreground mask for human pose tracking, and human pose tracking step provides foreground response map for segmentation. The final solution is obtained when the iterative procedure converges. We evaluate our algorithm quantitatively and qualitatively in real videos involving various challenges, and present its outstandingperformance compared to the state-of-the-art techniques for segmentation and pose estimation.
2 0.88363421 416 iccv-2013-The Interestingness of Images
Author: Michael Gygli, Helmut Grabner, Hayko Riemenschneider, Fabian Nater, Luc Van_Gool
Abstract: We investigate human interest in photos. Based on our own and others ’psychological experiments, we identify various cues for “interestingness ”, namely aesthetics, unusualness and general preferences. For the ranking of retrieved images, interestingness is more appropriate than cues proposed earlier. Interestingness is, for example, correlated with what people believe they will remember. This is opposed to actual memorability, which is uncorrelated to both of them. We introduce a set of features computationally capturing the three main aspects of visual interestingness that we propose and build an interestingness predictor from them. Its performance is shown on three datasets with varying context, reflecting diverse levels of prior knowledge of the viewers.
same-paper 3 0.86974722 447 iccv-2013-Volumetric Semantic Segmentation Using Pyramid Context Features
Author: Jonathan T. Barron, Mark D. Biggin, Pablo Arbeláez, David W. Knowles, Soile V.E. Keranen, Jitendra Malik
Abstract: We present an algorithm for the per-voxel semantic segmentation of a three-dimensional volume. At the core of our algorithm is a novel “pyramid context” feature, a descriptive representation designed such that exact per-voxel linear classification can be made extremely efficient. This feature not only allows for efficient semantic segmentation but enables other aspects of our algorithm, such as novel learned features and a stacked architecture that can reason about self-consistency. We demonstrate our technique on 3Dfluorescence microscopy data ofDrosophila embryosfor which we are able to produce extremely accurate semantic segmentations in a matter of minutes, and for which other algorithms fail due to the size and high-dimensionality of the data, or due to the difficulty of the task.
4 0.85311055 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
Author: Seunghoon Hong, Suha Kwak, Bohyung Han
Abstract: We propose a novel offline tracking algorithm based on model-averaged posterior estimation through patch matching across frames. Contrary to existing online and offline tracking methods, our algorithm is not based on temporallyordered estimates of target state but attempts to select easyto-track frames first out of the remaining ones without exploiting temporal coherency of target. The posterior of the selected frame is estimated by propagating densities from the already tracked frames in a recursive manner. The density propagation across frames is implemented by an efficient patch matching technique, which is useful for our algorithm since it does not require motion smoothness assumption. Also, we present a hierarchical approach, where a small set of key frames are tracked first and non-key frames are handled by local key frames. Our tracking algorithm is conceptually well-suited for the sequences with abrupt motion, shot changes, and occlusion. We compare our tracking algorithm with existing techniques in real videos with such challenges and illustrate its superior performance qualitatively and quantitatively.
5 0.85189664 86 iccv-2013-Concurrent Action Detection with Structural Prediction
Author: Ping Wei, Nanning Zheng, Yibiao Zhao, Song-Chun Zhu
Abstract: Action recognition has often been posed as a classification problem, which assumes that a video sequence only have one action class label and different actions are independent. However, a single human body can perform multiple concurrent actions at the same time, and different actions interact with each other. This paper proposes a concurrent action detection model where the action detection is formulated as a structural prediction problem. In this model, an interval in a video sequence can be described by multiple action labels. An detected action interval is determined both by the unary local detector and the relations with other actions. We use a wavelet feature to represent the action sequence, and design a composite temporal logic descriptor to describe the action relations. The model parameters are trained by structural SVM learning. Given a long video sequence, a sequential decision window search algorithm is designed to detect the actions. Experiments on our new collected concurrent action dataset demonstrate the strength of our method.
6 0.83313578 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects
7 0.83299398 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
8 0.82803684 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection
9 0.8279354 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
10 0.82742465 414 iccv-2013-Temporally Consistent Superpixels
11 0.82546288 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
12 0.82341093 423 iccv-2013-Towards Motion Aware Light Field Video for Dynamic Scenes
13 0.82228494 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
14 0.82224512 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions
15 0.82160687 309 iccv-2013-Partial Enumeration and Curvature Regularization
16 0.82143116 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
17 0.82098389 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
18 0.82083994 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
19 0.82057971 150 iccv-2013-Exemplar Cut
20 0.8201561 349 iccv-2013-Regionlets for Generic Object Detection