nips nips2008 nips2008-36 knowledge-graph by maker-knowledge-mining

36 nips-2008-Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree

Source: pdf

Author: Daphna Weinshall, Hynek Hermansky, Alon Zweig, Jie Luo, Holly Jimison, Frank Ohl, Misha Pavel

Abstract: Unexpected stimuli are a challenge to any machine learning algorithm. Here we identify distinct types of unexpected events, focusing on ’incongruent events’ when ’general level’ and ’speciﬁc level’ classiﬁers give conﬂicting predictions. We deﬁne a formal framework for the representation and processing of incongruent events: starting from the notion of label hierarchy, we show how partial order on labels can be deduced from such hierarchies. For each event, we compute its probability in different ways, based on adjacent levels (according to the partial order) in the label hierarchy. An incongruent event is an event where the probability computed based on some more speciﬁc level (in accordance with the partial order) is much smaller than the probability computed based on some more general level, leading to conﬂicting predictions. We derive algorithms to detect incongruent events from different types of hierarchies, corresponding to class membership or part membership. Respectively, we show promising results with real data on two speciﬁc problems: Out Of Vocabulary words in speech recognition, and the identiﬁcation of a new sub-class (e.g., the face of a new individual) in audio-visual facial object recognition.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Here we identify distinct types of unexpected events, focusing on ’incongruent events’ when ’general level’ and ’speciﬁc level’ classiﬁers give conﬂicting predictions. [sent-2, score-0.173]

2 We deﬁne a formal framework for the representation and processing of incongruent events: starting from the notion of label hierarchy, we show how partial order on labels can be deduced from such hierarchies. [sent-3, score-0.431]

3 For each event, we compute its probability in different ways, based on adjacent levels (according to the partial order) in the label hierarchy. [sent-4, score-0.164]

4 An incongruent event is an event where the probability computed based on some more speciﬁc level (in accordance with the partial order) is much smaller than the probability computed based on some more general level, leading to conﬂicting predictions. [sent-5, score-0.823]

5 We derive algorithms to detect incongruent events from different types of hierarchies, corresponding to class membership or part membership. [sent-6, score-0.76]

6 Respectively, we show promising results with real data on two speciﬁc problems: Out Of Vocabulary words in speech recognition, and the identiﬁcation of a new sub-class (e. [sent-7, score-0.148]

7 , the face of a new individual) in audio-visual facial object recognition. [sent-9, score-0.135]

8 By deﬁnition, an unexpected event is one whose probability to confront the system is low, based on the data that has been observed previously. [sent-15, score-0.269]

9 In line with this observation, much of the computational work on novelty detection focused on the probabilistic modeling of known classes, identifying outliers of these distributions as novel events (see e. [sent-16, score-0.518]

10 More recently, oneclass classiﬁers have been proposed and used for novelty detection without the direct modeling of data distribution [3, 4]. [sent-19, score-0.301]

11 There are many studies on novelty detection in biological systems [5], often focusing on regions of the hippocampus [6]. [sent-20, score-0.301]

12 To advance beyond the detection of outliers, we observe that there are many different reasons why some stimuli could appear novel. [sent-21, score-0.152]

13 Our work, presented in Section 2, focuses on unexpected events which are indicated by the incongruence between prediction induced by prior experience (training) and the evidence provided by the sensory data. [sent-22, score-0.344]

14 A sufﬁciently large discrepancy between posterior probabilities induced by input data in the two classiﬁers is taken as indication that an item is incongruent. [sent-26, score-0.104]

15 Thus, in comparison with most existing work on novelty detection, one new and important characteristic of our approach is that we look for a level of description where the novel event is highly probable. [sent-27, score-0.362]

16 Rather than simply respond to an event which is rejected by all classiﬁers, which more often than not requires no special attention (as in pure noise), we construct and exploit a hierarchy of 1 representations. [sent-28, score-0.303]

17 We attend to those events which are recognized (or accepted) at some more abstract levels of description in the hierarchy, while being rejected by the more concrete classiﬁers. [sent-29, score-0.309]

18 One approach, used to detect images of unexpected incongruous objects, is to train the more general, less constrained classiﬁer using a larger more diverse set of stimuli, e. [sent-31, score-0.273]

19 A different approach is used to identify unexpected (out-of-vocabulary) lexical items. [sent-42, score-0.173]

20 A word that did not belong to the expected vocabulary of the more constrained recognizer could then be identiﬁed by discrepancy in posterior probabilities of phonemes derived from both classiﬁers. [sent-44, score-0.315]

21 Speciﬁcally, we consider two kinds of hierarchies: Part membership as in biological taxonomy or speech, and Class membership, as in human categorization (or levels of categorization). [sent-46, score-0.164]

22 We deﬁne a notion of partial order on such hierarchies, and identify those events whose probability as computed using different levels of the hierarchy does not agree. [sent-47, score-0.628]

23 Such events correspond to many interesting situations that are worthy of special attention, including incongruous scenes and new sub-classes, as shown in Section 3. [sent-49, score-0.277]

24 1 Introducing label hierarchy The set of labels represents the knowledge base about stimuli, which is either given (by a teacher in supervised learning settings) or learned (in unsupervised or semi-supervised settings). [sent-51, score-0.201]

25 For example, eyes, ears, and nose combine to form a head; head, legs and tail combine to form a dog. [sent-58, score-0.191]

26 Class membership, as in human categorization – where objects can be classiﬁed at different levels of generality, from sub-ordinate categories (most speciﬁc level), to basic level (intermediate level), to super-ordinate categories (most general level). [sent-59, score-0.179]

27 For example, a Beagle (sub-ordinate category) is also a dog (basic level category), and it is also an animal (super-ordinate category). [sent-60, score-0.566]

28 In the class-membership hierarchy, a parent class admits higher number of combinations of features than any of its children, i. [sent-62, score-0.127]

29 , the parent category is less constrained than its children classes. [sent-64, score-0.228]

30 In contrast, a parent node in the part-membership hierarchy imposes stricter constraints on the observed features than a child node. [sent-65, score-0.328]

31 Roughly speaking, in the class-membership hierarchy (right panel), the parent node is the disjunction of the child categories. [sent-68, score-0.388]

32 In the part-membership hierarchy (left panel), the parent category represents a conjunction of the children categories. [sent-69, score-0.424]

33 Left: part-membership hierarchy, the concept of a dog requires a conjunction of parts a head, legs and tail. [sent-72, score-0.864]

34 Right: class-membership hierarchy, the concept of a dog is deﬁned as the disjunction of more speciﬁc concepts - Afghan, Beagle and Collie. [sent-73, score-0.774]

35 Intuitively speaking, different levels in each hierarchy are related by a partial order: the more speciﬁc concept, which corresponds to a smaller set of events or objects in the world, is always smaller than the more general concept, which corresponds to a larger set of events or objects. [sent-75, score-0.804]

36 For the part-membership hierarchy example (left panel), the concept of ’dog’ requires a conjunction of parts as in DOG = LEGS ∩ HEAD ∩ TAIL, and therefore, for example, DOG ⊂ LEGS ⇒ DOG LEGS . [sent-78, score-0.445]

37 Thus DOG LEGS , DOG HEAD, DOG TAIL In contrast, for the class-membership hierarchy (right panel), the class of dogs requires the conjunction of the individual members as in DOG = AFGHAN ∪ BEAGEL ∪ COLLIE , and therefore, for example, DOG ⊃ AFGHAN ⇒ DOG AFGHAN . [sent-79, score-0.363]

38 Each node in G is a random variable which corresponds to a class or concept (or event). [sent-82, score-0.213]

39 Each directed link in E corresponds to partial order relationship as deﬁned above, where there is a link from node a to node b iff a b. [sent-83, score-0.146]

40 • If |As | > 1 Qs (X): a probabilistic model of class a which is based on the probability of concepts in a As , assuming their independence of each other. [sent-86, score-0.172]

41 Typically, the model incorporates some relatively simple conjunctive and/or disjunctive relations among concepts in A s . [sent-87, score-0.166]

42 • If |Ag | > 1 Qg (X): a probabilistic model of class a which is based on the probability of concepts in a Ag , assuming their independence of each other. [sent-88, score-0.172]

43 Here too, the model typically incorporates some relatively simple conjunctive and/or disjunctive relations among concepts in A g . [sent-89, score-0.166]

44 1, where our concept of interest a is the concept ‘dog’: In the part-membership hierarchy (left panel), |Ag | = 3 (head, legs, tail). [sent-91, score-0.463]

45 We can therefore learn 2 models for the class ‘dog’ (Qs is not deﬁned): dog 1. [sent-92, score-0.533]

46 Qg - obtained using the outcome of models for head, legs and tail, which were trained on dog the same training set T with body part labels. [sent-95, score-0.656]

47 If we further assume that a class-membership hierarchy is always a tree, then |A g | = 1. [sent-97, score-0.201]

48 We can therefore learn 2 models for the class ‘dog’ (Qg is not deﬁned): dog 1. [sent-98, score-0.533]

49 Qs - obtained using the outcome of models for Afghan, Beagle and Collie, which were dog trained on the same training set T with only speciﬁc dog type labels. [sent-101, score-1.02]

50 In particular, we are interested in the following discrepancy: Deﬁnition: Observation X is incongruent if there exists a concept a such that Qg (X) a Qa (X) or Qa (X) Qs (X). [sent-104, score-0.498]

51 In either case, the concept receives high probability at the more general level (according to the GP O), but much lower probability when relying only on the more speciﬁc level. [sent-106, score-0.285]

52 Let us discuss again the examples we have seen before, to illustrate why this deﬁnition indeed captures interesting “surprises”: • In the part-membership hierarchy (left panel of Fig. [sent-107, score-0.272]

53 1), we have Qg = QHead · QLegs · QTail dog Qdog In other words, while the probability of each part is high (since the multiplication of those probabilities is high), the ’dog’ classiﬁer is rather uncertain about the existence of a dog in this data. [sent-108, score-1.024]

54 Maybe the parts are conﬁgured in an unusual arrangement for a dog (as in a 3-legged cat), or maybe we encounter a donkey with a cat’s tail (as in Shrek 3). [sent-110, score-0.66]

55 Those are two examples of the kind of unexpected events we are interested in. [sent-111, score-0.344]

56 4 • In the class-membership hierarchy (right panel of Fig. [sent-112, score-0.272]

57 1), we have Qs = QAf ghan + QBeagle + QCollie dog Qdog In other words, while the probability of each sub-class is low (since the sum of these probabilities is low), the ’dog’ classiﬁer is certain about the existence of a dog in this data. [sent-113, score-1.064]

58 Maybe we are seeing a new type of dog that we haven’t seen before - a Pointer. [sent-115, score-0.492]

59 The dog model, if correctly capturing the notion of ’dogness’, should be able to identify this new object, while models of previously seen dog breeds (Afghan, Beagle and Collie) correctly fail to recognize the new object. [sent-116, score-1.03]

60 3 Incongruent events: algorithms Our deﬁnition for incongruent events in the previous section is indeed uniﬁed, but as a result quite abstract. [sent-117, score-0.584]

61 In this section we discuss two different algorithmic implementations, one generative and one discriminative, which were developed for the part membership and class membership hierarchies respectively (see deﬁnition in Section 1). [sent-118, score-0.374]

62 1 Part membership - a generative algorithm Consider the left panel of Fig. [sent-121, score-0.223]

63 The event in the top node is incongruent if its probability is low, while the probability of all its descendants is high. [sent-123, score-0.59]

64 In many applications, such as speech recognition, one computes the probability of events (sentences) based on a generative model (corresponding to a speciﬁc language) which includes a dictionary of parts (words). [sent-124, score-0.468]

65 At the top level the event probability is computed conditional on the model; in which case typically the parts are assumed to be independent, and the event probability is computed as the multiplication of the parts probabilities conditioned on the model. [sent-125, score-0.484]

66 For example, in speech processing and assuming a speciﬁc language (e. [sent-126, score-0.155]

67 , English), the probability of the sentence is typically computed by multiplying the probability of each word using an HMM model trained on sentences from a speciﬁc language. [sent-128, score-0.186]

68 More formally, Consider an event u composed of parts wk . [sent-130, score-0.257]

69 Using the generative model of events and assuming the conditional independence of the parts given this model, the prior probability of the event is given by the product of prior probabilities of the parts, p(u|L) = p(wk |L) (3) k where L denotes the generative model (e. [sent-131, score-0.518]

70 At the risk of notation abuse, {wk } now denote the parts which compose the most likely event u. [sent-135, score-0.165]

71 In speech processing, a sentence is incongruent if it includes an incongruent word - a word whose probability based on the generative language model is low, but whose direct probability (not constrained by the language model) is high. [sent-139, score-1.232]

72 Example: Out Of Vocabulary (OOV) words For the detection of OOV words, we performed experiments using a Large Vocabulary Continuous Speech Recognition (LVCSR) system on the Wall Street Journal Corpus (WSJ). [sent-140, score-0.163]

73 To introduce OOV words, the vocabulary was restricted to the 4968 most frequent words from the language training texts, leaving the remaining words unknown to the model. [sent-143, score-0.224]

74 In this task, we have shown that the comparison between two parallel classiﬁers, based on strong and weak posterior streams, is effective for the detection of OOV words, and also for the detection of recognition errors. [sent-145, score-0.262]

75 Speciﬁcally, we use the derivation above to detect out of vocabulary words, by comparing their probability when computed based on the language model, and when computed based on mere acoustic modeling. [sent-146, score-0.199]

76 2 Class membership - a discriminative algorithm Consider the right panel of Fig. [sent-156, score-0.227]

77 The general class in the top node is incongruent if its probability is high, while the probability of all its sub-classes is low. [sent-158, score-0.529]

78 In other words, the classiﬁer of the parent object accepts the new observation, but all the children object classiﬁers reject it. [sent-159, score-0.32]

79 Brute force computation of this deﬁnition may follow the path taken by traditional approaches to novelty detection, e. [sent-160, score-0.186]

80 Instead, it seems like discriminative classiﬁers, trained to discriminate 6 between objects at the sub-class level, could be more successful. [sent-164, score-0.133]

81 We note that unlike traditional approaches to novelty detection, which must use generative models or one-class classiﬁers in the absence of appropriate discriminative data, our dependence on object hierarchy provides discriminative data as a by-product. [sent-165, score-0.583]

82 In other words, after the recognition by a parent-node classiﬁer, we may use classiﬁers trained to discriminate between its children to implement a discriminative novelty detection algorithm. [sent-166, score-0.463]

83 Speciﬁcally, we used the approach described in [8] to build a uniﬁed representation for all objects in the sub-class level, which is the representation computed for the parent object whose classiﬁer had accepted (positively recognized) the object. [sent-167, score-0.206]

84 In this setup, the general parent category level is the ‘speech’ (audio) and ‘face’ (visual), and the different individuals are the offspring (sub-class) levels. [sent-175, score-0.249]

85 The task is to identify an individual as belonging to the trusted group of individuals vs. [sent-176, score-0.138]

86 All objects in the sub-class level (different individuals) were represented using the representation learnt for the parent level (’face’). [sent-182, score-0.279]

87 For this fusion the audio signal and visual signal were synchronized, and the winning classiﬁcation margins of both signals were normalized to the same scale and averaged to obtain a single margin for the combined method. [sent-190, score-0.106]

88 Since the goal is to identify novel incongruent events, true positive and false positive rates were calculated by considering all frames from the unknown test sequences as positive events and the known individual test sequences as negative events. [sent-191, score-0.63]

89 We compared our method to novelty detection based on one-class SVM [3] extended to our multi-class case. [sent-192, score-0.301]

90 3, our method performs substantially better in both modalities as compared to the “standard” one class approach for novelty detection. [sent-195, score-0.269]

91 4 Summary Unexpected events are typically identiﬁed by their low posterior probability. [sent-197, score-0.217]

92 In this paper we employed label hierarchy to obtain a few probability values for each event, which allowed us to tease apart different types of unexpected events. [sent-198, score-0.368]

93 3 audio visual audio−visual audio (OC−SVM) visual (OC−SVM) 0. [sent-206, score-0.212]

94 For comparison, we show results with a more traditional novelty detection method using One Class SVM (dashed lines). [sent-223, score-0.301]

95 We focused above on the second type of events - incongruent concepts, which have not been studied previously in isolation. [sent-224, score-0.584]

96 Such events are characterized by some discrepancy between the response of two classiﬁers, which can occur for a number different reasons: Context: in a given context such as the English language, a sentence containing a Czech word is assigned low probability. [sent-225, score-0.359]

97 In the visual domain, in a given context such as a street scene, otherwise high probability events such as “car” and “elephant” are not likely to appear together. [sent-226, score-0.297]

98 We described how our approach can be used to design new algorithms to address these problems, showing promising results on real speech and audio-visual facial datasets. [sent-228, score-0.15]

99 : Brain regions responsive to novelty in the absence of awareness. [sent-263, score-0.186]

100 : Combination of strongly and weakly constrained recognizers for reliable detection of oovs. [sent-282, score-0.202]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dog', 0.492), ('incongruent', 0.367), ('events', 0.217), ('hierarchy', 0.201), ('novelty', 0.186), ('afghan', 0.16), ('concept', 0.131), ('legs', 0.128), ('qg', 0.128), ('unexpected', 0.127), ('ers', 0.124), ('oov', 0.12), ('classi', 0.119), ('detection', 0.115), ('membership', 0.104), ('event', 0.102), ('beagle', 0.1), ('collie', 0.1), ('speech', 0.1), ('er', 0.092), ('wk', 0.092), ('concepts', 0.091), ('parent', 0.086), ('qs', 0.081), ('lvcsr', 0.08), ('qdog', 0.08), ('hierarchies', 0.077), ('level', 0.074), ('vocabulary', 0.073), ('discrepancy', 0.072), ('panel', 0.071), ('dogs', 0.071), ('head', 0.068), ('ag', 0.067), ('audio', 0.066), ('partial', 0.064), ('tail', 0.063), ('parts', 0.063), ('reject', 0.062), ('disjunction', 0.06), ('incongruous', 0.06), ('phoneme', 0.06), ('levels', 0.06), ('constrained', 0.055), ('language', 0.055), ('discriminative', 0.052), ('facial', 0.05), ('conjunction', 0.05), ('words', 0.048), ('generative', 0.048), ('qa', 0.048), ('trusted', 0.048), ('identify', 0.046), ('uni', 0.046), ('category', 0.045), ('objects', 0.045), ('object', 0.044), ('individuals', 0.044), ('accept', 0.043), ('accepts', 0.042), ('maybe', 0.042), ('modalities', 0.042), ('children', 0.042), ('class', 0.041), ('node', 0.041), ('face', 0.041), ('probability', 0.04), ('beagel', 0.04), ('breed', 0.04), ('conjunctive', 0.04), ('ghan', 0.04), ('hermansky', 0.04), ('markou', 0.04), ('oc', 0.04), ('phonemes', 0.04), ('qaf', 0.04), ('qb', 0.04), ('qbeagle', 0.04), ('qcollie', 0.04), ('qhead', 0.04), ('qlegs', 0.04), ('qtail', 0.04), ('recognizer', 0.04), ('wsj', 0.04), ('visual', 0.04), ('stimuli', 0.037), ('trained', 0.036), ('disjunctive', 0.035), ('plp', 0.035), ('sentence', 0.035), ('nn', 0.035), ('word', 0.035), ('accordance', 0.034), ('item', 0.032), ('recognizers', 0.032), ('recognized', 0.032), ('recognition', 0.032), ('accepted', 0.031), ('posteriors', 0.031), ('detect', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 36 nips-2008-Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree

Author: Daphna Weinshall, Hynek Hermansky, Alon Zweig, Jie Luo, Holly Jimison, Frank Ohl, Misha Pavel

2 0.12763135 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features

Author: Tae-kyun Kim, Roberto Cipolla

Abstract: We present a new co-clustering problem of images and visual features. The problem involves a set of non-object images in addition to a set of object images and features to be co-clustered. Co-clustering is performed in a way that maximises discrimination of object images from non-object images, thus emphasizing discriminative features. This provides a way of obtaining perceptual joint-clusters of object images and features. We tackle the problem by simultaneously boosting multiple strong classiﬁers which compete for images by their expertise. Each boosting classiﬁer is an aggregation of weak-learners, i.e. simple visual features. The obtained classiﬁers are useful for object detection tasks which exhibit multimodalities, e.g. multi-category and multi-view object detection tasks. Experiments on a set of pedestrian images and a face data set demonstrate that the method yields intuitive image clusters with associated features and is much superior to conventional boosting classiﬁers in object detection tasks. 1

3 0.11990166 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing

Author: Leo Zhu, Yuanhao Chen, Yuan Lin, Chenxi Lin, Alan L. Yuille

Abstract: Language and image understanding are two major goals of artiﬁcial intelligence which can both be conceptually formulated in terms of parsing the input signal into a hierarchical representation. Natural language researchers have made great progress by exploiting the 1D structure of language to design efﬁcient polynomialtime parsing algorithms. By contrast, the two-dimensional nature of images makes it much harder to design efﬁcient image parsers and the form of the hierarchical representations is also unclear. Attempts to adapt representations and algorithms from natural language have only been partially successful. In this paper, we propose a Hierarchical Image Model (HIM) for 2D image parsing which outputs image segmentation and object recognition. This HIM is represented by recursive segmentation and recognition templates in multiple layers and has advantages for representation, inference, and learning. Firstly, the HIM has a coarse-to-ﬁne representation which is capable of capturing long-range dependency and exploiting different levels of contextual information. Secondly, the structure of the HIM allows us to design a rapid inference algorithm, based on dynamic programming, which enables us to parse the image rapidly in polynomial time. Thirdly, we can learn the HIM efﬁciently in a discriminative manner from a labeled dataset. We demonstrate that HIM outperforms other state-of-the-art methods by evaluation on the challenging public MSRC image dataset. Finally, we sketch how the HIM architecture can be extended to model more complex image phenomena. 1

4 0.11487374 109 nips-2008-Interpreting the neural code with Formal Concept Analysis

Author: Dominik Endres, Peter Foldiak

Abstract: We propose a novel application of Formal Concept Analysis (FCA) to neural decoding: instead of just trying to ﬁgure out which stimulus was presented, we demonstrate how to explore the semantic relationships in the neural representation of large sets of stimuli. FCA provides a way of displaying and interpreting such relationships via concept lattices. We explore the effects of neural code sparsity on the lattice. We then analyze neurophysiological data from high-level visual cortical area STSa, using an exact Bayesian approach to construct the formal context needed by FCA. Prominent features of the resulting concept lattices are discussed, including hierarchical face representation and indications for a product-of-experts code in real neurons. 1

5 0.102582 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding

Author: Geremy Heitz, Stephen Gould, Ashutosh Saxena, Daphne Koller

Abstract: One of the original goals of computer vision was to fully understand a natural scene. This requires solving several sub-problems simultaneously, including object detection, region labeling, and geometric reasoning. The last few decades have seen great progress in tackling each of these problems in isolation. Only recently have researchers returned to the difﬁcult task of considering them jointly. In this work, we consider learning a set of related models in such that they both solve their own problem and help each other. We develop a framework called Cascaded Classiﬁcation Models (CCM), where repeated instantiations of these classiﬁers are coupled by their input/output variables in a cascade that improves performance at each level. Our method requires only a limited “black box” interface with the models, allowing us to use very sophisticated, state-of-the-art classiﬁers without having to look under the hood. We demonstrate the effectiveness of our method on a large set of natural images by combining the subtasks of scene categorization, object detection, multiclass image segmentation, and 3d reconstruction. 1

6 0.081258066 82 nips-2008-Fast Computation of Posterior Mode in Multi-Level Hierarchical Models

7 0.081224576 4 nips-2008-A Scalable Hierarchical Distributed Language Model

8 0.075693205 134 nips-2008-Mixed Membership Stochastic Blockmodels

9 0.074396394 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition

10 0.071712814 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

11 0.070470929 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

12 0.067908868 207 nips-2008-Shape-Based Object Localization for Descriptive Classification

13 0.067753866 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words

14 0.062665254 226 nips-2008-Supervised Dictionary Learning

15 0.060815964 228 nips-2008-Support Vector Machines with a Reject Option

16 0.058213819 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

17 0.058136869 199 nips-2008-Risk Bounds for Randomized Sample Compressed Classifiers

18 0.057382595 16 nips-2008-Adaptive Template Matching with Shift-Invariant Semi-NMF

19 0.055989619 15 nips-2008-Adaptive Martingale Boosting

20 0.055126715 62 nips-2008-Differentiable Sparse Coding

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.173), (1, -0.071), (2, 0.092), (3, -0.106), (4, -0.041), (5, -0.005), (6, 0.016), (7, -0.02), (8, 0.045), (9, 0.028), (10, 0.015), (11, 0.074), (12, -0.023), (13, -0.06), (14, -0.027), (15, 0.109), (16, 0.006), (17, -0.143), (18, 0.041), (19, 0.017), (20, 0.038), (21, 0.008), (22, 0.101), (23, 0.005), (24, -0.111), (25, -0.011), (26, -0.052), (27, -0.116), (28, -0.045), (29, -0.04), (30, 0.127), (31, 0.001), (32, 0.024), (33, -0.013), (34, 0.1), (35, -0.065), (36, -0.033), (37, 0.19), (38, -0.004), (39, -0.029), (40, 0.025), (41, -0.003), (42, 0.005), (43, -0.018), (44, -0.011), (45, -0.05), (46, -0.006), (47, 0.042), (48, -0.023), (49, 0.128)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95803726 36 nips-2008-Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree

Author: Daphna Weinshall, Hynek Hermansky, Alon Zweig, Jie Luo, Holly Jimison, Frank Ohl, Misha Pavel

2 0.68313748 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features

Author: Tae-kyun Kim, Roberto Cipolla

3 0.6755814 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding

Author: Geremy Heitz, Stephen Gould, Ashutosh Saxena, Daphne Koller

4 0.60927629 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing

Author: Leo Zhu, Yuanhao Chen, Yuan Lin, Chenxi Lin, Alan L. Yuille

5 0.58147156 109 nips-2008-Interpreting the neural code with Formal Concept Analysis

Author: Dominik Endres, Peter Foldiak

6 0.58087707 15 nips-2008-Adaptive Martingale Boosting

7 0.53676075 82 nips-2008-Fast Computation of Posterior Mode in Multi-Level Hierarchical Models

8 0.50045186 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words

9 0.4773638 4 nips-2008-A Scalable Hierarchical Distributed Language Model

10 0.47631705 27 nips-2008-Artificial Olfactory Brain for Mixture Identification

11 0.47162741 186 nips-2008-Probabilistic detection of short events, with application to critical care monitoring

12 0.46088558 207 nips-2008-Shape-Based Object Localization for Descriptive Classification

13 0.45945671 98 nips-2008-Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data

14 0.44897979 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition

15 0.43492076 41 nips-2008-Breaking Audio CAPTCHAs

16 0.42438602 162 nips-2008-On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost

17 0.41415954 67 nips-2008-Effects of Stimulus Type and of Error-Correcting Code Design on BCI Speller Performance

18 0.39564818 5 nips-2008-A Transductive Bound for the Voted Classifier with an Application to Semi-supervised Learning

19 0.39499483 228 nips-2008-Support Vector Machines with a Reject Option

20 0.39457384 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.055), (7, 0.079), (12, 0.052), (16, 0.318), (28, 0.133), (57, 0.085), (59, 0.022), (63, 0.019), (71, 0.015), (77, 0.036), (78, 0.04), (83, 0.064)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75695789 36 nips-2008-Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree

Author: Daphna Weinshall, Hynek Hermansky, Alon Zweig, Jie Luo, Holly Jimison, Frank Ohl, Misha Pavel

2 0.676853 145 nips-2008-Multi-stage Convex Relaxation for Learning with Sparse Regularization

Author: Tong Zhang

Abstract: We study learning formulations with non-convex regularizaton that are natural for sparse linear models. There are two approaches to this problem: • Heuristic methods such as gradient descent that only ﬁnd a local minimum. A drawback of this approach is the lack of theoretical guarantee showing that the local minimum gives a good solution. • Convex relaxation such as L1 -regularization that solves the problem under some conditions. However it often leads to sub-optimal sparsity in reality. This paper tries to remedy the above gap between theory and practice. In particular, we investigate a multi-stage convex relaxation scheme for solving problems with non-convex regularization. Theoretically, we analyze the behavior of a resulting two-stage relaxation scheme for the capped-L1 regularization. Our performance bound shows that the procedure is superior to the standard L1 convex relaxation for learning sparse targets. Experiments conﬁrm the effectiveness of this method on some simulation and real data. 1

3 0.66582233 173 nips-2008-Optimization on a Budget: A Reinforcement Learning Approach

Author: Paul L. Ruvolo, Ian Fasel, Javier R. Movellan

Abstract: Many popular optimization algorithms, like the Levenberg-Marquardt algorithm (LMA), use heuristic-based “controllers” that modulate the behavior of the optimizer during the optimization process. For example, in the LMA a damping parameter λ is dynamically modiﬁed based on a set of rules that were developed using heuristic arguments. Reinforcement learning (RL) is a machine learning approach to learn optimal controllers from examples and thus is an obvious candidate to improve the heuristic-based controllers implicit in the most popular and heavily used optimization algorithms. Improving the performance of off-the-shelf optimizers is particularly important for time-constrained optimization problems. For example the LMA algorithm has become popular for many real-time computer vision problems, including object tracking from video, where only a small amount of time can be allocated to the optimizer on each incoming video frame. Here we show that a popular modern reinforcement learning technique using a very simple state space can dramatically improve the performance of general purpose optimizers, like the LMA. Surprisingly the controllers learned for a particular domain also work well in very different optimization domains. For example we used RL methods to train a new controller for the damping parameter of the LMA. This controller was trained on a collection of classic, relatively small, non-linear regression problems. The modiﬁed LMA performed better than the standard LMA on these problems. This controller also dramatically outperformed the standard LMA on a difﬁcult computer vision problem for which it had not been trained. Thus the controller appeared to have extracted control rules that were not just domain speciﬁc but generalized across a range of optimization domains. 1

4 0.56215155 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models

Author: Tong Zhang

Abstract: Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with non-zero coefﬁcients and reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneﬁcial. We prove strong theoretical results showing that this procedure is effective in learning sparse representations. Experimental results support our theory. 1

5 0.53610218 99 nips-2008-High-dimensional support union recovery in multivariate regression

Author: Guillaume R. Obozinski, Martin J. Wainwright, Michael I. Jordan

Abstract: We study the behavior of block 1 / 2 regularization for multivariate regression, where a K-dimensional response vector is regressed upon a ﬁxed set of p covariates. The problem of support union recovery is to recover the subset of covariates that are active in at least one of the regression problems. Studying this problem under high-dimensional scaling (where the problem parameters as well as sample size n tend to inﬁnity simultaneously), our main result is to show that exact recovery is possible once the order parameter given by θ 1 / 2 (n, p, s) : = n/[2ψ(B ∗ ) log(p − s)] exceeds a critical threshold. Here n is the sample size, p is the ambient dimension of the regression model, s is the size of the union of supports, and ψ(B ∗ ) is a sparsity-overlap function that measures a combination of the sparsities and overlaps of the K-regression coefﬁcient vectors that constitute the model. This sparsity-overlap function reveals that block 1 / 2 regularization for multivariate regression never harms performance relative to a naive 1 -approach, and can yield substantial improvements in sample complexity (up to a factor of K) when the regression vectors are suitably orthogonal relative to the design. We complement our theoretical results with simulations that demonstrate the sharpness of the result, even for relatively small problems. 1

6 0.53537685 202 nips-2008-Robust Regression and Lasso

7 0.52829736 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

8 0.52731377 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations

9 0.52387196 66 nips-2008-Dynamic visual attention: searching for coding length increments

10 0.52362722 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

11 0.52217591 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

12 0.52107388 194 nips-2008-Regularized Learning with Networks of Features

13 0.52055866 62 nips-2008-Differentiable Sparse Coding

14 0.52004582 95 nips-2008-Grouping Contours Via a Related Image

15 0.51998389 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

16 0.51898128 222 nips-2008-Stress, noradrenaline, and realistic prediction of mouse behaviour using reinforcement learning

17 0.51871777 200 nips-2008-Robust Kernel Principal Component Analysis

18 0.51861227 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

19 0.51787496 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding

20 0.51723522 4 nips-2008-A Scalable Hierarchical Distributed Language Model