nips nips2003 nips2003-153 knowledge-graph by maker-knowledge-mining

153 nips-2003-Parameterized Novelty Detectors for Environmental Sensor Monitoring

Source: pdf

Author: Cynthia Archer, Todd K. Leen, António M. Baptista

Abstract: As part of an environmental observation and forecasting system, sensors deployed in the Columbia RIver Estuary (CORIE) gather information on physical dynamics and changes in estuary habitat. Of these, salinity sensors are particularly susceptible to biofouling, which gradually degrades sensor response and corrupts critical data. Automatic fault detectors have the capability to identify bio-fouling early and minimize data loss. Complicating the development of discriminatory classiﬁers is the scarcity of bio-fouling onset examples and the variability of the bio-fouling signature. To solve these problems, we take a novelty detection approach that incorporates a parameterized bio-fouling model. These detectors identify the occurrence of bio-fouling, and its onset time as reliably as human experts. Real-time detectors installed during the summer of 2001 produced no false alarms, yet detected all episodes of sensor degradation before the ﬁeld staﬀ scheduled these sensors for cleaning. From this initial deployment through February 2003, our bio-fouling detectors have essentially doubled the amount of useful data coming from the CORIE sensors. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract As part of an environmental observation and forecasting system, sensors deployed in the Columbia RIver Estuary (CORIE) gather information on physical dynamics and changes in estuary habitat. [sent-10, score-0.548]

2 Of these, salinity sensors are particularly susceptible to biofouling, which gradually degrades sensor response and corrupts critical data. [sent-11, score-1.095]

3 Automatic fault detectors have the capability to identify bio-fouling early and minimize data loss. [sent-12, score-0.286]

4 Complicating the development of discriminatory classiﬁers is the scarcity of bio-fouling onset examples and the variability of the bio-fouling signature. [sent-13, score-0.264]

5 These detectors identify the occurrence of bio-fouling, and its onset time as reliably as human experts. [sent-15, score-0.38]

6 Real-time detectors installed during the summer of 2001 produced no false alarms, yet detected all episodes of sensor degradation before the ﬁeld staﬀ scheduled these sensors for cleaning. [sent-16, score-0.923]

7 From this initial deployment through February 2003, our bio-fouling detectors have essentially doubled the amount of useful data coming from the CORIE sensors. [sent-17, score-0.342]

8 This system uses data from sensors deployed throughout the estuary (Figure 1) to calibrate and verify numerical models of circulation and material transport. [sent-20, score-0.488]

9 CORIE scientists use these models to predict and evaluate the eﬀects of development on the estuary environment (e. [sent-21, score-0.22]

10 CORIE salinity sensors deployed in the estuary lose several months of data every year due to sensor degradation. [sent-24, score-1.414]

11 The most common form of salinity sensor degradation is bio-fouling, a reduction of the sensor response due to growth of biological material on the sensor. [sent-26, score-1.384]

12 Prior the deployment of the technology described here, on a yearly basis CORIE salinity sensors suﬀered a 68% data loss due to bio-fouling. [sent-27, score-0.974]

13 Although bio-fouling degradation is a common problem for environmental sensors, there is apparently no previous work that develops automatic detectors of such degradation. [sent-28, score-0.378]

14 Figure 1: Map of Columbia River estuary marked with locations of CORIE sensors. [sent-29, score-0.22]

15 Early bio-fouling detection is made diﬃcult by the normal variability of salinity measurements. [sent-30, score-0.742]

16 Tides cause the measurements to vary from near river salinity to near ocean salinity twice a day. [sent-31, score-1.645]

17 The temporal pattern of salinity penetration varies spatially in the estuary. [sent-32, score-0.645]

18 The time from onset to complete bio-fouling can take anywhere from 3 weeks to 5 months depending on the season and type of growth. [sent-38, score-0.254]

19 barnacles) characterized by quick linear degradation and soft growth (e. [sent-41, score-0.184]

20 plant material) characterized by slow linear degradation with occasional interruptions in the downtrend. [sent-43, score-0.178]

21 Figure 2 illustrates tidal variations in salinity and the eﬀect that bio-fouling has on these measurements. [sent-44, score-0.724]

22 It contains salinity time series in practical salinity units (psu) from two sensors mounted at the Red26 station, Figure 1. [sent-45, score-1.529]

23 The upper trace, from sensor CT1460, contains only clean measurements. [sent-46, score-0.419]

24 The lower trace, from sensor CT1448, contains both clean and bio-fouled measurements. [sent-47, score-0.419]

25 The ﬁrst half of the two time series are similar, but beginning on September 28th , the salinity measurements diverge. [sent-48, score-0.752]

26 The primary challenge to our work is to detect the degradation quickly, ideally within several diurnal cycles. [sent-50, score-0.214]

27 The upper time series is from clean instrument CT1460. [sent-53, score-0.256]

28 The lower time series from instrument CT1448 shows degradation beginning on September 28, 2001. [sent-54, score-0.274]

29 deployed in the estuary, and it is this onset that we must detect. [sent-56, score-0.239]

30 The dearth of onset examples, and the observed variability of the bio-fouling signature spatially, seasonally, and weekly (according to the spring/neap tidal cycle) prevents use of classical discriminatory fault detectors. [sent-57, score-0.48]

31 Instead we develop a parameterized novelty detector to detect bio-fouling. [sent-58, score-0.214]

32 The parameters in the model of bio-fouled sensor behavior are ﬁt on-line by maximum-likelihood estimation. [sent-60, score-0.263]

33 A model of the clean sensor behavior is ﬁt to archival data. [sent-61, score-0.45]

34 These models are used in a sequential likelihood test to provide detection of bio-fouling, and an estimation of the time at which the degradation began. [sent-62, score-0.267]

35 Evaluations show that our detectors identify the onset of bio-fouling as reliably as human experts, and frequently within fewer tidal cycles of the onset. [sent-63, score-0.435]

36 Our deployment of sensors throughout the estuary has resulted in an actual reduction of the error loss from 68% to 35%. [sent-64, score-0.571]

37 Were it economical to replace sensors immediately upon detection of degradation, the data loss would have been reduced to 17%. [sent-66, score-0.301]

38 2 Salinity and Temperature Our detectors monitor maximum diurnal (md) salinity, deﬁned as the maximum salinity near one of the two diurnal tidal ﬂoods. [sent-67, score-1.104]

39 When the sensor is clean, the md salinity stays close to some mean value, with occasional dips of several psu caused by variations in the intrusion of salt water into the estuary. [sent-68, score-1.014]

40 When the sensor biofouls, the md salinity gradually decreases to typically less than half its normal mean value, as seen in the Figure 2 example. [sent-69, score-0.956]

41 Detectors that monitor salinity alone can not distinguish between normal decreases in salinity and early bio-fouling. [sent-70, score-1.38]

42 Natural salinity decreases can be recognized by monitoring a correlated source of information that is not corrupted by bio-fouling. [sent-72, score-0.682]

43 Salinity and temperature at a station are products of the same mixing process of ocean and river waters, so we expect these values will be correlated. [sent-73, score-0.395]

44 Assuming linear mixing of ocean and river waters, measured salinity Sm and temperature Tm are linear functions of ocean {So , To } and river {Sr , Tr } values Sm = α(t)So + (1 − α(t))Sr (1) Tm = α(t)To + (1 − α(t))Tr (2) where α(t) is the mixing coeﬃcient at time t. [sent-74, score-1.367]

45 The river temperature is measured at far upstream stations (Elliot or Woody). [sent-77, score-0.306]

46 The ocean temperature is estimated from measurements at Sand Island, the outermost sensor station. [sent-78, score-0.483]

47 3 Bio-fouling Detection Our early experiments with single-measurement detection suggested that we develop detectors that accrue information over time - similar to the standard sequential likelihood methods in classical pattern recognition. [sent-79, score-0.323]

48 We construct probability densities for such sequences for both clean sensors p(y1 , . [sent-85, score-0.318]

49 , yN | c) < c (4) where the threshold λ is chosen high enough to provide a speciﬁed false alarm rate (Neyman-Pearson test). [sent-98, score-0.261]

50 We assume that the probability density for the measurement sequence for fouled detectors is parameterized by a vector of unknown parameters θ. [sent-99, score-0.314]

51 The model is constructed such that at θ = 0 the density for the sequence assuming a fouled detector is equal to the density of the sequence assuming a clean detector p(y1 , . [sent-100, score-0.417]

52 1 Equivalently, if the alarm threshold is increased to maintain a low false alarm rate, the rate of proper detections is decreased. [sent-108, score-0.396]

53 , yN | c) f N = ln n=τ p(yn | τ, θ, f ) > λ < p(yn | c) c (6) Finally, we ﬁt the fouling model parameters θ and the onset time τ , by maximizing the log-likelihood ln p(y1 , . [sent-124, score-0.327]

54 Since the clean detector model is independent of τ and θ, this is equivalent to maximizing the log-likelihood ratio in (6). [sent-128, score-0.263]

55 Hence, we replace the latter with N h = max τ,θ f ln n=τ p(yn |τ, θ, f ) > λ < p(yn |c) c (7) If the sequence is coming from a clean sensor, the ﬁt should give θ ≈ 0 and hence h ≈ 0 (cf 5), and we will detect no event (assuming λ > 0). [sent-129, score-0.224]

56 1 Bio-fouling Fault Model By parameterizing the bio-fouling model, we are able to develop detectors using only clean example data. [sent-132, score-0.328]

57 (8) This provides a regression of the salinity on α. [sent-136, score-0.645]

58 The model of the measured md salinity value for a fouled detector is xn = g(n)sn (11) where the suppression factor, g(n), is g(n) = 1 (1 − m(n − τ )) n<τ n≥τ (12) and m is the bio-fouling rate (1/sec). [sent-139, score-0.916]

59 The discriminant function in (7) depends on the parameters of the clean model (9) and (10) which are estimated from historical data. [sent-141, score-0.246]

60 It also depends on the slope parameter θ = m of the fouling model, and the onset time τ which are ﬁt online as per (7). [sent-142, score-0.239]

61 Applying our Gaussian models in (8) and 13) to (7) gives us N h = max τ,m ln n=τ 1 (xn − ηn )2 (xn − (1 − m(n − τ ))ηn )2 + − 2 1 − m(n − τ ) 2ρ 2(1 − m(n − τ ))2 ρ2 (14) When h is above our chosen threshold, the detector signals a biofouled sensor. [sent-143, score-0.198]

62 The threshold λ is set to provide a maximum false alarm rate on historical data. [sent-144, score-0.311]

63 2 Model Fitting We ﬁnd maximum likelihood estimates for µ and Σ from clean archival time series 1 data. [sent-146, score-0.265]

64 For yn = [sn , αn ]T and N training values, the mean is given by µ = N n yn 1 T and the covariance matrix by Σ = N n (yn − µ)(yn − µ) . [sent-147, score-0.278]

65 At each time step N , we determine the maximum likelihood estimate of onset time τ and bio-fouling rate m from the data under test. [sent-149, score-0.296]

66 We ﬁnd the maximum likelihood estimate of bio-fouling rate m, for some onset time τ , by setting the ﬁrst derivative of (14) with respect to m equal to zero. [sent-150, score-0.272]

67 If we set the window length N − k to maximize the log likelihood ratio, h, the best estimate of onset time is τ . [sent-158, score-0.236]

68 To determine the onset time estimate, τ , we search over over all past time for the value of k that maximizes h (14). [sent-159, score-0.232]

69 The estimated onset time τ is the window length N − k that gives the largest value of h. [sent-164, score-0.208]

70 4 On-line Bio-fouling Detectors To see how well our classiﬁers worked in practice, we implemented versions that operated on real-time salinity and temperature measurements. [sent-166, score-0.737]

71 For all four instances of sensor degradation (three bio-fouling incidents and one instrument failure that mimicked bio-fouling) that occurred in the summer 2001 test period, our classiﬁers correctly indicated a sensor problem before the ﬁeld staﬀ was aware of it. [sent-167, score-0.816]

72 In addition, the real-time classiﬁers produced no false alarms during the summer test period. [sent-168, score-0.175]

73 Dotted lines indicate historical no false alarm (lower) and 10% false alarm rate (upper). [sent-172, score-0.536]

74 Field staﬀ schedule sensors for cleaning when the maximum salinity drops “too low”, roughly the no false alarm level. [sent-173, score-1.032]

75 Bottom plots show the sequential likelihood discriminant for forty days of salinity and temperature measurements. [sent-174, score-0.894]

76 Dotted lines indicate historical no false alarm (upper) and 10% false alarm rate (lower). [sent-175, score-0.536]

77 Figure 3 shows the on-line bio-fouling monitor during incidents at the Red26 CT1448 sensor and the Tansy Point CT1462 sensor. [sent-178, score-0.349]

78 Since we had another sensor mounted at the Red26 site that did not bio-foul, Figure 2, we were able to estimate the bio-fouling time as September 28th . [sent-179, score-0.314]

79 Our detector discriminant passed the no false alarm threshold ﬁve days after onset and roughly three days before the ﬁeld staﬀ decided the instrument needed cleaning. [sent-180, score-0.734]

80 In addition, the onset time estimate of September 29th was within a day of the true onset time. [sent-182, score-0.392]

81 The Tansy Point CT1462 sensor began to bio-foul a few days after the Red26 CT1448 sensor. [sent-183, score-0.327]

82 Our detector indicated that the Tansy Point sensor was bio-fouling on October 9th . [sent-184, score-0.37]

83 Since neighboring sensor Red26 was being replaced on October 11th , the ﬁeld staﬀ decided to retrieve the Tansy Point sensor as well. [sent-185, score-0.526]

84 On removal, this sensor was found to be in the early stages of bio-fouling. [sent-186, score-0.298]

85 In this case, indications from our classiﬁer permitted the sensor to be replaced before the ﬁeld staﬀ would normally have scheduled it for retrieval. [sent-187, score-0.29]

86 Experience with our on-line bio-fouling indicators demonstrates that these methods substantially reduce time from biofouling onset to detection. [sent-188, score-0.266]

87 In addition to the events described above, we have fairly extensive experience with the online detectors since their initial deployment in the Spring of 2001. [sent-189, score-0.295]

88 At this writing we have bio-fouling detectors at all observing stations in the estuary and experience with events throughout the year. [sent-190, score-0.455]

89 Near the end of October, 2001 we experienced a false alarm in a sensor near the surface in the lower estuary. [sent-191, score-0.515]

90 In a recent (February 2003) study of ﬁve sensor stations in the estuary we compared data loss prior to the deployment of bio-fouling detectors, with data loss postdeployment. [sent-194, score-0.735]

91 Neglecting seasonal variation, prior to the deployment of our detectors, 68% of all the sensor data was corrupted by bio-fouling. [sent-197, score-0.423]

92 Were it economical to replace the sensors immediately upon detection of bio-fouling, the data loss rate would have been dropped farther to 17%. [sent-200, score-0.36]

93 Even with the delay in responding to event detection, the detectors have more than doubled the amount of reliable data collected from the estuary. [sent-201, score-0.243]

94 5 Discussion CORIE salinity sensors lose several months of data every year due to sensor biofouling. [sent-202, score-1.139]

95 Developing discriminatory fault detectors for these sensors is hampered by the variability of the bio-fouling time-signature, and the dearth of bio-fouling onset example data for training. [sent-203, score-0.708]

96 Clean sensor models were developed based on archive data, while biofouled sensor models are given a simple parametric form that is ﬁt online. [sent-205, score-0.573]

97 On-line bio-fouling detectors deployed during the summer of 2001 detected all episodes of sensor degradation several days before the ﬁeld staﬀ without generating any false alarms. [sent-206, score-0.853]

98 Expanded installation of a suite of detectors throughout the estuary continue to successfully detect bio-fouling with minimal false alarm intrusion. [sent-207, score-0.666]

99 The detector deployment has eﬀectively doubled the amount of clean data available from the estuary salinity sensors. [sent-208, score-1.298]

100 Fault detection for salinity sensors in the Columbia River Estuary. [sent-235, score-0.871]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('salinity', 0.645), ('sensor', 0.263), ('estuary', 0.22), ('onset', 0.184), ('river', 0.173), ('detectors', 0.172), ('sensors', 0.162), ('corie', 0.157), ('clean', 0.156), ('degradation', 0.151), ('yn', 0.139), ('alarm', 0.135), ('deployment', 0.123), ('detector', 0.107), ('sta', 0.096), ('ocean', 0.094), ('tansy', 0.094), ('temperature', 0.092), ('false', 0.09), ('fault', 0.079), ('tidal', 0.079), ('detection', 0.064), ('days', 0.064), ('archer', 0.063), ('baptista', 0.063), ('diurnal', 0.063), ('columbia', 0.06), ('summer', 0.058), ('environmental', 0.055), ('deployed', 0.055), ('novelty', 0.055), ('monitor', 0.055), ('parameterized', 0.052), ('historical', 0.05), ('instrument', 0.05), ('md', 0.048), ('biofouled', 0.047), ('discriminatory', 0.047), ('doubled', 0.047), ('fouled', 0.047), ('months', 0.046), ('loss', 0.044), ('ln', 0.044), ('measurement', 0.043), ('stations', 0.041), ('discriminant', 0.04), ('february', 0.037), ('corrupted', 0.037), ('rate', 0.036), ('mixing', 0.036), ('september', 0.035), ('early', 0.035), ('measurements', 0.034), ('sr', 0.033), ('date', 0.033), ('growth', 0.033), ('suppression', 0.033), ('variability', 0.033), ('archival', 0.031), ('biofouling', 0.031), ('dearth', 0.031), ('economical', 0.031), ('eofs', 0.031), ('fouling', 0.031), ('gather', 0.031), ('incidents', 0.031), ('precipitation', 0.031), ('psu', 0.031), ('sal', 0.031), ('slr', 0.031), ('wilkin', 0.031), ('tr', 0.029), ('material', 0.029), ('sm', 0.029), ('likelihood', 0.028), ('coe', 0.028), ('october', 0.028), ('near', 0.027), ('mounted', 0.027), ('signature', 0.027), ('scheduled', 0.027), ('indicators', 0.027), ('degrading', 0.027), ('alarms', 0.027), ('occasional', 0.027), ('suite', 0.027), ('tm', 0.027), ('eld', 0.027), ('series', 0.026), ('susceptible', 0.025), ('forecasting', 0.025), ('waters', 0.025), ('forty', 0.025), ('time', 0.024), ('event', 0.024), ('xk', 0.024), ('lose', 0.023), ('dropped', 0.023), ('beginning', 0.023), ('throughout', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 153 nips-2003-Parameterized Novelty Detectors for Environmental Sensor Monitoring

Author: Cynthia Archer, Todd K. Leen, António M. Baptista

2 0.1426466 55 nips-2003-Distributed Optimization in Adaptive Networks

Author: Ciamac C. Moallemi, Benjamin V. Roy

Abstract: We develop a protocol for optimizing dynamic behavior of a network of simple electronic components, such as a sensor network, an ad hoc network of mobile devices, or a network of communication switches. This protocol requires only local communication and simple computations which are distributed among devices. The protocol is scalable to large networks. As a motivating example, we discuss a problem involving optimization of power consumption, delay, and buffer overﬂow in a sensor network. Our approach builds on policy gradient methods for optimization of Markov decision processes. The protocol can be viewed as an extension of policy gradient methods to a context involving a team of agents optimizing aggregate performance through asynchronous distributed communication and computation. We establish that the dynamics of the protocol approximate the solution to an ordinary differential equation that follows the gradient of the performance objective. 1

3 0.11249862 192 nips-2003-Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes

Author: Kevin P. Murphy, Antonio Torralba, William T. Freeman

Abstract: Standard approaches to object detection focus on local patches of the image, and try to classify them as background or not. We propose to use the scene context (image as a whole) as an extra source of (global) information, to help resolve local ambiguities. We present a conditional random ﬁeld for jointly solving the tasks of object detection and scene classiﬁcation. 1

4 0.069709904 170 nips-2003-Self-calibrating Probability Forecasting

Author: Vladimir Vovk, Glenn Shafer, Ilia Nouretdinov

Abstract: In the problem of probability forecasting the learner’s goal is to output, given a training set and a new object, a suitable probability measure on the possible values of the new object’s label. An on-line algorithm for probability forecasting is said to be well-calibrated if the probabilities it outputs agree with the observed frequencies. We give a natural nonasymptotic formalization of the notion of well-calibratedness, which we then study under the assumption of randomness (the object/label pairs are independent and identically distributed). It turns out that, although no probability forecasting algorithm is automatically well-calibrated in our sense, there exists a wide class of algorithms for “multiprobability forecasting” (such algorithms are allowed to output a set, ideally very narrow, of probability measures) which satisfy this property; we call the algorithms in this class “Venn probability machines”. Our experimental results demonstrate that a 1-Nearest Neighbor Venn probability machine performs reasonably well on a standard benchmark data set, and one of our theoretical results asserts that a simple Venn probability machine asymptotically approaches the true conditional probabilities regardless, and without knowledge, of the true probability measure generating the examples.

5 0.061917994 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection

Author: Jianxin Wu, James M. Rehg, Matthew D. Mullin

Abstract: Face detection is a canonical example of a rare event detection problem, in which target patterns occur with much lower frequency than nontargets. Out of millions of face-sized windows in an input image, for example, only a few will typically contain a face. Viola and Jones recently proposed a cascade architecture for face detection which successfully addresses the rare event nature of the task. A central part of their method is a feature selection algorithm based on AdaBoost. We present a novel cascade learning algorithm based on forward feature selection which is two orders of magnitude faster than the Viola-Jones approach and yields classiﬁers of equivalent quality. This faster method could be used for more demanding classiﬁcation tasks, such as on-line learning. 1

6 0.054405596 166 nips-2003-Reconstructing MEG Sources with Unknown Correlations

7 0.048868861 133 nips-2003-Mutual Boosting for Contextual Inference

8 0.047330901 53 nips-2003-Discriminating Deformable Shape Classes

9 0.046647433 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

10 0.046530001 182 nips-2003-Subject-Independent Magnetoencephalographic Source Localization by a Multilayer Perceptron

11 0.043134853 154 nips-2003-Perception of the Structure of the Physical World Using Unknown Multimodal Sensors and Effectors

12 0.039176341 10 nips-2003-A Low-Power Analog VLSI Visual Collision Detector

13 0.033948105 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates

14 0.03267343 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles

15 0.031989709 186 nips-2003-Towards Social Robots: Automatic Evaluation of Human-Robot Interaction by Facial Expression Classification

16 0.031681463 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots

17 0.029840061 122 nips-2003-Margin Maximizing Loss Functions

18 0.029065644 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models

19 0.028456649 181 nips-2003-Statistical Debugging of Sampled Programs

20 0.028184118 76 nips-2003-GPPS: A Gaussian Process Positioning System for Cellular Networks

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.101), (1, 0.005), (2, 0.03), (3, -0.059), (4, -0.037), (5, -0.055), (6, 0.031), (7, -0.042), (8, -0.063), (9, 0.029), (10, -0.068), (11, -0.011), (12, 0.069), (13, 0.017), (14, 0.024), (15, -0.026), (16, -0.075), (17, 0.004), (18, -0.016), (19, 0.089), (20, 0.09), (21, 0.003), (22, 0.015), (23, 0.086), (24, -0.032), (25, -0.214), (26, 0.046), (27, 0.088), (28, -0.075), (29, 0.064), (30, 0.006), (31, -0.056), (32, -0.137), (33, 0.135), (34, -0.067), (35, 0.179), (36, 0.11), (37, 0.112), (38, 0.029), (39, 0.022), (40, -0.066), (41, -0.039), (42, 0.113), (43, 0.063), (44, 0.102), (45, -0.011), (46, -0.038), (47, 0.064), (48, 0.075), (49, 0.061)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96266192 153 nips-2003-Parameterized Novelty Detectors for Environmental Sensor Monitoring

Author: Cynthia Archer, Todd K. Leen, António M. Baptista

2 0.56802773 55 nips-2003-Distributed Optimization in Adaptive Networks

Author: Ciamac C. Moallemi, Benjamin V. Roy

3 0.51043105 192 nips-2003-Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes

Author: Kevin P. Murphy, Antonio Torralba, William T. Freeman

4 0.42672127 182 nips-2003-Subject-Independent Magnetoencephalographic Source Localization by a Multilayer Perceptron

Author: Sung C. Jun, Barak A. Pearlmutter

Abstract: We describe a system that localizes a single dipole to reasonable accuracy from noisy magnetoencephalographic (MEG) measurements in real time. At its core is a multilayer perceptron (MLP) trained to map sensor signals and head position to dipole location. Including head position overcomes the previous need to retrain the MLP for each subject and session. The training dataset was generated by mapping randomly chosen dipoles and head positions through an analytic model and adding noise from real MEG recordings. After training, a localization took 0.7 ms with an average error of 0.90 cm. A few iterations of a Levenberg-Marquardt routine using the MLP’s output as its initial guess took 15 ms and improved the accuracy to 0.53 cm, only slightly above the statistical limits on accuracy imposed by the noise. We applied these methods to localize single dipole sources from MEG components isolated by blind source separation and compared the estimated locations to those generated by standard manually-assisted commercial software. 1

5 0.37954184 76 nips-2003-GPPS: A Gaussian Process Positioning System for Cellular Networks

Author: Anton Schwaighofer, Marian Grigoras, Volker Tresp, Clemens Hoffmann

Abstract: In this article, we present a novel approach to solving the localization problem in cellular networks. The goal is to estimate a mobile user’s position, based on measurements of the signal strengths received from network base stations. Our solution works by building Gaussian process models for the distribution of signal strengths, as obtained in a series of calibration measurements. In the localization stage, the user’s position can be estimated by maximizing the likelihood of received signal strengths with respect to the position. We investigate the accuracy of the proposed approach on data obtained within a large indoor cellular network. 1

6 0.37484139 133 nips-2003-Mutual Boosting for Contextual Inference

7 0.35200754 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection

8 0.33954838 166 nips-2003-Reconstructing MEG Sources with Unknown Correlations

9 0.32545206 53 nips-2003-Discriminating Deformable Shape Classes

10 0.30569601 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

11 0.29287341 170 nips-2003-Self-calibrating Probability Forecasting

12 0.2707893 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots

13 0.26305452 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles

14 0.25872377 181 nips-2003-Statistical Debugging of Sampled Programs

15 0.25536624 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

16 0.23947014 22 nips-2003-An Improved Scheme for Detection and Labelling in Johansson Displays

17 0.22285064 11 nips-2003-A Mixed-Signal VLSI for Real-Time Generation of Edge-Based Image Vectors

18 0.21412887 131 nips-2003-Modeling User Rating Profiles For Collaborative Filtering

19 0.21370348 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models

20 0.20567876 38 nips-2003-Autonomous Helicopter Flight via Reinforcement Learning

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.05), (11, 0.014), (29, 0.016), (30, 0.015), (35, 0.036), (49, 0.012), (53, 0.073), (60, 0.408), (71, 0.059), (76, 0.028), (85, 0.095), (91, 0.082)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.81596667 153 nips-2003-Parameterized Novelty Detectors for Environmental Sensor Monitoring

Author: Cynthia Archer, Todd K. Leen, António M. Baptista

2 0.55303043 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions

Author: Tong Zhang

Abstract: In this paper we obtain convergence bounds for the concentration of Bayesian posterior distributions (around the true distribution) using a novel method that simpliﬁes and enhances previous results. Based on the analysis, we also introduce a generalized family of Bayesian posteriors, and show that the convergence behavior of these generalized posteriors is completely determined by the local prior structure around the true distribution. This important and surprising robustness property does not hold for the standard Bayesian posterior in that it may not concentrate when there exist “bad” prior structures even at places far away from the true distribution. 1

3 0.37795582 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection

Author: Jianxin Wu, James M. Rehg, Matthew D. Mullin

4 0.37792379 3 nips-2003-AUC Optimization vs. Error Rate Minimization

Author: Corinna Cortes, Mehryar Mohri

Abstract: The area under an ROC curve (AUC) is a criterion used in many applications to measure the quality of a classiﬁcation algorithm. However, the objective function optimized in most of these algorithms is the error rate and not the AUC value. We give a detailed statistical analysis of the relationship between the AUC and the error rate, including the ﬁrst exact expression of the expected value and the variance of the AUC for a ﬁxed error rate. Our results show that the average AUC is monotonically increasing as a function of the classiﬁcation accuracy, but that the standard deviation for uneven distributions and higher error rates is noticeable. Thus, algorithms designed to minimize the error rate may not lead to the best possible AUC values. We show that, under certain conditions, the global function optimized by the RankBoost algorithm is exactly the AUC. We report the results of our experiments with RankBoost in several datasets demonstrating the beneﬁts of an algorithm speciﬁcally designed to globally optimize the AUC over other existing algorithms optimizing an approximation of the AUC or only locally optimizing the AUC. 1 Motivation In many applications, the overall classiﬁcation error rate is not the most pertinent performance measure, criteria such as ordering or ranking seem more appropriate. Consider for example the list of relevant documents returned by a search engine for a speciﬁc query. That list may contain several thousand documents, but, in practice, only the top ﬁfty or so are examined by the user. Thus, a search engine’s ranking of the documents is more critical than the accuracy of its classiﬁcation of all documents as relevant or not. More generally, for a binary classiﬁer assigning a real-valued score to each object, a better correlation between output scores and the probability of correct classiﬁcation is highly desirable. A natural criterion or summary statistic often used to measure the ranking quality of a classiﬁer is the area under an ROC curve (AUC) [8].1 However, the objective function optimized by most classiﬁcation algorithms is the error rate and not the AUC. Recently, several algorithms have been proposed for maximizing the AUC value locally [4] or maximizing some approximations of the global AUC value [9, 15], but, in general, these algorithms do not obtain AUC values signiﬁcantly better than those obtained by an algorithm designed to minimize the error rates. Thus, it is important to determine the relationship between the AUC values and the error rate. ∗ This author’s new address is: Google Labs, 1440 Broadway, New York, NY 10018, corinna@google.com. 1 The AUC value is equivalent to the Wilcoxon-Mann-Whitney statistic [8] and closely related to the Gini index [1]. It has been re-invented under the name of L-measure by [11], as already pointed out by [2], and slightly modiﬁed under the name of Linear Ranking by [13, 14]. True positive rate ROC Curve. AUC=0.718 (1,1) True positive rate = (0,0) False positive rate = False positive rate correctly classiﬁed positive total positive incorrectly classiﬁed negative total negative Figure 1: An example of ROC curve. The line connecting (0, 0) and (1, 1), corresponding to random classiﬁcation, is drawn for reference. The true positive (negative) rate is sometimes referred to as the sensitivity (resp. speciﬁcity) in this context. In the following sections, we give a detailed statistical analysis of the relationship between the AUC and the error rate, including the ﬁrst exact expression of the expected value and the variance of the AUC for a ﬁxed error rate.2 We show that, under certain conditions, the global function optimized by the RankBoost algorithm is exactly the AUC. We report the results of our experiments with RankBoost in several datasets and demonstrate the beneﬁts of an algorithm speciﬁcally designed to globally optimize the AUC over other existing algorithms optimizing an approximation of the AUC or only locally optimizing the AUC. 2 Deﬁnition and properties of the AUC The Receiver Operating Characteristics (ROC) curves were originally developed in signal detection theory [3] in connection with radio signals, and have been used since then in many other applications, in particular for medical decision-making. Over the last few years, they have found increased interest in the machine learning and data mining communities for model evaluation and selection [12, 10, 4, 9, 15, 2]. The ROC curve for a binary classiﬁcation problem plots the true positive rate as a function of the false positive rate. The points of the curve are obtained by sweeping the classiﬁcation threshold from the most positive classiﬁcation value to the most negative. For a fully random classiﬁcation, the ROC curve is a straight line connecting the origin to (1, 1). Any improvement over random classiﬁcation results in an ROC curve at least partially above this straight line. Fig. (1) shows an example of ROC curve. The AUC is deﬁned as the area under the ROC curve and is closely related to the ranking quality of the classiﬁcation as shown more formally by Lemma 1 below. Consider a binary classiﬁcation task with m positive examples and n negative examples. We will assume that a classiﬁer outputs a strictly ordered list for these examples and will denote by 1X the indicator function of a set X. Lemma 1 ([8]) Let c be a ﬁxed classiﬁer. Let x1 , . . . , xm be the output of c on the positive examples and y1 , . . . , yn its output on the negative examples. Then, the AUC, A, associated to c is given by: m n i=1 j=1 1xi >yj (1) A= mn that is the value of the Wilcoxon-Mann-Whitney statistic [8]. Proof. The proof is based on the observation that the AUC value is exactly the probability P (X > Y ) where X is the random variable corresponding to the distribution of the outputs for the positive examples and Y the one corresponding to the negative examples [7]. The Wilcoxon-Mann-Whitney statistic is clearly the expression of that probability in the discrete case, which proves the lemma [8]. Thus, the AUC can be viewed as a measure based on pairwise comparisons between classiﬁcations of the two classes. With a perfect ranking, all positive examples are ranked higher than the negative ones and A = 1. Any deviation from this ranking decreases the AUC. 2 An attempt in that direction was made by [15], but, unfortunately, the authors’ analysis and the result are both wrong. Threshold θ k − x Positive examples x Negative examples n − x Negative examples m − (k − x) Positive examples Figure 2: For a ﬁxed number of errors k, there may be x, 0 ≤ x ≤ k, false negative examples. 3 The Expected Value of the AUC In this section, we compute exactly the expected value of the AUC over all classiﬁcations with a ﬁxed number of errors and compare that to the error rate. Different classiﬁers may have the same error rate but different AUC values. Indeed, for a given classiﬁcation threshold θ, an arbitrary reordering of the examples with outputs more than θ clearly does not affect the error rate but leads to different AUC values. Similarly, one may reorder the examples with output less than θ without changing the error rate. Assume that the number of errors k is ﬁxed. We wish to compute the average value of the AUC over all classiﬁcations with k errors. Our model is based on the simple assumption that all classiﬁcations or rankings with k errors are equiprobable. One could perhaps argue that errors are not necessarily evenly distributed, e.g., examples with very high or very low ranks are less likely to be errors, but we cannot justify such biases in general. For a given classiﬁcation, there may be x, 0 ≤ x ≤ k, false positive examples. Since the number of errors is ﬁxed, there are k − x false negative examples. Figure 3 shows the corresponding conﬁguration. The two regions of examples with classiﬁcation outputs above and below the threshold are separated by a vertical line. For a given x, the computation of the AUC, A, as given by Eq. (1) can be divided into the following three parts: A1 + A2 + A3 A= , with (2) mn A1 = the sum over all pairs (xi , yj ) with xi and yj in distinct regions; A2 = the sum over all pairs (xi , yj ) with xi and yj in the region above the threshold; A3 = the sum over all pairs (xi , yj ) with xi and yj in the region below the threshold. The ﬁrst term, A1 , is easy to compute. Since there are (m − (k − x)) positive examples above the threshold and n − x negative examples below the threshold, A1 is given by: A1 = (m − (k − x))(n − x) (3) To compute A2 , we can assign to each negative example above the threshold a position based on its classiﬁcation rank. Let position one be the ﬁrst position above the threshold and let α1 < . . . < αx denote the positions in increasing order of the x negative examples in the region above the threshold. The total number of examples classiﬁed as positive is N = m − (k − x) + x. Thus, by deﬁnition of A2 , x A2 = (N − αi ) − (x − i) (4) i=1 where the ﬁrst term N − αi represents the number of examples ranked higher than the ith example and the second term x − i discounts the number of negative examples incorrectly ranked higher than the ith example. Similarly, let α1 < . . . < αk−x denote the positions of the k − x positive examples below the threshold, counting positions in reverse by starting from the threshold. Then, A3 is given by: x A3 = (N − αj ) − (x − j) (5) j=1 with N = n − x + (k − x) and x = k − x. Combining the expressions of A1 , A2 , and A3 leads to: A= A1 + A2 + A3 (k − 2x)2 + k ( =1+ − mn 2mn x i=1 αi + mn x j=1 αj ) (6) Lemma 2 For a ﬁxed x, the average value of the AUC A is given by: < A >x = 1 − x n + k−x m 2 (7) x Proof. The proof is based on the computation of the average values of i=1 αi and x j=1 αj for a given x. We start by computing the average value < αi >x for a given i, 1 ≤ i ≤ x. Consider all the possible positions for α1 . . . αi−1 and αi+1 . . . αx , when the value of αi is ﬁxed at say αi = l. We have i ≤ l ≤ N − (x − i) since there need to be at least i − 1 positions before αi and N − (x − i) above. There are l − 1 possible positions for α1 . . . αi−1 and N − l possible positions for αi+1 . . . αx . Since the total number of ways of choosing the x positions for α1 . . . αx out of N is N , the average value < αi >x is: x N −(x−i) l=i < αi >x = l l−1 i−1 N −l x−i (8) N x Thus, x < αi >x = x i=1 i=1 Using the classical identity: x < αi >x = N −(x−i) l−1 l i−1 l=i N x u p1 +p2 =p p1 N l=1 l N −1 x−1 N x i=1 N −l x−i v p2 = = N l=1 = u+v p N (N + 1) 2 x l−1 i=1 i−1 N x l N −l x−i (9) , we can write: N −1 x−1 N x = x(N + 1) 2 (10) Similarly, we have: x < αj >x = j=1 x Replacing < i=1 αi >x and < Eq. (10) and Eq. (11) leads to: x j=1 x (N + 1) 2 (11) αj >x in Eq. (6) by the expressions given by (k − 2x)2 + k − x(N + 1) − x (N + 1) =1− 2mn which ends the proof of the lemma. < A >x = 1 + x n + k−x m 2 (12) Note that Eq. (7) shows that the average AUC value for a given x is simply one minus the average of the accuracy rates for the positive and negative classes. Proposition 1 Assume that a binary classiﬁcation task with m positive examples and n negative examples is given. Then, the expected value of the AUC A over all classiﬁcations with k errors is given by: < A >= 1 − k (n − m)2 (m + n + 1) − m+n 4mn k−1 m+n x=0 x k m+n+1 x=0 x k − m+n (13) Proof. Lemma 2 gives the average value of the AUC for a ﬁxed value of x. To compute the average over all possible values of x, we need to weight the expression of Eq. (7) with the total number of possible classiﬁcations for a given x. There are N possible ways of x choosing the positions of the x misclassiﬁed negative examples, and similarly N possible x ways of choosing the positions of the x = k − x misclassiﬁed positive examples. Thus, in view of Lemma 2, the average AUC is given by: < A >= k N x=0 x N x (1 − k N x=0 x N x k−x x n+ m 2 ) (14) r=0.05 r=0.01 r=0.1 r=0.25 0.0 0.1 0.2 r=0.5 0.3 Error rate 0.4 0.5 .00 .05 .10 .15 .20 .25 0.5 0.6 0.7 0.8 0.9 1.0 Mean value of the AUC Relative standard deviation r=0.01 r=0.05 r=0.1 0.0 0.1 r=0.25 0.2 0.3 Error rate r=0.5 0.4 0.5 Figure 3: Mean (left) and relative standard deviation (right) of the AUC as a function of the error rate. Each curve corresponds to a ﬁxed ratio of r = n/(n + m). The average AUC value monotonically increases with the accuracy. For n = m, as for the top curve in the left plot, the average AUC coincides with the accuracy. The standard deviation decreases with the accuracy, and the lowest curve corresponds to n = m. This expression can be simpliﬁed into Eq. (13)3 using the following novel identities: k X N x x=0 k X N x x x=0 ! N x ! ! N x ! = = ! k X n+m+1 x x=0 (15) ! k X (k − x)(m − n) + k n + m + 1 2 x x=0 (16) that we obtained by using Zeilberger’s algorithm4 and numerous combinatorial ’tricks’. From the expression of Eq. (13), it is clear that the average AUC value is identical to the accuracy of the classiﬁer only for even distributions (n = m). For n = m, the expected value of the AUC is a monotonic function of the accuracy, see Fig. (3)(left). For a ﬁxed ratio of n/(n + m), the curves are obtained by increasing the accuracy from n/(n + m) to 1. The average AUC varies monotonically in the range of accuracy between 0.5 and 1.0. In other words, on average, there seems nothing to be gained in designing speciﬁc learning algorithms for maximizing the AUC: a classiﬁcation algorithm minimizing the error rate also optimizes the AUC. However, this only holds for the average AUC. Indeed, we will show in the next section that the variance of the AUC value is not null for any ratio n/(n + m) when k = 0. 4 The Variance of the AUC 2 Let D = mn + (k−2x) +k , a = i=1 αi , a = j=1 αj , and α = a + a . Then, by 2 Eq. (6), mnA = D − α. Thus, the variance of the AUC, σ 2 (A), is given by: (mn)2 σ 2 (A) x x = < (D − α)2 − (< D > − < α >)2 > = < D2 > − < D >2 + < α2 > − < α >2 −2(< αD > − < α >< D >) (17) As before, to compute the average of a term X over all classiﬁcations, we can ﬁrst determine its average < X >x for a ﬁxed x, and then use the function F deﬁned by: F (Y ) = k N N x=0 x x k N N x=0 x x Y (18) and < X >= F (< X >x ). A crucial step in computing the exact value of the variance of x the AUC is to determine the value of the terms of the type < a2 >x =< ( i=1 αi )2 >x . 3 An essential difference between Eq. (14) and the expression given by [15] is the weighting by the number of conﬁgurations. The authors’ analysis leads them to the conclusion that the average AUC is identical to the accuracy for all ratios n/(n + m), which is false. 4 We thank Neil Sloane for having pointed us to Zeilberger’s algorithm and Maple package. x Lemma 3 For a ﬁxed x, the average of ( i=1 αi )2 is given by: x(N + 1) < a2 > x = (3N x + 2x + N ) 12 (19) Proof. By deﬁnition of a, < a2 >x = b + 2c with: x x α2 >x i b =< c =< αi αj >x (20) 1≤i

5 0.37585938 192 nips-2003-Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes

Author: Kevin P. Murphy, Antonio Torralba, William T. Freeman

6 0.37473863 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games

7 0.37118816 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition

8 0.37044317 124 nips-2003-Max-Margin Markov Networks

9 0.36774379 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

10 0.36773923 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots

11 0.36720598 78 nips-2003-Gaussian Processes in Reinforcement Learning

12 0.36582008 113 nips-2003-Learning with Local and Global Consistency

13 0.36564505 41 nips-2003-Boosting versus Covering

14 0.36530027 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction

15 0.36384773 172 nips-2003-Semi-Supervised Learning with Trees

16 0.36315489 143 nips-2003-On the Dynamics of Boosting

17 0.36291063 4 nips-2003-A Biologically Plausible Algorithm for Reinforcement-shaped Representational Learning

18 0.36189798 145 nips-2003-Online Classification on a Budget

19 0.3614589 104 nips-2003-Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks

20 0.36096746 126 nips-2003-Measure Based Regularization