nips nips2009 nips2009-249 nips2009-249-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shobha Venkataraman, Avrim Blum, Dawn Song, Subhabrata Sen, Oliver Spatscheck
Abstract: We formulate and address the problem of discovering dynamic malicious regions on the Internet. We model this problem as one of adaptively pruning a known decision tree, but with additional challenges: (1) severe space requirements, since the underlying decision tree has over 4 billion leaves, and (2) a changing target function, since malicious activity on the Internet is dynamic. We present a novel algorithm that addresses this problem, by putting together a number of different “experts” algorithms and online paging algorithms. We prove guarantees on our algorithm’s performance as a function of the best possible pruning of a similar size, and our experiments show that our algorithm achieves high accuracy on large real-world data sets, with significant improvements over existing approaches.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27] Brightmail. http://www.brightmail.com. SpamAssassin. http://www.spamassassin.apache.org. SpamHaus. http://www.spamhaus.net. B LUM , A., AND M ANSOUR , Y. From external to internal regret. In In Proceedings of 18th Annual Conference on Computational Learning Theory (COLT 2005) (2005). C ESA -B IANCHI , N., F REUND , Y., H AUSSLER , D., H ELMBOLD , D. P., S CHAPIRE , R. E., AND WAR MUTH , M. K. How to use expert advice. J. ACM 44, 3 (1997), 427–485. C OLLINS , M. P., S HIMEALL , T. J., FABER , S., NAIES , J., W EAVER , R., AND S HON , M. D. Using uncleanliness to predict future botnet addresses. In Proceedings of the Internet Measurement Conference (2007). C ORMODE , G., KORN , F., M UTHUKRISHNAN, S., AND S RIVASTAVA , D. Diamond in the rough: Finding hierarchical heavy hitters in multi-dimensional data. In SIGMOD ’04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data (2004). D OMINGOS , P., AND H ULTEN , G. Mining high-speed data streams. In Proceedings of ACM SIGKDD (2000), pp. 71–80. E STAN , C., S AVAGE , S., AND VARGHESE , G. Automatically inferring patterns of resource consumption in network traffic. In Proceedings of SIGCOMM’03 (2003). F REUND , Y., S CHAPIRE , R. E., S INGER , Y., AND WARMUTH , M. K. Using and combining predictors that specialize. In Proceedings of the Twenty-Ninth Annual Symposium on the Theory of Computing (STOC) (1997), pp. 334–343. H ELMBOLD , D. P., AND S CHAPIRE , R. E. Predicting nearly as well as the best pruning of a decision tree. Machine Learning 27, 1 (1997), 51–68. H ERBSTER , M., AND WARMUTH , M. Tracking the best expert. Machine Learning 32, 2 (August 1998). J IN , R., AND AGARWAL , G. Efficient and effective decision tree construction on streaming data. In Proceedings of ACM SIGKDD (2003). J UNG , J., K RISHNAMURTHY, B., AND R ABINOVICH , M. Flash crowds and denial of service attacks: Characterization and implications for cdns and websites. In Proceedings of the International World Wide Web Conference (May 2002). J UNG , J., AND S IT, E. An empirical study of spam traffic and the use of DNS black lists. In Proceedings of Internet Measurement Conference (IMC) (2004). KOHLER , E., L I , J., PAXSON , V., AND S HENKER , S. Observed structure of addresses in IP traffic. IEEE/ACM Transactions in Networking 14, 6 (2006). K RISHNAMURTHY, B., AND WANG , J. On network-aware clustering of web clients. In Proceedings of ACM SIGCOMM (2000). M AO , Z. M., S EKAR , V., S PATSCHECK , O., VAN DER M ERWE , J., AND VASUDEVAN , R. Analyzing large ddos attacks using multiple data sources. In ACM SIGCOMM Workshop on Large Scale Attack Defense (2006). R AMACHANDRAN , A., AND F EAMSTER , N. Understanding the network-level behavior of spammers. In Proceedings of ACM SIGCOMM (2006). S LEATOR , D. D., AND TARJAN , R. E. Amortized efficiency of list update and paging rules. In Communications of the ACM (1985), vol. 28, pp. 202–208. S OLDO , F., M ARKOPOULO, A., AND A RGYRAKI, K. Optimal filtering of source address prefixes: Models and algorithms. In Proceedings of IEEE Infocom 2009 (2009). T WINING , D., W ILLIAMSON , M. M., M OWBRAY, M., AND R AHMOUNI , M. Email prioritization: Reducing delays on legitimate mail caused by junk mail. In USENIX Annual Technical Conference (2004). V ENKATARAMAN, S., B LUM , A., S ONG , D., S EN , S., AND S PATSCHECK , O. Tracking dynamic sources of malicious activity at internet-scale. Tech. Rep. TD-7NZS8K, AT&T; Labs, 2009. V ENKATARAMAN, S., S EN , S., S PATSCHECK , O., H AFFNER , P., AND S ONG , D. Exploiting network structure for proactive spam mitigation. In Proceedings of Usenix Security’07 (2007). X IE , Y., Y U , F., ACHAN , K., G ILLUM , E., , G OLDSZMIDT, M., AND W OBBER , T. How dynamic are IP addresses? In Proceedings of ACM SIGCOMM (2007). Z HANG , J., P ORRAS , P., AND U LRICH , J. Highly predictive blacklists. In Proceedings of Usenix Security’08 (2008). Z HANG , Y., S INGH , S., S EN , S., D UFFIELD , N., AND L UND , C. Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications. In IMC ’04: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement (New York, NY, USA, 2004), ACM, pp. 101–114. 9