Network managers are inevitably called upon to associate network traffic with particular applicat... more Network managers are inevitably called upon to associate network traffic with particular applications. Indeed, this operation is critical for a wide range of management functions ranging from debugging and security to analytics and policy support. Traditionally, managers have relied on application adherence to a well established global port mapping: Web traffic on port 80, mail traffic on port 25 and so on. However, a range of factors -including firewall port blocking, tunneling, dynamic port allocation, and a bloom of new distributed applications -has weakened the value of this approach. We analyze three alternative mechanisms using statistical and structural content models for automatically identifying traffic that uses the same application-layer protocol, relying solely on flow content. In this manner, known applications may be identified regardless of port number, while traffic from one unknown application will be identified as distinct from another. We evaluate each mechanism's classification performance using realworld traffic traces from multiple sites.
Remote code injection exploits inflict a significant societal cost, and an active underground eco... more Remote code injection exploits inflict a significant societal cost, and an active underground economy has grown up around these continually evolving attacks. We present a methodology for inferring the phylogeny, or evolutionary tree, of such exploits. We have applied this methodology to traffic captured at several vantage points, and we demonstrate that our methodology is robust to the observed polymorphism. Our techniques revealed non-trivial code sharing among different exploit families, and the resulting phylogenies accurately captured the subtle variations among exploits within each family. Thus, we believe our methodology and results are a helpful step to better understanding the evolution of remote code injection exploits on the Internet.
Modern network worms spread with tremendous speed-potentially covering the planet in mere seconds... more Modern network worms spread with tremendous speed-potentially covering the planet in mere seconds. However, for most worms, this prodigious pace continues unabated long after the outbreak's incidence has peaked. Indeed, it is this ongoing infection activity that is typically used to identify compromised hosts. In principle, a stealthier worm might eliminate this telltale sign by coordinating its members to halt infection activity after the vulnerable population is subverted. Thus, after a short initial spreading period all infected hosts could become quiescent "sleeper agents." In this paper, we show that such "self-stopping" capabilities are trivial to add to existing worms, and can be efficiently implemented without any explicit coordination or additional network traffic.
The rapid evolution of large-scale worms, viruses and botnets have made Internet malware a pressi... more The rapid evolution of large-scale worms, viruses and botnets have made Internet malware a pressing concern. Such infections are at the root of modern scourges including DDoS extortion, on-line identity theft, SPAM, phishing, and piracy. However, the most widely used tools for gathering intelligence on new malware -network honeypots -have forced investigators to choose between monitoring activity at a large scale or capturing behavior with high fidelity. In this paper, we describe an approach to minimize this tension and improve honeypot scalability by up to six orders of magnitude while still closely emulating the execution behavior of individual Internet hosts. We have built a prototype honeyfarm system, called Potemkin, that exploits virtual machines, aggressive memory sharing, and late binding of resources to achieve this goal. While still an immature implementation, Potemkin has emulated over 64,000 Internet honeypots in live test runs, using only a handful of physical servers.
Network managers are inevitably called upon to associate network traffic with particular applicat... more Network managers are inevitably called upon to associate network traffic with particular applications. Indeed, this operation is critical for a wide range of management functions ranging from debugging and security to analytics and policy support. Traditionally, managers have relied on application adherence to a well established global port mapping: Web traffic on port 80, mail traffic on port 25 and so on. However, a range of factors -including firewall port blocking, tunneling, dynamic port allocation, and a bloom of new distributed applications -has weakened the value of this approach. We analyze three alternative mechanisms using statistical and structural content models for automatically identifying traffic that uses the same application-layer protocol, relying solely on flow content. In this manner, known applications may be identified regardless of port number, while traffic from one unknown application will be identified as distinct from another. We evaluate each mechanism's classification performance using realworld traffic traces from multiple sites.
Remote code injection exploits inflict a significant societal cost, and an active underground eco... more Remote code injection exploits inflict a significant societal cost, and an active underground economy has grown up around these continually evolving attacks. We present a methodology for inferring the phylogeny, or evolutionary tree, of such exploits. We have applied this methodology to traffic captured at several vantage points, and we demonstrate that our methodology is robust to the observed polymorphism. Our techniques revealed non-trivial code sharing among different exploit families, and the resulting phylogenies accurately captured the subtle variations among exploits within each family. Thus, we believe our methodology and results are a helpful step to better understanding the evolution of remote code injection exploits on the Internet.
Modern network worms spread with tremendous speed-potentially covering the planet in mere seconds... more Modern network worms spread with tremendous speed-potentially covering the planet in mere seconds. However, for most worms, this prodigious pace continues unabated long after the outbreak's incidence has peaked. Indeed, it is this ongoing infection activity that is typically used to identify compromised hosts. In principle, a stealthier worm might eliminate this telltale sign by coordinating its members to halt infection activity after the vulnerable population is subverted. Thus, after a short initial spreading period all infected hosts could become quiescent "sleeper agents." In this paper, we show that such "self-stopping" capabilities are trivial to add to existing worms, and can be efficiently implemented without any explicit coordination or additional network traffic.
The rapid evolution of large-scale worms, viruses and botnets have made Internet malware a pressi... more The rapid evolution of large-scale worms, viruses and botnets have made Internet malware a pressing concern. Such infections are at the root of modern scourges including DDoS extortion, on-line identity theft, SPAM, phishing, and piracy. However, the most widely used tools for gathering intelligence on new malware -network honeypots -have forced investigators to choose between monitoring activity at a large scale or capturing behavior with high fidelity. In this paper, we describe an approach to minimize this tension and improve honeypot scalability by up to six orders of magnitude while still closely emulating the execution behavior of individual Internet hosts. We have built a prototype honeyfarm system, called Potemkin, that exploits virtual machines, aggressive memory sharing, and late binding of resources to achieve this goal. While still an immature implementation, Potemkin has emulated over 64,000 Internet honeypots in live test runs, using only a handful of physical servers.
Uploads
Papers by Justin Ma