CMAND: Transport Traffic Analysis: SpamFlow

Transport Traffic Analysis: SpamFlow

Overview

What: SpamFlow is a spam detection technique that relies on neither content nor reputation analysis. Instead, SpamFlow investigates the discriminatory power of the email TCP packet stream.
Why: Spam's low penetration rate requires that spammers send extremely large volumes of mail, increasingly through botnets, in order to remain commercially viable. Botnet hosts are typically widely distributed with low, asymmetric bandwidth Internet connections. Therefore, while legitimate mail traffic is well-behaved, we observe small congestion windows, retransmissions, loss and large latencies in spam flows.
How: Using machine learning and feature selection to identify the most selective flow properties, thereby adapting to different networks and users.
Benefit: By capitalizing on spam's fundamental requirement to source large quantities of mail, often from resource constrained hosts and networks, SpamFlow promises a unique and difficult-to-subvert complement to existing spam defenses.
Use: SpamFlow is implemented as a daemon and a SpamAssassin plugin. Details of the architecture are provided in our USENIX LISA 2011 paper.

SpamFlow is implemented as a daemon and a SpamAssassin plugin. See our LISA 2011 paper for details.
SpamFlow will be publicly available under an Opensource license in January, 2012
Please sign up for the SpamFlow Mailing List to receive notification

Thanks to the NSF and Cisco for support
Our Transport Traffic Analysis page contains additional details of the larger project