Robert Beverly and Karen Sollins.
Proceedings of the Fifth Conference on Email and Anti-Spam
(CEAS 2008),
Mountain View, CA, August 2008.
We present a spam detection technique that relies on neither content nor reputation analysis. Instead, this work investigates the discriminatory power of the email TCP packet stream. From a corpus of packet flows and their corresponding messages, we extract per-email \emph{transport-layer} features. While legitimate mail traffic is well-behaved, we observe small congestion windows, retransmissions, loss and large latencies in spam flows. To identify the most selective flow properties, thereby adapting to different networks and users, we build ``SpamFlow.'' On our data, SpamFlow achieves greater than 90\% classification accuracy while correctly identifying 78\% of the false negatives from a popular content filter. By capitalizing on spam's fundamental requirement to source large quantities of mail, often from resource constrained hosts and networks, SpamFlow promises a unique and difficult-to-subvert complement to existing spam defenses.
[Postscript(547KB)]
[PDF(176KB)]
[BibTeX]
[Tech Report]
[Presentation Slides]
[ Return to publications ]