Back to Top

Friday, October 30, 2009

The importance of false positives

2748438226_c0ed3e06f6_o An interesting paper was bought to my attention recently by this blog post: The Base Rate Fallacy and its implications for the difficulty of Intrusion Detection. The central question of this paper is: if we have a flow of N packets per day and our network IDS has a false-positive rate of X, what is the probability that we are experiencing a real attack, given that the IDS says that we are? The paper uses Bayes’ theorem (of which you can find a nice explanation here) to put some numbers in and to get horrifying results (many false alerts), and to conclude that such a rate of FPs seriously undermines the credibility of the system.

The issue of false positives is also a concern in the anti-malware industry. And while I rant quite a bit about the AV industry, you have to give this one to them: the number of false positives is really low. For example, in the AV-Comparatives test 20 false positives is considered many, even though the collection is over 1 500 000 samples (so the acceptable FP rate is below 0.0015%!). Update: David Harley was kind enough to correct me, because I was comparing apples (the number of malware samples) to oranges (the number of clean files falsely detected). So here is an updated calculation: the Bit9 Global File Registry has more than 6 billion files indexed (they index clean files). Consider whatever percent from that which is used by AV-Comparatives for FP testing (as David correctly pointed out, the cleanset size of AV-Comparatives is not public information – although I would be surprised if it was less than 1 TB). Some back-of-the-napkin calculations: lets say that AV-Comparatives has only one tenth of one percent of the 6 billion files, which would result in 600 000 files. Even so, 20 files out of 600 000 is just 0.003%.

Now there were (and will be) a couple of big f***-ups by different companies (like detecting files from Windows), but still, consumers have a very good reason to trust them. Compare this with more “chatty” solutions like software firewalls or – why not – the UAC. Any good security solution needs to have at least this level of FPs and much better detection. AV companies with low FP rates – we salute you!

PS. There might be an argument to be made that different false-positives should be weighted differently (for example depending on the popularity of the file) to emphasize the big problems (when out-of-control heuristics start detecting Windows components for example). That is a valid argument which can be analyzed, but the fact remains that FP rates of AV solutions, is very low!

Picture taken from wadem's photostream with permission.

1 comment:

  1. Anonymous8:58 PM

    this was the best explanation i encountered so far: http://yudkowsky.net/rational/bayes

    ReplyDelete