hype-free: Malicious hosts

Wednesday, August 29, 2007

Malicious hosts

There is a new study on the honeynet site, titled Know Your Enemy: Malicious Web Servers. While the study is interesting, there isn't anything particularly new about it. The methodology was very similar to other studies in this area (the Google Ghost in the browser - warning, PDF - study or the Microsoft HoneyMonkey project<) - essentially it was a set of virtual machines running unpatched versions of the OS which were directed to the malicious links and any changes in them (created files, processes, etc) were recorded.

The most interesting part (for me) however was the Defense Evaluation / Blacklisting part. When applied on their dataset the very famous hosts file maintained by winhelp2002 blocked all infections, although it contained only a minority (12%) of the domains. This means that the majority of bad code out there are redirectors and that these lists managed to include (at least until now) the true sources of the infections. This is a very interesting and it shows that while the number of different points of contact with malicious intent on the Internet increases very rapidly, their variation doesn't quite as rapidly and blacklisting technologies are still effective (and by the same logic, AV systems can still be effective).

An other interesting aspect of this data is that almost half of the malicious links is hosted in the US (this data was generated by a small Perl script which can be seen below and has several weak-points - for example some hosts have been taken down and it does not differentiate between sites which were possibly hacked and probably only contain IFRAMEs / redirects and sites which intentionally hosts malicious files. It also counts physical IP addresses rather than host names - this is not a flaw per-se, but it must be noted if we want to make any meaningful comparison). The second most frequent hosting location is, drum roll, China. A quick'n'dirty summary of the results is:

Country	IP count
US	470
CN	429
Unknown	51
DE	47
RU	45
IT	25
CA	22
GB	16
TW	11
FR	8
NL	8
CZ	7
...	...

Again, these results do not differentiate between redirectors and infection sources, hacked and purposefully malicious sites. Even so, the results suggest that blocking IP ranges representing countries / regions which are not the target for a business can improve the security at least by 50% from the point of view of random (non-targeted) browser exploits.

The script used to generate this data (bear in mind that this is script hacked together for quick results):

#!/usr/bin/perl
use strict;
use warnings;
my %ips;

foreach (<*>) {
    next unless -f;
    next if /\.pl$/i;
    
    open F, $_;
    while () {
        chomp;
        next unless /https?:\/\/([^\/\"]+)/i;

        my $ip = $1;
        if ($ip !~ /^\d+\.\d+\.\d+\.\d+$/) {
            $ip = gethostbyname($ip);
            $ip = join(".", unpack("C*", $ip));
        }
        next unless defined($ip);
        $ips{$ip} = 0;
        print "$ip\n";
    }
    close F;    
}

use IP::Country::DNSBL;

my %countries;
my $reg = IP::Country::DNSBL->new();

foreach (keys %ips) {
    my $cnt = $reg->inet_atocc($_);
    print "$cnt\n";
    $countries{$cnt} = 0 unless (exists $countries{$cnt});
    $countries{$cnt}++;
}

print "---------------------------\n";
foreach (sort { $countries{$a} <=> $countries{$b} } keys %countries) {
    print "$_\t", $countries{$_}, "\n";
}