Back to Top

Tuesday, July 31, 2007

Mixed links and commentary


Via a tool to load arbitrary unsigned drivers under Vista without playing with the boot parameters. Very nice. I didn't play with it, but I assume that it does this by loading its (signed) driver, then using that to perform the load from kernel mode. The question remains: can't Microsoft revoke their certificate, so that this driver can no longer be used? If not (and that's what I've heard), the benefits of driver signing are gone.

How to discover a buffer overflow in less than 30 minutes. Cool, just remember that the absence of vulnerabilities can not be proven by attacking it.

The Microsoft file list database, if you want too lookup a file to decide if it's truly a MS file (of course you also need to check the validity of its certificate)

Via LonerVamp:

DNS pinning explained. From what I know (and that's not much ;-)), the solution would be to disallow the forging of headers from XMLHttpQuery (why was it in there in the first place!?)

Satori, an OS fingerprinting tool for Windows. I wonder how it compares to nmap or p0f, and how it handles changing the parameters of the Windows TCP/IP stack? (Via PaulDotCom)

Changing your MAC address programatically under Windows - used to to this manually, because I found the idea strange to pay for an utility which does the same as you cn do from a few clicks. Anyway, here is an interesting related tidbid of information, maybe somebody will find it useful: while Windows (tested with XP SP2) can spoof the MAC address, it does not take it kindly (read: everything goes haywire) when it is spoofed from hardware. I was putting two Qemu virtual machines in a virtual network, and to make them have different MAC addresses (so that the network can actually work :)), I gave one 12:34:56:78:9A:BC and the other one :BD. With windows this didn't work (all kind of strange things happening, like ignoring ARP replies, etc), while with Linux (tested with DSL - Damn Small Linux) everything was perfect. The problem disappeared after I made sure that the MAC address for the Windows machine contained the correct OUI in the first six bytes.

Security videos

Using telnet to test your e-mail - I always forget it too :)

The Cisco challenge


Today being (very probably - there is an oxymoron for you) the last day I play the Cisco Networking Academy challenge (but the first day you might play it), I thought it may be useful to share some thoughts (cough-cough brag) about it.

The challenge is very simple: you can answer fifty questions each day (plus one Question of the Day for each weekday). Your score is reset to zero at the beginning of each month (this is one of the reasons I decided to quit playing now). There are some possible prizes you can buy with your points, but they are highly symbolic (like wallpapers or screensavers) - being Cisco I thought they would give more physical prizes like routers or switches. I have also other griefs with the system:

  • You get the next round of questions depending on your timezone, so there are people who will be one set of questions before you always.
  • The questions are highly repetitive, it is not unusual for questions to be repeated during the same batch of 50 questions
  • Very little effort is put into randomizing the questions (most of the time the order of the possible answers remains the same for example), so many of them can be answered out of muscle memory
  • Too big of an accent is put (IMHO) on subnetting questions, which gets boring and frustrating after a while
  • There is a way to pick-and-choose the questions you want to answer, and although it is timeconsuming, coupled with the fact that the number of points varies greatly (from 100 points to 550 points) depending on the question, this can be used to answer only more valuable questions.

Anyway, here is the bragging part :-D

This is the result from the first month the contest launched (in the first month they allowed 100 questions to be answered daily, thus the higher scores). Now, in the second month I'm in position six. During all this time I was in second place for the Central and Eastern Europe region. An my background is: I've taken all four semesters of CCNA, but I never did my final exam. Also, system administration is not the core of my job, rather something I have to do from time to time.

Two channel authentication - part two


I've had some excellent replies to my last post (including the CTO of PhoneFactor - probably via Google Alerts or something similar ;) - I don't delude myself into thinking that he reads my blog :)), so I thought I expand a little on the subject:

As it was pointed out in the comments, this does not prevent active MITM attacks. Then again, security is a cat-and-mouse game, and as long as you stay ahead of the curve, you can reduce your risk to the minimum. There are also other approaches to security which can compliment this, like IP Geolocation. Then again (just to make my point about this being a cat-and-mouse game from the technological point of view), bad guys can host servers near you to avoid triggering alarms from IP Geolocation systems. They can even host the code which modifies the transaction on the fly inside your browser in which case IP Geolocation becomes useless.

The elimination of passive attacks (ie. password capture) has the (small) advantage that the time-frame for attackers to act is limited, and while this is not such a big hurdle from an attack standpoint (since the attack itself can be automated), it means that it can be discovered really fast - for example if I fall for an active MITM attack and after doing my trasaction I go down to my bank and ask for the list of transfers, the fraudulent transfer will clearly be present, since the attack can be executed only while I'm logged in. With a passive attack they can withdraw money any time in the future. (Observe that I did not say check my balance / activity history on my computer, since it is possible for attackers to alter the output of the browser to hide the malicious activities, so a second channel needs to be used).

Also I was pointed to the blog posting Only the Easiest Way is the Secure Way (which resonates well with the things said by the second commenter). This is so true. So, while in theory you could thwart some active MITM attacks by good security practices (like checking the validity of SSL certificates), most (and I mean most - like 99.9999%) users won't (know to) do that. Something like NIM (again, no affiliation, just read it in the comments of Roger's Security Blog, and they have a cute animation) has the same capabilities (by which I mean that NIM = client side certificates and proper checking of the server certificates) and probably costs money. Sure, it provides a nicer user interface, but ultimately it is just as vulnerable to active MITM attacks which is performed directly in the clients machine (eg. software which changes the destination account just before the transaction is submitted for example).

Also, I got a video to an OpenID presentation. While OpenID is nice and is very useful for having a federated identity, it's not multi-factor usually (although it can be, if you identity provider chooses so) and has the same problem of single channel authentications: the electronic data representing your proof of identity can still relatively easy be copied and replayed.

Finally, I would like to present a solution employed here in Romania (and probably elsewhere) by ING (again, no affiliation, not even a customer):

When you sign up for online banking, you get a device similar to a pocket calculator. This works as a RSA token at login, but also, when you do a transfer you are asked to punch in the details of the transfer (like the targer, the sum, etc) in it, and it will generate a one-time code which you must provide with the transfer for it to be accepted. This is very cool, and possibly (depending on the implementation) very secure, but a little hard to use. One possible extension of this would be for the webpage to display a 2D barcode with the details of the transfer and for the device to contain a barcode reader, so that you don't have to type them in twice. Again, there are endless possible ways the implementation of such a device can be screwed up (weak hashing algorithms, weak random number generator, etc), however I think the concept is very secure.

MySQL triggers and stored procedures


So MySQL is trying to be a big boy and have advanced features like triggers and stored procedures (not just UDF's). However their syntax seems a little complicated compared to the PostgreSQL one. So here it goes:


  INSERT INTO test2 SET a2 = NEW.a1;
  DELETE FROM test3 WHERE a3 = NEW.a1;  
  UPDATE test4 SET b4 = b4 + 1 WHERE a4 = NEW.a1;


The play with the delimiter is necessary to be able to put multiple statements (separated by ;) inside of the trigger. The DROP TRIGGER IF EXISTS construct is the equivalent of the CREATE OR REPLACE construct from PostgreSQL.

The syntax for procedures / functions is similar:

CREATE PROCEDURE simpleproc (OUT param1 INT)

Sunday, July 29, 2007

Updating PHP in XAMPP for Windows


Inspired by the YAIG blog, here is my how to do it post:

XAMPP is a great suite to quickly get up and running with Apache, PHP, Perl and MySQL. Warning! It is not aimed to be used in a production environment! Its settings are geared towards ease of use rather then security!

However I went against my own advice and used it in a couple of instances, however these are not public facing sites, rather internal and heavily firewalled services. Even so whenever a security update to PHP comes out, I feel the need to use the new version. Here is how you can do it, without waiting for the official XAMPP package to update (you really can't criticize them for not updating faster, since it is a development not a production server and as such minor PHP releases shouldn't influence your code).

Warning! Use these instructions on your own risk. They are written to the best of my knowledge, however I can't make any guarantees. Always backup your data.

  1. Download the latest binary version of PHP (make sure to get the .zip package not the installer)
  2. De-archive it to a directory
  3. Copy the contents of the directory in the php subfolder of your XAMPP installation directory, overwriting the files which are already present
  4. Overwrite the files which are already present in the apache\bin directory with the newer versions.
  5. Now the trick: take the files which have a _2 in their names (for example php5apache2_2.dll or php5apache2_2_filter.dll), copy them in the apache\bin subdirectory and remove the _2 part, overwriting the existing files. This is necessary because by XAMPP uses Apache version 2.2 and the files with the 2 prefix are built for Apache 2.0, so you must take the files build for the newer version (which has a different plugin interface) and rename them in the filenames XAMPP expects.

Recovering deleted files the DIY way


I can't really remember if I've written about this or not (old age I suppose :-p), so here it goes:

There are certainly easier (and better) ways to do it, here is the DIY way for those who enjoy some hands-on fun:

  1. Save the contents of the entire partition (or disk) in a separate file. While this step is not an absolute must, it's still better than playing with the original media and risking unrecoverable damage (to the information, not to the physical image). You can do this for example with the free HxD HexEditor (under Windows) or with the dd command under Linux.
  2. Now you should go through the data (again, you can access directly the original media, but it's better to work on a backup copy), and look for the headers of the file types you wish to recover and drop a predefined amount of data from that location. The principles behind this approaches are:
    • Binary files usually have a size field in their headers so that having junk at the end doesn't influence the ability of the programs to use them.
    • The fragmentation on SD cards for digital cameras is very low, meaning that the file data is layed out in a sequential way with a very high probability
    If these principles don't apply, you can have less than ideal results.

Below is a quick and dirty Perl script implementing this approach for JPEG images:

use strict;
use warnings;

my $fileName = "Removable Disk 1";
my $picSize = 4*1024*1024;

open F, $fileName;
binmode F;

foreach my $strpos (1 .. -s $fileName) {
 seek(F, $strpos, 0);
 my $str;
 read F, $str, 10;

 if ($str =~ /Exif$/) {
  print "$strpos\n";

  seek(F, $strpos, 0);
  read F, $str, $picSize;

  open O, ">$strpos.jpg";
  binmode O;
  print O $str;
  close O;

close F;

A final tip: you can clean-up the resulting files usually by opening and saving them with a program (which should strip the junk from the end). You can do this easily for image files with the batch conversion function of IrfanView, which is a great little freeware tool for iamge viewing / conversion under Windows (as long as you remember to uncheck the Google Toolbar during installation), just remember that converting from lossy image format to lossy image format always means data loss!

Update: Andreas Gohr (the lead devel on DokuWiki) has a nicer solution.

Friday, July 27, 2007

Mixed links


After DefCon we might have a new debugger based on Olly and with Python scripting support.

A nice little (free) tool to view / edit PE files, with plugin support: CFF explorer

Update: it seems that the debugger (btw, de-bugger, what an interesting word) will be made public on August the 3rd.

Two channel authentication


I'm no Bruce Schneier, so I welcome the comments of any more informed and/or more intelligent readers (which shouldn't be too hard ;-)).

Two factor authentication is the buzz these days, it's the silver bullet of the security industry. To provide a short explanation (which will almost certainly leave out essential facts and get others wrong :-p):

To prove your identity to the other party you can use several factors. Usually factors are categorized in three groups:

  • Something you know (and hopefully only you know :)) - like a password, date of birth, SSN, etc
  • Something you have - like and RSA token, smart card, etc
  • Something you are, also known as biometrics - fingerprint readers, iris scanner, etc.

The basic premise of two factor authentication is that providing two factors from different groups (so an user name and password is not two factor!) increases the trust.

The thing I want to discuss is the way the authentication data goes from the user to the server, the channel through which the information flows. Why is this important? Because communication channels have their own weaknesses like replay attack (where the attacker captures the data - without necessarily being able to interpret it - and resends the exact sequence later) or man-in-the-middle attack. The most frequently used communication channel (HTTP over TCP/IP) is very vulnerable and even protocols which deal with data in transit (HTTPS) can not (and are not designed to) deal with endpoint security. To give a possible attack scenario:

  1. The user goes to a website (lets say over HTTPS)
  2. She enters her username and password
  3. She uses her finger with a fingerprint reader to create a hash that uniquely identifies her finger
  4. She sends all the data to the server for authentication
  5. However, there is an information collecting software (trojan) running on her computer, which captures and relays all the data before it enters the HTTPS tunnel
  6. Now all the attacker has to do is to replay all the data, including the fingerprint hash, without even needing a fingerprint reader (the existence of which the remote server can not check, it can only check that it was supplied the correct hash)

One of the problems in this scenario was that the channel (or more precisely one of its ends) got compromised. PC's (and Macs too :-)) are relatively easy to compromise because they were built as general purpose computers. However if one of the factors was transported over an other channel, it would mean that the attacker would need to listen in on two channels. This is what I would call Two factor / Two channel authentication.

What attributes would such a channel need? It should be as widely accessible as possible and should be safe (to the extent possible) against spoofing. The (cell) phone network is something that comes close to this. So here is a possible implementation of such a system using the cell phone network:

  1. When the user registers on a site, she gives her cell phone number
  2. When she needs to log-in, she supplies her username and password
  3. After a very short time her cellphone rings, she picks it up and presses #
  4. Now she authenticated using two factors (something she knows - the username and password - and something she has - her cellphone) and two chennels!

Something like this already exists can be used for free by everyone in need to add a second factor to the authentication: PhoneFactor. (Disclaimer: I have no relations with PhoneFactor or anyone involved with it, I just think that it is a very cool idea).

Some issues with this method:

Can't the attacker just specify an other phone number and authenticate using that? - No, if the system is built properly. That is, the user supplies the phone number at the registration phase (which is supposed to be done in a safe environment) and it can later be changed only after logging in. If there is no safe environment when the user registers for the first time with the website, all bets are off anyway.

But phone numbers can be spoofed! - Yes, but to fool this system, the attacker would have to be able to (a) know the phone number where the call will be placed and (b) compromise the (cell) phone network to re-route calls. While both of these are possible, they are (hopefully) much more difficult than installing a trojan on the users machine

Isn't it a privacy concern that my phone number is stored in a database? - Yes it is, but if the database gets compromised, there are probably more valuable information than your phone number. Unfortunately it's not possible to store your phone number as a one-way hash, because (a) the system needs to able to call it and (b) phone numbers are relatively short (10 numeric characters), so a brute-force is very feasible. This can be mitigated if PhoneFactor would offer a service to store the phone numbers, which could work as follows (I don't know if they have such a system):

  • When a new user register, the site asks PhoneFactor for a UID (Unique Identifier) to be associated with that user
  • The user gives her phone number to the site, which in turn forwards it to PhoneFactor together with the UID and doesn't store it.
  • From here on, the website only needs to send the UID whenever it wants to ask PhoneFactor to verify the user, and it will know which number to call
  • When the user wants to change her phone number, the website would send the UID and the new phone number to be associated with it, without actually storing the phone number

In this case the information would be split up between two places (the website and PhoneFactor) and unless both are compromised, the complete identity-phone number can not be reconstructed.

What if I'm at a place where there is no coverage for my phone or I'm unable to use it for other reasons (battery died, etc)? - then you are SOL, pardon my language. But then again, what if you are in a place where there is no internet access? Or no fingerprint reader?

PS. I argue that the RSA-style one-time pad generators also fall in this category because one could say that there is a virtual channel between the RSA-token and the authentication server, through the clock built into both of them and the serial number of the token.

Approving comments


As I explained earlier, the only reason I prefilter comments is spam. I do not censor comments based on any other criteria. If you comment didn't show up, the only reason is that I'm being lazy (yet again :-p). However there was one comment on my Favicon for blogger (which I will update shortly by the way) post which not very clear how it should be categorized:

Anonymous said...

If your looking for more free icons then take a look at this page:

Free Icons

Only a small collection of icons but they're all free to use in your web design, favicon, blogs..

First let's take a look at what made this comment suspicious:

  1. Commented as Anonymous
  2. Its only purpose was promoting a site

These are things which, although they can be reasonably be justified (for example it can be burdensome to create an account with YAWS - Yet Another Web Site - just to make a comment), raise the suspicion of people and can lead to your comment not being published and categorized as SPAM.

There are three things which swayed me to publish the comment:

  1. I feel strong about giving people credit where credit is due
  2. SiteAdvisor gave the site a thumb up. (A mini rant: quote from the SiteAdvisor result page: We tested this site and didn't find any significant problems. What problems did you find? Or is this just CYA talk? Then again I can't be too mad with them because there is no such thing as a 100% safe site on the internet. Interestingly for well respected sites like the blurb is We've tested this site and found it safe to use, although they include annoyances like third party cookies or popups in the listing)
  3. I visited the site and it looks legitimate (although here is a tip for their designer: try to redesign the sections so that the do no resemble Google Ads so strongly, because my initial reaction was This site is full with ads, even though it was just the navigation menu)

Funny YouTube videos


Via the ComputerDefense blog:

Intel Video Ad Directed by Christopher Guest #1

And here are some others I've found clicking around:

Intel Video Ad Directed by Christopher Guest #2

"Mac or PC" Rap Music Video - Mac vs PC

South Park Mac vs. PC

Shooting yourself in the foot


This is a very old one and you can find it on a ton of sites. Most recently I saw it at the InfoSecPodcast blog. Rather than reposting the whole, here is just my favorite one:

% ls
foot.c foot.h foot.o toe.c toe.o
% rm * .o
rm: .o: No such file or directory
% ls

Wednesday, July 25, 2007

Responsinble behavior


Disclaimer: the views expressed in this post (and on the entire blog) do not necessarily reflect the opinion of my past or current employers. These are entirely my own opinions.

Know your audience! and Never underestimate human stupidity! these are two ideas missed by Alex Eckelberry in his latest blog post. Before I give you the link, repeat after me: I'm not running as Administrator, I have my computer fully updated and I have an AV or HIPS product installed & up-to-date.

Now here is the link to the posting in question: promoting malware?. In it Alex not only gives a link to a site which (supposedly) contains references to malware, he actually solicits (albeit indirectly) the readers to check it out. To his defense the site itself is not malicious and only serves as proxy, and also most of his readers are somewhat knowledgeable in this area, however did he consider that:

  • There could be other exploits served up
  • Maybe not everyone is well protected / prepared?

I'm all for open research in the appropriate forums, but one should always weight the benefits against the risks of posting possibly harmful material to public forums.

Update: there was a bit of back and forth on the Sunbelt blog, with none of the parties admitting anything :). My final opinion is that the blog tries to ride two horses at one (or something like that): to serve both as a marketing vehicle and to be as responsible as possible, which can lead to conflicts of interest.

Hack the Gibson - special edition - aka lucky 13


I've been absent lately with the whole Hack the Gibson series, completely missing the 100th episode for example, not because I wouldn't have material, but because I'm very busy (or very lazy, depending on your viewpoint :-)). However I just wanted to let you know about a usefull resource, which unfortunately seems to be dead (in the sense that the domain seems to have expired).

The site I'm talking about is (again, I didn't provide a link, because you would be met with a generic this domain has expired message). Fortunately most of the content (if not all) is still available at's wayback machine (which as of the time of this writing seems also to be down - this is all a conspiracy I tell ya! :-)). The site consists of a (relatively) large set of materials criticizing Steve Gibson, and, even though the domain name is rather inflamatory, the content is well balanced. Hopefully it will come back someday.

PS. The existence of this site is both reassuring (in the sense that there are others who have similar opinions, and not just anybody, for example the author of Snort is one of them!) and intimidating (because if so many well written material couldn't get Steve to at least tone down his hype-machine, it's very improbably that I can).

And finally here is a quote to remind everybody what I object against (from episode #99, taken directly from the transcript) - the premise of this is that somebody has written in to counter one of Steve's arguments:

STEVE: "In the days before international banking, banks would build elaborate buildings. The reason for this is often considered by non-economists to be competitive. However, economists know that if it were out of competition, there would be similar architectural arms races in other industries. Yet banks were different somehow. The real reason is that the bank could afford to build beautiful buildings, while the fraudsters, who would open a bank and then skip town with the money deposited, could not. A baroque building was a signal of legitimacy. These scenarios are called ‘signaling games’ in economics and game theory that only a legitimate bank could afford to send.

"The problem in the online world, as you well know, is that people use the same rationale. If they go to a phishing site, and it has a nice layout with scripting and menus and animation, they assume it’s real. Enter EV certificates, the online equivalent of building a nice bank. It only makes economic sense to get one if you plan on sticking around. A nice website is a signal that anyone can duplicate, and therefore it isn’t a good signal at all. An EV-enhanced certificate that costs $15,000 per year is not easily duplicated and therefore is an effective signal. If you are legitimate and can’t afford one, you probably are not a target for phishing in the first place." Which actually I thought was sort of a really good point that he made. "If you don’t have the same need to signal your legitimacy as PayPal, eBay, Amazon, or an online bank, all of whom can afford one." And then he says, "I’ve written more on this exact topic if you’re interested," blah blah blah. But anyway, I just - I loved what he said. I mean, this is the kind of really good stuff that’s appearing in the mailbag now, so...

Now please direct your attentions to exhibit A, aka the sentence where Steve refuses to give real credit to the guy (Google to the rescue), even though they praise the letter. This is selfishness and disrespectful of the listeners, who put time and effort into the show and without whom there would be no show!

Update:'s Wayback machine is back up again, so here is the link to the last stored version:

Serving up authenticated static files


Two components which are usually found in web applications are authentication and static files. In this post I will try to show how these two interact. The post will refer to PHP and Apache specifically, since these are the platforms I'm familiar with, however the ideas are generally applicable.

The advantages of static files are: cacheability out of the box (with a dynamically generated result this is very hard to get right) and less overhead when serving up (even more so if something specialized is used like tiny httpd). However you might feel the need to apply authentication to the static files also (that is only users with proper privileges can have access to them). Of course you want to retain the advantages of caching and low overhead as much as possible.

One option (and probably the one with less overhead and ultimately simpler to implement) is to use mod_auth_mysql on the directory hosting the static files and generate a random long (!) username and password for each user session, insert them to the authentication table, and modify the links to the resources to include these credentials. For example, a link in this case might look like this:

http://w7PLTHUDxK:[email protected]/static/image.jpg

The advantage of this approach is that we get all those wonderful things like content type or cache headers (or even zlib compression if we configured it) for free. The main pitfall is the choosing of the place where to do the cleanup (remove this temporary user from the table). The session destroy handler is not good enough since it won't be called if the user doesn't properly log-out. One solution would be to do repeated "garbage collections" on the tables (in this case care must be taken to set this garbage collection interval the same or larger as the session timeout interval, since otherwise the access might "go away" from under the users feet while they are still logged on). An other option would be to add a user id column to the table and use the "REPLACE INTO" SQL command (which is AFAIK unique to MySQL, not standard) to ensure that the temporary user table has at most as many users as the main user table.

A quick note: all the above can of course be done with static authentication also (that is a hardcoded username and password in the .htaccess file). This is a very simple solution (an easier to apply, since mod_auth_mysql might not be installed/enabled on all the webservers, but mod_auth is on most of them), but is insecure, it can not be used to separate users (ie. to have files which only certain users can access) and because it does not expire automatically, one link is enough for search engines / other crawlers to find it.

This is all well and good, but what if you don't have control over the server configuration? While I strongly recommend against using shared PHP hosting, some people might be in this situation. The solution is to recreate (at least some of) Apache's functionality.

The first step is to put the actual static files outside your web root (preferably) or to deny access to the folder where the files are placed with .htaccess (less preferable). If the files would to reside in a public folder, this system would provide obfuscation at best and is equivalent with a 302 or 301 redirect at worst.

The next step is to decide on the method of referencing your static file. You have three options:

  • Put the file name directly in as a GET parameter (for example get_static.php?fn=image.jpg)
  • Use mod_rewrite to simulate a directory structure (static/image.jpg which will be rewritten by a rule into the form showed at the previous point)
  • Use the fact that Apache walks up the path until if finds the first file / directory, so you can do something like get_static.php/image.jpg

The second and third options are the ones I recommend. The reason behind this is that it gives the browser the illusion that you are dealing with different files which can help it do proper caching without relying on the ETag mechanism discussed later.

I would like to pause for a moment and remind everybody that security is a big concern in the web world, since you are practically putting your code out for everybody, meaning that anybody can come and try to break it. One particular concern with these types of scripts (which read and echo back to you arbitrary files) is path traversal. This attack is easy to demonstrate with the following example:

Let's say that the script works by taking the filename given, concatenating it with the directory (which for this example is /home/abcd/static/) and echoing back the given file. Now if I supply in the filename something like ../../../etc/password, the resulting path will be /home/abcd/static/../../../etc/password, meaning that I can read any file the web server has access to. And before the Windows guys start jumping up and down saying that this is a *nix problem, the example is very easy to translate to Windows.

Now your first reaction would be to disallow (blacklist) the usage of the . character in the path, but don't go this way. Rather, define the rule which your files will follow and verify that the supplied parameters follow that rule. For example the filenames will contain one or more alphanumeric, underscore or dash character and will have a png, jpg, css or js extension. This translates into the regular expression ^[a-z0-9_\-]+\.(png|jpg|css|js)$. Be sure to include the start and end anchors (otherwise it only has to contain a substring matching the rule, the whole string doesn't have to match the rule) and watch out for other regular expression gotcha's. As an added security measure use the realpath function (which resolves things like symbolic links or .. sequences) before performing any further verification.

Now we have the file, and need to generate the headers. The important headers are:

  • Content-Size - this is very straight forward, it is the size of the file. While theoretically the HTTP protocol supports other measurement units than bytes, practically bytes are always used
  • Content-Type - this can be obtained using the mime_content_type function, however be aware that sometimes it fails to identify the correct type and action must be taken to correct it (for example a CSS file might be identified as text/plain, but it must be served up text/stylesheet to work in all the browsers)
  • Cache headers - depending on how long you think the clients / intermediate proxies should cache your content, these must be set accordingly.
  • ETag - this is a header which helps the browser distinguish between multiple content sources from the same URL. For example if the link to an image is and to the second one, without an ETag these will represent the same cache entry, meaning that you can have situations where the second image is displayed instead of the first or vice-versa, because the browser operates under the assumption that they are the same and pulls one out of the cache, when instead the should be used. ETag's can be an arbitrary alphanumeric string, so for example you could use the MD5 hash of the file (and no, there is no information disclosure vulnerability here which would warrant the usage of salted hashes for example because the user is already getting the file! S/he can recalculate the MD5 of it is s/he wishes!)
  • Content-Encoding - if you wish and it makes sense to compress your content, be sure to output the proper Content-Encoding header. Also make sure to adjust the Content-Size header, otherwise you could have some serious breakage.
  • Accept-Range - if you wish to enable resume support for the file (that is for the client to be able to start downloading from the middle of file for example), you need to provide (and handle, as described below) this header.

The script also needs to take into account the request headers:

  • If-Modified-Since - the browser is checking the validity of the cached object, so this should return a 304 header if the content didn't change and provide no content body.
  • Accept-Encoding - this should be checked before providing compressed (gzipped) content. Also, beware that some older browser falsely claim to support gzipped content.
  • Range - if you specified that you handle ranges, you must look out for this header and send only which was requested. This of course can further be complicated with compression, in which case you need to take the specified chunk, compress it, make sure to output the correct Content-Length, and the send it
  • ETag - if you supplied an ETag when serving up the content, it will (should) be returned to you, when doing cache checking

After I've written this all up, I've found that there is a PHP extension which provides most of the functions for this: HTTP. Use it. It's much easier than rolling your own and you have less chance to miss some corner cases (like the fact that as per HTTP/1.1 request headers are case-insensitive, meaning that If-Modified-Since and iF-mOdIfIeD-sInCe are the same thing and should be treated the same).

PS. I didn't mention, but mechanism can also be used to hide the real file names. This might be needed when for whatever reason you don't want to divulge it (because file names can provide additional information which you might not want your users to have). This can be achieved by using an additional step and giving the user a token which is translated in a file-name at the server. These tokens can be:

  • Generated from the file name
  • Arbitrarily chosen
  • Created using a random process
  • Created using a deterministic process

For maximum security I recommend to go with arbitrarily chosen random tokens for each file (otherwise an attacker might break the security by trying other IDs - for example if the IDs are numeric, s/he can try other numbers - or by guessing the file names and applying the generator function on it and checking the existence of the file).

Update: I've looked at using mod_xsendfile with PHP, however it seems to be a dormant project (the latest posted version is for Apache 2.0, nothing there for 2.2 :-(). An other option which may be worth exploring is the following (if you are using PHP as a loadable module rather than CGI): use virtual to redirect the request to the static files. You even find a good example in the comments.

Tuesday, July 24, 2007

Compressed HTTP


The HTTP standard allows for the delivered content to be compressed (to be more precise it allows for it to be encoded in different ways, one of the encoding being compression). Under Apache there are two simple ways to do this:

I won't spend much detail on the configuration options, however I want to describe one little quirk, which is logical in hindsight but I struggled with it a little: you can loose the Content-Length header on files which don't fit in your compression buffer from the start. This is course logical because:

  • Headers must be sent before the content
  • If the server must do several read from file - compress - output cycles to compress the whole file, it can't possibly predict accurately (to the byte level) how large / small the compressed version of the file will be. Getting it wrong is risky because client software might rely on this and could lock up in a wait cycle or display incomplete data.

Update: if you want to selectively disable mod_deflate for certain files because of this (or other reasons), check out this post about it.

You can observe this effect when downloading (large) files especially, since the absence of a Content-Length header means that the client can't show a progress bar indicating the percentage you downloaded (this is what I observed at first and then went on to investigate the causes).

One more remark regarding the getting the Content-Length wrong part. One (fairly) common case where this can be an issue is with PHP scripts which output Content-Length headers and the compression is done via zlib.output_compression. The problem is that mod_php doesn't remove the Content-Length header, which almost certainly has a larger value than the size of the compressed data. This causes the hanging, incomplete downloads symptom. To be even more confusing:

  • When using HTTP/1.1 and keep-alive this problem manifest itself.
  • When keep-alive is inactive, the problem disapears (sort-of). What actually happens is that the Content-Length is still wrong, but the actual connection is reset by the server after sending all the data (since no keep-alive = one request per connection). This usually works with clients (both curl and Firefox interpreted it as download complete), but other client software might chose to interpret the condition as failed/corrupted download.

The possible solutions would be:

  • Perform the compression inside your PHP script (possibly caching the compressed version on-disk if it makes sense) and output the correct (ie. the one corresponding to the compressed data) Content-Length header. This is more work, but you will retain the progress-bar when downloading files
  • Use mod_deflate to perform the compression, which removes the Content-Length header if it can't compress the whole data at once (this is not specified in the documenation, but - the beauty of open source - you can take a peak at the source code - the ultimate documentation. Just search for apr_table_unset(r->headers_out, "Content-Length"); ). This will kill the progress bar (for the reasons discussed before). To get back the progress bar, you could increase the DeflateBufferSize configuration parameter (which is by default set to 8k) to be larger than the largest file you wish to serve, or deactivate compression for the files which will be downloaded (rather than displayed).

A final remark: the HTTP protocol also supports the uploaded data to be compressed (this can useful for example when uploading larger files), as shown by the following blurb in the mod_deflate documentation:

The mod_deflate module also provides a filter for decompressing a gzip compressed request body. In order to activate this feature you have to insert the DEFLATE filter into the input filter chain using SetInputFilter or AddInputFilter.


Now if a request contains a Content-Encoding: gzip header, the body will be automatically decompressed. Few browsers have the ability to gzip request bodies. However, some special applications actually do support request compression, for instance some WebDAV clients.

When I saw this, I was ecstatic, since I was searching for something like this for some of my projects. If this works, it means that I can:

  • Use a protocol (HTTP) for file upload which has libraries in many programming languates
  • Use a protocol which needs only one port (as opposed to FTP) and can be made secure if necessary (with SSL/TLS)
  • Use compression, just like rsync can (and, although it can't create binary diffs on its own, when the uploaded files are not used for synchronization, this is not an issue)

Obviously there must be some drawbacks :-)

  • It seems to be an Apache-only feature (I didn't find anything which could indicate support in IIS or even some clear RFC to document how this should work)
  • It can't be negociated! This is huge drawback. When the server side compression is used, the process is the following:
    • The client sends an Accept-Encoding: gzip header along with the request
    • The server checks for this header and if present, compresses the content (minus the time, when the client doesn't really support the compression)
    However, the fact that the client is the first to send, means that there is no way for the server to signal its (in)capability to accept gzip encoding. Even the fact that it's Apache and previously served up compressed content doesn't guarantee the fact that it can handle it, since the input and output filters are two separate things. So the options available are:
    • Use gzip (eventually preceding it with a heuristic detection like the one described before - is it Apache and does it serve up gzip compressed content), and if the server returns an error code, try without gzip
    • The option which I will take - use this only with your own private servers where you configured them properly.

So how do you do it? Here is a blurb, again from the mod_deflate source code: only work on main request/no subrequests. This means that the whole body of the request must be gzip compressed if we chose to use this, it is not possible to compress only the part containing the file for example in a multipart request. Below you can see some perl code I hacked together to use this feature:

use strict;
use warnings;
use File::Temp qw/tempfile/;
use Compress::Zlib;
use HTTP::Request::Common;
use LWP;

$HTTP::Request::Common::DYNAMIC_FILE_UPLOAD = 1;

my $request = POST '',
 'test1' => 'test1',
 'test2' => 'test2',
 'a_file' => ['somedata.dat']
    'Content_Type' => 'form-data',
    'Content_Encoding' => 'gzip';

sub transform_upload {
    my $request = shift;
    my ($fh, $filename) = tempfile();
    my $cs = gzopen($fh, "wb");
    my $request_c = $request->content();
    while (my $data = $request_c->()) { $cs->gzwrite($data); }
    close $fh;
    open $fh, $filename; binmode $fh;
    $request->content(sub {
 my $buffer;
 if (0 < read $fh, $buffer, 4096) {
     return $buffer;
 } else {
     close $fh;
     return undef;
    $request->content_length(-s $filename);


my $browser = LWP::UserAgent->new();
my $response = $browser->request($request);

print $response->content();

This code is optimized for big files, meaning that it won't read the whole request in the memory at one time. Hope somebody finds it useful.

The emperor is not naked!


I was reading the SANS journal for this morning (in my time zone :-)), titled Antivirus: The emperor is naked and got a little upset (probably because it's very hot here and I didn't had my morning tea yet :-D). If you are like me (eg. lazy) and don't want go over to read the post (btw, subscribe to SANS, they are a great information source), the short version of it is that AV can't cope with some advanced malware served up through exploits which are transformed each time somebody downloads a copy (that is, you can't download two identical copies).

The reality is: while there are many drawbacks of AV (and blacklisting in general) and this certainly is a problem, all of the successful AV companies have moved away from simple signatures (like searching for one or more byte sequences) to more complex methods. And anything that is generated by an algorithm in finite time (like this malware) can be identified by an algorithm (the AV software).

The post has a very valid point however: corporations where the variety of used software is limited should move away from a blacklist approach to a whitelist approach for maximum safety.

Living off of the hype


Disclaimer: I work for a competitor, however this is my personal opinion and does not necessarily represent the views of any of my past or future employers.

So tell me, what does F-Secure exactly contribute to the malware fighting effort? Sure, they have a blog and a chief researcher who has an opinion about everything (including many things he hasn't though trough very well - like the .bank top level domain or SMS authentication), but they are only a front for Kaspersky Labs. (To be fair, they mention it in one of their blog posts). They are 99.99% Kaspersky, so why do they need researchers in two locations? Stop the marketing guys and do something useful.

Again, in the spirit of fairness, I know that this blog contributed to raising the awareness about the malware issue, however I feel that they don't give enough credit to the main force behind them (BTW, I'm not in any way affiliated with Kaspersky Labs).

Monday, July 09, 2007

Finding a Windows computer based on its NetBios name


A short tip: when working in hybrid environments (that is where both Windows and Linux machines are present), it is useful to be able to lookup a machine IP based on its NetBios name. You can do this by writing nmblookup [the name of the computer]. This will do a broadcast on all the interfaces querying the directly attached subnets for machines which match the given name and will output their IP addresses.

Sunday, July 08, 2007

Offline updating of Debian systems


It has been my experience that a Linux system is much more usable if it's connected to the Internet, because then the package management system can be used to resolve the dependencies of the programs. From what I've seen (and please bear in mind that I'm fairly new to it), in Linux it is much more common to reuse programs / libraries and there is much less reinventing the wheel going on than in the Windows world. I can only theoretize as to what the reason may be for this, but I think that the clear-cut licenses may be the main reason (basically almost everything is under the GPL - meaning that a programmer knows that s/he can reuse all the other pieces of code).

While this makes for a much more pleasurable experience for the developer, it makes the software harder to install, because you have to have all its dependencies (the libraries/programs it relies on) and their dependencies and so on. A package management system makes it seamless if you are connected to the internet.

However if you have no access to the internet, under Ubuntu you can export the download instructions to a file, which you can take to a computer connected to the internet and execute it. If the given computer runs Windows, you can still use this file to download the packages, just get WGET for Windows, rename the file such that its name ends with .bat (for example download.bat) and remove the first line (the one which begins with #!)

An alternative to this is hyperget, a new project which aims to make this process even more simple.

Computer immune system


Disclaimer: this post (as all the others) are my personal opinion and do not necessarily represent the opinions of any of my past or current employer.

From time to time I get questions from people like: how to best secure my computer? or which security products to use?. Other times they me is product X any good? or argue that product Y stops 99% of the threats, so it must be good!.

My first response is: look at the independent tests. But the actual response it much more complicated:

One thing that all people should realize that there exists no such thing as perfect security. Unfortunately this is something that is quite shocking at first (although it is not so shocking it you stop to think about it a bit: for example in the physical world there is no perfect security either). Every measure you take is only a risk reduction measure. So don't expect to solve the problem of computer security by throwing a lot of money at it!

An other surprising factor: security by obscurity is a valuable risk reduction strategy in computer security. While you should not rely on it as your sole defense, it can improve your (computer) security considerably, because the bad guys are capitalists themselves and want to optimize their revenue / effort ratio, which usually means: target the most common configuration. So here is (a no way exhaustive list) of the characteristics of a common setup (this may be biased because its based on the system I've seen lately):

  • It runs Windows XP
  • It uses Internet Explorer
  • It runs as administrator or at least an user with Power User privileges
  • It uses Windows Media Player or Winamp for media playback
  • It uses Yahoo! Messenger or MSN Messenger (which I think is now called Live Messenger) for chatting. Skype is also common.

This means that the following steps reduce your risk. Of course you have way the costs against the benefits of every item you may or may not implement.

(All the enumerated software is free for personal use - most of time also for commercial use, but please do check - and/or open source)

Here are some additional steps you could take to improve your security:

  • Disallow scripting on web pages (entirely, or selectively)
  • Block the execution of programs from all paths, except those which are needed.

I've put these items separate because they need a more active intervention during their execution and also need constant adjustment. One thing I want to emphasize here that many of these measures work not because they inherently improve security, but because they create a playing field the attacker did not anticipate.

One very good example for this is the fairly recent ANI exploit, where many security experts advised to turf off javascript as a way of preventing exploitation. The thing is that it worked, not because the vulnerability had anything to do with Javascript, but because the attackers counted on Javascript being enabled on most computers and used it to obfuscate the exploit code. Later many more exploits appeared which made no use of Javascript and thus worked perfectly with it turned off!

This is one of the reasons why alternatives are more secure than the thing used by the majority. If all the computers were using the same software, a vulnerability found it it could bring down a very large percentage of them (as the Morris worm demonstrated). This is also why the Microsoft market dominance is such a big problem.

A final thing I wanted to mention: with security products the attacker has the upper hand most of the time. S/he can test the malware against it as many times as s/he would like to ensure that it's not detected / prevented. This is why the big two (or three) AV companies have usually slower reaction times than smaller ones. Although they can write generic detections for classes of malware as good as less known ones, malware authors will usually tweak their malware until it's not detected by them (and before you ask, detecting malware generically is mathematically impossible, although there exists a perfect antivirus :-)). This is also why smaller security vendors seem to provide seem to provide better protection, however it is likely that if they would face the same situation they would perform the same (or worse) as the big players.

In conclusion: diverge from the mainstream where possible, however keep your eye on the cost (if you're responsible for the IT in a company, take into account the time and cost it would take to re-train your users to handle the changes). When evaluating security products ask yourself: is this effective because the way most other products work or does it provide transparent security (as opposed to securtiy by obscurity)?. Because this is a hard question to answer (if you don't work in the field), try asking the following (mostly) equivalent question (which is somewhat easier, although still hard): can I think of any way to circumvent this system?

These things need to be considered because there are targeted attacks out there (and by targeted I don't mean necessarily you or your company, although that is the worst case because then the attacker can perfectly adapt her/his strategy to your counter measures, it can mean all users for a certain country, all users of a certain service/product and so on).

Thursday, July 05, 2007

Google survey beta


So Blogger wanted to know my opinion. I happily clicked along to express my desire to be able to include syntax highlighted code easily. Five point question: what is wrong with the webpage below?

Hint: how do I submit this thing? While the looks of the questionnaire were spartan (not like some people who feel that they must style their checkboxes to look like big buttons, and to add insult to injury they make the styling with javascript rather than CSS!) and simple and itself was short, to the subject, this was all ruined by the tiny little problem of sending the results :-). Admittedly I'm using a little unconventional browser (Epiphany aka Gnome Web Browser), but it is based on the same engine as Firefox (Gecko).

Checking out CVS and creating patches


Update: Qemu moved from CVS to SVN. While the CVS repository is (and will be) available for some time, you should look at the new checkout instructions.

Lately I started to dive into open-source development, specifically Qemu. Since I'm relatively new, here are some commands I found useful:

cvs -z3 -d:pserver:[email protected]:/sources/qemu co qemu - to check out the source code from a CVS repository, the Qemu source code in this case. Unfortunately CVS by default (without SSH tunneling, etc) uses a rather strange port (2401), which is firewalled at most places.

cvs diff -u vl.c vl.h > ../dump_traffic_to_pcap.patch - to create a so called patch (a list of differences between the files on your hard drive and the ones in CVS) file which can later be applied to the source code by the maintainer(s) of the project if s/he so chooses. This command must be issued from within the directory where the project was checked out, so that it can pick up the settings of the checkout.

Some tips I picked up until now (again, I'm by no means an expert):

  • Use the conventions of the source code. This means everything from commenting style, number of tabs to types of functions used (do they use fopen or open?).
  • Make your patches against the latest CVS version. It makes it easier for the maintainer(s) to apply your patch
  • Use Meld or WinMerge to port your patch to a new CVS version
  • Use dos2unix and unix2dos if you are doing cross-platform development (they can be found in the tofrodos package if you are using Ubuntu)
  • Be patient

Two quick tips


Via the .:Computer Defense:. blog: the Windows command prompt has a history feature: just press F7 in a command window.

One of the great features of Firefox 2 is the session saving (I know, there were extensions before that to do the same thing, but they somehow never worked for me). If you want to activate it for every start, not just when Firefox crashes, go to Edit -> Preferences (or Tools -> Options on the Windows I think), Main -> Startup and set When Firefox starts to Show my windows and tabs from the last time. (Via Otaku, Cedric's weblog and MozillaZine)

Update: Thanks to Andy for the tip: there are a lot more hidden features of the command shell which make it a lot more bearable. For a complete description check out The Windows NT Command Shell if you have some time on your hand and/or wish to make your immersions in the command line world more efficient.

Update to the update: the shell has an emulation layer for DOSKEY, which means you can use all of its features without having to run unsupported 16 bit code!

Why not to chain remote desktops?


Quick tip (learned the painful way): do not chain remote desktops, meaning don't open a remote desktop (or VNC sessions for that matter) to one computer and open in that session a remote desktop to an other computer, unless you have bandwidth to waste and don't mind the increased delay :-).

The explanation is rather simple (my head still hurts for banging against the wall for not figuring it out earlier :-)): remote desktop programs work by monitoring the screen and only sending areas which have changed. This has the advantage of reduced need for network bandwidth and less latency. However most window managers (and I'm not talking only about things like Gnome, but also the Windows GDI) provide notification only on a control level (a control is usually defined as a rectangular area which has its own window handle or equivalent - for example a button, an edit control, etc), because that's the abstraction level they work on. Usually the flow of things in a GUI system goes like this:

  • The GUI subsystem figures that a portion of the desktop needs redrawing
  • It then creates a list of controls that need to do the redraw by intersecting the area occupied by the control with the area that needs to be redrawn, taking into account the spacial order of the controls (which is behind which).
  • Finally it transmits a message to the controls which need to be redrawn with the exact area they need to draw

As you can see from the above list, the smallest unit the OS knows about (and thus the smallest unit it can notify other programs about) is a rectangular area of the control. Usually there is no way for the control to notify the OS that it has redrawn only a small portion of the rectangle. Now if we have an other remote desktop connection open, it basically acts as a big rectangle which continuously refreshes, and thus in turn must be retransmitted as a whole trough the primary connection. Basically you are getting a low framerate movie :-)

Before anyone jumps at me: what I've said is true for bitmap based remote desktop products (that is, one which uses bitmaps rather than vector graphics to transmit the information). In case of vector graphics (like the X server of the NoMachine/NX technology) much of these concerns are eliminated.

So what is the solution? Probably you are remoting in through a third machine, because you can't remote in directly. In this case it is recommended to port forward (through this third machine) to your machine the listening port of the remote desktop. Forwarding through something like an SSH tunnel also gives you added security over something like VNC which is essentially cleartext (or clear-image :-)) and truncates your password to 8 characters (that is from the point of view of a VNC server the passwords abcdefgh1234 and abcdefgh are equivalent). Of course SSH has its problems too, so be sure to only allow version two protocol (version one has some weaknesses) and use a strong password.

An other way to forward ports on Linux (to be used only on secured links!) is socat, which describes itself as netcat++. The most notable ability of it is that it can fork separate instances to handle multiple connections.

A related tip: if rdesktop keeps crashing on you, try to change your color depth. For example, it crashed when I was running 16 bit, while with 24 it works flawlessly, even though I was connecting using a 8 bit color depth.

Also note that your clipboard can stop to function when you have multiple remote desktop / VNC instances open. This is something I haven't figured out a solution for and most probably is related to the way the different remote desktop products try to synchronize the clipboard.

Wednesday, July 04, 2007

Regex magic


First of all I want to apologize to my readers (both of them :-)) for bein AWOL, but real life sometimes interferes pretty badly.

I always been a big fan or regular expressions and one of the main reasons I love Perl is because they are so deeply integrated in it and are natural to use. (Of course there are many negative aspects one must be aware, like speed or the fact that sometimes they can be quite hard to read). To deal with the later problem, here is a link to a Perl module which tries to dissect and explain step by step what a regular expression does:

YAPE::Regex::Explain. Be aware that it has a dependency on YAPE::Regex, but this fact is not specified in the package, so doing an install YAPE::Regex::Explain will fail if it's not preceded by an install YAPGE::Regex, even though this should be done automatically (and it would be if the package would be created properly). Running a regular expression through this module will produce an output like the following:

The regular expression:


matches as follows:
NODE                     EXPLANATION
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
  a+?                      'a' (1 or more times (matching the least
                           amount possible))
)                        end of grouping

An other interesting module I came on thanks to this blog post is Regexp::Assemble, which can be used to combine regular expressions and create a big expression which would match anything the starting expressions would have matched (so it is a reunion of the regular expressions), but it's also optimized! Wicked cool.