hype-free: My submission for The Ethical Hacker Skillz Challenge

The submission date for the 8th ethical hacker skillz challenge is over and I'm eagerly awaiting the results (which should be published any day now). Until then here is my version of the solution, maybe somebody finds it useful someday:

What is the significance of various numbers in the story, including the speech patterns of the goose and Templeton?
Both the goose and Templeton have a a tendency of using larger words, accomplished in the case of the goose by repeating parts of the words or by using longer and longer words (in case of Templeton). The series of number which represent how many times a given syllable was repeated, respectively how many characters the words contained is 2 3 5 7 11, which are the first five prime numbers (number divisible without a remainder only by one and themselves).

As for the two prices (1,618.03 and 2,718.28), I have no idea, but I have found this PDF, which seems to represent some price list where both numbers are the prices of different Mercedes models.
How had Charlotte and the Geography Ants fooled Lurvy's integrity-checking script?
They created two files (t1.html and t2.html) which had different content but the same MD5 hash. This was possible because of the research done in this area made finding of two such streams of data (usually referred to as collisions) rather simple for the MD5 hash. One interesting aspect that one can note is the fact that because MD5 (and usually all the other hash algorithms) process one byte at the time, it is enough for the attacker to generate two different headers with the same size which have the same hash and if the following data is equal, the resulting hash will be equal.

For example if we have the following two streams:
AAAAAAASSSSSSSSSSSSSSSSSSSSSSS BBBBBBBSSSSSSSSSSSSSSSSSSSSSSS
if MD5(AA...A) is equal to MD5(BB...B), then MD5(AA...ASS..S) will be equal to MD5(BB...BSS..S), regardless of what the sequence SS...S contains. In this concrete situation, the AA...A and BB...B part were the two colliding byte sequences originally published in the 2004 paper by Xiaoyun Wang et. al. entitled "Collisions for Hash Functions - MD4, MD5, HAVAL-128 and RIPEMD" and what followed was a javascript which basically said: if the first part contains the strings (in C notation) "\xC2\xB5\x07\x12F" (where \x?? means that the ascii code of the character is ??, where ?? is a hexadecimal number), display a variant, and if it doesn't, display an other.

The modus operandi was to place the first file (AA...ASS...S in our example) on the site, which outputted the first message and the swap it with the second file at the right moment (BB...BSS...S), which had the same size and MD5 hash, but displayed the other message because the bytes in its "header" were different.
Why did Charlotte have to change the website before the integrity-checking script ran for the first time? Why couldn't she deface it later?
Because the research done in this area only showed how to generate two data streams with the same hash, not how to generate a data stream which has a given hash. If Charlotte would have waited for the integrity check script to run for the first time and to create a baseline hash, she would had to solve the second problem (given a hash, generate a file which has that hash). This problem, while theoretically possible (because we are mapping an infinite number of possible data streams to a finite number of possible hashes - so there has to be collisions), it is computationally much more expensive.
How should Lurvy's script have functioned to improve its ability to detect the kinds of alterations made by Charlotte?
The script should have compared the hash of a local version file (the original file, before it was uploaded) with the hash of the remote version. Also the script should have used hash algorithms which have no known weaknesses (like SHA-256, SHA-512 or WHIRPOOL). An even more secure solution would have been to compute all these hashes and alert it at least one of them didn't match. This wouldn't have eliminated the problem (because again, we map an infinite amount of possible data streams to a finite number of hash values), but would have made it computationally very expensive. Of course the best thing would have been to download the file from the website and compare it byte-by-byte to a stored version of the file. Given the small size of the file and the fact that it already had to be downloaded (fro the MD5 hash to be computed) this would have caused no performance problems and it would have provided a 100% secure way of making sure that the page wasn't changed since the baseline was created.
What was Charlotte's proposal to Lurvy for saving Wilbur?
The file counterhackreloadedsteg.png contains an embedded Microsoft Word document (it was embedded with the Digital Invisible Ink program mentioned in the hint and the password "baaramewe" (without the quotes) - corresponding to the letters in red in lower case - with the following content:

Charlotte’s Proposal

Dear Mr. Lurvy,

Now that I have got your attention, I have a proposal for you. You are obviously a bright businessman, trying to make some money on the sale of Wilbur. But, surely you must recognize the fleeting nature of that one-time sale. I propose to you a better business model, one that can keep this farm profitable for years to come.

Employ me, Charlotte the Spider, as a web site designer, contracting my services out for $150 per hour. I will charge you only $ 50 per hour. Thus, working only 40 hours per week, I can bring in more cash for you every single week than the one-time sale of Wilbur. If you are interested in my offer, please send e-mail to [email protected], with the subject: CHARLOTTE’S PROPOSAL.

Yours truly,

Charlotte

Final note: while the method in this story interesting and demonstrated the concept of hash collisions, it has two major drawbacks:

it breaks horribly if the target doesn't have javascript enabled
it is very clear that something is wrong even after a brief examination of the page source.

In a real world scenario most probably a more stealthy method should have been applied. For example, linking to an external script file - placed on a server controlled by the attacker - from a script tag which was included / appended to the document. This method resolves both of the problems mentioned earlier:

it degrades nicely in browsers which don't support javascript or don't have it enabled - it will show the original page
a cursory examination of the page source might not reveal the source of the problem

Further advantages:

the file needs to be modified only once, all subsequent modifications will be performed in the linked script file. this means that as long as the modification is performed before the baseline for the integrity check is created, we have successfully bypassed all enumerated integrity check methods, except those which kept an original version of the file before it was uploaded and used it as baseline for the integrity check.
the javascript can be generated dynamically, so for example it is possible to show the original version for some originating IPs and the altered version for other IPs, making the discovery of the fact that the site was modified even harder.

The linked javascript would include the code which would generate an IFRAME pointing to an arbitrary page and overlay that over the original page. The fact that it is an IFRAME ensures the fact that the url in the address bar wouldn't change (which could raise suspicion).