Back to Top

Monday, May 11, 2009

Removing features is the best defense


1815301588_eb17d22f24_b When I’ve read the news that Microsoft is disabling Autorun for removable media other than CD/DVD in Windows 7 (and maybe HD-DVD/BlueRay) I said: cool! This will slow down the spreading of malware using this feature (on a very long timeframe of course, because Windows 7 isn’t even final yet – and far away from wide adoption).

Then again the evil voices in my head ;-) said: ok, maybe they eliminated the automatic way, but I should be able to find a one-click method which makes social engineering malware easy to deploy. My line of thinking was: make something run when the “Open folder” AutoPlay option is selected using the desktop.ini (also, the MSDN link) file. After toying around and not having too much success I came upon a KB article from MS which states:

To help prevent potentially unsafe content from running when you open a folder on your local computer or on your local area network, by default, Windows XP SP1,Windows Server 2003, Windows Vista, and Windows Server 2008 do not support HTML for Web view in Windows Explorer.

What can I say? Very cool. This again demonstrates the value of the agile practice “just add the features the customer is asking for, nothing more”. So, no cookie for me this time :-).

Picture taken from drumecho's photostream with permission.

Tuesday, May 05, 2009

Is Java slower than C? (and does it matter?)


2471828485_c97a2e83a8_b Via daniel’s blog (the creator of curl) I arrived to this page: why the Java implementation (JGit) doesn’t run nearly as fast as the C implementation. The short version of it is: even after many tunings JGit is twice as slow as the C implementation. One of the problems which got my attention, was the different ways a SHA-1 sum got sliced and diced. So I’ve done a microbenchmark and here are my (not very scientific) results:

  • The fastest way to compare two SHA-1 sums in Java (that I found) was to use its string representation. I’ve tried cramming the hash in Unicode characters (two bytes per character) and byte arrays. The first was only slightly slower, while the second was orders of magnitude slower (~15x slower)
  • Compared to the naive C implementation (using strcmp over the string representation) the Java solution was 100x times (!) slower

What is the end-conclusion? Yes, Java is slower. This is an extreme case of course (amongst other problems, the test ran for very short period of times and possibly the JIT didn’t kick in) and in real life the performance loss is much smaller. In fact the email linked above talks about a 2x performance loss and 2x bigger memory consumption. What it doesn’t talk about however, is the number of bugs (of the “walk all over your memory and you are scratching your head” kind) in the C implementation versus the Java implementation. In my opinion:

  • The speed of Java is “good enough”. In fact it is (orders of magnitude) better than many other high-level languages which are widely used (like PHP, Perl, Python, Ruby).
  • Yes, you can implement things in C, but you will do it in 10x the time with 10x the bugs and probably go mad (unless your aim is job security rather than getting work done)
  • There is an incredible amount of work going into improving the performance of the JVM. Check out this episode from the Java Posse (great podcast btw!) if you are interested in the subject
  • Always profile before deciding that you need to optimize a certain part of your code. Humans are notoriously bad at guessing the bottlenecks
  • “Good enough” means “good enough”. Ok, so the Java implementation was a 100 times slower. Still, it managed to compare over 10 million (that is 10^7) hashes in one second! I find it hard to believe that the main bottleneck in a source-code versioning system this is the comparing of hashes (or the CPU more generally). Even my crappy CVS saturates the disk I/O over a high latency VPN.
  • Related to the above point: set (realistic) goals and don’t obsess about the fact that you could be “doing better”. For example: it needs to render the HTML page in less than 100 ms in 95% of the cases. Could you do it in less tha 50 ms? Maybe, but if 100 ms is good enough, it is good enough.
  • Finally, after you profiled, you always have the option of reimplementing problematic parts in C if you think that it’s worth your time

Picture taken from Tahmid Munaz's photostream with permission.

Weird RVRD issue explained


47838934_1726066a43_bDear reader. This is a highly specific description of a problem related to the Tibco RV (Rendezvous) product and chances are that it is of no interest to you. If this is the case, feel free to skip it. I’ve wanted to document it here, so that other people searching about this problem can find the information.

Here is the problem:

  • lets say that you have to environments (RVs): RV1 and RV2.
  • the environments are linked together by a pair of RVRDs (this is important – no RVRDs – no problems)
  • a server in RV1 wants to send a message to a server in RV2 on subject S1 and then expect to hear back from the given server on S2
  • under some conditions the response can be lost

What is the cause? (The following theory has been confirmed by TIBCO support, so I’m not pulling it out of my rear-end – entirely)

  • RVRDs try to optimize network traffic by forwarding messages only on the subjects where there are listeners
  • What happens if that there is a very small time interval between setting up a listener on S2 by the server in R1 and sending the “request” message. This means that the RVRD in RV2 only learns about a listener being interested in the response messages after it already decided not to forward then. More specifically: there is no guarantee that “normal” messages and administrative messages (those sent on the _RV.> subject) are delivered in the order they are sent. This means that  the administrative message announcing the new listener can arrive later than then messages which are dependent on it.

Possible solutions:

  • Insert delays in your programs after setting listeners on subjects or before responding to messages.
  • Start a tibrvlisten on the given subject (or a superset which includes a given subject) in RV1 on the machine which is running the RVRD. This keeps at least one active “listener” open (of course, you can redirect its output to /dev/null). Strangely enough, it seems important to have tibrvlisten started on the same machine, because using an other machine from RV1 doesn’t seem to deliver the same result.

Hope this helps someone.

Picture taken from psd's photostream with permission.