Back to Top

Sunday, March 20, 2011

Setting the maximum number of opened files under Ubuntu (for JProfiler)


As I found out "on my own skin", setting fs.file-max in /etc/sysctl.conf is a BAD idea. It can render your system useless in one step. Please don't do it! If you did it, use the recovery mode to roll back the change. Also, currently I would only recommend doubling the limit (ie going from 1024 to 2048 or from 2048 to 4096) not going to the maximum value.

JProfiler is a great tool, however under 32 bit Ubuntu you can run into the problem of having a too low limit for open filehandles. This is a problem for JProfiler because it uses temporary files to work around the address-space limitation created by 32 bit (yeah, I know, I should upgrade to 64 bit - but 32 bit works great for now...)

To raise the maximum filehandle limit, do the following:

sudo gedit /etc/security/limits.conf
# add the following two lines before the # End of file marker
# yes, the initial star is also part of line, and you should add it
*       hard    nofile  4096
*       soft    nofile  4096
sudo gedit /etc/sysctl.conf
# restart your system

You can check if the changes were successful by using the ulimit command:

ulimit -n
# it should print out 4096

Tuesday, March 08, 2011

DiskMap - an disk backed Map in Java


I have the following problem: a Java application was running out of memory. It was not feasible to mandate 64 bit JVM for this application and the ~1.4G limit wasn't enough.

My solution was to implement a Map which - when an element is added - also saves the value to disk and only holds a weak reference to the value. When the memory pressure occurs, these objects, only linked by the weak references are evicted. Later, when they need to be read, they are read from the backing file.


  • Adding elements takes considerably longer (because they need to be serialized)
  • There is no way to reclaim space from the backing file (this is only intended for short-running mostly read-only tasks)
  • This is only useful if the values are considerably larger than the keys (because the keys are kept in-memory and only the values have the potential to be removed)
  • There is a memory overhead: when the objects are in-memory, you will take up an additional 20 to 40 bytes per entry. However, when the GC kicks in will only take up a 20 to 40 bytes per key.
Long story short: you can find the code (together with unit-tests) in my repo.

Why running sushi is the best fast-food?

Sushi Bar - Angle View

I just realized that running sushi is the best fast-food ever. (Yes, I have strong opinions weakly held):

  • You get you food in small chunks, so you can stop at any time and still don't feel like you've wasted food
  • You have a great variety of food and you can look at it before taking (rather than just looking at a picture in the menu and wondering how the real thing will look like)
  • It is most probably healthier than other kinds of fast-food
  • It is neither hot nor cold, so you can eat it right away (you don't have to wait for it to warm up or to cool down)
  • You don't have to order! The last thing you want to do when you are hungry is to stare at food and wait

PS. If you are Cluj (Romania) you can check out the Wasabi Running Sushi. As far as I know they are the only running sushi in Romania!

Monday, March 07, 2011

Microbenchmarking and you


Crossposted from the Transylvania JUG website.

Microbenchmarking is the practice of measuring the performance characteristics (like CPU, memory or I/O) of a small piece of code to determine which would be better suited for a particular scenario. If I could offer but one advice on this, it would be this: don't. It is too easy to get it wrong and bad advice resulting from bad measurement is like cancer.

If you don't want to take my first advice, here is my second advice: if you really want to do microbenchmarking watch this talk by Joshua Bloch: Performance Anxiety and use a framework like caliper, which I present below.

caliper is a Java framework written at Google for doing Java microbenchmarks as correctly as possible. To use, first you have to build it (there are no prebuild jars yet, nor is it present in the central Maven repository, sorry):

svn checkout caliper
cd caliper

Now you can start writing your benchmark. Benchmarks are written in a style similar to the JUnit3 tests:

  • you have to extend the class
  • your methods must conform to the public void timeZZZZ(int reps) signature
  • you can override the setUp and tearDown methods to implement initialization / finalization

Below is a simple example (taken from the caliper homepage):

public class MyBenchmark extends SimpleBenchmark {
  public void timeMyOperation(int reps) {
    for (int i = 0; i < reps; i++) {

To run this you have multiple possibilities:

  • Use the caliper script included in the code distribution (this is a SH script, so it won't work under Windows):
    ~/projects-personal/caliper/build/caliper-0.0/caliper --trials 10 org.transylvania.jug.espresso.shots.d20110306.MyBenchmark
    you can also execute the script without parameters to get a list and description of command line parameters.
  • Run it from your favorite IDE. You need to add the following libraries: allocation.jar, caliper-0.0.jar. The main class is and the parameters are the same you would pass to the caliper runner
  • Add a main method to your test class which would contain the following:
    public static void main(String... args) throws Exception {
      Runner.main(MyBenchmark.class, args);

By default caliper outputs an easy to understand text output. You have also the option to publish the benchmark as a nice HTML page (see this page for example). The publication is done trough a Google AppEngine app and is publicly available to anyone (a caveat to remember). For more information see the caliper questions on StackOveflow. You might also be interested in the java performance tunning website if you need to perform such tasks.

Sunday, March 06, 2011

Doing some estimations


This is again one of those topics which I like to rant about, so I give you the short version: when you see a number, question it! Most of the numbers thrown at us in different media can be disproven quite easily and it is our responsibility as people not to just repeat whatever we’ve heard, but rather stop and think a little about it (of course I’m not immune to this myself, since I’ve just fallen into this trap when reading the “Contemplating Financial Trading At Picosecond Resolution” on Slashdot, only to see the very insightful comment: light travels 3mm in a picosecond – yes I’ve done the math - so this article is pure BS).

Offtopic: why do sayings in different languages have so much in common? For example we have the “beating the dead horse” expression in English and in Hungarian we would say somebody is talking about is “horse made of branch” (vesszoparipa). Ain’t it interesting?

Getting back to my rant :-). I’ve seen an article recently about a local (Romanian) affiliate program: eMAG Profitshare 2010. I applaud them for their openness and it also gives us the possibility to do a quick calculation. They say that they’ve given out 463 000 RON (~109 905 EUR / 153 499 USD) to 8690 sites.

Does it sound like a lot? Yes. Is it a lot for each individual site? Unlikely. Lets do a quick math: assuming that each site gets the same share (a very simplistic assumption) we have: 109 905 EUR / 8690 site = ~13 EUR per site / per year (these are yearly figures for 2010) so around 1 EUR (!) per site per month (!).

Ok, so be more real. You have a big fanbase, so you should be in the top sites as revenue. Lets consider a binomial distribution of the sites and do a little chart with Google Docs:


What you see here is the revenue per month for a site in a certain category (categories are from 1 to 10, 1 being the lowest traffic one and 10 the highest traffic one). The number is in EUR. The conclusion: this business model is a very poor revenue source for the individuals participating, but probably a very good marketing avenue for companies (I assume that the cost for companies is around the same as doing a ad campaign, but the returns must be much better – not to mention the google juice they must be getting from this referrals!).

PS: in the name of transparency, you can see the sheet I used for calculation here.

Saturday, March 05, 2011

Audio quality


This is just one of those topics which comes up from time to time in my life (probably because I consume a lot of media). I was recently watching the Jim Zemlin interviewed by Jeremy Allison (Jim Zemlin is the Executive Director of the Linux Foundation) on the Google Open Source YouTube channel and was frustrated by the background noise and low audio volume, since the topic was really interesting to me. So I decided to look into the problem and see if the audio quality could have been easily improved. I covered the topic a couple of years so I won’t go into details, rather just give a 10 000 foot view of the process. Please read the original post for more details, since everything in it still applies.

Step 1: download the YouTube video. VLC natively supports YouTube playback, so exporting the sound to a FLAC file (you should always use lossless codecs during processing!) was just a matter of a couple of clicks and one or two minutes.


Step 2: load up in Audacity and remove the noise. The loading of the FLAC file is a little buggy (the progress bar keeps jumping between 0 and 100% and the time estimation is useless, but it loaded in under a minute). As you can see in the screenshot below, the volume is really low, but there are the occasional spikes, so plain normalization wouldn’t help you here. On the upside, there is no clipping which would result in a hard (impossible?) to repair artifacts.


After noise removal and keeping only one channel (no need for stereo here – we would add it back in the last step if we would to publish it since some devices can’t handle mono and the overhead with joint stereo is almost zero) the file was exported into WAV and fed into the Levelator. Here is the end result:


As you can see, we have much better volume resulting in a much improved experience for the consumer, all this with a couple of minutes of work while browsing Hacker News and with free (and mostly open-source) cross platform tools.

Content publishers of the world: please take a couple of minutes of your time after editing to do a proper post-production! Thank you.

Update: YouTube downloading is broken in the current VLC release but it will be fixed in the next version (1.1.12). Until then you can use the nighly builds.