Back to Top

Monday, March 22, 2010

String.intern() – there are better ways

4349787041_f31a40baf4_o I don’t want to write a “considered harmful” article (because they are harmful), but after experimenting with different solutions I do have a strong opinion that there almost no reason to use String.intern() in Java. But let us proceed step-by-step.

First of all, what does String.intern() do? Go read the Javadoc for it and also take a look at String interning of Wikipedia. The essence of it is that if you have two strings s1 and s2 such that s1.equals(s2), there will be only one copy of the string stored if they are interned. From this definition follow the two usecases for string interning:

  1. You read a lot of repetitive strings from an external source (a flat file or DB for example) and you need to keep them all in memory. In this case interning the strings has the potential to save you a lot of memory.
  2. You’ve determined (by profiling your application!) that String.equals is a hotspot for your application and you would like to replace those calls with the == operator.

If you have different reasons for looking at String.intern(), you should think twice about them before going down the route. If you’ve thought about long and hard, and you still think that String.intern is the best solution for you, but not for any reason mentioned above, please leave me a comment! (Also, read the rest of this post, since it might give you a better alternative).

So, having the above usecases in mind, what is the problem with calling String.intern?

  1. It is quite CPU intensive. Calling new String("foo").intern() can be an order of magnitude (10x to 15x based on some of my measurements) slower than new String("foo").
  2. You have to remember to do it everywhere. This isn’t so fatal if you’re just aiming for reduced memory consumption, but if you forget to call “intern” somewhere and later use the “==” operator for comparing elements, you can create some hard to track down bugs.
  3. It can result in mysterious “OutOfMemory” exceptions. In the SUN JVM (which is the most widely used one) “internalized” String’s are stored in a special memory location called “PermGen”. The size of this isn’t influenced by the usual “-Xmx1024M” command line option, you have to remember (and to know about it in the first place!) to use the “-XX:MaxPermSize=512m” command line.

These are some very serious problems. What are the alternatives? The easiest one is not to use String.intern. Ok, lets say that you’ve performed measurements with relevant, production data and came to the conclusions that your problems need to resolved using this method. My recommendation would be the following:

  • Use a WeakHashMap to create a pool of Strings as describe in this blog post. This has the advantage that your cache won’t end up keeping the objects in memory after all the references to it have disappeared. Don’t forget to synchronize access to it if you’re planning on using it from multiple threads!
  • Always use String.equals, never “==”. If you take a peek at java.lang.String.equals, you will see that the first check that it does is “==”. By never using “==” explicitly you still will have most of the speed benefits, while eliminating the risk that you accidentally get a “rogue” String from somewhere and your code fails, even though the two strings are equal.

The advantages of the above solution are:

  • It is 30% to 50% faster than String.intern (although it is still slower than not calling String.intern. You should also watch out that it doesn’t become a chocking point in your application because of the synchronization if you are calling it from multiple threads).
  • It is safe (as mentioned above, forgetting to “make unique” some of the String’s doesn’t make your logic fail)
  • It doesn’t require special configuration on the JVM (like adjusting the PermGen size)

I will post some example code later this week when I’ll post the slides for a presentation I’ll be giving to the local JUG, so be sure to keep an eye on my blog and my code repository.

Some resources on the topic:

Picture taken from Mark Drago's photostream with permission.

6 comments:

  1. When do you use Intern or something similar to it? What is a practical application for this? I'm thinking if you have 2 strings with equal value and only one copy is stored, what happens if another process or thread modifies the string, while yet another thread tried to retrieve the string value and now you get a different one.

    ReplyDelete
  2. @Vita: String's in Java are immutable. This means that you can never modify their value (except using some dirty Reflection trickery, but you are not supposed to do that).

    For example when you write the code:

    String s = "aaaaa";
    s = "bbbbbb" + s;

    What actually happens is that you modify the string S points to (remember, in Java all non-primitive type variables are in fact references), not the actual value of the string.

    Or to put it an other way: after the second line you will have to memory areas: one representing the string "aaaaa" and for the string "bbbbbbaaaaa" (at least until the GC kicks in and eliminates the first). What you are actually doing is re-pointing the reference s from the first to the second one.

    ReplyDelete
  3. The advantage of using String.intern is that it requires less memory. A WeakHashmap has a high overhead (more than 100 bytes per entry IIRC). Weak references are also slowing down the GC and what you really would want a Concurrent StringWeakHashSet which would have an even higher overhead (most likely). String.intern is not perfect, but it works well enough on modern VM's.IIRC it's not slower at least not on the SAP JVM (a SUN Hotspot derivate).

    ReplyDelete
  4. @Markus: I found String.intern to be considerably slower (somewhere in the 2x - 5x range, I don't have the numbers at hand right now) than lookups in a hashmap.

    Also, in many cases you can get away with a simple HashMap (for example if you have a read-only in-memory DB which is populated at the program startup).

    But I completely agree that everyone should do his or her own measurement and do the most effective thing for her situation.

    Best regards and thanks for the comment.

    ReplyDelete
  5. @Markus: I forgot to mention that I've used the SUN JDK 1.6u18 for the measurements.

    ReplyDelete
  6. just 4 fun:
    am dat din greseala cautare pe google dupa "string" si sa vezi ce am gasit la capitolul imagini: http://www.google.ro/images?q=string&um=1&ie=UTF-8&source=univ&ei=zGTcTMv-HJGTswb_7PyhBA&sa=X&oi=image_result_group&ct=title&resnum=4&ved=0CEsQsAQwAw&biw=1280&bih=666

    ;))

    ReplyDelete