Back to Top

Monday, December 31, 2012

What every programmer should know about X

0 comments

Piggybacking on some memes floating around on the internet I would like to publish my list of "what every programmer should know".

A couple of introductory words: in my opinion the two most important things to learn for new programmers are terminology - to know what things / ideas / algorithms / concepts are called so that they can search for them on the internet and discuss their ideas) and humility (if something doesn't exists or doesn't work the way we expected, the first thing we should ask ourselves is: "what am I missing?" instead of proclaiming the predecessors to be idiots). Moving along to the list:

Happy holiday reading/watching to all!

Friday, December 21, 2012

Ensuring the order of execution for tasks

0 comments

This post was originally published as part of the Java Advent series. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on! Want to write for the Java Advent blog? We are looking for contributors to fill all 24 slot and would love to have your contribution! Contact Attila Balazs to contribute!

Sometimes it is necessary to impose certain order on the tasks in a threadpool. Issue 206 of the JavaSpecialists newsletter presents one such case: we have multiple connections from which we read using NIO. We need to ensure that events from a given connection are executed in-order but events between different connections can be freely mixed.

I would like to present a similar but slightly different situation: we have N clients. We would like to execute events from a given client in the order they were submitted, but events from different clients can be mixed freely. Also, from time to time, there are "rollup" tasks which involve more than one client. Such tasks should block the tasks for all involved clients (but not more!). Let's see a diagram of the situation:

As you can see tasks from client A and client B are happily processed in parallel until a "rollup" task comes along. At that point no more tasks of type A or B can be processed but an unrelated task C can be executed (provided that there are enough threads). The skeleton of such an executor is available in my repository. The centerpiece is the following interface:

public interface OrderedTask extends Runnable {
    boolean isCompatible(OrderedTask that);
}

Using this interface the threadpool decides if two tasks may be run in parallel or not (A and B can be run in parallel if A.isCompatible(B) && B.isComaptible(A)). These methods should be implemented in a fast, non locking and time-invariant manner.

The algorithm behind this threadpool is as follows:

  • If the task to be added doesn't conflict with any existing tasks, add it to the thread with the fewest elements.
  • If it conflicts with elements from exactly one thread, schedule it to be executed on that thread (and implicitly after the conflicting elements which ensures that the order of submission is maintained)
  • If it conflicts with multiple threads, add tasks (shown with red below) on all but the first one of them on which a task on the first thread will wait, after which it will execute the original task.

More information about the implementation:

  • The code is only a proof-of-concept, some more would would be needed to make it production quality (it needs code for exception handling in tasks, proper shutdown, etc)
  • For maximum performance it uses lock-free* structures where available: each worker thread has an associated ConcurrentLinkedQueue. To achieve the sleep-until-work-is-available semantics, an additional Semaphore is used**
  • To be able to compare a new OrderedTask with currently executing ones, a copy of their reference is kept. This list of copies is updated whenever new elements are enqueued (this is has the potential of memory leaks and if tasks are infrequent enough alternatives - like an additional timer for weak references - should be investigated)
  • Compared to the solution in the JavaSpecialists newsletter, this is more similar to a fixed thread pool executor, while the solution from the newsletter is similar to a cached thread pool executor.
  • This implementation is ideal if (a) the tasks are (mostly) short and (mostly) uniform and (b) there are few (one or two) threads submitting new tasks, since multiple submissions are mutually exclusive (but submission and execution isn't)
  • If immediately after a "rollup" is submitted (and before it can be executed) tasks of the same kind are submitted, they will unnecessarily be forced on one thread. We could add code rearrange tasks after the rollup task finished if this becomes an issue.

Have fun with the source code! (maybe some day I'll find the time to remove all the rough edges).

* somewhat of a misnomer, since there are still locks, only at a lower - CPU not OS - level, but this is the accepted terminology

** - benchmarking indicated this to be the most performant solution. This was inspired from the implementation of the ThreadPoolExecutor.

Meta: this post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on! Want to write for the blog? We are looking for contributors to fill all 24 slot and would love to have your contribution! Contact Attila Balazs to contribute!

Java Runtime options

0 comments

This post was originally published as part of the Java Advent series. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on! Want to write for the Java Advent blog? We are looking for contributors to fill all 24 slot and would love to have your contribution! Contact Attila Balazs to contribute!

The Java runtime is a complex beast - and it has to be since it runs officially on seven platforms and unofficially on many more. Give this, it is normal that there are many knobs and dials to control how things function. The more well known ones are:

  • -Xmx for the maximum heap size
  • -client and -server for selecting the default set of parameters from classes of defaults
  • -XX:MaxPermGen for controlling the permanent generation size

Other than these, it is (very) rarely the case that you need to change the defaults. However, thanks to Java being open source you can see the list of options, their default values and a short explanation directly from the source code. Currently there are almost 800 options in there!

An other way to see the options (but one which doesn't display the explanations unfortunately) is the following command:

java -XX:+UnlockDiagnosticVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version

These options are well worth studying. Not for tweaking them (since there is a wealth of testing behind the defaults the extent of which would be very hard to replicate), but rather to understand the different functionalities offered by the JVM (for example why you might not see stacktraces in exceptions).

Meta: this post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on! Want to write for the blog? We are looking for contributors to fill all 24 slot and would love to have your contribution! Contact Attila Balazs to contribute!

Wednesday, December 12, 2012

Changes to String.substring in Java 7

0 comments

This post was originally published as part of the Java Advent series. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on! Want to write for the Java Advent blog? We are looking for contributors to fill all 24 slot and would love to have your contribution! Contact Attila Balazs to contribute!

It is common knowledge that Java optimizes the substring operation for the case where you generate a lot of substrings of the same source string. It does this by using the (value, offset, count) way of storing the information. See an example below:

In the above diagram you see the strings "Hello" and "World!" derived from "Hello World!" and the way they are represented in the heap: there is one character array containing "Hello World!" and two references to it. This method of storage is advantageous in some cases, for example for a compiler which tokenizes source files. In other instances it may lead you to an OutOfMemorError (if you are routinely reading long strings and only keeping a small part of it - but the above mechanism prevents the GC from collecting the original String buffer). Some even call it a bug. I wouldn't go so far, but it's certainly a leaky abstraction because you were forced to do the following to ensure that a copy was made: new String(str.substring(5, 6)).

This all changed in May of 2012 or Java 7u6. The pendulum is swung back and now full copies are made by default. What does this mean for you?

  • For most probably it is just a nice piece of Java trivia
  • If you are writing parsers and such, you can not rely any more on the implicit caching provided by String. You will need to implement a similar mechanism based on buffering and a custom implementation of CharSequence
  • If you were doing new String(str.substring) to force a copy of the character buffer, you can stop as soon as you update to the latest Java 7 (and you need to do that quite soon since Java 6 is being EOLd as we speak).

Thankfully the development of Java is an open process and such information is at the fingertips of everyone!

A couple of more references (since we don't say pointers in Java :-)) related to Strings:

  • If you are storing the same string over and over again (maybe you're parsing messages from a socket for example), you should read up on alternatives to String.intern() (and also consider reading chapter 50 from the second edition of Effective Java: Avoid strings where other types are more appropriate)
  • Look into (and do benchmarks before using them!) options like UseCompressedStrings (which seems to have been removed), UseStringCache and StringCache

Hope I didn't strung you along too much and you found this useful! Until next time
- Attila Balazs

Meta: this post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on! Want to write for the blog? We are looking for contributors to fill all 24 slot and would love to have your contribution! Contact Attila Balazs to contribute!

Saturday, December 01, 2012

(Re)Start me up!

0 comments

This post was originally published as part of the Java Advent series.

There are cases where you would like to start a Java process identical to the current one (or at least using the the same JVM with tweaked parameters). Some concrete cases where this would be useful:

  • Auto-tuning the maximum memory parameters (ie. you have an algorithm to determine the optimal value - for example: 80% of the system memory - and your JVM wasn't started with that particular value)
  • Creating a cluster of processes for high(er)-availability (true HA implies multiple physical nodes) or because processes have different roles (like the components in MongoDB).
  • Daemonizing the current process (that is, the background process should run even after the launching process has terminated) - this is a very frequent modus-operandi for programs on *nix systems where you have the foreground "control" process and the background "daemon" process (not to be confused with the "daemon" threads).

Doing this is relatively simple - and can be done in pure Java - after you find the correct API calls:

List arguments = new ArrayList<>();
// the java executable
arguments
  .add(String.format("%s%sbin%sjava",
    System.getProperty("java.home"), File.separator,
    File.separator));
// pre-execuable arguments (like -D, -agent, etc)
arguments.addAll(ManagementFactory.getRuntimeMXBean()
  .getInputArguments());

String classPath = System.getProperty("java.class.path"), javaExecutable = System
  .getProperty("sun.java.command");
if (classPath.equals(javaExecutable)) {
 // was started with -jar
 arguments.add("-jar");
 arguments.add(javaExecutable);
} else {
 arguments.add("-classpath");
 arguments.add(classPath);
 arguments.add(javaExecutable);
}

// we might add additional arguments here which will be received by the
// launched program
// in its args[] paramater
arguments.add("runme");

// launch it!
new ProcessBuilder().command(arguments).start();

Some explanations about to the code:

  • It is largely inspired from this project
  • We suppose that the java executable is named java and is located in bin/java relative to java.home. We use File.separator for the code to be portable.
  • getInputArguments is used to get specific arguments passed to the JVM (like -Xmx). It does not include the classpath.
  • Which is taken from java.class.path
  • Finally, there is one heuristic step: we try to detect if we were launched using the -jar myjar.jar syntax or the MyMainClass syntax and replicate it.

This is it! After that we use ProcessBuilder (which we should always favour over Runtime.exec because it auto-escapes the parts of the command line for us).

A final thought: if you intend to use this method to "daemonize" a process (that is: to ensure that it stays running after its parent process has terminated) you should do two things:

  • Redirect the standard input and output. By default they are redirected into temporary buffers and the JVM will seemingly randomly terminate when those buffers (pipes) fill up.
  • Under Windows use javaw instead of java. This ensures that the process won't be tied to the console it was started from (however it will still be tied to the user login session and will terminate when the user logs out - for a more heavy-duty solution look into the Java Service Wrapper).

This is it for today, hope you enjoyed it, fond it useful. If you run the code and it doesn't work as advertised, let me know so that I can update it (I'm especially interested if it works with non Sun/Oracle JVMs). Come back tomorrow for an other article!

Meta: this post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license.