Thursday, October 20, 2011

When it comes to multithreading, better be safe than sorry

Recently, I attended a code review of the core parts of a web application, written in Java. The application is used by a large customer base and occassionally, there are error reports and exceptions in the log files. Some of these exceptions are the dreaded ConcurrentModificationExceptions, indicating conflicting read/write access on an unsynchronized collection data structure. In the code review, we found several threading flaws, but not after an exhaustive reading of the whole module. Here, I want to present the flaws and give some advice on how to avoid them:

The public lock

In some parts of the code, methods were defined as synchronized through the method declaration keyword:
public synchronized String getLastReservation() { [...]
While there is nothing wrong with this approach in itself, it can be highly dangerous in combination with synchronized blocks. The code above effectively wraps a synchronized block using the object instance (this) as a lock. No information of an object is more publicly visible as the object reference (this), so you have to check all direct or indirect clients of this object if they synchronize on this instance, too. If they do, you have chained two code blocks together, probably without proper mentioning of this fact. The least harmful defect will be performance losses because your code isn’t locked as fine grained as it could be.
The easiest way to avoid these situations it to always hide the locks. Try not to share one object’s locks with other objects. If you choose publicly accessible locks, you can never be sure about that.

The subtle lock change

In one class, there were both instance and class (static) methods, using the synchronized keyword:
public synchronized String getOrderNumberOf(String customerID) { [...]
public  synchronized static int getTotalPendingOrders() { [...]
And while they were both accessing the same collection data structure (a static hashmap), they were using different locks. The lock of the instance method is the instance itself, while the lock of the static method is the class object of the type. This is very dangerous, as it can be easily missed when writing or altering the code.
The best way to prevent this problem it to avoid the synchronized modifier for methods completely. State your locks explicitely, all the time.

Partial locking

In a few classes, collection datatypes like lists were indeed synchronized by internal synchronized-blocks in the methods, using the private collection instance as lock. The synchronized blocks were applied to the altering methods like putX(), removeX() and getX(). But the toString() method, building a comma-separated list of the textual list entries, wasn’t synchronized to the list. The method contained the following code:
1public String toString() {
2    StringBuilder result = new StringBuilder();
3    for (String entry : this.list) {
4        result.append(entry);
5        result.append(",");
6    }
7    [...]
8    return result.toString();
9}
I’ve left out some details and special cases, as they aren’t revelant here. The problem with the foreach loop is that an anonymous Iterator over the list is used and it will relentlessly monitor the list for any changes and throw a ConcurrentModificationException as soon as one of the properly synchronized sections changes it. The toString() method was used to store the list to a session dependent data storage. Every once in a while, the foreach loop threw an exception and failed to properly persist the list data, resulting in data loss.
The most straight-forward solution to this problem might be to add the missing synchronization block in the toString() method. If you don’t want to block the user session while writing to disk, you might traverse the list without an Iterator (and be careful with your assumptions about valid indices) or work on a copy of the list, given that an in-memory copy of the list would be cheap. In an ACID system scenario, you should probably choose to complete your synchronized block guards.

Locking loophole

Another problem was a collection that was synchronized internally, but could be accessed through a getter method. No client could safely modify or traverse the collection, because they had the collection, but not the lock object (that happened to be the collection, too, but who can really be sure about that in the future?). It would be ridiculous to also provide a getter for the lock object (always hide your locks, remember?), the better solution is to refactor the client code to a “tell, don’t ask” style.
To prevent a scenario when a client can access a data structure but not its lock, you shouldn’t be able to gain access to the data structure, but pass “command objects” to the data structure. This is a perfect use case for closures. Effectively, you’ll end up with something like Function or Operation instances that are applied to every element of the collection within a synchronized block and perform your functionality on them. Have a look at op4j for inspirational syntax.

Local locking

This was the worst of all problems and the final reason for this blog entry: In some methods, the lock objects were local variables. In summary, these methods looked like this:
1public String getData() {
2    Object lock = new Object();
3    synchronized (lock) {
4        [...]
5    }
6}
Of course, it wasn’t that obvious. The lock objects were propagated to other methods, stored in datastructures, removed from them, etc. But in the end, each caller of the method got his own lock and could henceforth wreck havoc in code that appeared very well synchronized on first look. The error in its clarity is too stupid to be widespread. The problem was the obfuscation around it. It took us some time to really understand what is going on and where all that lock objects really come from.
My final advice is: If you have to deal with multithreading, don’t outsmart yourself and the next fellow programmer by building complex code structures or implicit relationships. Be as concise and explicit as you can be. Less clutter is more when dealing with threads. The core problem is the all-or-none law of thread synchronization: Either you’ve got it all right or you’ve got it all wrong – you just don’t know yet.
Hide your locks, name your locks explicitely, reduce the scope of necessary locking so that you can survey it easily, never hand out your locked data, and, most important, remove all clutter around your locking structures. This might make the difference between “just works” and endless ominous bug reports.

No comments:

Post a Comment