Recently,
I attended a code review of the core parts of a web application,
written in Java. The application is used by a large customer base and
occassionally, there are error reports and exceptions in the log files.
Some of these exceptions are the dreaded
ConcurrentModificationExceptions, indicating conflicting read/write
access on an unsynchronized collection data structure. In the code
review, we found several threading flaws, but not after an exhaustive
reading of the whole module. Here, I want to present the flaws and give
some advice on how to avoid them:
The public lock
In some parts of the code, methods were defined as synchronized through the method declaration keyword:
public synchronized String getLastReservation() { [...]
While there is nothing wrong with this approach in itself, it can be
highly dangerous in combination with synchronized blocks. The code above
effectively wraps a synchronized block using the object instance (this)
as a lock. No information of an object is more publicly visible as the
object reference (this), so you have to check all direct or indirect
clients of this object if they synchronize on this instance, too. If
they do, you have chained two code blocks together, probably without
proper mentioning of this fact. The least harmful defect will be
performance losses because your code isn’t locked as fine grained as it
could be.
The easiest way to avoid these situations it to always hide the
locks. Try not to share one object’s locks with other objects. If you
choose publicly accessible locks, you can never be sure about that.
The subtle lock change
In one class, there were both instance and class (static) methods, using the synchronized keyword:
public synchronized String getOrderNumberOf(String customerID) { [...]
public synchronized static int getTotalPendingOrders() { [...]
And while they were both accessing the same collection data structure
(a static hashmap), they were using different locks. The lock of the
instance method is the instance itself, while the lock of the static
method is the class object of the type. This is very dangerous, as it
can be easily missed when writing or altering the code.
The best way to prevent this problem it to avoid the synchronized
modifier for methods completely. State your locks explicitely, all the
time.
Partial locking
In a few classes, collection datatypes like lists were indeed
synchronized by internal synchronized-blocks in the methods, using the
private collection instance as lock. The synchronized blocks were
applied to the altering methods like putX(), removeX() and getX(). But
the toString() method, building a comma-separated list of the textual
list entries, wasn’t synchronized to the list. The method contained the
following code:
1 | public String toString() { |
2 | StringBuilder result = new StringBuilder(); |
3 | for (String entry : this .list) { |
8 | return result.toString(); |
I’ve left out some details and special cases, as they aren’t revelant
here. The problem with the foreach loop is that an anonymous Iterator
over the list is used and it will relentlessly monitor the list for any
changes and throw a ConcurrentModificationException as soon as one of
the properly synchronized sections changes it. The toString() method was
used to store the list to a session dependent data storage. Every once
in a while, the foreach loop threw an exception and failed to properly
persist the list data, resulting in data loss.
The most straight-forward solution to this problem might be to add
the missing synchronization block in the toString() method. If you don’t
want to block the user session while writing to disk, you might
traverse the list without an Iterator (and be careful with your
assumptions about valid indices) or work on a copy of the list, given
that an in-memory copy of the list would be cheap. In an
ACID system scenario, you should probably choose to complete your synchronized block guards.
Locking loophole
Another problem was a collection that was synchronized internally,
but could be accessed through a getter method. No client could safely
modify or traverse the collection, because they had the collection, but
not the lock object (that happened to be the collection, too, but who
can really be sure about that in the future?). It would be ridiculous to
also provide a getter for the lock object (always hide your locks,
remember?), the better solution is to refactor the client code to a
“tell, don’t ask” style.
To prevent a scenario when a client can access a data structure but
not its lock, you shouldn’t be able to gain access to the data
structure, but pass “command objects” to the data structure. This is a
perfect use case for
closures.
Effectively, you’ll end up with something like Function or Operation
instances that are applied to every element of the collection within a
synchronized block and perform your functionality on them. Have
a look at op4j for inspirational syntax.
Local locking
This was the worst of all problems and the final reason for this blog
entry: In some methods, the lock objects were local variables. In
summary, these methods looked like this:
1 | public String getData() { |
2 | Object lock = new Object(); |
Of course, it wasn’t that obvious. The lock objects were propagated
to other methods, stored in datastructures, removed from them, etc. But
in the end, each caller of the method got his own lock and could
henceforth wreck havoc in code that appeared very well synchronized on
first look. The error in its clarity is too stupid to be widespread. The
problem was the obfuscation around it. It took us some time to really
understand what is going on and where all that lock objects really come
from.
My final advice is: If you have to deal with multithreading, don’t
outsmart yourself and the next fellow programmer by building complex
code structures or implicit relationships. Be as concise and explicit as
you can be. Less clutter is more when dealing with threads. The core
problem is the all-or-none law of thread synchronization: Either you’ve
got it all right or you’ve got it all wrong – you just don’t know yet.
Hide your locks,
name your locks explicitely,
reduce the scope of necessary locking so that you can survey it easily,
never hand out your locked data, and, most important,
remove all clutter around your locking structures. This might make the difference between “just works” and endless ominous bug reports.