Monday, March 21, 2011

Java Performance Tuning

         In this chapter:
The biggest difference between time and space is that you can't reuse time.

 "I thought that I didn't need to worry about memory allocation. Java is supposed to handle all that for me." This is a common perception, which is both true and false. Java handles low-level memory allocation and deallocation and comes with a garbage collector. Further, it prevents access to these low-level memory-handling routines, making the memory safe. So memory access should not cause corruption of data in other objects or in the running application, which is potentially the most serious problem that can occur with memory access violations. In a C or C++ program, problems of illegal pointer manipulations can be a major headache (e.g., deleting memory more than once, runaway pointers, bad casts). They are very difficult to track down and are likely to occur when changes are made to existing code. Java deals with all these possible problems and, at worst, will throw an exception immediately if memory is incorrectly accessed.

However, Java does not prevent you from using excessive amounts of memory nor from cycling through too much memory (e.g., creating and dereferencing many objects). Contrary to popular opinion, you can get memory leaks by holding on to objects without releasing references. This stops the garbage collector from reclaiming those objects, resulting in increasing amounts of memory being used.[1] In addition, Java does not provide for large numbers of objects to be created simultaneously (as you could do in C by allocating a large buffer), which eliminates one powerful technique for optimizing object creation.

Creating objects costs time and CPU effort for an application. Garbage collection and memory recycling cost more time and CPU effort. The difference in object usage between two algorithms can make a huge difference. In Chapter 5, Strings, I cover algorithms for appending basic data types to StringBuffer objects. These can be an order of magnitude faster than some of the conversions supplied with Java. A significant portion of the speedup is obtained by avoiding extra temporary objects used and discarded during the data conversions.[2]

Here are a few general guidelines for using object memory efficiently:
  • Avoid creating objects in frequently used routines. Because these routines are called frequently, you will likely be creating objects frequently, and consequently adding heavily to the overall burden of object cycling. By rewriting such routines to avoid creating objects, possibly by passing in reusable objects as parameters, you can decrease object cycling.
  • Try to presize any collection object to be as big as it will need to be. It is better for the object to be slightly bigger than necessary than to be smaller than it needs to be. This recommendation really applies to collections that implement size increases in such a way that objects are discarded. For example, Vector grows by creating a new larger internal array object, copying all the elements from and discarding the old array. Most collection implementations have similar implementations for growing the collection beyond its current capacity, so presizing a collection to its largest potential size reduces the number of objects discarded.
  • When multiple instances of a class need access to a particular object in a variable local to those instances, it is better to make that variable a static variable rather than have each instance hold a separate reference. This reduces the space taken by each object (one less instance variable) and can also reduce the number of objects created if each instance creates a separate object to populate that instance variable.
  • Reuse exception instances when you do not specifically require a stack trace (see ).
This chapter presents many other standard techniques to avoid using too many objects, and identifies some known inefficiencies when using some types of objects.

Object-Creation Statistics

Objects need to be created before they can be used, and garbage-collected when they are finished with. The more objects you use, the heavier this garbage-cycling impact becomes. General object-creation statistics are actually quite difficult to measure decisively, since you must decide exactly what to measure, what size to pregrow the heap space to, how much garbage collection impacts the creation process if you let it kick in, etc.

For example, on a medium Pentium II, with heap space pregrown so that garbage collection does not have to kick in, you can get around half a million to a million simple objects created per second. If the objects are very simple, even more can be garbage-collected in one second. On the other hand, if the objects are complex, with references to other objects, and include arrays (like Vector and StringBuffer) and nonminimal constructors, the statistics plummet to less than a quarter of a million created per second, and garbage collection can drop way down to below 100,000 objects per second. Each object creation is roughly as expensive as a malloc in C, or a new in C++, and there is no easy way of creating many objects together, so you cannot take advantage of efficiencies you get using bulk allocation.

There are already runtime systems that use generational garbage collection, minimize object-creation overhead, and optimize native-code compilation. By doing this they reach up to three million objects created and collected per second (on a Pentium II), and it is likely that the average Java system should improve to get closer to that kind of performance over time. But these figures are for basic tests, optimized to show the maximum possible object-creation throughput. In a normal application with varying size objects and constructor chains, these sorts of figures cannot be obtained or even approached. Also bear in mind that you are doing nothing else in these tests apart from creating objects. In most applications, you are usually doing something with all those objects, making everything much slower but significantly more useful. Avoidable object creation is definitely a significant overhead for most applications, and you can easily run through millions of temporary objects using inefficient algorithms that create too many objects. In Chapter 5, we look at an example that uses the StreamTokenizer class. This class creates and dereferences a huge number of objects while it parses a stream, and the effect is to slow down processing to a crawl. The example in Chapter 5 presents a simple alternative to using a StreamTokenizer, which is 100 times faster: a large percentage of the speedup is gained from avoiding cycling through objects.

Note that different VM environments produce different figures. If you plot object size against object-creation time for various environments, most plots are monotonically increasing, i.e., it takes more time to create larger objects. But there are discrepancies here too. For example, Netscape Version 4 running on Windows has the peculiar behavior that objects of size 4 and 12 ints are created fastest (refer to http://www.javaworld.com/javaworld/jw-09-1998/jw-09-speed.html). Also, note that JIT VMs actually have a worse problem with object creation relative to other VM activities, because JIT VMs can speed up almost every other activity, but object creation is nearly as slow as if the JIT compiler was not there.

Object Reuse

As we saw in the last section, objects are expensive to create. Where it is reasonable to reuse the same object, you should do so. You need to be aware of when not to call new. One fairly obvious situation is when you have already used an object and can discard it before you are about to create another object of the same class. You should look at the object and consider whether it is possible to reset the fields and then reuse the object, rather than throw it away and create another. This can be particularly important for objects that are constantly used and discarded: for example, in graphics processing, objects such as Rectangles, Points, Colors, and Fonts are used and discarded all the time. Recycling these types of objects can certainly improve performance.

Recycling can also apply to the internal elements of structures. For example, a linked list has nodes added to it as it grows, and as it shrinks, the nodes are discarded. Holding on to the discarded nodes is an obvious way to recycle these objects and reduce the cost of object creation.

Pool Management

Most container objects (e.g., Vectors, Hashtables) can be reused rather than created and thrown away. Of course, while you are not using the retained objects, you are holding on to more memory than if you simply discarded those objects, and this reduces the memory available to create other objects. You need to balance the need to have some free memory available against the need to improve performance by reusing objects. But generally, the space taken by retaining objects for later reuse is significant only for very large collections, and you should certainly know which ones these are in your application.

Note that when recycling container objects, you need to dereference all the elements previously in the container so that you don't prevent them from being garbage collected. Because there is this extra overhead in recycling, it may not always be worth recycling containers. As usual for tuning, this technique is best applied to ameliorate an object-creation bottleneck that has already been identified.

A good strategy for reusing container objects is to use your own container classes, possibly wrapping other containers. This gives you a high degree of control over each collection object, and you can design these specifically for reuse. You can still use a pool manager to manage your requirements, even without reuse-designed classes. Reusing classes requires extra work when you've finished with a collection object, but the effort is worth it when reuse is possible. The code fragment here shows how you could use a vector pool manager:

//An instance of the vector pool manager.
public static VectorPoolManager vectorPoolManager =
    new VectorPoolManager(25);

...

public void someMethod(  )
{
  //Get a new Vector. We only use the vector to do some stuff
  //within this method, and then we dump the vector (i.e. it
  //is not returned or assigned to a state variable)
  //so this is a perfect candidate for reusing Vectors.

  //Use a factory method instead of 'new Vector(  )'
  Vector v = vectorPoolManager.getVector( );
 
  ... //do vector manipulation stuff
 
  //and the extra work is that we have to explicitly tell the
  //pool manager that we have finished with the vector

  vectorPoolManager.returnVector(v);
}
Note that nothing stops the application from retaining a handle on a vector after it has been returned to the pool, and obviously that could lead to a classic "inadvertent reuse of memory" bug. You need to ensure that handles to vectors are not held anywhere: these Vectors should be used only internally within an application, not externally in third-party classes where a handle may be retained. The following class manages a pool of Vectors:

package tuning.reuse;
import java.util.Vector;
public class VectorPoolManager
{
 
  Vector[] pool;
  boolean[] inUse;

  public VectorPoolManager(int initialPoolSize)
  {
    pool = new Vector[initialPoolSize];
    inUse = new boolean[initialPoolSize];
    for (int i = pool.length-1; i>=0; i--)
    {
      pool[i] = new Vector(  );
      inUse[i] = false;
    }
  }
 
  public synchronized Vector getVector(  )
  {

    for (int i = inUse.length-1; i >= 0; i--)
      if (!inUse[i])
      {
        inUse[i] = true;
        return pool[i];
      }

    //If we got here, then all the Vectors are in use. We will
    //increase the number in our pool by 10 (arbitrary value for
    //illustration purposes).
    boolean[] old_inUse = inUse;
    inUse = new boolean[old_inUse.length+10];
    System.arraycopy(old_inUse, 0, inUse, 0, old_inUse.length);

    Vector[] old_pool = pool;
    pool = new Vector[old_pool.length+10];
    System.arraycopy(old_pool, 0, pool, 0, old_pool.length);
 
    for (int i = old_pool.length; i < pool.length; i++)
    {

      pool[i] = new Vector(  );
      inUse[i] = false;
    }
 
    //and allocate the last Vector
    inUse[pool.length-1] = true;
    return pool[pool.length-1];

  }
 
  public synchronized void returnVector(Vector v)
  {
    for (int i = inUse.length-1; i >= 0; i--)
      if (pool[i] == v)

      {
        inUse[i] = false;
        //Can use clear(  ) for java.util.Collection objects
        //Note that setSize(  ) nulls out all elements
        v.setSize(0);
        return;

      }
    throw new RuntimeException("Vector was not obtained from the pool: " + v);
  }
}
Because you reset the Vector size to 0 when it is returned to the pool, all objects previously referenced from the vector are no longer referenced (the Vector.setSize( ) method nulls out all internal index entries beyond the new size to ensure no reference is retained). However, at the same time, you do not return any memory allocated to the Vector itself, because the Vector's current capacity is retained. A lazily initialized version of this class simply starts with zero items in the pool and sets the pool to grow by one or more each time.

(Many JDK collection classes, including java.util.Vector, have both a size and a capacity. The capacity is the number of elements the collection can hold before that collection needs to resize its internal memory to be larger. The size is the number of externally accessible elements the collection is actually holding. The capacity is always greater than or equal to the size. By holding spare capacity, elements can be added to collections without having to continually resize the underlying memory. This makes element addition faster and more efficient.)

ThreadLocals

The previous example of a pool manager can be used by multiple threads in a multithreaded application, although the getVector( ) and returnVector( ) methods first need to be defined as synchronized. This may be all you need to ensure that you reuse a set of objects in a multithreaded application. Sometimes though, there are objects you need to use in a more complicated way. It may be that the objects are used in such a way that you can guarantee you need only one object per thread, but any one thread must consistently use the same object. Singletons (see the "Canonicalizing Objects" section) that maintain some state information are a prime example of this sort of object.

In this case, you might want to use a ThreadLocal object. ThreadLocals have accessors that return an object local to the current thread. ThreadLocal use is best illustrated using an example; this one produces:

[This is thread 0, This is thread 0, This is thread 0]
[This is thread 1, This is thread 1, This is thread 1]

[This is thread 2, This is thread 2, This is thread 2]
[This is thread 3, This is thread 3, This is thread 3]
[This is thread 4, This is thread 4, This is thread 4]

Each thread uses the same access method to obtain a vector to add some elements. The vector obtained by each thread is always the same vector for that thread: the ThreadLocal object always returns the thread-specific vector. As the following code shows, each vector has the same string added to it repeatedly, showing that it is always obtaining the same thread-specific vector from the vector access method. (Note that ThreadLocals are only available from Java 2, but it is easy to create the equivalent functionality using a Hashtable: see the getVectorPriorToJDK12( ) method.)

package tuning.reuse;
 
import java.util.*;
 
public class ThreadedAccess
  implements Runnable
{
  static int ThreadCount = 0;

  public void run(  )
  {
    //simple test just accesses the thread local vector, adds the
    //thread specific string to it, and sleeps for two seconds before
    //again accessing the thread local and printing out the value.
    String s = "This is thread " + ThreadCount++;

    Vector v = getVector(  );
    v.addElement(s);
    v = getVector(  );
    v.addElement(s);
    try{Thread.sleep(2000);}catch(Exception e){}
    v = getVector(  );

    v.addElement(s);
    System.out.println(v);
  }
 
  public static void main(String[] args)
  {
    try

    {
      //Four threads to see the multithreaded nature at work
      for (int i = 0; i < 5; i++)
      {
        (new Thread(new ThreadedAccess())).start(  );

        try{Thread.sleep(200);}catch(Exception e){}
      }
    }
    catch(Exception e){e.printStackTrace(  );}
  }
 
  private static ThreadLocal vectors = new ThreadLocal(  );

  public static Vector getVector(  )
  {
     //Lazily initialized version. Get the thread local object
     Vector v = (Vector) vectors.get(  );
     if (v == null)
     {

       //First time. So create a vector and set the ThreadLocal
       v = new Vector(  );
       vectors.set(v);
     }
     return v;
  }

 
  private static Hashtable hvectors = new Hashtable(  );
  /* This method is equivalent to the getVector(  ) method, 
   * but works prior to JDK 1.2 (as well as after).
   */
  public static Vector getVectorPriorToJDK12(  )
  {

     //Lazily initialized version. Get the thread local object
     Vector v = (Vector) hvectors.get(Thread.currentThread(  ));
     if (v == null)
     {
       //First time. So create a vector and set the thread local
       v = new Vector(  );

       hvectors.put(Thread.currentThread(  ), v);
     }
     return v;
  }
}

Reusable Parameters

Reuse also applies when a constant object is returned for information. For example, the preferredSize( ) of a customized widget returns a Dimension object that is normally one particular dimension. But to ensure that the stored unchanging Dimension value does not get altered, you need to return a copy of the stored Dimension. Otherwise, the calling method accesses the original Dimension object and can change the Dimension values, thus affecting the original Dimension object itself.

Java provides a final modifier to fields that allows you to provide fixed values for the Dimension fields. Unfortunately, you cannot redefine an already existing class, so Dimension cannot be redefined to have final fields. The best solution in this case is that a separate class, FixedDimension, be defined with final fields (this cannot be a subclass of Dimension, as the fields can't be redefined in the subclass). This extra class allows methods to return the same FixedDimension object if applicable, or a new FixedDimension is returned (as happens with Dimension) if the method requires different values to be returned for different states. Of course, it is too late now for java.awt to be changed in this way, but the principle remains.

Note that making a field final does not make an object unchangeable. It only disallows changes to the field:

public class FixedDimension {
  final int height;
  final int width;
  ...

}
 
//Both the following fields are defined as final
public static final Dimension dim = new Dimension(3,4);
public static final FixedDimension fixedDim = new FixedDimension(3,4);
 
dim.width = 5;           //reassignment allowed
dim = new Dimension(3,5);//reassignment disallowed
fixedDim.width = 5;      //reassignment disallowed
fixedDim = new FixedDimension(3,5); //reassignment disallowed

An alternative to defining preferredSize( ) to return a fixed object is to provide a method that accepts an object whose values will be set, e.g., preferredSize(Dimension). The caller can then pass in a Dimension object, which would have its values filled in by the preferredSize(Dimension)method. The calling method can then access the values in the Dimension object. This same Dimension object can be reused for multiple components. This design pattern is beginning to be used extensively within the JDK. Many methods developed with JDK 1.2 and onward accept a parameter that is filled in, rather than returning a copy of the master value of some object. If necessary, backward compatibility can be retained by adding this method as extra, rather than replacing an existing method:

public static final Dimension someSize = new Dimension(10,5);
//original definition returns a new Dimension.
public Dimension someSize(  ) {
  Dimension dim = new Dimension(0,0);
  someSize(dim);
  return dim;
}

//New method which fills in the Dimension details in a passed parameter.
public void someSize(Dimension dim) {
  dim.width = someSize.width;
  dim.width = someSize.height;
}

Canonicalizing Objects

Wherever possible, you should replace multiple objects with a single object (or just a few). For example, if you need only one VectorPoolManager object, it makes sense to provide a static variable somewhere that holds this. You can even enforce this by making the constructor private and holding the singleton in the class itself; e.g., change the definition of VectorPoolManager to:

public class VectorPoolManager
{
  public static final VectorPoolManager SINGLETON =

    new VectorPoolManager(10);
  Vector[] pool;
  boolean[] inUse;
 
  //Make the constructor private to enforce that
  //no other objects can be created.
  private VectorPoolManager(int initialPoolSize)

  {
  ...
}

An alternative implementation is to make everything static (all methods and both the instance variables in the VectorPoolManager class). This also ensures that only one pool manager can be used. My preference is to have a SINGLETON object for design reasons.[3]

This activity of replacing multiple copies of an object with just a few objects is often referred to as canonicalizing objects. The Booleans provide an existing example of objects that should have been canonicalized in the JDK. They were not, and no longer can be without breaking backward compatibility. For Booleans, only two objects need to exist, but by allowing a new Boolean object to be created (by providing public constructors), you lose canonicalization. The JDK should have enforced the existence of only two objects by keeping the constructors private. Note that canonical objects have another advantage in addition to reducing the number of objects created: they also allow comparison by identity. For example:

Boolean t1 = new Boolean(true);

System.out.println(t1==Boolean.TRUE);
System.out.println(t1.equals(Boolean.TRUE));

produces the output:

false
true

If Booleans had been canonicalized, all Boolean comparisons could be done by identity: comparison by identity is always faster than comparison by equality, because identity comparisons are simply pointer comparisons.[4]

You are probably better off not canonicalizing all objects that could be canonicalized. For example, the Integer class can (theoretically) have its instances canonicalized, but you need a map of some sort, and it is more efficient to allow multiple instances, rather than to manage a potential pool of four billion objects. However, the situation is different for particular applications. If you use just a few Integer objects in some defined way, you may find you are repeatedly creating the Integer objects with values 1, 2, 3, etc., and also have to access the integerValue( ) to compare them. In this case, you can canonicalize a few integer objects, improving performance in several ways: eliminating the extra Integer creations and the garbage collections of these objects when they are discarded, and allowing comparison by identity. For example:

public class IntegerManager
{
  public static final Integer ZERO = new Integer(0);
  public static final Integer ONE = new Integer(1);
  public static final Integer TWO = new Integer(2);
  public static final Integer THREE = new Integer(3);

  public static final Integer FOUR = new Integer(4);
  public static final Integer FIVE = new Integer(5);
  public static final Integer SIX = new Integer(6);
  public static final Integer SEVEN = new Integer(7);
  public static final Integer EIGHT = new Integer(8);
  public static final Integer NINE = new Integer(9);

  public static final Integer TEN = new Integer(10);
}
 
public class SomeClass
{
  public void doSomething(Integer i)
  {
    //Assume that we are passed a canonicalized Integer

    if (i == IntegerManager.ONE)
     xxx(  );
   else if(i == IntegerManager.FIVE)
     yyy(  );
   else ...
  }

  ...
}

There are various other frequently used objects throughout an application that should be canonicalized. A few that spring to mind are the empty string, empty arrays of various types, and some dates.

String canonicalization

There can be some confusion about whether Strings are already canonicalized. There is no guarantee that they are, although the compiler can canonicalize Strings that are equal and are compiled in the same pass. The String.intern( ) method canonicalizes strings in an internal table. This is supposed to be, and usually is, the same table used by strings canonicalized at compile time, but in some earlier JDK versions (e.g., 1.0), it was not the same table. In any case, there is no particular reason to use the internal string table to canonicalize your strings unless you want to compare Strings by identity (see ). Using your own table gives you more control and allows you to inspect the table when necessary. To see the difference between identity and equality comparisons for Strings, including the difference that String.intern( ) makes, you can run the following class:

public class Test
{
  public static void main(String[] args)
  {
    System.out.println(args[0]);  //see that we have the empty string
 
    //should be true

    System.out.println(args[0].equals(""));
 
    //should be false since they are not identical objects
    System.out.println(args[0] == "");
 
    //should be true unless there are two internal string tables
    System.out.println(args[0].intern(  ) == ""); 

  }
}

This Test class, when run with the command line:

java Test ""

gives the output:

true
false
true

Changeable objects

Canonicalizing objects is best for read-only objects and can be troublesome for objects that change. If you canonicalize a changeable object and then change its state, then all objects that have a reference to the canonicalized object are still pointing to that object, but with the object's new state. For example, suppose you canonicalize a special Date value. If that object has its date value changed, all objects pointing to that Date object now see a different date value. This result may be desired, but more often it is a bug.

If you want to canonicalize changeable objects, one technique to make it slightly safer is to wrap the object with another one, or use your own (sub)class.[5] Then all accesses and updates are controlled by you. If the object is not supposed to be changed, you can throw an exception on any update method. Alternatively, if you want some objects to be canonicalized but with copy-on-write behavior, you can allow the updater to return a noncanonicalized copy of the canonical object.

Note that it makes no sense to build a table of millions or even thousands of strings (or other objects) if the time taken to test for, access, and update objects in the table is longer than the time you are saving canonicalizing them.

Weak references

One technique for maintaining collections of objects that can grow too large is the use of WeakReferences (from the java.lang.ref package in Java 2). If you need to maintain one or more pools of objects with a large number of objects being held, you may start coming up against memory limits of the VM. In this case, you should consider using WeakReference objects to hold on to your pool elements. Objects referred to by WeakReferences can be automatically garbage-collected if memory gets low enough (see the "Reference Objects" sidebar).


Reference Objects


In many ways, you can think of Reference objects as normal objects that have a private Object instance variable. You can access the private object (termed the referent) using the Reference.get( ) method. However, Reference objects differ from normal objects in one hugely important way. The garbage collector may be allowed to clear Reference objects when it decides space is low enough. Clearing the Reference object sets the referent to null. For example, say you assign an object to a Reference. Later you test to see if the referent is null. It could be null if, between the assignment and the test, the garbage collector kicked in and decided to reclaim space:

Reference ref = new WeakReference(someObject); //ref.get( ) is someObject at the moment //Now do something that creates lots of objects, making //the garbage collector try to find more memory space doSomething( );   //now test if ref is null if (ref.get( ) == null) System.out.println("The garbage collector deleted my ref"); else System.out.println("ref object is still here");

Note that the referent can be garbage-collected at any time, as long as there are no other strong references referring to it. (In the example, ref.get( ) can become null only if there are no other non-Reference objects referring to someObject.)

The advantage of References is that you can use them to hang on to objects that you want to reuse but are not needed immediately. If memory space gets too low, those objects not currently being used are automatically reclaimed by the garbage collector. This means that you subsequently need to create objects instead of reusing them, but that is preferable to having the program crash from lack of memory. (To delete the reference object itself when the referent is nulled, you need to create the reference with a ReferenceQueue instance. When the reference object is cleared, it is added to the ReferenceQueue instance and can then be processed by the application, e.g., explicitly deleted from a hash table in which it may be a key.)

There are three Reference types in Java 2. WeakReferences and SoftReferences differ essentially in the order in which the garbage collector clears them. Basically, the garbage collector does not clear SoftReference objects until all WeakReferences have been cleared. PhantomReferences (not addressed here) are not cleared automatically by the garbage collector and are intended for use in a different way.


The concept behind this differentiation is that SoftReferences are intended to be used for canonical tables that may need to have memory automatically freed, and WeakReferences are intended for caches that may need to have memory automatically freed.


The rationale is that caches normally take up more space and are the first to be reclaimed when memory gets low. Canonical tables are normally smaller, and developers prefer them not to be garbage-collected unless memory gets really low. This differentiation between the two reference types allows cache memory to be freed up first if memory gets low; only when there is no more cache memory to be freed does the garbage collector start looking at canonical table memory.

Java 2 comes with a java.util.WeakHashMap class that implements a hash table with keys held by weak references.

A WeakReference normally maintains references to elements in a table of canonicalized objects. If memory gets low, any of the objects referred to by the table and not referred to anywhere else in the application (except by other weak references) are garbage-collected. This does not affect the canonicalization because only those objects not referenced anywhere else are removed. The canonical object can be re-created when required, and this new instance is now the new canonical object: remember that no other references to the object exist, or the original could not have been garbage-collected.

For example, a table of canonical Integer objects can be maintained using WeakReferences. This example is not particularly useful: unlike the earlier example, in which Integer objects from 1 to 10 can be referenced directly with no overhead, thus providing a definite speedup for tests, the next example has overheads that would probably swamp any benefits of having canonical Integers. I present it only as a clear and simple example to illustrate the use of WeakReferences.

The example has two iterations: one sets an array of canonical Integer objects up to a value set by the command-line argument; a second loops through to access the first 10 canonical Integers. If the first loop is large enough (or the VM memory is constrained low enough), the garbage collector kicks in and starts reclaiming some of the Integer objects that are all being held by WeakReferences. The second loop then reaccesses the first 10 Integer objects. Earlier, I explicitly held on to five of these Integer objects (integers 3 to 7 inclusive) in variables so that they could not be garbage-collected, and so that the second loop would reset only the five reclaimed Integers. When running this test with the VM constrained to 4 MB:

java -Xmx4M  tuning.reuse.Test 100000

you get the following output:

Resetting integer 0
Resetting integer 1
Resetting integer 2
Resetting integer 8
Resetting integer 9

The example is defined here. Note the overheads. Even if the reference has not been garbage-collected, you have to access the underlying object and cast it to the desired type:

package tuning.reuse;
 
import java.util.*;
import java.lang.ref.*;
 
public class Test
{

  public static void main(String[] args)
  {
    try
    {
      Integer ic = null;
      int REPEAT = args.length > 0 ? Integer.parseInt(args[0]) : 10000000;

      //Hang on to the Integer objects from 3 to 7
      //so that they cannot be garbage collected
      Integer i3 = getCanonicalInteger(3);
      Integer i4 = getCanonicalInteger(4);
      Integer i5 = getCanonicalInteger(5);
      Integer i6 = getCanonicalInteger(6);

      Integer i7 = getCanonicalInteger(7);
 
      //Loop through getting canonical integers until there is not
      //enough space, and the garbage collector reclaims some.
      for (int i = 0; i < REPEAT; i++)
        ic = getCanonicalInteger(i);

      //Now just re-access the first 10 integers (0 to 9) and
      //the 0, 1, 2, 8, and 9 integers will need to be reset in
      //the access method since they will have been reclaimed
      for (int i = 0; i < 10; i++)
        ic = getCanonicalInteger(i);

      System.out.println(ic);
    }
    catch(Exception e){e.printStackTrace(  );}
  }
 
  private static Vector canonicalIntegers = new Vector(  );
  public static Integer getCanonicalInteger(int i)

  {
    //First make sure our collection is big enough
    if (i >= canonicalIntegers.size(  ))
      canonicalIntegers.setSize(i+1);
 
    //Now access the canonical value.

    //This element contains null if the the value has never been set
    //or a weak reference that may have been garbage collected
    WeakReference ref = (WeakReference) canonicalIntegers.elementAt(i);
    Integer canonical_i;
 
    if (ref == null)
    {

      //never been set, so create and set it now
      canonical_i = new Integer(i);
      canonicalIntegers.setElementAt(new WeakReference(canonical_i), i);
    }
    else if( (canonical_i = (Integer) ref.get(  )) == null)
    {

      //has been set, but was garbage collected, so recreate and set it now
      //Include a print to see that we are resetting the Integer
      System.out.println("Resetting integer " + i);
      canonical_i = new Integer(i);
      canonicalIntegers.setElementAt(new WeakReference(canonical_i), i);

    }
    //else clause not needed, since the alternative is that the weak ref was
    //present and not garbage collected, so we now have our canonical integer
    return canonical_i;
  }

Enumerating constants

Another canonicalization technique often used is replacing constant objects with integers. For example, rather than use the strings "female" and "male", you should use a constant defined in an interface:

public interface GENDER
{

  public static final int FEMALE=1;
  public static final int MALE=2;
}

Used consistently, this enumeration can provide both speed and memory advantages. The enumeration requires less memory than the equivalent strings and makes network transfers faster. Comparisons are faster too, as the identity comparison can be used instead of the equality comparison. For example, you can use:

this.gender == FEMALE;

instead of:

this.gender.equals("female");

Avoiding Garbage Collection

The canonicalization techniques I've discussed are one way to avoid garbage collection: fewer objects means less to garbage-collect. Similarly, the pooling technique in that section also tends to reduce garbage-collection requirements, partly because you are creating fewer objects by reusing them, and partly because you deallocate memory less often by holding on to the objects you have allocated. Of course, this also means that your memory requirements are higher, but you can't have it both ways.

Another technique for reducing garbage-collection impact is to avoid using objects where they are not needed. For example, there is no need to create an extra unnecessary Integer to parse a String containing an int value, as in:

String string = "55";
int theInt = new Integer(string).intValue(  )


Instead, there is a static method available for parsing:

int theInt = Integer.parseInt(string);

Unfortunately, some classes do not provide static methods that avoid the spurious intermediate creation of objects. Until JDK Version 1.2, there were no static methods that allowed you to parse strings containing floating-point numbers to get doubles or floats. Instead, you needed to create an intermediate Double object and extract the value. (Even after JDK 1.2, an intermediate FloatingDecimal is created, but this is arguably due to good abstraction in the programming design.) When a class does not provide a static method, you can sometimes use a dummy instance to repeatedly execute instance methods, thus avoiding the need to create extra objects.

The primitive data types in Java use memory space that also needs reclaiming, but the overhead in reclaiming data-type storage is smaller: it is reclaimed at the same time as its holding object and so has a smaller impact. (Temporary primitive data types exist only on the stack and do not need to be garbage-collected at all: see the section for more on this.) For example, an object with just one instance variable holding an int is reclaimed in one object reclaim, whereas if it holds an Integer object, the garbage collector needs to reclaim two objects.

Reducing garbage collection by using primitive data types also applies when you can hold an object in a primitive data-type format rather than another format. For example, if you have a large number of objects each with a String instance variable holding a number (e.g., "1492", "1997"), it is better to make that instance variable an int data type and store the numbers as ints, provided that the conversion overheads do not swamp the benefits of holding the values in this alternative format.

Similarly, you can use an int (or long) to represent a Date object, providing appropriate calculations to access and update the values, thus saving an object and the associated garbage overhead. Of course, you have a different runtime overhead instead, as those conversion calculations may take up more time.

A more extreme version of this technique is to use arrays to map objects: for example, see the section . Towards the end of that example, one version of the class gets rid of node objects completely, using a large array to map and maintain all instances and instance variables. This leads to a large improvement in performance at all stages of the object life cycle. Of course, this technique is a specialized one that should not be used generically throughout your application, or you will end up with unmaintainable code. It should be used only when called for (and when it can be completely encapsulated). A simple example is for the class defined as:

class MyClass
{
  int x;
  boolean y;
}

This class has an associated collection class that seems to hold an array of MyClass objects, but that actually holds arrays of instance variables of the MyClass class:

class MyClassCollection
{
  int[] xs;
  boolean[] ys;
  public int getXForElement(int i) {return xs[i];}
  public boolean getYForElement(int i) {return ys[i];}

  //If possible avoid having to declare element access like the
  //following method:
  //public MyClass getElement(int i) {return new MyClass(xs[i], ys[i]);}
}

An extension of this technique flattens objects that have a one-to-one relationship. The classic example is a Person object that holds a Name object, consisting of first name and last name (and collection of middle names), and an Address object, with street, number, etc. This can be collapsed down to just the Person object, with all the fields moved up to the Person class. For example, the original definition consists of three classes:

public class Person {
  private Name name;
  private Address address;
}
class Name {
  private String firstName;
  private String lastName;

  private String[] otherNames;
}
class Address {
  private int houseNumber;
  private String houseName;
  private String streetName;
  private String town;

  private String area;
  private String greaterArea;
  private String country;
  private String postCode;
}

These three classes collapse into one class:

public class Person {
  private String firstName;
  private String lastName;
  private String[] otherNames;
  private int houseNumber;
  private String houseName;

  private String streetName;
  private String town;
  private String area;
  private String greaterArea;
  private String country;
  private String postCode;

}

This results in the same data and the same functionality (assuming that Addresses and Names are not referenced by more than one Person). But now you have one object instead of three for each Person. Of course, this violates the good design of an application and should not be used as standard, only when absolutely necessary.

Finally, here are some general recommendations that help to reduce the number of unnecessary objects being generated. These recommendations should be part of your standard coding practice, not just performance-related:

  • Reduce the number of temporary objects being used, especially in loops. It is easy to use a method in a loop that has side effects such as making copies, or an accessor that returns a copy of some object you only need once.
  • Use StringBuffer in preference to the String concatenation operator (+). This is really a special case of the previous point, but needs to be emphasized.
  • Be aware of which methods alter objects directly without making copies and which ones return a copy of an object. For example, any String method that changes the string (such as String.trim( )) returns a new String object, whereas a method like Vector.setSize( ) does not return a copy. If you do not need a copy, use (or create) methods that do not return a copy of the object being operated on.
  • Avoid using generic classes that handle Object types when you are dealing with basic data types. For example, there is no need to use Vector to store ints by wrapping them in Integers. Instead, implement an IntVector class that holds the ints directly. 

Initialization

All chained constructors are automatically called when creating an object with new. Chaining more constructors for a particular object causes extra overhead at object creation, as does initializing instance variables more than once. Be aware of the default values that Java initializes variables to:

  • null for objects
  • 0 for integer types of all lengths (byte, char, short, int, long)
  • 0.0 for float types (float and double)
  • false for booleans
There is no need to reinitialize these values in the constructor (although an optimizing compiler should be able to eliminate the extra redundant statement). Generalizing this point: if you can identify that the creation of a particular object is a bottleneck, either because it takes too long or because a great many of those objects are being created, you should check the constructor hierarchy to eliminate any multiple initializations to instance variables.

You can avoid constructors by unmarshalling objects from a serialized stream, because deserialization does not use constructors. However, serializing and deserializing objects is a CPU-intensive procedure and is unlikely to speed up your application. There is another way to avoid constructors when creating objects, namely by creating a clone( ) of an object. You can create new instances of classes that implement the Cloneable interface using the clone( ) method. These new instances do not call any class constructor, thus allowing you to avoid the constructor initializations. Cloning does not save a lot of time because the main overhead in creating an object is in the creation, not the initialization. However, when there are extensive initializations or many objects generated from a class with some significant initialization, this technique can help.

If you have followed the factory design pattern,[6] it is relatively simple to reimplement the original factory method to use a clone. For example, the original factory method can be defined similar to:

public static Something getNewSomething(  )
{
  return new Something(  );
}


The replaced implementation that uses cloning looks like:

private static Something MASTER_Something = new Something(  );
public static Something getNewSomething(  )
{
  return (Something) MASTER_Something.clone(  );
}

If you have not followed the factory design pattern, you may need to track down all calls that create a new instance of the relevant class and replace those calls. Note that the cloned object is still initialized, but the initialization is not the constructor initialization. Instead, the initialization consists of assigning exactly once to each instance variable of the new (cloned) object, using the instance variables of the object being cloned.

Java arrays all support cloning. This allows you to manage a similar trick when it comes to initializing arrays. But first let's see why you would want to clone an array for performance reasons.

When you create an array in code, using the curly braces to assign a newly created array to an array variable like this:

int[] array1 = {1,2,3,4,5,6,7,8,9};

you might imagine that the compiler creates an array in the compiled file, leaving a nice structure to be pulled in to memory. In fact, this is not what happens. The array is still created at runtime, with all the elements initialized then. Because of this, you should specify arrays just once, probably as a static, and assign that array as required. In most cases this is enough, and there is nothing further to improve on because the array is created just once. But sometimes you have a routine for which you want to create a new array each time you execute it. In this case, the complexity of the array determines how efficient the array creation is. If the array is quite complex, it is faster to hold a reference copy and clone that reference than it is to create a new array. For instance, the array example shown previously as array1 is simple and therefore faster to create, as shown in that example. But the following more complex array, array2, is faster to create as a cloned array:

static int[] Ref_array1 = {1,2,3,4,5,6,7,8,9};
static int[][] Ref_array2 = {{1,2},{3,4},{5,6},{7,8}};
 
int[] array1 = {1,2,3,4,5,6,7,8,9};         //faster than cloning
int[] array1 = (int[]) Ref_array1.clone(  );  //slower than initializing
 
int[][] array2 = {{1,2},{3,4},{5,6},{7,8}}; //slower than cloning
int[][] array2 = (int[][]) Ref_array2.clone(  );//faster than initializing

Early and Late Initialization

The final two sections of this chapter discuss two seemingly opposing tuning techniques. The first section, "Preallocating Objects," presents the technique of creating objects before they are needed. This technique is useful when a large number of objects need to be created at a time when CPU power is needed for other routines, and where those objects could feasibly be created earlier, at a time when there is ample spare CPU power.

The second section, "Lazy Initialization," presents the technique of delaying object creation until the last possible moment. This technique is useful for avoiding unnecessary object creation when only a few objects are used although many possible objects can be created.

In fact, these techniques represent two sides of the same coin: moving object creation from one time to another. Preallocating moves object creation to a time earlier than you would normally create those objects; lazy initialization moves object creation to a later time (or never).

Preallocating Objects

There may be situations in which you cannot avoid creating particular objects in significant amounts: perhaps they are necessary for the application and no reasonable amount of tuning has managed to reduce the object-creation overhead for them. If the creation time has been identified as a bottleneck, it is possible that you can still create the objects, but move the creation time to a part of the application when more spare cycles are available or there is more flexibility in response times.

The idea here is to choose another time to create some or all of the objects (perhaps in a partially initialized stage), and store those objects until they are needed. Again, if you have followed the factory design pattern, it is relatively simple to replace the return new Something( ) statement with an access to the collection of spare objects (presumably testing for a nonempty collection as well). If you have not followed the factory design pattern, you may need to track down all calls that create a new instance of the relevant class and replace them with a call to the factory method. For the real creation, you might want to spawn a background (low-priority) thread to churn out objects and add them into the storage collection until you run out of time, space, or necessity.

This is a variation of the "read-ahead" concept, and you can also apply this idea to:

  • Classloading (obviously not for classes needed as soon as the application starts up): see .
  • Distributed objects: see Chapter 12, Distributed Computing.
  • Reading in external data files.

Lazy Initialization

Lazy initialization means that you do not initialize objects until the first time they are used. Typically, this comes about when you are unsure of what initial value an instance variable might have but want to provide a default. Rather than initialize explicitly in the constructor (or class static initializer), it is left until access time for the variable to be initialized, using a test for null to determine if it has been initialized. For example:

public getSomething(  )
{
  if (something == null)
    something = defaultSomething(  );
  return something;
}

I find this kind of construct quite often in code (too often, in my opinion). I can only rarely see a justifiable reason for using lazy initialization. Not deciding where to initialize a variable correctly is more often a result of lazy design or lazy coding. The result can be many tests for null executing when you access your variables, and these null tests never go away: they are always performed, even after the variable has been initialized. In the worst case, this can impact performance badly, although generally the overhead is small and can be ignored. I always recommend avoiding the use of lazy initialization for general coding.

On the other hand, there are particular design situations in which it is appropriate to use lazy initialization. A good example is classloading, where classes are loaded dynamically as required. This is a specific design situation in which it is clear there will be a performance impact on running applications, but the design of the Java runtime merited this for the features that it brought.

Lazy initialization can be a useful performance-tuning technique. As usual, you should be tuning after functionality is present in your application, so I am not recommending using lazy initialization before the tuning stage. But there are places where you can change objects to be lazily initialized and make a large gain. Specifically, these are objects or variables of objects that may never be used. For example, if you need to make available a large choice of objects, of which only a few will actually be used in the application (e.g., based on a user's choice), then you are better off not instantiating or initializing these objects until they are actually used. An example is the char-to-byte encoding provided by the JDK. Only a few (usually one) of these are used, so you do not need to provide every type of encoding, fully initialized, to the application. Only the required encoding needs to be used.

When you have thousands of objects that need complex initializations but only a few will actually be used, lazy initialization provides a significant speedup to an application by avoiding exercising code that may never be run. A related situation in which lazy initialization can be used for performance tuning is when there are many objects that need to be created and initialized, and most of these objects will be used, but not immediately. In this case, it can be useful to spread out the load of object initialization so you don't get one large hit on the application. It may be better to let a background thread initialize all the objects slowly or to use lazy initialization to take many small or negligible hits, thus spreading the load over time. This is essentially the same technique as for preallocation of objects (see the previous section).

It is true that many of these kinds of situations should be anticipated at the design stage, in which case you could build lazy initialization into the application from the beginning. But this is quite an easy change to make (usually affecting just the accessors of a few classes), and so there is usually little reason to over-engineer the application prior to tuning.

Performance Checklist

Most of these suggestions apply only after a bottleneck has been identified:

  • Establish whether you have a memory problem.
  • Reduce the number of temporary objects being used, especially in loops.
  • Avoid creating temporary objects within frequently called methods.
  • Presize collection objects.
  • Reuse objects where possible.
  • Empty collection objects before reusing them. (Do not shrink them unless they are very large.)
  • Use custom conversion methods for converting between data types (especially strings and streams) to reduce the number of temporary objects.
  • Define methods that accept reusable objects to be filled in with data, rather than methods that return objects holding that data. (Or you can return immutable objects.)
  • Canonicalize objects wherever possible. Compare canonicalized objects by identity.
  • Create only the number of objects a class logically needs (if that is a small number of objects).
  • Replace strings and other objects with integer constants. Compare these integers by identity.
  • Use primitive data types instead of objects as instance variables.
  • Avoid creating an object that is only for accessing a method.
  • Flatten objects to reduce the number of nested objects.
  • Preallocate storage for large collections of objects by mapping the instance variables into multiple arrays.
  • Use StringBuffer rather than the string concatenation operator (+).
  • Use methods that alter objects directly without making copies.
  • Create or use specific classes that handle primitive data types rather than wrapping the primitive data types.
  • Consider using a ThreadLocal to provide threaded access to singletons with state.
  • Use the final modifier on instance-variable definitions to create immutable internally accessible objects.
  • Use WeakReferences to hold elements in large canonical lookup tables. (Use SoftReferences for cache elements.)
  • Reduce object-creation bottlenecks by targeting the object-creation process.
  • Keep constructors simple and inheritance hierarchies shallow.
  • Avoid initializing instance variables more than once.
  • Use the clone( ) method to avoid calling any constructors.
  • Clone arrays if that makes their creation faster.
  • Create copies of simple arrays faster by initializing them; create copies of complex arrays faster by cloning them.
  • Eliminate object-creation bottlenecks by moving object creation to an alternative time.
  • Create objects early, when there is spare time in the application, and hold those objects until required.
  • Use lazy initialization when there are objects or variables that may never be used, or when you need to distribute the load of creating objects.
  • Use lazy initialization only when there is a defined merit in the design, or when identifying a bottleneck which is alleviated using lazy initialization.


1.Ethan Henry and Ed Lycklama have written a nice article discussing Java memory leaks in the February 2000 issue of Dr. Dobb's Journal. This article is available online from the Dr. Dobb's web site, http://www.ddj.com/.
2.Up to Java 1.3. Data-conversion performance is targeted by JavaSoft, however, so some of the data conversions may speed up after 1.3.
3.The VectorPoolManager is really an object with behavior and state. It is not just a related group of functions (which is what class static methods are equivalent to). My colleague Kirk Pepperdine insists that this choice is more than just a preference. He states that holding on to an object as opposed to using statics provides more flexibility should you need to alter the use of the VectorPoolManager or provide multiple pools. I agree.
4.Deserializing Booleans would have required special handling to return the canonical Boolean. All canonicalized objects similarly require special handling to manage serialization. Java serialization supports the ability, when deserializing, to return specific objects in place of the object that is normally created by the default deserialization mechanism.
5.Beware that using a subclass may break the superclass semantics.
6.The factory design pattern recommends that object creation be centralized in a particular factory method. So instead of directly calling new Something( ) in the code to create an instance of the Something class, you call a method such as SomethingFactory.getNewSomething( ), which creates and returns a new instance of the Something class. This is actually detrimental for performance, as there is the overhead of an extra method call for every object creation, but the pattern does provide more flexibility when it comes to tuning. My inclination is to use the factory pattern. If you identify a particular factory method as a bottleneck when performance-tuning, you can relatively easily inline that factory method using a preprocessor.



No comments:

Post a Comment