LOYOLITE: September 2011

Thursday, September 29, 2011

Tuning Java I/O Performance

I/O Performance
This article discusses and illustrates a variety of techniques for improving Java I/O performance. Most of the techniques center around tuning disk file I/O, but some are applicable to network I/O and window output as well. The first set of techniques presented below cover low-level I/O issues, and then higher-level issues such as compression, formatting, and serialization are discussed. Note, however, the discussion does not cover application design issues, such as choice of search algorithms and data structures, nor does it discuss system-level issues such as file caching.

When discussing Java I/O, it's worth noting that the Java programming language assumes two distinct types of disk file organization. One is based on streams of bytes, the other on character sequences. In the Java language a character is represented using two bytes, not one byte as in other common languages such as C. Because of this, some translation is required to read characters from a file. This distinction is important in some contexts, as several of the examples will illustrate.

Low-Level I/O Issues

High-Level I/O Issues

Basic Rules for Speeding Up I/O

As a means of starting the discussion, here are some basic rules on how to speed up I/O:

Avoid accessing the disk.
Avoid accessing the underlying operating system.
Avoid method calls.
Avoid processing bytes and characters individually.

These rules obviously cannot be applied in a "blanket" way, because if that were the case, no I/O would ever get done! But to see how they can be applied, consider the following three-part example that counts the number of newline bytes ('\n') in a file.

Approach 1: Read Method

The first approach simply uses the read method on a FileInputStream:



 import java.io.*;
  
  public class intro1 {
    public static void main(String args[]) {
      if (args.length != 1) {
        System.err.println("missing filename");
        System.exit(1);
      }
      try {
        FileInputStream fis =
            new FileInputStream(args[0]);
        int cnt = 0;
        int b;
        while ((b = fis.read()) != -1) {
          if (b == '\n')
            cnt++;
        }
        fis.close();
        System.out.println(cnt);
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
  }

However, this approach triggers a lot of calls to the underlying runtime system, that is, FileInputStream.read, a native method that returns the next byte of the file.

Approach 2: Using a Large Buffer

The second approach avoids the above problem, by using a large buffer:

 

import java.io.*;
  
 public class intro2 {
   public static void main(String args[]) {
    if (args.length != 1) {
      System.err.println("missing filename");
      System.exit(1);
    }
    try {
      FileInputStream fis =
          new FileInputStream(args[0]);
      BufferedInputStream bis =
          new BufferedInputStream(fis);
      int cnt = 0;
      int b;
      while ((b = bis.read()) != -1) {
        if (b == '\n')
          cnt++;
        }
      bis.close();
      System.out.println(cnt);
    }
    catch (IOException e) {
      System.err.println(e);
    }
  }
 }

BufferedInputStream.read takes the next byte from the input buffer, and only rarely accesses the underlying system.

Approach 3: Direct Buffering

The third approach avoids BufferedInputStream and does buffering directly, thereby eliminating the read method calls:



 import java.io.*;
  
  public class intro3 {
    public static void main(String args[]) {
      if (args.length != 1) {
        System.err.println("missing filename");
        System.exit(1);
      }
      try {
        FileInputStream fis =
            new FileInputStream(args[0]);
        byte buf[] = new byte[2048];
        int cnt = 0;
        int n;
        while ((n = fis.read(buf)) != -1) {
          for (int i = 0; i < n; i++) {
            if (buf[i] == '\n')
              cnt++;
          }
        }
        fis.close();
        System.out.println(cnt);
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
  }

For a 1 MB input file, the execution times in seconds of the programs are:

 
 intro1    6.9
 intro2    0.9
 intro3    0.4

or about a 17 to 1 difference between the slowest and fastest.

This huge speedup doesn't necessarily prove that you should always emulate the third approach, in which you do your own buffering. Such an approach may be error-prone, especially in handling end-of-file events, if it is not carefully implemented. It may also be less readable than the alternatives. But it's useful to keep in mind where the time goes, and how it can be reclaimed when necessary.

Approach 2 is probably "right" for most applications.

Buffering

Approaches 2 and 3 use the technique of buffering, where large chunks of a file are read from disk, and then accessed a byte or character at a time. Buffering is a basic and important technique for speeding I/O, and several Java classes support buffering (BufferedInputStream for bytes, BufferedReader for characters).

An obvious question is: Will making the buffer bigger make I/O go faster? Java buffers typically are by default 1024 or 2048 bytes long. A buffer larger than this may help speed I/O, but often by only a few percent, say 5 to 10%.

Approach 4: Whole File

The extreme case of buffering would be to determine the length of a file in advance, and then read in the whole file:



  import java.io.*;
  
  public class readfile {
    public static void main(String args[]) {
      if (args.length != 1) {
        System.err.println("missing filename");
        System.exit(1);
      }
      try {
        int len = (int)(new File(args[0]).length());
        FileInputStream fis =
            new FileInputStream(args[0]);
        byte buf[] = new byte[len];
        fis.read(buf);
        fis.close();
        int cnt = 0;
        for (int i = 0; i < len; i++) {
          if (buf[i] == '\n')
            cnt++;
        }
        System.out.println(cnt);
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
  }

This approach is convenient, in that a file can be treated as an array of bytes. But there's an obvious problem of possibly not having enough memory to read in a very large file.

Another aspect of buffering concerns text output to a terminal window. By default, System.out (a PrintStream) is line buffered, meaning that the output buffer is flushed when a newline character is encountered. This is important for interactivity, where you'd like to have an input prompt displayed before actually entering any input.

Approach 5: Disabling Line Buffering

But line buffering can be disabled, as in this example:



  import java.io.*;
  
  public class bufout {
    public static void main(String args[]) {
      FileOutputStream fdout =
          new FileOutputStream(FileDescriptor.out);
      BufferedOutputStream bos =
          new BufferedOutputStream(fdout, 1024);
      PrintStream ps =
          new PrintStream(bos, false);
  
      System.setOut(ps);
  
      final int N = 100000;
  
      for (int i = 1; i <= N; i++)
        System.out.println(i);
  
      ps.close();
    }
  }

This program writes the integers 1..100000 to the output, and runs about three times faster than the default equivalent that has line buffering enabled.

Buffering is also an important part of one of the examples presented below, where a buffer is used to speed up random file access.

Reading/Writing Text Files

Earlier the idea was mentioned that method call overhead can be significant when reading characters from a file. Another example of this can be found in a program that counts the number of lines in a text file:

 

 import java.io.*;

  public class line1 {
    public static void main(String args[]) {
      if (args.length != 1) {
        System.err.println("missing filename");
        System.exit(1);
      }
      try {
        FileInputStream fis =
            new FileInputStream(args[0]);
        BufferedInputStream bis =
            new BufferedInputStream(fis);
        DataInputStream dis =
            new DataInputStream(bis);
        int cnt = 0;
        while (dis.readLine() != null)
          cnt++;
        dis.close();
        System.out.println(cnt);
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
  }

This program uses the old DataInputStream.readLine method, which is implemented using read method calls to obtain each character. A newer approach is to say:



 import java.io.*;

  public class line2 {
    public static void main(String args[]) {
      if (args.length != 1) {
        System.err.println("missing filename");
        System.exit(1);
      }
      try {
        FileReader fr = new FileReader(args[0]);
        BufferedReader br = new BufferedReader(fr);
        int cnt = 0;
        while (br.readLine() != null)
          cnt++;
        br.close();
        System.out.println(cnt);
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
  }

This approach can be faster. For example, on a 6 MB text file with 200,000 lines, the second program is around 20% faster than the first.

But even if the second program isn't faster, there's an important issue to note. The first program evokes a deprecation warning from the Java 2 compiler, because DataInputStream.readLine is obsolete. It does not properly convert bytes to characters, and would not be an appropriate choice for manipulating text files containing anything other than ASCII text byte streams (recall that the Java language uses the Unicode character set, not ASCII).

This is where the distinction between byte streams and character streams noted earlier comes into play. A program such as:



 import java.io.*;
  
  public class conv1 {
    public static void main(String args[]) {
      try {
        FileOutputStream fos =
            new FileOutputStream("out1");
        PrintStream ps = 
            new PrintStream(fos);
        ps.println("\uffff\u4321\u1234");
        ps.close();
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
  }

writes an output file, but without preserving the Unicode characters that are actually output. The Reader/Writer I/O classes are character-based, and are designed to resolve this issue. OutputStreamWriter is where the encoding of characters to bytes is applied.

A program that uses PrintWriter to write out Unicode characters looks like this:

 

 import java.io.*;

  public class conv2 {
    public static void main(String args[]) {
      try {
        FileOutputStream fos =
            new FileOutputStream("out2");
        OutputStreamWriter osw =
            new OutputStreamWriter(fos, "UTF8");
        PrintWriter pw =
            new PrintWriter(osw);
        pw.println("\uffff\u4321\u1234");
        pw.close();
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
  }

This program uses the UTF8 encoding, which has the property of encoding ASCII text as itself, and other characters as two or three bytes.

Formatting Costs

Actually writing data to a file is only part of the cost of output. Another significant cost is data formatting. Consider a three-part example, one that writes out lines like:

 
The square of 5 is 25

Approach 1

The first approach is simply to write out a fixed string, to get an idea of the intrinsic I/O cost:



  public class format1 {
    public static void main(String args[]) {
      final int COUNT = 25000;

      for (int i = 1; i <= COUNT; i++) {
        String s = "The square of 5 is 25\n";
        System.out.print(s);
      }
    }
  }

Approach 2

The second approach employs simple formatting using "+":



 public class format2 {
    public static void main(String args[]) {
      int n = 5;
  
      final int COUNT = 25000;


      for (int i = 1; i <= COUNT; i++) {
        String s = "The square of " + n + " is " +
            n * n + "\n";
        System.out.print(s);
      }
    }
  }

Approach 3

The third approach uses the MessageFormat class from the java.text package:



 import java.text.*;
  
 public class format3 {
   public static void main(String args[]) {
     MessageFormat fmt =
      new MessageFormat("The square of {0} is {1}\n");
      Object values[] = new Object[2];


    int n = 5;


    values[0] = new Integer(n);
    values[1] = new Integer(n * n);
  
    final int COUNT = 25000;


    for (int i = 1; i <= COUNT; i++) {
      String s = fmt.format(values);
      System.out.print(s);
     }
    }
  }

These programs produce identical output. The running times are:

 
 format1   1.3
 format2   1.8
 format3   7.8

or about a 6 to 1 difference between the slowest and fastest. The third program would be even slower if the format had not been precompiled and the static convenience method had been used instead:

Approach 4

MessageFormat.format(String, Object[])

as in:



  import java.text.*;
  
  public class format4 {
    public static void main(String args[]) {
      String fmt = "The square of {0} is {1}\n";
      Object values[] = new Object[2];


      int n = 5;


      values[0] = new Integer(n);
      values[1] = new Integer(n * n);
  
      final int COUNT = 25000;


      for (int i = 1; i <= COUNT; i++) {
        String s =
            MessageFormat.format(fmt, values);
        System.out.print(s);
      }
    }
  }

which takes 1/3 longer than the previous example.

The fact that approach 3 is quite a bit slower than approaches 1 and 2 doesn't mean that you shouldn't use it. But you need to be aware of the cost in time.

Message formats are quite important in internationalization contexts, and an application concerned about this issue might typically read the format from a resource bundle, and then use it.

Random Access

RandomAccessFile is a Java class for doing random access I/O (at the byte level) on files. The class provides a seek method, similar to that found in C/C++, to move the file pointer to an arbitrary location, from which point bytes can then be read or written.

The seek method accesses the underlying runtime system, and as such, tends to be expensive. One cheaper alternative is to set up your own buffering on top of a RandomAccessFile, and implement a read method for bytes directly. The parameter to read is the byte offset >= 0 of the desired byte. An example of how this is done is:

 
 import java.io.*;
  
  public class ReadRandom {
    private static final int DEFAULT_BUFSIZE = 4096;
  
    private RandomAccessFile raf;
    private byte inbuf[];
    private long startpos = -1;
    private long endpos = -1;
    private int bufsize;
  
    public ReadRandom(String name) 
     throws FileNotFoundException {
      this(name, DEFAULT_BUFSIZE);
    }
  
    public ReadRandom(String name, int b)
        throws FileNotFoundException {
      raf = new RandomAccessFile(name, "r");
      bufsize = b;
      inbuf = new byte[bufsize];
    }
  
    public int read(long pos) {
      if (pos < startpos || pos > endpos) {
        long blockstart = (pos / bufsize) * bufsize;
        int n;
        try {
          raf.seek(blockstart);
          n = raf.read(inbuf);
        }
        catch (IOException e) {
          return -1;
        }
        startpos = blockstart;
        endpos = blockstart + n - 1;
        if (pos < startpos || pos > endpos)
          return -1;
      }
  
      return inbuf[(int)(pos - startpos)] & 0xffff;
    }
  
    public void close() throws IOException {
      raf.close();
    }
  
    public static void main(String args[]) {
      if (args.length != 1) {
        System.err.println("missing filename");
        System.exit(1);
      }
  
      try {
        ReadRandom rr = new ReadRandom(args[0]);
        long pos = 0;
        int c;
        byte buf[] = new byte[1];
        while ((c = rr.read(pos)) != -1) {
          pos++;
          buf[0] = (byte)c;
          System.out.write(buf, 0, 1);
        }
        rr.close();
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
  }

The driver program simply reads the bytes in sequence and writes them out.

This technique is helpful if you have locality of access, where nearby bytes in the file are read at about the same time. For example, if you are implementing a binary search scheme on a sorted file, this approach might be useful. It's of less value if you are truly doing random access at arbitrary points in a large file.

Compression

Java provides classes for compressing and uncompressing byte streams. These are found in the java.util.zip package, and also serve as the basis for Jar files (a Jar file is a Zip file with an added manifest).

The following program takes a single input file, and writes a compressed output Zip file, with a single entry representing the input file:



 import java.io.*;
 import java.util.zip.*;
  
  public class compress {
    public static void doit(
                            String filein, 
                            String fileout
                            ) {
      FileInputStream fis = null;
      FileOutputStream fos = null;
      try {
        fis = new FileInputStream(filein);
        fos = new FileOutputStream(fileout);
        ZipOutputStream zos =
            new ZipOutputStream(fos);
        ZipEntry ze = new ZipEntry(filein);
        zos.putNextEntry(ze);
        final int BUFSIZ = 4096;
        byte inbuf[] = new byte[BUFSIZ];
        int n;
        while ((n = fis.read(inbuf)) != -1)
          zos.write(inbuf, 0, n);
        fis.close();
        fis = null;
        zos.close();
        fos = null;
      }
      catch (IOException e) {
        System.err.println(e);
      }
      finally {
        try {
          if (fis != null)
            fis.close();
          if (fos != null)
            fos.close();
        }
        catch (IOException e) {
        }
      }
    }
  public static void main(String args[]) {
    if (args.length != 2) {
     System.err.println("missing filenames");
     System.exit(1);
    }
   if (args[0].equals(args[1])) {
     System.err.println("filenames are identical");
     System.exit(1);
      }
      doit(args[0], args[1]);
    }
  }

The next program reverses the process, taking an input Zip file that is assumed to have a single entry in it, and uncompresses that entry to the output file:


 import java.io.*;
 import java.util.zip.*;
  
  public class uncompress {
    public static void doit(
                            String filein, 
                            String fileout
                            ) {
      FileInputStream fis = null;
      FileOutputStream fos = null;
      try {
        fis = new FileInputStream(filein);
        fos = new FileOutputStream(fileout);
        ZipInputStream zis = new ZipInputStream(fis);
        ZipEntry ze = zis.getNextEntry();
        final int BUFSIZ = 4096;
        byte inbuf[] = new byte[BUFSIZ];
        int n;
        while ((n = zis.read(inbuf, 0, BUFSIZ)) != -1)
          fos.write(inbuf, 0, n);
        zis.close();
        fis = null;
        fos.close();
        fos = null;
      }
      catch (IOException e) {
        System.err.println(e);
      }
      finally {
        try {
          if (fis != null)
            fis.close();
          if (fos != null)
            fos.close();
        }
        catch (IOException e) {
        }
      }
    }
    public static void main(String args[]) {
      if (args.length != 2) {
     System.err.println("missing filenames");
     System.exit(1);
      }
    if (args[0].equals(args[1])) {
     System.err.println("filenames are identical");
     System.exit(1);
      }
      doit(args[0], args[1]);
    }
  }

Whether compression helps or hurts I/O performance depends a lot on your local hardware setup; specifically the relative speeds of the processor and disk drives. Compression using Zip technology implies typically a 50% reduction in data size, but at the cost of some time to compress and decompress. An experiment with large (5 to 10 MB) compressed text files, using a 300-MHz Pentium PC with IDE disk drives, showed an elapsed time speedup of around 1/3 in reading compressed files from disk, over reading uncompressed ones.

An example of where compression is useful is in writing to very slow media such as floppy disks. A test using a fast processor (300 MHz Pentium) and a slow floppy (the conventional floppy drive found on PCs), showed that compressing a large text file and then writing to the floppy drive results in a speedup of around 50% over simply copying the file directly to the floppy drive.

Caching

A detailed discussion of hardware caching is beyond the scope of this paper. But sometimes software caching can be used to speed up I/O. Consider a case where you want to read lines of a text file in random order. One way to do this is to read in all the lines, and store them in an ArrayList (a collection class similar to Vector):


 import java.io.*;
 import java.util.ArrayList;
  
  public class LineCache {
    private ArrayList list = new ArrayList();
  
    public LineCache(String fn) throws IOException {
      FileReader fr = new FileReader(fn);
      BufferedReader br = new BufferedReader(fr);
      String ln;
      while ((ln = br.readLine()) != null)
        list.add(ln);
      br.close();
    }
  
    public String getLine(int n) {
      if (n < 0)
        throw new IllegalArgumentException();
  
      return (n < list.size() ? 
       (String)list.get(n) : null);
    }
  
    public static void main(String args[]) {
      if (args.length != 1) {
        System.err.println("missing filename");
        System.exit(1);
      }
      try {
        LineCache lc = new LineCache(args[0]);
        int i = 0;
        String ln;
        while ((ln = lc.getLine(i++)) != null)
          System.out.println(ln);
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
  }

The getLine method is then used to retrieve an arbitrary line. This technique is quite useful, but obviously uses a lot of memory for large files, and so has limitations. An alternative might be to simply remember the last 100 lines that were requested, and read from the disk for any other requests. This scheme works well if there is locality of access of the lines, but not so well if line requests are truly random.

Tokenization

Tokenization refers to the process of breaking byte or character sequences into logical chunks, for example words. Java offers a StreamTokenizer class, that operates like this:

 
 import java.io.*;
  
  public class token1 {
    public static void main(String args[]) {
     if (args.length != 1) {
       System.err.println("missing filename");
       System.exit(1);
      }
      try {
        FileReader fr = new FileReader(args[0]);
        BufferedReader br = new BufferedReader(fr);
        StreamTokenizer st = new StreamTokenizer(br);
        st.resetSyntax();
        st.wordChars('a', 'z');
        int tok;
        while ((tok = st.nextToken()) !=
            StreamTokenizer.TT_EOF) {
          if (tok == StreamTokenizer.TT_WORD)
            ;// st.sval has token
        }
        br.close();
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
  }

This example tokenizes in terms of lower-case words (letters a-z). If you implement the equivalent yourself, it might look like:

 

 import java.io.*;
  
  public class token2 {
    public static void main(String args[]) {
      if (args.length != 1) {
        System.err.println("missing filename");
        System.exit(1);
      }
      try {
        FileReader fr = new FileReader(args[0]);
        BufferedReader br = new BufferedReader(fr);
        int maxlen = 256;
        int currlen = 0;
        char wordbuf[] = new char[maxlen];
        int c;
        do {
          c = br.read();
          if (c >= 'a' && c <= 'z') {
            if (currlen == maxlen) {
              maxlen *= 1.5;
              char xbuf[] =
                  new char[maxlen];
              System.arraycopy(
                  wordbuf, 0,
                  xbuf, 0, currlen);
              wordbuf = xbuf;
            }
            wordbuf[currlen++] = (char)c;
          }
          else if (currlen > 0) {
            String s = new String(wordbuf,
                0, currlen);
          // do something with s
            currlen = 0;
          }
        } while (c != -1);
        br.close();
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
  }

The second program runs about 20% faster than the first, at the expense of having to write some tricky low-level code.

StreamTokenizer is sort of a hybrid class, in that it will read from character-based streams (like BufferedReader), but at the same time operates in terms of bytes, treating all characters with two-byte values (greater than 0xff) as though they are alphabetic characters.

Serialization

Serialization is used to convert arbitrary Java data structures into byte streams, using a standardized format. For example, the following program writes out an array of random integers:


  import java.io.*;
  import java.util.*;
  
  public class serial1 {
    public static void main(String args[]) {
      ArrayList al = new ArrayList();
      Random rn = new Random();
      final int N = 100000;
  
      for (int i = 1; i <= N; i++)
        al.add(new Integer(rn.nextInt()));
  
      try {
        FileOutputStream fos =
            new FileOutputStream("test.ser");
        BufferedOutputStream bos =
            new BufferedOutputStream(fos);
        ObjectOutputStream oos =
            new ObjectOutputStream(bos);
        oos.writeObject(al);
        oos.close();
      }
      catch (Throwable e) {
        System.err.println(e);
      }
    }
  }

and this program reads the array back in:

 
 import java.io.*;
 import java.util.*;
  
  public class serial2 {
    public static void main(String args[]) {
      ArrayList al = null;
  
      try {
        FileInputStream fis =
            new FileInputStream("test.ser");
        BufferedInputStream bis =
            new BufferedInputStream(fis);
        ObjectInputStream ois =
            new ObjectInputStream(bis);
        al = (ArrayList)ois.readObject();
        ois.close();
      }
      catch (Throwable e) {
        System.err.println(e);
      }
    }
  }

Note that we used buffering to speed the I/O operations.

Is there a faster way than serialization to write out large volumes of data, and then read it back? Probably not, except in special cases. For example, suppose that you decide to write out a 64-bit long integer as text instead of as a set of 8 bytes. The maximum length of a long integer as text is around 20 characters, or 2.5 times as long as the binary representation. So it seems likely that this format wouldn't be any faster. In some cases, however, such as bitmaps, a special format might be an improvement. However, using your own scheme does work against the standard offered by serialization, so doing so involves some tradeoffs.

Beyond the actual I/O and formatting costs of serialization (using DataInputStream and DataOutputStream), there are other costs, for example, the need to create new objects when deserializing.

Note also that the methods of DataOutputStream can be used to develop semi-custom data formats, for example:

 

 import java.io.*;
  import java.util.*;
  
  public class binary1 {
    public static void main(String args[]) {
      try {
        FileOutputStream fos =
            new FileOutputStream("outdata");
        BufferedOutputStream bos =
            new BufferedOutputStream(fos);
        DataOutputStream dos =
            new DataOutputStream(bos);
        Random rn = new Random();
        final int N = 10;
        dos.writeInt(N);
        for (int i = 1; i <= N; i++) {
          int r = rn.nextInt();
          System.out.println(r);
          dos.writeInt(r);
        }
        dos.close();
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
  }

and:



 import java.io.*;
  
  public class binary2 {
    public static void main(String args[]) {
      try {
        FileInputStream fis =
            new FileInputStream("outdata");
        BufferedInputStream bis =
            new BufferedInputStream(fis);
        DataInputStream dis =
            new DataInputStream(bis);
        int N = dis.readInt();
        for (int i = 1; i <= N; i++) {
          int r = dis.readInt();
          System.out.println(r);
        }
        dis.close();
      }
      catch (IOException e) {
        System.err.println(e);
      }
    }
  }

These programs write 10 integers to a file and then read them back.

Obtaining Information About Files

Our discussion so far has centered on input and output for individual files. But there's another aspect of speeding I/O performance, that relates to finding out properties of files. For example, consider a small program that prints the length of a filename:


 import java.io.*;
  
  public class length1 {
    public static void main(String args[]) {
      if (args.length != 1) {
        System.err.println("missing filename");
        System.exit(1);
      }
      File f = new File(args[0]);
      long len = f.length();
      System.out.println(len);
    }
  }

The Java runtime system itself cannot know the length of a file, and so must query the underlying operating system to obtain this information. This holds true for other file information, such as whether a file is a directory, the time it was last modified, and so on. The File class in the java.io package provides a set of methods to query this information. Such querying is in general expensive in terms of time, and should be used as little as possible.

A longer example of querying file information, one that recursively walks the file system roots to dump out a set of all the file pathnames on a system, looks like this:

 
 import java.io.*;
  
  public class roots {
    public static void visit(File f) {
      System.out.println(f);
    }
  
    public static void walk(File f) {
      visit(f);
      if (f.isDirectory()) {
        String list[] = f.list();
        for (int i = 0; i < list.length; i++)
          walk(new File(f, list[i]));
      }
    }
  
    public static void main(String args[]) {
      File list[] = File.listRoots();
      for (int i = 0; i < list.length; i++) {
        if (list[i].exists())
          walk(list[i]);
        else
          System.err.println("not accessible: "
              + list[i]);
      }
    }
  }

This example uses File methods, such as isDirectory and exists, to navigate through the directory structure. Each file is queried exactly once as to its type (plain file or directory).

Monday, September 26, 2011

Getting the IP Address and Hostname of the Local Machine

This code will list all interfaces and attached inet addresses on local machine, And fetch you teh actual ip address of the host.


        final String hostname = InetAddress.getLocalHost().getHostName();
       String ip = InetAddress.getLocalHost().getHostAddress();
       final Enumeration<NetworkInterface> ifaces = NetworkInterface.getNetworkInterfaces();
       for (NetworkInterface iface : Collections.list(ifaces))
       {
           /*final Enumeration<NetworkInterface> virtualIfaces = iface.getSubInterfaces();
           for (NetworkInterface viface : Collections.list(virtualIfaces))
           {
               System.out.println(iface.getDisplayName() + " VIRT " + viface.getDisplayName());
                final Enumeration<InetAddress> vaddrs = viface.getInetAddresses();
               for (InetAddress vaddr : Collections.list(vaddrs))
               {
                   System.out.println("\t" + vaddr.toString());
               }
           }*/
           final Enumeration<InetAddress> raddrs = iface.getInetAddresses();
           for (InetAddress addr : Collections.list(raddrs))
           {
               if (!addr.isLoopbackAddress() && addr.isSiteLocalAddress())
               {
                   ip = addr.getHostAddress();
               }
           }
       }

Sunday, September 25, 2011

Generic Methods

We can create a single Generic method which can be called by passing arguments of different types. Depending on the argument type provided to the compiler , each method call is handled by compiler appropriately.
The following thing must be keep in mind when defining Generic Method :

The Generic method has a type parameter section which is enclosed by angel (< >) brackets which should be placed before return type of method. For example :

static <T> void printType(T anytype)

Note :

Type parameter section can have one or more type variable separated by commas(,)
The passed argument(known as actual type arguments) replaces the type parameters which act as return type for the method.In other words, we can say that it(type parameters) acts as placeholder for passed argument to the method.
The Generic method declaration is similar to any other method declaration.

EXAMPLE
In the below example, <T> acts as placeholder for the passed arguments to the Generic method "printClassnName" . This method prints two things : the passed argument package name & name of the runtime class of the passed argument..

package simpleCoreJava;
public class GenericMethodExample {
static <T> void printClassnName(T anyType) {
System.out.println(anyType.getClass().getPackage());
System.out.println(anyType.getClass().getName());
}
public static void main(String[] args) {
GenericMethodExample.printClassnName(String.class);
GenericMethodExample.printClassnName(new String(""));
GenericMethodExample.printClassnName(Integer.class);
GenericMethodExample.printClassnName(new Integer(3));
GenericMethodExample.printClassnName(GenericMethodExample.class);
}
}

OUTPUT :

package java.lang, Java Platform API Specification, version 1.6
java.lang.Class
package java.lang, Java Platform API Specification, version 1.6
java.lang.String
package java.lang, Java Platform API Specification, version 1.6
java.lang.Class
package java.lang, Java Platform API Specification, version 1.6
java.lang.Integer
package java.lang, Java Platform API Specification, version 1.6
java.lang.Class

Wildcards in Generics

Problem :

For better understanding of wildcards, taking a scenario :
Take a look at the following code :

List<String> strArrayList = new ArrayList<String>(); 
strArrayList.add(new String("Genrics"));
strArrayList.add(new String("Genrics Wildcards"));

The above code declares a array list of type String and later adding String objects.
Now we declaring another aray list of type String and passing above array list to this :

List<String> anotherStrArrayList  = strArrayList;

This is perfectly alright to pass String array list to another array list of type String.
But the below code will give compiler error :

List<Object> obj = strArrayList;

The above code will give error because the list of Object type is assigning a list of String type. Though the String is a concrete sub-class of Object, this is assignment is not possible in Generics.

Solution :

Now, the solution of above problem is :

List<?> obj = strArrayList;

The above line is perfectly alright and runs successfully without any error.
The '?' symbol or character is known as wildcard character and it can accept any java type. Whether it is of type java.lang.Object or String or Integer or any Java type. It acts as placeholder which can be assigned with any Java type.

Other Examples

The below code runs without any error :

List<?> anyObj = null; 
List<Integer> intObj = new ArrayList<Integer>();
anyObj = intObj;
List<Double> doublesObj = new ArrayList<Double>();
anyObj = doublesObj;

In the above code the anyObj has wildcard(?) character in place of type parameter due to which it is possible to assign it any type of list.

Make your apps fly with Flyweight pattern!

Justified or not, Java is sometimes perceived as slow. Compared with other languages—such as C++—that are compiled into machine code, Java is interpreted at runtime; and all other things being equal, compiled machine code is faster than interpreted byte code. Of course, you might argue that Java can hold its own, mainly due to advanced technologies such as just-in-time (JIT) compilers and Java HotSpot, and you'd be mostly correct. Today's JVMs are modern marvels that wring amazing performance out of Java byte code.

Regardless of modern JVM efficiency, you should always look for programming techniques that boost performance. In this installment of Java Design Patterns, I discuss one of those techniques that, not surprisingly, takes the form of a design pattern: the Flyweight pattern. Let's see how it works.

The Flyweight pattern

Object-oriented design is sometimes at odds with performance. From a design standpoint, it's best to encapsulate everything in an object so you can treat those objects uniformly, reuse them, and extend them. Unfortunately, modeling everything as an object can be expensive performance-wise. A case-in-point is Smalltalk, which literally models everything as an object, as opposed to Java, which provides a mix of objects and intrinsic types. Today, Java is much more prevalent than Smalltalk, due in no small part to Java's better performance.

Note: Java also provides static type checking, which Smalltalk lacks. Although static type checking is widely regarded as beneficial, most Smalltalk developers argue against it.

Even though Java provides a mix of objects and intrinsic types, Java applications can create a huge number of objects. For example, trees can have an unlimited number of rows, so if you model each row as a distinct Swing component, you're asking for trouble. For fine-grained objects such as tree nodes, you can implement the Flyweight pattern to create a small pool of shared objects, which significantly reduces the number of objects created. Soon enough, you'll learn how to implement flyweights and share them; in the meantime, it's sufficient to understand that flyweights are shared objects and that using them can result in substantial performance gains.

In Design Patterns, the Gang of Four (GOF) authors describe the Flyweight pattern like this:

Use sharing to support large numbers of fine-grained objects efficiently.

Figure 1 shows a class diagram for the Flyweight design pattern.

Figure 1. Flyweight class diagram

Flyweights are typically instantiated by a flyweight factory that creates a limited number of flyweights and doles them out, one at a time, to clients. Those flyweights are instantiated according to some criteria. For example, you might have a pool of line objects that know how to draw lines. In that case, the flyweight factory could create one line object for each line color, such as one object for white lines and another for blue lines. Those lines, which are flyweights, get reused whenever you draw white or blue lines. If you have a drawing with 1,000 white lines and 6,000 blue lines, only two lines—instead of 7,000—are actually instantiated.

Figure 2's sequence diagram depicts runtime interaction between clients and a flyweight factory.

Figure 2. Flyweight sequence diagram

Clients don't directly instantiate flyweights; instead they get them from a factory. The factory first checks to see if it has a flyweight that fits specific criteria (e.g., a blue or white line); if so, the factory returns a reference to the flyweight. If the factory can't locate a flyweight for the specified criteria, it instantiates one, adds it to the pool, and returns it to the client.

You might be wondering how you can reuse flyweights. For example, how can one line flyweight draw all lines of a particular color when those lines presumably have different locations and lengths? You accomplish that reuse by splitting a flyweight's state in two: intrinsicstate (e.g., line color) and extrinsic state (e.g., line location and length). The flyweight maintains intrinsic state, while extrinsic state must be passed to the flyweight at runtime. By externalizing extrinsic state, you can easily reuse flyweights for different variations of that extrinsic state.

Flyweights are abundant in Java, from simple strings to Swing tree nodes and component borders. The rest of this article discusses those flyweights and concludes with an example of how you can implement your own flyweights.

Java strings

Because strings are usually plentiful in most applications, Java goes to great lengths to make sure strings perform well. First, strings are immutable, so they don't have to be thread-safe. Because synchronization is expensive, strings are spared that drag on performance. Second, as you might guess, strings specified at compile-time are flyweights—strings that contain the same character sequence are shared. That sharing can greatly reduce memory footprints, and therefore, increase performance.

Example 1's application demonstrates Java strings' flyweight nature.

Example 1. Flyweight strings

public class StringTest {
   public static void main(String[] args) {
      String fly = "fly", weight = "weight";
      String fly2 = "fly", weight2 = "weight"; 
      System.out.println(fly == fly2);       // fly and fly2 refer to the same String instance
      System.out.println(weight == weight2); // weight and weight2 also refer to
                                             // the same String instance
      String distinctString = fly + weight;
      System.out.println(distinctString == "flyweight"); // flyweight and "flyweight" do not
                                                         // refer to the same instance
      String flyweight = (fly + weight).intern();
      System.out.println(flyweight == "flyweight"); // The intern() method returns a flyweight
   }
}

Here's the application's output: true true false true.

In the preceding application, the strings fly and fly2 refer to the same string object (recall that the == operator evaluates totrue if the objects it compares are the same). Ditto for weight and weight2.

Strings computed at runtime, such as distinctString in the preceding application, are not flyweights by default; however, you can force the issue with the String.intern() method—a poorly named method that brings to mind medical topics, but nonetheless returns flyweights for strings computed at runtime.

Because the Java compiler implements the Flyweight pattern for strings, I can't show you how it's done. So, let's move on to another flyweight implemented in the Java APIs.

Swing borders

Swing comes with an exhaustive set of borders for its components, including line borders, titled borders, bevel borders, etc. Figure 3 shows an application that fits two panels with a bevel border.

Figure 3. Swing borders

You can implement component borders in numerous ways. The GOF book, for example, shows you how to implement them with the Decorator pattern (see page 175), but in Swing, they're implemented as flyweights for maximum performance. Additionally, Swing lets you instantiate a single border per component if you wish, but it also provides a border factory that creates flyweight borders so a single border can be shared among multiple components. Example 2, the listing for Figure 3's application, illustrates the use of that factory.

Example 2. Border flyweights

import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
import javax.swing.border.*;
public class Test extends JFrame {
   public static void main(String[] args) {
      Test test = new Test();
      test.setBounds(20,20,250,150);
      test.show();
   }
   public Test() {
      super("Border flyweights");
      JPanel panel   = new JPanel(), panel2 = new JPanel();
      Border border  = BorderFactory.createRaisedBevelBorder();
      Border border2 = BorderFactory.createRaisedBevelBorder();
      panel.setBorder(border);
      panel.setPreferredSize(new Dimension(100,100));
      panel2.setBorder(border2);
      panel2.setPreferredSize(new Dimension(100,100));
      Container contentPane = getContentPane();
      contentPane.setLayout(new FlowLayout());
      contentPane.add(panel);
      contentPane.add(panel2);
      setDefaultCloseOperation(WindowConstants.DISPOSE_ON_CLOSE);
      addWindowListener(new WindowAdapter() {
         public void windowClosed(WindowEvent e) {
            System.exit(0);
         }
      });
      if(border == border2)
         System.out.println("bevel borders are shared");
      else
         System.out.println("bevel borders are NOT shared");
   }
}

The preceding application's output is: bevel borders are shared, which means the two panels share a single border instance. Notice that BorderFactory.createRaisedBevelBorder() is somewhat of a misnomer because it doesn't actually create anything if a bevel border already exists in the factory.

Borders have a paintBorder() method that the border's component invokes when the component is painted. Here is that method's signature:

public void paintBorder(Component c, 
                        Graphics g, 
                        int x, 
                        int y, 
                        int width, 
                        int height)

The arguments passed to paintBorder() represent a border's extrinsic state, namely, a component, a graphics object to draw in, and the border's bounds. Because a border can be a flyweight, and therefore shared by multiple components, the border does not maintain that extrinsic state.

Swing tree nodes

Swing tree nodes are excellent candidates for the Flyweight pattern. Consider Figure 4, which shows a Swing tree that acts as a file explorer.

Figure 4. Swing trees

As a testament to Swing's design, a file explorer like the one in Figure 4 is straightforward to implement. In fact, it's a mere 144 lines of code, blank lines and all. As a testament to the Flyweight pattern, that file explorer is fast. If you wish, you can download this article's source code and take the file explorer for a spin. If you do, explore some directories with a lot of files so you have a tree with many nodes like Figure 4 (note the scrollbar thumb's size). Then grab the scrollbar thumb and move it up and down as fast as you can. You'll see that performance is excellent.

Swing trees are blazingly fast because they use a single componentfor all nodes in the tree. That component is created by a tree cell renderer with the overly verboseTreeCellRenderer.getTreeCellRendererComponent() method, whose signature is listed below:

public Component getTreeCellRendererComponent(JTree tree,
                                              Object value,
                                              boolean selected,
                                              boolean expanded,
                                              boolean leaf,
                                              int row,
                                              boolean hasFocus)

By now, the sheer number of arguments togetTreeCellRendererComponent() should indicate to you that the component returned from the method is a flyweight. Like Swing borders, tree cell renderers are passed extrinsic state so the rendering component can be shared. ThegetTreeCellRendererComponent() method sets the component's properties depending upon the extrinsic state. For instance, for folders, getTreeCellRendererComponent() fits the component with a folder icon; if not, it uses a document icon.

As an aside, Swing tables render table cells in the same manner as Swing trees. One component represents all the cells for a single table, so Swing tables also have excellent performance.

So far I've detailed three Flyweight pattern examples in Java, but for you to understand the Flyweight pattern completely, let's implement a flyweight from scratch.

Roll your own!

The rest of this article implements a line flyweight similar to the one discussed at the beginning of this article. We'll explore that implementation in three steps. First, I implement an application that draws 10,000 lines without the aid of a line object, by simply painting repeatedly into a graphics context. Second, I implement a naive Lineclass that cannot be shared. Third, I turn that naive implementation into a flyweight. Figure 5 shows the line-drawing application.

Figure 5. Drawing lines with Swing

Drawing without line objects

Example 3 lists Figure 5's application.

Example 3. Drawing lines without line objects

import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
import javax.swing.event.*;
public class Test extends JFrame{
   private static final Color colors[] = { Color.red, Color.blue,
                                           Color.yellow, Color.orange,
                                           Color.black,  Color.white };
   private static final int WINDOW_WIDTH=400,
                            WINDOW_HEIGHT=400,
                           NUMBER_OF_LINES=10000;
   public Test() {
      Container contentPane = getContentPane();
      contentPane.setLayout(new BorderLayout());
      
      JButton button = new JButton("draw lines");
      final JPanel  panel  = new JPanel();
      contentPane.add(panel,  BorderLayout.CENTER);
      contentPane.add(button, BorderLayout.SOUTH);
      setBounds(20,20,WINDOW_WIDTH,WINDOW_HEIGHT);      
      button.addActionListener(new ActionListener() {
         public void actionPerformed(ActionEvent event) {
            Graphics g = panel.getGraphics();
            for(int i=0; i < NUMBER_OF_LINES; ++i) {
               g.setColor(getRandomColor());
               g.drawLine(getRandomX(), getRandomY(), 
                          getRandomX(), getRandomY());
            }
         }
      });
   }
   public static void main(String[] args) {
      Test test = new Test();
      test.show();
   }
   private int getRandomX() {
      return (int)(Math.random()*WINDOW_WIDTH);
   }
   private int getRandomY() {
      return (int)(Math.random()*WINDOW_HEIGHT);
   }
   private Color getRandomColor() {
      return colors[(int)(Math.random()*colors.length)];
   }
}

The preceding application creates a window with two panels. The application draws lines with random colors, locations, and lengths into the top panel when you click on the button in the bottom panel. The graphic context's drawLine() method draws those lines.

Now let's implement a Line class and use line objects to draw lines instead of using the graphics context directly.

A naive Line implementation

Perhaps you think using a graphics context to draw lines is not very object-oriented, so you want to implement a Line class to encapsulate that functionality. Let's do that. Example 4 lists the revised application.

Example 4. Drawing lines with a heavyweight line object

import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
import javax.swing.event.*;
public class Test extends JFrame{
   public Test() {
      ...
      button.addActionListener(new ActionListener() {
         public void actionPerformed(ActionEvent event) {
            Graphics g = panel.getGraphics();
            for(int i=0; i < NUMBER_OF_LINES; ++i) {
               Color color = getRandomColor();
               System.out.println("Creating " + color + " line");
               Line line = new Line(color,
                                    getRandomX(), getRandomY(), 
                                    getRandomX(), getRandomY());
               line.draw(g);
            }
         }
      });
   }
   ...
}

I only included pertinent code; the rest of the application is identical to the one listed in Example 3. The preceding application creates 10,000 line objects and tells each one to draw itself. Example 5 lists the Line class.

Example 5. A heavyweight Line class

import java.awt.*;
public class Line {
   private Color color = Color.black;
   private int x, y, x2, y2;
   
   public Line(Color color, int x, int y, int x2, int y2) {
      this.color = color;
      this.x = x;   this.y = y;
      this.x2 = x2; this.y2 = y2;
   }
   public void draw(Graphics g) {
      g.setColor(color);
      g.drawLine(x, y, x2, y2);
   }
}

The preceding Line class maintains its color and endpoints, which it uses to draw a line.

A flyweight Line implementation

Obviously, we don't want to create 10,000 lines as we did in the preceding application, so let's reduce that number to six, one line for each line color. Example 6 lists the revised application.

Example 6. Drawing lines with a flyweight line object

import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
import javax.swing.event.*;
public class Test extends JFrame {
   ...
   public Test() {
      ...
      button.addActionListener(new ActionListener() {
         public void actionPerformed(ActionEvent event) {
            Graphics g = panel.getGraphics();
            for(int i=0; i < NUMBER_OF_LINES; ++i) {
               Line line = LineFactory.getLine(getRandomColor());
               line.draw(g, getRandomX(), getRandomY(), 
                            getRandomX(), getRandomY());
            }
         }
      });
   }
   ...
}

That's more like it. Now line objects are obtained from a line factory, which returns one of six shared line instances. Example 7 lists that factory.

Example 7. The line factory

import java.util.HashMap;
import java.awt.Color;
public class LineFactory {
   private static final HashMap linesByColor = new HashMap();
   public static Line getLine(Color color) {
      Line line = (Line)linesByColor.get(color);
      if(line == null) {
         line = new Line(color);
         linesByColor.put(color, line);
         System.out.println("Creating " + color + " line");
      }
      return line;
   }
}

The line factory maintains a HashMap of lines, keyed by color. When you ask the factory for a line by calling getLine(), the factory first checks to see if a line with that color exists in the HashMap; if so, it returns it. Otherwise, it creates a line, stores it in the HashMap, and returns it. Either way, the caller winds up with a shared Line instance.

Example 8 lists the revised Line class.

Example 8. A flyweight Line implementation

import java.awt.*;
public class Line {
   private Color color;
   public Line(Color color) {
      this.color = color;
   }
   public void draw(Graphics g, int x, int y, int x2, int y2) {
      g.setColor(color);
      g.drawLine(x, y, x2, y2);
   }
}

Notice the preceding Line class is simpler than Example 5's Line class. Why? Because the preceding Line class has been purged of extrinsic state, namely the line's endpoints. Because the preceding Line class does not maintain that extrinsic state, and because it's passed to the draw() method from the client, those line instances can be shared.

Your apps can soar

Historically, allocating a huge number of (typically small) objects can be detrimental to your Java application's performance, although modern JVMs have greatly reduced the penalty you must pay for such excess. If you find that your application is instantiating numerous objects of a particular type, you might consider using the Flyweight pattern to share a limited number of those objects.