Overview
Many frameworks for storing objects in an off-line or cached manner, use standard Java Serialization to encode the object as bytes which can be turned back into the original object.Java Serialization is generic and can serialise just about any type of object.
Why avoid it
The main problem with Java Serialization is performance and efficiency. Java serialization is much slower than using in memory stores and tends to significantly expand the size of the object. Java Serialization also creates a lot of garbage.Access performance
Say you have a collection and you want to update a field of many elements. Something likefor (MutableTypes mt : mts) { mt.setInt(mt.getInt()); }
If you update one million elements for about five seconds how long does each one take.
Collection | Fast PC | Labtop |
---|---|---|
Huge Collection | 5.1 ns | 33 ns |
List<JavaBean> | 6.5 ns | 54 ns |
List<byte[]> with Externalizable | 5,841 ns | 17,508 ns |
List<byte[]> with Serializable | 23,217 ns | 60,947 ns |
If you update ten million elements for five seconds or more
Collection | Fast PC | Labtop |
---|---|---|
Huge Collection | 5.4 ns | 33 ns |
List<JavaBean> | 6.6 ns | 60 ns |
List<byte[]> with Externalizable | 6,073 ns | 71,691 ns |
List<byte[]> with Serializable | 22,943 ns | failed |
* Fast PC - 3.8 GHz i7 with 24 GB of memory.
* Labtop - 2.3 GHz Core Duo with 4 GB of memory.
Huge Collection stores information in a column based based, so accessing just one field is much more CPU cache efficient than using JavaBeans. If you were to update every field, it would be about 2x or more times slower.
Using an optimised Externalizable is much faster than the default Serializable, however is it 400x slower than using a a JavaBean
Memory efficiency
The per object memory used is also important as it impacts how many object you can store and the performance of accessing those objects.Collection type | Heap used per million | Direct memory per million | Garbage produced per million |
---|---|---|---|
Huge Collection | 0.09 MB | 34 MB | 80 bytes |
List<JavaBean> | 68 MB | none | 30 bytes |
List<byte[]> using Externalizable | 140 MB | none | 5,941 MB |
List<byte[]> | 506 MB | none | 16,746 MB |
To test the amount of garbage produced I set the Eden size target greater than 17 GB so no GC would be performed.
-mx22g -XX:NewSize=20g -XX:-UseTLAB -verbosegc
Conclusion
Having an optimised readExternal/writeExternal can improve performance and the size of a serialised object by 2-4 times, however if you need to maximise performance and efficiency you can gain much more by not using it.Related Links
Collections Library for millions of elementsEhcache BigMemory performance typical latency is around 200 μs for huge caches, indicating that serialization might not be the only bottleneck.
HugeArrayVsSerializationTest.java The test code
MutableTypes.java The data type used for testing different implementations.
Hi, What is the format that serializable actually creates? Does it create a field listing as well?
ReplyDelete@steven, by default the Serializable format includes field listing (the first like an object of a type is encoded) It also lists the class and its Serializable parents and their fields the first time. However if each object is Serialized individually (in its own byte[]) this is inefficient.
ReplyDelete