28 August 2005

Static Initializers

An extremely powerful feature

Here is a very simple class:

public class Sample
{
public static int x = 6;
}


This is straightforward enough - any code that reads Sample.x will get the correct value. The question is, where is the value of x actually set?

The answer is that the Java compiler generates code to initialize the static data, and this code is included in the resulting class file. The JVM executes this code as soon as the class has been loaded into memory. In the case of our Sample class, the first reference to the class causes the JVM to locate the class file, load it into memory then run the initialization code. All of the initialization code for a given class is encapsulated into a single "method", called the static initializer.

Static initializers do not have names, arguments, or access modifiers and do not return any values - in other words, you can't call static initializers as if they were methods. A class's static initializer is run once and once only during the lifetime of an application, at the point where the class is loaded.

If initializing data fields were all that static initializers did, they wouldn't be particularly interesting. What makes them far more powerful is that you can write your own code to be added to the static initializers of your own classes. Here is an example:

public class Sample2
{
public final static Properties props;
static
{
props = new Properties();
try
{
InputStream is = new FileInputStream(
"my.properties");
props.load(is);
is.close();
}
catch (Exception e)
{
e.printStackTrace();
}
}
}


The first thing to note is the static keyword followed by a block of code - this is how we tell the compiler that the code is to be added to the static initializer. In this case loading the class causes it to load the Properties object from a file.

Some other things to know about static initializers: first, you can include as many static{...} sections as you like in a class source; each section of compiled code is added to the static initializer in the order the static blocks appear in the source. Second, static initializer code can throw only uncaught exceptions (hence the try/catch block in the example). Third, note that any final static data must be initialized; in the example the final props field isn't initialized at the point where it's declared, which would normally be an error, but it's assigned in our static initializer so the compiler doesn't complain.

You can put pretty much any code you like into static initializers - this simple fact makes them surprisingly powerful.

For example, if you've ever used JDBC you may have wondered how it manages to select the correct driver for a given database URL when there is more than one driver loaded. The trick is that all JDBC drivers' static initializers include code that creates an instance of the driver and then passes this object to a static method in the DriverManager class, which adds the driver to an internal list; when your application passes a URL string to DriverManager.getConnection(), that method iterates through the list of drivers and passes the URL to each driver in turn, until one of them recognizes the URL format and opens and returns a database connection.

Another example is Log4J, whose Logger class has a static initializer that by default scans the directories in the classpath looking for a file named "log4j.xml" or "log4j.properties". If it finds such a file it reads the logging configuration from it. This means that you don't need to write a single line of configuration code - the configuration can be completely defined in a file, and this will be loaded automatically the first time your application references the Logger. Velocity - a templating package from the Apache Jakarta project - uses a similar technique to load its default configuration settings.

I leave it to you, adventurous reader, to think of more applications for static initializers.

Considering that static initializers are so powerful and also that it would be impossible to understand some commonly-used packages without having an understanding of them, it is strange and surprising that there are a number of (otherwise excellent) books about Java programming that that make no mention of them whatsoever. My recommendation, when choosing a book, is to check the index for mention of static initializers; if there's no reference, move on to the next book.

Labels:

21 August 2005

How to set up a simple LRU cache using LinkedHashMap

Caches

Caches are a simple way to improve the performance of an application that reads data from a "slow" source such as files on disk or rows of data from a database table, but may need to re-read the same data multiple times. The idea is simple: instead of discarding the data after using it, keep it in memory so that it doesn't have to be re-read later.

For example, a simple way to organize this for files might be to create a HashMap that maps the file names to objects containing the file data. When your application needs a particular file it first checks the map to see if it already has the data; if it doesn't, it reads the file and places it in the map in case it needs it again later.

LRU caches

The problem with this simple method is that your application could use a vast amount of memory. Once read in, a file is in the cache for the lifetime of the application whether or not it is ever used again. For an application such as a server that is intended to stay up and running for weeks or months at a time, this is probably not acceptable.

What's needed is a cache that automatically discards entries that haven't been accessed for some time so as to limit the amount of space being used. Such as cache is called an LRU Cache. LRU stands for "Least Recently Used", which refers to the policy of removing the oldest, or least-recently used entries to make space for new data.

LRU caches have a maximum number of data items that they will hold and these items are usually arranged in a list. When an item is added to the cache, and every time it is accessed after that, it is automatically moved to the head of the list. If the cache is full and a slot is required for a new item, the cache makes room by discarding the entry at the tail of the list - the least-recently used item.

The java.util.LinkedHashMap class

LinkedHashMap is a subclass of java.util.HashMap that adds a couple of useful features. One is that by default the iteration order reflects the order that entries are added to the map, rather than the rather haphazard order of a HashMap.

The other feature - the one we're interested in here - is that LinkedHashMap has an option to use access-order instead of insertion order, and includes a way to remove the least-recently accessed entries automatically. This makes it well suited for creating LRU caches.

Creating a cache class

The simplest way to create a cache using LinkedHashMap is to extend it. Here is an example:

public class Cache extends
LinkedHashMap
{
private final int capacity;

public Cache(int capacity)
{
super(capacity + 1, 1.1f, true);
this.capacity = capacity;
}

protected boolean removeEldestEntry(Entry eldest)
{
return size() > capacity;
}
}


Note that the constructor takes as an argument the maximum number of entries we want a Cache object to hold. The superclass constructor has three arguments: the initial capacity of the map, the load factor, and a boolean argument that tells the LinkedHashMap constructor to keep entries in access order instead of the default insertion order. (See the java.util.HashMap API documentation for a description of the initial capacity and load factor.) In this case we set the initial capacity to be one more than our required cache size - this is because new entries are added before any are removed, so for example if we want to hold 100 entries the cache will actually contain 101 for an instant when new data is added. Setting the load factor to 1.1 ensures that the rehashing mechanism of the underlying HashMap class isn't triggered - this isn't a vital point but helps a little with efficiency at run-time.

The removeEldestEntry() method overrides a default implementation in LinkedHashMap and is where we determine the policy for removing the oldest entry. In this case, we return true when the cache has more entries than our defined capacity.

Using the cache

Using the cache is simple - just use a suitable key to access the cache; if the data is in the cache we can read it from there. If it's not there we pull it from the slow medium and add it to the cache so that it's in place if needed later:

    Cache cache = new Cache(100);
// ...

String filename = "file.txt";
String filedata = (String) cache
.get(filename);
if (filedata == null)
{
// Read filedata from file system here...
cache.put(filename, filedata);
}


How does it perform?

Our basic implementation should work just fine but we may need to know how effective it is in a given application. One measure of how well a cache is performing is Hit Rate, which tells us how many cache accesses are "hits" - that is, how many times the required data was found in the cache for a given number of accesses. Hit rate is usually expressed as a percentage - a hit rate above 80% is usually pretty good.

We can add a few things to our Cache class to make it possible to monitor performance so that we can tune the cache by setting an optimum size. We need a counter for the number of accesses and another for the number of hits. We will need getter methods to allow us to retrieve those values after the cache has been running for a time, and finally we must override the get() method to update the counts. Here is our Cache class complete with these new members:

public class Cache extends
LinkedHashMap
{
private final int capacity;
private long accessCount = 0;
private long hitCount = 0;

public Cache(int capacity)
{
super(capacity + 1, 1.1f, true);
this.capacity = capacity;
}

public Object get(Object key)
{
accessCount++;
if (containsKey(key))
{
hitCount++;
}
Object value = super.get(key);
return value;
}

protected boolean removeEldestEntry(Entry eldest)
{
return size() > capacity;
}

public long getAccessCount()
{
return accessCount;
}

public long getHitCount()
{
return hitCount;
}
}


One point to note is that calls to the containsKey() method don't update the access counts; we may want to override that method also, so that the hit rate isn't skewed by code such as this:

    if (cache.containsKey(filename))
{
filedata = (String) cache
.get(filename);
}
else
{
// Read filedata from file system here...
cache.put(filename, filedata);
}

Labels:

11 August 2005

Neat Trick #1

How to log a message identifying the location in code

Consider this code:

public class Debug
{
public static void debug(String message)
{
StackTraceElement ste =
new Throwable().getStackTrace()[1]; // <==
System.out.println("[" + ste.getFileName()
+ ":"
+ ste.getLineNumber()
+ "] "
+ message);
}
}


Now we can call this from other code:

public class LoggerDemo
{
public static void main(String[] args)
{
Debug.debug("Test Message");
}
}


The output will look something like this - note that we get the source file name and line number where the call was made:

[LoggerDemo.java:16] Test Message


This can be an extremely useful trick. The secret to how this works is all in the line marked with the "<==" marker comment.

Whenever a Throwable object is created (or an Exception, which is a subclass of Throwable), the JVM creates a stack trace in the form of an array of StackTraceElement objects. (Note that you can create a Throwable object using new just like any other class, and just because it's Throwable doesn't mean that it must be thrown.) The getStackTrace() method returns the stack trace array.

The elements of the array are ordered starting at the point in the code where the Throwable was created, so element 0 would contain details about the line containing the new Throwable() call. Element 1 is the one we want because it tells us about where the debug() method was called from.

As you can see the StackTraceElement contains information about the source file and line number. It also contains the class and method name, so our debug() method could report that too if it was useful.

Labels:

05 August 2005

Character development

Representing characters for global software

When we used to write in C, we didn't often concern ourselves with the character set we worked with. In general you can write a string constant in the usual way like this: "Hello, world" - and the compiler converts it to an array of encoded bytes using the character set of the underlying platform. Because most computers in the western world use ASCII or a compatible derivative, this works just fine most of the time for those of us working in ASCII-speaking regions such as Britain and the US.

Things start to go wrong when you start to get a little more adventurous. For example, to check that a character is an upper-case alphabetic you might do something like this:

  if ((c >= 'A') && (c <= 'Z'))
{
...
}


This will work fine on a PC and many other systems, but what if you try to run this on an IBM AS/400 system? That system didn't use ASCII - it used a different standard called EBCDIC instead (and may still do, if there are any of these machines still around today). On such a machine some characters, such as '}' and 'ü', will pass the test in that example code as if they were upper-case alphabetic characters.

Ok, that's a contrived example. But what about writing code that will work in German, with umlauts over some of the vowels? Or Danish, which has 29 characters in the alphabet? Or in Korean or Arabic, which don't even use the same alphabets?

Java handles this whole problem by working internally with Unicode no matter what the underlying platform uses. Unicode is a character encoding that's designed to allow any characters in use anywhere in the world to be represented. The problem from a developer perspective is how to get those Unicode characters in and out of your applications, because most systems as yet don't use Unicode as a native format for storing files or displaying text on the screen.

Mostly you don't have to worry about this. The InputStreamReader and OutputStreamWriter classes, for example, convert characters between Unicode and the platform's default character set without you needing to take explicit action.

But in today's networked world, many applications aren't designed to handle just one system - you may be developing client/server applications. What if your server is running on a system where the platform character set is Latin-9, and the client is on a Latin-1 system? The Unicode in the client application will be converted to Latin-1 and sent to the server, but the server will interpret the bytes as Latin-9. Since the mappings between Latin-1 and Latin-9 don't quite line up, some characters from the client will be wrong by the time they get into the server as Unicode.

As it happens this is quite easy to fix. InputStreamReader and OutputStreamWriter both have constructors that will let you specify the encoding you want to use. In this case you could set the encoding on both sides to be Latin-9 explicitly by putting something like this in the client and server sides:

  String charset = "ISO-8859-15"; // a.k.a. Latin-9
reader = new BufferedReader(new InputStreamReader(socket
.getInputStream(), charset));
writer = new BufferedWriter(new OutputStreamWriter(socket
.getOutputStream(), charset));


Now it doesn't matter what the default encoding of the client and server platforms are; your applications will communicate using Latin-9 regardless.

Should you use Latin-9? Probably not. For one thing, the Java specification defines a minimal set of encodings that all Java implementations must support, and Latin-9 isn't one of these. A better choice is probably UTF-8 - this is mandated in the specification so all Java runtimes must support it, and better still it allows all Unicode characters to be encoded and transferred so your application would be able to handle Russian, Greek and even Arabic and Japanese characters.

Labels:

03 August 2005

Waiting as fast as I can

How to avoid an occasional problem with Thread.sleep()

Every so often you find that you need a background thread that runs at intervals to perform some background task. Of course the simplest way to do this is to use Thread.sleep():

  while (true)
{
// Perform essential task here...
...

// Now wait for 10 seconds before continuing
try
{
Thread.sleep(10000);
}
catch (InterruptedException ie)
{
}
}


This will work fine - it will suspend the thread for ten seconds each time round the loop. However, if it takes two seconds to perform the "essential task", then the task will actually be performed once every twelve seconds instead of ten. What if it's important to run every ten seconds regardless of how long it takes to run the task? Well, that's easy enough to fix:

  while (true)
{
// Perform essential task here...
...

// Now wait for next 10-second interval
try
{
// This calculates how long to wait
long interval = 10000 -
(System.currentTimeMillis() % 10000);
Thread.sleep(interval);
}
catch (InterruptedException ie)
{
}
}


This calculates how many milliseconds have elapsed since the clock last showed a time which was a multiple of ten seconds ("System.currentTimeMillis() % 10000") then subtracts that from 10000, which tells us how many milliseconds until the next 10-second multiple. Now the code suspends for that many milliseconds before returning to the top of the loop and running the core task again. The task will therefore run when the system clock shows 0, 10, 20, 30, 40 and 50 seconds after each minute, regardless of how long the task takes (as long as it's less than ten seconds, of course).

There is just one slight problem with this code: it doesn't always work. I used code just like this in a server application and found that quite often it would run the core task twice in quick succession every ten seconds, instead of once. It turned out that the thread was coming out of the Thread.sleep() early by as much as fifteen or twenty milliseconds. It would then run the core task in just a couple of milliseconds, so when it recalculated the wait interval there was still ten milliseconds or so before the next clock "tick" - actually the same clock tick that it was waiting for on the previous call. The task would then be re-executed just a few milliseconds later.

I never did figure out the root cause, although it did seem that this problem never showed up on the Windows desktop box I develop on or the small Linux system I use as a test platform. It only happened when I moved the code to the 4-processor Sun server, and that makes me think that maybe this is something to do with concurrent thread synchronization on multiprocessor systems (although I think it's more likely that it's something to do with the clock granularity).

The final fix? I wrote this as a general method that can be used to synchronize on any millisecond interval:

  public static void syncSleep(long ms)
throws InterruptedException
{
long now = System.currentTimeMillis();
long interval = (ms - (now % ms));
long expectedEndTime = now + interval;
do
{
Thread.sleep(interval);
now = System.currentTimeMillis();
interval = expectedEndTime - now;
}
while (interval > 0);
}


In this, if the Thread.sleep() returns early it doesn't matter - the code detects the fact and waits again. Now we can recode the original thread code like this:

  while (true)
{
// Perform essential task here...
...

// Now wait for next 10-second interval
try
{
syncSleep(10000);
}
catch (InterruptedException ie)
{
}
}

Labels: