21 August 2005

How to set up a simple LRU cache using LinkedHashMap

Caches

Caches are a simple way to improve the performance of an application that reads data from a "slow" source such as files on disk or rows of data from a database table, but may need to re-read the same data multiple times. The idea is simple: instead of discarding the data after using it, keep it in memory so that it doesn't have to be re-read later.

For example, a simple way to organize this for files might be to create a HashMap that maps the file names to objects containing the file data. When your application needs a particular file it first checks the map to see if it already has the data; if it doesn't, it reads the file and places it in the map in case it needs it again later.

LRU caches

The problem with this simple method is that your application could use a vast amount of memory. Once read in, a file is in the cache for the lifetime of the application whether or not it is ever used again. For an application such as a server that is intended to stay up and running for weeks or months at a time, this is probably not acceptable.

What's needed is a cache that automatically discards entries that haven't been accessed for some time so as to limit the amount of space being used. Such as cache is called an LRU Cache. LRU stands for "Least Recently Used", which refers to the policy of removing the oldest, or least-recently used entries to make space for new data.

LRU caches have a maximum number of data items that they will hold and these items are usually arranged in a list. When an item is added to the cache, and every time it is accessed after that, it is automatically moved to the head of the list. If the cache is full and a slot is required for a new item, the cache makes room by discarding the entry at the tail of the list - the least-recently used item.

The java.util.LinkedHashMap class

LinkedHashMap is a subclass of java.util.HashMap that adds a couple of useful features. One is that by default the iteration order reflects the order that entries are added to the map, rather than the rather haphazard order of a HashMap.

The other feature - the one we're interested in here - is that LinkedHashMap has an option to use access-order instead of insertion order, and includes a way to remove the least-recently accessed entries automatically. This makes it well suited for creating LRU caches.

Creating a cache class

The simplest way to create a cache using LinkedHashMap is to extend it. Here is an example:

public class Cache extends
LinkedHashMap
{
private final int capacity;

public Cache(int capacity)
{
super(capacity + 1, 1.1f, true);
this.capacity = capacity;
}

protected boolean removeEldestEntry(Entry eldest)
{
return size() > capacity;
}
}


Note that the constructor takes as an argument the maximum number of entries we want a Cache object to hold. The superclass constructor has three arguments: the initial capacity of the map, the load factor, and a boolean argument that tells the LinkedHashMap constructor to keep entries in access order instead of the default insertion order. (See the java.util.HashMap API documentation for a description of the initial capacity and load factor.) In this case we set the initial capacity to be one more than our required cache size - this is because new entries are added before any are removed, so for example if we want to hold 100 entries the cache will actually contain 101 for an instant when new data is added. Setting the load factor to 1.1 ensures that the rehashing mechanism of the underlying HashMap class isn't triggered - this isn't a vital point but helps a little with efficiency at run-time.

The removeEldestEntry() method overrides a default implementation in LinkedHashMap and is where we determine the policy for removing the oldest entry. In this case, we return true when the cache has more entries than our defined capacity.

Using the cache

Using the cache is simple - just use a suitable key to access the cache; if the data is in the cache we can read it from there. If it's not there we pull it from the slow medium and add it to the cache so that it's in place if needed later:

    Cache cache = new Cache(100);
// ...

String filename = "file.txt";
String filedata = (String) cache
.get(filename);
if (filedata == null)
{
// Read filedata from file system here...
cache.put(filename, filedata);
}


How does it perform?

Our basic implementation should work just fine but we may need to know how effective it is in a given application. One measure of how well a cache is performing is Hit Rate, which tells us how many cache accesses are "hits" - that is, how many times the required data was found in the cache for a given number of accesses. Hit rate is usually expressed as a percentage - a hit rate above 80% is usually pretty good.

We can add a few things to our Cache class to make it possible to monitor performance so that we can tune the cache by setting an optimum size. We need a counter for the number of accesses and another for the number of hits. We will need getter methods to allow us to retrieve those values after the cache has been running for a time, and finally we must override the get() method to update the counts. Here is our Cache class complete with these new members:

public class Cache extends
LinkedHashMap
{
private final int capacity;
private long accessCount = 0;
private long hitCount = 0;

public Cache(int capacity)
{
super(capacity + 1, 1.1f, true);
this.capacity = capacity;
}

public Object get(Object key)
{
accessCount++;
if (containsKey(key))
{
hitCount++;
}
Object value = super.get(key);
return value;
}

protected boolean removeEldestEntry(Entry eldest)
{
return size() > capacity;
}

public long getAccessCount()
{
return accessCount;
}

public long getHitCount()
{
return hitCount;
}
}


One point to note is that calls to the containsKey() method don't update the access counts; we may want to override that method also, so that the hit rate isn't skewed by code such as this:

    if (cache.containsKey(filename))
{
filedata = (String) cache
.get(filename);
}
else
{
// Read filedata from file system here...
cache.put(filename, filedata);
}

Labels:

7 Comments:

  • LRU caches must remove the stale entry from the cache. Stale entry is a one which has NOT been accessed for the longest period of time. (i.e) map.get("key") should make that MapEntry "unstale"

    The Map.Entry passed as an argument to removeEldestEntry() is the entry which was first added in the LinkedHashMap. It is NOT the one which is "stale"

    By Blogger 'E', At 6:48 AM  

  • 'E' - note that the code uses the LinkedHashMap constructor variant that switched the map to use access ordering instead of the default insertion ordering. This ensures that, as you said, accessing an entry moves it back to the top of the access order list. As a consequence the eldest entry is in fact the 'stalest', that is the one that has been least recently accessed and hence the one that is to be removed.

    By Blogger Pete Ford, At 11:09 AM  

  • In the get(Object key) method of cache we are incrementing accessCount everytime and hitCount if the key exists in cache. How should we implement the put(Object key,Object value) method ? should we update the existing key if the key alraedy exists in cache ? what if the key doesn't exist in cache ? Need we update instance variables accessCount and hitCount during put() method with either scenario ?

    By Blogger beanicatch, At 1:29 AM  

  • bean - you don't need to override put() unless you want to gather hit statistics in the same way as get(). If you do, you can do it in much the same way as what I showed for get(). Yes, you should then make sure your overriding method calls super.put() so that it updates existing keys with new values, and you should return the value that the super.put() call gives you back so that it remains fully compatible.

    By Blogger Pete Ford, At 2:42 PM  

  • We don't need containsKey(key) and cost of it.

    public Object get(Object key)
    {
    accessCount++;
    Object value = super.get(key);

    hitCount=value==null?hitCount:hitCount+1;

    return value;
    }

    By Blogger mustafayuceel, At 7:21 AM  

  • Really Nice Article ...
    I loved it .

    By Blogger Shirish Bari, At 12:43 AM  

  • org.apache.commons.collections.map.LRUMap


    Apache has provided above excellent class, which has already implemented LRU element removal technique.

    By Blogger Shirish Bari, At 12:46 AM  

Post a Comment

Subscribe to Post Comments [Atom]



<< Home