Rails cache in distributed environment
Page and fragment caching are life-savers for Rails application scalability. Specially for page cache, they can make your app fast, specially if you use a webserver like Nginx, serving static files directly without touching the Rails stack.
But maintaining cache consistency across a distributed Rails application can be challenging.
When page caching, Rails writes page result in a static file on the public folder (when using the default options), allowing the web server to serve it directly.
Expiring cache
Cache expiration must be done explicitly by your app using the expire_page command. This should be done when changes are made to your model (creations, deletions and updates), and should affect one or more pages, depending on your app. The cache expiration should be placed on model sweepers, as explained here.
Distributed environment
What about whenu you are using more than one box for serving your Rails app? When one box calls ethe expire page command, it only cleans the local cache, rendering the other boxes cache remain inconsistent.
Solutions
There are several solutions to this. Let's look at them:
1. dRb cache store
dRb (or distributed Ruby) cache store uses a singleton process to communicate your cache decisions. This is not a good solution because:
- there is a single point of failure: the dRb process
- web servers generally can't talk to dRb. even if they could, serving static files locally is much faster
2. Memcache Store
Using a memcached service is one good solution. Memcached can be use clustering and load balancing, and it is pretty fast. But, if you are using a distributed Rails environment mainly for the sake of redundancy, or don't want to complicate the environment setup, don't use memcache store.
3. Cron-based expiration
You can expire cache on a scheduled basis. This can be enough for some applications. But for some, specially when you have to keep a tight cache consistency, this is not enough
4. Build your own distributed cache cleaning
In this solution, your model cache sweepers are responsible for cleaning the cache (deleting page cache files) on the other machines.
But how does one machine contact the other machines?
One solution I came up with envolves every machine having a Mongrel server listening on a public TCP port. (When I say public,. mean accessible to the other machines on the cluster. This is not a service that you want to be public on the internet) .
This HTTP service is there just to listen to cache expiration events. It accepts, as arguments, the paths of the page cache
Security concerns
This service can be implemented on your Rails app, but it should not be accessible to
Drawbacks
There are several problems with this aproach:
1. Every machine must know each other
In order for one machine to contact each other when expiration must occur, every machine must know the other machines. This can be challenging using Rails config, but can be done in Capistrano tasks.
2. It does not scale well
Every time you add a machine you are increasing the cache expiration cost.
3. Fault tolerance
When you expire a cache page, you must contact EVERY other box. If the cache expiration service of one box is down, the cache expiration will fail. Error handling must be done carefully, having a fall-back mechanism like putting the cache expiration command on a queue.
A better solution
One better solution is to make cache expiration events ASYNCHRONOUS. When expiring a page, an event is triggered, and every other box is listening on this channel.
This can be achieved using UDP broadcasts, and having every box listening on this UDP port.
Drawbacks (again)
A fall-back mechanism must be in place, though, in case one box is down during the cache expiration event, rendering the cache inconsistent.
This can be done using some kind of persistent message queue instead of UDP broadcasts, but I think this can be an overkill for most applications.
Expect to hear from me soon regarding the implementation of this solution!