4 December 2009

Rails Caching OptimizationIf you use Rails on a high-traffic site, you know that as your number of users increases, you have three main options:

  1. Add servers to handle the load
  2. Optimize your queries
  3. Improve your caching scheme

You may be surprised to know that most people go for Door #1. It’s a lot easier. You don’t have to really do tons of work to rewrite your code and actually make it efficient. You don’t have to ditch “the Rails Way” and start actually thinking about what the database layer is doing to slow your site down to a crawl. And you don’t have to think about how Rails’ cache expiry functions actually work. Finally – and best of all – you can just pass the costs on to your customer, right??

I recently read an article that really slayed me. It was about a “niche site” that runs on Rails. They get 50 million hits a month, and they have SIX servers to handle the load, including multiple dedicated DB servers. I designed a Rails site that now gets 27 million hits a month, and it runs Rails on a single 1.86 GHz dual-core server with 3GB of RAM. By my calculations, the site could easily handle twice as many hits as it does now. Most of the time, the load is very low and the CPU and disk accesses hover at a few percent.

Of course, to achieve good performance, you can use things like Phusion Passenger. But that alone ain’t gonna cut it. You also have to optimize your queries, stop doing things The Rails Way and start thinking for yourself, and of course optimize your caching scheme.

In this episode, I’m going to tell you one very cool way to turbocharge your caching setup!

[ad name=”banner”]

Alrighty. So, if you’re smart, you will use Rails’ fragment caching feature. It is the most flexible caching option if you need to cache individual sections or even entire pages on your site for maximum performance. It does have its problems, though.

Let’s say you have a cache directory with 50,000 cache files. Those cache files represent various chunks of various pages, all automagically generated by Rails’ cache method. Some might be for an entire method, like an RSS feed cache. Rails will dutifully generate all those cache files without a hitch. The problem comes when you want to expire a cache file so that Rails regenerates it on the next hit to that particular piece of content.

Well, normally, you’d just use the expire_fragment method. And, it just works. Sort of…

You see, when you have, say, 50,000 cache files stored in various nested subdirectories in your main cache directory, Rails expire_fragment method is very, very dumb, and very, very slow. The reason is that expire_fragment will more or less check every single file and directory in the cache until it finds the one it’s looking for – even if you have specifically told it, “Go expire /stamps/123/123847/user_comments“.

Yes, it’s THAT dumb. It will cause your disk usage to skyrocket and your server will slow to a crawl while it scans all those cache files needlessly.

Now, you might decide to just use FileUtils and remove the cache file manually. You can use it with Dir.glob to even match wildcards, like so:

FileUtils.rm_f(Dir.glob(File.join(CACHEDIRECTORY, 'main_stamp_list', '*.cache')))

That will go into your CACHEDIRECTORY/main_stamp_list/ directory and nuke all the .cache files. And it will do so very quickly – much more quickly than Rails can do it. In fact, you might think that your problems are solved.

But there’s an even better way to do these things: using an atomic file operation. Thanks go out to John Leach for this glorious solution.

To see why, consider this scenario: You have multiple people posting content to your site. When they post new content, it expires several different cache files. Now, what can happen is this:

  1. Bert posts new content, and the cache file “main_stamps” is expired
  2. Very shortly thereafter, Edna posts new content, and the app tries to expire the same “main_stamps” file
  3. Both Bert and Edna see a “success” message, but Edna doesn’t see her new content added in the portion of the site cached as “main_stamps”. The reason is that Bert’s cache expiry had not yet finished, so when Edna’s expiry request went in, the system was busy deleting/regenerating the cache file and just sort of said, “Well, nothing to see here! Carry on then.”

You might think this is a rare occurrence, but it isn’t. Trust me on this one.

What you need to do is implement an atomic operation when deleting the cache files. Even a simple “rm” command can take awhile to complete, and so it has a “waiting period”. An “rm -r” is even worse. In other words, they aren’t atomic. But the “mv” command is atomic. It appears to happen “instantly” – and for all intents and purposes, it does.

So why the heck does all this atomic nonsense matter?

Check this out:

def atomic_cache_expire(isfile, cachepath)
  temp_str = [Process.pid, Time.now.to_i, rand(10000)].to_s
  FileUtils.mv(cachepath, cachepath + temp_str, :force => true)
if isfile then
    # expire individual file
    FileUtils.rm_f(cachepath + temp_str)
  else
    # expire whole directory
    FileUtils.rm_rf(cachepath + temp_str)
  end
end

All you have to do is define this method somewhere, and then call it to expire your cache files. If you want to expire a single file, pass isfile=true. If you want to expire an entire directory, pass in isfile=false. Obviously you must also pass in a cachepath, like so:

atomic_cache_expire(ISFILE, File.join(CACHEDIRECTORY, 'stamp_categories', '25.cache'))

If you call this:

atomic_cache_expire(ISDIR, File.join(CACHEDIRECTORY, 'stamp_categories'))

Then you will nuke the entire “stamp_categories” cache directory, and it will be regenerated on the next page hit.

Also note that you need to define some things for the above examples to work:

CACHEDIRECTORY = File.join('tmp','cache','views')
ISFILE = true
ISDIR = false

The above assumes your cache directory is RAILS_ROOT/tmp/cache/views/.

So what does the function actually DO?

  1. It creates a temporary string consisting of the process ID, the current time in integer form, and a random number between 0 and 10000. This string ensures that the temporary file/directory name is unique in the case of multiple concurrent cache expiry operations.
  2. It then moves the existing cache file or directory to a temporary one. In other words, if your old cache file was “stamp_categories.cache”, it moves it to “stamp_categories.cache53231259936248897”, for example. At this point, the old cache file/dir has been expired since it has been moved in one teensy, lightning-fast operation. The app is now free to generate a new cache file/dir without a problem.
  3. If the cache is a file, the temporary file is removed using FileUtils.rm_f
  4. If the cache is a directory, the temporary directory is removed using FileUtils.rm_rf.

That’s all there is to it. By using the “mv” command to rename the file/directory first, you have freed up the app from having to wait 300 years to do the actual deletion of the old cache file. And you most certainly will no longer have to wait for Rails’ inefficient expire_fragment method to scan all the darn cache files!

If you don’t have thousands of cache files, you may think you don’t need to go to all this trouble. But every little performance improvement adds up to huge cost savings in the long run!

Increase the Performance of Fragment Caching in Rails
Tagged on:         

2 thoughts on “Increase the Performance of Fragment Caching in Rails

  • 11 June 2011 at 20:18
    Permalink

    Great post, and just enough sarcasm 🙂 I’ll definitely be using the expiring mechanism written about, so thanks for putting it out there.

    Four of my own tips are :
    1) Just because there’s a gem that “does what we want” it doesn’t have to be used – especially for simple things. Often it’s better to just create the optimal code that you need and be in control of its destiny.
    2) When you’re just rendering results, don’t instantiate model objects. Use AREL to build the query (if need be), but execute the raw SQL and loop over the raw result object – so much Ruby, memory allocation and GC saved upon.
    3) Develop in exactly the same DB as your production environment. That way (a) you get very familiar with the database (b) you can tune the heck of the queries.
    4) TDD isn’t God. Sometimes it’s better to prove the concept and then write the tests – otherwise you have twice as much code to manage while checking out new ideas.

    Thanks again.

    Reply
  • Pingback: Fixing the Rails 3 Fragment Cache Path | Scottie’s Tech.Info

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.