Let’s say you were using Rails 2.3.x, and you made the (wise) decision to implement heavy fragment caching. Then let’s say that you updated to Rails 3.x. At that point, you probably noticed that Rails 3 does something seriously annoying with the paths where it caches fragments.

In Rails 2, doing this:

<% cache('posts/123/123456/main') do %>

Gave you a cache file that looks like this:

RAILS_ROOT/tmp/cache/views/posts/123/123456/main.cache

However, in Rails 3, the same cache code results in a cache file that looks like this:

RAILS_ROOT/tmp/cache/925/AB2/posts%2F123%2F123456%2Fmain

WTH? That’s not documented in the Rails 3 API. So what’s going on, and how do you make Rails 3’s fragment caching work like it used to instead of using those crazy hash subdirectories? Read on!

Unfortunately, it’s not terribly easy to track down exactly what the dear Rails people did. Fortunately, I found it for you.

Behold the supreme hilarity that is the new ActiveSupport::Cache::FileStore.key_file_path(key):

# Translate a key into a file path.
def key_file_path(key)
  fname = Rack::Utils.escape(key)
  hash = Zlib.adler32(fname)
  hash, dir_1 = hash.divmod(0x1000)
  dir_2 = hash.modulo(0x1000)
  fname_paths = []
  # Make sure file name is < 255 characters so it doesn't exceed file system limits
  if fname.size <= 255
    fname_paths << fname
  else
    while fname.size <= 255
      fname_path << fname[0, 255]
      fname = fname[255, -1]
    end
  end
  File.join(cache_path, DIR_FORMATTER % dir_1, DIR_FORMATTER % dir_2, *fname_paths)
end

So, what’s wrong with this method? Quite a few things, actually.

First of all, we can see that the key, which in your case is a file path, is URI escaped so that things like “/” become “%2F”. Then, that escaped key is Adler-32 hashed and then chopped up a bit to give you those annoying “/925/AB2” subdirectories.

Why are these “hash subdirectories” a problem? Well, what happens if you have 10,000 cache files and you’d like to find a specific one to expire manually? Good luck finding it! Whereas before, we knew that post #123456 could be found in /posts/123/123456, now our lovely self-created cache file organizational structure that used to be eminently human-readable becomes all crap-hashed and complicated.

Now, I should note that this hash-based scheme for storing cache files is seriously popular. It’s also seriously unnecessary overhead in most cases. Implement your own scheme, and you’ll be much happier. For example, we know that linux complains when there are more than ~32,000 files/directories in any given place on the file system. So, if we store cache files based on a Post’s ID, we can easily divide by 1000 to generate the /posts/123/ directory in which to plop our cache files. The end result is that now, each /posts/XXX directory will hold 1000 cache files, which means our /posts directory can now contain ~32,000 * 1000 = 32,000,000 cache files. Need more? Just separate the ID into 10,000’s subdirs, for example, which would give you 320 million cache files. It’s simple, it’s fast, and it works.

If you can’t store by ID, you can also generate subdirs based on the first 3 or 4 characters of a string. /people/@@@ (where @@@ is a 3-letter combination) gives you 499 million cache files!

But, I digress.

Back to our method above, we see that the hash subdirs are being generated with the Adler-32 algorithm. On most platforms, Adler-32 is faster than even CRC-32, which is pretty fast. However, for small keys, the generated hash becomes not-so-unique, which is bad. Furthermore, why do any hashing at all if you don’t have to?

But it gets worse. The next part of the function takes the key (now “fname”) and checks to make sure that it is less than 255 characters long to keep the file system happy. If fname is greater than 255 chars, it’s busted up into 255-char chunks. Let’s take a peak:

  # Make sure file name is < 255 characters so it doesn't exceed file system limits
  if fname.size <= 255
    fname_paths << fname
  else
    while fname.size <= 255
      fname_path << fname[0, 255]
      fname = fname[255, -1]
    end
  end
  File.join(cache_path, DIR_FORMATTER % dir_1, DIR_FORMATTER % dir_2, *fname_paths)

Oops! If fname is less than 255 characters, it’s passed as the URI-escaped string at the end of the output file path. If it is greater than 255 characters, the while loop is supposed to be executed to break it up. Two problems:

while fname.size <= 255 means that the code inside the while will never be executed. Should be: > 255
The name of the var inside the while loop to hold 255-character chunks is fname_path, not fname_paths. So, even if the while loop WAS executed, it wouldn’t do anything anyway in terms of the last File.join because there’s a typo in the first line of the while block.

In short, this function borders on total crap. Not only is it annoying and inefficient in most cases, but it’s broken in 2 places.

Since the Rails core peeps don’t actually ever fix bugs, I’ve fixed it myself. Well, they DO fix bugs, but most of the time they decide that bugs are just not important enough, and we poor end users are left to fend for ourselves with a monkeypatch while they work on the next glorious (and buggy) version of Rails. It’s like they say: Ruby is Awesome, and Rails is… well, Rails.

Without further ado, this is what you do: Create a new file in /config/initializers and call it something like “fragment_cache_fix.rb”. Plop the following code inside:

# Override the key_file_path method to get rid of those hash subdirs for frag cache files

ActiveSupport::Cache::FileStore.module_eval do
  def key_file_path(key)
    fname = key.to_s
    fname_paths = []
    # Make sure file name is < 255 characters so it doesn't exceed file system limits
    if fname.size <= 255
      fname_paths << fname
    else
      while fname.size > 255
        fname_paths << fname[0, 255]
        fname = fname[255, -1]
      end
    end
    File.join(cache_path, *fname_paths) + '.cache'
  end
end

Save, restart your app, and fragment cache away!

Of course, the above method assumes that you’ll be passing in only string paths that are properly formatted for your file system (i.e. with / or \ as appropriate). And, as for expiring the cache files, don’t user expire_fragment. Use an atomic cache expiry function.

So there you have it. Now your fragment caching works like it used to! And if it doesn’t, let me know and I’ll ask DHH for a job. Kidding! He would never hire me because I don’t use a Mac.

I’ll stop now… 🙂

2 Comments

Jason Knight on 24 July 2011 at 14:29

There are all manner of gotchas and twists about rails internals that most users just don’t know about. The reason they remain is that, in spite of the evangelism for rails, it’s really just a glorified CRUD framework. It’s used on very very few projects that would ever reach the scales necessary to trip some of these internal bugs and tweaks so they never really hear about it.

On the few big projects it is used on, people just patch the bug themselves and don’t bother with the rails bug track, the rails community generally doesn’t make it worth your while to report bugs.
Jeff on 30 March 2012 at 06:43

Thank you so much for posting this – I was wondering what these random three-digit folders were and why my caching broke on a rails 3.2 upgrade. Now I both know AND have it fixed with your help above.