Server managed cache in the browser

Imagine browsing to a big web page with lots of images and scripts, and it loads in your browser almost instantly, nearly as fast as loading it from your hard drive. Now imagine you’re browsing a web site with about 60-70 of these pages and they all load very very fast. Sounds interesting? But how to do that? Prime the browser’s cache? Preload all components of the web pages somehow? Is that possible?

Well, yes and no. It is possible by using Gears. It can be set to store all “static” components (JS, CSS, images, etc.) of a web page or a whole web site and load them from the local storage every time they are requested by the browser. However the Gears team shifted their priorities to HTML 5.0 offline storage which was the main idea behind Gears in the first place. Unfortunately the HTML 5.0 specification for offline storage implements only some of the features that were available in Gears, so this type of caching (controlled by the user and managed by the server) is impossible.

But why server managed cache? Isn’t the standard browser caching good enough? Yes, it is good. It has evolved significantly during the 15 or so years since the beginning of the World Wide Web. However it just can’t do that.

Lets take a simplistic look at how the browser cache works:

  • We (the users) browse to a web page.
  • The Server tells the Browser: “Hey there, these few files (images, JS, CSS, etc.) are almost never updated, put them in your cache and don’t ask me to send them again for the next 10 years.”
  • The Browser thinks: “Hmm, put them in the cache you saying? I’ll think about it. You know, I’m a Web Browser. I need to load pages very very fast. I don’t want a huge cache with millions of files in it. That will slow me down. Lets see if the User would come back to this page ever again.”

If we keep going to the same web page eventually the Browser would change his mind: “Maybe that Server was right and I should put these files in my cache. That would speed up page loading. But what will happen if these files are updated… I better keep asking the Server to check if they have been updated so my cache is always fresh.”

Couple of years ago we implemented Gears as WordPress’ Turbo feature. We didn’t use it to make WordPress an offline app, we used it to create server managed cache. It worked great. Even the heaviest pages in the WordPress admin were loading considerably faster regardless of how often the users were visiting them.

The implementation was very simple: we had a manifest that listed all “static” files and couple of user options to enable and initialize the “super cache”. The rest was handled automatically by Gears. So in reality we discovered the perfect way of browser caching for web apps:

  • The User decides which web sites / web apps are cached and can add or remove them.
  • The server (i.e. the web app) maintains the cache, updating, adding, deleting files as needed.

The results were spectacular. We didn’t need to concatenate and compress scripts and stylesheets. We even stopped compressing TinyMCE which alone can load about 30-40 files on initialization. And page load time was from 0.5 to 1.5 sec. no matter how heavy the page was. For comparison before implementing this “super caching” pages were loading in 5 to 9 sec.

Why was it performing that well? Simple: it eliminated all requests to the server for the files that were cached. And that means all, even the “HEAD” requests. In our implementation the only file that was loaded from the server was the actual HTML. All other components of the web page were stored in Gears’ offline storage.

That also had the side benefit of eliminating a big chunk of traffic to the server. At first look it doesn’t seem like a lot, 30-40 requests for the web page components followed by 30-40 of HEAD requests per page every now and then (while the browser cache is hot), but think about it in global scope: several millions of these pages are loaded every hour.

So, why not do the same with HTML 5.0 offline storage? Because it doesn’t work that way. The HTML 5.0 specification for offline storage is good only for… Offline storage. It’s missing a lot of the features Gears has. Yes, there is a workaround. We can “store offline” a skeleton of the web page and then load all the dynamic content with XHR (a.k.a. AJAX), but that method has other (quite annoying) limitations. Despite that we will try this method in WordPress for sure, but that discussion is for another post.

In short: the HTML 5.0 offline storage implementation is missing some critical features. For example a file that is stored there is not loaded from the storage when the browser goes to another page on the same website. Yes, it’s sad watching the browser load the same file again and again from the Internet when that file is already on the user’s hard drive.

What can we do about it? Don’t think there is anything that can be done short of changing, or rather enhancing the HTML 5.0 specification for offline storage. The XHR “hack” that makes this kind of caching possible with the current HTML 5.0 is still just a hack.