Apache deflation and negotiation (0 comments)

Apache deflation and negotiation

Friday, April 17, 2009 - 04:38 PM

Sometimes, wasting time isn't entirely unproductive.

This week, while thinking of getting work done on some longer term projects that are standing nearby and mocking me, I somehow got conned into installing Yahoo's YSlow Firefox extension.

It's pretty cool in a masochistic sort of way. It gives you a performance evaluation of whatever site you're looking at, based on "best practices" for HTML, server configs, etc. In my case, the one thing that popped out at me while looking at my homepage here, was that I wasn't compressing any of the "text" content (HTML, CSS, JavaScript, RSS feeds, etc.) that the machines serve. Server side auto-compression is one of those things that I remember looking at a few years ago before being distracted by the next shiny bauble that prevented me actually doing anything about it.

The idea is to auto-compress any text being served back so that the payload delivered to the clients (you and your web browser) is smaller, gets to you faster, and loads faster. Computers are fast enough, and the files are small enough that the compression speed-hit is far outweighed by the network latency speed-gain.

As an added bonus, since the server has to hold the network connection, with all it's associated memory and resource usage, open until the client is finished getting all its content, this frees up those resources a little bit faster.

The first step was enabling apache's mod_deflate module. This seems simple enough, the docs even have a perfect example right at the top:

Compress only a few types

AddOutputFilterByType DEFLATE text/html text/plain text/xml

Nifty! Now let's check the documentation for AddOutputFilterByType:

Compatibility: Available in Apache 2.0.33 and later; deprecated in Apache 2.1 and later

Well, crap.

I'm running Apache 2.2 (which is also what all the documentation links point to), so I probably shouldn't start by implementing this with a deprecated config directive.

It does point us in the direction of its replacement, the mod_filter module. Reading through all this documentation is not entirely unconfusing, as there are a lot of parts without a very coherent picture of a whole. At the end of the day, what it comes down to is that I need to first define my filter, and then apply it where and how I want to. To define it, I put at the top of my config:
FilterDeclare compress-response
FilterProvider compress-response DEFLATE resp=Content-Type $text/
FilterProvider compress-response DEFLATE resp=Content-Type $application/x-javascript

This declares a filter with the name "compress-response" and then says that it should be applied to anything with a MIME-Type starting with "text/" (i.e. text/html), or "application/x-javascript". Further down, in the virtual hosts that I want to use this compression, I need to add the line:
FilterChain compress-response
Nice and easy!

For the purposes of full disclosure, there's also some stuff in the mod_deflate documentation that I used for determining browsers where this will and won't work, so the full set of directives is:
<Location />
# Insert filter
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary

FilterChain compress-response
</Location>

I'll probably eliminate a bunch of those at some point as I don't think we need to worry about Netscape 4 a whole lot these days.

This worked, and made me happy. It's been live for most of this week, and nobody noticed, commented, or complained. Success!

What bugged me about this, is that while compressing on the fly makes sense for all the dynamic pages (most of mine and my clients' sites), it seems like a waste of resources for things like the JavaScripts served back from OhNoRobot, which are written to disk and then served back multiple times. It makes more sense to zip them once, when they're being written, and then serve those back to the clients that can handle it.

I'm about a decade too late to be the first person to think of this, so it's also conveniently built into Apache. Content Negotiation also supports sending back pages in multiple languages, but for my purposes I wanted to send back a ".gz" file instead of a ".js" if there was one available. The two important things to do are: add MultiViews to your enabled Options, and do an AddEncoding for the .gz files:
<Location "/js/">
Options +MultiViews
ForceType "application/x-javascript"
AddEncoding x-gzip .gz
</Location>

I also had to add the ForceType directive, because otherwise the .gz version of the file would be served back with Content-Type: application/x-gzip instead of as JavaScript. The other thing that wasn't immediately clear in the documentation is that in order to support the content negotiation for .gz or .js, you need to have both files there, but both need to be a ".js" file plus the encoding suffix, so (for example) Dinosaur Comics needs to have both "/js/23.js.js" and "/js/23.js.gz" on disk.

If you want to test this using curl from the command line, add "-H "Accept-Encoding: gzip,deflate"" to your requests, e.g.: curl -vI -H "Accept-Encoding: gzip,deflate" http://www.dumbrellahosting.com/

As a final remark, I'm pretty sure that most of the above is pretty obvious to all good Apache administrators. However, for those of us doing that as just one part of a larger job, it seems remarkably difficult to find a coherent set of task-oriented how-tos. Mostly I document this so I'll remember what the hell I was thinking when I look at this config again next year.

Threshold:  Locked
The Fine Print: The above comments are owned by whoever posted them. We are not responsible for them in any way.
Hell, let's face it, we're not responsible for anything; including the things we say, do, or think. And if you sue us because you think we are? Well, we're not responsible for that either.