optimisation

It’s been clear to me for a while that arts & ego is too slow, and it became obvious when I moved it home that it’s also too power hungry. I decided, finally, to sort things out this weekend, by optimising away Server Side Includes (SSIs). The results are very good.

Why did I use SSIs in the first place? My initial online presence was ordinary, ugly, ancient HTML. In 2000, I migrated to ASP running under Windows 2000. In 2002, I decided to migrate again, to a more generic format, mostly so I could chose the cheapest host around, whatever their underlying technology. My requirements were fairly minimal: HTML with file includes. The includes permit common content on many pages without having to amend those pages individually when that content changed. I did some research and found that SSIs were supported on both Microsoft’s IIS and Apache (this was years before nginx was launched), so decided to settle on them. A few years later, my SSI requirements evolved beyond the fairly minimal feature set in IIS, but this wasn’t a problem because almost every web host runs Apache.

underflower

I rewrote arts & ego in HTML with SSIs in summer 2002, and moved it to powweb. (I stayed with them for 16 years.) Over the years, I tried alternatives to hand–coded HTML SSIs, such as Apple’s iWeb, and drupal, but all gave more hassle than benefit. I’m still using SSIs in HTML.

SSIs over HTML unfortunately has a serious problem: humungous inefficiency. The Apache server has to scan every HTML file it’s going to serve to a client, when the client requests it, to process all the SSIs there.

There are alternatives. If I were to use server side scripting, whether with PHP or another technology, then those languages offer less inefficient include mechanisms. Unfortunately, I long since decided the work maintaining such scripts weren’t worth the work, because of the constant rounds of security patches and upgrades breaking code which I’d then have to repair. Furthermore, the main benefit of server side scripting is to support dynamic sites, whereas I’ve settled on a simpler and more secure static site, so I can concentrate on content, not maintenance.

I decided to resolve the inefficiencies of SSIs by precompiling pages. This means that, whilst I still write the HTML with SSIs (for the time being), the pages are precompiled to process the SSI code before they go live. I tried the Python tool SSIC, and, although it works fairly well, it tends to trip up on relative page addressing. I wrote my own, simple, bash script to precompile the site, built around curl. It’s not very efficient, but it does work — well, I’ve not found the humungous bugs yet.

The results are excellent. Instead of Apache using between 20 and 80 percent of each CPU, it now uses between 0 and 3 percent for much the same traffic (presuming awstats isn’t fibbing). If my simple sanity testing is correct, pages are delivered significantly faster.

If I want to optimise the site any further, I’ll switch from Apache to OpenBSD’s native httpd. I couldn’t use that when I moved the site home because it doesn’t support SSIs, but now that’s irrelevant.

The one downside is the precompilation process takes time. If I amend a common include file, such as my standard header file, all HTML pages have to be recompiled and uploaded. This, though, is more a problem of using SSIs, indeed includes, than the compilation process per se. Having said that, I’m pretty confident the process can be improved. But, for now, I’m happy with the results, so that’s a task for another day, should another day shout so.