Thursday, July 12, 2007

Improving web performance by distributing images among hostnames

It's a known fact that browsers open a very limited amount of concurrent connections to get the components of a page. By components I mean all the resources required to render a page: images, css, javascript, flash, etc. In fact, the HTTP/1.1 standard recommends that no more than two concurrent persistent connections per hostname should be open at the same time. Users can modify these default settings for their browsers, but as developers we have to assume defaults. So, if we can't control the number of connections, we have to increase the number of hostnames (the other variable). How do we do that? Basically, fetching images and/or other resources from a hostname other than the main host. E.g. if your main host is www.example.org, then you could create subdomains and fetch some images from images1.example.org, some from images2.example.org, and so on. So if you have your main host plus two subdomains for images, you get 6 concurrent connections from your visitors.

Now this is all nice and dandy, but if you have a 10000+ page site in ASP.NET, replacing every friggin' src attribute in every img tag is simply not viable. Plus, the guy that used to work before you wrote a lot of image urls on the code-behind, like img.Src = "/img/hello.gif";
And you have to consider browser caching too. If you assign two different hostnames in two pages to the same resource, it will break the cache.

The solution I propose is to write a Stream to plug into Response.Filter. Let's call it ImageDistributionFilter. You would create this ImageDistributionFilter, somehow pass it a list of domains of subdomains available (these parameters could be injected by a IoC container or fetched from web.config or dynamic.web.config). Then you hook it to Response.Filter, using you custom base page class (if you have one) or a HttpModule. The ImageDistributionFilter scans the HTML before sending it to the client, and replaces every src of every img tag with a corresponding subdomain. E.g:

<img src="/img/hello.gif">

becomes:

<img src="http://images1.example.org/img/hello.gif">


Here's a catch: you should always check that Request.Url.Scheme == Uri.UriSchemeHttp before applying the filter (in other words, don't apply the filter in secure pages), otherwise you could end up with mixed http / https content (which generates warnings on the browser) or certificate mismatches triggered by requests such as https://images1.example.org/img/hello.gif (if you have a certificate for exactly www.example.org, it won't work for other subdomains)

Another thing to remember: there's a limit to the number of subdomains that can be used to increase performance. Each subdomain will require a separate DNS request, that's an important overhead to take into account. Yahoo ran an experiment and found out that the optimal number of subdomains lies between two and four. Beyond that, DNS overhead is too high and you actually lose performance. CPU usage increases too. However, you have to think about the most common network connection that your users have (Google Analytics to the rescue!). If 90% of your users are still on dial-up and use Pentiums-133, don't use this technique, it will kill their machines! Ok, maybe I'm exaggerating, but it will make things slower for them. On the other hand, if it's an intranet app (LAN) and users have modern computers, maybe you could use six or eight subdomains.

Note also that it's not neccesary that each subdomain goes to a different server. The connection limit applies per hostname, not per IP. So, you could use wildcard DNS entries to map every subdomain to the same server.

You can find the source for the ImageDistributionFilter along with a little demo here.

Further reading :

No comments: