Measurements of client-side processing delays

Recently I wrote about “holes in the waterfall“: gaps in the waterfall graph for a web page download where no network activity occurred while the browser processed some JavaScript. To measure how widespread this phenomenon might be, I downloaded the HAR files for the August 15, 2011 run of the HTTP Archive and ran a program that measured the total duration of gaps in each one. To avoid counting intentional gaps where website developers had used timers to refresh the content after the initial page load, I ignored any part of the waterfall that occurred after the onload event.

For the 17,001 web pages in the sample data set, the distribution of time spent waiting for the client side looked like this:

Note: I limited the height of the Y-axis to 500 milliseconds to keep the middle of the graph from getting compressed too much. The 99th-percentile time was over 16 seconds. (A logarithmic scale wouldn’t work well for this data set because the first 10% of the Y-values on the left side of the graph were zeros.)

The median time spent waiting for client-side processing was slightly over 50 milliseconds. In an absolute sense, 50 milliseconds is a nontrivial amount of time; it’s on the same order of magnitude as the first-byte response time for a lot of the HTTP requests in this same data set. In relative terms, though, the client-side time was only a single-digit percentage of total page load time (measured from the first HTTP request to the onLoad event) for almost all of the websites in the sample:

Subjectively speaking, the amount of time spent on the client side is:

  • Larger than I expected,
  • Still not big enough to be the primary performance bottleneck for most websites,
  • But worth checking when doing website performance optimization work.

A hole in the waterfall: observations on JavaScript delays

Many web performance analysis tools use waterfall charts to show the timing of HTTP requests within a page download. Here is an example from the HTTP Archive showing the first two dozen requests that an IE 8 browser made while loading the CNN home page.

My own work on web performance has been focused on the time spent waiting for the network – i.e., the length of the bars in the waterfall chart, the number of bars, and the staircase patterns that emerge when synchronous HTTP requests are queued on a small pool of connections. In this waterfall, though, another phenomenon stands out: there is a short interval of no network activity right before the browser starts downloading six images in parallel at the end.

The page source shows the reason for this delay: there is a <script> block near the top of the page. Modern web browsers can download external scripts in parallel with other content, but running a script is a serialized operation.

The interesting question is: how common is this pattern of a waterfall “stalling” due to script execution? I’m working on a study of the HTTP Archive data to measure how many websites are affected by this phenomenon, and I’ll post a follow-up when the results are ready.

Will HTTP pipelining help? A study based on the data set

1. Background

1.1 Serialization of HTTP requests

One reason why web pages download slowly is that most browsers implement HTTP requests serially on each persistent connection:

  1. Open a connection to a server.
  2. Send an HTTP request.
  3. Wait for the response.
  4. If there are more resources to be fetched from that server, repeat steps 2 and 3.
  5. Eventually close the connection to the server.

The minimum amount of time needed to fetch a resource via HTTP is the network round trip time (RTT) between the client and the server. The good news is that, if the client reuses a persistent connection to the server, the actual elapsed time is very close to this lower bound of 1 RTT in some common scenarios: retrieving small, static image files, for example. The bad news is that the RTT is often a large number: even at the speed of light, it takes several tens of milliseconds to send a packet across a continent and back.

Given a set of n small, static objects, the time needed to fetch them all synchronously over a preestablished TCP connection is approximately:

n * RTT

(Note: For simplicity, this model assumes that the server’s initial TCP congestion window is large enough to avoid further round-trip delays to await ACKs during slow start. It also assumes that the time-to-first-byte as measured at the server is zero.)

For a web page containing 50 small images and a 100ms RTT, requesting all of the images serially would take five seconds. That’s too long.

1.2 Current best practices for mitigation

In common practice, the elapsed time is less than n * RTT, because the developers of web browsers and web sites have taken some steps to reduce it:

  • Multiple connections
    Most modern web browsers will open up to six concurrent, persistent connections per server hostname. This reduces the elapsed time in the simplified model from n*RTT to n*RTT/6.

  • Domain sharding
    Knowing that browsers maintain persistent connections on a per-server-hostname basis, some web applications partition their static content into a few different hostnames:,, etc.
    With the content spread across C distinct scheme:host:port combinations, the elapsed time drops further, to n*RTT/(6*C).

  • CDNs
    A Content Distribution Network stores cached copies of web content at many geographically distributed nodes and uses anycast DNS to send each client to a nearby node. This reduces the value of n*RTT by replacing a large RTT with a smaller one.

1.3 Example of unmitigated serialization latency

Despite these practices, however, serialized HTTP requests are still a significant factor in the slow page downloads of some websites. The following subset of a waterfall diagram, taken from the HTTP Archive, shows part of the work a browser must do to download the home page of a particular e-commerce website. (Because I am using this website as an example of slow page downloads, I have blurred the site name and URLs in the waterfall diagram to preserve the developers’ anonymity.)

Several things stand out in this waterfall:

  • The client, MSIE 8, used 6 concurrent, persistent connections per server hostname.
  • All of the requests in this sequence were for graphics. None of these requests depended on the response to any of the others (all the GIF URLs were specified in a CSS file loaded earlier in the waterfall). Thus, significantly, it would be valid for a client to download all of these images in parallel.
  • Many of these requests had elapsed times of approximately 125 milliseconds. That appears to be the RTT between the client and the server in this test. Thus this waterfall shows the n*RTT/6 phenomenon.
  • The amount of response data was quite small: a total of 25KB in about 1 second during this part of the waterfall, for an effective throughput of under 0.25 Mb/s. The client in this test run had several Mb/s of downstream network bandwidth, so the serialization of requests resulted in an inefficient utilization of the available bandwidth.

Given these observations, how might we speed up the downloading of these files? There are a few ways to reduce the download time by modifying the content:

  • Domain sharding could reduce the elapsed time by a small constant factor. The total time for this section of the waterfall, though, was approximately 7*RTT. While partitioning the requests across 7 distinct hostnames could potentially reduce the total time to 1*RTT, the resulting 42 TCP connections per client might substantially increase the memory footprint of the server and any intermediate proxies or load balancers. (Because of the need to dedicate buffer space per TCP connection to accommodate the advertised receive window and the send window, adding TCP connections is a somewhat resource-intensive way to gain parallelism.)
  • Spriting or inlining the images could reduce this section of the waterfall to a single HTTP request. These techniques have been used to good effect by some websites, but they are nontrivial to implement and maintain.

1.4 Protocol-level strategies

An alternate approach is to seek more parallelism at a lower level: not by changing the content, but by changing the protocol:

  • The HTTP/1.1 specification allows a client to pipeline its requests. The responses arrive in order: first the entire response to the first request in the pipeline, then the entire response to the second request in the pipeline, and so on.
  • The SPDY protocol, currently experimental but in production use on various Google websites, allows a client to issue arbitrarily many requests on the same connection. The responses arrive out of order and may be interleaved with each other; SPDY defines a framing protocol to allow chunks of multiple messages to be multiplexed onto the same connection.

There are barriers to the adoption of both HTTP pipelining and SPDY, however. Most current web browsers do not pipeline their HTTP requests, due in part to past experience with web servers or proxies that handled pipelining incorrectly. Those browsers that do implement pipelining, such as Opera, currently rely on heuristics to decide when pipelining is likely to be safe. And only one major web browser, Chrome, currently supports SPDY.

2. Empirical study of the opportunity for request parallelization

2.1 Definitions

Given a set of timed HTTP transactions comprising the download of a single web page by a browser, a serialized request sequence is a subset of HTTP transactions with the following properties:

  • All the HTTP requests in the set are for the same scheme:host:port.
  • Each transaction except the first must begin immediately upon the completion of some other transaction in the sequence.
  • Each transaction except the last must have an HTTP response status of 2xx.
  • Each transaction except the last must have a response content-type of image/png, image/gif, or image/jpeg.

These criteria are based on a simple heuristic: If a set of back-to-back HTTP transactions are requests for images served from the same site, the requests are assumed to have been serialized due to a scarcity of available persistent connections at the client.

The length of a serialized request sequence is the number of HTTP transactions in the sequence.

2.2 Questions to be answered


  • The pattern of a serialized request sequences, as defined in Section 2.1,
  • the negative impact of this pattern on page load speed,
  • the possibility that additional parallelism, in the form of HTTP request pipelining or SPDY, could mitigate the performance impact,
  • and the practical barriers to adoption of either pipelining or a new protocol,

it is useful to address two questions:

  • How prevalent is the pattern?

    If only 1% of real-world web pages had a longest serialized request sequence with length greater than 1, there would be no need for widespread deployment of protocol-level changes to increase request parallelism. At the other extreme, if 99% of pages contained long serialized request sequences, it would be worthwhile for HTTP client, server, and intermediary (proxy) developers to pursue protocol-level solutions.

  • And when the pattern occurs, how bad is it?

    If the longest serialized request sequence in a typical page had length 2, it might reasonably be fixed (i.e., reduced to an average value closer to 1) by increasing the size of browsers’ per-hostname connection pools. Conversely, if the longest serialized request sequence had length 10, it would provide an argument for a more radical solution such as pipelining.

2.3 Experiment design

Starting from the database of surveyed websites’ home pages from the HTTP Archive’s July 1, 2011 sample, I downloaded the HAR file for each page from Each of these HAR files contains a detailed log of all the HTTP transactions required to download the corresponding web page using MSIE 8 (a browser that uses up to 6 concurrent, persistent connections per server hostname and does not pipeline its HTTP requests). The HAR file format contains timing data for each HTTP request and response, with 1-millisecond resolution.

For each HAR file, I ran a simple Java program to find the longest serialized request sequence, using the heuristics described in Section 2.1.

2.4 Results

The following histogram shows the distribution of longest serialized request sequences among the 15,568 web pages pages in the sample set:

For 12% of the sites, the longest serialized sequence in the home page had a length of 1. These sites are already well optimized, at least with regard to parallelization.

For another 22% of the sites, the longest serialization sequence had length 2. By applying additional parallelization efforts, one could potentially reduce the elapsed time for the affected requests by 1*RTT. The simplest and most compatible way to achieve the small amount of extra parallelism needed for these sites probably would be to implement domain sharding at the origin web applications.

For the remaining 66% of the sites, the longest serialization sequence had length 3 or greater. Increasing the number of concurrent client-to-server connections by a factor or more to parallelize these requests (via either domain sharding on the server side or bigger per-hostname connection pools on the client side) probably would be a bad solution, due to the per-TCP-connection memory footprint issues noted in Section 1.3. For these sites, it appears that parallelization of requests within the same TCP connection, via HTTP pipelining or SPDY, would be an effective solution.

3. Conclusions

Based on the results in Section 2.4, two thirds of the websites in the HTTP Archive contain request patterns that may be good candidates for acceleration through protocol-level parallelization.

The opportunity for web image optimization: an empirical study based on the July 1, 2011 HTTP Archive

Following my investigations of CSS and JavaScript minification, I applied a similar test methodology to study whether further image optimization could materially accelerate the page loading of a large sampling of websites.

Starting with the database from the HTTP Archive’s July 1, 2011 run, the experimental steps were:

  • Fetch, via HTTP, every PNG or JPEG URL listed in the archive.
  • Upon retrieving each image, attempt to reduce its size with a type-specific optimizer program:

From the test results, the median size reduction possible for image files in the July 1, 2011 HTTP Archive survey was:

  • 1.8% for PNG (sample size: 167,185)
  • 2.3% for JPEG (sample size: 372,131)

The following graphs show the full distribution for each image type.

The potential reduction in file size for the images in this study is relatively small when viewed as a percentage; compare the median reduction of 1.8% for JPEG and 2.3% for PNG to the median 10.4% for compressed JavaScript and 13.9% for compressed CSS:

However, it is important to note that images constitute a larger fraction of total page size than scripts or stylesheets. From the July 1, 2011 HTTP Archive stats, images comprise 482KB of the average page’s weight, compared to 135KB for scripts and just 28KB for CSS.

JavaScript minification: an empirical study based on the July 1, 2011 HTTP Archive

Using the same methodology as in my earlier study of CSS minification, I tested 102,577 JavaScript URLs from the July 1, 2011 HTTP Archive release to determine how much the application of JavaScript minification could reduce their download size.

The 50th percentile benefit observed from applying minification to these JavaScript resources was:

  • 10.0% reduction in the uncompressed JS file size
  • 10.4% reduction in the compressed JS file size

The following graphs show the distribution of minification benefit:

Interestingly, the median size reduction that can be achieved by applying minification to the files in the July 1 HTTP Archive corpus is smaller for JavaScript (10.0% reduction before compression is applied) than for CSS (17.9%). This may indicate that websites in the Archive are more likely to have minified their scripts than their stylesheets.

CSS minification: an empirical study based on the July 1, 2011 HTTP Archive

One of the many common recommendations for accelerating the download of web pages is to minify JavaScript and CSS content. Among content optimization practices, minification is a safe bet: it never slows down a web page the way script combining sometimes does, and it never breaks on older browsers the way image inlining does. Back when I started Jitify, streaming minification was one of the first content optimizations I implemented.

Even though minification does no harm and is relatively easy to implement, it is worthwhile to ask a basic question: how much does it really help?

The HTTP Archive project provides some useful data with which to explore this question. The project periodically retrieves the home pages of a large number of public-facing websites (over 15,000 as of July, 2011) and records the HTTP requests and responses for all of the objects that a browser must fetch for each page. The project then publishes the details of all these HTTP transactions in a form suitable for loading into a MySQL database. From this database we can get the URLs of many tens of thousands of CSS and JavaScript files used by real-world websites.

Using the database from the HTTP Archive’s July 1, 2011 run, I conducted an experiment:

  1. Fetch, via HTTP, every CSS document listed in the archive. (The selection criteria used to obtain a list of CSS URLs from the archive’s Requests table were: status=200 and (resp_content_type="text/css" or resp_content_type like "text/css;%").)
  2. Upon retrieving each CSS document, use YUI Compressor to minify it, and record the pre- and post-minification sizes.
  3. Using “gzip -6,” compress the pre- and post-minified forms of the CSS document to measure the effects of minification in combination with compression.

A small number of URLs that resulted in unusable data, due to a problem in the fetch script’s handling of servers that returned gzipped content in response to requests without an Accept-Encoding. With these URLs excluded, 64,470 CSS URLs remained in the data set.

The 50th percentile benefit observed from applying minification was:

  • 17.9% reduction in the uncompressed CSS file size
  • 13.9% reduction in the compressed CSS file size

The following graphs show the distribution of minification benefit, from no benefit at the 0th percentile (corresponding to CSS files that were already minified) to a 100% reduction in download size at the other end of the scale (corresponding to CSS files that contained nothing but comments and/or whitespace).

Based on the data from this experiment, most websites could decrease their total CSS download size measurably by implementing CSS minification.

It is important to note, however, that CSS represents only a small fraction of the download time of a typical web page.  From the July 1, 2011 HTTP Archive trend report, CSS objects accounted for only 3.5% of the (uncompressed) weight of the average page surveyed. HTML and JavaScript were 4.6% and 16.9%, respectively, so reducing the size of those resources could yield a bigger win.  I plan to run follow-up experiments to measure the opportunity for HTML and JavaScript minification.