1.1 Serialization of HTTP requests
One reason why web pages download slowly is that most browsers implement HTTP requests serially on each persistent connection:
- Open a connection to a server.
- Send an HTTP request.
- Wait for the response.
- If there are more resources to be fetched from that server, repeat steps 2 and 3.
- Eventually close the connection to the server.
The minimum amount of time needed to fetch a resource via HTTP is the network round trip time (RTT) between the client and the server. The good news is that, if the client reuses a persistent connection to the server, the actual elapsed time is very close to this lower bound of 1 RTT in some common scenarios: retrieving small, static image files, for example. The bad news is that the RTT is often a large number: even at the speed of light, it takes several tens of milliseconds to send a packet across a continent and back.
Given a set of
n small, static objects, the time needed to fetch them all synchronously over a preestablished TCP connection is approximately:
n * RTT
(Note: For simplicity, this model assumes that the server’s initial TCP congestion window is large enough to avoid further round-trip delays to await ACKs during slow start. It also assumes that the time-to-first-byte as measured at the server is zero.)
For a web page containing 50 small images and a 100ms RTT, requesting all of the images serially would take five seconds. That’s too long.
1.2 Current best practices for mitigation
In common practice, the elapsed time is less than
n * RTT, because the developers of web browsers and web sites have taken some steps to reduce it:
- Multiple connections
Most modern web browsers will open up to six concurrent, persistent connections per server hostname. This reduces the elapsed time in the simplified model from
- Domain sharding
Knowing that browsers maintain persistent connections on a per-server-hostname basis, some web applications partition their static content into a few different hostnames: http://images1.example.com/, http://images2.example.com/, etc.
With the content spread across C distinct scheme:host:port combinations, the elapsed time drops further, to
A Content Distribution Network stores cached copies of web content at many geographically distributed nodes and uses anycast DNS to send each client to a nearby node. This reduces the value of
n*RTT by replacing a large RTT with a smaller one.
1.3 Example of unmitigated serialization latency
Despite these practices, however, serialized HTTP requests are still a significant factor in the slow page downloads of some websites. The following subset of a waterfall diagram, taken from the HTTP Archive, shows part of the work a browser must do to download the home page of a particular e-commerce website. (Because I am using this website as an example of slow page downloads, I have blurred the site name and URLs in the waterfall diagram to preserve the developers’ anonymity.)
Several things stand out in this waterfall:
- The client, MSIE 8, used 6 concurrent, persistent connections per server hostname.
- All of the requests in this sequence were for graphics. None of these requests depended on the response to any of the others (all the GIF URLs were specified in a CSS file loaded earlier in the waterfall). Thus, significantly, it would be valid for a client to download all of these images in parallel.
- Many of these requests had elapsed times of approximately 125 milliseconds. That appears to be the RTT between the client and the server in this test. Thus this waterfall shows the
- The amount of response data was quite small: a total of 25KB in about 1 second during this part of the waterfall, for an effective throughput of under 0.25 Mb/s. The client in this test run had several Mb/s of downstream network bandwidth, so the serialization of requests resulted in an inefficient utilization of the available bandwidth.
Given these observations, how might we speed up the downloading of these files? There are a few ways to reduce the download time by modifying the content:
- Domain sharding could reduce the elapsed time by a small constant factor. The total time for this section of the waterfall, though, was approximately 7*RTT. While partitioning the requests across 7 distinct hostnames could potentially reduce the total time to 1*RTT, the resulting 42 TCP connections per client might substantially increase the memory footprint of the server and any intermediate proxies or load balancers. (Because of the need to dedicate buffer space per TCP connection to accommodate the advertised receive window and the send window, adding TCP connections is a somewhat resource-intensive way to gain parallelism.)
- Spriting or inlining the images could reduce this section of the waterfall to a single HTTP request. These techniques have been used to good effect by some websites, but they are nontrivial to implement and maintain.
1.4 Protocol-level strategies
An alternate approach is to seek more parallelism at a lower level: not by changing the content, but by changing the protocol:
- The HTTP/1.1 specification allows a client to pipeline its requests. The responses arrive in order: first the entire response to the first request in the pipeline, then the entire response to the second request in the pipeline, and so on.
- The SPDY protocol, currently experimental but in production use on various Google websites, allows a client to issue arbitrarily many requests on the same connection. The responses arrive out of order and may be interleaved with each other; SPDY defines a framing protocol to allow chunks of multiple messages to be multiplexed onto the same connection.
There are barriers to the adoption of both HTTP pipelining and SPDY, however. Most current web browsers do not pipeline their HTTP requests, due in part to past experience with web servers or proxies that handled pipelining incorrectly. Those browsers that do implement pipelining, such as Opera, currently rely on heuristics to decide when pipelining is likely to be safe. And only one major web browser, Chrome, currently supports SPDY.
2. Empirical study of the opportunity for request parallelization
Given a set of timed HTTP transactions comprising the download of a single web page by a browser, a serialized request sequence is a subset of HTTP transactions with the following properties:
- All the HTTP requests in the set are for the same scheme:host:port.
- Each transaction except the first must begin immediately upon the completion of some other transaction in the sequence.
- Each transaction except the last must have an HTTP response status of 2xx.
- Each transaction except the last must have a response content-type of image/png, image/gif, or image/jpeg.
These criteria are based on a simple heuristic: If a set of back-to-back HTTP transactions are requests for images served from the same site, the requests are assumed to have been serialized due to a scarcity of available persistent connections at the client.
The length of a serialized request sequence is the number of HTTP transactions in the sequence.
2.2 Questions to be answered
- The pattern of a serialized request sequences, as defined in Section 2.1,
- the negative impact of this pattern on page load speed,
- the possibility that additional parallelism, in the form of HTTP request pipelining or SPDY, could mitigate the performance impact,
- and the practical barriers to adoption of either pipelining or a new protocol,
it is useful to address two questions:
- How prevalent is the pattern?
If only 1% of real-world web pages had a longest serialized request sequence with length greater than 1, there would be no need for widespread deployment of protocol-level changes to increase request parallelism. At the other extreme, if 99% of pages contained long serialized request sequences, it would be worthwhile for HTTP client, server, and intermediary (proxy) developers to pursue protocol-level solutions.
- And when the pattern occurs, how bad is it?
If the longest serialized request sequence in a typical page had length 2, it might reasonably be fixed (i.e., reduced to an average value closer to 1) by increasing the size of browsers’ per-hostname connection pools. Conversely, if the longest serialized request sequence had length 10, it would provide an argument for a more radical solution such as pipelining.
2.3 Experiment design
Starting from the database of surveyed websites’ home pages from the HTTP Archive’s July 1, 2011 sample, I downloaded the HAR file for each page from webpagetest.org. Each of these HAR files contains a detailed log of all the HTTP transactions required to download the corresponding web page using MSIE 8 (a browser that uses up to 6 concurrent, persistent connections per server hostname and does not pipeline its HTTP requests). The HAR file format contains timing data for each HTTP request and response, with 1-millisecond resolution.
For each HAR file, I ran a simple Java program to find the longest serialized request sequence, using the heuristics described in Section 2.1.
The following histogram shows the distribution of longest serialized request sequences among the 15,568 web pages pages in the sample set:
For 12% of the sites, the longest serialized sequence in the home page had a length of 1. These sites are already well optimized, at least with regard to parallelization.
For another 22% of the sites, the longest serialization sequence had length 2. By applying additional parallelization efforts, one could potentially reduce the elapsed time for the affected requests by 1*RTT. The simplest and most compatible way to achieve the small amount of extra parallelism needed for these sites probably would be to implement domain sharding at the origin web applications.
For the remaining 66% of the sites, the longest serialization sequence had length 3 or greater. Increasing the number of concurrent client-to-server connections by a factor or more to parallelize these requests (via either domain sharding on the server side or bigger per-hostname connection pools on the client side) probably would be a bad solution, due to the per-TCP-connection memory footprint issues noted in Section 1.3. For these sites, it appears that parallelization of requests within the same TCP connection, via HTTP pipelining or SPDY, would be an effective solution.
Based on the results in Section 2.4, two thirds of the websites in the HTTP Archive contain request patterns that may be good candidates for acceleration through protocol-level parallelization.