Referring URLs and web privacy

Some writers at the WSJ recently “discovered” that one of the longstanding conventions of the web, the referring URL, can leak unexpected information if websites put things like user names in their URLs. Although my first thought was, “that’s not news,” Facebook appears to have responded with some technical changes that protect referrer data better than the industry norm.

Quick technical background

  • Assume you’re on a web page http://www.Site-A.com/origin-page.html
  • There’s a link on that page that goes to http://www.Site-B.net/destination-page.html
  • You click on the link.
  • Your web browser finds the web server for www.Site-B.net and sends that server a request for /destination-page.html.
  • The web browser usually also sends the www.Site-B.net web server the URL of the referring page: http://www.Site-A.com/origin-page.html

In short, when you’re on a web page on site A and you click on a link to site B, site B normally knows:

  • That you came from site A,
  • and the URL of the specific page on site A.

How is referring URL data used?

The web server for site B often just ignores the referring URL. Occasionally it may serve up different content based on the referring URL. But the most common practice I’ve seen is for the receiving website to study the referring URL information in aggregate to analyze traffic sources. “Last month we saw a big jump in the number of users reaching our home page from referring URLs that look like search result pages,” a marketing manager might say, “so our SEO efforts must be working.”

What’s dangerous about referring URLs?

If the referring URL contains sensitive data, that data will be visible to the destination site’s web servers. E.g., if the page

http://www.social-network.com/users/BrianPane

links to an article at

http://news-website.com/12345.html

then the people who run the web servers for news-website.com need only read their server logs to find the name of the social network user who linked to their article.

If you work for an organization with an internal Wiki, you might have a page for your secret new project

http://wiki.my-company.com/ProjectPhoenix

that links to the competing product that you plan to crush:

http://www.my-competitor.com/products/Widget-2000.html

Your competitor need only study the referring URLs in their server logs to get the hint that you’re working on something called “Project Phoenix” that has something to do with their Widget-2000 product.

As early as 1996, the people writing the HTTP specification (the technical standard describing how web browsers talk to web servers) recognized the potential privacy problems with referring URLs. RFC 1945, the first version of the HTTP spec, introduced a guideline that has remained in all subsequent versions of the spec:

“Because the source of a link may be private information or may reveal an otherwise private information source, it is strongly recommended that the user be able to select whether or not the Referer field is sent. For example, a browser client could have a toggle switch for browsing openly/anonymously, which would respectively enable/disable the sending of Referer and From information.”

Two things are noteworthy here:

  • Yes, the spec uses a misspelling of “referrer.” A programmer working on one of the earliest web browsers misspelled the word, and the misspelling became a de facto standard and then part of the official spec.
  • As far as I know, none of the major web browsers has implemented the “strongly recommended” switch to disable the sending of referring URLs.

Defending against data leakage

In the absence of widespread browser support for hiding referring URLs, people who build websites can defend against data leakage on the server side.

One solution is to design the site so that the URLs don’t provide any externally meaningful information. Webmail services, for example, typically are designed so that no URL conveys information about who the user is. (If you use a webmail service and find that your username or user ID shows up in the service’s URLs, it’s probably time to switch providers.)

That solution isn’t always feasible, though. In social networking websites, for example, it has become popular to put a human-readable user name in URLs. In cases where the referring URL necessarily contains sensitive data, it is possible to keep most browsers from sending the referring URL to the destination site’s web servers. The technique for doing this is an old trick:

  • On Site A, instead of linking directly to http://www.Site-B.net/destination-page.html, link to http://www.Site-A.com/some-URL-that-gives-away-no-user-data.html
  • The page http://www.Site-A.com/some-URL-that-gives-away-no-user-data.html contains (nothing but) an HTML meta tag that redirects immediately to http://www.Site-B.net/destination-page.html
  • The web server for www.Site-B.net sees a referring URL of http://www.Site-A.com/some-URL-that-gives-away-no-user-data.html

It appears that Facebook is now using this trick for all links that go offsite, including ads and links in user-generated content. Thus, while the WSJ’s article arguably is alarmist, it seems to have helped push Facebook to deploy a more rigorous referrer protection than has been common in the industry.

2 comments to Referring URLs and web privacy

  • Brian, there’s another name for the “trick” that Facebook is using, the one that they had to no doubt invest thousands of engineering hours on, is called “click tracking” which any budget ad serving solution offers out of the box, including my perennial favorite, OAS. ;)

  • P.S. I’ll give you one guess how easy it is to turn on click tracking for 3rd party served ads…

You must be logged in to post a comment. Don't worry; you needn't register for a new account here. You can log in securely via your Google/Gmail, Yahoo, Facebook, or OpenID account.