[ root | ruminations | remote ]

Shedding some light on "dark social"


Recently, The Atlantic had an article about the weight of "dark social"—inbound links without a referer header. I fundamentally agree with the conclusion of the piece that the primary way to optimize for social spread is to focus on the content itself. The data referenced in the article goes on to state that somewhere between 56% and 69% of inbound links across a broad set of sites don't have a referer. That in and of itself is plausible. However, the author then goes on to suggest that all of this traffic must be coming from email or IM.

I find this highly dubious. There must be some other sources of refererless traffic besides just email and IM.

Over on HN and Reddit the commentary isn't critical of these numbers or assertions.

So, what else could be contributing to this source of traffic?

The rise of link shorteners, redirects, and mobile

Content publishers and the broader web have been completely enmeshed with the collection of analytics. One of the ways that the collection of these analytics manifests itself with link shorteners. So, what effect do link shorteners have on this mysterious dark social? How do they interact with the server when consumed via browser? Via HTTPS? Via a native mobile app?

Collecting some data

Since we are measuring the ever elusive presence of dark social, I set up a Large Referer Collider endpoint that displays the referer header for a request. I'll test posting the direct link along with some of the more popular link shorteners: t.co, bit.ly, goo.gl, ow.ly, and tinyurl.com. Along with these, I'll try three chained shortened links: bit.ly to t.co; bit.ly to t.co to bit.ly; and, bit.ly to goo.gl (under the covers these end up being chained HTTP 301s and 302s). Then, I'll post and consume these links to the two biggest contributors of social traffic: Facebook and Twitter. I'll also take a look at what happens with GChat.

In order to best simulate what's probably happening for most users, in clicking and testing all of these links I'm not going to be doing things like clearing my cache, opening and closing the browser, forcing a new tab, etc. We want to find places the referer is absent in the course of an ordinary browsing session. I'm using on Mac: Safari Version 6.0 (7536.25); Chrome Version 22.0.1229.94; Firefox 14.0.1; Android ICS 4.0.4; and iOS 5.1. For the browser tests, I'm using the HTTPS version of the site.

Some semblance of a control group

First, let's just click all of these links directly from this page right here. Let's be sure we're seeing what we expect to see as the referer ().

SafariChromeFirefoxAndroidiOS
direct
t.co
bit.ly
goo.gl
ow.ly
tinyurl
bit.ly → t.co
bit.ly → t.co → bit.ly
bit.ly → goo.gl
table 1

Looks good and what we'd expect.

Next, let's just copy and paste these links directly into the location bar. Let's be sure we're seeing that there is, in fact, no referer ().

SafariChromeFirefoxAndroidiOS
direct
t.co
bit.ly
goo.gl
ow.ly
tinyurl
bit.ly → t.co
bit.ly → t.co → bit.ly
bit.ly → goo.gl
table 2

Wait, wait... wat?

Ok! Well! That's interesting. It appears that any time t.co is in the chain of links the server is getting a referer header set to t.co.

I may have just discovered anti-dark social: links that shouldn't have a referer but are contributing to analytics nonetheless. This finding is, of course, subject to independent verification and various publications in peer reviewed journals. For my own sanity, here's a screen capture of using Chrome to copy and paste the ow.ly link, the bit.ly to goo.gl link, and the bit.ly to t.co to bit.ly link.

I'll leave it as an exercise for the reader to see what's going on with t.co headers.

Facebook

Next, let's post these links to Facebook and click on them from there. On Facebook we can post links as status updates as well as in the comments to a status update. Let's start with a link posted as a status update. The columns for Android and iOS are through the context of the official native Facebook app.

SafariChromeFirefoxAndroidiOS
direct
bit.ly
goo.gl
ow.ly
tinyurl
bit.ly → t.co
bit.ly → t.co → bit.ly
bit.ly → goo.gl
table 3

Looks good!

(N.B: The original article states "whenever someone is moving from a secure site [...] to a non-secure site [the referer is absent]". However, all of these links were tested from Facebook's HTTPS site — the referer is present every time.)

On to the links posted as comments. For the comments section the user has two options: 1) click the link that was part of the comment, or 2) click the section that's being rendered by Open Graph data. Since the Large Referer Collider doesn't have any, it just shows the link twice (except for the direct link which isn't rendered twice). Let's take a look at each one; the Open Graph rendered link will be denoted with a ′

SafariChromeFirefoxAndroidiOS
direct
bit.ly
bit.ly ′
goo.gl
goo.gl ′
ow.ly
ow.ly ′
tinyurl
tinyurl ′
bit.ly → t.co
bit.ly → t.co ′
bit.ly → t.co → bit.ly
bit.ly → t.co → bit.ly ′
bit.ly → goo.gl
bit.ly → goo.gl ′
table 4

What the!?? It looks like WebKit is doing something weird on the desktop. Alrighty then! (For what ever it's worth: I tried each of these links on an older version of Safari and always got a referer.)

Twitter

Next, let's post all of these links to Twitter and click on them. I've folded the t.co link in with the direct link since the rendered page as well as the native clients unroll t.co links as direct links. The columns for Android and iOS are through the context of the official native Twitter app. On Android a click within the native app causes the default system wide browser to open; on iOS a click within the native app pushes a new view inside the official app, rather than opening the system-wide Safari.

SafariChromeFirefoxAndroidiOS
direct & t.co
bit.ly
goo.gl
ow.ly
tinyurl
bit.ly → t.co
bit.ly → t.co → bit.ly
bit.ly → goo.gl
table 5

It appears that the embedded UIWebView rendering links in the native iOS Twitter app doesn't pass the referer on to the server if there's more than one redirect in the link (since every link to Twitter gets wrapped in a t.co link, every other shortened link will encounter at least one extra 301/302).

The same note about going from Twitter's HTTPS site to a link applies here: the referer is present in all browsers.

GChat

Let's see what happens with links over GChat when consumed with a browser. First let's see what happens when clicked from within gMail (i.e. https://mail.google.com/).

SafariChromeFirefox
direct
t.co
bit.ly
goo.gl
ow.ly
tinyurl
bit.ly → t.co
bit.ly → t.co → bit.ly
bit.ly → goo.gl
table 6

This result is interesting since clicking on the links over HTTPS from Facebook and Twitter, the referer was present in most of the cases.

Next, let's see what happens when we click on the links from Google+ (i.e. https://plus.google.com/).

SafariChromeFirefox
direct
t.co
bit.ly
goo.gl
ow.ly
tinyurl
bit.ly → t.co
bit.ly → t.co → bit.ly
bit.ly → goo.gl
table 7

Interesting: the direct link doesn't have a referer, but the shortened links do.

Findings and conclusions

  1. Anti-dark social: t.co links appear to be presenting a referer header when there shouldn't be.
  2. Contributing to dark social: comments on Facebook, when clicked on with a WebKit (Safari and Chrome) browser, don't send a referer header
  3. Contributing to dark social: chained redirects in Twitter's official app on iOS.
  4. In testing links from Facebook and Twitter over HTTPS the referer is present in most cases.
  5. GChat links present a referer over https://plus.google.com/ on desktop browsers.

These mechanisms driving refererless traffic are probably pretty small in the grand scheme of things. So, the big question in my mind still remains: what other ordinary interactions, besides email and IM, are driving refererless traffic? What about RSS readers? Apps like Flipboard? Apps like Instapaper? And, more importantly, what glorious hack is going to emerge to measure all of these sources?

The conclusion we can draw from this data is one we all already know: the web, and the internet in general, is a giant hodgepodge. Browsers, implementers of HTTP, proxies, crawlers, bots, native apps are all just doing their own thing. Most of the time things are in good agreement but along the edges you're probably going to see what you don't expect.

Well, that was fun and all but I have to go finish up my research on faster than light SSL terminators. Until next time!

If you enjoyed reading this, catch me on Twitter