Principles of measuring connection speed

Today I finally have the time to continue with the connection speed measurements. For those joining the discussion only now, first read part 1 and part 2. See also the draft spec; especially for the metered property.

I’m trying to distill a few principles for measuring the connection speed. I give them in the form of statements; be sure to speak up if you disagree with one of them.

A later article will return to the nasty details of reading out the connection speed. Today we’re only concerned with broad outlines.

1. It’s useful

Principle 1: Despite the high level of uncertainty involved, it’s useful to try to read out the connection speed.

If I didn’t think it useful I wouldn’t write so many posts about this topic. Duh.

Seriously, though: does the fact that connection speed readings can go wrong on many levels mean that it can never give useful information?

The key point here is that we need an idea of the uncertainty involved. If that uncertainty is low it would mean we have a reasonably accurate speed reading that we can use to make our sites more responsive.

To me, getting useful information in some cases is sufficient reason to add connection speed measurements to the web developers’ arsenal.

2. The user has the final say

Principle 2: Users should have the option of overriding the entire system. Web developers should obey the user.

I don’t think many people will disagree with this one. Users should be able to indicate they always want to receive the high-res version, even if they’re on a slow connection. Or vice versa.

Still, I will argue in a moment that we should send all information to the web developer, and not just a conclusion. So in theory the web developer could still overrule the user. He shouldn’t, though.

3. Give the web developer information, not conclusions

Principle 3: Web developers should get full information about the connection instead of a conclusion generated internally by the browser.

This requires some explanation. Suppose the user starts up her phone and surfs to your site over a 3G connection.

Now the browser could use internal heuristics to take a decision on the reported connection speed. It could also just dump this entire series of data points onto the web developer’s plate. I prefer the latter.

Browser decision

Web developer decision

The browser could also send the entire data set to the web developer and wait for instructions:

This would require the web developer to write her own heuristics for dealing with this situation. In most situations I’d say she’d combine 3G and high uncertainty to decide on the low quality site, but the point here is that she could decide otherwise if she has good reason.

The disadvantage is that we need a rather large amount of properties; see the list above. I’d like to remove some, but I’m not yet sure which ones.

4. Individual readings may differ

Principle 4: Every individual connection speed reading should react to the current situation.

Thus, if the user starts browsing in a high-speed area and then enters a tunnel where the speed drops dramatically before downloading the style sheet with speed media queries, the HTTP header should report a fast connection and the media queries a slow one.

This is in line with the previous principle of dumping all information on the web developer’s plate, and leaving it to her to make sense of it.

JavaScript readings can easily be compared to each other, and you can always add the HTTP header reading as another JavaScript variable. Media query readings, unfortunately, are hard, if not impossible, to compare to other readings. Maybe connection speed media queries aren’t such a good idea after all.

Empowering web developers

The last two principles are about empowering web developers by giving them all information that is available. This has advantages as well as disadvantages.

The disadvantage is information glut. Take another look at the last list above, and imagine you having to muddle through this list and take a decision.

Still, if a system like this is ever implemented, it won’t be long before clever web developers start writing tutorials and libraries to help the newbies out with their decisions.

The advantage of empowering web developers is that change occurs at a much faster rate. Suppose that research discovers that a user with the settings I sketched above would really prefer the high-res site. Now the decision heuristics should be updated.

If web developers take the decision themselves they can simply amend their scripts and libraries, and the update will take place in a matter of weeks. Conversely, if browser vendors would be involved we’d have to wait for the next upgrade to the browser, which is likely the same as the next upgrade to the OS. It might take months. That’s a clear advantage of decentralizing this sort of decisions.

Finally, if browser vendors were to take connection speed decisions themselves, it wouldn’t be long before we discovered that they may take different decisions in the same situation. Browser compatibility tables, premature hair loss etc.

So all in all I feel that empowering web developers to take connection speed decisions is the way to go.

Comments

1 Posted by tom jones on 7 November 2012 | Permalink

"Still, I will argue in a moment that we should send all information to the web developer, and not just a conclusion. So in theory the web developer could still overrule the user. "

the developer can *always* override the user (and send either high res or low res), as this is only an (optional) header that merely suggests a preference.

i think this is getting too complicated for headers and media queries, for a simple default use case. i think the difference should be split between headers and media queries on one side, and javascript on the other.

headers (and media queries) should only have one high-level parameter, with maybe three possible values: high, medium and low bandwidth, that can be (simply) detected by the browser, or overridden by the user.

all the other details that you list that can be useful in more specific use cases should still be accessible from javascript, where a more sophisticated algorithm can adjust to a constantly-changing situation.

2 Posted by Michael. on 7 November 2012 | Permalink

As a user I object to 3 "Give the web developer information, not conclusions". The main reason is privacy (I object to information leakage). As a (minor) developer, I would also say "what do I want all that information for?" but you've sort of covered that.

Also, the trouble with using language (e.g. "slow" "very-slow") is that definitions change. What was once blisteringly fast is now too slow for most people. Instead, I prefer numerical values. USB 2 and USB 3 makes a lot more sense than USB Hi-Speed and USB Super-Speed (what's USB 4 going to be? USB Extra-Super-Speed?).

3 Posted by Ron Waldon on 7 November 2012 | Permalink

I agree that this information would be great to have. I have been puzzling out a solution to this too, and I think your addition of an "uncertainty" reading is the missing piece.

This reminds me of the GeoLocation API. When you make a request to it, you can tell it that you want the calculations to be based on data that is 5 seconds old, 30 seconds old, 10 minutes old, etc. When you get the results back, there's an accuracy value to indicate how fuzzy the results are. Is there any more than inspiration to be drawn from here?

4 Posted by Lee Kowalkowski on 8 November 2012 | Permalink

Principles 2 & 4 are good, as are your intentions. Principle 3 is a potential digital fingerprint that some users will insist be disabled.

However, I can't get past the notion that if this capability existed, the majority of websites that use it, will only do so to essentially slow down my fast connection with unnecessary rubbish. Cue endless blog posts and magazine articles instructing users how to configure their devices to speed up their browsing experience by overriding this system.

I can't see very many website developers caring about connection speed, only device capabilities like resolution (connection speed is not a device capability, it's a network capability - knowing the connection speed tells you nothing about the device - any device can have a USB2 3G wireless adapter).

Websites where connection speed is critical to the experience (e.g. streaming video or gaming), would already have to dynamically measure goodput to ensure usability, not throughput or connection speed. As goodput is measured at the application level, I don't see how these websites would benefit from the extra information at all, except metered - that might be useful. The browser itself will not be capable of measuring goodput, although the server might, depending on the application, but it would have to use application-specific data, nothing that a generic specification could provide.

So principle 1, no, connection speed is not useful, but goodput is. Not because connection speed might be inaccurate, but because it's meaningless, just like knowing the speed of the processor, or whether the device has batteries. It's not telling you what you need to know. You'll just be (ab)using the connection speed to make other assumptions.

5 Posted by Philip Tellis (@bluesmoon) on 9 November 2012 | Permalink

Hi ppk,

I'm joining this conversation late (missed your earlier two posts), so these comments are related to the entire series.

We'd done a lot of research on connection speed at Yahoo! starting from back in 2006. It involved downloading several images in sequence to calculate network throughput. The research was added to the load time measurement library in use at Yahoo!, which we opensourced in 2010 as boomerang.

I was and still am the lead developer of boomerang, and you can find the most current version of the code on my github page (bluesmoon/boomerang).

You can read the source (or the slides for one of the many talks I've done on the topic) for the details, but at the high level, we capture a few important things:
1. Network latency between client and server
2. Margin of Error in this latency measurement
3. Network throughput between client and server
4. Margin of Error in this measurement.

The 2nd and 4th tells us how stable the link is. The 1st gives us an estimate of how far the user is, but also the kind of connection they're on, and we can use this to get an actual or corrected speed. The 3rd tells us an order of magnitude of the user's network speed.

Note that all measurements are to the server serving content. This may not be what you're proposing here, since the assumption you're making is that it's the last mile that's the most unstable part of the infrastructure and that's the only part we should measure. This may or may not be true.