The HTTPS-only experience

EFF recently announced that We’re Halfway to Encrypting the Entire Web.” As a celebration of this achievement I’ve started an experiment: as of yesterday, no unencrypted HTTP traffic reaches this machine*.

Experience

Even though securing your web site is easier than ever it will take some time before everybody encrypts. But there are a bunch of sites which support both secure and insecure HTTP transfer, and there are a few tricks to tilt the scales and make the current experience better:

  • The HTTPS Everywhere browser extension ensures that you use HTTPS whenever possible on thousands of sites.
  • Editing the URL to add “https://” at the start works for some sites. If you get the dreaded certificate error page make sure to report them, and only add an exception if … well, that subject is too big to get into here. Just don’t.

Many sites have excellent HTTPS support, and have enabled HTTP Strict Transport Security (HSTS). In short, if your browser knows that the domain supports HTTPS (by visiting it or the browser being installed with it) you can simply type the domain name in the URL field and the browser will default to fetching the secure version of the site.

pacman stopped working after setting this up. It turns out the package database is fetched using unencrypted HTTP by default, but it was easy to generate a new list of only HTTPS mirrors.

Some sites have a strange HTTPS setup. The BBC only support HTTPS for the front page, which is just weird. Why go to all that trouble for 1% of the gain? Other sites require authentication to access using HTTPS, possibly not realising that setting up HTTPS for everyone would be easier.

These have been the only major hassles so far. The only things I really can’t get to work over HTTPS are various hold-outs like Wikia and BBC.

Setup

The change was a simple addition to my Puppet manifest:

firewall { '100 drop insecure outgoing HTTP traffic':
  chain  => 'OUTPUT',
  dport  => 80,
  proto  => tcp,
  action => reject,
}

The resulting rule:

$ sudo iptables --list-rules OUTPUT | grep ^-A
-A OUTPUT -p tcp -m multiport --dports 80 -m comment --comment "100 drop insecure outgoing HTTP traffic" -j REJECT --reject-with icmp-port-unreachable

* Technical readers will of course notice that the configuration simply blocks port 80, while HTTP can of course be served on any port. The configuration wasn’t meant as a safeguard against absolutely every way unencrypted HTTP content could be fetched, but rather focused on the >99.9% of the web which serves unencrypted content (if any) on port 80. I would be interested in any easy solutions for blocking unencrypted HTTP across the board.

More practical HTTP Accept headers

Isn’t it time for user agents to start reporting more fine grained which standards they support? The HTTP Accept header doesn’t provide enough information to know whether a document will be understood at all, and can lead to quite a few hacks, specially on sites using cutting edge technology, such as SVG or AJAX.

For an example, take a look at the correspondence between the Firefox 1.5.0.1 Accept header (text/xml, application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, image/png, */*;q=0.5) and the support level reported at Web Devout’s Web browser standards support page or Wikipedia’s Comparison of web browsers.

Say that I want to serve a page with some SVG, MathML, CSS2/3, and AJAX functionality. Each of these requires different hacks to ensure that non-compliant browsers don’t barf on the contents. For SVG and MathML, I can use CSS to put the advanced contents above the replacement images, or use e.g. SVG to provide replacement text. Both methods increase the amount of contents sent to the user agent, and are not really accessible – Non-visual browsers get the same information twice.

For CSS, countless hacks have been devised to make sure sites display the same in different browsers. So the user agent always receives more information than it needs.

AJAX needs to check for JavaScript support, then XMLHttpRequest support, and then must use typeof to switch between JS methods. This can easily triple the length of a script.

What if browsers could negotiate support with the server using e.g. namespace URIs, where these would reference either a standard, part of it, or some pre-defined support level? Poof, SVG 1.1 Tiny 95% supported, CSS 3 10% supported, DOM level 2 80% supported, etc..

Obviously, the Accept header would be much longer, but the contents received could be reduced significantly. Also, I believe it would be easier for developers to use only Accept header switching than learning all the hacks necessary for modern web development.

I don’t really know if this is possible, but maybe this kind of Accept header could be separated into a special HTTP reply. This would contain the URIs of the potential contents, and the user agent would send a new HTTP GET request with the modified Accept header, reporting the support levels.

Note: This post is the same as sent in reply to an email by Allan Beaufour on the www-forms mailing list of W3C. The text has been slightly modified for legibility.

Follow-up: Added a bug report for Firefox and a short Wikipedia article.