Over the years, I have referred to cookies as web guano. I'm now of the opinion that I may have been too charitable. Cookies are far more hazardous to your digital health than anyone could have predicted 20 years ago.
I last wrote about this subject more than ten years ago in a piece entitled “Caustic Cookies” in reaction to what I considered to be inexcusable inattention to security and privacy issues in the IETF standard (RFC2965). Caustic no longer does justice to the problems cookies cause. We're beyond caustic, passed by carcinogenic, ducked into mephitic, and are now decidedly toxic – unsafe for any human habitat. And it didn't need to be this way.
Cookies were invented to overcome the statelessness of HTTP for web commerce applications. The contents of the shopping cart had to be stored somewhere, and offloading this responsibility to the customer's hard drive required far less in the way of network and programming resources than retaining it on the merchant's servers. And I don't really have a problem with that as long as all principals – online merchants, web applications/browser developers, search engines, operating systems, computer manufacturers, and most of all end users and customers, are on the same page regarding deployment and disclosures. But as some of us have explained for these many years, not everyone is behaving nicely.
The basic HTTP cookie recipe was invented by Lou Montulli while he was developing e-commerce applications for Netscape Communications in 1994. (Some of you may remember Lou's early multi-platform, hypertext web browser, Lynx.) By extending the concept of the “magic cookie” metaphor in programming, Montulli created the cookie in response to a need for client-side web memory (aka shopping cart) for one of the last releases of the original Netscape Mosaic browser releases in October, 1994 just months before the browser evolved into Netscape Navigator.
The general idea was straightforward: small amounts of digital guano (cookies) would be left on the user's hard disk by a server-side platform by means of a “set-cookie” header as part of a HTTP response to a browser. The set cookie attributes were to be transaction-oriented data such as userID, name, date, domain of server, pages visited, contents of shopping cart, and potentially any personally identifying information (PII) that the user provided during the session. While all of this information is being stored on the user's side, there is also a server-side memory being created. How these two inter-relate in any particular cookie context is anyone's guess because there are no legally enforced standards.
Montulli's 1998 patent (US5774670) defines persistent cookies. This concept was almost immediately extended well beyond the original idea of adding a client-side memory to a stateless Internet protocol. Authentication cookies followed so that current user information and connection status might be monitored. Now just ask yourself what information must be exchanged between the client and server in order for a commercial interest to authenticate a user! In a nutshell, its information that the user/customer doesn't want leaked, so you have to rely on the security integrity of the host website for protection. What has the last twenty years taught us about relying on payment card systems and e-commerce environments to protect our security? Does Heartland Payment Systems ring a bell? TJX? Card Systems Solutions? For a convenient refresher, take a look at our Identity Theft and Financial Fraud Reading Room at http://www.itffroc.org/rr.html or look back at my column in the January, 2012 issue of Computer. The only reliable way to protect against such PII compromise, is to prevent the data from being used in the first place. A sobering thought for a world that lives on plastic.
3 rd party cookies are a different story altogether. These cookies are set with “foreign” domain tails – i.e., different from that of the site being visited. When embedded ads in a web page are allowed to store their own cookies, the browsing behavior may be reconstructed (aka “tracked”, hence the term “tracking cookies”) and exploited by commercial interests. 3 rd party cookies present an interesting case-study in perfecting really bad ideas (from the point of the consumer). They are primarily used by ad networks to track movement between websites. Cookies were admitted into the HTTP standards without any requirement of user-awareness! Although 3 rd party cookie blocking was the default in section 4.3.2 of the original 1997 draft standard for HTTP state management (RFC2109), the browser developers did not follow the standard.
The basic cookie mix serves many appetites. Session (or transaction) cookies are (hopefully) dispatched at the end of the browser session. Secure cookies are created during SSL exchanges (e.g., HTTPS) sessions. There are ill-behaved, out-of-band cookies such as supercookies and Zombie cookies, as well. There are many variations on the cookie theme. In my opinion, all but the original recipe are half-baked.
This Wikipedia quotation gives some idea that cookie sharing is out of control
“The United States government has set strict rules on setting cookies in 2000 after it was disclosed that the White House drug policy office used cookies to track computer users viewing its online anti-drug advertising. In 2002, privacy activist Daniel Brandt found that the CIA had been leaving persistent cookies on computers which had visited its website. When notified it was violating policy, CIA stated that these cookies were not intentionally set and stopped setting them. On December 25, 2005, Brandt discovered that the National Security Agency (NSA) had been leaving two persistent cookies on visitors' computers due to a software upgrade. After being informed, the National Security Agency immediately disabled the cookies. ”
Not only are we being abused by e_merchants, but our own government as well!
So what's all the fuss about? Here are a two examples.
do not track (DNT) – tracking cookies have been with us for quite a while. The concept is to enable server-side systems to monitor online customer behavior. DNT is now an accepted IETF HTTP header field. If a browser has DNT enabled, then tracking is prevented, right? Not at all. Let's see how this works.
When a browser sends an HTTP request to a web server, the dialog is organized around message header fields (e.g., GET, POST). Some headers are considered “core” (per IETF RFCs 2616 and 4229) and must be supported to achieve IETF HTTP-compliancy. Others, such as DNT, are outside the core and optional. To put it simply, respecting a browser's DNT request is entirely voluntary and may be ignored without penalty under IETF standards. Ask yourself where the motivation for this idea came from. Cui bono?
From a personal privacy perspective, the European Union has taken a more reasonable approach in their model for compliance (cf. http://www.ico.gov.uk/for_organisations/privacy_and_electronic_communications/the_guide/cookies.aspx , especially the PDF entitled “Cookies Guidance”). Their 2009/136/EC amendment to section 5C of the 2003 E-Privacy Regulation was undertaken to “…protect the privacy of internet users – even where the information being collected about them is not directly personally identifiable.” Not surprisingly, such concern for the security and privacy of individuals is anathema to U.S. business interests whose focus is primarily on increasing the consumption of goods and services, and the sale of other peoples PII. Philosophically, the U.S. views restrictions on cookies as primarily an economic issue (unrealized sales, decreased advertising revenue, increased overhead, decreased efficiencies), whereas the E.U. looks at it as a matter of civil liberties (right to be left alone, right to privacy).
From a historical perspective, W3C has tried unsuccessfully for several years to come up with a DNT standard. Their problem was conceptual confusion: they spent their time asking the question “What does ‘do not track' mean?” – as if this were some Wittgensteinian grand challenge. They would have been farther along if they had approached the problem from the point of view of Humpty Dumpty.
The fault is that in trying to be everything to all parties – regulators, users, privacy zealots, and the 10,000 gorilla in the room, the business interests (merchants, advertisers, analytics services, etc.) - the W3C has added a level of agenda-based obfuscation that parallels the gun rights debate. Introducing a sprig of DNT and a pinch of public policy/econo-babble into a huge vat of self-serving business interests, will still yield a huge vat of self-serving business interests. DNT is not deceptively simple, it paradigmatically simple. Do not track means just what it says – full stop! The W3C behaves as if the meaning of DNT is to be found in cost-benefit studies. Linguistic absurdity and the suspension of common sense will never frame an intelligent discussion on DNT. Of course the business interests' mantra is that any restraint over their use of other people's data that would affect their profit is, by definition, over-regulation. Such is the rhetoric of Edward Bernays knock-offs whose ideological mentor thought that strategies to get more people to smoke cigarettes was inspired.
Perhaps do not track may not be the appropriate operational metaphor. Maybe, we should define a continuum that includes “track me a little but don't scar the cheeks,” to “have your way with me, you global commerce she-devil.” One may imagine middle ground here. Microsoft had the right idea with DNT1 which was on by default in IE10. They failed in application because they failed to build a consensus, and as a consequence had their effort to protect the user's privacy thwarted by web merchants – and even Apache, that doesn't recognize DNT1 set on IE10 browsers!
Canada does not allow tracking by statute, f.y.i., which is the only intelligent starting point if one views users as anything more than consumers. This is all easily accomplished with that bête noir of the Web advertising and analytics crowd, the “opt in” checkbox – or at least E.U.'s “Enhanced Browser Settings.” Microsoft would have been far more successful if they built “tracker tracking” features into IE to let the end user see just what was being done by the servers, and taken a swerve around the W3C altogether.
Add-on Wars. The 1990's “browser wars' made it clear that there was a lack of orthodoxy concerning browser ware compliance with W3C recommendations. As a consequence, there was no assurance that what you saw in the browser was what was intended by the Web page author. I coined this “WYSINWOS” (what you see isn't necessarily what's on the server”). This disparity led me to develop the World Wide Web Test Pattern in 1994-5 (cf. http://www.berghel.net/webtestpattern/ ). Particularly annoying to the W3C was Microsoft's zeal at innovation – the W3C was trying to get the developers to work through the approval process, while Microsoft was going its own way. To this day there are still Web portals that design around Internet Explorer . How many times have you visited Websites where the text didn't fit nicely into the text box provided? Well the browser wars are back, but this time the fight is over add-ons. Speaking of which, these are three that I like adding onto Firefox: Adblock Plus ( http://www.youtube.com/watch?v=oNvb2SjVjjI ) is a tunable add-on to get rid of most web ads; Empty Cache Button 2.2 ( https://addons.mozilla.org/en-us/firefox/addon/empty-cache-button/ ) that does just what it says in memory or on disk; and https-finder ( https://code.google.com/p/https-finder/ ) that automatically detects and enforces HTTPS connections whenever possible. At this point, the add-on community is doing more for browsing privacy than the W3C and the IETF.
In 2001 I spoke of web barbarians at the electronic gates that penetrate our digital zones of privacy. I wrote: “for wont of a simple technical patch to overcome the statelessness of TCP/IP, we have created a cookie monster” – and at that time I assumed that web merchants were for the most part acting responsibly! I no longer hold that belief – especially with respect to the Web advertising and analysis community. We need a federal statutory wake-up call that the data needs for marketing, behavioral analytics and the like do not trump a citizens expectation of privacy.
I'll return to the more technical side of cookie abuse in a later column. For the moment, I'll conclude now as I did twelve years ago: “The problem society has to deal with is whether the collection of personal information about an individual without the individual's informed consent should be tolerated.”
The current IETF cookie standard is RFC 6265 ( http://tools.ietf.org/html/rfc6265 ) released in August, 2011. The security vulnerabilities of cookies (bearing the euphemism “pitfalls” in RFC6265) are outlined in Section 8. This includes cross-site scripting, cross-site request forgeries, session fixation vulnerabilities, ambient authority and confused deputy attacks, replay attacks, to name but a few. At this writing RFC6265 only “recommends” but does not require encryption of cookie payloads and the use of secure (e.g., HTTPS) channels. Neither of these restrictions have been integrated into the design of browsers, although modern browsers do support user-configurable security improvements. It is interesting to note that the vulnerabilities of cookies were anticipated at the time of the original October, 2000 draft standard (cf. http://tools.ietf.org/html/rfc2965 , section 7). The point is that when vulnerabilities are known at the time of the setting of the standards informing and protecting the end-users should have been a priority concern. Did you know that cookie vulnerabilities were anticipated in 1997? Those of us who wrote about this vulnerability at the time were either ignored or vilified by Internet snake charmers.
Basic explanations of cookies-in-use may be found online in my 2001 paper ( http://www.berghel.net/col-edit/digital_village/apr-01/dv_4-01.php ) , Wikipedia, and the All About Cookies website at http://www.allaboutcookies.org/ .