Saturday 23 May 2009

Getting it up (3) - the trouble with browsers

The story so far: If you want to get a proper website up, you need to know about how the text, pictures etc. on web pages are organised. Web pages are text with tags. The tags are bits of text inside pointy brackets which tell a program (the browser) on your computer how to display the page and, most importantly, what parts of your text are links to other pages. The tags are bits of code inside pointy brackets, as in the example that Anne included in another post:


Interestingly, there are problems with most or all of the tags in the graphic. ("b" and "i" are no longer regarded as the best way to achieve these effects; the "p" tag is a problem in an html editor if you try to use the carriage return to start a new line, as in a word processing program; the "br" tag written like this will cause a line break at the beginning of the line, not the end, and will probably generate an error; putting "text-decoration" inside a "span" tag would be a bad idea if you're doing it more than once on a site, and so on.)

The reason for the trouble is the history of html.

It started off as an academic research tool, to organise the cross-references between texts and avoid all that tedious business of ibid and op. cit. and passim and cf. and q.v. and librarians hating you because you dropped biscuit crumbs in their card indexes. It was invented by Tim Berners-Lee (now Sir Timothy, and should be at very least Lord Berners-Lee of Cyberspace if you ask me). He was a researcher at CERN, a big atom thing, on (or more precisely under) the Swiss-French border. Around 1990, the new-fangled internet was being used by nerds and soldiers to send messages to each other. St. Timothy realised that it would also be handy for distributing scientific documents. So you would store the text, illustrations and the references to other documents on your server, then others could call up the document using his clever system of addresses (URLs), read the thing and, gloriously, add to it, resulting in an explosion of collaborative human knowledge, progress and happiness.

Mr Berners-Lee (now the Earl of Url) then made three decisions which, with hindsight, appear unfortunate. First, he called it the "World Wide Web", giving the system (and hence the addresses) a tongue-twisting twelve-syllable abbreviation in English. Second, he didn't use his invention in a way that would make him fabulously rich. And thirdly, he didn't give serious attention to the browser issue.

Recall that all that was stored on Mr B's subterranean servers was essentially the elements of a scientific text on paper: the text itself (organised into headings, paragraphs, lists, tables etc.), the references (now in the form of handy embedded links), and the illustrations. To read the stuff, users needed software that displayed the text, lists, tables, illustrations etc. in a readable format, and had some mechanism to call up a new page from a coded link. Plus - the really tricky bit - the software had to allow you to edit the text - add new references, comments etc. - and save your edited version back onto the server.

The vital requirements for a browser were, then, 1) it had to be standards-compliant (i.e. respond to the agreed set of tags to display reliably any text written in html), and 2) it had to work as an editor. After a while, vital requirement 3) became apparent: the browser had to avoid being a means for the writers of the webpages to access a user's computer via naughty bits of code.

The first decent browser was created at the University of Illinois. Under a succession of names - Mosaic, Mozilla, Netscape, Mozilla again, and Firefox - it's been around ever since. It was the first attempt to monopolize the web. Visiting the NCSA in Illinois in 1992, Berners-Lee was dismayed to find "that the people at NCSA were attempting to portray themselves as the centre of Web development, and to basically rename the Web as Mosaic. At NCSA, something wasn't 'on the Web', it was 'on Mosaic'". This meant that the Illinois people felt no obligation to comply with the html standards. And they were uninterested in the more difficult requirement of a browser: that it should also be an html editor. As Berners-Lee puts it, they "were more excited about putting fancy display features into the browsers – multimedia, different colours and fonts – which took much less work and created more buzz among users".

The most flashy design feature was a tag which drew a horizontal line across the page. So as the web grew, initially in America, into a mass consumer activity, browser designers started increasing their market share by inventing their own tags, effectively inviting the writers of web pages to write html that would only work in particular browsers. What didn't help was the approach that U.S. companies took to the commercial development of the internet. The model was that of a TV network: people would pay a monthly subscription to companies like America Online, Compuserve or Prodigy, and in return the company would provide a full internet service: email, newsgroups and all the web-style content you would ever need – weather, news, entertainment, local information and so on. Since the content was produced by the internet provider, there was no reason to make it compatible with the international html standards. Users installed the provider's software from a CD. The less standards-compliant the "browser" part of this software, the more differentiated was the provider's product.

The next few years are known to infamy as "the browser wars". With a business model that staked everything on market share, the Netscape browsers introduced a succession of new tags. Some were useful, some unreliable (they didn't display properly, or crashed the browser), and some – notably the "blink" tag – were so hideous that web designers distained to use them. One – the "font" tag – seemed like a really good idea at the time but has had disastrous effects on the ability of non-specialists to create workable web pages.

Microsoft's influence on this process has been surprisingly benign, though this has been more a result of bad judgment than philanthropic instinct. Initially they failed to notice the web. Then, as Netscape got worryingly popular, Microsoft produced its own browser. Aside from the occasional silly tag – "marquee" to scroll text rather than blink it – Internet Explorer was less whizzy than Netscape, but thereby more standards-compliant. After the failure of a half-hearted attempt to rival providers like AOL with their own "network", Microsoft settled into a strategy of bundling its dull-but-sensible browser with the Windows operating system. They were subsequently expensively punished for this decision, but it had the effect of accustoming the world to receiving web content in standardised form from any source, as part of normal computer use.

By around 2000, Microsoft's Internet Explorer had become dominant, and Netscape's browser had collapsed under the weight of its own idiosyncrasies. By then, the battlefield had become online security, with rivals to Internet Explorer claiming that their browsers offered less openings to hackers. As to html, everybody now claims to be 100% standards-compliant (and everybody lies).

Apple, by the way, has generally stayed aloof from the browser wars: its strategy of modifying an area of computer technology to create a local monopoly didn't seem to apply here. Until 2003, Macs used Netscape or Internet Explorer. Since 2003, Apple has had its own browser, Safari, which is pretty standards-compliant but apparently dead insecure. Fortunately for Mac users, malicious hackers seem to expend most of their energy on attacking Microsoft's browser. Presumably the cybercriminals reckon that it's not worth targeting Mac users because a) there aren't enough of them, and b) those that there are have already spent all of their money on flashy digital bling.

Illinois eventually redeemed themselves by electing Senator Obama.

The browser wars continue, but in a less destructive form, as the surviving belligerents converge upon full standards-compliance and bicker about their relative security. The legacy is that it's always been tricky to create html that will work in all browsers and on all main computer systems, and that will continue to work in a few months when the next security parches are issued, and next year when the new browsers are released.
The other resultant problem is that it's never been easy to find a program to edit html – i.e. to create web pages. More about that in the next post ...

No comments: