300 Page Binary Madness

No, really, Microsoft’s obfuscation was necessary and in no way purposefully opaque!

Joel Spolsky seems like a really good programmer with solid design skills and a fantastic looking product, FogBugz.  On the other hand, his blog, Joel on Software is…not bad.  A lot of what he writes is at worst worth thinking about, and occasionally right on target.  Then every once and a while, his misplaced Microsoft sympathies come creeping out and he writes something like today’s post.  In it, he points out the absurd difficulty in reading and understanding Microsoft’s Office file format specifications just before telling us all why it was such a good idea and that its inherent unreadability and utter lack of portability were simply necessary side effects of good programming.

I don’t want to go point by point through his post.  Joel makes a lot of technical arguments whose validity is irrelevant: he’s working from a lot of goofy and faulty assumptions.  I imagine these are the same goofy and faulty assumptions that Microsoft’s design team works under when they produce their bloated, only occasionally functional monoliths.

There are a lot of things like: They were designed to be fast on very old computers.

And: They have to reflect all the complexity of the applications.

As well as: They were not designed with interoperability in mind.

The designed to be fast on very old computers thing amuses me, since the very argument Joel made in defense of Excel was that, unlike IBM’s Lotus package, Microsoft largely ignored the speed and hard drive limitations of the day, knowing that computers would get better quickly and thus having a product out that worked was more important.  I don’t buy that “efficiency” required a format no one could manage to implement today, even with the specification.  I’d like to note that the UNIX world managed to get by just fine without ever coming up with an equivalent binary monstrosity.  Plus, Word files were bloody huge, partially because of junk like Word’s “fast save” feature just slapping document updates onto the end of the file, leading to documents that grew in file size every time you saved a minor change.

I think it’s telling that the feature Joel suggests as the kind of complexity that necessitates massive binary specification is a “Keep With Next” function I never knew existed and have never seen anyone use.  This isn’t to say this is not a vital part of 5% of the user population’s day, but building a file format design that no one will ever be able to unpack so that you can add features most people don’t know exist is putting the cart far in front of the horse. Once again I must point out: The UNIX world managed to get by without binary madness filling in for reasonable document markup.

That’s what I can’t get past.  Other people have solved this problem, and they’ve solved it better.  We’re talking about tables and text files, here.  UNIX managed to get by with a minimum of crazy opaque binary formats, and largely due to this became the backbone of the Internet.  To suggest that Microsoft just couldn’t have anticipated interoperability being this important is insane.  UNIX hackers were making decision that facilitated interoperability since the 70’s, not because they pre-visioned the rise of the Internet, but because it was good design practice.

It’s possible Microsoft wasn’t prescient enough to see what was coming.  Microsoft certainly isn’t known for being forward thinking.  At best, they’re quick to react, purchasing or stealing the technology they needed to keep up with the market.  Even so, they never rectified their binary-file-format mistake, and I have no doubt it was for the same reason Microsoft obfuscates all of their formats: Because they can.  Microsoft’s control of the market relies on maintaining that control.  Not on producing better technology, or anticipating market trends, but simply by using its current market share to make switching to competing software so inconvenient that user laziness will sustain that market share.  Word 2003 had a big, opaque binary format so that people couldn’t easily switch to Linux and still read Word documents, thereby alienating them from the rest of the business community.  That this was not working is the only reason Word 2007 uses XML.

I can’t end without mentioning the flaw that cripples every Microsoft product, as spoken by Joel: They have to reflect the history of the applications.

Microsoft has been hobbled by the poor design decisions of its past for decades.  Everything from an MS DOS file system to the Registry to a lack of robust multi-user functionality is and must remain a part of the Microsoft codebase.  Every product by Microsoft has to be 100% compatible with every prior version of that product.  Word 2003 couldn’t have just imported in Word 97 formats, it had to stick with it, shoving even more binary madness into the spec to accommodate feature enhancements.  We’re not even talking about being backwards compatible with the OS here.  We’re talking about a marked-up text file.  It’s one thing to be backwards compatible with sane designs, like much of the UNIX world has been.  It’s another thing to be shackled to bad ideas you made in 1980.  Apple found itself in the same position, and scrapped their entire historical codebase.  Would the company still exist without OSX?  Maybe, but it certainly wouldn’t be growing its market share at all, let alone at their current rate.

There are reasons Microsoft’s dominating presence in the home and office spheres did not translate into the same market share online.  Most of them are laid out rather nicely in Joel’s post.  So check it out.

This entry was posted in Coding. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *