Code Mark-up Language (CodeML)

Many of the most time-consuming, polarizing debates in softwaredevelopment are the constant back-and-forths over trivial codestyle: Tabs or spaces, 2 or 3 spaces, curly bracket at the endof line or beginning of the next, and so on. While C# improvesthings a bit in that it, by default, does some formatting (such ascurly bracket placement), they could have taken it a step further -the source should always be stored in a clear, defined andstandardized XML storage format. e.g.

<namespace name=”Microsoft.LameExample”>
  <class visibility=”public” name=”MyClass”>
    <method visibility=”public” type=”instance”name=”Run”>

I’m hardly thefirst to advocate this (and of course it’s what most of us thoughtabout when we first started getting a handle on XML many moons ago- how can we apply this to software development), and while Ithought I was original thinking up the term CodeML, there areactually already plenty of references to it out there. It’s apretty obvious idea.

With such a structure not only would language parsing be mucheasier for third-parties to interact with, but most importantly itwould allow each user’s environment to stylistically renderthe code however they want. Source control systems could(should) be schema aware, intelligently doing change management,and we wouldn’t have massive change sets caused by someone runninga reformatter to update to the standard of the day, switching fromtabs to 3 spaces (python’s use of whitespace to meaningfullyindicate blocks/scope is absolutely brilliant, as an aside), orupdating the curly bracket standard. What a complete waste of timeand effort, and what a ridiculous distraction from actually solvingproblems.

Of course we still have the issue of case sensitivity – I’mstill on the fence about this. I love C & C++, and I’ve alwayshung onto their case sensitivity out of habit (“real programmerspay attention to case”), yet it’s possibly just another needlessdistraction. There really are very few instances where youwant overlapped names, differing only by case.

It was, I think, an ugly decision to incorporate casesensitivity into .NET.

If there were no case sensitivity issues then this too could bean environmental option (e.g. all identifiers are stored in upperor lower case in the XML file, but then rendered according to userpreferences – consts all upper case, parameters camel cased, publicmethods pascal cased, and so on. No need to “correct” such a menialthing in other people’s code. The IDE knows what each of thesethings are and what their visibility is, so there should be no needto crystalize such a pseudo-Hungarian notation in our code).

This is of course just off-the-cuff thoughts after hearing yetanother reformatting debate, so temper your criticism accordingly.I’m also pragmatic enough to realize that making a standardizedschema for something with as many edge conditions and exceptionalcircumstances as source code isn’t quite as easy as itsounds.