1. Sep 10th, 2006

    What spec authors can learn from context-free grammars

    Good specs are like context-free grammars.

    Back when I started developing language tools, memory was a big issue. There’s so much you can fit into 16KB of program code. And so I hand coded parsers in assembly language. The most interesting question I had to answer: does it work? I knew about context-free grammars, but regarded them as nice theory, the stuff you learn at university. They did not apply to me.

    When I graduated to 640KB land, I started working with tools like Flex, Yacc, Bison. They made life easier, and in doing so also enforced a context-free grammar. Still I didn’t care, I happened to use it because it was available.

    With time, I expected tools to do much more. I expected the editor to do syntax highlighting, then came code completion and jumping from code to definition. I wanted better searching and diffing, and occassionally some sort of code generation. Tools that can process source code to reveal class heirarchies and diagram module relations are also good.

    That’s when I learned the value of context-free grammars.

    You see, a compiler has a deep and complete understanding of the programming language. However complicated the language is, the compiler will understand it well. Text editors don’t. The text editor knows nothing about the language you’re working with. But if the syntax rules are very simple, you can teach it to pick specific elements and highlight them with different colors. And you can get it done in an hour or two.

    The same thing with code generators. Working with context-based grammars requires very sophisticated libraries that understand what code to generate in which context. Working with context-free grammars requires very simple tools. And you can get a lot done in a short amount of time. You don’t need trucks to drive through a door.

    A good spec assumes that it will be used by a variety of tools. Some of these tools will be trucks that implement the entire spec, down to the last detail. Other tools are roller blades that implement bits and pieces, just enough to solve a specific problems. Some of them will be ad hoc hacks, created in an hour and used only once. It’s quite common for projects to create these one-off solutions out of necessity. It’s called self-help.

    Good specs let you do that because they value simplicity.

    This week I spent a few hours struggling to understand two sections of the WSDL 2.0 spec, and it reminded me of this analogy. I like specs that use a lot of verbs, then I can easily understand how to use them. Some specs have more nouns than verbs. The noun part describes what the feature is, not how to use it, and are generally obscure and hard to understand. Still, you can delve into the spec, read the verb part, gain an understanding and be done in 30 minutes.

    Not so with WSDL 2.0 which is all nouns. It never quite answers the question: how do I do this? That’s really all I wanted to know, but instead I had to spend hours trying to piece it together form a collection of loosely coupled descriptions.

    Even worse, instead of editing for the reader, it’s edited for the component model. Each feature is broken into units, one per component. Each component has a description, followed by an elaboration of the component model, followed by an elaboration of the almost identical XML schema, followed by a mapping between the two.

    Because features are broken into component units, there are no simple answers. I expected one page to teach me how to use SOAP bindings, and another to show the differences between SOAP 1.1 and 1.2. Instead, I had to wade through several component units, flipping over non-functional pages to try and piece the answer myself.

    WSDL 2.0 expects the reader to be a master of the spec, studying and memorizing it from beginning to end, before going and implementing it. It’s the equivalent of reading 233 pages of C++: A Reference Manual before you can write your first “hello world”. That level of dedication only pays off if you’re building a truck. It’s a huge barrier if you only want to do one feature, or troubleshoot a problem, or hack a one-off solution.

    Wouldn’t it be better if your average developer could understand a feature of the spec in under 30 minutes? If you could get more people involved that can do things with the spec, without the time commitment barrier?

    If it was a context-free grammar?

    One argument is that specs like this are only interesting to tool vendors. But tool vendors only build general purpose products. So you’re missing the opportunity to go beyond the general purpose tool, the opportunity for self-help. It’s like having HTML, but only if you use FrontPage, DreamWeaver or export from Word.

    Another argument is that spec authors don’t write books. Books will always do a better job of teaching and educating to an even wider audience. But that’s creating another artificial barrier. You can’t work with the spec as it’s being created, but have to wait for the book authors (and online equivalence) to catch up. And getting involved early helps the spec mature, find and fix more issues before it goes live.

    Think of HTML which evolved out of people using it and building tools around it, then codified into a spec.

    Standard bodies have ways to deal with IP issues, balance interests of multiple vendors, and establish liasons between different working groups. But they’re off base until they start mandating some level of simplicity that will help specs appeal to a wider audience of developers. Not just those building trucks, but also those who just want to roller blade.

    Standard bodies must start demanding context-free specs.

    Image by alykat.

    1. Sep 11th, 2006

      ludo

      s/grammer(s?)/grammar\1/g :)

    2. Sep 11th, 2006

      Assaf

      thanks :-) someday I will learn to spell grammar, or just start using a spell checker.

    3. Sep 12th, 2006

      ludo

      Easy to spot for an Italian, grammar => grammatica, grammer => grammetica which does not make any sense and has a totally different sound when read aloud. We have a whole different set of spelling problems. :)

    4. Sep 12th, 2006

      Assaf

      it’s one of those spelling mistakes I used to do when I was young, and for some reason it just stuck. I have a few of those, which I always get wrong.

      sometimes I remember to check for them before posting or e-mailing, sometimes I don’t.

      but hey, I’m going to try thinking grammatica, maybe that will cure it!

    Your comment, here ⇓