Monday, July 8, 2013

some history

When I started there was really only assembly (machine) language. Assembler has some features that were right, but somehow the higher level language designers didn't like. In particular, every identifier stood for a constant value. For labels this was a memory address, but identifiers were also used to give names to constants.

The exceptional language design in which identifiers stood for constants was Algol68. Unfortunately it had some serious problems. If you used it you realised that it was hard to use, but the reason was not obvious at the time, but is with hindsight. The (main) problem was that it used Ref for variables and for pointers. That's just what we do in assembler, but it isn't confusing in assembler because there are no coercions. For variables you want Ref X to coerce to X, but for pointers you don't. Another problem was that Algol68 allowed procedure values, but since they weren't closures they were not very useful.

[In the previous paragraph I used the word "variable" to mean a mutable value, as opposed to a constant. Unfortunately the word is also used for other things, such as any identifier whose value is not known at compile time. That's why Wombat now has a type Mutable X instead of Var X [update: name changed to Assignable X as per Robert Harper]]

Serious programming that wasn't assembler became possible with BCPL (which later led to B and then C). It had procedure values and I remember trying to create a closure (not that I'd ever heard of the name). I remember saying to my boss "This should work but I don't think it will". It didn't.

Other languages were becoming available. In particular Simula 67 was a mile ahead of its time and I wrote some programs in good OO style before that style had a name. But I was more attracted to the ideas of functional programming, which were espoused seductively by W.H.Burge in the IBM Systems Journal: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5391254.

Once you have closures, and they can be full values, then you don't need any other mechanisms for control flow. For example if-then-else can be just a procedure with 3 parameters: a boolean and two closure parameters (each with no input and compatible output). I designed and implemented a baby language based on this idea and wrote it up for SIGPLAN. They managed to publish it even though I got the formatting all wrong: http://dl.acm.org/citation.cfm?id=988090.988097.

But my thinking ran into a brick wall. I wanted everything to be a procedure. In Algol68 an Array X is immutable and behaves just like a procedure. But how do you do a normal mutable array? Algol68's solution is to say that you can index a Ref Array X, and when you do you get a Ref X that you can assign to, allowing the array to be updated in place. This didn't seem right. A Ref Array X should point to an immutable array. What we really want semantically is an Array Ref X so that when you index it you naturally get an lvalue you can assign to. But that seemed like it meant a sequence of addresses which could point anywhere.

At this point I moved to CSIRO and spent 20+ years doing system management and research related to Internet security and commerce. But when I retired I started to think about it again, and I had an inspiration: Define types by their semantics and allow multiple implementations which need not cover all cases. This solved my problem because now an Array Ref X (Array (Assignable X) in Wombat speak) can have the expensive implementation of being a lot of possibly unrelated addresses, but it can also have the normal case implementation of a start address and length (and stride).

Semantic types had many other advantages, but there were still problems to solve. More on that another day.