Friday, November 4, 2016

The case for Algol2020

I decided to take a different tack on my OBT submission. The new submission is at https://docs.google.com/document/d/1PQSc0UEHcYrfbiv_AGDxc8Wvjiy75UsyA5pM2mqRiyk/edit?usp=sharing, and also pasted below. And the pdf I actually submitted is https://drive.google.com/file/d/0B9G51OQiN3RuanMxTkFjT1pCRnM/view?usp=sharing. See you in Paris. Feel free to lobby the conference committee if you know them :-).
[updated]

The case for Algol2020

Robert Smart

There are many reasons why now is a good time for a new version of Algol, and since “we can see clearly now” it would be a shame to miss the opportunity for Algol2020.
Programming has got past the OO fever, and established a good, though still improving, feel for programming with values rather than variables. Algol68 had already made great progress in the latter direction with its fundamental principle that identifiers stand for constants.
Algol68 was a difficult language to use, and with 20-20 hindsight we can now understand why:
  • The dual use of ref types for pointers and for variables was a mistake. It is true that both have the same implementation as an address in memory, but for variables we want a dereferencing coercion to operate and for pointers we don’t. Indeed, more generally, we want types to be defined by their semantics and not their implementation.
  • Procedure values lacked power because they weren’t closures. This was a pity because the Algol68 rule that identifiers stand for constants is exactly what is needed to give closures a clear cut meaning.
It was a nice move towards functional programming that Algol68 had array X which was constant and behaved just like procedures Index=>X. But why then isn’t a normal mutable array an array(ref X)? Semantically you give it an index and get back a mutable value you can assign to. Instead Algol68 had a hack where a ref (array X) could be indexed and gave back a ref X, allowing one to change a part of the ostensibly constant array X value. Presumably this was intended to be more efficient, yet this is once again mixing semantics with implementation. An array(ref X) could conceivably have pointers to variables arbitrarily arranged in memory. However a simple start address and length could be a separate implementation that gives the required efficiency.
Similarly int64 can be an efficient implementation of a full int type, as long as we detect overflow and transition when necessary to a wider implementation such as BigInt.
Algol is naturally oriented to arrays rather than lists. However this is a distinction without a difference once types are oriented to semantics rather than implementation. Both lists and 1-dimensional arrays are just different implementations of what mathematicians call a “word”. This is hard to see because we are used to arrays of variables (array(ref X)) implemented by a start address and length, and you can’t prepend a new variable to that and retain the “start address and length” implementation. We can do it if we move to a fully general array(ref X), or preferably something in between. This is like moving from Int64 to BigInt when necessary.

The Wombat programming language

The Wombat programming language has been under development for 35 years (under various names) inspired initially by the desire to address the problems in Algol68 mentioned above. It has quite a lot of ideas that I would be keen to put before the Algol committee (IFIP WG 2.1), should they be interested in defining a new Algol. I list some of them here because they show what can be done within the Algol tradition.
  • Behaviours are like Haskell type classes, listing properties and laws. This is similar to interface/trait without allowing them to be used as if they were types.
  • It is natural to organise types in an embedding hierarchy. An Int is a Rational, and a Dog is a Mammal. This is done with apply/unapply pairs of functions (in Scala terminology). The hierarchy forms a bounded order lattice and this is the natural setting for Algol68-style Union. (Unrelatedly, we also need DisjointUnion which the application programmer will more normally use.)
  • The type at the bottom of our hierarchy has no elements. It is Empty in Wombat.
  • Automatic conversions and casts within the hierarchy go down to the intersection of source and destination, then up. When the intersection is Empty then other coercions, such as dereferencing, can occur.
  • Diamond ambiguity (choice of conversion paths) is addressed in Wombat by combining procedures so that the fact that the two paths give the same result is checked at run time. Combinable is a weakened Distinguishable that is sufficient for some uses. Combining is also used to support procedure declaration by multiple declarations with different parameter patterns.
  • The Algol68 declaration, such as “int five = 5” can, in modern style, be sensibly reduced to “five = 5”. It is natural then to allow the left side to pattern match and unpack structures. However we can have names on the left already declared from an outer level which are just representing their outer value. Wombat avoids any ambiguity by marking new names with a preceding backquote. It also extends the meaning of “=” to general unification. The benefit of this is that the same code can, in logic programming style, be declarative and can be run forwards or backwards in some circumstances.
  • Wombat has an Atom type, with values being an identifier with a leading dot (.), and no Behaviours except Distinguishable. So s.length is just a procedure call with Atom parameter .length. All values are allowed to behave as procedures in a type-specific way, so the convenience of OO method call syntax is delivered with simple procedure call semantics. So if s has type String then s behaves as (String.asProc s).
  • The advantages of lazy evaluation can be substantially realised by use of streams [Stream(X) = Unit=>(X*Stream(X))], and similar types, if they can be implemented in iterator style.
  • We can see that proofs are going to become an important part of programming. In Wombat it is expected that when the properties and laws of a type are defined using the properties and laws of an implementation type, then that should be proved. When a specialization is added to a procedure for some implementation of the input, then it should be proved that it actually does implement the same algorithm. Inverse pairs arise in several circumstances, and it is reasonable to expect a proof that they are inverses. And so on. This is an important but new and difficult area that might make it hard to hit the 2020 date.
Naturally there are many other researchers and developers who will want to add great ideas to a new Algol.

More information about Wombat is available on the blog wombatlang.blogspot.com. The posts are of varied quality and varied relevance to the current design. The post Rationale of the Wombat Type System is an alternative OBT submission that I prepared and is perhaps a good introduction. The outline doc, "Outline of the Wombat Programming Language", is reasonably up to date, but not currently very well organised.

No comments:

Post a Comment