Friday, November 15, 2013

Yin Wang on type inference

[update: Note that Yin Wang has deleted his post. Maybe he no longer agrees with it.]

Wombat hasn't made much progress recently. However I was delighted to discover Yin Wang. Basically his talk on type inference, http://yinwang0.wordpress.com/2012/10/19/type-inference/, explains why dynamic typing works where static typing seems to often get in the way.

He also says elsewhere that static typing is not much different to, nor much better than, static analysis of a dynamically typed language. This is very hopeful for Wombat, because Wombat's planned static typing is just based on static analysis, which is a required part of the language, not an add on.

Monday, July 8, 2013

some history

When I started there was really only assembly (machine) language. Assembler has some features that were right, but somehow the higher level language designers didn't like. In particular, every identifier stood for a constant value. For labels this was a memory address, but identifiers were also used to give names to constants.

The exceptional language design in which identifiers stood for constants was Algol68. Unfortunately it had some serious problems. If you used it you realised that it was hard to use, but the reason was not obvious at the time, but is with hindsight. The (main) problem was that it used Ref for variables and for pointers. That's just what we do in assembler, but it isn't confusing in assembler because there are no coercions. For variables you want Ref X to coerce to X, but for pointers you don't. Another problem was that Algol68 allowed procedure values, but since they weren't closures they were not very useful.

[In the previous paragraph I used the word "variable" to mean a mutable value, as opposed to a constant. Unfortunately the word is also used for other things, such as any identifier whose value is not known at compile time. That's why Wombat now has a type Mutable X instead of Var X [update: name changed to Assignable X as per Robert Harper]]

Serious programming that wasn't assembler became possible with BCPL (which later led to B and then C). It had procedure values and I remember trying to create a closure (not that I'd ever heard of the name). I remember saying to my boss "This should work but I don't think it will". It didn't.

Other languages were becoming available. In particular Simula 67 was a mile ahead of its time and I wrote some programs in good OO style before that style had a name. But I was more attracted to the ideas of functional programming, which were espoused seductively by W.H.Burge in the IBM Systems Journal: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5391254.

Once you have closures, and they can be full values, then you don't need any other mechanisms for control flow. For example if-then-else can be just a procedure with 3 parameters: a boolean and two closure parameters (each with no input and compatible output). I designed and implemented a baby language based on this idea and wrote it up for SIGPLAN. They managed to publish it even though I got the formatting all wrong: http://dl.acm.org/citation.cfm?id=988090.988097.

But my thinking ran into a brick wall. I wanted everything to be a procedure. In Algol68 an Array X is immutable and behaves just like a procedure. But how do you do a normal mutable array? Algol68's solution is to say that you can index a Ref Array X, and when you do you get a Ref X that you can assign to, allowing the array to be updated in place. This didn't seem right. A Ref Array X should point to an immutable array. What we really want semantically is an Array Ref X so that when you index it you naturally get an lvalue you can assign to. But that seemed like it meant a sequence of addresses which could point anywhere.

At this point I moved to CSIRO and spent 20+ years doing system management and research related to Internet security and commerce. But when I retired I started to think about it again, and I had an inspiration: Define types by their semantics and allow multiple implementations which need not cover all cases. This solved my problem because now an Array Ref X (Array (Assignable X) in Wombat speak) can have the expensive implementation of being a lot of possibly unrelated addresses, but it can also have the normal case implementation of a start address and length (and stride).

Semantic types had many other advantages, but there were still problems to solve. More on that another day.




Saturday, June 22, 2013

the wombat has landed

[update 2016-09-28: The outline is up to v 0.0.4. Have modified this post to change Mutable to Assignable.]

I've been working on the Wombat programming language (under various names) for 35 years. I finally have a design that I'm happy with, though I expect that feedback from others (if I get any) will lead to useful, and even necessary, changes.

So I'm happy to release the document "Outline of the Wombat Programming Language". It is in Google Docs, with comments enabled. Comments can also be added to this blog post. In google+ see +WombatLang and/or use #wombatlang. An issue tracker is available at https://code.google.com/p/wombatlang/.

[update: I'm happy to give talks about wombat to any group (or anyone) who is interested, within financial contraints. Within 3.5 hours drive from Melbourne or Sydney should be ok.]

The doc starts:
Wombat aims for simplicity and generality in a comprehensive modern programming language:
  • Expression language with simple left to right evaluation;
  • All optional, repeated, deferred or parallel execution through a single mechanism: the anonymous procedure (closure).
  • All identifiers stand for a constant value of some type. [Mutable variables are of a Assignable(X) type, whose constant value is effectively a memory address.] Identifiers in closures are set to their value at closure creation, and there is no confusion about what that means.
  • Identifiers are just compile-time names with lexical scope. Identifier values are set by unification.
  • Polymorphism resides in values, not identifiers or operators.
  • Operators are defined in a simple way, available to users, which supports suboperators (like then in if-then-else). This covers nearly all syntax.
  • Compile time entities and run time entities are handled with the same syntax. For example List is just a Type=>Type, and there is frequent use of tuples, lists and sets of Types.
  • In mathematics, infinite entities are studied, whether formally (coq) or informally (latex), by finite strings from finite character sets. Wombat copies this, letting the programmer work conceptually with simple infinite entities at compile time.
  • Operator macros utilize free variables to allow syntactic sugar, but also, more importantly, to provide a way of dealing conveniently with infinite entities at compile time.
  • Non-primitive types are defined by semantics. They can have multiple implementations.
  • Disambiguation of expressions uses information from the operands (input), and of the required result (output), equally.
  • The standard library is written in Wombat Compile Time Language (WCTL) and uses capabilities available to all library writers.

  • Wombat allows rich text, using bold, underlining, subscripts and an extended character set, but ASCII equivalents are available.