Tuesday, December 26, 2017

code for Wombat/Marsupial Operators

[Update: This was an insanely premature announcement. Stay tuned ...]
[Update2: now looks roughly ok -- next step write an interpreter]

One of Wombat's many aims was to have user-defined operators that are so good that there is no need for operators to be built into the language at all.

  • Parentheses ("(...)") are just an operator with no left and no right that maps to the identity function.
  • Semicolon ("statement1;statement2;expression") is an operator: with left and right and low priority on both sides.
  • etc -- operators are the only syntax for most users.
We now support different versions of an operator with a left or with no left (e.g. x-y versus -z). Also operators overlap as long as they stay the same until they diverge by either a (mandatory) operator or by whether a mandatory operator is/isn't followed by an operand. E.g. "[" has 3 forms:
  • [ ] is the empty list
  • [ space-separated-elements ] is an ordinary list. Note that wombat uses space as a separator here, so procedure calls have to use adjacency f(x) or be parenthesised (f x). You don't like that? It's really trivial to change operators, and still be compatible with other code.
  • [ head | tail ] intended for pattern matching, but works either way.
The point is that both the compiler and the reader can understand which is which with only one token lookahead.

The code for this now passes the barest one line test. I'll get back to testing it when I have more wombat code. The code is in v0.0.1 under marsupial (https://github.com/rks987/marsupial/tree/v0.0.1). Marsupial is just the bare minimum, and the plan is eventually to separate out the include/import/require files that turn marsupial into wombat.

If any languages out there would like a comprehensive scheme for user defined operators then I might find time to help. It is written in python3, but really just uses json-level features (apart from using generators), so I think it will fit into other languages. The problem is interacting with compiler defined operators already there. Wombat doesn't have that problem because there aren't any!

Wednesday, June 21, 2017

No Holes in Closures, and "case" behaviour

No holes in closures

In Wombat closures are the only block structure. So closures are often called immediately that they are created, and it is easy to forget that they are potentially independent entities. In particular it is clearly wrong to allow references to external identifiers to refer to ones which are still holes and don't have values at the time of closure creation. For one thing the closure will run differently before and after the hole is filled.

Holes should only be passed to closures at closure execution time, via the input parameter ($) and the output (`$). So the code for the appendTest example (from Logic Programming in Functional Style) should be:

`append = { # $ is 2 input lists
case $ of [ # nb. "case x of y" just maps to "caseP y x" { $ = ([],`y); y } { $ = (`hdx +> `tlx, `y); hdx +> append(tlx,y) } ] }; print( append([1 2],[3 4])); # [1 2 3 4] [1 2 3 4] = append([1 2],print(`a)); # [3 4] [1 2 3 4] = append(print(`b),[3 4]); # [1 2]

 And the code for the factorial example (from Quantum, Entropic and Entangled computation) should be:

    `fact = { case $:Nat of [
                { $ = 0; 1}
                { $ = `n >? 0; n*fact(n-1)}

    6 = fact `x; print x

Case Behaviour

Another problem in the factorial example is that the 2nd option won't terminate without substantial and inappropriate compiler cleverness, as the rules stand. I could use firstCase instead, which would work in this particular example, but it is a hack (even though it is the only case construct in most languages). So my current thinking is that when running backwards and matching the output instead of the input it should act like anyCase and abort other branches when a match is found. But I don't think that is quite the right answer.

Thursday, June 15, 2017

Quantum, Entropic and Entangled computation

Quantum, Entropic and Entangled computation

[See the new post https://wombatlang.blogspot.com.au/2017/06/no-holes-in-closures-and-case-behaviour.html for fixes to problems in this post.]

In the quantum world every event is reversible. But above that we have the world of everyday experience where entropy always increases, time moves relentlessly forward and events are not reversible.
In computing there is a similar situation. Some functions don't lose any information when run in the normal forward direction, and these should be able to be run backwards to give the inverse function. Factorial is such a lossless function, and it is interesting to run it backwards in Wombat. [Relevant aspects of Wombat are given at the end for newcomers].
`fact = {   $:Nat = `n;
           caseP [
               {n=0; 1}
               {n>?0; n*fact(n-1)}
           ] ()
6 = fact `x; print x
  • `x = H1 -- call fact (Holes are given as Hn, as they are created).
  • in fact for the 1st time. $=n=H1. `$=6.
    • in 1st case n=H1=0, `$=1 fails
    • 2nd case. n=H1, 6=H1*fact(H1-1)
  • in fact 2nd time. $=n=H2=H1-1. `$=H3. 6=H1*H3.
    • in 1st case. H2=0. H1=1. H3=1 -- 6=1*1 fails
    • 2nd case. H2>0. H1>1. H3=H2*fact(H2-1).
  • in fact 3rd time. $=n=H4=H2-1=H1-2. `$=H5=fact(H2-1). H3=H2*H5. => 6=H1*H2*H5
    • 1st case. H2=1. H1=2. H5=1. But 6=H1*H2*H5=2 fails.
    • 2nd case. H4>0.H2>1.H1>2. H5=H4*fact(H4-1).
  • in fact 4th (and last) time. $=n=H6=H4-1=H2-2=H1-3.
                           `$=H7=fact(H4-1). H5=H4*H7 => 6=H1*H2*H4*H7
    • 1st case. H6=0. H4=1. H2=2. H1=3. H7=1. 6=H1*H2*H4*H7 succeeds! The answer (x=H1) is 3.
    • 2nd case. Exercise: Prove that result must be greater than 6... fails
As we recurse, we keep creating new holes, and having to remember the relationships between them. This is vaguely reminiscent of thunks in lazy languages. It is also feels like entanglement, though whether it could have anything to do with quantum physics entanglement is left as an exercise for the reader.

some relevant features of Wombat are:
  • An identifier is preceded by backquote when used for the first time. It starts life as a hole, and like all holes it can only be filled in once. `x:Int; x=3 (Explicit typing is optional.);
  • An explicit procedure (closure) is just an expression in braces -- { x+1 } ;
  • A closure's input is $ and its output is `$. The input is commonly a tuple which is unpacked immediately, and $ is never mentioned again -- { $ = (`x,`y); x+y } ;
  • If `$ isn't explicitly unified, then it is unified with the whole expression: {$+1} means {`$=$+1}.
  • For each boolean operator, there is a version with ? (such as >? above) which succeeds or fails, instead of returning true or false. The failure is informational, allowing alternative paths. A double ? would cause a fatal error when false. The ubiquitous = operator is the same as ==?.
  • caseP takes a list of procedures, passing the 2nd parameter (just () above) to each. It expects exactly one to succeed, giving its result. Actually more than one can succeed if they give the same or compatible results, but that's another story. The example above would have been easier if I'd used firstCase instead.

P.S. I fixed the formatting in the previous post. Don't ask me how Blogger corrupted it.

Monday, June 12, 2017

Logic Programming in Functional Style

[See https://wombatlang.blogspot.com.au/2017/06/no-holes-in-closures-and-case-behaviour.html for updates on this post. Yes the github code is now wrong ...]

[N.B. There is code. In https://github.com/rks987/appendTest I have hand-compiled the example below to an AST, and written a pretty-printer to check it is right. Next step is to write an interpreter for the AST. How hard can that be :-).]

Wombat was designed to be a (mostly) functional programming language with some logic programming capabilities. But it turned out that you can't be half-hearted about logic programming. However the functional roots shine through, giving Wombat a familiar look for most programmers. But behind the scenes, unification is everywhere.

Procedures are also ubiquitous in Wombat. They always have exactly one input and one output. Even things that aren't procedures can act like procedures. In normal functional programming the input is specified, and the output starts out as a hole that the result gets put into. In Wombat both the output and the input are passed to the procedure. Either can be a hole. One or both might be structures which include holes. Consider

(x,y) = f(1,2)

Here "=" causes unification. One might think that the function f will be called, it will return a pair, and unification will then happen. But this is not how Wombat works. Instead (x,y) is unified with f's output, (1,2) is unified with f's input, and execution of f then starts.

Before we look at a more interesting example, some relevant features of Wombat are:
  • An identifier is preceded by backquote when used for the first time. It starts life as a hole, and like all holes it can only be filled in once. `x:Int; x=3 (Explicit typing is optional.);
  • An explicit procedure (closure) is just an expression in braces -- { x+1 } ;
  • A closure's input is $ and its output is `$. The input is commonly a tuple which is unpacked immediately, and $ is never mentioned again -- { $ = (`x,`y); x+y } ;
  • If `$ isn't explicitly unified, then it is unified with the whole expression: {$+1} means {`$=$+1}.
  • A list is given by elements in square brackets separated by spaces. The +> operator adds an element to the head of the list and is invertible.
  • print returns its argument.

Here is the classic list append program (using the caseP procedure, rather than the almost identical case syntactic sugar):

`append = {
$ = (`x,`y); # 2 input lists caseP [ { x=[]; y } { x = `hdx +> `tlx; hdx +> append(tlx,y) } ] () }; print( append([1 2],[3 4])); # [1 2 3 4] [1 2 3 4] = append([1 2],print(`a)); # [3 4] [1 2 3 4] = append(print(`b),[3 4]); # [1 2]

Consider the last line. Execution proceeds concurrently:
  • x is unified with print(`b) and y with [3 4];
    • print is called with its `$ set to the hole x, and its input set to the hole `b. Since it is going to have an effect it has to stall waiting for one or other to be filled. If there were any later effects they would also stall, even if ready to go, because of a lexical ordering requirement.
  • At the same time caseP is called with input set to unit (=()), and output set to the output of the whole procedure (i.e. [1 2 3 4]) since it is the last expression. Now caseP calls all procedures in its list expecting precisely one to succeed. In this case:
    • Closures execute in a sandbox where unifications with holes from outside are tentative and only make it out if the procedure doesn't fail. If the outside hole gets filled in while the closure is executing then the unification is made firm if it agrees with the tentative binding, or the closure fails if it doesn't.
    • So when we look at the first procedure in the caseP, it tentatively unifies x with [], then tries to unify y=[3 4] with `$=[1 2 3 4]. This fails, so that closure fails.
    • At the same time we start the more complex 2nd closure. The first line creates a relationship between the 3 holes: x, hd and tl. The 2nd line then unifies [1 2 3 4] with (hd+>append(tl,y)). this sets hd=1 and unifies [2 3 4] with append(tl,y). So we call append recursively with `$=[2 3 4] and $=(tl,y).
    • The following time that append is called we have `$=[3 4] and then the first closure succeeds (while the 2nd fails), so that when it returns it establishes its parent's y as [3 4], tlx=[] and hdx=2. This resolves the previous line giving x=[2].
    • When this returns the output of print(`b) is unified with [1 2] which in turns sets b to [1 2] and allows the print to proceed.
    • If we weren't going to use b subsequently we could have just written print(_) because _ is always a new hole.

Friday, November 4, 2016

The case for Algol2020

I decided to take a different tack on my OBT submission. The new submission is at https://docs.google.com/document/d/1PQSc0UEHcYrfbiv_AGDxc8Wvjiy75UsyA5pM2mqRiyk/edit?usp=sharing, and also pasted below. And the pdf I actually submitted is https://drive.google.com/file/d/0B9G51OQiN3RuanMxTkFjT1pCRnM/view?usp=sharing. See you in Paris. Feel free to lobby the conference committee if you know them :-).

The case for Algol2020

Robert Smart

There are many reasons why now is a good time for a new version of Algol, and since “we can see clearly now” it would be a shame to miss the opportunity for Algol2020.
Programming has got past the OO fever, and established a good, though still improving, feel for programming with values rather than variables. Algol68 had already made great progress in the latter direction with its fundamental principle that identifiers stand for constants.
Algol68 was a difficult language to use, and with 20-20 hindsight we can now understand why:
  • The dual use of ref types for pointers and for variables was a mistake. It is true that both have the same implementation as an address in memory, but for variables we want a dereferencing coercion to operate and for pointers we don’t. Indeed, more generally, we want types to be defined by their semantics and not their implementation.
  • Procedure values lacked power because they weren’t closures. This was a pity because the Algol68 rule that identifiers stand for constants is exactly what is needed to give closures a clear cut meaning.
It was a nice move towards functional programming that Algol68 had array X which was constant and behaved just like procedures Index=>X. But why then isn’t a normal mutable array an array(ref X)? Semantically you give it an index and get back a mutable value you can assign to. Instead Algol68 had a hack where a ref (array X) could be indexed and gave back a ref X, allowing one to change a part of the ostensibly constant array X value. Presumably this was intended to be more efficient, yet this is once again mixing semantics with implementation. An array(ref X) could conceivably have pointers to variables arbitrarily arranged in memory. However a simple start address and length could be a separate implementation that gives the required efficiency.
Similarly int64 can be an efficient implementation of a full int type, as long as we detect overflow and transition when necessary to a wider implementation such as BigInt.
Algol is naturally oriented to arrays rather than lists. However this is a distinction without a difference once types are oriented to semantics rather than implementation. Both lists and 1-dimensional arrays are just different implementations of what mathematicians call a “word”. This is hard to see because we are used to arrays of variables (array(ref X)) implemented by a start address and length, and you can’t prepend a new variable to that and retain the “start address and length” implementation. We can do it if we move to a fully general array(ref X), or preferably something in between. This is like moving from Int64 to BigInt when necessary.

The Wombat programming language

The Wombat programming language has been under development for 35 years (under various names) inspired initially by the desire to address the problems in Algol68 mentioned above. It has quite a lot of ideas that I would be keen to put before the Algol committee (IFIP WG 2.1), should they be interested in defining a new Algol. I list some of them here because they show what can be done within the Algol tradition.
  • Behaviours are like Haskell type classes, listing properties and laws. This is similar to interface/trait without allowing them to be used as if they were types.
  • It is natural to organise types in an embedding hierarchy. An Int is a Rational, and a Dog is a Mammal. This is done with apply/unapply pairs of functions (in Scala terminology). The hierarchy forms a bounded order lattice and this is the natural setting for Algol68-style Union. (Unrelatedly, we also need DisjointUnion which the application programmer will more normally use.)
  • The type at the bottom of our hierarchy has no elements. It is Empty in Wombat.
  • Automatic conversions and casts within the hierarchy go down to the intersection of source and destination, then up. When the intersection is Empty then other coercions, such as dereferencing, can occur.
  • Diamond ambiguity (choice of conversion paths) is addressed in Wombat by combining procedures so that the fact that the two paths give the same result is checked at run time. Combinable is a weakened Distinguishable that is sufficient for some uses. Combining is also used to support procedure declaration by multiple declarations with different parameter patterns.
  • The Algol68 declaration, such as “int five = 5” can, in modern style, be sensibly reduced to “five = 5”. It is natural then to allow the left side to pattern match and unpack structures. However we can have names on the left already declared from an outer level which are just representing their outer value. Wombat avoids any ambiguity by marking new names with a preceding backquote. It also extends the meaning of “=” to general unification. The benefit of this is that the same code can, in logic programming style, be declarative and can be run forwards or backwards in some circumstances.
  • Wombat has an Atom type, with values being an identifier with a leading dot (.), and no Behaviours except Distinguishable. So s.length is just a procedure call with Atom parameter .length. All values are allowed to behave as procedures in a type-specific way, so the convenience of OO method call syntax is delivered with simple procedure call semantics. So if s has type String then s behaves as (String.asProc s).
  • The advantages of lazy evaluation can be substantially realised by use of streams [Stream(X) = Unit=>(X*Stream(X))], and similar types, if they can be implemented in iterator style.
  • We can see that proofs are going to become an important part of programming. In Wombat it is expected that when the properties and laws of a type are defined using the properties and laws of an implementation type, then that should be proved. When a specialization is added to a procedure for some implementation of the input, then it should be proved that it actually does implement the same algorithm. Inverse pairs arise in several circumstances, and it is reasonable to expect a proof that they are inverses. And so on. This is an important but new and difficult area that might make it hard to hit the 2020 date.
Naturally there are many other researchers and developers who will want to add great ideas to a new Algol.

More information about Wombat is available on the blog wombatlang.blogspot.com. The posts are of varied quality and varied relevance to the current design. The post Rationale of the Wombat Type System is an alternative OBT submission that I prepared and is perhaps a good introduction. The outline doc, "Outline of the Wombat Programming Language", is reasonably up to date, but not currently very well organised.