The Wombat Programming Language: 2014

Friday, October 3, 2014

Wombat Type Hierarchy

It is natural to regard some types as being contained in others. An Integer is a Rational. A Dog or a Cat is a Mammal or a Quadruped. Programmers expect this to mean that they can use an Integer where a Rational is expected, and so on. If you follow this through to its logical conclusion you are bound to get the Wombat solution outlined below, or something very close. Note that this is about lossless conversion which can be reversed in a “case” statement. Wombat also allows for lossy conversion with convertsTo which will be covered in a separate post.

The relationship between types is declared (and tested) with variants of the “isA” operator. When declaring that Int isA Rational we provide a pair of functions:

The “to” function creates a Rational with denominator 1;
The “from” function fails if the denominator is not 1, else returns the numerator.

The effect of declaring Int isA Rational is that Int acquires all the semantics of Rationals. For example if r:Rational then we can get the denominator as r.denominator. This must now apply to Ints as well. By default the Int value will be converted to Rational, then the denominator will be extracted. It is possible and easy to add a specialization so that for Int values it returns 1 directly, though it seems unlikely that it would get much use.

Also, all overlapping semantics (such as, in this case, addition and multiplication) have to be compatible. This is a delicate matter, particularly if the types come from modules from different sources, so we’ll defer it to a later post.

Another motivating example would be similar to OO style subtyping. Suppose we have types Mammal, Quadruped, Cat and Dog such that Cat isA Mammal; Dog isA Mammal; Cat isA Quadruped; Dog isA Quadruped. Now what can we make of this list:

[ myDog myCat anotherDog ]

What is it a list of? It could be a list of Mammals, or a list of Quadrupeds. Should we make the programmer decide? What if he wants to pass this to a procedure that expects a list of Mammals, but later wants to pass it to a procedure that expects a list of Quadrupeds? It is stuff like this that drives people away from static typing to dynamic typing.

In Wombat: for every set of Types there is a Union which is the smallest Type that is above all of them. So our list above is a list of Union(Dog,Cat). This automatically behaves correctly with respect to declared relations. So Union(Dog,Cat) isA Mammal and Union(Dog,Cat) isA Quadruped. Note that all the semantics of Mammal and of Quadruped are also compatibly in Dog and Cat. So there is no problem bestowing them on Union(Dog,Cat). In a Union where the components have no common higher isA relation defined, such as Union(Dog,Int), then there are no defined semantics other than those from Union.

To fill out our list example we need to have a rule that whenever X isA Y then List(X) isA List(Y). Which gets into yet more delicate areas that we’ll leave for another day.

On the other side we have Intersection(Mammal,Quadruped) which is the largest type that is below each of them, and gets rid of those pesky bipedal mammals. There has to be at least one type that is declared to be below all the types in the Intersection, otherwise the Intersection is Empty.

The rules for the interaction between Union and Intersection need to be extended in obvious ways (such as associativity) and this makes Types into a lattice (http://en.wikipedia.org/wiki/Lattice_(order)).

Wombat’s type-assertion/conversion operator is “:”, and it can go up or down or even sideways. So myDog:Mammal always succeeds (calling the “to” conversion), myMammal:Dog calls the “from” conversion and may fail (allowing other options to be tried in a case statement). Finally myMammal:Quadruped will go down to the Intersection (possibly failing in the bipedal case) then up. Only when the Intersection is Empty does “convertsTo” come into play.

Yet another future post will show how the type hierarchy gives Wombat named parameters with default values.

[update 2016-09-28: That last promise was never fulfilled. It's too easy for a post. If you want named arguments to a procedure, use a Struct parameter. Structs allow default values which have this effect: a Struct without the defaulted field will convertTo any type with extra fields that have defaults. Voila.]

Sunday, September 28, 2014

Whither Wombat?

We interrupt the sequence of posts on aspects of Wombat for a couple of planning posts. This one on language design “decisions”.

Under the influence of Bob Harper, and also guys from the Mercury team, I am giving up my flirtation with lazy evaluation. It has always been my expectation that Wombat’s good support for closures would cover this. I now realise that I also need to handle memoization well. Anyway a fair bit of thought needs to go into making memoized streams as convenient as lazy lists, and also covering other lazy data structures.
Also from reading Harper, I see that he has had the same issue as me with the overloaded use of the word “variable”. My solutions was to use “mutable”. However he has already invented the word “assignable”, and since we don’t need 2 words for the same thing, I will, at some point, change the doc from “Mutable” to “Assignable”.

More substantively is the issue of the status of laws/axioms. It used to be that these were just comments. Then languages started including them in an executable form, with a bit of handwaving “quickcheck style checking programs might use this”. That’s where Wombat came in, and the way it is specified in the doc.

But I don’t think this phase will last long. It is time to do it properly and prove all claimed properties:

Primitive types will come with axioms describing their behaviour, from the X86/JVM/whatever manual.
The semantics of other Types will be proved from the code implementing the semantics using the assumed (if primitive) or proved (if not) semantics of the implementation.
Other things that need to be proved include:

proving that a supplied specialization does indeed implement the same algorithm as the original;
proving that the procedures linking one implementation to/from a more comprehensive one (usual the main one) are inverses (where the reverse is defined).
…

Just as the time has come for programming to help mathematicians deal with complexity, so it is time for mathematics to help control the complexity of programming. While this looks really ambitious, the main plan is to get the syntax right so that it can be brought in later. It is expected that initially the only proofs will be “admitted” (as they say in Coq). However it is expected to be extremely useful to think clearly about what should be proved.

A key decision needs to be made about WCTL: Wombat Compile Time Language. While the plan is for it to be a Wombat DSL, in the short term, to get going, it seems like a good idea to start with some existing scripting language which is well integrated with some target machine. A possibility is Pure which has a good interface to LLVM, and also to C code which could call more obscure parts of LLVM (like Integer overflow).

Friday, September 26, 2014

HoTT topics

HoTT stands for Homotopy Type Theory. Sounds scary and maybe it is. I’m slowly learning about it in various ways, mostly by watching Bob Harper’s lectures at CMU. The subject area of HoTT is “Proof Theory”. I’ve been known to say that Mathematics is about thinking clearly about problems. Proof Theory is mathematicians trying to think clearly about how to do Mathematics. While the details aren’t sorted out, there seems to be good reason to believe that HoTT is a major advance in this subject, and hence in Mathematics itself.

It has been known for many decades that there is a correspondence between logic and programming that is usually expressed as “Propositions are equivalent to Types, and proofs are the same as programs”. In HoTT this becomes true in a practical way. It looks like all or most of Mathematics can be formalized as files that can be checked by a proof assistant. Note that Mathematics is a human activity, and is about human understanding. So Mathematics can not normally be in a final form in these files since the implications for understanding need to be communicated in a human way.

In the world of Proof Theory we don’t have the law of the excluded middle. For a given proposition we might have a proof or a refutation, but until we do the status of the proposition is unknown. In Wombat there is a WeakBoolean type which also includes Unknown as well as True and False, and this has various applications which can be seen as being related to those in Proof Theory.

The core of HoTT is thinking clearly about equality. The very first bit about that (in Harper’s lectures 5 and 6) talks about the difference between Intentional equality, which follows by direct application of the definition of a Type, and Extentional equality which requires a more substantial proof. This has a close analogy in Wombat where primitive types are always Distinguishable (i.e. have a Boolean equality) just based on the representation as a string of bits. All types are then SemiDistinguishable (i.e. have a WeakBoolean equality) based, ultimately, on that Equality: I.e. (roughly) if 2 values of a Type are equal as bits then they are automatically equal, but finer concepts of equality (up to Distinguishable which excludes Unknown) require actual code (a “program” corresponding to a “proof”).

Friday, September 19, 2014

Wombat Types and their implementations

Wombat has quite a few little ideas in it, and one big idea. The big idea is this:

Types are defined by their semantics. A type can have multiple implementations with different performance characteristics.

Since changing implementation doesn’t change the semantics, the programmer doesn’t specify the implementation as part of the language. Programmers can influence the compiler’s implementation choices with pragmas.

An important addendum is that implementations need not cover all values in the type. There does have to be an implementation that covers all possibilities, and this is specified when the type is created. Here are some examples:

Int is the type of integers.

The comprehensive implementation is BigInt, which is, in turn, implemented by List Int64.
The normal implementation is Int64.

If an Int64 operation overflows then the operation is rerun giving a BigInt result and the whole continuation of everything that follows is switched to one using the BigInt implementation for calculations depending on that value that overflowed.

Array(Assignable X) is the type of traditional arrays from imperative languages: i.e. the bounds don’t change after creation but each entry can be updated. Wombat is functionally oriented, but not prescriptive. This example is the one that inspired the type/implementation split because the comprehensive implementation isn’t something one would normally want.

The comprehensive implementation is just an array of addresses in memory where an X is stored. The addresses are not (necessarily) laid out in any neat way.
The normal implementation is just a start address plus, in the 1d case, a length (and maybe a stride). Slightly more complex in higher dimensions.

This case is discussed in other blog posts: “Solving a 30 year puzzle” and “some history”.

Any value known at compile time can be implemented by Unit. I.e. we don’t need to allocate any space on the stack to pass it as a parameter. Though it is a slightly complicated example, let us consider the expression s.length from the previous post. This is actually String.asProc(s).length, and in a normal way String.asProc(s) is just a closure returned from within the String.asProc code, and inside which s is locked down. In order for the compiler to use the zero length implementation of the Atom parameter, there has to be a specialization of that closure for when the parameter is exactly .length. Likely String.asProc is just one big case statement and this specialization will just throw away all the code except the little bit returning the length. So we see that for practical purposes we end up with the same result as in languages where a method does map to a special sort of procedure.

I won’t go on here, except to say that this is a very fruitful concept, and more examples will appear.

I gave a talk on this once. The visiting expert at the end asked a question which I couldn’t understand, and hence answered incoherently. Afterwards I realised that he was suggesting that this was the same as interfaces/traits. Obviously my talk did not do a good job of explaining the concept. Or maybe the visiting expert was jet-lagged. Anyway this reminds me: if anyone wants me to give a Wombat talk, let me know and I’ll figure out if I can get there.

[update 2014-09-27: I should have mentioned the interconversion rules for implementations, which is a key thing that is different from interfaces/traits where no interconversion is expected. Every new implementation has to give a pair of functions for converting to/from some other implementation (commonly the initial universal implementation). The "to" direction has to be 1-1, and the "from" direction fails for values that aren't covered by the new implementation.]

Wednesday, September 17, 2014

atoms in Wombat

Since I'm too busy/weak to do more important things, I'm going to do some posts on specific aspects of the Wombat design. Hopefully this will be easier to understand than the doc "Outline of the Wombat Programming Language". I'll try to do it in an order that avoids too many forward references.

Lisp has always used atoms a lot, and perhaps gave them a bad name in the early days by adding a mutable value to each to give dynamically scoped mutable variables (yikes!). Wombat separates atoms firmly from variable identifiers by using a different syntax. An atom is an identifier preceded by a dot ("."). So .length is an atom. The only property of atoms is that they are distinguishable, so they can be easily used in conditional constructs like “if” and “case”.

The syntax is carefully chosen (i.e. it’s a hack) because the main use of atoms is to pick out methods. The expression s.length can occur in many languages. However in Wombat the dot is not a syntactic element, it is just a part of .length.

Of course lisp variants use atoms to pick methods as well. This looks like it is creating global names (as some uses of atoms in lisp do), but it isn’t. In any language where we can write s.length, the “length” method clearly has its meaning restricted to the context of s. In effect s is acting as a namespace of names that are relevant to it. This is much nicer than python’s len(s), in which len is a name polluting the global namespace, but without a clear meaning independent of s.

Well I had to laugh. Of course Haskell field selectors avoid the imaginary abuse of global names. But I notice that the next version will have a feature: OverloadedRecordFields. Yes, Haskell has rediscovered atoms.

In Wombat the expression s.length is a function invocation. It just follows the Haskell style in which the function is immediately followed by the parameter, and binds tightly to it. Parentheses are not needed when the parameter is a simple expression such as an identifier or an atom.

But, I hear you say, s is a String (let’s say) not a procedure. Yes, but in Wombat every value can act as a procedure. Every type, such as String, has a method .asProc which defines the behaviour as a procedure. So in this case, s.length means String.asProc(s).length. We gloss over the recursive character of this definition...

Atoms turn out to have other uses, but they are all similar in taking their meaning from a nearby context.

expanding the derivative

The arithmetic derivative (discussed here: https://plus.google.com/101584889282878921052/posts/9nY35Ma1pbU) is cute but perhaps not that important.

However differentiable functors seem important for programming. They take data structures and return a "one hole context" for that structure. This makes it possible to write code that traverses data structures in a very general way.
In http://stackoverflow.com/questions/25554062/zipper-comonads-generically/25572148#25572148, Connor McBride shows how this can involve partial differentiation. Wow.

As we see it is pretty messy in Haskell, even with multifarious optional features enabled. I suspect that this is stuff which is an indicator of whether you've got your programming language design right.

Tuesday, September 2, 2014

Yin Wang talk in 2012

It would have been nice to hear Yin Wang's talk in 2012 (from http://lambda.soic.indiana.edu/talks-fall2012). Sadly some of his posts have disappeared from his blog. I wish I'd kept a copy.

Oct 19: A Fresh View at Type Inference

Speaker: Yin Wang

In this talk, I present a simple and unified understanding of type inference systems ranging from the simplest to the most complicated. Those include:

The polymorphic Hindley-Milner system
MLF and its contemporaries
Intersection type systems
Polar type systems and bidirectional type checking

I attempt to answer the following questions:

What are the key intuitions behind type inference?
What is wrong with let-polymorphism and how to fix it?
Why are type systems more powerful than Hindley-Milner system seldom used?
Why is there always a conflict between expressiveness and effectiveness and how to find a balance point?

This talk is suitable for both beginners and advanced researchers. No prior knowledge of type systems and typing rules are required. Some understanding of lambda calculus may be helpful.

Monday, September 1, 2014

Simula67 and the threaded stack

Simula 67 is just a few years from its 50th birthday. It was a long way ahead of its time. One of the things that made it possible to do all the things it did, like simulated or real time concurrency, was that it didn't use the hardware's simple but efficient stack. Instead it threaded the stack on the heap. This made it easy to have multiple stacks going simultaneously. [However it was stupid that, at least when I used it, it didn't return popped memory to free, but just left it around for the garbage collector.]

This seems like it would naturally make it a lot easier to have programs containing laziness, concurrency and parallelism, with separate processors working on separate subexpressions. Forty years ago the extra overhead of threading the stack was too inefficient (and we were pleased when C came along, even though it was much less powerful). But now I think it would be ok. Even better if memory technology supported it.

[update 7-Sep-2014: My old brain finally remembered that in Simula67 an object was just a function that left its local variables and inner procedures lying around after it finished. Also you could transfer pointers (refs) to local stuff without problems because these stack areas wouldn't be GCed if there were pointers into them. So there were substantive reasons why it worked as it did.]

Sunday, August 31, 2014

lazy closure creation

Some issues with closure creation become more important in a lazy environment. Consider this simple Wombat Void=>Int closure:

{ x+y }

Since the add doesn't involve the (empty) parameter, it is a subexpression of closure creation, not of closure execution. In an eager environment we mightn't want to do the add until the closure is called to avoid doing it unnecessarily (this would be more clear if it was a more expensive operation). But (yet another) nice thing about being lazy is that we can do these things correctly without cost (apart from the cost of laziness itself).

It is not so easy to switch Wombat to being lazy. The intention was that the language would have 3 levels:

A very low level that maps directly to the particular hardware format. Let me call this a grade in Mercury style. This is the primitive procedures and types that vary with grade.
The next level up is C-like and abstracts out the grade-specific crud, but still doesn't provide a safe programming environment for most uses.
At the top level we have types and operations with mathematically nice semantics, designed for most uses. This is where laziness might be appropriate.

parallel laziness

I've just started reading Paul Bone's thesis: https://mercurylang.org/documentation/papers/pbone_phd_thesis.pdf. I got to the bit that said "laziness makes parallelization harder". That set me thinking. Here is some idle speculation.

When we are executing our lazy program we can look at every possible expression in our program, and classify them:

There are expressions that we need to evaluate to advance the next IO. This includes all subexpressions that we will definitely need to evaluate.

These have subexpressions that we might or might not need to evaluate. In wombat these will always be in closures. Note that creating those closures (filling in the values of identifiers) is also an expression.

On the other side of the coin, and working from the bottom up instead of top down, there are expressions that we could evaluate because we have all the parameters.

And conceivably we could order these by how likely the value is to be useful, either for the current IO or for possible future IOs. And conceivably we could improve this ordering as time goes on, based on the actual behaviour of the program.

So one can imagine a program that has as many cores as the operating system wants to give it at any particular time. And those cores can beaver away on all the available work that can be done...

And doesn't this remind me of work that was going on in the next lab when I was younger: http://www.eganfamily.id.au/archive30nov2007/monash/research/projects/CCC/index.html. (Actually their lab was up the road at RMIT since it was a joint CSIRO-RMIT project.)
[update!! And I got this wrong: the Prof Greg Egan is not the the Greg Egan (http://gregegan.customer.netspace.net.au/) who is an author and amateur (I presume) mathematician who collaborates with John Baez]. I also see Mark Rawling in the photo. Later we (mostly Mark) wrote a C++ library for reactive programming that is somewhat related to this post. And I now remember that I applied for a job with that CSIRAC2 project, because of my interest in functional programming which they were using. I didn't get it :-(.

Saturday, August 30, 2014

laziness, backtracking and explicit io threading

from http://gifsoup.com/view/37759/lazy.html.

When I first heard about laziness in programming languages I thought "That's just an optimization technique". Then I saw the example of an infinite lazy list and I thought "That's just a stream, umm I mean a memoized stream [1]". Well yes, but streams are uglier and harder to think about than lists, and this quickly gets worse for more complex structures. So I've come around to being rather sympathetic to laziness by default.

And laziness seems closely related to an important emerging trend in programming, which is to allow the programmer, as much as possible, to write the program as specifications and tests rather than in the traditional style of "step by step instructions to a complete idiot". This is also related to the trend that the programmer gets to deal with entities with mathematically nice semantics rather than entities with hardware-level semantics (and this is an important aim of Wombat).

If we're lazy then we need something to inspire us to do anything at all, and in programming it is the need to do IO that provides the inspiration. And, of course, we need to do the IO in the correct order, so you come around to the need to explicitly tell the compiler when IO is needed and in what order. In Haskell that is done with the IO monad. I won't explain the details since Mercury has a similar but conceptually simpler system.

I vaguely knew that the Mercury programming language (mercurylang.org) also had explicit threading of IO. So when I learnt about laziness, I naturally assumed that Mercury was also lazy and that was the reason for the IO threading. Now that I have (finally) started to learn Mercury [2] I see that this is wrong. Mercury is a logic programming language, in the tradition of Prolog, but much more hard core about being declarative. The reason it is explicit about IO is because it supports backtracking when searching for a logical result. So it needs to thread IO to prevent backtracking into an IO operation that has already happened.

The way Mercury does this is that procedures which want to do IO (starting necessarily with main) have:

an extra input parameter representing the world before the procedure is called. This parameter is marked "will be destroyed" so that it can't be reused after the procedure is called.
an extra output parameter representing the world after the procedure returns. This value is marked "unique", meaning that there are no other copies of it. And naturally this has to be used as the input parameter of the next IO operation.

This could get tedious, but there is syntactic sugar to handle the bookkeeping. An obvious question is: Why does Mercury have eager evaluation instead of lazy, since it is declarative? Is there some interaction with logic programming and backtracking that makes laziness difficult? I don't know the answer to that, but here's a different question.

How do you handle the important need for IO concurrency? In Haskell the guys from Facebook have recently released their Haskell library for concurrently reading from multiple sources. However they are careful to only claim to handle reading. So I'm guessing that handling concurrent changes to the world is not so easy.

In Mercury one can write code (I hesitate to say "function") where spawned procedures can fan out and rejoin (actually I don't yet know how to rejoin before end of program, but I'm sure it is possible). Anyway this is all nicely explained by Mark Brown in this email: http://www.mercurylang.org/list-archives/users/2014-August/007757.html. You need to learn some Mercury to understand it, but I recommend that.

----------------------------------------------------------------
[1] A stream is just a procedure (taking no input) which either returns an end-of-stream marker or it returns a pair consisting of the current value (like the head of a list) and the next procedure to call (like the tail of the list). By a memoized stream I mean that all those procedures are memoized (which means those procedures quickly return their result rather than recalculating it when called more than once).

[2] Learning Mercury is easier than on my previous attempt (15 years ago) because of the rather nice, though unfinished, tutorial by Ralph Beckett which is at the top of their documentation page.

-----------------------------------------------------------------
Update 2014-09-20: I just saw Bob Harper's post "The Point of Laziness". He basically endorses memoized Streams as doing the job of lazy lists. I'm inclined to be convinced, and Wombat's neat terse procedure format should suit. It is a challenge to handle more complex lazy structures nicely with procedures.

Update 2016-09-28: The current spec calls for values to start as holes and be passed around like that till set. This does some of the job of laziness. Streams can be created in interator/yield style, making streams much easier to use.

Saturday, August 16, 2014

IO in a declarative or functional language

Haskell is declarative and functional. Its supporters seem to hang out more with the functional crowd. Maybe they think it is easier to sell "declarative" to functional folk, than to sell "functional" to other declarative folk. And there are good reasons why that might be so.

Suppose we declare that

y = x + 1

[This is not an assignment. Imagine a val in front if that is fits your language preferences]. This declares a relationship between x and y, but it does it in a functional style. It would look more functional still if we wrote "y = add 1 x", but that is just sugar.

But what if we have y and we want x? Since we are just declaring a relationship it isn't obvious why we have to change this. But in functional style this has to be changed to:

x = y - 1

Functional programs can only run forward, they can't run backwards. Haskell has, of course, a big exception to that: Constructors can be run forward (to construct) or backward (to deconstruct). But perhaps everything that can run backwards should be allowed to (and hence be useable in case statements). In addition to convenience there is a dodgy philosophical point.

When we look at the way the world works it is rather like a giant computer program. And it is particularly like a program in that, on a large scale it pushes forward in an irreversible way. Entropy increases and broken plates never reassemble themselves. And yet at the core is quantum mechanics which is completely reversible.

Haskell and Mercury are two languages in which the interaction with the external world (including mutable data in memory) is explicit. This is a consequence of the fact that both languages are declarative. In non-declarative languages the textual order of the program specifies an order of execution (optimisers alter that, but they are constrained to preserve the semantics, and are forced to make conservative decisions as a result). In declarative languages the compiler generates execution when required, and the thing that brings about that requirement is interaction with the outside world. Since that interaction is explicit, one would hope that the optimiser is never constrained by worrying about what separately compiled code might do.

So it seems that programming should have three levels.

A core language that is reversible. This specifies what can be handled by cases, though if you allow multiple returns (as Mercury does) then you can expand that.
Reducing operations (like summing a list) which are intrinsically irreversible.
Interaction with the unforgiving world which forces an order of execution.

But maybe its not quite that simple. It seems that if the part of the real world you are dealing with is quantum then it might be reversible and you might conceivably be able to interact with that in the reversible part of the language. Perhaps more practically: When two programs interact with each other then they might sometimes be able to do that while each stays within the reversible core.

I should have another go at learning Mercury. Last time was 15 years ago.