Saturday, November 21, 2015

Holes: ideas and puzzles

[note that holes have been brought in to version 0.0.4 of the doc, but in a simpler way than this.]
An example from the overview:

firstCase someExpresion of [
{ `x, ok = $; … }
{ `x, error(timeout,`extraInfo) = $; … }
{ `x, `otherError = $; … }]
The idea is that when the error is a timeout we extract the extraInfo and do something with that. The error identifier is acting as a procedure here, and maybe it is one, just building up an appropriate structure:
`error = { (`errorCode,`extraInfo) = $; (.error, errorCode, extraInfo) }
Here we return a 3-tuple with an atom first component used to identify it as an error structure. This might not be a very likely methodology, but it is conveniently illustrative.
`extraInfo is a hole waiting to be filled. What can we make of it being a part of the parameter we pass to error? And holes naturally appear in other places (such as pigworker’s amazing derivatives of data structures). And we’ll see others below. We want to handle them in a uniform way.
The easy thing, if we can make it work, is to see holes as just another type. Let’s write -T for a hole of type T. If X convertsTo Y, then we see that -Y convertsTo -X. If Int convertsTo Rational, then a hole for a Rational can take an Int. We want this to also work for the more important isA relation, and let’s assume that before getting to how it might work.
Extrapolating wildly based on our notation, it is natural to assume that Empty = -Empty, and the other hole types are below that, with -Any at the bottom.
An interesting thing happens when you consider the type Assignable(X) of mutables that can hold a value of type X. Naturally Assignable(X) convertsTo X, so that we can conveniently use the value. But now we see that we can also have Assignable(X) convertsTo -X, which will allow us to use our Assignable as a hole and assign new values using more general hole-oriented code.
The isA rule is that properties must always decrease as we go up. To fit this in we will say that holes types have negative properties. Some types (such as Assignables, or the tuples we looked at above) have mixed positive and negative character. The isA rule now says that, as we go up the isA hierarchy the number of positive properties must go down, while the number of negative properties must go up.
I’m not sure if this is all going to work, but it is fun.

[update: This suggests that an explicit closure should be an Any=>-Any, and maybe other things.]

[upupdate: A cute thing, if this all works is the ancient assignment statement becomes:
someHole := someExpression
and this works in all hole contexts, and its use for Assignables becomes a special case.]

Saturday, November 7, 2015

procedures: - combining and adjoint pairs

After the previous post, I feel a need to put things on a slightly firmer footing. This is a start.

Combinable

We define a Behaviour Combinable with a property combine that is a procedure of type SemiSet X=>X. Informally (ignoring side effects):
  • Every type that is Distinguishable is Combinable. The SemiSet is a Set if X is Distinguishable. If the set has one element then return it, else fail.
  • Procedure type P=>Q are combinable if Q is. The result of combining them is a procedure which runs all procedures and returns the combination of the successful results.
  • A Union is combinable if the alternatives are. If the parameters are in one branch only then it is the combination in that branch. If the values are all in the intersection of 2 or more branches then that intersection must support combining (since properties are lost but not gained as we go up the isA hierarchy).
  • A tuple with combinable component types is combinable: just combine each component separately.
  • Etc.
All types considered in Wombat documentation to date are Combinable, and types which aren’t Combinable will have poor semantics. For example we could imagine a well behaved form of approximate numbers which are defined by a range of Rational numbers. We could say that two are Combinable if their ranges overlap, in which case the combine operation creates a range covering both ranges. But standard floating point numbers are not going to be Combinable in any sensible way.

AdjointPair(X,Y)

These used to be called inverse pairs. That was misleading because it suggested symmetry between the two directions. The new name is also slightly misleading, suggesting that it can be seen as an example of adjoint functors from Category Theory, after which it is named because it seems similar.
If ap is an AdjointPair(X,Y) then it has 2 procedures:
  1. ap.to:(X=>Y) never fails for an x:X, and is effectively 1-1 (injection / monomorphism / section) from X to Y, since one can (almost) get back to x with the other half of the pair;
  2. ap.from:(Y=>X) applied to ap.to(x) may not get back to x, but it will get back to something that is compatible with x, so that they could be combined. In the common case where X is Distinguishable it does get back to x. Conversely ap.from will fail (Fail.next) if its parameter does not come from an X, except that if X is a procedure type it might instead return a procedure that fails when called.
The most obvious place where AdjointPair is used is in an isA declaration. Now if X isA Y via apxy, and P isA Q via appq, then there is an induced relation:
Y=>P isA X=>Q via AdjointPair(
   { `yp = $=:Y=>P; { appq.to (yp (apxy.to $)) }} : (Y=>P)=>(X=>Q)},
   { `xq = $=:X=>Q; { appq.from (xq (apxy.from $)) }} : (X=>Q)=>(Y=>P)})

The 2nd (.from) procedure doesn’t itself fail, but instead it returns a procedure that might fail if given a Y that isn’t an X. The key point is that if you start with a Y=>P, convert to an X=>Q, then convert back, then you get back to a procedure that has the same semantics, just some extra unnecessary checks thrown in. One might well say that this transformation is a natural one. Maybe if we define an appropriate category then these really are an adjoint pair in the Category Theory sense. Anybody who can see that and explain it will get a permanent acknowledgement in Wombat’s main documentation. For extra merit: What then is the associated monad?

Tuesday, November 3, 2015

Adventures in combining procedures

Gaussian Integers are just complex numbers with integer real and imaginary parts. Why don't they just call them Complex Integers? That's what I'll do here.

Suppose we have types Int, Rational and ComplexInteger, such that Int isA Rational and Int isA ComplexInteger.

Imagine a procedure that takes a Union(Rational,ComplexInteger). Any well behaved such procedure is going to give the same result for an Int parameter, whether it arrives as a Rational (with denominator 1) or as a ComplexInteger (with imaginary part 0). Indeed by transitivity, Int isA Union(Rational,ComplexInteger), so it can just come as itself. For definiteness let's suppose the procedure returns the magnitude as an AlgebraicNumber. The traditional hacky approach is to use the equivalent of Wombat's firstCase:

  `magnitude =
    { firstCase $ of [
        { $=`p:ComplexInteger; AlgebraicNumber.sqrt(p.real**2 + p.imag**2) }
        { $=`p:Rational; AlgebraicNumber.abs(p) }
    ]}

Int parameter's will always take the first exit (even if Rational!). The more respectable way to code this is to use the case operator:

  `magnitude =
    { case $ of [
        { `p = $=:ComplexInteger; AlgebraicNumber.sqrt(p.real**2 + p.imag**2) }
        { `p = $=:Rational; AlgebraicNumber.abs(p) }
    ]}

Note that the procedures are still given as a list here, but are actually a set in this instance (with no order), or rather a SemiSet, but we can use list operator because List X convertsTo SemiSet X.

In this 2nd implementation using case, when magnitude is given an Int parameter it will (in theory) run both procedures to conclusion and check that they return the same result. However a runtime check might not be necessary if the compiler can work out that both return the same answer, and in that  case it can choose the more efficient.

The case operator is precisely the mechanism for combining procedures. So we can imagine a class of generic procedures such that, if the procedure is defined for types X and Y, then it is also automatically defined for Union(X,Y). Wombat supports such generic procedures via properties of types. In this case it is natural to make .magnitude a method by making it a case in the type's .asProc static property. This is convenient, allowing one to write expr.magnitude. Other static properties of types can be used for generic procedures that don't fit the .asProc pattern. Except that Wombat doesn't do duck-typing. So the automatic combination of .asProc and other properties only applies where both types get the property from the same Behaviour, but we'll gloss over that for the moment.

Let's see if we can prove in an informal way that this works correctly. Suppose we have, as well we might, a type ComplexRational such that there are declared relations: Rational isA ComplexRational and ComplexInteger isA ComplexRational. The rules say that Union(Rational,ComplexInteger) isA ComplexRational. Suppose (simplifying) that the .magnitude operation for the Union is formed by combining the operations for Rational and ComplexInteger. We know that the .magnitude operation for Rational is compatible with that for ComplexRational, and ditto for ComplexInteger. Now consider the operation of .magnitude on a Union value. If the Rational branch fails then the ComplexInteger branch is the one that wins, and it is compatible with the result for ComplexRational. Conversely if the ComplexInteger case fails the Rational branch result will be compatible. If both succeed then the two results are compatible with each other, and each is separately compatible with the .magnitude of ComplexRational. The combination then has to be compatible as well. Hopefully I can make a less hand-wavy proof one day.

I was going to continue with a discussion of folds. For associative operations, foldl and foldr can be combined (and maybe we can add other potentially parallel variants), and one can imagine adding a little magic pixie-dust that will allow the compiler to pick the best for various available implementations of List. However that will have to wait for another day.

Saturday, October 24, 2015

Scala is scary

This paper about DOT (= future Scala) has stuff that might be relevant to Wombat: http://arxiv.org/abs/1510.05216.
  • "extending subtyping to a lattice breaks at least one key metatheoretic property such as narrowing or subtyping transitivity". Apparently they solve the problem by distinguishing run-time values from those that exist at compile time but are never instantiated. This distinction is important in Wombat where infinite things exist at compile time.
  • However the name DOT = "Dependent Object Types" makes me wonder. Objects are bags full of variables and variables preclude any sane type hierarchy because they can't behave covariantly or contravariantly with respect to any other type. I.e. if X isA Y then we can't create a rule that Assignable(X) isA Assignable(Y) because suppose we pass an Assignable(X) to a procedure that wants an Assignable(Y): that procedure might assign a Y to it that isn't an X. So I don't know how DOT can have sane semantics. I should add it to my list of "stuff I really should read", but I don't feel strong enough.
Jon Pretty gives another interesting talk: http://rapture.io/talks/slippery-surface/cadiz.html. I should try to understand how his ideas fit in with Wombat. His (and presumably Scala's) idea of intersection types is not the same as Wombat's. 

The reason Wombat should (I expect) work without all the drama of more complex languages is the tough rule on the "isA" hierarchy: As you go up the hierarchy you always get less properties and the properties that are the same are (provably) compatible, and you have more data so that you can (provably) go back down the hierarchy with case statements and get back where you came from.

[Update: I should say something about what "compatible" means. One example is that addition works compatibly as you go from Int to Rational, even though one gives an Int result and the other a Rational result, the results are equal whether you promote the Int or demote the Rational. But it gets more complex when you get to things that don't support equality, particularly procedures. This shows up in the case of .asProc. We can use any type as if it was a procedure. For example "a string".length actually calls String.asProc("a string") with parameter .length. But .asProc is going to do less and less as we go up the hierarchy. So compatibility means that a higher .asProc might fail on a parameter, but if it succeeds it will return the same result (or compatible!!). A .asProc can take a parameter and pass it to the .asProc of all lower types (without having to know what they are) and combine the results. This is the equivalent of Abstract methods in other systems, but it has no problem with the situation where the value belongs to the intersection of multiple lower types.]

Wednesday, October 14, 2015

Units in Wombat

The way I hope to do units in Wombat is something like this:

There will be a type for length/distance. There will be some predefined values of that type such as metre. If you want 3.0 metres it is 3.0*metre. If you have a distance and you want to write it out as a number, divide by metre: myDist/metre. The same for angles: Ď€*radian=180*degree. It gets a bit ugly for temperatures: zeroC+30.0*cDeg, and ditto date-time.

To be more exact. Assume the numbers we are working with are type Approx. Units will live in a type U. So metre above should really be U.metre, but there is nothing to stop you declaring `metre=U.metre, etc. The Approx number live in U as a degenerate set of units, so when you do myDist/metre you actually end up with type U, not type Approx. However Approx isA U. So, either immediately or at any later time, we can convert our result to Approx without failing.

[Update: This is just because units exist where we have a torsor (http://math.ucr.edu/home/baez/torsors.html): the difference or ratio is a number, but there is no default scale (multiplicative) or no certain starting point (additive), and the two variants seem to often come together. At any rate it is obvious that we want to generalize this and have type U be just a special case of a wider torsor type.]

[Update 2: Wouldn't it be better to have each sort of quantity (distance, mass, time, etc) be a different type? Then the type system could sort everything out, and detect mistakes at compile time, and some things become more convenient (distance1/distance2 is automatically just a number). But there are an infinite number of combinations (mass/(distance**3) is density) so it would be a complex variety of indexed types. This is very typical of the hard choices between static and dynamic typing. Wombat's standard library will try to hit the middle, while allowing others to do very static or very dynamic stuff. In this case a Distance type may not need any additional properties compared to type U restricted to distance. If that is the case then the advantages of having Distance as a separate type can be realised in a simple way using subset types, that is also pretty easy for the programmer to understand, and which let's them go back to U for one-off stuff without having to create new types all the time.]

Monday, September 28, 2015

Multitype has gone!

Version 0.0.3 of the outline doc (https://docs.google.com/document/d/1sNnCvYFjjBmtrl7XF7JX_pdMZ_0ydrctjTR0pO0p5Y4/edit?usp=sharing) has been updated to remove Multitype and replace it with Empty weirdness, based on the fact that Empty is at the bottom of the isA hierarchy, so Empty isA X for any X.

  • [] is a List Empty, so can be used where any List X is required;
  • Option.none is an Option Empty, so can be used where any Option X is required.
  • An explicit closure, { ... }, has type Any=>Empty, and so isA X=>Y for any X, Y.
We've drifted a bit towards dynamic typing, but all is now very straightforward.

Friday, September 25, 2015

Mathematics influences Wombat again

In Wombat, types form a lattice (an order lattice) generated by isA declarations. The sup/lub/join is Union, the inf/glb/meet is Intersection. Unions between otherwise unrelated types are new types created as required. The intersection is just the union of types below all the types being intersected. So, for some examples: Integer isA Rational, Integer isA GaussianInteger. Union[Integer Rational] is Rational, but Union[Integer String] is a new type (with few properties, just the ability to extract one or other). And, of course, Intersection[Rational GaussianInteger] is Integer. The intersection of unrelated types is the type Empty, which has no values.

[notational note: list entries are separated by spaces. The parameter of Union and of Intersection is a set of types, but can be given as a list as I have done here (because, in a forgetful way, List X convertsTo Set X). The / operator invoked below takes 2 Integers and returns a Rational.]

I've been following along, in a desultory way, with a Mathematics discussion (https://golem.ph.utexas.edu/category/2015/09/the_free_modular_lattice_on_3.html) where I learnt of the difference between a lattice (which has sups and infs for finite sets of members of the lattice) and a bounded lattice which also has sups and infs for an empty set of members. Things usually work better when you include these degenerate cases, so I thought: I wonder if I can make Wombat's type hierarchy into a bounded lattice.

It doesn't take much imagination to see that Union[] is going to be Empty. And then it is natural, looking at everything upside down, to see that Intersection[] is the type Any which is the Union of all types.

Now consider the list [7 3/2 4/5]. What is this a list of? It is naturally a list of the Union of the types of its elements. Since those types are Integer and Rational, the result is a List Rational (as any ordinary mortal would expect).

So what is the type of the empty list []? The set of types of the members is the empty set. The Union is thus type Empty. So [] is a List Empty. But we've seen that Empty is below any other type X, so Empty isA X. So a List Empty isA List X for any X. Which is exactly what you need to allow the empty list to be used in the situations where one might want to use it.

Monday, September 21, 2015

Syntax is a minor matter

I just learnt of the existence of Marpa: https://jeffreykegler.github.io/Marpa-web-site/. Basically this generates a BNF parser that is simple and efficient. I'm not sure how easily Wombat's user-defined syntax system can be implemented with Marpa, but the key point is this:

Having syntax built in to a language definition is going to alienate people. It isn't necessary. A core aim of Wombat is that different folk can use different syntax systems, including rolling their own, and still use each others libraries without difficulty. Of course Wombat will have default syntax for the standard libraries, based on the prejudices of the core developers, but even that can be easily changed. You prefer to separate your list elements with commas instead of spaces, or you like your "if" expressions to end with "fi": easy to do. Wombat may provide standard WCTL (Wombat Compile Time Language) functions to allow programmers to specify common styles.

What is important is the structure of the Type system. This is where I think Wombat is different (though the relevant literature is vast, so I could well be wrong). If a Rational has a denominator of 1 (or a Gaussian Integer has an imaginary part of 0), then it is an Integer, and not distinguishable from an Integer that arose in a more obviously Integer way. If two types have a non-Empty intersection (as Gaussian Integers and Rationals do) then you are not allowed to define a lossy conversion from one to the other: Conversion can only happen losslessly by going down to the intersection and then up (and it may then, naturally, fail).

I trained as a Mathematician, and I think this is the way most Mathematicians operate most of the time. Sometimes they will show a bit of guilt by writing "by abuse of notation", but I think they really feel it is ok. By contrast Mathematicians involved in the foundations of Mathematics don't allow this sort of thing at all. I suspect this is a big part of the disconnect between the two camps. Programming is in a funny state, being a mixture of very hacky stuff on one side, plus stuff that has come over from work on the Foundations of Mathematics. I think that Wombat, by contrast, has the right balance, corresponding to the way most people think.

Wednesday, September 2, 2015

On Infinite Things

Infinite things don't really exist. Consider the simplest example: the natural numbers 0,1,2,3,... We have strong intuitions about this set and we use that intuition in applying mathematics to the real world. Yet everything about Nat, the natural numbers, follows logically from the following simple, and finite, set of rules:
  • Zero and Succ are primitive entities, completely defined by the following:
  • Zero belongs to Nat;
  • Succ maps from Nat to Nat. If N belongs to Nat, then Succ(N) belongs to Nat.
The recurrence relation is real but not easy to think about. The infinite set of natural numbers is something we are more comfortable with.
Consider the test program in the recently released first draft of Wombat compiler code:
   `fact1 _n = if (_n:Int>=?0)==0 then {1} else {_n*fact1(_n-1)};
   `fact2 = { `n = $:Int>=?0; if n==0 then {1} else {n*fact2(n-1)}};
   `in = getInt();
   putInt( fact1 in = fact2 in)
To understand the first line, consider this very finite situation. We have 2 expressions within a closure:
  1. `not true = false
  2. `not false = true
The first thing to note is that it is OK to declare the same name more than once in a closure as long as the names are compatible. Being equal is the simplest form of compatibility. The other acceptable form of compatibility is by combining procedures, and that is what applies here. For the gory details see http://wombatlang.blogspot.com.au/2015/05/combining-procedures-in-wombat.html. But basically they have to agree (or be combined!) where they overlap, and where one fails (harmlessly = Fail.next) and the other doesn't, the combination takes that value. The declared name can't be used until all declarations are complete.
Getting back to that first line above:
   `fact1 _n = if (_n:Int>=?0)==0 then {1} else {_n*fact1(_n-1)}
The first thing to notice is that this expression is the left parameter of a semicolon(;) operator, so the value is discarded, and only the side affect (defining fact1) is relevant. For the record the value would be a Type.
The expression involves a free variable (_n). That means this is an application of a Wombat macro. Since no macro is specified, the default applies, which is "forall". So this is really:
   forall _n (`fact1 _n = if (_n:Int>=?0)==0 then {1} else {_n*fact1(_n-1)})
This is the same as our definition of "not" above, except that there are an infinite number of procedures to be combined (all possibilities for which the expression doesn't fail):
   `fact1 0 = if (0:Int>=?0)==0 then {1} else {0*fact1(0-1)}
   `fact1 1 = if (1:Int>=?0)==0 then {1} else {1*fact1(1-1)}
   `fact1 2 = if (2:Int>=?0)==0 then {1} else {2*fact1(2-1)}
   ...
Except that it is not to be regarded as being in any order.
Note that we can refer to fact1 inside the definition because that is inside a closure. It is technically a forward reference which is filled in, completing the construction of the closure, when the definition completes. The closure can not be called until all forward references are filled in.

[update 2016-09-28: This is no longer quite correct in the new wombat.]

Saturday, August 29, 2015

Wombat changes

Here are some changes to Wombat:

  1. The type that was Void is now Unit. Its only value, that was empty, is now unit. This is cutting a link to Algol68 usage, but the new naming is a modern standard.
  2. I plan to allow operators to have prefix and left-parameter forms. This will allow the "-" in 5-3 to be different from the "-" in -3. What happens to the right can then be different for each. So prefix "-" can have a higher right priority to infix "-". This will also allow the Python3 "if" style. It will allow the whole APL/J monadic/dyadic suite for those that like that sort of thing. [Previously Wombat could handle -3 because it became unit-3 which could be handled differently via polymorphism.]
  3. The type Muteable X [which used to be Var X before that] becomes Assignable X, following the lead of Robert Harper.
  4. Wombat will make more use of 1-tuples. The 0-tuple type is exactly Unit. The comma operator generates n-tuples for n>1. There is no syntax for 1-tuples, just procedures to go to and fro. Previously I vaguely thought that X and Tuple[X] were the same, which caused various problems. Now all is sweetness and light, and I can, for example say that Option X is just Union[Unit,Tuple[X]], and even if X=Unit, that still works because unit!=to1tuple(unit).
  5. In addition to operators for tight and loose breaks, add breaks representing line breaks with various sorts of indentation change.
[update 2016-09-28: point 4 is rubbish.]

Super Simple Syntax

Super Simple Syntax

Wombat likes to take minimalism to extremes. The only built in types are procedure types and tuples (Which includes Unit, the 0-tuple). There is no builtin syntax. All syntax is defined with the syntax creation scheme called Super Simple Syntax (SSS) which is available to library writers and even end-users. If the programmer doesn’t like the syntax she can make big or small changes, without affecting interoperability with other modules.
Some programming languages have user defined operators with just one precedence number that is the same on both sides. This means that you also have to specify left or right associativity. In Wombat the left and right can have different precedence. The operator associates to the left if the right precedence is higher.
Wombat allows operators with no left operand, or no right operand, or both. Indeed an identifier is just an operator, with no left or right, from a syntactic viewpoint. Operators can have following sub-operators, such as then and else for the if operator. Sub-operators can repeat or be optional, and can have a nested structure. There can be repeating groups, such as elif-then pairs in an if operator.
Operators can have a left parameter in which case they have a left priority. They can have a right parameter in which case they have a right priority. Technically the right priority belongs to trailing sub-operators such as else, but this is commonly the operator itself when it doesn’t have other sub-operators. (The operator counts as a sub-operator of itself.)
An operator can have a sub-operator (not itself) which is also in use as an operator. It only takes its sub-operator meaning where it is expected.
Priorities form a partial order. If priorities are not comparable then they can’t do battle for an expression between them. This stops different libraries from getting in each other’s way. The partial order includes as a subset the positive decimal numbers with a finite number of digits (Dewey decimal style), which the ordinary programmer might prefer to use. Additional priorities are defined by names together with enough comparisons between each other and (if desired) the numerical priorities. The partial order is the resulting transitive closure.
Most operators just map to a procedure. The parameters are combined into a tuple in the same order that they occur. Repeated parameters map to an n-tuple where n is the number of repeats. Optional parameters map to a 0-tuple or a 1-tuple. The procedure has to be appropriately polymorphic.
Every (sub)expression starts with an operator with no left and ends with an operator with no right. Every operator with no left is the start of one or more expressions, and every operator with no right is the end of one or more expressions. Every operator with no left will be preceded by a suboperator with a right which will swallow one of the expressions it starts. Every operator with no right is followed by an operator with a left to swallow one of the expressions it ends.
When an expression is required after a (sub)operator, and the following operator has no left, then the 0-tuple (unit:Unit) is inserted automatically. This means that unit can be written (), but also that it can be omitted almost anywhere that it might appear. [Note that '(' is an operator in Wombat, with ')' as a suboperator.]
When an operator with no right (such as an identifier or a parenthesized expression) is followed by an operator with no left, then a space-break token is inserted. In standard Wombat this is left associative procedure call. Actually if there is no actual white space (the break is created by the lexical analysis) then it is a different operator: tight-break. This also maps to procedure call. Tight-break is always an operator: it can't be used as a suboperator. Space-break can be used as a suboperator, and is so used in the list constructor operator (e.g. [12 34 56]).

Some initial code for SSS has been released. See the preceding post and https://github.com/rks987/wombatlang.