Monday, October 31, 2016

Rationale of the Wombat Type System

I wrote a submission for "Off the Beaten Track" (http://popl17.sigplan.org/track/OBT-2017#About). Then I noticed that it has to be 2 pages, so back to the drawing board. Anyway its at
Rationale of the Wombat Type System
The features of the type system that will be covered here are:
  • Support embedding (lossless type conversion) and distinguish it from lossy conversions.
  • Characterise types by their properties, allowing multiple implementations with differing performance characteristics.
  • Allow incomplete implementations (such as Int64 for Int).

Wombat Introduction

Before getting to the subject of the talk, some aspects of Wombat need to be introduced. Yes they could be done differently, but I find that they fit comfortably with the type system features.
The following expressions are key:
  • expression:type returns a value of the given type, failing if that is not possible;
  • expression1 = expression2 is a unification, filling in holes on either side to make the equality true in a unique way, or failing if that is not possible.
In both cases the failure is informational not fatal, and is commonly used for flow control.
Blocks of code are just closures with Unit input. So it was desirable to have a syntax for closures that was terse when the parameter was not mentioned. A closure is just an expression in braces (such a {1+2}). The input parameter is $, and good practice is to start the closure with an unpacking unification and never mention $ again:
{ $=Complex(`re,`im); Complex(re,-im)}
Identifiers are preceded with a backquote when introduced for the first time. Operations that create structures (such as the Complex constructor here) can take holes as parameters, which are quickly resolved in this case.
There is a caseP procedure. The case operator is identical apart from a more familiar syntax. The (curried) parameters are a list of procedures, and a parameter. The parameter is passed to all the procedures, conceptually in parallel, and it is expected that all but one will fail, and the result of the one that succeeds is returned. Effectively the guards of the case statement are moved inside the alternative procedures where they just become normal statements. However more than one of the procedures can succeed if they return the same answer.
`abs = {$=`x; caseP [
                { x >=? 0; x } # ? turns bool operator to unit/fail
                { x <=? 0; -x } ] () ;
If the returned results are not Distinguishable then they are combined if they conformTo Combinable. In particular procedures are combined by using the curried caseP procedure with the list of returned procedures as the parameter. This defers the equality check and also allows the operation to be spread over multiple procedures. Procedure combining in this way is a recurring requirement.
Atom constants start with a dot (.). All values can act as procedures. So s.length is just calling s with Atom parameter .length. If s:String then this is (String.asProc s).length.

Type Embedding

In some languages we have to say “I’m sorry, you’re a Dog, you can’t also be a Mammal. However I have this irreversible conversion you might consider.” or “I’m sorry, you’re an Int, you can’t also be a Rational.” In Wombat we allow embedding with the isA declaration:
Int isA Rational by { Rational($:Int,1) } inverse
                   { $=`r:Rational; r.denominator=1; r.numerator}
The inverse operator creates an inverse pair of procedures. The first always succeeds (converting the input type up), while the second may fail. Explicitly:
(F inverse G):InversePair X Y
means that F:X=>Y is total, G(F(x:X))=x always, and F(G(y:Y))=y whenever G:Y=>X succeeds.
As we go up the hierarchy we expect to lose properties and not gain any. So the above isA declaration has the effect that Int acquires all the properties of Rational, such as the .denominator procedure. This means that no explicit conversion is necessary to convert upwards: just use the properties of that higher type as if the value was of that type, because it is. As we go up we also need more data such as the denominator for a Rational, or dog-specific data for a Mammal.
Downward and sideways conversion only happens in the presence of an explicit colon operator conversion or converting a parameter to the domain type of a procedure.
For an example of sideways conversion, suppose we have types Mammal and Quadruped, with some obvious lower types. Then someMammal:Quadruped will convert to Quadruped type and fail if someMammal is one of those pesky bipedal mammals. The rule is that we first convert down to the intersection of the origin and destination types, then up to the destination.
There is also lossy conversion established with the convertsTo operator. This only comes into play when the source and destination have Empty intersection in the isA hierarchy, in which case it acts the same as a downwards conversion. For example, the type of variable X is Assignable(X), which convertsTo X. So if we give an Assignable(Int) where a Rational is required then we convert down to Int then up to Rational. There is, naturally, no route either way between Assignable(Int) and Assignable(Rational).
Whether going up or down there is the potential for diamond ambiguity. For example suppose Int isA Rational, Int isA ComplexInt, Rational isA ComplexRational, ComplexInt isA ComplexRational. There are two ways to convert from Int to ComplexRational. This is solved by combining the procedures using caseP.

It's Types all the way down

Wombat has Behaviours (spelt with or without the 'u’), which are much the same as Haskell type classes. A type is defined by its compliance with a particular Behaviour. A type can also be declared to conform to other Behaviours, and can do so in more than one way (controlled by a 'how’ parameter which is often specifiable as an optional operator subscript).
We might declare Nat by having a zero, successor and ordering with appropriate laws. That doesn't generate any implementation. Instead every non-primitive type has to have one or more implementations using some other type. Nat might reasonably be implemented by BigUInt, with Nat’s operations implemented using BigUInt’s operations, and its laws proved using BigUInt’s laws.
Ultimately this has to lead to primitive types. These have a Behaviour with operations and laws determined by the specification of the compiler target, and they are the only types that exist at run time.
Yes we can:
  • fully define Nat with a few simple operations;
  • code all other operations, such as add, using those simple operations;
  • implement those simple operations in BigUInt.
But that is going to be a very inefficient use of BigUInt. The rule is that a use of a type is not allowed to peek inside at the implementation. It is only allowed to use the type’s defining Behaviour. The exception is “addSpecialization” which can add a specialization to any procedure, replacing a value of a type with some implementation of that type. The thing that makes that safe is that the programmer is required to provide a proof that it is an accurate specialization. Unfortunately the details of that are not worked out, and the only proof in the initial design is TBS, which always works. Here is some out of context code:
`np=Nat.property;
  Nat conformsTo Monoid name .additive by # Monoid extends Distinguishable
      [
          .zero -> np.zero # monoid .zero goes to Nat .zero
          .binop -> `add={$=`x,`y;
                          `up,`down=np.succ; # .succ is an inverse pair
                          caseP [{y=np.zero; x} # note: down fails for zero
                                 {add(up x, down y)}] ()
                         } # need to add implementation specializations!!
      ];
  Nat.behaviour(Monoid).additive.binop addSpecialization BigUInt.add
      covering (Nat as .bigUInt)*(Nat as .bigUInt)=>(Nat as .bigUInt)
      p
roof TBS;
Implementations are interlinked by inverse pairs of total procedures. All implementations except one have to link to one of the others. But we also allow partial implementations which must link up to a wider implementation (usually to a full one) with an InversePair of procedures that can fail going back down. For example Int64 is an important partial implementation of Int for some target hardware. However operations producing Int64 must detect failure, leading to rerunning the failed operation to produce a BigInt and to following a different continuation of the code from that point.

Every subtype automatically becomes a partial implementation of the higher type. So Int is a partial implementation of Rational and we can add a specialization for, say, .denominator that always returns 1.