Articles

Termination of Flow Typing in Whiley

Whiley uses flow typing to give it the look-and-feel of a dynamically typed language (see this page for more on flow typing).  In short, flow typing means that variables can have different types at different program points.  For example:

define Node as { int data, Tree left, Tree right }
define Tree as null | Node

// Insert item into tree
Tree insert(Tree tree, int item):
    if tree is null:
        return {
          data: item,
          left: null,
          right: null
        }
    else if item < tree.data:
        tree.left = insert(tree.left,item)
    else:
        tree.right = insert(tree.right,item)
    // done
    return tree

Here, Whiley’s flow typing system automatically retypes the variable tree to Node on the false branch of the condition tree is null. Thus, we can safely use tree.left and tree.right without having to explicitly cast tree to a Node. In this way, we see that variable tree has different types at different points. In this case, it always has one of the following types: Tree, Node or null (the latter being its type on the true branch of the condition tree is null).

In the above example, the different types of tree are related — i.e. Node and null are both subtypes of Tree. However, Whiley’s flow typing system can also work with unrelated types. A more involved example, based on my Chess benchmark, illustrates:

define LongPos as { int col, int row }
define LongMove as {
  Piece piece,
  LongPos from,
  LongPos to
}

define ShortPos as {int row} | {int col}
define ShortMove as {
  Piece piece,
  ShortPos from,
  LongPos to
}

// Find matching pieces on the board.
[Pos] find(Piece p, Board b):
    ...

// Intersect destination with possible moves of matching pieces.
[Pos] narrow(ShortMove m, [Pos] ms, Board b):
    ...

// Convert move in short notation into long notation.
LongMove convert(ShortMove m, Board b) throws Error:
    matches = find(m.piece,b)
    matches = narrow(m.to,matches,b)
    if |matches| != 1:
        throw Error("invalid move")
    m.from = matches[0]
    return m

This code converts moves expressed in short algebraic notation into long algebraic notation (see this for more). In the short notation, moves are given in an abbreviated (and potentially ambiguous) form with only the destination square given. For example, a move Nf6 indicates a Knight moving to square f6. If the player has only one knight, then the move must refer to it. However, if there are two Knights, the system must determine which it is. This is done by narrow() above, which intersects the destination with the possible moves of the matching pieces. In the case of multiple matches, an Error is thrown. The notation also permits explicit disambiguation by providing either the rank or file of the given piece. For example, Neg3 indicates the Knight on file e moves to square g3.

Flow typing is crucial in type checking the above fragment. The critical issue resolves around the statement m.from = matches[0]. This retypes variable m from ShortMove to LongMove, allowing the subsequent statement return m to be type checked. In this case, there is no subtyping relationship between ShortMove and LongMove — they are essentially unrelated.

A Problem of Termination

We now come to the main topic of this post. Consider the following very simple example:

define Rec as int | { Rec f }

Rec loopy(int x, int y):
  z = { f : x }
  while x < y:
     z.f = z
     x = x + 1
  return z

This program currently causes the Whiley Compiler to enter an infinite loop! But, why is that? Well, the key is that flow typing is implemented in the style of a dataflow analysis. Roughly speaking, the algorithm employs an environment which maps variables to their current type. It then processes each statement updating the environment as it proceeds. The following illustrates the process for the first two statements above:

Rec loopy(int x, int y):
  // x->int, y->int
  z = { f : x }
  // x=>int, y=>int, z=>{int f}
  ...

In many ways, this is the natural way to implement the flow typing algorithm and follows, for example, the approach used in the JVM bytecode verifier (perhaps the most widely used example of flow typing).

So, why does the above short program not terminate? Well, a dataflow analysis handles loops by iterating until a fixed-point is reached. This is done by first propagating through the loop body to update the environment, like so:

  // x=>int, y=>int, z=>{int f}
  while x < y:
      // x=>int, y=>int, z=>{int f}
      z.f = z
      // x=>int, y=>int, z=>{{int f] f}
      x = x + 1
      // x=>int, y=>int, z=>{{int f] f}

At this point, it merges the new environment with that originally going into the loop and repeats the process:

  // x=>int, y=>int, z=>{int f}
  while x < y:
      // x=>int, y=>int, z=>{{int f}|int f}
      z.f = z
      // x=>int, y=>int, z=>{{{int f}|int f} f}
      x = x + 1
      // x=>int, y=>int, z=>{{{int f}|int f} f}

And then repeats again:

  // x=>int, y=>int, z=>{int f}
  while x < y:
      // x=>int, y=>int, z=>{{{int f}|int f} f}
      z.f = z
      // x=>int, y=>int, z=>{{{{int f}|int f} f} f}
      x = x + 1
      // x=>int, y=>int, z=>{{{{int f}|int f} f} f}

And so on, until there is no change in the environment (i.e. the fixed point is reached). For a typical dataflow analysis, this should happen within a few iterations. However, for the above flow typing algorithm, this process will never terminate because it is constructing successively larger types for variable z.

In fact, we can obtain a valid flow-typing for our simple program as the following illustrates:

Rec loopy(int x, int y):
  // x->int, y->int
  z = { f : x }
  // x=>int, y=>int, z=>{int f}
  while x < y:
      // x=>int, y=>int, z=>{Rec f}
      z.f = z
      // x=>int, y=>int, z=>{Rec f}
      x = x + 1
   // x=>int, y=>int, z=>{Rec f}
   return z

The key is that assigning {Rec f} to field f gives the type {{Rec f} f} — which is a subtype of {Rec f}. Hence, the above typing is valid (albeit conservative). The problem is how to go about finding this typing, given that the standard dataflow approach does not succeed.

At this point, I’m not going to say too much more. That’s because I’ve written up a detailed account of the solution in a technical report. The key idea is to use a constraint-based approach, rather than a dataflow approach. Whilst this is slightly more complex, it has one important advantage: the algorithm always terminates!

8 comments to Termination of Flow Typing in Whiley

  • Hey Dave, just a question,

    The critical issue resolves around the statement m.from = matches[0]. This retypes variable m from ShortMove to LongMove…

    How does it know to retype m to LongMove? Since it seems to be a forward analysis, and there are no loops in the function, I’m assuming it can’t use the subsequent return statement to infer the new type.

    Let’s say there is another type X in scope with a field LongPos from. Will it retype m as LongMove | X ?

  • Hi Edmund,

    That’s the beauty of structural typing. The name is only superficial. Think of it as like a C macro which is always expanded. So, LongMove is really just the type {Piece piece,LongPos from,LongPos to} — nothing more. Whenever you see LongMove, the compiler automatically expands it. So, before the assignment, m has type {Piece piece,ShortPos from,LongPos to} and then afterwards its type is the same LongMove’s.

  • That does sound pretty neat, providing a good compromise between flexibility and strictness of types. But I wonder if humans find it easier to give a name to something, and expect the name to officially tag it? It feels like there’s room to accidentally create a value of the “correct” type by inadvertently composing items similar types, without explicitly stating what is being created. For instance:

    define Temp as { int degrees; }
    define Bearing as { int degrees; }

    Temp heatInDirection(Bearing b):
    // TODO Somehow calculate the temperature seen in direction b
    return b // Oops, type system didn’t stop us doing something dumb!

  • Hi Edmund,

    Yeah, that’s a common complaint against structural typing. I may well include some constructs to help with this kind of situation. For example, something like Units in F# would be interesting …

  • I immediately thought of SSA and ϕ nodes for this, and it seems like you have done something along those lines. I look forward to going deeper into this TR; great work! Reading your blog is always a pleasure.

  • Hey Peter,

    Yeah, this is strongly related to SSA form. However, it’s a bit easier for me since Whiley doesn’t have unstructured control-flow.

  • JamesG

    As I mentioned on twitter, I think that including an example of a Units and Measures would be very constructive.

    (I have been an unabashed fan of units and measures in programming for some years. Once when creating a simulation creation platform, I decided to add JScience to provide some clarity and flexibility to end users, allow for automatic conversions between equivalent units, and provide some basic operations on those measurements. Once I had integrated them into my code, they pinpointed an error in the simulation that otherwise would have gone unnoticed for quite some time… and they worked beautifully for all the reasons I had intended to use them as well.)

  • Yeah, I am definitely going to include units in some way. However, I need to think through exactly how it will work. So it’ll take a little time before it filters in …

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>