Articles

Indentation Syntax in Whiley

Like Python, Whiley uses indentation syntax instead of curly braces for delimiting blocks.  When I started using indentation syntax with Python, I was pretty skeptical, but  it grew on me fast and now I really like it.  However, this wasn’t the only reason I chose to use indentation syntax in Whiley.  The other is that we’d have too many braces otherwise.  The thing is, sets and set comprehensions are very important in Whiley, and their syntax uses curly braces (e.g. {1,2,3} and { x+1 | x in xs, x > 0 }). So, indentation syntax is one way to reduce the amount of curly (or other) braces.

One of the challenges with indentation syntax is the treatment of [[whitespace character|whitespace]]. In traditional languages, characters including newlines, tabs and spaces are dropped by the lexer. With indentation syntax, this is not possible as newlines and tabs form part of the syntax. By itself, this is straightforward to handle. The main issue arises when we want to wrap lines. For example, consider the following:

int f(int x):
    x = x +
    1
    return x

The question is whether or not this is syntactically correct. More importantly, if we decide it is, then how does the parser know when to ignore newlines and tabs?

My answer to this is surprisingly simple.  When parsing an incomplete expression, tabs, newlines, spaces (and other forms of whitespace) are ignored.  Only once an expression is completed, are they are again used for determining indentation.  Thus, the above example is syntactically correct in Whiley.  However, the following is not:

int f(int x):
    x = x
    + 1
    return x

This is not syntactically correct because x = x is considered a complete statement and, thus, the parser is expecting a new statement at +1.

This approach is similar, but not identical, to the way Python handles line wraps (see here for specifics).  Python distinguishes implicit line wraps from explicit ones.  An explicit line wrap is denoted using the \ symbol. For example, the following is valid Python:

def f(x):
    x = x \
    + 1
    return x

An implicit line wrap is one which is permitted without using the \ symbol.  In Python, expressions in parentheses, square brackets or curly braces can be split over multiple lines without using an explicit line wrap.  In Whiley, I have essentially just taken this a bit further to include any incomplete expression, not just those involving e.g. curly braces.

I suppose the real question is whether or not Whiley should also support an explicit line wrap operator.  For now, I’ll just defer this decision as it’s not mission critical …

2 comments to Indentation Syntax in Whiley

  • Adrian Quark

    Have you considered doing something like Javascript? In Javascript, if the next token after a newline is a valid continuation of the expression, the expression is continued. This gives you a bit more freedom in where you can break lines without requiring an explicit line wrap operator.

    Unfortunately this sometimes leads to unintuitive results, for example this is parsed as a function application:

    f
    (“hi”)

    When I wrote the parser for Orc — http://orc.csres.utexas.edu/ — I followed Javascript’s example but made some parts of the grammar sensitive to newlines to avoid confusion. For example, “(” is used for both function application and tuples, but at the start of a line it can only be used for tuples. For Orc this worked out very well, but I don’t know enough about Whiley to tell if a similar approach would work.

  • So, I have been wondering about that. I guess I need to think through carefully whether or not this can lead problems. Certainly, it makes parsing slightly harder, although that’s not a big deal.

    I think the main reason I would not do this, is simply to enforce a more consistent style for Whiley programs. Thinking about it from a human language perspective, I think requiring a token that indicates there is still something to come makes sense. E.g. “x=y+\n 1″ is easier for me to read than “x=y\n +1″, as my brain immediately recognises “x=y” as a complete statement. Hmmmm, so many decisions!

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>