Disambiguating Ambiguous Syntax?

When designing a programming language, being on the lookout for ambiguous syntax is important. You don’t want to realise down the track that your syntax is ambiguous in some subtle way. This is especially true if it means the compiler can’t decide how to proceed on some important case(s).  But, spotting these problems is not easy, especially when there are unexpected interactions between constructs.

The Context

In the latest release of Whiley, I have relented to peer pressure and backed off from my earlier syntax for message sends.  To recall, this used <-> and <- to signal synchronous and asynchronous message sends.  In particular, several people commented on how cumbersome <-> was.  Instead, the latest version of Whiley uses plain old . for synchronous sends, and ! for asynchronous (i.e. like [[Erlang (programming language)|Erlang]]).  Whilst in many ways this is rather nice, it does leave open an interesting question of ambiguity.

There are essentially two invocation forms the compiler encounters: func() and x.meth(). The former is straightforward as it always corresponds to a function invocation. The latter, however, is more subtle as it has two possible interpretations: a direct message send, or an indirect function invocation via a field dereference.

Here’s an example to illustrate a direct message send:

define MyProc as { int data }

int MyProc::func():

int client(MyProc p):
    return p.func() // direct message send

And, here’s an example to illustrate an indirect function invocation via field dereference:

define MyRec as { int() func }

int client(MyRec p):
    return p.func() // indirect function invocation

The question is: how does the compiler disambiguate this? Well,the (current) rule is simple:

If a matching external symbol exists then its a direct message send; otherwise, it’s an indirect function invocation via field dereference.

In otherwords, priority is given to direct message sends.  At this point, alarm bells should be starting to go off.  Why? Well, because external symbols may come and go — we have no control over them, and we don’t want changes in external libraries to affect the semantics of our code.  Considering the last example, suppose we ended up accidentally importing a matching external symbol:

// following symbol from some imported module
int MyProc::func():

define MyRec as { int() func }

int client(MyRec p):
    return p.func() // indirect function invocation

Well, this code no longer compiles because func is resolved as an external symbol and, hence, the compiler is expecting p to have type MyProc.

The Problem

Now, it seems like the above problem just stems from a bad choice of rule.  Perhaps an alternative ruling would be better?  Well, we could give priority to field-dereferences. This way, external symbols can’t affect field-dereferences. But, now, field dereferences can affect external symbols!   To see why, consider this:

define MyProc as {
 int data,
 int func()

int MyProc::func():

int client(MyProc p):
    return p.func() // what is it???

The problem here is that either interpretation makes sense.  We’re either indirectly invoking a function pointer stored in field func of process p, or making a direct message send to p.  Of course, we can choose how to resolve this (i.e. either field dereference or direct message send gets priority).

The problem is, whichever choice we make, it remains the case that a change in an imported module can affect the semantics of our code. To see why, assume MyProc is defined in an imported module and that some change occurs to that module outside of our control. There are two cases:

  1. Dereferences get priority. Then the MyProc changes from not including func to including func (so our client code changes from a direct send to an indirect invoke).
  2. Direct sends get priority. Then the method func doesn’t exist initially, but is added later (so our client code changes from an indirect invoke to a direct send).

Neither of these options seems desirable … but what to do?

The Solution?

Basically, at this stage, I don’t have a solution … so suggestions welcome!

One option is to make a clear distinction between dereferencing a field in a record from one in a process.  For example, we could view processes as “pointers” to records, and then use C/C++ syntax such as p->func to indicate we’re indirectly accessing field func in process p.

What I don’t like about this approach is the discrepancy between message send and process access syntax. For example, something like sys.out.println becomes sys->out.println() (this is a direct message send to the process referenced by field out in the process referenced by sys).

5 comments to Disambiguating Ambiguous Syntax?

  • James

    Heh. Solution – there isn’t one. Perhaps you can disambiguate via types, but that will be tricky and nonmodular. Grace doesn’t use () for “function” invocation for precisely this reason.

    Scala gets away with it (mostly) because it always knows – or tries to guess – the type of everything.

    Incidently, Thorn uses <–> & <– for sync & asynch message sends.

  • Hmmm, interesting, so has this issue arisen in the discussion of Grace?

    Using types is difficult because records and processes can both be dereferenced. But, by having distinct syntax for distinct concepts you can solve it … but you then move further away from e.g. Java syntax.

    Definitely a humdinger this one …

    I’ll have a look at Thorn and maybe canvas a whole bunch of other languages as well …

  • Lia

    I think the easiest way is using different syntax, and it could appear after the name instead of before the name. For example, in Scala, function is an object with a method named “apply”, then obj.func() is a method call and obj.func.apply() is a function invocation. (Although Scala can recognize obj.func() as a short form of the latter.) The same in Smalltalk except they use “values:” instead of “apply” and no short form.

  • […] The syntax for synchronous and asynchronous message sends is still causing headaches.  I’ve summarised the main problems in this post. […]

  • […] Send Syntax. Currently, the syntax for sending messages to actors is not finalised.  This post provides all of the interesting […]

Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>