One Approach to Efficient Structural Subtyping

An interesting challenge with [[Structural type system|structural subtyping]] is that of efficient implementation.  In particular, without care, it may be impossible to determine a static offset for each field in a structure at runtime, meaning every field access will require a dictionary lookup.  In this post, I’ll review the problem and  outline one alternative approach.

The Problem

To properly understand the problem faced implementing a structural type system, we must consider things at a low level.  In languages like C or Java, it is possible to determine a static offset for all fields at runtime.  Consider this simple C example:

typedef struct { int day; int month; int year; } date;

int getYear(date *ptr) {
 return ptr->year

The C compiler will automatically determine a fixed layout for the struct named date (which may or may not including padding). For simplicity, let’s assume there’s no padding and an int corresponds to a [[Two’s complement|two’s-complement]] 32 bit integer. Then, the layout will look something like this:

Data Layout Example

Here, the offsets are measured in bytes and, thus, we know that the year field starts at an offset of 8 bytes.  This means that, in function getYear() above, the compiler can implement the field access directly by loading the int stored in memory at an offset of 8 bytes from ptr.

In the presence of structural subtyping, things are a little more complicated.  Consider a similar example in a psuedo-version of C which supports structural subtyping:

typedef struct { int day; int month; int year; } date;
typedef struct { int year; } hasYear;

int getYear(hasYear *ptr) {
 return ptr->year

This is very similar to the original version, except that getYear() has been refined to accept a pointer to any structure which has a field year. For example, it will accept a pointer to an instance of date, as well as a pointer to an instance of hasYear.

The problem is that the compiler is now unable to determine a static offset for the field year within function getYear(). This is because field year is at offset 8 in a date instance, and at offset 0 in a hasYear instance — and, indeed, it could be at any offset (depending on the available structural subtypes of hasYear). Therefore, the compiler is forced to resort to an altnerative strategy, such as using a dictionary lookup (i.e. where the struct is actually implemented as a dictionary keyed on field name).

One Solution

There are many different approaches to generating efficient code for languages with structural type systems, and I’m going to present just one here which I haven’t seen considered before.  In fact, it’s really rather simple — we’re going to distinguish between exact subtypes and variable subtypes (note: in type system terminology, this corresponds to the difference between depth and width subtyping).  An exact subtype must have exactly the same number of fields with the same names, whilst a variable subtype may have more fields than required.  And, we’ll provide a distinct notation, ..., to distinguish them. For example:

typedef struct { int day; int month; int year; } date;
typedef struct { int year; } hasYearExact;
typedef struct { ...; int year; ... } hasYearVariable;

int getYear(hasYearExact *ptr) {
 return ptr->year

int getYear(hasYearVariable *ptr) {
 return ptr->year

In this example, an instance of date is not a subtype of hasYearExact because it does not have exactly the same number of fields with the same names. In contrast, an instance of date is a subtype of hasYearVariable, because this uses the ... notation to indicate that subtypes are may have more fields.

So, what’s the advantage here? Well, essentially, the point is that with exact types you can always determine a static offset of fields, whilst with variable types you cannot.  This means that, in cases where performance is important, you can elect to use an exact type instead of a variable type to ensure that field accesses compile down to direct memory accesses.  This isn’t rocket science, but it does give an interesting and alternative approach to the problem!

5 comments to One Approach to Efficient Structural Subtyping

  • The problem with that solution is that it pushes the problem back onto the programmer. Is it’s hard to see how “data” vs “hasYearExact” really is structural sub-typing.

    But I’d have through something like trace compilation, or type-based inlining (based on concrete types of course) could finesse this problem easily – and without requiring programmer level annotations.

  • Yeah, I think that, in the context of a JIT, trace-based compilation can mostly resolve the issue. But, that’s not the only approach to compilation I’m considering with Whiley.

    And, I actually disagree with your comment regarding pushing it back onto the programmer. From my perspective, that’s a good thing!! Giving the programmer the ability to make simple, well-informed choices is exactly what a good language is about …

  • Daniel Yokomizo

    Daan Leijen has a great solution for this in one of his rows papers:

    IIRC it’s in “First-class labels for extensible rows”.

  • Hi Daniel,

    Thanks for the link! If I understand correctly, what they do is provide an array of offsets with the record, and these offsets are used to determine the actual field offsets. So, when passing a struct { int x; int y; int z } into a function requiring a struct { int y; int z; … }, an array with two entries (for y and z) is created. This is certainly better than using a dictionary lookup — thanks! i should have a look at the referenced paper by Gaster and Jones as well.

  • […] forced to employ some kind of dictionary (i.e. hash table) lookup on every field access.   See my earlier post for more on this problem.  Note, this problem is similar, for example, to that of implementing […]

Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>