### Articles

• No categories

# One Approach to Efficient Structural Subtyping

An interesting challenge with [[Structural type system|structural subtyping]] is that of efficient implementation.  In particular, without care, it may be impossible to determine a static offset for each field in a structure at runtime, meaning every field access will require a dictionary lookup.  In this post, I’ll review the problem and  outline one alternative approach.

## The Problem

To properly understand the problem faced implementing a structural type system, we must consider things at a low level.  In languages like C or Java, it is possible to determine a static offset for all fields at runtime.  Consider this simple C example:

```typedef struct { int day; int month; int year; } date;

int getYear(date *ptr) {
return ptr->year
}
```

The C compiler will automatically determine a fixed layout for the `struct` named `date` (which may or may not including padding). For simplicity, let’s assume there’s no padding and an `int` corresponds to a [[Two’s complement|two’s-complement]] 32 bit integer. Then, the layout will look something like this:

Here, the offsets are measured in bytes and, thus, we know that the `year` field starts at an offset of 8 bytes.  This means that, in function `getYear()` above, the compiler can implement the field access directly by loading the `int` stored in memory at an offset of 8 bytes from `ptr`.

In the presence of structural subtyping, things are a little more complicated.  Consider a similar example in a psuedo-version of C which supports structural subtyping:

```typedef struct { int day; int month; int year; } date;
typedef struct { int year; } hasYear;

int getYear(hasYear *ptr) {
return ptr->year
}
```

This is very similar to the original version, except that `getYear()` has been refined to accept a pointer to any structure which has a field `year`. For example, it will accept a pointer to an instance of `date`, as well as a pointer to an instance of `hasYear`.

The problem is that the compiler is now unable to determine a static offset for the field `year` within function `getYear()`. This is because field `year` is at offset 8 in a `date` instance, and at offset 0 in a `hasYear` instance — and, indeed, it could be at any offset (depending on the available structural subtypes of `hasYear`). Therefore, the compiler is forced to resort to an altnerative strategy, such as using a dictionary lookup (i.e. where the `struct` is actually implemented as a dictionary keyed on field name).

## One Solution

There are many different approaches to generating efficient code for languages with structural type systems, and I’m going to present just one here which I haven’t seen considered before.  In fact, it’s really rather simple — we’re going to distinguish between exact subtypes and variable subtypes (note: in type system terminology, this corresponds to the difference between depth and width subtyping).  An exact subtype must have exactly the same number of fields with the same names, whilst a variable subtype may have more fields than required.  And, we’ll provide a distinct notation, `...`, to distinguish them. For example:

```typedef struct { int day; int month; int year; } date;
typedef struct { int year; } hasYearExact;
typedef struct { ...; int year; ... } hasYearVariable;

int getYear(hasYearExact *ptr) {
return ptr->year
}

int getYear(hasYearVariable *ptr) {
return ptr->year
}
```

In this example, an instance of `date` is not a subtype of `hasYearExact` because it does not have exactly the same number of fields with the same names. In contrast, an instance of `date` is a subtype of `hasYearVariable`, because this uses the `...` notation to indicate that subtypes are may have more fields.

So, what’s the advantage here? Well, essentially, the point is that with exact types you can always determine a static offset of fields, whilst with variable types you cannot.  This means that, in cases where performance is important, you can elect to use an exact type instead of a variable type to ensure that field accesses compile down to direct memory accesses.  This isn’t rocket science, but it does give an interesting and alternative approach to the problem!

### 5 comments to One Approach to Efficient Structural Subtyping

• The problem with that solution is that it pushes the problem back onto the programmer. Is it’s hard to see how “data” vs “hasYearExact” really is structural sub-typing.

But I’d have through something like trace compilation, or type-based inlining (based on concrete types of course) could finesse this problem easily – and without requiring programmer level annotations.

• Yeah, I think that, in the context of a JIT, trace-based compilation can mostly resolve the issue. But, that’s not the only approach to compilation I’m considering with Whiley.

And, I actually disagree with your comment regarding pushing it back onto the programmer. From my perspective, that’s a good thing!! Giving the programmer the ability to make simple, well-informed choices is exactly what a good language is about …

• Daniel Yokomizo

Daan Leijen has a great solution for this in one of his rows papers:
http://research.microsoft.com/en-us/um/people/daan/pubs.html

IIRC it’s in “First-class labels for extensible rows”.

• Hi Daniel,

Thanks for the link! If I understand correctly, what they do is provide an array of offsets with the record, and these offsets are used to determine the actual field offsets. So, when passing a struct { int x; int y; int z } into a function requiring a struct { int y; int z; … }, an array with two entries (for y and z) is created. This is certainly better than using a dictionary lookup — thanks! i should have a look at the referenced paper by Gaster and Jones as well.

• […] forced to employ some kind of dictionary (i.e. hash table) lookup on every field access.   See my earlier post for more on this problem.  Note, this problem is similar, for example, to that of implementing […]