Encoding C Strings in Whiley

In this post, we’re going to consider representing the classic C string in Whiley. This turns out to be useful as we can then try to verify properties about functions which operate on C strings (e.g. strlen(), strcpy(), etc). If you’re not familiar with C strings, then the main points are:

  1. Roughly speaking, C Strings are arrays of 8bit ASCII characters.
  2. C Strings are terminated by the special character '\0' (also called null terminated strings).
  3. C Strings do not carry any other length information (e.g. for the size of the containing memory chunk).

The interesting thing about Whiley is that we can encode these constraints within the language itself using only the built-in list and integer types. Specifically, we’re going to encode a C string as a constrained list of constrained integers. The list will be constrained to ensure it is null terminated, whilst the contained integers will be constrained to ensure they are between 0 and 255.

Before giving our definition of a C string in Whiley, we need to first define the notion of an 8bit ASCII character. This is done as follows:

// Define the ASCII 8bit character
type ASCII_char is (int n) where 0 <= n && n <= 255

Here, we have defined an 8bit ASCII character to be an integer which is constrained between 0 and 255 (i.e. to ensure it fits in 8 bits). Using this, we can now define our notion of a C string as follows.

// Define the null terminator
constant NULL is 0

type C_string is (ASCII_char[] str)
// Must have at least one character (i.e. null terminator)
where |str| > 0 && some { i in 0 .. |str| | str[i] == NULL }

Here, we have then defined a C_String to be a list of integers which is constrained to ensure it is always null terminated. That is, there is always at least one NULL terminator somewhere in the array (though there may be several).

As an example to illustrate the use of our C_string, we provide an implementation of the well-known strlen() function below:

// Determine the length of a C string.
function strlen(C_string str) -> (int r)
ensures str[r] == NULL
ensures all { k in 0 .. r | str[k] != NULL }:
  int i = 0
  while str[i] != NULL
  where i >= 0 && i < |str|
  where all { k in 0 .. i | str[k] != NULL}:
    i = i + 1
  return i

The Whiley compiler statically verifies this function does not overrun the string bounds. The loop invariant given by the where clause is needed as a hint to the verifier, but does not affect the function’s execution in any way.

Finally, you can see the complete implementation and check it verifies for yourself here on Whiley Play.

3 comments to Encoding C Strings in Whiley

  • Frederick J Blalalala

    Not NULL terminated but NUL terminated. NUL being the name of the first item in the ASCII chart.

    NULL typically refers to pointers and NUL to the zero’th ASCII character.

  • Frederick J Blalalala

    OK, wipe the comment and remain retarded. Typical Java viewpoint.

  • Hey Frederick,

    Sorry it got marked as spam. Not sure how!! I dug it out now.

    And, fair point btw!!


Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>