leftpad() proved in Frama-C

Last Friday I found the lets-prove-leftpad repository on GitHub. The idea is to prove leftpad, a function that pads a string with a given character c, up to a given length n.

leftpad is notorious for being its own NPM package. The owner of the leftpad package decided to take it down, taking thousands of projects with it that had it as a dependency. This article on The Register has more details.

There was no example in lets-prove-leftpad using ACSL or Frama-C, so I decided to give that a shot. My work resulted in this pull request, which was merged shortly after. I have quoted both the implementation and the associated readme below.

leftpad.c

#include <string.h>

/*@ requires \valid(s + (0..n));
    requires valid_string(s);
    requires c != 0;

    behavior no_padding:
      assumes strlen(s) >= n;
      assigns \nothing;

    behavior padding:
      assumes strlen(s) < n;
      ensures valid_string(s);
      ensures \let l = strlen{Pre}(s); \let p = n - l;
        (\forall integer x; 0 <= x < p ==> s[x] == c) &&
        memcmp{Pre, Post}(s, s + p, l + 1) == 0 &&
        strlen(s) == n;
      assigns s[0 .. n];

    complete behaviors;
    disjoint behaviors;
 */
void leftpad(char c, size_t n, char *s) {
  size_t l = strlen(s);
  if (n > l) {
    size_t p = n - l;

    // This loop is equivalent to memmove(s+p, s, l+1)
    /*@ loop invariant x_range: 0 <= x <= l;
        loop invariant s_state: \forall integer i; 0 <= i <= n ==>
          (i <= x+p ==> s[i] == \at(s[i], LoopEntry)) &&
          (i >  x+p ==> s[i] == \at(s[i-p], LoopEntry));
        loop assigns x, s[p..n];
        loop variant x;
     */
    for (size_t x = l;; x--) {
      s[x+p] = s[x];
      if (x == 0) {
        break;
      }
    }

    // These assertions are not strictly necessary, but they speed up the verification
    //@ assert same_chars: memcmp{Pre, Here}(s, s+p, l+1) == 0 && s[n] == 0;
    //@ assert valid_str: valid_string(s+p);
    //@ assert same_len: strlen(s+p) == l;

    // This loop is equivalent to memset(s, c, p)
    /*@ loop invariant x_range2: 0 <= x <= p;
        loop invariant still_same: memcmp{Pre, LoopCurrent}(s, s+p, l+1) == 0 && s[n] == 0;
        loop invariant valid_grows: valid_string(s + p - x);
        loop invariant padding_grows: \forall integer i; p - x <= i < p ==> s[i] == c;
        loop assigns x, s[0..p-1];
        loop variant p - x;
     */
    for (size_t x = 0; x < p; x++) {
      s[p-1-x] = c;
    }
  }
}

readme.md

About Frama-C

From the Frama-C website:

Frama-C is an extensible and collaborative platform dedicated to source-code analysis of C software.

In other words, Frama-C is a set of tools for analysing C programs. Most relevant for this example is the Weakest Precondition plugin (-wp), which uses weakest precondition calculus to prove ANSI/ISO C Specification Language (ACSL) annotations in C programs.

ACSL is based on the Java Modeling Language (JML), so the annotations are quite similar to the Java example in this repository.

Debian ships with some rather old versions of Frama-C (14.0 Silicon and 16.0 Sulfur vs 20.0 Calcium), so I would recommend using the latest version which can be installed via opam.

About this proof

This example proves an in-place variant of leftpad implemented using loops. The latest version of Frama-C is used, version 20.0 (Calcium). An example invocation looks like this:

$ frama-c -wp -wp-rte -warn-unsigned-overflow leftpad.c
[kernel] Parsing leftpad.c (with preprocessing)
[rte] annotating function leftpad
[wp] 56 goals scheduled
[wp] [Cache] found:31
[wp] Proved goals:   56 / 56
  Qed:              25  (2ms-26ms-88ms)
  Alt-Ergo 2.3.0:   31  (23ms-306ms-3.8s) (2273) (cached: 31)

The option -wp enables the Weakest Precondition plugin (WP). -wp-rte tells WP to insert checks for run-time exceptions, which is Frama-C parlance for undefined behavior (UB) such as signed overflow or out-of-bounds memory access. -warn-unsigned-overflow inserts checks for unsigned overflow, which are defined in C, but since our loop variables are size_t we want checks that they don't overflow.

The output of the program tells us that all goals were proved. Some of the proof obligations could be trivially discharged by Frama-C's built-in checker Qed. The rest were verified by Alt-Ergo. On my machine (Intel(R) Core(TM) i5-3317U CPU @ 1.70GHz) this takes just over 7.8 seconds to run.

If you have frama-c-gui installed, it can be invoked similarly by just replacing the name of the program:

$ frama-c-gui -wp -wp-rte -warn-unsigned-overflow leftpad.c

The GUI is used to assist the provers in a semi-automatic fashion, and is outside the scope of this example.

Frama-C uses Why3 to interface with the provers, and usually ships with Alt-Ergo as its default prover.

Contract

First of all we'll take a look at the contract of the function:

/*@ requires \valid(s + (0..n));
    requires valid_string(s);
    requires c != 0;

    behavior no_padding:
      assumes strlen(s) >= n;
      assigns \nothing;

    behavior padding:
      assumes strlen(s) < n;
      ensures valid_string(s);
      ensures \let l = strlen{Pre}(s); \let p = n - l;
        \forall integer x; 0 <= x < p ==> s[x] == c &&
        memcmp{Pre, Post}(s, s + p, l + 1) == 0 &&
        strlen(s) == n;
      assigns s[0 .. n];

    complete behaviors;
    disjoint behaviors;
 */
void leftpad(char c, size_t n, char *s) {

The requires clauses are our preconditions. They say that we need s to have space for n+1 characters, that s must be a valid string and that the padding character cannot be NUL. valid_string is a predicate defined internally in Frama-C in the file __fc_string_axiomatic.h:

/*@ predicate valid_string{L}(char *s) =
  @   0 <= strlen(s) && \valid(s+(0..strlen(s)));

strlen in turn has an axiomatic definition in the same file which is too lengthy to quote here. After the preconditions we have two so-called behaviors: no_padding and padding.

no_padding says that if the length of s is n or more then the function assigns nothing, meaning the function has no side-effects. This implicitly means that the length and contents of s remain unchanged.

The padding behavior says that if the length of s is less than n then s will be padded. The two ensures clauses that follow are postconditions that specify what this means. valid_string(s) means that the resulting s will be a valid string, just as in the preconditions. The second clause defines the contents of s. It defines two local names using \let, for readability: one for the length of the input string (l) and another the amount of padding (p). The {Pre} in the strlen predicate specifies that it is the length of s when the function is called, not after it has been called. The latter would be strlen(s) or strlen{Post}(s). Let us take a closer look at the conjunction that makes up the rest of this postcondition:

(\forall integer x; 0 <= x < p ==> s[x] == c) says that the first p characters in s are c. memcmp{Pre, Post}(s, s + p, l + 1) == 0 uses the logic function memcmp to say that the contents of s at offset p is the same as the input string, including the NUL terminator. memcmp is defined axiomatically in __fc_string_axiomatic.h:

/*@ axiomatic MemCmp {
  @ logic ℤ memcmp{L1,L2}(char *s1, char *s2, ℤ n)
  @   reads \at(s1[0..n - 1],L1), \at(s2[0..n - 1],L2);
  @
  @ axiom memcmp_zero{L1,L2}:
  @   \forall char *s1, *s2; \forall ℤ n;
  @      memcmp{L1,L2}(s1,s2,n) == 0
  @      <==> \forall ℤ i; 0 <= i < n ==> \at(s1[i],L1) == \at(s2[i],L2);
  @
  @ }
  @*/

This definition only covers the memcmp() == 0 case, but this is enough for our purposes. There is a strcmp predicate which could be useful here, but it sadly cannot compare strings at different labels in Calcium. A not-in-place implementation of leftpad could make use of the strcmp predicate however.

Finally we have strlen(s) == n which says that the resulting string is of length n, which implies that there are no NUL characters in the range s[0..n-1].

The final assigns clause says that the padding behavior has the side effect of assigning elements in the range s[0..n]. In general it is possible that the function assigns only a subset of this range; assigns is allowed to specify a superset of the elements actually assigned.

The final two lines in the contract specify that the assumes clauses cover the entire behavior of the function. Specifically, the combination of complete behaviors and disjoint behaviors means that the preconditions imply strlen(s) >= n XOR strlen(s) < n. For more information about behaviors, see the ACSL manual on the Frama-C website.

Implementation

The implementation makes use of two loops, which are only executed if the length of the input string is less than n. The first loop is equivalent to memmove(s+p, s, l+1) and the second loop to memset(s, c, p). Let us look closer at the first loop:

/*@ loop invariant x_range: 0 <= x <= l;
    loop invariant s_state: \forall integer i; 0 <= i <= n ==>
      (i <= x+p ==> s[i] == \at(s[i], LoopEntry)) &&
      (i >  x+p ==> s[i] == \at(s[i-p], LoopEntry));
    loop assigns x, s[p..n];
    loop variant x;
 */
for (size_t x = l;; x--) {
  s[x+p] = s[x];
  if (x == 0) {
    break;
  }
}

This loop moves the string p elements to the right by starting at the end of the string and makings its way back to p elements from the start. The loop invariant x_range constrains the range of x. The invariant s_state describes the contents of s at each iteration in the loop. The loop assigns describes the side effects of the loop, namely assigning the loop variable and the elements of s that the loop moves the string into. To avoid underflowing x an explicit check against zero terminates the loop. Using ssize_t for x would be an alternative solution. The loop variant specifies that the loop terminates. We will go into more detail about loop variants further down.

Between the loops there are three assertions:

//@ assert same_chars: memcmp{Pre, Here}(s, s+p, l+1) == 0 && s[n] == 0;
//@ assert valid_str: valid_string(s+p);
//@ assert same_len: strlen(s+p) == l;

The purpose of these is to speed up the verification. Try removing them and see how it impacts the time spent by frama-c.

The final loop inserts the padding characters at the start of the string:

/*@ loop invariant x_range2: 0 <= x <= p;
    loop invariant still_same: memcmp{Pre, LoopCurrent}(s, s+p, l+1) == 0 && s[n] == 0;
    loop invariant valid_grows: valid_string(s + p - x);
    loop invariant padding_grows: \forall integer i; p - x <= i < p ==> s[i] == c;
    loop assigns x, s[0..p-1];
    loop variant p - x;
 */
for (size_t x = 0; x < p; x++) {
  s[p-1-x] = c;
}

x_range2 is similar to x_range. still_same affirms that the loop does not modify the range s[p..n]. It should not be necessary, but waiting for Alt-Ergo to figure that out would take ages. It is possible that enabling other provers such as Z3 (-wp-prover z3) might make still_same redundant. Feel free to experiment!

valid_grows says that the valid string grows at each iteration of the loop. padding_grows similarly says that the padding grows each iteration. Side effects are similar to the previous loop. Finally we come to the loop variant.

A loop variant in ACSL is an expression which is positive while the loop is executing, decreases at each iteration of the loop and becomes zero or less when the loop terminates. This allows Frama-C to reason about program termination. Support for verifying program complexity and termination is not supported in Frama-C at the moment (in Calcium).

A few notes about builtins

It is possible to implement leftpad using memmove and memset. Doing so requires the -builtin plugin, which is only available in the development version of Frama-C. It will be available in the next version of Frama-C, Scandium. It may also require some "handholding" of the provers via so-called proof scripts generated by the GUI.

Tomas' place in cyberspace