Featherweight Musings: July 2014

Wednesday, July 23, 2014

LibHoare - pre- and postconditions in Rust

I wrote a small macro library for writing pre- and postconditions (design by contract style) in Rust. It is called LibHoare (named after the logic and in turn Tony Hoare) and is here (along with installation instructions). It should be easy to use in your Rust programs, especially if you use Cargo. If it isn't, please let me know by filing issues on GitHub.

The syntax is straightforward, you add `[#precond="predicate"]` annotations before a function where `predicate` is any Rust expression which will evaluate to a bool. You can use any variables which would be in scope where the function is defined and any arguments to the function. Preconditions are checked dynamically before a function is executed on every call to that function.

You can also write `[#postcond="predicate"]` which is checked on leaving a function and `[#invariant="predicate"]` which is checked before and after. You can write any combination of annotations too. In postconditions you can use the special variable `result` (soon to be renamed to `return`) to access the value returned by the function.

There are also `debug_*` versions of each annotation which are not checked in --ndebug builds.

The biggest limitation at the moment is that you can only write conditions on functions, not methods (even static ones). This is due to a restriction on where any annotation can be placed in the Rust compiler. That should be resolved at some point and then LibHoare should be pretty easy to update.

If you have ideas for improvement, please let me know! Contributions are very welcome.

# Implementation

The implementation of these syntax extensions is fairly simple. Where the old function used to be, we create a new function with the same signature and an empty body. Then we declare the old function inside the new function and call it with all the arguments (generating the list of arguments is the only interesting bit here because arguments in Rust can be arbitrary patterns). We then return the result of that function call as the result of the outer function. Preconditions are just an `assert!` inserted before calling the inner function and postconditions are an `assert!` inserted after the function call and before returning.

Thursday, July 17, 2014

Rust for C++ programmers - part 9: destructuring pt2 - match and borrowing

(Continuing from part 8, destructuring).

(Note this was significantly edited as I was wildly wrong the first time around. Thanks to /u/dpx-infinity for pointing out my mistakes.)

When destructuring there are some surprises in store where borrowing is concerned. Hopefully, nothing surprising once you understand borrowed references really well, but worth discussing (it took me a while to figure out, that's for sure. Longer than I realised, in fact, since I screwed up the first version of this blog post).

Imagine you have some `&Enum` variable `x` (where `Enum` is some enum type). You have two choices: you can match `*x` and list all the variants (`Variant1 => ...`, etc.) or you can match `x` and list reference to variant patterns (`&Variant1 => ...`, etc.). (As a matter of style, prefer the first form where possible since there is less syntactic noise). `x` is a borrowed reference and there are strict rules for how a borrowed reference can be dereferenced, these interact with match expressions in surprising ways (at least surprising to me), especially when you a modifying an existing enum in a seemingly innocuous way and then the compiler explodes on a match somewhere.

Before we get into the details of the match expression, lets recap Rust's rules for value passing. In C++, when assigning a value into a variable or passing it to a function there are two choices - pass-by-value and pass-by-reference. The former is the default case and means a value is copied either using a copy constructor or a bitwise copy. If you annotate the destination of the parameter pass or assignment with `&`, then the value is passed by reference - only a pointer to the value is copied and when you operate on the new variable, you are also operating on the old value.

Rust has the pass-by-reference option, although in Rust the source as well as the destination must be annotated with `&`. For pass-by-value in Rust, there are two further choices - copy or move. A copy is the same as C++'s semantics (except that there are no copy constructors in Rust). A move copies the value but destroys the old value - Rust's type system ensures you can no longer access the old value. As examples, `int` has copy semantics and `Box<int>` has move semantics:

fn foo() {
    let x = 7i;
    let y = x;                // x is copied
    println!("x is {}", x);   // OK

    let x = box 7i;
    let y = x;                // x is moved
    //println!("x is {}", x); // error: use of moved value: `x`
}

Rust determines if an object has move or copy semantics by looking for destructors. Destructors probably need a post of their own, but for now, an object in Rust has a destructor if it implements the `Drop` trait. Just like C++, the destructor is executed just before an object is destroyed. If an object has a destructor then it has move semantics. If it does not, then all of its fields are examined and if any of those do then the whole object has move semantics. And so on down the object structure. If no destructors are found anywhere in an object, then it has copy semantics.

Now, it is important that a borrowed object is not moved, otherwise you would have a reference to the old object which is no longer valid. This is equivalent to holding a reference to an object which has been destroyed after going out of scope - it is a kind of dangling pointer. If you have a pointer to an object, there could be other references to it. So if an object has move semantics and you have a pointer to it, it is unsafe to dereference that pointer. (If the object has copy semantics, dereferencing creates a copy and the old object will still exist, so other references will be fine).

OK, back to match expressions. As I said earlier, if you want to match some `x` with type `&T` you can dereference once in the match clause or match the reference in every arm of the match expression. Example:

enum Enum1 {
    Var1,
    Var2,
    Var3
}

fn foo(x: &Enum1) {
    match *x { // Option 1: deref here.
        Var1 => {}
        Var2 => {}
        Var3 => {}
    }

    match x {
        // Option 2: 'deref' in every arm.
        &Var1 => {}
        &Var2 => {}
        &Var3 => {}
    }
}

In this case you can take either approach because `Enum1` has copy semantics. Let's take a closer look at each approach: in the first approach we dereference `x` to a temporary variable with type `Enum1` (which copies the value in `x`) and then do a match against the three variants of `Enum1`. This is a 'one level' match because we don't go deep into the value's type. In the second approach there is no dereferencing. We match a value with type `&Enum1` against a reference to each variant. This match goes two levels deep - it matches the type (always a reference) and looks inside the type to match the referred type (which is `Enum1`).

Either way, we must ensure that we (that is, the compiler) must ensure we respect Rust's invariants around moves and references - we must not move any part of an object if it is referenced. If the value being matched has copy semantics, that is trivial. If it has move semantics then we must make sure that moves don't happen in any match arm. This is accomplished either by ignoring data which would move, or making references to it (so we get by-reference passing rather than by-move).

enum Enum2 {
    // Box has a destructor so Enum2 has move semantics.
    Var1(Box<int>),
    Var2,
    Var3
}

fn foo(x: &Enum2) {
    match *x {
        // We're ignoring nested data, so this is OK
        Var1(..) => {}
        // No change to the other arms.
        Var2 => {}
        Var3 => {}
    }

    match x {
        // We're ignoring nested data, so this is OK
        &Var1(..) => {}
        // No change to the other arms.
        &Var2 => {}
        &Var3 => {}
    }
}

In either approach we don't refer to any of the nested data, so none of it is moved. In the first approach, even though `x` is referenced, we don't touch its innards in the scope of the dereference (i.e., the match expression) so nothing can escape. We also don't bind the whole value (i.e., bind `*x` to a variable), so we can't move the whole object either.

We can take a reference to any variant in the second match, but not in the derferenced version. So, in the second approach replacing the second arm with `a @ &Var2 => {}` is OK (`a` is a reference), but under the first approach we couldn't write `a @ Var2 => {}` since that would mean moving `*x` into `a`. We could write `ref a @ Var2 => {}` (in which `a` is also a reference), although it's not a construct you see very often.

But what about if we want to use the data nested inside `Var1`? We can't write:

    match *x {
        Var1(y) => {}
        _ => {}
    }

    match x {
        &Var1(y) => {}
        _ => {}
    }

because in both cases it means moving part of `x` into `y`. We can use the 'ref' keyword to get a reference to the data in `Var1`: `&Var1(ref y) => {}`.That is OK, because now we are not dereferencing anywhere and thus not moving any part of `x`. Instead we are creating a pointer which points into the interior of `x`.

Alternatively, we could destructure the Box (this match is going three levels deep): `&Var1(box y) => {}`. This is OK because `int` has copy semantics and `y` is a copy of the `int` inside the `Box` inside `Var1` (which is 'inside' a borrowed reference). Since `int` has copy semantics, we don't need to move any part of `x`. We could also create a reference to the int rather than copy it: `&Var1(box ref y) => {}`. Again, this is OK, because we don't do any dereferencing and thus don't need to move any part of `x`. If the contents of the Box had move semantics, then we could not write `&Var1(box y) => {}`, we would be forced to use the reference version. We could also use similar techniques with the first approach to matching, which look the same but without the first `&`. For example, `Var1(box ref y) => {}`.

Now lets get more complex. Lets say you want to match against a pair of reference-to-enum values. Now we can't use the first approach at all:

fn bar(x: &Enum2, y: &Enum2) {
    // Error: x and y are being moved.
    // match (*x, *y) {
    //     (Var2, _) => {}
    //     _ => {}
    // }

    // OK.
    match (x, y) {
        (&Var2, _) => {}
        _ => {}
    }
}

The first approach is illegal because the value being matched is created by dereferencing `x` and `y` and then moving them both into a new tuple object. So in this circumstance, only the second approach works. And of course, you still have to follow the rules above for avoiding moving parts of `x` and `y`.

If you do end up only being able to get a reference to some data and you need the value itself, you have no option except to copy that data. Usually that means using `clone()`. If the data doesn't implement clone, you're going to have to further destructure to make a manual copy or implement clone yourself.

What if we don't have a reference to a value with move semantics, but the value itself. Now moves are OK, because we know no one else has a reference to the value (the compiler ensures that if they do, we can't use the value). For example,

fn baz(x: Enum2) {
    match x {
        Var1(y) => {}
        _ => {}
    }
}

There are still a few things to be aware of. Firstly, you can only move to one place. In the above example we are moving part of `x` into `y` and we'll forget about the rest. If we wrote `a @ Var1(y) => {}` we would be attempting to move all of `x` into `a` and part of `x` into `y`. That is not allowed, an arm like that is illegal. Making one of `a` or `y` a reference (using `ref a`, etc.) is not an option either, then we'd have the problem described above where we move whilst holding a reference. We can make both `a` and `y` references and then we're OK - neither is moving, so `x` remains in tact and we have pointers to the whole and a part of it.

Similarly (and more common), if we have a variant with multiple pieces of nested data, we can't take a reference to one datum and move another. For example if we had a `Var4` declared as `Var4(Box<int>, Box<int>)` we can have a match arm which references both (`Var4(ref y, ref z) => {}`) or a match arm which moves both (`Var4(y, z) => {}`) but you cannot have a match arm which moves one and references the other (`Var4(ref y, z) => {}`). This is because a partial move still destroys the whole object, so the reference would be invalid.

Sunday, July 13, 2014

Rust for C++ programmers - part 8: destructuring

First an update on progress. You probably noticed this post took quite a while to come out. Fear not, I have not given up (yet). I have been busy with other things, and there is a section on match and borrowing which I found hard to write and it turns out I didn't understand very well. It is complicated and probably deserves a post of its own, so after all the waiting, the interesting bit is going to need more waiting. Sigh.

I've also been considering the motivation of these posts. I really didn't want to write another tutorial for Rust, I don't think that is a valuable use of my time when there are existing tutorials and a new guide in the works. I do think there is something to be said for targeting tutorials at programmers with different backgrounds. My first motivation for this series of posts was that a lot of energy in the tutorial was expended on things like pointers and the intuition of ownership which I understood well from C++, and I wanted a tutorial that concentrated on the things I didn't know. That is hopefully where this has been going, but it is a lot of work, and I haven't really got on to the interesting bits. So I would like to change the format a bit to be less like a tutorial and more like articles aimed at programmers who know Rust to some extent, but know C++ a lot better and would like to bring up their Rust skills to their C++ level. I hope that complements the existing tutorials better and is more interesting for readers. I still have some partially written posts in the old style so they will get mixed in a bit. Let me know what you think of the idea in the comments.

Destructuring

Last time we looked at Rust's data types. Once you have some data structure, you will want to get that data out. For structs, Rust has field access, just like C++. For tuples, tuple structs, and enums you must use destructuring (there are various convenience functions in the library, but they use destructuring internally). Destructuring of data structures doesn't happen in C++, but it might be familiar from languages such as Python or various functional languages. The idea is that just as you can create a data structure by filling out its fields with data from a bunch of local variables, you can fill out a bunch of local variables with data from a data structure. From this simple beginning, destructuring has become one of Rust's most powerful features. To put it another way, destructuring combines pattern matching with assignment into local variables.

Destructuring is done primarily through the let and match statements. The match statement is used when the structure being desctructured can have difference variants (such as an enum). A let expression pulls the variables out into the current scope, whereas match introduces a new scope. To compare:

fn foo(pair: (int, int)) {
    let (x, y) = pair;
    // we can now use x and y anywhere in foo

    match pair {
        (x, y) => {
            // x and y can only be used in this scope
        }
    }
}

The syntax for patterns (used after `let` and before `=>` in the above example) in both cases is (pretty much) the same. You can also use these patterns in argument position in function declarations:

fn foo((x, y): (int, int)) {
}

(Which is more useful for structs or tuple-structs than tuples).

Most initialisation expressions can appear in a destructuring pattern and they can be arbitrarily complex. That can include references and primitive literals as well as data structures. For example,

struct St {
    f1: int,
    f2: f32
}

enum En {
    Var1,
    Var2,
    Var3(int),
    Var4(int, St, int)
}

fn foo(x: &En) {
    match x {
        &Var1 => println!("first variant"),
        &Var3(5) => println!("third variant with number 5"),
        &Var3(x) => println!("third variant with number {} (not 5)", x),
        &Var4(3, St{ f1: 3, f2: x }, 45) => {
            println!("destructuring an embedded struct, found {} in f2", x)
        }
        &Var4(_, x, _) => {
            println!("Some other Var4 with {} in f1 and {} in f2", x.f1, x.f2)
        }
        _ => println!("other (Var2)")
    }
}

Note how we destructure through a reference by using `&` in the patterns and how we use a mix of literals (`5`, `3`, `St { ... }`), wildcards (`_`), and variables (`x`).

You can use `_` wherever a variable is expected if you want to ignore a single item in a pattern, so we could have used `&Var3(_)` if we didn't care about the integer. In the first `Var4` arm we destructure the embedded struct (a nested pattern) and in the second `Var4` arm we bind the whole struct to a variable. You can also use `..` to stand in for all fields of a tuple or struct. So if you wanted to do something for each enum variant but don't care about the content of the variants, you could write:

fn foo(x: En) {
    match x {
        Var1 => println!("first variant"),
        Var2 => println!("second variant"),
        Var3(..) => println!("third variant"),
        Var4(..) => println!("fourth variant")
    }
}

When destructuring structs, the fields don't need to be in order and you can use `..` to elide the remaining fields. E.g.,

struct Big {
    field1: int,
    field2: int,
    field3: int,
    field4: int,
    field5: int,
    field6: int,
    field7: int,
    field8: int,
    field9: int,
}

fn foo(b: Big) {
    let Big { field6: x, field3: y, ..} = b;
    println!("pulled out {} and {}", x, y);
}

As a shorthand with structs you can use just the field name which creates a local variable with that name. The let statement in the above example created two new local variables `x` and `y`. Alternatively, you could write

fn foo(b: Big) {
let Big { field6, field3, ..} = b;
println!("pulled out {} and {}", field3, field6);
}

Now we create local variables with the same names as the fields, in this case `field3` and `field6`.

There are a few more tricks to Rust's destructuring. Lets say you want a reference to a variable in a pattern. You can't use `&` because that matches a reference, rather than creates one (and thus has the effect of dereferencing the object). For example,

struct Foo {
field: &'static int
}

fn foo(x: Foo) {
let Foo { field: &y } = x;
}

Here, `y` has type `int` and is a copy of the field in `x`.

To create a reference to something in a pattern, you use the `ref` keyword. For example,

fn foo(b: Big) {
let Big { field3: ref x, ref field6, ..} = b;
println!("pulled out {} and {}", *x, *field6);
}

Here, `x` and `field6` both have type `&int` and are references to the fields in `b`.

One last trick when destructuring is that if you are detructuring a complex object, you might want to name intermediate objects as well as individual fields. Going back to an earlier example, we had the pattern `&Var4(3, St{ f1: 3, f2: x }, 45)`. In that pattern we named one field of the struct, but you might also want to name the whole struct object. You could write `&Var4(3, s, 45)` which would bind the struct object to `s`, but then you would have to use field access for the fields, or if you wanted to only match with a specific value in a field you would have to use a nested match. That is not fun. Rust lets you name parts of a pattern using `@` syntax. For example `&Var4(3, s @ St{ f1: 3, f2: x }, 45)` lets us name both a field (`x`, for `f2`) and the whole struct (`s`).

That just about covers your options with Rust pattern matching. There are a few features I haven't covered, such as matching vectors, but hopefully you know how to use `match` and `let` and have seen some of the powerful things you can do. Next time I'll cover some of the subtle interactions between match and borrowing which tripped me up a fair bit when learning Rust.