How I Learned to Stop Fighting the Borrow Checker and Love Dirty Structs

Karolin Varner
Adobe Tech Blog
Published in
7 min readAug 14, 2020

--

This post discusses how ad-hoc structs designed to be implementation details can be used to reintroduce aliasing to rust and cut down on the number of fights with the borrow checker.

This was going to be the post with the most clickbaitiest title I was ever going to write. I couldn’t quite bring myself to do it, but here is the title I was going to use: “How this one simple trick can be used to end most fights with the borrow checker.” Marvelously bad.

So, here is that one simple trick: When fighting with the borrow checker, consider moving your variables and your functions into a struct in place of using references.

Admittedly, this might need a bit more elaboration: When I started coding in rust, coming from other languages like javascript and c++, I noticed that a few of the patterns I was using consistently led to fights with the borrow checker. In fact, most of these patterns where intentionally prohibited in rust; usually with good reason which makes the entire situation just so much more infuriating. After going through this process a couple of times, I found that often the solution I would come up with involved refactoring my code by wrapping the offending variables and functions in a struct.

In this blog post we will look at a couple examples for these kinds of situations and finally discuss the underlying reason why creating a new struct is a solution that works in these cases.

title of the post superimposed on the image of a crab

Global variables

Global variables are a common occurrence in most languages, but in rust globals (or statics as they are called in rust) are discouraged. You can only initialize them with constant expressions which rules all constructors that need some sort of memory allocation or rely on dynamic data like the contents of an external file. This rule is the reason why the following example will fail at compile time:

We can solve this problem by using Vec::new(), which creates an empty vector and is a constant function. Any memory will be allocated later once we manually add our data:

Great, this works! Unfortunately we had to use unsafe blocks to add our data, because as it turns our rust prohibits accessing mutable global variables in safe code for thread safety reasons: Without this restrictions two threads may access the object at the same time, causing a race condition.

There is a number of ways to solve this problem; one of the easiest: do not use a global variable at all. By moving our global variables as well as the functions using them into a struct, we can not only sidesteps the thread safety/static mutable issue, but also start using a non constant initializer again.

Aliasing

The rust borrow checker bans mutable variable aliasing. This is a good thing, but for me it was a hard pill to swallow as aliasing is a feature I had been using quite a lot just to give descriptive names. Usually these aliases either take the form of a reference, renaming variable nested deeply in one of my objects. Other times my aliases take the form of a closure, giving a descriptive name to a simple code snippet as well as making that snipped reusable. Used carefully, aliasing can help structure long, complex pieces of code and aide in communicating your intentions to other developers well. Given the choice between an alias and a line of documentation, I will often choose the alias.

The following — broken — example uses a few aliases: avg is an alias because we need to use it multiple times and because it divides by zero for empty sample vectors. mag is an alias and not a free function because it's behaviour is not well defined for negative values. Finally, new_is_outlier is just there to give a relatively opaque formula a clear name.

Using aliases just made a bunch of relatively random code much easier to read and understand — just imagine how long it would take to understand that this code is doing outlier rejection if all you had was some weird, code containing a bit of math. Unfortunately, this code is also broken: our closures need to borrow a reference to the arguments and this stops us actually mutating the arguments. Luckily, by moving our code into a struct, we can keep the nice names and keep the reusability; our code becomes just a little harder to read a little more verbose.

Of course we could also solve this by promoting our aliases to become proper, well defined functions; avg could be a generic average function and the same goes for mag which could be turned into a factor_magnitude function. Both changes would introduce a lot of extra complexity; we would have to decide what avg should do if the input collection is empty, document it properly, test it properly, adapt it to work for any Iterator, use a longer name and explicitly pass in the iterator from the outside. Similar caveats apply for mag and new_is_outlier.

No, if we really just need those functions for add_sample and nowhere else, we might just treat them as implementation details and skip that extra maintenance required to make them function in any and all contexts.

Visitors

Finally I would like to look at an example quite similar to to the aliasing ones: Using closures as event handlers. In the following example, a function produces two sorts of events, one for warnings one for errors. Both are handled by passing a closure to the function.

Now we solve that the same way we solved lifetime problems in the previous examples: By moving our closures into a struct, as we did before. I will leave that as an exercise to the reader, because this time we are more interested in the sort of structure we end up with after the move.

What we end up with is called a visitor; a method of processing events or data in arbitrary order by assigning each event a method. This sort of construction is used in a lot of places; a particularly common application for visitors involves walking any sorts of tree-like structure. For instance, we could use a visitor to walk our file system tree; you could just declare one method for handling files, one for handling directories and one for symbolic links.

Here is another example, using a visitor to serialize data as JSON:

Note how we first define a trait with our events ( start_obj, end_obj, number, string, ...), then create separate sources ( example_json) and finally the sinks ( JsonGenerator, our visitor); this sort of decoupling using traits makes visitors the lowest common denominator for a lot of possible use cases. The decoder could be hard coded, read from an efficient-to-decode binary stream or be a full blown json decoder dynamically selecting with method/event to invoke. The same holds true for the visitor itself: this could just search for some specific value, generate json or even transform the events on the fly and forward them to another visitor. Applications using this pattern can be extremely efficient, relying on lot's of inlining or rely on much less efficient dynamic dispatch. They can be asynchronous or process a stream of events already stored in memory all at once.

This level of flexibility makes them especially suitable for data conversion frameworks. At the core of these, some sort of visitor pattern can usually be found; create decoders and encoders for a couple of formats and you have yourself a converter between any two of these formats. Finally, higher level, more convenient interfaces, like DOM style interfaces can be generated/exported using their own pair of encode/decode visitors.

Check out Serde’s Encoder Visitor Trait for for just one of the many examples.

Conclusion

In this post we looked at three examples of borrow checker complaints that can be solved by moving the offending functions, closures and variables into a struct. In each of these cases this worked because by using structs we implicitly started using a form of control inversion. Instead of storing a reference to whatever variable we needed inside our function, closure or our alias, we would hand over the variable from outside every time. Embedding our variables inside a struct just allowed us to keep things convenient and easy to read despite implementing inversion of control.

There is another commonality among our examples: All three are ad-hoc solutions you might often find in other languages, especially javascript; these sorts of patterns are used whenever we just need to reuse some snippet of code or assign an alternate name to a variable. I struggle even with the term ad-hoc solution (or worse, hack, quick fix) because having these reusable implementation details is often the best possible solution. Introducing a brand new public interface adds maintenance burden and prevents future refactoring. Expanding your public facing interface — your API — is a step that should not be taken lightly; premature abstraction is the bane of many software projects.

Object Oriented Development has taught us that having ad-hoc classes is not OK. Classes should be these pristine things with a nice API. Definitely no public fields. I urge you to let go of that notion; API design is an art in of it self; an excellent API is much more — and much less — than a list of structs contained in library.

In conclusion, I would say, that one important step in learning to work with the borrow checker instead of against it is this: Learn not to shy away from using structs as implementation details when implementing a functions.

--

--