Creating Macros in Rust [Tutorial]

Since Rust 1.0 has a great macro system, it allows us to apply some code to multiple types or expressions, as they work by expanding themselves at compile time. This means that when you use a macro, you are effectively writing a lot of code before the actual compilation starts. This has two main benefits, first, the codebase can be easier to maintain by being smaller and reusing code. Second, since macros expand before starting the creation of object code, you can abstract at the syntactic level.

In this article, we'll learn how to create our very own macros in Rust.

This Rust tutorial is an extract from Rust High Performance, authored by Iban Eguia Moraza.

For example, you can have a function like this one:

fn add_one(input: u32) -> u32 {
    input + 1
}

This function restricts the input to u32 types and the return type to u32. We could add some more accepted types by using generics, which may accept &u32 if we use the Add trait. Macros allow us to create this kind of code for any element that can be written to the left of the + sign and it will be expanded differently for each type of element, creating a different code for each case.

To create a macro, you will need to use a macro built into the language, the macro_rules!{} macro. This macro receives the name of the new macro as a first parameter and a block with the macro code as a second element. The syntax can be a bit complex the first time you see it, but it can be learned quickly. Let's start with a macro that does just the same as the function we saw before:

macro_rules! add_one {
    ($input:expr) => {
        $input + 1
    }
}

You can now call that macro from your main() function by calling add_one!(integer);. Note that the macro needs to be defined before the first call, even if it's in the same file. It will work with any integer, which wasn't possible with functions.

Let's analyze how the syntax works. In the block after the name of the new macro (add_one), we can see two sections. In the first part, on the left of the =>, we see $input:expr inside parentheses. Then, to the right, we see a Rust block where we do the actual addition.

The left part works similarly (in some ways) to a pattern match. You can add any combination of characters and then some variables, all of them starting with a dollar sign ($) and showing the type of variable after a colon. In this case, the only variable is the $input variable and it's an expression. This means that you can insert any kind of expression there and it will be written in the code to the right, substituting the variable with the expression.

Creating Macro variants

As you can see, it's not as complicated as you might think. As I wrote, you can have almost any pattern to the left of the macro_rules!{} side. Not only that, you can also have multiple patterns, as if it were a match statement, so that if one of them matches, it will be the one expanded. Let's see how this works by creating a macro which, depending on how we call it, will add one or two to the given integer:

macro_rules! add {
    {one to $input:expr} => ($input + 1);
    {two to $input:expr} => ($input + 2);
}

fn main() {
println!("Add one: {}", add!(one to 25/5));
println!("Add two: {}", add!(two to 25/5));
}

You can see a couple of clear changes to the macro. First, we swapped braces for parentheses and parentheses for braces in the macro. This is because in a macro, you can use interchangeable braces ({ and }), square brackets ([ and ]), and parentheses (( and )). Not only that, you can use them when calling the macro. You have probably already used the vec![] macro and the format!() macro, and we saw the lazy_static!{} macro in the last chapter. We use brackets and parentheses here just for convention, but we could call the vec!{} or the format![] macros the same way, because we can use braces, brackets, and parentheses in any macro call.

The second change was to add some extra text to our left-hand side patterns. We now call our macro by writing the text one to or two to, so I also removed the one redundancy to the macro name and called it add!(). This means that we now call our macro with literal text. That is not valid Rust, but since we are using a macro, we modify the code we are writing before the compiler tries to understand actual Rust code and the generated code is valid. We could add any text that does not end the pattern (such as parentheses or braces) to the pattern.

The final change was to add a second possible pattern. We can now add one or two and the only difference will be that the right side of the macro definition must now end with a trailing semicolon for each pattern (the last one is optional) to separate each of the options.

A small detail that I also added in the example was when calling the macro in the main() function. As you can see, I could have added one or two to 5, but I wrote 25/5 for a reason. When compiling this code, this will be expanded to 25/5 + 1 (or 2, if you use the second variant). This will later be optimized at compile time, since it will know that 25/5 + 1 is 6, but the compiler will receive that expression, not the final result. The macro system will not calculate the result of the expression; it will simply copy in the resulting code whatever you give to it and then pass it to the next compiler phase.

You should be especially careful with this when a macro you are creating calls another macro. They will get expanded recursively, one inside the other, so the compiler will receive a bunch of final Rust code that will need to be optimized. Issues related to this were found in the CLAP crate that we saw in the last chapter, since the exponential expansions were adding a lot of bloat code to their executables. Once they found out that there were too many macro expansions inside the other macros and fixed it, they reduced the size of their binary contributions by more than 50%.

Macros allow for an extra layer of customization. You can repeat arguments more than once. This is common, for example, in the vec![] macro, where you create a new vector with information at compile time. You can write something like vec![3, 4, 76, 87];. How does the vec![] macro handle an unspecified number of arguments?

Creating Complex macros

We can specify that we want multiple expressions in the left-hand side pattern of the macro definition by adding a * for zero or more matches or a + for one or more matches. Let's see how we can do that with a simplified my_vec![] macro:

macro_rules! my_vec {
    ($($x: expr),*) => {{
        let mut vector = Vec::new();
        $(vector.push($x);)*
        vector
    }}
}

Let's see what is happening here. First, we see that on the left side, we have two variables, denoted by the two $ signs. The first makes reference to the actual repetition. Each comma-separated expression will generate a $x variable. Then, on the right side, we use the various repetitions to push $x to the vector once for every expression we receive.

There is another new thing on the right-hand side. As you can see, the macro expansion starts and ends with a double brace instead of using only one. This is because, once the macro gets expanded, it will substitute the given expression for a new expression: the one that gets generated. Since what we want is to return the vector we are creating, we need a new scope where the last sentence will be the value of the scope once it gets executed. You will be able to see it more clearly in the next code snippet.

We can call this code with the main() function:

fn main() {
    let my_vector = my_vec![4, 8, 15, 16, 23, 42];
    println!("Vector test: {:?}", my_vector);
}

It will be expanded to this code:

fn main() {
    let my_vector = {
        let mut vector = Vec::new();
        vector.push(4);
        vector.push(8);
        vector.push(15);
        vector.push(16);
        vector.push(23);
        vector.push(42);
        vector
    };
    println!("Vector test: {:?}", my_vector);
}

As you can see, we need those extra braces to create the scope that will return the vector so that it gets assigned to the my_vector binding.

You can have multiple repetition patterns on the left expression and they will be repeated for every use, as needed on the right.

macro_rules! add_to_vec {
    ($( $x:expr; [ $( $y:expr ),* ]);* ) => {
        &[ $($( $x + $y ),*),* ]
    }
}

In this example, the macro can receive one or more $x; [$y1, $y2,...] input. So, for each input, it will have one expression, then a semicolon, then a bracket with multiple sub-expressions separated by a comma, and finally, another bracket and a semicolon. But what does the macro do with this input? Let's check to the right-hand side of it.

As you can see, this will create multiple repetitions. We can see that it creates a slice (&[T]) of whatever we feed to it, so all the expressions we use must be of the same type. Then, it will start iterating over all $x variables, one per input group. So if we feed it only one input, it will iterate once for the expression to the left of the semicolon. Then, it will iterate once for every $y expression associated with the $x expression, add them to the + operator, and include the result in the slice.

If this was too complex to understand, let's look at an example. Let's suppose we call the macro with 65; [22, 34] as input. In this case, 65 will be $x, and 22, 24, and so on will be $y variables associated with 65. So, the result will be a slice like this: &[65+22, 65+34]. Or, if we calculate the results: &[87, 99].

If, on the other hand, we give two groups of variables by using 65; [22, 34]; 23; [56, 35] as input, in the first iteration, $x will be 65, while in the second one, it will be 23. The $y variables of 64 will be 22 and 34, as before, and the ones associated with 23 will be 56 and 35. This means that the final slice will be &[87, 99, 79, 58], where 87 and 99 work the same way as before and 79 and 58 are the extension of adding 23 to 56 and 23 to 35.

This gives you much more flexibility than the functions, but remember, all this will be expanded during compile time, which can make your compilation time much slower and the final codebase larger and slower still if the macro used duplicates too much code. In any case, there is more flexibility to it yet.

So far, all variables have been of the expr kind. We have used this by declaring $x:expr and $y:expr but, as you can imagine, there are other kinds of macro variables. The list follows:

expr: Expressions that you can write after an = sign, such as 76+4 or if a==1 {"something"} else {"other thing"}.

ident: An identifier or binding name, such as foo or bar.

path: A qualified path. This will be a path that you could write in a use sentence, such as foo::bar::MyStruct or foo::bar::my_func.

ty: A type, such as u64 or MyStruct. It can also be a path to the type.

pat: A pattern that you can write at the left side of an = sign or in a match expression, such as Some(t) or (a, b, _).

stmt: A full statement, such as a let binding like let a = 43;.

block: A block element that can have multiple statements and a possible expression between braces, such as {vec.push(33); vec.len()}.

item: What Rust calls items. For example, function or type declarations, complete modules, or trait definitions.

meta: A meta element, which you can write inside of an attribute (#[]). For example, cfg(feature = "foo").

tt: Any token tree that will eventually get parsed by a macro pattern, which means almost anything. This is useful for creating recursive macros, for example.

As you can imagine, some of these kinds of macro variables overlap and some of them are just more specific than the others. The use will be verified on the right-hand side of the macro, in the expansion, since you might try to use a statement where an expression must be used, even though you might use an identifier too, for example.

There are some extra rules, too, as we can see in the Rust documentation (https://doc.rust-lang.org/book/first-edition/macros.html#syntactic-requirements). Statements and expressions can only be followed by =>, a comma, or a semicolon. Types and paths can only be followed by =>, the as or where keywords, or any commas, =, |, ;, :, >, [, or {. And finally, patterns can only be followed by =>, the if or in keywords, or any commas, =, or |.

Let's put this in practice by implementing a small Mul trait for a currency type we can create. This is an adapted example of some work we did when creating the Fractal Credits digital currency. In this case, we will look to the implementation of the Amount type (https://github.com/FractalGlobal/utils-rs/blob/49955ead9eef2d9373cc9386b90ac02b4d5745b4/src/amount.rs#L99-L102), which represents a currency amount. Let's start with the basic type definition:

#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord)]
pub struct Amount {
    value: u64,
}

This amount will be divisible by up to three decimals, but it will always be an exact value. We should be able to add an Amount to the current Amount, or to subtract it. I will not explain these trivial implementations, but there is one implementation where macros can be of great help. We should be able to multiply the amount by any positive integer, so we should implement the Mul trait for u8, u16, u32, and u64 types. Not only that, we should be able to implement the Div and the Rem traits, but I will leave those out, since they are a little bit more complex. You can check them in the implementation linked earlier.

The only thing the multiplication of an Amount with an integer should do is to multiply the value by the integer given. Let's see a simple implementation for u8:

use std::ops::Mul;
impl Mul<u8> for Amount {
type Output = Self;

fn mul(self, rhs: u8) -> Self::Output {
Self { value: self.value * rhs as u64 }
}
}

impl Mul<Amount> for u8 {
type Output = Amount;

fn mul(self, rhs: Amount) -> Self::Output {
Self::Output { value: self as u64 * rhs.value }
}
}

As you can see, I implemented it both ways so that you can put the Amount to the left and to the right of the multiplication. If we had to do this for all integers, it would be a big waste of time and code. And if we had to modify one of the implementations (especially for Rem functions), it would be troublesome to do it in multiple code points. Let's use macros to help us.

We can define a macro, impl_mul_int!{}, which will receive a list of integer types and then implement the Mul trait back and forward between all of them and the Amount type. Let's see:

macro_rules! impl_mul_int {
    ($($t:ty)*) => ($(
        impl Mul<$t> for Amount {
            type Output = Self;

fn mul(self, rhs: $t) -> Self::Output {
Self { value: self.value * rhs as u64 }
}
}

impl Mul<Amount> for $t {
type Output = Amount;

fn mul(self, rhs: Amount) -> Self::Output {
Self::Output { value: self as u64 * rhs.value }
}
}
)*)
}

impl_mul_int! { u8 u16 u32 u64 usize }

As you can see, we specifically ask for the given elements to be types and then we implement the trait for all of them. So, for any code that you want to implement for multiple types, you might as well try this approach, since it will save you from writing a lot of code and it will make it more maintainable.

If you found this article useful and would like to learn more such tips, head on over to pick up the book, Rust High Performance, authored by Iban Eguia Moraza.

Perform Advanced Programming with Rust

Rust 1.28 is here with global allocators, nonZero types and more

Eclipse IDE's Photon release will support Rust