TL;DR
- Use
Pin<Box<T>>
for the held value (here,Me
). Mark
Me
as unpinnable by setting a field asstd::marker::PhantomPinned
.Eg,
_pinned: std::marker::PhantomPinned
.Use
Rc<RefCell<H>>
for holder, (here,Holder
) and storeMe
as raw pointers.Eg,
me: BTreeSet<*const Me>
.Store
Holder
withinMe
usingRc::clone(...)
.- Store
Me
withinHolder
by converting it into a raw pointer. It is completely upto you if you want to store a mutable raw pointer or aconst
raw pointer. - Implement
Drop
trait such that it removes all references toMe
fromHolder
once it is dropped.
Click here to jump to solution.
Introduction
There are mainly two ways to represent self-referential struct in Rust.
Reference within the struct:
struct RefWithinMe { value: String, pointer_to_value: &'? str, }
These can easily be solved using
ouroboros
.#[macro_use] extern crate ouroboros; #[self_referential] struct RefWithinMe { value: String, #[borrows(value)] pointer_to_value: &'this str, }
We won't be discussing about these.
Reference outside the struct (A referential cycle):
use std::collections::BTreeSet; #[derive(Ord, PartialEq, Eq, PartialOrd)] struct Me<'a> { /// The my_holder that I hold. my_holder: &'a Holder<'a>, } /// The my_holder that holds `Me`. #[derive(Ord, PartialEq, Eq, PartialOrd)] struct Holder<'a> { set_of_me: BTreeSet<&'a Me<'a>>, } impl Me<'a> { fn new(my_holder: &'a mut Holder<'a>) -> Self { let this = Self { my_holder, }; // Insert myself here. my_holder.set_of_me.insert(&this); // but the value is being moved ๐ค this } }
We are going to discuss about this.
Problem Description
Given that you have two structs, a holder (Holder
, say) and a struct to hold
something (Me
, say), how will you establish a circular reference?
Experimentation
Try to run the above code and see what happens.
You get an error where Rust complaints about:
error[E0502]: cannot borrow `my_holder.set_of_me` as mutable because it is also
borrowed as immutable
|
15 | impl<'a> Me<'a> {
| -- lifetime `'a` defined here
...
18 | my_holder,
| ---------
| |
| immutable borrow occurs here
| requires that `*my_holder` is borrowed for `'a`
...
21 | my_holder.set_of_me.insert(&this);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here
error[E0597]: `this` does not live long enough
|
15 | impl<'a> Me<'a> {
| -- lifetime `'a` defined here
...
21 | my_holder.set_of_me.insert(&this);
| ---------------------------^^^^^-
| | |
| | borrowed value does not live long enough
| argument requires that `this` is borrowed for `'a`
...
25 | }
| - `this` dropped here while still borrowed
error[E0505]: cannot move out of `this` because it is borrowed
|
15 | impl<'a> Me<'a> {
| -- lifetime `'a` defined here
...
21 | my_holder.set_of_me.insert(&this);
| ---------------------------------
| | |
| | borrow of `this` occurs here
| argument requires that `this` is borrowed for `'a`
...
24 | this
| ^^^^ move out of `this` occurs here
error: aborting due to 3 previous errors
Aaaand, we hit a lifetime error!
I'll assume that you have a basic knowledge about Rust's lifetimes. If not, don't worry! I'll try to keep this article as simple as possible. But if you're interested, You can read my post on Rust Lifetimes.
Understanding the Error
So, here we see that Rust complaints us about this
variable not "living
long enough".
Pay attention to the message, "borrowed value does not live long enough".
Rust wants us to understand that if this
is moved, then
my_holder.set_of_me
will have a reference to a nonexistent object. And it's
never a good idea to dereference a reference (or, a pointer) to something that
does not exist. *SEGFAULT
intensifies ๐*
In short, Rust prevents the creation of dangling pointer.
But, this prevents us from creating a self-referential struct. Worry not,
because we can pull up some tricks to make the Rust compiler "think" that there
are no lifetime violations. However, it requires you to go down to the unsafe
level.
But first, let us talk about moves and why they are a big deal.
Understanding Moves in Rust
A "move" in a function happens when you return a value:
fn foo() -> String {
let value = "ABC".to_owned();
value // move occurs
}
Now, for ordinary objects, "moving" is fine. But since objects (and their references) created within the function have the lifetime of the function itself, Rust disallows it (thankfully!).
That also means, we need to find a way to tell Rust that "our move will keep the reference so please don't drop it!".
Raw Pointers of Rust
Let's jump right to the point. Raw pointers are your typical C-like pointers that are not bound to Rust's borrow checker.
But Rust's pointer suffer from a fatal flaw. They are not safe to use when two variables are swapped.
Let me demonstrate it. In fact, the following is adapted from The Rust Language Tutorial for Async. Follow this link to know more.
#[derive(Debug)]
struct Test {
value: String,
pointer_to_value: *const String,
}
impl Test {
fn new(txt: &str) -> Self {
let mut this = Test {
value: String::from(txt),
// In `C`, you'd call this initialize to `NULL`.
pointer_to_value: std::ptr::null(),
};
this.pointer_to_value = &this.value;
this
}
fn get_value(&self) -> &str {
&self.value
}
fn get_pointer_to_value(&self) -> &String {
unsafe { &*(self.pointer_to_value) }
}
}
Here we have a simple struct called Test
. pointer_to_value
points to
value
. Two methods (get_value
and get_pointer_to_value
) have been
provided as getters.
Now, let's say we create two instances of Test
and use our getters to
inspect the contents.
fn main() {
let mut test1 = Test::new("test1");
let mut test2 = Test::new("test2");
println!(
"test1: value: {}, ptr to value: {}",
test1.get_value(),
test1.get_pointer_to_value()
);
println!(
"test2: value: {}, ptr to value: {}",
test2.get_value(),
test2.get_pointer_to_value()
);
}
Executing the code gives:
test1: value: test1, ptr to value: test1
test2: value: test2, ptr to value: test2
Nothing out of the ordinary right? You get the contents of value
and
pointer_to_value
and print it. But the real danger happens when we swap the
variables.
fn main() {
// ...
println!("Swapping...");
std::mem::swap(&mut test1, &mut test2);
println!(
"test1: value: {}, ptr to value: {}",
test1.get_value(),
test1.get_pointer_to_value()
);
println!(
"test2: value: {}, ptr to value: {}",
test2.get_value(),
test2.get_pointer_to_value()
);
}
You might expect this to print:
Swapping...
test1: value: test2, ptr to value: test2
test2: value: test1, ptr to value: test1
But to our surprise, this prints:
Swapping...
test1: value: test2, ptr to value: test1
test2: value: test1, ptr to value: test2
Unfolding the Mystery of Raw Pointers
So why did that happen? The answer is quite simple. "A pointer follows its pointee".
When we swapped the variables, the contents got swapped too. But that does not mean the pointers will change their pointee. This image demonstrates the point:
But, this is not good as this might potentially lead to situations like "double free" and "incorrect access".
We need a way to tell Rust that our variables should not be moved. Or in other words, we need something that will encapsulate our objects and make them immovable.
Pin<T>
the objects
Meet Pin<T>
! They "pin" objects to its location in memory.
Now, you can either pin to the stack or pin to the heap. For this case, pinning to the heap is the simplest and most viable option.
In Rust, we know that we can put a variable to the heap by using Box<T>
,
Rc<T>
or Arc<T>
. For this case, Box<T>
is sufficient.
To our convenience, Box<T>
provides a method called Box::pin(...)
. We can
use this to pin our object to encapsulate and prevent it from incorrect moves.
So how do we use Pin<T>
? Let's modify our code to find out.
use std::pin::Pin;
#[derive(Debug)]
struct Test {
// same as before
}
impl Test {
fn new(txt: &str) -> Pin<Box<Self>> { // return Pin<Box<Self>>
let mut this = Box::pin(Test {
value: String::from(txt),
// In `C`, you'd call this initialize to `NULL`.
pointer_to_value: std::ptr::null(),
});
this.as_mut().pointer_to_value = &this.value;
this
}
fn get_value(self: Pin<&Self>) -> &str { // modify `self` parameter
&self.get_ref().value
}
// modify `self` parameter
fn get_pointer_to_value(self: Pin<&Self>) -> &String {
unsafe { &*(self.pointer_to_value) }
}
}
Note that whenever you use Pin<T>
, the self
argument is modified to accept
a Pin<T>
type only.
Running the main
function as before:
fn main() {
let mut test1 = Test::new("test1");
let mut test2 = Test::new("test2");
println!(
"test1: value: {}, ptr to value: {}",
test1.as_ref().get_value(),
test1.as_ref().get_pointer_to_value(),
);
println!(
"test2: value: {}, ptr to value: {}",
test2.as_ref().get_value(),
test2.as_ref().get_pointer_to_value(),
);
println!("Swapping...");
std::mem::swap(&mut test1, &mut test2);
println!(
"test1: value: {}, ptr to value: {}",
test1.as_ref().get_value(),
test1.as_ref().get_pointer_to_value()
);
println!(
"test2: value: {}, ptr to value: {}",
test2.as_ref().get_value(),
test2.as_ref().get_pointer_to_value()
);
}
This prints:
test1: value: test1, ptr to value: test1
test2: value: test2, ptr to value: test2
Swapping...
test1: value: test2, ptr to value: test2
test2: value: test1, ptr to value: test1
As expected!
Note that we can still get a mutable reference to our enclosed Test
by using
as_mut().get_mut()
. This cannot be good!
PhantomPinned
to the Rescue
By using std::marker::PhantomPinned
, we tell Rust that our object cannot be
unpinned (Rust calls it !Unpin
. Read as "Not Unpinnable").
Now, any calls to get_mut()
will fail.
So, how do you use it?
modify your struct to hold that extra field.
use std::{ + marker::PhantomPinned, pin::Pin, }; #[derive(Debug)] struct Test { value: String, pointer_to_value: *const String, + _pinned: PhantomPinned, }
Initialize
PhantomPinned
innew()
impl Test { fn new(txt: &str) -> Pin<Box<Self>> { let mut this = Box::pin(Test { value: String::from(txt), // In `C`, you'd call this initialize to `NULL`. pointer_to_value: std::ptr::null(), + _pinned: PhantomPinned, }); this.as_mut().pointer_to_value = &this.value; this } // ... }
Try to run it, and you'll face with yet another complaint from Rust! This time,
it will say: "cannot assign to data in a dereference of Pin<&mut Test>
".
So, what does that mean?
It means: in the presence of PhantomPinned
, you cannot modify, or mutate
through a dereference. This is a side effect of PhantomPinned
.
To overcome this problem, we have to "force our way" into the enclosed object.
That means, we will use unsafe
.
impl Test {
fn new(txt: &str) -> Pin<Box<Self>> {
let mut this = Box::pin(Test {
value: String::from(txt),
// In `C`, you'd call this initialize to `NULL`.
pointer_to_value: std::ptr::null(),
_pinned: PhantomPinned,
});
+ unsafe {
+ this.as_mut().get_unchecked_mut().pointer_to_value = &this.value;
+ }
this
}
// ...
}
Aaaand that's it! We have constructed a self-referential struct using Pin
.
But why did we go through all this? This doesn't even solve our original problem?!
We went through this in order to understand the mechanism of Rust's Pin
and
its associated intricacies.
Now that we have understood how pinning works, let us go back to our original problem.
Solving the Original Problem
Let us wrap Me
with Pin<T>
.
use std::{
marker::PhantomPinned,
pin::Pin,
}
#[derive(Debug)]
struct Me<'a> {
my_holder: &'a Holder,
_pinned: PhantomPinned,
}
impl Me {
pub fn new(my_holder: &'a mut Holder) -> Pin<Box<Self>> {
let this = Box::pin(Self {
my_holder,
_pinned: PhantomPinned,
});
let this_ptr: *mut _ = unsafe { val.as_mut().get_unchecked_mut() };
my_holder.set_of_me.insert(this_ptr);
val
}
}
But we have a small problem with this scheme. But first, let us check what
Holder
looks like:
#[derive(Debug)]
struct Holder {
set_of_me: BTreeSet<*mut Me>,
}
Holder
holds several instances of Me
. Or in other words, Me
inserts
itself in Holder
, and it can insert at any time at any place. But if we keep
using references for Holder
(ie, &'a mut Holder
), we'll soon realize that
Rust does not allow us to go beyond what Borrow Checker permits.
We need to find a way to share references without letting Rust know that it is a reference. And it should be destroyed once its job is done.
Rc<T>
and RefCell<T>
Rc<T>
is the reference counted version of Box<T>
, and RefCell<T>
allows
you to mutate a seemingly immutable object. It is useful mostly in these
contexts. So how do we use it?
First, we create a type alias for shorthand!
type RcCell<T> = Rc<RefCell<T>>;
Next, we modify Holder
to return a RcCell<Holder>
.
use std::{
cell::RefCell,
rc::Rc,
}
#[derive(Debug)]
struct Holder {
set_of_me: BTreeSet<*mut Me>,
}
impl Holder {
fn new() -> RcCell<Self> {
Rc::new(RefCell::new(Self {
set_of_me: Default::default(),
}))
}
}
Let's discuss a bit about RefCell<T>
.
Why RefCell<T>
?
Let's see what happens when we try to modify anything in an Rc<T>
:
fn main() {
let mut x = Rc::new(String::from("EEE"));
x.push('e');
}
The last line will complain: "cannot borrow data in an Rc
as mutable".
That means, Rc<T>
acts as an immutable enclosure around the object.
Now let's see what happens when we use RefCell<T>
:
fn main() {
let mut x = Rc::new(RefCell::new(String::from("EEE")));
x.borrow_mut().push('e');
}
RefCell<T>
adds a new method called borrow_mut()
that allows you to receive
a mutable reference to the underlying object. And the added benefit of using
RefCell<T>
is that it strictly follows Rust's borrowing rules. This ensures
nothing funny or obscure happens with the object.
Putting the Pieces Together
Let us tell Me
to accept a RcCell<Holder>
.
#[derive(Debug)]
struct Me {
+ my_holder: RcCell<Holder>,
_pinned: PhantomPinned,
}
impl Me {
pub fn new(
+ my_holder: RcCell<Holder>,
) -> Pin<Box<Self>> {
let mut this = Box::pin(Self {
my_holder,
_pinned: PhantomPinned,
});
let this_ptr: *mut _ = unsafe { this.as_mut().get_unchecked_mut() };
+ this.my_holder.borrow_mut().set_of_me.insert(this_ptr);
this
}
}
But before you jump and rejoice, think about what happens in the end? What
happens to the references in Holder
's set_of_me
after an instance of Me
has been destroyed?
Right! They are still there as DANGLING POINTERS!
We need a way to remove Me
from Holder
's set_of_me
when it will be
destroyed. Fortunately, we have something called Drop
trait in Rust.
impl Drop for Me {
fn drop(&mut self) {
let this = &(self as *mut _);
self.my_holder.borrow_mut().set_of_me.remove(this);
}
}
And that's it.
Why not Weak<T>
for storing Me
?
Now you could ask, why go through all the pain of raw pointers and Pins when
I could've used Weak<T>
.
Yes you could've, but Weak<T>
does not implement Ord
, PartialOrd
,
PartialEq
etc. This will limit usability to a great degree.
Similarly, you cannot use Rc<T>
for storing Me
because Rc<T>
will not
help you in resolving the circular reference and will create a memory leak.
Solution
Here is the entire solution. Run it and experiment with it to understand how it works.
use std::{
cell::RefCell, collections::BTreeSet, marker::PhantomPinned, pin::Pin,
rc::Rc,
};
type RcCell<T> = Rc<RefCell<T>>;
#[derive(Debug)]
struct Holder {
set_of_me: BTreeSet<*mut Me>,
}
impl Holder {
fn new() -> RcCell<Self> {
Rc::new(RefCell::new(Self {
// this is initially empty
set_of_me: Default::default(),
}))
}
/// Mutate every value of `Me`.
/// Note how a pinned value is reconstructed.
fn mutate_value_of_me(&self, val: i32) {
self.set_of_me.iter().for_each(|a| {
let a = unsafe { Pin::new_unchecked(&mut **a) };
a.mutate_me(val);
})
}
}
#[derive(Debug)]
struct Me {
name: String,
mutate_by_holder: i32,
my_holder: RcCell<Holder>,
_pinned: PhantomPinned,
}
impl Me {
/// Accept a name
pub fn new(
holder: RcCell<Holder>,
name: impl Into<String>,
) -> Pin<Box<Self>> {
let mut this = Box::pin(Self {
name: name.into(),
mutate_by_holder: 0,
my_holder: holder,
_pinned: PhantomPinned,
});
let this_ptr: *mut _ = unsafe { this.as_mut().get_unchecked_mut() };
this.my_holder.borrow_mut().set_of_me.insert(this_ptr);
this
}
/// Allows you to mutate a value within me.
/// Run this from `Holder` to see what happens.
fn mutate_me(self: Pin<&mut Self>, val: i32) {
let this = unsafe { self.get_unchecked_mut() };
this.mutate_by_holder += val;
}
}
impl Drop for Me {
fn drop(&mut self) {
println!("Dropping {:#?}", self);
let this = &(self as *mut _);
self.my_holder.borrow_mut().set_of_me.remove(this);
}
}
/// A test function to play with `Holder`s.
fn make_ref_of_holder(holder: RcCell<Holder>) {
let holder = Rc::clone(&holder);
println!("Making a ref of {:?}", holder);
println!("No. of refs = {}", Rc::strong_count(&holder));
}
pub fn main() {
let holder = Holder::new();
// be explicit about `Rc`'s cloning
let a = Me::new(Rc::clone(&holder), "a");
let b = Me::new(Rc::clone(&holder), "b");
holder.borrow().mutate_value_of_me(455);
make_ref_of_holder(Rc::clone(&holder));
}
If you look at the code carefully and the code that was shown before, you'd notice that we are suffering from one big issue: Management of duplicate objects.
Duplicate Objects
To see what the problem is, add the following line to the code given in the solution.
// ...
pub fn main() {
let holder = Holder::new();
// be explicit about `Rc`'s cloning
let a = Me::new(Rc::clone(&holder), "a");
let b = Me::new(Rc::clone(&holder), "b");
+ let c = Me::new(Rc::clone(&holder), "b");
holder.borrow().mutate_value_of_me(455);
make_ref_of_holder(Rc::clone(&holder));
}
We might think, "Oh since we are storing our Me
objects in a set, c
will not be added."
But notice that the definition of set_of_me
is BTreeSet<*mut Me>
. Which means, the set stores unique pointers to Me
, not unique Me
s itself! And that's why c
gets stored in holder
.
So what's the solution?
One way to solve this issue is to create a wrapper type over *mut Me
that will allow dereferencing, and thus comparison of objects.
We will discuss about that in Part 2.
Conclusion
I hope this article helps you in understanding Rust's Self Referential Structs a bit better.
Questions? Comments? Concerns? Please put them down below and I'd be happy to help you.
Cover Image Source: Manjaro's /usr/share/backgrounds
folder ๐