I'm working with a friend to define a safe public API for lifetimes of a "scoped" garbage collector. The lifetimes are either overly constrained and correct code does not compile or the lifetimes are too loose and they may allow invalid behavior. After trying multiple approaches, we are still stuck getting a correct API. This is especially frustrating because Rust's lifetimes can help avoid bugs in this situation but right now it just looks stubborn.
Scoped garbage collection
I am implementing an ActionScript interpreter and need a garbage collector. I studied rust-gc but it did not suit my needs. The main reason is that it requires the garbage collected values to have a static lifetime because the GC state is a thread-local static variable. I need to get garbage-collected bindings to a dynamically created host object. The other reason to avoid globals is that it is easier for me to handle multiple independent garbage-collected scopes, control their memory limits or serialize them.
A scoped garbage collector is similar to a typed-arena. You can use it to allocate values and they are all freed once the garbage collector is dropped. The difference is that you can also trigger garbage collection during its lifetime and it will clean-up the unreachable data (and is not limited to a single type).
I have a working implementation implemented (mark & sweep GC with scopes), but the interface is not yet safe to use.
Here is a usage example of what I want:
pub struct RefNamedObject<'a> {
pub name: &'a str,
pub other: Option<Gc<'a, GcRefCell<NamedObject<'a>>>>,
}
fn main() {
// Initialize host settings: in our case the host object will be replaced by a string
// In this case it lives for the duration of `main`
let host = String::from("HostConfig");
{
// Create the garbage-collected scope (similar usage to `TypedArena`)
let gc_scope = GcScope::new();
// Allocate a garbage-collected string: returns a smart pointer `Gc` for this data
let a: Gc<String> = gc_scope.alloc(String::from("a")).unwrap();
{
let b = gc_scope.alloc(String::from("b")).unwrap();
}
// Manually trigger garbage collection: will free b's memory
gc_scope.collect_garbage();
// Allocate data and get a Gc pointer, data references `host`
let host_binding: Gc<RefNamed> = gc_scope
.alloc(RefNamedObject {
name: &host,
other: None,
})
.unwrap();
// At the end of this block, gc_scope is dropped with all its
// remaining values (`a` and `host_bindings`)
}
}
Lifetime properties
The basic intuition is that Gc
can only contain data that lives as long (or longer) than the corresponding GcScope
. Gc
is similar to Rc
but supports cycles. You need to use Gc<GcRefCell<T>>
to mutate values (similar to Rc<RefCell<T>>
).
Here are the properties that must be satisfied by the lifetimes of my API:
Gc
cannot live longer than its GcScope
The following code must fail because a
outlives gc_scope
:
let a: Gc<String>;
{
let gc_scope = GcScope::new();
a = gc_scope.alloc(String::from("a")).unwrap();
}
// This must fail: the gc_scope was dropped with all its values
println("{}", *a); // Invalid
Gc
cannot contain data that lives shorter than its GcScope
The following code must fail because msg
does not live as long (or longer) as gc_scope
let gc_scope = GcScope::new();
let a: Gc<&string>;
{
let msg = String::from("msg");
a = gc.alloc(&msg).unwrap();
}
It must be possible to allocate multiple Gc
(no exclusion on gc_scope
)
The following code must compile
let gc_scope = GcScope::new();
let a = gc_scope.alloc(String::from("a"));
let b = gc_scope.alloc(String::from("b"));
It must be possible to allocate values containing references with lifetimes longer than gc_scope
The following code must compile
let msg = String::from("msg");
let gc_scope = GcScope::new();
let a: Gc<&str> = gc_scope.alloc(&msg).unwrap();
It must be possible to create cycles of Gc pointers (that's the whole point)
Similarly to the Rc<Refcell<T>>
pattern, you can use Gc<GcRefCell<T>>
to mutate values and create cycles:
// The lifetimes correspond to my best solution so far, they can change
struct CircularObj<'a> {
pub other: Option<Gc<'a, GcRefCell<CircularObj<'a>>>>,
}
let gc_scope = GcScope::new();
let n1 = gc_scope.alloc(GcRefCell::new(CircularObj { other: None }));
let n2 = gc_scope.alloc(GcRefCell::new(CircularObj {
other: Some(Gc::clone(&n1)),
}));
n1.borrow_mut().other = Some(Gc::clone(&n2));
Solutions so far
Automatic lifetime / lifetime tag
Implemented on the auto-lifetime
branch
This solution is inspired by neon
's handles.
This lets any valid code compile (and allowed me to test my implementation) but is too loose and allows invalid code. It allows Gc
to outlive the gc_scope
that created it. (Violates the first property)
The idea here is that I add a single lifetime 'gc
to all my structs. The idea is that this lifetime represents "how long gc_scope lives".
// A smart pointer for `T` valid during `'gc`
pub struct Gc<'gc, T: Trace + 'gc> {
pub ptr: NonNull<GcBox<T>>,
pub phantom: PhantomData<&'gc T>,
pub rooted: Cell<bool>,
}
I call it automatic lifetimes because the methods never mix these struct lifetimes with the lifetime of the references they receive.
Here is the impl for gc_scope.alloc:
impl<'gc> GcScope<'gc> {
// ...
pub fn alloc<T: Trace + 'gc>(&self, value: T) -> Result<Gc<'gc, T>, GcAllocErr> {
// ...
}
}
Inner/outer lifetimes
Implemented on the inner-outer
branch
This implementation tries to fix the previous issue by relating Gc
to the lifetime of GcScope
. It is overly constrained and prevents the creation of cycles. This violates the last property.
To constrain Gc
relative to its GcScope
, I introduce two lifetimes: 'inner
is the lifetime of GcScope
and the result is Gc<'inner, T>
. 'outer
represents a lifetime longer than 'inner
and is used for the allocated value.
Here is the alloc signature:
impl<'outer> GcScope<'outer> {
// ...
pub fn alloc<'inner, T: Trace + 'outer>(
&'inner self,
value: T,
) -> Result<Gc<'inner, T>, GcAllocErr> {
// ...
}
// ...
}
Closure (context management)
Implemented on the with
branch
Another idea was to not let the user create a GcScope
manually with GcScope::new
but instead expose a function GcScope::with(executor)
providing a reference to the gc_scope
. The closure executor
corresponds to the gc_scope
. So far, it either prevents the use of external references or allows to leak data to external Gc
variables (first and fourth properties).
Here is the alloc signature:
impl<'gc> GcScope<'gc> {
// ...
pub fn alloc<T: Trace + 'gc>(&self, value: T) -> Result<Gc<'gc, T>, GcAllocErr> {
// ...
}
}
Here is a usage example showing the violation of the first property:
let message = GcScope::with(|scope| {
scope
.alloc(NamedObject {
name: String::from("Hello, World!"),
})
.unwrap()
});
println!("{}", message.name);
What I'd like
From what I understand, the alloc
signature I'd like is:
impl<'gc> GcScope<'gc> {
pub fn alloc<T: Trace + 'gc>(&'gc self, value: T) -> Result<Gc<'gc, T>, GcAllocErr> {
// ...
}
}
Where everything lives as long or longer than self
(the gc_scope
). But this blows up with the most simple tests:
fn test_gc() {
let scope: GcScope = GcScope::new();
scope.alloc(String::from("Hello, World!")).unwrap();
}
causes
error[E0597]: `scope` does not live long enough
--> src/test.rs:50:3
|
50 | scope.alloc(String::from("Hello, World!")).unwrap();
| ^^^^^ borrowed value does not live long enough
51 | }
| - `scope` dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are created
I have no idea what happens here. Playground link
Edit: As explained to me on IRC, this is because I implement Drop
which requires &mut self
, but the scope
is already borrowed in read-only mode.
Overview
Here is a quick overview of the main components of my library.
GcScope
contains a RefCell
to its mutable state. This was introduced to not require &mut self
for alloc
because it "locked" the gc_scope and violated property 3: allocate multiple values.
This mutable state is GcState
. It keeps track of all the allocated values. The values are stored as a forward-only linked list of GcBox
. This GcBox
is heap-allocated and contains the actual value with some metadata (how many active Gc
pointers have it as their root and a boolean flag used to check if the value is reachable from the root (see rust-gc). The value here must outlive its gc_scope
so GcBox
uses a lifetime, and in turn GcState
must then use a lifetime as well as GcScope
: this is always the same lifetime meaning "longer than gc_scope
". The fact that GcScope
has a RefCell
(interior mutability) and lifetime is maybe the reason why I can't get my lifetimes to work (it causes invariance?).
Gc
is a smart pointer to some gc_scope
-allocated data. You can only get it through gc_scope.alloc
or by cloning it.
GcRefCell
is most likely fine, it's just a RefCell
wrapper adding metadata and behavior to properly support borrows.
Flexibility
I'm fine with the following requirements to get a solution:
- unsafe code
- nightly features
- API changes (see for example my
with
approach). What matters is that I can create a temporary zone where I can manipulate garbage-collected values and that they are all dropped after this. These garbage-collected values need to be able to access longer-lived (but not static) variables outside of the scope.
The repository has a few tests in scoped-gc/src/lib.rs
(compile-fail) as scoped-gc/src/test.rs
.
I found a solution, I'll post it once redacted.