4
votes

I find the concept of PhantomData in Rust quite confusing. I use it extensively to constrain object lifetimes in my FFI-based code, and I am still not sure whether I do it correctly or not.

Here's a contrived sample of how I often end up using it. For example, I do not want an instance of MyStruct to outlive an instance of Context:

// FFI declarations and types
mod ffi {
    use std::ffi::c_void;
    pub type handle_t = *const c_void;
    // ...
}

// A wrapper structure for some context created and maintained
// inside the C library
struct Context {
    // ...
}

// Handle is only valid as long as the Context is alive.
// Hence, I use the PhantomData marker to constrain its lifetime.
struct MyStruct<'a> {
    marker: PhantomData<&'a Context>,
    handle: ffi::handle_t,
}

impl<'a> MyStruct<'a> {
    fn new(context: &'a Context) -> Self {
        let handle: ffi::handle_t = context.new_handle();
        MyStruct {
            marker: PhantomData,
            handle
        }
    }
}

fn main() {
    // Initialize the context somewhere inside the C library
    let ctx = Context::new(unsafe {ffi::create_context()});

    // Create an instance of MyStruct
    let my_struct = MyStruct::new(&ctx);

    // ...
}

I don't quite understand the following:

  1. What exactly is this marker: PhantomData thing, syntactically? I mean, it does not look like a constructor, which I'd expect to be something like PhantomData{} or PhantomData().

  2. For the purposes of lifetime tracking, does PhantomData even care about the actual type in the declaration of marker? I tried changing it to PhantomData<&'a usize>, and it still worked.

  3. In the declaration of my MyStruct::new() method, if I forget to explicitly specify the 'a lifetime for the context argument, the magic of PhantomData disappears, and it becomes OK to drop Context before MyStruct. This is quite insidious; the compiler does not even give a warning. What lifetime does it assign to marker then, and why?

  4. Related to the previous question; if there are multiple input reference arguments with potentially different lifetimes, how does PhantomData determine which lifetime to use?

1
There are too many questions in this question.mcarton
@mcarton I felt that these questions stemmed from a single gross misunderstanding of what PhantomData actually is, so I put them all in a single questionRoman Dmitrienko
The Rustonomicon has a good answer to this question: doc.rust-lang.org/nomicon/phantom-data.htmlBallpointBen

1 Answers

6
votes

What exactly is this marker: PhantomData thing, syntactically? I mean, it does not look like a constructor, which I'd expect to be something like PhantomData{} or PhantomData().

You can define a zero-field struct, like this:

struct Foo;

And create an instance of it like this:

let foo: Foo = Foo;

Both the type and the value are named Foo.

For the purposes of lifetime tracking, does PhantomData even care about the actual type in the declaration of marker? I tried changing it to PhantomData<&'a usize>, and it still worked.

There is nothing special about PhantomData except that it is not an error that its type argument is unused (see the source). This behaviour is enabled with the #[lang = "phantom_data"] attribute, which is just a hook in the compiler for this purpose.

In the declaration of my MyStruct::new() method, if I forget to explicitly specify the 'a lifetime for the context argument, the magic of PhantomData disappears, and it becomes OK to drop Context before MyStruct. This is quite insidious; the compiler does not even give a warning. What lifetime does it assign to marker then, and why?

PhantomData is there to let you tell the compiler information that it cannot infer itself, because the information is about a type that you are not directly using. It's up to you to give the compiler the correct information.

In the declaration of my MyStruct::new() method, if I forget to explicitly specify the 'a lifetime for the context argument, the magic of PhantomData disappears, and it becomes OK to drop Context before MyStruct. This is quite insidious; the compiler does not even give a warning. What lifetime does it assign to marker then, and why?

I'm not completely sure if I understand this question. PhantomData doesn't do anything - it's just a way to communicate to the compiler that you are using data in a certain way and it's up to you to express that information accurately. Note that, even if you express the constraints incorrectly, it is only possible to introduce memory unsafety if you also have unsafe code. Correctly expressing lifetimes in PhantomData is a part of creating a safe abstraction around unsafe code.