2
votes

I'm writing a Rust wrapper for a (mostly C-style) C++ plug-in SDK. The plug-in host is a graphical desktop application that runs an event loop. The plug-in is regularly called as part of that event loop. Whenever this happens, the plug-in has control and can call arbitrary host functions.

One C function which I want to wrap returns a raw pointer. Right after that function returns, the pointer is guaranteed to be a valid C string, so it is safe to dereference it. However, after the plug-in callback returns (thus giving back control to the host), the pointer can become stale. How can I write an ergonomic function wrapper for this which will not result in undefined behavior at some point, e.g. when the consumer tries to access the string in the next event loop cycle?

I've thought about the following approaches:

1. Return an owned string

I could immediately dereference the pointer and copy the content into an owned CString:

pub fn get_string_from_host() -> CString {
    let ptr: *const c_char = unsafe { ffi.get_string() };
    unsafe { CStr::from_ptr(ptr).to_owned() }
}

This is presumptuous — maybe the consumer of my wrapper is not interested in getting an owned string because they just want to make a comparison (that's even the primary use case I would say). Copying the string would be a total waste then.

2. Return the raw pointer

pub fn get_string_from_host() -> *const c_char {
    unsafe { ffi.get_string() }
}

This just shifts the problem to the consumer.

3. Return a CStr reference (unsafe method)

pub unsafe fn get_string_from_host<'a>() -> &'a CStr {
    let ptr: *const c_char = ffi.get_string();
    CStr::from_ptr(ptr)
}

This is unsafe because the lifetime of the reference is not accurate. Accessing the reference at a later point in time can result in undefined behavior. Another way of shifting the problem to the consumer.

4. Take a closure instead of returning something

pub fn with_string_from_host<T>(f: impl Fn(&CStr) -> T) -> T {
    let ptr: *const c_char = unsafe { ffi.get_string() };
    f(unsafe { CStr::from_ptr(ptr) })
}

pub fn consuming_function() {
    let length = with_string_from_host(|s| s.to_bytes().len());
}

This works but really needs getting used to.


None of these solutions are really satisfying.

Is there a way to make sure a return value is used "immediately", meaning that it is not stored anywhere or never escapes the caller's scope?

This sounds like a job for references/lifetimes, but I'm not aware of any lifetime annotation which means something like "valid just in the current stackframe". If there would be, I would use that (just for illustration):

pub fn get_string_from_host() -> &'??? CStr {
    let ptr: *const c_char = unsafe { ffi.get_string() };
    unsafe { CStr::from_ptr(ptr) }
}

pub fn consuming_function() {
    // For example, this shouldn't be possible in this case
    let prolonged: &'static CStr = get_string_from_host();
    // But this should
    let owned = get_string_from_host().to_owned();
}
1
If that value becomes invalidated after some other operation occurs it's the responsibility of the caller to use it correctly. "Immediate" may require passing it to a function for processing. This function shouldn't dictate rules that don't matter. To eliminate any possible invalidation situations you'd have to make a copy and pass that back where it can be used indefinitely.tadman
@tadman Because if this function has the guarantee that its return value will never leave the caller's scope, it can safely return the string as a reference (which points to C memory). It can vouch for the fact that the reference stays valid within that very short lifetime. Therefore it wouldn't have to be marked as unsafe, which is very good for API ergonomics.helgoboss
How much of a performance hit do you take by making a copy? Could you also consider what you're doing with this data and perhaps write wrapper functions that perform those operations safely, abstracting this particular issue away?tadman
@tadman To be fair, the performance hit is probably neglectable for many real-world use cases since those strings are rather small and this function is not called from a real-time thread. But it's a generic library so I can't predict all use cases. For the same reason I also find it hard to predict what consumers are going to do with that data and to write perfectly tailored wrapper functions. Plus, it's not just this one function, it's actually many that follow the same pattern. So if there's a better way, I would definitely prefer that.helgoboss
I think you're stuck here with a choice between either shackling the caller in ways that are really onerous, making the limitations of this result known (unsafe), and making a safe copy that can be used however. The copy, if cheap, is surely the best approach.tadman

1 Answers

2
votes

Your question and the comments lay out your options. It mostly comes down to meeting other people's expectations, that is, the rule of least surprise. This argues for returning an owned String. As it was said before, the owned String involves a copy (which will have a negligible performance-impact unless called a gazillion times in a loop)

I'd strongly advise against the raw-pointer- and CStr-reference-solutions, which are foot-guns.

Personally, I'd go with the closure, as this implements the basic situation: The context of the code accessing the string has to move to where the string is; we can't allow the string to move to where the context is (which as far as we can know even the caller may not control). The closure-solution should allow you to have your cake and eat it too: The closure of type impl Fn(&CStr) -> T can be |s| s.to_owned(), making with_string_from_host return a copy if so desired.