80
votes

I'm trying to get a C string returned by a C library and convert it to a Rust string via FFI.

mylib.c

const char* hello(){
    return "Hello World!";
}

main.rs

#![feature(link_args)]

extern crate libc;
use libc::c_char;

#[link_args = "-L . -I . -lmylib"]
extern {
    fn hello() -> *c_char;
}

fn main() {
    //how do I get a str representation of hello() here?
}
2

2 Answers

120
votes

The best way to work with C strings in Rust is to use structures from the std::ffi module, namely CStr and CString.

CStr is a dynamically sized type and so it can only be used through a pointer. This makes it very similar to the regular str type. You can construct a &CStr from *const c_char using an unsafe CStr::from_ptr static method. This method is unsafe because there is no guarantee that the raw pointer you pass to it is valid, that it really does point to a valid C string and that the string's lifetime is correct.

You can get a &str from a &CStr using its to_str() method.

Here is an example:

extern crate libc;

use libc::c_char;
use std::ffi::CStr;
use std::str;

extern {
    fn hello() -> *const c_char;
}

fn main() {
    let c_buf: *const c_char = unsafe { hello() };
    let c_str: &CStr = unsafe { CStr::from_ptr(c_buf) };
    let str_slice: &str = c_str.to_str().unwrap();
    let str_buf: String = str_slice.to_owned();  // if necessary
}

You need to take into account the lifetime of your *const c_char pointers and who owns them. Depending on the C API, you may need to call a special deallocation function on the string. You need to carefully arrange conversions so the slices won't outlive the pointer. The fact that CStr::from_ptr returns a &CStr with arbitrary lifetime helps here (though it is dangerous by itself); for example, you can encapsulate your C string into a structure and provide a Deref conversion so you can use your struct as if it was a string slice:

extern crate libc;

use libc::c_char;
use std::ops::Deref;
use std::ffi::CStr;

extern "C" {
    fn hello() -> *const c_char;
    fn goodbye(s: *const c_char);
}

struct Greeting {
    message: *const c_char,
}

impl Drop for Greeting {
    fn drop(&mut self) {
        unsafe {
            goodbye(self.message);
        }
    }
}

impl Greeting {
    fn new() -> Greeting {
        Greeting { message: unsafe { hello() } }
    }
}

impl Deref for Greeting {
    type Target = str;

    fn deref<'a>(&'a self) -> &'a str {
        let c_str = unsafe { CStr::from_ptr(self.message) };
        c_str.to_str().unwrap()
    }
}

There is also another type in this module called CString. It has the same relationship with CStr as String with str - CString is an owned version of CStr. This means that it "holds" the handle to the allocation of the byte data, and dropping CString would free the memory it provides (essentially, CString wraps Vec<u8>, and it's the latter that will be dropped). Consequently, it is useful when you want to expose the data allocated in Rust as a C string.

Unfortunately, C strings always end with the zero byte and can't contain one inside them, while Rust &[u8]/Vec<u8> are exactly the opposite thing - they do not end with zero byte and can contain arbitrary numbers of them inside. This means that going from Vec<u8> to CString is neither error-free nor allocation-free - the CString constructor both checks for zeros inside the data you provide, returning an error if it finds some, and appends a zero byte to the end of the byte vector which may require its reallocation.

Like String, which implements Deref<Target = str>, CString implements Deref<Target = CStr>, so you can call methods defined on CStr directly on CString. This is important because the as_ptr() method that returns the *const c_char necessary for C interoperation is defined on CStr. You can call this method directly on CString values, which is convenient.

CString can be created from everything which can be converted to Vec<u8>. String, &str, Vec<u8> and &[u8] are valid arguments for the constructor function, CString::new(). Naturally, if you pass a byte slice or a string slice, a new allocation will be created, while Vec<u8> or String will be consumed.

extern crate libc;

use libc::c_char;
use std::ffi::CString;

fn main() {
    let c_str_1 = CString::new("hello").unwrap(); // from a &str, creates a new allocation
    let c_str_2 = CString::new(b"world" as &[u8]).unwrap(); // from a &[u8], creates a new allocation
    let data: Vec<u8> = b"12345678".to_vec(); // from a Vec<u8>, consumes it
    let c_str_3 = CString::new(data).unwrap();

    // and now you can obtain a pointer to a valid zero-terminated string
    // make sure you don't use it after c_str_2 is dropped
    let c_ptr: *const c_char = c_str_2.as_ptr();

    // the following will print an error message because the source data
    // contains zero bytes
    let data: Vec<u8> = vec![1, 2, 3, 0, 4, 5, 0, 6];
    match CString::new(data) {
        Ok(c_str_4) => println!("Got a C string: {:p}", c_str_4.as_ptr()),
        Err(e) => println!("Error getting a C string: {}", e),
    }  
}

If you need to transfer ownership of the CString to C code, you can call CString::into_raw. You are then required to get the pointer back and free it in Rust; the Rust allocator is unlikely to be the same as the allocator used by malloc and free. All you need to do is call CString::from_raw and then allow the string to be dropped normally.

3
votes

In addition to what @vladimir-matveev has said, you can also convert between them without the aid of CStr or CString:

#![feature(link_args)]

extern crate libc;
use libc::{c_char, puts, strlen};
use std::{slice, str};

#[link_args = "-L . -I . -lmylib"]
extern "C" {
    fn hello() -> *const c_char;
}

fn main() {
    //converting a C string into a Rust string:
    let s = unsafe {
        let c_s = hello();
        str::from_utf8_unchecked(slice::from_raw_parts(c_s as *const u8, strlen(c_s)+1))
    };
    println!("s == {:?}", s);
    //and back:
    unsafe {
        puts(s.as_ptr() as *const c_char);
    }
}

Just make sure that when converting from a &str to a C string, your &str ends with '\0'. Notice that in the code above I use strlen(c_s)+1 instead of strlen(c_s), so s is "Hello World!\0", not just "Hello World!".
(Of course in this particular case it works even with just strlen(c_s). But with a fresh &str you couldn't guarantee that the resulting C string would terminate where expected.)
Here's the result of running the code:

s == "Hello World!\u{0}"
Hello World!