2
votes

My program parse big enough json document (30MB), on machine with slow CPU it takes 70ms, I want to speedup the process, and I find out that 27% of parsing take place into my foo_document_type_deserialize, is it possible to improve this function, may be there is way to skip String allocation here: let s = String::deserialize(deserializer)?;?

I completly sure that strings that represent enum values doesn't contain special json characters like \b \f \n \r \t \" \\, so it should be safe to work with unescaped string.

use serde::{Deserialize, Deserializer};

#[derive(Deserialize, Debug, Clone)]
#[serde(rename_all = "camelCase")]
pub struct FooDocument {
    // other fields...
    #[serde(rename = "type")]
    #[serde(deserialize_with = "foo_document_type_deserialize")]
    doc_type: FooDocumentType,
}

fn foo_document_type_deserialize<'de, D>(deserializer: D) -> Result<FooDocumentType, D::Error>
where
    D: Deserializer<'de>,
{
    use self::FooDocumentType::*;
    let s = String::deserialize(deserializer)?;
    match s.as_str() {
        "tir lim bom bom" => Ok(Var1),
        "hgga;hghau" => Ok(Var2),
        "hgueoqtyhit4t" => Ok(Var3),
        "Text" | "Type not detected" | "---" => Ok(Unknown),
        _ => Err(serde::de::Error::custom(format!(
            "Unsupported foo document type '{}'",
            s
        ))),
    }
}

#[derive(Debug, Clone, Copy)]
pub enum FooDocumentType {
    Unknown,
    Var1,
    Var2,
    Var3,
}
1
I'm afraid (but no expert) that there isn't much you can do. Did you compile your program with --release? - hellow
@hellow yes, and I replaced work File with fs::read_to_string + serde_json::from_str, because of github.com/serde-rs/json/issues/160 . Still too slow. - user1244932
As you mention in your question, allocating a new String for each deserialization is likely contributiing to the performance issue here. Serde provides an impl Deserialize for &str. Can you use that instead of the String impl? - Wesley Wiser
Instead of let s = String::deserialize(deserializer)?;, try this: let s = <&str as Deserialize<'de>>::deserialize(deserializer)?; - Wesley Wiser

1 Answers

6
votes

The custom impl you've written is in a form that serde_derive can generate:

#[derive(Deserialize, Debug)]
pub enum FooDocumentType {
    #[serde(rename = "Text", alias = "Type not detected", alias = "---")]
    Unknown,
    #[serde(rename = "tir lim bom bom")]
    Var1,
    #[serde(rename = "hgga;hghau")]
    Var2,
    #[serde(rename = "hgueoqtyhit4t")]
    Var3,
}

The resulting derived code does not allocate memory and is about 2× faster in a quick microbenchmark compared to your code when I measure the following:

serde_json::from_str::<FooDocument>(r#"{"type":"hgga;hghau"}"#).unwrap()