2
votes

I'm working on a Serde serializer in Rust for a JSON-like binary format.

It uses different binary encodings for sequences depending on whether each element is the same size, or whether it's a sequence with mixed element sizes.

E.g.:

  • [1, 2, 3]: serializes using a compact encoding, as all elements serialize to the same byte length
  • [1, "two", ["a", 1]]: serializes using a different encoding, since each element serializes to a different byte length

From what I understand of Serde, it serializes a sequence an element at a time, and only the sequence length is optionally know at the start of sequence serialization (via a call to serialize_seq on a Serializer).

Is there a good pattern for dealing with cases such as the above, where a sequence can't be serialized until all elements have been inspected (and serialized to know their byte length)? It also needs to cope with nested sequences too, as in the example above.

1
You could store all elements in your SerializeSeq instance until the call to end, where you would do the actual serialization. - mcarton

1 Answers

0
votes

Solved using the suggestion from mcarton about storing elements in the sequence serializer.

This stores each serialized sequence of bytes in a Vec (i.e. as a Vec<Vec<u8>>), and checks whether all items are of equal length before serializing into output.

This looks something like:

/// Main top level serializer.
pub struct Serializer {
    output: Vec<u8>
}

impl<'a> ser::Serializer for &'a mut Serializer {
    // ... skipped ...

    type SerializeSeq = ArraySerializer<'a>;

    fn serialize_seq(self, len: Option<usize>) -> Result<Self::SerializeSeq> {
        let array_ser = ArraySerializer {
            items: Vec::new(),
            output: &mut self.output,
        };
        Ok(array_ser)
    }

    // ... skipped ...
}


pub struct ArraySerializer<'a> {
    /// Temporary storage for individual serialized array elements.
    items: Vec<Vec<u8>>,

    /// Storage for final serialized output of header plus all elements. This is
    /// typically a reference to the full output buffer being serialized into.
    output: &'a mut Vec<u8>,
}

impl <'a> ser::SerializeSeq for ArraySerializer<'a> {
    type Ok = ();
    type Error = Error;

    fn serialize_element<T>(&mut self, value: &T) -> Result<Self::Ok> where
        T: ?Sized + Serialize {
        // default serializer used for serializing array elements
        let mut serializer = Serializer::default();

        // serialize individual item and add to `items`
        value.serialize(&mut serializer)?;
        self.items.push(serializer.output);
        Ok(())
    }

    fn end(self) -> Result<Self::Ok> {
        if self.items.is_empty() {
            self.output.push(EMPTY_ARRAY_HEADER);
            return Ok(());
        }

        let all_elems_same_length = self.items
            .iter()
            .all(|ref v| v.len() == self.items[0].len());
        };

        if all_elems_same_length {
            self.output.push(SAME_LENGTH_HEADER);
            for item in &mut self.items.iter_mut() {
                self.output.append(item);
            }
        } else {
            self.output.push(VARIABLE_LENGTH_HEADER);

            // ... skipped: encode rest of items using more complicated serialization ...
        }

        Ok(())
    }
}