What are the advantages of SpecificMutableRow in Spark SQL?

Question

From comments it seems:

A parent class for mutable container objects that are reused when the values are changed, resulting in less garbage.

and

A row type that holds an array specialized container objects, of type MutableValue, chosen based on the dataTypes of each column. The intent is to decrease garbage when modifying the values of primitive columns.

Source - https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SpecificMutableRow.scala

Can anyone explain how it is efficent? Is just the fact that it avoids boxing enough?

Michael Armbrust Michael Armbrust · Accepted Answer · 2015-03-25T00:33:15

It avoids boxing in storage, but this is mostly helpful when combined with another specific interface (i.e. the parquet reader or code generated expression evaluation).

The other advantage is that it is reused (unlike generic row). So for many operations it can operate on a range of data without allocating any objects.

What are the advantages of SpecificMutableRow in Spark SQL?

1 Answers