Is it possible to approximate the size of a derived table (in kb/mb/gb etc) in a Spark SQL query ? I don't need the exact size but an approximate value will do, which would allow me to plan my queries better by determining if a table could be broadcast in a join, or if using a filtered subquery in a Join will be better than using the entire table etc.
For e.g. in the following query, is it possible to approximate the size (in MB) of the derived table named b ? This will help me figure out if it will be better to use the derived table in the Join vs using the entire table with the filter outside -
select
a.id, b.name, b.cust
from a
left join (select id, name, cust
from tbl
where size > 100
) b
on a.id = b.id
We use Spark SQL 2.4. Any comments appreciated.