I see really weird behaviour when trying to do EXPORT TO PARQUET from Vertica to local drive. See example:
When I do
EXPORT TO PARQUET (directory = '/data') over (partition by install_date)
AS select field1, field2 from table where install_date >= 'some_date';
Vertica says:
Rows Exported
---------------
16 405 136
But when I read result parquet (using pyarrow or Spark), i get only 2 522 845 rows. oO
Directory size is about 59M.
After few hours of testing I've found that if I do the same command but with limit clause with enormous large value, it works!
EXPORT TO PARQUET (directory = '/data') over (partition by install_date)
AS select field1, field2 from table where install_date >= 'some_date' limit 10000000000;
Then Vertica exports the same 16 405 136 number of rows, and parquet has the same number of rows! And directory size is around 350M.
Any ideas why?
Thanks.