0
votes

I have just installed Presto today on our server at work (version 0.57) and when doing a select count(*) from log; it takes more than 17 minutes for a table with only 640 million records (~64GB).

Now I am under the impression that this is way too slow for presto, but I am not sure.

Some info:

Hive and Presto have both been installed with default configurations from their documentation.

Hive table is an external table with about 24 columns most of them String and 3 of them are Array and the file is stored as Textfile (Hive complains about RCFile with my file for some reason).

The table will be mostly used for grouping and count operations.

Do you have any tips for increasing performance or what the targetted query time should be for a simple count(*) of a table?

Cheers

1

1 Answers

1
votes

You should solve your problem with RCFile. Using RCFile will increase the performance significant (x2 - x4 the developers say conform with my experience). Try to convert it using CREATE TABLE <new rcfile table name> AS SELECT * FROM <old textfile table name>; in Presto. (Be sure to have enough space on disk.)