0
votes

I have been using Select Hive processor to fetch data from Hive and create CSV files. I am observing for around 7 Million records, it takes around 5 minutes. When observed closely, It was found that data fetch from Hive is faster and hardly takes less 10% of the overall time but it is taking too long to write files in CSVs. I am using 8 Cores and 32GB RAM. I have configured heap memory of 16 GB. Can someone please help to improve this performance? Do I need to do any system level settings?

1

1 Answers

1
votes

The CSV output option of SelectHiveQL could certainly be improved, currently it builds each row as a string in memory and then writes it to the flow file, but it probably could just write straight to the flow file, etc. Please feel free to file a Jira for this improvement.