I am looking into using Hive on our Hadoop cluster to then use Presto to do some analytics on the data stored in Hadoop but I am still confused about some things:
- Files are stored in Hadoop (some kind of file manager)
- Hive needs tables to store data from Hadoop (data manager)
- Do both Hadoop and Hive store their data separate or does Hive just use the files from Hadoop? (in terms of hard disk space and so on?) -> So does Hive import data from Hadoop in tables and leave Hadoop alone or how must I see this?
- Can Presto be used without Hive and just on Hadoop directly?
Thanks in advance for answering my questions :)