My target is to perform a SELECT query using Hive
When I have a small data on a single machine (namenode), I start by: 1-Creating a table that contains this data: create table table1 (int col1, string col2) 2-Loading the data from a file path: load data local inpath 'path' into table table1; 3-Perform my SELECT query: select * from table1 where col1>0
I have huge data, of 10 millions rows that doesn't fit into a single machine. Lets assume Hadoop divided my data into for example 10 datanodes and each datanode contains 1 million row.
Retrieving the data to a single computer is impossible due to its huge size or would take alot of time in case it is possible.
Will Hive create a table at each datanode and perform the SELECT query or will Hive move all the data a one location (datanode) and create one table? (which is inefficient)