I started exploring drill for our requirement to run SQL-on-semi structured data. I have setup a 4node drill cluster with zookeeper. Have few questions on how it actually works,
When I run Drill in distributed mode, using dfs (local file system) i.e., I have a 1GB Json file on one of the nodes(say n1). I am able to run the query by launching sqlline from any of the nodes(n1, n2, n3, n4) inspire have date only on n1. My questions is
a. Is the query being executed on all the nodes? i.e., will Drill parallelise the query execution by distributing the data to other node n2,n3n4?
b. If NO, by copying the same file on all the nodes n2,n3,n4 will help in leveraging MPP architecture of Drill?