I am using CDH-5.4.4 Cloudera Edition, I have a CSV file in HDFS location, My requirement is to perform Real time SQL queries on Hadoop Environement (OLTP).
So I decided to go with Impala, I have created MetaStore table to a CSV file, then execuing query in impala editor (within HUE application) .
When i am executing below query, i am getting error like
"AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT City); deviating function: count(DISTINCT Country)".
CSV File
OrderID,CustomerID,City,Country
Ord01,Cust01,Aachen,Germany
Ord02,Cust01,Albuquerque,USA
Ord03,Cust01,Aachen,Germany
Ord04,Cust02,Arhus,Denmark
Ord05,Cust02,Arhus,Denmark
Problamatic Query
Select CustomerID,Count(Distinct City),Count(Distinct Country) From CustomerOrders Group by CustomerID
Problem:
Unable to execute the Impala Query with More than one Distinct Values in an Query.. I have searched over internet they provide NDV() method as a workaround, But NDV method only returns approximate count of distinct values, I need Exact unique count for more than one fields.
Expectation:
What is the best way to do Exact unique count for more than one fields? Kindly modify the above query to work with Impala.
Note: This is not my original table, I have replicate for the forum question.