0
votes

I've created a table as follows:

CREATE TABLE TEST (ID INT, SCORE INT, NAME STRING);

and inserted several records. I would like to perform a top-k query, returning the top records for each ID, sorted by SCORE.

I'm using the each_top_k() UDF from the Hivemall library as documented here: https://hivemall.incubator.apache.org/userguide/misc/topk.html

SELECT EACH_TOP_K(1, ID, SCORE, ID, NAME) AS (RANK, SCORE, ID, NAME) FROM (
SELECT * FROM TEST
CLUSTER BY ID
) T;

which successfully returns the top SCORE for each ID. However, I then create a view as follows:

CREATE VIEW TEST_VIEW AS SELECT EACH_TOP_K(1, ID, SCORE, ID, NAME) AS (RANK, SCORE, ID, NAME) FROM (
SELECT * FROM TEST
CLUSTER BY ID
) T;

and it executes successfully. However, then a simple

SELECT * FROM TEST_VIEW;

returns the following error:

Error: Error while compiling statement: FAILED: SemanticException View test_view is corresponding to UDTF, rather than a SelectOperator. (state=42000,code=40000)

I can't find any mention of this error. Any suggestions?

1

1 Answers

1
votes

I would suppose that Hive has problems to infer the data type for each field for your udtf at runtime. This should solve it, try put a query on top of your query, something like

CREATE VIEW TEST_VIEW AS 
select cast(rank as long) as rank, cast(score as double) as score, cast(id as string) as id, cast(name as string) as name from (
SELECT EACH_TOP_K(1, ID, SCORE, ID, NAME) AS (RANK, SCORE, ID, NAME) FROM (
SELECT * FROM TEST
CLUSTER BY ID
) T ) t2;