0
votes

I am a beginner with Hive. I have two Hive tables as follows:

Table A contains columns - date, name, age. Range for values in the date column from Table A is from 20150406 to 20150408.

Table B is a copy of Table A - but with one more new column added - date, name, **dept**, age Range for values in the date column from Table B is from 20150409 to 20150411.

I would like to create a view using Tables A and B such that

View A = 
Table A(date, name, dept as NULL, age) //for dates 20150406 to 20150408
UNION
Table B(date, name, dept, age) //for dates 20150409 to 20150411

Example:

Table A

date | name | age
20150406 | John | 21
20150407 | Jane | 23
20150408 | Mary | 20

Table B

date | name | dept | age
20150409 | Claire | CSE | 25
20150410 | Cindy | Maths | 27
20150408 | Tom | Biology | 30

View A

date | name | dept | age
20150406 | John | NULL | 21
20150407 | Jane | NULL | 23
20150408 | Mary | NULL | 20
20150409 | Claire | CSE | 25
20150410 | Cindy | Maths | 27
20150408 | Tom | Biology | 30

Is this feasible? How can this be done?

Thanks in advance!

2

2 Answers

3
votes

You're almost there:

create view viewA
as
select date, name, NULL as dept, age
from tableA
where date between '20150406' and '20150408'
union all
select date, name, dept, age 
from tableB
where date between '20150409' and '20150411'
0
votes

See Detailed solution:

hive> create table tableA(date String,name string,age int) row format delimited fields terminated by '\t' stored as textfile;
OK
Time taken: 0.084 seconds

hive> create table tableB(date String,name string,dept String,age int) row format delimited fields terminated by '\t' stored as textfile;
OK
Time taken: 0.103 seconds

then from local to hive by load:

hive> load data local inpath '/home/hastimal/PracticeTableData/tableB' into table tableB;
Loading data to table default.tableb
Table default.tableb stats: [numFiles=1, totalSize=71]
OK
Time taken: 0.291 seconds

hive> load data local inpath '/home/hastimal/PracticeTableData/tableA' into table tableA;
Loading data to table default.tablea
Table default.tablea stats: [numFiles=1, totalSize=51]
OK

Further available in hive just to make sure:

hive> select * from tableA;
OK
20150406    John    21
20150407    Jane    23
20150408    Mary    20
Time taken: 0.126 seconds, Fetched: 3 row(s)

hive> select * from tableB;
OK
20150409    Claire  CSE 25
20150410    Cindy   Maths   27
20150408    Tom Biology 30
Time taken: 0.11 seconds, Fetched: 3 row(s)

Final solution :)

SELECT tbA.date AS a ,tbA.name AS b ,NULL AS c,tbA.age AS d FROM tableA tbA 
UNION ALL 
SELECT tbB.date AS a ,tbB.name AS b ,tbB.dept AS c,tbB.age AS d FROM tableB tbB 

See output:

OK
20150409    Claire  CSE 25
20150410    Cindy   Maths   27
20150408    Tom Biology 30
20150406    John    NULL    21
20150407    Jane    NULL    23
20150408    Mary    NULL    20
Time taken: 43.462 seconds, Fetched: 6 row(s)