In my Hive Table, I have a column called YearMonth which has values like 'Jul1998'. I need to split the string in each row into Month ('Jul') and Year ('1998) and add those separately in new columns in Hive. Unfortunately, my code doesn't do it correctly and adds a NULL instance in every other row like shown below:
I took the below steps to get the above output:
Created a split.py file to split the string 'Jul1998' into 'Jul' and '1998
#!/usr/bin/python
import sys
for line in sys.stdin:
Month= line[:3]
Year = line[3:]
print '\t'.join([Month, Year])
Then I go into Hive and add python file into hive directory Finally, I used the transform function to split the column and created two new columns.
ADD FILE /home/ec2-user/split.py;
INSERT OVERWRITE TABLE data
SELECT TRANSFORM (yearmonth) USING 'split.py' AS (month,year) FROM data;

SUBSTR? For example:SELECT SUBSTR(yearmonth,1,3) AS month, SUBSTR(yearmonth,4,3) AS year FROM data- pault