I am trying to make a mapper/reducer program to calculate max/min temp from a data set. I have tried to modify by myself but the code doesn't work. The mapper runs fine but reducer doesn't, given I made changes in mapper.
My sample code: mapper.py
import re
import sys
for line in sys.stdin:
val = line.strip()
(year, temp, q) = (val[14:18], val[25:30], val[31:32])
if (temp != "9999" and re.match("[01459]", q)):
print "%s\t%s" % (year, temp)
reducer.py
import sys
(last_key, max_val) = (None, -sys.maxint)
for line in sys.stdin:
(key, val) = line.strip().split("\t")
if last_key and last_key != key:
print "%s\t%s" % (last_key, max_val)
(last_key, max_val) = (key, int(val))
else:
(last_key, max_val) = (key, max(max_val, int(val)))
if last_key:
print "%s\t%s" % (last_key, max_val)
sample line from file:
690190,13910, 2012**0101, *42.9,18, 29.4,18, 1033.3,18, 968.7,18, 10.0,18, 8.7,18, 15.0, 999.9, 52.5, 31.6*, 0.00I,999.9, 000000,
I need the values in bold. Any idea!!
this is my output if i run mapper as a simple code:
root@ubuntu:/home/hduser/files# python maxtemp-map.py
2012 42.9
2012 50.0
2012 47.0
2012 52.0
2012 43.4
2012 52.6
2012 51.1
2012 50.9
2012 57.8
2012 50.7
2012 44.6
2012 46.7
2012 52.1
2012 48.4
2012 47.1
2012 51.8
2012 50.6
2012 53.4
2012 62.9
2012 62.6
The file contains different years data. I have to calculate min, max, and avg for each yr.
FIELD POSITION TYPE DESCRIPTION
STN--- 1-6 Int. Station number (WMO/DATSAV3 number)
for the location.
WBAN 8-12 Int. WBAN number where applicable--this is the
historical
YEAR 15-18 Int. The year.
MODA 19-22 Int. The month and day.
TEMP 25-30 Real Mean temperature. Missing = 9999.9
Count 32-33 Int. Number of observations in mean temperature
*represent? Do all numbers of each line represent the same quantity (temperature)? - wflynny