Purpose is to sort (key, value) by value of the input, which is a json file. I have 4 methods, two pairs of mappers and reducers.
Input is similar to
{
id: 1,
user: {
friends_count: 1
}
}
Output of first stage of mapper and reducer is something like
A 1
B 2
C 3
D 4
What i want is
1 A
2 B
3 C
4 D
In the first stage sort by key works fine, but at the second stage where i try to make value the key, an error is thrown which says
TypeError: at 0x7fa43ea615a0> is not JSON serializable
The code which i am using is
from mrjob.job import MRJob
from mrjob.step import MRStep
import json
class MRFrnsCounter(MRJob):
def steps(self):
return [
MRStep(mapper=self.mapper,
reducer=self.reducer),
MRStep(mapper = self.mapper_two,
reducer = self.reducer_two)
]
def mapper(self, _, line):
f = json.loads(line)
(uid, frns) = f["id"],f["user"]["friends_count"]
yield (uid), (frns)
def reducer(self, uid, frns):
yield uid, sum(frns)
def mapper_two(self, uid, frns):
yield (frns), (uid)
def reducer_two(self, frns, uid):
yield (frns), uid
if __name__ == '__main__':
MRFrnsCounter.run()
The code breaks in the second mapper when the key and value are reversed. Any opinions would be appreciated.
TypeError: at 0x7fa43ea615a0> is not JSON serializable
really the full error message? – juanpa.arrivillagaline
? I'm pretty sure you need to materialize that generator inmapper
. – juanpa.arrivillaga