0
votes

Purpose is to sort (key, value) by value of the input, which is a json file. I have 4 methods, two pairs of mappers and reducers.

Input is similar to

{ 
  id: 1, 
  user: {
    friends_count: 1
  } 
}

Output of first stage of mapper and reducer is something like

A 1
B 2
C 3
D 4

What i want is

1 A
2 B
3 C
4 D

In the first stage sort by key works fine, but at the second stage where i try to make value the key, an error is thrown which says

TypeError: at 0x7fa43ea615a0> is not JSON serializable

The code which i am using is

from mrjob.job import MRJob
from mrjob.step import MRStep
import json

class MRFrnsCounter(MRJob):
    def steps(self):
        return [
            MRStep(mapper=self.mapper,
                   reducer=self.reducer),
            MRStep(mapper = self.mapper_two,
                    reducer = self.reducer_two)
        ]

def mapper(self, _, line):
    f = json.loads(line)
    (uid, frns) = f["id"],f["user"]["friends_count"]
    yield (uid), (frns)

def reducer(self, uid, frns):
    yield uid, sum(frns)

def mapper_two(self, uid, frns):
    yield (frns), (uid)

def reducer_two(self, frns, uid):
    yield (frns), uid

if __name__ == '__main__':
    MRFrnsCounter.run()

The code breaks in the second mapper when the key and value are reversed. Any opinions would be appreciated.

1
Is this: TypeError: at 0x7fa43ea615a0> is not JSON serializable really the full error message?juanpa.arrivillaga
Sorry. The full error message is TypeError: <generator object <genexpr> at 0x7efbfe824a50> is not JSON serializableSid
post the full error message with the stack trace. What is line? I'm pretty sure you need to materialize that generator in mapper.juanpa.arrivillaga
please see my comment belowSid

1 Answers

0
votes

Why not just yield sum(frns), uid in the first reducer?

However, in your second mapper you are trying to yield a generator, not an integer. Iterate through the generator to yield frns, uid. Something like this:

for num in frns:
    yield num, uid