rebuilding arrays with nested defaultdict

Question

This question is an extension of a previous question: rebuild python array based on common elements - but different enough to warrant a new question:

I've been struggling with this for a bit now. My data is an array of dictionaries from an sql query. Each element in the array represents a shipment, and there are common values based on the keys.

data = [
    {"CustName":"customer1", "PartNum":"part1", "delKey":"0001", "qty":"10", "memo":"blah1"},
    {"CustName":"customer1", "PartNum":"part1", "delKey":"0002", "qty":"10", "memo":"blah2"},
    {"CustName":"customer1", "PartNum":"part1", "delKey":"0003", "qty":"10", "memo":"blah3"},
    {"CustName":"customer2", "PartNum":"part3", "delKey":"0004", "qty":"20", "memo":"blah4"},
    {"CustName":"customer2", "PartNum":"part3", "delKey":"0005", "qty":"20", "memo":"blah5"},
    {"CustName":"customer3", "PartNum":"partXYZ", "delKey":"0006", "qty":"50", "memo":"blah6"},
    {"CustName":"customer3", "PartNum":"partABC", "delKey":"0007", "qty":"100", "memo":"blah7"}]

The output I want is grouped according to specific keys

dataOut = [
   {"CustName":"customer1", "Parts":[
        {"PartNum":"part1", "deliveries":[
            {"delKey":"0001", "qty":"10", "memo":"blah1"},
            {"delKey":"0002", "qty":"10", "memo":"blah2"},
            {"delKey":"0003", "qty":"10", "memo":"blah3"}]}]},
   {"CustName":"customer2", "Parts":[
        {"PartNum":"part3", "deliveries":[
            {"delKey":"0004", "qty":"20", "memo":"blah4"},
            {"delKey":"0005", "qty":"20", "memo":"blah5"}]}]},
   {"CustName":"customer3", "Parts":[
        {"PartNum":"partXYZ", "deliveries":[
            {"delKey":"0006", "qty":"50", "memo":"blah6"}]},
        {"PartNum":"partABC", "deliveries":[
            {"delKey":"0007", "qty":"100", "memo":"blah7"}]}]}]

I can get the grouping with a single level using defaultdict and list comprehension as provided by the previous question and modified slightly

d = defaultdict(list)
for item in data:
    d[item['CustName']].append(item)
print([{'CustName': key, 'parts': value} for key, value in d.items()])

But I can't seem to get the second level in the output array - the grouping b the PartNum key. Through some research, I think what I need to do is use defaultdict as the type of the outer `defaultdict' like so:

d = defaultdict(defaultdict(list))

which throws errors because defaultdict returns a function, so I need to use lambda (yes?)

d = defaultdict(lambda:defaultdict(list))
for item in data:
    d[item['CustName']].append(item) <----this?

My question is how to "access" the second level array in the loop and tell the "inner" defaultdict what to group on (PartNum)? The data comes to me from the database programmer and the project keeps evolving to add more and more data (keys), so I'd like this solution to be as general as possible in case more data gets thrown my way. I was hoping to be able to "chain" the defaultdicts depending on how many levels I need to go. I'm learning as I'm going, so I'm struggling trying to understand the lambda and the basics of the defaultdict type and where to go from here.

Sort the dict then apply groupby. Or do it beforehand with SQL if you can. I cannot help you more now, I am on a mobile phone... — Pynchia
Can there be two delKey's with the same number/value for a PartNum? — wwii
Do you care about the order of the values in the lists in your output? If not, you could easily get rid of those levels and make your structure just a nested set of dictionaries. Tree = lambda: defaultdict(Tree) is all the setup you need for that kind of structure. — Blckknght

cr3 cr3 · Accepted Answer · 2016-01-04T02:24:29

Using groupby as suggested by @Pynchia and using sorted for unordered data as suggested by @hege_hegedus:

from itertools import groupby
dataOut = []
dataSorted = sorted(data, key=lambda x: (x["CustName"], x["PartNum"]))
for cust_name, cust_group in groupby(dataSorted, lambda x: x["CustName"]):
    dataOut.append({
        "CustName": cust_name,
        "Parts": [],
    })
    for part_num, part_group in groupby(cust_group, lambda x: x["PartNum"]):
        dataOut[-1]["Parts"].append({
            "PartNum": part_num,
            "deliveries": [{
                "delKey": delivery["delKey"],
                "memo": delivery["memo"],
                "qty": delivery["qty"],
            } for delivery in part_group]
        })

If you look at the second for loop, this will hopefully answer your question about accessing the second level array in the loop.

rebuilding arrays with nested defaultdict

4 Answers