0
votes

I'm struggling to build a multi-nested dictionary that I'll be using to add to my collection in mongodb. I'm questioning the approach and my attempt at the the solution.

Here is the problem: I built a function that identifies the delta's between the local collection data and updates I'm getting from my golden source.

The function produces a dictionary of all the delta's. The dictionary contains the tag as the key, and the new delta update as the value.

I then pass the delta dictionary and current data dictionary to another function, who is responsible for doing the following:

  1. identifying the old value and new value using the delta.key()
  2. building a new dictionary which should contain the a full path to the nested dictionary, which will contain only two values: newValue and oldValue.

What I'm struggling with is that when I do the four loop, it just seems to overwrite the the previous record. The data should get appended. If the path exists, when the update should only be adding to the delta's. Unless the value already exists then I can understand the update. For example:

  1. Same date -> Different Tags: should append the new tag and it's oldvalue and newvalue.
  2. Same date -> Same tag: Should update the existing tag's

The reason I am trying to do this in this manner is so that I can avoid multiple calls and updates to the collection. Ideally stick to one update.

But my concerns are the following:

  1. Is this the best approach when working with nested dictionaries and MongoDB ?
  2. What issues will this cause when I go to update mongodb using "pymongo". I'm worried it's going to over ride existing records on the update. I want the records to be appended not overwritten.
  3. Is there a different approach that would make more sense?

This is my first attempt 1:

def update_record(_collection, _key, _data, _delta):
    today = date.today()
    today_formatted = today.strftime("%Y-%m-%d")
    _query_criteria = {_key: _data[_key]}
    _update_values = {}
    _append_delta = {}

    x = 0
    for delta_key in _delta.keys():
        _update_values = {delta_key: _delta[delta_key]}
            _append_delta["delta"]["byType"][delta_key][today_formatted] = {"oldValue": _data[delta_key],
                                                                            "newValue": _delta[delta_key]}
            _append_delta["delta"]["byDate"][today_formatted][delta_key] = {"oldValue": _data[delta_key],
                                                                            "newValue": _delta[delta_key]}

Attempt 2:

def update_record(_collection, _key, _data, _delta):
    today = date.today()
    today_formatted = today.strftime("%Y-%m-%d")
    _query_criteria = {_key: _data[_key]}
    _update_values = {}
    _append_delta = {}

    x = 0
    for delta_key in _delta.keys():
        _update_values = {delta_key: _delta[delta_key]}
        x_dict = {}
        y_dict = {}

        if x == 0:
            _append_delta["delta"]["byType"] = {delta_key: {today_formatted: {}}}
            _append_delta["delta"]["byDate"][today_formatted] = {delta_key: {}}
            x += 1
            _append_delta["delta"]["byType"][delta_key][today_formatted] = {"oldValue": _data[delta_key],
                                                                            "newValue": _delta[delta_key]}
            _append_delta["delta"]["byDate"][today_formatted][delta_key] = {"oldValue": _data[delta_key],
                                                                            "newValue": _delta[delta_key]}

        else:
            _append_delta.update(
                {"delta":
                    {"byType": {
                        delta_key: {today_formatted: {"oldValue": _data[delta_key], "newValue": _delta[delta_key]}}},
                        "byDate": {
                            today_formatted: {delta_key: {"oldValue": _data[delta_key], "newValue": _delta[delta_key]}}}
                    }
                }
            )

Example of what I want the collection to look like in MongoDB:

[{name: "Apple",
 ticker: "appl",
 description: "Apple Computers",
 currency: "usd",
 delta: {
     byTag: {
         name: {
             "2021-06-01": {
                 oldValue: "appl",
                 newValue: "Apple"
             }
         },
         description: {
             "2021-06-06": {
                 oldValue: "Apple",
                 newValue: "Apple Computers"
             }
         }
     },
     byDate: {
         "2021-06-01": {
             name: {
                 oldValue: "appl",
                 newValue: "Apple"
             }
         },
        "2021-06-06": {
             description: {
                 oldValue: "Apple",
                 newValue: "Apple Computers"
             }
         }

     }
 }
 }]
1

1 Answers

0
votes

You have a lot of questions here. You may get a better response if you break them down into bite-size issues.

In terms of dealing with changes to your data, you might want to take a look at dictdiffer. Like a lot of things in python there's usually a good library to achieve what you are looking to do. It won't give you the format you are looking for but will give you a format that the community has determined is best practice for this sort of problem. You get extra great stuff as well, like being able to patch old records with the delta.

Separately, with nested dicts, I think it's easier to create them based on the object structure rather than relying on building from keys. It more verbose but clearer in my opinion. The code below is a sample using classes to give you an idea of this concept:

from pymongo import MongoClient
from datetime import date
from bson.json_util import dumps

db = MongoClient()['mydatabase']


class UpdateRecord:
    def __init__(self, name, ticker, description, currency, delta):
        self.name = name
        self.ticker = ticker
        self.description = description
        self.currency = currency
        self.date = date.today().strftime("%Y-%m-%d")
        self.delta = delta
        # Need to code to work out the deltas

    def by_tags(self):
        tags = dict()
        for tag in ['name', 'description']:
            tags.update({
                tag: {
                    self.date: {
                        'oldValue': "appl",
                        'newValue': "Apple"
                    }
                }
            })
        return tags

    def by_date(self):
        dates = dict()
        for dt in ['2021-06-01', '2021-06-06']:
            dates.update({
                dt: {
                    self.date: {
                        'oldValue': "appl",
                        'newValue': "Apple"
                    }
                }
            })
        return dates

    def to_dict(self):
        return {
            'name': self.name,
            'ticker': self.ticker,
            'description': self.description,
            'currency': self.currency,
            'delta': {
                'byTag': self.by_tags(),
                'byDate': self.by_date()
            }
        }

    def update(self, _id):
        db.mycollection.update_one({'_id': _id}, {'$push': {'Updates': self.to_dict()}})


delta = {
    'oldValue': "appl",
    'newValue': "Apple"
}
#
# Test it out
#
dummy_record = {'a': 1}
db.mycollection.insert_one(dummy_record)
record = db.mycollection.find_one()

update_record = UpdateRecord(name='Apple', ticker='appl', description='Apple Computer', currency='usd', delta=delta)
update_record.update(record.get('_id'))

print(dumps(db.mycollection.find_one({}, {'_id': 0}), indent=4))

prints:

{
    "a": 1,
    "Updates": [
        {
            "name": "Apple",
            "ticker": "appl",
            "description": "Apple Computer",
            "currency": "usd",
            "delta": {
                "byTag": {
                    "name": {
                        "2021-08-14": {
                            "oldValue": "appl",
                            "newValue": "Apple"
                        }
                    },
                    "description": {
                        "2021-08-14": {
                            "oldValue": "appl",
                            "newValue": "Apple"
                        }
                    }
                },
                "byDate": {
                    "2021-06-01": {
                        "2021-08-14": {
                            "oldValue": "appl",
                            "newValue": "Apple"
                        }
                    },
                    "2021-06-06": {
                        "2021-08-14": {
                            "oldValue": "appl",
                            "newValue": "Apple"
                        }
                    }
                }
            }
        }
    ]
}