4
votes

Currently streaming Change Data Capture events from MongoDB into snowflake, would like to apply them to the raw data that is already there.

Let's say I have a table like this:

+---------------------+-----------------+-----------+
|         key         |      value      | document  |
+---------------------+-----------------+-----------+
| foo.bar             | "changed value" | <variant> |
| foo.stuff.anArray.1 | 1000            | <variant> |
| ...                 | ...             | ...       |
+---------------------+-----------------+-----------+

Where variant contains a very heavily nested JSON ex:

{
    "foo": {
        "bar": "Some info",
        "baz": "Other info",
        "stuff": {
            "anArray": [1, 2, 3],
            "things": "More nested info"
        }
    }
}

I would like to use OBJECT_DELETE and OBJECT_INSERT functions to update this nested variant data in snowflake.

Tried making a js UDF but eval() is not supported.

Other approaches like writing a UDF that does key.split(".") and then recursively walking the structure and updating the field seem to take a long time and fail with JavaScript out of memory error: UDF thread memory limit exceeded in some cases.

Looking for a bit more efficient way to solve this issue.

2
Can you be more specific/examples on how the object would be updated (in terms of data)?Felipe Hoffa
@FelipeHoffa Sure, the key mentioned in the table is the path in json. In the first row "foo.bar" would update the "Some info" value in the json to "changed value" and in the second row the expected values in "anArray" would be [1, 1000, 3]. I the Idea was for it to function the same as OBJECT_INSERT but with the ability to go deeper into json.ElvisElvis
do you mean, can someone write snowflake SQL to handling this: docs.mongodb.com/manual/reference/change-events/…Simeon Pilgrim
I really want to help with this - I'm just having trouble understanding exactly what you want.Felipe Hoffa
@SimeonPilgrim Yes indeed the idea is to apply the changes coming in the change stream to data already in the warehouse. The variant is the existing data and the key, value is the parsed information coming from mongodb CDC.ElvisElvis

2 Answers

1
votes

I've faced a similar problem and used a generic UDF to solve it. Here is an example of a UDF implementation that will solve what you need:

create or replace function edit_nested_entity("variant_object" variant, "path" string, "value" string)
returns variant
language javascript
as
$$
// https://stackguides.com/questions/6491463/accessing-nested-javascript-objects-and-arrays-by-string-path?page=1&tab=votes#tab-top
    Object.byString = function(o, s) {
        s = s.replace(/\[(\w+)\]/g, '.$1'); // convert indexes to properties
        s = s.replace(/^\./, '');           // strip a leading dot
        var a = s.split('.');
        for (var i = 0, n = a.length; i < n; ++i) {
            var k = a[i];
            if (k in o) {
                o = o[k];
            } else {
                return;
            }
        }
        return o;
   }
   // get the entity base
   nested_entity = Object.byString(variant_object, path)
   // update the value
   nested_entity = value
   return variant_object;
$$;

And now you'll need to run the following SQL command to achieve what you need:

UPDATE t1
SET document = edit_nested_entity(document, key, value) 

You'll maybe do some fine-tuning for this UDF to be more generic (or to use different UDF for different data types), but this will work.

0
votes

There is a way using OBJECT_INSERT but it's not pretty. Unfortunately I don't see a way to specify a nested key in a single OBJECT_INSERT. So:

create or replace table test2 (document variant);
insert into test2 select object_construct('foo',object_construct('bar','Some info', 'baz', 'Other info','stuff', object_construct('anArray', array_construct(1, 2, 3), 'things', 'More nested info')));
select * from test2;

I get:

{
      "foo": {
                "bar": "Some info",
                "baz": "Other info",
                "stuff": {
                          "anArray": [1,2,3],
                          "things": "More nested info"
                }
      }

}

Now, I want to update foo.bar with "Changed info" so I can do (remember to set the flag to TRUE so you get an update rather than insert):

update test2 set document = OBJECT_INSERT(document, 'foo', OBJECT_INSERT(document:foo::VARIANT, 'bar', 'Changed value', TRUE), TRUE) WHERE document:foo.bar::VARCHAR = 'Some info';

I get back:

{
          "foo": {
                    "bar": "Changed value",
                    "baz": "Other info",
                    "stuff": {
                              "anArray": [1,2,3],
                              "things": "More nested info"
                    }
          }
}

You can also use Javascript UDF as mentioned here.