0
votes

I have a specify use case but my question pertains to the best way of doing this in general.

I have three tables

Order - primary key order_id

OrderLine - Linking table with order_id, product_id and quantity. An order has 1 or more order lines

Product - primary key product_id, each order line has one product

In sqlachemy / python how do I generate nested JSON along the lines of:

{
    "orders": [
        {
            "order_id": 1
            "some_order_level_detail": "Kansas"
            "order_lines": [
                {
                    "product_id": 1,
                    "product_name": "Clawhammer",
                    "quantity": 5
                },
                ...
            ]
        },
        ...
    ]
}

Potential Ideas

Hack away doing successive queries

First idea which I want to get away from if possible is using list comprehesion and a brute force approach.

def get_json():
    answer = {
        "orders": [
            {
                "order_id": o.order_id,
                "some_order_level_detail": o.some_order_level_detail,
                "order_lines": [
                    {
                        "product_id": 1,
                        "product_name": Product.query.get(o_line.product_id).product_name,
                        "quantity": 5
                    }
                    for o_line in OrderLine.query.filter(order_id=o.order_id).all()
                ]
            }
            for o in Order.query.all()
        ]
    }

This gets hard to maintain mixing the queries with json. Ideally I'd like to do a query first...

Get joined results first, somehow manipulate later

The second idea is to do a join query to join the three tables showing per row in OrderLine the order and product details.

My question to pythonista out there is is there a nice way to convert this to nested json.

Another way?

This really seems like such a common requirement I'm really wondering whether there is a book method for this sort of thing? Is there an SQLAchemy version of this

3

3 Answers

1
votes

Look into marshmallow-sqlalchemy, as it does exactly what you're looking for.

I strongly advise against baking your serialization directly into your model, as you will eventually have two services requesting the same data, but serialized in a different way (including fewer or more nested relationships for performance, for instance), and you will either end up with either (1) a lot of bugs that your test suite will miss unless you're checking for literally every field or (2) more data serialized than you need and you'll run into performance issues as the complexity of your application scales.

With marshmallow-sqlalchemy, you'll need to define a schema for each model you'd like to serialize. Yes, it's a bit of extra boilerplate, but believe me - you will be much happier in the end.

We build applications using flask-sqlalchemy and marshmallow-sqlalchemy like this (also highly recommend factory_boy so that you can mock your service and write unit tests in place of of integration tests that need to touch the database):

# models

class Parent(Base):
    __tablename__ = 'parent'
    id = Column(Integer, primary_key=True)
    children = relationship("Child", back_populates="parent")

class Child(Base):
    __tablename__ = 'child'
    id = Column(Integer, primary_key=True)
    parent_id = Column(Integer, ForeignKey('parent.id'))
    parent = relationship('Parent', back_populates='children',
                          foreign_keys=[parent_id])

# schemas. Don't put these in your models. Avoid tight coupling here

from marshmallow_sqlalchemy import ModelSchema
import marshmallow as ma


class ParentSchema(ModelSchema):
    children = ma.fields.Nested(
        'myapp.schemas.child.Child', exclude=('parent',), many=True)
    class Meta(ModelSchema.Meta):
        model = Parent
        strict = True
        dump_only = ('id',)


class ChildSchema(ModelSchema):
    parent = ma.fields.Nested(
        'myapp.schemas.parent.Parent', exclude=('children',))
    class Meta(ModelSchema.Meta):
        model = Child
        strict = True
        dump_only = ('id',)

# services

class ParentService:
    '''
    This service intended for use exclusively by /api/parent
    '''
    def __init__(self, params, _session=None):
        # your unit tests can pass in _session=MagicMock()
        self.session = _session or db.session
        self.params = params

    def _parents(self) -> typing.List[Parent]:
        return self.session.query(Parent).options(
            joinedload(Parent.children)
        ).all()

    def get(self):
        schema = ParentSchema(only=(
            # highly recommend specifying every field explicitly
            # rather than implicit
            'id',
            'children.id',
        ))
        return schema.dump(self._parents()).data

# views

@app.route('/api/parent')
def get_parents():
    service = ParentService(params=request.get_json())
    return jsonify(data=service.get())


# test factories
class ModelFactory(SQLAlchemyModelFactory):
    class Meta:
        abstract = True
        sqlalchemy_session = db.session

class ParentFactory(ModelFactory):
    id = factory.Sequence(lambda n: n + 1)
    children = factory.SubFactory('tests.factory.children.ChildFactory')

class ChildFactory(ModelFactory):
    id = factory.Sequence(lambda n: n + 1)
    parent = factory.SubFactory('tests.factory.parent.ParentFactory')

# tests
from unittest.mock import MagicMock, patch

def test_can_serialize_parents():
    parents = ParentFactory.build_batch(4)
    session = MagicMock()
    service = ParentService(params={}, _session=session)
    assert service.session is session
    with patch.object(service, '_parents') as _parents:
        _parents.return_value = parents
        assert service.get()[0]['id'] == parents[0].id
        assert service.get()[1]['id'] == parents[1].id
        assert service.get()[2]['id'] == parents[2].id
        assert service.get()[3]['id'] == parents[3].id
1
votes

I would add a .json() method to each model, so that they call each other. It's essentially your "hacked" solution but a bit more readable/maintainable. Your Order model could have:

def json(self):
    return {
        "id": self.id,
        "order_lines": [line.json() for line in self.order_lines]
    }

Your OrderLine model could have:

def json(self):
    return {
        "product_id": self.product_id,
        "product_name": self.product.name,
        "quantity": self.quantity
    }

Your resource at the top level (where you're making the request for orders) could then do:

...
orders = Order.query.all()
return {"orders": [order.json() for order in orders]}
...

This is how I normally structure this JSON requirement.

0
votes

Check my answer in this thread Flask Sqlalchmey - Marshmallow Nested Schema fails for joins with filter ( where ) conditions and using the Marshmallow package you include in your schema something like this:

name = fields.Nested(Schema, many=True)