I've been using SQLAlchemy with Alembic to simplify the database access I use, and any data structure changes I make to the tables. This has been working out really well up until I started to notice more and more issues with SQLAlchemy "expiring" fields from my point of view nearly at random.
A case in point would be this snippet,
class HRDecimal(Model):
dec_id = Column(String(50), index=True)
@staticmethod
def qfilter(*filters):
"""
:rtype : list[HRDecimal]
"""
return list(HRDecimal.query.filter(*filters))
class Meta(Model):
dec_id = Column(String(50), index=True)
@staticmethod
def qfilter(*filters):
"""
:rtype : list[Meta]
"""
return list(Meta.query.filter(*filters))
Code:
ids = ['1', '2', '3'] # obviously fake list of ids
decs = HRDecimal.qfilter(
HRDecimal.dec_id.in_(ids))
metas = Meta.qfilter(
Meta.dec_id.in_(ids))
combined = []
for ident in ids:
combined.append((
ident,
[dec for dec in decs if dec.dec_id == ident],
[hm for hm in metas if hm.dec_id == ident]
))
For the above, there wasn't a problem, but when I'm processing a list of ids that may contain a few thousand ids, this process started taking a huge amount of time, and if done from a web request in flask, the thread would often be killed.
When I started poking around with why this was happening, the key area was
[dec for dec in decs if dec.dec_id == ident],
[hm for hm in metas if hm.dec_id == ident]
At some point during the combining of these (what I thought were) Python objects, at some point calling dec.dec_id
and hm.dec_id
, in the SQLAlchemy code, at best, we go into,
def __get__(self, instance, owner):
if instance is None:
return self
dict_ = instance_dict(instance)
if self._supports_population and self.key in dict_:
return dict_[self.key]
else:
return self.impl.get(instance_state(instance), dict_)
Of InstrumentedAttribute
in sqlalchemy/orm/attributes.py
which seems to be very slow, but even worse than this, I've observed times when fields expired, and then we enter,
def get(self, state, dict_, passive=PASSIVE_OFF):
"""Retrieve a value from the given object.
If a callable is assembled on this object's attribute, and
passive is False, the callable will be executed and the
resulting value will be set as the new value for this attribute.
"""
if self.key in dict_:
return dict_[self.key]
else:
# if history present, don't load
key = self.key
if key not in state.committed_state or \
state.committed_state[key] is NEVER_SET:
if not passive & CALLABLES_OK:
return PASSIVE_NO_RESULT
if key in state.expired_attributes:
value = state._load_expired(state, passive)
Of AttributeImpl
in the same file. Horrible issue here is that state._load_expired re-runs the SQL Query completely. So in a situation like this, with a big list of idents, we end up running thousands of "small" SQL queries to the database, where I think we should have only been running two "large" ones at the top.
Now, I've gotten around the expired issue by how I initialise the database for flask with session-options
, changing
app = Flask(__name__)
CsrfProtect(app)
db = SQLAlchemy(app)
to
app = Flask(__name__)
CsrfProtect(app)
db = SQLAlchemy(
app,
session_options=dict(autoflush=False, autocommit=False, expire_on_commit=False))
This has definitely improved the above situation for when a rows fields just seemed to expire seemingly (from my observations) at random, but the "normal" slowness of accessing items to SQLAlchemy is still an issue for what we're currently running.
Is there any way with SQLAlchemy, to get a "real" Python object returned from a query, instead of a proxied one like it is now, so it isn't being affected by this?
Query.all()
in your example only, or also in your real code? As you have it now you're repeatedly iterating over a query objects decs and metas, which perform their queries again and again each time. - Ilja Everilä