Can a Python list, set or dictionary be implemented invisibly using a database?

Question

The Python native capabilities for lists, sets & dictionaries totally rock. Is there a way to continue using the native capability when the data becomes really big? The problem I'm working on involved matching (intersection) of very large lists. I haven't pushed the limits yet -- actually I don't really know what the limits are -- and don't want to be surprised with a big reimplementation after the data grows as expected.

Is it reasonable to deploy on something like Google App Engine that advertises no practical scale limit and continue using the native capability as-is forever and not really think about this?

Is there some Python magic that can hide whether the list, set or dictionary is in Python-managed memory vs. in a DB -- so physical deployment of data can be kept distinct from what I do in code?

How do you, Mr. or Ms. Python Super Expert, deal with lists, sets & dicts as data volume grows?

Maybe u need SQLAlchemy? Or another ORM? And save u data to database? — Denis
Mr. and Ms. Python Super Experts are called pythonistas. ;-) — Aufwind
It's extremely hard, if not impossible, to serialize and deserialize arbitrary Python objects but it's easy to persistent a subset of Python objects, like int, str, list and dict using pickle/json or whatever. However data persistence only consists of a small part in your problem. Another problem to solve is that you need to create some kind of mapper to map your object with the database. If you are using relational databases like Postgresql or MySQL, you may take a look at ORMs like Sqlalchemy but If you can only use GAE's bigtable then you may need to write your own ORM... — Overmind Jiang
@Druss: not official, nor ever will be. For myself, I'm a snake charmer. — Chris Morgan
@Druss, first time I hear of snake charmers. Sounds neat, too. ;-) — Aufwind

Daniel Hepper Daniel Hepper · Accepted Answer · 2011-07-11T13:57:51

I'm not quite sure what you mean by native capabilities for lists, sets & dictionaries. However, you can create classes that emulate container types and sequence types by defining some methods with special names. That means that you could create a class that behaves like a list, but stores its data in a SQL database or on GAE datastore. Simply speaking, this is what an ORM does. However, mapping objects to a database is very complicated and it is probably not a good idea to invent your own ORM, but to use an existing one.

I'm afraid there is no one-size-fits-all solution. Especially GAE is not some kind of of Magic Fairy Dust you can sprinkle on your code to make it scale. There are several limitations you have to keep in mind to create an application that can scale. Some of them are general, like computational complexity, others are specific to the environment your code runs in. E.g. on GAE ~~the maximum response time is limited to 30 seconds and~~ querying the datastore works different that on other databases.

It's hard to give any concrete advice without knowing your specific problem, but I doubt that GAE is the right solution.

In general, if you want to work with large datasets, you either have to keep that in mind from the start or you will have to rework your code, algorithms and data structures as the datasets grow.

Can a Python list, set or dictionary be implemented invisibly using a database?

3 Answers