How do you store a object with size bigger than 1 MB in memcache? Is there a way to split it up, but have the data still be accessible with the same key?
5 Answers
I use the following module ("blobcache") for storing values with sizes greater than 1Mb in GAE's memcache.
import pickle
import random
from google.appengine.api import memcache
MEMCACHE_MAX_ITEM_SIZE = 900 * 1024
def delete(key):
chunk_keys = memcache.get(key)
if chunk_keys is None:
return False
chunk_keys.append(key)
memcache.delete_multi(chunk_keys)
return True
def set(key, value):
pickled_value = pickle.dumps(value)
# delete previous entity with the given key
# in order to conserve available memcache space.
delete(key)
pickled_value_size = len(pickled_value)
chunk_keys = []
for pos in range(0, pickled_value_size, MEMCACHE_MAX_ITEM_SIZE):
# TODO: use memcache.set_multi() for speedup, but don't forget
# about batch operation size limit (32Mb currently).
chunk = pickled_value[pos:pos + chunk_size]
# the pos is used for reliable distinction between chunk keys.
# the random suffix is used as a counter-measure for distinction
# between different values, which can be simultaneously written
# under the same key.
chunk_key = '%s%d%d' % (key, pos, random.getrandbits(31))
is_success = memcache.set(chunk_key, chunk)
if not is_success:
return False
chunk_keys.append(chunk_key)
return memcache.set(key, chunk_keys)
def get(key):
chunk_keys = memcache.get(key)
if chunk_keys is None:
return None
chunks = []
for chunk_key in chunk_keys:
# TODO: use memcache.get_multi() for speedup.
# Don't forget about the batch operation size limit (currently 32Mb).
chunk = memcache.get(chunk_key)
if chunk is None:
return None
chunks.append(chunk)
pickled_value = ''.join(chunks)
try:
return pickle.loads(pickled_value)
except Exception:
return None
There are memcache methods set_multi and get_multi that take a dictionary and a prefix as arguments.
If you could split your data into a dictionary of chunks you could use this. Basically, the prefix would become your new key name.
You'd have to keep track of the names of the chunks somehow. Also, ANY of the chunks could be evicted from memcache at any time, so you'd also need someway to reconstitute partial data.
The best way to store a large blob of data into memcache is to split it up into chunks and use set_multi and get_multi to efficiently store and retrieve the data.
But be aware that it's possible for some parts to drop from cache and others to remain.
You can also cache data in the application instance by storing it in a global variable, but this is less ideal as it won't be shared across instances and is more likely to disappear.
Support for uploading to the blobstore from within the application is on the GAE roadmap, you might want to keep an eye out for that, as well as integration with Google Storage.
As other guys have mentioned, you can add and retrieve multiple values from memcache at once. Interestingly, while the app engine blog says these bulk operations can handle up to 32mb, the official documentation still says they're limited to 1mb. So definitely test it out, and maybe pester Google about updating their documentation. And also keep in mind that some of your chunks might get evicted from memcache before others.
I'd recommend googling python compress string
and thinking about serializing and compressing your object before sending it to memcache.
You might also want to ask this guy what he means about having an extension that allows him to store larger objects in memcache.
A nice workaround is to use layer_cache.py, a python class written and used at Khan Academy (open source). Basically it's a combination of in-memory cache (cachepy module) with memcache being used as a way of syncing the in-memory cache through instances. find the source here and read Ben Kamens blog post about it here.