4
votes

How do you store a object with size bigger than 1 MB in memcache? Is there a way to split it up, but have the data still be accessible with the same key?

5
What is the nature of the object, and why are you trying to cache it in memcache?Nick Johnson

5 Answers

9
votes

I use the following module ("blobcache") for storing values with sizes greater than 1Mb in GAE's memcache.

import pickle
import random
from google.appengine.api import memcache


MEMCACHE_MAX_ITEM_SIZE = 900 * 1024


def delete(key):
  chunk_keys = memcache.get(key)
  if chunk_keys is None:
    return False
  chunk_keys.append(key)
  memcache.delete_multi(chunk_keys)
  return True


def set(key, value):
  pickled_value = pickle.dumps(value)

  # delete previous entity with the given key
  # in order to conserve available memcache space.
  delete(key)

  pickled_value_size = len(pickled_value)
  chunk_keys = []
  for pos in range(0, pickled_value_size, MEMCACHE_MAX_ITEM_SIZE):
    # TODO: use memcache.set_multi() for speedup, but don't forget
    # about batch operation size limit (32Mb currently).
    chunk = pickled_value[pos:pos + chunk_size]

    # the pos is used for reliable distinction between chunk keys.
    # the random suffix is used as a counter-measure for distinction
    # between different values, which can be simultaneously written
    # under the same key.
    chunk_key = '%s%d%d' % (key, pos, random.getrandbits(31))

    is_success = memcache.set(chunk_key, chunk)
    if not is_success:
      return False
    chunk_keys.append(chunk_key)
  return memcache.set(key, chunk_keys)


def get(key):
  chunk_keys = memcache.get(key)
  if chunk_keys is None:
    return None
  chunks = []
  for chunk_key in chunk_keys:
    # TODO: use memcache.get_multi() for speedup.
    # Don't forget about the batch operation size limit (currently 32Mb).
    chunk = memcache.get(chunk_key)
    if chunk is None:
      return None
    chunks.append(chunk)
  pickled_value = ''.join(chunks)
  try:
    return pickle.loads(pickled_value)
  except Exception:
    return None
2
votes

There are memcache methods set_multi and get_multi that take a dictionary and a prefix as arguments.

If you could split your data into a dictionary of chunks you could use this. Basically, the prefix would become your new key name.

You'd have to keep track of the names of the chunks somehow. Also, ANY of the chunks could be evicted from memcache at any time, so you'd also need someway to reconstitute partial data.

0
votes

The best way to store a large blob of data into memcache is to split it up into chunks and use set_multi and get_multi to efficiently store and retrieve the data.

But be aware that it's possible for some parts to drop from cache and others to remain.

You can also cache data in the application instance by storing it in a global variable, but this is less ideal as it won't be shared across instances and is more likely to disappear.

Support for uploading to the blobstore from within the application is on the GAE roadmap, you might want to keep an eye out for that, as well as integration with Google Storage.

0
votes

As other guys have mentioned, you can add and retrieve multiple values from memcache at once. Interestingly, while the app engine blog says these bulk operations can handle up to 32mb, the official documentation still says they're limited to 1mb. So definitely test it out, and maybe pester Google about updating their documentation. And also keep in mind that some of your chunks might get evicted from memcache before others.

I'd recommend googling python compress string and thinking about serializing and compressing your object before sending it to memcache.

You might also want to ask this guy what he means about having an extension that allows him to store larger objects in memcache.

0
votes

A nice workaround is to use layer_cache.py, a python class written and used at Khan Academy (open source). Basically it's a combination of in-memory cache (cachepy module) with memcache being used as a way of syncing the in-memory cache through instances. find the source here and read Ben Kamens blog post about it here.