59
votes

Is there any way of keeping a result variable in memory so I don't have to recalculate it each time I run the beginning of my script? I am doing a long (5-10 sec) series of the exact operations on a data set (which I am reading from disk) every time I run my script. This wouldn't be too much of a problem since I'm pretty good at using the interactive editor to debug my code in between runs; however sometimes the interactive capabilities just don't cut it.

I know I could write my results to a file on disk, but I'd like to avoid doing so if at all possible. This should be a solution which generates a variable the first time I run the script, and keeps it in memory until the shell itself is closed or until I explicitly tell it to fizzle out. Something like this:

# Check if variable already created this session
in_mem = var_in_memory() # Returns pointer to var, or False if not in memory yet
if not in_mem:
    # Read data set from disk
    with open('mydata', 'r') as in_handle:
        mytext = in_handle.read()
    # Extract relevant results from data set
    mydata = parse_data(mytext)
    result = initial_operations(mydata)
    in_mem = store_persistent(result)

I've an inkling that the shelve module might be what I'm looking for here, but looks like in order to open a shelve variable I would have to specify a file name for the persistent object, and so I'm not sure if it's quite what I'm looking for.

Any tips on getting shelve to do what I want it to do? Any alternative ideas?

7

7 Answers

55
votes

You can achieve something like this using the reload global function to re-execute your main script's code. You will need to write a wrapper script that imports your main script, asks it for the variable it wants to cache, caches a copy of that within the wrapper script's module scope, and then when you want (when you hit ENTER on stdin or whatever), it calls reload(yourscriptmodule) but this time passes it the cached object such that yourscript can bypass the expensive computation. Here's a quick example.

wrapper.py

import sys
import mainscript

part1Cache = None
if __name__ == "__main__":
    while True:
        if not part1Cache:
            part1Cache = mainscript.part1()
        mainscript.part2(part1Cache)
        print "Press enter to re-run the script, CTRL-C to exit"
        sys.stdin.readline()
        reload(mainscript)

mainscript.py

def part1():
    print "part1 expensive computation running"
    return "This was expensive to compute"

def part2(value):
    print "part2 running with %s" % value

While wrapper.py is running, you can edit mainscript.py, add new code to the part2 function and be able to run your new code against the pre-computed part1Cache.

9
votes

To keep data in memory, the process must keep running. Memory belongs to the process running the script, NOT to the shell. The shell cannot hold memory for you.

So if you want to change your code and keep your process running, you'll have to reload the modules when they're changed. If any of the data in memory is an instance of a class that changes, you'll have to find a way to convert it to an instance of the new class. It's a bit of a mess. Not many languages were ever any good at this kind of hot patching (Common Lisp comes to mind), and there are a lot of chances for things to go wrong.

6
votes

If you only want to persist one object (or object graph) for future sessions, the shelve module probably is overkill. Just pickle the object you care about. Do the work and save the pickle if you have no pickle-file, or load the pickle-file if you have one.

import os
import cPickle as pickle

pickle_filepath = "/path/to/picklefile.pickle"

if not os.path.exists(pickle_filepath):
    # Read data set from disk
    with open('mydata', 'r') as in_handle:
        mytext = in_handle.read()
    # Extract relevant results from data set
    mydata = parse_data(mytext)
    result = initial_operations(mydata)
    with open(pickle_filepath, 'w') as pickle_handle:
        pickle.dump(result, pickle_handle)
else:
    with open(pickle_filepath) as pickle_handle:
        result = pickle.load(pickle_handle)
4
votes

Python's shelve is a persistence solution for pickled (serialized) objects and is file-based. The advantage is that it stores Python objects directly, meaning the API is pretty simple.

If you really want to avoid the disk, the technology you are looking for is a "in-memory database." Several alternatives exist, see this SO question: in-memory database in Python.

2
votes

This is a os dependent solution...

$mkfifo inpipe

#/usr/bin/python3
#firstprocess.py
complicated_calculation()
while True:
 with open('inpipe') as f:
  try:
   print( exec (f.read()))
  except Exception as e: print(e)

$./first_process.py &
$cat second_process.py > inpipe

This will allow you to change and redefine variables in the first process without copying or recalculating anything. It should be the most efficient solution compared to multiprocessing, memcached, pickle, shelve modules or databases.

This is really nice if you want to edit and redefine second_process.py iteratively in your editor or IDE until you have it right without having to wait for the first process (e.g. initializing a large dict, etc.) to execute each time you make a change.

0
votes

You can do this but you must use a Python shell. In other words, the shell that you use to start Python scripts must be a Python process. Then, any global variables or classes will live until you close the shell.

Look at the cmd module which makes it easy to write a shell program. You can even arrange so that any commmands that are not implemented in your shell get passed to the system shell for execution (without closing your shell). Then you would have to implement some kind of command, prun for instance, that runs a Python script by using the runpy module.

http://docs.python.org/library/runpy.html

You would need to use the init_globals parameter to pass your special data to the program's namespace, ideally a dict or a single class instance.

0
votes

You could run a persistent script on the server through the os which loads/calcs, and even periodically reloads/recalcs the sql data into memory structures of some sort and then acess the in-memory data from your other script through a socket.