35
votes

I have some code in the form of a string and would like to make a module out of it without writing to disk.

When I try using imp and a StringIO object to do this, I get:

>>> imp.load_source('my_module', '', StringIO('print "hello world"'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: load_source() argument 3 must be file, not instance
>>> imp.load_module('my_module', StringIO('print "hello world"'), '', ('', '', 0))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: load_module arg#2 should be a file or None

How can I create the module without having an actual file? Alternatively, how can I wrap a StringIO in a file without writing to disk?

UPDATE:

NOTE: This issue is also a problem in python3.

The code I'm trying to load is only partially trusted. I've gone through it with ast and determined that it doesn't import anything or do anything I don't like, but I don't trust it enough to run it when I have local variables running around that could get modified, and I don't trust my own code to stay out of the way of the code I'm trying to import.

I created an empty module that only contains the following:

def load(code):
    # Delete all local variables
    globals()['code'] = code
    del locals()['code']

    # Run the code
    exec(globals()['code'])

    # Delete any global variables we've added
    del globals()['load']
    del globals()['code']

    # Copy k so we can use it
    if 'k' in locals():
        globals()['k'] = locals()['k']
        del locals()['k']

    # Copy the rest of the variables
    for k in locals().keys():
        globals()[k] = locals()[k]

Then you can import mymodule and call mymodule.load(code). This works for me because I've ensured that the code I'm loading does not use globals. Also, the global keyword is only a parser directive and can't refer to anything outside of the exec.

This really is way too much work to import the module without writing to disk, but if you ever want to do this, I believe it's the best way.

6
If you don't trust the code, do not load it. Its terribly hard to make a python interpreter really safe (e.g. there are some ways to patch the bytecode of some different stack frame once you have access to a code object). So if this is just to prevent errors it might work, as a security thing it is probably a bad idea. - schlenk
I've used ast pretty extensively to ensure it can't do anything dangerous. There may be holes, but I'm pretty sure that the holes can be patched if discovered. - Conley Owens
del locals()['code'] does nothing, as the locals() dictionary is but a proxy, a reflection of the actual locals. Just use del code if you must delete a reference. Not that I am sure why you are doing this at all. Why the whole dance with globals() and locals() in the first place? - Martijn Pieters♦
@ConleyOwens: I'm pretty sure they can't be patched. Python's dynamic nature leaves you wide open. Execute untrusted code in a virtual machine without network access that you then wipe and restart from a trusted image. - Martijn Pieters♦

6 Answers

59
votes

Here is how to import a string as a module (Python 2.x):

import sys,imp

my_code = 'a = 5'
mymodule = imp.new_module('mymodule')
exec my_code in mymodule.__dict__

In Python 3, exec is a function, so this should work:

import sys,imp

my_code = 'a = 5'
mymodule = imp.new_module('mymodule')
exec(my_code, mymodule.__dict__)

Now access the module attributes (and functions, classes etc) as:

print(mymodule.a)
>>> 5

To ignore any next attempt to import, add the module to sys:

sys.modules['mymodule'] = mymodule
13
votes

imp.new_module is deprecated since python 3.4

but the short solution from schlenk using types.ModuleType is still working in python 3.7

imp.new_module was replaced with importlib.util.module_from_spec

importlib.util.module_from_spec is preferred over using types.ModuleType to create a new module as spec is used to set as many import-controlled attributes on the module as possible.

importlib.util.spec_from_loader uses available loader APIs, such as InspectLoader.is_package(), to fill in any missing information on the spec.

these module attributes are __builtins__, __doc__, __loader__, __name__, __package__, __spec__

import sys, importlib

my_name = 'my_module'
my_spec = importlib.util.spec_from_loader(my_name, loader=None)

my_module = importlib.util.module_from_spec(my_spec)

my_code = '''
def f():
    print('f says hello')
'''
exec(my_code, my_module.__dict__)
sys.modules['my_module'] = my_module

my_module.f()
5
votes

You could simply create a Module object and stuff it into sys.modules and put your code inside.

Something like:

import sys
from types import ModuleType
mod = ModuleType('mymodule')
sys.modules['mymodule'] = mod
exec(mycode, mod.__dict__)
3
votes

If the code for the module is in a string, you can forgo using StringIO and use it directly with exec, as illustrated below with a file named dynmodule.py. Works in Python 2 & 3.

from __future__ import print_function

class _DynamicModule(object):
    def load(self, code):
        execdict = {'__builtins__': None}  # optional, to increase safety
        exec(code, execdict)
        keys = execdict.get(
            '__all__',  # use __all__ attribute if defined
            # else all non-private attributes
            (key for key in execdict if not key.startswith('_')))
        for key in keys:
            setattr(self, key, execdict[key])

# replace this module object in sys.modules with empty _DynamicModule instance
# see Stack Overflow question:
# https://stackguides.com/questions/5365562/why-is-the-value-of-name-changing-after-assignment-to-sys-modules-name
import sys as _sys
_ref, _sys.modules[__name__] = _sys.modules[__name__], _DynamicModule()

if __name__ == '__main__':
    import dynmodule  # name of this module
    import textwrap  # for more readable code formatting in sample string

    # string to be loaded can come from anywhere or be generated on-the-fly
    module_code = textwrap.dedent("""\
        foo, bar, baz = 5, 8, 2

        def func():
            return foo*bar + baz

        __all__ = 'foo', 'bar', 'func'  # 'baz' not included
        """)

    dynmodule.load(module_code)  # defines module's contents

    print('dynmodule.foo:', dynmodule.foo)
    try:
        print('dynmodule.baz:', dynmodule.baz)
    except AttributeError:
        print('no dynmodule.baz attribute was defined')
    else:
        print('Error: there should be no dynmodule.baz module attribute')
    print('dynmodule.func() returned:', dynmodule.func())

Output:

dynmodule.foo: 5
no dynmodule.baz attribute was defined
dynmodule.func() returned: 42

Setting the '__builtins__' entry to None in the execdict dictionary prevents the code from directly executing any built-in functions, like __import__, and so makes running it safer. You can ease that restriction by selectively adding things to it you feel are OK and/or required.

It's also possible to add your own predefined utilities and attributes which you'd like made available to the code thereby creating a custom execution context for it to run in. That sort of thing can be useful for implementing a "plug-in" or other user-extensible architecture.

-1
votes

you could use exec or eval to execute python code as a string. see here, here and here

-2
votes

The documentation for imp.load_source says (my emphasis):

The file argument is the source file, open for reading as text, from the beginning. It must currently be a real file object, not a user-defined class emulating a file.

... so you may be out of luck with this method, I'm afraid.

Perhaps eval would be enough for you in this case?

This sounds like a rather surprising requirement, though - it might help if you add some more to your question about the problem you're really trying to solve.