8
votes

I'm trying to install and run pandas on an Amazon Lambda instance. I've used the recommended zip method of packaging my code file model_a.py and related python libraries (pip install pandas -t /path/to/dir/) and uploaded the zip to Lambda. When I try to run a test, this is the error message I get:

Unable to import module 'model_a': C extension: /var/task/pandas/hashtable.so: undefined symbol: PyFPE_jbuf not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace' to build the C extensions first.

Looks like an error in a variable defined in hashtable.so that comes with the pandas installer. Googling for this did not turn up any relevant articles. There were some references to a failure in numpy installation but nothing concrete. Would appreciate any help in troubleshooting this! Thanks.

3
Why don't you try the virtualenv-based approach? That way you won't miss any dependencies required by the python packages that you include in your lambda deployment package.Leon
@Leon Isn't that virtually the same thing?rtindru
I thought they were different, but cannot find any evidence supporting that point of view.Leon
Check the answer at stackoverflow.com/a/43766512/345606 for advice on including Python packages, like Pandas, that have compiled code.Kevin
I've deployed Pandas projects to AWS Lambda several times using Zappa and I haven't hit the problem you're running into. Zappa also works out of virtual environments. So I'm not sure if it's the venv step or how Zappa packages up the libraries that deserves the credit for avoiding the problem.Chad Parmet

3 Answers

2
votes

I would advise you to use Lambda layers to use additional libraries. The size of a lambda function package is limited, but layers can be used up to 250MB (more here).

AWS has open sourced a good package, including Pandas, for dealing with data in Lambdas. AWS has also packaged it making it convenient for Lambda layers. You can find instructions here.

0
votes

I have successfully run pandas code on lambda before. If your development environment is not binary-compatible with the lambda environment, you will not be able to simply run pip install pandas -t /some/dir and package it up into a lambda .zip file. Even if you are developing on linux, you may still run into compatability issues.

So, how do you get around this? The solution is actually pretty simple: run your pip install on a lambda container and use the pandas module that it downloads/builds instead. When I did this, I had a build script that would spin up an instance of the lambci/lambda container on my local system (a clone of the AWS Lambda container in docker), bind my local build folder to /build and run pip install pandas -t /build/. Once that's done, kill the container and you have the lambda-compatible pandas module in your local build folder, ready to zip up and send to AWS along with the rest of your code.

You can do this for an arbitrary set of python modules by making use of a requirements.txt file, and you can even do it for arbitrary versions of python by first creating a virtual environment on the lambci container. I haven't needed to do this for a couple of years, so maybe there are better tools by now, but this approach should at least be functional.

0
votes

If you want to install it directly through the AWS Console, I made a step-by-step youtube tutorial, check out the video here: How to install Pandas on AWS Lambda