36
votes

I've used pm2 for my Node.js script and I love it.
Now I have a python script which collect streaming data on EC2. Sometimes the script bombs out and I would like a process manager to restart itself like pm2.

Is there something the same as pm2 for python? I've been searching around and couldn't find anything.

Here's my error

  File "/usr/local/lib/python2.7/dist-packages/tweepy/streaming.py", line 430, in filter
    self._start(async)
  File "/usr/local/lib/python2.7/dist-packages/tweepy/streaming.py", line 346, in _start
    self._run()
  File "/usr/local/lib/python2.7/dist-packages/tweepy/streaming.py", line 286, in _run
    raise exception
AttributeError: 'NoneType' object has no attribute 'strip'
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util/ssl_.py:90:

It's a simple data collecting script

class StdOutListener(StreamListener):

    def on_data(self, data):
        mydata = json.loads(data)
        db.raw_tweets.insert_one(mydata)
        return True

    def on_error(self, status):
        mydata = json.loads(status)
        db.error_tweets.insert_one(mydata)


if __name__ == '__main__':

    #This handles Twitter authetification and the connection to Twitter Streaming API
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, l)

    #This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
    stream.filter(follow=[''])

That I would like it to just restart itself in case something happens.

7
May I suggest supervisord.orgdoog abides

7 Answers

6
votes

UPD: See answers below for better solutions.

--

There are several solutions for that. First, you may use http://supervisord.org/ which is a decent universal process controll system, which includes a lot of features out of the box, such as autorestart, restart counter, logging, flexible configuration and more.

Beyond that, you may just wrap your implementation logic into a function, run it within try except block, catch all exceptions and when an exception is cought, run the function again instead of exiting the script. In your case such function might include creating listener, authentication and stream part.

97
votes

You can actually run python scripts from within pm2:

pm2 start echo.py

If the script ends in a .py suffix it will use a python interpreter by default. If your filename doesn't end in .py you can do:

pm2 start echo --interpreter=python

I've found you have to be a little bit careful which python you are using, especially if you are using a virtualenv python with a different version to the 'default' python on your machine.

13
votes

PM2 is enough, it will run interpreter by suffix:

{
  ".sh": "bash",
  ".py": "python",
  ".rb": "ruby",
  ".coffee" : "coffee",
  ".php": "php",
  ".pl" : "perl",
  ".js" : "node"
}
11
votes

I created a echosystem file ecosystem.config.json

{
    "apps": [{
        "name": "app_name",
        "script": "/the/app/path/my_app.py",
        "args": ["-c", "my_config.prod.json"],
        "instances": "1",
        "wait_ready": true,
        "autorestart": false,
        "max_restarts": 5,
        "interpreter" : "/path/to/venv/bin/python",
    }]
}

Run the pm2 service:

$ pm2 start ecosystem.config.json
$ pm2 -v
3.2.8
10
votes

PM2 with pipenv

For those trying to run a python program from/with pipenv try a pm2.config.json (or ecosystem.json.config as in the official documentation of PM2) like this:

The important parts being "interpreter" : "pipenv" and "interpreter_args": "run python3".

pm2.config.json

{
    "apps": [{
        "name": "BackupService",
        "script": "/home/service-backup/service/server.py",
        "args": [""],
        "wait_ready": true,
        "autorestart": false,
        "max_restarts": 5,
        "interpreter" : "pipenv",
        "interpreter_args": "run python3"
    }]
}

Then pm2 start pm2.config.json. I always pm2 delete BackupService (or whatever you call it in "name"), before starting again, because even with the --update-env flag it does not make use of a updated pm2.config.json. Don't know why.

Also note that "interpreter_args", seems to have been changed to "node_args", according to the latest PM2 docs. I am running pm2 --version 3.0.0, and the old way still works.

PM2 with Python multiprocessing

If you want to run a python program that uses Pythons multiprocessing lib, the solution is to force running it in fork mode. PM2, if not told otherwise, automatically tries to run it in cluster mode, it seems.

However, I suspect, we need to leave the multiprocessing part to Python completely. I can't imagine PM2 being able to manage the multiple processes being spawned by the multiprocessing of Python — which it tries, when running in cluster mode. Also, when using the "interpreter" option (e.g. for pipenv), only fork_mode will work, according to the PM2 docs.

So add "exec_mode": "fork" to your pm2.config.json to make it run.

If you don't use a pm2.config.json file, passing -i 0 to pm2 start should force fork mode as well. (-i stands for instances)

2
votes

In my case I use scrapyd in my project. The original command is:

scrapyd --pidfile /var/log/scrapyd/twistd.pid -l /var/log/scrapyd/logs/scrapyd.log

and the pm2 version is:

pm2 start scrapyd --interpreter python --watch --name=scrapyd -- --pidfile "/var/log/scrapyd/twistd.pid" -l "/var/log/scrapyd/logs/scrapyd.log"

hope this example can help

2
votes

You can use nohup- Nohup, short for no hang-up is a command in Linux systems that keep processes running even after exiting the shell or terminal. Nohup prevents the processes or jobs from receiving the SIGHUP (Signal Hang UP) signal. This is a signal that is sent to a process upon closing or exiting the terminal. Some basics nohup's commands are given below.

 nohup mycommand

   OR

 nohup python3 -m flask run &