6
votes

I am trying to set up a logger for my AWS Glue job using Python's logging module. I have a Glue job with the type as "Python Shell" using Python version 3.

Logging works fine if I instantiate the logger without any name, but if I give my logger a name, it no longer works, and I get an error which says: Log stream not found.

I have the following code in an example Glue job:

import sys
import logging

# Version 1 - this works fine
logger = logging.getLogger()
log_format = "[%(asctime)s %(levelname)-8s %(message)s"

# Version 2 - this fails
logger = logging.getLogger(name = "foobar")
log_format = "[%(name)s] %(asctime)s %(levelname)-8s %(message)s"

date_format = "%a, %d %b %Y %H:%M:%S %Z"
log_stream = sys.stdout
if logger.handlers:
  for handler in logger.handlers:
    logger.removeHandler(handler)
logging.basicConfig(level = logging.INFO, format = log_format, stream =
    log_stream, datefmt = date_format)
logger.info("This is a test.")

Note that I'm removing the handlers based on this post.

If I instantiate the logger using Version 1 of the code, it runs successfully and I am able to view the logs, as well as query them in CloudWatch.

If I run Version 2, giving the logger a name, the Glue job still succeeds. However, if I try to view the logs, I get the following error message:

Log stream not found
The log stream jr_f137743545d3d242618ac95d859b9146fd15d15a0aadce64d8f3ba991ffed012 could not be found. Check if it was correctly created and retry.

And I am also not able to query these logs in CloudWatch.

I have tried running this code locally using python version 3.6.0, and both versions work. Additionally, both versions of this logging code work inside of a Lambda funcvtion. They only fail in Glue.

3

3 Answers

7
votes

This code worked for me:

import sys

root = logging.getLogger()
root.setLevel(logging.DEBUG)

handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
root.addHandler(handler)
root.info("check")
4
votes

I had a similar issue but fixed it with a combination of correct roles and looking in the right place in Cloudwatch. Make sure you're using the GlueServiceRole. Both Steve and your logging code is fine but the place that you are taken in Cloudwatch when you click on the "logs" button in Glue isn't the correct logging folder.

Go back into log groups then go into /aws-glue/python-jobs/error and that is the where the logger write to, while the stdout writes to the folder /aws-glue/python-jobs/output. It's not a very intuitive setup writing the logs to the error logs folder but hey ho, I'm sure there is a way of configuring it to get it to write where expected.

2
votes

You should be able to name the log stream by using the following (replace "logger-name-here" with your desired log stream name):

import logging

MSG_FORMAT = '%(asctime)s %(levelname)s %(name)s: %(message)s'
DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S'
logging.basicConfig(format=MSG_FORMAT, datefmt=DATETIME_FORMAT)
logger = logging.getLogger(<logger-name-here>)

logger.setLevel(logging.INFO)

logger.info("Test log message")