4
votes

I have created a AWS glue job which executes successfully. However, I am unable to put any custom logging inside the job.

How can I create a log file in AWS S3 bucket so that I can keep a track of the everyday job execution?

Currently, when my job executes it creates the default logs (i.e. spark logs) and I can see it in AWS cloud watch. What is best practice for logging events in AWS glue?

1
please be specific what information you cant to show in cloud watch i will suggest a solution according to thatvarnit
I agree to the point that by mentioning print or logger statements we can log all the events in cloud watch.But the issue is that the cloud watch logs become too big to find out the root cause of the failure.Hence thinking of a creating a custom log to create in S3 bucket so that the spark logs goes to the cloud watch and the custom logs gets created in S3 bucket.trp
ya you can do that just create a cloudwatch event to watch aws glue job and then store the logs by executing your custom code through aws lambda is that what you want ?varnit
Yes @varnit.I am very new to AWS.Can you please elaborate or provide me any link which would be helpful?trp
Errors are logged separately by cloud watch which is not bulky like the default log. Make sure you are logging correctly.Abraham

1 Answers

3
votes

Best-practice logging

AWS Glue is designed to best log via CloudWatch (see this documentation for details). Since your logs are getting too big to identify the root cause, and there's no event to hook in CloudWatch that'd line up with @varnit's suggestion, we can do the next-best thing: create a CloudWatch dashboard with a query pulling a filtered version of your logs.

Create a custom dashboard

On the CloudWatch console, navigate to "Dashboards" and select "Create dashboard". Name it something meaningful (e.g., "glue-custom-logs"). Continuing, we'll add and configure a "Query results" widget. Choose your log stream, likely "/aws-glue/jobs/error" if you went with the defaults, and note that Glue defaults to the error stream if you're using normal python prints. Choose a sane window of time for your lookback so your results are somewhat pre-filtered.

If you have a unique identifier in your custom log messages, such as "glue-custom-log", we can now easily write a query filtering the results:

fields @timestamp, @message
| filter @message like 'glue-custom-log'
| sort @timestamp desc

Save the widget, save the dashboard, and you now have an easy-access pre-filtered log in CloudWatch for your custom logging needs.