2
votes

I have a several crawlers that crawls multiple sites and stores the contents in a database. The logs from the program are stored in CloudWatch Logs.

If the crawlers successfully pulls back content it looks like similarly to below

HTTP GET: 200 - https://www.thecheyennepost.com/news/national/r

HTTP GET: 200 - https://www.thecheyennepost.com/news/f-e-warren-hous

The issue I'm dealing with is identifying when 400 errors pop up. Below is an example:

HTTP GET: 429 - https://www.livingstonparishnews.com/search/?l=25&sort=

HTTP GET: 429 - https://www.livingstonparishnews.com/search/?l=25&sort=rele

HTTP GET: 429 - https://www.ktbs.com/search/?l=25&s=start_time&sd=desc&f=

I tried using status_code=4* but that didn't do anything

I just want to be able to filter any and all 400 errors.

Any help that can be provided would be greatly appreciated.

1

1 Answers

3
votes

Yes! Now you can with Logs Insights :)

First... you need to have the new UI or in another way go to "Logs Insights" service... jaja

CloudWatch -> CloudWatch Logs -> Log groups -> [your service logs]

With the new UI you can see this button (or go to Logs Insights in the search engine of aws cli):

Cloud Watch Example

Now you can see this:

Logs Insights UI

  1. It's a box for querys, it's like a SQL.
  2. The time range in which you will search

Now in your case.. you need this query (tell me if you need to filter another thing)

fields @message
| sort @timestamp desc
| filter @message like /4{1}[0-9]{1}[0-9]{1}/

I see your logs and you have spaces between your status code and I think this is the best

fields @message
| sort @timestamp desc
| filter @message like / 4{1}[0-9]{1}[0-9]{1} /

And that's all

Now run the query and you will see only logs that contains status codes [4xx]. I hope that solve your problem

NOTE: if you go directly from search engine to Logs Insights you need to select the service logs that you scan with the query. On the combobox in top of query box.