26
votes

Is there any way to 1) filter and 2) retrieve the raw log data out of Cloudwatch via the API or from the CLI? I need to extract a subset of log events from Cloudwatch for analysis.

I don't need to create a metric or anything like that. This is for historical research of a specific event in time.

I have gone to the log viewer in the console but I am trying to pull out specific lines to tell me a story around a certain time. The log viewer would be nigh-impossible to use for this purpose. If I had the actual log file, I would just grep and be done in about 3 seconds. But I don't.

Clarification

In the description of Cloudwatch Logs, it says, "You can view the original log data (only in the web view?) to see the source of the problem if needed. Log data can be stored and accessed (only in the web view?) for as long as you need using highly durable, low-cost storage so you don’t have to worry about filling up hard drives." --italics are mine

If this console view is the only way to get at the source data, then storing logs via Cloudwatch is not an acceptable solution for my purposes. I need to get at the actual data with sufficient flexibility to search for patterns, not click through dozens of pages lines and copy/paste. It appears a better way to get to the source data may not be available however.

5

5 Answers

55
votes

For using AWSCLI (plain one as well as with cwlogs plugin) see http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/SearchDataFilterPattern.html

For pattern syntax (plain text, [space separated] as as {JSON syntax}) see: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/FilterAndPatternSyntax.html

For python command line utility awslogs see https://github.com/jorgebastida/awslogs.

AWSCLI: aws logs filter-log-events

AWSCLI is official CLI for AWS services and now it supports logs too.

To show help:

$ aws logs filter-log-events help

The filter can be based on:

  • log group name --log-group-name (only last one is used)
  • log stream name --log-stream-name (can be specified multiple times)
  • start time --start-time
  • end time --end-time (not --stop-time)
  • filter patter --filter-pattern

Only --log-group-name is obligatory.

Times are expressed as epoch using milliseconds (not seconds).

The call might look like this:

$ aws logs filter-log-events \
    --start-time 1447167000000 \
    --end-time 1447167600000 \
    --log-group-name /var/log/syslog \
    --filter-pattern ERROR \
    --output text

It prints 6 columns of tab separated text:

  • 1st: EVENTS (to denote, the line is a log record and not other information)
  • 2nd: eventId
  • 3rd: timestamp (time declared by the record as event time)
  • 4th: logStreamName
  • 5th: message
  • 6th: ingestionTime

So if you have Linux command line utilities at hand and care only about log record messages for interval from 2015-11-10T14:50:00Z to 2015-11-10T15:00:00Z, you may get it as follows:

$ aws logs filter-log-events \
    --start-time `date -d 2015-11-10T14:50:00Z +%s`000 \
    --end-time `date -d 2015-11-10T15:00:00Z +%s`000 \
    --log-group-name /var/log/syslog \
    --filter-pattern ERROR \
    --output text| grep "^EVENTS"|cut -f 5

AWSCLI with cwlogs plugin

The cwlogs AWSCLI plugin is simpler to use:

$ aws logs filter \
    --start-time 2015-11-10T14:50:00Z \
    --end-time 2015-11-10T15:00:00Z \
    --log-group-name /var/log/syslog \
    --filter-pattern ERROR

It expects human readable date-time and always returns text output with (space delimited) columns:

  • 1st: logStreamName
  • 2nd: date
  • 3rd: time
  • 4th till the end: message

On the other hand, it is a bit more difficult to install (few more steps to do plus current pip requires to declare the installation domain as trusted one).

$ pip install awscli-cwlogs --upgrade \
--extra-index-url=http://aws-cloudwatch.s3-website-us-east-1.amazonaws.com/ \
--trusted-host aws-cloudwatch.s3-website-us-east-1.amazonaws.com
$ aws configure set plugins.cwlogs cwlogs

(if you make typo in last command, just correct it in ~/.aws/config file)

awslogs command from jorgebastida/awslogs

This become my favourite one - easy to install, powerful, easy to use.

Installation:

$ pip install awslogs

To list available log groups:

$ awslogs groups

To list log streams

$ awslogs streams /var/log/syslog

To get the records and follow them (see new ones as they come):

$ awslogs get --watch /var/log/syslog

And you may filter the records by time range:

$ awslogs get /var/log/syslog -s 2015-11-10T15:45:00 -e 2015-11-10T15:50:00

Since version 0.2.0 you have there also the --filter-pattern option.

The output has columns:

  • 1st: log group name
  • 2nd: log stream name
  • 3rd: message

Using --no-group and --no-stream you may switch the first two columns off.

Using --no-color you may get rid of color control characters in the output.

EDIT: as awslogs version 0.2.0 adds --filter-pattern, text updated.

1
votes

If you are using the Python Boto3 library for extraction of AWS cloudwatch Logs. The function of get_log_events() accepts start and end time in milliseconds.

For reference: http://boto3.readthedocs.org/en/latest/reference/services/logs.html#CloudWatchLogs.Client.get_log_events

For this you can take a UTC time input and convert it into milliseconds by using the Datetime and timegm modules and you are good to go:

from calendar import timegm
from datetime import datetime, timedelta

# If no time filters are given use the last hour
now = datetime.utcnow()
start_time = start_time or now - timedelta(hours=1)
end_time = end_time or now
start_ms = timegm(start_time.utctimetuple()) * 1000
end_ms = timegm(end_time.utctimetuple()) * 1000

So, you can give inputs as stated below y using sys input as:

python flowlog_read.py '2015-11-13 00:00:00' '2015-11-14 00:00:00'
1
votes

While Jan's answer is a great one and probably what the author wanted, please note that there is an additional way to get programmatic access to the logs - via subscriptions.

This is intended for always-on streaming scenarios where data is constantly fetched (usually into Kinesis stream) and then further processed.

0
votes

Haven't used it myself, but here is an open-source cloudwatch to Excel exporter I came across on GitHub:

https://github.com/petezybrick/awscwxls

Generic AWS CloudWatch to Spreadsheet Exporter CloudWatch doesn't provide an Export utility - this does. awscwxls creates spreadsheets based on generic sets of Namespace/Dimension/Metric/Statistic specifications. As long as AWS continues to follow the Namespace/Dimension/Metric/Statistic pattern, awscwxls should work for existing and future Namespaces (Services). Each set of specifications is stored in a properties file, so each properties file can be configured for a specific set of AWS Services and resources. Take a look at run/properties/template.properties for a complete example.

0
votes

I think the best option to retrieve the data is provided as described in the API.