Creating Alarm in AWS for EC2 Instances

Question

How to create alarm when 1) Alert when EC2 instance runs for too long (Say for 1 hour). 2)Alert when number of EC2 instances reaches a threshold (say 5 instances at a time)

One more assumption is, these EC2 instance are specific.Say these alerts applicable to EC2 instances where their instance name start with "test".

When i try to create the alarm , i haven't see this logic in Metrics. Standard Metrics include CPU Utilization, Network In, Network Out etc.

Is there a way to create this alarm either by defining our custom metrics or some other options?

Have you explored cloudwatch documentation? Their are apis which allow you to publish metrics — Shibashis
@Shibashis had a look into publich metrics.But i am not sure where the logics of the metrics is defined.Eg: docs.aws.amazon.com/cli/latest/reference/cloudwatch/… , i am seing only metric name, statiscal outputs and units are defined. But what i want is, suppose metric name is EC2InstanceHealthDuration (means how long the EC2 instances ran until it started).There should be some unix scripts about the logic that metric name is doing. Please let me know where can i find the same. — Vasanth
cloudwatch only collects stats and allows you to create alerts on them, You will need to create scripts for watching how long the instance has been up and then push that metric to cloudwatch. If you dont want to develop such logic you may need to consider other software like datadog,new relic, nagios etc — Shibashis
Thanks @Shibashis, so as for the above requirement 1) when EC2 instance runs for too long (Say for 1 hour). 2)Alert when number of EC2 instances reaches a threshold(say 5 instances at a time) doesn't able to use standard metrics. If we write a custom metric, where we have to write the logic of the scripts. I saw in put-metric-data (docs.aws.amazon.com/cli/latest/reference/cloudwatch/…), there is no option to place the scripts. It would be good, if you throw some light on this. — Vasanth
For pushing the metric u have many options, i would suggest one of these three options 1> place the scripts on instance themselves 2> create another ec2 where the scripts will run 3> Use AWS lambda scheduled events to push the metrics. — Shibashis

cage cage · Accepted Answer · 2017-07-26T16:39:05

For automatically deployed instances it’s impossible to setup CloudWatch Alarm as you do not know the instance ID. The only way to setup an alarm was to create an AWS Lambda function that poles all the running instances and compares their launch time to a specified timeout.

The lambda function is periodically triggered by a CloudWatch - Event – Rule.

Use tags to specify different run durations to different machines. For example your launch tool should tag the instance with key value “Test”

Please note this code comes with NO warranties at all! This is more of an example.

import boto3
import datetime
import json
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText

ec2_client = boto3.client('ec2')

INSTANCE_TIMEOUT = 24
MAX_PERMITTED_INSTANCES = 5 
MAILING_LIST = "jon.doh@simpsons.duf, drunk@beer.com"

def parse_tag(tags, keyValuePair):
    for tag in tags:
        if tag['Key'] == keyValuePair[0] and tag['Value'] == keyValuePair[1]:
            return True
    return False

def runtimeExceeded(instance, timeOutHours):
    # Working in to UTC to avoid time-travel during daylight-saving changeover
    timeNow = datetime.datetime.utcnow()
    instanceRuntime = timeNow - instance.launch_time.replace(tzinfo=None)
    print instanceRuntime
    if instanceRuntime > datetime.timedelta(hours=timeOutHours):
        return True
    else:
        return False

def sendAlert(instance, message):
    msg = MIMEMultipart()
    msg['From'] = 'AWS_Notification@sourcevertex.net'
    msg['To'] = MAILING_LIST
    msg['Subject'] = "AWS Alert: " + message
    bodyText = '\n\nThis message was sent by the AWS Monitor ' + \
        'Lambda. For details see AwsConsole-Lambdas. \n\nIf you want to ' + \
        'exclude an instance from this monitor, tag it ' + \
        'with Key=RuntimeMonitor Value=False'

    messageBody = MIMEText( message + '\nInstance ID: ' +
                    str(instance.instance_id) + '\nIn Availability zone: '
                    + str(instance.placement['AvailabilityZone']) + bodyText)
    msg.attach(messageBody)

    ses = boto3.client('ses')
    ses.send_raw_email(RawMessage={'Data' : msg.as_string()})

def lambda_handler(event, context):
    aws_regions = ec2_client.describe_regions()['Regions']
    for region in aws_regions:
        runningInstancesCount = 0
        try:
            ec2 = boto3.client('ec2', region_name=region['RegionName'])
            ec2_resource = boto3.resource('ec2',
                            region_name=region['RegionName'])
            aws_region = region['RegionName']

            instances = ec2_resource.instances.all()

            for i in instances:
                if i.state['Name'] == 'running':
                    runningInstancesCount +=1
                    if i.tags != None:
                        if parse_tag(i.tags, ('RuntimeMonitor', 'False')):
                            # Ignore these instances
                            pass
                        else:
                            if runtimeExceeded(i, INSTANCE_TIMEOUT):
                                sendAlert(i, "An EC2 instance has been running " + \
                                "for over {0} hours".format(INSTANCE_TIMEOUT))
                    else:
                        print "Untagged instence"
                        if runtimeExceeded(i, UNKNOWN_INSTANCE_TIMEOUT):
                                sendAlert(i, "An EC2 instance has been running " + \
                                "for over {0} hours".format(UNKNOWN_INSTANCE_TIMEOUT))

        except Exception as e:
            print e
            continue

        if runningInstancesCount > MAX_PERMITTED_INSTANCES:
            sendAlert(i, "Number of running instances exceeded threshold  " + \
                    "{0} running instances".format(runningInstancesCount))

    return True

Creating Alarm in AWS for EC2 Instances

2 Answers