1
votes

I'd like to create an EMR cluster programmatically using spot pricing to achieve some cost savings. To do this, I am trying to retrieve EMR spot instance pricing from AWS using boto3 but the only API available that I'm aware of from Boto3 is to use the ec2 client's decribe_spot_price_history call - https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Client.describe_spot_price_history

The prices from EC2 are not indicative of the pricing for EMR as seen here - https://aws.amazon.com/emr/pricing/. The values are almost double that of EMR's.

Is there a way that I can see the spot price history for EMR similar to EC2? I have checked https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html and several other pages of documentation from AWS online about this and have found nothing.

Here's a code snippet that I use to check approximate pricing that I can use to bid on EMR instances.

max_bid_price = 0.140
min_bid_price = max_bid_price
az_choice = ''
response = ec2.describe_spot_price_history(
    Filters=[{
        'Name': 'availability-zone',
        'Values': ['us-east-1a', 'us-east-1c', 'us-east-1d']
        },
        {
            'Name': 'product-description',
            'Values': ['Linux/UNIX (Amazon VPC)']
        }],
    InstanceTypes=['r5.2xlarge'],
    EndTime=datetime.now(),
    StartTime=datetime.now()
)
# TODO: Add more Subnets in other AZ's if picking from our existing 3 is an issue
# 'us-east-1b', 'us-east-1e', 'us-east-1f'
for spot_price_history in response['SpotPriceHistory']:
    print(spot_price_history)
    if float(spot_price_history['SpotPrice']) <= min_bid_price:
        min_bid_price = float(spot_price_history['SpotPrice'])
        az_choice = spot_price_history['AvailabilityZone']

The above fails since the prices for EC2 spot instances are a bit higher than what Amazon would charge for the normal hourly amount for EMR on-demand instances. (e.g. on demand for a cluster of that size only costs $0.126/hour, but on demand for EC2 is $0.504/hour and spot instances go for about $0.20/hour).

1
When you use EMR, it will gonna charge for EMR price + each EC2 price. There is no spot price for EMR and you can get the spot price for each node and then add the EMR on demand price. That is the total pricing.Lamanus

1 Answers

2
votes

There's no such thing called EMR spot pricing, as already mentioned in the comment. Spot pricing is for EC2 instances. You can look at this AWS spot advisor page to find out which instance categories have lower interruption rate, and choose based on that.

Since 2017, AWS has changed the algorithm for spot pricing, "where prices adjust more gradually, based on longer-term trends in supply and demand", so you probably don't need to look at the historical spot prices. More details about that can be found here.

Nowadays, you're most likely gonna be fine using the last price (+ delta) for that instance. This can be achieved using the following code snippet:

def get_bid_price(instancetype, aws_region):
    instance_types = [instancetype]
    start = datetime.now() - timedelta(days=1)

    ec2_client = boto3.client('ec2', aws_region)
    price_dict = ec2_client.describe_spot_price_history(StartTime=start,
                                                        InstanceTypes=instance_types,
                                                        ProductDescriptions=['Linux/UNIX (Amazon VPC)']
                                                        )
    if len(price_dict.get('SpotPriceHistory')) > 0:
        PriceHistory = namedtuple('PriceHistory', 'price timestamp')
        price_list = [PriceHistory(round(float(item.get('SpotPrice')), 3), item.get('Timestamp'))
                      for item in price_dict.get('SpotPriceHistory')]
        price_list.sort(key=lambda tup: tup.timestamp, reverse=True)

        # Maybe add 10 cents to the last spot price
        bid_price = round(float(price_list[0][0] + .01), 3)
        return bid_price
    else:
        raise ValueError('Invalid instance type: {} provided. '
                         'Please provide correct instance type.'.format(instancetype))