1
votes

My funtion is

import boto3 import csv s3 = boto3.client('s3') dynamodb = boto3.resource('dynamodb')

def lambda_handler(event, context):

bucket='bucketname'
file_name='filename.csv'
obj = s3.get_object(Bucket=bucket,Key=file_name)
rows = obj['Body'].read()
lines = rows.splitlines()
# print(lines)
reader = csv.reader(lines)
parsed_csv = list(reader)
num_rows = (len(parsed_csv))
table = dynamodb.Table('table_name')
with table.batch_writer() as batch:
    for i in range(1,num_rows):
        Brand_Name= parsed_csv[i][0]
        Assigned_Brand_Name= parsed_csv[i][1]
        Brand_URL= parsed_csv[i][2]
        Generic_Name= parsed_csv[i][3]
        HSN_Code= parsed_csv[i][4]
        GST_Rate= parsed_csv[i][5]
        Price= parsed_csv[i][6]
        Dosage= parsed_csv[i][7]
        Package= parsed_csv[i][8]
        Size= parsed_csv[i][9]
        Size_Unit= parsed_csv[i][10]
        Administration_Form= parsed_csv[i][11]
        Company= parsed_csv[i][12]
        Uses= parsed_csv[i][13]
        Side_Effects= parsed_csv[i][14]
        How_to_use= parsed_csv[i][15]
        How_to_work= parsed_csv[i][16]
        FAQs_Downloaded= parsed_csv[i][17]
        Alternate_Brands= parsed_csv[i][18]
        Prescription_Required= parsed_csv[i][19]
        Interactions= parsed_csv[i][20]


        batch.put_item(Item={
            'Brand Name':Assigned_Brand_Name 
             'Brand URL':Brand_URL,
             'Generic Name':Generic_Name,
             'Price':Price,
             'Dosage':Dosage,
             'Company':Company,
             'Uses':Uses,
             'Side Effects':Side_Effects,
             'How to use':How_to_use,
             'How to work':How_to_work,
             'FAQs Downloaded?':FAQs_Downloaded,
             'Alternate Brands':Alternate_Brands,
             'Prescription Required':Prescription_Required,
             'Interactions':Interactions



            
        })

Response: { "errorMessage": "2020-10-14T11:40:56.792Z ecd63bdb-16bc-4813-afed-cbf3e1fa3625 Task timed out after 3.00 seconds" }

1
here I am working on 21 columns and 7000 rows of data.Sayan Dey
I use the batch writer in the same way and I had no issues. If you have access to cloudtrail you may be able to get more info from ecd63bdb-16bc-4813-afed-cbf3e1fa3625. One think I would suggest is to use generators if possible. For example instead of for i in range(1,num_rows): I would go directly for for row in parsed_csv. Or for row in csv.reader(obj['Body']) especially knowing that the records are large.petrch

1 Answers

0
votes

You haven't specified how many rows there are is your CSV file. "Huge" is pretty subjective so it is possible that your task is timing out due to throttling on the DynamoDB table.

If you are using provisioned capacity on the table you are loading into, make sure you have enough capacity allocated. If you're using on-demand capacity then this might be due to the on-demand partitioning that happens when the table needs to scale up.

Either way, you may want to add some error handling for situations like these and add a delay when you get a timeout, before retrying and resuming.

Something to keep in mind is that writes to Dynamo always take 1 WCU and the maximum capacity a single partition can have is 1000 WCU so as your write throughput increases, the table may undergo multiple splits behind the scenes when you're in on-demand mode. For provisioned mode, you'll have to have allocated enough capacity to begin with, otherwise you'll be limited to writing however many items / second you have allocated write capacity.