0
votes

I am getting issues when logging to cloudwatch where it appears intermittent. Not all errors are logging to Cloudwatch and I am getting Read-only file system Exceptions in the default lambda log group.

The architecture of the system involves a lambda (dotnet core 2.1) being triggered by and receiving a message from a SQS queue. The lambda then makes a http requests to a 3rd party and uploads the response to S3. There are 1000+ messages on the queue at a time and I have set the lambdas reserved concurrency to 100 to deal with this which processes 1 record at a time, which all appears fine.

I am using Nlog to format and filter logging in the lambda, the following config shows that I only have AWSTargets and no file system targets. I have added async wrappers as an attempt to stop these issues but still no luck.

<?xml version="1.0" encoding="utf-8" ?>
<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      throwConfigExceptions="true">
  <extensions>
    <add assembly="NLog.AWS.Logger" />
  </extensions>
  <targets>
    <target name="CloudwatchTraceTargetAsync" xsi:type="AsyncWrapper" overflowAction="Block">
      <target name="CloudwatchTraceTarget" type="AWSTarget" logGroup="/aws/lambda/adx-dev-batch-processor-consumer-trace" region="eu-west-2" layout="${longdate}|${uppercase:${level}}|${logger}|${message} ${exception:format=tostring}"/>
    </target>
    <target name="CloudwatchErrorTargetAsync" xsi:type="AsyncWrapper" overflowAction="Block">
      <target name="CloudwatchErrorTarget" type="AWSTarget" logGroup="/aws/lambda/adx-dev-batch-processor-consumer-error" region="eu-west-2" layout="${longdate}|${uppercase:${level}}|${logger}|${message} ${exception:format=tostring}"/>
    </target>
  </targets>
  <rules>
    <logger name="*" minlevel="Trace" maxlevel="Trace" writeTo="CloudwatchTraceTargetAsync" enabled="false" />
    <logger name="*" minlevel="Error" maxlevel="Error" writeTo="CloudwatchErrorTargetAsync" enabled="true" />
  </rules>
</nlog>

The entrypoint to the lambda is as follows. I am constructor injecting loggers into each of my services and loggings as normal.

 public class QueueConsumerLambda
    {
        private const string AspnetCoreEnvironmentVariable = "ASPNETCORE_ENVIRONMENT";
        private IServiceProvider _serviceProvider;

        [LambdaSerializer(typeof(Amazon.Lambda.Serialization.Json.JsonSerializer))]
        public async Task ProcessQueueMessage(SQSEvent sqsEvent, ILambdaContext context)
        {
            var stopwatch = new Stopwatch();
            stopwatch.Start();
            var environmentName = Environment.GetEnvironmentVariable(AspnetCoreEnvironmentVariable);

            LogManager.Configuration = new XmlLoggingConfiguration($"nlog.Consumer.{environmentName}.config");

            var logger = LogManager.GetCurrentClassLogger();
            try
            {
                logger.Trace($"QueueConsumer.ProcessQueueMessage - started with {sqsEvent.Records.Count} records.");

                var config = new ConfigurationBuilder()
                    .SetBasePath(Environment.CurrentDirectory)
                    .AddJsonFile($"appsettings.{environmentName}.json", false, false)
                    .AddEnvironmentVariables()
                    .Build();

                _serviceProvider = ConfigureServices(config);

                var dataRetrievalController = _serviceProvider.GetRequiredService<QueueConsumerController>();

                foreach (var record in sqsEvent.Records)
                {
                    logger.Trace($"QueueConsumer.ProcessQueueMessage - messageId: {record.MessageId}, eventSource: {record.EventSource}, record.Body: {record.Body}");

                    var messageBody = JsonConvert.DeserializeObject<SqsMessageModel>(record.Body);

                    await dataRetrievalController.GetData(messageBody, record.ReceiptHandle);
                }

                stopwatch.Stop();
                logger.Trace($"QueueConsumer.ProcessQueueMessage - finished and it took {stopwatch.Elapsed.Seconds} seconds.");
            }
            catch (Exception ex)
            {
                logger.Error(ex, "Stopped program because of exception");
            }
            finally
            {
                LogManager.Shutdown();
            }
        }
        private static IServiceProvider ConfigureServices(IConfiguration config)
        {
            var services = new ServiceCollection()
                .AddLogging(loggingBuilder =>
                {
                    // configure Logging with NLog
                    loggingBuilder.ClearProviders();
                    loggingBuilder.AddConfiguration(config.GetSection("Logging"));
                    loggingBuilder.AddNLog(config);
                })

                // registering more services
        }
    }

I am successfully getting logs into the separate error log group but not all of them. Some errors are not getting logged and I know this because the expected output file from the http request is not in S3 and when I debug locally I am hitting the expected exceptions. My only guess as to what is happening is the Read-only file system error that is being logged to the default log group is a symptom. It is also worth noting that not always the same error is missing, if I process the same requests the next day a different number of errors are not being logged.

I am confused, not sure if I have my logging set up correctly in the context of a lambda function or if I am hitting cloudwatch limits or any other reason. So any insight is appreciated. Let me know if more information is required

1

1 Answers

0
votes

I would probably change it so the constructor is the one responsible for setting up logging. Instead of having the request-function:

 public class QueueConsumerLambda
    {
        private const string AspnetCoreEnvironmentVariable = "ASPNETCORE_ENVIRONMENT";
        private IServiceProvider _serviceProvider;
        private NLog.Logger _logger;

        public QueueConsumerLambda()
        {
            try
            {
            var environmentName = Environment.GetEnvironmentVariable(AspnetCoreEnvironmentVariable);
            LogManager.Configuration = new XmlLoggingConfiguration($"nlog.Consumer.{environmentName}.config");

            _logger = LogManager.GetCurrentClassLogger();

                var config = new ConfigurationBuilder()
                    .SetBasePath(Environment.CurrentDirectory)
                    .AddJsonFile($"appsettings.{environmentName}.json", false, false)
                    .AddEnvironmentVariables()
                    .Build();


                _serviceProvider = ConfigureServices(config)
            }
            catch (Exception ex)
            {
                _logger?.Error(ex, "Stopped program because of exception");
                _logger?.LogFactory?.Shutdown();
            }
        }

        [LambdaSerializer(typeof(Amazon.Lambda.Serialization.Json.JsonSerializer))]
        public async Task ProcessQueueMessage(SQSEvent sqsEvent, ILambdaContext context)
        {
            var stopwatch = new Stopwatch();
            stopwatch.Start();

            var logger = _logger;
            try
            {
                logger.Trace($"QueueConsumer.ProcessQueueMessage - started with {sqsEvent.Records.Count} records.");

                var dataRetrievalController = _serviceProvider.GetRequiredService<QueueConsumerController>();

                foreach (var record in sqsEvent.Records)
                {
                    logger.Trace($"QueueConsumer.ProcessQueueMessage - messageId: {record.MessageId}, eventSource: {record.EventSource}, record.Body: {record.Body}");

                    var messageBody = JsonConvert.DeserializeObject<SqsMessageModel>(record.Body);

                    await dataRetrievalController.GetData(messageBody, record.ReceiptHandle);
                }

                stopwatch.Stop();
                logger.Trace($"QueueConsumer.ProcessQueueMessage - finished and it took {stopwatch.Elapsed.Seconds} seconds.");
            }
            catch (Exception ex)
            {
                logger.Error(ex, "Stopped program because of exception");
            }
            finally
            {
                LogManager.Flush(); // Not shutdown
            }
        }
        private static IServiceProvider ConfigureServices(IConfiguration config)
        {
            var services = new ServiceCollection()
                .AddLogging(loggingBuilder =>
                {
                    // configure Logging with NLog
                    loggingBuilder.ClearProviders();
                    loggingBuilder.AddConfiguration(config.GetSection("Logging"));
                    loggingBuilder.AddNLog(config);
                })

                // registering more services
        }
    }

But maybe the correct advice is doing the logging through the ILambdaContext. See also https://github.com/NLog/NLog.Extensions.Logging/wiki/NLog-cloud-logging-with-Azure-function-or-AWS-lambda#writing-to-ilambdacontext-ilambdalogger-in-aws and https://github.com/aws/aws-logging-dotnet#aws-lambda